Optimize the code to dedup references recorded in a bundle file.
* kn/bundle-dedup-optim:
bundle: fix non-linear performance scaling with refs
t6020: test for duplicate refnames in bundle creation
"make perf" fixes.
* pb/perf-test-fixes:
p7821: fix instructions for testing with threads
p9210: fix 'scalar clone' when running from a detached HEAD
p7821: fix test_perf invocation for prereqs
Incorrect sorting of refs with bytes with high-bit set on platforms
with signed char led to a BUG, which has been corrected.
* ps/refname-avail-check-optim:
refs/packed: fix BUG when seeking refs with UTF-8 characters
Remove remnants of the recursive merge strategy backend, which was
superseded by the ort merge strategy.
* en/merge-recursive-debug:
builtin/{merge,rebase,revert}: remove GIT_TEST_MERGE_ALGORITHM
tests: remove GIT_TEST_MERGE_ALGORITHM and test_expect_merge_algorithm
merge-recursive.[ch]: thoroughly debug these
merge, sequencer: switch recursive merges over to ort
sequencer: switch non-recursive merges over to ort
merge-ort: enable diff-algorithms other than histogram
builtin/merge-recursive: switch to using merge_ort_generic()
checkout: replace merge_trees() with merge_ort_nonrecursive()
"git blame --porcelain" mode now talks about unblamable lines and
lines that are blamed to an ignored commit.
* kn/blame-porcelain-unblamable:
blame: print unblamable and ignored commits in porcelain mode
"git fetch [<remote>]" with only the configured fetch refspec
should be the only thing to update refs/remotes/<remote>/HEAD,
but the code was overly eager to do so in other cases.
* jk/fetch-follow-remote-head-fix:
fetch: make set_head() call easier to read
fetch: don't ask for remote HEAD if followRemoteHEAD is "never"
fetch: only respect followRemoteHEAD with configured refspecs
"git cat-file --batch" and friends learned to allow "--filter=" to
omit certain objects, just like the transport layer does.
* ps/cat-file-filter-batch:
builtin/cat-file: use bitmaps to efficiently filter by object type
builtin/cat-file: deduplicate logic to iterate over all objects
pack-bitmap: introduce function to check whether a pack is bitmapped
pack-bitmap: add function to iterate over filtered bitmapped objects
pack-bitmap: allow passing payloads to `show_reachable_fn()`
builtin/cat-file: support "object:type=" objects filter
builtin/cat-file: support "blob:limit=" objects filter
builtin/cat-file: support "blob:none" objects filter
builtin/cat-file: wire up an option to filter objects
builtin/cat-file: introduce function to report object status
builtin/cat-file: rename variable that tracks usage
"make test" used to have a hard dependency on (basic) Perl; tests
have been rewritten help environment with NO_PERL test the build as
much as possible.
* ps/test-wo-perl-prereq:
t5703: refactor test to not depend on Perl
t5316: refactor `max_chain()` to not depend on Perl
t0210: refactor trace2 scrubbing to not use Perl
t0021: refactor `generate_random_characters()` to not depend on Perl
t/lib-httpd: refactor "one-time-perl" CGI script to not depend on Perl
t/lib-t6000: refactor `name_from_description()` to not depend on Perl
t/lib-gpg: refactor `sanitize_pgp()` to not depend on Perl
t: refactor tests depending on Perl for textconv scripts
t: refactor tests depending on Perl to print data
t: refactor tests depending on Perl substitution operator
t: refactor tests depending on Perl transliteration operator
Makefile: stop requiring Perl when running tests
meson: stop requiring Perl when tests are enabled
t: adapt existing PERL prerequisites
t: introduce PERL_TEST_HELPERS prerequisite
t: adapt `test_readlink()` to not use Perl
t: adapt `test_copy_bytes()` to not use Perl
t: adapt character translation helpers to not use Perl
t: refactor environment sanitization to not use Perl
t: skip chain lint when PERL_PATH is unset
"git help --build-options" reports SHA-1 and SHA-256 backends used
in the build.
* jt/help-sha-backend-info-in-build-options:
help: include unsafe SHA-1 build info in version
help: include SHA implementation in version info
Updating multiple references have only been possible in all-or-none
fashion with transactions, but it can be more efficient to batch
multiple updates even when some of them are allowed to fail in a
best-effort manner. A new "best effort batches of updates" mode
has been introduced.
* kn/non-transactional-batch-updates:
update-ref: add --batch-updates flag for stdin mode
refs: support rejection in batch updates during F/D checks
refs: implement batch reference update support
refs: introduce enum-based transaction error types
refs/reftable: extract code from the transaction preparation
refs/files: remove duplicate duplicates check
refs: move duplicate refname update check to generic layer
refs/files: remove redundant check in split_symref_update()
Auth-related (and unrelated) error handling in send-email has been
made more robust.
* zy/send-email-error-handling:
send-email: finer-grained SMTP error handling
send-email: capture errors in an eval {} block
"git rev-list" learns machine-parsable output format that delimits
each field with NUL.
* jt/rev-list-z:
rev-list: support NUL-delimited --missing option
rev-list: support NUL-delimited --boundary option
rev-list: support delimiting objects with NUL bytes
rev-list: refactor early option parsing
rev-list: inline `show_object_with_name()` in `show_object()`
Random build fixes.
* ps/misc-build-fixes:
ci: use Visual Studio for win+meson job on GitHub Workflows
meson: distinguish build and target host binaries
meson: respect 'tests' build option in contrib
gitweb: fix generation of "gitweb.js"
meson: fix handling of '-Dcurl=auto'
A few traditional unit tests have been rewritten to use the clar
framework.
* sk/clar-trailer-urlmatch-norm-test:
t/unit-tests: convert urlmatch-normalization test to clar
t/unit-tests: convert trailer test to use clar
The image pointed to by the fedora:latest tag has moved from fedora
41 to 42. The fedora 41 container images have awk installed while
the fedora 42 images do not. That change is most likely just part
of reducing the size of the base container images.
In both AlmaLinux and Fedora (as well as other RHEL
derivatives/relatives), awk is provided by the gawk package.
On Fedora, `dnf install awk` would work, by using the package
filelist data to determine that /usr/bin/awk is provided by gawk and
installs gawk as a result.
On AlmaLinux (8 & 9, by quick testing by Todd), that is not the case
and you'd need to use `dnf install gawk` or `dnf install '*bin/awk'`
to get it installed. Having said that, awk _is_ included in the
current AlmaLinux 8 and 9 images, so it isn't strictly needed. But
it's probably better to be explicit that we need it installed, as a
defense against some future change to the AlmaLinux container
removing awk.
Because we know that on both of these distros, our scripts that call
for 'awk' had been using 'gawk' that was installed as part of the
base image, let's make sure that we explicitly install 'gawk'. If
the image already has it, it would be a no-op that does not cause
breakage.
Suggested-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git-merge-file" documentation source, which has lines that look
like conflict markers, lacked custom conflict marker size defined,
which has been corrected..
* pw/custom-conflict-marker-size-for-merge-related-docs:
merge-file doc: set conflict-marker-size attribute
Code clean-up.
* js/comma-semicolon-confusion:
detect-compiler: detect clang even if it found CUDA
clang: warn when the comma operator is used
compat/regex: explicitly mark intentional use of the comma operator
wildmatch: avoid using of the comma operator
diff-delta: avoid using the comma operator
xdiff: avoid using the comma operator unnecessarily
clar: avoid using the comma operator unnecessarily
kwset: avoid using the comma operator unnecessarily
rebase: avoid using the comma operator unnecessarily
remote-curl: avoid using the comma operator unnecessarily
"git clone" still gave the message about the default branch name;
this message has been turned into an advice message that can be
turned off.
* jt/clone-guess-remote-head-fix:
advice: allow disabling default branch name advice
builtin/clone: suppress unexpected default branch advice
remote: allow `guess_remote_head()` to suppress advice
The job to coalesce loose objects into packfiles in "git
maintenance" now has configurable batch size.
* ds/maintenance-loose-objects-batchsize:
maintenance: add loose-objects.batchSize config
maintenance: force progress/no-quiet to children
Fix lockfile contention in reftable code on Windows.
* ps/mingw-creat-excl-fix:
compat/mingw: fix EACCESS when opening files with `O_CREAT | O_EXCL`
meson: fix compat sources when compiling with MSVC
"git reflog" learns "drop" subcommand, that discards the entire
reflog data for a ref.
* kn/reflog-drop:
reflog: implement subcommand to drop reflogs
reflog: improve error for when reflog is not found
The object layer has been updated to take an explicit repository
instance as a parameter in more code paths.
* ps/object-wo-the-repository:
hash: stop depending on `the_repository` in `null_oid()`
hash: fix "-Wsign-compare" warnings
object-file: split out logic regarding hash algorithms
delta-islands: stop depending on `the_repository`
object-file-convert: stop depending on `the_repository`
pack-bitmap-write: stop depending on `the_repository`
pack-revindex: stop depending on `the_repository`
pack-check: stop depending on `the_repository`
environment: move access to "core.bigFileThreshold" into repo settings
pack-write: stop depending on `the_repository` and `the_hash_algo`
object: stop depending on `the_repository`
csum-file: stop depending on `the_repository`
Fix our use of zlib corner cases.
* jk/zlib-inflate-fixes:
unpack_loose_rest(): rewrite return handling for clarity
unpack_loose_rest(): simplify error handling
unpack_loose_rest(): never clean up zstream
unpack_loose_rest(): avoid numeric comparison of zlib status
unpack_loose_header(): avoid numeric comparison of zlib status
git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT
unpack_loose_header(): fix infinite loop on broken zlib input
unpack_loose_header(): report headers without NUL as "bad"
unpack_loose_header(): simplify next_out assignment
loose_object_info(): BUG() on inflating content with unknown type
In 7b31b55db1 (perf: amend the grep tests to test grep.threads,
2017-12-29), p7821 was tweaked to test the performance of 'git grep'
under different number of threads. These tests are run if
GIT_PERF_GREP_THREADS is set to a list of thread numbers, but the
comment at the top of the file instead mentions GIT_PERF_7821_THREADS.
Fix the comment.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
packed_git_window_size and packed_git_limit are not used anywhere in
the codebase. A search found that all references were removed in
d284713bae (config: make `packed_git_(limit|window_size)` non-global
variables, 2024-12-03), except the ones in this file, as they were moved
to struct repo_settings.
Remove packed_git_window_size and packed_git_limit from environment.h.
Signed-off-by: Arnav Bhate <bhatearnav@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a typo in a comment in refs.c: "checking checking" → "checking".
Signed-off-by: Christian Fredrik Johnsen <christian@johnsen.no>
Acked-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was reported that using git-pull(1) in a repository whose remote
contains branches with emojis leads to the following bug:
$ git pull
remote: Enumerating objects: 161255, done.
remote: Counting objects: 100% (55884/55884), done.
remote: Compressing objects: 100% (5518/5518), done.
remote: Total 161255 (delta 54253), reused 50509 (delta 50364),
pack-reused 105371 (from 4)
Receiving objects: 100% (161255/161255), 309.90 MiB | 16.87 MiB/s, done.
Resolving deltas: 100% (118048/118048), completed with 13416 local objects.
From github.com:github/github
97ab7ae3f3745..8fb2f9fa180ed master -> origin/master
[...snip many screenfuls of updates to origin remotes...]
BUG: refs/packed-backend.c:984: packed-refs backend yielded reference
preceding its prefix
error: fetch died of signal 6
This issue bisects to 22600c0452 (refs/iterator: implement seeking for
packed-ref iterators, 2025-03-12) where we have implemented seeking for
the packed-ref iterator. As part of that change we introduced a check
that verifies that the iterator only returns refnames bigger than the
prefix. In theory, this check should always hold: when a prefix is set
we know that we would've seeked that prefix first, so we should never
see a reference sorting before that prefix.
But in practice the check itself is misbehaving when handling unicode
characters. The particular issue triggered with a branch that got the
"shaved ice" unicode character in its name, which is composed of the
bytes "0xEE 0x90 0xBF". The bug triggers when we compare the refname
"refs/heads/<shaved-ice>" to something like "refs/heads/z", and it
specifically hits when comparing the first byte, "0xEE".
The root cause is that the most-significant bit of 0xEE is set. The
`refname` and `prefix` pointers that we use to compare bytes with one
another are both pointers to signed characters. As such, when we
dereference the 0xEE byte the result is a _negative_ value, and this
value will of course compare smaller than "z".
We can see that this issue is avoided in `cmp_packed_refname()`, where
we explicitly cast each byte to its unsigned form. Fix the bug by doing
the same in `packed_ref_iterator_advance()`.
Reported-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We ignore any error returned from set_head(), but 638060dcb9 (fetch
set_head: refactor to use remote directly, 2025-01-26) left its call in
a noop "if" conditional as a sort of note-to-self.
When c834d1a7ce (fetch: only respect followRemoteHEAD with configured
refspecs, 2025-03-18) added a "do_set_head" flag, it was rolled into the
same conditional, putting set_head() on the right-hand side of a
short-circuit AND.
That's not wrong, but it really hides the point of the line, which
is (maybe) calling the function.
Instead, let's have a full if() block for the flag, and then our comment
(with some rewording) will be sufficient to clarify the error handling.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `sparse` job still uses the `ubuntu-20.04` runner pool, but that
pool is about to go away, so let's stop using it.
There is no `sparse-22.04` artifact provided by the "Build sparse for
Ubuntu" Azure Pipeline, but that is not necessary anyway because Ubuntu
22.04 has the `sparse` package: https://packages.ubuntu.com/jammy/sparse
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With at least glibc 2.39, glibc provides a function declaration that
matches with this POSIX interface:
int regexec(const regex_t *restrict preg, const char *restrict string,
size_t nmatch, regmatch_t pmatch[restrict], int eflags);
such prototype requires variable-length-array for `pmatch'.
Thus, sparse reports this error:
> ../add-patch.c: note: in included file (through ../git-compat-util.h):
> /usr/include/regex.h:682:41: error: undefined identifier '__nmatch'
> /usr/include/regex.h:682:41: error: bad constant expression type
> /usr/include/regex.h:682:41: error: Variable length array is used.
Note: `__nmatch' is POSIX's nmatch.
The glibc's intention is informing their users to provides a large
enough buffer to hold `__nmatch' results and provides diagnosis if
necessary. It's merely a glibc' implementation detail.
Hide that usage from sparse by using standard C11's macro:
__STDC_NO_VLA__
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we already teach the `repo_config()` in f29f1990 (config:
teach repo_config to allow `repo` to be NULL, 2025-03-08) to allow
`repo` to be NULL, no need to check if `repo` is NULL before calling
`repo_config()`.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git bundle create' command has non-linear performance with the
number of refs in the repository. Benchmarking the command shows that
a large portion of the time (~75%) is spent in the
`object_array_remove_duplicates()` function.
The `object_array_remove_duplicates()` function was added in
b2a6d1c686 (bundle: allow the same ref to be given more than once,
2009-01-17) to skip duplicate refs provided by the user from being
written to the bundle. Since this is an O(N^2) algorithm, in repos with
large number of references, this can take up a large amount of time.
Let's instead use a 'strset' to skip duplicates inside
`write_bundle_refs()`. This improves the performance by around 6 times
when tested against in repository with 100000 refs:
Benchmark 1: bundle (refcount = 100000, revision = master)
Time (mean ± σ): 14.653 s ± 0.203 s [User: 13.940 s, System: 0.762 s]
Range (min … max): 14.237 s … 14.920 s 10 runs
Benchmark 2: bundle (refcount = 100000, revision = HEAD)
Time (mean ± σ): 2.394 s ± 0.023 s [User: 1.684 s, System: 0.798 s]
Range (min … max): 2.364 s … 2.425 s 10 runs
Summary
bundle (refcount = 100000, revision = HEAD) ran
6.12 ± 0.10 times faster than bundle (refcount = 100000, revision = master)
Previously, `object_array_remove_duplicates()` ensured that both the
refname and the object it pointed to were checked for duplicates. The
new approach, implemented within `write_bundle_refs()`, eliminates
duplicate refnames without comparing the objects they reference. This
works because, for bundle creation, we only need to prevent duplicate
refs from being written to the bundle header. The `revs->pending` array
can contain duplicates of multiple types.
First, references which resolve to the same refname. For e.g. "git
bundle create out.bdl master master" or "git bundle create out.bdl
refs/heads/master refs/heads/master" or "git bundle create out.bdl
master refs/heads/master". In these scenarios we want to prevent writing
"refs/heads/master" twice to the bundle header. Since both the refnames
here would point to the same object (unless there is a race), we do not
need to check equality of the object.
Second, refnames which are duplicates but do not point to the same
object. This can happen when we use an exclusion criteria. For e.g. "git
bundle create out.bdl master master^!", Here `revs->pending` would
contain two elements, both with refname set to "master". However, each
of them would be pointing to an INTERESTING and UNINTERESTING object
respectively. Since we only write refnames with INTERESTING objects to
the bundle header, we perform our duplicate checks only on such objects.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit b2a6d1c686 (bundle: allow the same ref to be given more than
once, 2009-01-17) added functionality to detect and remove duplicate
refnames from being added during bundle creation. This ensured that
clones created from such bundles wouldn't barf about duplicate refnames.
The following commit will add some optimizations to make this check
faster, but before doing that, it would be optimal to add tests to
capture the current behavior.
Add tests to capture duplicate refnames provided by the user during
bundle creation. This can be a combination of:
- refnames directly provided by the user.
- refname duplicate by using the '--all' flag alongside manual
references being provided.
- exclusion criteria provided via a refname "main^!".
- short forms of refnames provided, "main" vs "refs/heads/main".
Note that currently duplicates due to usage of short and long forms goes
undetected. This should be fixed with the optimizations made in the next
commit.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>