kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Patrick Steinhardt	8e0a1ec076	builtin/maintenance: introduce "reflog-expire" task By default, git-maintenance(1) uses the "gc" task to ensure that the repository is well-maintained. This can be changed, for example by either explicitly configuring which tasks should be enabled or by using the "incremental" maintenance strategy. If so, git-maintenance(1) does not know to expire reflog entries, which is a subtask that git-gc(1) knows to perform for the user. Consequently, the reflog will grow indefinitely unless the user manually trims it. Introduce a new "reflog-expire" task that plugs this gap: - When running the task directly, then we simply execute `git reflog expire --all`, which is the same as git-gc(1). - When running git-maintenance(1) with the `--auto` flag, then we only run the task in case the "HEAD" reflog has at least N reflog entries that would be discarded. By default, N is set to 100, but this can be configured via "maintenance.reflog-expire.auto". When a negative integer has been provided we always expire entries, zero causes us to never expire entries, and a positive value specifies how many entries need to exist before we consider pruning the entries. Note that the condition for the `--auto` flags is merely a heuristic and optimized for being fast. This is because `git maintenance run --auto` will be executed quite regularly, so scanning through all reflogs would likely be too expensive in many repositories. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 07:53:27 -07:00
Patrick Steinhardt	3fef24ac3f	builtin/gc: split out function to expire reflog entries We're about to introduce a new task for git-maintenance(1) that knows to expire reflog entries. The logic will be shared with git-gc(1), which already knows how to do this. Pull out the common logic into a separate function so that we can share the implementation between both builtins. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 07:53:27 -07:00
Patrick Steinhardt	d20fc193b6	builtin/reflog: make functions regarding `reflog_expire_options` public Make functions that are required to manage `reflog_expire_options` available elsewhere by moving them into "reflog.c" and exposing them in the corresponding header. The functions will be used in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 07:53:27 -07:00
Patrick Steinhardt	964f364de9	builtin/reflog: stop storing per-reflog expiry dates globally As described in the preceding commit, the per-reflog expiry dates are stored in a global pair of variables. Refactor the code so that they are contained in `struct reflog_expire_options` to make the structure useful in other contexts. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 07:53:26 -07:00
Patrick Steinhardt	8565827570	builtin/reflog: stop storing default reflog expiry dates globally When expiring reflog entries, it is possible to configure expiry dates that depend on the name of the reflog. This requires us to store a couple of different expiry dates: - The default expiry date for reflog entries that aren't otherwise specified. - The per-reflog expiry date. - The currently active set of expiry dates for a given reference. While the last item is stored in `struct reflog_expire_options`, the other items aren't, which makes it hard to reuse the structure in other places. Refactor the code so that the default expiry date is stored as part of the structure. The per-reflog expiry dates will be adapted accordingly in the subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 07:53:26 -07:00
Patrick Steinhardt	2ed8008399	reflog: rename `cmd_reflog_expire_cb` to `reflog_expire_options` We're about to expose `struct cmd_reflog_expire_cb` via "reflog.h" so that we can also use this structure in "builtin/gc.c". Once we make it accessible to a wider scope though it becomes awkwardly named, as it isn't only useful in the context of a callback. Instead, the function is containing all kinds of options relevant to whether or not a reflog entry should be expired. Rename the structure to `reflog_expire_options` to prepare for this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-08 07:53:25 -07:00
Karthik Nayak	4d253071dd	blame: print unblamable and ignored commits in porcelain mode The 'git-blame(1)' command allows users to ignore specific revisions via the '--ignore-rev <rev>' and '--ignore-revs-file <file>' flags. These flags are often combined with the 'blame.markIgnoredLines' and 'blame.markUnblamableLines' config options. These config options prefix ignored and unblamable lines with a '?' and '*', respectively. However, this option was never extended to the porcelain mode of 'git-blame(1)'. Since the documentation does not indicate this exclusion, it is a bug. Fix this by printing 'ignored' and 'unblamable' respectively for the options when using the porcelain modes. Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: Toon Claes <toon@iotcl.com> Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:50:18 -07:00
Patrick Steinhardt	8002e8ee18	builtin/cat-file: use bitmaps to efficiently filter by object type While it is now possible to filter objects by type, this mechanism is for now mostly a convenience. Most importantly, we still have to iterate through the whole packfile to find all objects of a specific type. This can be prohibitively expensive depending on the size of the packfiles. It isn't really possible to do better than this when only considering a packfile itself, as the order of objects is not fixed. But when we have a packfile with a corresponding bitmap, either because the packfile itself has one or because the multi-pack index has a bitmap for it, then we can use these bitmaps to improve the runtime. While bitmaps are typically used to compute reachability of objects, they also contain one bitmap per object type that encodes which object has what type. So instead of reading through the whole packfile(s), we can use the bitmaps and iterate through the type-specific bitmap. Typically, only a subset of packfiles will have a bitmap. But this isn't really much of a problem: we can use bitmaps when available, and then use the non-bitmap walk for every packfile that isn't covered by one. Overall, this leads to quite a significant speedup depending on how many objects of a certain type exist. The following benchmarks have been executed in the Chromium repository, which has a 50GB packfile with almost 25 million objects. As expected, there isn't really much of a change in performance without an object filter: Benchmark 1: cat-file with no-filter (revision = HEAD~) Time (mean ± σ): 89.675 s ± 4.527 s [User: 40.807 s, System: 10.782 s] Range (min … max): 83.052 s … 96.084 s 10 runs Benchmark 2: cat-file with no-filter (revision = HEAD) Time (mean ± σ): 88.991 s ± 2.488 s [User: 42.278 s, System: 10.305 s] Range (min … max): 82.843 s … 91.271 s 10 runs Summary cat-file with no-filter (revision = HEAD) ran 1.01 ± 0.06 times faster than cat-file with no-filter (revision = HEAD~) We still have to scan through all objects as we yield all of them, so using the bitmap in this case doesn't really buy us anything. What is noticeable in this benchmark is that we're I/O-bound, not CPU-bound, as can be seen from the user/system runtimes, which combined are way lower than the overall benchmarked runtime. But when we do use a filter we can see a significant improvement: Benchmark 1: cat-file with filter=object:type=commit (revision = HEAD~) Time (mean ± σ): 86.444 s ± 4.081 s [User: 36.830 s, System: 11.312 s] Range (min … max): 80.305 s … 93.104 s 10 runs Benchmark 2: cat-file with filter=object:type=commit (revision = HEAD) Time (mean ± σ): 2.089 s ± 0.015 s [User: 1.872 s, System: 0.207 s] Range (min … max): 2.073 s … 2.119 s 10 runs Summary cat-file with filter=object:type=commit (revision = HEAD) ran 41.38 ± 1.98 times faster than cat-file with filter=object:type=commit (revision = HEAD~) This is because we don't have to scan through all packfiles anymore, but can instead directly look up relevant objects. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:52 -07:00
Patrick Steinhardt	d5ec7027bc	builtin/cat-file: deduplicate logic to iterate over all objects Pull out a common function that allows us to iterate over all objects in a repository. Right now the logic is trivial and would only require two function calls, making this refactoring a bit pointless. But in the next commit we will iterate on this logic to make use of bitmaps, so this is about to become a bit more complex. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:52 -07:00
Patrick Steinhardt	3d45483846	pack-bitmap: allow passing payloads to `show_reachable_fn()` The `show_reachable_fn` callback is used by a couple of functions to present reachable objects to the caller. The function does not provide a way for the caller to pass a payload though, which is functionality that we'll require in a subsequent commit. Change the callback type to accept a payload and adapt all callsites accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:51 -07:00
Patrick Steinhardt	8fa9fe171a	builtin/cat-file: support "object:type=" objects filter Implement support for the "object:type=" filter in git-cat-file(1), which causes us to omit all objects that don't match the provided object type. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:51 -07:00
Patrick Steinhardt	dbe1b32d59	builtin/cat-file: support "blob:limit=" objects filter Implement support for the "blob:limit=" filter in git-cat-file(1), which causes us to omit all blobs that are bigger than a certain size. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:50 -07:00
Patrick Steinhardt	3794e9bf98	builtin/cat-file: support "blob:none" objects filter Implement support for the "blob:none" filter in git-cat-file(1), which causes us to omit all blobs. Note that this new filter requires us to read the object type via `oid_object_info_extended()` in `batch_object_write()`. But as we try to optimize away reading objects from the database the `data->info.typep` pointer may not be set. We thus have to adapt the logic to conditionally set the pointer in cases where the filter is given. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:50 -07:00
Patrick Steinhardt	eb83e4c64b	builtin/cat-file: wire up an option to filter objects In batch mode, git-cat-file(1) enumerates all objects and prints them by iterating through both loose and packed objects. This works without considering their reachability at all, and consequently most options to filter objects as they exist in e.g. git-rev-list(1) are not applicable. In some situations it may still be useful though to filter objects based on properties that are inherent to them. This includes the object size as well as its type. Such a filter already exists in git-rev-list(1) with the `--filter=` command line option. While this option supports a couple of filters that are not applicable to our usecase, some of them are quite a neat fit. Wire up the filter as an option for git-cat-file(1). This allows us to reuse the same syntax as in git-rev-list(1) so that we don't have to reinvent the wheel. For now, we die when any of the filter options has been passed by the user, but they will be wired up in subsequent commits. Further note that the filters that we are about to introduce don't significantly speed up the runtime of git-cat-file(1). While we can skip emitting a lot of objects in case they are uninteresting to us, the majority of time is spent reading the packfile, which is bottlenecked by I/O and not the processor. This will change though once we start to make use of bitmaps, which will allow us to skip reading the whole packfile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:50 -07:00
Patrick Steinhardt	1914ae0d70	builtin/cat-file: introduce function to report object status We have multiple callsites that report the status of an object, for example when the objec tis missing or its name is ambiguous. We're about to add a couple more such callsites to report on "excluded" objects. Prepare for this by introducing a new function `report_object_status()` that encapsulates the functionality. Note that this function also flushes stdout, which is a requirement so that request-response style batched modes can learn about the status before proceeding to the next object. We already flush correctly at all existing callsites, even though the flush in `batch_one_object()` only comes after the switch statement. That flush is now redundant, and we could in theory deduplicate it by moving it into all branches that don't use `report_object_status()`. But that doesn't quite feel sensible: - The duplicate flush should ultimately just be a no-op for us and thus shouldn't impact performance significantly. - By keeping the flush in `report_object_status()` we ensure that all future callers get semantics correct. So let's just be pragmatic and live with the duplicated flush. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:49 -07:00
Patrick Steinhardt	84a1d0039a	builtin/cat-file: rename variable that tracks usage The usage strings for git-cat-file(1) that we pass to `parse_options()` and `usage_msg_optf()` are stored in a variable called `usage`. This variable shadows the declaration of `usage()`, which we'll want to use in a subsequent commit. Rename the variable to `builtin_catfile_usage`, which is in line with how the variable is typically called in other builtins. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:43:49 -07:00
Junio C Hamano	8a753b9a44	Merge branch 'jh/hash-init-fixes' An earlier code refactoring of the hash machinery missed a few required calls to init_fn. * jh/hash-init-fixes: index-pack, unpack-objects: restore missing ->init_fn	2025-04-07 14:23:18 -07:00
Junio C Hamano	58a8c38226	Merge branch 'tb/combine-cruft-below-size' "git repack" learned "--combine-cruft-below-size" option that controls how cruft-packs are combined. * tb/combine-cruft-below-size: repack: begin combining cruft packs with `--combine-cruft-below-size` repack: avoid combining cruft packs with `--max-cruft-size` t/t7704-repack-cruft.sh: consolidate `write_blob()` t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests	2025-04-07 14:23:18 -07:00
Junio C Hamano	477cc3b6c7	Merge branch 'jc/name-rev-stdin' Using "git name-rev --stdin" as an example, improve the framework to prepare tests to pretend to be in the future where the breaking changes have already happened. * jc/name-rev-stdin: name-rev: remove "--stdin" support t6120: further modernize t6120: avoid hiding "git" exit status t: introduce WITH_BREAKING_CHANGES prerequisite t: extend test_lazy_prereq t: document test_lazy_prereq	2025-04-07 14:23:17 -07:00
Arnav Bhate	d2fc29380a	rm: fix sign comparison warnings There are multiple places in loops, where a signed and an unsigned data type are compared. Git uses a mix of signed and unsigned types to store lengths of arrays. This sometimes leads to using a signed index for an array whose length is stored in an unsigned variable or vice versa. get_ours_cache_pos is a special case where i, though derived from a signed variable is never negative. Move this part to the caller side and make i an unsigned argument of the function. Rename i to pos to make it descriptive, now that it is a function argument. Replace signed data types with unsigned data types and vice versa wherever necessary. Where both signed and unsigned data types have been used, define a new variable in the scope of the for loop for use as the iterator. Remove #define DISABLE_SIGN_COMPARE_WARNINGS. Signed-off-by: Arnav Bhate <bhatearnav@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-29 01:04:40 -07:00
Junio C Hamano	ff926a6d1b	Merge branch 'en/random-cleanups' Miscellaneous code clean-ups. * en/random-cleanups: merge-ort: remove extraneous word in comment merge-ort: fix accidental strset<->strintmap t7615: be more explicit about diff algorithm used t6423: fix a comment that accidentally reversed two commits stash: remove merge-recursive.h include	2025-03-29 16:39:10 +09:00
Junio C Hamano	27fe152e88	Merge branch 'tb/multi-cruft-pack-refresh-fix' Certain "cruft" objects would have never been refreshed when there are multiple cruft packs in the repository, which has been corrected. * tb/multi-cruft-pack-refresh-fix: builtin/pack-objects.c: freshen objects from existing cruft packs	2025-03-29 16:39:09 +09:00
Junio C Hamano	650b2e2fdb	Merge branch 'jk/fetch-ref-prefix-cleanup' In protocol v2 where the refs advertisement is constrained, we try to tell the server side not to limit the advertisement when there is no specific need to, which has been the source of confusion and recent bugs. Revamp the logic to simplify. * jk/fetch-ref-prefix-cleanup: fetch: use ref prefix list to skip ls-refs fetch: avoid ls-refs only to ask for HEAD symref update fetch: stop protecting additions to ref-prefix list fetch: ask server to advertise HEAD for config-less fetch refspec_ref_prefixes(): clean up refspec_item logic t5516: beef up exact-oid ref prefixes test t5516: drop NEEDSWORK about v2 reachability behavior t5516: prefer "oid" to "sha1" in some test titles t5702: fix typo in test name	2025-03-29 16:39:08 +09:00
Junio C Hamano	eb7923be1f	Merge branch 'en/merge-ort-prepare-to-remove-recursive' First step of deprecating and removing merge-recursive. * en/merge-ort-prepare-to-remove-recursive: am: switch from merge_recursive_generic() to merge_ort_generic() merge-ort: fix merge.directoryRenames=false t3650: document bug when directory renames are turned off merge-ort: support having merge verbosity be set to 0 merge-ort: allow rename detection to be disabled merge-ort: add new merge_ort_generic() function	2025-03-29 16:39:07 +09:00
Junio C Hamano	8d6413a1be	Merge branch 'ps/refname-avail-check-optim' The code paths to check whether a refname X is available (by seeing if another ref X/Y exists, etc.) have been optimized. * ps/refname-avail-check-optim: refs: reuse iterators when determining refname availability refs/iterator: implement seeking for files iterators refs/iterator: implement seeking for packed-ref iterators refs/iterator: implement seeking for ref-cache iterators refs/iterator: implement seeking for reftable iterators refs/iterator: implement seeking for merged iterators refs/iterator: provide infrastructure to re-seek iterators refs/iterator: separate lifecycle from iteration refs: stop re-verifying common prefixes for availability refs/files: batch refname availability checks for initial transactions refs/files: batch refname availability checks for normal transactions refs/reftable: batch refname availability checks refs: introduce function to batch refname availability checks builtin/update-ref: skip ambiguity checks when parsing object IDs object-name: allow skipping ambiguity checks in `get_oid()` family object-name: introduce `repo_get_oid_with_flags()`	2025-03-29 16:39:07 +09:00
Junio C Hamano	01d17c0530	Merge branch 'cc/signed-fast-export-import' "git fast-export \| git fast-import" learns to deal with commit and tag objects with embedded signatures a bit better. * cc/signed-fast-export-import: fast-export, fast-import: add support for signed-commits fast-export: do not modify memory from get_commit_buffer git-fast-export.adoc: clarify why 'verbatim' may not be a good idea fast-export: rename --signed-tags='warn' to 'warn-verbatim' fast-export: fix missing whitespace after switch git-fast-import.adoc: add missing LF in the BNF	2025-03-29 16:39:07 +09:00
Johannes Schindelin	38c696d66b	rebase: avoid using the comma operator unnecessarily The comma operator is a somewhat obscure C feature that is often used by mistake and can even cause unintentional code flow. Better use a semicolon instead. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-28 17:38:08 -07:00
Junio C Hamano	1a764cdbdc	Merge branch 'ua/some-builtins-wo-the-repository' A handful of built-in command implementations have been rewritten to use the repository instance supplied by git.c:run_builtin(), its caller. * ua/some-builtins-wo-the-repository: builtin/checkout-index: stop using `the_repository` builtin/for-each-ref: stop using `the_repository` builtin/ls-files: stop using `the_repository` builtin/pack-refs: stop using `the_repository` builtin/send-pack: stop using `the_repository` builtin/verify-commit: stop using `the_repository` builtin/verify-tag: stop using `the_repository` config: teach repo_config to allow `repo` to be NULL	2025-03-26 16:26:10 +09:00
Junio C Hamano	de35b7b3ff	Merge branch 'sj/ref-consistency-checks-more' "git fsck" becomes more careful when checking the refs. * sj/ref-consistency-checks-more: builtin/fsck: add `git refs verify` child process packed-backend: check whether the "packed-refs" is sorted packed-backend: add "packed-refs" entry consistency check packed-backend: check whether the refname contains NUL characters packed-backend: add "packed-refs" header consistency check packed-backend: check if header starts with "# pack-refs with: " packed-backend: check whether the "packed-refs" is regular file builtin/refs: get worktrees without reading head information t0602: use subshell to ensure working directory unchanged	2025-03-26 16:26:10 +09:00
Junio C Hamano	f50df872a4	Merge branch 'jt/diff-pairs' A post-processing filter for "diff --raw" output has been introduced. * jt/diff-pairs: builtin/diff-pairs: allow explicit diff queue flush builtin: introduce diff-pairs command diff: add option to skip resolving diff statuses diff: return diff_filepair from diff queue helpers	2025-03-26 16:26:09 +09:00
Justin Tobler	c039a46e99	builtin/clone: suppress unexpected default branch advice In `199f44cb2e` (builtin/clone: allow remote helpers to detect repo, 2024-02-27), clones started partially initializing the refdb before executing the remote helpers by creating a HEAD file and "refs/" directory. This has resulted in some scenarios where git-clone(1) now prints the default branch name advice message where it previously did not. A side-effect of the HEAD file already existing, is that computation of the default branch name is handled later in execution. This matters because prior to `97abaab5f6` (refs: drop `git_default_branch_name()`, 2024-05-17), the default branch value would be computed during its first execution and cached. Subsequent invocations would simply return the cached value. Since the next `git_default_branch_name()` call site, which is invoked through `guess_remote_head()`, is not configured to suppress the advice message, computing the default branch name results in the advice message being printed. Configure `guess_remote_head()` to suppress the advice message, restoring the previous behavior. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-25 16:09:28 -07:00
Justin Tobler	d5d284df91	remote: allow `guess_remote_head()` to suppress advice The `repo_default_branch_name()` invoked through `guess_remote_head()` is configured to always display the default branch advice message. Adapt `guess_remote_head()` to accept flags and convert the `all` parameter to a flag. Add the `REMOTE_GUESS_HEAD_QUIET` flag to to enable suppression of advice messages. Call sites are updated accordingly. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-25 16:09:27 -07:00
Derrick Stolee	6540560fd6	maintenance: add loose-objects.batchSize config The 'loose-objects' task of 'git maintenance run' first deletes loose objects that exit within packfiles and then collects loose objects into a packfile. This second step uses an implicit limit of fifty thousand that cannot be modified by users. Add a new config option that allows this limit to be adjusted or ignored entirely. While creating tests for this option, I noticed that actually there was an off-by-one error due to the strict comparison in the limit check. I considered making the limit check turn true on equality, but instead I thought to use INT_MAX as a "no limit" barrier which should mean it's never possible to hit the limit. Thus, a new decrement to the limit is provided if the value is positive. (The restriction to positive values is to avoid underflow if INT_MIN is configured.) Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-23 23:06:01 -07:00
Derrick Stolee	286183da99	maintenance: force progress/no-quiet to children The --no-quiet option for 'git maintenance run' is supposed to indicate that progress should happen even while ignoring the value of isatty(2). However, Git implicitly asks child processes to check isatty(2) since these arguments are not passed through. The pass through of --no-quiet will be useful in a test in the next change. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-23 23:06:01 -07:00
Taylor Blau	27afc272c4	midx: implement writing incremental MIDX bitmaps Now that the pack-bitmap machinery has learned how to read and interact with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery (and relevant callers from within the MIDX machinery) to write such bitmaps. The details for doing so are mostly straightforward. The main changes are as follows: - find_object_pos() now makes use of an extra MIDX parameter which is used to locate the bit positions of objects which are from previous layers (and thus do not exist in the current layer's pack_order field). (Note also that the pack_order field is moved into struct write_midx_context to further simplify the callers for write_midx_bitmap()). - bitmap_writer_build_type_index() first determines how many objects precede the current bitmap layer and offsets the bits it sets in each respective type-level bitmap by that amount so they can be OR'd together. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 04:34:16 -07:00
Justin Tobler	b9fadeead7	builtin/fetch: avoid aborting closed reference transaction As part of the reference transaction commit phase, the transaction is set to a closed state regardless of whether it was successful of not. Attempting to abort a closed transaction via `ref_transaction_abort()` results in a `BUG()`. In `c92abe71df` (builtin/fetch: fix leaking transaction with `--atomic`, 2024-08-22), logic to free a transaction after the commit phase is moved to the centralized exit path. In cases where the transaction commit failed, this results in a closed transaction being aborted and signaling a bug. Free the transaction and set it to NULL when the commit fails. This allows the exit path to correctly handle the error without attempting to abort the transaction. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:59:46 -07:00
Taylor Blau	484d7adcda	repack: begin combining cruft packs with `--combine-cruft-below-size` The previous commit changed the behavior of repack's '--max-cruft-size' to specify a cruft pack-specific override for '--max-pack-size'. Introduce a new flag, '--combine-cruft-below-size' which is a replacement for the old behavior of '--max-cruft-size'. This new flag does explicitly what it says: it combines together cruft packs which are smaller than a given threshold, and leaves alone ones which are larger. This accomplishes the original intent of '--max-cruft-size', which was to avoid repacking cruft packs larger than the given threshold. The new behavior is slightly different. Instead of building up small packs together until the threshold is met, '--combine-cruft-below-size' packs up all cruft packs smaller than the threshold. This means that we may make a pack much larger than the given threshold (e.g., if you aggregate 5 packs which are each 99 MiB in size with a threshold of 100 MiB). But that's OK: the point isn't to restrict the size of the cruft packs we generate, it's to avoid working with ones that have already grown too large. If repositories still want to limit the size of the generated cruft pack(s), they may use '--max-cruft-size'. There's some minor test fallout as a result of the slight differences in behavior between the old meaning of '--max-cruft-size' and the behavior of '--combine-cruft-below-size'. In the test which is now called "--combine-cruft-below-size combines packs", we need to use the new flag over the old one to exercise that test's intended behavior. The remainder of the changes there are to improve the clarity of the comments. Suggested-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:42:07 -07:00
Taylor Blau	0855ed966c	repack: avoid combining cruft packs with `--max-cruft-size` In `37dc6d8104` (builtin/repack.c: implement support for `--max-cruft-size`, 2023-10-02), we exposed new functionality that allowed repositories to specify the behavior of when we should combine multiple cruft packs together. This feature was designed to ensure that we never repacked cruft packs which were larger than the given threshold in order to provide tighter I/O bounds for repositories that have many unreachable objects. In essence, specifying '--max-cruft-size=N' instructed 'repack' to aggregate cruft packs together (in order of ascending size) until the combine size grows past 'N', and then make a new cruft pack whose contents includes the packs we rolled up. But this isn't quite how it works in practice. Suppose for example that we have two cruft packs which are each 100MiB in size. One might expect specifying "--max-cruft-size=200M" would combine these two packs together, and then avoid repacking them until a pruning GC takes place. In reality, 'repack' would try and aggregate these together, but writing a pack that is strictly smaller than 200 MiB (since pack-objects' "--max-pack-size" provides a strict bound for packs containing more than one object). So instead we'll write out a pack that is, say, 199 MiB in size, and then another 1 MiB pack containing the balance. If we later repack the repository without adding any new unreachable objects, we'll repeat the same exercise again, making the same 199 MiB and 1 MiB packs each time. This happens because of a poor choice to bolt the '--max-cruft-size' functionality onto pack-objects' '--max-pack-size', forcing us to generate packs which are always smaller than the provided threshold and thus subject to repacking. The following commit will introduce a new flag that implements something similar to the behavior above. Let's prepare for that by making repack's '--max-cruft-size' flag behave as an cruft pack-specific override for '--max-pack-size'. Do so by temporarily repurposing the 'collapse_small_cruft_packs()' function to instead generate a cruft pack using the same instructions as if we didn't specify any maximum pack size. The calling code looks something like: if (args->max_pack_size && !cruft_expiration) { collapse_small_cruft_packs(in, args->max_pack_size, existing); } else { for_each_string_list_item(item, &existing->non_kept_packs) fprintf(in, "-%s.pack\n", item->string); for_each_string_list_item(item, &existing->cruft_packs) fprintf(in, "-%s.pack\n", item->string); } This patch makes collapse_small_cruft_packs() behave identically to the 'else' arm of the conditional above. This repurposing of 'collapse_small_cruft_packs()' is intentional, since it will set us up nicely to introduce the new behavior in the following commit. Naturally, there is some test fallout in the test which exercises the old meaning of '--max-cruft-size'. Mark that test as failing for now to be dealt with in the following commit. Likewise, add a new test which explicitly tests the behavior of '--max-cruft-size' to place a hard limit on the size of any generated cruft pack(s). Note that this is a breaking change, as it alters the user-visible behavior of '--max-cruft-size'. But I'm OK changing this behavior in this instance, since the behavior wasn't accurate to begin with. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:42:07 -07:00
Justin Tobler	340e7523c0	rev-list: support NUL-delimited --missing option The `--missing={print,print-info}` option for git-rev-list(1) prints missing objects found while performing the object walk in the form: $ git rev-list --missing=print-info <rev> ?<oid> [SP <token>=<value>]... LF Add support for printing missing objects in a NUL-delimited format when the `-z` option is enabled. $ git rev-list -z --missing=print-info <rev> <oid> NUL missing=yes NUL [<token>=<value> NUL]... In this mode, values containing special characters or spaces are printed as-is without being escaped or quoted. Instead of prefixing the missing OID with '?', a separate `missing=yes` token/value pair is appended. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:40:03 -07:00
Justin Tobler	1c3c1ab3d2	rev-list: support NUL-delimited --boundary option The `--boundary` option for git-rev-list(1) prints boundary objects found while performing the object walk in the form: $ git rev-list --boundary <rev> -<oid> LF Add support for printing boundary objects in a NUL-delimited format when the `-z` option is enabled. $ git rev-list -z --boundary <rev> <oid> NUL boundary=yes NUL In this mode, instead of prefixing the boundary OID with '-', a separate `boundary=yes` token/value pair is appended. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:40:02 -07:00
Justin Tobler	c3d59c2e70	rev-list: support delimiting objects with NUL bytes When walking objects, git-rev-list(1) prints each object entry on a separate line. Some options, such as `--objects`, may print additional information about tree and blob object on the same line in the form: $ git rev-list --objects <rev> <tree/blob oid> SP [<path>] LF Note that in this form the SP is appended regardless of whether the tree or blob object has path information available. Paths containing a newline are also truncated at the newline. Introduce the `-z` option for git-rev-list(1) which reformats the output to use NUL-delimiters between objects and associated info in the following form: $ git rev-list -z --objects <rev> <oid> NUL [path=<path> NUL] In this form, the start of each record is signaled by an OID entry that is all hexidecimal and does not contain any '='. Additional path info from `--objects` is appended to the record as a token/value pair `path=<path>` as-is without any truncation. For now, the `--objects` flag is the only options that can be used in combination with `-z`. In a subsequent commit, NUL-delimited support for other options is added. Other options that do not make sense when used in combination with `-z` are rejected. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:40:02 -07:00
Justin Tobler	c9907a1916	rev-list: refactor early option parsing Before invoking `setup_revisions()`, the `--missing` and `--exclude-promisor-objects` options are parsed early. In a subsequent commit, another option is added that must be parsed early. Refactor the code to parse both options in a single early pass. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:40:02 -07:00
Justin Tobler	1481e29112	rev-list: inline `show_object_with_name()` in `show_object()` The `show_object_with_name()` function only has a single call site. Inline call to `show_object_with_name()` in `show_object()` so the explicit function can be cleaned up and live closer to where it is used. While at it, factor out the code that prints the OID and newline for both objects with and without a name. In a subsequent commit, `show_object()` is modified to support printing object information in a NUL-delimited format. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 03:40:02 -07:00
Taylor Blau	459e54b549	refspec: replace `refspec_item_init()` with fetch/push variants For similar reasons as in the previous refactoring of `refspec_init()` into `refspec_init_fetch()` and `refspec_init_push()`, apply the same refactoring to `refspec_item_init()`. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 01:45:16 -07:00
Taylor Blau	ec6829e484	refspec: remove refspec_item_init_or_die() There are two callers of this function, which ensures that a dispatched call to refspec_item_init() does not fail. In the following commit, we're going to add fetch/push-specific variants of refspec_item_init(), which will turn one function into two. To avoid introducing yet another pair of new functions (such as refspec_item_init_push_or_die() and refspec_item_init_fetch_or_die()), let's remove the thin wrapper entirely. This duplicates a single line of code among two callers, but thins the refspec.h API by one function, and prevents introducing two more in the following commit. Note that we still have a trailing Boolean argument in the function `refspec_item_init()`. The following commit will address this. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 01:45:16 -07:00
Taylor Blau	3809633d0a	refspec: treat 'fetch' as a Boolean value Since `6d4c057859` (refspec: introduce struct refspec, 2018-05-16), we have macros called REFSPEC_FETCH and REFSPEC_PUSH. This confusingly suggests that we might introduce other modes in the future, which, while possible, is highly unlikely. But these values are treated as a Boolean, and stored in a struct field called 'fetch'. So the following: if (refspec->fetch == REFSPEC_FETCH) { ... } , and if (refspec->fetch) { ... } are equivalent. Let's avoid renaming the Boolean values "true" and "false" here and remove the two REFSPEC_ macros mentioned above. Since this value is truly a Boolean and will only ever take on a value of 0 or 1, we can declare it as a single bit unsigned field. In practice this won't shrink the size of 'struct refspec', but it more clearly indicates the intent. Note that this introduces some awkwardness like: refspec_item_init_or_die(&spec, refspec, 1); , where it's unclear what the final "1" does. This will be addressed in the following commits. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 01:45:15 -07:00
Jensen Huang	d39f04b638	index-pack, unpack-objects: restore missing ->init_fn Commit `0578f1e66a` ("global: adapt callers to use generic hash context helpers") accidentally removed `->init_fn`, which is required for OpenSSL 3+ SHA1. This fixes the following error on fetch: fatal: fetch-pack: invalid index-pack output Signed-off-by: Jensen Huang <hmz007@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-18 12:27:33 -07:00
Jeff King	aab0f899d9	fetch: don't ask for remote HEAD if followRemoteHEAD is "never" When we are going to consider updating the refs/remotes/*/HEAD symref, we have to ask the remote side where its HEAD points. But if we know that the feature is disabled by config, we don't need to bother! This saves a little bit of work and network communication for the server. And even a little bit of effort on the client, as our local set_head() function did a bit of work matching the remote HEAD before realizing that we're not going to do anything with it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-18 12:21:26 -07:00
Jeff King	c834d1a7ce	fetch: only respect followRemoteHEAD with configured refspecs The new followRemoteHEAD feature is triggered for almost every fetch, causing us to ask the server about the remote "HEAD" and to consider updating our local tracking HEAD symref. This patch limits the feature only to the case when we are fetching a remote using its configured refspecs (typically into its refs/remotes/ hierarchy). There are two reasons for this. One is efficiency. E.g., the fixes in `6c915c3f85` (fetch: do not ask for HEAD unnecessarily, 2024-12-06) and `20010b8c20` (fetch: avoid ls-refs only to ask for HEAD symref update, 2025-03-08) were aimed at reducing the work we do when we would not be able to update HEAD anyway. But they do not quite cover all cases. The remaining one is: git fetch origin refs/heads/foo:refs/remotes/origin/foo which _sometimes_ can update HEAD, but usually not. And that leads us to the second point, which is being simple and explainable. The code for updating the tracking HEAD symref requires both that we learned which ref the remote HEAD points at, and that the server advertised that ref to us. But because the v2 protocol narrows the server's advertisement, the command above would not typically update HEAD at all, unless it happened to point to the "foo" branch. Or even weirder, it probably _would_ update if the server is very old and supports only the v0 protocol, which always gives a full advertisement. This creates confusing behavior for the user: sometimes we may try to update HEAD and sometimes not, depending on vague rules. One option here would be to loosen the update code to accept the remote HEAD even if the server did not advertise that ref. I think that could work, but it may also lead to interesting corner cases (e.g., creating a dangling symref locally, even though the branch is not unborn on the server, if we happen not to have fetched it). So let's instead simplify the rules: we'll only consider updating the tracking HEAD symref when we're doing a full fetch of the remote's configured refs. This is easy to implement; we can just set a flag at the moment we realize we're using the configured refspecs. And we can drop the special case code added by `6c915c3f85` and `20010b8c20`, since this covers those cases. The existing tests from those commits still pass. In t5505, an incidental call to "git fetch <remote> <refspec>" updated HEAD, which caused us to adjust the test in `3f763ddf28` (fetch: set remote/HEAD if it does not exist, 2024-11-22). We can now adjust that back to how it was before the feature was added. Even though t5505 is incidentally testing our new desired behavior, we'll add an explicit test in t5510 to make sure it is covered. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-18 12:21:25 -07:00
Elijah Newren	947e219fb6	am: switch from merge_recursive_generic() to merge_ort_generic() Switch from merge-recursive to merge-ort. Adjust the following testcases due to the switch: * t4151: This test left an untracked file in the way of the merge. merge-recursive could only sometimes tell when untracked files were in the way, and by the time it discovers others, it has already made too many changes to back out of the merge. So, instead of writing the results to e.g. 'file1' it would instead write them to 'file1~branch1'. This is confusing for users, because they might not notice 'file1~branch1' and accidentally add and commit 'file1'. In contrast, merge-ort correctly notices the file in the way before making any changes and aborts. Since this test didn't care about the file in the way, just remove it before calling git-am. * t4255: Usage of merge-ort allows us to change two known failures into successes. * t6427: As noted a few commits ago, the choice of conflict label for diff3 markers for the ancestor commit was previously handled by merge-recursive.c rather than by callers. Since that has now changed, `git am` needs to specify that label. Although the previous conflict label ("constructed merge base") was already fairly somewhat slanted towards `git am`, let's use wording more along the lines of the related command-line flag from `git apply` and function involved to tie it more closely to `git am`. Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-18 09:49:08 -07:00
Junio C Hamano	77f32ba430	Merge branch 'tb/multi-cruft-pack-refresh-fix' into tb/combine-cruft-below-size * tb/multi-cruft-pack-refresh-fix: builtin/pack-objects.c: freshen objects from existing cruft packs	2025-03-17 17:00:38 -07:00
Karthik Nayak	d1270689a1	reflog: implement subcommand to drop reflogs While 'git-reflog(1)' currently allows users to expire reflogs and delete individual entries, it lacks functionality to completely remove reflogs for specific references. This becomes problematic in repositories where reflogs are not needed but continue to accumulate entries despite setting 'core.logAllRefUpdates=false'. Add a new 'drop' subcommand to git-reflog that allows users to delete the entire reflog for a specified reference. Include an '--all' flag to enable dropping all reflogs from all worktrees and an addon flag '--single-worktree', to only drop all reflogs from the current worktree. While here, remove an extraneous newline in the file. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-17 16:58:11 -07:00
Karthik Nayak	52f2dfb084	reflog: improve error for when reflog is not found The 'git reflog expire' prints the error message '<ref> points nowhere!' when used with a non-existent ref. This message is a bit confusing and vague. Modify the message to be more clear and direct. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-17 16:58:11 -07:00
Elijah Newren	e40eefba02	stash: remove merge-recursive.h include stash was modified to use merge_ort_nonrecursive() instead of merge_recursive_generic() back in commit `874cf2a604` (stash: apply stash using 'merge_ort_nonrecursive()', 2022-05-10). That makes the inclusion of merge-recursive.h unnecessary. In preparation for the removal of merge-recursive.h, remove the unnecessary include. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-17 15:39:03 -07:00
Taylor Blau	08f612ba70	builtin/pack-objects.c: freshen objects from existing cruft packs Once an object is written into a cruft pack, we can only freshen it by writing a new loose or packed copy of that object with a more recent mtime. Prior to `61568efa95` (builtin/pack-objects.c: support `--max-pack-size` with `--cruft`, 2023-08-28), we typically had at most one cruft pack in a repository at any given time. So freshening unreachable objects was straightforward when already rewriting the cruft pack (and its .mtimes file). But `61568efa95` changes things: 'pack-objects' now supports writing multiple cruft packs when invoked with `--cruft` and the `--max-pack-size` flag. Cruft packs are rewritten until they reach some size threshold, at which point they are considered "frozen", and will only be modified in a pruning GC, or if the threshold itself is adjusted. Prior to this patch, however, this process breaks down when we attempt to freshen an object packed in an earlier cruft pack, and that cruft pack is larger than the threshold and thus will survive the repack. When this is the case, it is impossible to freshen objects in cruft pack(s) when those cruft packs are larger than the threshold. This is because we would avoid writing them in the new cruft pack entirely, for a couple of reasons. 1. When enumerating packed objects via 'add_objects_in_unpacked_packs()' we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping over the packs we're going to retain (which are marked as kept in-core by 'read_cruft_objects()'). This means that we will avoid enumerating additional packed copies of objects found in any cruft packs which are larger than the given size threshold. Thus there is no opportunity to call 'create_object_entry()' whatsoever. 2. We likewise will discard the loose copy (if one exists) of any unreachable object packed in a cruft pack that is larger than the threshold. Here our call path is 'add_unreachable_loose_objects()', which uses the 'add_loose_object()' callback. That function will eventually land us in 'want_object_in_pack()' (via 'add_cruft_object_entry()'), and we'll discard the object as it appears in one of the packs which we marked as kept in-core. This means in effect that it is impossible to freshen an unreachable object once it appears in a cruft pack larger than the given threshold. Instead, we should pack an additional copy of an unreachable object we want to freshen even if it appears in a cruft pack, provided that the cruft copy has an mtime which is before the mtime of the copy we are trying to pack/freshen. This is sub-optimal in the sense that it requires keeping an additional copy of unreachable objects upon freshening, but we don't have a better alternative without the ability to make in-place modifications to existing .mtimes files. In order to implement this, we have to adjust the behavior of 'want_found_object()'. When 'pack-objects' is told that we're not going to retain any cruft packs (i.e. the set of packs marked as kept in-core does not contain a cruft pack), the behavior is unchanged. But when there is at least one cruft pack that we're holding onto, it is no longer sufficient to reject a copy of an object found in that cruft pack for that reason alone. In this case, we only want to reject a candidate object when copies of that object either: - exists in a non-cruft pack that we are retaining, regardless of that pack's mtime, or - exists in a cruft pack with an mtime at least as recent as the copy we are debating whether or not to pack, in which case freshening would be redundant. To do this, keep track of whether or not we have any cruft packs in our in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft' flag. When we end up in this new special case, we replace a call to 'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject objects when we have a copy in an existing cruft pack with at least as recent an mtime as our candidate (in which case "freshening" would be redundant). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-13 11:48:04 -07:00
Patrick Steinhardt	cec2b6f55a	refs/iterator: separate lifecycle from iteration The ref and reflog iterators have their lifecycle attached to iteration: once the iterator reaches its end, it is automatically released and the caller doesn't have to care about that anymore. When the iterator should be released before it has been exhausted, callers must explicitly abort the iterator via `ref_iterator_abort()`. This lifecycle is somewhat unusual in the Git codebase and creates two problems: - Callsites need to be very careful about when exactly they call `ref_iterator_abort()`, as calling the function is only valid when the iterator itself still is. This leads to somewhat awkward calling patterns in some situations. - It is impossible to reuse iterators and re-seek them to a different prefix. This feature isn't supported by any iterator implementation except for the reftable iterators anyway, but if it was implemented it would allow us to optimize cases where we need to search for specific references repeatedly by reusing internal state. Detangle the lifecycle from iteration so that we don't deallocate the iterator anymore once it is exhausted. Instead, callers are now expected to always call a newly introduce `ref_iterator_free()` function that deallocates the iterator and its internal state. Note that the `dir_iterator` is somewhat special because it does not implement the `ref_iterator` interface, but is only used to implement other iterators. Consequently, we have to provide `dir_iterator_free()` instead of `dir_iterator_release()` as the allocated structure itself is managed by the `dir_iterator` interfaces, as well, and not freed by `ref_iterator_free()` like in all the other cases. While at it, drop the return value of `ref_iterator_abort()`, which wasn't really required by any of the iterator implementations anyway. Furthermore, stop calling `base_ref_iterator_free()` in any of the backends, but instead call it in `ref_iterator_free()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-12 11:31:18 -07:00
Patrick Steinhardt	3c20bf0c85	builtin/update-ref: skip ambiguity checks when parsing object IDs Most of the commands in git-update-ref(1) accept an old and/or new object ID to update a specific reference to. These object IDs get parsed via `repo_get_oid()`, which not only handles plain object IDs, but also those that have a suffix like "~" or "^2". More surprisingly though, it even knows to resolve arbitrary revisions, despite the fact that its manpage does not mention this fact even once. One consequence of this is that we also check for ambiguous references: when parsing a full object ID where the DWIM mechanism would also cause us to resolve it as a branch, we'd end up printing a warning. While this check makes sense to have in general, it is arguably less useful in the context of git-update-ref(1). This is due to multiple reasons: - The manpage is explicitly structured around object IDs. So if we see a fully blown object ID, the intent should be quite clear in general. - The command is part of our plumbing layer and not a tool that users would generally use in interactive workflows. As such, the warning will likely not be visible to anybody in the first place. - Users can and should use the fully-qualified refname in case there is any potential for ambiguity. And given that this command is part of our plumbing layer, one should always try to be as defensive as possible and use fully-qualified refnames. Furthermore, this check can be quite expensive when updating lots of references via `--stdin`, because we try to read multiple references per object ID that we parse according to the DWIM rules. This effect can be seen both with the "files" and "reftable" backend. The issue is not unique to git-update-ref(1), but was also an issue in git-cat-file(1), where it was addressed by disabling the ambiguity check in `25fba78d36` (cat-file: disable object/refname ambiguity check for batch mode, 2013-07-12). Disable the warning in git-update-ref(1), which provides a significant speedup with both backends. The user-visible outcome is unchanged even when ambiguity exists, except that we don't show the warning anymore. The following benchmark creates 10000 new references with a 100000 preexisting refs with the "files" backend: Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~) Time (mean ± σ): 467.3 ms ± 5.1 ms [User: 100.0 ms, System: 365.1 ms] Range (min … max): 461.9 ms … 479.3 ms 10 runs Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) Time (mean ± σ): 394.1 ms ± 5.8 ms [User: 63.3 ms, System: 327.6 ms] Range (min … max): 384.9 ms … 405.7 ms 10 runs Summary update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran 1.19 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~) And with the "reftable" backend: Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~) Time (mean ± σ): 146.9 ms ± 2.2 ms [User: 90.4 ms, System: 56.0 ms] Range (min … max): 142.7 ms … 150.8 ms 19 runs Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) Time (mean ± σ): 63.2 ms ± 1.1 ms [User: 41.0 ms, System: 21.8 ms] Range (min … max): 61.1 ms … 66.6 ms 41 runs Summary update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran 2.32 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~) Note that the absolute improvement with both backends is roughly in the same ballpark, but the relative improvement for the "reftable" backend is more significant because writing the new table to disk is faster in the first place. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-12 11:31:17 -07:00
Junio C Hamano	de3dec1187	name-rev: remove "--stdin" support As part of Git 3.0, remove the hidden synonym for "--annotate-stdin" for real. As this does not change the fact that it used to be called "--stdin" in older version of Git, keep that passage in the documentation for "--annotate-stdin". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-12 08:48:54 -07:00
Luke Shumaker	d9cb0e6ff8	fast-export, fast-import: add support for signed-commits fast-export has a --signed-tags= option that controls how to handle tag signatures. However, there is no equivalent for commit signatures; it just silently strips the signature out of the commit (analogously to --signed-tags=strip). While signatures are generally problematic for fast-export/fast-import (because hashes are likely to change), if they're going to support tag signatures, there's no reason to not also support commit signatures. So, implement a --signed-commits= option that mirrors the --signed-tags= option. On the fast-export side, try to be as much like signed-tags as possible, in both implementation and in user-interface. This will change the default behavior to '--signed-commits=abort' from what is now '--signed-commits=strip'. In order to provide an escape hatch for users of third-party tools that call fast-export and do not yet know of the --signed-commits= option, add an environment variable 'FAST_EXPORT_SIGNED_COMMITS_NOABORT=1' that changes the default to '--signed-commits=warn-strip'. Signed-off-by: Luke Shumaker <lukeshu@datawire.io> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:24:56 -07:00
Luke Shumaker	dda9bff3c5	fast-export: do not modify memory from get_commit_buffer fast-export's helper function find_encoding() takes a `const char *`, but modifies that memory despite the `const`. Ultimately, this memory came from get_commit_buffer(), and you're not supposed to modify the memory that you get from get_commit_buffer(). So, get rid of find_encoding() in favor of commit.h:find_commit_header(), which gives back a string length, rather than mutating the memory to insert a '\0' terminator. Because find_commit_header() detects the "\n\n" string that separates the headers and the commit message, move the call to be above the `message = strstr(..., "\n\n")` call. This helps readability, and allows for the value of `encoding` to be used for a better value of "..." so that the same memory doesn't need to be checked twice. Introduce a `commit_buffer_cursor` variable to avoid writing an awkward `encoding ? encoding + encoding_len : committer_end` expression. Signed-off-by: Luke Shumaker <lukeshu@datawire.io> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:24:56 -07:00
Luke Shumaker	3b24d86c56	fast-export: rename --signed-tags='warn' to 'warn-verbatim' The --signed-tags= option takes one of five arguments specifying how to handle signed tags during export. Among these arguments, 'strip' is to 'warn-strip' as 'verbatim' is to 'warn' (the unmentioned argument is 'abort', which stops the fast-export process entirely). That is, signatures are either stripped or copied verbatim while exporting, with or without a warning. Match the pattern and rename 'warn' to 'warn-verbatim' to make it clear that it instructs fast-export to copy signatures verbatim. To maintain backwards compatibility, 'warn' is still recognized as deprecated synonym of 'warn-verbatim'. Signed-off-by: Luke Shumaker <lukeshu@datawire.io> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:24:55 -07:00
Christian Couder	73ca6d2001	fast-export: fix missing whitespace after switch "Documentation/CodingGuidelines" says that there should be whitespaces around operators like 'if', 'switch', 'for', etc. Let's fix this in "builtin/fast-export.c". Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:24:55 -07:00
Patrick Steinhardt	7d70b29c4f	hash: stop depending on `the_repository` in `null_oid()` The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:16:20 -07:00
Patrick Steinhardt	19be71db9c	delta-islands: stop depending on `the_repository` There are multiple sites in "delta-islands.c" where we use the global `the_repository` variable, either explicitly or implicitly by using `the_hash_algo`. Refactor the code to stop using `the_repository`. In most cases this is trivial because we already had a repository available in the calling context, with the only exception being `propagate_island_marks()`. Adapt it so that the repository gets passed in via a parameter. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:16:20 -07:00
Patrick Steinhardt	f6e174b2d8	object-file-convert: stop depending on `the_repository` There are multiple sites in "object-file-convert.c" where we use the global `the_repository` variable, either explicitly or implicitly by using `the_hash_algo`. All of these callsites are transitively called from `convert_object_file()`, which indeed has no repo as input. Refactor the function so that it receives a repository as a parameter and pass it through to all internal functions to get rid of the dependency. Remove the `USE_THE_REPOSITORY_VARIABLE` define. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:16:19 -07:00
Patrick Steinhardt	7835ee75cd	environment: move access to "core.bigFileThreshold" into repo settings The "core.bigFileThreshold" setting is stored in a global variable and populated via `git_default_core_config()`. This may cause issues in the case where one is handling multiple different repositories in a single process with different values for that config key, as we may or may not see the correct value in that case. Furthermore, global state blocks our path towards libification. Refactor the code so that we instead store the value in `struct repo_settings`, where the value is computed as-needed and cached. Note that this change requires us to adapt one test in t1050 that verifies that we die when parsing an invalid "core.bigFileThreshold" value. The exercised Git command doesn't use the value at all, and thus it won't hit the new code path that parses the value. This is addressed by using git-hash-object(1) instead, which does read the value. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:16:18 -07:00
Patrick Steinhardt	2582846f2f	pack-write: stop depending on `the_repository` and `the_hash_algo` There are a couple of functions in "pack-write.c" that implicitly depend on `the_repository` or `the_hash_algo`. Remove this dependency by injecting the repository via a parameter and adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:16:18 -07:00
Patrick Steinhardt	74d414c9f1	object: stop depending on `the_repository` There are a couple of functions exposed by "object.c" that implicitly depend on `the_repository`. Remove this dependency by injecting the repository via a parameter. Adapt callers accordingly by simply using `the_repository`, except in cases where the subsystem is already free of the repository. In that case, we instead pass the repository provided by the caller's context. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:16:18 -07:00
Patrick Steinhardt	228457c9d9	csum-file: stop depending on `the_repository` There are multiple sites in "csum-file.c" where we use the global `the_repository` variable, either explicitly or implicitly by using `the_hash_algo`. Refactor the code to stop using `the_repository` by adapting functions to receive required data as parameters. Adapt callsites accordingly by either using `the_repository->hash_algo`, or by using a context-provided hash algorithm in case the subsystem already got rid of its dependency on `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:16:18 -07:00
Jeff King	c702dd4856	fetch: use ref prefix list to skip ls-refs In git-fetch we have an optimization to avoid issuing an ls-refs command to the server if we don't care about the value of any refs (e.g., because we are fetching exact object ids), saving a round-trip to the server. This comes from `e70a3030e7` (fetch: do not list refs if fetching only hashes, 2018-09-27). It uses an explicit flag "must_list_refs" to decide when we need to do so. That was needed back then, because the list of ref-prefixes was not always complete. If it was empty, it did not necessarily mean that we were not interested in any refs). But that is no longer the case; an empty list of prefixes means that we truly do not care about any refs. And so rather than an explicit flag, we can just check whether we are interested in any ref prefixes. This simplifies the code slightly, as there is now a single source of truth for the decision. It also fixes a bug in / optimizes a very unlikely case, which is: git fetch $remote ^foo $oid I.e., a negative refspec combined with an exact oid fetch. This is somewhat nonsense, in that there are no positive refspecs mentioning refs to countermand with the negative one. But we should be able to do this without issuing an ls-refs command (excluding "foo" from the empty set will obviously still be the empty set). However, the current code does not do so. The negative refspec is not counted as a noop in un-setting the must_list_refs flag (hardly the fault of `e70a3030e7`, as negative refspecs did not appear until much later). But by using the prefix list as a source of truth, this naturally just works; the negative refspec does not add a prefix to ask about, and hence does not trigger the ls-refs call. This is esoteric enough that I didn't bother adding a test. The real value here is in the code simplification. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:13:46 -07:00
Jeff King	20010b8c20	fetch: avoid ls-refs only to ask for HEAD symref update When we fetch from a configured remote, we may try to update the local refs/remotes/<origin>/HEAD, and so we ask the server to advertise its HEAD to us. But if we aren't otherwise asking about any refs at all, then we know this HEAD update can never happen! To consider a new value for HEAD, the set_head() function uses guess_remote_head(). And even if it sees an explicit symref value for HEAD, it will only report that as a match if we also saw that remote ref advertised, and it mapped to a local tracking ref via get_fetch_map(). In other words, a fetch like this: git fetch origin $exact_oid:refs/heads/foo can never update HEAD, because we will never have fetched (nor even see the advertisement for) the ref that HEAD points to. Currently the command above will still call ls-refs to ask about the HEAD, even though it is pointless. This patch teaches it to skip the ls-refs call entirely in this case, which avoids a round-trip to the server. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:13:46 -07:00
Jeff King	095bc13f35	fetch: stop protecting additions to ref-prefix list When using the ref-prefix feature of protocol v2, a client which sends no prefixes at all will get the full advertisement. And so the code in git-fetch was historically loose about setting up that list based on our refspecs. There were cases where we needed to know about some refs, so we just didn't add anything to the ref-prefix list. And hence further code, like that for tag-following and updating origin/HEAD, had to be careful about adding to an empty list. E.g., see the bug fixed by `bd52d9a058` (fetch: fix following tags when fetching specific OID, 2025-03-07). But the previous commit removed the last such case, and now we know an empty ref-prefix list (at least inside git-fetch's do_fetch() function) means that we really don't need to see any refs. So we can drop those extra conditionals. This simplifies the code a little. But it also means that some cases can now use ref prefixes when they would not otherwise. As the test shows, fetching an exact oid into a local ref can now avoid enumerating all of the refs. The refspec itself doesn't need to know about any remote refs, and the tag auto-following can just ask about refs/tags/. The same is true for asking about HEAD to update the local origin/HEAD. I didn't add a test for that yet, though, as we can optimize it even further. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:13:45 -07:00
Jeff King	625ed92134	fetch: ask server to advertise HEAD for config-less fetch If we're not given any refspecs (either on the command line or via config) and we have no branch merge config, then we fetch the remote HEAD into our local FETCH_HEAD. In that case we do not send any ref-prefix option to the server at all, and we see the full advertisement. But this is sub-optimal. We only care about HEAD, so we can just ask for that, and ignore all of the other refs. The new test demonstrates a case where we see fewer refs (in this case only one less, but in theory we could be ignoring millions of them). This also removes the only case where we care about seeing some refs from the other side, but don't add anything to the ref_prefixes list. Cleaning this up means one less maintenance burden. Before this patch, any code which wanted to add to the list had to make sure the list was not empty, since an empty list meant "ask for everything". Now it really means "we are not interested in any refs". This should let us optimize a few more cases in subsequent patches. Note that we'll add "HEAD" to the list of prefixes, and later code for updating "refs/remotes/<remote>/HEAD" may likewise do so. In theory this could cause duplicates in the list, but in practice these can't both trigger. We hit our new case only if there are no refspecs, and the "<remote>/HEAD" feature is enabled only when we are fetching from a remote with configured refspecs. We could be defensive with a flag, but it didn't seem worth it to me (the absolute worse case is a useless redundant ref-prefix line sent to the server). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:13:45 -07:00
Junio C Hamano	5d55ad01f5	Merge branch 'tb/fetch-follow-tags-fix' * tb/fetch-follow-tags-fix: fetch: fix following tags when fetching specific OID	2025-03-10 08:45:58 -07:00
Usman Akinyemi	09cbf1597e	builtin/checkout-index: stop using `the_repository` Remove the_repository global variable in favor of the repository argument that gets passed in "builtin/checkout-index.c". When `-h` is passed to the command outside a Git repository, the `run_builtin()` will call the `cmd_checkout_index()` function with `repo` set to NULL and then early in the function, `show_usage_with_options_if_asked()` call will give the options help and exit. Pass an instance of "struct index_state" available in the calling context to both `checkout_all()` and `checkout_file()` to remove their dependency on the global `the_repository` variable. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-07 16:52:02 -08:00
Usman Akinyemi	d9dce89192	builtin/for-each-ref: stop using `the_repository` Remove the_repository global variable in favor of the repository argument that gets passed in "builtin/for-each-ref.c". When `-h` is passed to the command outside a Git repository, the `run_builtin()` will call the `cmd_for_each_ref()` function with `repo` set to NULL and then early in the function, `parse_options()` call will give the options help and exit. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-07 16:52:02 -08:00
Usman Akinyemi	d9c5cfb18f	builtin/ls-files: stop using `the_repository` Remove the_repository global variable in favor of the repository argument that gets passed in "builtin/ls-files.c". When `-h` is passed to the command outside a Git repository, the `run_builtin()` will call the `cmd_ls_files()` function with `repo` set to NULL and then early in the function, `show_usage_with_options_if_asked()` call will give the options help and exit. Pass the repository available in the calling context to both `expand_objectsize()` and `show_ru_info()` to remove their dependency on the global `the_repository` variable. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-07 16:52:01 -08:00
Usman Akinyemi	72fe8bfac8	builtin/pack-refs: stop using `the_repository` Remove the_repository global variable in favor of the repository argument that gets passed in "builtin/pack-refs.c". When `-h` is passed to the command outside a Git repository, the `run_builtin()` will call the `cmd_pack_refs()` function with `repo` set to NULL and then early in the function, `parse_options()` call will give the options help and exit. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-07 16:52:01 -08:00
Usman Akinyemi	1c14b1aede	builtin/send-pack: stop using `the_repository` Remove the_repository global variable in favor of the repository argument that gets passed in "builtin/send-pack.c". When `-h` is passed to the command outside a Git repository, the `run_builtin()` will call the `cmd_send_pack()` function with `repo` set to NULL and then early in the function, `parse_options()` call will give the options help and exit. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-07 16:52:01 -08:00
Usman Akinyemi	db58d5a351	builtin/verify-commit: stop using `the_repository` Remove the_repository global variable in favor of the repository argument that gets passed in "builtin/verify-commit.c". When `-h` is passed to the command outside a Git repository, the `run_builtin()` will call the `cmd_verify_commit()` function with `repo` set to NULL and then early in the function, `parse_options()` call will give the options help and exit. Pass the repository available in the calling context to `verify_commit()` to remove it's dependency on the global `the_repository` variable. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-07 16:52:01 -08:00
Usman Akinyemi	43a8391977	builtin/verify-tag: stop using `the_repository` Remove the_repository global variable in favor of the repository argument that gets passed in "builtin/verify-tag.c". When `-h` is passed to the command outside a Git repository, the `run_builtin()` will call the `cmd_verify_tag()` function with `repo` set to NULL and then early in the function, `parse_options()` call will give the options help and exit. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-07 16:52:01 -08:00
Taylor Blau	bd52d9a058	fetch: fix following tags when fetching specific OID In `3f763ddf28` (fetch: set remote/HEAD if it does not exist, 2024-11-22), unconditionally adds "HEAD" to the list of ref prefixes we send to the server. This breaks a core assumption that the list of prefixes we send to the server is complete. We must either send all prefixes we care about, or none at all (in the latter case the server then advertises everything). The tag following code is careful to only add "refs/tags/" to the list of prefixes if there are already entries in the prefix list. But because the new code from `3f763ddf28` runs after the tag code, and because it unconditionally adds to the prefix list, we may end up with a prefix list that _should_ have "refs/tags/" in it, but doesn't. When that is the case, the server does not advertise any tags, and our auto-following breaks because we never learned about any tags in the first place. Fix this by only adding "HEAD" to the ref prefixes when we know that we are already limiting the advertisement. In either case we'll learn about HEAD (either through the limited advertisement, or implicitly through a full advertisement). Reported-by: Igor Todorovski <itodorov@ca.ibm.com> Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-07 16:15:18 -08:00
Junio C Hamano	cdf458c60e	Merge branch 'kn/ref-migrate-skip-reflog' Usage string of "git refs" has been corrected. * kn/ref-migrate-skip-reflog: refs: show --no-reflog in the help text	2025-03-05 10:37:45 -08:00
Junio C Hamano	feffb34257	Merge branch 'ps/path-sans-the-repository' The path.[ch] API takes an explicit repository parameter passed throughout the callchain, instead of relying on the_repository singleton instance. * ps/path-sans-the-repository: path: adjust last remaining users of `the_repository` environment: move access to "core.sharedRepository" into repo settings environment: move access to "core.hooksPath" into repo settings repo-settings: introduce function to clear struct path: drop `git_path()` in favor of `repo_git_path()` rerere: let `rerere_path()` write paths into a caller-provided buffer path: drop `git_common_path()` in favor of `repo_common_path()` worktree: return allocated string from `get_worktree_git_dir()` path: drop `git_path_buf()` in favor of `repo_git_path_replace()` path: drop `git_pathdup()` in favor of `repo_git_path()` path: drop unused `strbuf_git_path()` function path: refactor `repo_submodule_path()` family of functions submodule: refactor `submodule_to_gitdir()` to accept a repo path: refactor `repo_worktree_path()` family of functions path: refactor `repo_git_path()` family of functions path: refactor `repo_common_path()` family of functions	2025-03-05 10:37:43 -08:00
Junio C Hamano	6dff5de1da	refs: show --no-reflog in the help text We forgot that we must keep the documentation and help text in sync. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-03 14:51:29 -08:00
Justin Tobler	cf15095ec5	builtin/diff-pairs: allow explicit diff queue flush The diffs queued from git-diff-pairs(1) are flushed when stdin is closed. To enable greater flexibility, allow control over when the diff queue is flushed by writing a single NUL byte on stdin between input file pairs. Diff output between flushes is separated by a single NUL byte. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-03 08:17:47 -08:00
Justin Tobler	5bd10b2adc	builtin: introduce diff-pairs command Through git-diff(1), a single diff can be generated from a pair of blob revisions directly. Unfortunately, there is not a mechanism to compute batches of specific file pair diffs in a single process. Such a feature is particularly useful on the server-side where diffing between a large set of changes is not feasible all at once due to timeout concerns. To facilitate this, introduce git-diff-pairs(1) which acts as a backend passing its NUL-terminated raw diff format input from stdin through diff machinery to produce various forms of output such as patch or raw. The raw format was originally designed as an interchange format and represents the contents of the diff_queued_diff list making it possible to break the diff pipeline into separate stages. For example, git-diff-tree(1) can be used as a frontend to compute file pairs to queue and feed its raw output to git-diff-pairs(1) to compute patches. With this, batches of diffs can be progressively generated without having to recompute renames or retrieve object context. Something like the following: git diff-tree -r -z -M $old $new \| git diff-pairs -p -z should generate the same output as `git diff-tree -p -M`. Furthermore, each line of raw diff formatted input can also be individually fed to a separate git-diff-pairs(1) process and still produce the same output. Based-on-patch-by: Jeff King <peff@peff.net> Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-03 08:17:47 -08:00
Patrick Steinhardt	028f618658	path: adjust last remaining users of `the_repository` With the preceding refactorings we now only have a couple of implicit users of `the_repository` left in the "path" subsystem, all of which depend on global state via `calc_shared_perm()`. Make the dependency on `the_repository` explicit by passing the repo as a parameter instead and adjust callers accordingly. Note that this change bubbles up into a couple of subsystems that were previously declared as free from `the_repository`. Instead of marking all of them as `the_repository`-dependent again, we instead use the repository that is available in the calling context. There are three exceptions though with "copy.c", "pack-write.c" and "tempfile.c". Adjusting these would require us to adapt callsites all over the place, so this is left for a future iteration. Mark "path.c" as free from `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-28 13:54:11 -08:00
Patrick Steinhardt	f1ce861c34	environment: move access to "core.sharedRepository" into repo settings Similar as with the preceding commit, we track "core.sharedRepository" via a pair of global variables. Move them into `struct repo_settings` so that we can instead track them per-repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-28 13:54:11 -08:00
Patrick Steinhardt	88dd321cfe	path: drop `git_path()` in favor of `repo_git_path()` Remove `git_path()` in favor of the `repo_git_path()` family of functions, which makes the implicit dependency on `the_repository` go away. Note that `git_path()` returned a string allocated via `get_pathname()`, which uses a rotating set of statically allocated buffers. Consequently, callers didn't have to free the returned string. The same isn't true for `repo_common_path()`, so we also have to add logic to free the returned strings. This refactoring also allows us to remove `repo_common_pathv()` as well as `get_pathname()` from the public interface. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-28 13:54:11 -08:00
Patrick Steinhardt	8ee018d863	rerere: let `rerere_path()` write paths into a caller-provided buffer Same as with `get_worktree_git_dir()` a couple of commits ago, the `rerere_path()` function returns paths that need not be free'd by the caller because `git_path()` internally uses `get_pathname()`. Refactor the function to instead accept a caller-provided buffer that the path will be written into, passing on ownership to the caller. This refactoring prepares us for the removal of `git_path()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-28 13:54:11 -08:00
Junio C Hamano	3c0f4abaf5	Merge branch 'kn/ref-migrate-skip-reflog' "git refs migrate" can optionally be told not to migrate the reflog. * kn/ref-migrate-skip-reflog: builtin/refs: add '--no-reflog' flag to drop reflogs	2025-02-27 15:23:00 -08:00
Junio C Hamano	9d8cce051a	Merge branch 'ua/os-version-capability' The value of "uname -s" is by default sent over the wire as a part of the "version" capability. * ua/os-version-capability: agent: advertise OS name via agent capability t5701: add setup test to remove side-effect dependency version: extend get_uname_info() to hide system details version: refactor get_uname_info() version: refactor redact_non_printables() version: replace manual ASCII checks with isprint() for clarity	2025-02-27 15:23:00 -08:00
shejialuo	c1cf918d3a	builtin/fsck: add `git refs verify` child process At now, we have already implemented the ref consistency checks for both "files-backend" and "packed-backend". Although we would check some redundant things, it won't cause trouble. So, let's integrate it into the "git-fsck(1)" command to get feedback from the users. And also by calling "git refs verify" in "git-fsck(1)", we make sure that the new added checks don't break. Introduce a new function "fsck_refs" that initializes and runs a child process to execute the "git refs verify" command. In order to provide the user interface create a progress which makes the total task be 1. It's hard to know how many loose refs we will check now. We might improve this later. Then, introduce the option to allow the user to disable checking ref database consistency. Put this function in the very first execution sequence of "git-fsck(1)" due to that we don't want the existing code of "git-fsck(1)" which would implicitly check the consistency of refs to die the program. Last, update the test to exercise the code. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-27 14:03:10 -08:00
shejialuo	fdf3820b7e	builtin/refs: get worktrees without reading head information In "packed-backend.c", there are some functions such as "create_snapshot" and "next_record" which would check the correctness of the content of the "packed-ref" file. When anything is bad, the program will die. It may seem that we have nothing relevant to above feature, because we are going to read and parse the raw "packed-ref" file without creating the snapshot and using the ref iterator to check the consistency. However, when using "get_worktrees" in "builtin/refs", we would parse the "HEAD" information. If the referent of the "HEAD" is inside the "packed-ref", we will call "create_snapshot" function to parse the "packed-ref" to get the information. No matter whether the entry of "HEAD" in "packed-ref" is correct, "create_snapshot" would call "verify_buffer_safe" to check whether there is a newline in the last line of the file. If not, the program will die. Although this behavior has no harm for the program, it will short-circuit the program. When the users execute "git refs verify" or "git fsck", we should avoid reading the head information, which may execute the read operation in packed backend with stricter checks to die the program. Instead, we should continue to check other parts of the "packed-refs" file completely. Fortunately, in `465a22b338` (worktree: skip reading HEAD when repairing worktrees, 2023-12-29), we have introduced a function "get_worktrees_internal" which allows us to get worktrees without reading head information. Create a new exposed function "get_worktrees_without_reading_head", then replace the "get_worktrees" in "builtin/refs" with the new created function. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-27 14:03:07 -08:00
Junio C Hamano	e24570b0a3	Merge branch 'jk/check-mailmap-wo-name-fix' "git check-mailmap" segfault fix. * jk/check-mailmap-wo-name-fix: mailmap: fix check-mailmap with full mailmap line	2025-02-26 08:51:00 -08:00
Junio C Hamano	9b07c152df	Merge branch 'pw/merge-tree-stdin-deadlock-fix' "git merge-tree --stdin" has been improved (including a workaround for a deadlock). * pw/merge-tree-stdin-deadlock-fix: merge-tree: fix link formatting in html docs merge-tree: improve docs for --stdin merge-tree: only use basic merge config merge-tree: remove redundant code merge-tree --stdin: flush stdout to avoid deadlock	2025-02-25 14:19:36 -08:00
Jacob Keller	bb60c52131	mailmap: fix check-mailmap with full mailmap line I recently had reported to me a crash from a coworker using the recently added sendemail mailmap support: 3724814 Segmentation fault (core dumped) git check-mailmap "bugs@company.xx" This appears to happen because of the NULL pointer name passed into map_user(). Fix this by passing "" instead of NULL so that we have a valid pointer. Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-21 18:27:16 -08:00
Junio C Hamano	ee8020ff40	Merge branch 'ua/update-server-info-sans-the-repository' Code clean-up. * ua/update-server-info-sans-the-repository: builtin/update-server-info: remove the_repository global variable	2025-02-21 10:35:53 -08:00
Karthik Nayak	89be7d2774	builtin/refs: add '--no-reflog' flag to drop reflogs The "git refs migrate" subcommand converts the backend used for ref storage. It always migrates reflog data as well as refs. Introduce an option to exclude reflogs from migration, allowing them to be discarded when they are unnecessary. This is particularly useful in server-side repositories, where reflogs are typically not expected. However, some repositories may still have them due to historical reasons, such as bugs, misconfigurations, or administrative decisions to enable reflogs for debugging. In such repositories, it would be optimal to drop reflogs during the migration. To address this, introduce the '--no-reflog' flag, which prevents reflog migration. When this flag is used, reflogs from the original reference backend are migrated. Since only the new reference backend remains in the repository, all previous reflogs are permanently discarded. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-21 09:55:02 -08:00
Junio C Hamano	716b00e6e9	Merge branch 'da/difftool-sans-the-repository' "git difftool" code clean-up. * da/difftool-sans-the-repository: difftool: eliminate use of USE_THE_REPOSITORY_VARIABLE difftool: eliminate use of the_repository difftool: eliminate use of global variables	2025-02-18 15:30:32 -08:00
Junio C Hamano	7722b997c6	Merge branch 'jt/rev-list-missing-print-info' "git rev-list --missing=" learned to accept "print-info" that gives known details expected of the missing objects, like path and type. * jt/rev-list-missing-print-info: rev-list: extend print-info to print missing object type rev-list: add print-info action to print missing object path	2025-02-18 15:30:32 -08:00
Junio C Hamano	e565f37553	Merge branch 'ds/backfill' Lazy-loading missing files in a blobless clone on demand is costly as it tends to be one-blob-at-a-time. "git backfill" is introduced to help bulk-download necessary files beforehand. * ds/backfill: backfill: assume --sparse when sparse-checkout is enabled backfill: add --sparse option backfill: add --min-batch-size=<n> option backfill: basic functionality and tests backfill: add builtin boilerplate	2025-02-18 15:30:31 -08:00
Phillip Wood	54cf5d2da8	merge-tree: only use basic merge config Commit `9c93ba4d0a` (merge-recursive: honor diff.algorithm, 2024-07-13) replaced init_merge_options() with init_basic_merge_config() for use in plumbing commands and init_ui_merge_config() for use in porcelain commands. As "git merge-tree" is a plumbing command it should call init_basic_merge_config() rather than init_ui_merge_config(). The merge ort machinery ignores "diff.algorithm" so the behavior is unchanged by this commit but it future proofs us against any future changes to init_ui_merge_config(). Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-18 09:52:39 -08:00
Phillip Wood	1b0e5f4499	merge-tree: remove redundant code real_merge() only ever returns "0" or "1" as it dies if the merge status is less than zero. Therefore the check for "result < 0" is redundant and the result variable is not needed. The return value of real_merge() is ignored because exit status of "git merge-tree --stdin" is "0" for both successful and conflicted merges (the status of each merge is written to stdout). The return type of real_merge() is not changed as it is used for the program's exit status when "--stdin" is not given. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-18 09:52:39 -08:00
Phillip Wood	344a107b55	merge-tree --stdin: flush stdout to avoid deadlock If a process tries to read the output from "git merge-tree --stdin" before it closes merge-tree's stdin then it deadlocks. This happens because merge-tree does not flush its output before trying to read another line of input and means that it is not possible to cherry-pick a sequence of commits using "git merge-tree --stdin". Fix this by calling maybe_flush_or_die() before trying to read the next line of input. Flushing the output after each merge does not seem to affect the performance, any difference is lost in the noise even after increasing the number of runs. $ git rev-list --merges --parents -n100 origin/master \| sed 's/^[^ ]* //' >/tmp/merges $ hyperfine -L flush 0,1 --warmup 1 --runs 30 \ 'GIT_FLUSH={flush} ./git merge-tree --stdin </tmp/merges' Benchmark 1: GIT_FLUSH=0 ./git merge-tree --stdin </tmp/merges Time (mean ± σ): 546.6 ms ± 11.7 ms [User: 503.2 ms, System: 40.9 ms] Range (min … max): 535.9 ms … 567.7 ms 30 runs Benchmark 2: GIT_FLUSH=1 ./git merge-tree --stdin </tmp/merges Time (mean ± σ): 546.9 ms ± 12.0 ms [User: 505.9 ms, System: 38.9 ms] Range (min … max): 529.8 ms … 570.0 ms 30 runs Summary 'GIT_FLUSH=0 ./git merge-tree --stdin </tmp/merges' ran 1.00 ± 0.03 times faster than 'GIT_FLUSH=1 ./git merge-tree --stdin </tmp/merges' Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-18 09:52:39 -08:00
Usman Akinyemi	6aa09fd872	version: extend get_uname_info() to hide system details Currently, get_uname_info() function provides the full OS information. In a following commit, we will need it to provide only the OS name. Let's extend it to accept a "full" flag that makes it switch between providing full OS information and providing only the OS name. We may need to refactor this function in the future if an `osVersion.format` is added. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-18 09:05:12 -08:00
Usman Akinyemi	0a78d61247	version: refactor get_uname_info() Some code from "builtin/bugreport.c" uses uname(2) to get system information. Let's refactor this code into a new get_uname_info() function, so that we can reuse it in a following commit. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-18 09:05:12 -08:00
Junio C Hamano	c3fffcfe8e	Merge branch 'bf/fetch-set-head-fix' Fetching into a bare repository incorrectly assumed it always used a mirror layout when deciding to update remote-tracking HEAD, which has been corrected. * bf/fetch-set-head-fix: fetch set_head: fix non-mirror remotes in bare repositories fetch set_head: refactor to use remote directly	2025-02-14 17:53:48 -08:00
Junio C Hamano	5785d9143b	Merge branch 'tc/clone-single-revision' "git clone" learned to make a shallow clone for a single commit that is not necessarily be at the tip of any branch. * tc/clone-single-revision: builtin/clone: teach git-clone(1) the --revision= option parse-options: introduce die_for_incompatible_opt2() clone: introduce struct clone_opts in builtin/clone.c clone: add tags refspec earlier to fetch refspec clone: refactor wanted_peer_refs() clone: make it possible to specify --tags clone: cut down on global variables in clone.c	2025-02-14 17:53:48 -08:00
Junio C Hamano	998c5f0c75	Merge branch 'ms/refspec-cleanup' Code clean-up. cf. <Z6G-toOJjMmK8iJG@pks.im> * ms/refspec-cleanup: refspec: relocate apply_refspecs and related funtions refspec: relocate matching related functions remote: rename query_refspecs functions refspec: relocate refname_matches_negative_refspec_item remote: rename function omit_name_by_refspec	2025-02-12 10:08:54 -08:00
Junio C Hamano	5b9d01bc4d	Merge branch 'zh/gc-expire-to' "git gc" learned the "--expire-to" option and passes it down to underlying "git repack". * zh/gc-expire-to: gc: add `--expire-to` option	2025-02-12 10:08:53 -08:00
Junio C Hamano	07c401d392	Merge branch 'ps/repack-keep-unreachable-in-unpacked-repo' "git repack --keep-unreachable" to send unreachable objects to the main pack "git repack -ad" produces did not work when there is no existing packs, which has been corrected. * ps/repack-keep-unreachable-in-unpacked-repo: builtin/repack: fix `--keep-unreachable` when there are no packs	2025-02-12 10:08:52 -08:00
Junio C Hamano	aae91a86fb	Merge branch 'ds/name-hash-tweaks' "git pack-objects" and its wrapper "git repack" learned an option to use an alternative path-hash function to improve delta-base selection to produce a packfile with deeper history than window size. * ds/name-hash-tweaks: pack-objects: prevent name hash version change test-tool: add helper for name-hash values p5313: add size comparison test pack-objects: add GIT_TEST_NAME_HASH_VERSION repack: add --name-hash-version option pack-objects: add --name-hash-version option pack-objects: create new name-hash function version	2025-02-12 10:08:51 -08:00
Usman Akinyemi	62898b8f5e	builtin/update-server-info: remove the_repository global variable Remove the_repository global variable in favor of the repository argument that gets passed in "builtin/update-server-info.c". When `-h` is passed to the command outside a Git repository, the `run_builtin()` will call the `cmd_update_server_info()` function with `repo` set to NULL and then early in the function, "parse_options()" call will give the options help and exit, without having to consult much of the configuration file. So it is safe to omit reading the config when `repo` argument the caller gave us is NULL. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-10 16:20:21 -08:00
Junio C Hamano	246569bf83	Merge branch 'ps/hash-cleanup' Further code clean-up on the use of hash functions. Now the context object knows what hash function it is working with. * ps/hash-cleanup: global: adapt callers to use generic hash context helpers hash: provide generic wrappers to update hash contexts hash: stop typedeffing the hash context hash: convert hashing context to a structure	2025-02-10 10:18:31 -08:00
Patrick Steinhardt	07242c2a5a	path: drop `git_common_path()` in favor of `repo_common_path()` Remove `git_common_path()` in favor of the `repo_common_path()` family of functions, which makes the implicit dependency on `the_repository` go away. Note that `git_common_path()` used to return a string allocated via `get_pathname()`, which uses a rotating set of statically allocated buffers. Consequently, callers didn't have to free the returned string. The same isn't true for `repo_common_path()`, so we also have to add logic to free the returned strings. This refactoring also allows us to remove `repo_common_pathv()` from the public interface. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-07 09:59:23 -08:00
Patrick Steinhardt	8e4710f011	worktree: return allocated string from `get_worktree_git_dir()` The `get_worktree_git_dir()` function returns a string constant that does not need to be free'd by the caller. This string is computed for three different cases: - If we don't have a worktree we return a path into the Git directory. The returned string is owned by `the_repository`, so there is no need for the caller to free it. - If we have a worktree, but no worktree ID then the caller requests the main worktree. In this case we return a path into the common directory, which again is owned by `the_repository` and thus does not need to be free'd. - In the third case, where we have an actual worktree, we compute the path relative to "$GIT_COMMON_DIR/worktrees/". This string does not need to be released either, even though `git_common_path()` ends up allocating memory. But this doesn't result in a memory leak either because we write into a buffer returned by `get_pathname()`, which returns one out of four static buffers. We're about to drop `git_common_path()` in favor of `repo_common_path()`, which doesn't use the same mechanism but instead returns an allocated string owned by the caller. While we could adapt `get_worktree_git_dir()` to also use `get_pathname()` and print the derived common path into that buffer, the whole schema feels a lot like premature optimization in this context. There are some callsites where we call `get_worktree_git_dir()` in a loop that iterates through all worktrees. But none of these loops seem to be even remotely in the hot path, so saving a single allocation there does not feel worth it. Refactor the function to instead consistently return an allocated path so that we can start using `repo_common_path()` in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-07 09:59:23 -08:00
Patrick Steinhardt	3859e39659	path: drop `git_path_buf()` in favor of `repo_git_path_replace()` Remove `git_path_buf()` in favor of `repo_git_path_replace()`. The latter does essentially the same, with the only exception that it does not rely on `the_repository` but takes the repo as separate parameter. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-07 09:59:22 -08:00
Patrick Steinhardt	bba59f58a4	path: drop `git_pathdup()` in favor of `repo_git_path()` Remove `git_pathdup()` in favor of `repo_git_path()`. The latter does essentially the same, with the only exception that it does not rely on `the_repository` but takes the repo as separate parameter. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-07 09:59:22 -08:00
Patrick Steinhardt	f5c714e2a7	path: refactor `repo_submodule_path()` family of functions As explained in an earlier commit, we're refactoring path-related functions to provide a consistent interface for computing paths into the commondir, gitdir and worktree. Refactor the "submodule" family of functions accordingly. Note that in contrast to the other `repo_*_path()` families, we have to pass in the repository as a non-constant pointer. This is because we end up calling `repo_read_gitmodules()` deep down in the callstack, which may end up modifying the repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-07 09:59:22 -08:00
Patrick Steinhardt	f9467895d8	submodule: refactor `submodule_to_gitdir()` to accept a repo The `submodule_to_gitdir()` function implicitly uses `the_repository` to resolve submodule paths. Refactor the function to instead accept a repo as parameter to remove the dependency on global state. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-07 09:59:21 -08:00
David Aguilar	7c2f291943	difftool: eliminate use of USE_THE_REPOSITORY_VARIABLE Remove the USE_THE_REPOSITORY_VARIABLE #define now that all state is passed to each function from callers. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 13:00:21 -08:00
David Aguilar	a24953f3df	difftool: eliminate use of the_repository Make callers pass a repository struct into each function instead of relying on the global the_repository variable. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 13:00:21 -08:00
David Aguilar	8241ae63d8	difftool: eliminate use of global variables Move difftool's global variables into a difftools_option struct in preparation for removal of USE_THE_REPOSITORY_VARIABLE. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 13:00:21 -08:00
Toon Claes	337855629f	builtin/clone: teach git-clone(1) the --revision= option The git-clone(1) command has the option `--branch` that allows the user to select the branch they want HEAD to point to. In a non-bare repository this also checks out that branch. Option `--branch` also accepts a tag. When a tag name is provided, the commit this tag points to is checked out and HEAD is detached. Thus `--branch` can be used to clone a repository and check out a ref kept under `refs/heads` or `refs/tags`. But some other refs might be in use as well. For example Git forges might use refs like `refs/pull/<id>` and `refs/merge-requests/<id>` to track pull/merge requests. These refs cannot be selected upon git-clone(1). Add option `--revision` to git-clone(1). This option accepts a fully qualified reference, or a hexadecimal commit ID. This enables the user to clone and check out any revision they want. `--revision` can be used in conjunction with `--depth` to do a minimal clone that only contains the blob and tree for a single revision. This can be useful for automated tests running in CI systems. Using option `--branch` and `--single-branch` together is a similar scenario, but serves a different purpose. Using these two options, a singlet remote tracking branch is created and the fetch refspec is set up so git-fetch(1) will receive updates on that branch from the remote. This allows the user work on that single branch. Option `--revision` on contrary detaches HEAD, creates no tracking branches, and writes no fetch refspec. Signed-off-by: Toon Claes <toon@iotcl.com> Acked-by: Patrick Steinhardt <ps@pks.im> [jc: removed unnecessary TEST_PASSES_SANITIZE_LEAK from the test] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 12:26:42 -08:00
Toon Claes	9144b9362b	parse-options: introduce die_for_incompatible_opt2() The functions die_for_incompatible_opt3() and die_for_incompatible_opt4() already exist to die whenever a user specifies three or four options respectively that are not compatible. Introduce die_for_incompatible_opt2() which dies when two options that are incompatible are set. Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 12:23:54 -08:00
Toon Claes	7a52a8c7d8	clone: introduce struct clone_opts in builtin/clone.c There is a lot of state stored in global variables in builtin/clone.c. In the long run we'd like to remove many of those. Introduce `struct clone_opts` in this file. This struct will be used to contain all details needed to perform the clone. The struct object can be thrown around to all the functions that need these details. The first field we're adding is `wants_head`. In some scenarios (specifically when both `--single-branch` and `--branch` are given) we are not interested in `HEAD` on the remote. The field `wants_head` in `struct clone_opts` will hold this information. We could have put `option_branch` and `option_single_branch` into that struct instead, but in a following commit we'll be using `wants_head` as well. Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 12:23:54 -08:00
Toon Claes	2ca67c6f14	clone: add tags refspec earlier to fetch refspec In clone.c we call refspec_ref_prefixes() to copy the fetch refspecs from the `remote->fetch` refspec into `ref_prefixes` of `transport_ls_refs_options`. Afterwards we add the tags prefix `refs/tags/` prefix as well. At a later point, in wanted_peer_refs() we process refs using both `remote->fetch` and `TAG_REFSPEC`. Simplify the code by appending `TAG_REFSPEC` to `remote->fetch` before calling refspec_ref_prefixes(). To be able to do this, we set `option_tags` to 0 when --mirror is given. This is because --mirror mirrors (hence the name) all the refs, including tags and they do not need to be treated separately. Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 12:23:53 -08:00
Toon Claes	879780f9a1	clone: refactor wanted_peer_refs() The function wanted_peer_refs() is used to map the refs returned by the server to refs we will save in our clone. Over time this function grown to be very complex. Refactor it. Previously, there was a separate code path for when `option_single_branch` was set. It resulted in duplicated code and deeper nested conditions. After this refactor the code path for when `option_single_branch` is truthy modifies `refs` and then falls through to the common code path. This approach relies on the `refspec` being set correctly and thus only mapping refs that are relevant. Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 12:23:53 -08:00
Toon Claes	bc26f7690a	clone: make it possible to specify --tags Option --no-tags was added in `0dab2468ee` (clone: add a --no-tags option to clone without tags, 2017-04-26). At the time there was no need to support --tags as well, although there was some conversation about it[1]. To simplify the code and to prepare for future commits, invert the flag internally. Functionally there is no change, because the flag is default-enabled passing `--tags` has no effect, so there's no need to add tests for this. [1]: https://lore.kernel.org/git/CAGZ79kbHuMpiavJ90kQLEL_AR0BEyArcZoEWAjPPhOFacN16YQ@mail.gmail.com/ Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 12:23:53 -08:00
Toon Claes	7f420a6bda	clone: cut down on global variables in clone.c In clone.c the `struct option` which is used to parse the input options for git-clone(1) is a global variable. Due to this, many variables that are used to parse the value into, are also global. Make `builtin_clone_options` a local variable in cmd_clone() and carry along all variables that are only used in that function. Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 12:23:53 -08:00
Justin Tobler	3295c35398	rev-list: extend print-info to print missing object type Additional information about missing objects found in git-rev-list(1) can be printed by specifying the `print-info` missing action for the `--missing` option. Extend this action to also print missing object type information inferred from its containing object. This token follows the form `type=<type>` and specifies the expected object type of the missing object. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-05 09:32:01 -08:00
Justin Tobler	c6d896bcfd	rev-list: add print-info action to print missing object path Missing objects identified through git-rev-list(1) can be printed by setting the `--missing=print` option. Additional information about the missing object, such as its path and type, may be present in its containing object. Add the `print-info` missing action for the `--missing` option that, when set, prints additional insight about the missing object inferred from its containing object. Each line of output for a missing object is in the form: `?<oid> [<token>=<value>]...`. The `<token>=<value>` pairs containing additional information are separated from each other by a SP. The value is encoded in a token specific fashion, but SP or LF contained in value are always expected to be represented in such a way that the resulting encoded value does not have either of these two problematic bytes. This format is kept generic so it can be extended in the future to support additional information. For now, only a missing object path info is implemented. It follows the form `path=<path>` and specifies the full path to the object from the top-level tree. A path containing SP or special characters is enclosed in double-quotes in the C style as needed. In a subsequent commit, missing object type info will also be added. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-05 09:32:01 -08:00
Patrick Steinhardt	414c82300a	builtin/repack: fix `--keep-unreachable` when there are no packs The "--keep-unreachable" flag is supposed to append any unreachable objects to the newly written pack. This flag is explicitly documented as appending both packed and loose unreachable objects to the new packfile. And while this works alright when repacking with preexisting packfiles, it stops working when the repository does not have any packfiles at all. The root cause are the conditions used to decide whether or not we want to append "--pack-loose-unreachable" to git-pack-objects(1). There are a couple of conditions here: - `has_existing_non_kept_packs()` checks whether there are existing packfiles. This condition makes sense to guard "--keep-pack=", "--unpack-unreachable" and "--keep-unreachable", because all of these flags only make sense in combination with existing packfiles. But it does not make sense to disable `--pack-loose-unreachable` when there aren't any preexisting packfiles, as loose objects can be packed into the new packfile regardless of that. - `delete_redundant` checks whether we want to delete any objects or packs that are about to become redundant. The documentation of `--keep-unreachable` explicitly says that `git repack -ad` needs to be executed for the flag to have an effect. It is not immediately obvious why such redundant objects need to be deleted in order for "--pack-unreachable-objects" to be effective. But as things are working as documented this is nothing we'll change for now. - `pack_everything & PACK_CRUFT` checks that we're not creating a cruft pack. This condition makes sense in the context of "--pack-loose-unreachable", as unreachable objects would end up in the cruft pack anyway. So while the second and third condition are sensible, it does not make any sense to condition `--pack-loose-unreachable` on the existence of packfiles. Fix the bug by splitting out the "--pack-loose-unreachable" and only making it depend on the second and third condition. Like this, loose unreachable objects will be packed regardless of any preexisting packfiles. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-04 09:58:02 -08:00
Meet Soni	be0905fed1	remote: rename query_refspecs functions Rename functions related to handling refspecs in preparation for their move from `remote.c` to `refspec.c`. Update their names to better reflect their intent: - `query_refspecs()` -> `refspec_find_match()` for clarity, as it finds a single matching refspec. - `query_refspecs_multiple()` -> `refspec_find_all_matches()` to better reflect that it collects all matching refspecs instead of returning just the first match. - `query_matches_negative_refspec()` -> `refspec_find_negative_match()` for consistency with the updated naming convention, even though this static function didn't strictly require renaming. Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-04 09:51:41 -08:00
Meet Soni	e4f6ab0085	remote: rename function omit_name_by_refspec Rename the function `omit_name_by_refspec()` to `refname_matches_negative_refspec_item()` to provide clearer intent. The previous function name was vague and did not accurately describe its purpose. By using `refname_matches_negative_refspec_item`, make the function's purpose more intuitive, clarifying that it checks if a reference name matches any negative refspec. Rename function parameters for consistency with existing naming conventions. Use `refname` instead of `name` to align with terminology in `refs.h`. Remove the redundant doc comment since the function name is now self-explanatory. Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-04 09:51:41 -08:00
Derrick Stolee	85127bcdea	backfill: assume --sparse when sparse-checkout is enabled The previous change introduced the '--[no-]sparse' option for the 'git backfill' command, but did not assume it as enabled by default. However, this is likely the behavior that users will most often want to happen. Without this default, users with a small sparse-checkout may be confused when 'git backfill' downloads every version of every object in the full history. However, this is left as a separate change so this decision can be reviewed independently of the value of the '--[no-]sparse' option. Add a test of adding the '--sparse' option to a repo without sparse-checkout to make it clear that supplying it without a sparse-checkout is an error. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-03 16:12:42 -08:00
Derrick Stolee	bff4555767	backfill: add --sparse option One way to significantly reduce the cost of a Git clone and later fetches is to use a blobless partial clone and combine that with a sparse-checkout that reduces the paths that need to be populated in the working directory. Not only does this reduce the cost of clones and fetches, the sparse-checkout reduces the number of objects needed to download from a promisor remote. However, history investigations can be expensive as computing blob diffs will trigger promisor remote requests for one object at a time. This can be avoided by downloading the blobs needed for the given sparse-checkout using 'git backfill' and its new '--sparse' mode, at a time that the user is willing to pay that extra cost. Note that this is distinctly different from the '--filter=sparse:<oid>' option, as this assumes that the partial clone has all reachable trees and we are using client-side logic to avoid downloading blobs outside of the sparse-checkout cone. This avoids the server-side cost of walking trees while also achieving a similar goal. It also downloads in batches based on similar path names, presenting a resumable download if things are interrupted. This augments the path-walk API to have a possibly-NULL 'pl' member that may point to a 'struct pattern_list'. This could be more general than the sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently the only consumer. Be sure to test this in both cone mode and not cone mode. Cone mode has the benefit that the path-walk can skip certain paths once they would expand beyond the sparse-checkout. Non-cone mode can describe the included files using both positive and negative patterns, which changes the possible return values of path_matches_pattern_list(). Test both kinds of matches for increased coverage. To test this, we can create a blobless sparse clone, expand the sparse-checkout slightly, and then run 'git backfill --sparse' to see how much data is downloaded. The general steps are 1. git clone --filter=blob:none --sparse <url> 2. git sparse-checkout set <dir1> ... <dirN> 3. git backfill --sparse For the Git repository with the 'builtin' directory in the sparse-checkout, we get these results for various batch sizes: \| Batch Size \| Pack Count \| Pack Size \| Time \| \|-----------------\|------------\|-----------\|-------\| \| (Initial clone) \| 3 \| 110 MB \| \| \| 10K \| 12 \| 192 MB \| 17.2s \| \| 15K \| 9 \| 192 MB \| 15.5s \| \| 20K \| 8 \| 192 MB \| 15.5s \| \| 25K \| 7 \| 192 MB \| 14.7s \| This case matters less because a full clone of the Git repository from GitHub is currently at 277 MB. Using a copy of the Linux repository with the 'kernel/' directory in the sparse-checkout, we get these results: \| Batch Size \| Pack Count \| Pack Size \| Time \| \|-----------------\|------------\|-----------\|------\| \| (Initial clone) \| 2 \| 1,876 MB \| \| \| 10K \| 11 \| 2,187 MB \| 46s \| \| 25K \| 7 \| 2,188 MB \| 43s \| \| 50K \| 5 \| 2,194 MB \| 44s \| \| 100K \| 4 \| 2,194 MB \| 48s \| This case is more meaningful because a full clone of the Linux repository is currently over 6 GB, so this is a valuable way to download a fraction of the repository and no longer need network access for all reachable objects within the sparse-checkout. Choosing a batch size will depend on a lot of factors, including the user's network speed or reliability, the repository's file structure, and how many versions there are of the file within the sparse-checkout scope. There will not be a one-size-fits-all solution. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-03 16:12:42 -08:00
Derrick Stolee	6840fe9ee2	backfill: add --min-batch-size=<n> option Users may want to specify a minimum batch size for their needs. This is only a minimum: the path-walk API provides a list of OIDs that correspond to the same path, and thus it is optimal to allow delta compression across those objects in a single server request. We could consider limiting the request to have a maximum batch size in the future. For now, we let the path-walk API batches determine the boundaries. To get a feeling for the value of specifying the --min-batch-size parameter, I tested a number of open source repositories available on GitHub. The procedure was generally: 1. git clone --filter=blob:none <url> 2. git backfill Checking the number of packfiles and the size of the .git/objects/pack directory helps to identify the effects of different batch sizes. For the Git repository, we get these results: \| Batch Size \| Pack Count \| Pack Size \| Time \| \|-----------------\|------------\|-----------\|-------\| \| (Initial clone) \| 2 \| 119 MB \| \| \| 25K \| 8 \| 290 MB \| 24s \| \| 50K \| 5 \| 290 MB \| 24s \| \| 100K \| 4 \| 290 MB \| 29s \| Other than the packfile counts decreasing as we need fewer batches, the size and time required is not changing much for this small example. For the nodejs/node repository, we see these results: \| Batch Size \| Pack Count \| Pack Size \| Time \| \|-----------------\|------------\|-----------\|--------\| \| (Initial clone) \| 2 \| 330 MB \| \| \| 25K \| 19 \| 1,222 MB \| 1m 22s \| \| 50K \| 11 \| 1,221 MB \| 1m 24s \| \| 100K \| 7 \| 1,223 MB \| 1m 40s \| \| 250K \| 4 \| 1,224 MB \| 2m 23s \| \| 500K \| 3 \| 1,216 MB \| 4m 38s \| Here, we don't have much difference in the size of the repo, though the 500K batch size results in a few MB gained. That comes at a cost of a much longer time. This extra time is due to server-side delta compression happening as the on-disk deltas don't appear to be reusable all the time. But for smaller batch sizes, the server is able to find reasonable deltas partly because we are asking for objects that appear in the same region of the directory tree and include all versions of a file at a specific path. To contrast this example, I tested the microsoft/fluentui repo, which has been known to have inefficient packing due to name hash collisions. These results are found before GitHub had the opportunity to repack the server with more advanced name hash versions: \| Batch Size \| Pack Count \| Pack Size \| Time \| \|-----------------\|------------\|-----------\|--------\| \| (Initial clone) \| 2 \| 105 MB \| \| \| 5K \| 53 \| 348 MB \| 2m 26s \| \| 10K \| 28 \| 365 MB \| 2m 22s \| \| 15K \| 19 \| 407 MB \| 2m 21s \| \| 20K \| 15 \| 393 MB \| 2m 28s \| \| 25K \| 13 \| 417 MB \| 2m 06s \| \| 50K \| 8 \| 509 MB \| 1m 34s \| \| 100K \| 5 \| 535 MB \| 1m 56s \| \| 250K \| 4 \| 698 MB \| 1m 33s \| \| 500K \| 3 \| 696 MB \| 1m 42s \| Here, a larger variety of batch sizes were chosen because of the great variation in results. By asking the server to download small batches corresponding to fewer paths at a time, the server is able to provide better compression for these batches than it would for a regular clone. A typical full clone for this repository would require 738 MB. This example justifies the choice to batch requests by path name, leading to improved communication with a server that is not optimally packed. Finally, the same experiment for the Linux repository had these results: \| Batch Size \| Pack Count \| Pack Size \| Time \| \|-----------------\|------------\|-----------\|---------\| \| (Initial clone) \| 2 \| 2,153 MB \| \| \| 25K \| 63 \| 6,380 MB \| 14m 08s \| \| 50K \| 58 \| 6,126 MB \| 15m 11s \| \| 100K \| 30 \| 6,135 MB \| 18m 11s \| \| 250K \| 14 \| 6,146 MB \| 18m 22s \| \| 500K \| 8 \| 6,143 MB \| 33m 29s \| Even in this example, where the default name hash algorithm leads to decent compression of the Linux kernel repository, there is value for selecting a smaller batch size, to a limit. The 25K batch size has the fastest time, but uses 250 MB more than the 50K batch size. The 500K batch size took much more time due to server compression time and thus we should avoid large batch sizes like this. Based on these experiments, a batch size of 50,000 was chosen as the default value. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-03 16:12:42 -08:00
Derrick Stolee	1e72e889e7	backfill: basic functionality and tests The default behavior of 'git backfill' is to fetch all missing blobs that are reachable from HEAD. Document and test this behavior. The implementation is a very simple use of the path-walk API, initializing the revision walk at HEAD to start the path-walk from all commits reachable from HEAD. Ignore the object arrays that correspond to tree entries, assuming that they are all present already. The path-walk API provides lists of objects in batches according to a common path, but that list could be very small. We want to balance the number of requests to the server with the ability to have the process interrupted with minimal repeated work to catch up in the next run. Based on some experiments (detailed in the next change) a minimum batch size of 50,000 is selected for the default. This batch size is a _minimum_. As the path-walk API emits lists of blob IDs, they are collected into a list of objects for a request to the server. When that list is at least the minimum batch size, then the request is sent to the server for the new objects. However, the list of blob IDs from the path-walk API could be much longer than the batch size. At this moment, it is unclear if there is a benefit to split the list when there are too many objects at the same path. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-03 16:12:41 -08:00
Derrick Stolee	a3f79e9abd	backfill: add builtin boilerplate In anticipation of implementing 'git backfill', populate the necessary files with the boilerplate of a new builtin. Mark the builtin as experimental at this time, allowing breaking changes in the near future, if necessary. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-03 16:12:41 -08:00
Junio C Hamano	b83a2f9006	Merge branch 'kn/pack-write-with-reduced-globals' Code clean-up. * kn/pack-write-with-reduced-globals: pack-write: pass hash_algo to internal functions pack-write: pass hash_algo to `write_rev_file()` pack-write: pass hash_algo to `write_idx_file()` pack-write: pass repository to `index_pack_lockfile()` pack-write: pass hash_algo to `fixup_pack_header_footer()`	2025-02-03 10:23:34 -08:00
Junio C Hamano	803b5acaa7	Merge branch 'ps/3.0-remote-deprecation' Following the procedure we established to introduce breaking changes for Git 3.0, allow an early opt-in for removing support of $GIT_DIR/branches/ and $GIT_DIR/remotes/ directories to configure remotes. * ps/3.0-remote-deprecation: remote: announce removal of "branches/" and "remotes/" builtin/pack-redundant: remove subcommand with breaking changes ci: repurpose "linux-gcc" job for deprecations ci: merge linux-gcc-default into linux-gcc Makefile: wire up build option for deprecated features	2025-02-03 10:23:33 -08:00
Junio C Hamano	caf17423d3	Merge branch 'tb/unsafe-hash-cleanup' The API around choosing to use unsafe variant of SHA-1 implementation has been updated in an attempt to make it harder to abuse. * tb/unsafe-hash-cleanup: hash.h: drop unsafe_ function variants csum-file: introduce hashfile_checkpoint_init() t/helper/test-hash.c: use unsafe_hash_algo() csum-file.c: use unsafe_hash_algo() hash.h: introduce `unsafe_hash_algo()` csum-file.c: extract algop from hashfile_checksum_valid() csum-file: store the hash algorithm as a struct field t/helper/test-tool: implement sha1-unsafe helper	2025-02-03 10:23:32 -08:00
Patrick Steinhardt	0578f1e66a	global: adapt callers to use generic hash context helpers Adapt callers to use generic hash context helpers instead of using the hash algorithm to update them. This makes the callsites easier to reason about and removes the possibility that the wrong hash algorithm is used to update the hash context's state. And as a nice side effect this also gets rid of a bunch of users of `the_hash_algo`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-31 10:06:11 -08:00
Patrick Steinhardt	7346e340f1	hash: stop typedeffing the hash context We generally avoid using `typedef` in the Git codebase. One exception though is the `git_hash_ctx`, likely because it used to be a union rather than a struct until the preceding commit refactored it. But now that it is a normal `struct` there isn't really a need for a typedef anymore. Drop the typedef and adapt all callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-31 10:06:10 -08:00
Junio C Hamano	0cbcba5455	Merge branch 'tb/unsafe-hash-cleanup' into ps/hash-cleanup * tb/unsafe-hash-cleanup: hash.h: drop unsafe_ function variants csum-file: introduce hashfile_checkpoint_init() t/helper/test-hash.c: use unsafe_hash_algo() csum-file.c: use unsafe_hash_algo() hash.h: introduce `unsafe_hash_algo()` csum-file.c: extract algop from hashfile_checksum_valid() csum-file: store the hash algorithm as a struct field t/helper/test-tool: implement sha1-unsafe helper	2025-01-31 10:05:46 -08:00
Junio C Hamano	81309f424b	Merge branch 'jc/show-index-h-update' Doc and short-help text for "show-index" has been clarified to stress that the command reads its data from the standard input. * jc/show-index-h-update: show-index: the short help should say the command reads from its input	2025-01-31 09:44:16 -08:00
Junio C Hamano	8d6240d4c6	Merge branch 'rs/ref-fitler-used-atoms-value-fix' "git branch --sort=..." and "git for-each-ref --format=... --sort=..." did not work as expected with some atoms, which has been corrected. * rs/ref-fitler-used-atoms-value-fix: ref-filter: remove ref_format_clear() ref-filter: move is-base tip to used_atom ref-filter: move ahead-behind bases into used_atom	2025-01-29 14:05:09 -08:00
Junio C Hamano	de56e1d746	Merge branch 'ja/doc-commit-markup-updates' Doc updates. * ja/doc-commit-markup-updates: doc: migrate git-commit manpage secondary files to new format doc: convert git commit config to new format doc: make more direct explanations in git commit options doc: the mode param of -u of git commit is optional doc: apply new documentation guidelines to git commit	2025-01-29 14:05:09 -08:00
Junio C Hamano	73e055d71e	Merge branch 'mh/credential-cache-authtype-request-fix' The "cache" credential back-end did not handle authtype correctly, which has been corrected. * mh/credential-cache-authtype-request-fix: credential-cache: respect authtype capability	2025-01-28 13:02:24 -08:00
Junio C Hamano	f8b9821f7d	Merge branch 'jk/pack-header-parse-alignment-fix' It was possible for "git unpack-objects" and "git index-pack" to make an unaligned access, which has been corrected. * jk/pack-header-parse-alignment-fix: index-pack, unpack-objects: use skip_prefix to avoid magic number index-pack, unpack-objects: use get_be32() for reading pack header parse_pack_header_option(): avoid unaligned memory writes packfile: factor out --pack_header argument parsing bswap.h: squelch potential sparse -Wcast-truncate warnings	2025-01-28 13:02:23 -08:00
Junio C Hamano	f0a371a39d	Merge branch 'jc/show-usage-help' The help text from "git $cmd -h" appear on the standard output for some $cmd and the standard error for others. The built-in commands have been fixed to show them on the standard output consistently. * jc/show-usage-help: builtin: send usage() help text to standard output oddballs: send usage() help text to standard output builtins: send usage_with_options() help text to standard output usage: add show_usage_if_asked() parse-options: add show_usage_with_options_if_asked() t0012: optionally check that "-h" output goes to stdout	2025-01-28 13:02:22 -08:00
Derrick Stolee	b4cf68476a	pack-objects: prevent name hash version change When the --name-hash-version option is used in 'git pack-objects', it can change from the initial assignment to when it is used based on interactions with other arguments. Specifically, when writing or reading bitmaps, we must force version 1 for now. This could change in the future when the bitmap format can store a name hash version value, indicating which was used during the writing of the packfile. Protect the 'git pack-objects' process from getting confused by failing with a BUG() statement if the value of the name hash version changes between calls to pack_name_hash_fn(). Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-27 13:21:43 -08:00
Derrick Stolee	ce961135cc	pack-objects: add GIT_TEST_NAME_HASH_VERSION Add a new environment variable to opt-in to different values of the --name-hash-version=<n> option in 'git pack-objects'. This allows for extra testing of the feature without repeating all of the test scenarios. Unlike many GIT_TEST_* variables, we are choosing to not add this to the linux-TEST-vars CI build as that test run is already overloaded. The behavior exposed by this test variable is of low risk and should be sufficient to allow manual testing when an issue arises. But this option isn't free. There are a few tests that change behavior with the variable enabled. First, there are a few tests that are very sensitive to certain delta bases being picked. These are both involving the generation of thin bundles and then counting their objects via 'git index-pack --fix-thin' which pulls the delta base into the new packfile. For these tests, disable the option as a decent long-term option. Second, there are some tests that compare the exact output of a 'git pack-objects' process when using bitmaps. The warning that ignores the --name-hash-version=2 and forces version 1 causes these tests to fail. Disable the environment variable to get around this issue. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-27 13:21:43 -08:00
Derrick Stolee	928ef41dd8	repack: add --name-hash-version option The new '--name-hash-version' option for 'git repack' is a simple pass-through to the underlying 'git pack-objects' subcommand. However, this subcommand may have other options and a temporary filename as part of the subcommand execution that may not be predictable or could change over time. The existing test_subcommand method requires an exact list of arguments for the subcommand. This is too rigid for our needs here, so create a new method, test_subcommand_flex. Use it to check that the --name-hash-version option is passing through. Since we are modifying the 'git repack' command, let's bring its usage in line with the Documentation's synopsis. This removes it from the allow list in t0450 so it will remain in sync in the future. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-27 13:21:43 -08:00
Derrick Stolee	fc62e033cd	pack-objects: add --name-hash-version option The previous change introduced a new pack_name_hash_v2() function that intends to satisfy much of the hash locality features of the existing pack_name_hash() function while also distinguishing paths with similar final components of their paths. This change adds a new --name-hash-version option for 'git pack-objects' to allow users to select their preferred function version. This use of an integer version allows for future expansion and a direct way to later store a name hash version in the .bitmap format. For now, let's consider how effective this mechanism is when repacking a repository with different name hash versions. Specifically, we will execute 'git pack-objects' the same way a 'git repack -adf' process would, except we include --name-hash-version=<n> for testing. On the Git repository, we do not expect much difference. All path names are short. This is backed by our results: \| Stage \| Pack Size \| Repack Time \| \|-----------------------\|-----------\|-------------\| \| After clone \| 260 MB \| N/A \| \| --name-hash-version=1 \| 127 MB \| 129s \| \| --name-hash-version=2 \| 127 MB \| 112s \| This example demonstrates how there is some natural overhead coming from the cloned copy because the server is hosting many forks and has not optimized for exactly this set of reachable objects. But the full repack has similar characteristics for both versions. Let's consider some repositories that are hitting too many collisions with version 1. First, let's explore the kinds of paths that are commonly causing these collisions: * "/CHANGELOG.json" is 15 characters, and is created by the beachball [1] tool. Only the final character of the parent directory can differentiate different versions of this file, but also only the two most-significant digits. If that character is a letter, then this is always a collision. Similar issues occur with the similar "/CHANGELOG.md" path, though there is more opportunity for differences In the parent directory. * Localization files frequently have common filenames but differentiates via parent directories. In C#, the name "/strings.resx.lcl" is used for these localization files and they will all collide in name-hash. [1] https://github.com/microsoft/beachball I've come across many other examples where some internal tool uses a common name across multiple directories and is causing Git to repack poorly due to name-hash collisions. One open-source example is the fluentui [2] repo, which uses beachball to generate CHANGELOG.json and CHANGELOG.md files, and these files have very poor delta characteristics when comparing against versions across parent directories. \| Stage \| Pack Size \| Repack Time \| \|-----------------------\|-----------\|-------------\| \| After clone \| 694 MB \| N/A \| \| --name-hash-version=1 \| 438 MB \| 728s \| \| --name-hash-version=2 \| 168 MB \| 142s \| [2] https://github.com/microsoft/fluentui In this example, we see significant gains in the compressed packfile size as well as the time taken to compute the packfile. Using a collection of repositories that use the beachball tool, I was able to make similar comparisions with dramatic results. While the fluentui repo is public, the others are private so cannot be shared for reproduction. The results are so significant that I find it important to share here: \| Repo \| --name-hash-version=1 \| --name-hash-version=2 \| \|----------\|-----------------------\|-----------------------\| \| fluentui \| 440 MB \| 161 MB \| \| Repo B \| 6,248 MB \| 856 MB \| \| Repo C \| 37,278 MB \| 6,755 MB \| \| Repo D \| 131,204 MB \| 7,463 MB \| Future changes could include making --name-hash-version implied by a config value or even implied by default during a full repack. It is important to point out that the name hash value is stored in the .bitmap file format, so we must force --name-hash-version=1 when bitmaps are being read or written. Later, the bitmap format could be updated to be aware of the name hash version so deltas can be quickly computed across the bitmapped/not-bitmapped boundary. To promote the safety of this parameter, the validate_name_hash_version() method will die() if the given name-hash version is incorrect and will disable newer versions if not yet compatible with other features, such as --write-bitmap-index. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-27 13:21:41 -08:00
Bence Ferdinandy	93dc16483a	fetch set_head: fix non-mirror remotes in bare repositories In `b1b713f722` (fetch set_head: handle mirrored bare repositories, 2024-11-22) it was implicitly assumed that all remotes will be mirrors in a bare repository, thus fetching a non-mirrored remote could lead to HEAD pointing to a non-existent reference. Make sure we only overwrite HEAD if we are in a bare repository and fetching from a mirror. Otherwise, proceed as normally, and create refs/remotes/<nonmirrorremote>/HEAD instead. Reported-by: Christian Hesse <list@eworm.de> Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-27 08:16:47 -08:00
Bence Ferdinandy	638060dcb9	fetch set_head: refactor to use remote directly As a preparatory step to use even more properties from the remote struct, refactor set_head to take the entire struct as a parameter, instead of the necessary bits. This also allows consolidating the use of gtransport->remote in set_head, making the access of the remote's properties consistent in the function. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-27 08:16:45 -08:00
ZheNing Hu	08032fa30f	gc: add `--expire-to` option This commit extends the functionality of `git gc` by adding a new option, `--expire-to=<dir>`. Previously, this feature was implemented in `91badeba32` (builtin/repack.c: implement `--expire-to` for storing pruned objects, 2022-10-24), which allowing users to specify a directory where unreachable and expired cruft packs are stored during garbage collection. However, users had to run `git repack --cruft --expire-to=<dir>` followed by `git prune` to achieve similar results within `git gc`. By introducing `--expire-to=<dir>` directly into `git gc`, we simplify the process for users who wish to manage their repository's cleanup more efficiently. This change involves passing the `--expire-to=<dir>` parameter through to `git repack`, making it easier for users to set up a backup location for cruft packs that will be pruned. Due to the original `git gc --prune=now` deleting all unreachable objects by passing the `-a` parameter to git repack. With the addition of the `--cruft` and `--expire-to` options, it is necessary to modify this default behavior: instead of deleting these unreachable objects, they should be merged into a cruft pack and collected in a specified directory. Therefore, we do not pass `-a` to the repack command but instead pass `--cruft`, `--expire-to`, and `--cruft-expiration=now` to repack. Signed-off-by: ZheNing Hu <adlternative@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-24 14:32:28 -08:00
Patrick Steinhardt	8ccc75c245	remote: announce removal of "branches/" and "remotes/" Back when Git was in its infancy, remotes were configured via separate files in "branches/" (back in 2005). This mechanism was replaced later that year with the "remotes/" directory. Both mechanisms have eventually been replaced by config-based remotes, and it is very unlikely that anybody still uses these directories to configure their remotes. Both of these directories have been marked as deprecated, one in 2005 and the other one in 2011. Follow through with the deprecation and finally announce the removal of these features in Git 3.0. Signed-off-by: Patrick Steinhardt <ps@pks.im> [jc: with a small tweak to the help message] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-24 08:08:56 -08:00
Taylor Blau	a8dd3821fe	csum-file: introduce hashfile_checkpoint_init() In `106140a99f` (builtin/fast-import: fix segfault with unsafe SHA1 backend, 2024-12-30) and `9218c0bfe1` (bulk-checkin: fix segfault with unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to initialize a hashfile_checkpoint with the same hash function implementation as is used by the hashfile it is used to checkpoint. While both `106140a99f` and `9218c0bfe1` work around the immediate crash, changing the hash function implementation within the hashfile API to, for example, the non-unsafe variant would re-introduce the crash. This is a result of the tight coupling between initializing hashfiles and hashfile_checkpoints. Introduce and use a new function which ensures that both parts of a hashfile and hashfile_checkpoint pair use the same hash function implementation to avoid such crashes. A few things worth noting: - In the change to builtin/fast-import.c::stream_blob(), we can see that by removing the explicit reference to 'the_hash_algo->unsafe_init_fn()', we are hardened against the hashfile API changing away from the_hash_algo (or its unsafe variant) in the future. - The bulk-checkin code no longer needs to explicitly zero-initialize the hashfile_checkpoint, since it is now done as a result of calling 'hashfile_checkpoint_init()'. - Also in the bulk-checkin code, we add an additional call to prepare_to_stream() outside of the main loop in order to initialize 'state->f' so we know which hash function implementation to use when calling 'hashfile_checkpoint_init()'. This is OK, since subsequent 'prepare_to_stream()' calls are noops. However, we only need to call 'prepare_to_stream()' when we have the HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling 'prepare_to_stream()' does not assign 'state->f', so we have nothing to initialize. - Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are appropriately guarded. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-23 10:28:17 -08:00
Karthik Nayak	6b2aa7fd37	pack-write: pass hash_algo to `write_rev_file()` The `write_rev_file()` function uses the global `the_hash_algo` variable to access the repository's hash_algo. To avoid global variable usage, pass a hash_algo from the layers above. Also modify children functions `write_rev_file_order()` and `write_rev_header()` to accept 'the_hash_algo'. Altough the layers above could have access to the hash_algo internally, simply pass in `the_hash_algo`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. However, in `midx-write.c`, since all usage of global variables is removed, don't reintroduce them and instead use the `repo` available in the context. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 12:36:34 -08:00
Karthik Nayak	7653e9af9b	pack-write: pass hash_algo to `write_idx_file()` The `write_idx_file()` function uses the global `the_hash_algo` variable to access the repository's hash_algo. To avoid global variable usage, pass a hash_algo from the layers above. Since `stage_tmp_packfiles()` also resides in 'pack-write.c' and calls `write_idx_file()`, update it to accept a `struct git_hash_algo` as a parameter and pass it through to the callee. Altough the layers above could have access to the hash_algo internally, simply pass in `the_hash_algo`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 12:36:34 -08:00
Karthik Nayak	e2f6f76585	pack-write: pass repository to `index_pack_lockfile()` The `index_pack_lockfile()` function uses the global `the_repository` variable to access the repository. To avoid global variable usage, pass the repository from the layers above. Altough the layers above could have access to the repository internally, simply pass in `the_repository`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 12:36:34 -08:00
Karthik Nayak	8244d01de6	pack-write: pass hash_algo to `fixup_pack_header_footer()` The `fixup_pack_header_footer()` function uses the global `the_hash_algo` variable to access the repository's hash function. To avoid global variable usage, pass a hash_algo from the layers above. Altough the layers above could have access to the hash_algo internally, simply pass in `the_hash_algo`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 12:36:34 -08:00
René Scharfe	c5490ce9d1	ref-filter: remove ref_format_clear() Now that ref_format_clear() no longer releases any memory we don't need it anymore. Remove it and its counterpart, ref_format_init(). Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 09:06:24 -08:00
René Scharfe	5e58db6575	ref-filter: move ahead-behind bases into used_atom verify_ref_format() parses a ref-filter format string and stores recognized items in the static array "used_atom". For "ahead-behind:<committish>" it stores the committish part in a string_list member "bases" of struct ref_format. ref_sorting_options() also parses bare ref-filter format items and stores stores recognized ones in "used_atom" as well. The committish parts go to a dummy struct ref_format in parse_sorting_atom(), though, and are leaked and forgotten. If verify_ref_format() is called before ref_sorting_options(), like in git for-each-ref, then all works well if the sort key is included in the format string. If it isn't then sorting cannot work as the committishes are missing. If ref_sorting_options() is called first, like in git branch, then we have the additional issue that if the sort key is included in the format string then filter_ahead_behind() can't see its committish, will not generate any results for it and thus it will be expanded to an empty string. Fix those issues by replacing the string_list with a field in used_atom for storing the committish. This way it can be shared for handling both ref-filter format strings and sorting options in the same command. Reported-by: Ross Goldberg <ross.goldberg@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 09:06:15 -08:00
Junio C Hamano	7b39a128c8	Merge branch 'ps/the-repository' More code paths have a repository passed through the callchain, instead of assuming the primary the_repository object. * ps/the-repository: match-trees: stop using `the_repository` graph: stop using `the_repository` add-interactive: stop using `the_repository` tmp-objdir: stop using `the_repository` resolve-undo: stop using `the_repository` credential: stop using `the_repository` mailinfo: stop using `the_repository` diagnose: stop using `the_repository` server-info: stop using `the_repository` send-pack: stop using `the_repository` serve: stop using `the_repository` trace: stop using `the_repository` pager: stop using `the_repository` progress: stop using `the_repository`	2025-01-21 08:44:54 -08:00
Junio C Hamano	d6a7cace21	Merge branch 'jt/fsck-skiplist-parse-fix' A misconfigured "fsck.skiplist" configuration variable was not diagnosed as an error, which has been corrected. * jt/fsck-skiplist-parse-fix: fsck: reject misconfigured fsck.skipList	2025-01-21 08:44:53 -08:00
Junio C Hamano	cb441e1ec3	Merge branch 'ps/reftable-get-random-fix' The code to compute "unique" name used git_rand() which can fail or get stuck; the callsite does not require cryptographic security. Introduce the "insecure" mode and use it appropriately. * ps/reftable-get-random-fix: reftable/stack: accept insecure random bytes wrapper: allow generating insecure random bytes	2025-01-21 08:44:53 -08:00
Jeff King	98046591b9	index-pack, unpack-objects: use skip_prefix to avoid magic number When parsing --pack_header=, we manually skip 14 bytes to the data. Let's use skip_prefix() to do this automatically. Note that we overwrite our pointer to the front of the string, so we have to add more context to the error message. We could avoid this by declaring an extra pointer to hold the value, but I think the modified message is actually preferable; it should give translators a bit more context. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 08:42:56 -08:00
Jeff King	f1299bff26	index-pack, unpack-objects: use get_be32() for reading pack header Both of these commands read the incoming pack into a static unsigned char buffer in BSS, and then parse it by casting the start of the buffer to a struct pack_header. This can result in SIGBUS on some platforms if the compiler doesn't place the buffer in a position that is properly aligned for 4-byte integers. This reportedly happens with unpack-objects (but not index-pack) on sparc64 when compiled with clang (but not gcc). But we are definitely in the wrong in both spots; since the buffer's type is unsigned char, we can't depend on larger alignment. When it works it is only because we are lucky. We'll fix this by switching to get_be32() to read the headers (just like the last few commits similarly switched us to put_be32() for writing into the same buffer). It would be nice to factor this out into a common helper function, but the interface ends up quite awkward. Either the caller needs to hardcode how many bytes we'll need, or it needs to pass us its fill()/use() functions as pointers. So I've just fixed both spots in the same way; this is not code that is likely to be repeated a third time (most of the pack reading code uses an mmap'd buffer, which should be properly aligned). I did make one tweak to the shared code: our pack_version_ok() macro expects us to pass the big-endian value we'd get by casting. We can introduce a "native" variant which uses the host integer ordering. Reported-by: Koakuma <koachan@protonmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 08:42:56 -08:00
Jeff King	798e0f4516	packfile: factor out --pack_header argument parsing Both index-pack and unpack-objects accept a --pack_header argument. This is an undocumented internal argument used by receive-pack and fetch to pass along information about the header of the pack, which they've already read from the incoming stream. In preparation for a bugfix, let's factor the duplicated code into a common helper. The callers are still responsible for identifying the option. While this could likewise be factored out, it is more flexible this way (e.g., if they ever started using parse-options and wanted to handle both the stuck and unstuck forms). Likewise, the callers are responsible for reporting errors, though they both just call die(). I've tweaked unpack-objects to match index-pack in marking the error for translation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 08:42:55 -08:00
Junio C Hamano	f66d1423f5	builtin: send usage() help text to standard output Using the show_usage_and_exit_if_asked() helper we introduced earlier, fix callers of usage() that want to show the help text when explicitly asked by the end-user. The help text now goes to the standard output stream for them. These are the bog standard "if we got only '-h', then that is a request for help" callers. Their if (argc == 2 && !strcmp(argv[1], "-h")) usage(message); are simply replaced with show_usage_and_exit_if_asked(argc, argv, message); With this, the built-ins tested by t0012 all send their help text to their standard output stream, so the check in t0012 that was half tightened earlier is now fully tightened to insist on standard error stream being empty. Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-17 13:30:03 -08:00
Junio C Hamano	a36a822d7d	oddballs: send usage() help text to standard output Using the show_usage_if_asked() helper we introduced earlier, fix callers of usage() that want to show the help text when explicitly asked by the end-user. The help text now goes to the standard output stream for them. The callers in this step are oddballs in that their invocations of usage() are not guarded by if (argc == 2 && !strcmp(argv[1], "-h") usage(...); There are (unnecessarily) being clever ones that do things like if (argc != 2 \|\| !strcmp(argv[1], "-h") usage(...); to say "I know I take only one argument, so argc != 2 is always an error regardless of what is in argv[]. Ah, by the way, even if argc is 2, "-h" is a request for usage text, so we do the same". Some like "git var -h" just do not treat "-h" any specially, and let it take the same error code paths as a parameter error. Now we cannot do the same, so these callers are rewrittin to do the show_usage_and_exit_if_asked() first and then handle the usage error the way they used to. Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-17 13:30:03 -08:00
Junio C Hamano	b821c999ca	builtins: send usage_with_options() help text to standard output Using the show_usage_with_options_if_asked() helper we introduced earlier, fix callers of usage_with_options() that want to show the help text when explicitly asked by the end-user. The help text now goes to the standard output stream for them. The test in t7600 for "git merge -h" may want to be retired, as the same is covered by t0012 already, but it is specifically testing that the "-h" option gets a response even with a corrupt index file, so for now let's leave it there. Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-17 13:30:03 -08:00
Junio C Hamano	564b907c8a	Merge branch 'ps/more-sign-compare' More -Wsign-compare fixes. * ps/more-sign-compare: sign-compare: avoid comparing ptrdiff with an int/unsigned commit-reach: use `size_t` to track indices when computing merge bases shallow: fix -Wsign-compare warnings builtin/log: fix remaining -Wsign-compare warnings builtin/log: use `size_t` to track indices commit-reach: use `size_t` to track indices in `get_reachable_subset()` commit-reach: use `size_t` to track indices in `remove_redundant()` commit-reach: fix type of `min_commit_date` commit-reach: fix index used to loop through unsigned integer prio-queue: fix type of `insertion_ctr`	2025-01-16 16:35:14 -08:00
Jean-Noël Avila	d533c10697	doc: the mode param of -u of git commit is optional Fix the synopsis to reflect the option description. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-15 14:43:36 -08:00
Junio C Hamano	b28fb93e51	Merge branch 'ps/build-sign-compare' Last-minute fix for a regression in "git blame --abbrev=<length>" when insane <length> is specified; we used to correctly cap it to the hash output length but broke it during the cycle. * ps/build-sign-compare: builtin/blame: fix out-of-bounds write with blank boundary commits builtin/blame: fix out-of-bounds read with excessive `--abbrev`	2025-01-10 09:19:34 -08:00
Patrick Steinhardt	e7fb2ca945	builtin/blame: fix out-of-bounds write with blank boundary commits When passing the `-b` flag to git-blame(1), then any blamed boundary commits which were marked as uninteresting will not get their actual commit ID printed, but will instead be replaced by a couple of spaces. The flag can lead to an out-of-bounds write as though when combined with `--abbrev=` when the abbreviation length is longer than `GIT_MAX_HEXSZ` as we simply use memset(3p) on that array with the user-provided length directly. The result is most likely that we segfault. An obvious fix would be to cull `length` to `GIT_MAX_HEXSZ` many bytes. But when the underlying object ID is SHA1, and if the abbreviated length exceeds the SHA1 length, it would cause us to print more bytes than desired, and the result would be misaligned. Instead, fix the bug by computing the length via strlen(3p). This makes us write as many bytes as the formatted object ID requires and thus effectively limits the length of what we may end up printing to the length of its hash. If `--abbrev=` asks us to abbreviate to something shorter than the full length of the underlying hash function it would be handled by the call to printf(3p) correctly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-10 06:56:55 -08:00
Patrick Steinhardt	1fbb8d7ecb	builtin/blame: fix out-of-bounds read with excessive `--abbrev` In `6411a0a896` (builtin/blame: fix type of `length` variable when emitting object ID, 2024-12-06) we have fixed the type of the `length` variable. In order to avoid a cast from `size_t` to `int` in the call to printf(3p) with the "%.*s" formatter we have converted the code to instead use fwrite(3p), which accepts the length as a `size_t`. It was reported though that this makes us read over the end of the OID array when the provided `--abbrev=` length exceeds the length of the object ID. This is because fwrite(3p) of course doesn't stop when it sees a NUL byte, whereas printf(3p) does. Fix the bug by reverting back to printf(3p) and culling the provided length to `GIT_MAX_HEXSZ` to keep it from overflowing when cast to an `int`. Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-10 06:56:54 -08:00
M Hickford	0b43274850	credential-cache: respect authtype capability Previously, credential-cache populated authtype regardless whether "get" request had authtype capability. As documented in git-credential.txt, authtype "should not be sent unless the appropriate capability ... is provided". Add test. Without this change, the test failed because "credential fill" printed an incomplete credential with only protocol and host attributes (the unexpected authtype attribute was discarded by credential.c). Signed-off-by: M Hickford <mirth.hickford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-09 15:04:15 -08:00
Justin Tobler	ca7158076f	fsck: reject misconfigured fsck.skipList In Git, fsck operations can ignore known broken objects via the `fsck.skipList` configuration. This option expects a path to a file with the list of object names. When the configuration is specified without a path, an error message is printed, but the command continues as if the configuration was not set. Configuring `fsck.skipList` without a value is a misconfiguration so config parsing should be more strict and reject it. Update `git_fsck_config()` to no longer ignore misconfiguration of `fsck.skipList`. The same behavior is also present for `fetch.fsck.skipList` and `receive.fsck.skipList` so the configuration parsers for these are updated to ensure the related operations remain consistent. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-07 09:22:25 -08:00
Patrick Steinhardt	1568d1562e	wrapper: allow generating insecure random bytes The `csprng_bytes()` function generates randomness and writes it into a caller-provided buffer. It abstracts over a couple of implementations, where the exact one that is used depends on the platform. These implementations have different guarantees: while some guarantee to never fail (arc4random(3)), others may fail. There are two significant failures to distinguish from one another: - Systemic failure, where e.g. opening "/dev/urandom" fails or when OpenSSL doesn't have a provider configured. - Entropy failure, where the entropy pool is exhausted, and thus the function cannot guarantee strong cryptographic randomness. While we cannot do anything about the former, the latter failure can be acceptable in some situations where we don't care whether or not the randomness can be predicted. Introduce a new `CSPRNG_BYTES_INSECURE` flag that allows callers to opt into weak cryptographic randomness. The exact behaviour of the flag depends on the underlying implementation: - `arc4random_buf()` never returns an error, so it doesn't change. - `getrandom()` pulls from "/dev/urandom" by default, which never blocks on modern systems even when the entropy pool is empty. - `getentropy()` seems to block when there is not enough randomness available, and there is no way of changing that behaviour. - `GtlGenRandom()` doesn't mention anything about its specific failure mode. - The fallback reads from "/dev/urandom", which also returns bytes in case the entropy pool is drained in modern Linux systems. That only leaves OpenSSL with `RAND_bytes()`, which returns an error in case the returned data wouldn't be cryptographically safe. This function is replaced with a call to `RAND_pseudo_bytes()`, which can indicate whether or not the returned data is cryptographically secure via its return value. If it is insecure, and if the `CSPRNG_BYTES_INSECURE` flag is set, then we ignore the insecurity and return the data regardless. It is somewhat questionable whether we really need the flag in the first place, or whether we wouldn't just ignore the potentially-insecure data. But the risk of doing that is that we might have or grow callsites that aren't aware of the potential insecureness of the data in places where it really matters. So using a flag to opt-in to that behaviour feels like the more secure choice. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-07 09:04:18 -08:00
Junio C Hamano	a41e394e21	Merge branch 'bf/fetch-set-head-config' A hotfix on an advice messagge added during this cycle. * bf/fetch-set-head-config: fetch: fix erroneous set_head advice message	2025-01-06 12:02:21 -08:00
Bence Ferdinandy	233d48f5de	fetch: fix erroneous set_head advice message `9e2b7005be` (fetch set_head: add warn-if-not-$branch option, 2024-12-05) tried to expand the advice message for set_head with the new option, but unfortunately did not manage to add the right incantation. Fix the advice message with the correct usage of warn-if-not-$branch. Reported-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-06 06:50:03 -08:00
Junio C Hamano	effbef2beb	Merge branch 'jk/lsan-race-ignore-false-positive' CI jobs that run threaded programs under LSan has been giving false positives from time to time, which has been worked around. This is an alternative to the jk/lsan-race-with-barrier topic with much smaller change to the production code. * jk/lsan-race-ignore-false-positive: test-lib: ignore leaks in the sanitizer's thread code test-lib: check leak logs for presence of DEDUP_TOKEN test-lib: simplify leak-log checking test-lib: rely on logs to detect leaks Revert barrier-based LSan threading race workaround	2025-01-02 13:37:08 -08:00
Junio C Hamano	fc89d14c63	Revert barrier-based LSan threading race workaround The extra "barrier" approach was too much code whose sole purpose was to work around a race that is not even ours (i.e. in LSan's teardown code). In preparation for queuing a solution taking a much-less-invasive approach, let's revert them.	2025-01-01 14:13:01 -08:00
Junio C Hamano	d893741e02	Merge branch 'jk/lsan-race-with-barrier' CI jobs that run threaded programs under LSan has been giving false positives from time to time, which has been worked around. * jk/lsan-race-with-barrier: grep: work around LSan threading race with barrier index-pack: work around LSan threading race with barrier thread-utils: introduce optional barrier type Revert "index-pack: spawn threads atomically" test-lib: use individual lsan dir for --stress runs	2025-01-01 09:21:15 -08:00
Junio C Hamano	98422943f0	Merge branch 'ps/weak-sha1-for-tail-sum-fix' An earlier "csum-file checksum does not have to be computed with sha1dc" topic had a few code paths that had initialized an implementation of a hash function to be used by an unmatching hash by mistake, which have been corrected. * ps/weak-sha1-for-tail-sum-fix: ci: exercise unsafe OpenSSL backend builtin/fast-import: fix segfault with unsafe SHA1 backend bulk-checkin: fix segfault with unsafe SHA1 backend	2025-01-01 09:21:14 -08:00
Patrick Steinhardt	106140a99f	builtin/fast-import: fix segfault with unsafe SHA1 backend Same as with the preceding commit, git-fast-import(1) is using the safe variant to initialize a hashfile checkpoint. This leads to a segfault when passing the checkpoint into the hashfile subsystem because it would use the unsafe variants instead: ++ git --git-dir=R/.git fast-import --big-file-threshold=1 AddressSanitizer:DEADLYSIGNAL ================================================================= ==577126==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 (pc 0x7ffff7a01a99 bp 0x5070000009c0 sp 0x7fffffff5b30 T0) ==577126==The signal is caused by a READ memory access. ==577126==Hint: address points to the zero page. #0 0x7ffff7a01a99 in EVP_MD_CTX_copy_ex (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4) #1 0x555555ddde56 in openssl_SHA1_Clone ../sha1/openssl.h:40:2 #2 0x555555dce2fc in git_hash_sha1_clone_unsafe ../object-file.c:123:2 #3 0x555555c2d5f8 in hashfile_checkpoint ../csum-file.c:211:2 #4 0x5555559647d1 in stream_blob ../builtin/fast-import.c:1110:2 #5 0x55555596247b in parse_and_store_blob ../builtin/fast-import.c:2031:3 #6 0x555555967f91 in file_change_m ../builtin/fast-import.c:2408:5 #7 0x55555595d8a2 in parse_new_commit ../builtin/fast-import.c:2768:4 #8 0x55555595bb7a in cmd_fast_import ../builtin/fast-import.c:3614:4 #9 0x555555b1f493 in run_builtin ../git.c:480:11 #10 0x555555b1bfef in handle_builtin ../git.c:740:9 #11 0x555555b1e6f4 in run_argv ../git.c:807:4 #12 0x555555b1b87a in cmd_main ../git.c:947:19 #13 0x5555561649e6 in main ../common-main.c:64:11 #14 0x7ffff742a1fb in __libc_start_call_main (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4) #15 0x7ffff742a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4) #16 0x555555772c84 in _start (git+0x21ec84) ==577126==Register values: rax = 0x0000511000000cc0 rbx = 0x0000000000000000 rcx = 0x000000000000000c rdx = 0x0000000000000000 rdi = 0x0000000000000000 rsi = 0x00005070000009c0 rbp = 0x00005070000009c0 rsp = 0x00007fffffff5b30 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x0000000000000000 r11 = 0x00007ffff7a01a30 r12 = 0x0000000000000000 r13 = 0x00007fffffff6b60 r14 = 0x00007ffff7ffd000 r15 = 0x00005555563b9910 AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4) in EVP_MD_CTX_copy_ex ==577126==ABORTING ./test-lib.sh: line 1039: 577126 Aborted git --git-dir=R/.git fast-import --big-file-threshold=1 < input error: last command exited with $?=134 not ok 167 - R: blob bigger than threshold The segfault is only exposed in case the unsafe and safe backends are different from one another. Fix the issue by initializing the context with the unsafe SHA1 variant. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-30 06:46:30 -08:00
Jeff King	7a8d9efc26	grep: work around LSan threading race with barrier There's a race with LSan when spawning threads and one of the threads calls die(). We worked around one such problem with index-pack in the previous commit, but it exists in git-grep, too. You can see it with: make SANITIZE=leak THREAD_BARRIER_PTHREAD=YesOnLinux cd t ./t0003-attributes.sh --stress which fails pretty quickly with: ==git==4096424==ERROR: LeakSanitizer: detected memory leaks Direct leak of 32 byte(s) in 1 object(s) allocated from: #0 0x7f906de14556 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98 #1 0x7f906dc9d2c1 in __pthread_getattr_np nptl/pthread_getattr_np.c:180 #2 0x7f906de2500d in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long, unsigned long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150 #3 0x7f906de25187 in __sanitizer::GetThreadStackAndTls(bool, unsigned long, unsigned long, unsigned long, unsigned long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:614 #4 0x7f906de17d18 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:53 #5 0x7f906de143a9 in ThreadStartFunc<false> ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:431 #6 0x7f906dc9bf51 in start_thread nptl/pthread_create.c:447 #7 0x7f906dd1a677 in __clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 As with the previous commit, we can fix this by inserting a barrier that makes sure all threads have finished their setup before continuing. But there's one twist in this case: the thread which calls die() is not one of the worker threads, but the main thread itself! So we need the main thread to wait in the barrier, too, until all threads have gotten to it. And thus we initialize the barrier for num_threads+1, to account for all of the worker threads plus the main one. If we then test as above, t0003 should run indefinitely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-30 06:18:58 -08:00
Jeff King	526c0a851b	index-pack: work around LSan threading race with barrier We sometimes get false positives from our linux-leaks CI job because of a race in LSan itself. The problem is that one thread is still initializing its stack in LSan's code (and allocating memory to do so) while anothe thread calls die(), taking down the whole process and triggering a leak check. The problem is described in more detail in `993d38a066` (index-pack: spawn threads atomically, 2024-01-05), which tried to fix it by pausing worker threads until all calls to pthread_create() had completed. But that's not enough to fix the problem, because the LSan setup code runs in the threads themselves. So even though pthread_create() has returned, we have no idea if all threads actually finished their setup before letting any of them do real work. We can fix that by using a barrier inside the threads themselves, waiting for all of them to hit the start of their main function before any of them proceed. You can test for the race by running: make SANITIZE=leak THREAD_BARRIER_PTHREAD=YesOnLinux cd t ./t5309-pack-delta-cycles.sh --stress which fails quickly before this patch, and should run indefinitely without it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-30 06:18:58 -08:00
Jeff King	ca9d60f246	Revert "index-pack: spawn threads atomically" This reverts commit `993d38a066`. That commit was trying to solve a race between LSan setting up the threads stack and another thread calling exit(), by making sure that all pthread_create() calls have finished before doing any work that might trigger the exit(). But that isn't sufficient. The setup code actually runs in the individual threads themselves, not in the spawning thread's call to pthread_create(). So while it may have improved the race a bit, you can still trigger it pretty quickly with: make SANITIZE=leak cd t ./t5309-pack-delta-cycles.sh --stress Let's back out that failed attempt so we can try again. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-30 06:18:57 -08:00
Patrick Steinhardt	5e7fe8a7b8	commit-reach: use `size_t` to track indices when computing merge bases The functions `repo_get_merge_bases_many()` and friends accepts an array of commits as well as a parameter that indicates how large that array is. This parameter is using a signed integer, which leads to a couple of warnings with -Wsign-compare. Refactor the code to use `size_t` to track indices instead and adapt callers accordingly. While most callers are trivial, there are two callers that require a bit more scrutiny: - builtin/merge-base.c:show_merge_base() subtracts `1` from the `rev_nr` before calling `repo_get_merge_bases_many_dirty()`, so if the variable was `0` it would wrap. This code is fine though because its only caller will execute that code only when `argc >= 2`, and it follows that `rev_nr >= 2`, as well. - bisect.ccheck_merge_bases() similarly subtracts `1` from `rev_nr`. Again, there is only a single caller that populates `rev_nr` with `good_revs.nr`. And because a bisection always requires at least one good revision it follws that `rev_nr >= 1`. Mark the file as -Wsign-compare-clean. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:12:40 -08:00
Patrick Steinhardt	1ab5948141	builtin/log: fix remaining -Wsign-compare warnings Fix remaining -Wsign-compare warnings in "builtin/log.c" and mark the file as -Wsign-compare-clean. While most of the fixes are obvious, one fix requires us to use `cast_size_t_to_int()`, which will cause us to die in case the `size_t` cannot be represented as `int`. This should be fine though, as the data would typically be set either via a config key or via the command line, neither of which should ever exceed a couple of kilobytes of data. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:11:46 -08:00
Patrick Steinhardt	0905ed201a	builtin/log: use `size_t` to track indices Similar as with the preceding commit, adapt "builtin/log.c" so that it tracks array indices via `size_t` instead of using signed integers. This fixes a couple of -Wsign-compare warnings and prepares the code for a similar refactoring of `repo_get_merge_bases_many()` in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:11:46 -08:00
Junio C Hamano	88e59f8027	Merge branch 'js/range-diff-diff-merges' "git range-diff" learned to optionally show and compare merge commits in the ranges being compared, with the --diff-merges option. * js/range-diff-diff-merges: range-diff: introduce the convenience option `--remerge-diff` range-diff: optionally include merge commits' diffs in the analysis	2024-12-23 09:32:17 -08:00
Junio C Hamano	002a8a9d36	Merge branch 'as/show-index-uninitialized-hash' Regression fix for 'show-index' when run outside of a repository. * as/show-index-uninitialized-hash: t5300: add test for 'show-index --object-format' show-index: fix uninitialized hash function	2024-12-23 09:32:12 -08:00
Junio C Hamano	4156b6a741	Merge branch 'ps/build-sign-compare' Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings	2024-12-23 09:32:11 -08:00
Junio C Hamano	49edce4ff9	show-index: the short help should say the command reads from its input The short help text given by "git show-index -h" says $ git show-index -h usage: git show-index [--object-format=<hash-algorithm>] --[no-]object-format <hash-algorithm> specify the hash algorithm to use The command takes a pack .idx file from its standard input. The user has to _know_ this, as there is no indication from this output. Give a hint that the data to work on is fed from its standard input. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-20 17:30:57 -08:00
Junio C Hamano	cb89eebf3b	Merge branch 'js/log-remerge-keep-ancestry' "git log -p --remerge-diff --reverse" was completely broken. * js/log-remerge-keep-ancestry: log: --remerge-diff needs to keep around commit parents	2024-12-19 10:58:31 -08:00
Junio C Hamano	a1f34d5955	Merge branch 'bf/fetch-set-head-config' "git fetch" honors "remote.<remote>.followRemoteHEAD" settings to tweak the remote-tracking HEAD in "refs/remotes/<remote>/HEAD". * bf/fetch-set-head-config: remote set-head: set followRemoteHEAD to "warn" if "always" fetch set_head: add warn-if-not-$branch option fetch set_head: move warn advice into advise_if_enabled fetch: add configuration for set_head behaviour	2024-12-19 10:58:30 -08:00
Junio C Hamano	ae75cefd94	Merge branch 'jc/set-head-symref-fix' "git fetch" from a configured remote learned to update a missing remote-tracking HEAD but it asked the remote about their HEAD even when it did not need to, which has been corrected. Incidentally, this also corrects "git fetch --tags $URL" which was broken by the new feature in an unspecified way. * jc/set-head-symref-fix: fetch: do not ask for HEAD unnecessarily	2024-12-19 10:58:28 -08:00
Junio C Hamano	5f212684ab	Merge branch 'bf/set-head-symref' When "git fetch $remote" notices that refs/remotes/$remote/HEAD is missing and discovers what branch the other side points with its HEAD, refs/remotes/$remote/HEAD is updated to point to it. * bf/set-head-symref: fetch set_head: handle mirrored bare repositories fetch: set remote/HEAD if it does not exist refs: add create_only option to refs_update_symref_extended refs: add TRANSACTION_CREATE_EXISTS error remote set-head: better output for --auto remote set-head: refactor for readability refs: atomically record overwritten ref in update_symref refs: standardize output of refs_read_symbolic_ref t/t5505-remote: test failure of set-head t/t5505-remote: set default branch to main	2024-12-19 10:58:27 -08:00
Patrick Steinhardt	727c71a112	tmp-objdir: stop using `the_repository` Stop using `the_repository` in the "tmp-objdir" subsystem by passing in the repostiroy when creating a new temporary object directory. While we could trivially update the caller to pass in the hash algorithm used by the index itself, we instead pass in `the_hash_algo`. This is mostly done to stay consistent with the rest of the code in that file, which isn't prepared to handle arbitrary repositories, either. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-18 10:44:31 -08:00
Patrick Steinhardt	6c27d22276	credential: stop using `the_repository` Stop using `the_repository` in the "credential" subsystem by passing in a repository when filling, approving or rejecting credentials. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-18 10:44:31 -08:00
Patrick Steinhardt	71e5afee8b	mailinfo: stop using `the_repository` Stop using `the_repository` in the "mailinfo" subsystem by passing in a repository when setting up the mailinfo structure. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-18 10:44:31 -08:00
Patrick Steinhardt	b4c476c43a	diagnose: stop using `the_repository` Stop using `the_repository` in the "diagnose" subsystem by passing in a repository when generating a diagnostics archive. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-18 10:44:31 -08:00
Patrick Steinhardt	c365dbb44e	server-info: stop using `the_repository` Stop using `the_repository` in the "server-info" subsystem by passing in a repository when updating server info and storing the repository in the `update_info_ctx` structure to make it accessible to other functions. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-18 10:44:30 -08:00
Patrick Steinhardt	5ee907bb3f	send-pack: stop using `the_repository` Stop using `the_repository` in the "send-pack" subsystem by passing in a repository when sending a packfile. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-18 10:44:30 -08:00
Patrick Steinhardt	395b584b57	serve: stop using `the_repository` Stop using `the_repository` in the "serve" subsystem by passing in a repository when advertising capabilities or serving requests. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-18 10:44:30 -08:00
Patrick Steinhardt	59b6131a67	pager: stop using `the_repository` Stop using `the_repository` in the "pager" subsystem by passing in a repository when setting up the pager and when configuring it. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-18 10:44:30 -08:00
Patrick Steinhardt	1f7e6478dc	progress: stop using `the_repository` Stop using `the_repository` in the "progress" subsystem by passing in a repository when initializing `struct progress`. Furthermore, store a pointer to the repository in that struct so that we can pass it to the trace2 API when logging information. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-18 10:44:30 -08:00
Junio C Hamano	913a1e157c	Merge branch 'ps/build-sign-compare' into ps/the-repository * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings	2024-12-18 10:43:16 -08:00
Johannes Schindelin	4538338c7e	range-diff: introduce the convenience option `--remerge-diff` Just like `git log`, now also `git range-diff` has that option as a shortcut for the common operation that would otherwise require the quite unwieldy (if theoretically "more correct") `--diff-mode=remerge` option. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-16 08:45:48 -08:00
Johannes Schindelin	f8043236c6	range-diff: optionally include merge commits' diffs in the analysis The `git log` command already offers support for including diffs for merges, via the `--diff-merges=<format>` option. Let's add corresponding support for `git range-diff`, too. This makes it more convenient to spot differences between commit ranges that contain merges. This is especially true in scenarios with non-trivial merges, i.e. merges introducing changes other than, or in addition to, what merge ORT would have produced. Merging a topic branch that changes a function signature into a branch that added a caller of that function, for example, would require the merge commit itself to adjust that caller to the modified signature. In my code reviews, I found the `--diff-merges=remerge` option particularly useful. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-16 08:45:48 -08:00
Junio C Hamano	eb8374c652	Merge branch 'js/log-remerge-keep-ancestry' into js/range-diff-diff-merges * js/log-remerge-keep-ancestry: log: --remerge-diff needs to keep around commit parents	2024-12-16 08:45:14 -08:00
Junio C Hamano	ededd0d5dc	Merge branch 'jt/fix-fattening-promisor-fetch' Fix performance regression of a recent "fatten promisor pack with local objects" protection against an unwanted gc. * jt/fix-fattening-promisor-fetch: index-pack --promisor: also check commits' trees index-pack --promisor: don't check blobs index-pack --promisor: dedup before checking links	2024-12-15 17:54:31 -08:00
Junio C Hamano	ab738b2f1f	Merge branch 'jc/forbid-head-as-tagname' "git tag" has been taught to refuse to create refs/tags/HEAD as such a tag will be confusing in the context of UI provided by the Git Porcelain commands. * jc/forbid-head-as-tagname: tag: "git tag" refuses to use HEAD as a tagname t5604: do not expect that HEAD can be a valid tagname refs: drop strbuf_ prefix from helpers refs: move ref name helpers around	2024-12-15 17:54:26 -08:00
Junio C Hamano	73b7e03e9e	Merge branch 'jk/describe-perf' "git describe" optimization. * jk/describe-perf: describe: split "found all tags" and max_candidates logic describe: stop traversing when we run out of names describe: stop digging for max_candidates+1 t/perf: add tests for git-describe t6120: demonstrate weakness in disjoint-root handling	2024-12-15 17:54:25 -08:00
Junio C Hamano	ca43bd2562	Merge branch 'kn/midx-wo-the-repository' Yet another "pass the repository through the callchain" topic. * kn/midx-wo-the-repository: midx: inline the `MIDX_MIN_SIZE` definition midx: pass down `hash_algo` to functions using global variables midx: pass `repository` to `load_multi_pack_index` midx: cleanup internal usage of `the_repository` and `the_hash_algo` midx-write: pass down repository to `write_midx_file[_only]` write-midx: add repository field to `write_midx_context` midx-write: use `revs->repo` inside `read_refs_snapshot` midx-write: pass down repository to static functions packfile.c: remove unnecessary prepare_packed_git() call midx: add repository to `multi_pack_index` struct config: make `packed_git_(limit\|window_size)` non-global variables config: make `delta_base_cache_limit` a non-global variable packfile: pass down repository to `for_each_packed_object` packfile: pass down repository to `has_object[_kept]_pack` packfile: pass down repository to `odb_pack_name` packfile: pass `repository` to static function in the file packfile: use `repository` from `packed_git` directly packfile: add repository to struct `packed_git`	2024-12-13 07:33:44 -08:00
Junio C Hamano	3b11c9139d	Merge branch 'cw/worktree-extension' Introduce a new repository extension to prevent older Git versions from mis-interpreting worktrees created with relative paths. * cw/worktree-extension: worktree: refactor `repair_worktree_after_gitdir_move()` worktree: add relative cli/config options to `repair` command worktree: add relative cli/config options to `move` command worktree: add relative cli/config options to `add` command worktree: add `write_worktree_linking_files()` function worktree: refactor infer_backlink return worktree: add `relativeWorktrees` extension setup: correctly reinitialize repository version	2024-12-13 07:33:43 -08:00
Junio C Hamano	e56c283c15	Merge branch 'en/fast-import-verify-path' "git fast-import" learned to reject paths with ".." and "." as their components to avoid creating invalid tree objects. * en/fast-import-verify-path: t9300: test verification of renamed paths fast-import: disallow more path components fast-import: disallow "." and ".." path components	2024-12-13 07:33:41 -08:00
Junio C Hamano	a32668829d	Merge branch 'jt/bundle-fsck' "git bundle --unbundle" and "git clone" running on a bundle file both learned to trigger fsck over the new objects with configurable fck check levels. * jt/bundle-fsck: transport: propagate fsck configuration during bundle fetch fetch-pack: split out fsck config parsing bundle: support fsck message configuration bundle: add bundle verification options type	2024-12-13 07:33:36 -08:00
Johannes Schindelin	f94bfa1516	log: --remerge-diff needs to keep around commit parents To show a remerge diff, the merge needs to be recreated. For that to work, the merge base(s) need to be found, which means that the commits' parents have to be traversed until common ancestors are found (if any). However, one optimization that hails all the way back to `cb115748ec` (Some more memory leak avoidance, 2006-06-17) is to release the commit's list of parents immediately after showing it _and to set that parent list to `NULL`_. This can break the merge base computation. This problem is most obvious when traversing the commits in reverse: In that instance, if a parent of a merge commit has been shown as part of the `git log` command, by the time the merge commit's diff needs to be computed, that parent commit's list of parent commits will have been set to `NULL` and as a result no merge base will be found (even if one should be found). Traversing commits in reverse is far from the only circumstance in which this problem occurs, though. There are many avenues to traversing at least one commit in the revision walk that will later be part of a merge base computation, for example when not even walking any revisions in `git show <merge1> <merge2>` where `<merge1>` is part of the commit graph between the parents of `<merge2>`. Another way to force a scenario where a commit is traversed before it has to be traversed again as part of a merge base computation is to start with two revisions (where the first one is reachable from the second but not in a first-parent ancestry) and show the commit log with `--topo-order` and `--first-parent`. Let's fix this by special-casing the `remerge_diff` mode, similar to what we did with reflogs in `f35650dff6` (log: do not free parents when walking reflog, 2017-07-07). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-13 06:56:10 -08:00
Junio C Hamano	35f40385e4	Merge branch 'bc/allow-upload-pack-from-other-people' Loosen overly strict ownership check introduced in the recent past, to keep the promise "cloning a suspicious repository is a safe first step to inspect it". * bc/allow-upload-pack-from-other-people: Allow cloning from repositories owned by another user	2024-12-10 10:04:55 +09:00
Jonathan Tan	1a14c857db	index-pack --promisor: also check commits' trees Commit `c08589efdc` (index-pack: repack local links into promisor packs, 2024-11-01) seems to contain an oversight in that the tree of a commit is not checked. Teach git to check these trees. The fix slows down a fetch from a certain repo at $DAYJOB from 2m2.127s to 2m45.052s, but in order to make the fetch correct, it seems worth it. In order to test this, we could create server and client repos as follows... C S \ / O (O and C are commits both on the client and server. S is a commit only on the server. C and S have the same tree but different commit messages. The diff between O and C is non-zero.) ...and then, from the client, fetch S from the server. In theory, the client declares "have C" and the server can use this information to exclude S's tree (since it knows that the client has C's tree, which is the same as S's tree). However, it is also possible for the server to compute that it needs to send S and not O, and proceed from there; therefore the objects of C are not considered at all when determining what to send in the packfile. In order to prevent a test of client functionality from having such a dependence on server behavior, I have not included such a test. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-10 08:53:59 +09:00
Jonathan Tan	36198026d8	index-pack --promisor: don't check blobs As a follow-up to the parent of this commit, it was found that not checking for the existence of blobs linked from trees sped up the fetch from 24m47.815s to 2m2.127s. Teach Git to do that. The tradeoff of not checking blobs is documented in a code comment. (Blobs may also be linked from tag objects, but it is impossible to know the type of an object linked from a tag object without looking it up in the object database, so the code for that is untouched.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-10 08:53:59 +09:00
Jonathan Tan	911d14203c	index-pack --promisor: dedup before checking links Commit `c08589efdc` (index-pack: repack local links into promisor packs, 2024-11-01) fixed a bug with what was believed to be a negligible decrease in performance [1] [2]. But at $DAYJOB, with at least one repo, it was found that the decrease in performance was very significant. Looking at the patch, whenever we parse an object in the packfile to be indexed, we check the targets of all its outgoing links for its existence. However, this could be optimized by first collecting all such targets into an oidset (thus deduplicating them) before checking. Teach Git to do that. On a certain fetch from the aforementioned repo, this improved performance from approximately 7 hours to 24m47.815s. This number will be further reduced in a subsequent patch. [1] https://lore.kernel.org/git/CAG1j3zGiNMbri8rZNaF0w+yP+6OdMz0T8+8_Wgd1R_p1HzVasg@mail.gmail.com/ [2] https://lore.kernel.org/git/20241105212849.3759572-1-jonathantanmy@google.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-10 08:53:59 +09:00
Junio C Hamano	6c915c3f85	fetch: do not ask for HEAD unnecessarily In `3f763ddf28` (fetch: set remote/HEAD if it does not exist, 2024-11-22), git-fetch learned to opportunistically set $REMOTE/HEAD when fetching by always asking for remote HEAD, in the hope that it will help setting refs/remotes/<name>/HEAD if missing. But it is not needed to always ask for remote HEAD. When we are fetching from a remote, for which we have remote-tracking branches, we do need to know about HEAD. But if we are doing one-shot fetch, e.g., $ git fetch --tags https://github.com/git/git we do not even know what sub-hierarchy of refs/remotes/<remote>/ we need to adjust the remote HEAD for. There is no need to ask for HEAD in such a case. Incidentally, because the unconditional request to list "HEAD" affected the number of ref-prefixes requested in the ls-remote request, this affected how the requests for tags are added to the same ls-remote request, breaking "git fetch --tags $URL" performed against a URL that is not configured as a remote. Reported-by: Josh Steadmon <steadmon@google.com> [jc: tests are also borrowed from Josh's patch] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-07 21:58:59 +09:00
Patrick Steinhardt	efb38ad49f	builtin/patch-id: fix type of `get_one_patchid()` In `get_one_patchid()` we assign either the result of `strlen()` or `remove_space()` to `len`. But while the former correctly returns a `size_t`, the latter returns an `int` to indicate the length of the stripped string even though it cannot ever return a negative value. This causes a warning with "-Wsign-conversion". In fact, even `get_one_patchid()` itself is also using an integer as return value even though it always returns the length of the patch, and this bubbles up to other callers. Adapt the function and its helpers to use `size_t` for string lengths consistently. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-06 20:20:05 +09:00
Patrick Steinhardt	6411a0a896	builtin/blame: fix type of `length` variable when emitting object ID The `length` variable is used to store how many bytes we wish to emit from an object ID. This value will either be the full hash algorithm's length, or the abbreviated hash that can be set via `--abbrev` or the "core.abbrev" option. The former is of type `size_t`, whereas the latter is of type `int`, which causes a warning with "-Wsign-compare". The reason why `abbrev` is using a signed type is mostly that it is initialized with `-1` to indicate that we have to compute the minimum abbreviation length. This length is computed via `find_alignment()`, which always gets called before `emit_other()`, and thus we can assume that the value would never be negative in `emit_other()`. In fact, we can even assume that the value will always be at least `MINIMUM_ABBREV`, which is enforced by both `git_default_core_config()` and `parse_opt_abbrev_cb()`. We implicitly rely on this by subtracting up to 3 without checking for whether the value becomes negative. We then pass the value to printf(3p) to print the prefix of our object's ID, so if that assumption was violated we may end up with undefined behaviour. Squelch the warning by asserting this invariant and casting the value of `abbrev` to `size_t`. This allows us to store the whole length as an unsigned integer, which we can then pass to `fwrite()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-06 20:20:05 +09:00
Patrick Steinhardt	80c9e70ebe	global: trivial conversions to fix `-Wsign-compare` warnings We have a bunch of loops which iterate up to an unsigned boundary using a signed index, which generates warnigs because we compare a signed and unsigned value in the loop condition. Address these sites for trivial cases and enable `-Wsign-compare` warnings for these code units. This patch only adapts those code units where we can drop the `DISABLE_SIGN_COMPARE_WARNINGS` macro in the same step. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-06 20:20:04 +09:00
Patrick Steinhardt	47d72a74a7	diff.h: fix index used to loop through unsigned integer The `struct diff_flags` structure is essentially an array of flags, all of which have the same type. We can thus use `sizeof()` to iterate through all of the flags, which we do in `diff_flags_or()`. But while the statement returns an unsigned integer, we used a signed integer to iterate through the flags, which generates a warning. Fix this by using `size_t` for the index instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-06 20:20:03 +09:00
Patrick Steinhardt	41f43b8243	global: mark code units that generate warnings with `-Wsign-compare` Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-06 20:20:02 +09:00
Jeff King	db162862b3	describe: split "found all tags" and max_candidates logic Commit a30154187a (describe: stop traversing when we run out of names, 2024-10-31) taught git-describe to automatically reduce the max_candidates setting to match the total number of possible names. This lets us break out of the traversal rather than fruitlessly searching for more candidates when there are no more to be found. However, setting max_candidates to 0 (e.g., if the repo has no tags) overlaps with the --exact-match option, which explicitly uses the same value. And this causes a regression with --always, which is ignored in exact-match mode. We used to get this in a repo with no tags: $ git describe --always HEAD b2f0a7f and now we get: $ git describe --always HEAD fatal: no tag exactly matches 'b2f0a7f47f5f2aebe1e7fceff19a57de20a78c06' The reason is that we bail early in describe_commit() when max_candidates is set to 0. This logic goes all the way back to `2c33f75754` (Teach git-describe --exact-match to avoid expensive tag searches, 2008-02-24). We should obviously fix this regression, but there are two paths, depending on what you think: $ git describe --always --exact-match and $ git describe --always --candidates=0 should do. Since the "--always" option was added, it has always been ignored in --exact-match (or --candidates=0) mode. I.e., we treat --exact-match as a true exact match of a tag, and never fall back to using --always, even if it was requested. If we think that's a bug (or at least a misfeature), then the right solution is to fix it by removing the early bail-out from `2c33f75754`, letting the noop algorithm run and then hitting the --always fallback output. And then our regression naturally goes away, because it follows the same path. If we think that the current "--exact-match --always" behavior is the right thing, then we have to differentiate the case where we automatically reduced max_candidates to 0 from the case where the user asked for it specifically. That's possible to do with a flag, but we can also just reimplement the logic from a30154187a to explicitly break out of the traversal when we run out of candidates (rather than relying on the existing max_candidates check). My gut feeling is along the lines of option 1 (it's a bug, and people would be happy for "--exact-match --always" to give the fallback rather than ignoring "--always"). But the documentation can be interpreted in the other direction, and we've certainly lived with the existing behavior for many years. So it's possible that changing it now is the wrong thing. So this patch fixes the regression by taking the second option, retaining the "--exact-match" behavior as-is. There are two new tests. The first shows that the regression is fixed (we don't even need a new repo without tags; a restrictive --match is enough to create the situation that there are no candidate names). The second test confirms that the "--exact-match --always" behavior remains unchanged and continues to die when there is no tag pointing at the specified commit. It's possible we may reconsider this in the future, but this shows that the approach described above is implemented faithfully. We can also run the perf tests in p6100 to see that we've retained the speedup that a30154187a was going for: Test HEAD^ HEAD -------------------------------------------------------------------------------------- 6100.2: describe HEAD 0.72(0.64+0.07) 0.72(0.66+0.06) +0.0% 6100.3: describe HEAD with one max candidate 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0% 6100.4: describe HEAD with one tag 0.01(0.01+0.00) 0.01(0.01+0.00) +0.0% Reported-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-06 15:18:21 +09:00
Bence Ferdinandy	012bc566ba	remote set-head: set followRemoteHEAD to "warn" if "always" When running "remote set-head" manually it is unlikely, that the user would actually like to have "fetch" always update the remote/HEAD. On the contrary, it is more likely, that the user would expect remote/HEAD to stay the way they manually set it, and just forgot about having "followRemoteHEAD" set to "always". When "followRemoteHEAD" is set to "always" make running "remote set-head" change the config to "warn". Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-06 02:59:40 +09:00
Bence Ferdinandy	9e2b7005be	fetch set_head: add warn-if-not-$branch option Currently if we want to have a remote/HEAD locally that is different from the one on the remote, but we still want to get a warning if remote changes HEAD, our only option is to have an indiscriminate warning with "follow_remote_head" set to "warn". Add a new option "warn-if-not-$branch", where $branch is a branch name we do not wish to get a warning about. If the remote HEAD is $branch do not warn, otherwise, behave as "warn". E.g. let's assume, that our remote origin has HEAD set to "master", but locally we have "git remote set-head origin seen". Setting 'remote.origin.followRemoteHEAD = "warn"' will always print a warning, even though the remote has not changed HEAD from "master". Setting 'remote.origin.followRemoteHEAD = "warn-if-not-master" will squelch the warning message, unless the remote changes HEAD from "master". Note, that should the remote change HEAD to "seen" (which we have locally), there will still be no warning. Improve the advice message in report_set_head to also include silencing the warning message with "warn-if-not-$branch". Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-06 02:59:40 +09:00
Bence Ferdinandy	ad739f525e	fetch set_head: move warn advice into advise_if_enabled Advice about what to do when getting a warning is typed out explicitly twice and is printed as regular output. The output is also tested for. Extract the advice message into a single place and use a wrapper function, so if later the advice is made more chatty the signature only needs to be changed in once place. Remove the testing for the advice output in the tests. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-06 02:59:16 +09:00
Karthik Nayak	2fed09aa9b	midx-write: pass down repository to `write_midx_file[_only]` In a previous commit, we passed the repository field to all subcommands in the `builtin/` directory. Utilize this to pass the repository field down to the `write_midx_file[_only]` functions to remove the usage of `the_repository` global variables. With this, all usage of global variables in `midx-write.c` is removed, hence, remove the `USE_THE_REPOSITORY_VARIABLE` guard from the file. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 10:32:20 +09:00
Junio C Hamano	aaafb67ba9	Merge branch 'kn/pass-repo-to-builtin-sub-sub-commands' into kn/midx-wo-the-repository * kn/pass-repo-to-builtin-sub-sub-commands: builtin: pass repository to sub commands Git 2.47.1 Makefile(s): avoid recipe prefix in conditional statements doc: switch links to https doc: update links to current pages The eleventh batch pack-objects: only perform verbatim reuse on the preferred pack t5332-multi-pack-reuse.sh: demonstrate duplicate packing failure test-lib: move malloc-debug setup after $PATH setup builtin/difftool: intialize some hashmap variables refspec: store raw refspecs inside refspec_item refspec: drop separate raw_nr count fetch: adjust refspec->raw_nr when filtering prefetch refspecs test-lib: check malloc debug LD_PRELOAD before using	2024-12-04 10:32:02 +09:00
Junio C Hamano	33833ed08b	Merge branch 'kn/the-repository' into kn/midx-wo-the-repository * kn/the-repository: packfile.c: remove unnecessary prepare_packed_git() call midx: add repository to `multi_pack_index` struct config: make `packed_git_(limit\|window_size)` non-global variables config: make `delta_base_cache_limit` a non-global variable packfile: pass down repository to `for_each_packed_object` packfile: pass down repository to `has_object[_kept]_pack` packfile: pass down repository to `odb_pack_name` packfile: pass `repository` to static function in the file packfile: use `repository` from `packed_git` directly packfile: add repository to struct `packed_git`	2024-12-04 10:31:46 +09:00
Junio C Hamano	1e18cf4310	Merge branch 'kn/pass-repo-to-builtin-sub-sub-commands' Built-in Git subcommands are supplied the repository object to work with; they learned to do the same when they invoke sub-subcommands. * kn/pass-repo-to-builtin-sub-sub-commands: builtin: pass repository to sub commands	2024-12-04 10:14:47 +09:00
Junio C Hamano	57e81b59f3	Merge branch 'sj/ref-contents-check' "git fsck" learned to issue warnings on "curiously formatted" ref contents that have always been taken valid but something Git wouldn't have written itself (e.g., missing terminating end-of-line after the full object name). * sj/ref-contents-check: ref: add symlink ref content check for files backend ref: check whether the target of the symref is a ref ref: add basic symref content check for files backend ref: add more strict checks for regular refs ref: port git-fsck(1) regular refs check for files backend ref: support multiple worktrees check for refs ref: initialize ref name outside of check functions ref: check the full refname instead of basename ref: initialize "fsck_ref_report" with zero	2024-12-04 10:14:42 +09:00
Junio C Hamano	7ee055b237	Merge branch 'ps/ref-backend-migration-optim' The migration procedure between two ref backends has been optimized. * ps/ref-backend-migration-optim: reftable: rename scratch buffer refs: adapt `initial_transaction` flag to be unsigned reftable/block: optimize allocations by using scratch buffer reftable/block: rename `block_writer::buf` variable reftable/writer: optimize allocations by using a scratch buffer refs: don't normalize log messages with `REF_SKIP_CREATE_REFLOG` refs: skip collision checks in initial transactions refs: use "initial" transaction semantics to migrate refs refs/files: support symbolic and root refs in initial transaction refs: introduce "initial" transaction flag refs/files: move logic to commit initial transaction refs: allow passing flags when setting up a transaction	2024-12-04 10:14:41 +09:00
Junio C Hamano	a5dd262a75	Merge branch 'ps/leakfixes-part-10' Leakfixes. * ps/leakfixes-part-10: (27 commits) t: remove TEST_PASSES_SANITIZE_LEAK annotations test-lib: unconditionally enable leak checking t: remove unneeded !SANITIZE_LEAK prerequisites t: mark some tests as leak free t5601: work around leak sanitizer issue git-compat-util: drop now-unused `UNLEAK()` macro global: drop `UNLEAK()` annotation t/helper: fix leaking commit graph in "read-graph" subcommand builtin/branch: fix leaking sorting options builtin/init-db: fix leaking directory paths builtin/help: fix leaks in `check_git_cmd()` help: fix leaking return value from `help_unknown_cmd()` help: fix leaking `struct cmdnames` help: refactor to not use globals for reading config builtin/sparse-checkout: fix leaking sanitized patterns split-index: fix memory leak in `move_cache_to_base_index()` git: refactor builtin handling to use a `struct strvec` git: refactor alias handling to use a `struct strvec` strvec: introduce new `strvec_splice()` function line-log: fix leak when rewriting commit parents ...	2024-12-04 10:14:40 +09:00
Junio C Hamano	2f605347da	Merge branch 'ps/gc-stale-lock-warning' Give a bit of advice/hint message when "git maintenance" stops finding a lock file left by another instance that still is potentially running. * ps/gc-stale-lock-warning: t7900: fix host-dependent behaviour when testing git-maintenance(1) builtin/gc: provide hint when maintenance hits a stale schedule lock	2024-12-04 10:14:37 +09:00
Karthik Nayak	d284713bae	config: make `packed_git_(limit\|window_size)` non-global variables The variables `packed_git_window_size` and `packed_git_limit` are global config variables used in the `packfile.c` file. Since it is only used in this file, let's change it from being a global config variable to a local variable for the subsystem. With this, we rid `packfile.c` from all global variable usage and this means we can also remove the `USE_THE_REPOSITORY_VARIABLE` guard from the file. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 08:21:55 +09:00
Karthik Nayak	d6b2d21fbf	config: make `delta_base_cache_limit` a non-global variable The `delta_base_cache_limit` variable is a global config variable used by multiple subsystems. Let's make this non-global, by adding this variable independently to the subsystems where it is used. First, add the setting to the `repo_settings` struct, this provides access to the config in places where the repository is available. Use this in `packfile.c`. In `index-pack.c` we add it to the `pack_idx_option` struct and its constructor. While the repository struct is available here, it may not be set because `git index-pack` can be used without a repository. In `gc.c` add it to the `gc_config` struct and also the constructor function. The gc functions currently do not have direct access to a repository struct. These changes are made to remove the usage of `delta_base_cache_limit` as a global variable in `packfile.c`. This brings us one step closer to removing the `USE_THE_REPOSITORY_VARIABLE` definition in `packfile.c` which we complete in the next patch. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 08:21:55 +09:00
Karthik Nayak	c87910b96b	packfile: pass down repository to `for_each_packed_object` The function `for_each_packed_object` currently relies on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Let's remove its usage from this function and closely related function `is_promisor_object`. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 08:21:54 +09:00
Karthik Nayak	cc656f4eb2	packfile: pass down repository to `has_object[_kept]_pack` The functions `has_object[_kept]_pack` currently rely on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Let's remove its usage from these functions and any related ones. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 08:21:54 +09:00
Karthik Nayak	873b00597b	packfile: pass down repository to `odb_pack_name` The function `odb_pack_name` currently relies on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 08:21:54 +09:00
Karthik Nayak	2cf3fe63f6	packfile: add repository to struct `packed_git` The struct `packed_git` holds information regarding a packed object file. Let's add the repository variable to this object, to represent the repository that this packfile belongs to. This helps remove dependency on the global `the_repository` object in `packfile.c` by simply using repository information now readily available in the struct. We do need to consider that a packfile could be part of the alternates of a repository, but considering that we only have one repository struct and also that we currently anyways use 'the_repository', we should be OK with this change. We also modify `alloc_packed_git` to ensure that the repository is added to newly created `packed_git` structs. This requires modifying the function and all its callee to pass the repository object down the levels. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-04 08:21:53 +09:00
Junio C Hamano	93e5e048f8	refs: drop strbuf_ prefix from helpers The helper functions (strbuf_branchname, strbuf_check_branch_ref, and strbuf_check_tag_ref) are about handling branch and tag names, and it is a non-essential fact that these functions use strbuf to hold these names. Rename them to make it clarify that these are more about "ref". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-03 12:38:49 +09:00
Junio C Hamano	5bcbde9e49	refs: move ref name helpers around strbuf_branchname(), strbuf_check_{branch,tag}_ref() are helper functions to deal with branch and tag names, and the fact that they happen to use strbuf to hold the name of a branch or a tag is not essential. These functions fit better in the refs API than strbuf API, the latter of which is about string manipulations. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-03 12:38:49 +09:00
Elijah Newren	da91a90c2f	fast-import: disallow more path components Instead of just disallowing '.' and '..', make use of verify_path() to ensure that fast-import will disallow anything we wouldn't allow into the index, such as anything under .git/, .gitmodules as a symlink, or a dos drive prefix on Windows. Since a few fast-export and fast-import tests that tried to stress-test the correct handling of quoting relied on filenames that fail is_valid_win32_path(), such as spaces or periods at the end of filenames or backslashes within the filename, turn off core.protectNTFS for those tests to ensure they keep passing. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-02 10:09:48 +09:00
Bence Ferdinandy	b7f7d16562	fetch: add configuration for set_head behaviour In the current implementation, if refs/remotes/$remote/HEAD does not exist, running fetch will create it, but if it does exist it will not do anything, which is a somewhat safe and minimal approach. Unfortunately, for users who wish to NOT have refs/remotes/$remote/HEAD set for any reason (e.g. so that `git rev-parse origin` doesn't accidentally point them somewhere they do not want to), there is no way to remove this behaviour. On the other side of the spectrum, users may want fetch to automatically update HEAD or at least give them a warning if something changed on the remote. Introduce a new setting, remote.$remote.followRemoteHEAD with four options: - "never": do not ever do anything, not even create - "create": the current behaviour, now the default behaviour - "warn": print a message if remote and local HEAD is different - "always": silently update HEAD on every change Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-02 09:55:17 +09:00
Caleb White	e6df1ee2c1	worktree: add relative cli/config options to `repair` command This teaches the `worktree repair` command to respect the `--[no-]relative-paths` CLI option and `worktree.useRelativePaths` config setting. If an existing worktree with an absolute path is repaired with `--relative-paths`, the links will be replaced with relative paths, even if the original path was correct. This allows a user to covert existing worktrees between absolute/relative as desired. To simplify things, both linking files are written when one of the files needs to be repaired. In some cases, this fixes the other file before it is checked, in other cases this results in a correct file being written with the same contents. Signed-off-by: Caleb White <cdwhite3@pm.me> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-02 09:36:17 +09:00
Caleb White	298d2917e2	worktree: add relative cli/config options to `move` command This teaches the `worktree move` command to respect the `--[no-]relative-paths` CLI option and `worktree.useRelativePaths` config setting. If an existing worktree is moved with `--relative-paths` the new path will be relative (and visa-versa). Signed-off-by: Caleb White <cdwhite3@pm.me> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-02 09:36:17 +09:00
Caleb White	b7016344f1	worktree: add relative cli/config options to `add` command This introduces the `--[no-]relative-paths` CLI option and `worktree.useRelativePaths` configuration setting to the `worktree add` command. When enabled these options allow worktrees to be linked using relative paths, enhancing portability across environments where absolute paths may differ (e.g., containerized setups, shared network drives). Git still creates absolute paths by default, but these options allow users to opt-in to relative paths if desired. The t2408 test file is removed and more comprehensive tests are written for the various worktree operations in their own files. Signed-off-by: Caleb White <cdwhite3@pm.me> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-02 09:36:17 +09:00
Justin Tobler	87c01003cd	bundle: add bundle verification options type When `unbundle()` is invoked, fsck verification may be configured by passing the `VERIFY_BUNDLE_FSCK` flag. This mechanism allows fsck checks on the bundle to be enabled or disabled entirely. To facilitate more fine-grained fsck configuration, additional context must be provided to `unbundle()`. Introduce the `unbundle_opts` type, which wraps the existing `verify_bundle_flags`, to facilitate future extension of `unbundle()` configuration. Also update `unbundle()` and its call sites to accept this new options type instead of the flags directly. The end behavior is functionally the same, but allows for the set of configurable options to be extended. This is leveraged in a subsequent commit to enable fsck message severity configuration. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-28 12:07:57 +09:00
Junio C Hamano	761e62a09a	Merge branch 'bf/set-head-symref' into bf/fetch-set-head-config * bf/set-head-symref: fetch set_head: handle mirrored bare repositories fetch: set remote/HEAD if it does not exist refs: add create_only option to refs_update_symref_extended refs: add TRANSACTION_CREATE_EXISTS error remote set-head: better output for --auto remote set-head: refactor for readability refs: atomically record overwritten ref in update_symref refs: standardize output of refs_read_symbolic_ref t/t5505-remote: test failure of set-head t/t5505-remote: set default branch to main	2024-11-27 22:49:05 +09:00
Junio C Hamano	1f3d9b9814	Merge branch 'jt/index-pack-allow-promisor-only-while-fetching' We now ensure "index-pack" is used with the "--promisor" option only during a "git fetch". * jt/index-pack-allow-promisor-only-while-fetching: index-pack: teach --promisor to forbid pack name	2024-11-27 07:57:09 +09:00
Junio C Hamano	8eaa06590f	Merge branch 'en/fast-import-avoid-self-replace' "git fast-import" can be tricked into a replace ref that maps an object to itself, which is a useless thing to do. * en/fast-import-avoid-self-replace: fast-import: avoid making replace refs point to themselves	2024-11-27 07:57:08 +09:00
Junio C Hamano	93905d3b70	Merge branch 'bc/c23' C23 compatibility updates. * bc/c23: reflog: rename unreachable index-pack: rename struct thread_local	2024-11-27 07:57:05 +09:00
Karthik Nayak	6f33d8e255	builtin: pass repository to sub commands In `9b1cb5070f` (builtin: add a repository parameter for builtin functions, 2024-09-13) the repository was passed down to all builtin commands. This allowed the repository to be passed down to lower layers without depending on the global `the_repository` variable. Continue this work by also passing down the repository parameter from the command to sub-commands. This will help pass down the repository to other subsystems and cleanup usage of global variables like 'the_repository' and 'the_hash_algo'. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-26 10:36:08 +09:00
Elijah Newren	4a2790a257	fast-import: disallow "." and ".." path components If a user specified e.g. M 100644 :1 ../some-file then fast-import previously would happily create a git history where there is a tree in the top-level directory named "..", and with a file inside that directory named "some-file". The top-level ".." directory causes problems. While git checkout will die with errors and fsck will report hasDotdot problems, the user is going to have problems trying to remove the problematic file. Simply avoid creating this bad history in the first place. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-26 10:30:04 +09:00
Junio C Hamano	058c36aa26	Merge branch 'kh/checkout-ignore-other-docfix' into maint-2.47 Doc updates. * kh/checkout-ignore-other-docfix: checkout: refer to other-worktree branch, not ref	2024-11-25 12:29:45 +09:00
Junio C Hamano	e52276d340	Merge branch 'jh/config-unset-doc-fix' into maint-2.47 Docfix. * jh/config-unset-doc-fix: git-config.1: remove value from positional args in unset usage	2024-11-25 12:29:40 +09:00
Bence Ferdinandy	b1b713f722	fetch set_head: handle mirrored bare repositories When adding a remote to bare repository with "git remote add --mirror", running fetch will fail to update HEAD to the remote's HEAD, since it does not know how to handle bare repositories. On the other hand HEAD already has content, since "git init --bare" has already set HEAD to whatever is the default branch set for the user. Unless this - by chance - is the same as the remote's HEAD, HEAD will be pointing to a bad symref. Teach set_head to handle bare repositories, by overwriting HEAD so it mirrors the remote's HEAD. Note, that in this case overriding the local HEAD reference is necessary, since HEAD will exist before fetch can be run, but this should not be an issue, since the whole purpose of --mirror is to be an exact mirror of the remote, so following any changes to HEAD makes sense. Also note, that although "git remote set-head" also fails when trying to update the remote's locally tracked HEAD in a mirrored bare repository, the usage of the command does not make much sense after this patch: fetch will update the remote HEAD correctly, and setting it manually to something else is antithetical to the concept of mirroring. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-25 11:46:37 +09:00
Bence Ferdinandy	3f763ddf28	fetch: set remote/HEAD if it does not exist When cloning a repository remote/HEAD is created, but when the user creates a repository with git init, and later adds a remote, remote/HEAD is only created if the user explicitly runs a variant of "remote set-head". Attempt to set remote/HEAD during fetch, if the user does not have it already set. Silently ignore any errors. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-25 11:46:37 +09:00
Bence Ferdinandy	9963746c84	refs: add create_only option to refs_update_symref_extended Allow the caller to specify that it only wants to update the symref if it does not already exist. Silently ignore the error from the transaction API if the symref already exists. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-25 11:46:36 +09:00
Bence Ferdinandy	dfe86fa06b	remote set-head: better output for --auto Currently, set-head --auto will print a message saying "remote/HEAD set to branch", which implies something was changed. Change the output of --auto, so the output actually reflects what was done: a) set a previously unset HEAD, b) change HEAD because remote changed or c) no updates. As edge cases, if HEAD is changed from a previous symbolic reference that was not a remote branch, explicitly call attention to this fact, and also notify the user if the previous reference was not a symbolic reference. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-25 11:46:36 +09:00
Bence Ferdinandy	4f07c45e25	remote set-head: refactor for readability Make two different readability refactors: Rename strbufs "buf" and "buf2" to something more explanatory. Instead of calling get_main_ref_store(the_repository) multiple times, call it once and store the result in a new refs variable. Although this change probably offers some performance benefits, the main purpose is to shorten the line lengths of function calls using this variable. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-25 11:46:35 +09:00
Bence Ferdinandy	2fd5555895	t/t5505-remote: test failure of set-head The test coverage was missing a test for the failure branch of remote set-head auto's output. Add the missing text and while we are at it, correct a small grammatical mistake in the error's output ("setup" is the noun, "set up" is the verb). Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-25 11:46:34 +09:00
Junio C Hamano	0a83b39594	Merge branch 'tb/multi-pack-reuse-dupfix' Object reuse code based on multi-pack-index sent an unwanted copy of object. * tb/multi-pack-reuse-dupfix: pack-objects: only perform verbatim reuse on the preferred pack t5332-multi-pack-reuse.sh: demonstrate duplicate packing failure	2024-11-22 14:34:19 +09:00
Junio C Hamano	76bb16db5c	Merge branch 'sm/difftool' Use of some uninitialized variables in "git difftool" has been corrected. * sm/difftool: builtin/difftool: intialize some hashmap variables	2024-11-22 14:34:18 +09:00
Junio C Hamano	aa1d4b42e5	Merge branch 'jk/fetch-prefetch-double-free-fix' Double-free fix. * jk/fetch-prefetch-double-free-fix: refspec: store raw refspecs inside refspec_item refspec: drop separate raw_nr count fetch: adjust refspec->raw_nr when filtering prefetch refspecs	2024-11-22 14:34:17 +09:00
Patrick Steinhardt	d91a9db33c	global: drop `UNLEAK()` annotation There are two users of `UNLEAK()` left in our codebase: - In "builtin/clone.c", annotating the `repo` variable. That leak has already been fixed though as you can see in the context, where we do know to free `repo_to_free`. - In "builtin/diff.c", to unleak entries of the `blob[]` array. That leak has also been fixed, because the entries we assign to that array come from `rev.pending.objects`, and we do eventually release `rev`. This neatly demonstrates one of the issues with `UNLEAK()`: it is quite easy for the annotation to become stale. A second issue is that its whole intent is to paper over leaks. And while that has been a necessary evil in the past, because Git was leaking left and right, it isn't really much of an issue nowadays where our test suite has no known leaks anymore. Remove the last two users of this macro. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-21 08:23:46 +09:00
Patrick Steinhardt	b97301c13c	builtin/branch: fix leaking sorting options The sorting options are leaking, but given that they are marked with `UNLEAK()` the leak sanitizer doesn't complain. Fix the leak by creating a common exit path and clearing the vector such that we can get rid of the `UNLEAK()` annotation entirely. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-21 08:23:45 +09:00
Patrick Steinhardt	8ef15c205b	builtin/init-db: fix leaking directory paths We've got a couple of leaking directory paths in git-init(1), all of which are marked with `UNLEAK()`. Fixing them is trivial, so let's do that instead so that we can get rid of `UNLEAK()` entirely. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-21 08:23:45 +09:00
Patrick Steinhardt	2379b5c900	builtin/help: fix leaks in `check_git_cmd()` The `check_git_cmd()` function is declared to return a string constant. And while it sometimes does return a constant, it may also return an allocated string in two cases: - When handling aliases. This case is already marked with `UNLEAK()` to work around the leak. - When handling unknown commands in case "help.autocorrect" is enabled. This one is not marked with `UNLEAK()`. The function only has a single caller, so let's fix its return type to be non-constant, consistently return an allocated string and free it at its callsite to plug the leak. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-21 08:23:44 +09:00
Patrick Steinhardt	58e7568c61	builtin/sparse-checkout: fix leaking sanitized patterns Both `git sparse-checkout add` and `git sparse-checkout set` accept a list of additional directories or patterns. These get massaged via calls to `sanitize_paths()`, which may end up modifying the passed-in array by updating its pointers to be prefixed paths. This allocates memory that we never free. Refactor the code to instead use a `struct strvec`, which makes it way easier for us to track the lifetime correctly. The couple of extra memory allocations likely do not matter as we only ever populate it with command line arguments. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-21 08:23:43 +09:00
Patrick Steinhardt	65a1b7e2bd	builtin/blame: fix leaking blame entries with `--incremental` When passing `--incremental` to git-blame(1) we exit early by jumping to the `cleanup` label. But some of the cleanups we perform are handled between the `goto` and its label, and thus we leak the data. Move the cleanups after the `cleanup` label. While at it, move the logic to free the scoreboard's `final_buf` into `cleanup_scoreboard()` and drop its `const` declaration. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-21 08:23:40 +09:00
shejialuo	7c78d819e6	ref: support multiple worktrees check for refs We have already set up the infrastructure to check the consistency for refs, but we do not support multiple worktrees. However, "git-fsck(1)" will check the refs of worktrees. As we decide to get feature parity with "git-fsck(1)", we need to set up support for multiple worktrees. Because each worktree has its own specific refs, instead of just showing the users "refs/worktree/foo", we need to display the full name such as "worktrees/<id>/refs/worktree/foo". So we should know the id of the worktree to get the full name. Add a new parameter "struct worktree *" for "refs-internal.h::fsck_fn". Then change the related functions to follow this new interface. The "packed-refs" only exists in the main worktree, so we should only check "packed-refs" in the main worktree. Use "is_main_worktree" method to skip checking "packed-refs" in "packed_fsck" function. Then, enhance the "files-backend.c::files_fsck_refs_dir" function to add "worktree/<id>/" prefix when we are not in the main worktree. Last, add a new test to check the refname when there are multiple worktrees to exercise the code. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-21 08:21:32 +09:00
Patrick Steinhardt	1c299d03e5	refs: introduce "initial" transaction flag There are two different ways to commit a transaction: - `ref_transaction_commit()` can be used to commit a regular transaction and is what almost every caller wants. - `initial_ref_transaction_commit()` can be used when it is known that the ref store that the transaction is committed for is empty and when there are no concurrent processes. This is used when cloning a new repository. Implementing this via two separate functions has a couple of downsides. First, every reference backend needs to implement a separate callback even in the case where they don't special-case the initial transaction. Second, backends are basically forced to reimplement the whole logic for how to commit the transaction like the "files" backend does, even though backends may wish to only tweak certain behaviour of a "normal" commit. Third, it is awkward that callers must never prepare the transaction as this is somewhat different than how a transaction typically works. Refactor the code such that we instead mark initial transactions via a separate flag when starting the transaction. This addresses all of the mentioned painpoints, where the most important part is that it will allow backends to have way more leeway in how exactly they want to handle the initial transaction. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-21 07:59:15 +09:00
Patrick Steinhardt	a0efef1446	refs: allow passing flags when setting up a transaction Allow passing flags when setting up a transaction such that the behaviour of the transaction itself can be altered. This functionality will be used in a subsequent patch. Adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-21 07:59:14 +09:00
Junio C Hamano	38e4df6615	Merge branch 'la/trailer-info' Renaming a handful of variables and structure fields. * la/trailer-info: trailer: spread usage of "trailer_block" language	2024-11-20 14:47:17 +09:00
Junio C Hamano	0c11ef1356	Merge branch 'jt/repack-local-promisor' "git gc" discards any objects that are outside promisor packs that are referred to by an object in a promisor pack, and we do not refetch them from the promisor at runtime, resulting an unusable repository. Work it around by including these objects in the referring promisor pack at the receiving end of the fetch. * jt/repack-local-promisor: index-pack: repack local links into promisor packs t5300: move --window clamp test next to unclamped t0410: use from-scratch server t0410: make test description clearer	2024-11-20 14:47:16 +09:00
Junio C Hamano	cc53ddf7f0	Merge branch 'db/submodule-fetch-with-remote-name-fix' into maint-2.47 A "git fetch" from the superproject going down to a submodule used a wrong remote when the default remote names are set differently between them. * db/submodule-fetch-with-remote-name-fix: submodule: correct remote name with fetch	2024-11-20 14:43:00 +09:00
Junio C Hamano	76c1953395	Merge branch 'ps/maintenance-start-crash-fix' into maint-2.47 "git maintenance start" crashed due to an uninitialized variable reference, which has been corrected. * ps/maintenance-start-crash-fix: builtin/gc: fix crash when running `git maintenance start`	2024-11-20 14:42:58 +09:00
Junio C Hamano	f1a50f12b9	Merge branch 'jk/fsmonitor-event-listener-race-fix' into maint-2.47 On macOS, fsmonitor can fall into a race condition that results in a client waiting forever to be notified for an event that have already happened. This problem has been corrected. * jk/fsmonitor-event-listener-race-fix: fsmonitor: initialize fs event listener before accepting clients simple-ipc: split async server initialization and running	2024-11-20 14:42:57 +09:00
Jonathan Tan	1f2be8bed6	index-pack: teach --promisor to forbid pack name Currently, - Running "index-pack --promisor" outside a repo segfaults. - It may be confusing to a user that running "index-pack --promisor" within a repo may make changes to the repo's object DB, especially since the packs indexed by the index-pack invocation may not even be related to the repo. As discussed in [1] and [2], teaching --promisor to forbid a packfile name solves both these problems. This combination of arguments requires a repo (since we are writing the resulting .pack and .idx to it) and it is clear that the files are related to the repo. Currently, Git uses "index-pack --promisor" only when fetching into a repo, so it could be argued that we should teach "index-pack" a new argument (say, "--fetching-mode") instead of tying --promisor to a generic argument like the packfile name. However, this --promisor feature could conceivably be used whenever we have a packfile that is known to come from the promisor remote (whether obtained through Git's fetch protocol or through other means) so not using a new argument seems reasonable - one could envision a user-made script obtaining a packfile and then running "index-pack --promisor --stdin", for example. In fact, it might be possible to relax the restriction further (say, by also allowing --promisor when indexing a packfile that is in the object DB), but relaxing the restriction is backwards-compatible so we can revisit that later. One thing to watch out for is the possibility of a future Git feature that indexes a pack in the context of a repo, but does not necessarily write the resulting pack to it (and does not necessarily desire to make any changes to the object DB). One such feature would be fetch quarantine, which might need the repo context in order to detect hash collisions, but would also need to ensure that the object DB is undisturbed in case the fetch fails for whatever reason, even if the reason occurs only after the indexing is complete. It may not be obvious to the implementer of such a feature that "index-pack" could sometimes write packs other than the indexed pack to the object DB, but there are already other ways that "fetch" could write to the object DB (in particular, packfile URIs and bundle URIs), so hopefully the implementation of this future feature would already include a test that the object DB be undisturbed. This change requires the change to t5300 by `1f52cdfacb` (index-pack: document and test the --promisor option, 2022-03-09) to be undone. (--promisor is already tested indirectly, so we don't need the explicit test here any more.) [1] https://lore.kernel.org/git/20241114005652.GC1140565@coredump.intra.peff.net/ [2] https://lore.kernel.org/git/20241119185345.GB15723@coredump.intra.peff.net/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-20 10:37:56 +09:00
Patrick Steinhardt	656ca9204a	builtin/gc: provide hint when maintenance hits a stale schedule lock When running scheduled maintenance via `git maintenance start`, we acquire a lockfile to ensure that no other scheduled maintenance task is running in the repository concurrently. If so, we do provide an error to the user hinting that another process seems to be running in this repo. There are two important cases why such a lockfile may exist: - An actual git-maintenance(1) process is still running in this repository. - An earlier process may have crashed or was interrupted part way through and has left a stale lockfile behind. In `c95547a394` (builtin/gc: fix crash when running `git maintenance start`, 2024-10-10), we have fixed an issue where git-maintenance(1) would crash with the "start" subcommand, and the underlying bug causes the second scenario to trigger quite often now. Most users don't know how to get out of that situation again though. Ideally, we'd be removing the stale lock for our users automatically. But in the context of repository maintenance this is rather risky, as it can easily run for hours or even days. So finding a clear point where we know that the old process has exited is basically impossible. We have the same issue in other subsystems, e.g. when locking refs. Our lockfile interfaces thus provide the `unable_to_lock_message()` function for exactly this purpose: it provides a nice hint to the user that explains what is going on and how to get out of that situation again by manually removing the file. Adapt git-maintenance(1) to print a similar hint. While we could use the above function, we can provide a bit more context as we know exactly what kind of process would create the lockfile. Reported-by: Miguel Rincon Barahona <mrincon@gitlab.com> Reported-by: Kev Kloss <kkloss@gitlab.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-20 10:26:12 +09:00
Elijah Newren	5e904f1a4a	fast-import: avoid making replace refs point to themselves If someone replaces a commit with a modified version, then builds on that commit, and then later decides to rewrite history in a format like git fast-export --all \| CMD_TO_TWEAK_THE_STREAM \| git fast-import and CMD_TO_TWEAK_THE_STREAM undoes the modifications that the replacement did, then at the end you'd get a replace ref that points to itself. For example: $ git show-ref \| grep replace fb92ebc654641b310e7d0360d0a5a49316fd7264 refs/replace/fb92ebc654641b310e7d0360d0a5a49316fd7264 Git commands which pay attention to replace refs will die with an error when a self-referencing replace ref is present: $ git log fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264 Avoid such problems by deleting replace refs that will simply end up pointing to themselves at the end of our writing. Unless users specify --quiet, warn them when we delete such a replace ref. Two notes about this patch: * We are not ignoring the problematic update of the replace ref (turning it into a no-op), we are replacing the update with a delete. The logic here is that if the repository had a value for the replace ref before fast-import was run, and the replace ref was explicitly named in the fast-import stream, we don't want the replace ref to be left with a pre-fast-import value. * While loops with more than one element (e.g. refs/replace/A points to B, and refs/replace/B points to A) are possible, they seem much less plausible. It is pretty easy to create a sequence of git-filter-repo commands that will trigger a self-referencing replace ref, but I do not know how to trigger a scenario with a cycle length greater than 1. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 09:39:33 +09:00
brian m. carlson	e8b3bcf491	index-pack: rename struct thread_local "thread_local" is a keyword in C23. To make sure that our code compiles on a wide variety of C versions, rename struct thread_local to "struct thread_local_data" to avoid a conflict. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-18 09:42:08 +09:00
brian m. carlson	0ffb5a6bf1	Allow cloning from repositories owned by another user Historically, Git has allowed users to clone from an untrusted repository, and we have documented that this is safe to do so: `upload-pack` tries to avoid any dangerous configuration options or hooks from the repository it's serving, making it safe to clone an untrusted directory and run commands on the resulting clone. However, this was broken by `f4aa8c8bb1` ("fetch/clone: detect dubious ownership of local repositories", 2024-04-10) in an attempt to make things more secure. That change resulted in a variety of problems when cloning locally and over SSH, but it did not change the stated security boundary. Because the security boundary has not changed, it is safe to adjust part of the code that patch introduced. To do that and restore the previous functionality, adjust enter_repo to take two flags instead of one. The two bits are - ENTER_REPO_STRICT: callers that require exact paths (as opposed to allowing known suffixes like ".git", ".git/.git" to be omitted) can set this bit. Corresponds to the "strict" parameter that the flags word replaces. - ENTER_REPO_ANY_OWNER_OK: callers that are willing to run without ownership check can set this bit. The former is --strict-paths option of "git daemon". The latter is set only by upload-pack, which honors the claimed security boundary. Note that local clones across ownership boundaries require --no-local so that upload-pack is used. Document this fact in the manual page and provide an example. This patch was based on one written by Junio C Hamano. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-15 11:05:06 +09:00
Taylor Blau	e199290592	pack-objects: only perform verbatim reuse on the preferred pack When reusing objects from source pack(s), write_reused_pack_verbatim() is responsible for reusing objects whole eword_t's at a time. It works by taking the longest continuous run of objects from the beginning of each source pack that the caller wants, and reuses the entirety of that section from each pack. This is based on the assumption that we don't have any gaps within the region. This assumption relieves us from having to patch any OFS_DELTAs, since we know that there aren't any gaps between any delta and its base in that region. To illustrate why this assumption is necessary, suppose we have some pack P, which has objects X, Y, and Z. If the MIDX's copy of Y was selected from a pack other than P, then the bit corresponding to object Y will appear earlier in the bitmap than the bits corresponding to X and Z. If pack-objects already has or will use the copy of Y from the pack it was selected from in the MIDX, then it is an error to reuse all objects between X and Z in the source pack. Doing so will cause us to reuse Y from a different pack than the one which represents Y in the MIDX, causing us to either: - include the object twice, assuming that the caller wants Y in the pack, or - include the object once, resulting in us packing more objects than necessary. This regression comes from `ca0fd69e37` (pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse, 2023-12-14), which incorrectly assumed that there would be no gaps in reusable regions of non-preferred packs. Instead, we can only safely perform the whole-word reuse optimization on the preferred pack, where we know with certainty that no gaps exist in that region of the bitmap. We can still reuse objects from non-preferred packs, but we have to inspect them individually in write_reused_pack() to ensure that any gaps that may exist are accounted for. This allows us to simplify the implementation of write_reused_pack_verbatim() back to almost its pre-multi-pack reuse form, since we can now assume that the beginning of the pack appears at the beginning of the bitmap, meaning that we don't have to account for any bits up to the first word boundary (like we had to special case in `ca0fd69e37`). The only significant changes from the pre-ca0fd69e37 implementation are: - that we can no longer inspect words up to the end of reuse_packfile_bitmap->word_alloc, since we only want to look at words whose bits all correspond to objects in the given packfile, and - that we return early when given a reuse_packfile which is not preferred, making the call a noop. In the future, it might be possible to restore this optimization if we could guarantee that some reuse packs don't contain any gaps by construction (similar to the "disjoint packs" idea in very early versions of multi-pack reuse). Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-15 09:13:31 +09:00
Junio C Hamano	51ba601160	Merge branch 'en/shallow-exclude-takes-a-ref-fix' The "--shallow-exclude=<ref>" option to various history transfer commands takes a ref, not an arbitrary revision. * en/shallow-exclude-takes-a-ref-fix: doc: correct misleading descriptions for --shallow-exclude upload-pack: fix ambiguous error message	2024-11-13 08:35:32 +09:00
Junio C Hamano	6890c99e38	Merge branch 'ps/leakfixes-part-9' More leakfixes. * ps/leakfixes-part-9: (22 commits) list-objects-filter-options: work around reported leak on error builtin/merge: release output buffer after performing merge dir: fix leak when parsing "status.showUntrackedFiles" t/helper: fix leaking buffer in "dump-untracked-cache" t/helper: stop re-initialization of `the_repository` sparse-index: correctly free EWAH contents dir: release untracked cache data combine-diff: fix leaking lost lines builtin/tag: fix leaking key ID on failure to sign transport-helper: fix leaking import/export marks builtin/commit: fix leaking cleanup config trailer: fix leaking strbufs when formatting trailers trailer: fix leaking trailer values builtin/commit: fix leaking change data contents upload-pack: fix leaking URI protocols pretty: clear signature check diff-lib: fix leaking diffopts in `do_diff_cache()` revision: fix leaking bloom filters builtin/grep: fix leak with `--max-count=0` grep: fix leak in `grep_splice_or()` ...	2024-11-13 08:35:31 +09:00
Simon Marchi	98e4015593	builtin/difftool: intialize some hashmap variables When running a dir-diff command that produces no diff, variables `wt_modified` and `tmp_modified` are used while uninitialized, causing: $ /home/smarchi/src/git/git-difftool --dir-diff master free(): invalid pointer [1] 334004 IOT instruction (core dumped) /home/smarchi/src/git/git-difftool --dir-diff master $ valgrind --track-origins=yes /home/smarchi/src/git/git-difftool --dir-diff master ... Invalid free() / delete / delete[] / realloc() at 0x48478EF: free (vg_replace_malloc.c:989) by 0x422CAC: hashmap_clear_ (hashmap.c:208) by 0x283830: run_dir_diff (difftool.c:667) by 0x284103: cmd_difftool (difftool.c:801) by 0x238E0F: run_builtin (git.c:484) by 0x2392B9: handle_builtin (git.c:750) by 0x2399BC: cmd_main (git.c:921) by 0x356FEF: main (common-main.c:64) Address 0x1ffefff180 is on thread 1's stack in frame #2, created by run_dir_diff (difftool.c:358) ... If taking any `goto finish` path before these variables are initialized, `hashmap_clear_and_free()` operates on uninitialized data, sometimes causing a crash. This regression was introduced in `7f795a1715` (builtin/difftool: plug several trivial memory leaks, 2024-09-26). Fix it by initializing those variables with the `HASHMAP_INIT` macro. Add a test comparing the main branch to itself, resulting in no diff. Signed-off-by: Simon Marchi <simon.marchi@efficios.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-13 08:11:19 +09:00
Jeff King	fe17a25905	refspec: store raw refspecs inside refspec_item The refspec struct keeps two matched arrays: one for the refspec_item structs and one for the original raw refspec strings. The main reason for this is that there are other users of refspec_item that do not care about the raw strings. But it does make managing the refspec struct awkward, as we must keep the two arrays in sync. This has led to bugs in the past (both leaks and double-frees). Let's just store a copy of the raw refspec string directly in each refspec_item struct. This simplifies the handling at a small cost: 1. Direct callers of refspec_item_init() will now get an extra copy of the refspec string, even if they don't need it. This should be negligible, as the struct is already allocating two strings for the parsed src/dst values (and we tend to only do it sparingly anyway for things like the TAG_REFSPEC literal). 2. Users of refspec_appendf() will now generate a temporary string, copy it, and then free the result (versus handing off ownership of the temporary string). We could get around this by having a "nodup" variant of refspec_item_init(), but it doesn't seem worth the extra complexity for something that is not remotely a hot code path. Code which accesses refspec->raw now needs to look at refspec->item.raw. Other callers which just use refspec_item directly can remain the same. We'll free the allocated string in refspec_item_clear(), which they should be calling anyway to free src/dst. One subtle note: refspec_item_init() can return an error, in which case we'll still have set its "raw" field. But that is also true of the "src" and "dst" fields, so any caller which does not _clear() the failed item is already potentially leaking. In practice most code just calls die() on an error anyway, but you can see the exception in valid_fetch_refspec(), which does correctly call _clear() even on error. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-12 18:16:48 +09:00
Jeff King	d36af33081	refspec: drop separate raw_nr count A refspec struct contains zero or more refspec_item structs, along with matching "raw" strings. The items and raw strings are kept in separate arrays, but those arrays will always have the same length (because we write them only via refspec_append_nodup(), which grows both). This can lead to bugs when manipulating the array, since the arrays and lengths must be modified in lockstep. For example, the bug fixed in the previous commit, which forgot to decrement raw_nr. So let's get rid of "raw_nr" and have only "nr", making this kind of bug impossible (and also making it clear that the two are always matched, something that existing code already assumed but was not guaranteed by the interface). Even though we'd expect "alloc" and "raw_alloc" to likewise move in lockstep, we still need to keep separate counts there if we want to continue to use ALLOC_GROW() for both. Conceptually this would all be simpler if refspec_item just held onto its own raw string, and we had a single array. But there are callers which use refspec_item outside of "struct refspec" (and so don't hold on to a matching "raw" string at all), which we'd possibly need to adjust. So let's not worry about refactoring that for now, and just get rid of the redundant count variable. That is the first step on the road to combining them anyway. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-12 18:16:48 +09:00
Jeff King	b970509c59	fetch: adjust refspec->raw_nr when filtering prefetch refspecs In filter_prefetch_refspecs(), we may remove one or more refspecs if they point into refs/tags/. When we do, we remove the item from the refspec->items array, shifting subsequent items down, and then decrement the refspec->nr count. We also remove the item from the refspec->raw array, but fail to decrement refspec->raw_nr. This leaves us with a count that is too high, and anybody looking at the "raw" array will erroneously see either: 1. The removed entry, if there were no subsequent items to shift down. 2. A duplicate of the final entry, as everything is shifted down but there was nothing to overwrite the final item. The obvious culprit to run into this is calling refspec_clear(), which will try to free the removed entry (case 1) or double-free the final entry (case 2). But even though the bug has existed since the function was added in `2e03115d0c` (fetch: add --prefetch option, 2021-04-16), we did not trigger it in the test suite. The --prefetch option is normally only used with configured refspecs, and we never bother to call refspec_clear() on those (they are stored as part of a struct remote, which is held in a global variable). But you could trigger case 2 manually like: git fetch --prefetch . refs/tags/foo refs/tags/bar Ironically you couldn't trigger case 1, because the code accidentally leaked the string in the raw array, and the two bugs (the leak and the double-free) cancelled out. But when we fixed the leak in `ea4780307c` (fetch: free "raw" string when shrinking refspec, 2024-09-24), it became possible to trigger that, too, with a single item: git fetch --prefetch . refs/tags/foo We can fix both cases by just correctly decrementing "raw_nr" when we shrink the array. Even though we don't expect people to use --prefetch with command-line refspecs, we'll add a test to make sure it behaves well (like the test just before it, we're just confirming that the filtered prefetch succeeds at all). Reported-by: Eric Mills <ermills@epic.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-12 18:16:47 +09:00
Jonathan Tan	c08589efdc	index-pack: repack local links into promisor packs Teach index-pack to, when processing the objects in a pack with --promisor specified on the CLI, repack local objects (and the local objects that they refer to, recursively) referenced by these objects into promisor packs. This prevents the situation in which, when fetching from a promisor remote, we end up with promisor objects (newly fetched) referring to non-promisor objects (locally created prior to the fetch). This situation may arise if the client had previously pushed objects to the remote, for example. One issue that arises in this situation is that, if the non-promisor objects become inaccessible except through promisor objects (for example, if the branch pointing to them has moved to point to the promisor object that refers to them), then GC will garbage collect them. There are other ways to solve this, but the simplest seems to be to enforce the invariant that we don't have promisor objects referring to non-promisor objects. This repacking is done from index-pack to minimize the performance impact. During a fetch, the only time most objects are fully inflated in memory is when their object ID is computed, so we also scan the objects (to see which objects they refer to) during this time. Also to minimize the performance impact, an object is calculated to be local if it's a loose object or present in a non-promisor pack. (If it's also in a promisor pack or referred to by an object in a promisor pack, it is technically already a promisor object. But a misidentification of a promisor object as a non-promisor object is relatively benign here - we will thus repack that promisor object into a promisor pack, duplicating it in the object store, but there is no correctness issue, just an issue of inefficiency.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-12 10:18:16 +09:00
Abhijeet Sonar	4da8d90fdd	show-index: fix uninitialized hash function In `c8aed5e8da` (repository: stop setting SHA1 as the default object hash), we got rid of the default hash algorithm for the_repository. Due to this change, it is now the responsibility of the callers to set their own default when this is not present. As stated in the docs, show-index should use SHA1 as the default hash algorithm when run outside a repository. Make sure this promise is met by falling back to SHA1 when the_hash_algo is not present (i.e. when the command is run outside a repository). Also add a test that verifies this behavior. Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-11 12:16:59 +09:00
Junio C Hamano	02a2d5706d	Merge branch 'jk/left-right-bitmap' When called with '--left-right' and '--use-bitmap-index', 'rev-list' will produce output without any left/right markers, which has been corrected. * jk/left-right-bitmap: rev-list: skip bitmap traversal for --left-right	2024-11-08 12:56:28 +09:00
Jeff King	b8150bfee1	describe: stop traversing when we run out of names When trying to describe a commit, we'll traverse from the commit, collecting candidate tags that point to its ancestors. But once we've seen all of the tags in the repo, there's no point in traversing further. There's nothing left to find! For a default "git describe", this isn't usually a big problem. In a large repo you'll probably have multiple tags, so we'll eventually find 10 candidates (the default for max_candidates) and stop there. And in a small repo, it's quick to traverse to the root. But you can imagine a large repo with few tags. Or, as we saw in a real world case, explicitly limiting the set of matches like this (on linux.git): git describe --match=v6.12-rc4 HEAD which goes all the way to the root before realizing that no, there are no other tags under consideration besides the one we fed via --match. If we add in "--candidates=1" there, it's much faster (at least as of the previous commit). But we should be able to speed this up without the user asking for it. After expanding all matching tags, we know the total number of names. We could just stop the traversal there, but as hinted at above we already have a mechanism for doing that: the max_candidate limit. So we can just reduce that limit to match the number of possible candidates. Our p6100 test shows this off: Test HEAD^ HEAD --------------------------------------------------------------------------------------- 6100.2: describe HEAD 0.71(0.65+0.06) 0.72(0.68+0.04) +1.4% 6100.3: describe HEAD with one max candidate 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0% 6100.4: describe HEAD with one tag 0.72(0.66+0.05) 0.01(0.00+0.00) -98.6% Now we are fast automatically, just as if --candidates=1 were supplied by the user. Reported-by: Josh Poimboeuf <jpoimboe@kernel.org> Helped-by: Rasmus Villemoes <ravi@prevas.dk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-07 13:28:22 +09:00
Jeff King	7379046221	describe: stop digging for max_candidates+1 By default, describe considers only 10 candidate matches, and stops traversing when we have enough. This makes things much faster in a large repository, where collecting all candidates requires walking all the way down to the root (or at least to the oldest tag). This goes all the way back to `8713ab3079` (Improve git-describe performance by reducing revision listing., 2007-01-13). However, we don't stop immediately when we have enough candidates. We keep traversing and only bail when we find one more candidate that we're ignoring. Usually this is not too expensive, if the tags are sprinkled evenly throughout history. But if you are unlucky, you might hit the max candidate quickly, and then have a huge swath of history before finding the next one. Our p6100 test has exactly this unlucky case: with a max of "1", we find a recent tag quickly and then have to go all the way to the root to find the old tag that will be discarded. A more interesting real-world case is: git describe --candidates=1 --match=v6.12-rc4 HEAD in the linux.git repo. There we restrict the set of tags to a single one, so there is no older candidate to find at all! But despite --candidates=1, we keep traversing to the root only to find nothing. So why do we keep traversing after hitting thet max? There are two reasons I can see: 1. In theory the extra information that there was another candidate could be useful, and we record it in the gave_up_on variable. But we only show this information with --debug. 2. After finding the candidate, there's more processing we do in our loop. The most important of this is propagating the "within" flags to our parent commits, and putting them in the commit_list we'll use for finish_depth_computation(). That function continues the traversal until we've counted all commits reachable from the starting point but not reachable from our best candidate tag (so essentially counting "$tag..$start", but avoiding re-walking over the bits we've seen). If we break immediately without putting those commits into the list, our depth computation will be wrong (in the worst case we'll count all the way down to the root, not realizing those commits are included in our tag). But we don't need to find a new candidate for (2). As soon as we finish the loop iteration where we hit max_candidates, we can then quit on the next iteration. This should produce the same output as the original code (which could, after all, find a candidate on the very next commit anyway) but ends the traversal with less pointless digging. We still have to set "gave_up_on"; we've popped it off the list and it has to go back. An alternative would be to re-order the loop so that it never gets popped, but it's perhaps still useful to show in the --debug output, so we need to know it anyway. We do have to adjust the --debug output since it's now just a commit where we stopped traversing, and not the max+1th candidate. p6100 shows the speedup using linux.git: Test HEAD^ HEAD --------------------------------------------------------------------------------------- 6100.2: describe HEAD 0.70(0.63+0.06) 0.71(0.66+0.04) +1.4% 6100.3: describe HEAD with one max candidate 0.70(0.64+0.05) 0.01(0.00+0.00) -98.6% 6100.4: describe HEAD with one tag 0.70(0.67+0.03) 0.70(0.63+0.06) +0.0% Reported-by: Josh Poimboeuf <jpoimboe@kernel.org> Helped-by: Rasmus Villemoes <ravi@prevas.dk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-07 13:28:22 +09:00
Junio C Hamano	2664f2a0cb	Merge branch 'ps/leakfixes-part-9' into ps/leakfixes-part-10 * ps/leakfixes-part-9: (22 commits) list-objects-filter-options: work around reported leak on error builtin/merge: release output buffer after performing merge dir: fix leak when parsing "status.showUntrackedFiles" t/helper: fix leaking buffer in "dump-untracked-cache" t/helper: stop re-initialization of `the_repository` sparse-index: correctly free EWAH contents dir: release untracked cache data combine-diff: fix leaking lost lines builtin/tag: fix leaking key ID on failure to sign transport-helper: fix leaking import/export marks builtin/commit: fix leaking cleanup config trailer: fix leaking strbufs when formatting trailers trailer: fix leaking trailer values builtin/commit: fix leaking change data contents upload-pack: fix leaking URI protocols pretty: clear signature check diff-lib: fix leaking diffopts in `do_diff_cache()` revision: fix leaking bloom filters builtin/grep: fix leak with `--max-count=0` grep: fix leak in `grep_splice_or()` ...	2024-11-07 13:25:01 +09:00
Elijah Newren	00e10e0751	doc: correct misleading descriptions for --shallow-exclude The documentation for the --shallow-exclude option to clone/fetch/etc. claims that the option takes a revision, but it does not. As per upload-pack.c's process_deepen_not(), it passes the option to expand_ref() and dies if it does not find exactly one ref matching the name passed. Further, this has always been the case ever since these options were introduced by the commits merged in `a460ea4a3c` (Merge branch 'nd/shallow-deepen', 2016-10-10). Fix the documentation to match the implementation. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:53:23 -08:00
Patrick Steinhardt	ff67083ccd	builtin/merge: release output buffer after performing merge The `obuf` member of `struct merge_options` is used to buffer output in some cases. In order to not discard its allocated memory we only release its contents in `merge_finalize()` when we're not currently recursing into a subtree. This results in some situations where we seemingly do not release the buffer reliably. We thus have calls to `strbuf_release()` for this buffer scattered across the codebase. But we're missing one callsite in git-merge(1), which causes a memory leak. We should ideally refactor this interface so that callers don't have to know about any such internals. But for now, paper over the issue by adding one more `strbuf_release()` call. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:57 -08:00
Patrick Steinhardt	d06e3ec858	builtin/tag: fix leaking key ID on failure to sign We do not free the key ID when signing a tag fails. Do so by using the common exit path. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:55 -08:00
Patrick Steinhardt	6ef9f77a15	builtin/commit: fix leaking cleanup config The cleanup string set by the config is leaking when it is being overridden by an option. Fix this by tracking these via two separate variables such that we can free the old value. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:54 -08:00
Patrick Steinhardt	d34b5cbf02	builtin/commit: fix leaking change data contents While we free the worktree change data, we never free its contents. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:53 -08:00
Patrick Steinhardt	43fedde3df	builtin/grep: fix leak with `--max-count=0` When executing with `--max-count=0` we'll return early from git-grep(1) without performing any cleanup, which causes memory leaks. Plug these. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:52 -08:00
Patrick Steinhardt	5f5dd8e297	builtin/ls-remote: plug leaking server options The list of server options populated via `OPT_STRING_LIST()` is never cleared, causing a memory leak. Plug it. This leak is exposed by t5702, but plugging it alone does not make the whole test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-04 22:37:51 -08:00
Taylor Blau	1c5a712f26	Merge branch 'jk/dumb-http-finalize' The dumb-http code regressed when the result of re-indexing a pack yielded an .idx file that differs in content from the .idx file it downloaded from the remote. This has been corrected by no longer relying on the .idx file we got from the remote. jk/dumb-http-finalize: packfile: use oidread() instead of hashcpy() to fill object_id packfile: use object_id in find_pack_entry_one() packfile: convert find_sha1_pack() to use object_id http-walker: use object_id instead of bare hash packfile: warn people away from parse_packed_git() packfile: drop sha1_pack_index_name() packfile: drop sha1_pack_name() packfile: drop has_pack_index() dumb-http: store downloaded pack idx as tempfile t5550: count fetches in "previously-fetched .idx" test midx: avoid duplicate packed_git entries	2024-11-01 12:53:32 -04:00
Taylor Blau	20ab7fa3b6	Merge branch 'sa/notes-edit' Teach 'git notes add' and 'git notes append' a new '-e' flag, instructing them to open the note in $GIT_EDITOR before saving. * sa/notes-edit: notes: teach the -e option to edit messages in editor	2024-11-01 12:53:25 -04:00
Taylor Blau	7b1f01f02e	Merge branch 'ss/duplicate-typos' Typofixes. * ss/duplicate-typos: global: Fix duplicate word typos	2024-11-01 12:53:23 -04:00
Taylor Blau	787297b396	Merge branch 'rj/cygwin-exit' Treat ECONNABORTED the same as ECONNRESET in 'git credential-cache' to work around a possible Cygwin regression. This resolves a race condition caused by changes in Cygwin's handling of socket closures, allowing the client to exit cleanly when encountering ECONNABORTED. * rj/cygwin-exit: credential-cache: treat ECONNABORTED like ECONNRESET	2024-11-01 12:53:19 -04:00
Taylor Blau	268fd2fe58	Merge branch 'ps/platform-compat-fixes' Various platform compatibility fixes split out of the larger effort to use Meson as the primary build tool. * ps/platform-compat-fixes: t6006: fix prereq handling with `test_format ()` http: fix build error on FreeBSD builtin/credential-cache: fix missing parameter for stub function t7300: work around platform-specific behaviour with long paths on MinGW t5500, t5601: skip tests which exercise paths with '[::1]' on Cygwin t3404: work around platform-specific behaviour on macOS 10.15 t1401: make invocation of tar(1) work with Win32-provided one t/lib-gpg: fix setup of GNUPGHOME in MinGW t/lib-gitweb: test against the build version of gitweb t/test-lib: wire up NO_ICONV prerequisite t/test-lib: fix quoting of TEST_RESULTS_SAN_FILE	2024-11-01 12:53:17 -04:00
Jeff King	16a186fede	rev-list: skip bitmap traversal for --left-right Running: git rev-list --left-right --use-bitmap-index one...two will produce output without any left-right markers, since the bitmap traversal returns only a single set of reachable commits. Instead we should refuse to use bitmaps here and produce the correct output using a traditional traversal. This is probably not the only remaining option that misbehaves with bitmaps, but it's particularly egregious in that it feels like it _could_ work. Doing two separate traversals for the left/right sides and then taking the symmetric set differences should yield the correct answer, but our traversal code doesn't know how to do that. It's not clear if naively doing two separate traversals would always be a performance win. A traditional traversal only needs to walk down to the merge base, but bitmaps always fill out the full reachability set. So depending on your bitmap coverage, we could end up walking old bits of history twice to fill out the same uninteresting bits on both sides. We'd also of course end up with a very large --boundary set, if the user asked for that. So this might or might not be something worth implementing later. But for now, let's make sure we don't produce the wrong answer if somebody tries it. The test covers this, but also the same thing with "--count" (which is what I originally tried in a real-world case). Ironically the try_bitmap_count() code already realizes that "--left-right" won't work there. But that just causes us to fall back to the regular bitmap traversal code, which itself doesn't handle counting (we produce a list of objects rather than a count). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-11-01 11:02:27 -04:00
Jeff King	479ab76c9f	packfile: use object_id in find_pack_entry_one() The main function we use to search a pack index for an object is find_pack_entry_one(). That function still takes a bare pointer to the hash, despite the fact that its underlying bsearch_pack() function needs an object_id struct. And so we end up making an extra copy of the hash into the struct just to do a lookup. As it turns out, all callers but one already have such an object_id. So we can just take a pointer to that struct and use it directly. This avoids the extra copy and provides a more type-safe interface. The one exception is get_delta_base() in packfile.c, when we are chasing a REF_DELTA from inside the pack (and thus we have a pointer directly to the mmap'd pack memory, not a struct). We can just bump the hashcpy() from inside find_pack_entry_one() to this one caller that needs it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-25 17:35:46 -04:00
Jeff King	4d99559147	packfile: convert find_sha1_pack() to use object_id The find_sha1_pack() function has a few problems: - it's badly named, since it works with any object hash - it takes the hash as a bare pointer rather than an object_id struct We can fix both of these easily, as all callers actually have a real object_id anyway. I also found the existence of this function somewhat confusing, as it is about looking in an arbitrary set of linked packed_git structs. It's good for things like dumb-http which are looking in downloaded remote packs, and not our local packs. But despite the name, it is not a good way to find the pack which contains a local object (it skips the use of the midx, the pack mru list, and so on). So let's also add an explanatory comment above the declaration that may point people in the right direction. I suspect the calls in fast-import.c, which use the packed_git list from the repository struct, could actually just be using find_pack_entry(). But since we'd need to keep it anyway for dumb-http, I didn't dig further there. If we eventually drop dumb-http support, then it might be worth examining them to see if we can get rid of the function entirely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-25 17:35:46 -04:00
Jeff King	4390fea963	packfile: drop sha1_pack_index_name() Like sha1_pack_name() that we dropped in the previous commit, this function uses an error-prone static strbuf and the somewhat misleading name "sha1". The only caller left is in pack-redundant.c. While this command is marked for potential removal in our BreakingChanges document, we still have it for now. But it's simple enough to convert it to use its own strbuf with the underlying odb_pack_name() function, letting us drop the otherwise obsolete function. Note that odb_pack_name() does its own strbuf_reset(), so it's safe to use directly within a loop like this. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-25 17:35:46 -04:00
Taylor Blau	55d12c24d7	Merge branch 'wm/shortlog-hash' Teaches 'shortlog' to explicitly use SHA-1 when operating outside of a repository. * wm/shortlog-hash: builtin/shortlog: explicitly set hash algo when there is no repo	2024-10-25 14:02:49 -04:00
Taylor Blau	0ab43ed95c	Merge branch 'jc/a-commands-without-the-repo' Commands that can also work outside Git have learned to take the repository instance "repo" when we know we are in a repository, and NULL when we are not, in a parameter. The uses of the_repository variable in a few of them have been removed using the new calling convention. * jc/a-commands-without-the-repo: archive: remove the_repository global variable annotate: remove usage of the_repository global git: pass in repo to builtin based on setup_git_directory_gently	2024-10-25 14:02:36 -04:00
Taylor Blau	6cbcc68ea7	Merge branch 'db/submodule-fetch-with-remote-name-fix' A "git fetch" from the superproject going down to a submodule used a wrong remote when the default remote names are set differently between them. * db/submodule-fetch-with-remote-name-fix: submodule: correct remote name with fetch	2024-10-25 14:01:09 -04:00
Taylor Blau	8e08668322	Merge branch 'cw/worktree-relative' An extra worktree attached to a repository points at each other to allow finding the repository from the worktree and vice versa possible. Turn this linkage to relative paths. * cw/worktree-relative: worktree: add test for path handling in linked worktrees worktree: link worktrees with relative paths worktree: refactor infer_backlink() to use *strbuf worktree: repair copied repository and linked worktrees	2024-10-22 14:40:39 -04:00
Sven Strickroth	c32d4a8cfe	global: Fix duplicate word typos Used regex to find these typos: (?<!struct )(?<=\s)([a-z]{1,}) \1(?=\s) Signed-off-by: Sven Strickroth <email@cs-ware.de> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-21 16:05:04 -04:00
Abraham Samuel Adekunle	dab0b9e176	notes: teach the -e option to edit messages in editor Notes can be added to a commit using: - "-m" to provide a message on the command line. - -C to copy a note from a blob object. - -F to read the note from a file. When these options are used, Git does not open an editor, it simply takes the content provided via these options and attaches it to the commit as a note. Improve flexibility to fine-tune the note before finalizing it by allowing the messages to be prefilled in the editor and edited after the messages have been provided through -[mF]. Signed-off-by: Abraham Samuel Adekunle <abrahamadekunle50@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-21 15:52:48 -04:00
Ramsay Jones	468a7e41e8	credential-cache: treat ECONNABORTED like ECONNRESET On Cygwin, t0301 fails because "git credential-cache exit" returns a non-zero exit code. What's supposed to happen here is: 1. The client (the "credential-cache" invocation above) connects to a previously-spawned credential-cache--daemon. 2. The client sends an "exit" command to the daemon. 3. The daemon unlinks the socket and then exits, closing the descriptor back to the client. 4. The client sees EOF on the descriptor and exits successfully. That works on most platforms, and even _used_ to work on Cygwin. But that changed in Cygwin's ef95c03522 (Cygwin: select: Fix FD_CLOSE handling, 2021-04-06). After that commit, the client sees a read error with errno set to ECONNABORTED, and it reports the error and dies. It's not entirely clear if this is a Cygwin bug. It seems that calling fclose() on the filehandles pointing to the sockets is sufficient to avoid this error return, even though exiting should in general look the same from the client's perspective. However, we can't just call fclose() here. It's important in step 3 above to unlink the socket before closing the descriptor to avoid the race mentioned by `7d5e9c9849` (credential-cache--daemon: clarify "exit" action semantics, 2016-03-18). The client will exit as soon as it sees the descriptor close, and the daemon may or may not have actually unlinked the socket by then. That makes test code like this: git credential exit && test_path_is_missing .git-credential-cache racy. So we probably _could_ fix this by calling: delete_tempfile(&socket_file); fclose(in); fclose(out); before we exit(). Or by replacing the exit() with a return up the stack, in which case the fclose() happens as we unwind. But in that case we'd still need to call delete_tempfile() here to avoid the race. But simpler still is that we can notice that we already special-case ECONNRESET on the client side, courtesy of `1f180e5eb9` (credential-cache: interpret an ECONNRESET as an EOF, 2017-07-27). We can just do the same thing here (I suspect that prior to the Cygwin commit that introduced this problem, we were really just seeing ECONNRESET instead of ECONNABORTED, so the "new" problem is just the switch of the errno values). There's loads more debugging in this thread: https://lore.kernel.org/git/9dc3e85f-a532-6cff-de11-1dfb2e4bc6b6@ramsayjones.plus.com/ but I've tried to summarize the useful bits in this commit message. [jk: commit message] Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-18 17:18:05 -04:00
Taylor Blau	c1662a00b6	Merge branch 'ps/maintenance-start-crash-fix' "git maintenance start" crashed due to an uninitialized variable reference, which has been corrected. * ps/maintenance-start-crash-fix: builtin/gc: fix crash when running `git maintenance start`	2024-10-18 13:56:26 -04:00
Taylor Blau	645cc7a2a7	Merge branch 'kh/checkout-ignore-other-docfix' Doc updates. * kh/checkout-ignore-other-docfix: checkout: refer to other-worktree branch, not ref	2024-10-18 13:56:24 -04:00
Wolfgang Müller	b33001645e	builtin/shortlog: explicitly set hash algo when there is no repo Whilst git-shortlog(1) does not explicitly need any repository information when run without reference to one, it still parses some of its arguments with parse_revision_opt() which assumes that the hash algorithm is set. However, in `c8aed5e8da` (repository: stop setting SHA1 as the default object hash, 2024-05-07) we stopped setting up a default hash algorithm and instead require commands to set it up explicitly. This was done for most other commands like in `ab274909d4` (builtin/diff: explicitly set hash algo when there is no repo, 2024-05-07) but was missed for builtin/shortlog, making git-shortlog(1) segfault outside of a repository when given arguments like --author that trigger a call to parse_revision_opt(). Fix this for now by explicitly setting the hash algorithm to SHA1. Also add a regression test for the segfault. Thanks-to: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Wolfgang Müller <wolf@oriole.systems> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-17 16:10:54 -04:00
Patrick Steinhardt	87ad2a9d56	builtin/credential-cache: fix missing parameter for stub function When not compiling the credential cache we may use a stub function for `cmd_credential_cache()`. With commit `9b1cb5070f` (builtin: add a repository parameter for builtin functions, 2024-09-13), we have added a new parameter to all of those top-level `cmd_*()` functions, and did indeed adapt the non-stubbed-out `cmd_credential_cache()`. But we didn't adapt the stubbed-out variant, so the code does not compile. Fix this by adding the missing parameter. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-16 17:00:49 -04:00
Taylor Blau	b43e23fa02	Merge branch 'jk/fsmonitor-event-listener-race-fix' On macOS, fsmonitor can fall into a race condition that results in a client waiting forever to be notified for an event that have already happened. This problem has been corrected. * jk/fsmonitor-event-listener-race-fix: fsmonitor: initialize fs event listener before accepting clients simple-ipc: split async server initialization and running	2024-10-15 16:56:43 -04:00
Taylor Blau	fd98f659fd	Merge branch 'xx/remote-server-option-config' A new configuration variable remote.<name>.serverOption makes the transport layer act as if the --serverOption=<value> option is given from the command line. * xx/remote-server-option-config: ls-remote: leakfix for not clearing server_options fetch: respect --server-option when fetching multiple remotes transport.c:🤝 make use of server options from remote remote: introduce remote.<name>.serverOption configuration transport: introduce parse_transport_option() method	2024-10-15 16:56:43 -04:00
Taylor Blau	f004467b04	Merge branch 'jh/config-unset-doc-fix' Docfix. * jh/config-unset-doc-fix: git-config.1: remove value from positional args in unset usage	2024-10-15 16:56:43 -04:00
Linus Arver	3f0346d4dc	trailer: spread usage of "trailer_block" language Deprecate the "trailer_info" struct name and replace it with "trailer_block". This is more readable, for two reasons: 1. "trailer_info" on the surface sounds like it's about a single trailer when in reality it is a collection of one or more trailers, and 2. the "_block" suffix is more informative than "_info", because it describes a block (or region) of contiguous text which has trailers in it, which has been parsed into the trailer_block structure. Rename the size_t trailer_block_start, trailer_block_end; members of trailer_info to just "start" and "end". Rename the "info" pointer to "trailer_block" because it is more descriptive. Update comments accordingly. Signed-off-by: Linus Arver <linus@ucla.edu> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-14 12:33:02 -04:00
John Cai	528d3e4d53	archive: remove the_repository global variable As part of the effort to get rid of global state due to the global the_repository variable, replace the_repository with the repository argument that gets passed down through the builtin function. The repo might be NULL, but we should be safe in write_archive() because it detects if we are outside of a repository and calls setup_git_directory() which will error. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-11 09:37:18 -07:00
John Cai	ebe8f4b6ec	annotate: remove usage of the_repository global As part of the effort to get rid of global state due to the_repository variable, remove the the_repository with the repository argument that gets passed down through the builtin function. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-11 09:37:18 -07:00
John Cai	5db948d413	git: pass in repo to builtin based on setup_git_directory_gently The current code in run_builtin() passes in a repository to the builtin based on whether cmd_struct's option flag has RUN_SETUP. This is incorrect, however, since some builtins that only have RUN_SETUP_GENTLY can potentially take a repository. setup_git_directory_gently() tells us whether or not a command is being run inside of a repository. Use the output of setup_git_directory_gently() to help determine whether or not there is a repository to pass to the builtin. If not, then we just pass NULL. As part of this patch, we need to modify add to check for a NULL repo before calling repo_git_config(), since add -h can be run outside of a repository. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-11 09:37:17 -07:00
Junio C Hamano	31bc4454de	Merge branch 'ps/leakfixes-part-8' More leakfixes. * ps/leakfixes-part-8: (23 commits) builtin/send-pack: fix leaking list of push options remote: fix leaking push reports t/helper: fix leaks in proc-receive helper pack-write: fix return parameter of `write_rev_file_order()` revision: fix leaking saved parents revision: fix memory leaks when rewriting parents midx-write: fix leaking buffer pack-bitmap-write: fix leaking OID array pseudo-merge: fix leaking strmap keys pseudo-merge: fix various memory leaks line-log: fix several memory leaks diff: improve lifecycle management of diff queues builtin/revert: fix leaking `gpg_sign` and `strategy` config t/helper: fix leaking repository in partial-clone helper builtin/clone: fix leaking repo state when cloning with bundle URIs builtin/pack-redundant: fix various memory leaks builtin/stash: fix leaking `pathspec_from_file` submodule: fix leaking submodule entry list wt-status: fix leaking buffer with sparse directories shell: fix leaking strings ...	2024-10-10 14:22:29 -07:00
Kristoffer Haugsbakk	b8139c8f4e	checkout: refer to other-worktree branch, not ref We can only check out commits or branches, not refs in general. And the problem here is if another worktree is using the branch that we want to check out. Let’s be more direct and just talk about branches instead of refs. Also replace “be held” with “in use”. Further, “in use” is not restricted to a branch being checked out (e.g. the branch could be busy on a rebase), hence generalize to “or otherwise in use” in the option description. Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-10 13:09:13 -07:00
Patrick Steinhardt	c95547a394	builtin/gc: fix crash when running `git maintenance start` It was reported on the mailing list that running `git maintenance start` immediately segfaults starting with `b6c3f8e12c` (builtin/maintenance: fix leak in `get_schedule_cmd()`, 2024-09-26). And indeed, this segfault is trivial to reproduce up to a point where one is scratching their head why we didn't catch this regression in our test suite. The root cause of this error is `get_schedule_cmd()`, which does not populate the `out` parameter in all cases anymore starting with the mentioned commit. Callers do assume it to always be populated though and will e.g. call `strvec_split()` on the returned value, which will of course segfault when the variable is uninitialized. So why didn't we catch this trivial regression? The reason is that our tests always set up the "GIT_TEST_MAINT_SCHEDULER" environment variable via "t/test-lib.sh", which allows us to override the scheduler command with a custom one so that we don't accidentally modify the developer's system. But the faulty code where we don't set the `out` parameter will only get hit in case that environment variable is _not_ set, which is never the case when executing our tests. Fix the regression by again unconditionally allocating the value in the `out` parameter, if provided. Add a test that unsets the environment variable to catch future regressions in this area. Reported-by: Shubham Kanodia <shubham.kanodia10@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-10 10:04:43 -07:00

... 5 6 7 8 9 ...

12959 Commits (seen)