Commit Graph

838 Commits (seen)

Author SHA1 Message Date
Junio C Hamano c62d2d3810 Merge branch 'kn/maintenance-is-needed'
"git maintenance" command learned "is-needed" subcommand to tell if
it is necessary to perform various maintenance tasks.

* kn/maintenance-is-needed:
  maintenance: add 'is-needed' subcommand
  maintenance: add checking logic in `pack_refs_condition()`
  refs: add a `optimize_required` field to `struct ref_storage_be`
  reftable/stack: add function to check if optimization is required
  reftable/stack: return stack segments directly
2025-11-21 09:14:17 -08:00
Junio C Hamano ee27005905 Merge branch 'ps/ref-peeled-tags-fixes'
Another fix-up to "peeled-tags" topic.

* ps/ref-peeled-tags-fixes:
  object: fix performance regression when peeling tags
2025-11-19 10:55:42 -08:00
Junio C Hamano 7ccfc262d7 Merge branch 'kn/refs-optim-cleanup'
Code clean-up.

* kn/refs-optim-cleanup:
  t/pack-refs-tests: move the 'test_done' to callees
  refs: rename 'pack_refs_opts' to 'refs_optimize_opts'
  refs: move to using the '.optimize' functions
2025-11-19 10:55:40 -08:00
Junio C Hamano 13134cecb0 Merge branch 'ps/ref-peeled-tags'
Some ref backend storage can hold not just the object name of an
annotated tag, but the object name of the object the tag points at.
The code to handle this information has been streamlined.

* ps/ref-peeled-tags:
  t7004: do not chdir around in the main process
  ref-filter: fix stale parsed objects
  ref-filter: parse objects on demand
  ref-filter: detect broken tags when dereferencing them
  refs: don't store peeled object IDs for invalid tags
  object: add flag to `peel_object()` to verify object type
  refs: drop infrastructure to peel via iterators
  refs: drop `current_ref_iter` hack
  builtin/show-ref: convert to use `reference_get_peeled_oid()`
  ref-filter: propagate peeled object ID
  upload-pack: convert to use `reference_get_peeled_oid()`
  refs: expose peeled object ID via the iterator
  refs: refactor reference status flags
  refs: fully reset `struct ref_iterator::ref` on iteration
  refs: introduce `.ref` field for the base iterator
  refs: introduce wrapper struct for `each_ref_fn`
2025-11-19 10:55:39 -08:00
Karthik Nayak f6c5ca387a refs: add a `optimize_required` field to `struct ref_storage_be`
To allow users of the refs namespace to check if the reference backend
requires optimization, add a new field `optimize_required` field to
`struct ref_storage_be`. This field is of type `optimize_required_fn`
which is also introduced in this commit.

Modify the debug, files, packed and reftable backend to implement this
field. A following commit will expose this via 'git pack-refs' and 'git
refs optimize'.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-10 09:28:48 -08:00
Junio C Hamano f58ea683b5 Merge branch 'pk/reflog-migrate-message-fix'
Message fix.

* pk/reflog-migrate-message-fix:
  refs: add missing space in messages
2025-11-06 14:52:57 -08:00
Patrick Steinhardt 7048e74609 object: fix performance regression when peeling tags
Our Bencher dashboards [1] have recently alerted us about a bunch of
performance regressions when writing references, specifically with the
reftable backend. There is a 3x regression when writing many refs with
preexisting refs in the reftable format, and a 10x regression when
migrating refs between backends in either of the formats.

Bisecting the issue lands us at 6ec4c0b45b (refs: don't store peeled
object IDs for invalid tags, 2025-10-23). The gist of the commit is that
we may end up storing peeled objects in both reftables and packed-refs
for corrupted tags, where the claimed tagged object type is different
than the actual tagged object type. This will then cause us to create
the `struct object *` with a wrong type, as well, and obviously nothing
good comes out of that.

The fix for this issue was to introduce a new flag to `peel_object()`
that causes us to verify the tagged object's type before writing it into
the refdb -- if the tag is corrupt, we skip writing the peeled value.
To verify whether the peeled value is correct we have to look up the
object type via the ODB and compare the actual type with the claimed
type, and that additional object lookup is costly.

This also explains why we see the regression only when writing refs with
the reftable backend, but we see the regression with both backends when
migrating refs:

  - The reftable backend knows to store peeled values in the new table
    immediately, so it has to try and peel each ref it's about to write
    to the transaction. So the performance regression is visible for all
    writes.

  - The files backend only stores peeled values when writing the
    packed-refs file, so it wouldn't hit the performance regression for
    normal writes. But on ref migrations we know to write all new values
    into the packed-refs file immediately, and that's why we see the
    regression for both backends there.

Taking a step back though reveals an oddity in the new verification
logic: we not only verify the _tagged_ object's type, but we also verify
the type of the tag itself. But this isn't really needed, as we wouldn't
hit the bug in such a case anyway, as we only hit the issue with corrupt
tags claiming an invalid type for the tagged object.

The consequence of this is that we now started to look up the target
object of every single reference we're about to write, regardless of
whether it even is a tag or not. And that is of course quite costly.

Fix the issue by only verifying the type of the tagged objects. This
means that we of course still have a performance hit for actual tags.
But this only happens for writes anyway, and I'd claim it's preferable
to not store corrupted data in the refdb than to be fast here. Rename
the flag accordingly to clarify that we only verify the tagged object's
type.

This fix brings performance back to previous levels:

    Benchmark 1: baseline
      Time (mean ± σ):      46.0 ms ±   0.4 ms    [User: 40.0 ms, System: 5.7 ms]
      Range (min … max):    45.0 ms …  47.1 ms    54 runs

    Benchmark 2: regression
      Time (mean ± σ):     140.2 ms ±   1.3 ms    [User: 77.5 ms, System: 60.5 ms]
      Range (min … max):   138.0 ms … 142.7 ms    20 runs

    Benchmark 3: fix
      Time (mean ± σ):      46.2 ms ±   0.4 ms    [User: 40.2 ms, System: 5.7 ms]
      Range (min … max):    45.0 ms …  47.3 ms    55 runs

    Summary
      update-ref: baseline
        1.00 ± 0.01 times faster than fix
        3.05 ± 0.04 times faster than regression

[1]: https://bencher.dev/perf/git/plots

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06 10:54:34 -08:00
Junio C Hamano 994869e2b5 Merge branch 'ps/ref-peeled-tags' into ps/ref-peeled-tags-fixes
* ps/ref-peeled-tags:
  t7004: do not chdir around in the main process
  ref-filter: fix stale parsed objects
  ref-filter: parse objects on demand
  ref-filter: detect broken tags when dereferencing them
  refs: don't store peeled object IDs for invalid tags
  object: add flag to `peel_object()` to verify object type
  refs: drop infrastructure to peel via iterators
  refs: drop `current_ref_iter` hack
  builtin/show-ref: convert to use `reference_get_peeled_oid()`
  ref-filter: propagate peeled object ID
  upload-pack: convert to use `reference_get_peeled_oid()`
  refs: expose peeled object ID via the iterator
  refs: refactor reference status flags
  refs: fully reset `struct ref_iterator::ref` on iteration
  refs: introduce `.ref` field for the base iterator
  refs: introduce wrapper struct for `each_ref_fn`
2025-11-06 10:54:28 -08:00
Peter Krefting d9988b063f refs: add missing space in messages
Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05 15:04:26 -08:00
Junio C Hamano 517964205c Merge branch 'xr/ref-debug-remove-on-disk'
The "debug" ref-backend was missing a method implementation, which
has been corrected.

* xr/ref-debug-remove-on-disk:
  refs: add missing remove_on_disk implementation for debug backend
2025-11-04 07:48:08 -08:00
Junio C Hamano 31177a8bb6 Merge branch 'kn/refs-optim-cleanup' into kn/maintenance-is-needed
* kn/refs-optim-cleanup:
  t/pack-refs-tests: move the 'test_done' to callees
  refs: rename 'pack_refs_opts' to 'refs_optimize_opts'
  refs: move to using the '.optimize' functions
2025-11-04 07:38:48 -08:00
Junio C Hamano 4a1442a336 Merge branch 'ps/ref-peeled-tags' into kn/maintenance-is-needed
* ps/ref-peeled-tags: (23 commits)
  t7004: do not chdir around in the main process
  ref-filter: fix stale parsed objects
  ref-filter: parse objects on demand
  ref-filter: detect broken tags when dereferencing them
  refs: don't store peeled object IDs for invalid tags
  object: add flag to `peel_object()` to verify object type
  refs: drop infrastructure to peel via iterators
  refs: drop `current_ref_iter` hack
  builtin/show-ref: convert to use `reference_get_peeled_oid()`
  ref-filter: propagate peeled object ID
  upload-pack: convert to use `reference_get_peeled_oid()`
  refs: expose peeled object ID via the iterator
  refs: refactor reference status flags
  refs: fully reset `struct ref_iterator::ref` on iteration
  refs: introduce `.ref` field for the base iterator
  refs: introduce wrapper struct for `each_ref_fn`
  builtin/repo: add progress meter for structure stats
  builtin/repo: add keyvalue and nul format for structure stats
  builtin/repo: add object counts in structure output
  builtin/repo: introduce structure subcommand
  ...
2025-11-04 07:38:27 -08:00
Karthik Nayak 2cd99d9841 refs: rename 'pack_refs_opts' to 'refs_optimize_opts'
The previous commit removed all references to 'pack_refs()' within
the refs subsystem. Continue this cleanup by also renaming
'pack_refs_opts' to 'refs_optimize_opts' and the respective flags
accordingly. Keeping the naming consistent will make the code easier to
maintain.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:35:12 -08:00
Karthik Nayak 9b93ab8a9c refs: move to using the '.optimize' functions
The `struct ref_store` variable exposes two ways to optimize a reftable
backend:

  1. pack_refs
  2. optimize

The former was specific to the 'files' + 'packed' refs backend. The
latter is more generic and covers all backends. While the naming is
different, both of these functions perform the same functionality.

Consolidate this code to only maintain the 'optimize' functions. Do this
by modifying the backends so that they exclusively implement the
`optimize` callback, only. All users of the refs subsystem already use
the 'optimize' function so there is no changes needed on the callee
side. Finally, cleanup all references to the 'pack_refs' field of the
structure and code around it.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:35:12 -08:00
Patrick Steinhardt 6ec4c0b45b refs: don't store peeled object IDs for invalid tags
Both the "files" and "reftable" backend store peeled object IDs for
references that point to tags:

  - The "files" backend stores the value when packing refs, where each
    peeled object ID is prefixed with "^".

  - The "reftable" backend stores the value whenever writing a new
    reference that points to a tag via a special ref record type.

Both of these backends use `peel_object()` to find the peeled object ID.
But as explained in the preceding commit, that function does not detect
the case where the tag's tagged object and its claimed type mismatch.

The consequence of storing these bogus peeled object IDs is that we're
less likely to detect such corruption in other parts of Git.
git-for-each-ref(1) for example does not notice anymore that the tag is
broken when using "--format=%(*objectname)" to dereference tags.

One could claim that this is good, because it still allows us to mostly
use the tag as intended. But the biggest problem here is that we now
have different behaviour for such a broken tag depending on whether or
not we have its peeled value in the refdb.

Fix the issue by verifying the object type when peeling the object. If
that verification fails we simply skip storing the peeled value in
either of the reference formats.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt 7ec85185b1 object: add flag to `peel_object()` to verify object type
When peeling a tag to a non-tag object we repeatedly call
`parse_object()` on the tagged object until we find the first object
that isn't a tag. While this feels sensible at first, there is a big
catch here: `parse_object()` doesn't actually verify the type of the
tagged object.

The relevant code path here eventually ends up in `parse_tag_buffer()`.
Here, we parse the various fields of the tag, including the "type". Once
we've figured out the type and the tagged object ID, we call one of the
`lookup_${type}()` functions for whatever type we have found. There is
two possible outcomes in the successful case:

  1. The object is already part of our cached objects. In that case we
     double-check whether the type we're trying to look up matches the
     type that was cached.

  2. The object is _not_ part of our cached objects. In that case, we
     simply create a new object with the expected type, but we don't
     parse that object.

In the first case we might notice type mismatches, but only in the case
where our cache has the object with the correct type. In the second
case, we'll blindly assume that the type is correct and then go with it.
We'll only notice that the type might be wrong when we try to parse the
object at a later point.

Now arguably, we could change `parse_tag_buffer()` to verify the tagged
object's type for us. But that would have the effect that such a tag
cannot be parsed at all anymore, and we have a small bunch of tests for
exactly this case that assert we still can open such tags. So this
change does not feel like something we can retroactively tighten, even
though one shouldn't ever hit such corrupted tags.

Instead, add a new `flags` field to `peel_object()` that allows the
caller to opt in to strict object verification. This will be wired up at
a subset of callsites over the next few commits.

Note that this change also inlines `deref_tag_noverify()`. There's only
been two callsites of that function, the one we're changing and one in
our test helpers. The latter callsite can trivially use `deref_tag()`
instead, so by inlining the function we avoid having to pass down the
flag.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt 705114772e refs: drop infrastructure to peel via iterators
Now that the peeled object ID gets propagated via the `struct reference`
there is no need anymore to call into the reference iterator itself to
dereference an object. Remove this infrastructure.

Most of the changes are straight-forward deletions of code. There is one
exception though in `refs/packed-backend.c::write_with_updates()`. Here
we stop peeling the iterator and instead just pass the peeled object ID
of that iterator directly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt 5a5c7359f7 refs: drop `current_ref_iter` hack
In preceding commits we have refactored all callers of
`peel_iterated_oid()` to instead use `reference_get_peeled_oid()`. This
allows us to thus get rid of the former function.

Getting rid of that function is nice, but even nicer is that this also
allows us to get rid of the `current_ref_iter` hack. This global
variable tracked the currently-active ref iterator so that we can use it
to peel an object ID. Now that the peeled object ID is propagated via
`struct reference` though we don't have to depend on this hack anymore,
which makes for a more robust and easier-to-understand infrastructure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt f898661637 refs: expose peeled object ID via the iterator
Both the "files" and "reftable" backend are able to store peeled values
for tags in the respective formats. This allows for a more efficient
lookup of the target object of such a tag without having to manually
peel via the object database.

The infrastructure to access these peeled object IDs is somewhat funky
though. When iterating through objects, we store a pointer reference to
the current iterator in a global variable. The callbacks invoked by that
iterator are then expected to call `peel_iterated_oid()`, which checks
whether the globally-stored iterator's current reference refers to the
one handed into that function. If so, we ask the iterator to peel the
object, otherwise we manually peel the object via the object database.
Depending on global state like this is somewhat weird and also quite
fragile.

Introduce a new `struct reference::peeled_oid` field that can be
populated by the reference backends. This field can be accessed via a
new function `reference_get_peeled_oid()` that either uses that value,
if set, or alternatively peels via the ODB. With this change we don't
have to rely on global state anymore, but make the peeled object ID
available to the callback functions directly.

Adjust trivial callers that already have a `struct reference` available.
Remaining callers will be adjusted in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt 4cea042287 refs: fully reset `struct ref_iterator::ref` on iteration
With the introduction of the `struct ref_iterator::ref` field it now is
a whole lot easier to introduce new fields that become accessible to the
caller without having to adapt every single callsite. But there's a
downside: when a new field is introduced we always have to adapt all
backends to set that field.

This isn't something we can avoid in the general case: when the new
field is expected to be populated by all backends we of course cannot
avoid doing so. But new fields may be entirely optional, in which case
we'd still have such churn. And furthermore, it is very easy right now
to leak state from a previous iteration into the next iteration.

Address this issue by ensuring that the reference backends all fully
reset the field on every single iteration. This ensures that no state
from previous iterations can leak into the next one. And it ensures that
any newly introduced fields will be zeroed out by default.

Note that we don't have to explicitly adapt the "files" backend, as it
uses the `cache_ref_iterator` internally. Furthermore, other "wrapping"
iterators like for example the `prefix_ref_iterator` copy around the
whole reference, so these don't need to be adapted either.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt 89baa52da6 refs: introduce `.ref` field for the base iterator
The base iterator has a couple of fields that tracks the name, target,
object ID and flags for the current reference. Due to this design we
have to create a new `struct reference` whenever we want to hand over
that reference to the callback function, which is tedious and not very
efficient.

Convert the structure to instead contain a `struct reference` as member.
This member is expected to be populated by the implementations of the
iterator and is handed over to the callback directly.

While at it, simplify `should_pack_ref()` to take a `struct reference`
directly instead of passing its respective fields.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt bdbebe5714 refs: introduce wrapper struct for `each_ref_fn`
The `each_ref_fn` callback function type is used across our code base
for several different functions that iterate through reference. There's
a bunch of callbacks implementing this type, which makes any changes to
the callback signature extremely noisy. An example of the required churn
is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a
single argument required us to change 48 files.

It was already proposed back then [1] that we might want to introduce a
wrapper structure to alleviate the pain going forward. While this of
course requires the same kind of global refactoring as just introducing
a new parameter, it at least allows us to more change the callback type
afterwards by just extending the wrapper structure.

One counterargument to this refactoring is that it makes the structure
more opaque. While it is obvious which callsites need to be fixed up
when we change the function type, it's not obvious anymore once we use
a structure. That being said, we only have a handful of sites that
actually need to populate this wrapper structure: our ref backends,
"refs/iterator.c" as well as very few sites that invoke the iterator
callback functions directly.

Introduce this wrapper structure so that we can adapt the iterator
interfaces more readily.

[1]: <ZmarVcF5JjsZx0dl@tanuki>

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:24 -08:00
Junio C Hamano 48d0b6545a Merge branch 'ps/symlink-symref-deprecation'
"Symlink symref" has been added to the list of things that will
disappear at Git 3.0 boundary.

* ps/symlink-symref-deprecation:
  refs/files: deprecate writing symrefs as symbolic links
2025-10-30 08:00:19 -07:00
Xinyu Ruan 6661cde2be refs: add missing remove_on_disk implementation for debug backend
The debug ref backend (refs_be_debug) was missing the remove_on_disk
function pointer, which caused a segmentation fault when running
'GIT_TRACE_REFS=1 git refs migrate --ref-format=reftable' commands.

Signed-off-by: Xinyu Ruan <r200981113@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-27 08:57:47 -07:00
Junio C Hamano 4b67e53fd6 Merge branch 'js/unreachable-workaround-for-no-symlink-head' into maint-2.51
Code clean-up.

* js/unreachable-workaround-for-no-symlink-head:
  refs: forbid clang to complain about unreachable code
2025-10-26 19:48:20 -07:00
Junio C Hamano 5a34f66fb9 Merge branch 'js/unreachable-workaround-for-no-symlink-head'
Code clean-up.

* js/unreachable-workaround-for-no-symlink-head:
  refs: forbid clang to complain about unreachable code
2025-10-20 14:12:17 -07:00
Junio C Hamano ff8ef0f9f3 Merge branch 'kn/refs-files-case-insensitive' into maint-2.51
Deal more gracefully with directory / file conflicts when the files
backend is used for ref storage, by failing only the ones that are
involved in the conflict while allowing others.

* kn/refs-files-case-insensitive:
  refs/files: handle D/F conflicts during locking
  refs/files: handle F/D conflicts in case-insensitive FS
  refs/files: use correct error type when lock exists
  refs/files: catch conflicts on case-insensitive file-systems
2025-10-15 10:29:31 -07:00
Junio C Hamano e04c0aded3 Merge branch 'ps/reflog-migrate-fixes' into maint-2.51
"git refs migrate" to migrate the reflog entries from a refs
backend to another had a handful of bugs squashed.

* ps/reflog-migrate-fixes:
  refs: fix invalid old object IDs when migrating reflogs
  refs: stop unsetting REF_HAVE_OLD for log-only updates
  refs/files: detect race when generating reflog entry for HEAD
  refs: fix identity for migrated reflogs
  ident: fix type of string length parameter
  builtin/reflog: implement subcommand to write new entries
  refs: export `ref_transaction_update_reflog()`
  builtin/reflog: improve grouping of subcommands
  Documentation/git-reflog: convert to use synopsis type
2025-10-15 10:29:28 -07:00
Patrick Steinhardt f570bd91b3 refs/files: deprecate writing symrefs as symbolic links
The "files" backend has the ability to store symbolic refs as symbolic
links, which can be configured via "core.preferSymlinkRefs". This
feature stems back from the early days: the initial implementation of
symbolic refs used symlinks exclusively. The symref format was only
introduced in 9b143c6e15 (Teach update-ref about a symbolic ref stored
in a textfile., 2005-09-25) and made the default in 9f0bb90d16
(core.prefersymlinkrefs: use symlinks for .git/HEAD, 2006-05-02).

This is all about 20 years ago, and there are no known reasons nowadays
why one would want to use symlinks instead of symrefs. Mark the feature
for deprecation in Git 3.0.

Note that this only deprecates _writing_ symrefs as symbolic links.
Reading such symrefs is still supported for now.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-15 09:11:08 -07:00
Junio C Hamano f50f046794 Merge branch 'kn/reftable-consistency-checks'
The reftable backend learned to sanity check its on-disk data more
carefully.

* kn/reftable-consistency-checks:
  refs/reftable: add fsck check for checking the table name
  reftable: add code to facilitate consistency checks
  fsck: order 'fsck_msg_type' alphabetically
  Documentation/fsck-msgids: remove duplicate msg id
  reftable: check for trailing newline in 'tables.list'
  refs: move consistency check msg to generic layer
  refs: remove unused headers
2025-10-13 22:00:35 -07:00
Johannes Schindelin 3860985105 refs: forbid clang to complain about unreachable code
When `NO_SYMLINK_HEAD` is defined, `create_ref_symlink()` is hard-coded
as `(-1)`, and as a consequence the condition `!create_ref_symlink()`
always evaluates to false, rendering any code guarded by that condition
unreachable.

Therefore, clang is _technically_ correct when it complains about
unreachable code. It does completely miss the fact that this is okay
because on _other_ platforms, where `NO_SYMLINK_HEAD` is not defined,
the code isn't unreachable at all.

Let's use the same trick as in 82e79c6364 (git-compat-util: add
NOT_CONSTANT macro and use it in atfork_prepare(), 2025-03-17) to
appease clang while at the same time keeping the `-Wunreachable` flag
to potentially find _actually_ unreachable code.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-09 13:22:09 -07:00
Junio C Hamano 8d3abe9f8a Merge branch 'kn/ref-cache-seek-fix'
Handling of an empty subdirectory of .git/refs/ in the ref-files
backend has been corrected.

* kn/ref-cache-seek-fix:
  refs/ref-cache: fix SEGFAULT when seeking in empty directories
2025-10-08 12:17:54 -07:00
Karthik Nayak 466a3a1afd refs/reftable: add fsck check for checking the table name
Add glue code in 'refs/reftable-backend.c' which calls the reftable
library to perform the fsck checks. Here we also map the reftable errors
to Git' fsck errors.

Introduce a check to validate table names for a given reftable stack.
Also add 'badReftableTableName' as a corresponding error within Git. The
reftable specification mentions:

  It suggested to use
  ${min_update_index}-${max_update_index}-${random}.ref as a naming
  convention.

So treat non-conformant file names as warnings.

While adding the fsck header to 'refs/reftable-backend.c', modify the
list to maintain lexicographical ordering.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-07 09:22:58 -07:00
Karthik Nayak 1ef32f0989 refs: move consistency check msg to generic layer
The files-backend prints a message before the consistency checks run.
Move this to the generic layer so both the files and reftable backend
can benefit from this message.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-07 09:22:57 -07:00
Karthik Nayak 2d2920c0ce refs: remove unused headers
In the 'refs/' namespace, some of the included header files are not
needed, let's remove them.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-07 09:22:57 -07:00
Junio C Hamano db0babf9b2 Merge branch 'ms/refs-optimize'
"git refs optimize" is added for not very well explained reason
despite it does the same thing as "git pack-refs"...

* ms/refs-optimize:
  t: add test for git refs optimize subcommand
  t0601: refactor tests to be shareable
  builtin/refs: add optimize subcommand
  doc: pack-refs: factor out common options
  builtin/pack-refs: factor out core logic into a shared library
  builtin/pack-refs: convert to use the generic refs_optimize() API
  reftable-backend: implement 'optimize' action
  files-backend: implement 'optimize' action
  refs: add a generic 'optimize' API
2025-10-02 12:26:12 -07:00
Karthik Nayak 351c6e719a refs/ref-cache: fix SEGFAULT when seeking in empty directories
The 'cache_ref_iterator_seek()' function is used to seek the
`ref_iterator` to the desired reference in the ref-cache mechanism. We
use the seeking functionality to implement the '--start-after' flag in
'git-for-each-ref(1)'.

When using the files-backend with packed-refs, it is possible that some
of the refs directories are empty. For e.g. just after repacking, the
'refs/heads' directory would be empty. The ref-cache seek mechanism,
doesn't take this into consideration when descending into a
subdirectory, and makes an out of bounds access, causing SEGFAULT as we
try to access entries within the directory. Fix this by breaking out of
the loop when we enter an empty directory.

Since we start with the base directory of 'refs/' which is never empty,
it is okay to perform this check after the first iteration in the
`do..while` clause.

Add tests which simulate this behavior and also provide coverage over
using the feature over packed-refs.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-01 13:12:24 -07:00
Junio C Hamano 96ed0a8906 Merge branch 'kn/refs-files-case-insensitive'
Deal more gracefully with directory / file conflicts when the files
backend is used for ref storage, by failing only the ones that are
involved in the conflict while allowing others.

* kn/refs-files-case-insensitive:
  refs/files: handle D/F conflicts during locking
  refs/files: handle F/D conflicts in case-insensitive FS
  refs/files: use correct error type when lock exists
  refs/files: catch conflicts on case-insensitive file-systems
2025-09-29 11:40:35 -07:00
Meet Soni da0849a71e reftable-backend: implement 'optimize' action
To make the new generic `optimize` API fully functional, provide an
implementation for the 'reftable' reference backend.

For the reftable backend, the 'optimize' action is to compact its
tables. The existing `reftable_be_pack_refs()` function already provides
this logic, so the new `reftable_be_optimize()` function simply calls
it.

Wire up the new function to the `optimize` slot in the reftable
backend's virtual table.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-19 10:02:55 -07:00
Meet Soni 1fd6067181 files-backend: implement 'optimize' action
With the generic `refs_optimize()` API now in place, provide the first
implementation for the 'files' reference backend. This makes the new API
functional for existing repositories and serves as the foundation for
migrating user-facing commands to the new architecture.

The implementation simply calls the existing `files_pack_refs()`
function, as 'packing' is the method used to optimize the files-based
reference store.

Wire up the new `files_optimize()` function to the `optimize` slot in
the files backend's virtual table.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-19 10:02:55 -07:00
Meet Soni 8dfe077fb6 refs: add a generic 'optimize' API
The existing `pack-refs` API is conceptually tied to the 'files'
backend, but its behavior is generic (e.g., it triggers compaction for
reftable). This naming is confusing.

Introduce a new generic refs_optimize() API that dispatches to a
backend-specific implementation via a new 'optimize' vtable method.

This lays the architectural groundwork for different reference backends
(like 'files' and 'reftable') to provide their own storage optimization
logic, which will be called from a single, generic entry point.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-19 10:02:55 -07:00
Karthik Nayak 948b2ab0d8 refs/files: handle D/F conflicts during locking
The previous commit added the necessary validation and checks for F/D
conflicts in the files backend when working on case insensitive systems.

There is still a possibility for D/F conflicts. This is a different from
the F/D since for F/D conflicts, there would not be a conflict during
the lock creation phase:

    refs/heads/foo.lock
    refs/heads/foo/bar.lock

However there would be a conflict when the locks are committed, since we
cannot have 'refs/heads/foo/bar' and 'refs/heads/foo'. These kinds of
conflicts are checked and resolved in
`refs_verify_refnames_available()`, so the previous commit ensured that
for case-insensitive filesystems, we would lowercase the inputs to that
function.

For D/F conflicts, there is a conflict during the lock creation phase
itself:

    refs/heads/foo/bar.lock
    refs/heads/foo.lock

As in `lock_raw_ref()` after creating the lock, we also check for D/F
conflicts. This can occur in case-insensitive filesystems when trying to
fetch case-conflicted references like:

    refs/heads/Foo/new
    refs/heads/foo

D/F conflicts can also occur in case-sensitive filesystems, when the
repository already contains a directory with a lock file
'refs/heads/foo/bar.lock' and trying to fetch 'refs/heads/foo'. This
doesn't concern directories containing garbage files as those are
handled on a higher level.

To fix this, simply categorize the error as a name conflict. Also remove
this reference from the list of valid refnames for availability checks.
By categorizing the error and removing it from the list of valid
references, batched updates now knows to reject such reference updates
and apply the other reference updates.

Fix a small typo in `ref_transaction_maybe_set_rejected()` while here.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-17 09:19:08 -07:00
Karthik Nayak 770f389b2d refs/files: handle F/D conflicts in case-insensitive FS
When using the files-backend on case-insensitive filesystems, there is
possibility of hitting F/D conflicts when creating references within a
single transaction, such as:

  - 'refs/heads/foo'
  - 'refs/heads/Foo/bar'

Ideally such conflicts are caught in `refs_verify_refnames_available()`
which is responsible for checking F/D conflicts within a given
transaction. This utility function is shared across the reference
backends. As such, it doesn't consider the issues of using a
case-insensitive file system, which only affects the files-backend.

While one solution would be to make the function aware of such issues,
this feels like leaking implementation details of file-backend specific
issues into the utility function. So opt for the more simpler option, of
lowercasing all references sent to this function when on a
case-insensitive filesystem and operating on the files-backend.

To do this, simply use a `struct strbuf` to convert the refname to
lowercase and append it to the list of refnames to be checked. Since we
use a `struct strbuf` and the memory is cleared right after, make sure
that the string list duplicates all provided string.

Without this change, the user would simply be left with a repository
with '.lock' files which were created in the 'prepare' phase of the
transaction, as the 'commit' phase would simply abort and not do the
necessary cleanup.

Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-17 09:19:08 -07:00
Karthik Nayak 9b62a67bdb refs/files: use correct error type when lock exists
When fetching references into a repository, if a lock for a particular
reference exists, then `lock_raw_ref()` throws:

    - REF_TRANSACTION_ERROR_CASE_CONFLICT: when there is a conflict
    because the transaction contains conflicting references while being
    on a case-insensitive filesystem.

    - REF_TRANSACTION_ERROR_GENERIC: for all other errors.

The latter causes the entire set of batched updates to fail, even in
case sensitive filessystems.

Instead, return a 'REF_TRANSACTION_ERROR_CREATE_EXISTS' error. This
allows batched updates to reject the individual update which conflicts
with the existing file, while updating the rest of the references.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-17 09:19:08 -07:00
Karthik Nayak 3c07063231 refs/files: catch conflicts on case-insensitive file-systems
During the 'prepare' phase of a reference transaction in the files
backend, we create the lock files for references to be created. When
using batched updates on case-insensitive filesystems, the entire
batched updates would be aborted if there are conflicting names such as:

  refs/heads/Foo
  refs/heads/foo

This affects all commands which were migrated to use batched updates in
Git 2.51, including 'git-fetch(1)' and 'git-receive-pack(1)'. Before
that, reference updates would be applied serially with one transaction
used per update. When users fetched multiple references on
case-insensitive systems, subsequent references would simply overwrite
any earlier references. So when fetching:

  refs/heads/foo: 5f34ec0bfeac225b1c854340257a65b106f70ea6
  refs/heads/Foo: ec3053b0977e83d9b67fc32c4527a117953994f3
  refs/heads/sample: 2eefd1150e06d8fca1ddfa684dec016f36bf4e56

The user would simply end up with:

  refs/heads/foo: ec3053b0977e83d9b67fc32c4527a117953994f3
  refs/heads/sample: 2eefd1150e06d8fca1ddfa684dec016f36bf4e56

This is buggy behavior since the user is never informed about the
overrides performed and missing references. Nevertheless, the user is
left with a working repository with a subset of the references. Since
Git 2.51, in such situations fetches would simply fail without updating
any references. Which is also buggy behavior and worse off since the
user is left without any references.

The error is triggered in `lock_raw_ref()` where the files backend
attempts to create a lock file. When a lock file already exists the
function returns a 'REF_TRANSACTION_ERROR_GENERIC'. When this happens,
the entire batched updates, not individual operation, is aborted as if
it were in a transaction.

Change this to return 'REF_TRANSACTION_ERROR_CASE_CONFLICT' instead to
aid the batched update mechanism to simply reject such errors. The
change only affects batched updates since batched updates will reject
individual updates with non-generic errors. So specifically this would
only affect:

    1. git fetch
    2. git receive-pack
    3. git update-ref --batch-updates

This bubbles the error type up to `files_transaction_prepare()` which
tries to lock each reference update. So if the locking fails, we check
if the rejection type can be ignored, which is done by calling
`ref_transaction_maybe_set_rejected()`.

As the error type is now 'REF_TRANSACTION_ERROR_CASE_CONFLICT',
the specific reference update would simply be rejected, while other
updates in the transaction would continue to be applied. This allows
partial application of references in case-insensitive filesystems when
fetching colliding references.

While the earlier implementation allowed the last reference to be
applied overriding the initial references, this change would allow the
first reference to be applied while rejecting consequent collisions.
This should be an okay compromise since with the files backend, there is
no scenario possible where we would retain all colliding references.

Let's also be more proactive and notify users on case-insensitive
filesystems about such problems by providing a brief about the issue
while also recommending using the reftable backend, which doesn't have
the same issue.

Reported-by: Joe Drew <joe.drew@indexexchange.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-17 09:19:07 -07:00
Junio C Hamano fea9d18c53 Merge branch 'jk/no-clobber-dangling-symref-with-fetch'
"git fetch" can clobber a symref that is dangling when the
remote-tracking HEAD is set to auto update, which has been
corrected.

* jk/no-clobber-dangling-symref-with-fetch:
  refs: do not clobber dangling symrefs
  t5510: prefer "git -C" to subshell for followRemoteHEAD tests
  t5510: stop changing top-level working directory
  t5510: make confusing config cleanup more explicit
2025-08-29 09:44:37 -07:00
Junio C Hamano 00c2c50ea6 Merge branch 'ps/reftable-libgit2-cleanup'
Code clean-ups.

* ps/reftable-libgit2-cleanup:
  refs/reftable: always reload stacks when creating lock
  reftable: don't second-guess errors from flock interface
  reftable/stack: handle outdated stacks when compacting
  reftable/stack: allow passing flags to `reftable_stack_add()`
  reftable/stack: fix compiler warning due to missing braces
  reftable/stack: reorder code to avoid forward declarations
  reftable/writer: drop Git-specific `QSORT()` macro
  reftable/writer: fix type used for number of records
2025-08-29 09:44:36 -07:00
Junio C Hamano 9a85fa8406 Merge branch 'ps/remote-rename-fix'
"git remote rename origin upstream" failed to move origin/HEAD to
upstream/HEAD when origin/HEAD is unborn and performed other
renames extremely inefficiently, which has been corrected.

* ps/remote-rename-fix:
  builtin/remote: only iterate through refs that are to be renamed
  builtin/remote: rework how remote refs get renamed
  builtin/remote: determine whether refs need renaming early on
  builtin/remote: fix sign comparison warnings
  refs: simplify logic when migrating reflog entries
  refs: pass refname when invoking reflog entry callback
2025-08-21 13:46:58 -07:00
Junio C Hamano c3c8b6910a Merge branch 'ps/reflog-migrate-fixes'
"git refs migrate" to migrate the reflog entries from a refs
backend to another had a handful of bugs squashed.

* ps/reflog-migrate-fixes:
  refs: fix invalid old object IDs when migrating reflogs
  refs: stop unsetting REF_HAVE_OLD for log-only updates
  refs/files: detect race when generating reflog entry for HEAD
  refs: fix identity for migrated reflogs
  ident: fix type of string length parameter
  builtin/reflog: implement subcommand to write new entries
  refs: export `ref_transaction_update_reflog()`
  builtin/reflog: improve grouping of subcommands
  Documentation/git-reflog: convert to use synopsis type
2025-08-21 13:46:57 -07:00
Jeff King 450fc2bace refs: do not clobber dangling symrefs
When given an expected "before" state, the ref-writing code will avoid
overwriting any ref that does not match that expected state. We use the
null oid as a sentinel value for "nothing should exist", and likewise
that is the sentinel value we get when trying to read a ref that does
not exist.

But there's one corner case where this is ambiguous: dangling symrefs.
Trying to read them will yield the null oid, but there is potentially
something of value there: the dangling symref itself.

For a normal recursive write, this is OK. Imagine we have a symref
"FOO_HEAD" that points to a ref "refs/heads/bar" that does not exist,
and we try to write to it with a create operation like:

  oid=$(git rev-parse HEAD) ;# or whatever
  git symbolic-ref FOO_HEAD refs/heads/bar
  echo "create FOO_HEAD $oid" | git update-ref --stdin

The attempt to resolve FOO_HEAD will actually resolve "bar", yielding
the null oid. That matches our expectation, and the write proceeds. This
is correct, because we are not writing FOO_HEAD at all, but writing its
destination "bar", which in fact does not exist.

But what if the operation asked not to dereference symrefs? Like this:

  echo "create FOO_HEAD $oid" | git update-ref --no-deref --stdin

Resolving FOO_HEAD would still result in a null oid, and the write will
proceed. But it will overwrite FOO_HEAD itself, removing the fact that
it ever pointed to "bar".

This case is a little esoteric; we are clobbering a symref with a
no-deref write of a regular ref value. But the same problem occurs when
writing symrefs. For example:

  echo "symref-create FOO_HEAD refs/heads/other" |
  git update-ref --no-deref --stdin

The "create" operation asked us to create FOO_HEAD only if it did not
exist. But we silently overwrite the existing value.

You can trigger this without using update-ref via the fetch
followRemoteHEAD code. In "create" mode, it should not overwrite an
existing value. But if you manually create a symref pointing to a value
that does not yet exist (either via symbolic-ref or with "remote add
-m"), create mode will happily overwrite it.

Instead, we should detect this case and refuse to write. The correct
specification to overwrite FOO_HEAD in this case is to provide an
expected target ref value, like:

  echo "symref-update FOO_HEAD refs/heads/other ref refs/heads/bar" |
  git update-ref --no-deref --stdin

Note that the non-symref "update" directive does not allow you to do
this (you can only specify an oid). This is a weakness in the update-ref
interface, and you'd have to overwrite unconditionally, like:

  echo "update FOO_HEAD $oid" | git update-ref --no-deref --stdin

Likewise other symref operations like symref-delete do not accept the
"ref" keyword. You should be able to do:

  echo "symref-delete FOO_HEAD ref refs/heads/bar"

but cannot (and can only delete unconditionally). This patch doesn't
address those gaps. We may want to do so in a future patch for
completeness, but it's not clear if anybody actually wants to perform
those operations. The symref update case (specifically, via
followRemoteHEAD) is what I ran into in the wild.

The code for the fix is relatively straight-forward given the discussion
above. But note that we have to implement it independently for the files
and reftable backends. The "old oid" checks happen as part of the
locking process, which is implemented separately for each system. We may
want to factor this out somehow, but it's beyond the scope of this
patch. (Another curiosity is that the messages in the reftable code are
marked for translation, but the ones in the files backend are not. I
followed local convention in each case, but we may want to harmonize
this at some point).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-19 16:06:02 -07:00