Ref storages are typically only initialized once for `the_repository`
and then never released. Until now we got away with that without causing
memory leaks because `the_repository` stays reachable, and because the
ref backend is reachable via `the_repository` its memory basically never
leaks.
This is about to change though because of the upcoming migration logic,
which will create a secondary ref storage. In that case, we will either
have to release the old or new ref storage to avoid leaks.
Implement a new `release` callback and expose it via a new
`ref_storage_release()` function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reference backends have two callbacks `init` and `init_db`. The
similarity of these two callbacks has repeatedly confused me whenever I
was looking at them, where I always had to look up which of them does
what.
Rename the `init_db` callback to `create_on_disk`, which should
hopefully be clearer.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git for-each-ref" learned "--include-root-refs" option to show
even the stuff outside the 'refs/' hierarchy.
* kn/for-all-refs:
for-each-ref: add new option to include root refs
ref-filter: rename 'FILTER_REFS_ALL' to 'FILTER_REFS_REGULAR'
refs: introduce `refs_for_each_include_root_refs()`
refs: extract out `loose_fill_ref_dir_regular_file()`
refs: introduce `is_pseudoref()` and `is_headref()`
Introduce a new ref iteration flag `DO_FOR_EACH_INCLUDE_ROOT_REFS`,
which will be used to iterate over regular refs plus pseudorefs and
HEAD.
Refs which fall outside the `refs/` and aren't either pseudorefs or HEAD
are more of a grey area. This is because we don't block the users from
creating such refs but they are not officially supported.
Introduce `refs_for_each_include_root_refs()` which calls
`do_for_each_ref()` with this newly introduced flag.
In `refs/files-backend.c`, introduce a new function
`add_pseudoref_and_head_entries()` to add pseudorefs and HEAD to the
`ref_dir`. We then finally call `add_pseudoref_and_head_entries()`
whenever the `DO_FOR_EACH_INCLUDE_ROOT_REFS` flag is set. Any new ref
backend will also have to implement similar changes on its end.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commit we have converted the reflog iterator of the
"files" backend to be ordered, which was the only remaining ref iterator
that wasn't ordered. Refactor the ref iterator infrastructure so that we
always assume iterators to be ordered, thus simplifying the code.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When iterating through reflogs in a worktree we create a merged iterator
that merges reflogs from both refdbs. The resulting refs are ordered so
that instead we first return all worktree reflogs before we return all
common refs.
This is the only remaining case where a ref iterator returns entries in
a non-lexicographic order. The result would look something like the
following (listed with a command we introduce in a subsequent commit):
```
$ git reflog list
HEAD
refs/worktree/per-worktree
refs/heads/main
refs/heads/wt
```
So we first print the per-worktree reflogs in lexicographic order, then
the common reflogs in lexicographic order. This is confusing and not
consistent with how we print per-worktree refs, which are exclusively
sorted lexicographically.
Sort reflogs lexicographically in the same way as we sort normal refs.
As this is already implemented properly by the "reftable" backend via a
separate selection function, we simply pull out that logic and reuse it
for the "files" backend. As logs are properly sorted now, mark the
merged reflog iterator as sorted.
Tests will be added in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Due to scalability issues, Shawn Pearce has originally proposed a new
"reftable" format more than six years ago [1]. Initially, this new
format was implemented in JGit with promising results. Around two years
ago, we have then added the "reftable" library to the Git codebase via
a4bbd13be3 (Merge branch 'hn/reftable', 2021-12-15). With this we have
landed all the low-level code to read and write reftables. Notably
missing though was the integration of this low-level code into the Git
code base in the form of a new ref backend that ties all of this
together.
This gap is now finally closed by introducing a new "reftable" backend
into the Git codebase. This new backend promises to bring some notable
improvements to Git repositories:
- It becomes possible to do truly atomic writes where either all refs
are committed to disk or none are. This was not possible with the
"files" backend because ref updates were split across multiple loose
files.
- The disk space required to store many refs is reduced, both compared
to loose refs and packed-refs. This is enabled both by the reftable
format being a binary format, which is more compact, and by prefix
compression.
- We can ignore filesystem-specific behaviour as ref names are not
encoded via paths anymore. This means there is no need to handle
case sensitivity on Windows systems or Unicode precomposition on
macOS.
- There is no need to rewrite the complete refdb anymore every time a
ref is being deleted like it was the case for packed-refs. This
means that ref deletions are now constant time instead of scaling
linearly with the number of refs.
- We can ignore file/directory conflicts so that it becomes possible
to store both "refs/heads/foo" and "refs/heads/foo/bar".
- Due to this property we can retain reflogs for deleted refs. We have
previously been deleting reflogs together with their refs to avoid
file/directory conflicts, which is not necessary anymore.
- We can properly enumerate all refs. With the "files" backend it is
not easily possible to distinguish between refs and non-refs because
they may live side by side in the gitdir.
Not all of these improvements are realized with the current "reftable"
backend implementation. At this point, the new backend is supposed to be
a drop-in replacement for the "files" backend that is used by basically
all Git repositories nowadays. It strives for 1:1 compatibility, which
means that a user can expect the same behaviour regardless of whether
they use the "reftable" backend or the "files" backend for most of the
part.
Most notably, this means we artificially limit the capabilities of the
"reftable" backend to match the limits of the "files" backend. It is not
possible to create refs that would end up with file/directory conflicts,
we do not retain reflogs, we perform stricter-than-necessary checks.
This is done intentionally due to two main reasons:
- It makes it significantly easier to land the "reftable" backend as
tests behave the same. It would be tough to argue for each and every
single test that doesn't pass with the "reftable" backend.
- It ensures compatibility between repositories that use the "files"
backend and repositories that use the "reftable" backend. Like this,
hosters can migrate their repositories to use the "reftable" backend
without causing issues for clients that use the "files" backend in
their clones.
It is expected that these artificial limitations may eventually go away
in the long term.
Performance-wise things very much depend on the actual workload. The
following benchmarks compare the "files" and "reftable" backends in the
current version:
- Creating N refs in separate transactions shows that the "files"
backend is ~50% faster. This is not surprising given that creating a
ref only requires us to create a single loose ref. The "reftable"
backend will also perform auto compaction on updates. In real-world
workloads we would likely also want to perform pack loose refs,
which would likely change the picture.
Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1)
Time (mean ± σ): 2.1 ms ± 0.3 ms [User: 0.6 ms, System: 1.7 ms]
Range (min … max): 1.8 ms … 4.3 ms 133 runs
Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1)
Time (mean ± σ): 2.7 ms ± 0.1 ms [User: 0.6 ms, System: 2.2 ms]
Range (min … max): 2.4 ms … 2.9 ms 132 runs
Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000)
Time (mean ± σ): 1.975 s ± 0.006 s [User: 0.437 s, System: 1.535 s]
Range (min … max): 1.969 s … 1.980 s 3 runs
Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000)
Time (mean ± σ): 2.611 s ± 0.013 s [User: 0.782 s, System: 1.825 s]
Range (min … max): 2.597 s … 2.622 s 3 runs
Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000)
Time (mean ± σ): 198.442 s ± 0.241 s [User: 43.051 s, System: 155.250 s]
Range (min … max): 198.189 s … 198.670 s 3 runs
Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000)
Time (mean ± σ): 294.509 s ± 4.269 s [User: 104.046 s, System: 190.326 s]
Range (min … max): 290.223 s … 298.761 s 3 runs
- Creating N refs in a single transaction shows that the "files"
backend is significantly slower once we start to write many refs.
The "reftable" backend only needs to update two files, whereas the
"files" backend needs to write one file per ref.
Benchmark 1: update-ref: create many refs (refformat = files, refcount = 1)
Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.4 ms, System: 1.4 ms]
Range (min … max): 1.8 ms … 2.6 ms 151 runs
Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 1)
Time (mean ± σ): 2.5 ms ± 0.1 ms [User: 0.7 ms, System: 1.7 ms]
Range (min … max): 2.4 ms … 3.4 ms 148 runs
Benchmark 3: update-ref: create many refs (refformat = files, refcount = 1000)
Time (mean ± σ): 152.5 ms ± 5.2 ms [User: 19.1 ms, System: 133.1 ms]
Range (min … max): 148.5 ms … 167.8 ms 15 runs
Benchmark 4: update-ref: create many refs (refformat = reftable, refcount = 1000)
Time (mean ± σ): 58.0 ms ± 2.5 ms [User: 28.4 ms, System: 29.4 ms]
Range (min … max): 56.3 ms … 72.9 ms 40 runs
Benchmark 5: update-ref: create many refs (refformat = files, refcount = 1000000)
Time (mean ± σ): 152.752 s ± 0.710 s [User: 20.315 s, System: 131.310 s]
Range (min … max): 152.165 s … 153.542 s 3 runs
Benchmark 6: update-ref: create many refs (refformat = reftable, refcount = 1000000)
Time (mean ± σ): 51.912 s ± 0.127 s [User: 26.483 s, System: 25.424 s]
Range (min … max): 51.769 s … 52.012 s 3 runs
- Deleting a ref in a fully-packed repository shows that the "files"
backend scales with the number of refs. The "reftable" backend has
constant-time deletions.
Benchmark 1: update-ref: delete ref (refformat = files, refcount = 1)
Time (mean ± σ): 1.7 ms ± 0.1 ms [User: 0.4 ms, System: 1.2 ms]
Range (min … max): 1.6 ms … 2.1 ms 316 runs
Benchmark 2: update-ref: delete ref (refformat = reftable, refcount = 1)
Time (mean ± σ): 1.8 ms ± 0.1 ms [User: 0.4 ms, System: 1.3 ms]
Range (min … max): 1.7 ms … 2.1 ms 294 runs
Benchmark 3: update-ref: delete ref (refformat = files, refcount = 1000)
Time (mean ± σ): 2.0 ms ± 0.1 ms [User: 0.5 ms, System: 1.4 ms]
Range (min … max): 1.9 ms … 2.5 ms 287 runs
Benchmark 4: update-ref: delete ref (refformat = reftable, refcount = 1000)
Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.5 ms, System: 1.3 ms]
Range (min … max): 1.8 ms … 2.1 ms 217 runs
Benchmark 5: update-ref: delete ref (refformat = files, refcount = 1000000)
Time (mean ± σ): 229.8 ms ± 7.9 ms [User: 182.6 ms, System: 46.8 ms]
Range (min … max): 224.6 ms … 245.2 ms 6 runs
Benchmark 6: update-ref: delete ref (refformat = reftable, refcount = 1000000)
Time (mean ± σ): 2.0 ms ± 0.0 ms [User: 0.6 ms, System: 1.3 ms]
Range (min … max): 2.0 ms … 2.1 ms 3 runs
- Listing all refs shows no significant advantage for either of the
backends. The "files" backend is a bit faster, but not by a
significant margin. When repositories are not packed the "reftable"
backend outperforms the "files" backend because the "reftable"
backend performs auto-compaction.
Benchmark 1: show-ref: print all refs (refformat = files, refcount = 1, packed = true)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.5 ms … 2.0 ms 1729 runs
Benchmark 2: show-ref: print all refs (refformat = reftable, refcount = 1, packed = true)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.5 ms … 1.8 ms 1816 runs
Benchmark 3: show-ref: print all refs (refformat = files, refcount = 1000, packed = true)
Time (mean ± σ): 4.3 ms ± 0.1 ms [User: 0.9 ms, System: 3.3 ms]
Range (min … max): 4.1 ms … 4.6 ms 645 runs
Benchmark 4: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = true)
Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.0 ms, System: 3.3 ms]
Range (min … max): 4.2 ms … 5.9 ms 643 runs
Benchmark 5: show-ref: print all refs (refformat = files, refcount = 1000000, packed = true)
Time (mean ± σ): 2.537 s ± 0.034 s [User: 0.488 s, System: 2.048 s]
Range (min … max): 2.511 s … 2.627 s 10 runs
Benchmark 6: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = true)
Time (mean ± σ): 2.712 s ± 0.017 s [User: 0.653 s, System: 2.059 s]
Range (min … max): 2.692 s … 2.752 s 10 runs
Benchmark 7: show-ref: print all refs (refformat = files, refcount = 1, packed = false)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.5 ms … 1.9 ms 1834 runs
Benchmark 8: show-ref: print all refs (refformat = reftable, refcount = 1, packed = false)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.4 ms … 2.0 ms 1840 runs
Benchmark 9: show-ref: print all refs (refformat = files, refcount = 1000, packed = false)
Time (mean ± σ): 13.8 ms ± 0.2 ms [User: 2.8 ms, System: 10.8 ms]
Range (min … max): 13.3 ms … 14.5 ms 208 runs
Benchmark 10: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = false)
Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.2 ms, System: 3.3 ms]
Range (min … max): 4.3 ms … 6.2 ms 624 runs
Benchmark 11: show-ref: print all refs (refformat = files, refcount = 1000000, packed = false)
Time (mean ± σ): 12.127 s ± 0.129 s [User: 2.675 s, System: 9.451 s]
Range (min … max): 11.965 s … 12.370 s 10 runs
Benchmark 12: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = false)
Time (mean ± σ): 2.799 s ± 0.022 s [User: 0.735 s, System: 2.063 s]
Range (min … max): 2.769 s … 2.836 s 10 runs
- Printing a single ref shows no real difference between the "files"
and "reftable" backends.
Benchmark 1: show-ref: print single ref (refformat = files, refcount = 1)
Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.4 ms, System: 1.0 ms]
Range (min … max): 1.4 ms … 1.8 ms 1779 runs
Benchmark 2: show-ref: print single ref (refformat = reftable, refcount = 1)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.4 ms … 2.5 ms 1753 runs
Benchmark 3: show-ref: print single ref (refformat = files, refcount = 1000)
Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.3 ms, System: 1.1 ms]
Range (min … max): 1.4 ms … 1.9 ms 1840 runs
Benchmark 4: show-ref: print single ref (refformat = reftable, refcount = 1000)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.5 ms … 2.0 ms 1831 runs
Benchmark 5: show-ref: print single ref (refformat = files, refcount = 1000000)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.5 ms … 2.1 ms 1848 runs
Benchmark 6: show-ref: print single ref (refformat = reftable, refcount = 1000000)
Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms]
Range (min … max): 1.5 ms … 2.1 ms 1762 runs
So overall, performance depends on the usecases. Except for many
sequential writes the "reftable" backend is roughly on par or
significantly faster than the "files" backend though. Given that the
"files" backend has received 18 years of optimizations by now this can
be seen as a win. Furthermore, we can expect that the "reftable" backend
will grow faster over time when attention turns more towards
optimizations.
The complete test suite passes, except for those tests explicitly marked
to require the REFFILES prerequisite. Some tests in t0610 are marked as
failing because they depend on still-in-flight bug fixes. Tests can be
run with the new backend by setting the GIT_TEST_DEFAULT_REF_FORMAT
environment variable to "reftable".
There is a single known conceptual incompatibility with the dumb HTTP
transport. As "info/refs" SHOULD NOT contain the HEAD reference, and
because the "HEAD" file is not valid anymore, it is impossible for the
remote client to figure out the default branch without changing the
protocol. This shortcoming needs to be handled in a subsequent patch
series.
As the reftable library has already been introduced a while ago, this
commit message will not go into the details of how exactly the on-disk
format works. Please refer to our preexisting technical documentation at
Documentation/technical/reftable for this.
[1]: https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=S-DpZ5dR1aF+VHVETKG20OQ@mail.gmail.com/
Original-idea-by: Shawn Pearce <spearce@spearce.org>
Based-on-patch-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The purpose of `refs_init_db()` is to initialize the on-disk files of a
new ref database. The function is quite inflexible right now though, as
callers can neither specify the `struct ref_store` nor can they pass any
flags.
Refactor the interface to accept both of these. This will be required so
that we can start initializing per-worktree ref databases via the ref
backend instead of open-coding the initialization in "worktree.c".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to look up ref storage backends, we're currently using a linked
list of backends, where each backend is expected to set up its `next`
pointer to the next ref storage backend. This is kind of a weird setup
as backends need to be aware of other backends without much of a reason.
Refactor the code so that the array of backends is centrally defined in
"refs.c", where each backend is now identified by an integer constant.
Expose functions to translate from those integer constants to the name
and vice versa, which will be required by subsequent patches.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that `refs_delete_refs` is implemented in a generic way via the ref
transaction interfaces there are no callers left that invoke the
`delete_refs` callback anymore. Remove it from all of our backends.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enumerating refs in the packed-refs file, while excluding refs that
match certain patterns, has been optimized.
* tb/refs-exclusion-and-packed-refs:
ls-refs.c: avoid enumerating hidden refs where possible
upload-pack.c: avoid enumerating hidden refs where possible
builtin/receive-pack.c: avoid enumerating hidden references
refs.h: implement `hidden_refs_to_excludes()`
refs.h: let `for_each_namespaced_ref()` take excluded patterns
revision.h: store hidden refs in a `strvec`
refs/packed-backend.c: add trace2 counters for jump list
refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
refs/packed-backend.c: refactor `find_reference_location()`
refs: plumb `exclude_patterns` argument throughout
builtin/for-each-ref.c: add `--exclude` option
ref-filter.c: parameterize match functions over patterns
ref-filter: add `ref_filter_clear()`
ref-filter: clear reachable list pointers after freeing
ref-filter.h: provide `REF_FILTER_INIT`
refs.c: rename `ref_filter`
The subsequent patch will want to access an optional `excluded_patterns`
array within `refs/packed-backend.c` that will cull out certain
references matching any of the given patterns on a best-effort basis.
To do so, the refs subsystem needs to be updated to pass this value
across a number of different locations.
Prepare for a future patch by introducing this plumbing now, passing
NULLs at top-level APIs in order to make that patch less noisy and more
easily readable.
Signed-off-by: Taylor Blau <me@ttaylorr.co>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At GitLab, we have a system that creates ephemeral internal refs that
don't live long before getting deleted. Having an option to exclude
certain refs from a packed-refs file allows these internal references to
be deleted much more efficiently.
Add an --exclude option to the pack-refs builtin, and use the ref
exclusions API to exclude certain refs from being packed into the final
packed-refs file
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We had several header files include cache.h unnecessarily. Remove
those. These have all been verified via both ensuring that
gcc -E $HEADER | grep '"cache.h"'
found no hits and that
cat >temp.c <<EOF &&
#include "git-compat-util.h"
#include "$HEADER"
int main() {}
EOF
gcc -c temp.c
successfully compiles without warnings.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reading of symbolic and non-symbolic references is currently treated the
same in reference backends: we always call `refs_read_raw_ref()` and
then decide based on the returned flags what type it is. This has one
downside though: symbolic references may be treated different from
normal references in a backend from normal references. The packed-refs
backend for example doesn't even know about symbolic references, and as
a result it is pointless to even ask it for one.
There are cases where we really only care about whether a reference is
symbolic or not, but don't care about whether it exists at all or may be
a non-symbolic reference. But it is not possible to optimize for this
case right now, and as a consequence we will always first check for a
loose reference to exist, and if it doesn't, we'll query the packed-refs
backend for a known-to-not-be-symbolic reference. This is inefficient
and requires us to search all packed references even though we know to
not care for the result at all.
Introduce a new function `refs_read_symbolic_ref()` which allows us to
fix this case. This function will only ever return symbolic references
and can thus optimize for the scenario layed out above. By default, if
the backend doesn't provide an implementation for it, we just use the
old code path and fall back to `read_raw_ref()`. But in case the backend
provides its own, more efficient implementation, we will use that one
instead.
Note that this function is explicitly designed to not distinguish
between missing references and non-symbolic references. If it did, we'd
be forced to always search the packed-refs backend to see whether the
symbolic reference the user asked for really doesn't exist, or if it
exists as a non-symbolic reference.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because a deletion of ref would need to remove it from both the
loose ref store and the packed ref store, a delete-ref operation
that logically removes one ref may end up invoking ref-transaction
hook twice, which has been corrected.
* ps/avoid-unnecessary-hook-invocation-with-packed-refs:
refs: skip hooks when deleting uncovered packed refs
refs: do not execute reference-transaction hook on packing refs
refs: demonstrate excessive execution of the reference-transaction hook
refs: allow skipping the reference-transaction hook
refs: allow passing flags when beginning transactions
refs: extract packed_refs_delete_refs() to allow control of transaction
We do not currently have any flags when creating reference transactions,
but we'll add one to disable execution of the reference transaction hook
in some cases.
Allow passing flags to `ref_store_transaction_begin()` to prepare for
this change.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "remainder" of hn/refs-errno-cleanup topic.
* ab/refs-errno-cleanup: (21 commits)
refs API: post-migration API renaming [2/2]
refs API: post-migration API renaming [1/2]
refs API: don't expose "errno" in run_transaction_hook()
refs API: make expand_ref() & repo_dwim_log() not set errno
refs API: make resolve_ref_unsafe() not set errno
refs API: make refs_ref_exists() not set errno
refs API: make refs_resolve_refdup() not set errno
refs tests: ignore ignore errno in test-ref-store helper
refs API: ignore errno in worktree.c's find_shared_symref()
refs API: ignore errno in worktree.c's add_head_info()
refs API: make files_copy_or_rename_ref() et al not set errno
refs API: make loose_fill_ref_dir() not set errno
refs API: make resolve_gitlink_ref() not set errno
refs API: remove refs_read_ref_full() wrapper
refs/files: remove "name exist?" check in lock_ref_oid_basic()
reflog tests: add --updateref tests
refs API: make refs_rename_ref_available() static
refs API: make parse_loose_ref_contents() not set errno
refs API: make refs_read_raw_ref() not set errno
refs API: add a version of refs_resolve_ref_unsafe() with "errno"
...
There is only one caller, builtin/checkout.c, and it hardcodes
force_create=1.
This argument was introduced in abd0cd3a30 (refs: new public ref function:
safe_create_reflog, 2015-07-21), which promised to immediately use it in a
follow-on commit, but that never happened.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Follow through the work to use the repo interface to access
submodule objects in-process, instead of abusing the alternate
object database interface.
* jt/no-abuse-alternate-odb-for-submodules:
submodule: trace adding submodule ODB as alternate
submodule: pass repo to check_has_commit()
object-file: only register submodule ODB if needed
merge-{ort,recursive}: remove add_submodule_odb()
refs: peeling non-the_repository iterators is BUG
refs: teach arbitrary repo support to iterators
refs: plumb repo into ref stores
Move the refs_rename_ref_available() function into
"refs/files-backend.c". It is file-backend specific.
This function was added in 5fe7d825da (refs.c: pass a list of names
to skip to is_refname_available, 2014-05-01) as rename_ref_available()
and was only ever used in this one file-backend specific codepath. So
let's move it there.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the parse_loose_ref_contents() function to stop setting "errno"
and failure, and to instead pass up a "failure_errno" via a
parameter. This requires changing its callers to do the same.
The EINVAL error from parse_loose_ref_contents is used in files-backend
to create a custom error message.
In untangling this we discovered a tricky edge case. The
refs_read_special_head() function was relying on
parse_loose_ref_contents() setting EINVAL.
By converting it to use "saved_errno" we can migrate away from "errno"
in this part of the code entirely, and do away with an existing
"save_errno" pattern, its only purpose was to not clobber the "errno"
we previously needed at the end of files_read_raw_ref().
Let's assert that we can do that by not having files_read_raw_ref()
itself operate on *failure_errno in addition to passing it on. Instead
we'll assert that if we return non-zero we actually do set errno, thus
assuring ourselves and callers that they can trust the resulting
"failure_errno".
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a "failure_errno" to refs_read_raw_ref(), his allows
refs_werrres_ref_unsafe() to pass along its "failure_errno", as a
first step before its own callers are migrated to pass it further up
the chain.
We are leaving out out the refs_read_special_head() in
refs_read_raw_ref() for now, as noted in a subsequent commit moving it
to "failure_errno" will require some special consideration.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref iteration code used to optionally allow dangling refs to be
shown, which has been tightened up.
* jk/ref-paranoia:
refs: drop "broken" flag from for_each_fullref_in()
ref-filter: drop broken-ref code entirely
ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN
repack, prune: drop GIT_REF_PARANOIA settings
refs: turn on GIT_REF_PARANOIA by default
refs: omit dangling symrefs when using GIT_REF_PARANOIA
refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag
refs-internal.h: reorganize DO_FOR_EACH_* flag documentation
refs-internal.h: move DO_FOR_EACH_* flags next to each other
t5312: be more assertive about command failure
t5312: test non-destructive repack
t5312: create bogus ref as necessary
t5312: drop "verbose" helper
t5600: provide detached HEAD for corruption failures
t5516: don't use HEAD ref for invalid ref-deletion tests
t7900: clean up some more broken refs
Note that should_pack_ref() is called when writing refs, which is only
supported for the_repository, hence the_repository is hardcoded there.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for the next 2 patches that adds (partial) support for
arbitrary repositories to ref iterators, plumb a repository into all ref
stores. There are no changes to program logic.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the DO_FOR_EACH_INCLUDE_BROKEN flag is used, we include both actual
corrupt refs (illegal names, missing objects), but also symrefs that
point to nothing. This latter is not really a corruption, but just
something that may happen normally. For example, the symref at
refs/remotes/origin/HEAD may point to a tracking branch which is later
deleted. (The local HEAD may also be unborn, of course, but we do not
access it through ref iteration).
Most callers of for_each_ref() etc, do not care. They don't pass
INCLUDE_BROKEN, so don't see it at all. But for those which do pass it,
this somewhat-normal state causes extra warnings (e.g., from
for-each-ref) or even aborts operations (destructive repacks with
GIT_REF_PARANOIA set).
This patch just introduces the flag and the mechanism; there are no
callers yet (and hence no tests). Two things to note on the
implementation:
- we actually skip any symref that does not resolve to a ref. This
includes ones which point to an invalidly-named ref. You could argue
this is a more serious breakage than simple dangling. But the
overall effect is the same (we could not follow the symref), as well
as the impact on things like REF_PARANOIA (either way, a symref we
can't follow won't impact reachability, because we'll see the ref
itself during iteration). The underlying resolution function doesn't
distinguish these two cases (they both get REF_ISBROKEN).
- we change the iterator in refs/files-backend.c where we check
INCLUDE_BROKEN. There's a matching spot in refs/packed-backend.c,
but we don't know need to do anything there. The packed backend does
not support symrefs at all.
The resulting set of flags might be a bit easier to follow if we broke
this down into "INCLUDE_CORRUPT_REFS" and "INCLUDE_DANGLING_SYMREFS".
But there are a few reasons not do so:
- adding a new OMIT_DANGLING_SYMREFS flag lets us leave existing
callers intact, without changing their behavior (and some of them
really do want to see the dangling symrefs; e.g., t5505 has a test
which expects us to report when a symref becomes dangling)
- they're not actually independent. You cannot say "include dangling
symrefs" without also including refs whose objects are not
reachable, because dangling symrefs by definition do not have an
object. We could tweak the implementation to distinguish this, but
in practice nobody wants to ask for that. Adding the OMIT flag keeps
the implementation simple and makes sure we don't regress the
current behavior.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for the DO_FOR_EACH_* flags is sprinkled over the
refs-internal.h file. We define the two flags in one spot, and then
describe them in more detail far away from there, in the definitions of
refs_ref_iterator_begin() and ref_iterator_advance_fn().
Let's try to organize this a bit better:
- convert the #defines to an enum. This makes it clear that they are
related, and that the enum shows the complete set of flags.
- combine all descriptions for each flag in a single spot, next to the
flag's definition
- use the enum rather than a bare int for functions which take the
flags. This helps readers realize which flags can be used.
- clarify the mention of flags for ref_iterator_advance_fn(). It does
not take flags itself, but is meant to depend on ones set up
earlier.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are currently two DO_FOR_EACH_* flags, which must not have their
bits overlap. Yet they're defined hundreds of lines apart. Let's move
them next to each other to make it clear that they are related and are a
complete set (which matters if you are adding a new flag and would like
to know what the next available bit is).
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes it explicit how alternative ref backends should report errors in
read_raw_ref_fn.
read_raw_ref_fn needs to supply a credible errno for a number of cases. These
are primarily:
1) The files backend calls read_raw_ref from lock_raw_ref, and uses the
resulting error codes to create/remove directories as needed.
2) ENOENT should be translated in a zero OID, optionally with REF_ISBROKEN set,
returning the last successfully resolved symref. This is necessary so
read_raw_ref("HEAD") on an empty repo returns refs/heads/main (or the default branch
du-jour), and we know on which branch to create the first commit.
Make this information flow explicit by adding a failure_errno to the signature
of read_raw_ref. All errnos from the files backend are still propagated
unchanged, even though inspection suggests only ENOTDIR, EISDIR and ENOENT are
relevant.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit does not change code; it documents the fact that an alternate ref
backend does not need to return EINVAL from read_raw_ref_fn to function
properly.
This is correct, because refs_read_raw_ref is only called from;
* resolve_ref_unsafe(), which does not care for the EINVAL errno result.
* refs_verify_refname_available(), which does not inspect errno.
* files-backend.c, where errno is overwritten on failure.
* packed-backend.c (is_packed_transaction_needed), which calls it for the
packed ref backend, which never emits EINVAL.
A grep for EINVAL */*c reveals that no code checks errno against EINVAL after
reading references. In addition, the refs.h file does not mention errno at all.
A grep over resolve_ref_unsafe() turned up the following callers that inspect
errno:
* sequencer.c::print_commit_summary, which uses it for die_errno
* lock_ref_oid_basic(), which only treats EISDIR and ENOTDIR specially.
The files ref backend does use EINVAL. The files backend does not call into
the generic API (refs_read_raw), but into the files-specific function
(files_read_raw_ref), which we are not changing in this commit.
As the errno sideband is unintuitive and error-prone, remove EINVAL
value, as a step towards getting rid of the errno sideband altogether.
Spotted by Ævar Arnfjörð Bjarmason <avarab@gmail.com>.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the the preceding commit the "oid" parameter to reflog_expire()
is always NULL, but it was not cleaned up to reduce the size of the
diff. Let's do that subsequent API and documentation cleanup now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use -1 as error return value throughout.
This removes spurious differences in the GIT_TRACE_REFS output, depending on the
ref storage backend active.
Before, the cached ref_iterator (but only that iterator!) would return
peel_object() output directly. No callers relied on the peel_status values
beyond success/failure. All calls to these functions go through
peel_iterated_oid(), which returns peel_object() as a fallback, but also
squashing the error values.
The iteration interface already passes REF_ISSYMREF and REF_ISBROKEN through the
flags argument, so the additional error values in enum peel_status provide no
value.
The ref iteration interface provides a separate peel() function because certain
formats (eg. packed-refs and reftable) can store the peeled object next to the
tag SHA1. Passing the peeled SHA1 as an optional argument to each_ref_fn maps
more naturally to the implementation of ref databases. Changing the code in this
way is left for a future refactoring.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When set in the environment, GIT_TRACE_REFS makes git print operations and
results as they flow through the ref storage backend. This helps debug
discrepancies between different ref backends.
Example:
$ GIT_TRACE_REFS="1" ./git branch
15:42:09.769631 refs/debug.c:26 ref_store for .git
15:42:09.769681 refs/debug.c:249 read_raw_ref: HEAD: 0000000000000000000000000000000000000000 (=> refs/heads/ref-debug) type 1: 0
15:42:09.769695 refs/debug.c:249 read_raw_ref: refs/heads/ref-debug: 3a238e539b (=> refs/heads/ref-debug) type 0: 0
15:42:09.770282 refs/debug.c:233 ref_iterator_begin: refs/heads/ (0x1)
15:42:09.770290 refs/debug.c:189 iterator_advance: refs/heads/b4 (0)
15:42:09.770295 refs/debug.c:189 iterator_advance: refs/heads/branch3 (0)
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
REF_LOG_ONLY is used in the transaction preparation: if a symref is involved in
a transaction, the referent of the symref should be updated, and the symref
itself should only be updated in the reflog.
Other ref backends will need to duplicate this logic too, so move it to a
central place.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This prepares for handling FETCH_HEAD (which is not a regular ref)
separately from the ref backend.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Regarding reflog messages:
- We expect that a reflog message consists of a single line. The
file format used by the files backend may add a LF after the
message as a delimiter, and output by commands like "git log -g"
may complete such an incomplete line by adding a LF at the end,
but philosophically, the terminating LF is not a part of the
message.
- We however allow callers of refs API to supply a random sequence
of NUL terminated bytes. We cleanse caller-supplied message by
squashing a run of whitespaces into a SP, and by trimming trailing
whitespace, before storing the message. This is how we tolerate,
instead of erring out, a message with LF in it (be it at the end,
in the middle, or both).
Currently, the cleansing of the reflog message is done by the files
backend, before the log is written out. This is sufficient with the
current code, as that is the only backend that writes reflogs. But
new backends can be added that write reflogs, and we'd want the
resulting log message we would read out of "log -g" the same no
matter what backend is used, and moving the code to do so to the
generic layer is a way to do so.
An added benefit is that the "cleansing" function could be updated
later, independent from individual backends, to e.g. allow
multi-line log messages if we wanted to, and when that happens, it
would help a lot to ensure we covered all bases if the cleansing
function (which would be updated) is called from the generic layer.
Side note: I am not interested in supporting multi-line reflog
messages right at the moment (nobody is asking for it), but I
envision that instead of the "squash a run of whitespaces into a SP
and rtrim" cleansing, we can %urlencode problematic bytes in the
message *AND* append a SP at the end, when a new version of Git that
supports multi-line and/or verbatim reflog messages writes a reflog
record. The reading side can detect the presense of SP at the end
(which should have been rtrimmed out if it were written by existing
versions of Git) as a signal that decoding %urlencode recovers the
original reflog message.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document some of the flag options in refs_ref_iterator_begin, and explain how
ref_iterator_advance_fn should handle them.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 60ce76d358 (refs: add repository argument to for_each_replace_ref,
2018-04-11) and 0d296c57ae (refs: allow for_each_replace_ref to handle
arbitrary repositories, 2018-04-11), for_each_replace_ref learned how
to iterate over refs by a given arbitrary repository.
New attempts in the object store conversion have shown that it is useful
to have the repository handle available that the refs iteration is
currently iterating over.
To achieve this goal we will need to add a repository argument to
each_ref_fn in refs.h. However as many callers rely on the signature
such a patch would be too large.
So convert the internals of the ref subsystem first to pass through a
repository argument without exposing the change to the user. Assume
the_repository for the passed through repository, although it is not
used anywhere yet.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>