Dozens of files made use of gettext functions, without explicitly
including gettext.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
gettext.h if they are using it.
However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since a64215b6cd ("object.h: stop depending on cache.h; make
cache.h depend on object.h", 2023-02-24), we have a few headers that
could have replaced their include of cache.h with an include of
object.h. Make that change now.
Some C files had to start including cache.h after this change (or some
smaller header it had brought in), because the C files were depending
on things from cache.h but were only formerly implicitly getting
cache.h through one of these headers being modified in this patch.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These functions were all defined in a separate ident.c already, so
create ident.h and move the declarations into that file.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places. It does mean that we also need to add
includes of alloc.h in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We had several header files include cache.h unnecessarily. Remove
those. These have all been verified via both ensuring that
gcc -E $HEADER | grep '"cache.h"'
found no hits and that
cat >temp.c <<EOF &&
#include "git-compat-util.h"
#include "$HEADER"
int main() {}
EOF
gcc -c temp.c
successfully compiles without warnings.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We had several C files ignoring the rule to include one of the
appropriate headers first; fix that.
While at it, the rule in Documentation/CodingGuidelines about which
header to include has also fallen out of sync, so update the wording to
mention other allowed headers.
Unfortunately, C files in reftable/ don't actually follow the previous
or updated rule. If you follow the #include chain in its C files,
reftable/system.h _tends_ to be first (i.e. record.c first includes
record.h, which first includes basics.h, which first includees
system.h), but not always (e.g. publicbasics.c includes another header
first that does not include system.h). However, I'm going to punt on
making actual changes to the C files in reftable/ since I do not want to
risk bringing it out-of-sync with any version being used externally.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix the sequence to fsync $GIT_DIR/packed-refs file that forgot to
flush its output to the disk..
* ps/fsync-refs-fix:
refs: fix corruption by not correctly syncing packed-refs to disk
At GitLab we have recently received a report where a repository was left
with a corrupted `packed-refs` file after the node hard-crashed even
though `core.fsync=reference` was set. This is something that in theory
should not happen if we correctly did the atomic-rename dance to:
1. Write the data into a temporary file.
2. Synchronize the temporary file to disk.
3. Rename the temporary file into place.
So if we crash in the middle of writing the `packed-refs` file we should
only ever see either the old or the new state of the file.
And while we do the dance when writing the `packed-refs` file, there is
indeed one gotcha: we use a `FILE *` stream to write the temporary file,
but don't flush it before synchronizing it to disk. As a consequence any
data that is still buffered will not get synchronized and a crash of the
machine may cause corruption.
Fix this bug by flushing the file stream before we fsync.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic to handle worktree refs (worktrees/NAME/REF and
main-worktree/REF) existed in two places:
* ref_type() in refs.c
* parse_worktree_ref() in worktree.c
Collapse this logic together in one function parse_worktree_ref():
this avoids having to cross-check the result of parse_worktree_ref()
and ref_type().
Introduce enum ref_worktree_type, which is slightly different from
enum ref_type. The latter is a misleading name (one would think that
'ref_type' would have the symref option).
Instead, enum ref_worktree_type only makes explicit how a refname
relates to a worktree. From this point of view, HEAD and
refs/bisect/abc are the same: they specify the current worktree
implicitly.
The files-backend must avoid packing refs/bisect/* and friends into
packed-refs, so expose is_per_worktree_ref() separately.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75de (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.
Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.
This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b24034754 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.
1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The refs code uses various polymorphic types (e.g., loose vs packed
ref_stores, abstracted iterators). Not every virtual function or
callback needs all of its parameters. Let's mark the unused ones to
quiet -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Functions used with for_each_ref(), etc, need to conform to the
each_ref_fn interface. But most of them don't need every parameter;
let's annotate the unused ones to quiet -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce and apply coccinelle rule to discourage an explicit
comparison between a pointer and NULL, and applies the clean-up to
the maintenance track.
* ep/maint-equals-null-cocci:
tree-wide: apply equals-null.cocci
tree-wide: apply equals-null.cocci
contrib/coccinnelle: add equals-null.cocci
Code clean-up.
* ab/refs-various-fixes:
refs debug: add a wrapper for "read_symbolic_ref"
packed-backend: remove stub BUG(...) functions
misc *.c: use designated initializers for struct assignments
refs: use designated initializers for "struct ref_iterator_vtable"
refs: use designated initializers for "struct ref_storage_be"
Updates to refs traditionally weren't fsync'ed, but we can
configure using core.fsync variable to do so.
* ps/fsync-refs:
core.fsync: new option to harden references
In cd475b3b03 (refs: add ability for backends to special-case reading
of symbolic refs, 2022-03-01) when the "read_symbolic_ref" callback
was added we'd fall back on "refs_read_raw_ref" if there wasn't any
backend implementation of "read_symbolic_ref".
As discussed in the preceding commit this would only happen if we were
running the "debug" backend, e.g. in the "setup for ref completion"
test in t9902-completion.sh with:
GIT_TRACE_REFS=1 git fetch --no-tags other
Let's improve the trace output, but and also eliminate the
now-redundant refs_read_raw_ref() fallback case. As noted in the
preceding commit the "packed" backend will never call
refs_read_symbolic_ref() (nor is it ever going to). For any future
backend such as reftable it's OK to ask that they either implement
this (or a wrapper) themselves.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the stub BUG(...) functions previously used by the "struct
ref_storage_be refs_be_packed" backend.
We never call any functions in the packed backend by using it as a
"normal" primary ref store, instead we'll always initialize a "files"
backend ref-store.
It will then via the "packed_ref_store" member of "struct
files_ref_store" call selected functions in the "packed" backend, and
we'll in addition call others via wrappers in refs.c.
So while these would arguably give us *slightly* more meaningful error
messages we'll NULL the missing members in the initializer anyway, so
we'll reliably get a segfault if we're ever changing the backend and
having it call something it doesn't have.
So there's no need for this verbose boilerplate, and as shown in a
subsequent commit it might even lead to some confusion about the
packed backend being a "real" backend. Let's make it clear that it's
not.
As an aside, this also fixes a warning emitted by SunCC in at least
versions 12.5 and 12.6 of Oracle Developer Studio:
"refs/packed-backend.c", line 1599: warning: Function has no return statement : packed_create_symref
"refs/packed-backend.c", line 1606: warning: Function has no return statement : packed_rename_ref)
"refs/packed-backend.c", line 1613: warning: Function has no return statement : packed_copy_ref
"refs/packed-backend.c", line 1648: warning: Function has no return statement : packed_create_reflog
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the definition of the three refs backends we currently carry to
use designated initializers.
The "= NULL" assignments being retained here are redundant, and could
be removed, but let's keep them for clarity. All of these backends
define almost all fields, so we're not saving much in terms of line
count by omitting these, but e.g. for "refs_be_debug" it's immediately
apparent that we're omitting "init" when comparing its assignment to
the others.
This is a follow-up to similar work merged in bd4232fac3 (Merge
branch 'ab/struct-init', 2021-07-16), a4b9fb6a5c (Merge branch
'ab/designated-initializers-more', 2021-10-18) and a30321b9ea (Merge
branch 'ab/designated-initializers' into
ab/designated-initializers-more, 2021-09-27).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing both loose and packed references to disk we first create a
lockfile, write the updated values into that lockfile, and on commit we
rename the file into place. According to filesystem developers, this
behaviour is broken because applications should always sync data to disk
before doing the final rename to ensure data consistency [1][2][3]. If
applications fail to do this correctly, a hard crash of the machine can
easily result in corrupted on-disk data.
This kind of corruption can in fact be easily observed with Git when the
machine hard-resets shortly after writing references to disk. On
machines with ext4, this will likely lead to the "empty files" problem:
the file has been renamed, but its data has not been synced to disk. The
result is that the reference is corrupt, and in the worst case this can
lead to data loss.
Implement a new option to harden references so that users and admins can
avoid this scenario by syncing locked loose and packed references to
disk before we rename them into place.
[1]: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/
[2]: https://btrfs.wiki.kernel.org/index.php/FAQ (What are the crash guarantees of overwrite-by-rename)
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/ext4.rst (see auto_da_alloc)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading references via `files_read_raw_ref()` we always consult
both the loose reference, and if that wasn't found, we also consult the
packed-refs file. While this makes sense to read a generic reference, it
is wasteful in the case where we only care about symbolic references:
the packed-refs backend does not support them, and thus it cannot ever
return one for us.
Special-case reading of symbolic references for the files backend such
that we always skip asking the packed-refs backend.
We use `refs_read_symbolic_ref()` extensively to determine whether we
need to skip updating local symbolic references during a fetch, which is
why the change results in a significant speedup when doing fetches in
repositories with huge numbers of references. The following benchmark
executes a mirror-fetch in a repository with about 2 million references
via `git fetch --prune --no-write-fetch-head +refs/*:refs/*`:
Benchmark 1: HEAD~
Time (mean ± σ): 68.372 s ± 2.344 s [User: 65.629 s, System: 8.786 s]
Range (min … max): 65.745 s … 70.246 s 3 runs
Benchmark 2: HEAD
Time (mean ± σ): 60.259 s ± 0.343 s [User: 61.019 s, System: 7.245 s]
Range (min … max): 60.003 s … 60.649 s 3 runs
Summary
'HEAD' ran
1.13 ± 0.04 times faster than 'HEAD~'
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reading of symbolic and non-symbolic references is currently treated the
same in reference backends: we always call `refs_read_raw_ref()` and
then decide based on the returned flags what type it is. This has one
downside though: symbolic references may be treated different from
normal references in a backend from normal references. The packed-refs
backend for example doesn't even know about symbolic references, and as
a result it is pointless to even ask it for one.
There are cases where we really only care about whether a reference is
symbolic or not, but don't care about whether it exists at all or may be
a non-symbolic reference. But it is not possible to optimize for this
case right now, and as a consequence we will always first check for a
loose reference to exist, and if it doesn't, we'll query the packed-refs
backend for a known-to-not-be-symbolic reference. This is inefficient
and requires us to search all packed references even though we know to
not care for the result at all.
Introduce a new function `refs_read_symbolic_ref()` which allows us to
fix this case. This function will only ever return symbolic references
and can thus optimize for the scenario layed out above. By default, if
the backend doesn't provide an implementation for it, we just use the
old code path and fall back to `read_raw_ref()`. But in case the backend
provides its own, more efficient implementation, we will use that one
instead.
Note that this function is explicitly designed to not distinguish
between missing references and non-symbolic references. If it did, we'd
be forced to always search the packed-refs backend to see whether the
symbolic reference the user asked for really doesn't exist, or if it
exists as a non-symbolic reference.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because a deletion of ref would need to remove it from both the
loose ref store and the packed ref store, a delete-ref operation
that logically removes one ref may end up invoking ref-transaction
hook twice, which has been corrected.
* ps/avoid-unnecessary-hook-invocation-with-packed-refs:
refs: skip hooks when deleting uncovered packed refs
refs: do not execute reference-transaction hook on packing refs
refs: demonstrate excessive execution of the reference-transaction hook
refs: allow skipping the reference-transaction hook
refs: allow passing flags when beginning transactions
refs: extract packed_refs_delete_refs() to allow control of transaction
Remove the now-unused "failure_errno" parameter from the
refs_resolve_ref_unsafe() signature. In my recent 96f6623ada (Merge
branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its
callers explicitly request the errno via an output parameter.
As that series shows all but one caller ended up passing in a
boilerplate "ignore_errno", since they only cared about whether the
return value was NULL or not, i.e. if the ref could be resolved.
There was one small issue with that series fixed with a follow-up in
31e3912369 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small
bug in that series was fixed.
After those two there was one caller left in sequencer.c that used the
"failure_errno', but as of the preceding commit it uses a boilerplate
"ignore_errno" instead.
This leaves the public refs API without any use of "failure_errno" at
all. We could still do with a bit of cleanup and generalization
between refs.c and refs/files-backend.c before the "reftable"
integration lands, but that's all internal to the reference code
itself.
So let's remove this output parameter. Not only isn't it used now, but
it's unlikely that we'll want it again in the future. We'd like to
slowly move the refs API to a more file-backend independent way of
communicating error codes, having it use a "failure_errno" was only
the first step in that direction. If this or any other function needs
to communicate what specifically is wrong with the requested "refname"
it'll be better to have the function set some output enum of
well-defined error states than piggy-backend on "errno".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When deleting refs from the loose-files refs backend, then we need to be
careful to also delete the same ref from the packed refs backend, if it
exists. If we don't, then deleting the loose ref would "uncover" the
packed ref. We thus always have to queue up deletions of refs for both
the loose and the packed refs backend. This is done in two separate
transactions, where the end result is that the reference-transaction
hook is executed twice for the deleted refs.
This behaviour is quite misleading: it's exposing implementation details
of how the files backend works to the user, in contrast to the logical
updates that we'd really want to expose via the hook. Worse yet, whether
the hook gets executed once or twice depends on how well-packed the
repository is: if the ref only exists as a loose ref, then we execute it
once, otherwise if it is also packed then we execute it twice.
Fix this behaviour and don't execute the reference-transaction hook at
all when refs in the packed-refs backend if it's driven by the files
backend. This works as expected even in case the refs to be deleted only
exist in the packed-refs backend because the loose-backend always queues
refs in its own transaction even if they don't exist such that they can
be locked for concurrent creation. And it also does the right thing in
case neither of the backends has the ref because that would cause the
transaction to fail completely.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reference-transaction hook is supposed to track logical changes to
references, but it currently also gets executed when packing refs in a
repository. This is unexpected and ultimately not all that useful:
packing refs is not supposed to result in any user-visible change to the
refs' state, and it ultimately is an implementation detail of how refs
stores work.
Fix this excessive execution of the hook when packing refs.
Reported-by: Waleed Khan <me@waleedkhan.name>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not currently have any flags when creating reference transactions,
but we'll add one to disable execution of the reference transaction hook
in some cases.
Allow passing flags to `ref_store_transaction_begin()` to prepare for
this change.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When deleting loose refs, then we also have to delete the refs in the
packed backend. This is done by calling `refs_delete_refs()`, which
then uses the packed-backend's logic to delete refs. This doesn't allow
us to exercise any control over the reference transaction which is being
created in the packed backend, which is required in a subsequent commit.
Extract a new function `packed_refs_delete_refs()`, which hosts most of
the logic to delete refs except for creating the transaction itself.
Like this, we can easily create the transaction in the files backend
and thus exert more control over it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a logic error in refs_resolve_ref_unsafe() introduced in a recent
series of mine to abstract the refs API away from errno. See
96f6623ada (Merge branch 'ab/refs-errno-cleanup', 2021-11-29)for that
series.
In that series introduction of "failure_errno" to
refs_resolve_ref_unsafe came in ef18119dec (refs API: add a version
of refs_resolve_ref_unsafe() with "errno", 2021-10-16). There we'd set
"errno = 0" immediately before refs_read_raw_ref(), and then set
"failure_errno" to "errno" if errno was non-zero afterwards.
Then in the next commit 8b72fea7e9 (refs API: make
refs_read_raw_ref() not set errno, 2021-10-16) we started expecting
"refs_read_raw_ref()" to set "failure_errno". It would do that if
refs_read_raw_ref() failed, but it wouldn't be the same errno.
So we might set the "errno" here to any arbitrary bad value, and end
up e.g. returning NULL when we meant to return the refname from
refs_resolve_ref_unsafe(), or the other way around. Instrumenting this
code will reveal cases where refs_read_raw_ref() will fail, and
"errno" and "failure_errno" will be set to different values.
In practice I haven't found a case where this scary bug changed
anything in practice. The reason for that is that we'll not care about
the actual value of "errno" here per-se, but only whether:
1. We have an errno
2. If it's one of ENOENT, EISDIR or ENOTDIR. See the adjacent code
added in a1c1d8170d (refs_resolve_ref_unsafe: handle d/f
conflicts for writes, 2017-10-06)
I.e. if we clobber "failure_errno" with "errno", but it happened to be
one of those three, and we'll clobber it with another one of the three
we were OK.
Perhaps there are cases where the difference ended up mattering, but I
haven't found them. Instrumenting the test suite to fail if "errno"
and "failure_errno" are different shows a lot of failures, checking if
they're different *and* one is but not the other is outside that list
of three "errno" values yields no failures.
But let's fix the obvious bug. We should just stop paying attention to
"errno" in refs_resolve_ref_unsafe(). In addition let's change the
partial resetting of "errno" in files_read_raw_ref() to happen just
before the "return", to ensure that any such bug will be more easily
spotted in the future.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code refactoring in the reflog part of refs API.
* ab/reflog-prep:
reflog + refs-backend: move "verbose" out of the backend
refs files-backend: assume cb->newlog if !EXPIRE_REFLOGS_DRY_RUN
reflog: reduce scope of "struct rev_info"
reflog expire: don't use lookup_commit_reference_gently()
reflog expire: refactor & use "tip_commit" only for UE_NORMAL
reflog expire: use "switch" over enum values
reflog: change one->many worktree->refnames to use a string_list
reflog expire: narrow scope of "cb" in cmd_reflog_expire()
reflog delete: narrow scope of "cmd" passed to count_reflog_ent()
Move the handling of the "verbose" flag entirely out of
"refs/files-backend.c" and into "builtin/reflog.c". This allows the
backend to stop knowing about the EXPIRE_REFLOGS_VERBOSE flag.
The expire_reflog_ent() function shouldn't need to deal with the
implementation detail of whether or not we're emitting verbose output,
by doing this the --verbose output becomes backend-agnostic, so
reftable will get the same output.
I think the output is rather bad currently, and should e.g. be
implemented with some better future mode of progress.[ch], but that's
a topic for another improvement.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's not possible for "cb->newlog" to be NULL if
!EXPIRE_REFLOGS_DRY_RUN, since files_reflog_expire() would have
error()'d and taken the "goto failure" branch if it couldn't open the
file. By not using the "newlog" field private to "file-backend.c"'s
"struct expire_reflog_cb", we can move this verbosity logging to
"builtin/reflog.c" in a subsequent commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is consistent with the calling convention for ref backend creation, and
avoids storing ".git/packed-refs" (the name of a regular file) in a variable called
ref_store::gitdir.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test helper for refs subsystem learned to write bogus and/or
nonexistent object name to refs to simulate error situations we
want to test Git in.
* hn/allow-bogus-oid-in-ref-tests:
t1430: create valid symrefs using test-helper
t1430: remove refs using test-tool
refs: introduce REF_SKIP_REFNAME_VERIFICATION flag
refs: introduce REF_SKIP_OID_VERIFICATION flag
refs: update comment.
test-ref-store: plug memory leak in cmd_delete_refs
test-ref-store: parse symbolic flag constants
test-ref-store: remove force-create argument for create-reflog
Prepare tests on ref API to help testing reftable backends.
* hn/reflog-tests:
refs/debug: trim trailing LF from reflog message
test-ref-store: tweaks to for-each-reflog-ent format
t1405: check for_each_reflog_ent_reverse() more thoroughly
test-ref-store: don't add newline to reflog message
show-branch: show reflog message
This lets the ref-store test helper write non-existent or unparsable objects
into the ref storage.
Use this to make t1006 and t3800 independent of the files storage backend.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
REF_IS_PRUNING is right below this comment, so it clearly does not belong in
this comment. This was apparently introduced in commit 5ac95fee (Nov 5, 2017
"refs: tidy up and adjust visibility of the `ref_update` flags").
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On iteration, the reflog message is always terminated by a newline. Trim it to
avoid clobbering the console with is this extra newline.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "remainder" of hn/refs-errno-cleanup topic.
* ab/refs-errno-cleanup: (21 commits)
refs API: post-migration API renaming [2/2]
refs API: post-migration API renaming [1/2]
refs API: don't expose "errno" in run_transaction_hook()
refs API: make expand_ref() & repo_dwim_log() not set errno
refs API: make resolve_ref_unsafe() not set errno
refs API: make refs_ref_exists() not set errno
refs API: make refs_resolve_refdup() not set errno
refs tests: ignore ignore errno in test-ref-store helper
refs API: ignore errno in worktree.c's find_shared_symref()
refs API: ignore errno in worktree.c's add_head_info()
refs API: make files_copy_or_rename_ref() et al not set errno
refs API: make loose_fill_ref_dir() not set errno
refs API: make resolve_gitlink_ref() not set errno
refs API: remove refs_read_ref_full() wrapper
refs/files: remove "name exist?" check in lock_ref_oid_basic()
reflog tests: add --updateref tests
refs API: make refs_rename_ref_available() static
refs API: make parse_loose_ref_contents() not set errno
refs API: make refs_read_raw_ref() not set errno
refs API: add a version of refs_resolve_ref_unsafe() with "errno"
...
There is only one caller, builtin/checkout.c, and it hardcodes
force_create=1.
This argument was introduced in abd0cd3a30 (refs: new public ref function:
safe_create_reflog, 2015-07-21), which promised to immediately use it in a
follow-on commit, but that never happened.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Follow through the work to use the repo interface to access
submodule objects in-process, instead of abusing the alternate
object database interface.
* jt/no-abuse-alternate-odb-for-submodules:
submodule: trace adding submodule ODB as alternate
submodule: pass repo to check_has_commit()
object-file: only register submodule ODB if needed
merge-{ort,recursive}: remove add_submodule_odb()
refs: peeling non-the_repository iterators is BUG
refs: teach arbitrary repo support to iterators
refs: plumb repo into ref stores
Rename the transitory refs_werrres_ref_unsafe() function to
refs_resolve_ref_unsafe(), now that all callers of the old function
have learned to pass in a "failure_errno" parameter.
The coccinelle semantic patch added in the preceding commit works, but
I couldn't figure out how to get spatch(1) to re-flow these argument
lists (and sometimes make lines way too long), so this rename was done
with:
perl -pi -e 's/refs_werrres_ref_unsafe/refs_resolve_ref_unsafe/g' \
$(git grep -l refs_werrres_ref_unsafe -- '*.c')
But after that "make contrib/coccinelle/refs.cocci.patch" comes up
empty, so the result would have been the same. Let's remove that
transitory semantic patch file, we won't need to retain it for any
other in-flight changes, refs_werrres_ref_unsafe() only existed within
this patch series.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
None of the callers of rename_ref() and copy_ref() care about errno,
and as seen in the context here we already emit our own non-errno
using error() in the case where we'd use it.
So let's have it explicitly ignore errno, and do the same in
commit_ref_update(), which is only used within other code in
files_copy_or_rename_ref() itself which doesn't care about errno
either.
It might actually be sensible to have the callers use errno if the
failure was filesystem-specific, and with the upcoming reftable
backend we don't want to rely on that sort of thing, so let's keep
ignoring that for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the refs_resolve_ref_unsafe() invoked in loose_fill_ref_dir()
to a form that ignores errno. The only eventual caller of this
function is create_ref_cache(), whose callers in turn don't have their
failure depend on any errno set here.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the refs_read_ref_full() wrapper in favor of migrating various
refs.c API users to the underlying refs_werrres_ref_unsafe() function.
A careful reading of these callers shows that the callers of this
function did not care about "errno", by moving away from the
refs_resolve_ref_unsafe() wrapper we can be sure that nothing relies
on it anymore.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In lock_ref_oid_basic() we'll happily lock a reference that doesn't
exist yet. That's normal, and is how references are initially born,
but we don't need to retain checks here in lock_ref_oid_basic() about
the state of the ref, when what we're checking is either checked
already, or something we're about to discover by trying to lock the
ref with raceproof_create_file().
The one exception is the caller in files_reflog_expire(), who passes
us a "type" to find out if the reference is a symref or not. We can
move the that logic over to that caller, which can now defer its
discovery of whether or not the ref is a symref until it's needed. In
the preceding commit an exhaustive regression test was added for that
case in a new test in "t1417-reflog-updateref.sh".
The improved diagnostics here were added in
5b2d8d6f21 (lock_ref_sha1_basic(): improve diagnostics for ref D/F
conflicts, 2015-05-11), and then much of the surrounding code went
away recently in my 245fbba46d (refs/files: remove unused "errno ==
EISDIR" code, 2021-08-23).
The refs_resolve_ref_unsafe() code being removed here looks like it
should be tasked with doing that, but it's actually redundant to other
code.
The reason for that is as noted in 245fbba46d this once widely used
function now only has a handful of callers left, which all handle this
case themselves.
To the extent that we're racy between their check and ours removing
this check actually improves the situation, as we'll be doing fewer
things between the not-under-lock initial check and acquiring the
lock.
Why this is OK for all the remaining callers of lock_ref_oid_basic()
is noted below. There are only two of those callers:
* "git branch -[cm] <oldbranch> <newbranch>":
In files_copy_or_rename_ref() we'll call this when we copy or rename
refs via rename_ref() and copy_ref(). but only after we've checked
if the refname exists already via its own call to
refs_resolve_ref_unsafe() and refs_rename_ref_available().
As the updated comment to the latter here notes neither of those are
actually needed. If we delete not only this code but also
refs_rename_ref_available() we'll do just fine, we'll just emit a
less friendly error message if e.g. "git branch -m A B/C" would have
a D/F conflict with a "B" file.
Actually we'd probably die before that in case reflogs for the
branch existed, i.e. when the try to rename() or copy_file() the
relevant reflog, since if we've got a D/F conflict with a branch
name we'll probably also have the same with its reflogs (but not
necessarily, we might have reflogs, but it might not).
As some #leftoverbits that code seems buggy to me, i.e. the reflog
"protocol" should be to get a lock on the main ref, and then perform
ref and/or reflog operations. That code dates back to
c976d415e5 (git-branch: add options and tests for branch renaming,
2006-11-28) and probably pre-dated the solidifying of that
convention. But in any case, that edge case is not our bug or
problem right now.
* "git reflog expire <ref>":
In files_reflog_expire() we'll call this without previous ref
existence checking in files-backend.c, but that code is in turn
called by code that's just finished checking if the refname whose
reflog we're expiring exists.
See ae35e16cd4 (reflog expire: don't lock reflogs using previously
seen OID, 2021-08-23) for the current state of that code, and
5e6f003ca8 (reflog_expire(): ignore --updateref for symbolic
references, 2015-03-03) for the code we'd break if we only did a
"update = !!ref" here, which is covered by the aforementioned
regression test in "t1417-reflog-updateref.sh".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the refs_rename_ref_available() function into
"refs/files-backend.c". It is file-backend specific.
This function was added in 5fe7d825da (refs.c: pass a list of names
to skip to is_refname_available, 2014-05-01) as rename_ref_available()
and was only ever used in this one file-backend specific codepath. So
let's move it there.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the parse_loose_ref_contents() function to stop setting "errno"
and failure, and to instead pass up a "failure_errno" via a
parameter. This requires changing its callers to do the same.
The EINVAL error from parse_loose_ref_contents is used in files-backend
to create a custom error message.
In untangling this we discovered a tricky edge case. The
refs_read_special_head() function was relying on
parse_loose_ref_contents() setting EINVAL.
By converting it to use "saved_errno" we can migrate away from "errno"
in this part of the code entirely, and do away with an existing
"save_errno" pattern, its only purpose was to not clobber the "errno"
we previously needed at the end of files_read_raw_ref().
Let's assert that we can do that by not having files_read_raw_ref()
itself operate on *failure_errno in addition to passing it on. Instead
we'll assert that if we return non-zero we actually do set errno, thus
assuring ourselves and callers that they can trust the resulting
"failure_errno".
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a "failure_errno" to refs_read_raw_ref(), his allows
refs_werrres_ref_unsafe() to pass along its "failure_errno", as a
first step before its own callers are migrated to pass it further up
the chain.
We are leaving out out the refs_read_special_head() in
refs_read_raw_ref() for now, as noted in a subsequent commit moving it
to "failure_errno" will require some special consideration.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref iteration code used to optionally allow dangling refs to be
shown, which has been tightened up.
* jk/ref-paranoia:
refs: drop "broken" flag from for_each_fullref_in()
ref-filter: drop broken-ref code entirely
ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN
repack, prune: drop GIT_REF_PARANOIA settings
refs: turn on GIT_REF_PARANOIA by default
refs: omit dangling symrefs when using GIT_REF_PARANOIA
refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag
refs-internal.h: reorganize DO_FOR_EACH_* flag documentation
refs-internal.h: move DO_FOR_EACH_* flags next to each other
t5312: be more assertive about command failure
t5312: test non-destructive repack
t5312: create bogus ref as necessary
t5312: drop "verbose" helper
t5600: provide detached HEAD for corruption failures
t5516: don't use HEAD ref for invalid ref-deletion tests
t7900: clean up some more broken refs
There is currently no support for peeling the current ref of an iterator
iterating over a non-the_repository ref store, and none is needed. Thus,
for now, BUG() if that happens.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Note that should_pack_ref() is called when writing refs, which is only
supported for the_repository, hence the_repository is hardcoded there.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for the next 2 patches that adds (partial) support for
arbitrary repositories to ref iterators, plumb a repository into all ref
stores. There are no changes to program logic.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Futz with the way 'errno' is relied on in the refs API to carry the
failure modes up the call chain.
* hn/refs-errno-cleanup:
refs: make errno output explicit for read_raw_ref_fn
refs/files-backend: stop setting errno from lock_ref_oid_basic
refs: remove EINVAL errno output from specification of read_raw_ref_fn
refs file backend: move raceproof_create_file() here
Continued work on top of the hn/refs-errno-cleanup topic.
* ab/refs-files-cleanup:
refs/files: remove unused "errno != ENOTDIR" condition
refs/files: remove unused "errno == EISDIR" code
refs/files: remove unused "oid" in lock_ref_oid_basic()
refs API: remove OID argument to reflog_expire()
reflog expire: don't lock reflogs using previously seen OID
refs/files: add a comment about refs_reflog_exists() call
refs: make repo_dwim_log() accept a NULL oid
refs/debug: re-indent argument list for "prepare"
refs/files: remove unused "skip" in lock_raw_ref() too
refs/files: remove unused "extras/skip" in lock_ref_oid_basic()
refs: drop unused "flags" parameter to lock_ref_oid_basic()
refs/files: remove unused REF_DELETING in lock_ref_oid_basic()
refs/packet: add missing BUG() invocations to reflog callbacks
Remove the now-unused "incomplete" parameter from create_dir_entry(),
all its callers specify it as "1", so let's drop the "incomplete=0"
case. The last caller to use it was search_for_subdir(), but that code
was removed in the preceding commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the "mkdir" parameter from the find_containing_dir() function,
the add_ref_entry() function removed in the preceding commit was its
last user.
Since "mkdir" is always "0" we can also remove the parameter from
search_for_subdir(), which in turn means that we can delete most of
that function.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function has not been used since 9dd389f3d8 (packed_ref_store:
get rid of the `ref_cache` entirely, 2017-09-25).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function was missed in 9939b33d6a (packed-backend: rip out some
now-unused code, 2017-09-08), and has been orphaned since then. Let's
delete it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the DO_FOR_EACH_INCLUDE_BROKEN flag is used, we include both actual
corrupt refs (illegal names, missing objects), but also symrefs that
point to nothing. This latter is not really a corruption, but just
something that may happen normally. For example, the symref at
refs/remotes/origin/HEAD may point to a tracking branch which is later
deleted. (The local HEAD may also be unborn, of course, but we do not
access it through ref iteration).
Most callers of for_each_ref() etc, do not care. They don't pass
INCLUDE_BROKEN, so don't see it at all. But for those which do pass it,
this somewhat-normal state causes extra warnings (e.g., from
for-each-ref) or even aborts operations (destructive repacks with
GIT_REF_PARANOIA set).
This patch just introduces the flag and the mechanism; there are no
callers yet (and hence no tests). Two things to note on the
implementation:
- we actually skip any symref that does not resolve to a ref. This
includes ones which point to an invalidly-named ref. You could argue
this is a more serious breakage than simple dangling. But the
overall effect is the same (we could not follow the symref), as well
as the impact on things like REF_PARANOIA (either way, a symref we
can't follow won't impact reachability, because we'll see the ref
itself during iteration). The underlying resolution function doesn't
distinguish these two cases (they both get REF_ISBROKEN).
- we change the iterator in refs/files-backend.c where we check
INCLUDE_BROKEN. There's a matching spot in refs/packed-backend.c,
but we don't know need to do anything there. The packed backend does
not support symrefs at all.
The resulting set of flags might be a bit easier to follow if we broke
this down into "INCLUDE_CORRUPT_REFS" and "INCLUDE_DANGLING_SYMREFS".
But there are a few reasons not do so:
- adding a new OMIT_DANGLING_SYMREFS flag lets us leave existing
callers intact, without changing their behavior (and some of them
really do want to see the dangling symrefs; e.g., t5505 has a test
which expects us to report when a symref becomes dangling)
- they're not actually independent. You cannot say "include dangling
symrefs" without also including refs whose objects are not
reachable, because dangling symrefs by definition do not have an
object. We could tweak the implementation to distinguish this, but
in practice nobody wants to ask for that. Adding the OMIT flag keeps
the implementation simple and makes sure we don't regress the
current behavior.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for the DO_FOR_EACH_* flags is sprinkled over the
refs-internal.h file. We define the two flags in one spot, and then
describe them in more detail far away from there, in the definitions of
refs_ref_iterator_begin() and ref_iterator_advance_fn().
Let's try to organize this a bit better:
- convert the #defines to an enum. This makes it clear that they are
related, and that the enum shows the complete set of flags.
- combine all descriptions for each flag in a single spot, next to the
flag's definition
- use the enum rather than a bare int for functions which take the
flags. This helps readers realize which flags can be used.
- clarify the mention of flags for ref_iterator_advance_fn(). It does
not take flags itself, but is meant to depend on ones set up
earlier.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are currently two DO_FOR_EACH_* flags, which must not have their
bits overlap. Yet they're defined hundreds of lines apart. Let's move
them next to each other to make it clear that they are related and are a
complete set (which matters if you are adding a new flag and would like
to know what the next available bit is).
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We only need to provide a mode if we are willing to let open(2) create
the file, which is not the case here, so drop the unnecessary parameter.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes it explicit how alternative ref backends should report errors in
read_raw_ref_fn.
read_raw_ref_fn needs to supply a credible errno for a number of cases. These
are primarily:
1) The files backend calls read_raw_ref from lock_raw_ref, and uses the
resulting error codes to create/remove directories as needed.
2) ENOENT should be translated in a zero OID, optionally with REF_ISBROKEN set,
returning the last successfully resolved symref. This is necessary so
read_raw_ref("HEAD") on an empty repo returns refs/heads/main (or the default branch
du-jour), and we know on which branch to create the first commit.
Make this information flow explicit by adding a failure_errno to the signature
of read_raw_ref. All errnos from the files backend are still propagated
unchanged, even though inspection suggests only ENOTDIR, EISDIR and ENOENT are
relevant.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/files-backend.c::lock_ref_oid_basic() tries to signal how it failed
to its callers using errno.
It is safe to stop setting errno here, because the callers of this
file-scope static function are
* files_copy_or_rename_ref()
* files_create_symref()
* files_reflog_expire()
None of them looks at errno after seeing a negative return from
lock_ref_oid_basic() to make any decision, and no caller of these three
functions looks at errno after they signal a failure by returning a
negative value. In particular,
* files_copy_or_rename_ref() - here, calls are followed by error()
(which performs I/O) or write_ref_to_lockfile() (which calls
parse_object() which may perform I/O)
* files_create_symref() - here, calls are followed by error() or
create_symref_locked() (which performs I/O and does not inspect
errno)
* files_reflog_expire() - here, calls are followed by error() or
refs_reflog_exists() (which calls a function in a vtable that is not
documented to use and/or preserve errno)
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit does not change code; it documents the fact that an alternate ref
backend does not need to return EINVAL from read_raw_ref_fn to function
properly.
This is correct, because refs_read_raw_ref is only called from;
* resolve_ref_unsafe(), which does not care for the EINVAL errno result.
* refs_verify_refname_available(), which does not inspect errno.
* files-backend.c, where errno is overwritten on failure.
* packed-backend.c (is_packed_transaction_needed), which calls it for the
packed ref backend, which never emits EINVAL.
A grep for EINVAL */*c reveals that no code checks errno against EINVAL after
reading references. In addition, the refs.h file does not mention errno at all.
A grep over resolve_ref_unsafe() turned up the following callers that inspect
errno:
* sequencer.c::print_commit_summary, which uses it for die_errno
* lock_ref_oid_basic(), which only treats EISDIR and ENOTDIR specially.
The files ref backend does use EINVAL. The files backend does not call into
the generic API (refs_read_raw), but into the files-specific function
(files_read_raw_ref), which we are not changing in this commit.
As the errno sideband is unintuitive and error-prone, remove EINVAL
value, as a step towards getting rid of the errno sideband altogether.
Spotted by Ævar Arnfjörð Bjarmason <avarab@gmail.com>.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the raceproof_create_file() API added to cache.h and
object-file.c in 177978f56a (raceproof_create_file(): new function,
2017-01-06) to its only user, refs/files-backend.c.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a follow-up to the preceding commit where we removed the adjacent
"errno == EISDIR" condition in the same function, remove the
"last_errno != ENOTDIR" condition here.
It's not possible for us to hit this condition added in
5b2d8d6f21 (lock_ref_sha1_basic(): improve diagnostics for ref D/F
conflicts, 2015-05-11). Since a1c1d8170d (refs_resolve_ref_unsafe:
handle d/f conflicts for writes, 2017-10-06) we've explicitly caught
these in refs_resolve_ref_unsafe() before returning NULL:
if (errno != ENOENT &&
errno != EISDIR &&
errno != ENOTDIR)
return NULL;
We'd then always return the refname from refs_resolve_ref_unsafe()
even if we were in a broken state as explained in the preceding
commit. The elided context here is a call to refs_resolve_ref_unsafe().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we lock a reference like "foo" we need to handle the case where
"foo" exists, but is an empty directory. That's what this code added
in bc7127ef0f (ref locking: allow 'foo' when 'foo/bar' used to exist
but not anymore., 2006-09-30) seems like it should be dealing with.
Except it doesn't, and we never take this branch. The reason is that
when bc7127ef0f was written this looked like:
ref = resolve_ref([...]);
if (!ref && errno == EISDIR) {
[...]
And in resolve_ref() we had this code:
fd = open(path, O_RDONLY);
if (fd < 0)
return NULL;
I.e. we would attempt to read "foo" with open(), which would fail with
EISDIR and we'd return NULL. We'd then take this branch, call
remove_empty_directories() and continue.
Since a1c1d8170d (refs_resolve_ref_unsafe: handle d/f conflicts for
writes, 2017-10-06) we don't. E.g. in the case of
files_copy_or_rename_ref() our callstack will look something like:
[...] ->
files_copy_or_rename_ref() ->
lock_ref_oid_basic() ->
refs_resolve_ref_unsafe()
At that point the first (now only) refs_resolve_ref_unsafe() call in
lock_ref_oid_basic() would do the equivalent of this in the resulting
call to refs_read_raw_ref() in refs_resolve_ref_unsafe():
/* Via refs_read_raw_ref() */
fd = open(path, O_RDONLY);
if (fd < 0)
/* get errno == EISDIR */
/* later, in refs_resolve_ref_unsafe() */
if ([...] && errno != EISDIR)
return NULL;
[...]
/* returns the refs/heads/foo to the caller, even though it's a directory */
return refname;
I.e. even though we got an "errno == EISDIR" we won't take this
branch, since in cases of EISDIR "resolved" is always
non-NULL. I.e. we pretend at this point as though everything's OK and
there is no "foo" directory.
We then proceed with the entire ref update and don't call
remove_empty_directories() until we call commit_ref_update(). See
5387c0d883 (commit_ref(): if there is an empty dir in the way, delete
it, 2016-05-05) for the addition of that code, and
a1c1d8170d (refs_resolve_ref_unsafe: handle d/f conflicts for writes,
2017-10-06) for the commit that changed the original codepath added in
bc7127ef0f to use this "EISDIR" handling.
Further historical commentary:
Before the two preceding commits the caller in files_reflog_expire()
was the only one out of our 4 callers that would pass non-NULL as an
oid. We would then set a (now gone) "resolve_flags" to
"RESOLVE_REF_READING" and just before that "errno != EISDIR" check do:
if (resolve_flags & RESOLVE_REF_READING)
return NULL;
There may have been some case where this ended up mattering and we
couldn't safely make this change before we removed the "oid"
parameter, but I don't think there was, see [1] for some discussion on
that.
In any case, now that we've removed the "oid" parameter in a preceding
commit we can be sure that this code is redundant, so let's remove it.
1. http://lore.kernel.org/git/871r801yp6.fsf@evledraar.gmail.com
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commit the last caller that passed a non-NULL OID was
changed to pass NULL to lock_ref_oid_basic(). As noted in preceding
commits use of this API has been going away (we should use ref
transactions, or lock_raw_ref()), so we're unlikely to gain new
callers that want to pass the "oid".
So let's remove it, doing so means we can remove the "mustexist"
condition, and therefore anything except the "flags =
RESOLVE_REF_NO_RECURSE" case.
Furthermore, since the verify_lock() function we called did most of
its work when the "oid" was passed (as "old_oid") we can inline the
trivial part of it that remains in its only remaining caller. Without
a NULL "oid" passed it was equivalent to calling refs_read_ref_full()
followed by oidclr().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the the preceding commit the "oid" parameter to reflog_expire()
is always NULL, but it was not cleaned up to reduce the size of the
diff. Let's do that subsequent API and documentation cleanup now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During reflog expiry, the cmd_reflog_expire() function first iterates
over all reflogs in logs/*, and then one-by-one acquires the lock for
each one and expires it. This behavior has been with us since this
command was implemented in 4264dc15e1 ("git reflog expire",
2006-12-19).
Change this to stop calling lock_ref_oid_basic() with the OID we saw
when we looped over the logs, instead have it pass the OID it managed
to lock.
This mostly mitigates a race condition where e.g. "git gc" will fail
in a concurrently updated repository because the branch moved since
"git reflog expire --all" was started. I.e. with:
error: cannot lock ref '<refname>': ref '<refname>' is at <OID-A> but expected <OID-B>
This behavior of passing in an "oid" was needed for an edge-case that
I've untangled in this and preceding commits though, namely that we
needed this OID because we'd:
1. Lookup the reflog name/OID via dwim_log()
2. With that OID, lock the reflog
3. Later in builtin/reflog.c we use the OID we looked as input to
lookup_commit_reference_gently(), assured that it's equal to the
OID we got from dwim_log().
We can be sure that this change is safe to make because between
dwim_log (step #1) and lock_ref_oid_basic (step #2) there was no other
logic relevant to the OID or expiry run in the cmd_reflog_expire()
caller.
We can thus treat that code as a black box, before and after this
change it would get an OID that's been locked, the only difference is
that now we mostly won't be failing to get the lock due to the TOCTOU
race[0]. That failure was purely an implementation detail in how the
"current OID" was looked up, it was divorced from the locking
mechanism.
What do we mean with "mostly"? It mostly mitigates it because we'll
still run into cases where the ref is locked and being updated as we
want to expire it, and other git processes wanting to update the refs
will in turn race with us as we expire the reflog.
That remaining race can in turn be mitigated with the
core.filesRefLockTimeout setting, see 4ff0f01cb7 ("refs: retry
acquiring reference locks for 100ms", 2017-08-21). In practice if that
value is high enough we'll probably never have ref updates or reflog
expiry failing, since the clients involved will retry for far longer
than the time any of those operations could take.
See [1] for an initial report of how this impacted "git gc" and a
large discussion about this change in early 2019. In particular patch
looked good to Michael Haggerty, see his[2]. That message seems to not
have made it to the ML archive, its content is quoted in full in my
[3].
I'm leaving behind now-unused code the refs API etc. that takes the
now-NULL "unused_oid" argument, and other code that can be simplified now
that we never have on OID in that context, that'll be cleaned up in
subsequent commits, but for now let's narrowly focus on fixing the
"git gc" issue. As the modified assert() shows we always pass a NULL
oid to reflog_expire() now.
Unfortunately this sort of probabilistic contention is hard to turn
into a test. I've tested this by running the following three subshells
in concurrent terminals:
(
rm -rf /tmp/git &&
git init /tmp/git &&
while true
do
head -c 10 /dev/urandom | hexdump >/tmp/git/out &&
git -C /tmp/git add out &&
git -C /tmp/git commit -m"out"
done
)
(
rm -rf /tmp/git-clone &&
git clone file:///tmp/git /tmp/git-clone &&
while git -C /tmp/git-clone pull
do
date
done
)
(
while git -C /tmp/git-clone reflog expire --all
do
date
done
)
Before this change the "reflog expire" would fail really quickly with
the "but expected" error noted above.
After this change both the "pull" and "reflog expire" will run for a
while, but eventually fail because I get unlucky with
core.filesRefLockTimeout (the "reflog expire" is in a really tight
loop). As noted above that can in turn be mitigated with higher values
of core.filesRefLockTimeout than the 100ms default.
As noted in the commentary added in the preceding commit there's also
the case of branches being racily deleted, that can be tested by
adding this to the above:
(
while git -C /tmp/git-clone branch topic master &&
git -C /tmp/git-clone branch -D topic
do
date
done
)
With core.filesRefLockTimeout set to 10 seconds (it can probably be a
lot lower) I managed to run all four of these concurrently for about
an hour, and accumulated ~125k commits, auto-gc's and all, and didn't
have a single failure. The loops visibly stall while waiting for the
lock, but that's expected and desired behavior.
0. https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use
1. https://lore.kernel.org/git/87tvg7brlm.fsf@evledraar.gmail.com/
2. http://lore.kernel.org/git/b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu
3. https://lore.kernel.org/git/87pnqkco8v.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a comment about why it is that we need to check for the the
existence of a reflog we're deleting after we've successfully acquired
the lock in files_reflog_expire(). As noted in [1] the lock protocol
for reflogs is somewhat intuitive.
This early exit code the comment applies to dates all the way back to
4264dc15e1 (git reflog expire, 2006-12-19).
1. https://lore.kernel.org/git/54DCDA42.2060800@alum.mit.edu/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Re-indent this argument list that's been mis-indented since it was
added in 34c319970d (refs/debug: trace into reflog expiry too,
2021-04-23). This makes a subsequent change smaller.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the unused "skip" parameter to lock_raw_ref(), it was never
used. We do use it when passing "skip" to the
refs_rename_ref_available() function in files_copy_or_rename_ref(),
but not here.
This is part of a larger series that modifies lock_ref_oid_basic()
extensively, there will be no more modifications of this function in
this series, but since the preceding commit removed this unused
parameter from lock_ref_oid_basic(), let's do it here too for
consistency.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The lock_ref_oid_basic() function has gradually been replaced by use
of the file transaction API, there are only 4 remaining callers of
it.
None of those callers pass non-NULL "extras" and "skip" parameters,
the last such caller went away in 92b1551b1d (refs: resolve symbolic
refs first, 2016-04-25), so let's remove the parameters.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the last commit we removed the REF_DELETING flag from
lock_ref_oid_basic(). Since then all of the remaining callers do pass
REF_NO_DEREF, but that has been ignored completely since
7a418f3a17 (lock_ref_sha1_basic(): only handle REF_NODEREF mode,
2016-04-22).
So we can simply get rid of the parameter entirely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The lock_ref_oid_basic() function has gradually been replaced by
most callers no longer performing a low-level "acquire lock,
update and release", and instead using the ref transaction API.
So there are only 4 remaining callers of lock_ref_oid_basic().
None of those callers pass REF_DELETING anymore, the last caller went
away in 92b1551b1d (refs: resolve symbolic refs first,
2016-04-25).
Before that we'd refactored and moved this code in:
- 8df4e51138 (struct ref_update: move "have_old" into "flags",
2015-02-17)
- 7bd9bcf372 (refs: split filesystem-based refs code into a new
file, 2015-11-09)
- 165056b2fc (lock_ref_for_update(): new function, 2016-04-24)
We then finally stopped using it in 92b1551b1d (noted above). So let's
remove the handling of this parameter.
By itself this change doesn't benefit us much, but it's the start of
even more removal of unused code in and around this function in
subsequent commits.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In e0cc8ac820 (packed_ref_store: make class into a subclass of
`ref_store`, 2017-06-23) a die() was added to packed_create_reflog(),
but not to any of the other reflog callbacks, let's do that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes the empty prefix ("") stand out better.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanup around struct_type_init() functions.
* ab/struct-init:
string-list.h users: change to use *_{nodup,dup}()
string-list.[ch]: add a string_list_init_{nodup,dup}()
dir.[ch]: replace dir_init() with DIR_INIT
*.c *_init(): define in terms of corresponding *_INIT macro
*.h: move some *_INIT to designated initializers
Change all in-tree users of the string_list_init(LIST, BOOL) API to
use string_list_init_{nodup,dup}(LIST) instead.
As noted in the preceding commit let's leave the now-unused
string_list_init() wrapper in-place for any in-flight users, it can be
removed at some later date.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use -1 as error return value throughout.
This removes spurious differences in the GIT_TRACE_REFS output, depending on the
ref storage backend active.
Before, the cached ref_iterator (but only that iterator!) would return
peel_object() output directly. No callers relied on the peel_status values
beyond success/failure. All calls to these functions go through
peel_iterated_oid(), which returns peel_object() as a fallback, but also
squashing the error values.
The iteration interface already passes REF_ISSYMREF and REF_ISBROKEN through the
flags argument, so the additional error values in enum peel_status provide no
value.
The ref iteration interface provides a separate peel() function because certain
formats (eg. packed-refs and reftable) can store the peeled object next to the
tag SHA1. Passing the peeled SHA1 as an optional argument to each_ref_fn maps
more naturally to the implementation of ref databases. Changing the code in this
way is left for a future refactoring.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git update-ref -d" removes a ref that is packed, it left
empty directories under $GIT_DIR/refs/ for
* wc/packed-ref-removal-cleanup:
refs: cleanup directories when deleting packed ref
When deleting a packed ref via 'update-ref -d', a lockfile is made in
the directory that would contain the loose copy of that ref, creating
any directories in the ref's path that do not exist. When the
transaction completes, the lockfile is deleted, but any empty parent
directories made when creating the lockfile are left in place. These
empty directories are not removed by 'pack-refs' or other housekeeping
tasks and will accumulate over time.
When deleting a loose ref, we remove all empty parent directories at the
end of the transaction.
This commit applies the parent directory cleanup logic used when
deleting loose refs to packed refs as well.
Signed-off-by: Will Chandler <wfc@wfchandler.org>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
SHA-256 transition.
* bc/hash-transition-interop-part-1:
hex: print objects using the hash algorithm member
hex: default to the_hash_algo on zero algorithm value
builtin/pack-objects: avoid using struct object_id for pack hash
commit-graph: don't store file hashes as struct object_id
builtin/show-index: set the algorithm for object IDs
hash: provide per-algorithm null OIDs
hash: set, copy, and use algo field in struct object_id
builtin/pack-redundant: avoid casting buffers to struct object_id
Use the final_oid_fn to finalize hashing of object IDs
hash: add a function to finalize object IDs
http-push: set algorithm when reading object ID
Always use oidread to read into struct object_id
hash: add an algo member to struct object_id
Up until recently, object IDs did not have an algorithm member, only a
hash. Consequently, it was possible to share one null (all-zeros)
object ID among all hash algorithms. Now that we're going to be
handling objects from multiple hash algorithms, it's important to make
sure that all object IDs have a correct algorithm field.
Introduce a per-algorithm null OID, and add it to struct hash_algo.
Introduce a wrapper function as well, and use it everywhere we used to
use the null_oid constant.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref backend API uses errno as a sideband error channel.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead. It shortens the code and infers the
element size automatically.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the previous commits, avoid peeking into the `struct
lock_file`. Use the lock file API instead. Note how we obtain the path
to the lock file if `fdopen_lock_file()` failed and that this is not a
problem: as documented in lockfile.h, failure to "fdopen" does not roll
back the lock file and we're free to, e.g., query it for its path.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When set in the environment, GIT_TRACE_REFS makes git print operations and
results as they flow through the ref storage backend. This helps debug
discrepancies between different ref backends.
Example:
$ GIT_TRACE_REFS="1" ./git branch
15:42:09.769631 refs/debug.c:26 ref_store for .git
15:42:09.769681 refs/debug.c:249 read_raw_ref: HEAD: 0000000000000000000000000000000000000000 (=> refs/heads/ref-debug) type 1: 0
15:42:09.769695 refs/debug.c:249 read_raw_ref: refs/heads/ref-debug: 3a238e539b (=> refs/heads/ref-debug) type 0: 0
15:42:09.770282 refs/debug.c:233 ref_iterator_begin: refs/heads/ (0x1)
15:42:09.770290 refs/debug.c:189 iterator_advance: refs/heads/b4 (0)
15:42:09.770295 refs/debug.c:189 iterator_advance: refs/heads/branch3 (0)
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
REF_LOG_ONLY is used in the transaction preparation: if a symref is involved in
a transaction, the referent of the symref should be updated, and the symref
itself should only be updated in the reflog.
Other ref backends will need to duplicate this logic too, so move it to a
central place.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This prepares for handling FETCH_HEAD (which is not a regular ref)
separately from the ref backend.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
523fa69c (reflog: cleanse messages in the refs.c layer, 2020-07-10)
centralized reflog normalizaton. However, the normalizaton added a
leading "\t" to the message. This is an artifact of the reflog
storage format in the files backend, so it should be added there.
Routines that parse back the reflog (such as grab_nth_branch_switch)
expect the "\t" to not be in the message, so without this fix, git
with reftable cannot process the "@{-1}" syntax.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Regarding reflog messages:
- We expect that a reflog message consists of a single line. The
file format used by the files backend may add a LF after the
message as a delimiter, and output by commands like "git log -g"
may complete such an incomplete line by adding a LF at the end,
but philosophically, the terminating LF is not a part of the
message.
- We however allow callers of refs API to supply a random sequence
of NUL terminated bytes. We cleanse caller-supplied message by
squashing a run of whitespaces into a SP, and by trimming trailing
whitespace, before storing the message. This is how we tolerate,
instead of erring out, a message with LF in it (be it at the end,
in the middle, or both).
Currently, the cleansing of the reflog message is done by the files
backend, before the log is written out. This is sufficient with the
current code, as that is the only backend that writes reflogs. But
new backends can be added that write reflogs, and we'd want the
resulting log message we would read out of "log -g" the same no
matter what backend is used, and moving the code to do so to the
generic layer is a way to do so.
An added benefit is that the "cleansing" function could be updated
later, independent from individual backends, to e.g. allow
multi-line log messages if we wanted to, and when that happens, it
would help a lot to ensure we covered all bases if the cleansing
function (which would be updated) is called from the generic layer.
Side note: I am not interested in supporting multi-line reflog
messages right at the moment (nobody is asking for it), but I
envision that instead of the "squash a run of whitespaces into a SP
and rtrim" cleansing, we can %urlencode problematic bytes in the
message *AND* append a SP at the end, when a new version of Git that
supports multi-line and/or verbatim reflog messages writes a reflog
record. The reading side can detect the presense of SP at the end
(which should have been rtrimmed out if it were written by existing
versions of Git) as a signal that decoding %urlencode recovers the
original reflog message.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document some of the flag options in refs_ref_iterator_begin, and explain how
ref_iterator_advance_fn should handle them.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cleaning up a transaction that has no updates queued, then the
transaction's backend data will not have been allocated. We correctly
handle this for the packed backend, where the cleanup function checks
whether the backend data has been allocated at all -- if not, then there
is nothing to clean up. For the files backend we do not check this and
as a result will hit a segfault due to dereferencing a `NULL` pointer
when cleaning up such a transaction.
Fix the issue by checking whether `backend_data` is set in the files
backend, too.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We often skip an optional prefix in a string with a hardcoded
constant, e.g.
if (starts_with(string, "prefix"))
string += 6;
which is less error prone when written
skip_prefix(string, "prefix", &string);
Note that this changes a few error messages from "git reflog expire
--expire=nonsense.timestamp", which used to complain by saying
'--expire=nonsense.timestamp' is not a valid timestamp
but with this change, we say
'nonsense.timestamp' is not a valid timestamp
which is more technically correct (the string with --expire= as
a prefix obviously cannot be a valid timestamp, but the error is
about the part of the input without that prefix).
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs_read_ref_full() wraps refs_resolve_ref_unsafe(), which handles a
NULL oid pointer of callers not interested in the resolved object ID.
Pass NULL from files_copy_or_rename_ref() to clarify that it is one
such caller.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git pack-refs" can lose refs that are created while running, which
is getting corrected.
* sc/pack-refs-deletion-racefix:
pack-refs: always refresh after taking the lock file
When a packed ref is deleted, the whole packed-refs file is
rewritten to omit the ref that no longer exists. However if another
gc command is running and calls `pack-refs --all` simultaneously,
there is a chance that a ref that was just updated lose the newly
created commits.
Through these steps, losing commits on newly updated refs can be
demonstrated:
# step 1: compile git without `USE_NSEC` option
Some kernel releases do enable it by default while some do
not. And if we compile git without `USE_NSEC`, it will be easier
demonstrated by the following steps.
# step 2: setup a repository and add the first commit
git init repo &&
(cd repo &&
git config core.logallrefupdates true &&
git commit --allow-empty -m foo)
# step 3: in one terminal, repack the refs repeatedly
cd repo &&
while true
do
git pack-refs --all
done
# step 4: in another terminal, simultaneously update the
# master with update-ref, and create and delete an
# unrelated ref also with update-ref
cd repo &&
while true
do
us=$(git commit-tree -m foo -p HEAD HEAD^{tree}) &&
git update-ref refs/heads/newbranch $us &&
git update-ref refs/heads/master $us &&
git update-ref -d refs/heads/newbranch &&
them=$(git rev-parse master) &&
if test "$them" != "$us"
then
echo >&2 "lost commit: $us"
exit 1
fi
# eye candy
printf .
done
Though we have the packed-refs lock file and loose refs lock
files to avoid updating conflicts, a ref will lost its newly
commits if racy stat-validity of `packed-refs` file happens
(which is quite same as the racy-git described in
`Documentation/technical/racy-git.txt`), the following
specific set of operations demonstrates the problem:
1. Call `pack-refs --all` to pack all the loose refs to
packed-refs, and let say the modify time of the
packed-refs is DATE_M.
2. Call `update-ref` to update a new commit to master while
it is already packed. the old value (let us call it
OID_A) remains in the packed-refs file and write the new
value (let us call it OID_B) to $GIT_DIR/refs/heads/master.
3. Call `update-ref -d` within the same DATE_M from the 1th
step to delete a different ref newbranch which is packed
in the packed-refs file. It check newbranch's oid from
packed-refs file without locking it.
Meanwhile it keeps a snapshot of the packed-refs file in
memory and record the file's attributes with the snapshot.
The oid of master in the packed-refs's snapshot is OID_A.
4. Call a new `pack-refs --all` to pack the loose refs, the
oid of master in packe-refs file is OID_B, and the loose
refs $GIT_DIR/refs/heads/master is removed. Let's say
the `pack-refs --all` is very quickly done and the new
packed-refs file's modify time is still DATE_M, and it
has the same file size, even the same inode.
5. 3th step now goes on after checking the newbranch, it
begin to rewrite the packed-refs file. After get the
lock file of packed-ref file, it checks it's on-disk
file attributes with the snapshot, suck as the timestamp,
the file size and the inode value. If they are both the
same values, and the snapshot is not refreshed.
Because the loose ref of master is removed by 4th step,
`update-ref -d` will updates the new packed-ref to disk
which contains master with the oid OID_A. So now the
newly commit OID_B of master is lost.
The best path forward is just always refreshing after take
the lock file of `packed-refs` file. Traditionally we avoided
that because refreshing it implied parsing the whole file.
But these days we mmap it, so it really is just an extra
open()/mmap() and a quick read of the header. That doesn't seem
like an outrageous cost to pay when we're already taking the lock.
Signed-off-by: Sun Chao <sunchao9@huawei.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Sun Chao <sunchao9@huawei.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the possibility of giving flags to dir_iterator_begin to initialize
a dir-iterator with special options.
Currently possible flags are:
- DIR_ITERATOR_PEDANTIC, which makes dir_iterator_advance abort
immediately in the case of an error, instead of keep looking for the
next valid entry;
- DIR_ITERATOR_FOLLOW_SYMLINKS, which makes the iterator follow
symlinks and include linked directories' contents in the iteration.
These new flags will be used in a subsequent patch.
Also add tests for the flags' usage and adjust refs/files-backend.c to
the new dir_iterator_begin signature.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dir_iterator_advance() is a large function with two nested loops. Let's
improve its readability factoring out three functions and simplifying
its mechanics. The refactored model will no longer depend on
level.initialized and level.dir_state to keep track of the iteration
state and will perform on a single loop.
Also, dir_iterator_begin() currently does not check if the given string
represents a valid directory path. Since the refactored model will have
to stat() the given path at initialization, let's also check for this
kind of error and make dir_iterator_begin() return NULL, on failures,
with errno appropriately set. And add tests for this new behavior.
Improve documentation at dir-iteration.h and code comments at
dir-iterator.c to reflect the changes and eliminate possible
ambiguities.
Finally, adjust refs/files-backend.c to check for now possible
dir_iterator_begin() failures.
Original-patch-by: Daniel Ferreira <bnmvco@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A corner case bug in the refs API has been corrected.
* jk/refs-double-abort:
refs/files-backend: don't look at an aborted transaction
refs/files-backend: handle packed transaction prepare failure
"git rebase" uses the refs/rewritten/ hierarchy to store its
intermediate states, which inherently makes the hierarchy per
worktree, but it didn't quite work well.
* nd/rewritten-ref-is-per-worktree:
Make sure refs/rewritten/ is per-worktree
files-backend.c: reduce duplication in add_per_worktree_entries_to_dir()
files-backend.c: factor out per-worktree code in loose_fill_ref_dir()
When deleting refs, we hold packed-refs.lock and prepare a packed
transaction to drop the refs from the packed-refs file. If it turns out
that we don't need to rewrite the packed refs (e.g., because none of the
deletions were present in the file), then we abort the transaction.
If that abort succeeds, then the transaction struct will have been
freed, and we set our local pointer to NULL so we don't look at it
again.
However, if it fails, then the struct will _still_ have been freed
(because ref_transaction_abort() always frees). But we don't clean up
the pointer, and will jump to our cleanup code, which will try to abort
it again, causing a use-after-free.
It's actually impossible for this to trigger in practice, since
packed_transaction_abort() will never return anything but success. But
let's fix it anyway, since that's more than we should assume about the
packed-refs code (after all, we are already bothering to check for an
error result which cannot be triggered).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In files_transaction_prepare(), if we have to delete some refs, we use a
subordinate packed_transaction to do so. It's rare for that
sub-transaction's prepare step to fail, since we hold the packed-refs
lock. But if it does, we trigger a BUG() due to these steps:
- we've attached the packed transaction to the files transaction as
backend_data->packed_transaction
- when the prepare step fails, the packed transaction cleans itself
up, putting itself into the CLOSED state
- the error value from preparing the packed transaction lets us know
in files_transaction_prepare() that we should also clean up and
return an error. We call files_transaction_cleanup(), which tries to
abort backend_data->packed_transaction. Since it's already CLOSED,
that triggers an assertion in ref_transaction_abort().
We can fix that by disconnecting the packed transaction from the outer
files transaction, and then free-ing (not aborting!) it ourselves.
A few other options/alternatives I considered:
- we could just make it a noop to abort a CLOSED transaction. But that
seems less safe, since clearly this code expects (and enforces) a
particular set of state transitions.
- we could have files_transaction_cleanup() selectively call abort()
vs free() based on the state of the on the packed transaction.
That's basically a more restricted version of the above, but also
potentially unsafe.
- instead of disconnecting backend_data->packed_transaction on error,
we could wait to install it until we successfully prepare. That
might make the flow a little simpler, but it introduces a hassle.
Earlier parts of files_transaction_prepare() that encounter an error
will jump to the cleanup label, and expect that cleaning up the
outer transaction will clean up the packed transaction, too. We'd
have to adjust those sites to clean up the packed transaction.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
a9be29c981 (sequencer: make refs generated by the `label` command
worktree-local, 2018-04-25) adds refs/rewritten/ as per-worktree
reference space. Unfortunately (my bad) there are a couple places that
need update to make sure it's really per-worktree.
- add_per_worktree_entries_to_dir() is updated to make sure ref listing
look at per-worktree refs/rewritten/ instead of per-repo one [1]
- common_list[] is updated so that git_path() returns the correct
location. This includes "rev-parse --git-path".
This mess is created by me. I started trying to fix it with the
introduction of refs/worktree, where all refs will be per-worktree
without special treatments. Unfortunate refs/rewritten came before
refs/worktree so this is all we can do.
This also fixes logs/refs/worktree not being per-worktree.
[1] note that ref listing still works sometimes. For example, if you
have .git/worktrees/foo/refs/rewritten/bar AND the directory
.git/worktrees/refs/rewritten, refs/rewritten/bar will show up.
add_per_worktree_entries_to_dir() is only needed when the directory
.git/worktrees/refs/rewritten is missing.
Reported-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is duplicated to handle refs/bisect/ and refs/worktree/
and a third prefix is coming. Time to clean up.
This also fixes incorrect "refs/worktrees/" length in this code. The
correct length is 14 not 11. The test in the next patch will also cover
this.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the first step for further cleaning up and extending this
function.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This parameter was added in fcc42ea0c9 (split_symref_update(): add a
files_ref_store argument, 2016-09-04) without comment, but never used.
The splitting is purely mechanical, and doesn't depend on the particular
ref-store. Let's drop this parameter in the name of simplicity.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function files_reflog_path returns void, which usually means
"return;" not returning "void value" from another function.
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to traverse objects for reachability, used to decide what
objects are unreferenced and expendable, have been taught to also
consider per-worktree refs of other worktrees as starting points to
prevent data loss.
* nd/per-worktree-ref-iteration:
git-worktree.txt: correct linkgit command name
reflog expire: cover reflog from all worktrees
fsck: check HEAD and reflog from other worktrees
fsck: move fsck_head_link() to get_default_heads() to avoid some globals
revision.c: better error reporting on ref from different worktrees
revision.c: correct a parameter name
refs: new ref types to make per-worktree refs visible to all worktrees
Add a place for (not) sharing stuff between worktrees
refs.c: indent with tabs, not spaces
More codepaths are moving away from hardcoded hash sizes.
* bc/hash-transition-part-15:
rerere: convert to use the_hash_algo
submodule: make zero-oid comparison hash function agnostic
apply: rename new_sha1_prefix and old_sha1_prefix
apply: replace hard-coded constants
tag: express constant in terms of the_hash_algo
transport: use parse_oid_hex instead of a constant
upload-pack: express constants in terms of the_hash_algo
refs/packed-backend: express constants using the_hash_algo
packfile: express constants in terms of the_hash_algo
pack-revindex: express constants in terms of the_hash_algo
builtin/fetch-pack: remove constants with parse_oid_hex
builtin/mktree: remove hard-coded constant
builtin/repack: replace hard-coded constants
pack-bitmap-write: use GIT_MAX_RAWSZ for allocation
object_id.cocci: match only expressions of type 'struct object_id'
One of the problems with multiple worktree is accessing per-worktree
refs of one worktree from another worktree. This was sort of solved by
multiple ref store, where the code can open the ref store of another
worktree and has access to the ref space of that worktree.
The problem with this is reporting. "HEAD" in another ref space is
also called "HEAD" like in the current ref space. In order to
differentiate them, all the code must somehow carry the ref store
around and print something like "HEAD from this ref store".
But that is not feasible (or possible with a _lot_ of work). With the
current design, we pass a reference around as a string (so called
"refname"). Extending this design to pass a string _and_ a ref store
is a nightmare, especially when handling extended SHA-1 syntax.
So we do it another way. Instead of entering a separate ref space, we
make refs from other worktrees available in the current ref space. So
"HEAD" is always HEAD of the current worktree, but then we can have
"worktrees/blah/HEAD" to denote HEAD from a worktree named
"blah". This syntax coincidentally matches the underlying directory
structure which makes implementation a bit easier.
The main worktree has to be treated specially because well... it's
special from the beginning. So HEAD from the main worktree is
acccessible via the name "main-worktree/HEAD" instead of
"worktrees/main/HEAD" because "main" could be just another secondary
worktree.
This patch also makes it possible to specify refs from one worktree in
another one, e.g.
git log worktrees/foo/HEAD
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The recently introduced commit-graph auxiliary data is incompatible
with mechanisms such as replace & grafts that "breaks" immutable
nature of the object reference relationship. Disable optimizations
based on its use (and updating existing commit-graph) when these
incompatible features are in use in the repository.
* ds/commit-graph-with-grafts:
commit-graph: close_commit_graph before shallow walk
commit-graph: not compatible with uninitialized repo
commit-graph: not compatible with grafts
commit-graph: not compatible with replace objects
test-repository: properly init repo
commit-graph: update design document
refs.c: upgrade for_each_replace_ref to be a each_repo_ref_fn callback
refs.c: migrate internal ref iteration to pass thru repository argument
Switch uses of GIT_SHA1_HEXSZ to use the_hash_algo so that they are
appropriate for the any given hash length.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When multiple worktrees are used, we need rules to determine if
something belongs to one worktree or all of them. Instead of keeping
adding rules when new stuff comes (*), have a generic rule:
- Inside $GIT_DIR, which is per-worktree by default, add
$GIT_DIR/common which is always shared. New features that want to
share stuff should put stuff under this directory.
- Inside refs/, which is shared by default except refs/bisect, add
refs/worktree/ which is per-worktree. We may eventually move
refs/bisect to this new location and remove the exception in refs
code.
(*) And it may also include stuff from external commands which will
have no way to modify common/per-worktree rules.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the flip side of the previous two patches: checking
for a non-zero oidcmp() can be more strictly expressed as
inequality. Like those patches, we write "!= 0" in the
coccinelle transformation, which covers by isomorphism the
more common:
if (oidcmp(E1, E2))
As with the previous two patches, this patch can be achieved
almost entirely by running "make coccicheck"; the only
differences are manual line-wrap fixes to match the original
code.
There is one thing to note for anybody replicating this,
though: coccinelle 1.0.4 seems to miss the case in
builtin/tag.c, even though it's basically the same as all
the others. Running with 1.0.7 does catch this, so
presumably it's just a coccinelle bug that was fixed in the
interim.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the more restrictive oideq() should, in the long run,
give the compiler more opportunities to optimize these
callsites. For now, this conversion should be a complete
noop with respect to the generated code.
The result is also perhaps a little more readable, as it
avoids the "zero is equal" idiom. Since it's so prevalent in
C, I think seasoned programmers tend not to even notice it
anymore, but it can sometimes make for awkward double
negations (e.g., we can drop a few !!oidcmp() instances
here).
This patch was generated almost entirely by the included
coccinelle patch. This mechanical conversion should be
completely safe, because we check explicitly for cases where
oidcmp() is compared to 0, which is what oideq() is doing
under the hood. Note that we don't have to catch "!oidcmp()"
separately; coccinelle's standard isomorphisms make sure the
two are treated equivalently.
I say "almost" because I did hand-edit the coccinelle output
to fix up a few style violations (it mostly keeps the
original formatting, but sometimes unwraps long lines).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 60ce76d358 (refs: add repository argument to for_each_replace_ref,
2018-04-11) and 0d296c57ae (refs: allow for_each_replace_ref to handle
arbitrary repositories, 2018-04-11), for_each_replace_ref learned how
to iterate over refs by a given arbitrary repository.
New attempts in the object store conversion have shown that it is useful
to have the repository handle available that the refs iteration is
currently iterating over.
To achieve this goal we will need to add a repository argument to
each_ref_fn in refs.h. However as many callers rely on the signature
such a patch would be too large.
So convert the internals of the ref subsystem first to pass through a
repository argument without exposing the change to the user. Assume
the_repository for the passed through repository, although it is not
used anywhere yet.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up to use size_t/ssize_t when they are the right type.
* jk/size-t:
strbuf_humanise: use unsigned variables
pass st.st_size as hint for strbuf_readlink()
strbuf_readlink: use ssize_t
strbuf: use size_t for length in intermediate variables
reencode_string: use size_t for string lengths
reencode_string: use st_add/st_mult helpers
Conversion from uchar[40] to struct object_id continues.
* bc/object-id:
pretty: switch hard-coded constants to the_hash_algo
sha1-file: convert constants to uses of the_hash_algo
log-tree: switch GIT_SHA1_HEXSZ to the_hash_algo->hexsz
diff: switch GIT_SHA1_HEXSZ to use the_hash_algo
builtin/merge-recursive: make hash independent
builtin/merge: switch to use the_hash_algo
builtin/fmt-merge-msg: make hash independent
builtin/update-index: simplify parsing of cacheinfo
builtin/update-index: convert to using the_hash_algo
refs/files-backend: use the_hash_algo for writing refs
sha1-name: use the_hash_algo when parsing object names
strbuf: allocate space with GIT_MAX_HEXSZ
commit: express tree entry constants in terms of the_hash_algo
hex: switch to using the_hash_algo
tree-walk: replace hard-coded constants with the_hash_algo
cache: update object ID functions for the_hash_algo
The codebase has been updated to compile cleanly with -pedantic
option.
* bb/pedantic:
utf8.c: avoid char overflow
string-list.c: avoid conversion from void * to function pointer
sequencer.c: avoid empty statements at top level
convert.c: replace "\e" escapes with "\033".
fixup! refs/refs-internal.h: avoid forward declaration of an enum
refs/refs-internal.h: avoid forward declaration of an enum
fixup! connect.h: avoid forward declaration of an enum
connect.h: avoid forward declaration of an enum
When we initially added the strbuf_readlink() function in
b11b7e13f4 (Add generic 'strbuf_readlink()' helper function,
2008-12-17), the point was that we generally have a _guess_
as to the correct size based on the stat information, but we
can't necessarily trust it.
Over the years, a few callers have grown up that simply pass
in 0, even though they have the stat information. Let's have
them pass in their hint for consistency (and in theory
efficiency, since it may avoid an extra resize/syscall loop,
but neither location is probably performance critical).
Note that st.st_size is actually an off_t, so in theory we
need xsize_t() here. But none of the other callsites use it,
and since this is just a hint, it doesn't matter either way
(if we wrap we'll simply start with a too-small hint and
then eventually complain when we cannot allocate the
memory).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to ensure we write the correct amount, use the_hash_algo to
find the correct number of bytes for the current hash.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we don't care about how many bytes were written, simplify the return
value logic.
log_ref_write_fd() was written long before strbuf was fleshed out. Remove
the old manual buffer management code and replace it with strbuf(). Also
update copy_reflog_msg() which is called only by log_ref_write_fd() to use
strbuf as it keeps things consistent.
Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow the callers of parse_object
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Developer support update, by using BUG() macro instead of die() to
mark codepaths that should not happen more clearly.
* js/use-bug-macro:
BUG_exit_code: fix sparse "symbol not declared" warning
Convert remaining die*(BUG) messages
Replace all die("BUG: ...") calls by BUG() ones
run-command: use BUG() to report bugs, not die()
test-tool: help verifying BUG() code paths
Code clean-up to adjust to a more recent lockfile API convention that
allows lockfile instances kept on the stack.
* ma/lockfile-cleanup:
lock_file: move static locks into functions
lock_file: make function-local locks non-static
refs.c: do not die if locking fails in `delete_pseudoref()`
refs.c: do not die if locking fails in `write_pseudoref()`
t/helper/test-write-cache: clean up lock-handling
Placing `struct lock_file`s on the stack used to be a bad idea, because
the temp- and lockfile-machinery would keep a pointer into the struct.
But after 076aa2cbd (tempfile: auto-allocate tempfiles on heap,
2017-09-05), we can safely have lockfiles on the stack. (This applies
even if a user returns early, leaving a locked lock behind.)
These `struct lock_file`s are local to their respective functions and we
can drop their staticness.
For good measure, I have inspected these sites and come to believe that
they always release the lock, with the possible exception of bailing out
using `die()` or `exit()` or by returning from a `cmd_foo()`.
As pointed out by Jeff King, it would be bad if someone held on to a
`struct lock_file *` for some reason. After some grepping, I agree with
his findings: no-one appears to be doing that.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The effort to pass the repository in-core structure throughout the
API continues. This round deals with the code that implements the
refs/replace/ mechanism.
* sb/object-store-replace:
replace-object: allow lookup_replace_object to handle arbitrary repositories
replace-object: allow do_lookup_replace_object to handle arbitrary repositories
replace-object: allow prepare_replace_object to handle arbitrary repositories
refs: allow for_each_replace_ref to handle arbitrary repositories
refs: store the main ref store inside the repository struct
replace-object: add repository argument to lookup_replace_object
replace-object: add repository argument to do_lookup_replace_object
replace-object: add repository argument to prepare_replace_object
refs: add repository argument to for_each_replace_ref
refs: add repository argument to get_main_ref_store
replace-object: check_replace_refs is safe in multi repo environment
replace-object: eliminate replace objects prepared flag
object-store: move lookup_replace_object to replace-object.h
replace-object: move replace_map to object store
replace_object: use oidmap
In d8193743e0 (usage.c: add BUG() function, 2017-05-12), a new macro
was introduced to use for reporting bugs instead of die(). It was then
subsequently used to convert one single caller in 588a538ae5
(setup_git_env: convert die("BUG") to BUG(), 2017-05-12).
The cover letter of the patch series containing this patch
(cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
terribly clear why only one call site was converted, or what the plan
is for other, similar calls to die() to report bugs.
Let's just convert all remaining ones in one fell swoop.
This trick was performed by this invocation:
sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This moves the 'main_ref_store', which was a global variable in refs.c
into the repository struct.
This patch does not deal with the parts in the refs subsystem which deal
with the submodules there. A later patch needs to get rid of the submodule
exposure in the refs API, such as 'get_submodule_ref_store(path)'.
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit f57f37e2e1 (files-backend: remove the use of
git_path(), 2017-03-26) introduced a regression when a
relative $GIT_DIR is used in a working tree:
- when we initialize the ref backend, we make a copy of
get_git_dir(), which may be relative
- later, we may call setup_work_tree() and chdir to the
root of the working tree
- further calls to the ref code will use the stored git
directory, but relative paths will now point to the
wrong place
The new test in t1501 demonstrates one such instance (the
bug causes us to write the ref update to the nonsense
"relative/relative/.git").
Since setup_work_tree() now uses chdir_notify, we can just
ask it update our relative paths when necessary.
Reported-by: Rafael Ascensao <rafa.almas@gmail.com>
Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid mmapping small files while using packed refs (especially ones
with zero size, which would cause later munmap() to fail).
* kg/packed-ref-cache-fix:
packed_ref_cache: don't use mmap() for small files
load_contents(): don't try to mmap an empty file
packed_ref_iterator_begin(): make optimization more general
find_reference_location(): make function safe for empty snapshots
create_snapshot(): use `xmemdupz()` rather than a strbuf
struct snapshot: store `start` rather than `header_len`
Crash fix for a corner case where an error codepath tried to unlock
what it did not acquire lock on.
* mr/packed-ref-store-fix:
files_initial_transaction_commit(): only unlock if locked
Take a hint from commit ea68b0ce9f (hash-object: don't use mmap() for
small files, 2010-02-21) and use read() instead of mmap() for small
packed-refs files.
Signed-off-by: Kim Gybels <kgybels@infogroep.be>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't actually create zero-length `packed-refs` files, but they are
valid and we should handle them correctly. The old code `xmmap()`ed
such files, which led to an error when `munmap()` was called. So, if
the `packed-refs` file is empty, leave the snapshot at its zero values
and return 0 without trying to read or mmap the file.
Returning 0 also makes `create_snapshot()` exit early, which avoids
the technically undefined comparison `NULL < NULL`.
Reported-by: Kim Gybels <kgybels@infogroep.be>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can return an empty iterator not only if the `packed-refs` file is
missing, but also if it is empty or if there are no references whose
names succeed `prefix`. Optimize away those cases as well by moving
the call to `find_reference_location()` higher in the function and
checking whether the determined start position is the same as
`snapshot->eof`. (This is possible now because the previous commit
made `find_reference_location()` robust against empty snapshots.)
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function had two problems if called for an empty snapshot (i.e.,
`snapshot->start == snapshot->eof == NULL`):
* It checked `NULL < NULL`, which is undefined by C (albeit highly
unlikely to fail in the real world).
* (Assuming the above comparison behaved as expected), it returned
NULL when `mustexist` was false, contrary to its docstring.
Change the check and fix the docstring.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Store a pointer to the start of the actual references within the
`packed-refs` contents rather than storing the length of the header.
This is more convenient for most users of this field.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the helper macro MOVE_ARRAY to move arrays. This is shorter and
safer, as it automatically infers the size of elements.
Patch generated by Coccinelle and contrib/coccinelle/array.cocci in
Travis CI's static analysis build job.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running git clone --single-branch --mirror -b TAGNAME previously
triggered the following error message:
fatal: multiple updates for ref 'refs/tags/TAGNAME' not allowed.
This error condition is handled in files_initial_transaction_commit().
42c7f7ff9 ("commit_packed_refs(): remove call to `packed_refs_unlock()`", 2017-06-23)
introduced incorrect unlocking in the error path of this function,
which changes the error message to
fatal: BUG: packed_refs_unlock() called when not locked
Move the call to packed_refs_unlock() above the "cleanup:" label
since the unlocking should only be done in the last error path.
Signed-off-by: Mathias Rav <m@git.strova.dk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent update to the refs infrastructure implementation started
rewriting packed-refs file more often than before; this has been
optimized again for most trivial cases.
* mh/avoid-rewriting-packed-refs:
files-backend: don't rewrite the `packed-refs` file unnecessarily
t1409: check that `packed-refs` is not rewritten unnecessarily
Code clean-up in refs API implementation.
* mh/tidy-ref-update-flags:
refs: update some more docs to use "oid" rather than "sha1"
write_packed_entry(): take `object_id` arguments
refs: rename constant `REF_ISPRUNING` to `REF_IS_PRUNING`
refs: rename constant `REF_NODEREF` to `REF_NO_DEREF`
refs: tidy up and adjust visibility of the `ref_update` flags
ref_transaction_add_update(): remove a check
ref_transaction_update(): die on disallowed flags
prune_ref(): call `ref_transaction_add_update()` directly
files_transaction_prepare(): don't leak flags to packed transaction
Recent update to the refs infrastructure implementation started
rewriting packed-refs file more often than before; this has been
optimized again for most trivial cases.
* mh/avoid-rewriting-packed-refs:
files-backend: don't rewrite the `packed-refs` file unnecessarily
t1409: check that `packed-refs` is not rewritten unnecessarily
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (25 commits)
refs/files-backend: convert static functions to object_id
refs: convert read_raw_ref backends to struct object_id
refs: convert peel_object to struct object_id
refs: convert resolve_ref_unsafe to struct object_id
worktree: convert struct worktree to object_id
refs: convert resolve_gitlink_ref to struct object_id
Convert remaining callers of resolve_gitlink_ref to object_id
sha1_file: convert index_path and index_fd to struct object_id
refs: convert reflog_expire parameter to struct object_id
refs: convert read_ref_at to struct object_id
refs: convert peel_ref to struct object_id
builtin/pack-objects: convert to struct object_id
pack-bitmap: convert traverse_bitmap_commit_list to object_id
refs: convert dwim_log to struct object_id
builtin/reflog: convert remaining unsigned char uses to object_id
refs: convert dwim_ref and expand_ref to struct object_id
refs: convert read_ref and read_ref_full to object_id
refs: convert resolve_refdup and refs_resolve_refdup to struct object_id
Convert check_connected to use struct object_id
refs: update ref transactions to use struct object_id
...
Change `write_packed_entry()` to take `struct object_id *` rather than
`unsigned char *` arguments.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even after working with this code for years, I still see this constant
name as "ref node ref". Rename it to make it's meaning clearer.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The constants used for `ref_update::flags` were rather disorganized:
* The definitions in `refs.h` were not close to the functions that
used them.
* Maybe constants were defined in `refs-internal.h`, making them
visible to the whole refs module, when in fact they only made sense
for the files backend.
* Their documentation wasn't very consistent and partly still referred
to sha1s rather than oids.
* The numerical values followed no rational scheme
Fix all of these problems. The main functional improvement is that
some constants' visibility is now limited to `files-backend.c`.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to make `REF_ISPRUNING` internal to the files backend. For
this to be possible, `ref_transaction_add_update()` mustn't know about
it. So move the check that `REF_ISPRUNING` is only used with
`REF_NODEREF` from this function to `files_transaction_prepare()`.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`prune_ref()` needs to use the `REF_ISPRUNING` flag, but we want to
make that flag private to the files backend. So instead of calling
`ref_transaction_delete()`, which is a public function and therefore
shouldn't allow the `REF_ISPRUNING` flag, change `prune_ref()` to call
`ref_transaction_add_update()`, which is private to the refs
module. (Note that we don't need any of the other services provided by
`ref_transaction_delete()`.)
This allows us to change `ref_transaction_update()` to reject the
`REF_ISPRUNING` flag. Do so by adjusting
`REF_TRANSACTION_UPDATE_ALLOWED_FLAGS`. Also add parentheses to its
definition to avoid potential future mishaps.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The files backend uses `ref_update::flags` for several internal flags.
But those flags have no meaning to the packed backend. So when adding
updates for the packed-refs transaction, only use flags that make
sense to the packed backend.
`REF_NODEREF` is part of the public interface, and it's logically what
we want, so include it. In fact it is actually ignored by the packed
backend (which doesn't support symbolic references), but that's its
own business.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even when we are deleting references, we needn't overwrite the
`packed-refs` file if the references that we are deleting only exist
as loose references. Implement this optimization as follows:
* Add a function `is_packed_transaction_needed()`, which checks
whether a given packed-refs transaction actually needs to be carried
out (i.e., it returns false if the transaction obviously wouldn't
have any effect). This function must be called while holding the
`packed-refs` lock to avoid races.
* Change `files_transaction_prepare()` to check whether the
packed-refs transaction is actually needed. If not, squelch it, but
continue holding the `packed-refs` lock until the end of the
transaction to avoid races.
This fixes a mild regression caused by dc39e09942 (files_ref_store:
use a transaction to update packed refs, 2017-09-08). Before that
commit, unnecessary rewrites of `packed-refs` were suppressed by
`repack_without_refs()`. But the transaction-based writing introduced
by that commit didn't perform that optimization.
Note that the pre-dc39e09942 code still had to *read* the whole
`packed-refs` file to determine that the rewrite could be skipped, so
the performance for the cases that the write could be elided was
`O(N)` in the number of packed references both before and after
dc39e09942. But after that commit the constant factor increased.
This commit reimplements the optimization of eliding unnecessary
`packed-refs` rewrites. That, plus the fact that since
cfa2e29c34 (packed_ref_store: get rid of the `ref_cache` entirely,
2017-03-17) we don't necessarily have to read the whole `packed-refs`
file at all, means that deletes of one or a few loose references can
now be done with `O(n lg N)` effort, where `n` is the number of loose
references being deleted and `N` is the total number of packed
references.
This commit fixes two tests in t1409.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Transactions to update multiple references that involves a deletion
was quite broken in an error codepath and did not abort everything
correctly.
* mh/ref-locking-fix:
files_transaction_prepare(): fix handling of ref lock failure
t1404: add a bunch of tests of D/F conflicts
Since dc39e09942 (files_ref_store: use a transaction to update packed
refs, 2017-09-08), failure to lock a reference has been handled
incorrectly by `files_transaction_prepare()`. If
`lock_ref_for_update()` fails in the lock-acquisition loop of that
function, it sets `ret` then breaks out of that loop. Prior to
dc39e09942, that was OK, because the only thing following the loop was
the cleanup code. But dc39e09942 added another blurb of code between
the loop and the cleanup. That blurb sometimes resets `ret` to zero,
making the cleanup code think that the locking was successful.
Specifically, whenever
* One or more reference deletions have been processed successfully in
the lock-acquisition loop. (Processing the first such reference
causes a packed-ref transaction to be initialized.)
* Then `lock_ref_for_update()` fails for a subsequent reference. Such
a failure can happen for a number of reasons, such as the old SHA-1
not being correct, lock contention, etc. This causes a `break` out
of the lock-acquisition loop.
* The `packed-refs` lock is acquired successfully and
`ref_transaction_prepare()` succeeds for the packed-ref transaction.
This has the effect of resetting `ret` back to 0, and making the
cleanup code think that lock acquisition was successful.
In that case, any reference updates that were processed prior to
breaking out of the loop would be carried out (loose and packed), but
the reference that couldn't be locked and any subsequent references
would silently be ignored.
This can easily cause data loss if, for example, the user was trying
to push a new name for an existing branch while deleting the old name.
After the push, the branch could be left unreachable, and could even
subsequently be garbage-collected.
This problem was noticed in the context of deleting one reference and
creating another in a single transaction, when the two references D/F
conflict with each other, like
git update-ref --stdin <<EOF
delete refs/foo
create refs/foo/bar HEAD
EOF
This triggers the above bug because the deletion is processed
successfully for `refs/foo`, then the D/F conflict causes
`lock_ref_for_update()` to fail when `refs/foo/bar` is processed. In
this case the transaction *should* fail, but instead it causes
`refs/foo` to be deleted without creating `refs/foo`. This could
easily result in data loss.
The fix is simple: instead of just breaking out of the loop, jump
directly to the cleanup code. This fixes some tests in t1404 that were
added in the previous commit.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert several static functions to take pointers to struct object_id.
Change the relevant parameters to write_packed_entry to be const, as we
don't modify them. Rename lock_ref_sha1_basic to lock_ref_oid_basic to
reflect its new argument. Update the docstring for verify lock to
account for the new parameter name, and note additionally that the
old_oid may be NULL.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the unsigned char * parameter to struct object_id * for
files_read_raw_ref and packed_read_raw_ref. Update the documentation.
Switch from using get_sha1_hex and a hard-coded 40 to using
parse_oid_hex.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert resolve_ref_unsafe to take a pointer to struct object_id by
converting one remaining caller to use struct object_id, removing the
temporary NULL pointer check in expand_ref, converting the declaration
and definition, and applying the following semantic patch:
@@
expression E1, E2, E3, E4;
@@
- resolve_ref_unsafe(E1, E2, E3.hash, E4)
+ resolve_ref_unsafe(E1, E2, &E3, E4)
@@
expression E1, E2, E3, E4;
@@
- resolve_ref_unsafe(E1, E2, E3->hash, E4)
+ resolve_ref_unsafe(E1, E2, E3, E4)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
reflog_expire already used struct object_id internally, but it did not
take it as a parameter. Adjust the parameter (and the callers) to pass
a pointer to struct object_id instead of a pointer to unsigned char.
Remove the temporary inserted earlier as it is no longer required.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All but two of the call sites already have parameters using the hash
parameter of struct object_id, so convert them to take a pointer to the
struct directly. Also convert refs_read_refs_full, the underlying
implementation.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the ref transaction code to use struct object_id. Remove one
NULL pointer check which was previously inserted around a dereference;
since we now pass a pointer to struct object_id directly through, the
code we're calling handles this for us.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert delete_ref and refs_delete_ref to take a pointer to struct
object_id. Update the documentation accordingly, including referring to
null_oid in lowercase, as it is not a #define constant.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the member of this struct to be a struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* rs/cleanup-strbuf-users:
graph: use strbuf_addchars() to add spaces
use strbuf_addstr() for adding strings to strbufs
path: use strbuf_add_real_path()
Code clean-up.
* rs/resolve-ref-optional-result:
refs: pass NULL to resolve_refdup() if hash is not needed
refs: pass NULL to refs_resolve_refdup() if hash is not needed
Operations that do not touch (majority of) packed refs have been
optimized by making accesses to packed-refs file lazy; we no longer
pre-parse everything, and an access to a single ref in the
packed-refs does not touch majority of irrelevant refs, either.
* mh/mmap-packed-refs: (21 commits)
packed-backend.c: rename a bunch of things and update comments
mmapped_ref_iterator: inline into `packed_ref_iterator`
ref_cache: remove support for storing peeled values
packed_ref_store: get rid of the `ref_cache` entirely
ref_store: implement `refs_peel_ref()` generically
packed_read_raw_ref(): read the reference from the mmapped buffer
packed_ref_iterator_begin(): iterate using `mmapped_ref_iterator`
read_packed_refs(): ensure that references are ordered when read
packed_ref_cache: keep the `packed-refs` file mmapped if possible
packed-backend.c: reorder some definitions
mmapped_ref_iterator_advance(): no peeled value for broken refs
mmapped_ref_iterator: add iterator over a packed-refs file
packed_ref_cache: remember the file-wide peeling state
read_packed_refs(): read references with minimal copying
read_packed_refs(): make parsing of the header line more robust
read_packed_refs(): only check for a header at the top of the file
read_packed_refs(): use mmap to read the `packed-refs` file
die_unterminated_line(), die_invalid_line(): new functions
packed_ref_cache: add a backlink to the associated `packed_ref_store`
prefix_ref_iterator: break when we leave the prefix
...
Code clean-up to prevent future mistakes by copying and pasting
code that checks the result of read_in_full() function.
* jk/read-in-full:
worktree: check the result of read_in_full()
worktree: use xsize_t to access file size
distinguish error versus short read from read_in_full()
avoid looking at errno for short read_in_full() returns
prefer "!=" when checking read_in_full() result
notes-merge: drop dead zero-write code
files-backend: prefer "0" for write_in_full() error check
"git branch" learned "-c/-C" to create a new branch by copying an
existing one.
* sd/branch-copy:
branch: fix "copy" to never touch HEAD
branch: add a --copy (-c) option to go with --move (-m)
branch: add test for -m renaming multiple config sections
config: create a function to format section headers
Use strbuf_addstr() instead of strbuf_addf() for adding strings. That's
simpler and makes the intent clearer.
Patch generated by Coccinelle and contrib/coccinelle/strbuf.cocci;
adjusted indentation in refs/packed-backend.c manually.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* rs/resolve-ref-optional-result:
refs: pass NULL to resolve_ref_unsafe() if hash is not needed
refs: pass NULL to refs_resolve_ref_unsafe() if hash is not needed
refs: make sha1 output parameter of refs_resolve_ref_unsafe() optional
Commit 06f46f237a (avoid "write_in_full(fd, buf, len) !=
len" pattern, 2017-09-13) converted this callsite from:
write_in_full(...) != 1
to
write_in_full(...) < 0
But during the conflict resolution in c50424a6f0 (Merge
branch 'jk/write-in-full-fix', 2017-09-25), this morphed
into
write_in_full(...) < 1
This behaves as we want, but we prefer to avoid modeling the
"less than length" error-check which can be subtly buggy, as
shown in efacf609c8 (config: avoid "write_in_full(fd, buf,
len) < len" pattern, 2017-09-13).
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We've made huge changes to this file, and some of the old names and
comments are no longer very fitting. So rename a bunch of things:
* `struct packed_ref_cache` → `struct snapshot`
* `acquire_packed_ref_cache()` → `acquire_snapshot()`
* `release_packed_ref_buffer()` → `clear_snapshot_buffer()`
* `release_packed_ref_cache()` → `release_snapshot()`
* `clear_packed_ref_cache()` → `clear_snapshot()`
* `struct packed_ref_entry` → `struct snapshot_record`
* `cmp_packed_ref_entries()` → `cmp_packed_ref_records()`
* `cmp_entry_to_refname()` → `cmp_record_to_refname()`
* `sort_packed_refs()` → `sort_snapshot()`
* `read_packed_refs()` → `create_snapshot()`
* `validate_packed_ref_cache()` → `validate_snapshot()`
* `get_packed_ref_cache()` → `get_snapshot()`
* Renamed local variables and struct members accordingly.
Also update a bunch of comments to reflect the renaming and the
accumulated changes that the code has undergone.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since `packed_ref_iterator` is now delegating to
`mmapped_ref_iterator` rather than `cache_ref_iterator` to do the
heavy lifting, there is no need to keep the two iterators separate. So
"inline" `mmapped_ref_iterator` into `packed_ref_iterator`. This
removes a bunch of boilerplate.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the `packed-refs` backend doesn't use `ref_cache`, there is
nobody left who might want to store peeled values of references in
`ref_cache`. So remove that feature.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that everything has been changed to read what it needs directly
out of the `packed-refs` file, `packed_ref_store` doesn't need to
maintain a `ref_cache` at all. So get rid of it.
First of all, this will save a lot of memory and lots of little
allocations. Instead of needing to store complicated parsed data
structures in memory, we just mmap the file (potentially sharing
memory with other processes) and parse only what we need.
Moreover, since the mmapped access to the file reads only the parts of
the file that it needs, this might save reading all of the data from
disk at all (at least if the file starts out sorted).
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're about to stop storing packed refs in a `ref_cache`. That means
that the only way we have left to optimize `peel_ref()` is by checking
whether the reference being peeled is the one currently being iterated
over (in `current_ref_iter`), and if so, using `ref_iterator_peel()`.
But this can be done generically; it doesn't have to be implemented
per-backend.
So implement `refs_peel_ref()` in `refs.c` and remove the `peel_ref()`
method from the refs API.
This removes the last callers of a couple of functions, so delete
them. More cleanup to come...
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of reading the reference from the `ref_cache`, read it
directly from the mmapped buffer.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have an efficient way to iterate, in order, over the
mmapped contents of the `packed-refs` file, we can use that directly
to implement reference iteration for the `packed_ref_store`, rather
than iterating over the `ref_cache`. This is the next step towards
getting rid of the `ref_cache` entirely.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It doesn't actually matter now, because the references are only
iterated over to fill the associated `ref_cache`, which itself puts
them in the correct order. But we want to get rid of the `ref_cache`,
so we want to be able to iterate directly over the `packed-refs`
buffer, and then the iteration will need to be ordered correctly.
In fact, we already write the `packed-refs` file sorted, but it is
possible that other Git clients don't get it right. So let's not
assume that a `packed-refs` file is sorted unless it is explicitly
declared to be so via a `sorted` trait in its header line.
If it is *not* declared to be sorted, then scan quickly through the
file to check. If it is found to be out of order, then sort the
records into a new memory-only copy. This checking and sorting is done
quickly, without parsing the full file contents. However, it needs a
little bit of care to avoid reading past the end of the buffer even if
the `packed-refs` file is corrupt.
Since *we* always write the file correctly sorted, include that trait
when we write or rewrite a `packed-refs` file. This means that the
scan described in the previous paragraph should only have to be done
for `packed-refs` files that were written by older versions of the Git
command-line client, or by other clients that haven't yet learned to
write the `sorted` trait.
If `packed-refs` was already sorted, then (if the system allows it) we
can use the mmapped file contents directly. But if the system doesn't
allow a file that is currently mmapped to be replaced using
`rename()`, then it would be bad for us to keep the file mmapped for
any longer than necessary. So, on such systems, always make a copy of
the file contents, either as part of the sorting process, or
afterwards.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Keep a copy of the `packed-refs` file contents in memory for as long
as a `packed_ref_cache` object is in use:
* If the system allows it, keep the `packed-refs` file mmapped.
* If not (either because the system doesn't support `mmap()` at all,
or because a file that is currently mmapped cannot be replaced via
`rename()`), then make a copy of the file's contents in
heap-allocated space, and keep that around instead.
We base the choice of behavior on a new build-time switch,
`MMAP_PREVENTS_DELETE`. By default, this switch is set for Windows
variants.
After this commit, `MMAP_NONE` and `MMAP_TEMPORARY` are still handled
identically. But the next commit will introduce a difference.
This whole change is still pointless, because we only read the
`packed-refs` file contents immediately after instantiating the
`packed_ref_cache`. But that will soon change.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
No code has been changed. This will make subsequent patches more
self-contained.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a reference is broken, suppress its peeled value.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new `mmapped_ref_iterator`, which can iterate over the
references in an mmapped `packed-refs` file directly. Use this
iterator from `read_packed_refs()` to fill the packed refs cache.
Note that we are not yet willing to promise that the new iterator
generates its output in order. That doesn't matter for now, because
the packed refs cache doesn't care what order it is filled.
This change adds a lot of boilerplate without providing any obvious
benefits. The benefits will come soon, when we get rid of the
`ref_cache` for packed references altogether.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rather than store the peeling state (i.e., the one defined by traits
in the `packed-refs` file header line) in a local variable in
`read_packed_refs()`, store it permanently in `packed_ref_cache`. This
will be needed when we stop reading all packed refs at once.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of copying data from the `packed-refs` file one line at time
and then processing it, process the data in place as much as possible.
Also, instead of processing one line per iteration of the main loop,
process a reference line plus its corresponding peeled line (if
present) together.
Note that this change slightly tightens up the parsing of the
`packed-refs` file. Previously, the parser would have accepted
multiple "peeled" lines for a single reference (ignoring all but the
last one). Now it would reject that.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many codepaths did not diagnose write failures correctly when disks
go full, due to their misuse of write_in_full() helper function,
which have been corrected.
* jk/write-in-full-fix:
read_pack_header: handle signed/unsigned comparison in read result
config: flip return value of store_write_*()
notes-merge: use ssize_t for write_in_full() return value
pkt-line: check write_in_full() errors against "< 0"
convert less-trivial versions of "write_in_full() != len"
avoid "write_in_full(fd, buf, len) != len" pattern
get-tar-commit-id: check write_in_full() return against 0
config: avoid "write_in_full(fd, buf, len) < len" pattern
This allows us to get rid of two write-only variables, one of them
being a SHA1 buffer.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement transactional update to the packed-ref representation of
references.
* mh/packed-ref-transactions:
files_transaction_finish(): delete reflogs before references
packed-backend: rip out some now-unused code
files_ref_store: use a transaction to update packed refs
t1404: demonstrate two problems with reference transactions
files_initial_transaction_commit(): use a transaction for packed refs
prune_refs(): also free the linked list
files_pack_refs(): use a reference transaction to write packed refs
packed_delete_refs(): implement method
packed_ref_store: implement reference transactions
struct ref_transaction: add a place for backends to store data
packed-backend: don't adjust the reference count on lock/unlock
The long-standing rule that an in-core lockfile instance, once it
is used, must not be freed, has been lifted and the lockfile and
tempfile APIs have been updated to reduce the chance of programming
errors.
* jk/incore-lockfile-removal:
stop leaking lock structs in some simple cases
ref_lock: stop leaking lock_files
lockfile: update lifetime requirements in documentation
tempfile: auto-allocate tempfiles on heap
tempfile: remove deactivated list entries
tempfile: use list.h for linked list
tempfile: release deactivated strbufs instead of resetting
tempfile: robustify cleanup handler
tempfile: factor out deactivation
tempfile: factor out activation
tempfile: replace die("BUG") with BUG()
tempfile: handle NULL tempfile pointers gracefully
tempfile: prefer is_tempfile_active to bare access
lockfile: do not rollback lock on failed close
tempfile: do not delete tempfile on failed close
always check return value of close_tempfile
verify_signed_buffer: prefer close_tempfile() to close()
setup_temporary_shallow: move tempfile struct into function
setup_temporary_shallow: avoid using inactive tempfile
write_index_as_tree: cleanup tempfile on error
"git gc" and friends when multiple worktrees are used off of a
single repository did not consider the index and per-worktree refs
of other worktrees as the root for reachability traversal, making
objects that are in use only in other worktrees to be subject to
garbage collection.
* nd/prune-in-worktree:
refs.c: reindent get_submodule_ref_store()
refs.c: remove fallback-to-main-store code get_submodule_ref_store()
rev-list: expose and document --single-worktree
revision.c: --reflog add HEAD reflog from all worktrees
files-backend: make reflog iterator go through per-worktree reflog
revision.c: --all adds HEAD from all worktrees
refs: remove dead for_each_*_submodule()
refs.c: move for_each_remote_ref_submodule() to submodule.c
revision.c: use refs_for_each*() instead of for_each_*_submodule()
refs: add refs_head_ref()
refs: move submodule slash stripping code to get_submodule_ref_store
refs.c: refactor get_submodule_ref_store(), share common free block
revision.c: --indexed-objects add objects from all worktrees
revision.c: refactor add_index_objects_to_pending()
refs.c: use is_dir_sep() in resolve_gitlink_ref()
revision.h: new flag in struct rev_info wrt. worktree-related refs
A leakfix.
* ma/split-symref-update-fix:
refs/files-backend: add `refname`, not "HEAD", to list
refs/files-backend: correct return value in lock_ref_for_update
refs/files-backend: fix memory leak in lock_ref_for_update
refs/files-backend: add longer-scoped copy of string to list
The old code parsed the traits in the `packed-refs` header by looking
for the string " trait " (i.e., the name of the trait with a space on
either side) in the header line. This is fragile, because if any other
implementation of Git forgets to write the trailing space, the last
trait would silently be ignored (and the error might never be
noticed).
So instead, use `string_list_split_in_place()` to split the traits
into tokens then use `unsorted_string_list_has_string()` to look for
the tokens we are interested in. This means that we can read the
traits correctly even if the header line is missing a trailing
space (or indeed, if it is missing the space after the colon, or if it
has multiple spaces somewhere).
However, older Git clients (and perhaps other Git implementations)
still require the surrounding spaces, so we still have to output the
header with a trailing space.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This tightens up the parsing a bit; previously, stray header-looking
lines would have been processed.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's still done in a pretty stupid way, involving more data copying
than necessary. That will improve in future commits.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract some helper functions for reporting errors. While we're at it,
prevent them from spewing unlimited output to the terminal. These
functions will soon have more callers.
These functions accept the problematic line as a `(ptr, len)` pair
rather than a NUL-terminated string, and `die_invalid_line()` checks
for an EOL itself, because these calling conventions will be
convenient for future callers. (Efficiency is not a concern here
because these functions are only ever called if the `packed-refs` file
is corrupt.)
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the underlying iterator is ordered, then `prefix_ref_iterator` can
stop as soon as it sees a refname that comes after the prefix. This
will rarely make a big difference now, because `ref_cache_iterator`
only iterates over the directory containing the prefix (and usually
the prefix will span a whole directory anyway). But if *hint, hint* a
future reference backend doesn't itself know where to stop the
iteration, then this optimization will be a big win.
Note that there is no guarantee that the underlying iterator doesn't
include output preceding the prefix, so we have to skip over any
unwanted references before we get to the ones that we want.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
References are iterated over in order by refname, but reflogs are not.
Some consumers of reference iteration care about the difference. Teach
each `ref_iterator` to keep track of whether its output is ordered.
`overlay_ref_iterator` is one of the picky consumers. Add a sanity
check in `overlay_ref_iterator_begin()` to verify that its inputs are
ordered.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The prior commit converted many sites to check the return
value of write_in_full() for negativity, rather than a
mismatch with the input length. This patch covers similar
cases, but where the return value is stored in an
intermediate variable. These should get the same treatment,
but they need to be reviewed more carefully since it would
be a bug if the return value is stored in an unsigned type
(which indeed, it is in one of the cases).
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The return value of write_in_full() is either "-1", or the
requested number of bytes[1]. If we make a partial write
before seeing an error, we still return -1, not a partial
value. This goes back to f6aa66cb95 (write_in_full: really
write in full or return error on disk full., 2007-01-11).
So checking anything except "was the return value negative"
is pointless. And there are a couple of reasons not to do
so:
1. It can do a funny signed/unsigned comparison. If your
"len" is signed (e.g., a size_t) then the compiler will
promote the "-1" to its unsigned variant.
This works out for "!= len" (unless you really were
trying to write the maximum size_t bytes), but is a
bug if you check "< len" (an example of which was fixed
recently in config.c).
We should avoid promoting the mental model that you
need to check the length at all, so that new sites are
not tempted to copy us.
2. Checking for a negative value is shorter to type,
especially when the length is an expression.
3. Linus says so. In d34cf19b89 (Clean up write_in_full()
users, 2007-01-11), right after the write_in_full()
semantics were changed, he wrote:
I really wish every "write_in_full()" user would just
check against "<0" now, but this fixes the nasty and
stupid ones.
Appeals to authority aside, this makes it clear that
writing it this way does not have an intentional
benefit. It's a historical curiosity that we never
bothered to clean up (and which was undoubtedly
cargo-culted into new sites).
So let's convert these obviously-correct cases (this
includes write_str_in_full(), which is just a wrapper for
write_in_full()).
[1] A careful reader may notice there is one way that
write_in_full() can return a different value. If we ask
write() to write N bytes and get a return value that is
_larger_ than N, we could return a larger total. But
besides the fact that this would imply a totally broken
version of write(), it would already invoke undefined
behavior. Our internal remaining counter is an unsigned
size_t, which means that subtracting too many byte will
wrap it around to a very large number. So we'll instantly
begin reading off the end of the buffer, trying to write
gigabytes (or petabytes) of data.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An earlier patch rewrote `split_symref_update()` to add a copy of a
string to a string list instead of adding the original string. That was
so that the original string could be freed in a later patch, but it is
also conceptually cleaner, since now all calls to `string_list_insert()`
and `string_list_append()` add `update->refname`. --- Except a literal
"HEAD" is added in `split_head_update()`.
Restructure `split_head_update()` in the same way as the earlier patch
did for `split_symref_update()`. This does not correct any practical
problem, but makes things conceptually cleaner. The downside is a call
to `string_list_has_string()`, which should be relatively cheap.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In one code path we return a literal -1 and not a symbolic constant. The
value -1 would be interpreted as TRANSACTION_NAME_CONFLICT, which is
wrong. Use TRANSACTION_GENERIC_ERROR instead (that is the only other
return value we have to choose from).
Noticed-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After the previous patch, none of the functions we call hold on to
`referent.buf`, so we can safely release the string buffer before
returning.
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
split_symref_update() receives a string-pointer `referent` and adds it
to the list of `affected_refnames`. The list simply holds on to the
pointers it is given, it does not copy the strings and it does not ever
free them. The `referent` string in split_symref_update() belongs to a
string buffer in the caller. After we return, the string will be leaked.
In the next patch, we want to properly release the string buffer in the
caller, but we can't safely do so until we've made sure that
`affected_refnames` will not be holding on to a pointer to the string.
We could configure the list to handle its own resources, but it would
mean some alloc/free-churning. The list is already handling other
strings (through other code paths) which we do not need to worry about,
and we'd be memory-churning those strings too, completely unnecessary.
Observe that split_symref_update() creates a `new_update`-object through
ref_transaction_add_update(), after which `new_update->refname` is a
copy of `referent`. The difference is, this copy will be freed, and it
will be freed *after* `affected_refnames` has been cleared.
Rearrange the handling of `referent`, so that we don't add it directly
to `affected_refnames`. Instead, first just check whether `referent`
exists in the string list, and later add `new_update->refname`.
Helped-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the deletion steps unexpectedly fail, it is less bad to leave a
reference without its reflog than it is to leave a reflog without its
reference, since the latter is an invalid repository state.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now the outside world interacts with the packed ref store only via the
generic refs API plus a few lock-related functions. This allows us to
delete some functions that are no longer used, thereby completing the
encapsulation of the packed ref store.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When processing a `files_ref_store` transaction, it is sometimes
necessary to delete some references from the "packed-refs" file. Do
that using a reference transaction conducted against the
`packed_ref_store`.
This change further decouples `files_ref_store` from
`packed_ref_store`. It also fixes multiple problems, including the two
revealed by test cases added in the previous commit.
First, the old code didn't obtain the `packed-refs` lock until
`files_transaction_finish()`. This means that a failure to acquire the
`packed-refs` lock (e.g., due to contention with another process)
wasn't detected until it was too late (problems like this are supposed
to be detected in the "prepare" phase). The new code acquires the
`packed-refs` lock in `files_transaction_prepare()`, the same stage of
the processing when the loose reference locks are being acquired,
removing another reason why the "prepare" phase might succeed and the
"finish" phase might nevertheless fail.
Second, the old code deleted the loose version of a reference before
deleting any packed version of the same reference. This left a moment
when another process might think that the packed version of the
reference is current, which is incorrect. (Even worse, the packed
version of the reference can be arbitrarily old, and might even point
at an object that has since been garbage-collected.)
Third, if a reference deletion fails to acquire the `packed-refs` lock
altogether, then the old code might leave the repository in the
incorrect state (possibly corrupt) described in the previous
paragraph.
Now we activate the new "packed-refs" file (sans any references that
are being deleted) *before* deleting the corresponding loose
references. But we hold the "packed-refs" lock until after the loose
references have been finalized, thus preventing a simultaneous
"pack-refs" process from packing the loose version of the reference in
the time gap, which would otherwise defeat our attempt to delete it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use a `packed_ref_store` transaction in the implementation of
`files_initial_transaction_commit()` rather than using internal
features of the packed ref store. This further decouples
`files_ref_store` from `packed_ref_store`.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At least since v1.7, the elements of the `refs_to_prune` linked list
have been leaked. Fix the leak by teaching `prune_refs()` to free the
list elements as it processes them.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the packed reference store supports transactions, we can use
a transaction to write the packed versions of references that we want
to pack. This decreases the coupling between `files_ref_store` and
`packed_ref_store`.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement `packed_delete_refs()` using a reference transaction. This
means that `files_delete_refs()` can use `refs_delete_refs()` instead
of `repack_without_refs()` to delete any packed references, decreasing
the coupling between the classes.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>