Make repository clean-up tasks "gc" can do available to "git
maintenance" front-end.
* ps/maintenance-missing-tasks:
builtin/maintenance: introduce "rerere-gc" task
builtin/gc: move rerere garbage collection into separate function
builtin/maintenance: introduce "worktree-prune" task
builtin/gc: move pruning of worktrees into a separate function
builtin/gc: remove global variables where it is trivial to do
builtin/gc: fix indentation of `cmd_gc()` parameters
As pointed out by CodeQL, `branch_get()` may return `NULL`, in which
case `branch_has_merge_config()` would return early, but we can even
avoid enumerating the refs prefixes in that case, saving even more CPU
cycles.
Technically, we should enclose these two statements in an `if (branch)
{...}` block, but the indentation is already quite deep, therefore I
refrained from doing that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As pointed out by CodeQL, it is a potentially dangerous practice to
store local variables' addresses in non-local structs. Yet this is
exactly what happens with the `acked_commits` attribute that is used in
`cmd_fetch()`: The pointer to a local variable is assigned to it.
Now, it is Git's convention that `cmd_*()` functions are essentially
only returning just before exiting the process, therefore there is
little danger that this attribute is used after the code flow returns
from that function.
However, code in `cmd_*()` function is often so useful that it gets
lifted into a library function, at which point this issue could become a
real problem.
Let's make sure to clear the `acked_commits` attribute out after it was
used, and before the function returns (at which point the address would
go stale).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The difference of two unsigned integers is defined to be unsigned, and
therefore it is misleading to check whether it is greater than zero
(instead, the more natural way would be to check whether the difference
is zero or not).
Let's instead avoid the subtraction altogether, and compare the two
operands directly, which makes the code more obvious as a side effect.
Pointed out by CodeQL's rule with the ID
`cpp/unsigned-difference-expression-compared-zero`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the_repository everywhere with repo, feed repo from cmd_replay()
to all the other functions in the file that need it, and remove the
UNUSED annotation on repo.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we made "git whatchanged" require "--i-still-use-this" and asked
the users to report if they still want to use it, the logical next
step is to allow us build Git without "whatchanged" to prepare for
its eventual removal.
If we were to follow the pattern established in 8ccc75c2 (remote:
announce removal of "branches/" and "remotes/", 2025-01-22), we can
do this together with the documentation update to officially list
that the command will be removed in the BreakingChanges document,
but let's just keep the changes separate just in case we want to
proceed a bit slower.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation of "git whatchanged" is pretty explicit that the
command was retained for historical reasons to help those whose fingers
cannot be retrained. Let's see if they still are finding it hard to
type "git log --raw" instead of "git whatchanged" by marking the
command as "nominated for removal", and require "--i-still-use-this"
on the command line. Adjust the tests so that the option is passed
when we invoke the command. In addition, we test that the command
fails when "--i-still-use-this" is not given.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git index-pack --fix-thin" used to abort to prevent a cycle in
delta chains from forming in a corner case even when there is no
such cycle.
* ds/fix-thin-fix:
index-pack: allow revisiting REF_DELTA chains
t5309: create failing test for 'git index-pack'
test-tool: add pack-deltas helper
Further code clean-up in the object-store layer.
* ps/object-store-cleanup:
object-store: drop `repo_has_object_file()`
treewide: convert users of `repo_has_object_file()` to `has_object()`
object-store: allow fetching objects via `has_object()`
object-store: move function declarations to their respective subsystems
object-store: move and rename `odb_pack_keep()`
object-store: drop `loose_object_path()`
object-store: move `struct packed_git` into "packfile.h"
Commands slated for removal like "git pack-redundant" now require
an explicit "--i-still-use-this" option to run. This is to
discourage casual use and surface their pending deprecation to
users.
The warning message is long, so factor it into a helper function
you_still_use_that() to simplify reuse by other commands.
Also add a missing test to ensure this enforcement works for
"pack-redundant".
Helped-by: Elijah Newren <newren@gmail.com>
[en: log message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function does not free the oidmap struct itself; it just drops all
items from the map (using hashmap_clear_() internally). It should be
called oidmap_clear(), per CodingGuidelines.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In builtin/am.c:split_mail_stgit_series, if `fopen` failed,
`series_dir_buf` allocated by `xstrdup` will leak. Add `free` in
`!fp` if branch will prevent the leak.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git mv a a/b dst" would ask to move the directory 'a' itself, as
well as its contents, in a single destination directory, which is
a contradicting request that is impossible to satisfy. This case is
now detected and the command errors out.
* ps/mv-contradiction-fix:
builtin/mv: convert assert(3p) into `BUG()`
builtin/mv: bail out when trying to move child and its parent
While git-gc(1) knows to garbage collect the rerere cache,
git-maintenance(1) does not yet have a task for this cleanup. Introduce
a new "rerere-gc" task to plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent commit we are going to introduce a new "rerere-gc" task
for git-maintenance(1). To prepare for this, refactor the code that
spawns `git rerere gc` into a separate function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While git-gc(1) knows to prune stale worktrees, git-maintenance(1) does
not yet have a task for this cleanup. Introduce a new "worktree-prune"
task to plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent commit we will introduce a new "worktree-prune" task for
git-maintenance(1). To prepare for this, refactor the code that spawns
`git worktree prune` into a separate function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use a couple of global variables to assemble command line arguments
for subprocesses we execute in git-gc(1). All of these variables except
the one for git-repack(1) are only used in a single place though, so
they don't really add anything but confusion.
Remove those variables.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The parameters of `cmd_gc()` aren't indented properly. Fix this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The use of asserts is discouraged in our codebase because they lead to
different behaviour depending on how Git is built. When being unsure
enough whether a condition always holds so that one adds the assert,
then the assert should probably trigger regardless of how Git is being
built.
Drop the call to assert(3p) in git-mv(1) and instead use `BUG()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a known issue in git-mv(1) where moving both a child and any of
its parents causes an assert to trigger because the child cannot be
found anymore in the index. We have added a test for this in commit
0fcd473fdd (t7001: add failure test which triggers assertion,
2024-10-22) without addressing the issue, which is why the test itself
is marked as `test_expect_failure`.
The behaviour of that test relies on a call to assert(3p) though, which
may or may not be compiled into the resulting binary depending on
whether or not we pass `-DNDEBUG`. When these asserts are compiled into
Git this may cause our CI to hang on Windows though, because asserts may
cause a modal window to be shown.
While we could work around the issue by converting this into a call to
`BUG()`, let's rather address the root cause of the issue by bailing out
in case we see that both a child and any of its parents are being moved
in the same command.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since a call to repo_config() can be called with repo set to NULL
these days, a command that is marked as RUN_SETUP in the builtin
command table does not have to check repo with NULL before making
the call.
* ua/call-repo-config-with-possibly-null-repository:
builtin/difftool: remove unnecessary if statement
builtin/add: remove unnecessary if statement
As the comment of `repo_has_object_file()` and its `_with_flags()`
variant tells us, these functions are considered to be deprecated in
favor of `has_object()`. There are a couple of slight benefits in favor
of the replacement:
- The new function has a short-and-sweet name.
- More explicit defaults: `has_object()` doesn't fetch missing objects
via promisor remotes, and neither does it reload packfiles if an
object wasn't found by default. This ensures that it becomes
immediately obvious when a simple object existence check may result
in expensive actions.
Most importantly though, it is confusing that we have two sets of
functions that ultimately do the same thing, but with different
defaults.
Start sunsetting `repo_has_object_file()` and its `_with_flags()`
sibling by replacing all callsites with `has_object()`:
- `repo_has_object_file(...)` is equivalent to
`has_object(..., HAS_OBJECT_RECHECK_PACKED | HAS_OBJECT_FETCH_PROMISOR)`.
- `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT)`
is equivalent to `has_object(..., 0)`.
- `repo_has_object_file_with_flags(..., OBJECT_INFO_SKIP_FETCH_OBJECT)`
is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED)`.
- `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)`
is equivalent to `has_object(..., HAS_OBJECT_FETCH_PROMISOR)`.
The replacements should be functionally equivalent.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We carry declarations for a couple of functions in "object-store.h" that
are not defined in "object-store.c", but in a different subsystem. Move
these declarations to the respective headers whose matching code files
carry the corresponding definition.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `odb_pack_keep()` creates a file at the passed-in path. If
this fails, then the function re-tries by first creating any potentially
missing leading directories and then trying to create the file once
again. As such, this function doesn't host any kind of logic that is
specific to the object store, but is rather a generic helper function.
Rename the function to `safe_create_file_with_leading_directories()` and
move it into "path.c". While at it, refactor it so that it loses its
dependency on `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As detailed in the previous changes to t5309-pack-delta-cycles.sh, the
logic within 'git index-pack' to analyze an incoming thin packfile with
REF_DELTAs is suspect. The algorithm is overly cautious around delta
cycles, and that leads in fact to failing even when there is no cycle.
This change adjusts the algorithm to no longer fail in these cases. In
fact, these cycle cases will no longer fail but more importantly the
valid cases will no longer fail, either. The resulting packfile from the
--fix-thin operation will not have cycles either since REF_DELTAs are
forbidden from the on-disk format and OFS_DELTAs are impossible to write
as a cycle.
The crux of the matter is how the algorithm works when the REF_DELTAs
point to base objects that exist in the local repository. When reading
the thin packfile, the object IDs for the delta objects are unknown so
we do not have the delta chain structure automatically. Instead, we need
to start somewhere by selecting a delta whose base is inside our current
object database.
Consider the case where the packfile has two REF_DELTA objects, A and B,
and the delta chain looks like "A depends on B" and "B depends on C" for
some third object C, where C is already in the current repository. The
algorithm _should_ start with all objects that depend on C, finding B,
and then moving on to all objects depending on B, finding A.
However, if the repository also already has object B, then the delta
chain can be analyzed in a different order. The deltas with base B can
be analyzed first, finding A, and then the deltas with base C are
analyzed, finding B. The algorithm currently continues to look for
objects that depend on B, finding A again. This fails due to A's
'real_type' member already being overwritten from OBJ_REF_DELTA to the
correct object type.
This scenario is possible in a typical 'git fetch' where the client does
not advertise B as a 'have' but requests A as a 'want' (and C is noticed
as a common object based on other 'have's). The reason this isn't
typically seen is that most Git servers use OFS_DELTAs to represent
deltas within a packfile. However, if a server uses only REF_DELTAs,
then this kind of issue can occur. There is nothing in the explicit
packfile format that states this use of inter-pack REF_DELTA is
incorrect, only that REF_DELTAs should not be used in the on-disk
representation to avoid cycles.
This die() was introduced in ab791dd138 (index-pack: fix race condition
with duplicate bases, 2014-08-29). Several refactors have adjusted the
error message and the surrounding logic, but this issue has existed for
a longer time as that was only a conversion from an assert().
The tests in t5309 originated in 3b910d0c5e (add tests for indexing
packs with delta cycles, 2013-08-23) and b2ef3d9ebb (test index-pack on
packs with recoverable delta cycles, 2013-08-23). These changes make
note that the current behavior of handling "resolvable" cycles is mostly
a documentation-only test, not that this behavior is the best way for
Git to handle the situation.
The fix here is somewhat complicated due to the amount of state being
adjusted by the loop within threaded_second_pass(). Instead of trying to
resume the start of the loop while adjusting the necessary context, I
chose to scan the REF_DELTAs depending on the current 'parent' and skip
any that have already been processed. This necessarily leaves us in a
state where 'child' and 'child_obj' could be left as NULL and that must
be handled later. There is also some careful handling around skipping
REF_DELTAs when there are also OFS_DELTAs depending on that parent.
There may be value in extending 'test-tool pack-deltas' to allow writing
OFS_DELTAs in order to exercise this logic across the delta types.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various build tweaks, including CSPRNG selection on some platforms.
* rj/build-tweaks:
config.mak.uname: set CSPRNG_METHOD to getrandom on Linux
config.mak.uname: add arc4random to the cygwin build
config.mak.uname: add sysinfo() configuration for cygwin
builtin/gc.c: correct RAM calculation when using sysinfo
config.mak.uname: add clock_gettime() to the cygwin build
config.mak.uname: add HAVE_GETDELIM to the cygwin section
config.mak.uname: only set NO_REGEX on cygwin for v1.7
config.mak.uname: add a note about NO_STRLCPY for Linux
Makefile: remove NEEDS_LIBRT build variable
meson.build: set default help format to html on windows
meson.build: only set build variables for non-default values
Makefile: only set some BASIC_CFLAGS when RUNTIME_PREFIX is set
meson.build: remove -DCURL_DISABLE_TYPECHECK
Update parse-options API to catch mistakes to pass address of an
integral variable of a wrong type/size.
* ps/parse-options-integers:
parse-options: detect mismatches in integer signedness
parse-options: introduce precision handling for `OPTION_UNSIGNED`
parse-options: introduce precision handling for `OPTION_INTEGER`
parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()`
parse-options: support unit factors in `OPT_INTEGER()`
global: use designated initializers for options
parse: fix off-by-one for minimum signed values
Code clean-up.
* ps/object-file-cleanup:
object-store: merge "object-store-ll.h" and "object-store.h"
object-store: remove global array of cached objects
object: split out functions relating to object store subsystem
object-file: drop `index_blob_stream()`
object-file: split up concerns of `HASH_*` flags
object-file: split out functions relating to object store subsystem
object-file: move `xmmap()` into "wrapper.c"
object-file: move `git_open_cloexec()` to "compat/open.c"
object-file: move `safe_create_leading_directories()` into "path.c"
object-file: move `mkdir_in_gitdir()` into "path.c"
* ps/object-file-cleanup:
object-store: merge "object-store-ll.h" and "object-store.h"
object-store: remove global array of cached objects
object: split out functions relating to object store subsystem
object-file: drop `index_blob_stream()`
object-file: split up concerns of `HASH_*` flags
object-file: split out functions relating to object store subsystem
object-file: move `xmmap()` into "wrapper.c"
object-file: move `git_open_cloexec()` to "compat/open.c"
object-file: move `safe_create_leading_directories()` into "path.c"
object-file: move `mkdir_in_gitdir()` into "path.c"
Doc mark-up updates.
* ja/doc-reset-mv-rm-markup-updates:
doc: add markup for characters in Guidelines
doc: fix asciidoctor synopsis processing of triple-dots
doc: convert git-mv to new documentation format
doc: move synopsis git-mv commands in the synopsis section
doc: convert git-rm to new documentation format
doc: fix synopsis analysis logic
doc: convert git-reset to new documentation format
When using the launchctl scheduler, the weekly job runs daily, and the
daily job runs on the first six days of each month. This appears to be
due to specifying "Day" in the calendar intervals, which according to
launchd.plist(5) is for specifying days of the month rather than days of
the week. The behaviour of running a job on the 0th day is undocumented,
but in my testing appears to be the same as not specifying "Day" in the
calendar interval, in which case the job will run daily.
Use "Weekday" in the calendar intervals, which is the correct way to
schedule jobs to run on specific days of the week.
Signed-off-by: Josh Heinrichs <joshiheinrichs@gmail.com>
Acked-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we already teach the `repo_config()` in "f29f1990b5
(config: teach repo_config to allow `repo` to be NULL, 2025-03-08)"
to allow `repo` to be NULL, no need to check if `repo` is NULL
before calling `repo_config()`.
Suggested-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we already teach the `repo_config()` in "f29f1990b5
(config: teach repo_config to allow `repo` to be NULL, 2025-03-08)"
to allow `repo` to be NULL, no need to check if `repo` is NULL
before calling `repo_config()`.
Suggested-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove remnants of the recursive merge strategy backend, which was
superseded by the ort merge strategy.
* en/merge-recursive-debug:
builtin/{merge,rebase,revert}: remove GIT_TEST_MERGE_ALGORITHM
tests: remove GIT_TEST_MERGE_ALGORITHM and test_expect_merge_algorithm
merge-recursive.[ch]: thoroughly debug these
merge, sequencer: switch recursive merges over to ort
sequencer: switch non-recursive merges over to ort
merge-ort: enable diff-algorithms other than histogram
builtin/merge-recursive: switch to using merge_ort_generic()
checkout: replace merge_trees() with merge_ort_nonrecursive()
"git blame --porcelain" mode now talks about unblamable lines and
lines that are blamed to an ignored commit.
* kn/blame-porcelain-unblamable:
blame: print unblamable and ignored commits in porcelain mode
"git fetch [<remote>]" with only the configured fetch refspec
should be the only thing to update refs/remotes/<remote>/HEAD,
but the code was overly eager to do so in other cases.
* jk/fetch-follow-remote-head-fix:
fetch: make set_head() call easier to read
fetch: don't ask for remote HEAD if followRemoteHEAD is "never"
fetch: only respect followRemoteHEAD with configured refspecs
It was reported that "t5620-backfill.sh" fails on s390x and sparc64 in a
test that exercises the "--min-batch-size" command line option. The
symptom was that the option didn't seem to have an effect: we didn't
fetch objects with a batch size of 20, but instead fetched all objects
at once.
As it turns out, the root cause is that `--min-batch-size` uses
`OPT_INTEGER()` to parse the command line option. While this macro
expects the caller to pass a pointer to an integer, we instead pass a
pointer to a `size_t`. This coincidentally works on most platforms, but
it breaks apart on the mentioned platforms because they are big endian.
This issue isn't specific to git-backfill(1): there are a couple of
other places where we have the same type confusion going on. This
indicates that the issue really is the interface that the parse-options
subsystem provides -- it is simply too easy to get this wrong as there
isn't any kind of compiler warning, and things just work on the most
common systems.
Address the systemic issue by introducing two new build asserts
`BARF_UNLESS_SIGNED()` and `BARF_UNLESS_UNSIGNED()`. As the names
already hint at, those macros will cause a compiler error when passed a
value that is not signed or unsigned, respectively.
Adapt `OPT_INTEGER()`, `OPT_UNSIGNED()` as well as `OPT_MAGNITUDE()` to
use those asserts. This uncovers a small set of sites where we indeed
have the same bug as in git-backfill(1). Adapt all of them to use the
correct option.
Reported-by: Todd Zullinger <tmz@pobox.com>
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `OPTION_INTEGER` option type accepts a signed integer. The type of
the underlying integer is a simple `int`, which restricts the range of
values accepted by such options. But there is a catch: because the
caller provides a pointer to the value via the `.value` field, which is
a simple void pointer. This has two consequences:
- There is no check whether the passed value is sufficiently long to
store the entire range of `int`. This can lead to integer wraparound
in the best case and out-of-bounds writes in the worst case.
- Even when a caller knows that they want to store a value larger than
`INT_MAX` they don't have a way to do so.
In practice this doesn't tend to be a huge issue because users typically
don't end up passing huge values to most commands. But the parsing logic
is demonstrably broken, and it is too easy to get the calling convention
wrong.
Improve the situation by introducing a new `precision` field into the
structure. This field gets assigned automatically by `OPT_INTEGER_F()`
and tracks the size of the passed value. Like this it becomes possible
for the caller to pass arbitrarily-sized integers and the underlying
logic knows to handle it correctly by doing range checks. Furthermore,
convert the code to use `strtoimax()` intstead of `strtol()` so that we
can also parse values larger than `LONG_MAX`.
Note that we do not yet assert signedness of the passed variable, which
is another source of bugs. This will be handled in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the preceding commit, `OPT_INTEGER()` has learned to support unit
factors. Consequently, the major differencen between `OPT_INTEGER()` and
`OPT_MAGNITUDE()` isn't the support of unit factors anymore, as both of
them do support them now. Instead, the difference is that one handles
signed and the other handles unsigned integers.
Adapt the name of `OPT_MAGNITUDE()` accordingly by renaming it to
`OPT_UNSIGNED()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we expose macros for most of our different option types understood
by the "parse-options" subsystem, not every combination of fields that
has one as that would otherwise quickly lead to an explosion of macros.
Instead, we just initialize structures manually for those variants of
fields that don't have a macro.
Callsites that open-code these structure initialization don't use
designated initializers though and instead just provide values for each
of the fields that they want to initialize. This has three significant
downsides:
- Callsites need to specify all values up to the last field that they
care about. This often includes fields that should simply be left at
their default zero-initialized state, which adds distraction.
- Any reader not deeply familiar with the layout of the structure
has a hard time figuring out what the respective initializers mean.
- Reordering or introducing new fields in the middle of the structure
is impossible without adapting all callsites.
Convert all sites to instead use designated initializers, which we have
started using in our codebase quite a while ago. This allows us to skip
any default-initialized fields, gives the reader context by specifying
the field names and allows us to reorder or introduce new fields where
we want to.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The man page for sysinfo(2) on Linux states that (from v2.3.48) the
sizes of the memory and swap fields, of the returned structure, are
given as multiples of 'mem_unit' bytes. In earlier versions (prior to
v2.3.23 on i386 in particular), the 'mem_unit' field was not part of
the structure, and all sizes were measured in bytes. The man page does
not discuss the motivation for this change, but it is possible that the
change was intended for the, relatively rare, 32-bit platform with more
than 4GB of memory.
The total_ram() function makes the assumption that the 'totalram' field
of the 'struct sysinfo' is measured in bytes, or alternatively that the
'mem_unit' field is always equal to one. Having writen a program to call
the sysinfo() function and print the structure fields, it seems that, on
Linux x84_64 and i686 anyway, the 'mem_unit' field is indeed set to one
(note that the 32-bit system had only 2GB ram). However, cygwin also has
an sysinfo() implementation, which gives the following values:
$ ./sysinfo
uptime: 21381
loads: 0, 0, 0
total ram: 2074637
free ram: 843237
shared ram: 0
buffer ram: 0
total swap: 327680
free swap: 306932
procs: 15
total high: 0
free high: 0
mem_unit: 4096
total ram: 8497713152
$
[This laptop has 8GB ram, so a little bit seems to be missing. ;) ]
Modify the total_ram() function to allow for the possibility that the
memory size is not specified in bytes (ie 'mem_unit' is greater than
one).
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git cat-file --batch" and friends learned to allow "--filter=" to
omit certain objects, just like the transport layer does.
* ps/cat-file-filter-batch:
builtin/cat-file: use bitmaps to efficiently filter by object type
builtin/cat-file: deduplicate logic to iterate over all objects
pack-bitmap: introduce function to check whether a pack is bitmapped
pack-bitmap: add function to iterate over filtered bitmapped objects
pack-bitmap: allow passing payloads to `show_reachable_fn()`
builtin/cat-file: support "object:type=" objects filter
builtin/cat-file: support "blob:limit=" objects filter
builtin/cat-file: support "blob:none" objects filter
builtin/cat-file: wire up an option to filter objects
builtin/cat-file: introduce function to report object status
builtin/cat-file: rename variable that tracks usage
Updating multiple references have only been possible in all-or-none
fashion with transactions, but it can be more efficient to batch
multiple updates even when some of them are allowed to fail in a
best-effort manner. A new "best effort batches of updates" mode
has been introduced.
* kn/non-transactional-batch-updates:
update-ref: add --batch-updates flag for stdin mode
refs: support rejection in batch updates during F/D checks
refs: implement batch reference update support
refs: introduce enum-based transaction error types
refs/reftable: extract code from the transaction preparation
refs/files: remove duplicate duplicates check
refs: move duplicate refname update check to generic layer
refs/files: remove redundant check in split_symref_update()
"git rev-list" learns machine-parsable output format that delimits
each field with NUL.
* jt/rev-list-z:
rev-list: support NUL-delimited --missing option
rev-list: support NUL-delimited --boundary option
rev-list: support delimiting objects with NUL bytes
rev-list: refactor early option parsing
rev-list: inline `show_object_with_name()` in `show_object()`
Code clean-up.
* js/comma-semicolon-confusion:
detect-compiler: detect clang even if it found CUDA
clang: warn when the comma operator is used
compat/regex: explicitly mark intentional use of the comma operator
wildmatch: avoid using of the comma operator
diff-delta: avoid using the comma operator
xdiff: avoid using the comma operator unnecessarily
clar: avoid using the comma operator unnecessarily
kwset: avoid using the comma operator unnecessarily
rebase: avoid using the comma operator unnecessarily
remote-curl: avoid using the comma operator unnecessarily
"git clone" still gave the message about the default branch name;
this message has been turned into an advice message that can be
turned off.
* jt/clone-guess-remote-head-fix:
advice: allow disabling default branch name advice
builtin/clone: suppress unexpected default branch advice
remote: allow `guess_remote_head()` to suppress advice
The job to coalesce loose objects into packfiles in "git
maintenance" now has configurable batch size.
* ds/maintenance-loose-objects-batchsize:
maintenance: add loose-objects.batchSize config
maintenance: force progress/no-quiet to children
"git reflog" learns "drop" subcommand, that discards the entire
reflog data for a ref.
* kn/reflog-drop:
reflog: implement subcommand to drop reflogs
reflog: improve error for when reflog is not found
The object layer has been updated to take an explicit repository
instance as a parameter in more code paths.
* ps/object-wo-the-repository:
hash: stop depending on `the_repository` in `null_oid()`
hash: fix "-Wsign-compare" warnings
object-file: split out logic regarding hash algorithms
delta-islands: stop depending on `the_repository`
object-file-convert: stop depending on `the_repository`
pack-bitmap-write: stop depending on `the_repository`
pack-revindex: stop depending on `the_repository`
pack-check: stop depending on `the_repository`
environment: move access to "core.bigFileThreshold" into repo settings
pack-write: stop depending on `the_repository` and `the_hash_algo`
object: stop depending on `the_repository`
csum-file: stop depending on `the_repository`
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.
Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions `hash_object_file()`, `write_object_file()` and
`index_fd()` reuse the same set of flags to alter their behaviour. This
not only adds confusion, but given that every function only supports a
subset of the flags it becomes very hard to see which flags can be
passed to what function. Last but not least, this entangles the
implementation of all three function families.
Split up concerns by creating separate flags for each of the function
families.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we have the "object-store.h" header, most of the functionality for
object stores is actually hosted in "object-file.c". This makes it hard
to find relevant functions and causes us to mix up concerns.
Split out functions relating to the object store subsystem into a new
"object-store.c" file.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `safe_create_leading_directories()` function and its relatives are
located in "object-file.c", which is not a good fit as they provide
generic functionality not related to objects at all. Move them into
"path.c", which already hosts `safe_create_dir()` and its relative
`safe_create_dir_in_gitdir()`.
"path.c" is free of `the_repository`, but the moved functions depend on
`the_repository` to read the "core.sharedRepository" config. Adapt the
function signature to accept a repository as argument to fix the issue
and adjust callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `mkdir_in_gitdir()` function is similar to `safe_create_dir()`, but
the former is hosted in "object-file.c" whereas the latter is hosted in
"path.c". The latter code unit makes way more sense though as the logic
has nothing to do with object files in particular.
Move the file into "path.c". While at it, we:
- Rename the function to `safe_create_dir_in_gitdir()` so that the
function names are similar to one another.
- Remove the dependency on `the_repository` by making the callers pass
the repository instead.
Adjust callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Unfortunately, there's an inconsistency in the synopsis style, where
the ellipsis is used to indicate that the option can be repeated, but
it can also be used in Git's three-dot notation to indicate a range of
commits. The rendering engine will not be able to distinguish
between these two cases.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This also entails changing the help output for the command to match the new
synopsis.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We ignore any error returned from set_head(), but 638060dcb9 (fetch
set_head: refactor to use remote directly, 2025-01-26) left its call in
a noop "if" conditional as a sort of note-to-self.
When c834d1a7ce (fetch: only respect followRemoteHEAD with configured
refspecs, 2025-03-18) added a "do_set_head" flag, it was rolled into the
same conditional, putting set_head() on the right-hand side of a
short-circuit AND.
That's not wrong, but it really hides the point of the line, which
is (maybe) calling the function.
Instead, let's have a full if() block for the flag, and then our comment
(with some rewording) will be sufficient to clarify the error handling.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we already teach the `repo_config()` in f29f1990 (config:
teach repo_config to allow `repo` to be NULL, 2025-03-08) to allow
`repo` to be NULL, no need to check if `repo` is NULL before calling
`repo_config()`.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/object-wo-the-repository:
hash: stop depending on `the_repository` in `null_oid()`
hash: fix "-Wsign-compare" warnings
object-file: split out logic regarding hash algorithms
delta-islands: stop depending on `the_repository`
object-file-convert: stop depending on `the_repository`
pack-bitmap-write: stop depending on `the_repository`
pack-revindex: stop depending on `the_repository`
pack-check: stop depending on `the_repository`
environment: move access to "core.bigFileThreshold" into repo settings
pack-write: stop depending on `the_repository` and `the_hash_algo`
object: stop depending on `the_repository`
csum-file: stop depending on `the_repository`
This environment variable existed to allow the testsuite to reuse all
the merge-related tests in the testsuite while easily flipping between
the 'recursive' and the 'ort' backends. Now that we have removed
merge-recursive and remapped 'recursive' to mean 'ort', we don't need
this scaffolding anymore. Remove it from these three builtins.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More precisely, replace calls to merge_recursive() with
merge_ort_recursive().
Also change t7615 to quit calling out recursive; it is not needed
anymore, and we are in fact using ort now.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Switch from merge-recursive to merge-ort. Adjust the following
testcases due to the switch:
* t6430: most of the test differences here were due to improved D/F
conflict handling explained in more detail in ef52778708 (merge
tests: expect improved directory/file conflict handling in ort,
2020-10-26). These changes weren't made to this test back in that
commit simply because I had been looking at `git merge` rather than
`git merge-recursive`. The final test in this testsuite, though, was
expunged because it was looking for specific output, and the calls to
output_commit_title() were discarded from merge_ort_internal() in its
adaptation from merge_recursive_internal(); see 8119214f4e
(merge-ort: implement merge_incore_recursive(), 2020-12-16).
* t6434: This test is built entirely around rename/delete conflicts,
which had a suboptimal handling under merge-recursive. As explained
in more detail in commits 1f3c9ba707 ("t6425: be more flexible with
rename/delete conflict messages", 2020-08-10) and 727c75b23f ("t6404,
t6423: expect improved rename/delete handling in ort backend",
2020-10-26), rename/delete conflicts should each have two entries in
the index rather than just one. Adjust the expectations for all the
tests in this testcase to see the two entries per rename/delete
conflict.
* t6424: merge-recursive had a special check-if-toplevel-trees-match
check that it ran at the beginning on both the merge-base and the
other side being merged in. In such a case, it exited early and
printed an "Already up to date." message. merge-ort got rid of
this, and instead checks the merge base tree matching the other
side throughout the tree instead of just at the toplevel, allowing
it to avoid recursing into various subtrees. As part of that, it
got rid of the specialty toplevel message. That message hasn't
been missed for years from `git merge`, so I don't think it is
necessary to keep it just for `git merge-recursive`, especially
since the latter is rarely used. (git itself only references it
in the testsuite, whereas it used to power one of the three
rebase backends that existed once upon a time.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the use of merge_trees() from merge-recursive.[ch] with the
merge-ort equivalent.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Incrementally updating multi-pack index files.
* tb/incremental-midx-part-2:
midx: implement writing incremental MIDX bitmaps
pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
pack-bitmap.c: keep track of each layer's type bitmaps
ewah: implement `struct ewah_or_iterator`
pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
pack-bitmap.c: compute disk-usage with incremental MIDXs
pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
pack-bitmap.c: open and store incremental bitmap layers
pack-revindex: prepare for incremental MIDX bitmaps
Documentation: describe incremental MIDX bitmaps
Documentation: remove a "future work" item from the MIDX docs
When updating multiple references through stdin, Git's update-ref
command normally aborts the entire transaction if any single update
fails. This atomic behavior prevents partial updates. Introduce a new
batch update system, where the updates the performed together similar
but individual updates are allowed to fail.
Add a new `--batch-updates` flag that allows the transaction to continue
even when individual reference updates fail. This flag can only be used
in `--stdin` mode and builds upon the batch update support added to the
refs subsystem in the previous commits. When enabled, failed updates are
reported in the following format:
rejected SP (<old-oid> | <old-target>) SP (<new-oid> | <new-target>) SP <rejection-reason> LF
Update the documentation to reflect this change and also tests to cover
different scenarios where an update could be rejected.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace preprocessor-defined transaction errors with a strongly-typed
enum `ref_transaction_error`. This change:
- Improves type safety and function signature clarity.
- Makes error handling more explicit and discoverable.
- Maintains existing error cases, while adding new error cases for
common scenarios.
This refactoring paves the way for more comprehensive error handling
which we will utilize in the upcoming commits to add batch reference
update support.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, git-maintenance(1) uses the "gc" task to ensure that the
repository is well-maintained. This can be changed, for example by
either explicitly configuring which tasks should be enabled or by using
the "incremental" maintenance strategy. If so, git-maintenance(1) does
not know to expire reflog entries, which is a subtask that git-gc(1)
knows to perform for the user. Consequently, the reflog will grow
indefinitely unless the user manually trims it.
Introduce a new "reflog-expire" task that plugs this gap:
- When running the task directly, then we simply execute `git reflog
expire --all`, which is the same as git-gc(1).
- When running git-maintenance(1) with the `--auto` flag, then we only
run the task in case the "HEAD" reflog has at least N reflog entries
that would be discarded. By default, N is set to 100, but this can
be configured via "maintenance.reflog-expire.auto". When a negative
integer has been provided we always expire entries, zero causes us
to never expire entries, and a positive value specifies how many
entries need to exist before we consider pruning the entries.
Note that the condition for the `--auto` flags is merely a heuristic and
optimized for being fast. This is because `git maintenance run --auto`
will be executed quite regularly, so scanning through all reflogs would
likely be too expensive in many repositories.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're about to introduce a new task for git-maintenance(1) that knows to
expire reflog entries. The logic will be shared with git-gc(1), which
already knows how to do this.
Pull out the common logic into a separate function so that we can share
the implementation between both builtins.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make functions that are required to manage `reflog_expire_options`
available elsewhere by moving them into "reflog.c" and exposing them in
the corresponding header. The functions will be used in a subsequent
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As described in the preceding commit, the per-reflog expiry dates are
stored in a global pair of variables. Refactor the code so that they are
contained in `struct reflog_expire_options` to make the structure useful
in other contexts.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When expiring reflog entries, it is possible to configure expiry dates
that depend on the name of the reflog. This requires us to store a
couple of different expiry dates:
- The default expiry date for reflog entries that aren't otherwise
specified.
- The per-reflog expiry date.
- The currently active set of expiry dates for a given reference.
While the last item is stored in `struct reflog_expire_options`, the
other items aren't, which makes it hard to reuse the structure in other
places.
Refactor the code so that the default expiry date is stored as part of
the structure. The per-reflog expiry dates will be adapted accordingly
in the subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're about to expose `struct cmd_reflog_expire_cb` via "reflog.h" so
that we can also use this structure in "builtin/gc.c". Once we make it
accessible to a wider scope though it becomes awkwardly named, as it
isn't only useful in the context of a callback. Instead, the function is
containing all kinds of options relevant to whether or not a reflog
entry should be expired.
Rename the structure to `reflog_expire_options` to prepare for this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git-blame(1)' command allows users to ignore specific revisions via
the '--ignore-rev <rev>' and '--ignore-revs-file <file>' flags. These
flags are often combined with the 'blame.markIgnoredLines' and
'blame.markUnblamableLines' config options. These config options prefix
ignored and unblamable lines with a '?' and '*', respectively.
However, this option was never extended to the porcelain mode of
'git-blame(1)'. Since the documentation does not indicate this
exclusion, it is a bug.
Fix this by printing 'ignored' and 'unblamable' respectively for the
options when using the porcelain modes.
Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Toon Claes <toon@iotcl.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While it is now possible to filter objects by type, this mechanism is
for now mostly a convenience. Most importantly, we still have to iterate
through the whole packfile to find all objects of a specific type. This
can be prohibitively expensive depending on the size of the packfiles.
It isn't really possible to do better than this when only considering a
packfile itself, as the order of objects is not fixed. But when we have
a packfile with a corresponding bitmap, either because the packfile
itself has one or because the multi-pack index has a bitmap for it, then
we can use these bitmaps to improve the runtime.
While bitmaps are typically used to compute reachability of objects,
they also contain one bitmap per object type that encodes which object
has what type. So instead of reading through the whole packfile(s), we
can use the bitmaps and iterate through the type-specific bitmap.
Typically, only a subset of packfiles will have a bitmap. But this isn't
really much of a problem: we can use bitmaps when available, and then
use the non-bitmap walk for every packfile that isn't covered by one.
Overall, this leads to quite a significant speedup depending on how many
objects of a certain type exist. The following benchmarks have been
executed in the Chromium repository, which has a 50GB packfile with
almost 25 million objects. As expected, there isn't really much of a
change in performance without an object filter:
Benchmark 1: cat-file with no-filter (revision = HEAD~)
Time (mean ± σ): 89.675 s ± 4.527 s [User: 40.807 s, System: 10.782 s]
Range (min … max): 83.052 s … 96.084 s 10 runs
Benchmark 2: cat-file with no-filter (revision = HEAD)
Time (mean ± σ): 88.991 s ± 2.488 s [User: 42.278 s, System: 10.305 s]
Range (min … max): 82.843 s … 91.271 s 10 runs
Summary
cat-file with no-filter (revision = HEAD) ran
1.01 ± 0.06 times faster than cat-file with no-filter (revision = HEAD~)
We still have to scan through all objects as we yield all of them, so
using the bitmap in this case doesn't really buy us anything. What is
noticeable in this benchmark is that we're I/O-bound, not CPU-bound, as
can be seen from the user/system runtimes, which combined are way lower
than the overall benchmarked runtime.
But when we do use a filter we can see a significant improvement:
Benchmark 1: cat-file with filter=object:type=commit (revision = HEAD~)
Time (mean ± σ): 86.444 s ± 4.081 s [User: 36.830 s, System: 11.312 s]
Range (min … max): 80.305 s … 93.104 s 10 runs
Benchmark 2: cat-file with filter=object:type=commit (revision = HEAD)
Time (mean ± σ): 2.089 s ± 0.015 s [User: 1.872 s, System: 0.207 s]
Range (min … max): 2.073 s … 2.119 s 10 runs
Summary
cat-file with filter=object:type=commit (revision = HEAD) ran
41.38 ± 1.98 times faster than cat-file with filter=object:type=commit (revision = HEAD~)
This is because we don't have to scan through all packfiles anymore, but
can instead directly look up relevant objects.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pull out a common function that allows us to iterate over all objects in
a repository. Right now the logic is trivial and would only require two
function calls, making this refactoring a bit pointless. But in the next
commit we will iterate on this logic to make use of bitmaps, so this is
about to become a bit more complex.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `show_reachable_fn` callback is used by a couple of functions to
present reachable objects to the caller. The function does not provide a
way for the caller to pass a payload though, which is functionality that
we'll require in a subsequent commit.
Change the callback type to accept a payload and adapt all callsites
accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement support for the "object:type=" filter in git-cat-file(1),
which causes us to omit all objects that don't match the provided object
type.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement support for the "blob:limit=" filter in git-cat-file(1), which
causes us to omit all blobs that are bigger than a certain size.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement support for the "blob:none" filter in git-cat-file(1), which
causes us to omit all blobs.
Note that this new filter requires us to read the object type via
`oid_object_info_extended()` in `batch_object_write()`. But as we try to
optimize away reading objects from the database the `data->info.typep`
pointer may not be set. We thus have to adapt the logic to conditionally
set the pointer in cases where the filter is given.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In batch mode, git-cat-file(1) enumerates all objects and prints them
by iterating through both loose and packed objects. This works without
considering their reachability at all, and consequently most options to
filter objects as they exist in e.g. git-rev-list(1) are not applicable.
In some situations it may still be useful though to filter objects based
on properties that are inherent to them. This includes the object size
as well as its type.
Such a filter already exists in git-rev-list(1) with the `--filter=`
command line option. While this option supports a couple of filters that
are not applicable to our usecase, some of them are quite a neat fit.
Wire up the filter as an option for git-cat-file(1). This allows us to
reuse the same syntax as in git-rev-list(1) so that we don't have to
reinvent the wheel. For now, we die when any of the filter options has
been passed by the user, but they will be wired up in subsequent
commits.
Further note that the filters that we are about to introduce don't
significantly speed up the runtime of git-cat-file(1). While we can skip
emitting a lot of objects in case they are uninteresting to us, the
majority of time is spent reading the packfile, which is bottlenecked by
I/O and not the processor. This will change though once we start to make
use of bitmaps, which will allow us to skip reading the whole packfile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have multiple callsites that report the status of an object, for
example when the objec tis missing or its name is ambiguous. We're about
to add a couple more such callsites to report on "excluded" objects.
Prepare for this by introducing a new function `report_object_status()`
that encapsulates the functionality.
Note that this function also flushes stdout, which is a requirement so
that request-response style batched modes can learn about the status
before proceeding to the next object. We already flush correctly at all
existing callsites, even though the flush in `batch_one_object()` only
comes after the switch statement. That flush is now redundant, and we
could in theory deduplicate it by moving it into all branches that don't
use `report_object_status()`. But that doesn't quite feel sensible:
- The duplicate flush should ultimately just be a no-op for us and
thus shouldn't impact performance significantly.
- By keeping the flush in `report_object_status()` we ensure that all
future callers get semantics correct.
So let's just be pragmatic and live with the duplicated flush.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The usage strings for git-cat-file(1) that we pass to `parse_options()`
and `usage_msg_optf()` are stored in a variable called `usage`. This
variable shadows the declaration of `usage()`, which we'll want to use
in a subsequent commit.
Rename the variable to `builtin_catfile_usage`, which is in line with
how the variable is typically called in other builtins.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An earlier code refactoring of the hash machinery missed a few
required calls to init_fn.
* jh/hash-init-fixes:
index-pack, unpack-objects: restore missing ->init_fn
Using "git name-rev --stdin" as an example, improve the framework to
prepare tests to pretend to be in the future where the breaking
changes have already happened.
* jc/name-rev-stdin:
name-rev: remove "--stdin" support
t6120: further modernize
t6120: avoid hiding "git" exit status
t: introduce WITH_BREAKING_CHANGES prerequisite
t: extend test_lazy_prereq
t: document test_lazy_prereq
There are multiple places in loops, where a signed and an
unsigned data type are compared. Git uses a mix of signed and unsigned
types to store lengths of arrays. This sometimes leads to using a signed
index for an array whose length is stored in an unsigned variable or
vice versa.
get_ours_cache_pos is a special case where i, though derived from a
signed variable is never negative. Move this part to the caller side
and make i an unsigned argument of the function. Rename i to
pos to make it descriptive, now that it is a function argument.
Replace signed data types with unsigned data types and vice versa
wherever necessary. Where both signed and unsigned data types have been
used, define a new variable in the scope of the for loop for use as the
iterator. Remove #define DISABLE_SIGN_COMPARE_WARNINGS.
Signed-off-by: Arnav Bhate <bhatearnav@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Miscellaneous code clean-ups.
* en/random-cleanups:
merge-ort: remove extraneous word in comment
merge-ort: fix accidental strset<->strintmap
t7615: be more explicit about diff algorithm used
t6423: fix a comment that accidentally reversed two commits
stash: remove merge-recursive.h include
Certain "cruft" objects would have never been refreshed when there
are multiple cruft packs in the repository, which has been
corrected.
* tb/multi-cruft-pack-refresh-fix:
builtin/pack-objects.c: freshen objects from existing cruft packs
In protocol v2 where the refs advertisement is constrained, we try
to tell the server side not to limit the advertisement when there
is no specific need to, which has been the source of confusion and
recent bugs. Revamp the logic to simplify.
* jk/fetch-ref-prefix-cleanup:
fetch: use ref prefix list to skip ls-refs
fetch: avoid ls-refs only to ask for HEAD symref update
fetch: stop protecting additions to ref-prefix list
fetch: ask server to advertise HEAD for config-less fetch
refspec_ref_prefixes(): clean up refspec_item logic
t5516: beef up exact-oid ref prefixes test
t5516: drop NEEDSWORK about v2 reachability behavior
t5516: prefer "oid" to "sha1" in some test titles
t5702: fix typo in test name
First step of deprecating and removing merge-recursive.
* en/merge-ort-prepare-to-remove-recursive:
am: switch from merge_recursive_generic() to merge_ort_generic()
merge-ort: fix merge.directoryRenames=false
t3650: document bug when directory renames are turned off
merge-ort: support having merge verbosity be set to 0
merge-ort: allow rename detection to be disabled
merge-ort: add new merge_ort_generic() function
The code paths to check whether a refname X is available (by seeing
if another ref X/Y exists, etc.) have been optimized.
* ps/refname-avail-check-optim:
refs: reuse iterators when determining refname availability
refs/iterator: implement seeking for files iterators
refs/iterator: implement seeking for packed-ref iterators
refs/iterator: implement seeking for ref-cache iterators
refs/iterator: implement seeking for reftable iterators
refs/iterator: implement seeking for merged iterators
refs/iterator: provide infrastructure to re-seek iterators
refs/iterator: separate lifecycle from iteration
refs: stop re-verifying common prefixes for availability
refs/files: batch refname availability checks for initial transactions
refs/files: batch refname availability checks for normal transactions
refs/reftable: batch refname availability checks
refs: introduce function to batch refname availability checks
builtin/update-ref: skip ambiguity checks when parsing object IDs
object-name: allow skipping ambiguity checks in `get_oid()` family
object-name: introduce `repo_get_oid_with_flags()`
"git fast-export | git fast-import" learns to deal with commit and
tag objects with embedded signatures a bit better.
* cc/signed-fast-export-import:
fast-export, fast-import: add support for signed-commits
fast-export: do not modify memory from get_commit_buffer
git-fast-export.adoc: clarify why 'verbatim' may not be a good idea
fast-export: rename --signed-tags='warn' to 'warn-verbatim'
fast-export: fix missing whitespace after switch
git-fast-import.adoc: add missing LF in the BNF
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. Better use a
semicolon instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A handful of built-in command implementations have been rewritten
to use the repository instance supplied by git.c:run_builtin(), its
caller.
* ua/some-builtins-wo-the-repository:
builtin/checkout-index: stop using `the_repository`
builtin/for-each-ref: stop using `the_repository`
builtin/ls-files: stop using `the_repository`
builtin/pack-refs: stop using `the_repository`
builtin/send-pack: stop using `the_repository`
builtin/verify-commit: stop using `the_repository`
builtin/verify-tag: stop using `the_repository`
config: teach repo_config to allow `repo` to be NULL
"git fsck" becomes more careful when checking the refs.
* sj/ref-consistency-checks-more:
builtin/fsck: add `git refs verify` child process
packed-backend: check whether the "packed-refs" is sorted
packed-backend: add "packed-refs" entry consistency check
packed-backend: check whether the refname contains NUL characters
packed-backend: add "packed-refs" header consistency check
packed-backend: check if header starts with "# pack-refs with: "
packed-backend: check whether the "packed-refs" is regular file
builtin/refs: get worktrees without reading head information
t0602: use subshell to ensure working directory unchanged
In 199f44cb2e (builtin/clone: allow remote helpers to detect repo,
2024-02-27), clones started partially initializing the refdb before
executing the remote helpers by creating a HEAD file and "refs/"
directory. This has resulted in some scenarios where git-clone(1) now
prints the default branch name advice message where it previously did
not.
A side-effect of the HEAD file already existing, is that computation of
the default branch name is handled later in execution. This matters
because prior to 97abaab5f6 (refs: drop `git_default_branch_name()`,
2024-05-17), the default branch value would be computed during its first
execution and cached. Subsequent invocations would simply return the
cached value. Since the next `git_default_branch_name()` call site,
which is invoked through `guess_remote_head()`, is not configured to
suppress the advice message, computing the default branch name results
in the advice message being printed.
Configure `guess_remote_head()` to suppress the advice message,
restoring the previous behavior.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `repo_default_branch_name()` invoked through `guess_remote_head()`
is configured to always display the default branch advice message.
Adapt `guess_remote_head()` to accept flags and convert the `all`
parameter to a flag. Add the `REMOTE_GUESS_HEAD_QUIET` flag to to enable
suppression of advice messages. Call sites are updated accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'loose-objects' task of 'git maintenance run' first deletes loose
objects that exit within packfiles and then collects loose objects into
a packfile. This second step uses an implicit limit of fifty thousand
that cannot be modified by users.
Add a new config option that allows this limit to be adjusted or ignored
entirely.
While creating tests for this option, I noticed that actually there was
an off-by-one error due to the strict comparison in the limit check. I
considered making the limit check turn true on equality, but instead I
thought to use INT_MAX as a "no limit" barrier which should mean it's
never possible to hit the limit. Thus, a new decrement to the limit is
provided if the value is positive. (The restriction to positive values
is to avoid underflow if INT_MIN is configured.)
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --no-quiet option for 'git maintenance run' is supposed to indicate
that progress should happen even while ignoring the value of isatty(2).
However, Git implicitly asks child processes to check isatty(2) since
these arguments are not passed through.
The pass through of --no-quiet will be useful in a test in the next
change.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the pack-bitmap machinery has learned how to read and interact
with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery
(and relevant callers from within the MIDX machinery) to write such
bitmaps.
The details for doing so are mostly straightforward. The main changes
are as follows:
- find_object_pos() now makes use of an extra MIDX parameter which is
used to locate the bit positions of objects which are from previous
layers (and thus do not exist in the current layer's pack_order
field).
(Note also that the pack_order field is moved into struct
write_midx_context to further simplify the callers for
write_midx_bitmap()).
- bitmap_writer_build_type_index() first determines how many objects
precede the current bitmap layer and offsets the bits it sets in
each respective type-level bitmap by that amount so they can be OR'd
together.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of the reference transaction commit phase, the transaction is
set to a closed state regardless of whether it was successful of not.
Attempting to abort a closed transaction via `ref_transaction_abort()`
results in a `BUG()`.
In c92abe71df (builtin/fetch: fix leaking transaction with `--atomic`,
2024-08-22), logic to free a transaction after the commit phase is moved
to the centralized exit path. In cases where the transaction commit
failed, this results in a closed transaction being aborted and signaling
a bug.
Free the transaction and set it to NULL when the commit fails. This
allows the exit path to correctly handle the error without attempting to
abort the transaction.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit changed the behavior of repack's '--max-cruft-size'
to specify a cruft pack-specific override for '--max-pack-size'.
Introduce a new flag, '--combine-cruft-below-size' which is a
replacement for the old behavior of '--max-cruft-size'. This new flag
does explicitly what it says: it combines together cruft packs which are
smaller than a given threshold, and leaves alone ones which are
larger.
This accomplishes the original intent of '--max-cruft-size', which was
to avoid repacking cruft packs larger than the given threshold.
The new behavior is slightly different. Instead of building up small
packs together until the threshold is met, '--combine-cruft-below-size'
packs up *all* cruft packs smaller than the threshold. This means that
we may make a pack much larger than the given threshold (e.g., if you
aggregate 5 packs which are each 99 MiB in size with a threshold of 100
MiB).
But that's OK: the point isn't to restrict the size of the cruft packs
we generate, it's to avoid working with ones that have already grown too
large. If repositories still want to limit the size of the generated
cruft pack(s), they may use '--max-cruft-size'.
There's some minor test fallout as a result of the slight differences in
behavior between the old meaning of '--max-cruft-size' and the behavior
of '--combine-cruft-below-size'. In the test which is now called
"--combine-cruft-below-size combines packs", we need to use the new flag
over the old one to exercise that test's intended behavior. The
remainder of the changes there are to improve the clarity of the
comments.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 37dc6d8104 (builtin/repack.c: implement support for
`--max-cruft-size`, 2023-10-02), we exposed new functionality that
allowed repositories to specify the behavior of when we should combine
multiple cruft packs together.
This feature was designed to ensure that we never repacked cruft packs
which were larger than the given threshold in order to provide tighter
I/O bounds for repositories that have many unreachable objects. In
essence, specifying '--max-cruft-size=N' instructed 'repack' to
aggregate cruft packs together (in order of ascending size) until the
combine size grows past 'N', and then make a new cruft pack whose
contents includes the packs we rolled up.
But this isn't quite how it works in practice. Suppose for example that
we have two cruft packs which are each 100MiB in size. One might expect
specifying "--max-cruft-size=200M" would combine these two packs
together, and then avoid repacking them until a pruning GC takes place.
In reality, 'repack' would try and aggregate these together, but writing
a pack that is strictly smaller than 200 MiB (since pack-objects'
"--max-pack-size" provides a strict bound for packs containing more than
one object).
So instead we'll write out a pack that is, say, 199 MiB in size, and
then another 1 MiB pack containing the balance. If we later repack the
repository without adding any new unreachable objects, we'll repeat the
same exercise again, making the same 199 MiB and 1 MiB packs each time.
This happens because of a poor choice to bolt the '--max-cruft-size'
functionality onto pack-objects' '--max-pack-size', forcing us to
generate packs which are always smaller than the provided threshold and
thus subject to repacking.
The following commit will introduce a new flag that implements something
similar to the behavior above. Let's prepare for that by making repack's
'--max-cruft-size' flag behave as an cruft pack-specific override for
'--max-pack-size'.
Do so by temporarily repurposing the 'collapse_small_cruft_packs()'
function to instead generate a cruft pack using the same instructions as
if we didn't specify any maximum pack size. The calling code looks
something like:
if (args->max_pack_size && !cruft_expiration) {
collapse_small_cruft_packs(in, args->max_pack_size, existing);
} else {
for_each_string_list_item(item, &existing->non_kept_packs)
fprintf(in, "-%s.pack\n", item->string);
for_each_string_list_item(item, &existing->cruft_packs)
fprintf(in, "-%s.pack\n", item->string);
}
This patch makes collapse_small_cruft_packs() behave identically to the
'else' arm of the conditional above. This repurposing of
'collapse_small_cruft_packs()' is intentional, since it will set us up
nicely to introduce the new behavior in the following commit.
Naturally, there is some test fallout in the test which exercises the
old meaning of '--max-cruft-size'. Mark that test as failing for now to
be dealt with in the following commit. Likewise, add a new test which
explicitly tests the behavior of '--max-cruft-size' to place a hard
limit on the size of any generated cruft pack(s).
Note that this is a breaking change, as it alters the user-visible
behavior of '--max-cruft-size'. But I'm OK changing this behavior in
this instance, since the behavior wasn't accurate to begin with.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--missing={print,print-info}` option for git-rev-list(1) prints
missing objects found while performing the object walk in the form:
$ git rev-list --missing=print-info <rev>
?<oid> [SP <token>=<value>]... LF
Add support for printing missing objects in a NUL-delimited format when
the `-z` option is enabled.
$ git rev-list -z --missing=print-info <rev>
<oid> NUL missing=yes NUL [<token>=<value> NUL]...
In this mode, values containing special characters or spaces are printed
as-is without being escaped or quoted. Instead of prefixing the missing
OID with '?', a separate `missing=yes` token/value pair is appended.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--boundary` option for git-rev-list(1) prints boundary objects
found while performing the object walk in the form:
$ git rev-list --boundary <rev>
-<oid> LF
Add support for printing boundary objects in a NUL-delimited format when
the `-z` option is enabled.
$ git rev-list -z --boundary <rev>
<oid> NUL boundary=yes NUL
In this mode, instead of prefixing the boundary OID with '-', a separate
`boundary=yes` token/value pair is appended.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When walking objects, git-rev-list(1) prints each object entry on a
separate line. Some options, such as `--objects`, may print additional
information about tree and blob object on the same line in the form:
$ git rev-list --objects <rev>
<tree/blob oid> SP [<path>] LF
Note that in this form the SP is appended regardless of whether the tree
or blob object has path information available. Paths containing a
newline are also truncated at the newline.
Introduce the `-z` option for git-rev-list(1) which reformats the output
to use NUL-delimiters between objects and associated info in the
following form:
$ git rev-list -z --objects <rev>
<oid> NUL [path=<path> NUL]
In this form, the start of each record is signaled by an OID entry that
is all hexidecimal and does not contain any '='. Additional path info
from `--objects` is appended to the record as a token/value pair
`path=<path>` as-is without any truncation.
For now, the `--objects` flag is the only options that can be used in
combination with `-z`. In a subsequent commit, NUL-delimited support for
other options is added. Other options that do not make sense when used
in combination with `-z` are rejected.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before invoking `setup_revisions()`, the `--missing` and
`--exclude-promisor-objects` options are parsed early. In a subsequent
commit, another option is added that must be parsed early.
Refactor the code to parse both options in a single early pass.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `show_object_with_name()` function only has a single call site.
Inline call to `show_object_with_name()` in `show_object()` so the
explicit function can be cleaned up and live closer to where it is used.
While at it, factor out the code that prints the OID and newline for
both objects with and without a name. In a subsequent commit,
`show_object()` is modified to support printing object information in a
NUL-delimited format.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For similar reasons as in the previous refactoring of `refspec_init()`
into `refspec_init_fetch()` and `refspec_init_push()`, apply the same
refactoring to `refspec_item_init()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two callers of this function, which ensures that a dispatched
call to refspec_item_init() does not fail.
In the following commit, we're going to add fetch/push-specific variants
of refspec_item_init(), which will turn one function into two. To avoid
introducing yet another pair of new functions (such as
refspec_item_init_push_or_die() and refspec_item_init_fetch_or_die()),
let's remove the thin wrapper entirely.
This duplicates a single line of code among two callers, but thins the
refspec.h API by one function, and prevents introducing two more in the
following commit.
Note that we still have a trailing Boolean argument in the function
`refspec_item_init()`. The following commit will address this.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 6d4c057859 (refspec: introduce struct refspec, 2018-05-16), we
have macros called REFSPEC_FETCH and REFSPEC_PUSH. This confusingly
suggests that we might introduce other modes in the future, which, while
possible, is highly unlikely.
But these values are treated as a Boolean, and stored in a struct field
called 'fetch'. So the following:
if (refspec->fetch == REFSPEC_FETCH) { ... }
, and
if (refspec->fetch) { ... }
are equivalent. Let's avoid renaming the Boolean values "true" and
"false" here and remove the two REFSPEC_ macros mentioned above.
Since this value is truly a Boolean and will only ever take on a value
of 0 or 1, we can declare it as a single bit unsigned field. In
practice this won't shrink the size of 'struct refspec', but it more
clearly indicates the intent.
Note that this introduces some awkwardness like:
refspec_item_init_or_die(&spec, refspec, 1);
, where it's unclear what the final "1" does. This will be addressed in
the following commits.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 0578f1e66a ("global: adapt callers to use generic hash context helpers")
accidentally removed `->init_fn`, which is required for OpenSSL 3+ SHA1.
This fixes the following error on fetch:
fatal: fetch-pack: invalid index-pack output
Signed-off-by: Jensen Huang <hmz007@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are going to consider updating the refs/remotes/*/HEAD symref,
we have to ask the remote side where its HEAD points. But if we know
that the feature is disabled by config, we don't need to bother!
This saves a little bit of work and network communication for the
server. And even a little bit of effort on the client, as our local
set_head() function did a bit of work matching the remote HEAD before
realizing that we're not going to do anything with it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new followRemoteHEAD feature is triggered for almost every fetch,
causing us to ask the server about the remote "HEAD" and to consider
updating our local tracking HEAD symref. This patch limits the feature
only to the case when we are fetching a remote using its configured
refspecs (typically into its refs/remotes/ hierarchy). There are two
reasons for this.
One is efficiency. E.g., the fixes in 6c915c3f85 (fetch: do not ask for
HEAD unnecessarily, 2024-12-06) and 20010b8c20 (fetch: avoid ls-refs
only to ask for HEAD symref update, 2025-03-08) were aimed at reducing
the work we do when we would not be able to update HEAD anyway. But they
do not quite cover all cases. The remaining one is:
git fetch origin refs/heads/foo:refs/remotes/origin/foo
which _sometimes_ can update HEAD, but usually not. And that leads us to
the second point, which is being simple and explainable.
The code for updating the tracking HEAD symref requires both that we
learned which ref the remote HEAD points at, and that the server
advertised that ref to us. But because the v2 protocol narrows the
server's advertisement, the command above would not typically update
HEAD at all, unless it happened to point to the "foo" branch. Or even
weirder, it probably _would_ update if the server is very old and
supports only the v0 protocol, which always gives a full advertisement.
This creates confusing behavior for the user: sometimes we may try to
update HEAD and sometimes not, depending on vague rules.
One option here would be to loosen the update code to accept the remote
HEAD even if the server did not advertise that ref. I think that could
work, but it may also lead to interesting corner cases (e.g., creating a
dangling symref locally, even though the branch is not unborn on the
server, if we happen not to have fetched it).
So let's instead simplify the rules: we'll only consider updating the
tracking HEAD symref when we're doing a full fetch of the remote's
configured refs. This is easy to implement; we can just set a flag at
the moment we realize we're using the configured refspecs. And we can
drop the special case code added by 6c915c3f85 and 20010b8c20, since
this covers those cases. The existing tests from those commits still
pass.
In t5505, an incidental call to "git fetch <remote> <refspec>" updated
HEAD, which caused us to adjust the test in 3f763ddf28 (fetch: set
remote/HEAD if it does not exist, 2024-11-22). We can now adjust that
back to how it was before the feature was added.
Even though t5505 is incidentally testing our new desired behavior,
we'll add an explicit test in t5510 to make sure it is covered.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Switch from merge-recursive to merge-ort. Adjust the following
testcases due to the switch:
* t4151: This test left an untracked file in the way of the merge.
merge-recursive could only sometimes tell when untracked files were
in the way, and by the time it discovers others, it has already made
too many changes to back out of the merge. So, instead of writing the
results to e.g. 'file1' it would instead write them to
'file1~branch1'. This is confusing for users, because they might not
notice 'file1~branch1' and accidentally add and commit 'file1'.
In contrast, merge-ort correctly notices the file in the way before
making any changes and aborts. Since this test didn't care about the
file in the way, just remove it before calling git-am.
* t4255: Usage of merge-ort allows us to change two known failures into
successes.
* t6427: As noted a few commits ago, the choice of conflict label for
diff3 markers for the ancestor commit was previously handled by
merge-recursive.c rather than by callers. Since that has now changed,
`git am` needs to specify that label. Although the previous conflict
label ("constructed merge base") was already fairly somewhat slanted
towards `git am`, let's use wording more along the lines of the
related command-line flag from `git apply` and function involved to
tie it more closely to `git am`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While 'git-reflog(1)' currently allows users to expire reflogs and
delete individual entries, it lacks functionality to completely remove
reflogs for specific references. This becomes problematic in
repositories where reflogs are not needed but continue to accumulate
entries despite setting 'core.logAllRefUpdates=false'.
Add a new 'drop' subcommand to git-reflog that allows users to delete
the entire reflog for a specified reference. Include an '--all' flag to
enable dropping all reflogs from all worktrees and an addon flag
'--single-worktree', to only drop all reflogs from the current worktree.
While here, remove an extraneous newline in the file.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git reflog expire' prints the error message '<ref> points nowhere!'
when used with a non-existent ref. This message is a bit confusing and
vague. Modify the message to be more clear and direct.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
stash was modified to use merge_ort_nonrecursive() instead of
merge_recursive_generic() back in commit 874cf2a604 (stash: apply
stash using 'merge_ort_nonrecursive()', 2022-05-10). That makes the
inclusion of merge-recursive.h unnecessary. In preparation for the
removal of merge-recursive.h, remove the unnecessary include.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once an object is written into a cruft pack, we can only freshen it by
writing a new loose or packed copy of that object with a more recent
mtime.
Prior to 61568efa95 (builtin/pack-objects.c: support `--max-pack-size`
with `--cruft`, 2023-08-28), we typically had at most one cruft pack in
a repository at any given time. So freshening unreachable objects was
straightforward when already rewriting the cruft pack (and its *.mtimes
file).
But 61568efa95 changes things: 'pack-objects' now supports writing
multiple cruft packs when invoked with `--cruft` and the
`--max-pack-size` flag. Cruft packs are rewritten until they reach some
size threshold, at which point they are considered "frozen", and will
only be modified in a pruning GC, or if the threshold itself is
adjusted.
Prior to this patch, however, this process breaks down when we attempt
to freshen an object packed in an earlier cruft pack, and that cruft
pack is larger than the threshold and thus will survive the repack.
When this is the case, it is impossible to freshen objects in cruft
pack(s) when those cruft packs are larger than the threshold. This is
because we would avoid writing them in the new cruft pack entirely, for
a couple of reasons.
1. When enumerating packed objects via 'add_objects_in_unpacked_packs()'
we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping
over the packs we're going to retain (which are marked as kept
in-core by 'read_cruft_objects()').
This means that we will avoid enumerating additional packed copies
of objects found in any cruft packs which are larger than the given
size threshold. Thus there is no opportunity to call
'create_object_entry()' whatsoever.
2. We likewise will discard the loose copy (if one exists) of any
unreachable object packed in a cruft pack that is larger than the
threshold. Here our call path is 'add_unreachable_loose_objects()',
which uses the 'add_loose_object()' callback.
That function will eventually land us in 'want_object_in_pack()'
(via 'add_cruft_object_entry()'), and we'll discard the object as it
appears in one of the packs which we marked as kept in-core.
This means in effect that it is impossible to freshen an unreachable
object once it appears in a cruft pack larger than the given threshold.
Instead, we should pack an additional copy of an unreachable object we
want to freshen even if it appears in a cruft pack, provided that the
cruft copy has an mtime which is before the mtime of the copy we are
trying to pack/freshen. This is sub-optimal in the sense that it
requires keeping an additional copy of unreachable objects upon
freshening, but we don't have a better alternative without the ability
to make in-place modifications to existing *.mtimes files.
In order to implement this, we have to adjust the behavior of
'want_found_object()'. When 'pack-objects' is told that we're *not*
going to retain any cruft packs (i.e. the set of packs marked as kept
in-core does not contain a cruft pack), the behavior is unchanged.
But when there *is* at least one cruft pack that we're holding onto, it
is no longer sufficient to reject a copy of an object found in that
cruft pack for that reason alone. In this case, we only want to reject a
candidate object when copies of that object either:
- exists in a non-cruft pack that we are retaining, regardless of that
pack's mtime, or
- exists in a cruft pack with an mtime at least as recent as the copy
we are debating whether or not to pack, in which case freshening
would be redundant.
To do this, keep track of whether or not we have any cruft packs in our
in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
flag. When we end up in this new special case, we replace a call to
'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject
objects when we have a copy in an existing cruft pack with at least as
recent an mtime as our candidate (in which case "freshening" would be
redundant).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref and reflog iterators have their lifecycle attached to iteration:
once the iterator reaches its end, it is automatically released and the
caller doesn't have to care about that anymore. When the iterator should
be released before it has been exhausted, callers must explicitly abort
the iterator via `ref_iterator_abort()`.
This lifecycle is somewhat unusual in the Git codebase and creates two
problems:
- Callsites need to be very careful about when exactly they call
`ref_iterator_abort()`, as calling the function is only valid when
the iterator itself still is. This leads to somewhat awkward calling
patterns in some situations.
- It is impossible to reuse iterators and re-seek them to a different
prefix. This feature isn't supported by any iterator implementation
except for the reftable iterators anyway, but if it was implemented
it would allow us to optimize cases where we need to search for
specific references repeatedly by reusing internal state.
Detangle the lifecycle from iteration so that we don't deallocate the
iterator anymore once it is exhausted. Instead, callers are now expected
to always call a newly introduce `ref_iterator_free()` function that
deallocates the iterator and its internal state.
Note that the `dir_iterator` is somewhat special because it does not
implement the `ref_iterator` interface, but is only used to implement
other iterators. Consequently, we have to provide `dir_iterator_free()`
instead of `dir_iterator_release()` as the allocated structure itself is
managed by the `dir_iterator` interfaces, as well, and not freed by
`ref_iterator_free()` like in all the other cases.
While at it, drop the return value of `ref_iterator_abort()`, which
wasn't really required by any of the iterator implementations anyway.
Furthermore, stop calling `base_ref_iterator_free()` in any of the
backends, but instead call it in `ref_iterator_free()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the commands in git-update-ref(1) accept an old and/or new
object ID to update a specific reference to. These object IDs get parsed
via `repo_get_oid()`, which not only handles plain object IDs, but also
those that have a suffix like "~" or "^2". More surprisingly though, it
even knows to resolve arbitrary revisions, despite the fact that its
manpage does not mention this fact even once.
One consequence of this is that we also check for ambiguous references:
when parsing a full object ID where the DWIM mechanism would also cause
us to resolve it as a branch, we'd end up printing a warning. While this
check makes sense to have in general, it is arguably less useful in the
context of git-update-ref(1). This is due to multiple reasons:
- The manpage is explicitly structured around object IDs. So if we see
a fully blown object ID, the intent should be quite clear in
general.
- The command is part of our plumbing layer and not a tool that users
would generally use in interactive workflows. As such, the warning
will likely not be visible to anybody in the first place.
- Users can and should use the fully-qualified refname in case there
is any potential for ambiguity. And given that this command is part
of our plumbing layer, one should always try to be as defensive as
possible and use fully-qualified refnames.
Furthermore, this check can be quite expensive when updating lots of
references via `--stdin`, because we try to read multiple references per
object ID that we parse according to the DWIM rules. This effect can be
seen both with the "files" and "reftable" backend.
The issue is not unique to git-update-ref(1), but was also an issue in
git-cat-file(1), where it was addressed by disabling the ambiguity check
in 25fba78d36 (cat-file: disable object/refname ambiguity check for
batch mode, 2013-07-12).
Disable the warning in git-update-ref(1), which provides a significant
speedup with both backends. The user-visible outcome is unchanged even
when ambiguity exists, except that we don't show the warning anymore.
The following benchmark creates 10000 new references with a 100000
preexisting refs with the "files" backend:
Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 467.3 ms ± 5.1 ms [User: 100.0 ms, System: 365.1 ms]
Range (min … max): 461.9 ms … 479.3 ms 10 runs
Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 394.1 ms ± 5.8 ms [User: 63.3 ms, System: 327.6 ms]
Range (min … max): 384.9 ms … 405.7 ms 10 runs
Summary
update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
1.19 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
And with the "reftable" backend:
Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 146.9 ms ± 2.2 ms [User: 90.4 ms, System: 56.0 ms]
Range (min … max): 142.7 ms … 150.8 ms 19 runs
Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 63.2 ms ± 1.1 ms [User: 41.0 ms, System: 21.8 ms]
Range (min … max): 61.1 ms … 66.6 ms 41 runs
Summary
update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
2.32 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Note that the absolute improvement with both backends is roughly in the
same ballpark, but the relative improvement for the "reftable" backend
is more significant because writing the new table to disk is faster in
the first place.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of Git 3.0, remove the hidden synonym for "--annotate-stdin"
for real. As this does not change the fact that it used to be
called "--stdin" in older version of Git, keep that passage in the
documentation for "--annotate-stdin".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fast-export has a --signed-tags= option that controls how to handle tag
signatures. However, there is no equivalent for commit signatures; it
just silently strips the signature out of the commit (analogously to
--signed-tags=strip).
While signatures are generally problematic for fast-export/fast-import
(because hashes are likely to change), if they're going to support tag
signatures, there's no reason to not also support commit signatures.
So, implement a --signed-commits= option that mirrors the --signed-tags=
option.
On the fast-export side, try to be as much like signed-tags as possible,
in both implementation and in user-interface. This will change the
default behavior to '--signed-commits=abort' from what is now
'--signed-commits=strip'. In order to provide an escape hatch for users
of third-party tools that call fast-export and do not yet know of the
--signed-commits= option, add an environment variable
'FAST_EXPORT_SIGNED_COMMITS_NOABORT=1' that changes the default to
'--signed-commits=warn-strip'.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fast-export's helper function find_encoding() takes a `const char *`, but
modifies that memory despite the `const`. Ultimately, this memory came
from get_commit_buffer(), and you're not supposed to modify the memory
that you get from get_commit_buffer().
So, get rid of find_encoding() in favor of commit.h:find_commit_header(),
which gives back a string length, rather than mutating the memory to
insert a '\0' terminator.
Because find_commit_header() detects the "\n\n" string that separates the
headers and the commit message, move the call to be above the
`message = strstr(..., "\n\n")` call. This helps readability, and allows
for the value of `encoding` to be used for a better value of "..." so that
the same memory doesn't need to be checked twice. Introduce a
`commit_buffer_cursor` variable to avoid writing an awkward
`encoding ? encoding + encoding_len : committer_end` expression.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --signed-tags= option takes one of five arguments specifying how to
handle signed tags during export. Among these arguments, 'strip' is to
'warn-strip' as 'verbatim' is to 'warn' (the unmentioned argument is
'abort', which stops the fast-export process entirely). That is,
signatures are either stripped or copied verbatim while exporting, with
or without a warning.
Match the pattern and rename 'warn' to 'warn-verbatim' to make it clear
that it instructs fast-export to copy signatures verbatim.
To maintain backwards compatibility, 'warn' is still recognized as
deprecated synonym of 'warn-verbatim'.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"Documentation/CodingGuidelines" says that there should be whitespaces
around operators like 'if', 'switch', 'for', etc.
Let's fix this in "builtin/fast-export.c".
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.
This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.
There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:
- "builtin/grep.c"
- "grep.c"
- "refs/debug.c"
There are also two non-trivial exceptions:
- "diff-no-index.c": Here we know that we may not have a repository
initialized at all, so we cannot rely on `the_repository`. Instead,
we adapt `diff_no_index()` to get a `struct git_hash_algo` as
parameter. The only caller is located in "builtin/diff.c", where we
know to call `repo_set_hash_algo()` in case we're running outside of
a Git repository. Consequently, it is fine to continue passing
`the_repository->hash_algo` even in this case.
- "builtin/ls-files.c": There is an in-flight patch series that drops
`USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
conflict because we use `null_oid()` in `show_submodule()`. The
value is passed to `repo_submodule_init()`, which may use the object
ID to resolve a tree-ish in the superproject from which we want to
read the submodule config. As such, the object ID should refer to an
object in the superproject, and consequently we need to use its hash
algorithm.
This means that we could in theory just not bother about this edge
case at all and just use `the_repository` in "diff-no-index.c". But
doing so would feel misdesigned.
Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple sites in "delta-islands.c" where we use the
global `the_repository` variable, either explicitly or implicitly by
using `the_hash_algo`.
Refactor the code to stop using `the_repository`. In most cases this is
trivial because we already had a repository available in the calling
context, with the only exception being `propagate_island_marks()`. Adapt
it so that the repository gets passed in via a parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple sites in "object-file-convert.c" where we use the
global `the_repository` variable, either explicitly or implicitly by
using `the_hash_algo`. All of these callsites are transitively called
from `convert_object_file()`, which indeed has no repo as input.
Refactor the function so that it receives a repository as a parameter
and pass it through to all internal functions to get rid of the
dependency. Remove the `USE_THE_REPOSITORY_VARIABLE` define.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "core.bigFileThreshold" setting is stored in a global variable and
populated via `git_default_core_config()`. This may cause issues in
the case where one is handling multiple different repositories in a
single process with different values for that config key, as we may or
may not see the correct value in that case. Furthermore, global state
blocks our path towards libification.
Refactor the code so that we instead store the value in `struct
repo_settings`, where the value is computed as-needed and cached.
Note that this change requires us to adapt one test in t1050 that
verifies that we die when parsing an invalid "core.bigFileThreshold"
value. The exercised Git command doesn't use the value at all, and thus
it won't hit the new code path that parses the value. This is addressed
by using git-hash-object(1) instead, which does read the value.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of functions in "pack-write.c" that implicitly depend
on `the_repository` or `the_hash_algo`. Remove this dependency by
injecting the repository via a parameter and adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of functions exposed by "object.c" that implicitly
depend on `the_repository`. Remove this dependency by injecting the
repository via a parameter. Adapt callers accordingly by simply using
`the_repository`, except in cases where the subsystem is already free of
the repository. In that case, we instead pass the repository provided by
the caller's context.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple sites in "csum-file.c" where we use the global
`the_repository` variable, either explicitly or implicitly by using
`the_hash_algo`.
Refactor the code to stop using `the_repository` by adapting functions
to receive required data as parameters. Adapt callsites accordingly by
either using `the_repository->hash_algo`, or by using a context-provided
hash algorithm in case the subsystem already got rid of its dependency
on `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git-fetch we have an optimization to avoid issuing an ls-refs command
to the server if we don't care about the value of any refs (e.g.,
because we are fetching exact object ids), saving a round-trip to the
server. This comes from e70a3030e7 (fetch: do not list refs if fetching
only hashes, 2018-09-27).
It uses an explicit flag "must_list_refs" to decide when we need to do
so. That was needed back then, because the list of ref-prefixes was not
always complete. If it was empty, it did not necessarily mean that we
were not interested in any refs). But that is no longer the case; an
empty list of prefixes means that we truly do not care about any refs.
And so rather than an explicit flag, we can just check whether we are
interested in any ref prefixes. This simplifies the code slightly, as
there is now a single source of truth for the decision.
It also fixes a bug in / optimizes a very unlikely case, which is:
git fetch $remote ^foo $oid
I.e., a negative refspec combined with an exact oid fetch. This is
somewhat nonsense, in that there are no positive refspecs mentioning
refs to countermand with the negative one. But we should be able to do
this without issuing an ls-refs command (excluding "foo" from the empty
set will obviously still be the empty set).
However, the current code does not do so. The negative refspec is not
counted as a noop in un-setting the must_list_refs flag (hardly the
fault of e70a3030e7, as negative refspecs did not appear until much
later). But by using the prefix list as a source of truth, this
naturally just works; the negative refspec does not add a prefix to ask
about, and hence does not trigger the ls-refs call.
This is esoteric enough that I didn't bother adding a test. The real
value here is in the code simplification.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we fetch from a configured remote, we may try to update the local
refs/remotes/<origin>/HEAD, and so we ask the server to advertise its
HEAD to us.
But if we aren't otherwise asking about any refs at all, then we know
this HEAD update can never happen! To consider a new value for HEAD,
the set_head() function uses guess_remote_head(). And even if it sees an
explicit symref value for HEAD, it will only report that as a match if
we also saw that remote ref advertised, and it mapped to a local
tracking ref via get_fetch_map().
In other words, a fetch like this:
git fetch origin $exact_oid:refs/heads/foo
can never update HEAD, because we will never have fetched (nor even see
the advertisement for) the ref that HEAD points to.
Currently the command above will still call ls-refs to ask about the
HEAD, even though it is pointless. This patch teaches it to skip the
ls-refs call entirely in this case, which avoids a round-trip to the
server.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the ref-prefix feature of protocol v2, a client which sends
no prefixes at all will get the full advertisement. And so the code in
git-fetch was historically loose about setting up that list based on our
refspecs. There were cases where we needed to know about some refs, so
we just didn't add anything to the ref-prefix list.
And hence further code, like that for tag-following and updating
origin/HEAD, had to be careful about adding to an empty list. E.g., see
the bug fixed by bd52d9a058 (fetch: fix following tags when fetching
specific OID, 2025-03-07).
But the previous commit removed the last such case, and now we know an
empty ref-prefix list (at least inside git-fetch's do_fetch() function)
means that we really don't need to see any refs. So we can drop those
extra conditionals.
This simplifies the code a little. But it also means that some cases can
now use ref prefixes when they would not otherwise. As the test shows,
fetching an exact oid into a local ref can now avoid enumerating all of
the refs. The refspec itself doesn't need to know about any remote refs,
and the tag auto-following can just ask about refs/tags/.
The same is true for asking about HEAD to update the local origin/HEAD.
I didn't add a test for that yet, though, as we can optimize it even
further.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we're not given any refspecs (either on the command line or via
config) and we have no branch merge config, then we fetch the remote
HEAD into our local FETCH_HEAD. In that case we do not send any
ref-prefix option to the server at all, and we see the full
advertisement.
But this is sub-optimal. We only care about HEAD, so we can just ask
for that, and ignore all of the other refs.
The new test demonstrates a case where we see fewer refs (in this case
only one less, but in theory we could be ignoring millions of them).
This also removes the only case where we care about seeing some refs
from the other side, but don't add anything to the ref_prefixes list.
Cleaning this up means one less maintenance burden. Before this patch,
any code which wanted to add to the list had to make sure the list was
not empty, since an empty list meant "ask for everything". Now it really
means "we are not interested in any refs".
This should let us optimize a few more cases in subsequent patches.
Note that we'll add "HEAD" to the list of prefixes, and later code for
updating "refs/remotes/<remote>/HEAD" may likewise do so. In theory this
could cause duplicates in the list, but in practice these can't both
trigger. We hit our new case only if there are no refspecs, and the
"<remote>/HEAD" feature is enabled only when we are fetching from a
remote with configured refspecs. We could be defensive with a flag, but
it didn't seem worth it to me (the absolute worse case is a useless
redundant ref-prefix line sent to the server).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/checkout-index.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_checkout_index()` function with `repo`
set to NULL and then early in the function, `show_usage_with_options_if_asked()`
call will give the options help and exit.
Pass an instance of "struct index_state" available in the calling
context to both `checkout_all()` and `checkout_file()` to remove their
dependency on the global `the_repository` variable.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/for-each-ref.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_for_each_ref()` function with `repo`
set to NULL and then early in the function, `parse_options()` call will
give the options help and exit.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/ls-files.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_ls_files()` function with `repo` set
to NULL and then early in the function, `show_usage_with_options_if_asked()`
call will give the options help and exit.
Pass the repository available in the calling context to both
`expand_objectsize()` and `show_ru_info()` to remove their
dependency on the global `the_repository` variable.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/pack-refs.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_pack_refs()` function with `repo` set
to NULL and then early in the function, `parse_options()` call will give
the options help and exit.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/send-pack.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_send_pack()` function with `repo` set
to NULL and then early in the function, `parse_options()` call will give
the options help and exit.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/verify-commit.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_verify_commit()` function with `repo`
set to NULL and then early in the function, `parse_options()` call will
give the options help and exit.
Pass the repository available in the calling context to `verify_commit()`
to remove it's dependency on the global `the_repository` variable.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/verify-tag.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_verify_tag()` function with `repo` set
to NULL and then early in the function, `parse_options()` call will give
the options help and exit.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 3f763ddf28 (fetch: set remote/HEAD if it does not exist, 2024-11-22),
unconditionally adds "HEAD" to the list of ref prefixes we send to the
server.
This breaks a core assumption that the list of prefixes we send to the
server is complete. We must either send all prefixes we care about, or
none at all (in the latter case the server then advertises everything).
The tag following code is careful to only add "refs/tags/" to the list
of prefixes if there are already entries in the prefix list. But because
the new code from 3f763ddf28 runs after the tag code, and because it
unconditionally adds to the prefix list, we may end up with a prefix
list that _should_ have "refs/tags/" in it, but doesn't.
When that is the case, the server does not advertise any tags, and our
auto-following breaks because we never learned about any tags in the
first place.
Fix this by only adding "HEAD" to the ref prefixes when we know that we
are already limiting the advertisement. In either case we'll learn about
HEAD (either through the limited advertisement, or implicitly through a
full advertisement).
Reported-by: Igor Todorovski <itodorov@ca.ibm.com>
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The path.[ch] API takes an explicit repository parameter passed
throughout the callchain, instead of relying on the_repository
singleton instance.
* ps/path-sans-the-repository:
path: adjust last remaining users of `the_repository`
environment: move access to "core.sharedRepository" into repo settings
environment: move access to "core.hooksPath" into repo settings
repo-settings: introduce function to clear struct
path: drop `git_path()` in favor of `repo_git_path()`
rerere: let `rerere_path()` write paths into a caller-provided buffer
path: drop `git_common_path()` in favor of `repo_common_path()`
worktree: return allocated string from `get_worktree_git_dir()`
path: drop `git_path_buf()` in favor of `repo_git_path_replace()`
path: drop `git_pathdup()` in favor of `repo_git_path()`
path: drop unused `strbuf_git_path()` function
path: refactor `repo_submodule_path()` family of functions
submodule: refactor `submodule_to_gitdir()` to accept a repo
path: refactor `repo_worktree_path()` family of functions
path: refactor `repo_git_path()` family of functions
path: refactor `repo_common_path()` family of functions
The diffs queued from git-diff-pairs(1) are flushed when stdin is
closed. To enable greater flexibility, allow control over when the diff
queue is flushed by writing a single NUL byte on stdin between input
file pairs. Diff output between flushes is separated by a single NUL
byte.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Through git-diff(1), a single diff can be generated from a pair of blob
revisions directly. Unfortunately, there is not a mechanism to compute
batches of specific file pair diffs in a single process. Such a feature
is particularly useful on the server-side where diffing between a large
set of changes is not feasible all at once due to timeout concerns.
To facilitate this, introduce git-diff-pairs(1) which acts as a backend
passing its NUL-terminated raw diff format input from stdin through diff
machinery to produce various forms of output such as patch or raw.
The raw format was originally designed as an interchange format and
represents the contents of the diff_queued_diff list making it possible
to break the diff pipeline into separate stages. For example,
git-diff-tree(1) can be used as a frontend to compute file pairs to
queue and feed its raw output to git-diff-pairs(1) to compute patches.
With this, batches of diffs can be progressively generated without
having to recompute renames or retrieve object context. Something like
the following:
git diff-tree -r -z -M $old $new |
git diff-pairs -p -z
should generate the same output as `git diff-tree -p -M`. Furthermore,
each line of raw diff formatted input can also be individually fed to a
separate git-diff-pairs(1) process and still produce the same output.
Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the preceding refactorings we now only have a couple of implicit
users of `the_repository` left in the "path" subsystem, all of which
depend on global state via `calc_shared_perm()`. Make the dependency on
`the_repository` explicit by passing the repo as a parameter instead and
adjust callers accordingly.
Note that this change bubbles up into a couple of subsystems that were
previously declared as free from `the_repository`. Instead of marking
all of them as `the_repository`-dependent again, we instead use the
repository that is available in the calling context. There are three
exceptions though with "copy.c", "pack-write.c" and "tempfile.c".
Adjusting these would require us to adapt callsites all over the place,
so this is left for a future iteration.
Mark "path.c" as free from `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar as with the preceding commit, we track "core.sharedRepository"
via a pair of global variables. Move them into `struct repo_settings` so
that we can instead track them per-repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove `git_path()` in favor of the `repo_git_path()` family of
functions, which makes the implicit dependency on `the_repository` go
away.
Note that `git_path()` returned a string allocated via `get_pathname()`,
which uses a rotating set of statically allocated buffers. Consequently,
callers didn't have to free the returned string. The same isn't true for
`repo_common_path()`, so we also have to add logic to free the returned
strings.
This refactoring also allows us to remove `repo_common_pathv()` as well
as `get_pathname()` from the public interface.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Same as with `get_worktree_git_dir()` a couple of commits ago, the
`rerere_path()` function returns paths that need not be free'd by the
caller because `git_path()` internally uses `get_pathname()`.
Refactor the function to instead accept a caller-provided buffer that
the path will be written into, passing on ownership to the caller. This
refactoring prepares us for the removal of `git_path()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git refs migrate" can optionally be told not to migrate the reflog.
* kn/ref-migrate-skip-reflog:
builtin/refs: add '--no-reflog' flag to drop reflogs
The value of "uname -s" is by default sent over the wire as a part
of the "version" capability.
* ua/os-version-capability:
agent: advertise OS name via agent capability
t5701: add setup test to remove side-effect dependency
version: extend get_uname_info() to hide system details
version: refactor get_uname_info()
version: refactor redact_non_printables()
version: replace manual ASCII checks with isprint() for clarity
At now, we have already implemented the ref consistency checks for both
"files-backend" and "packed-backend". Although we would check some
redundant things, it won't cause trouble. So, let's integrate it into
the "git-fsck(1)" command to get feedback from the users. And also by
calling "git refs verify" in "git-fsck(1)", we make sure that the new
added checks don't break.
Introduce a new function "fsck_refs" that initializes and runs a child
process to execute the "git refs verify" command. In order to provide
the user interface create a progress which makes the total task be 1.
It's hard to know how many loose refs we will check now. We might
improve this later.
Then, introduce the option to allow the user to disable checking ref
database consistency. Put this function in the very first execution
sequence of "git-fsck(1)" due to that we don't want the existing code of
"git-fsck(1)" which would implicitly check the consistency of refs to
die the program.
Last, update the test to exercise the code.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "packed-backend.c", there are some functions such as "create_snapshot"
and "next_record" which would check the correctness of the content of
the "packed-ref" file. When anything is bad, the program will die.
It may seem that we have nothing relevant to above feature, because we
are going to read and parse the raw "packed-ref" file without creating
the snapshot and using the ref iterator to check the consistency.
However, when using "get_worktrees" in "builtin/refs", we would parse
the "HEAD" information. If the referent of the "HEAD" is inside the
"packed-ref", we will call "create_snapshot" function to parse the
"packed-ref" to get the information. No matter whether the entry of
"HEAD" in "packed-ref" is correct, "create_snapshot" would call
"verify_buffer_safe" to check whether there is a newline in the last
line of the file. If not, the program will die.
Although this behavior has no harm for the program, it will
short-circuit the program. When the users execute "git refs verify" or
"git fsck", we should avoid reading the head information, which may
execute the read operation in packed backend with stricter checks to die
the program. Instead, we should continue to check other parts of the
"packed-refs" file completely.
Fortunately, in 465a22b338 (worktree: skip reading HEAD when repairing
worktrees, 2023-12-29), we have introduced a function
"get_worktrees_internal" which allows us to get worktrees without
reading head information.
Create a new exposed function "get_worktrees_without_reading_head", then
replace the "get_worktrees" in "builtin/refs" with the new created
function.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git merge-tree --stdin" has been improved (including a workaround
for a deadlock).
* pw/merge-tree-stdin-deadlock-fix:
merge-tree: fix link formatting in html docs
merge-tree: improve docs for --stdin
merge-tree: only use basic merge config
merge-tree: remove redundant code
merge-tree --stdin: flush stdout to avoid deadlock
I recently had reported to me a crash from a coworker using the recently
added sendemail mailmap support:
3724814 Segmentation fault (core dumped) git check-mailmap "bugs@company.xx"
This appears to happen because of the NULL pointer name passed into
map_user(). Fix this by passing "" instead of NULL so that we have a
valid pointer.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git refs migrate" subcommand converts the backend used for ref
storage. It always migrates reflog data as well as refs. Introduce an
option to exclude reflogs from migration, allowing them to be discarded
when they are unnecessary.
This is particularly useful in server-side repositories, where reflogs
are typically not expected. However, some repositories may still have
them due to historical reasons, such as bugs, misconfigurations, or
administrative decisions to enable reflogs for debugging. In such
repositories, it would be optimal to drop reflogs during the migration.
To address this, introduce the '--no-reflog' flag, which prevents reflog
migration. When this flag is used, reflogs from the original reference
backend are migrated. Since only the new reference backend remains in
the repository, all previous reflogs are permanently discarded.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git difftool" code clean-up.
* da/difftool-sans-the-repository:
difftool: eliminate use of USE_THE_REPOSITORY_VARIABLE
difftool: eliminate use of the_repository
difftool: eliminate use of global variables
"git rev-list --missing=" learned to accept "print-info" that gives
known details expected of the missing objects, like path and type.
* jt/rev-list-missing-print-info:
rev-list: extend print-info to print missing object type
rev-list: add print-info action to print missing object path
Lazy-loading missing files in a blobless clone on demand is costly
as it tends to be one-blob-at-a-time. "git backfill" is introduced
to help bulk-download necessary files beforehand.
* ds/backfill:
backfill: assume --sparse when sparse-checkout is enabled
backfill: add --sparse option
backfill: add --min-batch-size=<n> option
backfill: basic functionality and tests
backfill: add builtin boilerplate
Commit 9c93ba4d0a (merge-recursive: honor diff.algorithm, 2024-07-13)
replaced init_merge_options() with init_basic_merge_config() for use in
plumbing commands and init_ui_merge_config() for use in porcelain
commands. As "git merge-tree" is a plumbing command it should call
init_basic_merge_config() rather than init_ui_merge_config(). The merge
ort machinery ignores "diff.algorithm" so the behavior is unchanged by
this commit but it future proofs us against any future changes to
init_ui_merge_config().
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
real_merge() only ever returns "0" or "1" as it dies if the merge status
is less than zero. Therefore the check for "result < 0" is redundant and
the result variable is not needed. The return value of real_merge() is
ignored because exit status of "git merge-tree --stdin" is "0" for both
successful and conflicted merges (the status of each merge is written to
stdout). The return type of real_merge() is not changed as it is used
for the program's exit status when "--stdin" is not given.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a process tries to read the output from "git merge-tree --stdin"
before it closes merge-tree's stdin then it deadlocks. This happens
because merge-tree does not flush its output before trying to read
another line of input and means that it is not possible to cherry-pick a
sequence of commits using "git merge-tree --stdin". Fix this by calling
maybe_flush_or_die() before trying to read the next line of
input. Flushing the output after each merge does not seem to affect the
performance, any difference is lost in the noise even after increasing
the number of runs.
$ git rev-list --merges --parents -n100 origin/master |
sed 's/^[^ ]* //' >/tmp/merges
$ hyperfine -L flush 0,1 --warmup 1 --runs 30 \
'GIT_FLUSH={flush} ./git merge-tree --stdin </tmp/merges'
Benchmark 1: GIT_FLUSH=0 ./git merge-tree --stdin </tmp/merges
Time (mean ± σ): 546.6 ms ± 11.7 ms [User: 503.2 ms, System: 40.9 ms]
Range (min … max): 535.9 ms … 567.7 ms 30 runs
Benchmark 2: GIT_FLUSH=1 ./git merge-tree --stdin </tmp/merges
Time (mean ± σ): 546.9 ms ± 12.0 ms [User: 505.9 ms, System: 38.9 ms]
Range (min … max): 529.8 ms … 570.0 ms 30 runs
Summary
'GIT_FLUSH=0 ./git merge-tree --stdin </tmp/merges' ran
1.00 ± 0.03 times faster than 'GIT_FLUSH=1 ./git merge-tree --stdin </tmp/merges'
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, get_uname_info() function provides the full OS information.
In a following commit, we will need it to provide only the OS name.
Let's extend it to accept a "full" flag that makes it switch between
providing full OS information and providing only the OS name.
We may need to refactor this function in the future if an
`osVersion.format` is added.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some code from "builtin/bugreport.c" uses uname(2) to get system
information.
Let's refactor this code into a new get_uname_info() function, so
that we can reuse it in a following commit.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fetching into a bare repository incorrectly assumed it always used
a mirror layout when deciding to update remote-tracking HEAD, which
has been corrected.
* bf/fetch-set-head-fix:
fetch set_head: fix non-mirror remotes in bare repositories
fetch set_head: refactor to use remote directly
"git clone" learned to make a shallow clone for a single commit
that is not necessarily be at the tip of any branch.
* tc/clone-single-revision:
builtin/clone: teach git-clone(1) the --revision= option
parse-options: introduce die_for_incompatible_opt2()
clone: introduce struct clone_opts in builtin/clone.c
clone: add tags refspec earlier to fetch refspec
clone: refactor wanted_peer_refs()
clone: make it possible to specify --tags
clone: cut down on global variables in clone.c
"git repack --keep-unreachable" to send unreachable objects to the
main pack "git repack -ad" produces did not work when there is no
existing packs, which has been corrected.
* ps/repack-keep-unreachable-in-unpacked-repo:
builtin/repack: fix `--keep-unreachable` when there are no packs
"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.
* ds/name-hash-tweaks:
pack-objects: prevent name hash version change
test-tool: add helper for name-hash values
p5313: add size comparison test
pack-objects: add GIT_TEST_NAME_HASH_VERSION
repack: add --name-hash-version option
pack-objects: add --name-hash-version option
pack-objects: create new name-hash function version
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/update-server-info.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_update_server_info()` function
with `repo` set to NULL and then early in the function, "parse_options()"
call will give the options help and exit, without having to consult much
of the configuration file. So it is safe to omit reading the config when
`repo` argument the caller gave us is NULL.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Further code clean-up on the use of hash functions. Now the
context object knows what hash function it is working with.
* ps/hash-cleanup:
global: adapt callers to use generic hash context helpers
hash: provide generic wrappers to update hash contexts
hash: stop typedeffing the hash context
hash: convert hashing context to a structure
Remove `git_common_path()` in favor of the `repo_common_path()` family
of functions, which makes the implicit dependency on `the_repository` go
away.
Note that `git_common_path()` used to return a string allocated via
`get_pathname()`, which uses a rotating set of statically allocated
buffers. Consequently, callers didn't have to free the returned string.
The same isn't true for `repo_common_path()`, so we also have to add
logic to free the returned strings.
This refactoring also allows us to remove `repo_common_pathv()` from the
public interface.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_worktree_git_dir()` function returns a string constant that
does not need to be free'd by the caller. This string is computed for
three different cases:
- If we don't have a worktree we return a path into the Git directory.
The returned string is owned by `the_repository`, so there is no
need for the caller to free it.
- If we have a worktree, but no worktree ID then the caller requests
the main worktree. In this case we return a path into the common
directory, which again is owned by `the_repository` and thus does
not need to be free'd.
- In the third case, where we have an actual worktree, we compute the
path relative to "$GIT_COMMON_DIR/worktrees/". This string does not
need to be released either, even though `git_common_path()` ends up
allocating memory. But this doesn't result in a memory leak either
because we write into a buffer returned by `get_pathname()`, which
returns one out of four static buffers.
We're about to drop `git_common_path()` in favor of `repo_common_path()`,
which doesn't use the same mechanism but instead returns an allocated
string owned by the caller. While we could adapt `get_worktree_git_dir()`
to also use `get_pathname()` and print the derived common path into that
buffer, the whole schema feels a lot like premature optimization in this
context. There are some callsites where we call `get_worktree_git_dir()`
in a loop that iterates through all worktrees. But none of these loops
seem to be even remotely in the hot path, so saving a single allocation
there does not feel worth it.
Refactor the function to instead consistently return an allocated path
so that we can start using `repo_common_path()` in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove `git_path_buf()` in favor of `repo_git_path_replace()`. The
latter does essentially the same, with the only exception that it does
not rely on `the_repository` but takes the repo as separate parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove `git_pathdup()` in favor of `repo_git_path()`. The latter does
essentially the same, with the only exception that it does not rely on
`the_repository` but takes the repo as separate parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As explained in an earlier commit, we're refactoring path-related
functions to provide a consistent interface for computing paths into the
commondir, gitdir and worktree. Refactor the "submodule" family of
functions accordingly.
Note that in contrast to the other `repo_*_path()` families, we have to
pass in the repository as a non-constant pointer. This is because we end
up calling `repo_read_gitmodules()` deep down in the callstack, which
may end up modifying the repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `submodule_to_gitdir()` function implicitly uses `the_repository` to
resolve submodule paths. Refactor the function to instead accept a repo
as parameter to remove the dependency on global state.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the USE_THE_REPOSITORY_VARIABLE #define now that all
state is passed to each function from callers.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make callers pass a repository struct into each function instead
of relying on the global the_repository variable.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move difftool's global variables into a difftools_option struct
in preparation for removal of USE_THE_REPOSITORY_VARIABLE.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-clone(1) command has the option `--branch` that allows the user
to select the branch they want HEAD to point to. In a non-bare
repository this also checks out that branch.
Option `--branch` also accepts a tag. When a tag name is provided, the
commit this tag points to is checked out and HEAD is detached. Thus
`--branch` can be used to clone a repository and check out a ref kept
under `refs/heads` or `refs/tags`. But some other refs might be in use
as well. For example Git forges might use refs like `refs/pull/<id>` and
`refs/merge-requests/<id>` to track pull/merge requests. These refs
cannot be selected upon git-clone(1).
Add option `--revision` to git-clone(1). This option accepts a fully
qualified reference, or a hexadecimal commit ID. This enables the user
to clone and check out any revision they want. `--revision` can be used
in conjunction with `--depth` to do a minimal clone that only contains
the blob and tree for a single revision. This can be useful for
automated tests running in CI systems.
Using option `--branch` and `--single-branch` together is a similar
scenario, but serves a different purpose. Using these two options, a
singlet remote tracking branch is created and the fetch refspec is set
up so git-fetch(1) will receive updates on that branch from the remote.
This allows the user work on that single branch.
Option `--revision` on contrary detaches HEAD, creates no tracking
branches, and writes no fetch refspec.
Signed-off-by: Toon Claes <toon@iotcl.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
[jc: removed unnecessary TEST_PASSES_SANITIZE_LEAK from the test]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions die_for_incompatible_opt3() and
die_for_incompatible_opt4() already exist to die whenever a user
specifies three or four options respectively that are not compatible.
Introduce die_for_incompatible_opt2() which dies when two options that
are incompatible are set.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a lot of state stored in global variables in builtin/clone.c.
In the long run we'd like to remove many of those.
Introduce `struct clone_opts` in this file. This struct will be used to
contain all details needed to perform the clone. The struct object can
be thrown around to all the functions that need these details.
The first field we're adding is `wants_head`. In some scenarios
(specifically when both `--single-branch` and `--branch` are given) we
are not interested in `HEAD` on the remote. The field `wants_head` in
`struct clone_opts` will hold this information. We could have put
`option_branch` and `option_single_branch` into that struct instead, but
in a following commit we'll be using `wants_head` as well.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In clone.c we call refspec_ref_prefixes() to copy the fetch refspecs
from the `remote->fetch` refspec into `ref_prefixes` of
`transport_ls_refs_options`. Afterwards we add the tags prefix
`refs/tags/` prefix as well. At a later point, in wanted_peer_refs() we
process refs using both `remote->fetch` and `TAG_REFSPEC`.
Simplify the code by appending `TAG_REFSPEC` to `remote->fetch` before
calling refspec_ref_prefixes().
To be able to do this, we set `option_tags` to 0 when --mirror is given.
This is because --mirror mirrors (hence the name) all the refs,
including tags and they do not need to be treated separately.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function wanted_peer_refs() is used to map the refs returned by the
server to refs we will save in our clone.
Over time this function grown to be very complex. Refactor it.
Previously, there was a separate code path for when
`option_single_branch` was set. It resulted in duplicated code and
deeper nested conditions. After this refactor the code path for when
`option_single_branch` is truthy modifies `refs` and then falls through
to the common code path. This approach relies on the `refspec` being set
correctly and thus only mapping refs that are relevant.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Option --no-tags was added in 0dab2468ee (clone: add a --no-tags option
to clone without tags, 2017-04-26). At the time there was no need to
support --tags as well, although there was some conversation about
it[1].
To simplify the code and to prepare for future commits, invert the flag
internally. Functionally there is no change, because the flag is
default-enabled passing `--tags` has no effect, so there's no need to
add tests for this.
[1]: https://lore.kernel.org/git/CAGZ79kbHuMpiavJ90kQLEL_AR0BEyArcZoEWAjPPhOFacN16YQ@mail.gmail.com/
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In clone.c the `struct option` which is used to parse the input options
for git-clone(1) is a global variable. Due to this, many variables that
are used to parse the value into, are also global.
Make `builtin_clone_options` a local variable in cmd_clone() and carry
along all variables that are only used in that function.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Additional information about missing objects found in git-rev-list(1)
can be printed by specifying the `print-info` missing action for the
`--missing` option. Extend this action to also print missing object type
information inferred from its containing object. This token follows the
form `type=<type>` and specifies the expected object type of the missing
object.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Missing objects identified through git-rev-list(1) can be printed by
setting the `--missing=print` option. Additional information about the
missing object, such as its path and type, may be present in its
containing object.
Add the `print-info` missing action for the `--missing` option that,
when set, prints additional insight about the missing object inferred
from its containing object. Each line of output for a missing object is
in the form: `?<oid> [<token>=<value>]...`. The `<token>=<value>` pairs
containing additional information are separated from each other by a SP.
The value is encoded in a token specific fashion, but SP or LF contained
in value are always expected to be represented in such a way that the
resulting encoded value does not have either of these two problematic
bytes. This format is kept generic so it can be extended in the future
to support additional information.
For now, only a missing object path info is implemented. It follows the
form `path=<path>` and specifies the full path to the object from the
top-level tree. A path containing SP or special characters is enclosed
in double-quotes in the C style as needed. In a subsequent commit,
missing object type info will also be added.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--keep-unreachable" flag is supposed to append any unreachable
objects to the newly written pack. This flag is explicitly documented as
appending both packed and loose unreachable objects to the new packfile.
And while this works alright when repacking with preexisting packfiles,
it stops working when the repository does not have any packfiles at all.
The root cause are the conditions used to decide whether or not we want
to append "--pack-loose-unreachable" to git-pack-objects(1). There are
a couple of conditions here:
- `has_existing_non_kept_packs()` checks whether there are existing
packfiles. This condition makes sense to guard "--keep-pack=",
"--unpack-unreachable" and "--keep-unreachable", because all of
these flags only make sense in combination with existing packfiles.
But it does not make sense to disable `--pack-loose-unreachable`
when there aren't any preexisting packfiles, as loose objects can be
packed into the new packfile regardless of that.
- `delete_redundant` checks whether we want to delete any objects or
packs that are about to become redundant. The documentation of
`--keep-unreachable` explicitly says that `git repack -ad` needs to
be executed for the flag to have an effect.
It is not immediately obvious why such redundant objects need to be
deleted in order for "--pack-unreachable-objects" to be effective.
But as things are working as documented this is nothing we'll change
for now.
- `pack_everything & PACK_CRUFT` checks that we're not creating a
cruft pack. This condition makes sense in the context of
"--pack-loose-unreachable", as unreachable objects would end up in
the cruft pack anyway.
So while the second and third condition are sensible, it does not make
any sense to condition `--pack-loose-unreachable` on the existence of
packfiles.
Fix the bug by splitting out the "--pack-loose-unreachable" and only
making it depend on the second and third condition. Like this, loose
unreachable objects will be packed regardless of any preexisting
packfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename functions related to handling refspecs in preparation for their
move from `remote.c` to `refspec.c`. Update their names to better
reflect their intent:
- `query_refspecs()` -> `refspec_find_match()` for clarity, as it
finds a single matching refspec.
- `query_refspecs_multiple()` -> `refspec_find_all_matches()` to
better reflect that it collects all matching refspecs instead of
returning just the first match.
- `query_matches_negative_refspec()` ->
`refspec_find_negative_match()` for consistency with the
updated naming convention, even though this static function
didn't strictly require renaming.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the function `omit_name_by_refspec()` to
`refname_matches_negative_refspec_item()` to provide clearer intent.
The previous function name was vague and did not accurately describe its
purpose. By using `refname_matches_negative_refspec_item`, make the
function's purpose more intuitive, clarifying that it checks if a
reference name matches any negative refspec.
Rename function parameters for consistency with existing naming
conventions. Use `refname` instead of `name` to align with terminology
in `refs.h`.
Remove the redundant doc comment since the function name is now
self-explanatory.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change introduced the '--[no-]sparse' option for the 'git
backfill' command, but did not assume it as enabled by default. However,
this is likely the behavior that users will most often want to happen.
Without this default, users with a small sparse-checkout may be confused
when 'git backfill' downloads every version of every object in the full
history.
However, this is left as a separate change so this decision can be reviewed
independently of the value of the '--[no-]sparse' option.
Add a test of adding the '--sparse' option to a repo without sparse-checkout
to make it clear that supplying it without a sparse-checkout is an error.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One way to significantly reduce the cost of a Git clone and later fetches is
to use a blobless partial clone and combine that with a sparse-checkout that
reduces the paths that need to be populated in the working directory. Not
only does this reduce the cost of clones and fetches, the sparse-checkout
reduces the number of objects needed to download from a promisor remote.
However, history investigations can be expensive as computing blob diffs
will trigger promisor remote requests for one object at a time. This can be
avoided by downloading the blobs needed for the given sparse-checkout using
'git backfill' and its new '--sparse' mode, at a time that the user is
willing to pay that extra cost.
Note that this is distinctly different from the '--filter=sparse:<oid>'
option, as this assumes that the partial clone has all reachable trees and
we are using client-side logic to avoid downloading blobs outside of the
sparse-checkout cone. This avoids the server-side cost of walking trees
while also achieving a similar goal. It also downloads in batches based on
similar path names, presenting a resumable download if things are
interrupted.
This augments the path-walk API to have a possibly-NULL 'pl' member that may
point to a 'struct pattern_list'. This could be more general than the
sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently
the only consumer.
Be sure to test this in both cone mode and not cone mode. Cone mode has the
benefit that the path-walk can skip certain paths once they would expand
beyond the sparse-checkout. Non-cone mode can describe the included files
using both positive and negative patterns, which changes the possible return
values of path_matches_pattern_list(). Test both kinds of matches for
increased coverage.
To test this, we can create a blobless sparse clone, expand the
sparse-checkout slightly, and then run 'git backfill --sparse' to see
how much data is downloaded. The general steps are
1. git clone --filter=blob:none --sparse <url>
2. git sparse-checkout set <dir1> ... <dirN>
3. git backfill --sparse
For the Git repository with the 'builtin' directory in the
sparse-checkout, we get these results for various batch sizes:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|-------|
| (Initial clone) | 3 | 110 MB | |
| 10K | 12 | 192 MB | 17.2s |
| 15K | 9 | 192 MB | 15.5s |
| 20K | 8 | 192 MB | 15.5s |
| 25K | 7 | 192 MB | 14.7s |
This case matters less because a full clone of the Git repository from
GitHub is currently at 277 MB.
Using a copy of the Linux repository with the 'kernel/' directory in the
sparse-checkout, we get these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|------|
| (Initial clone) | 2 | 1,876 MB | |
| 10K | 11 | 2,187 MB | 46s |
| 25K | 7 | 2,188 MB | 43s |
| 50K | 5 | 2,194 MB | 44s |
| 100K | 4 | 2,194 MB | 48s |
This case is more meaningful because a full clone of the Linux
repository is currently over 6 GB, so this is a valuable way to download
a fraction of the repository and no longer need network access for all
reachable objects within the sparse-checkout.
Choosing a batch size will depend on a lot of factors, including the
user's network speed or reliability, the repository's file structure,
and how many versions there are of the file within the sparse-checkout
scope. There will not be a one-size-fits-all solution.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users may want to specify a minimum batch size for their needs. This is only
a minimum: the path-walk API provides a list of OIDs that correspond to the
same path, and thus it is optimal to allow delta compression across those
objects in a single server request.
We could consider limiting the request to have a maximum batch size in the
future. For now, we let the path-walk API batches determine the
boundaries.
To get a feeling for the value of specifying the --min-batch-size parameter,
I tested a number of open source repositories available on GitHub. The
procedure was generally:
1. git clone --filter=blob:none <url>
2. git backfill
Checking the number of packfiles and the size of the .git/objects/pack
directory helps to identify the effects of different batch sizes.
For the Git repository, we get these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|-------|
| (Initial clone) | 2 | 119 MB | |
| 25K | 8 | 290 MB | 24s |
| 50K | 5 | 290 MB | 24s |
| 100K | 4 | 290 MB | 29s |
Other than the packfile counts decreasing as we need fewer batches, the
size and time required is not changing much for this small example.
For the nodejs/node repository, we see these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|--------|
| (Initial clone) | 2 | 330 MB | |
| 25K | 19 | 1,222 MB | 1m 22s |
| 50K | 11 | 1,221 MB | 1m 24s |
| 100K | 7 | 1,223 MB | 1m 40s |
| 250K | 4 | 1,224 MB | 2m 23s |
| 500K | 3 | 1,216 MB | 4m 38s |
Here, we don't have much difference in the size of the repo, though the
500K batch size results in a few MB gained. That comes at a cost of a
much longer time. This extra time is due to server-side delta
compression happening as the on-disk deltas don't appear to be reusable
all the time. But for smaller batch sizes, the server is able to find
reasonable deltas partly because we are asking for objects that appear
in the same region of the directory tree and include all versions of a
file at a specific path.
To contrast this example, I tested the microsoft/fluentui repo, which
has been known to have inefficient packing due to name hash collisions.
These results are found before GitHub had the opportunity to repack the
server with more advanced name hash versions:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|--------|
| (Initial clone) | 2 | 105 MB | |
| 5K | 53 | 348 MB | 2m 26s |
| 10K | 28 | 365 MB | 2m 22s |
| 15K | 19 | 407 MB | 2m 21s |
| 20K | 15 | 393 MB | 2m 28s |
| 25K | 13 | 417 MB | 2m 06s |
| 50K | 8 | 509 MB | 1m 34s |
| 100K | 5 | 535 MB | 1m 56s |
| 250K | 4 | 698 MB | 1m 33s |
| 500K | 3 | 696 MB | 1m 42s |
Here, a larger variety of batch sizes were chosen because of the great
variation in results. By asking the server to download small batches
corresponding to fewer paths at a time, the server is able to provide
better compression for these batches than it would for a regular clone.
A typical full clone for this repository would require 738 MB.
This example justifies the choice to batch requests by path name,
leading to improved communication with a server that is not optimally
packed.
Finally, the same experiment for the Linux repository had these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|---------|
| (Initial clone) | 2 | 2,153 MB | |
| 25K | 63 | 6,380 MB | 14m 08s |
| 50K | 58 | 6,126 MB | 15m 11s |
| 100K | 30 | 6,135 MB | 18m 11s |
| 250K | 14 | 6,146 MB | 18m 22s |
| 500K | 8 | 6,143 MB | 33m 29s |
Even in this example, where the default name hash algorithm leads to
decent compression of the Linux kernel repository, there is value for
selecting a smaller batch size, to a limit. The 25K batch size has the
fastest time, but uses 250 MB more than the 50K batch size. The 500K
batch size took much more time due to server compression time and thus
we should avoid large batch sizes like this.
Based on these experiments, a batch size of 50,000 was chosen as the
default value.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The default behavior of 'git backfill' is to fetch all missing blobs that
are reachable from HEAD. Document and test this behavior.
The implementation is a very simple use of the path-walk API, initializing
the revision walk at HEAD to start the path-walk from all commits reachable
from HEAD. Ignore the object arrays that correspond to tree entries,
assuming that they are all present already.
The path-walk API provides lists of objects in batches according to a
common path, but that list could be very small. We want to balance the
number of requests to the server with the ability to have the process
interrupted with minimal repeated work to catch up in the next run.
Based on some experiments (detailed in the next change) a minimum batch
size of 50,000 is selected for the default.
This batch size is a _minimum_. As the path-walk API emits lists of blob
IDs, they are collected into a list of objects for a request to the
server. When that list is at least the minimum batch size, then the
request is sent to the server for the new objects. However, the list of
blob IDs from the path-walk API could be much longer than the batch
size. At this moment, it is unclear if there is a benefit to split the
list when there are too many objects at the same path.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of implementing 'git backfill', populate the necessary files
with the boilerplate of a new builtin. Mark the builtin as experimental at
this time, allowing breaking changes in the near future, if necessary.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Following the procedure we established to introduce breaking
changes for Git 3.0, allow an early opt-in for removing support of
$GIT_DIR/branches/ and $GIT_DIR/remotes/ directories to configure
remotes.
* ps/3.0-remote-deprecation:
remote: announce removal of "branches/" and "remotes/"
builtin/pack-redundant: remove subcommand with breaking changes
ci: repurpose "linux-gcc" job for deprecations
ci: merge linux-gcc-default into linux-gcc
Makefile: wire up build option for deprecated features
The API around choosing to use unsafe variant of SHA-1
implementation has been updated in an attempt to make it harder to
abuse.
* tb/unsafe-hash-cleanup:
hash.h: drop unsafe_ function variants
csum-file: introduce hashfile_checkpoint_init()
t/helper/test-hash.c: use unsafe_hash_algo()
csum-file.c: use unsafe_hash_algo()
hash.h: introduce `unsafe_hash_algo()`
csum-file.c: extract algop from hashfile_checksum_valid()
csum-file: store the hash algorithm as a struct field
t/helper/test-tool: implement sha1-unsafe helper
Adapt callers to use generic hash context helpers instead of using the
hash algorithm to update them. This makes the callsites easier to reason
about and removes the possibility that the wrong hash algorithm is used
to update the hash context's state. And as a nice side effect this also
gets rid of a bunch of users of `the_hash_algo`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generally avoid using `typedef` in the Git codebase. One exception
though is the `git_hash_ctx`, likely because it used to be a union
rather than a struct until the preceding commit refactored it. But now
that it is a normal `struct` there isn't really a need for a typedef
anymore.
Drop the typedef and adapt all callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tb/unsafe-hash-cleanup:
hash.h: drop unsafe_ function variants
csum-file: introduce hashfile_checkpoint_init()
t/helper/test-hash.c: use unsafe_hash_algo()
csum-file.c: use unsafe_hash_algo()
hash.h: introduce `unsafe_hash_algo()`
csum-file.c: extract algop from hashfile_checksum_valid()
csum-file: store the hash algorithm as a struct field
t/helper/test-tool: implement sha1-unsafe helper
Doc and short-help text for "show-index" has been clarified to
stress that the command reads its data from the standard input.
* jc/show-index-h-update:
show-index: the short help should say the command reads from its input
"git branch --sort=..." and "git for-each-ref --format=... --sort=..."
did not work as expected with some atoms, which has been corrected.
* rs/ref-fitler-used-atoms-value-fix:
ref-filter: remove ref_format_clear()
ref-filter: move is-base tip to used_atom
ref-filter: move ahead-behind bases into used_atom
Doc updates.
* ja/doc-commit-markup-updates:
doc: migrate git-commit manpage secondary files to new format
doc: convert git commit config to new format
doc: make more direct explanations in git commit options
doc: the mode param of -u of git commit is optional
doc: apply new documentation guidelines to git commit
The "cache" credential back-end did not handle authtype correctly,
which has been corrected.
* mh/credential-cache-authtype-request-fix:
credential-cache: respect authtype capability
It was possible for "git unpack-objects" and "git index-pack" to
make an unaligned access, which has been corrected.
* jk/pack-header-parse-alignment-fix:
index-pack, unpack-objects: use skip_prefix to avoid magic number
index-pack, unpack-objects: use get_be32() for reading pack header
parse_pack_header_option(): avoid unaligned memory writes
packfile: factor out --pack_header argument parsing
bswap.h: squelch potential sparse -Wcast-truncate warnings
The help text from "git $cmd -h" appear on the standard output for
some $cmd and the standard error for others. The built-in commands
have been fixed to show them on the standard output consistently.
* jc/show-usage-help:
builtin: send usage() help text to standard output
oddballs: send usage() help text to standard output
builtins: send usage_with_options() help text to standard output
usage: add show_usage_if_asked()
parse-options: add show_usage_with_options_if_asked()
t0012: optionally check that "-h" output goes to stdout
When the --name-hash-version option is used in 'git pack-objects', it
can change from the initial assignment to when it is used based on
interactions with other arguments. Specifically, when writing or reading
bitmaps, we must force version 1 for now. This could change in the
future when the bitmap format can store a name hash version value,
indicating which was used during the writing of the packfile.
Protect the 'git pack-objects' process from getting confused by failing
with a BUG() statement if the value of the name hash version changes
between calls to pack_name_hash_fn().
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new environment variable to opt-in to different values of the
--name-hash-version=<n> option in 'git pack-objects'. This allows for
extra testing of the feature without repeating all of the test
scenarios. Unlike many GIT_TEST_* variables, we are choosing to not add
this to the linux-TEST-vars CI build as that test run is already
overloaded. The behavior exposed by this test variable is of low risk
and should be sufficient to allow manual testing when an issue arises.
But this option isn't free. There are a few tests that change behavior
with the variable enabled.
First, there are a few tests that are very sensitive to certain delta
bases being picked. These are both involving the generation of thin
bundles and then counting their objects via 'git index-pack --fix-thin'
which pulls the delta base into the new packfile. For these tests,
disable the option as a decent long-term option.
Second, there are some tests that compare the exact output of a 'git
pack-objects' process when using bitmaps. The warning that ignores the
--name-hash-version=2 and forces version 1 causes these tests to fail.
Disable the environment variable to get around this issue.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new '--name-hash-version' option for 'git repack' is a simple
pass-through to the underlying 'git pack-objects' subcommand. However,
this subcommand may have other options and a temporary filename as part
of the subcommand execution that may not be predictable or could change
over time.
The existing test_subcommand method requires an exact list of arguments
for the subcommand. This is too rigid for our needs here, so create a
new method, test_subcommand_flex. Use it to check that the
--name-hash-version option is passing through.
Since we are modifying the 'git repack' command, let's bring its usage
in line with the Documentation's synopsis. This removes it from the
allow list in t0450 so it will remain in sync in the future.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change introduced a new pack_name_hash_v2() function that
intends to satisfy much of the hash locality features of the existing
pack_name_hash() function while also distinguishing paths with similar
final components of their paths.
This change adds a new --name-hash-version option for 'git pack-objects'
to allow users to select their preferred function version. This use of
an integer version allows for future expansion and a direct way to later
store a name hash version in the .bitmap format.
For now, let's consider how effective this mechanism is when repacking a
repository with different name hash versions. Specifically, we will
execute 'git pack-objects' the same way a 'git repack -adf' process
would, except we include --name-hash-version=<n> for testing.
On the Git repository, we do not expect much difference. All path names
are short. This is backed by our results:
| Stage | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone | 260 MB | N/A |
| --name-hash-version=1 | 127 MB | 129s |
| --name-hash-version=2 | 127 MB | 112s |
This example demonstrates how there is some natural overhead coming from
the cloned copy because the server is hosting many forks and has not
optimized for exactly this set of reachable objects. But the full repack
has similar characteristics for both versions.
Let's consider some repositories that are hitting too many collisions
with version 1. First, let's explore the kinds of paths that are
commonly causing these collisions:
* "/CHANGELOG.json" is 15 characters, and is created by the beachball
[1] tool. Only the final character of the parent directory can
differentiate different versions of this file, but also only the two
most-significant digits. If that character is a letter, then this is
always a collision. Similar issues occur with the similar
"/CHANGELOG.md" path, though there is more opportunity for
differences In the parent directory.
* Localization files frequently have common filenames but
differentiates via parent directories. In C#, the name
"/strings.resx.lcl" is used for these localization files and they
will all collide in name-hash.
[1] https://github.com/microsoft/beachball
I've come across many other examples where some internal tool uses a
common name across multiple directories and is causing Git to repack
poorly due to name-hash collisions.
One open-source example is the fluentui [2] repo, which uses beachball
to generate CHANGELOG.json and CHANGELOG.md files, and these files have
very poor delta characteristics when comparing against versions across
parent directories.
| Stage | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone | 694 MB | N/A |
| --name-hash-version=1 | 438 MB | 728s |
| --name-hash-version=2 | 168 MB | 142s |
[2] https://github.com/microsoft/fluentui
In this example, we see significant gains in the compressed packfile
size as well as the time taken to compute the packfile.
Using a collection of repositories that use the beachball tool, I was
able to make similar comparisions with dramatic results. While the
fluentui repo is public, the others are private so cannot be shared for
reproduction. The results are so significant that I find it important to
share here:
| Repo | --name-hash-version=1 | --name-hash-version=2 |
|----------|-----------------------|-----------------------|
| fluentui | 440 MB | 161 MB |
| Repo B | 6,248 MB | 856 MB |
| Repo C | 37,278 MB | 6,755 MB |
| Repo D | 131,204 MB | 7,463 MB |
Future changes could include making --name-hash-version implied by a config
value or even implied by default during a full repack.
It is important to point out that the name hash value is stored in the
.bitmap file format, so we must force --name-hash-version=1 when bitmaps
are being read or written. Later, the bitmap format could be updated to
be aware of the name hash version so deltas can be quickly computed
across the bitmapped/not-bitmapped boundary. To promote the safety of
this parameter, the validate_name_hash_version() method will die() if
the given name-hash version is incorrect and will disable newer versions
if not yet compatible with other features, such as --write-bitmap-index.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In b1b713f722 (fetch set_head: handle mirrored bare repositories,
2024-11-22) it was implicitly assumed that all remotes will be mirrors
in a bare repository, thus fetching a non-mirrored remote could lead to
HEAD pointing to a non-existent reference. Make sure we only overwrite
HEAD if we are in a bare repository and fetching from a mirror.
Otherwise, proceed as normally, and create
refs/remotes/<nonmirrorremote>/HEAD instead.
Reported-by: Christian Hesse <list@eworm.de>
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a preparatory step to use even more properties from the remote
struct, refactor set_head to take the entire struct as a parameter,
instead of the necessary bits. This also allows consolidating the use of
gtransport->remote in set_head, making the access of the remote's
properties consistent in the function.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit extends the functionality of `git gc`
by adding a new option, `--expire-to=<dir>`. Previously,
this feature was implemented in 91badeba32 (builtin/repack.c:
implement `--expire-to` for storing pruned objects, 2022-10-24),
which allowing users to specify a directory where unreachable
and expired cruft packs are stored during garbage collection.
However, users had to run `git repack --cruft --expire-to=<dir>`
followed by `git prune` to achieve similar results within `git gc`.
By introducing `--expire-to=<dir>` directly into `git gc`,
we simplify the process for users who wish to manage their
repository's cleanup more efficiently. This change involves
passing the `--expire-to=<dir>` parameter through to `git repack`,
making it easier for users to set up a backup location for cruft
packs that will be pruned.
Due to the original `git gc --prune=now` deleting all unreachable
objects by passing the `-a` parameter to git repack. With the
addition of the `--cruft` and `--expire-to` options, it is necessary
to modify this default behavior: instead of deleting these
unreachable objects, they should be merged into a cruft pack and
collected in a specified directory. Therefore, we do not pass `-a`
to the repack command but instead pass `--cruft`, `--expire-to`,
and `--cruft-expiration=now` to repack.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Back when Git was in its infancy, remotes were configured via separate
files in "branches/" (back in 2005). This mechanism was replaced later
that year with the "remotes/" directory. Both mechanisms have eventually
been replaced by config-based remotes, and it is very unlikely that
anybody still uses these directories to configure their remotes.
Both of these directories have been marked as deprecated, one in 2005
and the other one in 2011. Follow through with the deprecation and
finally announce the removal of these features in Git 3.0.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
[jc: with a small tweak to the help message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 106140a99f (builtin/fast-import: fix segfault with unsafe SHA1
backend, 2024-12-30) and 9218c0bfe1 (bulk-checkin: fix segfault with
unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to
initialize a hashfile_checkpoint with the same hash function
implementation as is used by the hashfile it is used to checkpoint.
While both 106140a99f and 9218c0bfe1 work around the immediate crash,
changing the hash function implementation within the hashfile API to,
for example, the non-unsafe variant would re-introduce the crash. This
is a result of the tight coupling between initializing hashfiles and
hashfile_checkpoints.
Introduce and use a new function which ensures that both parts of a
hashfile and hashfile_checkpoint pair use the same hash function
implementation to avoid such crashes.
A few things worth noting:
- In the change to builtin/fast-import.c::stream_blob(), we can see
that by removing the explicit reference to
'the_hash_algo->unsafe_init_fn()', we are hardened against the
hashfile API changing away from the_hash_algo (or its unsafe
variant) in the future.
- The bulk-checkin code no longer needs to explicitly zero-initialize
the hashfile_checkpoint, since it is now done as a result of calling
'hashfile_checkpoint_init()'.
- Also in the bulk-checkin code, we add an additional call to
prepare_to_stream() outside of the main loop in order to initialize
'state->f' so we know which hash function implementation to use when
calling 'hashfile_checkpoint_init()'.
This is OK, since subsequent 'prepare_to_stream()' calls are noops.
However, we only need to call 'prepare_to_stream()' when we have the
HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling
'prepare_to_stream()' does not assign 'state->f', so we have nothing
to initialize.
- Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are
appropriately guarded.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `write_rev_file()` function uses the global `the_hash_algo` variable
to access the repository's hash_algo. To avoid global variable usage,
pass a hash_algo from the layers above. Also modify children functions
`write_rev_file_order()` and `write_rev_header()` to accept
'the_hash_algo'.
Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.
However, in `midx-write.c`, since all usage of global variables is
removed, don't reintroduce them and instead use the `repo` available in
the context.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `write_idx_file()` function uses the global `the_hash_algo` variable
to access the repository's hash_algo. To avoid global variable usage,
pass a hash_algo from the layers above.
Since `stage_tmp_packfiles()` also resides in 'pack-write.c' and calls
`write_idx_file()`, update it to accept a `struct git_hash_algo` as a
parameter and pass it through to the callee.
Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `index_pack_lockfile()` function uses the global `the_repository`
variable to access the repository. To avoid global variable usage, pass
the repository from the layers above.
Altough the layers above could have access to the repository internally,
simply pass in `the_repository`. This avoids any compatibility issues
and bubbles up global variable usage to upper layers which can be
eventually resolved.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `fixup_pack_header_footer()` function uses the global
`the_hash_algo` variable to access the repository's hash function. To
avoid global variable usage, pass a hash_algo from the layers above.
Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that ref_format_clear() no longer releases any memory we don't need
it anymore. Remove it and its counterpart, ref_format_init().
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
verify_ref_format() parses a ref-filter format string and stores
recognized items in the static array "used_atom". For
"ahead-behind:<committish>" it stores the committish part in a
string_list member "bases" of struct ref_format.
ref_sorting_options() also parses bare ref-filter format items and
stores stores recognized ones in "used_atom" as well. The committish
parts go to a dummy struct ref_format in parse_sorting_atom(), though,
and are leaked and forgotten.
If verify_ref_format() is called before ref_sorting_options(), like in
git for-each-ref, then all works well if the sort key is included in the
format string. If it isn't then sorting cannot work as the committishes
are missing.
If ref_sorting_options() is called first, like in git branch, then we
have the additional issue that if the sort key is included in the format
string then filter_ahead_behind() can't see its committish, will not
generate any results for it and thus it will be expanded to an empty
string.
Fix those issues by replacing the string_list with a field in used_atom
for storing the committish. This way it can be shared for handling both
ref-filter format strings and sorting options in the same command.
Reported-by: Ross Goldberg <ross.goldberg@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More code paths have a repository passed through the callchain,
instead of assuming the primary the_repository object.
* ps/the-repository:
match-trees: stop using `the_repository`
graph: stop using `the_repository`
add-interactive: stop using `the_repository`
tmp-objdir: stop using `the_repository`
resolve-undo: stop using `the_repository`
credential: stop using `the_repository`
mailinfo: stop using `the_repository`
diagnose: stop using `the_repository`
server-info: stop using `the_repository`
send-pack: stop using `the_repository`
serve: stop using `the_repository`
trace: stop using `the_repository`
pager: stop using `the_repository`
progress: stop using `the_repository`
A misconfigured "fsck.skiplist" configuration variable was not
diagnosed as an error, which has been corrected.
* jt/fsck-skiplist-parse-fix:
fsck: reject misconfigured fsck.skipList
The code to compute "unique" name used git_rand() which can fail or
get stuck; the callsite does not require cryptographic security.
Introduce the "insecure" mode and use it appropriately.
* ps/reftable-get-random-fix:
reftable/stack: accept insecure random bytes
wrapper: allow generating insecure random bytes
When parsing --pack_header=, we manually skip 14 bytes to the data.
Let's use skip_prefix() to do this automatically.
Note that we overwrite our pointer to the front of the string, so we
have to add more context to the error message. We could avoid this by
declaring an extra pointer to hold the value, but I think the modified
message is actually preferable; it should give translators a bit more
context.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both of these commands read the incoming pack into a static unsigned
char buffer in BSS, and then parse it by casting the start of the buffer
to a struct pack_header. This can result in SIGBUS on some platforms if
the compiler doesn't place the buffer in a position that is properly
aligned for 4-byte integers.
This reportedly happens with unpack-objects (but not index-pack) on
sparc64 when compiled with clang (but not gcc). But we are definitely in
the wrong in both spots; since the buffer's type is unsigned char, we
can't depend on larger alignment. When it works it is only because we
are lucky.
We'll fix this by switching to get_be32() to read the headers (just like
the last few commits similarly switched us to put_be32() for writing
into the same buffer).
It would be nice to factor this out into a common helper function, but
the interface ends up quite awkward. Either the caller needs to hardcode
how many bytes we'll need, or it needs to pass us its fill()/use()
functions as pointers. So I've just fixed both spots in the same way;
this is not code that is likely to be repeated a third time (most of the
pack reading code uses an mmap'd buffer, which should be properly
aligned).
I did make one tweak to the shared code: our pack_version_ok() macro
expects us to pass the big-endian value we'd get by casting. We can
introduce a "native" variant which uses the host integer ordering.
Reported-by: Koakuma <koachan@protonmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both index-pack and unpack-objects accept a --pack_header argument. This
is an undocumented internal argument used by receive-pack and fetch to
pass along information about the header of the pack, which they've
already read from the incoming stream.
In preparation for a bugfix, let's factor the duplicated code into a
common helper.
The callers are still responsible for identifying the option. While this
could likewise be factored out, it is more flexible this way (e.g., if
they ever started using parse-options and wanted to handle both the
stuck and unstuck forms).
Likewise, the callers are responsible for reporting errors, though they
both just call die(). I've tweaked unpack-objects to match index-pack in
marking the error for translation.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>