Cruft packs are an alternative mechanism for storing a collection of
unreachable objects whose mtimes are recent enough to avoid being
pruned out of the repository.
When cruft packs were first introduced back in b757353676
(builtin/pack-objects.c: --cruft without expiration, 2022-05-20) and
a7d493833f (builtin/pack-objects.c: --cruft with expiration,
2022-05-20), the recommended workflow consisted of:
- Repacking periodically, either by packing anything loose in the
repository (via `git repack -d`) or producing a geometric sequence
of packs (via `git repack --geometric=<d> -d`).
- Every so often, splitting the repository into two packs, one cruft
to store the unreachable objects, and another non-cruft pack to
store the reachable objects.
Repositories may (out of band with the above) choose periodically to
prune out some unreachable objects which have aged out of the grace
period by generating a pack with `--cruft-expiration=<approxidate>`.
This allowed repositories to maintain relatively few packs on average,
and quarantine unreachable objects together in a cruft pack, avoiding
the pitfalls of holding unreachable objects as loose while they age out
(for more, see some of the details in 3d89a8c118
(Documentation/technical: add cruft-packs.txt, 2022-05-20)).
This all works, but can be costly from an I/O-perspective when
frequently repacking a repository that has many unreachable objects.
This problem is exacerbated when those unreachable objects are rarely
(if every) pruned.
Since there is at most one cruft pack in the above scheme, each time we
update the cruft pack it must be rewritten from scratch. Because much of
the pack is reused, this is a relatively inexpensive operation from a
CPU-perspective, but is very costly in terms of I/O since we end up
rewriting basically the same pack (plus any new unreachable objects that
have entered the repository since the last time a cruft pack was
generated).
At the time, we decided against implementing more robust support for
multiple cruft packs. This patch implements that support which we were
lacking.
Introduce a new option `--max-cruft-size` which allows repositories to
accumulate cruft packs up to a given size, after which point a new
generation of cruft packs can accumulate until it reaches the maximum
size, and so on. To generate a new cruft pack, the process works like
so:
- Sort a list of any existing cruft packs in ascending order of pack
size.
- Starting from the beginning of the list, group cruft packs together
while the accumulated size is smaller than the maximum specified
pack size.
- Combine the objects in these cruft packs together into a new cruft
pack, along with any other unreachable objects which have since
entered the repository.
Once a cruft pack grows beyond the size specified via `--max-cruft-size`
the pack is effectively frozen. This limits the I/O churn up to a
quadratic function of the value specified by the `--max-cruft-size`
option, instead of behaving quadratically in the number of total
unreachable objects.
When pruning unreachable objects, we bypass the new code paths which
combine small cruft packs together, and instead start from scratch,
passing in the appropriate `--max-pack-size` down to `pack-objects`,
putting it in charge of keeping the resulting set of cruft packs sized
correctly.
This may seem like further I/O churn, but in practice it isn't so bad.
We could prune old cruft packs for whom all or most objects are removed,
and then generate a new cruft pack with just the remaining set of
objects. But this additional complexity buys us relatively little,
because most objects end up being pruned anyway, so the I/O churn is
well contained.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a configuration option to enable updating and reading from
compatibility hash maps when git accesses the reposotiry.
Call the helper function repo_set_compat_hash_algo with the value
that compatObjectFormat is set to.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A previous commit implemented the `gc.repackFilter` config option
to specify a filter that should be used by `git gc` when
performing repacks.
Another previous commit has implemented
`git repack --filter-to=<dir>` to specify the location of the
packfile containing filtered out objects when using a filter.
Let's implement the `gc.repackFilterTo` config option to specify
that location in the config when `gc.repackFilter` is used.
Now when `git gc` will perform a repack with a <dir> configured
through this option and not empty, the repack process will be
passed a corresponding `--filter-to=<dir>` argument.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A previous commit has implemented `git repack --filter=<filter-spec>` to
allow users to filter out some objects from the main pack and move them
into a new different pack.
Users might want to perform such a cleanup regularly at the same time as
they perform other repacks and cleanups, so as part of `git gc`.
Let's allow them to configure a <filter-spec> for that purpose using a
new gc.repackFilter config option.
Now when `git gc` will perform a repack with a <filter-spec> configured
through this option and not empty, the repack process will be passed a
corresponding `--filter=<filter-spec>` argument.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add new configuration option diff.statNameWidth=<width> that is equivalent
to the command-line option --stat-name-width=<width>, but it is ignored
by format-patch. This follows the logic established by the already
existing configuration option diff.statGraphWidth=<width>.
Limiting the widths of names and graphs in the --stat output makes sense
for interactive work on wide terminals with many columns, hence the support
for these configuration options. They don't affect format-patch because
it already adheres to the traditional 80-column standard.
Update the documentation and add more tests to cover new configuration
option diff.statNameWidth=<width>. While there, perform a few minor code
and whitespace cleanups here and there, as spotted.
Signed-off-by: Dragan Simic <dsimic@manjaro.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We now limit depth of the tree objects and maximum length of
pathnames recorded in tree objects.
* jk/tree-name-and-depth-limit:
lower core.maxTreeDepth default to 2048
tree-diff: respect max_allowed_tree_depth
list-objects: respect max_allowed_tree_depth
read_tree(): respect max_allowed_tree_depth
traverse_trees(): respect max_allowed_tree_depth
add core.maxTreeDepth config
fsck: detect very large tree pathnames
tree-walk: rename "error" variable
tree-walk: drop MAX_TRAVERSE_TREES macro
tree-walk: reduce stack size for recursive functions
Most of our tree traversal algorithms use recursion to visit sub-trees.
For pathologically large trees, this can cause us to run out of stack
space and abort in an uncontrolled way. Let's put our own limit here so
that we can fail gracefully rather than segfaulting.
In similar cases where we recursed along the commit graph, we rewrote
the algorithms to avoid recursion and keep any stack data on the heap.
But the commit graph is meant to grow without bound, whereas it's not an
imposition to put a limit on the maximum size of tree we'll handle.
And this has a bonus side effect: coupled with a limit on individual
tree entry names, this limits the total size of a path we may encounter.
This gives us an extra protection against code handling long path names
which may suffer from integer overflows in the size (which could then be
exploited by malicious trees).
The default of 4096 is set to be much longer than anybody would care
about in the real world. Even with single-letter interior tree names
(like "a/b/c"), such a path is at least 8191 bytes. While most operating
systems will let you create such a path incrementally, trying to
reference the whole thing in a system call (as Git would do when
actually trying to access it) will result in ENAMETOOLONG. Coupled with
the recent fsck.largePathname warning, the maximum total pathname Git
will handle is (by default) 16MB.
This config option doesn't do anything yet; future patches will convert
various algorithms to respect the limit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Overly long label names used in the sequencer machinery are now
chopped to fit under filesystem limitation.
* mp/rebase-label-length-limit:
rebase: allow overriding the maximal length of the generated labels
sequencer: truncate labels to accommodate loose refs
With this change, users can override the compiled-in default for the
maximal length of the label names generated by `git rebase
--rebase-merges`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Mark Ruvald Pedersen <mped@demant.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git pack-objects" learned to invoke a new hook program that
enumerates extra objects to be used as anchoring points to keep
otherwise unreachable objects in cruft packs.
* tb/gc-recent-object-hook:
gc: introduce `gc.recentObjectsHook`
reachable.c: extract `obj_is_recent()`
The object traversal using reachability bitmap done by
"pack-object" has been tweaked to take advantage of the fact that
using "boundary" commits as representative of all the uninteresting
ones can save quite a lot of object enumeration.
* tb/pack-bitmap-traversal-with-boundary:
pack-bitmap.c: use commit boundary during bitmap traversal
pack-bitmap.c: extract `fill_in_bitmap()`
object: add object_array initializer helper function
'git worktree add' learned how to create a worktree based on an
orphaned branch with `--orphan`.
* ja/worktree-orphan:
worktree add: emit warn when there is a bad HEAD
worktree add: extend DWIM to infer --orphan
worktree add: introduce "try --orphan" hint
worktree add: add --orphan flag
t2400: add tests to verify --quiet
t2400: refactor "worktree add" opt exclusion tests
t2400: cleanup created worktree in test
worktree add: include -B in usage docs
This patch introduces a new multi-valued configuration option,
`gc.recentObjectsHook` as a means to mark certain objects as recent (and
thus exempt from garbage collection), regardless of their age.
When performing a garbage collection operation on a repository with
unreachable objects, Git makes its decision on what to do with those
object(s) based on how recent the objects are or not. Generally speaking,
unreachable-but-recent objects stay in the repository, and older objects
are discarded.
However, we have no convenient way to keep certain precious, unreachable
objects around in the repository, even if they have aged out and would
be pruned. Our options today consist of:
- Point references at the reachability tips of any objects you
consider precious, which may be undesirable or infeasible if there
are many such objects.
- Track them via the reflog, which may be undesirable since the
reflog's lifetime is limited to that of the reference it's tracking
(and callers may want to keep those unreachable objects around for
longer).
- Extend the grace period, which may keep around other objects that
the caller *does* want to discard.
- Manually modify the mtimes of objects you want to keep. If those
objects are already loose, this is easy enough to do (you can just
enumerate and `touch -m` each one).
But if they are packed, you will either end up modifying the mtimes
of *all* objects in that pack, or be forced to write out a loose
copy of that object, both of which may be undesirable. Even worse,
if they are in a cruft pack, that requires modifying its `*.mtimes`
file by hand, since there is no exposed plumbing for this.
- Force the caller to construct the pack of objects they want
to keep themselves, and then mark the pack as kept by adding a
".keep" file. This works, but is burdensome for the caller, and
having extra packs is awkward as you roll forward your cruft pack.
This patch introduces a new option to the above list via the
`gc.recentObjectsHook` configuration, which allows the caller to
specify a program (or set of programs) whose output is treated as a set
of objects to treat as recent, regardless of their true age.
The implementation is straightforward. Git enumerates recent objects via
`add_unseen_recent_objects_to_traversal()`, which enumerates loose and
packed objects, and eventually calls add_recent_object() on any objects
for which `want_recent_object()`'s conditions are met.
This patch modifies the recency condition from simply "is the mtime of
this object more recent than the cutoff?" to "[...] or, is this object
mentioned by at least one `gc.recentObjectsHook`?".
Depending on whether or not we are generating a cruft pack, this allows
the caller to do one of two things:
- If generating a cruft pack, the caller is able to retain additional
objects via the cruft pack, even if they would have otherwise been
pruned due to their age.
- If not generating a cruft pack, the caller is likewise able to
retain additional objects as loose.
A potential alternative here is to introduce a new mode to alter the
contents of the reachable pack instead of the cruft one. One could
imagine a new option to `pack-objects`, say `--extra-reachable-tips`
that does the same thing as above, adding the visited set of objects
along the traversal to the pack.
But this has the unfortunate side-effect of altering the reachability
closure of that pack. If parts of the unreachable object graph mentioned
by one or more of the "extra reachable tips" programs is not closed,
then the resulting pack won't be either. This makes it impossible in the
general case to write out reachability bitmaps for that pack, since
closure is a requirement there.
Instead, keep these unreachable objects in the cruft pack (or set of
unreachable, loose objects) instead, to ensure that we can continue to
have a pack containing just reachable objects, which is always safe to
write a bitmap over.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new advice/hint in `git worktree add` for when the user
tries to create a new worktree from a reference that doesn't exist.
Current Behavior:
% git init foo
Initialized empty Git repository in /path/to/foo/
% touch file
% git -C foo commit -q -a -m "test commit"
% git -C foo switch --orphan norefbranch
% git -C foo worktree add newbranch/
Preparing worktree (new branch 'newbranch')
fatal: invalid reference: HEAD
%
New Behavior:
% git init --bare foo
Initialized empty Git repository in /path/to/foo/
% touch file
% git -C foo commit -q -a -m "test commit"
% git -C foo switch --orphan norefbranch
% git -C foo worktree add newbranch/
Preparing worktree (new branch 'newbranch')
hint: If you meant to create a worktree containing a new orphan branch
hint: (branch with no commits) for this repository, you can do so
hint: using the --orphan option:
hint:
hint: git worktree add --orphan newbranch/
hint:
hint: Disable this message with "git config advice.worktreeAddOrphan false"
fatal: invalid reference: HEAD
% git -C foo worktree add -b newbranch2 new_wt/
Preparing worktree (new branch 'newbranch')
hint: If you meant to create a worktree containing a new orphan branch
hint: (branch with no commits) for this repository, you can do so
hint: using the --orphan option:
hint:
hint: git worktree add --orphan -b newbranch2 new_wt/
hint:
hint: Disable this message with "git config advice.worktreeAddOrphan false"
fatal: invalid reference: HEAD
%
Signed-off-by: Jacob Abel <jacobabel@nullpo.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the unit (bytes per second) for http.lowSpeedLimit
in the documentation.
Signed-off-by: Corentin Garcia <corenting@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reachability bitmap coverage exists in a repository, Git will use a
different (and hopefully faster) traversal to compute revision walks.
Consider a set of positive and negative tips (which we'll refer to with
their standard bitmap parlance by "wants", and "haves"). In order to
figure out what objects exist between the tips, the existing traversal
in `prepare_bitmap_walk()` does something like:
1. Consider if we can even compute the set of objects with bitmaps,
and fall back to the usual traversal if we cannot. For example,
pathspec limiting traversals can't be computed using bitmaps (since
they don't know which objects are at which paths). The same is true
of certain kinds of non-trivial object filters.
2. If we can compute the traversal with bitmaps, partition the
(dereferenced) tips into two object lists, "haves", and "wants",
based on whether or not the objects have the UNINTERESTING flag,
respectively.
3. Fall back to the ordinary object traversal if either (a) there are
more than zero haves, none of which are in the bitmapped pack or
MIDX, or (b) there are no wants.
4. Construct a reachability bitmap for the "haves" side by walking
from the revision tips down to any existing bitmaps, OR-ing in any
bitmaps as they are found.
5. Then do the same for the "wants" side, stopping at any objects that
appear in the "haves" bitmap.
6. Filter the results if any object filter (that can be easily
computed with bitmaps alone) was given, and then return back to the
caller.
When there is good bitmap coverage relative to the traversal tips, this
walk is often significantly faster than an ordinary object traversal
because it can visit far fewer objects.
But in certain cases, it can be significantly *slower* than the usual
object traversal. Why? Because we need to compute complete bitmaps on
either side of the walk. If either one (or both) of the sides require
walking many (or all!) objects before they get to an existing bitmap,
the extra bitmap machinery is mostly or all overhead.
One of the benefits, however, is that even if the walk is slower, bitmap
traversals are guaranteed to provide an *exact* answer. Unlike the
traditional object traversal algorithm, which can over-count the results
by not opening trees for older commits, the bitmap walk builds an exact
reachability bitmap for either side, meaning the results are never
over-counted.
But producing non-exact results is OK for our traversal here (both in
the bitmap case and not), as long as the results are over-counted, not
under.
Relaxing the bitmap traversal to allow it to produce over-counted
results gives us the opportunity to make some significant improvements.
Instead of the above, the new algorithm only has to walk from the
*boundary* down to the nearest bitmap, instead of from each of the
UNINTERESTING tips.
The boundary-based approach still has degenerate cases, but we'll show
in a moment that it is often a significant improvement.
The new algorithm works as follows:
1. Build a (partial) bitmap of the haves side by first OR-ing any
bitmap(s) that already exist for UNINTERESTING commits between the
haves and the boundary.
2. For each commit along the boundary, add it as a fill-in traversal
tip (where the traversal terminates once an existing bitmap is
found), and perform fill-in traversal.
3. Build up a complete bitmap of the wants side as usual, stopping any
time we intersect the (partial) haves side.
4. Return the results.
And is more-or-less equivalent to using the *old* algorithm with this
invocation:
$ git rev-list --objects --use-bitmap-index $WANTS --not \
$(git rev-list --objects --boundary $WANTS --not $HAVES |
perl -lne 'print $1 if /^-(.*)/')
The new result performs significantly better in many cases, particularly
when the distance from the boundary commit(s) to an existing bitmap is
shorter than the distance from (all of) the have tips to the nearest
bitmapped commit.
Note that when using the old bitmap traversal algorithm, the results can
be *slower* than without bitmaps! Under the new algorithm, the result is
computed faster with bitmaps than without (at the cost of over-counting
the true number of objects in a similar fashion as the non-bitmap
traversal):
# (Computing the number of tagged objects not on any branches
# without bitmaps).
$ time git rev-list --count --objects --tags --not --branches
20
real 0m1.388s
user 0m1.092s
sys 0m0.296s
# (Computing the same query using the old bitmap traversal).
$ time git rev-list --count --objects --tags --not --branches --use-bitmap-index
19
real 0m22.709s
user 0m21.628s
sys 0m1.076s
# (this commit)
$ time git.compile rev-list --count --objects --tags --not --branches --use-bitmap-index
19
real 0m1.518s
user 0m1.234s
sys 0m0.284s
The new algorithm is still slower than not using bitmaps at all, but it
is nearly a 15-fold improvement over the existing traversal.
In a more realistic setting (using my local copy of git.git), I can
observe a similar (if more modest) speed-up:
$ argv="--count --objects --branches --not --tags"
hyperfine \
-n 'no bitmaps' "git.compile rev-list $argv" \
-n 'existing traversal' "git.compile rev-list --use-bitmap-index $argv" \
-n 'boundary traversal' "git.compile -c pack.useBitmapBoundaryTraversal=true rev-list --use-bitmap-index $argv"
Benchmark 1: no bitmaps
Time (mean ± σ): 124.6 ms ± 2.1 ms [User: 103.7 ms, System: 20.8 ms]
Range (min … max): 122.6 ms … 133.1 ms 22 runs
Benchmark 2: existing traversal
Time (mean ± σ): 368.6 ms ± 3.0 ms [User: 325.3 ms, System: 43.1 ms]
Range (min … max): 365.1 ms … 374.8 ms 10 runs
Benchmark 3: boundary traversal
Time (mean ± σ): 167.6 ms ± 0.9 ms [User: 139.5 ms, System: 27.9 ms]
Range (min … max): 166.1 ms … 169.2 ms 17 runs
Summary
'no bitmaps' ran
1.34 ± 0.02 times faster than 'boundary traversal'
2.96 ± 0.05 times faster than 'existing traversal'
Here, the new algorithm is also still slower than not using bitmaps, but
represents a more than 2-fold improvement over the existing traversal in
a more modest example.
Since this algorithm was originally written (nearly a year and a half
ago, at the time of writing), the bitmap lookup table shipped, making
the new algorithm's result more competitive. A few other future
directions for improving bitmap traversal times beyond not using bitmaps
at all:
- Decrease the cost to decompress and OR together many bitmaps
together (particularly when enumerating the uninteresting side of
the walk). Here we could explore more efficient bitmap storage
techniques, like Roaring+Run and/or use SIMD instructions to speed
up ORing them together.
- Store pseudo-merge bitmaps, which could allow us to OR together
fewer "summary" bitmaps (which would also help with the above).
Helped-by: Jeff King <peff@peff.net>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes, adding a header different than CC or TO is desirable; for
example, when using Debbugs, it is best to use 'X-Debbugs-Cc' headers
to keep people in CC; this is an example use case enabled by the new
'--header-cmd' option.
The header unfolding logic is extracted to a subroutine so that it can
be reused; a test is added for coverage.
Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "gc" needs to retain unreachable objects, packing them into
cruft packs (instead of exploding them into loose object files) has
been offered as a more efficient option for some time. Now the use
of cruft packs has been made the default and no longer considered
an experimental feature.
* tb/enable-cruft-packs-by-default:
repository.h: drop unused `gc_cruft_packs`
builtin/gc.c: make `gc.cruftPacks` enabled by default
t/t9300-fast-import.sh: prepare for `gc --cruft` by default
t/t6500-gc.sh: add additional test cases
t/t6500-gc.sh: refactor cruft pack tests
t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default
t/t5304-prune.sh: prepare for `gc --cruft` by default
builtin/gc.c: ignore cruft packs with `--keep-largest-pack`
builtin/repack.c: fix incorrect reference to '-C'
pack-write.c: plug a leak in stage_tmp_packfiles()
The on-disk reverse index that allows mapping from the pack offset
to the object name for the object stored at the offset has been
enabled by default.
* tb/pack-revindex-on-disk:
t: invert `GIT_TEST_WRITE_REV_INDEX`
config: enable `pack.writeReverseIndex` by default
pack-revindex: introduce `pack.readReverseIndex`
pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
pack-revindex: make `load_pack_revindex` take a repository
t5325: mark as leak-free
pack-write.c: plug a leak in stage_tmp_packfiles()
Back in 5b92477f89 (builtin/gc.c: conditionally avoid pruning objects
via loose, 2022-05-20), `git gc` learned the `--cruft` option and
`gc.cruftPacks` configuration to opt-in to writing cruft packs when
collecting or pruning unreachable objects.
Cruft packs were introduced with the merge in a50036da1a (Merge branch
'tb/cruft-packs', 2022-06-03). They address the problem of "loose object
explosions", where Git will write out many individual loose objects when
there is a large number of unreachable objects that have not yet aged
past `--prune=<date>`.
Instead of keeping track of those unreachable yet recent objects via
their loose object file's mtime, cruft packs collect all unreachable
objects into a single pack with a corresponding `*.mtimes` file that
acts as a table to store the mtimes of all unreachable objects. This
prevents the need to store unreachable objects as loose as they age out
of the repository, and avoids the problem of loose object explosions.
Beyond avoiding loose object explosions, cruft packs also act as a more
efficient mechanism to store unreachable objects as they age out of a
repository. This is because pairs of similar unreachable objects serve
as delta bases for one another.
In 5b92477f89, the feature was introduced as experimental. Since then,
GitHub has been running these patches in every repository generating
hundreds of millions of cruft packs along the way. The feature is
battle-tested, and avoids many pathological cases such as above. Users
who either run `git gc` manually, or via `git maintenance` can benefit
from having cruft packs.
As such, enable cruft pack generation to take place by default (by
making `gc.cruftPacks` have the default of "true" rather than "false).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cruft packs were implemented, we never adjusted the code for `git
gc`'s `--keep-largest-pack` and `gc.bigPackThreshold` to ignore cruft
packs. This option and configuration option share a common
implementation, but including cruft packs is wrong in both cases:
- Running `git gc --keep-largest-pack` in a repository where the
largest pack is the cruft pack itself will make it impossible for
`git gc` to prune objects, since the cruft pack itself is kept.
- The same is true for `gc.bigPackThreshold`, if the size of the cruft
pack exceeds the limit set by the caller.
In the future, it is possible that `gc.bigPackThreshold` could be used
to write a separate cruft pack containing any new unreachable objects
that entered the repository since the last time a cruft pack was
written.
There are some complexities to doing so, mainly around handling
pruning objects that are in an existing cruft pack that is above the
threshold (which would either need to be rewritten, or else delay
pruning). Rewriting a substantially similar cruft pack isn't ideal, but
it is significantly better than the status-quo.
If users have large cruft packs that they don't want to rewrite, they
can mark them as `*.keep` packs. But in general, if a repository has a
cruft pack that is so large it is slowing down GC's, it should probably
be pruned anyway.
In the meantime, ignore cruft packs in the common implementation for
both of these options, and add a pair of tests to prevent any future
regressions here.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Back in e37d0b8730 (builtin/index-pack.c: write reverse indexes,
2021-01-25), Git learned how to read and write a pack's reverse index
from a file instead of in-memory.
A pack's reverse index is a mapping from pack position (that is, the
order that objects appear together in a ".pack") to their position in
lexical order (that is, the order that objects are listed in an ".idx"
file).
Reverse indexes are consulted often during pack-objects, as well as
during auxiliary operations that require mapping between pack offsets,
pack order, and index index.
They are useful in GitHub's infrastructure, where we have seen a
dramatic increase in performance when writing ".rev" files[1]. In
particular:
- an ~80% reduction in the time it takes to serve fetches on a popular
repository, Homebrew/homebrew-core.
- a ~60% reduction in the peak memory usage to serve fetches on that
same repository.
- a collective savings of ~35% in CPU time across all pack-objects
invocations serving fetches across all repositories in a single
datacenter.
Reverse indexes are also beneficial to end-users as well as forges. For
example, the time it takes to generate a pack containing the objects for
the 10 most recent commits in linux.git (representing a typical push) is
significantly faster when on-disk reverse indexes are available:
$ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~10 } >in
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 543.0 ms ± 20.3 ms [User: 616.2 ms, System: 58.8 ms]
Range (min … max): 521.0 ms … 577.9 ms 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 245.0 ms ± 11.4 ms [User: 335.6 ms, System: 31.3 ms]
Range (min … max): 226.0 ms … 259.6 ms 13 runs
Summary
'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran
2.22 ± 0.13 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
The same is true of writing a pack containing the objects for the 30
most-recent commits:
$ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~30 } >in
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 866.5 ms ± 16.2 ms [User: 1414.5 ms, System: 97.0 ms]
Range (min … max): 839.3 ms … 886.9 ms 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 581.6 ms ± 10.2 ms [User: 1181.7 ms, System: 62.6 ms]
Range (min … max): 567.5 ms … 599.3 ms 10 runs
Summary
'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran
1.49 ± 0.04 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
...and savings on trivial operations like computing the on-disk size of
a single (packed) object are even more dramatic:
$ git rev-parse HEAD >in
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in'
Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in
Time (mean ± σ): 305.8 ms ± 11.4 ms [User: 264.2 ms, System: 41.4 ms]
Range (min … max): 290.3 ms … 331.1 ms 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in
Time (mean ± σ): 4.0 ms ± 0.3 ms [User: 1.7 ms, System: 2.3 ms]
Range (min … max): 1.6 ms … 4.6 ms 1155 runs
Summary
'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran
76.96 ± 6.25 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in'
In the more than two years since e37d0b8730 was merged, Git's
implementation of on-disk reverse indexes has been thoroughly tested,
both from users enabling `pack.writeReverseIndexes`, and from GitHub's
deployment of the feature. The latter has been running without incident
for more than two years.
This patch changes Git's behavior to write on-disk reverse indexes by
default when indexing a pack, which should make the above operations
faster for everybody's Git installation after a repack.
(The previous commit explains some potential drawbacks of using on-disk
reverse indexes in certain limited circumstances, that essentially boil
down to a trade-off between time to generate, and time to access. For
those limited cases, the `pack.readReverseIndex` escape hatch can be
used).
[1]: https://github.blog/2021-04-29-scaling-monorepo-maintenance/#reverse-indexes
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 1615c567b8 (Documentation/config/pack.txt: advertise
'pack.writeReverseIndex', 2021-01-25), we have had the
`pack.writeReverseIndex` configuration option, which tells Git whether
or not it is allowed to write a ".rev" file when indexing a pack.
Introduce a complementary configuration knob, `pack.readReverseIndex` to
control whether or not Git will read any ".rev" file(s) that may be
available on disk.
This option is useful for debugging, as well as disabling the effect of
".rev" files in certain instances.
This is useful because of the trade-off[^1] between the time it takes to
generate a reverse index (slow from scratch, fast when reading an
existing ".rev" file), and the time it takes to access a record (the
opposite).
For example, even though it is faster to use the on-disk reverse index
when computing the on-disk size of a packed object, it is slower to
enumerate the same value for all objects.
Here are a couple of examples from linux.git. When computing the above
for a single object, using the on-disk reverse index is significantly
faster:
$ git rev-parse HEAD >in
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in'
Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in
Time (mean ± σ): 302.5 ms ± 12.5 ms [User: 258.7 ms, System: 43.6 ms]
Range (min … max): 291.1 ms … 328.1 ms 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in
Time (mean ± σ): 3.9 ms ± 0.3 ms [User: 1.6 ms, System: 2.4 ms]
Range (min … max): 2.0 ms … 4.4 ms 801 runs
Summary
'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran
77.29 ± 7.14 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in'
, but when instead trying to compute the on-disk object size for all
objects in the repository, using the ".rev" file is a disadvantage over
creating the reverse index from scratch:
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" --batch-all-objects'
Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects
Time (mean ± σ): 8.258 s ± 0.035 s [User: 7.949 s, System: 0.308 s]
Range (min … max): 8.199 s … 8.293 s 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects
Time (mean ± σ): 16.976 s ± 0.107 s [User: 16.706 s, System: 0.268 s]
Range (min … max): 16.839 s … 17.105 s 10 runs
Summary
'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' ran
2.06 ± 0.02 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects'
Luckily, the results when running `git cat-file` with `--unordered` are
closer together:
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects'
Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects
Time (mean ± σ): 5.066 s ± 0.105 s [User: 4.792 s, System: 0.274 s]
Range (min … max): 4.943 s … 5.220 s 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects
Time (mean ± σ): 6.193 s ± 0.069 s [User: 5.937 s, System: 0.255 s]
Range (min … max): 6.145 s … 6.356 s 10 runs
Summary
'git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' ran
1.22 ± 0.03 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects'
Because the equilibrium point between these two is highly machine- and
repository-dependent, allow users to configure whether or not they will
read any ".rev" file(s) with this configuration knob.
[^1]: Generating a reverse index in memory takes O(N) time (where N is
the number of objects in the repository), since we use a radix sort.
Reading an entry from an on-disk ".rev" file is slower since each
operation is bound by disk I/O instead of memory I/O.
In order to compute the on-disk size of a packed object, we need to
find the offset of our object, and the adjacent object (the on-disk
size difference of these two). Finding the first offset requires a
binary search. Finding the latter involves a single .rev lookup.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When no merge.tool or diff.tool is configured or manually selected, the
selection of a default tool is sensitive to the DISPLAY variable; in a
GUI session a gui-specific tool will be proposed if found, and
otherwise a terminal-based one. This "GUI-optimizing" behavior is
important because a GUI can make a huge difference to a user's ability
to understand and correctly complete a non-trivial conflicting merge.
Some time ago the merge.guitool and diff.guitool config options were
introduced to enable users to configure both a GUI tool, and a non-GUI
tool (with fallback if no GUI tool configured), in the same environment.
Unfortunately, the --gui argument introduced to support the selection of
the guitool is still explicit. When using configured tools, there is no
equivalent of the no-tool-configured "propose a GUI tool if we are in a GUI
environment" behavior.
As proposed in <xmqqmtb8jsej.fsf@gitster.g>, introduce new configuration
options, difftool.guiDefault and mergetool.guiDefault, supporting a special
value "auto" which causes the corresponding tool or guitool to be selected
depending on the presence of a non-empty DISPLAY value. Also support "true"
to say "default to the guitool (unless --no-gui is passed on the
commandline)", and "false" as the previous default behavior when these new
configuration options are not specified.
Signed-off-by: Tao Klerks <tao@klerks.biz>
Acked-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Streamline --rebase-merges command line option handling and
introduce rebase.merges configuration variable.
* ah/rebase-merges-config:
rebase: add a config option for --rebase-merges
rebase: deprecate --rebase-merges=""
rebase: add documentation and test for --no-rebase-merges
The purpose of the new option is to accommodate users who would like
--rebase-merges to be on by default and to facilitate turning on
--rebase-merges by default without configuration in a future version of
Git.
Name the new option rebase.rebaseMerges, even though it is a little
redundant, for consistency with the name of the command line option and
to be clear when scrolling through values in the [rebase] section of
.gitconfig.
Support setting rebase.rebaseMerges to the nonspecific value "true" for
users who don't need to or don't want to learn about the difference
between rebase-cousins and no-rebase-cousins.
Make --rebase-merges without an argument on the command line override
any value of rebase.rebaseMerges in the configuration, for consistency
with other command line flags with optional arguments that have an
associated config option.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git format-patch" honors the src/dst prefixes set to nonstandard
values with configuration variables like "diff.noprefix", causing
receiving end of the patch that expects the standard -p1 format to
break. Teach "format-patch" to ignore end-user configuration and
always use the standard prefixes.
This is a backward compatibility breaking change.
* jk/format-patch-ignore-noprefix:
rebase: prefer --default-prefix to --{src,dst}-prefix for format-patch
format-patch: add format.noprefix option
format-patch: do not respect diff.noprefix
diff: add --default-prefix option
t4013: add tests for diff prefix options
diff: factor out src/dst prefix setup
The previous commit dropped support for diff.noprefix in format-patch.
While this will do the right thing in most cases (where sending patches
without a prefix was an accidental side effect of the sender preferring
to see their local patches without prefixes), it left no good option for
a project or workflow where you really do want to send patches without
prefixes. You'd be stuck using "--no-prefix" for every invocation.
So let's add a config option specific to format-patch that enables this
behavior. That gives people who have such a workflow a way to get what
they want, but makes it hard to accidentally trigger it.
A more backwards-compatible way of doing the transition would be to have
format.noprefix default to diff.noprefix when it's not set. But that
doesn't really help the "accidental" problem; people would have to
manually set format.noprefix=false. And it's unlikely that anybody
really wants format.noprefix=true in the first place. I'm adding it here
mostly as an escape hatch, not because anybody has expressed any
interest in it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The user might not necessarily know why ff only was configured, maybe an
admin did it, or the installer (Git for Windows), or perhaps they just
followed some online advice.
This can happen not only on pull.ff=only, but merge.ff=only too.
Even worse if the user has configured pull.rebase=false and
merge.ff=only, because in those cases a diverging merge will constantly
keep failing. There's no trivial way to get out of this other than
`git merge --no-ff`.
Let's not assume our users are experts in git who completely understand
all their configurations.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This document only explains PGP signatures, but Git now supports X.509
signatures as of 1e7adb9756 (gpg-interface: introduce new signature
format "x509" using gpgsm, 2018-07-17), and SSH signatures as of
29b315778e (ssh signing: add ssh key format and signing code,
2021-09-10).
Additionally, explain that these signature formats are controlled
`gpg.format`, linking to its documentation, and explain in said
`gpg.format` documentation that the underlying signature format is
documented in signature-format.txt.
Signed-off-by: Gwyneth Morgan <gwymor@tilde.club>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a lower precedence configuration file (e.g. /etc/gitconfig)
defines format.attach in any way, there was no way to disable it in
a more specific configuration file (e.g. $HOME/.gitconfig).
Change the behaviour of setting it to an empty string. It used to
mean that the result is still a multipart message with only dashes
used as a multi-part separator, but now it resets the setting to
the default (which would be to give an inline patch, unless other
command line options are in effect).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Finally retire the scripted "git add -p/-i" implementation and have
everybody use the one reimplemented in C.
* ab/retire-scripted-add-p:
docs & comments: replace mentions of "git-add--interactive.perl"
add API: remove run_add_interactive() wrapper function
add: remove "add.interactive.useBuiltin" & Perl "git add--interactive"
The bundle-URI subsystem adds support for creation-token heuristics
to help incremental fetches.
* ds/bundle-uri-5:
bundle-uri: test missing bundles with heuristic
bundle-uri: store fetch.bundleCreationToken
fetch: fetch from an external bundle URI
bundle-uri: drop bundle.flag from design doc
clone: set fetch.bundleURI if appropriate
bundle-uri: download in creationToken order
bundle-uri: parse bundle.<id>.creationToken values
bundle-uri: parse bundle.heuristic=creationToken
t5558: add tests for creationToken heuristic
bundle: verify using check_connected()
bundle: test unbundling with incomplete history
Since [1] first released with Git v2.37.0 the built-in version of "add
-i" has been the default. That built-in implementation was added in
[2], first released with Git v2.25.0.
At this point enough time has passed to allow for finding any
remaining bugs in this new implementation, so let's remove the
fallback code.
As with similar migrations for "stash"[3] and "rebase"[4] we're
keeping a mention of "add.interactive.useBuiltin" in the
documentation, but adding a warning() to notify any outstanding users
that the built-in is now the default. As with [5] and [6] we should
follow-up in the future and eventually remove that warning.
1. 0527ccb1b5 (add -i: default to the built-in implementation,
2021-11-30)
2. f83dff60a7 (Start to implement a built-in version of `git add
--interactive`, 2019-11-13)
3. 8a2cd3f512 (stash: remove the stash.useBuiltin setting,
2020-03-03)
4. d03ebd411c (rebase: remove the rebase.useBuiltin setting,
2019-03-18)
5. deeaf5ee07 (stash: remove documentation for `stash.useBuiltin`,
2022-01-27)
6. 9bcde4d531 (rebase: remove transitory rebase.useBuiltin setting &
env, 2021-03-23)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a bundle list specifies the "creationToken" heuristic, the Git
client downloads the list and then starts downloading bundles in
descending creationToken order. This process stops as soon as all
downloaded bundles can be applied to the repository (because all
required commits are present in the repository or in the downloaded
bundles).
When checking the same bundle list twice, this strategy requires
downloading the bundle with the maximum creationToken again, which is
wasteful. The creationToken heuristic promises that the client will not
have a use for that bundle if its creationToken value is at most the
previous creationToken value.
To prevent these wasteful downloads, create a fetch.bundleCreationToken
config setting that the Git client sets after downloading bundles. This
value allows skipping that maximum bundle download when this config
value is the same value (or larger).
To test that this works correctly, we can insert some "duplicate"
fetches into existing tests and demonstrate that only the bundle list is
downloaded.
The previous logic for downloading bundles by creationToken worked even
if the bundle list was empty, but now we have logic that depends on the
first entry of the list. Terminate early in the (non-sensical) case of
an empty bundle list.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Bundle providers may organize their bundle lists in a way that is
intended to improve incremental fetches, not just initial clones.
However, they do need to state that they have organized with that in
mind, or else the client will not expect to save time by downloading
bundles after the initial clone. This is done by specifying a
bundle.heuristic value.
There are two types of bundle lists: those at a static URI and those
that are advertised from a Git remote over protocol v2.
The new fetch.bundleURI config value applies for static bundle URIs that
are not advertised over protocol v2. If the user specifies a static URI
via 'git clone --bundle-uri', then Git can set this config as a reminder
for future 'git fetch' operations to check the bundle list before
connecting to the remote(s).
For lists provided over protocol v2, we will want to take a different
approach and create a property of the remote itself by creating a
remote.<id>.* type config key. That is not implemented in this change.
Later changes will update 'git fetch' to consume this option.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The bundle.heuristic value communicates that the bundle list is
organized to make use of the bundle.<id>.creationToken values that may
be provided in the bundle list. Those values will create a total order
on the bundles, allowing the Git client to download them in a specific
order and even remember previously-downloaded bundles by storing the
maximum creation token value.
Before implementing any logic that parses or uses the
bundle.<id>.creationToken values, teach Git to parse the
bundle.heuristic value from a bundle list. We can use 'test-tool
bundle-uri' to print the heuristic value and verify that the parsing
works correctly.
As an extra precaution, create the internal 'heuristics' array to be a
list of (enum, string) pairs so we can iterate through the array entries
carefully, regardless of the enum values.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce an optional configuration to allow the trailing hash that
protects the index file from bit flipping.
* ds/omit-trailing-hash-in-index:
features: feature.manyFiles implies fast index writes
test-lib-functions: add helper for trailing hash
read-cache: add index.skipHash config option
hashfile: allow skipping the hash function
The recent addition of the index.skipHash config option allows index
writes to speed up by skipping the hash computation for the trailing
checksum. This is particularly critical for repositories with many files
at HEAD, so add this config option to two cases where users in that
scenario may opt-in to such behavior:
1. The feature.manyFiles config option enables some options that are
helpful for repositories with many files at HEAD.
2. 'scalar register' and 'scalar reconfigure' set config options that
optimize for large repositories.
In both of these cases, set index.skipHash=true to gain this
speedup. Add tests that demonstrate the proper way that
index.skipHash=true can override feature.manyFiles=true.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change allowed skipping the hashing portion of the
hashwrite API, using it instead as a buffered write API. Disabling the
hashwrite can be particularly helpful when the write operation is in a
critical path.
One such critical path is the writing of the index. This operation is so
critical that the sparse index was created specifically to reduce the
size of the index to make these writes (and reads) faster.
This trade-off between file stability at rest and write-time performance
is not easy to balance. The index is an interesting case for a couple
reasons:
1. Writes block users. Writing the index takes place in many user-
blocking foreground operations. The speed improvement directly
impacts their use. Other file formats are typically written in the
background (commit-graph, multi-pack-index) or are super-critical to
correctness (pack-files).
2. Index files are short lived. It is rare that a user leaves an index
for a long time with many staged changes. Outside of staged changes,
the index can be completely destroyed and rewritten with minimal
impact to the user.
Following a similar approach to one used in the microsoft/git fork [1],
add a new config option (index.skipHash) that allows disabling this
hashing during the index write. The cost is that we can no longer
validate the contents for corruption-at-rest using the trailing hash.
[1] 21fed2d914
We load this config from the repository config given by istate->repo,
with a fallback to the_repository if it is not set.
While older Git versions will not recognize the null hash as a special
case, the file format itself is still being met in terms of its
structure. Using this null hash will still allow Git operations to
function across older versions.
The one exception is 'git fsck' which checks the hash of the index file.
This used to be a check on every index read, but was split out to just
the index in a33fc72fe9 (read-cache: force_verify_index_checksum,
2017-04-14) and released first in Git 2.13.0. Document the versions that
relaxed these restrictions, with the optimistic expectation that this
change will be included in Git 2.40.0.
Here, we disable this check if the trailing hash is all zeroes. We add a
warning to the config option that this may cause undesirable behavior
with older Git versions.
As a quick comparison, I tested 'git update-index --force-write' with
and without index.skipHash=true on a copy of the Linux kernel
repository.
Benchmark 1: with hash
Time (mean ± σ): 46.3 ms ± 13.8 ms [User: 34.3 ms, System: 11.9 ms]
Range (min … max): 34.3 ms … 79.1 ms 82 runs
Benchmark 2: without hash
Time (mean ± σ): 26.0 ms ± 7.9 ms [User: 11.8 ms, System: 14.2 ms]
Range (min … max): 16.3 ms … 42.0 ms 69 runs
Summary
'without hash' ran
1.78 ± 0.76 times faster than 'with hash'
These performance benefits are substantial enough to allow users the
ability to opt-in to this feature, even with the potential confusion
with older 'git fsck' versions.
Test this new config option, both at a command-line level and within a
submodule. The confirmation is currently limited to confirm that 'git
fsck' does not complain about the index. Future updates will make this
test more robust.
It is critical that this test is placed before the test_index_version
tests, since those tests obliterate the .git/config file and hence lose
the setting from GIT_TEST_DEFAULT_HASH, if set.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git format-patch" learned to honor format.mboxrd even when sending
patches to the standard output stream,
* ew/format-patch-mboxrd:
format-patch: support format.mboxrd with --stdout
Bundle URIs part 4.
* ds/bundle-uri-4:
clone: unbundle the advertised bundles
bundle-uri: download bundles from an advertised list
bundle-uri: allow relative URLs in bundle lists
strbuf: introduce strbuf_strip_file_from_path()
bundle-uri: serve bundle.* keys from config
bundle-uri client: add helper for testing server
transport: rename got_remote_heads
bundle-uri client: add boolean transfer.bundleURI setting
clone: request the 'bundle-uri' command when available
t: create test harness for 'bundle-uri' command
protocol v2: add server-side "bundle-uri" skeleton
mboxrd is a more robust output format when used with --stdout
and needs more exposure. Introducing this config knob lets
users choose the more robust format for all their --stdout
uses.
Relying on --pretty=mboxrd and including all of pretty-formats.txt
in the `git format-patch' documentation would likely be
confusing to users. Furthermore, this setting is useful across
multiple invocations. So introduce `format.mboxrd' as a boolean
configuration knob that changes the default --pretty=email format
to --pretty=mboxrd when (and only when) --stdout is in use.
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The yet-to-be introduced client support for bundle-uri will always
fall back on a full clone, but we'd still like to be able to ignore a
server's bundle-uri advertisement entirely.
The new transfer.bundleURI config option defaults to 'false', but a user
can set it to 'true' to enable checking for bundle URIs from the origin
Git server using protocol v2.
Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git learned pushing submodules without pushing the superproject by
the user specifying --recurse-submodules=only through 6c656c3fe4
("submodules: add RECURSE_SUBMODULES_ONLY value", 2016-12-20) and
225e8bf778 ("push: add option to push only submodules", 2016-12-20).
For users who use this feature regularly, it is desirable to have an
equivalent configuration.
It turns out that such a configuration (push.recurseSubmodules=only) is
already supported, even though it is neither documented nor mentioned
in the commit messages, due to the way the --recurse-submodules=only
feature was implemented (a function used to parse --recurse-submodules
was updated to support "only", but that same function is used to parse
push.recurseSubmodules too). What is left is to document it and test it,
which is what this commit does.
There is a possible point of confusion when recursing into a submodule
that itself has the push.recurseSubmodules=only configuration, because
if a repository has only its submodules pushed and not itself, its
superproject can never be pushed. Therefore, treat such configurations
as being "on-demand", and print a warning message.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The glossary entries for "commit-graph file" and "reachability
bitmap" have been added.
* po/glossary-around-traversal:
glossary: add reachability bitmap description
glossary: add "commit graph" description
doc: use 'object database' not ODB or abbreviation
doc: use "commit-graph" hyphenation consistently
Enable gc.cruftpacks by default for those who opt into
feature.experimental setting.
* es/mark-gc-cruft-as-experimental:
config: let feature.experimental imply gc.cruftPacks=true
gc: add tests for --cruft and friends
Define the logical elements of a "bundle list", data structure to
store them in-core, format to transfer them, and code to parse
them.
* ds/bundle-uri-3:
bundle-uri: suppress stderr from remote-https
bundle-uri: quiet failed unbundlings
bundle: add flags to verify_bundle()
bundle-uri: fetch a list of bundles
bundle: properly clear all revision flags
bundle-uri: limit recursion depth for bundle lists
bundle-uri: parse bundle list in config format
bundle-uri: unit test "key=value" parsing
bundle-uri: create "key=value" line parsing
bundle-uri: create base key-value pair parsing
bundle-uri: create bundle_list struct and helpers
bundle-uri: use plain string in find_temp_filename()
Note, historical release notes have not been updated.
Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
We are interested in exploring whether gc.cruftPacks=true should become
the default value.
To determine whether it is safe to do so, let's encourage more users to
try it out.
Users who have set feature.experimental=true have already volunteered to
try new and possibly-breaking config changes, so let's try this new
default with that set of users.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation lacks mention of specific <msg-id> that are supported.
While git-help --config will display a list of these options, often
developers' first instinct is to consult the git docs to find valid
config values.
Add a list of fsck error messages, and link to it from the git-fsck
documentation.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, use of fsmonitor on a repository on networked
filesystem is disabled. Add knobs to make it workable on macOS.
* ed/fsmonitor-on-networked-macos:
fsmonitor: fix leak of warning message
fsmonitor: add documentation for allowRemote and socketDir options
fsmonitor: check for compatability before communicating with fsmonitor
fsmonitor: deal with synthetic firmlinks on macOS
fsmonitor: avoid socket location check if using hook
fsmonitor: relocate socket file if .git directory is remote
fsmonitor: refactor filesystem checks to common interface
There will be two primary ways to advertise a bundle list: as a list of
packet lines in Git's protocol v2 and as a config file served from a
bundle URI. Both of these fundamentally use a list of key-value pairs.
We will use the same set of key-value pairs across these formats.
Create a new bundle_list_update() method that is currently unusued, but
will be used in the next change. It inspects each key to see if it is
understood and then applies it to the given bundle_list. Here are the
keys that we teach Git to understand:
* bundle.version: This value should be an integer. Git currently
understands only version 1 and will ignore the list if the version is
any other value. This version can be increased in the future if we
need to add new keys that Git should not ignore. We can add new
"heuristic" keys without incrementing the version.
* bundle.mode: This value should be one of "all" or "any". If this
mode is not understood, then Git will ignore the list. This mode
indicates whether Git needs all of the bundle list items to make a
complete view of the content or if any single item is sufficient.
The rest of the keys use a bundle identifier "<id>" as part of the key
name. Keys using the same "<id>" describe a single bundle list item.
* bundle.<id>.uri: This stores the URI of the bundle item. This
currently is expected to be an absolute URI, but will be relaxed to be
a relative URI in the future.
While parsing, return an error if a URI key is repeated, since we can
make that restriction with bundle lists.
Make the git_parse_int() method global so we can parse the integer
version value carefully.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add documentation for 'fsmonitor.allowRemote' and 'fsmonitor.socketDir'.
Call-out experimental nature of 'fsmonitor.allowRemote' and limited
filesystem support for 'fsmonitor.socketDir'.
Signed-off-by: Eric DeCosta <edecosta@mathworks.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An earlier patch discussed and fixed a scenario where Git could be used
as a vector to exfiltrate sensitive data through a Docker container when
a potential victim clones a suspicious repository with local submodules
that contain symlinks.
That security hole has since been plugged, but a similar one still
exists. Instead of convincing a would-be victim to clone an embedded
submodule via the "file" protocol, an attacker could convince an
individual to clone a repository that has a submodule pointing to a
valid path on the victim's filesystem.
For example, if an individual (with username "foo") has their home
directory ("/home/foo") stored as a Git repository, then an attacker
could exfiltrate data by convincing a victim to clone a malicious
repository containing a submodule pointing at "/home/foo/.git" with
`--recurse-submodules`. Doing so would expose any sensitive contents in
stored in "/home/foo" tracked in Git.
For systems (such as Docker) that consider everything outside of the
immediate top-level working directory containing a Dockerfile as
inaccessible to the container (with the exception of volume mounts, and
so on), this is a violation of trust by exposing unexpected contents in
the working copy.
To mitigate the likelihood of this kind of attack, adjust the "file://"
protocol's default policy to be "user" to prevent commands that execute
without user input (including recursive submodule initialization) from
taking place by default.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Share the text used to explain configuration variables used by "git
<subcmd>" in "git help <subcmd>" with the text from "git help config".
* ab/dedup-config-and-command-docs:
docs: add CONFIGURATION sections that fuzzy map to built-ins
docs: add CONFIGURATION sections that map to a built-in
log docs: de-duplicate configuration sections
difftool docs: de-duplicate configuration sections
notes docs: de-duplicate and combine configuration sections
apply docs: de-duplicate configuration sections
send-email docs: de-duplicate configuration sections
grep docs: de-duplicate configuration sections
docs: add and use include template for config/* includes
Inspired by 24966cd982 ("doc: fix repeated words", 08-09-2019),
I ran "egrep -R "\<([a-zA-Z]+)\> \<\1\>" ./Documentation/*" to
find current cases of repeated words such as "the the" that were
quite clearly typos.
There were many false positives reported, such as "really really"
or valid uses of "that that" which I left alone.
Signed-off-by: Jacob Stopak <jacob@initialcommit.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git format-patch --from=<ident>" can be told to add an in-body
"From:" line even for commits that are authored by the given
<ident> with "--force-in-body-from"option.
* jc/format-patch-force-in-body-from:
format-patch: learn format.forceInBodyFrom configuration variable
format-patch: allow forcing the use of in-body From: header
pretty: separate out the logic to decide the use of in-body from
Include the "config/difftool.txt" file in "git-difftool.txt", and move
the relevant part of git-difftool(1) configuration from
"config/diff.txt" to config/difftool.txt".
Doing this is slightly odd, as we usually discuss configuration in
alphabetical order, but by doing it we're able to include the full set
of configuration used by git-difftool(1) (and only that configuration)
in its own documentation.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Combine the various "notes" configuration sections spread across
Documentation/config/notes.txt and Documentation/git-notes.txt to live
in the former, and to be included in the latter.
We'll now forward link from "git notes" to the "CONFIGURATION" section
below, rather than to "git-config(1)" when discussing configuration
variables that are (also) discussed in that section.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
De-duplicate the discussion of "send-email" configuration, such that
the "git-config(1)" manual page becomes the source of truth, and
"git-send-email(1)" includes the relevant part.
Most commands that suffered from such duplication had diverging text
discussing the same variables, but in this case some config was also
only discussed in one or the other.
This is mostly a move-only change, the exception is a minor rewording
of changing wording like "see above" to "see linkgit:git-config[1]",
as well as a clarification about the big section of command-line
option tweaking config being discussed in git-send-email(1)'s main
docs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Include the "config/grep.txt" file in "git-grep.txt", instead of
repeating an almost identical description of the "grep" configuration
variables in two places.
There is no loss of information here that isn't shown in the addition
to "grep.txt". This change was made by copying the contents of
"git-grep.txt"'s version over the "grep.txt" version. Aside from the
change "grep.txt" being made here the two were identical.
This documentation started being copy/pasted around in
b22520a37c (grep: allow -E and -n to be turned on by default via
configuration, 2011-03-30). After that in e.g. 6453f7b348 (grep: add
grep.fullName config variable, 2014-03-17) they started drifting
apart, with only grep.fullName being described in the command
documentation.
In 434e6e753f (config.txt: move grep.* to a separate file,
2018-10-27) we gained the include, but didn't do this next step, let's
do it now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pack bitmap file gained a bitmap-lookup table to speed up
locating the necessary bitmap for a given commit.
* ac/bitmap-lookup-table:
pack-bitmap-write: drop unused pack_idx_entry parameters
bitmap-lookup-table: add performance tests for lookup table
pack-bitmap: prepare to read lookup table extension
pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
pack-bitmap-write.c: write lookup table extension
bitmap: move `get commit positions` code to `bitmap_writer_finish`
Documentation/technical: describe bitmap lookup table extension
The namespaces used by "log --decorate" from "refs/" hierarchy by
default has been tightened.
* ds/decorate-filter-tweak:
fetch: use ref_namespaces during prefetch
maintenance: stop writing log.excludeDecoration
log: create log.initialDecorationSet=all
log: add --clear-decorations option
log: add default decoration filter
log-tree: use ref_namespaces instead of if/else-if
refs: use ref_namespaces for replace refs base
refs: add array of ref namespaces
t4207: test coloring of grafted decorations
t4207: modernize test
refs: allow "HEAD" as decoration filter
As the need to use the "--force-in-body-from" option primarily is
tied to which mailing list the mails go to (and get their From:
address mangled), it is likely that a user who needs to use this
option once to interact with their upstream project needs to use it
for all patches they send out.
Add a configuration variable, suitable for setting in the local
configuration file per repository, for this.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach Git to provide a way for users to enable/disable bitmap lookup
table extension by providing a config option named 'writeBitmapLookupTable'.
Default is false.
Also add test to verify writting of lookup table.
Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-Authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expose a lot of "tech docs" via "git help" interface.
* ab/tech-docs-to-help:
docs: move http-protocol docs to man section 5
docs: move cruft pack docs to gitformat-pack
docs: move pack format docs to man section 5
docs: move signature docs to man section 5
docs: move index format docs to man section 5
docs: move protocol-related docs to man section 5
docs: move commit-graph format docs to man section 5
git docs: add a category for file formats, protocols and interfaces
git docs: add a category for user-facing file, repo and command UX
git help doc: use "<doc>" instead of "<guide>"
help.c: remove common category behavior from drop_prefix() behavior
help.c: refactor drop_prefix() to use a "switch" statement"
Add missing documentation for "include" and "includeIf" features in
"git config" file format, which incidentally teaches the command
line completion to include them in its offerings.
source: <pull.1285.v2.git.1658002423864.gitgitgadget@gmail.com>
* mb/config-document-include:
config.txt: document include, includeIf
The previous change introduced the --clear-decorations option for users
who do not want their decorations limited to a narrow set of ref
namespaces.
Add a config option that is equivalent to specifying --clear-decorations
by default.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue the move of existing Documentation/technical/* protocol and
file-format documentation into our main documentation space by moving
the various documentation pertaining to the *.pack format and related
files, and updating things that refer to it to link to the new
location.
By moving these we can properly link from the newly created
gitformat-commit-graph to a gitformat-chunk-format page.
Integrating "Documentation/technical/bitmap-format.txt" and
"Documentation/technical/cruft-packs.txt" might logically be part of
this change, but as those cover parts of the wider "pack
format" (including associated files) that's documented outside of
"Documentation/technical/pack-format.txt" let's leave those for now,
subsequent commit(s) will address those.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue the move of existing Documentation/technical/* protocol and
file-format documentation into our main documentation space. By moving
the things that discuss the protocol we can properly link from
e.g. lsrefs.unborn and protocol.version documentation to a manpage we
build by default.
So far we have been using the "gitformat-" prefix for the
documentation we've been moving over from Documentation/technical/*,
but for protocol documentation let's use "gitprotocol-*".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rebase -i" learns to update branches whose tip appear in the
rebased range with "--update-refs" option.
source: <pull.1247.v5.git.1658255624.gitgitgadget@gmail.com>
* ds/rebase-update-ref:
sequencer: notify user of --update-refs activity
sequencer: ignore HEAD ref under --update-refs
rebase: add rebase.updateRefs config option
sequencer: rewrite update-refs as user edits todo list
rebase: update refs from 'update-ref' commands
rebase: add --update-refs option
sequencer: add update-ref command
sequencer: define array with enum values
rebase-interactive: update 'merge' description
branch: consider refs under 'update-refs'
t2407: test branches currently using apply backend
t2407: test bisect and rebase as black-boxes
Add missing documentation for "include" and "includeIf" features in
"git config" file format, which incidentally teaches the command
line completion to include them in its offerings.
* mb/config-document-include:
config.txt: document include, includeIf
The previous change added the --update-refs command-line option. For
users who always want this mode, create the rebase.updateRefs config
option which behaves the same way as rebase.autoSquash does with the
--autosquash option.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sparse checkout feature can be used in "cone mode" or "non-cone
mode". In this one instance in the documentation, we refer to the latter
as "non cone mode" with whitespace rather than a hyphen. Align this with
the rest of our documentation.
A few words later in the same paragraph, there's mention of "a more
flexible patterns". Drop that leading "a" to fix the grammar.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git config's tab completion does not yet know about the "include"
and "includeIf" sections, nor the related "path" variable.
Add a description for these two sections in
'Documentation/config/includeif.txt', which points to git-config's
documentation, specifically the "Includes" and "Conditional Includes"
subsections.
As a side effect, tab completion can successfully complete the
'include', 'includeIf', and 'include.add' expressions.
This effect is tested by two new ad-hoc tests.
Variable completion only works for "include" for now.
Credit for the ideas behind this patch goes to
Ævar Arnfjörð Bjarmason.
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Manuel Boni <ziosombrero@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a known social engineering attack that takes advantage of the
fact that a working tree can include an entire bare repository,
including a config file. A user could run a Git command inside the bare
repository thinking that the config file of the 'outer' repository would
be used, but in reality, the bare repository's config file (which is
attacker-controlled) is used, which may result in arbitrary code
execution. See [1] for a fuller description and deeper discussion.
A simple mitigation is to forbid bare repositories unless specified via
`--git-dir` or `GIT_DIR`. In environments that don't use bare
repositories, this would be minimally disruptive.
Create a config variable, `safe.bareRepository`, that tells Git whether
or not to die() when working with a bare repository. This config is an
enum of:
- "all": allow all bare repositories (this is the default)
- "explicit": only allow bare repositories specified via --git-dir
or GIT_DIR.
If we want to protect users from such attacks by default, neither value
will suffice - "all" provides no protection, but "explicit" is
impractical for bare repository users. A more usable default would be to
allow only non-embedded bare repositories ([2] contains one such
proposal), but detecting if a repository is embedded is potentially
non-trivial, so this work is not implemented in this series.
[1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com
[2]: https://lore.kernel.org/git/5b969c5e-e802-c447-ad25-6acc0b784582@github.com
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use git_protected_config() to read `safe.directory` instead of
read_very_early_config(), making it 'protected configuration only'.
As a result, `safe.directory` now respects "-c", so update the tests and
docs accordingly. It used to ignore "-c" due to how it was implemented,
not because of security or correctness concerns [1].
[1] https://lore.kernel.org/git/xmqqlevabcsu.fsf@gitster.g/
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For security reasons, there are config variables that are only trusted
when they are specified in certain configuration scopes, which are
sometimes referred to on-list as 'protected configuration' [1]. A future
commit will introduce another such variable, so let's define our terms
so that we can have consistent documentation and implementation.
In our documentation, define 'protected configuration' as the system,
global and command config scopes. As a shorthand, I will refer to
variables that are only respected in protected configuration as
'protected configuration only', but this term is not used in the
documentation.
This definition of protected configuration is based on whether or not
Git can reasonably protect the user by ignoring the configuration scope:
- System, global and command line config are considered protected
because an attacker who has control over any of those can do plenty of
harm without Git, so we gain very little by ignoring those scopes.
- On the other hand, local (and similarly, worktree) config are not
considered protected because it is relatively easy for an attacker to
control local config, e.g.:
- On some shared user environments, a non-admin attacker can create a
repository high up the directory hierarchy (e.g. C:\.git on
Windows), and a user may accidentally use it when their PS1
automatically invokes "git" commands.
`safe.directory` prevents attacks of this form by making sure that
the user intended to use the shared repository. It obviously
shouldn't be read from the repository, because that would end up
trusting the repository that Git was supposed to reject.
- "git upload-pack" is expected to run in repositories that may not be
controlled by the user. We cannot ignore all config in that
repository (because "git upload-pack" would fail), but we can limit
the risks by ignoring `uploadpack.packObjectsHook`.
Only `uploadpack.packObjectsHook` is 'protected configuration only'. The
following variables are intentionally excluded:
- `safe.directory` should be 'protected configuration only', but it does
not technically fit the definition because it is not respected in the
"command" scope. A future commit will fix this.
- `trace2.*` happens to read the same scopes as `safe.directory` because
they share an implementation. However, this is not for security
reasons; it is because we want to start tracing so early that
repository-level config and "-c" are not available [2].
This requirement is unique to `trace2.*`, so it does not makes sense
for protected configuration to be subject to the same constraints.
[1] For example,
https://lore.kernel.org/git/6af83767-576b-75c4-c778-0284344a8fe7@github.com/
[2] https://lore.kernel.org/git/a0c89d0d-669e-bf56-25d2-cbb09b012e70@jeffhostetler.com/
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow large objects read from a packstream to be streamed into a
loose object file straight, without having to keep it in-core as a
whole.
* hx/unpack-streaming:
unpack-objects: use stream_loose_object() to unpack large objects
core doc: modernize core.bigFileThreshold documentation
object-file.c: add "stream_loose_object()" to handle large object
object-file.c: factor out deflate part of write_loose_object()
object-file.c: refactor write_loose_object() to several steps
unpack-objects: low memory footprint for get_data() in dry_run mode
"git push" sometimes perform poorly when reachability bitmaps are
used, even in a repository where other operations are helped by
bitmaps. The push.useBitmaps configuration variable is introduced
to allow disabling use of reachability bitmaps only for "git push".
* zk/push-use-bitmaps:
send-pack.c: add config push.useBitmaps
43966ab315 (revert: optionally refer to commit in the "reference"
format, 2022-05-26) added the documentation file config/revert.txt.
Actually include it in config.txt.
Make is used with a bare infinitive after the object; remove the "to".
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 7281c196b1 (transfer doc: move fetch.credentialsInUrl to
"transfer" config namespace, 2022-06-15) propagates a typo from
6dcbdc0d66 (remote: create fetch.credentialsInUrl config, 2022-06-06),
where "other" is misspelled as "oher". Fix the typo accordingly.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint-2.35:
Git 2.35.4
Git 2.34.4
Git 2.33.4
Git 2.32.3
Git 2.31.4
Git 2.30.5
setup: tighten ownership checks post CVE-2022-24765
git-compat-util: allow root to access both SUDO_UID and root owned
t0034: add negative tests and allow git init to mostly work under sudo
git-compat-util: avoid failing dir ownership checks if running privileged
t: regression git needs safe.directory when using sudo
* maint-2.34:
Git 2.34.4
Git 2.33.4
Git 2.32.3
Git 2.31.4
Git 2.30.5
setup: tighten ownership checks post CVE-2022-24765
git-compat-util: allow root to access both SUDO_UID and root owned
t0034: add negative tests and allow git init to mostly work under sudo
git-compat-util: avoid failing dir ownership checks if running privileged
t: regression git needs safe.directory when using sudo
* maint-2.33:
Git 2.33.4
Git 2.32.3
Git 2.31.4
Git 2.30.5
setup: tighten ownership checks post CVE-2022-24765
git-compat-util: allow root to access both SUDO_UID and root owned
t0034: add negative tests and allow git init to mostly work under sudo
git-compat-util: avoid failing dir ownership checks if running privileged
t: regression git needs safe.directory when using sudo
* maint-2.32:
Git 2.32.3
Git 2.31.4
Git 2.30.5
setup: tighten ownership checks post CVE-2022-24765
git-compat-util: allow root to access both SUDO_UID and root owned
t0034: add negative tests and allow git init to mostly work under sudo
git-compat-util: avoid failing dir ownership checks if running privileged
t: regression git needs safe.directory when using sudo
* maint-2.31:
Git 2.31.4
Git 2.30.5
setup: tighten ownership checks post CVE-2022-24765
git-compat-util: allow root to access both SUDO_UID and root owned
t0034: add negative tests and allow git init to mostly work under sudo
git-compat-util: avoid failing dir ownership checks if running privileged
t: regression git needs safe.directory when using sudo
* maint-2.30:
Git 2.30.5
setup: tighten ownership checks post CVE-2022-24765
git-compat-util: allow root to access both SUDO_UID and root owned
t0034: add negative tests and allow git init to mostly work under sudo
git-compat-util: avoid failing dir ownership checks if running privileged
t: regression git needs safe.directory when using sudo
"sudo git foo" used to consider a repository owned by the original
user a safe one to access; it now also considers a repository owned
by root a safe one, too (after all, if an attacker can craft a
malicious repository owned by root, the box is 0wned already).
* cb/path-owner-check-with-sudo-plus:
git-compat-util: allow root to access both SUDO_UID and root owned
Reachability bitmaps are designed to speed up the "counting objects"
phase of generating a pack during a clone or fetch. They are not
optimized for Git clients sending a small topic branch via "git push".
In some cases (see [1]), using reachability bitmaps during "git push"
can cause significant performance regressions.
Add a new "push.useBitmaps" configuration variable to allow users to
tell "git push" not to use bitmaps. We already have "pack.bitmaps"
that controls the use of bitmaps, but a separate configuration variable
allows the reachability bitmaps to still be used in other areas,
such as "git upload-pack", while disabling it only for "git push".
[1]: https://lore.kernel.org/git/87zhoz8b9o.fsf@evledraar.gmail.com/
Signed-off-by: Kyle Zhao <kylezhao@tencent.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previous changes introduced a regression which will prevent root for
accessing repositories owned by thyself if using sudo because SUDO_UID
takes precedence.
Loosen that restriction by allowing root to access repositories owned
by both uid by default and without having to add a safe.directory
exception.
A previous workaround that was documented in the tests is no longer
needed so it has been removed together with its specially crafted
prerequisite.
Helped-by: Johanness Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some config variables are combinations of multiple words, and we
typically write them in camelCase forms in manpage and translatable
strings. It's not easy to find mismatches for these camelCase config
variables during code reviews, but occasionally they are identified
during localization translations.
To check for mismatched config variables, I introduced a new feature
in the helper program for localization[^1]. The following mismatched
config variables have been identified by running the helper program,
such as "git-po-helper check-pot".
Lowercase in manpage should use camelCase:
* Documentation/config/http.txt: http.pinnedpubkey
Lowercase in translable strings should use camelCase:
* builtin/fast-import.c: pack.indexversion
* builtin/gc.c: gc.logexpiry
* builtin/index-pack.c: pack.indexversion
* builtin/pack-objects.c: pack.indexversion
* builtin/repack.c: pack.writebitmaps
* commit.c: i18n.commitencoding
* gpg-interface.c: user.signingkey
* http.c: http.postbuffer
* submodule-config.c: submodule.fetchjobs
Mismatched camelCases, choose the former:
* Documentation/config/transfer.txt: transfer.credentialsInUrl
remote.c: transfer.credentialsInURL
[^1]: https://github.com/git-l10n/git-po-helper
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename fetch.credentialsInUrl to transfer.credentialsInUrl as the
single configuration variable should work both in pushing and
fetching.
* ab/credentials-in-url-more:
transfer doc: move fetch.credentialsInUrl to "transfer" config namespace
fetch doc: note "pushurl" caveat about "credentialsInUrl", elaborate
"git revert" learns "--reference" option to use more human-readable
reference to the commit it reverts in the message template it
prepares for the user.
* jc/revert-show-parent-info:
revert: --reference should apply only to 'revert', not 'cherry-pick'
revert: optionally refer to commit in the "reference" format
Rename the "fetch.credentialsInUrl" configuration variable introduced
in 6dcbdc0d66 (remote: create fetch.credentialsInUrl config,
2022-06-06) to "transfer".
There are existing exceptions, but generally speaking the
"<namespace>.<var>" configuration should only apply to command
described in the "namespace" (and its sub-commands, so e.g. "clone.*"
or "fetch.*" might also configure "git-remote-https").
But in the case of "fetch.credentialsInUrl" we've got a configuration
variable that configures the behavior of all of "clone", "push" and
"fetch", someone adjusting "fetch.*" configuration won't expect to
have the behavior of "git push" altered, especially as we have the
pre-existing "{transfer,fetch,receive}.fsckObjects", which configures
different parts of the transfer dialog.
So let's move this configuration variable to the "transfer" namespace
before it's exposed in a release. We could add all of
"{transfer,fetch,pull}.credentialsInUrl" at some other time, but once
we have "fetch" configure "pull" such an arrangement would would be a
confusing mess, as we'd at least need to have "fetch" configure
"push" (but not the other way around), or change existing behavior.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Amend the documentation and release notes entry for the
"fetch.credentialsInUrl" feature added in 6dcbdc0d66 (remote: create
fetch.credentialsInUrl config, 2022-06-06), it currently doesn't
detect passwords in `remote.<name>.pushurl` configuration. We
shouldn't lull users into a false sense of security, so we need to
mention that prominently.
This also elaborates and clarifies the "exposes the password in
multiple ways" part of the documentation. As noted in [1] a user
unfamiliar with git's implementation won't know what to make of that
scary claim, e.g. git hypothetically have novel git-specific ways of
exposing configured credentials.
The reality is that this configuration is intended as an aid for users
who can't fully trust their OS's or system's security model, so lets
say that's what this is intended for, and mention the most common ways
passwords stored in configuration might inadvertently get exposed.
1. https://lore.kernel.org/git/220524.86ilpuvcqh.gmgdl@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "fetch.credentialsInUrl" configuration variable controls what
happens when a URL with embedded login credential is used.
* ds/credentials-in-url:
remote: create fetch.credentialsInUrl config
Make use of the stream_loose_object() function introduced in the
preceding commit to unpack large objects. Before this we'd need to
malloc() the size of the blob before unpacking it, which could cause
OOM with very large blobs.
We could use the new streaming interface to unpack all blobs, but
doing so would be much slower, as demonstrated e.g. with this
benchmark using git-hyperfine[0]:
rm -rf /tmp/scalar.git &&
git clone --bare https://github.com/Microsoft/scalar.git /tmp/scalar.git &&
mv /tmp/scalar.git/objects/pack/*.pack /tmp/scalar.git/my.pack &&
git hyperfine \
-r 2 --warmup 1 \
-L rev origin/master,HEAD -L v "10,512,1k,1m" \
-s 'make' \
-p 'git init --bare dest.git' \
-c 'rm -rf dest.git' \
'./git -C dest.git -c core.bigFileThreshold={v} unpack-objects </tmp/scalar.git/my.pack'
Here we'll perform worse with lower core.bigFileThreshold settings
with this change in terms of speed, but we're getting lower memory use
in return:
Summary
'./git -C dest.git -c core.bigFileThreshold=10 unpack-objects </tmp/scalar.git/my.pack' in 'origin/master' ran
1.01 ± 0.01 times faster than './git -C dest.git -c core.bigFileThreshold=1k unpack-objects </tmp/scalar.git/my.pack' in 'origin/master'
1.01 ± 0.01 times faster than './git -C dest.git -c core.bigFileThreshold=1m unpack-objects </tmp/scalar.git/my.pack' in 'origin/master'
1.01 ± 0.02 times faster than './git -C dest.git -c core.bigFileThreshold=1m unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
1.02 ± 0.00 times faster than './git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/scalar.git/my.pack' in 'origin/master'
1.09 ± 0.01 times faster than './git -C dest.git -c core.bigFileThreshold=1k unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
1.10 ± 0.00 times faster than './git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
1.11 ± 0.00 times faster than './git -C dest.git -c core.bigFileThreshold=10 unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
A better benchmark to demonstrate the benefits of that this one, which
creates an artificial repo with a 1, 25, 50, 75 and 100MB blob:
rm -rf /tmp/repo &&
git init /tmp/repo &&
(
cd /tmp/repo &&
for i in 1 25 50 75 100
do
dd if=/dev/urandom of=blob.$i count=$(($i*1024)) bs=1024
done &&
git add blob.* &&
git commit -mblobs &&
git gc &&
PACK=$(echo .git/objects/pack/pack-*.pack) &&
cp "$PACK" my.pack
) &&
git hyperfine \
--show-output \
-L rev origin/master,HEAD -L v "512,50m,100m" \
-s 'make' \
-p 'git init --bare dest.git' \
-c 'rm -rf dest.git' \
'/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold={v} unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum'
Using this test we'll always use >100MB of memory on
origin/master (around ~105MB), but max out at e.g. ~55MB if we set
core.bigFileThreshold=50m.
The relevant "Maximum resident set size" lines were manually added
below the relevant benchmark:
'/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=50m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'origin/master' ran
Maximum resident set size (kbytes): 107080
1.02 ± 0.78 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'origin/master'
Maximum resident set size (kbytes): 106968
1.09 ± 0.79 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=100m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'origin/master'
Maximum resident set size (kbytes): 107032
1.42 ± 1.07 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=100m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'HEAD'
Maximum resident set size (kbytes): 107072
1.83 ± 1.02 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=50m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'HEAD'
Maximum resident set size (kbytes): 55704
2.16 ± 1.19 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'HEAD'
Maximum resident set size (kbytes): 4564
This shows that if you have enough memory this new streaming method is
slower the lower you set the streaming threshold, but the benefit is
more bounded memory use.
An earlier version of this patch introduced a new
"core.bigFileStreamingThreshold" instead of re-using the existing
"core.bigFileThreshold" variable[1]. As noted in a detailed overview
of its users in [2] using it has several different meanings.
Still, we consider it good enough to simply re-use it. While it's
possible that someone might want to e.g. consider objects "small" for
the purposes of diffing but "big" for the purposes of writing them
such use-cases are probably too obscure to worry about. We can always
split up "core.bigFileThreshold" in the future if there's a need for
that.
0. https://github.com/avar/git-hyperfine/
1. https://lore.kernel.org/git/20211210103435.83656-1-chiyutianyi@gmail.com/
2. https://lore.kernel.org/git/20220120112114.47618-5-chiyutianyi@gmail.com/
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Han Xin <chiyutianyi@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The core.bigFileThreshold documentation has been largely unchanged
since 5eef828bc0 (fast-import: Stream very large blobs directly to
pack, 2010-02-01).
But since then this setting has been expanded to affect a lot more
than that description indicated. Most notably in how "git diff" treats
them, see 6bf3b81348 (diff --stat: mark any file larger than
core.bigfilethreshold binary, 2014-08-16).
In addition to that, numerous commands and APIs make use of a
streaming mode for files above this threshold.
So let's attempt to summarize 12 years of changes in behavior, which
can be seen with:
git log --oneline -Gbig_file_thre 5eef828bc03.. -- '*.c'
To do that turn this into a bullet-point list. The summary Han Xin
produced in [1] helped a lot, but is a bit too detailed for
documentation aimed at users. Let's instead summarize how
user-observable behavior differs, and generally describe how we tend
to stream these files in various commands.
1. https://lore.kernel.org/git/20220120112114.47618-5-chiyutianyi@gmail.com/
Helped-by: Han Xin <chiyutianyi@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using `ssh-add -L` for gpg.ssh.defaultKeyCommand is not a good
recommendation. It might switch keys depending on the order of known
keys and it only supports ssh-* and no ecdsa or other keys.
Clarify that we expect a literal key prefixed by `key::`, give valid
example use cases and refer to `user.signingKey` as the preferred
option.
Signed-off-by: Fabian Stelzer <fs@gigacodes.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users sometimes provide a "username:password" combination in their
plaintext URLs. Since Git stores these URLs in plaintext in the
.git/config file, this is a very insecure way of storing these
credentials. Credential managers are a more secure way of storing this
information.
System administrators might want to prevent this kind of use by users on
their machines.
Create a new "fetch.credentialsInUrl" config option and teach Git to
warn or die when seeing a URL with this kind of information. The warning
anonymizes the sensitive information of the URL to be clear about the
issue.
This change currently defaults the behavior to "allow" which does
nothing with these URLs. We can consider changing this behavior to
"warn" by default if we wish. At that time, we may want to add some
advice about setting fetch.credentialsInUrl=ignore for users who still
want to follow this pattern (and not receive the warning).
An earlier version of this change injected the logic into
url_normalize() in urlmatch.c. While most code paths that parse URLs
eventually normalize the URL, that normalization does not happen early
enough in the stack to avoid attempting connections to the URL first. By
inserting a check into the remote validation, we identify the issue
before making a connection. In the old code path, this was revealed by
testing the new t5601-clone.sh test under --stress, resulting in an
instance where the return code was 13 (SIGPIPE) instead of 128 from the
die().
However, we can reuse the parsing information from url_normalize() in
order to benefit from its well-worn parsing logic. We can use the struct
url_info that is created in that method to replace the password with
"<redacted>" in our error messages. This comes with a slight downside
that the normalized URL might look slightly different from the input URL
(for instance, the normalized version adds a closing slash). This should
not hinder users figuring out what the problem is and being able to fix
the issue.
As an attempt to ensure the parsing logic did not catch any
unintentional cases, I modified this change locally to to use the "die"
option by default. Running the test suite succeeds except for the
explicit username:password URLs used in t5550-http-fetch-dumb.sh and
t5541-http-push-smart.sh. This means that all other tested URLs did not
trigger this logic.
The tests show that the proper error messages appear (or do not
appear), but also count the number of error messages. When only warning,
each process validates the remote URL and outputs a warning. This
happens twice for clone, three times for fetch, and once for push.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A mechanism to pack unreachable objects into a "cruft pack",
instead of ejecting them into loose form to be reclaimed later, has
been introduced.
* tb/cruft-packs:
sha1-file.c: don't freshen cruft packs
builtin/gc.c: conditionally avoid pruning objects via loose
builtin/repack.c: add cruft packs to MIDX during geometric repack
builtin/repack.c: use named flags for existing_packs
builtin/repack.c: allow configuring cruft pack generation
builtin/repack.c: support generating a cruft pack
builtin/pack-objects.c: --cruft with expiration
reachable: report precise timestamps from objects in cruft packs
reachable: add options to add_unseen_recent_objects_to_traversal
builtin/pack-objects.c: --cruft without expiration
builtin/pack-objects.c: return from create_object_entry()
t/helper: add 'pack-mtimes' test-tool
pack-mtimes: support writing pack .mtimes files
chunk-format.h: extract oid_version()
pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
pack-mtimes: support reading .mtimes files
Documentation/technical: add cruft-packs.txt
Introduce a filesystem-dependent mechanism to optimize the way the
bits for many loose object files are ensured to hit the disk
platter.
* ns/batch-fsync:
core.fsyncmethod: performance tests for batch mode
t/perf: add iteration setup mechanism to perf-lib
core.fsyncmethod: tests for batch mode
test-lib-functions: add parsing helpers for ls-files and ls-tree
core.fsync: use batch mode and sync loose objects by default on Windows
unpack-objects: use the bulk-checkin infrastructure
update-index: use the bulk-checkin infrastructure
builtin/add: add ODB transaction around add_files_to_cache
cache-tree: use ODB transaction around writing a tree
core.fsyncmethod: batched disk flushes for loose-objects
bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
bulk-checkin: rename 'state' variable and separate 'plugged' boolean
Deprecate non-cone mode of the sparse-checkout feature.
* en/sparse-cone-becomes-default:
Documentation: some sparsity wording clarifications
git-sparse-checkout.txt: mark non-cone mode as deprecated
git-sparse-checkout.txt: flesh out pattern set sections a bit
git-sparse-checkout.txt: add a new EXAMPLES section
git-sparse-checkout.txt: shuffle some sections and mark as internal
git-sparse-checkout.txt: update docs for deprecation of 'init'
git-sparse-checkout.txt: wording updates for the cone mode default
sparse-checkout: make --cone the default
tests: stop assuming --no-cone is the default mode for sparse-checkout
"git add -i" was rewritten in C some time ago and has been in
testing; the reimplementation is now exposed to general public by
default.
* js/use-builtin-add-i:
add -i: default to the built-in implementation
t2016: require the PERL prereq only when necessary
With the new http.curloptResolve configuration, the CURLOPT_RESOLVE
mechanism that allows cURL based applications to use pre-resolved
IP addresses for the requests is exposed to the scripts.
* cc/http-curlopt-resolve:
http: add custom hostname to IP address resolutions
A typical "git revert" commit uses the full title of the original
commit in its title, and starts its body of the message with:
This reverts commit 8fa7f667cf61386257c00d6e954855cc3215ae91.
This does not encourage the best practice of describing not just
"what" (i.e. "Revert X" on the title says what we did) but "why"
(i.e. and it does not say why X was undesirable).
We can instead phrase this first line of the body to be more like
This reverts commit 8fa7f667 (do this and that, 2022-04-25)
so that the title does not have to be
Revert "do this and that"
We can instead use the title to describe "why" we are reverting the
original commit.
Introduce the "--reference" option to "git revert", and also the
revert.reference configuration variable, which defaults to false, to
tweak the title and the first line of the draft commit message for
when creating a "revert" commit.
When this option is in use, the first line of the pre-filled editor
buffer becomes a comment line that tells the user to say _why_. If
the user exits the editor without touching this line by mistake,
what we prepare to become the first line of the body, i.e. "This
reverts commit 8fa7f667 (do this and that, 2022-04-25)", ends up to
be the title of the resulting commit. This behaviour is designed to
help such a user to identify such a revert in "git log --oneline"
easily so that it can be further reworded with "git rebase -i" later.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expose the new `git repack --cruft` mode from `git gc` via a new opt-in
flag. When invoked like `git gc --cruft`, `git gc` will avoid exploding
unreachable objects as loose ones, and instead create a cruft pack and
`.mtimes` file.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In servers which set the pack.window configuration to a large value, we
can wind up spending quite a lot of time finding new bases when breaking
delta chains between reachable and unreachable objects while generating
a cruft pack.
Introduce a handful of `repack.cruft*` configuration variables to
control the parameters used by pack-objects when generating a cruft
pack.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With a recent update to refuse access to repositories of other
people by default, "sudo make install" and "sudo git describe"
stopped working. This series intends to loosen it while keeping
the safety.
* cb/path-owner-check-with-sudo:
t0034: add negative tests and allow git init to mostly work under sudo
git-compat-util: avoid failing dir ownership checks if running privileged
t: regression git needs safe.directory when using sudo
"git -c branch.autosetupmerge=simple branch $A $B" will set the $B
as $A's upstream only when $A and $B shares the same name, and "git
-c push.default=simple" on branch $A would push to update the
branch $A at the remote $B came from. Also more places use the
sole remote, if exists, before defaulting to 'origin'.
* tk/simple-autosetupmerge:
push: new config option "push.autoSetupRemote" supports "simple" push
push: default to single remote even when not named origin
branch: new autosetupmerge option 'simple' for matching branches
New tests for the safe.directory mechanism.
* sg/safe-directory-tests-and-docs:
safe.directory: document and check that it's ignored in the environment
t0033-safe-directory: check when 'safe.directory' is ignored
t0033-safe-directory: check the error message without matching the trash dir
Libcurl has a CURLOPT_RESOLVE easy option that allows
the result of hostname resolution in the following
format to be passed:
[+]HOST:PORT:ADDRESS[,ADDRESS]
This way, redirects and everything operating against the
HOST+PORT will use the provided ADDRESS(s).
The following format is also allowed to stop using
hostname resolutions that have already been passed:
-HOST:PORT
See https://curl.se/libcurl/c/CURLOPT_RESOLVE.html for
more details.
Let's add a corresponding "http.curloptResolve" config
option that takes advantage of CURLOPT_RESOLVE.
Each value configured for the "http.curloptResolve" key
is passed "as is" to libcurl through CURLOPT_RESOLVE, so
it should be in one of the above 2 formats. This keeps
the implementation simple and makes us consistent with
libcurl's CURLOPT_RESOLVE, and with curl's corresponding
`--resolve` command line option.
The implementation uses CURLOPT_RESOLVE only in
get_active_slot() which is called by all the HTTP
request sending functions.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
bdc77d1d68 (Add a function to determine whether a path is owned by the
current user, 2022-03-02) checks for the effective uid of the running
process using geteuid() but didn't account for cases where that user was
root (because git was invoked through sudo or a compatible tool) and the
original uid that repository trusted for its config was no longer known,
therefore failing the following otherwise safe call:
guy@renard ~/Software/uncrustify $ sudo git describe --always --dirty
[sudo] password for guy:
fatal: unsafe repository ('/home/guy/Software/uncrustify' is owned by someone else)
Attempt to detect those cases by using the environment variables that
those tools create to keep track of the original user id, and do the
ownership check using that instead.
This assumes the environment the user is running on after going
privileged can't be tampered with, and also adds code to restrict that
the new behavior only applies if running as root, therefore keeping the
most common case, which runs unprivileged, from changing, but because of
that, it will miss cases where sudo (or an equivalent) was used to change
to another unprivileged user or where the equivalent tool used to raise
privileges didn't track the original id in a sudo compatible way.
Because of compatibility with sudo, the code assumes that uid_t is an
unsigned integer type (which is not required by the standard) but is used
that way in their codebase to generate SUDO_UID. In systems where uid_t
is signed, sudo might be also patched to NOT be unsigned and that might
be able to trigger an edge case and a bug (as described in the code), but
it is considered unlikely to happen and even if it does, the code would
just mostly fail safely, so there was no attempt either to detect it or
prevent it by the code, which is something that might change in the future,
based on expected user feedback.
Reported-by: Guy Maurel <guy.j@maurel.de>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Randall Becker <rsbecker@nexbridge.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Suggested-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reimplement "vimdiff[123]" mergetool drivers with a more generic
layout mechanism.
* fr/vimdiff-layout:
mergetools: add description to all diff/merge tools
vimdiff: add tool documentation
vimdiff: integrate layout tests in the unit tests framework ('t' folder)
vimdiff: new implementation with layout support
In some "simple" centralized workflows, users expect remote tracking
branch names to match local branch names. "git push" pushes to the
remote version/instance of the branch, and "git pull" pulls any changes
to the remote branch (changes made by the same user in another place, or
by other users).
This expectation is supported by the push.default default option "simple"
which refuses a default push for a mismatching tracking branch name, and
by the new branch.autosetupmerge option, "simple", which only sets up
remote tracking for same-name remote branches.
When a new branch has been created by the user and has not yet been
pushed (and push.default is not set to "current"), the user is prompted
with a "The current branch %s has no upstream branch" error, and
instructions on how to push and add tracking.
This error is helpful in that following the advice once per branch
"resolves" the issue for that branch forever, but inconvenient in that
for the "simple" centralized workflow, this is always the right thing to
do, so it would be better to just do it.
Support this workflow with a new config setting, push.autoSetupRemote,
which will cause a default push, when there is no remote tracking branch
configured, to push to the same-name on the remote and --set-upstream.
Also add a hint offering this new option when the "The current branch %s
has no upstream branch" error is encountered, and add corresponding tests.
Signed-off-by: Tao Klerks <tao@klerks.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With "push.default=current" configured, a simple "git push" will push to
the same-name branch on the current branch's branch.<name>.pushRemote, or
remote.pushDefault, or origin. If none of these are defined, the push will
fail with error "fatal: No configured push destination".
The same "default to origin if no config" behavior applies with
"push.default=matching".
Other commands use "origin" as a default when there are multiple options,
but default to the single remote when there is only one - for example,
"git checkout <something>". This "assume the single remote if there is
only one" behavior is more friendly/useful than a defaulting behavior
that only uses the name "origin" no matter what.
Update "git push" to also default to the single remote (and finally fall
back to "origin" as default if there are several), for
"push.default=current" and for other current and future remote-defaulting
push behaviors.
This change also modifies the behavior of ls-remote in a consistent way,
so defaulting not only supplies 'origin', but any single configured remote
also.
Document the change in behavior, correct incorrect assumptions in related
tests, and add test cases reflecting this new single-remote-defaulting
behavior.
Signed-off-by: Tao Klerks <tao@klerks.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the default push.default option, "simple", beginners are
protected from accidentally pushing to the "wrong" branch in
centralized workflows: if the remote tracking branch they would push
to does not have the same name as the local branch, and they try to do
a "default push", they get an error and explanation with options.
There is a particular centralized workflow where this often happens:
a user branches to a new local topic branch from an existing
remote branch, eg with "checkout -b feature1 origin/master". With
the default branch.autosetupmerge configuration (value "true"), git
will automatically add origin/master as the upstream tracking branch.
When the user pushes with a default "git push", with the intention of
pushing their (new) topic branch to the remote, they get an error, and
(amongst other things) a suggestion to run "git push origin HEAD".
If they follow this suggestion the push succeeds, but on subsequent
default pushes they continue to get an error - so eventually they
figure out to add "-u" to change the tracking branch, or they spelunk
the push.default config doc as proposed and set it to "current", or
some GUI tooling does one or the other of these things for them.
When one of their coworkers later works on the same topic branch,
they don't get any of that "weirdness". They just "git checkout
feature1" and everything works exactly as they expect, with the shared
remote branch set up as remote tracking branch, and push and pull
working out of the box.
The "stable state" for this way of working is that local branches have
the same-name remote tracking branch (origin/feature1 in this
example), and multiple people can work on that remote feature branch
at the same time, trusting "git pull" to merge or rebase as required
for them to be able to push their interim changes to that same feature
branch on that same remote.
(merging from the upstream "master" branch, and merging back to it,
are separate more involved processes in this flow).
There is a problem in this flow/way of working, however, which is that
the first user, when they first branched from origin/master, ended up
with the "wrong" remote tracking branch (different from the stable
state). For a while, before they pushed (and maybe longer, if they
don't use -u/--set-upstream), their "git pull" wasn't getting other
users' changes to the feature branch - it was getting any changes from
the remote "master" branch instead (a completely different class of
changes!)
An experienced git user might say "well yeah, that's what it means to
have the remote tracking branch set to origin/master!" - but the
original user above didn't *ask* to have the remote master branch
added as remote tracking branch - that just happened automatically
when they branched their feature branch. They didn't necessarily even
notice or understand the meaning of the "set up to track 'origin/master'"
message when they created the branch - especially if they are using a
GUI.
Looking at how to fix this, you might think "OK, so disable auto setup
of remote tracking - set branch.autosetupmerge to false" - but that
will inconvenience the *second* user in this story - the one who just
wanted to start working on the topic branch. The first and second
users swap roles at different points in time of course - they should
both have a sane configuration that does the right thing in both
situations.
Make this "branches have the same name locally as on the remote"
workflow less painful / more obvious by introducing a new
branch.autosetupmerge option called "simple", to match the same-name
"push.default" option that makes similar assumptions.
This new option automatically sets up tracking in a *subset* of the
current default situations: when the original ref is a remote tracking
branch *and* has the same branch name on the remote (as the new local
branch name).
Update the error displayed when the 'push.default=simple' configuration
rejects a mismatching-upstream-name default push, to offer this new
branch.autosetupmerge option that will prevent this class of error.
With this new configuration, in the example situation above, the first
user does *not* get origin/master set up as the tracking branch for
the new local branch. If they "git pull" in their new local-only
branch, they get an error explaining there is no upstream branch -
which makes sense and is helpful. If they "git push", they get an
error explaining how to push *and* suggesting they specify
--set-upstream - which is exactly the right thing to do for them.
This new option is likely not appropriate for users intentionally
implementing a "triangular workflow" with a shared upstream tracking
branch, that they "git pull" in and a "private" feature branch that
they push/force-push to just for remote safe-keeping until they are
ready to push up to the shared branch explicitly/separately. Such
users are likely to prefer keeping the current default
merge.autosetupmerge=true behavior, and change their push.default to
"current".
Also extend the existing branch tests with three new cases testing
this option - the obvious matching-name and non-matching-name cases,
and also a non-matching-ref-type case. The matching-name case needs to
temporarily create an independent repo to fetch from, as the general
strategy of using the local repo as the remote in these tests
precludes locally branching with the same name as in the "remote".
Signed-off-by: Tao Klerks <tao@klerks.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The description of 'safe.directory' mentions that it's respected in
the system and global configs, and ignored in the repository config
and on the command line, but it doesn't mention whether it's respected
or ignored when specified via environment variables (nor does the
commit message adding 'safe.directory' [1]).
Clarify that 'safe.directory' is ignored when specified in the
environment, and add tests to make sure that it remains so.
[1] 8959555cee (setup_git_directory(): add an owner check for the
top-level directory, 2022-03-02)
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make cone mode the default, and update the documentation accordingly.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the addition of the safe.directory in 8959555ce
(setup_git_directory(): add an owner check for the top-level directory,
2022-03-02) released in v2.35.2, we are receiving feedback from a
variety of users about the feature.
Some users have a very large list of shared repositories and find it
cumbersome to add this config for every one of them.
In a more difficult case, certain workflows involve running Git commands
within containers. The container boundary prevents any global or system
config from communicating `safe.directory` values from the host into the
container. Further, the container almost always runs as a different user
than the owner of the directory in the host.
To simplify the reactions necessary for these users, extend the
definition of the safe.directory config value to include a possible '*'
value. This value implies that all directories are safe, providing a
single setting to opt-out of this protection.
Note that an empty assignment of safe.directory clears all previous
values, and this is already the case with the "if (!value || !*value)"
condition.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Asciidoc renders `--` as em-dash. This is not appropriate for command
names. It also breaks linkgit links to these commands.
Fix git-credential-cache--daemon and git-fsmonitor--daemon. The latter
was added 3248486920 (fsmonitor: document builtin fsmonitor, 2022-03-25)
and included several links. A check for broken links in the HTML docs
turned this up.
Manually inspecting the other Documentation/git-*--*.txt files turned up
the issue in git-credential-cache--daemon.
While here, quote `git credential-cache--daemon` in the synopsis to
match the vast majority of our other documentation.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When adding many objects to a repo with `core.fsync=loose-object`,
the cost of fsync'ing each object file can become prohibitive.
One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. This commit introduces
a new `core.fsyncMethod=batch` option that batches up hardware flushes.
It hooks into the bulk-checkin odb-transaction functionality, takes
advantage of tmp-objdir, and uses the writeout-only support code.
When the new mode is enabled, we do the following for each new object:
1a. Create the object in a tmp-objdir.
2a. Issue a pagecache writeback request and wait for it to complete.
At the end of the entire transaction when unplugging bulk checkin:
1b. Issue an fsync against a dummy file to flush the log and hardware
writeback cache, which should by now have seen the tmp-objdir writes.
2b. Rename all of the tmp-objdir files to their final names.
3b. When updating the index and/or refs, we assume that Git will issue
another fsync internal to that operation. This is not the default
today, but the user now has the option of syncing the index and there
is a separate patch series to implement syncing of refs.
On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS
we would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns. This sequence also ensures that no object files
appear in the main object store unless they are fsync-durable.
Batch mode is only enabled if core.fsync includes loose-objects. If
the legacy core.fsyncObjectFiles setting is enabled, but core.fsync does
not include loose-objects, we will use file-by-file fsyncing.
In step (1a) of the sequence, the tmp-objdir is created lazily to avoid
work if no loose objects are ever added to the ODB. We use a tmp-objdir
to maintain the invariant that no loose-objects are visible in the main
ODB unless they are properly fsync-durable. This is important since
future ODB operations that try to create an object with specific
contents will silently drop the new data if an object with the target
hash exists without checking that the loose-object contents match the
hash. Only a full git-fsck would restore the ODB to a functional state
where dataloss doesn't occur.
In step (1b) of the sequence, we issue a fsync against a dummy file
created specifically for the purpose. This method has a little higher
cost than using one of the input object files, but makes adding new
callers of this mechanism easier, since we don't need to figure out
which object file is "last" or risk sharing violations by caching the fd
of the last object file.
_Performance numbers_:
Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.
Adding 500 files to the repo with 'git add' Times reported in seconds.
object file syncing | Linux | Mac | Windows
--------------------|-------|-------|--------
disabled | 0.06 | 0.35 | 0.61
fsync | 1.88 | 11.18 | 2.47
batch | 0.15 | 0.41 | 1.53
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Built-in fsmonitor (part 2).
* jh/builtin-fsmonitor-part2: (30 commits)
t7527: test status with untracked-cache and fsmonitor--daemon
fsmonitor: force update index after large responses
fsmonitor--daemon: use a cookie file to sync with file system
fsmonitor--daemon: periodically truncate list of modified files
t/perf/p7519: add fsmonitor--daemon test cases
t/perf/p7519: speed up test on Windows
t/perf/p7519: fix coding style
t/helper/test-chmtime: skip directories on Windows
t/perf: avoid copying builtin fsmonitor files into test repo
t7527: create test for fsmonitor--daemon
t/helper/fsmonitor-client: create IPC client to talk to FSMonitor Daemon
help: include fsmonitor--daemon feature flag in version info
fsmonitor--daemon: implement handle_client callback
compat/fsmonitor/fsm-listen-darwin: implement FSEvent listener on MacOS
compat/fsmonitor/fsm-listen-darwin: add MacOS header files for FSEvent
compat/fsmonitor/fsm-listen-win32: implement FSMonitor backend on Windows
fsmonitor--daemon: create token-based changed path cache
fsmonitor--daemon: define token-ids
fsmonitor--daemon: add pathname classification
fsmonitor--daemon: implement 'start' command
...
Give hint when branch tracking cannot be established because fetch
refspecs from multiple remote repositories overlap.
* tk/ambiguous-fetch-refspec:
tracking branches: add advice to ambiguous refspec error
"git fetch --refetch" learned to fetch everything without telling
the other side what we already have, which is useful when you
cannot trust what you have in the local object store.
* rc/fetch-refetch:
docs: mention --refetch fetch option
fetch: after refetch, encourage auto gc repacking
t5615-partial-clone: add test for fetch --refetch
fetch: add --refetch option
builtin/fetch-pack: add --refetch option
fetch-pack: add refetch
fetch-negotiator: add specific noop initializer
Running 'git {merge,diff}tool --tool-help' now also prints usage
information about the vimdiff tool (and its variants) instead of just
its name.
Two new functions ('diff_cmd_help()' and 'merge_cmd_help()') have been
added to the set of functions that each merge tool (ie. scripts found
inside "mergetools/") can overwrite to provided tool specific
information.
Right now, only 'mergetools/vimdiff' implements these functions, but
other tools are encouraged to do so in the future, specially if they
take configuration options not explained anywhere else (as it is the
case with the 'vimdiff' tool and the new 'layout' option)
Note that the function 'show_tool_names', used in the implementation of
'git mergetool --tool-help', is also used in Documentation/Makefile to
generate the list of allowed values for the configuration variables
'{diff,merge}.{gui,}tool'. Adjust the rule so its output is an Asciidoc
"description list" instead of a plain list, with the tool name as the
item and the newly added tool description as the description.
In addition, a section has been added to
"Documentation/git-mergetool.txt" to explain the new "layout"
configuration option with examples.
Helped-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Fernando Ramos <greenfoo@u92.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The error "not tracking: ambiguous information for ref" is raised
when we are evaluating what tracking information to set on a branch,
and find that the ref to be added as tracking branch is mapped
under multiple remotes' fetch refspecs.
This can easily happen when a user copy-pastes a remote definition
in their git config, and forgets to change the tracking path.
Add advice in this situation, explicitly highlighting which remotes
are involved and suggesting how to correct the situation. Also
update a test to explicitly expect that advice.
Signed-off-by: Tao Klerks <tao@klerks.biz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git stash" does not allow subcommands it internally runs as its
implementation detail, except for "git reset", to emit messages;
now "git reset" part has also been squelched.
* vd/stash-silence-reset:
reset: show --no-refresh in the short-help
reset: remove 'reset.refresh' config option
reset: remove 'reset.quiet' config option
reset: do not make '--quiet' disable index refresh
stash: make internal resets quiet and refresh index
reset: suppress '--no-refresh' advice if logging is silenced
reset: replace '--quiet' with '--no-refresh' in performance advice
reset: introduce --[no-]refresh option to --mixed
reset: revise index refresh advice
Document it for partial clones as a means to apply a new filter, and
reference it from the remote.<name>.partialclonefilter config parameter.
Signed-off-by: Robert Coup <robert@coup.net.nz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Updates to refs traditionally weren't fsync'ed, but we can
configure using core.fsync variable to do so.
* ps/fsync-refs:
core.fsync: new option to harden references
Replace core.fsyncObjectFiles with two new configuration variables,
core.fsync and core.fsyncMethod.
* ns/core-fsyncmethod:
core.fsync: documentation and user-friendly aggregate options
core.fsync: new option to harden the index
core.fsync: add configuration parsing
core.fsync: introduce granular fsync control infrastructure
core.fsyncmethod: add writeout-only mode
wrapper: make inclusion of Windows csprng header tightly scoped
Document how `core.fsmonitor` can be set to a boolean to enable
or disable the builtin FSMonitor.
Update references to `core.fsmonitor` and `core.fsmonitorHookVersion` and
pointers to `Watchman` to refer to it.
Create `git-fsmonitor--daemon` manual page and describe its features.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint-2.34:
Git 2.34.2
Git 2.33.2
Git 2.32.1
Git 2.31.2
GIT-VERSION-GEN: bump to v2.33.1
Git 2.30.3
setup_git_directory(): add an owner check for the top-level directory
Add a function to determine whether a path is owned by the current user
* maint-2.33:
Git 2.33.2
Git 2.32.1
Git 2.31.2
GIT-VERSION-GEN: bump to v2.33.1
Git 2.30.3
setup_git_directory(): add an owner check for the top-level directory
Add a function to determine whether a path is owned by the current user
* maint-2.32:
Git 2.32.1
Git 2.31.2
Git 2.30.3
setup_git_directory(): add an owner check for the top-level directory
Add a function to determine whether a path is owned by the current user
* maint-2.31:
Git 2.31.2
Git 2.30.3
setup_git_directory(): add an owner check for the top-level directory
Add a function to determine whether a path is owned by the current user
* maint-2.30:
Git 2.30.3
setup_git_directory(): add an owner check for the top-level directory
Add a function to determine whether a path is owned by the current user
Remove the 'reset.quiet' config option, remove '--no-quiet' documentation in
'Documentation/git-reset.txt'. In 4c3abd0551 (reset: add new reset.quiet
config setting, 2018-10-23), 'reset.quiet' was introduced as a way to
globally change the default behavior of 'git reset --mixed' to skip index
refresh.
However, now that '--quiet' does not affect index refresh, 'reset.quiet'
would only serve to globally silence logging. This was not the original
intention of the config setting, and there's no precedent for such a setting
in other commands with a '--quiet' option, so it appears to be obsolete.
In addition to the options & its documentation, remove 'reset.quiet' from
the recommended config for 'scalar'.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git repack" learned a new configuration to disable triggering of
age-old "update-server-info" command, which is rarely useful these
days.
* ps/repack-with-server-info:
repack: add config to skip updating server info
repack: refactor to avoid double-negation of update-server-info
It poses a security risk to search for a git directory outside of the
directories owned by the current user.
For example, it is common e.g. in computer pools of educational
institutes to have a "scratch" space: a mounted disk with plenty of
space that is regularly swiped where any authenticated user can create
a directory to do their work. Merely navigating to such a space with a
Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/`
can lead to a compromised account.
The same holds true in multi-user setups running Windows, as `C:\` is
writable to every authenticated user by default.
To plug this vulnerability, we stop Git from accepting top-level
directories owned by someone other than the current user. We avoid
looking at the ownership of each and every directories between the
current and the top-level one (if there are any between) to avoid
introducing a performance bottleneck.
This new default behavior is obviously incompatible with the concept of
shared repositories, where we expect the top-level directory to be owned
by only one of its legitimate users. To re-enable that use case, we add
support for adding exceptions from the new default behavior via the
config setting `safe.directory`.
The `safe.directory` config setting is only respected in the system and
global configs, not from repository configs or via the command-line, and
can have multiple values to allow for multiple shared repositories.
We are particularly careful to provide a helpful message to any user
trying to use a shared repository.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When writing both loose and packed references to disk we first create a
lockfile, write the updated values into that lockfile, and on commit we
rename the file into place. According to filesystem developers, this
behaviour is broken because applications should always sync data to disk
before doing the final rename to ensure data consistency [1][2][3]. If
applications fail to do this correctly, a hard crash of the machine can
easily result in corrupted on-disk data.
This kind of corruption can in fact be easily observed with Git when the
machine hard-resets shortly after writing references to disk. On
machines with ext4, this will likely lead to the "empty files" problem:
the file has been renamed, but its data has not been synced to disk. The
result is that the reference is corrupt, and in the worst case this can
lead to data loss.
Implement a new option to harden references so that users and admins can
avoid this scenario by syncing locked loose and packed references to
disk before we rename them into place.
[1]: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/
[2]: https://btrfs.wiki.kernel.org/index.php/FAQ (What are the crash guarantees of overwrite-by-rename)
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/ext4.rst (see auto_da_alloc)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ns/core-fsyncmethod:
core.fsync: documentation and user-friendly aggregate options
core.fsync: new option to harden the index
core.fsync: add configuration parsing
core.fsync: introduce granular fsync control infrastructure
core.fsyncmethod: add writeout-only mode
wrapper: make inclusion of Windows csprng header tightly scoped
This commit adds aggregate options for the core.fsync setting that are
more user-friendly. These options are specified in terms of 'levels of
safety', indicating which Git operations are considered to be sync
points for durability.
The new documentation is also included here in its entirety for ease of
review.
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace references to '--quiet' with '--no-refresh' in the advice on how to
skip refreshing the index. When the advice was introduced, '--quiet' was the
only way to avoid the expensive 'refresh_index(...)' at the end of a mixed
reset. After introducing '--no-refresh', however, '--quiet' became only a
fallback option for determining refresh behavior, overridden by
'--[no-]refresh' or 'reset.refresh' if either is set. To ensure users are
advised to use the most reliable option for avoiding 'refresh_index(...)',
replace recommendation of '--quiet' with '--[no-]refresh'.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the advice describing index refresh from "enumerate unstaged changes"
to "refresh the index." Describing 'refresh_index(...)' as "enumerating
unstaged changes" is not fully representative of what an index refresh is
doing; more generally, it updates the properties of index entries that are
affected by outside-of-index state, e.g. CE_UPTODATE, which is affected by
the file contents on-disk. This distinction is relevant to operations that
read the index but do not refresh first - e.g., 'git read-tree' - where a
stale index may cause incorrect behavior.
In addition to changing the advice message, use the "advise" function to
print advice.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, git-repack(1) will update server info that is required by
the dumb HTTP transport. This can be skipped by passing the `-n` flag,
but what we're noticably missing is a config option to permanently
disable updating this information.
Add a new option "repack.updateServerInfo" which can be used to disable
the logic. Most hosting providers have turned off the dumb HTTP protocol
anyway, and on the client-side it woudln't typically be useful either.
Giving a persistent way to disable this feature thus makes quite some
sense to avoid wasting compute cycles and storage.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This change introduces code to parse the core.fsync setting and
configure the fsync_components variable.
core.fsync is configured as a comma-separated list of component names to
sync. Each time a core.fsync variable is encountered in the
configuration heirarchy, we start off with a clean state with the
platform default value. Passing 'none' resets the value to indicate
nothing will be synced. We gather all negative and positive entries from
the comma separated list and then compute the new value by removing all
the negative entries and adding all of the positive entries.
We issue a warning for components that are not recognized so that the
configuration code is compatible with configs from future versions of
Git with more repo components.
Complete documentation for the new setting is included in a later patch
in the series so that it can be reviewed once in final form.
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit introduces the `core.fsyncMethod` configuration
knob, which can currently be set to `fsync` or `writeout-only`.
The new writeout-only mode attempts to tell the operating system to
flush its in-memory page cache to the storage hardware without issuing a
CACHE_FLUSH command to the storage controller.
Writeout-only fsync is significantly faster than a vanilla fsync on
common hardware, since data is written to a disk-side cache rather than
all the way to a durable medium. Later changes in this patch series will
take advantage of this primitive to implement batching of hardware
flushes.
When git_fsync is called with FSYNC_WRITEOUT_ONLY, it may fail and the
caller is expected to do an ordinary fsync as needed.
On Apple platforms, the fsync system call does not issue a CACHE_FLUSH
directive to the storage controller. This change updates fsync to do
fcntl(F_FULLFSYNC) to make fsync actually durable. We maintain parity
with existing behavior on Apple platforms by setting the default value
of the new core.fsyncMethod option.
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In sparse-checkouts, files mis-marked as missing from the working tree
could lead to later problems. Such files were hard to discover, and
harder to correct. Automatically detecting and correcting the marking
of such files has been added to avoid these problems.
* en/present-despite-skipped:
repo_read_index: add config to expect files outside sparse patterns
Accelerate clear_skip_worktree_from_present_files() by caching
Update documentation related to sparsity and the skip-worktree bit
repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
unpack-trees: fix accidental loss of user changes
t1011: add testcase demonstrating accidental loss of user modifications
The error message given by "git switch HEAD~4" has been clarified
to suggest the "--detach" option that is required.
* ah/advice-switch-requires-detach-to-detach:
switch: mention the --detach option when dying due to lack of a branch
Typically with sparse checkouts, we expect files outside the sparsity
patterns to be marked as SKIP_WORKTREE and be missing from the working
tree. Sometimes this expectation would be violated however; including
in cases such as:
* users grabbing files from elsewhere and writing them to the worktree
(perhaps by editing a cached copy in an editor, copying/renaming, or
even untarring)
* various git commands having incomplete or no support for the
SKIP_WORKTREE bit[1,2]
* users attempting to "abort" a sparse-checkout operation with a
not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
working tree is not atomic)[3].
When the SKIP_WORKTREE bit in the index did not reflect the presence of
the file in the working tree, it traditionally caused confusion and was
difficult to detect and recover from. So, in a sparse checkout, since
af6a51875a (repo_read_index: clear SKIP_WORKTREE bit from files present
in worktree, 2022-01-14), Git automatically clears the SKIP_WORKTREE
bit at index read time for entries corresponding to files that are
present in the working tree.
There is another workflow, however, where it is expected that paths
outside the sparsity patterns appear to exist in the working tree and
that they do not lose the SKIP_WORKTREE bit, at least until they get
modified. A Git-aware virtual file system[4] takes advantage of its
position as a file system driver to expose all files in the working
tree, fetch them on demand using partial clone on access, and tell Git
to pay attention to them on demand by updating the sparse checkout
pattern on writes. This means that commands like "git status" only have
to examine files that have potentially been modified, whereas commands
like "ls" are able to show the entire codebase without requiring manual
updates to the sparse checkout pattern.
Thus since af6a51875a, Git with such Git-aware virtual file systems
unsets the SKIP_WORKTREE bit for all files and commands like "git
status" have to fetch and examine them all.
Introduce a configuration setting sparse.expectFilesOutsideOfPatterns to
allow limiting the tracked set of files to a small set once again. A
Git-aware virtual file system or other application that wants to
maintain files outside of the sparse checkout can set this in a
repository to instruct Git not to check for the presence of
SKIP_WORKTREE files. The setting defaults to false, so most users of
sparse checkout will still get the benefit of an automatically updating
index to recover from the variety of difficult issues detailed in
af6a51875a for paths with SKIP_WORKTREE set despite the path being
present.
[1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/
[2] The three long paragraphs in the middle of
https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/
[4] such as the vfsd described in
https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users who are accustomed to doing `git checkout <tag>` assume that
`git switch <tag>` will do the same thing. Inform them of the --detach
option so they aren't left wondering why `git switch` doesn't work but
`git checkout` does.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git clone --filter=... --recurse-submodules" only makes the
top-level a partial clone, while submodules are fully cloned. This
behaviour is changed to pass the same filter down to the submodules.
* js/apply-partial-clone-filters-recursively:
clone, submodule: pass partial clone filters to submodules
"git sparse-checkout" wants to work with per-worktree configuration,
but did not work well in a worktree attached to a bare repository.
* ds/sparse-checkout-requires-per-worktree-config:
config: make git_configset_get_string_tmp() private
worktree: copy sparse-checkout patterns and config on add
sparse-checkout: set worktree-config correctly
config: add repo_config_set_worktree_gently()
worktree: create init_worktree_config()
Documentation: add extensions.worktreeConfig details
"git branch" learned the "--recurse-submodules" option.
* gc/branch-recurse-submodules:
branch.c: use 'goto cleanup' in setup_tracking() to fix memory leaks
branch: add --recurse-submodules option for branch creation
builtin/branch: consolidate action-picking logic in cmd_branch()
branch: add a dry_run parameter to create_branch()
branch: make create_branch() always create a branch
branch: move --set-upstream-to behavior to dwim_and_setup_tracking()
Removal of unused code and doc.
* js/no-more-legacy-stash:
stash: stop warning about the obsolete `stash.useBuiltin` config setting
stash: remove documentation for `stash.useBuiltin`
add: remove support for `git-legacy-stash`
git-sh-setup: remove remnant bits referring to `git-legacy-stash`
Interaction between fetch.negotiationAlgorithm and
feature.experimental configuration variables has been corrected.
* en/fetch-negotiation-default-fix:
repo-settings: rename the traditional default fetch.negotiationAlgorithm
repo-settings: fix error handling for unknown values
repo-settings: fix checking for fetch.negotiationAlgorithm=default
When cloning a repo with a --filter and with --recurse-submodules
enabled, the partial clone filter only applies to the top-level repo.
This can lead to unexpected bandwidth and disk usage for projects which
include large submodules. For example, a user might wish to make a
partial clone of Gerrit and would run:
`git clone --recurse-submodules --filter=blob:5k https://gerrit.googlesource.com/gerrit`.
However, only the superproject would be a partial clone; all the
submodules would have all blobs downloaded regardless of their size.
With this change, the same filter can also be applied to submodules,
meaning the expected bandwidth and disk savings apply consistently.
To avoid changing default behavior, add a new clone flag,
`--also-filter-submodules`. When this is set along with `--filter` and
`--recurse-submodules`, the filter spec is passed along to git-submodule
and git-submodule--helper, such that submodule clones also have the
filter applied.
This applies the same filter to the superproject and all submodules.
Users who need to customize the filter per-submodule would need to clone
with `--no-recurse-submodules` and then manually initialize each
submodule with the proper filter.
Applying filters to submodules should be safe thanks to Jonathan Tan's
recent work [1, 2, 3] eliminating the use of alternates as a method of
accessing submodule objects, so any submodule object access now triggers
a lazy fetch from the submodule's promisor remote if the accessed object
is missing. This patch is a reworked version of [4], which was created
prior to Jonathan Tan's work.
[1]: 8721e2e (Merge branch 'jt/partial-clone-submodule-1', 2021-07-16)
[2]: 11e5d0a (Merge branch 'jt/grep-wo-submodule-odb-as-alternate',
2021-09-20)
[3]: 162a13b (Merge branch 'jt/no-abuse-alternate-odb-for-submodules',
2021-10-25)
[4]: https://lore.kernel.org/git/52bf9d45b8e2b72ff32aa773f2415bf7b2b86da2.1563322192.git.steadmon@google.com/
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The extensions.worktreeConfig extension was added in 58b284a (worktree:
add per-worktree config files, 2018-10-21) and was somewhat documented
in Documentation/git-config.txt. However, the extensions.worktreeConfig
value was not specified further in the list of possible config keys. The
location of the config.worktree file is not specified, and there are
some precautions that should be mentioned clearly, but are only
mentioned in git-worktree.txt.
Expand the documentation to help users discover the complexities of
extensions.worktreeConfig by adding details and cross links in these
locations (relative to Documentation/):
- config/extensions.txt
- git-config.txt
- git-worktree.txt
The updates focus on items such as
* $GIT_DIR/config.worktree takes precedence over $GIT_COMMON_DIR/config.
* The core.worktree and core.bare=true settings are incorrect to have in
the common config file when extensions.worktreeConfig is enabled.
* The sparse-checkout settings core.sparseCheckout[Cone] are recommended
to be set in the worktree config.
As documented in 11664196ac ("Revert "check_repository_format_gently():
refuse extensions for old repositories"", 2020-07-15), this extension
must be considered regardless of the repository format version for
historical reasons.
A future change will update references to extensions.worktreeConfig
within git-sparse-checkout.txt, but a behavior change is needed before
making those updates.
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To improve the submodules UX, we would like to teach Git to handle
branches in submodules. Start this process by teaching "git branch" the
--recurse-submodules option so that "git branch --recurse-submodules
topic" will create the `topic` branch in the superproject and its
submodules.
Although this commit does not introduce breaking changes, it does not
work well with existing --recurse-submodules commands because "git
branch --recurse-submodules" writes to the submodule ref store, but most
commands only consider the superproject gitlink and ignore the submodule
ref store. For example, "git checkout --recurse-submodules" will check
out the commits in the superproject gitlinks (and put the submodules in
detached HEAD) instead of checking out the submodule branches.
Because of this, this commit introduces a new configuration value,
`submodule.propagateBranches`. The plan is for Git commands to
prioritize submodule ref store information over superproject gitlinks if
this value is true. Because "git branch --recurse-submodules" writes to
submodule ref stores, for the sake of clarity, it will not function
unless this configuration value is set.
This commit also includes changes that support working with submodules
from a superproject commit because "branch --recurse-submodules" (and
future commands) need to read .gitmodules and gitlinks from the
superproject commit, but submodules are typically read from the
filesystem's .gitmodules and the index's gitlinks. These changes are:
* add a submodules_of_tree() helper that gives the relevant
information of an in-tree submodule (e.g. path and oid) and
initializes the repository
* add is_tree_submodule_active() by adding a treeish_name parameter to
is_submodule_active()
* add the "submoduleNotUpdated" advice to advise users to update the
submodules in their trees
Incidentally, fix an incorrect usage string that combined the 'list'
usage of git branch (-l) with the 'create' usage; this string has been
incorrect since its inception, a8dfd5eac4 (Make builtin-branch.c use
parse_options., 2007-10-07).
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Glen Choo <chooglen@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Give the traditional default fetch.negotiationAlgorithm the name
'consecutive'. Also allow a choice of 'default' to have Git decide
between the choices (currently, picking 'skipping' if
feature.experimental is true and 'consecutive' otherwise). Update the
documentation accordingly.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 8a2cd3f512 (stash: remove the stash.useBuiltin setting, 2020-03-03),
we removed the setting, and for a couple of major versions, we still
documented the setting, telling users that it is gone.
We can now safely remove even the documentation.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an apostrophe to "signatures" to indicate the possessive
relationship in "the signature's creation".
Signed-off-by: Greg Hurrell <greg@hurrell.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Specifically, replace the tab between "the" and "first" with a space.
Signed-off-by: Greg Hurrell <greg@hurrell.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git -c branch.autosetupmerge=inherit branch new old" makes "new"
to have the same upstream as the "old" branch, instead of marking
"old" itself as its upstream.
* js/branch-track-inherit:
config: require lowercase for branch.*.autosetupmerge
branch: add flags and config to inherit tracking
branch: accept multiple upstream branches for tracking
The cryptographic signing using ssh keys can specify literal keys
for keytypes whose name do not begin with the "ssh-" prefix by
using the "key::" prefix mechanism (e.g. "key::ecdsa-sha2-nistp256").
* fs/ssh-signing-other-keytypes:
ssh signing: make sign/amend test more resilient
ssh signing: support non ssh-* keytypes
Extend the signing of objects with SSH keys and learn to pay
attention to the key validity time range when verifying.
* fs/ssh-signing-key-lifetime:
ssh signing: verify ssh-keygen in test prereq
ssh signing: make fmt-merge-msg consider key lifetime
ssh signing: make verify-tag consider key lifetime
ssh signing: make git log verify key lifetime
ssh signing: make verify-commit consider key lifetime
ssh signing: add key lifetime test prereqs
ssh signing: use sigc struct to pass payload
t/fmt-merge-msg: make gpgssh tests more specific
t/fmt-merge-msg: do not redirect stderr
It can be helpful when creating a new branch to use the existing
tracking configuration from the branch point. However, there is
currently not a method to automatically do so.
Teach git-{branch,checkout,switch} an "inherit" argument to the
"--track" option. When this is set, creating a new branch will cause the
tracking configuration to default to the configuration of the branch
point, if set.
For example, if branch "main" tracks "origin/main", and we run
`git checkout --track=inherit -b feature main`, then branch "feature"
will track "origin/main". Thus, `git status` will show us how far
ahead/behind we are from origin, and `git pull` will pull from origin.
This is particularly useful when creating branches across many
submodules, such as with `git submodule foreach ...` (or if running with
a patch such as [1], which we use at $job), as it avoids having to
manually set tracking info for each submodule.
Since we've added an argument to "--track", also add "--track=direct" as
another way to explicitly get the original "--track" behavior ("--track"
without an argument still works as well).
Finally, teach branch.autoSetupMerge a new "inherit" option. When this
is set, "--track=inherit" becomes the default behavior.
[1]: https://lore.kernel.org/git/20180927221603.148025-1-sbeller@google.com/
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add missing colon to ensure correct rendering of definition list
item. Without the proper number of colons, it renders as just another
top-level paragraph rather than a list item.
Signed-off-by: Greg Hurrell <greg@hurrell.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The discussion for gpg.ssh.allowedSignersFile shows an example string
that contains "user1@example.com,user2@example.com". Asciidoc thinks
these are real email addresses and generates "mailto" footnotes for
them. This makes the rendered content more confusing, as it has extra
"[1]" markers:
The file consists of one or more lines of principals followed by an
ssh public key. e.g.: user1@example.com[1],user2@example.com[2]
ssh-rsa AAAAX1... See ssh-keygen(1) "ALLOWED SIGNERS" for details.
and also generates pointless notes at the end of the page:
NOTES
1. user1@example.com
mailto:user1@example.com
2. user2@example.com
mailto:user2@example.com
We can fix this by putting the example into a backtick literal block.
That inhibits the mailto generation, and as a bonus typesets the example
text in a way that sets it off from the regular prose (a tt font for
html, or bold in the roff manpage).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If valid-before/after dates are configured for this signatures key in the
allowedSigners file then the verification should check if the key was valid at
the time the commit was made. This allows for graceful key rollover and
revoking keys without invalidating all previous commits.
This feature needs openssh > 8.8. Older ssh-keygen versions will simply
ignore this flag and use the current time.
Strictly speaking this feature is available in 8.7, but since 8.7 has a
bug that makes it unusable in another needed call we require 8.8.
Timestamp information is present on most invocations of check_signature.
However signer ident is not. We will need the signer email / name to be able
to implement "Trust on first use" functionality later.
Since the payload contains all necessary information we can parse it
from there. The caller only needs to provide us some info about the
payload by setting payload_type in the signature_check struct.
- Add payload_type field & enum and payload_timestamp to struct
signature_check
- Populate the timestamp when not already set if we know about the
payload type
- Pass -Overify-time={payload_timestamp} in the users timezone to all
ssh-keygen verification calls
- Set the payload type when verifying commits
- Add tests for expired, not yet valid and keys having a commit date
outside of key validity as well as within
Signed-off-by: Fabian Stelzer <fs@gigacodes.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We documented that with grep.patternType set to default, the "git
grep" command returns to "the default matching behavior" in 84befcd0
(grep: add a grep.patternType configuration setting, 2012-08-03).
The grep.extendedRegexp configuration variable was the only way to
configure the behavior before that, after b22520a3 (grep: allow -E
and -n to be turned on by default via configuration, 2011-03-30)
introduced it.
It is understandable that we referred to the behavior that honors
the older configuration variable as "the default matching"
behavior. It is fairly clear in its log message:
When grep.patternType is set to a value other than "default", the
grep.extendedRegexp setting is ignored. The value of "default" restores
the current default behavior, including the grep.extendedRegexp
behavior.
But when the paragraph is read in isolation by a new person who is
not aware of that backstory (which is the synonym for "most users"),
the "default matching behavior" can be read as "how 'git grep'
behaves without any configuration variables or options", which is
"match the pattern as BRE".
Clarify what the passage means by elaborating what the phrase
"default matching behavior" wanted to mean.
Helped-by: Johannes Altmanninger <aclopte@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 9a5315edfd (Merge branch 'js/patch-mode-in-others-in-c',
2020-02-05), Git acquired a built-in implementation of `git add`'s
interactive mode that could be turned on via the config option
`add.interactive.useBuiltin`.
The first official Git version to support this knob was v2.26.0.
In 2df2d81ddd (add -i: use the built-in version when
feature.experimental is set, 2020-09-08), this built-in implementation
was also enabled via `feature.experimental`. The first version with this
change was v2.29.0.
More than a year (and very few bug reports) later, it is time to declare
the built-in implementation mature and to turn it on by default.
We specifically leave the `add.interactive.useBuiltin` configuration in
place, to give users an "escape hatch" in the unexpected case should
they encounter a previously undetected bug in that implementation.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The user.signingKey config for ssh signing supports either a path to a
file containing the key or for the sake of convenience a literal string
with the ssh public key. To differentiate between those two cases we
check if the first few characters contain "ssh-" which is unlikely to be
the start of a path. ssh supports other key types which are not prefixed
with "ssh-" and will currently be treated as a file path and therefore
fail to load. To remedy this we move the prefix check into its own
function and introduce the prefix `key::` for literal ssh keys. This way
we don't need to add new key types when they become available. The
existing `ssh-` prefix is retained for compatibility with current user
configs but removed from the official documentation to discourage its
use.
Signed-off-by: Fabian Stelzer <fs@gigacodes.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git log --grep=string --author=name" learns to highlight hits just
like "git grep string" does.
* hm/paint-hits-in-log-grep:
grep/pcre2: fix an edge case concerning ascii patterns and UTF-8 data
pretty: colorize pattern matches in commit messages
grep: refactor next_match() and match_one_pattern() for external use
Fix-up for the other topic already in 'next'.
* fs/ssh-signing-fix:
gpg-interface: fix leak of strbufs in get_ssh_key_fingerprint()
gpg-interface: fix leak of "line" in parse_ssh_output()
ssh signing: clarify trustlevel usage in docs
ssh signing: fmt-merge-msg tests & config parse
Use ssh public crypto for object and push-cert signing.
* fs/ssh-signing:
ssh signing: test that gpg fails for unknown keys
ssh signing: tests for logs, tags & push certs
ssh signing: duplicate t7510 tests for commits
ssh signing: verify signatures using ssh-keygen
ssh signing: provide a textual signing_key_id
ssh signing: retrieve a default key from ssh-agent
ssh signing: add ssh key format and signing code
ssh signing: add test prereqs
ssh signing: preliminary refactoring and clean-up
Fix a regression in 8c32856133 (blame: document --color-* options,
2021-10-08), which added an extra newline before the "+" syntax.
The "Documentation/doc-diff HEAD~ HEAD" output with this applied is:
[...]
@@ -1815,13 +1815,13 @@ CONFIGURATION FILE
specified colors if the line was introduced before the given
timestamp, overwriting older timestamped colors.
- + Instead of an absolute timestamp relative timestamps work as well,
- e.g. 2.weeks.ago is valid to address anything older than 2 weeks.
+ Instead of an absolute timestamp relative timestamps work as well,
+ e.g. 2.weeks.ago is valid to address anything older than 2 weeks.
- + It defaults to blue,12 month ago,white,1 month ago,red, which colors
- everything older than one year blue, recent changes between one month
- and one year old are kept white, and lines introduced within the last
- month are colored red.
+ It defaults to blue,12 month ago,white,1 month ago,red, which
+ colors everything older than one year blue, recent changes between
+ one month and one year old are kept white, and lines introduced
+ within the last month are colored red.
color.blame.repeatedLines
Use the specified color to colorize line annotations for git blame
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--color-lines" and "--color-by-age" options of "git blame"
have been missing, which are now documented.
* bs/doc-blame-color-lines:
blame: document --color-* options
blame: describe default output format
The "--preserve-merges" option of "git rebase" has been removed.
* js/retire-preserve-merges:
sequencer: restrict scope of a formerly public function
rebase: remove a no-longer-used function
rebase: stop mentioning the -p option in comments
rebase: remove obsolete code comment
rebase: drop the internal `rebase--interactive` command
git-svn: drop support for `--preserve-merges`
rebase: drop support for `--preserve-merges`
pull: remove support for `--rebase=preserve`
tests: stop testing `git rebase --preserve-merges`
remote: warn about unhandled branch.<name>.rebase values
t5520: do not use `pull.rebase=preserve`
facca53ac added verification for ssh signatures but incorrectly
described the usage of gpg.minTrustLevel. While the verifications
trustlevel is stil set to fully or undefined depending on if the key is
known or not it has no effect on the verification result. Unknown keys
will always fail verification. This commit updates the docs to match
this behaviour.
Signed-off-by: Fabian Stelzer <fs@gigacodes.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Doc update plus improved error reporting.
* jk/log-warn-on-bogus-encoding:
docs: use "character encoding" to refer to commit-object encoding
logmsg_reencode(): warn when iconv() fails
* fs/ssh-signing:
ssh signing: test that gpg fails for unknown keys
ssh signing: tests for logs, tags & push certs
ssh signing: duplicate t7510 tests for commits
ssh signing: verify signatures using ssh-keygen
ssh signing: provide a textual signing_key_id
ssh signing: retrieve a default key from ssh-agent
ssh signing: add ssh key format and signing code
ssh signing: add test prereqs
ssh signing: preliminary refactoring and clean-up
"git multi-pack-index write --bitmap" learns to propagate the
hashcache from original bitmap to resulting bitmap.
* tb/midx-write-propagate-namehash:
t5326: test propagating hashcache values
p5326: generate pack bitmaps before writing the MIDX bitmap
p5326: don't set core.multiPackIndex unnecessarily
p5326: create missing 'perf-tag' tag
midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
pack-bitmap.c: propagate namehash values from existing bitmaps
t/helper/test-bitmap.c: add 'dump-hashes' mode
The "git log" command limits its output to the commits that contain strings
matched by a pattern when the "--grep=<pattern>" option is used, but unlike
output from "git grep -e <pattern>", the matches are not highlighted,
making them harder to spot.
Teach the pretty-printer code to highlight matches from the
"--grep=<pattern>", "--author=<pattern>" and "--committer=<pattern>"
options (to view the last one, you may have to ask for --pretty=fuller).
Also, it must be noted that we are effectively greping the content twice
(because it would be a hassle to rework the existing matching code to do
a /g match and then pass it all down to the coloring code), however it only
slows down "git log --author=^H" on this repository by around 1-2%
(compared to v2.33.0), so it should be a small enough slow down to justify
the addition of the feature.
Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit cdc2d5f11f (builtin/blame: dim uninteresting metadata lines,
2018-04-23) and 25d5f52901 (builtin/blame: highlight recently changed
lines, 2018-04-23) introduce --color-lines and --color-by-age options to
git blame, respectively. While both options are mentioned in usage help,
they aren't documented in git-blame(1). Document them.
Co-authored-by: Dr. Matthias St. Pierre <m.st.pierre@ncp-e.com>
Signed-off-by: Dr. Matthias St. Pierre <m.st.pierre@ncp-e.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous commit, the bitmap writing code learned to propagate
values from an existing hash-cache extension into the bitmap that it is
writing.
Now that that functionality exists, let's expose it by teaching the 'git
multi-pack-index' builtin to respect the `pack.writeBitmapHashCache`
option so that the hash-cache may be written at all.
Two minor points worth noting here:
- The 'git multi-pack-index write' sub-command didn't previously read
any configuration (instead this is handled in the base command). A
separate handler is added here to respect this write-specific
config option.
- I briefly considered adding a 'bitmap_flags' field to the static
options struct, but decided against it since it would require
plumbing through a new parameter to the write_midx_file() function.
Instead, a new MIDX-specific flag is added, which is translated to
the corresponding bitmap one.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To verify a ssh signature we first call ssh-keygen -Y find-principal to
look up the signing principal by their public key from the
allowedSignersFile. If the key is found then we do a verify. Otherwise
we only validate the signature but can not verify the signers identity.
Verification uses the gpg.ssh.allowedSignersFile (see ssh-keygen(1) "ALLOWED
SIGNERS") which contains valid public keys and a principal (usually
user@domain). Depending on the environment this file can be managed by
the individual developer or for example generated by the central
repository server from known ssh keys with push access. This file is usually
stored outside the repository, but if the repository only allows signed
commits/pushes, the user might choose to store it in the repository.
To revoke a key put the public key without the principal prefix into
gpg.ssh.revocationKeyring or generate a KRL (see ssh-keygen(1)
"KEY REVOCATION LISTS"). The same considerations about who to trust for
verification as with the allowedSignersFile apply.
Using SSH CA Keys with these files is also possible. Add
"cert-authority" as key option between the principal and the key to mark
it as a CA and all keys signed by it as valid for this CA.
See "CERTIFICATES" in ssh-keygen(1).
Signed-off-by: Fabian Stelzer <fs@gigacodes.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If user.signingkey is not set and a ssh signature is requested we call
gpg.ssh.defaultKeyCommand (typically "ssh-add -L") and use the first key we get
Signed-off-by: Fabian Stelzer <fs@gigacodes.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implements the actual sign_buffer_ssh operation and move some shared
cleanup code into a strbuf function
Set gpg.format = ssh and user.signingkey to either a ssh public key
string (like from an authorized_keys file), or a ssh key file.
If the key file or the config value itself contains only a public key
then the private key needs to be available via ssh-agent.
gpg.ssh.program can be set to an alternative location of ssh-keygen.
A somewhat recent openssh version (8.2p1+) of ssh-keygen is needed for
this feature. Since only ssh-keygen is needed it can this way be
installed seperately without upgrading your system openssh packages.
Signed-off-by: Fabian Stelzer <fs@gigacodes.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic for auto-correction of misspelt subcommands learned to go
interactive when the help.autocorrect configuration variable is set
to 'prompt'.
* ab/help-autocorrect-prompt:
help.c: help.autocorrect=prompt waits for user action
Doc update plus improved error reporting.
* jk/log-warn-on-bogus-encoding:
docs: use "character encoding" to refer to commit-object encoding
logmsg_reencode(): warn when iconv() fails
"git upload-pack" which runs on the other side of "git fetch"
forgot to take the ref namespaces into account when handling
want-ref requests.
* ka/want-ref-in-namespace:
docs: clarify the interaction of transfer.hideRefs and namespaces
upload-pack.c: treat want-ref relative to namespace
t5730: introduce fetch command helper
In preparation for `git-rebase--preserve-merges.sh` entering its after
life, we remove this (deprecated) option that would still rely on it.
To help users transition who still did not receive the memo about the
deprecation, we offer a helpful error message instead of throwing our
hands in the air and saying that we don't know that option, never heard
of it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expand the section about namespaces in the documentation of
`transfer.hideRefs` to point out the subtle differences between
`upload-pack` and `receive-pack`.
ffcfb68176 (upload-pack.c: treat want-ref relative to namespace,
2021-07-30) taught `upload-pack` to reject `want-ref`s for hidden refs,
which is now mentioned. It is clarified that at no point the name of a
hidden ref is revealed, but the object id it points to may.
Signed-off-by: Kim Altintop <kim@eagain.st>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Silently skipping commits when rebasing with --no-reapply-cherry-picks
(currently the default behavior) can cause user confusion. Issue
warnings when this happens, as well as advice on how to preserve the
skipped commits.
These warnings and advice are displayed only when using the (default)
"merge" rebase backend.
Update the git-rebase docs to mention the warnings and advice.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The word "encoding" can mean a lot of things (e.g., base64 or
quoted-printable encoding in emails, HTML entities, URL encoding, and so
on). The documentation for i18n.commitEncoding and i18n.logOutputEncoding
uses the phrase "character encoding" to make this more clear.
Let's use that phrase in other places to make it clear what kind of
encoding we are talking about. This patch covers the gui.encoding
option, as well as the --encoding option for git-log, etc (in this
latter case, I word-smithed the sentence a little at the same time).
That, coupled with the mention of iconv in the --encoding description,
should make this more clear.
The other spot I looked at is the working-tree-encoding section of
gitattributes(5). But it gives specific examples of encodings that I
think make the meaning pretty clear already.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation on "git diff -l<n>" and diff.renameLimit have been
updated, and the defaults for these limits have been raised.
* en/rename-limits-doc:
rename: bump limit defaults yet again
diffcore-rename: treat a rename_limit of 0 as unlimited
doc: clarify documentation for rename/copy limits
diff: correct warning message when renameLimit exceeded
"git send-email" optimization.
* ab/send-email-optim:
perl: nano-optimize by replacing Cwd::cwd() with Cwd::getcwd()
send-email: move trivial config handling to Perl
perl: lazily load some common Git.pm setup code
send-email: lazily load modules for a big speedup
send-email: get rid of indirect object syntax
send-email: use function syntax instead of barewords
send-email: lazily shell out to "git var"
send-email: lazily load config for a big speedup
send-email: copy "config_regxp" into git-send-email.perl
send-email: refactor sendemail.smtpencryption config parsing
send-email: remove non-working support for "sendemail.smtpssl"
send-email tests: test for boolean variables without a value
send-email tests: support GIT_TEST_PERL_FATAL_WARNINGS=true
The doc for 'submodule.recurse' starts with "Specifies if commands
recurse into submodles by default". This is not exactly true of all
commands that have a '--recurse-submodules' option. For example, 'git
pull --recurse-submodules' does not run 'git pull' in each submodule,
but rather runs 'git submodule update --recursive' so that the submodule
working trees after the pull matches the commits recorded in the
superproject.
Clarify that by just saying that it enables '--recurse-submodules'.
Note that the way this setting interacts with 'fetch.recurseSubmodules'
and 'push.recurseSubmodules', which can have other values than true or
false, is already documented since 4da9e99e6e (doc: be more precise on
(fetch|push).recurseSubmodules, 2020-04-06).
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code recently added to support common ancestry negotiation during
"git push" did not sanity check its arguments carefully enough.
* ab/fetch-negotiate-segv-fix:
fetch: fix segfault in --negotiate-only without --negotiation-tip=*
fetch: document the --negotiate-only option
send-pack.c: move "no refs in common" abort earlier
These were last bumped in commit 92c57e5c1d (bump rename limit
defaults (again), 2011-02-19), and were bumped both because processors
had gotten faster, and because people were getting ugly merges that
caused problems and reporting it to the mailing list (suggesting that
folks were willing to spend more time waiting).
Since that time:
* Linus has continued recommending kernel folks to set
diff.renameLimit=0 (maps to 32767, currently)
* Folks with repositories with lots of renames were happy to set
merge.renameLimit above 32767, once the code supported that, to
get correct cherry-picks
* Processors have gotten faster
* It has been discovered that the timing methodology used last time
probably used too large example files.
The last point is probably worth explaining a bit more:
* The "average" file size used appears to have been average blob size
in the linux kernel history at the time (probably v2.6.25 or
something close to it).
* Since bigger files are modified more frequently, such a computation
weights towards larger files.
* Larger files may be more likely to be modified over time, but are
not more likely to be renamed -- the mean and median blob size
within a tree are a bit higher than the mean and median of blob
sizes in the history leading up to that version for the linux
kernel.
* The mean blob size in v2.6.25 was half the average blob size in
history leading to that point
* The median blob size in v2.6.25 was about 40% of the mean blob size
in v2.6.25.
* Since the mean blob size is more than double the median blob size,
any file as big as the mean will not be compared to any files of
median size or less (because they'd be more than 50% dissimilar).
* Since it is the number of files compared that provides the O(n^2)
behavior, median-sized files should matter more than mean-sized
ones.
The combined effect of the above is that the file size used in past
calculations was likely about 5x too large. Combine that with a CPU
performance improvement of ~30%, and we can increase the limits by
a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated
time limits.
Keeping the same approximate time limit probably makes sense for
diff.renameLimit (there is no progress feedback in e.g. git log -p),
but the experience above suggests merge.renameLimit could be extended
significantly. In fact, it probably would make sense to have an
unlimited default setting for merge.renameLimit, but that would
likely need to be coupled with changes to how progress is displayed.
(See https://lore.kernel.org/git/YOx+Ok%2FEYvLqRMzJ@coredump.intra.peff.net/
for details in that area.) For now, let's just bump the approximate
time limit from 10s to 1m.
(Note: We do not want to use actual time limits, because getting results
that depend on how loaded your system is that day feels bad, and because
we don't discover that we won't get all the renames until after we've
put in a lot of work rather than just upfront telling the user there are
too many files involved.)
Using the original time limit of 2s for diff.renameLimit, and bumping
merge.renameLimit from 10s to 60s, I found the following timings using
the simple script at the end of this commit message (on an AWS c5.xlarge
which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"):
N Timing
1300 1.995s
7100 59.973s
So let's round down to nice even numbers and bump the limits from
400->1000, and from 1000->7000.
Here is the measure_rename_perf script (adapted from
https://lore.kernel.org/git/20080211113516.GB6344@coredump.intra.peff.net/
in particular to avoid triggering the linear handling from
basename-guided rename detection):
#!/bin/bash
n=$1; shift
rm -rf repo
mkdir repo && cd repo
git init -q -b main
mkdata() {
mkdir $1
for i in `seq 1 $2`; do
(sed "s/^/$i /" <../sample
echo tag: $1
) >$1/$i
done
}
mkdata initial $n
git add .
git commit -q -m initial
mkdata new $n
git add .
cd new
for i in *; do git mv $i $i.renamed; done
cd ..
git rm -q -rf initial
git commit -q -m new
time git diff-tree -M -l0 --summary HEAD^ HEAD
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few places in the docs implied that rename/copy detection is always
quadratic or that all (unpaired) files were involved in the quadratic
portion of rename/copy detection. The following two commits each
introduced an exception to this:
9027f53cb5 (Do linear-time/space rename logic for exact renames,
2007-10-25)
bd24aa2f97 (diffcore-rename: guide inexact rename detection based
on basenames, 2021-02-14)
(As a side note, for copy detection, the basename guided inexact rename
detection is turned off and the exact renames will only result in
sources (without the dests) being removed from the set of files used in
quadratic detection. So, for copy detection, the documentation was
closer to correct.)
Avoid implying that all files involved in rename/copy detection are
subject to the full quadratic algorithm. While at it, also note the
default values for all these settings.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There was no documentation for the --negotiate-only option added in
9c1e657a8f (fetch: teach independent negotiation (no packfile),
2021-05-04), only documentation for the related push.negotiation
option added in the following commit in 477673d6f3 (send-pack:
support push negotiation, 2021-05-04).
Let's document it, and update the cross-linking I'd added between
--negotiation-tip=* and 'fetch.negotiationAlgorithm' in
526608284a (fetch doc: cross-link two new negotiation options,
2018-08-01).
I think it would be better to say "in common with the remote" here
than "...the server", but the documentation for --negotiation-tip=*
above this talks about "the server", so let's continue doing that in
this related option. See 3390e42adb (fetch-pack: support negotiation
tip whitelist, 2018-07-02) for that documentation.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As can be seen in files "Documentation/blame-options.txt" and
"builtin/blame.c", the name of this configuration option is
"blame.markUnblamableLines".
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This option is almost never a good idea, as the resulting repository is
larger and slower (see the new explanations in the docs).
I outlined the potential problems. We could go further and make the
option harder to find (or at least, make the command-line option
descriptions a much more terse "you probably don't want this; see
pack.packsizeLimit for details"). But this seems like a minimal change
that may prevent people from thinking it's more useful than it is.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit a01f7f2ba0 (merge: enable defaulttoupstream by default,
2014-04-20) forgot to mention the new default in the configuration
documentation.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the code has been simplified and it's clear what it's
actually doing, update the documentation to reflect that.
Namely; the simple mode only barfs when working on a centralized
workflow, and there's no configured upstream branch with the same name.
Cc: Elijah Newren <newren@gmail.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the already dead code to support "sendemail.smtpssl" by finally
removing the dead code supporting the configuration option.
In f6bebd121a (git-send-email: add support for TLS via
Net::SMTP::SSL, 2008-06-25) the --smtp-ssl command-line option was
documented as deprecated, later in 65180c6618 (List send-email config
options in config.txt., 2009-07-22) the "sendemail.smtpssl"
configuration option was also documented as such.
Then in in 3ff15040e2 (send-email: fix regression in
sendemail.identity parsing, 2019-05-17) I unintentionally removed
support for it by introducing a bug in read_config().
As can be seen from the diff context we've already returned unless
$enc i defined, so it's not possible for us to reach the "elsif"
branch here. This code was therefore already dead since Git v2.23.0.
So let's just remove it. We were already 11 years into a stated
deprecation period of this variable when 3ff15040e2 landed, now it's
around 13. Since it hasn't worked anyway for around 2 years it looks
like we can safely remove it.
The --smtp-ssl option is still deprecated, if someone cares they can
follow-up and remove that too, but unlike the config option that one
could still be in use in the wild. I'm just removing this code that's
provably unused already.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Another brown paper bag inconsistency fix for a new feature
introduced during this cycle.
* dl/stash-show-untracked-fixup:
stash show: use stash.showIncludeUntracked even when diff options given
If options pertaining to how the diff is displayed is provided to
`git stash show`, the command will ignore the stash.showIncludeUntracked
configuration variable, defaulting to not showing any untracked files.
This is unintuitive behaviour since the format of the diff output and
whether or not to display untracked files are orthogonal.
Use stash.showIncludeUntracked even when diff options are given. Of
course, this is still overridable via the command-line options.
Update the documentation to explicitly say which configuration variables
will be overridden when a diff options are given.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current documentation for color.pager is technically correct, but
slightly misleading and doesn't really clarify the purpose of the
variable. As explained in the original thread which added it:
https://lore.kernel.org/git/E1G6zPH-00062L-Je@moooo.ath.cx/
the point is to deal with pagers that don't understand colors. And hence it
being set to "true" is necessary for colorizing output to the pager, but
not sufficient by itself (you must also have enabled one of the other
color options, though note that these are set to "auto" by default these
days).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git push" learns to discover common ancestor with the receiving
end over protocol v2.
* jt/push-negotiation:
send-pack: support push negotiation
fetch: teach independent negotiation (no packfile)
fetch-pack: refactor command and capability write
fetch-pack: refactor add_haves()
fetch-pack: refactor process_acks()
"git rev-list" learns the "--filter=object:type=<type>" option,
which can be used to exclude objects of the given kind from the
packfile generated by pack-objects.
* ps/rev-list-object-type-filter:
rev-list: allow filtering of provided items
pack-bitmap: implement combined filter
pack-bitmap: implement object type filter
list-objects: implement object type filter
list-objects: support filtering by tag and commit
list-objects: move tag processing into its own function
revision: mark commit parents as NOT_USER_GIVEN
uploadpack.txt: document implication of `uploadpackfilter.allow`
"git add" and "git rm" learned not to touch those paths that are
outside of sparse checkout.
* mt/add-rm-in-sparse-checkout:
rm: honor sparse checkout patterns
add: warn when asked to update SKIP_WORKTREE entries
refresh_index(): add flag to ignore SKIP_WORKTREE entries
pathspec: allow to ignore SKIP_WORKTREE entries on index matching
add: make --chmod and --renormalize honor sparse checkouts
t3705: add tests for `git add` in sparse checkouts
add: include magic part of pathspec on --refresh error
The checkout machinery has been taught to perform the actual
write-out of the files in parallel when able.
* mt/parallel-checkout-part-2:
parallel-checkout: add design documentation
parallel-checkout: support progress displaying
parallel-checkout: add configuration options
parallel-checkout: make it truly parallel
unpack-trees: add basic support for parallel checkout
Builds on top of the sparse-index infrastructure to mark operations
that are not ready to mark with the sparse index, causing them to
fall back on fully-populated index that they always have worked with.
* ds/sparse-index-protections: (47 commits)
name-hash: use expand_to_path()
sparse-index: expand_to_path()
name-hash: don't add directories to name_hash
revision: ensure full index
resolve-undo: ensure full index
read-cache: ensure full index
pathspec: ensure full index
merge-recursive: ensure full index
entry: ensure full index
dir: ensure full index
update-index: ensure full index
stash: ensure full index
rm: ensure full index
merge-index: ensure full index
ls-files: ensure full index
grep: ensure full index
fsck: ensure full index
difftool: ensure full index
commit: ensure full index
checkout: ensure full index
...
While it already is possible to filter objects by some criteria in
git-rev-list(1), it is not yet possible to filter out only a specific
type of objects. This makes some filters less useful. The `blob:limit`
filter for example filters blobs such that only those which are smaller
than the given limit are returned. But it is unfit to ask only for these
smallish blobs, given that git-rev-list(1) will continue to print tags,
commits and trees.
Now that we have the infrastructure in place to also filter tags and
commits, we can improve this situation by implementing a new filter
which selects objects based on their type. Above query can thus
trivially be implemented with the following command:
$ git rev-list --objects --filter=object:type=blob \
--filter=blob:limit=200
Furthermore, this filter allows to optimize for certain other cases: if
for example only tags or commits have been selected, there is no need to
walk down trees.
The new filter is not yet supported in bitmaps. This is going to be
implemented in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make parallel checkout configurable by introducing two new settings:
checkout.workers and checkout.thresholdForParallelism. The first defines
the number of workers (where one means sequential checkout), and the
second defines the minimum number of entries to attempt parallel
checkout.
To decide the default value for checkout.workers, the parallel version
was benchmarked during three operations in the linux repo, with cold
cache: cloning v5.8, checking out v5.8 from v2.6.15 (checkout I) and
checking out v5.8 from v5.7 (checkout II). The four tables below show
the mean run times and standard deviations for 5 runs in: a local file
system on SSD, a local file system on HDD, a Linux NFS server, and
Amazon EFS (all on Linux). Each parallel checkout test was executed with
the number of workers that brings the best overall results in that
environment.
Local SSD:
Sequential 10 workers Speedup
Clone 8.805 s ± 0.043 s 3.564 s ± 0.041 s 2.47 ± 0.03
Checkout I 9.678 s ± 0.057 s 4.486 s ± 0.050 s 2.16 ± 0.03
Checkout II 5.034 s ± 0.072 s 3.021 s ± 0.038 s 1.67 ± 0.03
Local HDD:
Sequential 10 workers Speedup
Clone 32.288 s ± 0.580 s 30.724 s ± 0.522 s 1.05 ± 0.03
Checkout I 54.172 s ± 7.119 s 54.429 s ± 6.738 s 1.00 ± 0.18
Checkout II 40.465 s ± 2.402 s 38.682 s ± 1.365 s 1.05 ± 0.07
Linux NFS server (v4.1, on EBS, single availability zone):
Sequential 32 workers Speedup
Clone 240.368 s ± 6.347 s 57.349 s ± 0.870 s 4.19 ± 0.13
Checkout I 242.862 s ± 2.215 s 58.700 s ± 0.904 s 4.14 ± 0.07
Checkout II 65.751 s ± 1.577 s 23.820 s ± 0.407 s 2.76 ± 0.08
EFS (v4.1, replicated over multiple availability zones):
Sequential 32 workers Speedup
Clone 922.321 s ± 2.274 s 210.453 s ± 3.412 s 4.38 ± 0.07
Checkout I 1011.300 s ± 7.346 s 297.828 s ± 0.964 s 3.40 ± 0.03
Checkout II 294.104 s ± 1.836 s 126.017 s ± 1.190 s 2.33 ± 0.03
The above benchmarks show that parallel checkout is most effective on
repositories located on an SSD or over a distributed file system. For
local file systems on spinning disks, and/or older machines, the
parallelism does not always bring a good performance. For this reason,
the default value for checkout.workers is one, a.k.a. sequential
checkout.
To decide the default value for checkout.thresholdForParallelism,
another benchmark was executed in the "Local SSD" setup, where parallel
checkout showed to be beneficial. This time, we compared the runtime of
a `git checkout -f`, with and without parallelism, after randomly
removing an increasing number of files from the Linux working tree. The
"sequential fallback" column below corresponds to the executions where
checkout.workers was 10 but checkout.thresholdForParallelism was equal
to the number of to-be-updated files plus one (so that we end up writing
sequentially). Each test case was sampled 15 times, and each sample had
a randomly different set of files removed. Here are the results:
sequential fallback 10 workers speedup
10 files 772.3 ms ± 12.6 ms 769.0 ms ± 13.6 ms 1.00 ± 0.02
20 files 780.5 ms ± 15.8 ms 775.2 ms ± 9.2 ms 1.01 ± 0.02
50 files 806.2 ms ± 13.8 ms 767.4 ms ± 8.5 ms 1.05 ± 0.02
100 files 833.7 ms ± 21.4 ms 750.5 ms ± 16.8 ms 1.11 ± 0.04
200 files 897.6 ms ± 30.9 ms 730.5 ms ± 14.7 ms 1.23 ± 0.05
500 files 1035.4 ms ± 48.0 ms 677.1 ms ± 22.3 ms 1.53 ± 0.09
1000 files 1244.6 ms ± 35.6 ms 654.0 ms ± 38.3 ms 1.90 ± 0.12
2000 files 1488.8 ms ± 53.4 ms 658.8 ms ± 23.8 ms 2.26 ± 0.12
From the above numbers, 100 files seems to be a reasonable default value
for the threshold setting.
Note: Up to 1000 files, we observe a drop in the execution time of the
parallel code with an increase in the number of files. This is a rather
odd behavior, but it was observed in multiple repetitions. Above 1000
files, the execution time increases according to the number of files, as
one would expect.
About the test environments: Local SSD tests were executed on an
i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local
HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores
with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running
Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge
instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large
instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing,
the linux repository was removed (or checked out back to its previous
state), and `sync && sysctl vm.drop_caches=3` was executed.
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
New log.diffMerges configuration variable sets the format that
--diff-merges=on will be using. The default is "separate".
t4013: add the following tests for log.diffMerges config:
* Test that wrong values are denied.
* Test that the value of log.diffMerges properly affects both
--diff-merges=on and -m.
t9902: fix completion tests for log.d* to match log.diffMerges.
Added documentation for log.diffMerges.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A configuration variable has been added to force tips of certain
refs to be given a reachability bitmap.
* tb/pack-preferred-tips-to-give-bitmap:
builtin/pack-objects.c: respect 'pack.preferBitmapTips'
t/helper/test-bitmap.c: initial commit
pack-bitmap: add 'test_bitmap_commits()' helper
When `uploadpackfilter.allow` is set to `true`, it means that filters
are enabled by default except in the case where a filter is explicitly
disabled via `uploadpackilter.<filter>.allow`. This option will not only
enable the currently supported set of filters, but also any filters
which get added in the future. As such, an admin which wants to have
tight control over which filters are allowed and which aren't probably
shouldn't ever set `uploadpackfilter.allow=true`.
Amend the documentation to make the ramifications more explicit so that
admins are aware of this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git add` refrains from adding or updating index entries that are
outside the current sparse checkout, but `git rm` doesn't follow the
same restriction. This is somewhat counter-intuitive and inconsistent.
So make `rm` honor the sparsity rules and advise on how to remove
SKIP_WORKTREE entries just like `add` does. Also add some tests for the
new behavior.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git add` already refrains from updating SKIP_WORKTREE entries, but it
silently exits with zero code when it is asked to do so. Instead, let's
warn the user and display a hint on how to update these entries.
Note that we only warn the user whey they give a pathspec item that
matches no eligible path for updating, but it does match one or more
SKIP_WORKTREE entries. A warning was chosen over erroring out right away
to reproduce the same behavior `add` already exhibits with ignored
files. This also allow users to continue their workflow without having
to invoke `add` again with only the eligible paths (as those will have
already been added).
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git clone --reject-shallow" option fails the clone as soon as we
notice that we are cloning from a shallow repository.
* ll/clone-reject-shallow:
builtin/clone.c: add --reject-shallow option
In some scenarios, users may want more history than the repository
offered for cloning, which happens to be a shallow repository, can
give them. But because users don't know it is a shallow repository
until they download it to local, we may want to refuse to clone
this kind of repository, without creating any unnecessary files.
The '--depth=x' option cannot be used as a solution; the source may
be deep enough to give us 'x' commits when cloned, but the user may
later need to deepen the history to arbitrary depth.
Teach '--reject-shallow' option to "git clone" to abort as soon as
we find out that we are cloning from a shallow repository.
Signed-off-by: Li Linchao <lilinchao@oschina.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing a new pack with a bitmap, it is sometimes convenient to
indicate some reference prefixes which should receive priority when
selecting which commits to receive bitmaps.
A truly motivated caller could accomplish this by setting
'pack.islandCore', (since all commits in the core island are similarly
marked as preferred) but this requires callers to opt into using delta
islands, which they may or may not want to do.
Introduce a new multi-valued configuration, 'pack.preferBitmapTips' to
allow callers to specify a list of reference prefixes. All references
which have a prefix contained in 'pack.preferBitmapTips' will mark their
tips as "preferred" in the same way as commits are marked as preferred
for selection by 'pack.islandCore'.
The choice of the verb "prefer" is intentional: marking the NEEDS_BITMAP
flag on an object does *not* guarantee that that object will receive a
bitmap. It merely guarantees that that commit will receive a bitmap over
any *other* commit in the same window by bitmap_writer_select_commits().
The test this patch adds reflects this quirk, too. It only tests that
a commit (which didn't receive bitmaps by default) is selected for
bitmaps after changing the value of 'pack.preferBitmapTips' to include
it. Other commits may lose their bitmaps as a byproduct of how the
selection process works (bitmap_writer_select_commits() ignores the
remainder of a window after seeing a commit with the NEEDS_BITMAP flag).
This configuration will aide in selecting important references for
multi-pack bitmaps, since they do not respect the same pack.islandCore
configuration. (They could, but doing so may be confusing, since it is
packs--not bitmaps--which are influenced by the delta-islands
configuration).
In a fork network repository (one which lists all forks of a given
repository as remotes), for example, it is useful to set
pack.preferBitmapTips to 'refs/remotes/<root>/heads' and
'refs/remotes/<root>/tags', where '<root>' is an opaque identifier
referring to the repository which is at the base of the fork chain.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When enabled, this config option signals that index writes should
attempt to use sparse-directory entries.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the rebase.useBuiltin setting and the now-obsolete
GIT_TEST_REBASE_USE_BUILTIN test flag.
This was left in place after my d03ebd411c (rebase: remove the
rebase.useBuiltin setting, 2019-03-18) to help anyone who'd used the
experimental flag and wanted to know that it was the default, or that
they should transition their test environment to use the builtin
rebase unconditionally.
It's been more than long enough for those users to get a headsup about
this. So remove all the scaffolding that was left inplace after
d03ebd411c. I'm also removing the documentation entry, if anyone
still has this left in their configuration they can do some source
archaeology to figure out what it used to do, which makes more sense
than exposing every git user reading the documentation to this legacy
configuration switch.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git stash show" learned to optionally show untracked part of the
stash.
* dl/stash-show-untracked:
stash show: learn stash.showIncludeUntracked
stash show: teach --include-untracked and --only-untracked
A new configuration variable has been introduced to allow choosing
which version of the generation number gets used in the
commit-graph file.
* ds/commit-graph-generation-config:
commit-graph: use config to specify generation type
commit-graph: create local repository pointer
Disable the recent mergetool's hideresolved feature by default for
backward compatibility and safety.
* jn/mergetool-hideresolved-is-optional:
doc: describe mergetool configuration in git-mergetool(1)
mergetool: do not enable hideResolved by default
When 98ea309b3f (mergetool: add hideResolved configuration,
2021-02-09) introduced the mergetool.hideResolved setting to reduce
the clutter in viewing non-conflicted sections of files in a
mergetool, it enabled it by default, explaining:
No adverse effects were noted in a small survey of popular mergetools[1]
so this behavior defaults to `true`.
In practice, alas, adverse effects do appear. A few issues:
1. No indication is shown in the UI that the base, local, and remote
versions shown have been modified by additional resolution. This
is inherent in the design: the idea of mergetool.hideResolved is to
convince a mergetool that expects pristine local, base, and remote
files to show partially resolved verisons of those files instead;
there is no additional source of information accessible to the
mergetool to see where the resolution has happened.
(By contrast, a mergetool generating the partial resolution from
conflict markers for itself would be able to hilight the resolved
sections with a different color.)
A user accustomed to seeing the files without partial resolution
gets no indication that this behavior has changed when they upgrade
Git.
2. If the computed merge did not line up the files correctly (for
example due to repeated sections in the file), the partially
resolved files can be misleading and do not have enough information
to reconstruct what happened and compute the correct merge result.
3. Resolving a conflict can involve information beyond the textual
conflict. For example, if the local and remote versions added
overlapping functionality in different ways, seeing the full
unresolved versions of each alongside the base gives information
about each side's intent that makes it possible to come up with a
resolution that combines those two intents. By contrast, when
starting with partially resolved versions of those files, one can
produce a subtly wrong resolution that includes redundant extra
code added by one side that is not needed in the approach taken
on the other.
All that said, a user wanting to focus on textual conflicts with
reduced clutter can still benefit from mergetool.hideResolved=true as
a way to deemphasize sections of the code that resolve cleanly without
requiring any changes to the invoked mergetool. The caveats described
above are reduced when the user has explicitly turned this on, because
then the user is aware of them.
Flip the default to 'false'.
Reported-by: Dana Dahlstrom <dahlstrom@google.com>
Helped-by: Seth House <seth@eseth.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit teaches `git stash show --include-untracked`. It
may be desirable for a user to be able to always enable the
--include-untracked behavior. Teach the stash.showIncludeUntracked
config option which allows users to do this in a similar manner to
stash.showPatch.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rebase --[no-]fork-point" gained a configuration variable
rebase.forkPoint so that users do not have to keep specifying a
non-default setting.
* ah/rebase-no-fork-point-config:
rebase: add a config option for --no-fork-point
We have two established generation number versions:
1: topological levels
2: corrected commit dates
The corrected commit dates are enabled by default, but they also write
extra data in the GDAT and GDOV chunks. Services that host Git data
might want to have more control over when this feature rolls out than
just updating the Git binaries.
Add a new "commitGraph.generationVersion" config option that specifies
the intended generation number version. If this value is less than 2,
then the GDAT chunk is never written _or read_ from an existing file.
This can replace our use of the GIT_TEST_COMMIT_GRAPH_NO_GDAT
environment variable in the test suite. Remove it.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some users (myself included) would prefer to have this feature off by
default because it can silently drop commits.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git mergetool" feeds three versions (base, local and remote) of
a conflicted path unmodified. The command learned to optionally
prepare these files with unconflicted parts already resolved.
* sh/mergetool-hideresolved:
mergetool: add per-tool support and overrides for the hideResolved flag
mergetool: break setup_tool out into separate initialization function
mergetool: add hideResolved configuration