The structure has nothing to do with what "git bisect" does; as
nobody other than "git rev-list" implementation uses it, move it
as a private data type to builtin/rev-list.c
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git uses `rebase.autostash` or `merge.autostash` to determine whether a
dirty worktree is allowed during pull. However, this behavior is not
clearly documented, making it difficult for users to discover how to
enable autostash, or causing them to unknowingly enable it. Add new
config option `pull.autostash` along with its documentation and test
cases.
`pull.autostash` provides the same functionality as `rebase.autostash`
and `merge.autostash`, but overrides them when set. If `pull.autostash`
is not set, it falls back to `rebase.autostash` or `merge.autostash`,
depending on the value of `pull.rebase`.
Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We added the --early-output feature long ago in cdcefbc971 (Add
"--early-output" log flag for interactive GUI use, 2007-11-03). The idea
was that GUIs could use it to progressively render a history view,
showing something quick-and-inaccurate at first and then enhancing it
later.
But we never documented it, and it appears never to have been used, even
by the projects which initially expressed interest. There was an RFC
patch for gitk to use it:
http://public-inbox.org/git/18221.2285.259487.655684@cargo.ozlabs.ibm.com/
but it was never merged. Likewise QGit had a patch in:
https://lore.kernel.org/git/e5bfff550711040225ne67c907r2023b1354c35f35@mail.gmail.com/
but it was never fully merged (to this day, QGit has a commented-out line to
add "--early-output" to the "log" invocation). Searching for other
mentions on the web or forges like github.com turns up nothing.
Meanwhile, the feature has been broken off and on over the years without
anybody noticing (and naturally, there are no tests, either). From 2011
to 2017 the option didn't even turn on via "--early-output"; this was
fixed in e35b6ac56f (revision.h: turn rev_info.early_output back into an
unsigned int, 2017-06-10).
It worked for a while then, but it does not interact well at all with
commit-graphs (which are turned on by default these days). The main
logic to count early commits is triggered by limit_list(), which we
traditionally invoked when showing output in topo-order (and
--early-output always enables --topo-order). But that changed in
f0d9cc4196 (revision.c: begin refactoring --topo-order logic,
2018-11-01). Now when we have generation numbers, we skip limit_list()
entirely, and the early-output code shows no commits, and just the final
header "Final output: 1 done". Which is syntactically OK, but
semantically wrong: that message should give the total number of commits
we're about to show.
So let's drop the feature. It is extra code that is untested and
undocumented, and makes working on the revision machinery more brittle.
Given the history above, it seems unlikely that anybody is using it (or
has used it), and we can drop it without the usual deprecation period.
A gentler option might be to "soft" drop it: keep accepting the option,
have it imply --topo-order as it does now, print "Final output: 1 done",
and then do our regular traversal. That would keep any hypothetical
caller working. But it doesn't seem worth the hassle to me.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git remote" now detects remote names that overlap with each other
(e.g., remote nickname "outer" and "outer/inner" are used at the
same time), as it will lead to overlapping remote-tracking
branches.
* jk/remote-avoid-overlapping-names:
remote: detect collisions in remote names
"pack-objects" has been taught to avoid pointing into objects in
cruft packs from midx.
* tb/midx-avoid-cruft-packs:
repack: exclude cruft pack(s) from the MIDX where possible
pack-objects: introduce '--stdin-packs=follow'
pack-objects: swap 'show_{object,commit}_pack_hint'
pack-objects: fix typo in 'show_object_pack_hint()'
pack-objects: perform name-hash traversal for unpacked objects
pack-objects: declare 'rev_info' for '--stdin-packs' earlier
pack-objects: factor out handling '--stdin-packs'
pack-objects: limit scope in 'add_object_entry_from_pack()'
pack-objects: use standard option incompatibility functions
Prepare to flip the default hash function to SHA-256.
* bc/use-sha256-by-default-in-3.0:
Enable SHA-256 by default in breaking changes mode
help: add a build option for default hash
t5300: choose the built-in hash outside of a repo
t4042: choose the built-in hash outside of a repo
t1007: choose the built-in hash outside of a repo
t: default to compile-time default hash if not set
setup: use the default algorithm to initialize repo format
Use legacy hash for legacy formats
builtin: use default hash when outside a repository
hash: add a constant for the legacy hash algorithm
hash: add a constant for the default hash algorithm
* bc/use-sha256-by-default-in-3.0:
Enable SHA-256 by default in breaking changes mode
help: add a build option for default hash
t5300: choose the built-in hash outside of a repo
t4042: choose the built-in hash outside of a repo
t1007: choose the built-in hash outside of a repo
t: default to compile-time default hash if not set
setup: use the default algorithm to initialize repo format
Use legacy hash for legacy formats
builtin: use default hash when outside a repository
hash: add a constant for the legacy hash algorithm
hash: add a constant for the default hash algorithm
The function `force_object_loose()` forces an object to become a loose
object in case it only exists in its packed form. To do so it implicitly
relies on `the_repository`.
Refactor the function by passing a `struct odb_source` as parameter.
While the check whether any such loose object exists already acts on the
whole object database, writing the loose object happens in one specific
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `read_loose_object()` takes a path to an object file and
tries to parse it. As such, the function does not depend on any specific
object database but instead acts as an ODB-independent way to read a
specific file. As such, all it needs as input is a repository so that we
can derive repo settings and the hash algorithm.
That repository isn't passed in as a parameter though, as we implicitly
depend on the global `the_repository`. Refactor the function so that we
pass in the repository as a parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The iterators for loose objects still rely on `the_repository`. Refactor
them:
- `for_each_loose_file_in_objdir()` is refactored so that the caller
is now expected to pass an `odb_source` as parameter instead of the
path to that source. Furthermore, it is renamed accordingly to
`for_each_loose_file_in_source()`.
- `for_each_loose_object()` is refactored to take in an object
database now and calls the above function in a loop.
This allows us to get rid of the global dependency.
Adjust callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic that writes loose objects still relies on `the_repository` to
decide where exactly the object shall be written to. Refactor it so that
the logic instead operates on a `struct odb_source` so that we can get
rid of this global dependency.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not have a backend-agnostic way to write objects into an object
database. While there is `write_object_file()`, this function is rather
specific to the loose object format.
Introduce `odb_write_object()` to plug this gap. For now, this function
is a simple wrapper around `write_object_file()` and doesn't even use
the passed-in object database yet. This will change in subsequent
commits, where `write_object_file()` is converted so that it works on
top of an `odb_source`. `odb_write_object()` will then become
responsible for deciding which source an object shall be written to.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We implicitly depend on `the_repository` when moving an object file into
place in `finalize_object_file()`. Get rid of this global dependency by
passing in a repository.
Note that one might be pressed to inject an object database instead of a
repository. But the function doesn't really care about the ODB at all.
All it does is to move a file into place while checking whether there is
any collision. As such, the functionality it provides is independent of
the object database and only needs the repository as parameter so that
it can adjust permissions of the file we are about to finalize.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We implicitly depend on `the_repository` in `has_loose_object()`.
Refactor the function to accept an `odb_source` as input that should be
checked for such a loose object.
This refactoring changes semantics of the function to not check the
whole object database for such a loose object anymore, but instead we
now only check that single source. Existing callers thus need to loop
through all sources manually now.
While this change may seem illogical at first, whether or not an object
exists in a specific format should be answered by the source using that
format. As such, we can eventually convert this into a generic function
`odb_source_has_object()` that simply checks whether a given object
exists in an object source. And as we will know about the format that
any given source uses it allows us to derive whether the object exists
in a given format.
This change also makes `has_loose_object_nonlocal()` obsolete. The only
caller of this function is adapted so that it skips the primary object
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Define .precision to more canned parse-options type to avoid bugs
coming from using a variable with a wrong type to capture the
parsed values.
* rs/parse-options-precision:
parse-options: add precision handling for OPTION_COUNTUP
parse-options: add precision handling for OPTION_BITOP
parse-options: add precision handling for OPTION_NEGBIT
parse-options: add precision handling for OPTION_BIT
parse-options: add precision handling for OPTION_SET_INT
parse-options: add precision handling for PARSE_OPT_CMDMODE
parse-options: require PARSE_OPT_NOARG for OPTION_BITOP
"git fetch --prune" used to be O(n^2) expensive when there are many
refs, which has been corrected.
* ph/fetch-prune-optim:
clean up interface for refs_warn_dangling_symrefs
refs: remove old refs_warn_dangling_symref
fetch-prune: optimize dangling-ref reporting
When core.commentChar is set to "auto", Git selects a comment character
by scanning the commit message contents and avoiding any character
already present in the message.
If the message still contains old conflict comments (starting with a
comment character), Git assumes that character is in use and chooses a
different one. As a result, those existing comment lines are no longer
recognized as comments and end up being included in the final commit
message.
To avoid this, skip scanning the trailing comment block when selecting
the comment character. This allows Git to safely reuse the original
character when appropriate, keeping the commit message clean and free of
leftover conflict information.
Background:
The "auto" value for core.commentchar was introduced in the commit
84c9dc2c5a (commit: allow core.commentChar=auto for character auto
selection, 2014-05-17) but did not exhibit this issue at that time.
The bug was introduced in commit a6c2654f83 (rebase -m: fix --signoff
with conflicts, 2024-04-18) where Git started writing conflict comments
to the file at 'rebase_path_message()'.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up around object access API.
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
The function `get_multi_pack_index()` loads multi-pack indices via
`prepare_packed_git()` and then returns the linked list of multi-pack
indices that is stored in `struct object_database`. That list is in the
process of being removed though in favor of storing the MIDX as part of
the object database source it belongs to.
Refactor `get_multi_pack_index()` so that it returns the multi-pack
index for a single object source. Callers are now expected to call this
function for each source they are interested in. This requires them to
iterate through alternates, so we have to prepare alternate object
sources before doing so.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tb/midx-avoid-cruft-packs:
repack: exclude cruft pack(s) from the MIDX where possible
pack-objects: introduce '--stdin-packs=follow'
pack-objects: swap 'show_{object,commit}_pack_hint'
pack-objects: fix typo in 'show_object_pack_hint()'
pack-objects: perform name-hash traversal for unpacked objects
pack-objects: declare 'rev_info' for '--stdin-packs' earlier
pack-objects: factor out handling '--stdin-packs'
pack-objects: limit scope in 'add_object_entry_from_pack()'
pack-objects: use standard option incompatibility functions
The `git-for-each-ref(1)` command is used to iterate over references
present in a repository. In large repositories with millions of
references, it would be optimal to paginate this output such that we
can start iteration from a given reference. This would avoid having to
iterate over all references from the beginning each time when paginating
through results.
The previous commit added 'seek' functionality to the reference
backends. Utilize this and expose a '--start-after' option in
'git-for-each-ref(1)'. When used, the reference iteration seeks to the
lexicographically next reference and iterates from there onward.
This enables efficient pagination workflows, where the calling script
can remember the last provided reference and use that as the starting
point for the next set of references:
git for-each-ref --count=100
git for-each-ref --count=100 --start-after=refs/heads/branch-100
git for-each-ref --count=100 --start-after=refs/heads/branch-200
Since the reference iterators only allow seeking to a specified marker
via the `ref_iterator_seek()`, we introduce a helper function
`start_ref_iterator_after()`, which seeks to next reference by simply
adding (char) 1 to the marker.
We must note that pagination always continues from the provided marker,
as such any concurrent reference updates lexicographically behind the
marker will not be output. Document the same.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some code paths in the "git prune" used to ignore passed in
repository object and used the_repository singleton instance
instead, which has been corrected.
* ac/prune-wo-the-repository:
builtin/prune: stop depending on 'the_repository'
repository: move 'repository_format_precious_objects' to repo scope
Use of sysctl() system call to learn the total RAM size used on
BSDs has been corrected.
* cb/total-ram-bsd-fix:
builtin/gc: correct total_ram calculation with HAVE_BSD_SYSCTL
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
A recent commit, d9cb0e6ff8 (fast-export, fast-import: add support for
signed-commits, 2025-03-10), added support for signed commits to
fast-export and fast-import.
When a signed commit is processed, fast-export can output either
"gpgsig sha1" or "gpgsig sha256" depending on whether the signed
commit uses the SHA-1 or SHA-256 Git object format.
However, this implementation has a number of limitations:
- the output format was not properly described in the documentation,
- the output format is not very informative as it doesn't even say
if the signature is an OpenPGP, an SSH, or an X509 signature,
- the implementation doesn't support having both one signature on
the SHA-1 object and one on the SHA-256 object.
Let's improve on these limitations by improving fast-export and
fast-import so that:
- all the signatures are exported,
- at most one signature on the SHA-1 object and one on the SHA-256
are imported,
- if there is more than one signature on the SHA-1 object or on
the SHA-256 object, fast-import emits a warning for each
additional signature,
- the output format is "gpgsig <git-hash-algo> <signature-format>",
where <git-hash-algo> is the Git object format as before, and
<signature-format> is the signature type ("openpgp", "x509",
"ssh" or "unknown"),
- the output is properly documented.
About the output format:
- <git-hash-algo> allows to know which representation of the commit
was signed (the SHA-1 or the SHA-256 version) which helps with
both signature verification and interoperability between repos
with different hash functions,
- <signature-format> helps tools that process the fast-export
stream, so they don't have to parse the ASCII armor to identify
the signature type.
It could be even better to be able to import more than one signature
on the SHA-1 object and on the SHA-256 object, but other parts of
Git don't handle that well for now, so this is left for future
improvements.
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to 09705696f7 (parse-options: introduce precision handling for
`OPTION_INTEGER`, 2025-04-17) support value variables of different sizes
for OPTION_NEGBIT. Do that by requiring their "precision" to be set,
casting their "value" pointer accordingly and checking whether the value
fits.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to 09705696f7 (parse-options: introduce precision handling for
`OPTION_INTEGER`, 2025-04-17) support value variables of different sizes
for OPTION_BIT. Do that by requiring their "precision" to be set,
casting their "value" pointer accordingly and checking whether the value
fits.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to 09705696f7 (parse-options: introduce precision handling for
`OPTION_INTEGER`, 2025-04-17) support value variables of different sizes
for OPTION_SET_INT. Do that by requiring their "precision" to be set,
casting their "value" pointer accordingly and checking whether the value
fits.
Factor out the casting code from the part of do_get_value() that handles
OPTION_INTEGER to avoid code duplication. We're going to use it in the
next patches as well.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Build on 09705696f7 (parse-options: introduce precision handling for
`OPTION_INTEGER`, 2025-04-17) to support value variables of different
sizes for PARSE_OPT_CMDMODE options. Do that by requiring their
"precision" to be set and casting their "value" pointer accordingly.
Call the function that does the raw casting do_get_int_value() to
reserve the name get_int_value() for a more friendly wrapper we're
going to introduce in one of the next patches.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
When two remotes collide in the destinations of their fetch refspecs,
the results can be confusing. For example, in this silly example:
git config remote.one.url [...]
git config remote.one.fetch +refs/heads/*:refs/remotes/collide/*
git config remote.two.url [...]
git config remote.two.fetch +refs/heads/*:refs/remotes/collide/*
git fetch --all
we may try to write to the same ref twice (once for each remote we're
fetching). There's also a more subtle version of this. If you have
remotes "outer/inner" and "outer", then the ref "inner/branch" on the
second remote will conflict with just "branch" on the former (they both
want to write to "refs/remotes/outer/inner/branch").
We probably don't want to forbid this kind of overlap completely. While
the results can be confusing, there are legitimate reasons to have
multiple refs write into the same namespace (e.g., if one is a "backup"
of the other that is rarely fetched from).
But it may be worth limiting the porcelain "git remote" command to avoid
this confusion. The example above cannot be done with "git remote",
because it always[1] matches the refspecs to the remote name, and you
can only have one instance of each remote name. But you can still
trigger the more subtle variant like this:
git remote add outer [...]
git remote add outer/inner [...]
So let's detect that kind of name collision (in both directions) and
forbid it. You can still do whatever you like by manipulating the config
directly, but this should prevent the most obvious foot-gun.
[1] Almost always. With the --mirror option, the resulting refspec will
just write into "refs/*"; the remote name does not appear in the ref
namespace at all.
Our new "names must not overlap" rule is not necessary for that
case, but it seems reasonable to enforce it consistently. We already
require all remote names to be valid in the ref namespace, even
though we won't ever use them in that context for --mirror remotes.
Likewise, our new rule doesn't help with overlap here. Any two
mirror remotes will always overlap (in fact, any mirror remote along
with any other single one, since refs/remotes/ is a subset of the
mirrored refs). I'm not sure this is worth worrying about, but if it
is, we'd want an additional rule like "mirror remotes must be the
only remote".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git push" and "git fetch" are taught to update refs in batches to
gain performance.
* kn/fetch-push-bulk-ref-update:
receive-pack: handle reference deletions separately
refs/files: skip updates with errors in batched updates
receive-pack: use batched reference updates
send-pack: fix memory leak around duplicate refs
fetch: use batched reference updates
refs: add function to translate errors to strings
Updating submodules from the upstream did not work well when
submodule's HEAD is detached, which has been improved.
* jk/submodule-remote-lookup-cleanup:
submodule: look up remotes by URL first
submodule: move get_default_remote_submodule()
submodule--helper: improve logic for fallback remote name
remote: remove the_repository from some functions
dir: move starts_with_dot(_dot)_slash to dir.h
remote: fix tear down of struct remote
remote: remove branch->merge_name and fix branch_release()
The calls to sysctl() assume a 64-bit memory size for the variable
holding the value, but the actual size depends on the key name and
platform, at least for HW_PHYSMEM.
Detect any mismatched reads, and retry with a shorter variable
when needed.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor builtin/prune.c to remove the dependency on the global
'the_repository'. Replace all the occurrences of 'the_repository' with
repo and thus remove the definition '#define
USE_THE_REPOSITORY_VARIABLE'. Also, add a test to make sure that 'git
prune -h' can be called when the repository is `NULL`.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'extensions.preciousObjects' setting when set true, prevents
operations that might drop objects from the object storage. This setting
is populated in the global variable
'repository_format_precious_objects'.
Move this global variable to repo scope by adding it to 'struct
repository and also refactor all the occurences accordingly.
This change is part of an ongoing effort to eliminate global variables,
improve modularity and help libify the codebase.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The refs_warn_dangling_symrefs interface is a bit fragile as it passes
in printf-formatting strings with expectations about the number of
arguments. This patch series made it worse by adding a 2nd positional
argument. But there are only two call sites, and they both use almost
identical display options.
Make this safer by moving the format strings into the function that uses
them to make it easier to see when the arguments don't match. Pass a
prefix string and a dry_run flag so the decision logic can be handled
where needed.
Signed-off-by: Phil Hord <phil.hord@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pruning during `git fetch` we check each pruned ref against the
ref_store one at a time to decide whether to report it as dangling.
This causes every local ref to be scanned for each ref being pruned.
If there are N refs in the repo and M refs being pruned, this code is
O(M*N). However, `git remote prune` uses a very similar function that
is only O(N*log(M)).
Remove the wasteful ref scanning for each pruned ref and use the faster
version already available in refs_warn_dangling_symrefs. Change the
message to include the original refname since the message is no longer
printed immediately after the line that did just print the refname.
In a repo with 126,000 refs, where I was pruning 28,000 refs, this
code made about 3.6 billion calls to strcmp and consumed 410 seconds
of CPU. (Invariably in that time, my remote would timeout and the
fetch would fail anyway.)
After this change, the same operation completes in under a second.
Signed-off-by: Phil Hord <phil.hord@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a large variety of data formats and protocols where no hash
algorithm was defined and the default was assumed to always be SHA-1.
Instead of explicitly stating SHA-1, let's use the constant to represent
the legacy hash algorithm (which is still SHA-1) so that it's clear
for documentary purposes that it's a legacy fallback option and not an
intentional choice to use SHA-1.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have some commands that can operate inside or outside a repository.
If we're operating outside a repository, we clearly cannot use the
repository's hash algorithm as a default since it doesn't exist, so
instead, let's pick the default instead of specifically SHA-1. Right
now this results in no functional change since the default is SHA-1, but
that may change in the future.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `read_object_with_reference()` to `odb_read_object_peeled()` to
match other functions related to the object database and our modern
coding guidelines. Furthermore though, the old name didn't really
describe very well what this function actually does, which is to walk
down any commit and tag objects until an object of the required type has
been found. This is generally referred to as "peeling", so the new name
should be way more descriptive.
No compatibility wrapper is introduced as the function is not used a lot
throughout our codebase.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `has_object()` to `odb_has_object()` to match other functions
related to the object database and our modern coding guidelines.
Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `repo_read_object_file()` to `odb_read_object()` to match other
functions related to the object database and our modern coding
guidelines.
Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.
Introduce compatibility wrappers so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--recursive" flag for git-grep(1) allows users to grep for a string
across submodule boundaries. To make this work we add each submodule's
object sources to our own object database so that the objects can be
accessed directly.
The infrastructure for this depends on a global string list of submodule
paths. The caller is expected to call `add_submodule_odb_by_path()` for
each source and the object database will then eventually register all
submodule sources via `do_oid_object_info_extended()` in case it isn't
able to look up a specific object.
This reliance on global state is of course suboptimal with regards to
our libification efforts.
Refactor the logic so that the list of submodule sources is instead
tracked in the object database itself. This allows us to lose the
condition of `r == the_repository` before registering submodule sources
as we only ever add submodule sources to `the_repository` anyway. As
such, behaviour before and after this refactoring should always be the
same.
Rename the functions accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of iterator-style functions that execute a callback
for each instance of a given set, all of which currently depend on
`the_repository`. Refactor them to instead take an object database as
parameter so that we can get rid of this dependency.
Rename the functions accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions to manage alternates all depend on `the_repository`.
Refactor them to accept an object database as a parameter and adjust all
callers. The functions are renamed accordingly.
Note that right now the situation is still somewhat weird because we end
up using the object store path provided by the object store's repository
anyway. Consequently, we could have instead passed in a pointer to the
repository instead of passing in the pointer to the object store. This
will be addressed in subsequent commits though, where we will start to
use the path owned by the object store itself.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Get rid of our dependency on `the_repository` in `odb_mkstemp()` by
passing in the object database as a parameter and adjusting all callers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Get rid of our dependency on `the_repository` in `assert_oid_type()` by
passing in the object database as a parameter and adjusting all callers.
Rename the function to `odb_assert_oid_type()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Get rid of our dependency on `the_repository` in `find_odb()` by passing
in the object database in which we want to search for the source and
adjusting all callers.
Rename the function to `odb_find_source()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `object_directory` structure is used as an access point for a single
object directory like ".git/objects". While the structure isn't yet
fully self-contained, the intent is for it to eventually contain all
information required to access objects in one specific location.
While the name "object directory" is a good fit for now, this will
change over time as we continue with the agenda to make pluggable object
databases a thing. Eventually, objects may not be accessed via any kind
of directory at all anymore, but they could instead be backed by any
kind of durable storage mechanism. While it seems quite far-fetched for
now, it is thinkable that eventually this might even be some form of a
database, for example.
As such, the current name of this structure will become worse over time
as we evolve into the direction of pluggable ODBs. Immediate next steps
will start to carve out proper self-contained object directories, which
requires us to pass in these object directories as parameters. Based on
our modern naming schema this means that those functions should then be
named after their subsystem, which means that we would start to bake the
current name into the codebase more and more.
Let's preempt this by renaming the structure. There have been a couple
alternatives that were discussed:
- `odb_backend` was discarded because it led to the association that
one object database has a single backend, but the model is that one
alternate has one backend. Furthermore, "backend" is more about the
actual backing implementation and less about the high-level concept.
- `odb_alternate` was discarded because it is a bit of a stretch to
also call the main object directory an "alternate".
Instead, pick `odb_source` as the new name. It makes it sufficiently
clear that there can be multiple sources and does not cause confusion
when mixed with the already-existing "alternate" terminology.
In the future, this change allows us to easily introduce for example a
`odb_files_source` and other format-specific implementations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previous commit has plugged one leak in the normal code path, but
there is an early exit that leaves without releasing any resources
acquired in the function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
4e51389000 (builtin/config: introduce "get" subcommand, 2024-05-06)
introduced `get` and `--url` but didn’t add `--url` to the synopsis.
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This option was introduced in a series of commits from fe3ccc7aab (Merge
branch 'ps/config-subcommands', 2024-05-15). But two styles were used
for the value provided to the option:
1. Synopsis: `--value=<value>`
2. Deprecated Modes: `--value=<pattern>`
(2) is also used in the synopsis on the command.
Use (2) consistently throughout since it’s a pattern in the general
case (`value` sounds more generic).
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git merge/pull" has been taught the "--compact-summary" option to
use the compact-summary format, intead of diffstat, when showing
the summary of the incoming changes.
* jc/merge-compact-summary:
merge/pull: extend merge.stat configuration variable to cover --compact-summary
merge/pull: add the "--compact-summary" option
An interchange format for stash entries is defined, and subcommand
of "git stash" to import/export has been added.
* bc/stash-export-import:
builtin/stash: provide a way to import stashes from a ref
builtin/stash: provide a way to export stashes to a ref
builtin/stash: factor out revision parsing into a function
object-name: make get_oid quietly return an error
Commit c800963578 ("fetch-pack, send-pack: clean up shallow oid
array", 2024-09-25) cleaned up the shallow oid array in cmd_send_pack,
but didn't clean up extra_have, which is still leaked at program exit.
I suspect the particular tests in t5539 don't trigger any additions to
the extra_have array, which explains why the tests can pass leak free
despite this gap.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git maintenance" lacked the care "git gc" had to avoid holding
onto the repository lock for too long during packing refs, which
has been remedied.
* ps/maintenance-ref-lock:
builtin/maintenance: fix locking race when handling "gc" task
builtin/gc: avoid global state in `gc_before_repack()`
usage: allow dying without writing an error message
builtin/maintenance: fix locking race with refs and reflogs tasks
builtin/maintenance: split into foreground and background tasks
builtin/maintenance: fix typedef for function pointers
builtin/maintenance: extract function to run tasks
builtin/maintenance: stop modifying global array of tasks
builtin/maintenance: mark "--task=" and "--schedule=" as incompatible
builtin/maintenance: centralize configuration of explicit tasks
builtin/gc: drop redundant local variable
builtin/gc: use designated field initializers for maintenance tasks
"git whatchanged" that is longer to type than "git log --raw"
which is its modern rough equivalent has outlived its usefulness
more than 10 years ago. Plan to deprecate and remove it.
* jc/you-still-use-whatchanged:
whatschanged: list it in BreakingChanges document
whatchanged: remove when built with WITH_BREAKING_CHANGES
whatchanged: require --i-still-use-this
tests: prepare for a world without whatchanged
doc: prepare for a world without whatchanged
you-still-use-that??: help deprecating commands for removal
In 9d2962a7c4 (receive-pack: use batched reference updates, 2025-05-19)
we updated the 'git-receive-pack(1)' command to use batched reference
updates. One edge case which was missed during this implementation was
when a user pushes multiple branches such as:
delete refs/heads/branch/conflict
create refs/heads/branch
Before using batched updates, the references would be applied
sequentially and hence no conflicts would arise. With batched updates,
while the first update applies, the second fails due to D/F conflict. A
similar issue was present in 'git-fetch(1)' and was fixed by separating
out reference pruning into a separate transaction in the commit 'fetch:
use batched reference updates'. Apply a similar mechanism for
'git-receive-pack(1)' and separate out reference deletions into its own
batch.
This means 'git-receive-pack(1)' will now use up to two transactions,
whereas before using batched updates it would use _at least_ two
transactions. So using batched updates is still the better option.
Add a test to validate this behavior.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git stash" recorded a wrong branch name when submodules are
present in the current checkout, which has been corrected.
* kj/stash-onbranch-submodule-fix:
stash: fix incorrect branch name in stash message
The get_default_remote_submodule() function performs a lookup to find
the appropriate remote to use within a submodule. The function first
checks to see if it can find the remote for the current branch. If this
fails, it then checks to see if there is exactly one remote. It will use
this, before finally falling back to "origin" as the default.
If a user happens to rename their default remote from origin, either
manually or by setting something like clone.defaultRemoteName, this
fallback will not work.
In such cases, the submodule logic will try to use a non-existent
remote. This usually manifests as a failure to trigger the submodule
update.
The parent project already knows and stores the submodule URL in either
.gitmodules or its .git/config.
Add a new repo_remote_from_url() helper which will iterate over all the
remotes in a repository and return the first remote which has a matching
URL.
Refactor get_default_remote_submodule to find the submodule and get its
URL. If a valid URL exists, first try to obtain a remote using the new
repo_remote_from_url(). Fall back to the repo_default_remote()
otherwise.
The fallback logic is kept in case for some reason the user has manually
changed the URL within the submodule. Additionally, we still try to use
a remote rather than directly passing the URL in the
fetch_in_submodule() logic. This ensures that an update will properly
update the remote refs within the submodule as expected, rather than
just fetching into FETCH_HEAD.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A future refactor got get_default_remote_submodule() is going to depend on
resolve_relative_url(). That function depends on get_default_remote().
Move get_default_remote_submodule() after resolve_relative_url() first
to make the additional functionality easier to review.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The repo_get_default_remote() function in submodule--helper currently
tries to figure out the proper remote name to use for a submodule based
on a few factors.
First, it tries to find the remote for the currently checked out branch.
This works if the submodule is configured to checkout to a branch
instead of a detached HEAD state.
In the detached HEAD state, the code calls back to using "origin", on
the assumption that this is the default remote name. Some users may
change this, such as by setting clone.defaultRemoteName, or by changing
the remote name manually within the submodule repository.
As a first step to improving this situation, refactor to reuse the logic
from remotes_remote_for_branch(). This function uses the remote from the
branch if it has one. If it doesn't then it checks to see if there is
exactly one remote. It uses this remote first before attempting to fall
back to "origin".
To allow using this helper function, introduce a repo_default_remote()
helper to remote.c which takes a repository structure. This helper will
load the remote configuration and get the "HEAD" branch. Then it will
call remotes_remote_for_branch to find the default remote.
Replace calls of repo_get_default_remote() with the calls to this new
function. To maintain consistency with the existing callers, continue
copying the returned string with xstrdup.
This isn't a perfect solution for users who change remote names, but it
should help in cases where the remote name is changed but users haven't
added any additional remotes.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both submodule--helper.c and submodule-config.c have an implementation
of starts_with_dot_slash and starts_with_dot_dot_slash. The dir.h header
has starts_with_dot(_dot)_slash_native, which sets PATH_MATCH_NATIVE.
Move the helpers to dir.h as static inlines. I thought about renaming
them to postfix with _platform but that felt too long and ugly. On the
other hand it might be slightly confusing with _native.
This simplifies a submodule refactor which wants to use the helpers
earlier in the submodule--helper.c file.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The branch structure has both branch->merge_name and branch->merge for
tracking the merge information. The former is allocated by add_merge()
and stores the names read from the configuration file. The latter is
allocated by set_merge() which is called by branch_get() when an
external caller requests a branch.
This leads to the confusing situation where branch->merge_nr tracks both
the size of branch->merge (once its allocated) and branch->merge_name.
The branch_release() function incorrectly assumes that branch->merge is
always set when branch->merge_nr is non-zero, and can potentially crash
if read_config() is called without branch_get() being called on every
branch.
In addition, branch_release() fails to free some of the memory
associated with the structure including:
* Failure to free the refspec_item containers in branch->merge[i]
* Failure to free the strings in branch->merge_name[i]
* Failure to free the branch->merge_name parent array.
The set_merge() function sets branch->merge_nr to 0 when there is no
valid remote_name, to avoid external callers seeing a non-zero merge_nr
but a NULL merge array. This results in failure to release most of the
merge data as well.
These issues could be fixed directly, and indeed I initially proposed
such a change at [1] in the past. While this works, there was some
confusion during review because of the inconsistencies.
Instead, its time to clean up the situation properly. Remove
branch->merge_name entirely. Instead, allocate branch->merge earlier
within add_merge() instead of within set_merge(). Instead of having
set_merge() copy from merge_name[i] to merge[i]->src, just have
add_merge() directly initialize merge[i]->src.
Modify the add_merge() to call xstrdup() itself, instead of having
the caller of add_merge() do so. This makes it more obvious which code
owns the memory.
Update all callers which use branch->merge_name[i] to use
branch->merge[i]->src instead.
Add a merge_clear() function which properly releases all of the
merge-related memory, and which sets branch->merge_nr to zero. Use this
both in branch_release() and in set_merge(), fixing the leak when
set_merge() finds no valid remote_name.
Add a set_merge variable to the branch structure, which indicates
whether set_merge() has been called. This replaces the previous use of a
NULL check against the branch->merge array.
With these changes, the merge array is always allocated when merge_nr is
non-zero.
This use of refspec_item to store the names should be safe. External
callers should be using branch_get() to obtain a pointer to the branch,
which will call set_merge(), and the callers internal to remote.c
already handle the partially initialized refpsec_item structure safely.
This end result is cleaner, and avoids duplicating the merge names
twice.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Link: [1] https://lore.kernel.org/git/20250617-jk-submodule-helper-use-url-v2-1-04cbb003177d@gmail.com/
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
MIDX with '--write-midx' to ensure that the resulting MIDX was always
closed under reachability in order to generate reachability bitmaps.
While the previous patch added the '--stdin-packs=follow' option to
pack-objects, it is not yet on by default. Given that, suppose you have
a once-unreachable object packed in a cruft pack, which later becomes
reachable from one or more objects in a geometrically repacked pack.
That once-unreachable object *won't* appear in the new pack, since the
cruft pack was not specified as included or excluded when the
geometrically repacked pack was created with 'pack-objects
--stdin-packs' (*not* '--stdin-packs=follow', which is not on). If that
new pack is included in a MIDX without the cruft pack, then trying to
generate bitmaps for that MIDX may fail. This happens when the bitmap
selection process picks one or more commits which reach the
once-unreachable objects.
To mitigate this failure mode, commit ddee3703b3 ensures that the MIDX
will be closed under reachability by including cruft pack(s). If cruft
pack(s) were not included, we would fail to generate a MIDX bitmap. But
ddee3703b3 alludes to the fact that this is sub-optimal by saying
[...] it's desirable to avoid including cruft packs in the MIDX
because it causes the MIDX to store a bunch of objects which are
likely to get thrown away.
, which is true, but hides an even larger problem. If repositories
rarely prune their unreachable objects and/or have many of them, the
MIDX must keep track of a large number of objects which bloats the MIDX
and slows down object lookup.
This is doubly unfortunate because the vast majority of objects in cruft
pack(s) are unlikely to be read. But any object lookups that go through
the MIDX must binary search over them anyway, slowing down object
lookups using the MIDX.
This patch causes geometrically-repacked packs to contain a copy of any
once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
allowing us to avoid including any cruft packs in the MIDX. This is
because a sequence of geometrically-repacked packs that were all
generated with '--stdin-packs=follow' are guaranteed to have their union
be closed under reachability.
Note that you cannot guarantee that a collection of packs is closed
under reachability if not all of them were generated with "following" as
above. One tell-tale sign that not all geometrically-repacked packs in
the MIDX were generated with "following" is to see if there is a pack in
the existing MIDX that is not going to be somehow represented (either
verbatim or as part of a geometric rollup) in the new MIDX.
If there is, then starting to generate packs with "following" during
geometric repacking won't work, since it's open to the same race as
described above.
But if you're starting from scratch (e.g., building the first MIDX after
an all-into-one '--cruft' repack), then you can guarantee that the union
of subsequently generated packs from geometric repacking *is* closed
under reachability.
(One exception here is when "starting from scratch" results in a noop
repack, e.g., because the non-cruft pack(s) in a repository already form
a geometric progression. Since we can't tell whether or not those were
generated with '--stdin-packs=follow', they may depend on
once-unreachable objects, so we have to include the cruft pack in the
MIDX in this case.)
Detect when this is the case and avoid including cruft packs in the MIDX
where possible. The existing behavior remains the default, and the new
behavior is available with the config 'repack.midxMustIncludeCruft' set
to 'false'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When invoked with '--stdin-packs', pack-objects will generate a pack
which contains the objects found in the "included" packs, less any
objects from "excluded" packs.
Packs that exist in the repository but weren't specified as either
included or excluded are in practice treated like the latter, at least
in the sense that pack-objects won't include objects from those packs.
This behavior forces us to include any cruft pack(s) in a repository's
multi-pack index for the reasons described in ddee3703b3
(builtin/repack.c: add cruft packs to MIDX during geometric repack,
2022-05-20).
The full details are in ddee3703b3, but the gist is if you
have a once-unreachable object in a cruft pack which later becomes
reachable via one or more commits in a pack generated with
'--stdin-packs', you *have* to include that object in the MIDX via the
copy in the cruft pack, otherwise we cannot generate reachability
bitmaps for any commits which reach that object.
Note that the traversal here is best-effort, similar to the existing
traversal which provides name-hash hints. This means that the object
traversal may hand us back a blob that does not actually exist. We
*won't* see missing trees/commits with 'ignore_missing_links' because:
- missing commit parents are discarded at the commit traversal stage by
revision.c::process_parents()
- missing tag objects are discarded by revision.c::handle_commit()
- missing tree objects are discarded by the list-objects code in
list-objects.c::process_tree()
But we have to handle potentially-missing blobs specially by making a
separate check to ensure they exist in the repository. Failing to do so
would mean that we'd add an object to the packing list which doesn't
actually exist, rendering us unable to write out the pack.
This prepares us for new repacking behavior which will "resurrect"
objects found in cruft or otherwise unspecified packs when generating
new packs. In the context of geometric repacking, this may be used to
maintain a sequence of geometrically-repacked packs, the union of which
is closed under reachability, even in the case described earlier.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
show_commit_pack_hint() has heretofore been a noop, so its position
within its compilation unit only needs to appear before its first use.
But the following commit will sometimes have `show_commit_pack_hint()`
call `show_object_pack_hint()`, so reorder the former to appear after
the latter to minimize the code movement in that patch.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With '--unpacked', pack-objects adds loose objects (which don't appear
in any of the excluded packs from '--stdin-packs') to the output pack
without considering them as reachability tips for the name-hash
traversal.
This was an oversight in the original implementation of '--stdin-packs',
since the code which enumerates and adds loose objects to the output
pack (`add_unreachable_loose_objects()`) did not have access to the
'rev_info' struct found in `read_packs_list_from_stdin()`.
Excluding unpacked objects from that traversal doesn't affect the
correctness of the resulting pack, but it does make it harder to
discover good deltas for loose objects.
Now that the 'rev_info' struct is declared outside of
`read_packs_list_from_stdin()`, we can pass it to
`add_objects_in_unpacked_packs()` and add any loose objects as tips to
the above-mentioned traversal, in theory producing slightly tighter
packs as a result.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once 'read_packs_list_from_stdin()' has called for_each_object_in_pack()
on each of the input packs, we do a reachability traversal to discover
names for any objects we picked up so we can generate name hash values
and hopefully get higher quality deltas as a result.
A future commit will change the purpose of this reachability traversal
to find and pack objects which are reachable from commits in the input
packs, but are packed in an unknown (not included nor excluded) pack.
Extract the code which initializes and performs the reachability
traversal to take place in the caller, not the callee, which prepares us
to share this code for the '--unpacked' case (see the function
add_unreachable_loose_objects() for more details).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the bottom of cmd_pack_objects() we check which mode the command is
running in (e.g., generating a cruft pack, handling '--stdin-packs',
using the internal rev-list, etc.) and handle the mode appropriately.
The '--stdin-packs' case is handled inline (dating back to its
introduction in 339bce27f4 (builtin/pack-objects.c: add '--stdin-packs'
option, 2021-02-22)) since it is relatively short. Extract the body of
"if (stdin_packs)" into its own function to prepare for the
implementation to become lengthier in a following commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In add_object_entry_from_pack() we declare 'revs' (given to us through
the miscellaneous context argument) earlier in the "if (p)" conditional
than is necessary. Move it down as far as it can go to reduce its
scope.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects has a handful of explicit checks for pairs of command-line
options which are mutually incompatible. Many of these pre-date
a699367bb8 (i18n: factorize more 'incompatible options' messages,
2022-01-31).
Convert the explicit checks into die_for_incompatible_opt2() calls,
which simplifies the implementation and standardizes pack-objects'
output when given incompatible options (e.g., --stdin-packs with
--filter gives different output than --keep-unreachable with
--unpack-unreachable).
There is one minor piece of test fallout in t5331 that expects the old
format, which has been corrected.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A memory leak on an error code path has been plugged.
* ly/submodule-update-failure-leakfix:
builtin/submodule--helper: fix leak when remote_submodule_branch() failed
"git diff --no-index dirA dirB" can limit the comparison with
pathspec at the end of the command line, just like normal "git
diff".
* jk/diff-no-index-with-pathspec:
diff --no-index: support limiting by pathspec
pathspec: add flag to indicate operation without repository
pathspec: add match_leading_pathspec variant
"git cat-file --batch" learns to understand %(objectmode) atom to
allow the caller to tell missing objects (due to repository
corruption) and submodules (whose commit objects are OK to be
missing) apart.
* vd/cat-file-objectmode-update:
cat-file.c: add batch handling for submodules
cat-file: add %(objectmode) atom
t1006: update 'run_tests' to test generic object specifiers
"git pack-objects" learns to find delta bases from blobs at the
same path, using the --path-walk API.
* ds/path-walk-2:
pack-objects: allow --shallow and --path-walk
path-walk: add new 'edge_aggressive' option
pack-objects: thread the path-based compression
pack-objects: refactor path-walk delta phase
scalar: enable path-walk during push via config
pack-objects: enable --path-walk via config
repack: add --path-walk option
t5538: add tests to confirm deltas in shallow pushes
pack-objects: introduce GIT_TEST_PACK_PATH_WALK
p5313: add performance tests for --path-walk
pack-objects: update usage to match docs
pack-objects: add --path-walk option
pack-objects: extract should_attempt_deltas()
Existing `merge.stat` configuration variable is a Boolean that
defaults to `true` to control `git merge --[no-]stat` behaviour.
Extend it to be "Boolean or text", that takes false, true, or
"compact", with the last one triggering the --compact-summary option
introduced earlier. Any other values are taken as the same as true,
instead of signaling an error---it is not a grave enough offence to
stop their merge.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git merge" and "git pull" shows "git diff --stat --summary @{1}"
when they finish to indicate the extent of the changes brought into
the history by default. While it gives a good overview, it becomes
annoying when there are very many created or deleted paths.
Introduce "--compact-summary" option to these two commands that
tells it to instead show "git diff --compact-summary @{1}", which
gives the same information in a lot more compact form in such a
situation.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have a way to export stashes to a ref, let's provide a way
to import them from such a ref back to the stash. This works much the
way the export code does, except that we strip off the first parent
chain commit and then store each resulting commit back to the stash.
We don't clear the stash first and instead add the specified stashes to
the top of the stash. This is because users may want to export just a
few stashes, such as to share a small amount of work in progress with a
colleague, and it would be undesirable for the receiving user to lose
all of their data. For users who do want to replace the stash, it's
easy to do to: simply run "git stash clear" first.
We specifically rely on the fact that we'll produce identical stash
commits on both sides in our tests. This provides a cheap,
straightforward check for our tests and also makes it easy for users to
see if they already have the same data in both repositories.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A common user problem is how to sync in-progress work to another
machine. Users currently must use some sort of transfer of the working
tree, which poses security risks and also necessarily causes the index
to become dirty. The experience is suboptimal and frustrating for
users.
A reasonable idea is to use the stash for this purpose, but the stash is
stored in the reflog, not in a ref, and as such it cannot be pushed or
pulled. This also means that it cannot be saved into a bundle or
preserved elsewhere, which is a problem when using throwaway development
environments.
In addition, users often want to replicate stashes across machines, such
as when they must use multiple machines or when they use throwaway dev
environments, such as those based on the Devcontainer spec, where they
might otherwise lose various in-progress work.
Let's solve this problem by allowing the user to export the stash to a
ref (or, to just write it into the repository and print the hash, à la
git commit-tree). Introduce git stash export, which writes a chain of
commits where the first parent is always a chain to the previous stash,
or to a single, empty commit (for the final item) and the second is the
stash commit normally written to the reflog.
Iterate over each stash from top to bottom, looking up the data for each
one, and then create the chain from the single empty commit back up in
reverse order. Generate a predictable empty commit so our behavior is
reproducible. Create a useful commit message, preserving the author and
committer information, to help users identify stash commits when viewing
them as normal commits.
If the user has specified specific stashes they'd like to export
instead, use those instead of iterating over all of the stashes.
As part of this, specifically request quiet behavior when looking up the
OID for a revision because we will eventually hit a revision that
doesn't exist and we don't want to die when that occurs.
When exporting stashes, be sure to verify that they look like valid
stashes and don't contain invalid data. This will help avoid failures
on import or problems due to attempting to export invalid refs that are
not stashes.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We allow several special forms of stash names in this code. In the
future, we'll want to allow these same forms without parsing a stash
commit, so let's refactor this code out into a function for reuse.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a stash, Git uses the current branch name
of the superproject to construct the stash commit message.
However, in repositories with submodules,
the message may mistakenly display the submodule branch name instead.
This is because `refs_resolve_ref_unsafe()` returns a pointer to a static buffer.
Subsequent calls to the same function overwrite the buffer,
corrupting the originally fetched `branch_name` used for the stash message.
Use `xstrdup()` to duplicate the branch name immediately after resolving it,
so that later buffer overwrites do not affect the stash message.
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If rebase.instructionFormat is invalid the repository is left in a
strange state when the interactive rebase fails. `git status` outputs
boths the same as it would in the normal case *and* something related to
interactive rebase:
$ git -c rebase.instructionFormat=blah rebase -i
fatal: invalid --pretty format: blah
$ git status
On branch master
Your branch is ahead of 'upstream/master' by 1 commit.
(use "git push" to publish your local commits)
git-rebase-todo is missing.
No commands done.
No commands remaining.
You are currently editing a commit while rebasing branch 'master' on '8db3019401'.
(use "git commit --amend" to amend the current commit)
(use "git rebase --continue" once you are satisfied with your changes)
By attempting to write the rebase script before initializing the state
this potential scenario is avoided.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In builtin/submodule--helper.c:update_submodule(), the variable
remote_name is allocated in get_default_remote_submodule() but
may be leaked if remote_submodule_branch() fails. Although it is
unlikely that remote_submodule_branch() would fail after successfully
obtaining a remote ref name from get_default_remote_submodule(),
it is still possible. To prevent a potential memory leak, add a
call to free(remote_name) at the early exit point.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The support for assuming "push" when "-p" is given introduced in
9e140909f6 (stash: allow pathspecs in the no verb form, 2017-02-28) is
very narrow, neither "git stash -m <message> -p <pathspec>" nor "git
stash --patch <pathspec>" imply "push" and die instead. Relax this by
passing PARSE_OPT_STOP_AT_NON_OPTION when push is being assumed and then
setting "force_assume" if "--patch" was present. This means "git stash
<pathspec> -p" still dies so that it does not assume the user meant
"push" if they mistype a subcommand name but "git stash -m <message> -p
<pathspec>" will now succeed. The test added in the last commit is
adjusted to check that push is still assumed when "--patch" comes after
other options on the command-line.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Historically "git stash [<options>]" was assumed to mean "git stash save
[<options>]". Since 1ada5020b3 (stash: use stash_push for no verb form,
2017-02-28) it is assumed to mean "git stash push [<options>]". As the
push subcommand supports pathspecs, 9e140909f6 (stash: allow pathspecs
in the no verb form, 2017-02-28) allowed "git stash -p <pathspec>" to
mean "git stash push -p <pathspec>". This was broken in 8c3713cede
(stash: eliminate crude option parsing, 2020-02-17) which failed to
account for "push" being added to the start of argv in cmd_stash()
before it calls push_stash() and kept looking in argv[0] for "-p" after
moving the code to push_stash().
Fix this by regression by checking argv[1] instead of argv[0] and add a
couple of tests to prevent future regressions.
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pretty.c:repo_logmsg_reencode() allocated memory should be freed with
repo_unuse_commit_buffer(). Callers sometimes forgot free it at exit
point. Add `repo_unuse_commit_buffer()` in insert_records_from_trailers
at builtin/shortlog.c and create_commit at builtin/replay.c
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit-graph.c:graph_write(), if read_one_commit() failed,
progress allocated in start_delayed_progress() will leak. Add
stop_progress() before goto cleanup.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In builtin/fetch-pack.c:cmd_fetch_pack(), if finish_connect() failed,
it returns error code without cleanup which cause memory leak. Add
cleanup label before frees in the end of cmd_fetch_pack(), and add
`goto cleanup` if finish_connect() failed.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>