Commit Graph

12934 Commits (main)

Author SHA1 Message Date
Patrick Steinhardt 961038856b odb: get rid of `the_repository` in `assert_oid_type()`
Get rid of our dependency on `the_repository` in `assert_oid_type()` by
passing in the object database as a parameter and adjusting all callers.

Rename the function to `odb_assert_oid_type()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:35 -07:00
Patrick Steinhardt bd52ea343d odb: get rid of `the_repository` in `find_odb()`
Get rid of our dependency on `the_repository` in `find_odb()` by passing
in the object database in which we want to search for the source and
adjusting all callers.

Rename the function to `odb_find_source()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:35 -07:00
Patrick Steinhardt 8f49151763 object-store: rename files to "odb.{c,h}"
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:34 -07:00
Patrick Steinhardt a1e2581a1e object-store: rename `object_directory` to `odb_source`
The `object_directory` structure is used as an access point for a single
object directory like ".git/objects". While the structure isn't yet
fully self-contained, the intent is for it to eventually contain all
information required to access objects in one specific location.

While the name "object directory" is a good fit for now, this will
change over time as we continue with the agenda to make pluggable object
databases a thing. Eventually, objects may not be accessed via any kind
of directory at all anymore, but they could instead be backed by any
kind of durable storage mechanism. While it seems quite far-fetched for
now, it is thinkable that eventually this might even be some form of a
database, for example.

As such, the current name of this structure will become worse over time
as we evolve into the direction of pluggable ODBs. Immediate next steps
will start to carve out proper self-contained object directories, which
requires us to pass in these object directories as parameters. Based on
our modern naming schema this means that those functions should then be
named after their subsystem, which means that we would start to bake the
current name into the codebase more and more.

Let's preempt this by renaming the structure. There have been a couple
alternatives that were discussed:

  - `odb_backend` was discarded because it led to the association that
    one object database has a single backend, but the model is that one
    alternate has one backend. Furthermore, "backend" is more about the
    actual backing implementation and less about the high-level concept.

  - `odb_alternate` was discarded because it is a bit of a stretch to
    also call the main object directory an "alternate".

Instead, pick `odb_source` as the new name. It makes it sufficiently
clear that there can be multiple sources and does not cause confusion
when mixed with the already-existing "alternate" terminology.

In the future, this change allows us to easily introduce for example a
`odb_files_source` and other format-specific implementations.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:34 -07:00
Junio C Hamano b0e9d25865 send-pack: clean-up even when taking an early exit
Previous commit has plugged one leak in the normal code path, but
there is an early exit that leaves without releasing any resources
acquired in the function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:17:25 -07:00
Kristoffer Haugsbakk c4e9775c60 config: mention --url in the synopsis
4e51389000 (builtin/config: introduce "get" subcommand, 2024-05-06)
introduced `get` and `--url` but didn’t add `--url` to the synopsis.

Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 10:28:48 -07:00
Kristoffer Haugsbakk f322f86e30 config: use --value=<pattern> consistently
This option was introduced in a series of commits from fe3ccc7aab (Merge
branch 'ps/config-subcommands', 2024-05-15).  But two styles were used
for the value provided to the option:

1. Synopsis: `--value=<value>`
2. Deprecated Modes: `--value=<pattern>`

(2) is also used in the synopsis on the command.

Use (2) consistently throughout since it’s a pattern in the general
case (`value` sounds more generic).

Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 10:28:41 -07:00
Junio C Hamano d2e49d2b76 Merge branch 'jc/merge-compact-summary'
"git merge/pull" has been taught the "--compact-summary" option to
use the compact-summary format, intead of diffstat, when showing
the summary of the incoming changes.

* jc/merge-compact-summary:
  merge/pull: extend merge.stat configuration variable to cover --compact-summary
  merge/pull: add the "--compact-summary" option
2025-06-30 14:30:31 -07:00
Junio C Hamano 91f10d7ca2 Merge branch 'bc/stash-export-import'
An interchange format for stash entries is defined, and subcommand
of "git stash" to import/export has been added.

* bc/stash-export-import:
  builtin/stash: provide a way to import stashes from a ref
  builtin/stash: provide a way to export stashes to a ref
  builtin/stash: factor out revision parsing into a function
  object-name: make get_oid quietly return an error
2025-06-30 14:30:31 -07:00
Jacob Keller d1c44861f9 send-pack: clean up extra_have oid array
Commit c800963578 ("fetch-pack, send-pack: clean up shallow oid
array", 2024-09-25) cleaned up the shallow oid array in cmd_send_pack,
but didn't clean up extra_have, which is still leaked at program exit.
I suspect the particular tests in t5539 don't trigger any additions to
the extra_have array, which explains why the tests can pass leak free
despite this gap.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-27 15:17:57 -07:00
Junio C Hamano 4c9a5d7729 Merge branch 'ps/maintenance-ref-lock'
"git maintenance" lacked the care "git gc" had to avoid holding
onto the repository lock for too long during packing refs, which
has been remedied.

* ps/maintenance-ref-lock:
  builtin/maintenance: fix locking race when handling "gc" task
  builtin/gc: avoid global state in `gc_before_repack()`
  usage: allow dying without writing an error message
  builtin/maintenance: fix locking race with refs and reflogs tasks
  builtin/maintenance: split into foreground and background tasks
  builtin/maintenance: fix typedef for function pointers
  builtin/maintenance: extract function to run tasks
  builtin/maintenance: stop modifying global array of tasks
  builtin/maintenance: mark "--task=" and "--schedule=" as incompatible
  builtin/maintenance: centralize configuration of explicit tasks
  builtin/gc: drop redundant local variable
  builtin/gc: use designated field initializers for maintenance tasks
2025-06-25 14:07:36 -07:00
Junio C Hamano a5cc6a2bc5 Merge branch 'jc/you-still-use-whatchanged'
"git whatchanged" that is longer to type than "git log --raw"
which is its modern rough equivalent has outlived its usefulness
more than 10 years ago.  Plan to deprecate and remove it.

* jc/you-still-use-whatchanged:
  whatschanged: list it in BreakingChanges document
  whatchanged: remove when built with WITH_BREAKING_CHANGES
  whatchanged: require --i-still-use-this
  tests: prepare for a world without whatchanged
  doc: prepare for a world without whatchanged
  you-still-use-that??: help deprecating commands for removal
2025-06-25 14:07:35 -07:00
Karthik Nayak 5c697f0b7d receive-pack: handle reference deletions separately
In 9d2962a7c4 (receive-pack: use batched reference updates, 2025-05-19)
we updated the 'git-receive-pack(1)' command to use batched reference
updates. One edge case which was missed during this implementation was
when a user pushes multiple branches such as:

  delete refs/heads/branch/conflict
  create refs/heads/branch

Before using batched updates, the references would be applied
sequentially and hence no conflicts would arise. With batched updates,
while the first update applies, the second fails due to D/F conflict. A
similar issue was present in 'git-fetch(1)' and was fixed by separating
out reference pruning into a separate transaction in the commit 'fetch:
use batched reference updates'. Apply a similar mechanism for
'git-receive-pack(1)' and separate out reference deletions into its own
batch.

This means 'git-receive-pack(1)' will now use up to two transactions,
whereas before using batched updates it would use _at least_ two
transactions. So using batched updates is still the better option.

Add a test to validate this behavior.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-25 08:20:27 -07:00
Junio C Hamano 77eb1dc722 Merge branch 'kj/stash-onbranch-submodule-fix'
"git stash" recorded a wrong branch name when submodules are
present in the current checkout, which has been corrected.

* kj/stash-onbranch-submodule-fix:
  stash: fix incorrect branch name in stash message
2025-06-24 09:48:48 -07:00
Junio C Hamano d2fb103447 Merge branch 'pw/stash-p-pathspec-fixes'
"git stash -p <pathspec>" improvements.

* pw/stash-p-pathspec-fixes:
  stash: allow "git stash [<options>] --patch <pathspec>" to assume push
  stash: allow "git stash -p <pathspec>" to assume push again
2025-06-24 09:48:47 -07:00
Jacob Keller ca62f524c1 submodule: look up remotes by URL first
The get_default_remote_submodule() function performs a lookup to find
the appropriate remote to use within a submodule. The function first
checks to see if it can find the remote for the current branch. If this
fails, it then checks to see if there is exactly one remote. It will use
this, before finally falling back to "origin" as the default.

If a user happens to rename their default remote from origin, either
manually or by setting something like clone.defaultRemoteName, this
fallback will not work.

In such cases, the submodule logic will try to use a non-existent
remote. This usually manifests as a failure to trigger the submodule
update.

The parent project already knows and stores the submodule URL in either
.gitmodules or its .git/config.

Add a new repo_remote_from_url() helper which will iterate over all the
remotes in a repository and return the first remote which has a matching
URL.

Refactor get_default_remote_submodule to find the submodule and get its
URL. If a valid URL exists, first try to obtain a remote using the new
repo_remote_from_url(). Fall back to the repo_default_remote()
otherwise.

The fallback logic is kept in case for some reason the user has manually
changed the URL within the submodule. Additionally, we still try to use
a remote rather than directly passing the URL in the
fetch_in_submodule() logic. This ensures that an update will properly
update the remote refs within the submodule as expected, rather than
just fetching into FETCH_HEAD.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 16:38:57 -07:00
Jacob Keller fedfb0735b submodule: move get_default_remote_submodule()
A future refactor got get_default_remote_submodule() is going to depend on
resolve_relative_url(). That function depends on get_default_remote().

Move get_default_remote_submodule() after resolve_relative_url() first
to make the additional functionality easier to review.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 16:38:57 -07:00
Jacob Keller e759275c8f submodule--helper: improve logic for fallback remote name
The repo_get_default_remote() function in submodule--helper currently
tries to figure out the proper remote name to use for a submodule based
on a few factors.

First, it tries to find the remote for the currently checked out branch.
This works if the submodule is configured to checkout to a branch
instead of a detached HEAD state.

In the detached HEAD state, the code calls back to using "origin", on
the assumption that this is the default remote name. Some users may
change this, such as by setting clone.defaultRemoteName, or by changing
the remote name manually within the submodule repository.

As a first step to improving this situation, refactor to reuse the logic
from remotes_remote_for_branch(). This function uses the remote from the
branch if it has one. If it doesn't then it checks to see if there is
exactly one remote. It uses this remote first before attempting to fall
back to "origin".

To allow using this helper function, introduce a repo_default_remote()
helper to remote.c which takes a repository structure. This helper will
load the remote configuration and get the "HEAD" branch. Then it will
call remotes_remote_for_branch to find the default remote.

Replace calls of repo_get_default_remote() with the calls to this new
function. To maintain consistency with the existing callers, continue
copying the returned string with xstrdup.

This isn't a perfect solution for users who change remote names, but it
should help in cases where the remote name is changed but users haven't
added any additional remotes.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 16:38:57 -07:00
Jacob Keller 059268fd05 dir: move starts_with_dot(_dot)_slash to dir.h
Both submodule--helper.c and submodule-config.c have an implementation
of starts_with_dot_slash and starts_with_dot_dot_slash. The dir.h header
has starts_with_dot(_dot)_slash_native, which sets PATH_MATCH_NATIVE.

Move the helpers to dir.h as static inlines. I thought about renaming
them to postfix with _platform but that felt too long and ugly. On the
other hand it might be slightly confusing with _native.

This simplifies a submodule refactor which wants to use the helpers
earlier in the submodule--helper.c file.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 16:38:56 -07:00
Jacob Keller f62dcc7f30 remote: remove branch->merge_name and fix branch_release()
The branch structure has both branch->merge_name and branch->merge for
tracking the merge information. The former is allocated by add_merge()
and stores the names read from the configuration file. The latter is
allocated by set_merge() which is called by branch_get() when an
external caller requests a branch.

This leads to the confusing situation where branch->merge_nr tracks both
the size of branch->merge (once its allocated) and branch->merge_name.
The branch_release() function incorrectly assumes that branch->merge is
always set when branch->merge_nr is non-zero, and can potentially crash
if read_config() is called without branch_get() being called on every
branch.

In addition, branch_release() fails to free some of the memory
associated with the structure including:

 * Failure to free the refspec_item containers in branch->merge[i]
 * Failure to free the strings in branch->merge_name[i]
 * Failure to free the branch->merge_name parent array.

The set_merge() function sets branch->merge_nr to 0 when there is no
valid remote_name, to avoid external callers seeing a non-zero merge_nr
but a NULL merge array. This results in failure to release most of the
merge data as well.

These issues could be fixed directly, and indeed I initially proposed
such a change at [1] in the past. While this works, there was some
confusion during review because of the inconsistencies.

Instead, its time to clean up the situation properly. Remove
branch->merge_name entirely. Instead, allocate branch->merge earlier
within add_merge() instead of within set_merge(). Instead of having
set_merge() copy from merge_name[i] to merge[i]->src, just have
add_merge() directly initialize merge[i]->src.

Modify the add_merge() to call xstrdup() itself, instead of having
the caller of add_merge() do so. This makes it more obvious which code
owns the memory.

Update all callers which use branch->merge_name[i] to use
branch->merge[i]->src instead.

Add a merge_clear() function which properly releases all of the
merge-related memory, and which sets branch->merge_nr to zero. Use this
both in branch_release() and in set_merge(), fixing the leak when
set_merge() finds no valid remote_name.

Add a set_merge variable to the branch structure, which indicates
whether set_merge() has been called. This replaces the previous use of a
NULL check against the branch->merge array.

With these changes, the merge array is always allocated when merge_nr is
non-zero.

This use of refspec_item to store the names should be safe. External
callers should be using branch_get() to obtain a pointer to the branch,
which will call set_merge(), and the callers internal to remote.c
already handle the partially initialized refpsec_item structure safely.

This end result is cleaner, and avoids duplicating the merge names
twice.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Link: [1] https://lore.kernel.org/git/20250617-jk-submodule-helper-use-url-v2-1-04cbb003177d@gmail.com/
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 16:38:55 -07:00
Taylor Blau 5ee86c273b repack: exclude cruft pack(s) from the MIDX where possible
In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
MIDX with '--write-midx' to ensure that the resulting MIDX was always
closed under reachability in order to generate reachability bitmaps.

While the previous patch added the '--stdin-packs=follow' option to
pack-objects, it is not yet on by default. Given that, suppose you have
a once-unreachable object packed in a cruft pack, which later becomes
reachable from one or more objects in a geometrically repacked pack.
That once-unreachable object *won't* appear in the new pack, since the
cruft pack was not specified as included or excluded when the
geometrically repacked pack was created with 'pack-objects
--stdin-packs' (*not* '--stdin-packs=follow', which is not on). If that
new pack is included in a MIDX without the cruft pack, then trying to
generate bitmaps for that MIDX may fail. This happens when the bitmap
selection process picks one or more commits which reach the
once-unreachable objects.

To mitigate this failure mode, commit ddee3703b3 ensures that the MIDX
will be closed under reachability by including cruft pack(s). If cruft
pack(s) were not included, we would fail to generate a MIDX bitmap. But
ddee3703b3 alludes to the fact that this is sub-optimal by saying

    [...] it's desirable to avoid including cruft packs in the MIDX
    because it causes the MIDX to store a bunch of objects which are
    likely to get thrown away.

, which is true, but hides an even larger problem. If repositories
rarely prune their unreachable objects and/or have many of them, the
MIDX must keep track of a large number of objects which bloats the MIDX
and slows down object lookup.

This is doubly unfortunate because the vast majority of objects in cruft
pack(s) are unlikely to be read. But any object lookups that go through
the MIDX must binary search over them anyway, slowing down object
lookups using the MIDX.

This patch causes geometrically-repacked packs to contain a copy of any
once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
allowing us to avoid including any cruft packs in the MIDX. This is
because a sequence of geometrically-repacked packs that were all
generated with '--stdin-packs=follow' are guaranteed to have their union
be closed under reachability.

Note that you cannot guarantee that a collection of packs is closed
under reachability if not all of them were generated with "following" as
above. One tell-tale sign that not all geometrically-repacked packs in
the MIDX were generated with "following" is to see if there is a pack in
the existing MIDX that is not going to be somehow represented (either
verbatim or as part of a geometric rollup) in the new MIDX.

If there is, then starting to generate packs with "following" during
geometric repacking won't work, since it's open to the same race as
described above.

But if you're starting from scratch (e.g., building the first MIDX after
an all-into-one '--cruft' repack), then you can guarantee that the union
of subsequently generated packs from geometric repacking *is* closed
under reachability.

(One exception here is when "starting from scratch" results in a noop
repack, e.g., because the non-cruft pack(s) in a repository already form
a geometric progression. Since we can't tell whether or not those were
generated with '--stdin-packs=follow', they may depend on
once-unreachable objects, so we have to include the cruft pack in the
MIDX in this case.)

Detect when this is the case and avoid including cruft packs in the MIDX
where possible. The existing behavior remains the default, and the new
behavior is available with the config 'repack.midxMustIncludeCruft' set
to 'false'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 15:41:38 -07:00
Taylor Blau cd846bacc7 pack-objects: introduce '--stdin-packs=follow'
When invoked with '--stdin-packs', pack-objects will generate a pack
which contains the objects found in the "included" packs, less any
objects from "excluded" packs.

Packs that exist in the repository but weren't specified as either
included or excluded are in practice treated like the latter, at least
in the sense that pack-objects won't include objects from those packs.
This behavior forces us to include any cruft pack(s) in a repository's
multi-pack index for the reasons described in ddee3703b3
(builtin/repack.c: add cruft packs to MIDX during geometric repack,
2022-05-20).

The full details are in ddee3703b3, but the gist is if you
have a once-unreachable object in a cruft pack which later becomes
reachable via one or more commits in a pack generated with
'--stdin-packs', you *have* to include that object in the MIDX via the
copy in the cruft pack, otherwise we cannot generate reachability
bitmaps for any commits which reach that object.

Note that the traversal here is best-effort, similar to the existing
traversal which provides name-hash hints. This means that the object
traversal may hand us back a blob that does not actually exist. We
*won't* see missing trees/commits with 'ignore_missing_links' because:

 - missing commit parents are discarded at the commit traversal stage by
   revision.c::process_parents()

 - missing tag objects are discarded by revision.c::handle_commit()

 - missing tree objects are discarded by the list-objects code in
   list-objects.c::process_tree()

But we have to handle potentially-missing blobs specially by making a
separate check to ensure they exist in the repository. Failing to do so
would mean that we'd add an object to the packing list which doesn't
actually exist, rendering us unable to write out the pack.

This prepares us for new repacking behavior which will "resurrect"
objects found in cruft or otherwise unspecified packs when generating
new packs. In the context of geometric repacking, this may be used to
maintain a sequence of geometrically-repacked packs, the union of which
is closed under reachability, even in the case described earlier.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 15:41:37 -07:00
Taylor Blau 63195f013b pack-objects: swap 'show_{object,commit}_pack_hint'
show_commit_pack_hint() has heretofore been a noop, so its position
within its compilation unit only needs to appear before its first use.

But the following commit will sometimes have `show_commit_pack_hint()`
call `show_object_pack_hint()`, so reorder the former to appear after
the latter to minimize the code movement in that patch.

Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 15:41:37 -07:00
Taylor Blau 8ed5d87bdd pack-objects: fix typo in 'show_object_pack_hint()'
Noticed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 15:41:37 -07:00
Taylor Blau d6220cce6b pack-objects: perform name-hash traversal for unpacked objects
With '--unpacked', pack-objects adds loose objects (which don't appear
in any of the excluded packs from '--stdin-packs') to the output pack
without considering them as reachability tips for the name-hash
traversal.

This was an oversight in the original implementation of '--stdin-packs',
since the code which enumerates and adds loose objects to the output
pack (`add_unreachable_loose_objects()`) did not have access to the
'rev_info' struct found in `read_packs_list_from_stdin()`.

Excluding unpacked objects from that traversal doesn't affect the
correctness of the resulting pack, but it does make it harder to
discover good deltas for loose objects.

Now that the 'rev_info' struct is declared outside of
`read_packs_list_from_stdin()`, we can pass it to
`add_objects_in_unpacked_packs()` and add any loose objects as tips to
the above-mentioned traversal, in theory producing slightly tighter
packs as a result.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 15:41:36 -07:00
Taylor Blau 97ec43247c pack-objects: declare 'rev_info' for '--stdin-packs' earlier
Once 'read_packs_list_from_stdin()' has called for_each_object_in_pack()
on each of the input packs, we do a reachability traversal to discover
names for any objects we picked up so we can generate name hash values
and hopefully get higher quality deltas as a result.

A future commit will change the purpose of this reachability traversal
to find and pack objects which are reachable from commits in the input
packs, but are packed in an unknown (not included nor excluded) pack.

Extract the code which initializes and performs the reachability
traversal to take place in the caller, not the callee, which prepares us
to share this code for the '--unpacked' case (see the function
add_unreachable_loose_objects() for more details).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 15:41:36 -07:00
Taylor Blau 67e1a7827b pack-objects: factor out handling '--stdin-packs'
At the bottom of cmd_pack_objects() we check which mode the command is
running in (e.g., generating a cruft pack, handling '--stdin-packs',
using the internal rev-list, etc.) and handle the mode appropriately.

The '--stdin-packs' case is handled inline (dating back to its
introduction in 339bce27f4 (builtin/pack-objects.c: add '--stdin-packs'
option, 2021-02-22)) since it is relatively short. Extract the body of
"if (stdin_packs)" into its own function to prepare for the
implementation to become lengthier in a following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 15:41:36 -07:00
Taylor Blau 9809d4ae9f pack-objects: limit scope in 'add_object_entry_from_pack()'
In add_object_entry_from_pack() we declare 'revs' (given to us through
the miscellaneous context argument) earlier in the "if (p)" conditional
than is necessary.  Move it down as far as it can go to reduce its
scope.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 15:41:36 -07:00
Taylor Blau 798ddd947f pack-objects: use standard option incompatibility functions
pack-objects has a handful of explicit checks for pairs of command-line
options which are mutually incompatible. Many of these pre-date
a699367bb8 (i18n: factorize more 'incompatible options' messages,
2022-01-31).

Convert the explicit checks into die_for_incompatible_opt2() calls,
which simplifies the implementation and standardizes pack-objects'
output when given incompatible options (e.g., --stdin-packs with
--filter gives different output than --keep-unreachable with
--unpack-unreachable).

There is one minor piece of test fallout in t5331 that expects the old
format, which has been corrected.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 15:41:35 -07:00
Junio C Hamano 92daf08c84 Merge branch 'ly/submodule-update-failure-leakfix'
A memory leak on an error code path has been plugged.

* ly/submodule-update-failure-leakfix:
  builtin/submodule--helper: fix leak when remote_submodule_branch() failed
2025-06-18 13:53:36 -07:00
Junio C Hamano 0d0d56bca4 Merge branch 'ly/commit-buffer-reencode-leakfix'
Leakfix.

* ly/commit-buffer-reencode-leakfix:
  repo_logmsg_reencode: fix memory leak when use repo_logmsg_reencode ()
2025-06-18 13:53:34 -07:00
Junio C Hamano 2024ab3d97 Merge branch 'jk/diff-no-index-with-pathspec'
"git diff --no-index dirA dirB" can limit the comparison with
pathspec at the end of the command line, just like normal "git
diff".

* jk/diff-no-index-with-pathspec:
  diff --no-index: support limiting by pathspec
  pathspec: add flag to indicate operation without repository
  pathspec: add match_leading_pathspec variant
2025-06-17 10:44:42 -07:00
Junio C Hamano 5e22d03832 Merge branch 'ly/fetch-pack-leakfix'
A memory-leak in an error code path has been plugged.

* ly/fetch-pack-leakfix:
  builtin/fetch-pack: cleanup before return error
2025-06-17 10:44:41 -07:00
Junio C Hamano b5a135b1f7 Merge branch 'ly/commit-graph-graph-write-leakfix'
A memory-leak in an error code path has been plugged.

* ly/commit-graph-graph-write-leakfix:
  commit-graph: fix start_delayed_progress() leak
2025-06-17 10:44:41 -07:00
Junio C Hamano 1f622bb0ab Merge branch 'ly/do-not-localize-bug-messages'
Code clean-up.

* ly/do-not-localize-bug-messages:
  BUG(): remove leading underscore of the format string
2025-06-17 10:44:40 -07:00
Junio C Hamano 4fd5b1ddc7 Merge branch 'vd/cat-file-objectmode-update'
"git cat-file --batch" learns to understand %(objectmode) atom to
allow the caller to tell missing objects (due to repository
corruption) and submodules (whose commit objects are OK to be
missing) apart.

* vd/cat-file-objectmode-update:
  cat-file.c: add batch handling for submodules
  cat-file: add %(objectmode) atom
  t1006: update 'run_tests' to test generic object specifiers
2025-06-17 10:44:39 -07:00
Junio C Hamano 88134a8417 Merge branch 'ds/path-walk-2'
"git pack-objects" learns to find delta bases from blobs at the
same path, using the --path-walk API.

* ds/path-walk-2:
  pack-objects: allow --shallow and --path-walk
  path-walk: add new 'edge_aggressive' option
  pack-objects: thread the path-based compression
  pack-objects: refactor path-walk delta phase
  scalar: enable path-walk during push via config
  pack-objects: enable --path-walk via config
  repack: add --path-walk option
  t5538: add tests to confirm deltas in shallow pushes
  pack-objects: introduce GIT_TEST_PACK_PATH_WALK
  p5313: add performance tests for --path-walk
  pack-objects: update usage to match docs
  pack-objects: add --path-walk option
  pack-objects: extract should_attempt_deltas()
2025-06-17 10:44:38 -07:00
Junio C Hamano c8b4805897 merge/pull: extend merge.stat configuration variable to cover --compact-summary
Existing `merge.stat` configuration variable is a Boolean that
defaults to `true` to control `git merge --[no-]stat` behaviour.

Extend it to be "Boolean or text", that takes false, true, or
"compact", with the last one triggering the --compact-summary option
introduced earlier.  Any other values are taken as the same as true,
instead of signaling an error---it is not a grave enough offence to
stop their merge.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-13 11:54:14 -07:00
Junio C Hamano 3a54f5bd5d merge/pull: add the "--compact-summary" option
"git merge" and "git pull" shows "git diff --stat --summary @{1}"
when they finish to indicate the extent of the changes brought into
the history by default.  While it gives a good overview, it becomes
annoying when there are very many created or deleted paths.

Introduce "--compact-summary" option to these two commands that
tells it to instead show "git diff --compact-summary @{1}", which
gives the same information in a lot more compact form in such a
situation.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-13 11:50:37 -07:00
brian m. carlson bc303718cc builtin/stash: provide a way to import stashes from a ref
Now that we have a way to export stashes to a ref, let's provide a way
to import them from such a ref back to the stash.  This works much the
way the export code does, except that we strip off the first parent
chain commit and then store each resulting commit back to the stash.

We don't clear the stash first and instead add the specified stashes to
the top of the stash.  This is because users may want to export just a
few stashes, such as to share a small amount of work in progress with a
colleague, and it would be undesirable for the receiving user to lose
all of their data.  For users who do want to replace the stash, it's
easy to do to: simply run "git stash clear" first.

We specifically rely on the fact that we'll produce identical stash
commits on both sides in our tests.  This provides a cheap,
straightforward check for our tests and also makes it easy for users to
see if they already have the same data in both repositories.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-12 13:32:18 -07:00
brian m. carlson 27c0be9a3f builtin/stash: provide a way to export stashes to a ref
A common user problem is how to sync in-progress work to another
machine.  Users currently must use some sort of transfer of the working
tree, which poses security risks and also necessarily causes the index
to become dirty.  The experience is suboptimal and frustrating for
users.

A reasonable idea is to use the stash for this purpose, but the stash is
stored in the reflog, not in a ref, and as such it cannot be pushed or
pulled.  This also means that it cannot be saved into a bundle or
preserved elsewhere, which is a problem when using throwaway development
environments.

In addition, users often want to replicate stashes across machines, such
as when they must use multiple machines or when they use throwaway dev
environments, such as those based on the Devcontainer spec, where they
might otherwise lose various in-progress work.

Let's solve this problem by allowing the user to export the stash to a
ref (or, to just write it into the repository and print the hash, à la
git commit-tree).  Introduce git stash export, which writes a chain of
commits where the first parent is always a chain to the previous stash,
or to a single, empty commit (for the final item) and the second is the
stash commit normally written to the reflog.

Iterate over each stash from top to bottom, looking up the data for each
one, and then create the chain from the single empty commit back up in
reverse order.  Generate a predictable empty commit so our behavior is
reproducible.  Create a useful commit message, preserving the author and
committer information, to help users identify stash commits when viewing
them as normal commits.

If the user has specified specific stashes they'd like to export
instead, use those instead of iterating over all of the stashes.

As part of this, specifically request quiet behavior when looking up the
OID for a revision because we will eventually hit a revision that
doesn't exist and we don't want to die when that occurs.

When exporting stashes, be sure to verify that they look like valid
stashes and don't contain invalid data.  This will help avoid failures
on import or problems due to attempting to export invalid refs that are
not stashes.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-12 13:32:17 -07:00
brian m. carlson 7572e59b3d builtin/stash: factor out revision parsing into a function
We allow several special forms of stash names in this code.  In the
future, we'll want to allow these same forms without parsing a stash
commit, so let's refactor this code out into a function for reuse.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-12 13:32:17 -07:00
K Jayatheerth ffb36c64f2 stash: fix incorrect branch name in stash message
When creating a stash, Git uses the current branch name
of the superproject to construct the stash commit message.
However, in repositories with submodules,
the message may mistakenly display the submodule branch name instead.

This is because `refs_resolve_ref_unsafe()` returns a pointer to a static buffer.
Subsequent calls to the same function overwrite the buffer,
corrupting the originally fetched `branch_name` used for the stash message.

Use `xstrdup()` to duplicate the branch name immediately after resolving it,
so that later buffer overwrites do not affect the stash message.

Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-11 08:59:32 -07:00
Øystein Walle ade14bffd7 rebase: write script before initializing state
If rebase.instructionFormat is invalid the repository is left in a
strange state when the interactive rebase fails. `git status` outputs
boths the same as it would in the normal case *and* something related to
interactive rebase:

    $ git -c rebase.instructionFormat=blah rebase -i
    fatal: invalid --pretty format: blah
    $ git status
    On branch master
    Your branch is ahead of 'upstream/master' by 1 commit.
      (use "git push" to publish your local commits)

    git-rebase-todo is missing.
    No commands done.
    No commands remaining.
    You are currently editing a commit while rebasing branch 'master' on '8db3019401'.
      (use "git commit --amend" to amend the current commit)
      (use "git rebase --continue" once you are satisfied with your changes)

By attempting to write the rebase script before initializing the state
this potential scenario is avoided.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-09 15:56:57 -07:00
Lidong Yan bfc9f9cc64 builtin/submodule--helper: fix leak when remote_submodule_branch() failed
In builtin/submodule--helper.c:update_submodule(), the variable
remote_name is allocated in get_default_remote_submodule() but
may be leaked if remote_submodule_branch() fails. Although it is
unlikely that remote_submodule_branch() would fail after successfully
obtaining a remote ref name from get_default_remote_submodule(),
it is still possible. To prevent a potential memory leak, add a
call to free(remote_name) at the early exit point.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-08 08:49:48 -07:00
Phillip Wood 468817bab2 stash: allow "git stash [<options>] --patch <pathspec>" to assume push
The support for assuming "push" when "-p" is given introduced in
9e140909f6 (stash: allow pathspecs in the no verb form, 2017-02-28) is
very narrow, neither "git stash -m <message> -p <pathspec>" nor "git
stash --patch <pathspec>" imply "push" and die instead. Relax this by
passing PARSE_OPT_STOP_AT_NON_OPTION when push is being assumed and then
setting "force_assume" if "--patch" was present. This means "git stash
<pathspec> -p" still dies so that it does not assume the user meant
"push" if they mistype a subcommand name but "git stash -m <message> -p
<pathspec>" will now succeed. The test added in the last commit is
adjusted to check that push is still assumed when "--patch" comes after
other options on the command-line.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-07 10:37:17 -07:00
Phillip Wood e6659b77df stash: allow "git stash -p <pathspec>" to assume push again
Historically "git stash [<options>]" was assumed to mean "git stash save
[<options>]". Since 1ada5020b3 (stash: use stash_push for no verb form,
2017-02-28) it is assumed to mean "git stash push [<options>]". As the
push subcommand supports pathspecs, 9e140909f6 (stash: allow pathspecs
in the no verb form, 2017-02-28) allowed "git stash -p <pathspec>" to
mean "git stash push -p <pathspec>". This was broken in 8c3713cede
(stash: eliminate crude option parsing, 2020-02-17) which failed to
account for "push" being added to the start of argv in cmd_stash()
before it calls push_stash() and kept looking in argv[0] for "-p" after
moving the code to push_stash().

Fix this by regression by checking argv[1] instead of argv[0] and add a
couple of tests to prevent future regressions.

Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-07 10:37:16 -07:00
Lidong Yan 61372dd613 repo_logmsg_reencode: fix memory leak when use repo_logmsg_reencode ()
pretty.c:repo_logmsg_reencode() allocated memory should be freed with
repo_unuse_commit_buffer(). Callers sometimes forgot free it at exit
point. Add `repo_unuse_commit_buffer()` in insert_records_from_trailers
at builtin/shortlog.c and create_commit at builtin/replay.c

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-05 08:35:22 -07:00
Lidong Yan 7082da85cb commit-graph: fix start_delayed_progress() leak
In commit-graph.c:graph_write(), if read_one_commit() failed,
progress allocated in start_delayed_progress() will leak. Add
stop_progress() before goto cleanup.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-04 08:55:30 -07:00
Lidong Yan aedebdb6b9 builtin/fetch-pack: cleanup before return error
In builtin/fetch-pack.c:cmd_fetch_pack(), if finish_connect() failed,
it returns error code without cleanup which cause memory leak. Add
cleanup label before frees in the end of cmd_fetch_pack(), and add
`goto cleanup` if finish_connect() failed.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-04 08:52:25 -07:00
Victoria Dye b0b910e052 cat-file.c: add batch handling for submodules
When an object specification is passed to 'cat-file --batch[-check]'
referring to a submodule (e.g. 'HEAD:path/to/my/submodule'), the current
behavior of the command is to print the "missing" error message. However, it
is often valuable for callers to distinguish between paths that are actually
missing and "the submodule tree entry exists, but the object does not exist
in the repository".

To disambiguate without needing to invoke a separate Git process (e.g.
'ls-tree'), print the message "<oid> submodule" for such objects instead of
"<object> missing". In addition to the change from "missing" to "submodule",
the new message differs from the old in that it always prints the resolved
tree entry's OID, rather than the input object specification.

Note that this implementation maintains a distinction between submodules
where the commit OID is not present in the repo, and submodules where the
commit OID *is* present; the former will now print "<object> submodule", but
the latter will still print the full object content.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03 12:08:58 -07:00
Victoria Dye aba1438435 cat-file: add %(objectmode) atom
Add a formatting atom, used with the --batch-check/--batch-command options,
that prints the octal representation of the object mode if a given revision
includes that information, e.g. one that follows the format
<tree-ish>:<path>. If the mode information does not exist, an empty string
is printed instead.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03 12:08:58 -07:00
Junio C Hamano d9a1e51c76 Merge branch 'bs/total-ram-bsd'
Update total_ram() functrion on BSD variants.

* bs/total-ram-bsd:
  builtin/gc: correct physical memory detection for OpenBSD / NetBSD
2025-06-03 08:55:24 -07:00
Lidong Yan 5dceb8bd05 BUG(): remove leading underscore of the format string
BUG() is not end-user facing but programmer facing, and we do not
use _("...") in them. Replace all `BUG(_("..."))` with `BUG("...")`

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03 08:36:11 -07:00
Patrick Steinhardt 1b5074e614 builtin/maintenance: fix locking race when handling "gc" task
The "gc" task has a similar locking race as the one that we have fixed
for the "pack-refs" and "reflog-expire" tasks in preceding commits. Fix
this by splitting up the logic of the "gc" task:

  - We execute `gc_before_repack()` in the foreground, which contains
    the logic that git-gc(1) itself would execute in the foreground, as
    well.

  - We spawn git-gc(1) after detaching, but with a new hidden flag that
    suppresses calling `gc_before_repack()`.

Like this we have roughly the same logic as git-gc(1) itself and know to
repack refs and reflogs before detaching, thus fixing the race.

Note that `gc_before_repack()` is renamed to `gc_foreground_tasks()` to
better reflect what this function does.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03 08:30:52 -07:00
Patrick Steinhardt d2b084c660 builtin/gc: avoid global state in `gc_before_repack()`
The `gc_before_repack()` should only ever run once in git-gc(1), but we
may end up calling it twice when the "--detach" flag is passed. The
duplicated call is avoided though via a static flag in this function.

This pattern is somewhat unintuitive though. Refactor it to drop the
static flag and instead guard the second call of `gc_before_repack()`
via `opts.detach`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03 08:30:52 -07:00
Patrick Steinhardt 697202b0b1 usage: allow dying without writing an error message
Sometimes code wants to die in a situation where it already has written
an error message. To use the same error code as `die()` we have to use
`exit(128)`, which is easy to get wrong and leaves magic numbers all
over our codebase.

Teach `die_message_builtin()` to not print any error when passed a
`NULL` pointer as error string. Like this, such users can now call
`die(NULL)` to achieve the same result without any hardcoded error
codes.

Adapt a couple of builtins to use this new pattern to demonstrate that
there is a need for such a helper.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03 08:30:51 -07:00
Patrick Steinhardt c367852d9e builtin/maintenance: fix locking race with refs and reflogs tasks
As explained in the preceding commit, git-gc(1) knows to detach only
after it has already packed references and expired reflogs. This is done
to avoid racing around their respective lockfiles.

Adapt git-maintenance(1) accordingly and run the "pack-refs" and
"reflog-expire" tasks in the foreground. Note that the "gc" task has the
same issue, but the fix is a bit more involved there and will thus be
done in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03 08:30:51 -07:00
Patrick Steinhardt 5bb4298acf builtin/maintenance: split into foreground and background tasks
Both git-gc(1) and git-maintenance(1) have logic to daemonize so that
the maintenance tasks are performed in the background. git-gc(1) has
some special logic though to not perform _all_ housekeeping tasks in the
background: both references and reflogs are still handled synchronously
in the foreground.

This split exists because otherwise it may easily happen that git-gc(1)
keeps the "packed-refs" file locked for an extended amount of time,
where the next Git command that wants to modify any reference could now
fail. This was especially important in the past, where git-gc(1) was
still executed directly as part of our automatic maintenance: git-gc(1)
was invoked via `git gc --auto --detach`, so we knew to handle most of
the maintenance tasks in the background while doing those parts that may
cause locking issues in the foreground.

We have since moved to git-maintenance(1), which is a more flexible
replacement for git-gc(1). By default this command runs git-gc(1), only,
but it can be configured to run different tasks, as well. This command
does not know about the split between maintenance tasks that should run
before and after detach though, and this has led to several bug reports
about spurious locking errors for the "packed-refs" file.

Prepare for a fix by introducing this split for maintenance tasks. Note
that this commit does not yet change any of the tasks, so there should
not (yet) be a change in behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03 08:30:51 -07:00
Patrick Steinhardt 3236e03c66 builtin/maintenance: fix typedef for function pointers
The typedefs for `maintenance_task_fn` and `maintenance_auto_fn` are
somewhat confusingly not true function pointers. As such, any user of
those typedefs needs to manually add the pointer to make use of them.

Fix this by making these true function pointers.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03 08:30:50 -07:00
Patrick Steinhardt 2aa9ee7eec builtin/maintenance: extract function to run tasks
Extract the function to run maintenance tasks. This function will be
reused in a subsequent commit where we introduce a split between
maintenance tasks that run before and after daemonizing the process.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03 08:30:50 -07:00
Patrick Steinhardt 38a8fa5a9a builtin/maintenance: stop modifying global array of tasks
When configuring maintenance tasks run by git-maintenance(1) we do so by
modifying the global array of tasks directly. This is already quite bad
on its own, as global state makes for logic that is hard to follow.

Even more importantly though we use multiple different fields to track
whether or not a task should be run:

  - "enabled" tracks the "maintenance.*.enabled" config key. This field
    disables execution of a task, unless the user has explicitly asked
    for the task.

  - "selected_order" tracks the order in which jobs have been asked for
    by the user via the "--task=" command line option. It overrides
    everything else, but only has an effect if at least one job has been
    selected.

  - "schedule" tracks the schedule priority for a job, that is how often
    it should run. This field only plays a role when the user has passed
    the "--schedule=" command line option.

All of this makes it non-trivial to figure out which job really should
be running right now. The logic to configure these fields and the logic
that interprets them is distributed across multiple functions, making it
even harder to follow it.

Refactor the logic so that we stop modifying global state. Instead, we
now compute which jobs should be run in `initialize_task_config()`,
represented as an array of jobs to run that is stored in the options
structure. Like this, all logic becomes self-contained and any users of
this array only need to iterate through the tasks and execute them one
by one.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03 08:30:49 -07:00
Patrick Steinhardt a7c86d328f builtin/maintenance: mark "--task=" and "--schedule=" as incompatible
The "--task=" option explicitly allows the user to say which maintenance
tasks should be run, whereas "--schedule=" only respects the maintenance
strategy configured for a specific repository. As such, it is not
sensible to accept both options at the same time.

Mark them as incompatible with one another. While at it, also convert
the existing logic that marks "--auto" and "--schedule=" as incompatible
to use `die_for_incompatible_opt2()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03 08:30:49 -07:00
Patrick Steinhardt 1bb6bdb646 builtin/maintenance: centralize configuration of explicit tasks
Users of git-maintenance(1) can explicitly ask it to run specific tasks
by passing the `--task=` command line option. This option can be passed
multiple times, which causes us to execute tasks in the same order as
the tasks have been provided by the user.

The order in which tasks are run is computed in `task_option_parse()`:
every time we parse such a command line argument, we modify the global
array of tasks by seting the selected index for that specific task.
This has two downsides:

  - We modify global state, which makes it hard to follow the logic.

  - The configuration of tasks is split across multiple different
    functions, so it is not easy to figure out the different factors
    that play a role in selecting tasks.

Refactor the logic so that `task_option_parse()` does not modify global
state anymore. Instead, this function now only collects the list of
configured tasks. The logic to configure ordering of the respective
tasks is then deferred to `initialize_task_config()`.

This refactoring solves the second problem, that the configuration of
tasks is spread across multiple different locations. The first problem,
that we modify global state, will be fixed in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03 08:30:49 -07:00
Patrick Steinhardt bd19b94a66 builtin/gc: drop redundant local variable
We have two different variables that track the quietness for git-gc(1):

  - The local variable `quiet`, which we wire up.

  - The `quiet` field of `struct maintenance_run_opts`.

This leads to confusion which of these variables should be used and what
the respective effect is.

Simplify this logic by dropping the local variable in favor of the
options field.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03 08:30:48 -07:00
Patrick Steinhardt 95b5039f5b builtin/gc: use designated field initializers for maintenance tasks
Convert the array of maintenance tasks to use designated field
initializers. This makes it easier to add more fields to the struct
without having to modify all tasks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-03 08:30:48 -07:00
Brad Smith 35c1d592cd builtin/gc: correct physical memory detection for OpenBSD / NetBSD
OpenBSD / NetBSD use HW_PHYSMEM64 to detect the amount of physical
memory in a system. HW_PHYSMEM will not provide the correct amount
on a system with >=4GB of memory.

Signed-off-by: Brad Smith <brad@comstyle.com>
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-01 19:01:07 -07:00
Junio C Hamano 0b4c6baa70 fast-export: --signed-commits is experimental
As the design of signature handling is still being discussed, it is
likely that the data stream produced by the code in Git 2.50 would
have to be changed in such a way that is not backward compatible.

Mark the feature as experimental and discourge its use for now.

Also flip the default on the generation side to "strip"; users of
existing versions would not have passed --signed-commits=strip and
will be broken by this change if the default is made to abort, and
will be encouraged by the error message to produce data stream with
future breakage guarantees by passing --signed-commits option.

As we tone down the default behaviour, we no longer need the
FAST_EXPORT_SIGNED_COMMITS_NOABORT environment variable, which was
not discoverable enough.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-28 10:30:47 -07:00
Junio C Hamano b4847a4477 Merge branch 'jt/receive-pack-skip-connectivity-check'
"git receive-pack" optionally learns not to care about connectivity
check, which can be useful when the repository arranges to ensure
connectivity by some other means.

* jt/receive-pack-skip-connectivity-check:
  builtin/receive-pack: add option to skip connectivity check
  t5410: test receive-pack connectivity check
2025-05-28 07:59:56 -07:00
Junio C Hamano f9cdaa2860 Merge branch 'js/misc-fixes'
Assorted fixes for issues found with CodeQL.

* js/misc-fixes:
  sequencer: stop pretending that an assignment is a condition
  bundle-uri: avoid using undefined output of `sscanf()`
  commit-graph: avoid using stale stack addresses
  trace2: avoid "futile conditional"
  Avoid redundant conditions
  fetch: avoid unnecessary work when there is no current branch
  has_dir_name(): make code more obvious
  upload-pack: rename `enum` to reflect the operation
  commit-graph: avoid malloc'ing a local variable
  fetch: carefully clear local variable's address after use
  commit: simplify code
2025-05-27 13:59:11 -07:00
Junio C Hamano 6e5fb398d3 Merge branch 'ds/sparse-apply-add-p'
"git apply" and "git add -i/-p" code paths no longer unnecessarily
expand sparse-index while working.

* ds/sparse-apply-add-p:
  p2000: add performance test for patch-mode commands
  reset: integrate sparse index with --patch
  git add: make -p/-i aware of sparse index
  apply: integrate with the sparse index
2025-05-27 13:59:09 -07:00
Junio C Hamano f545f401be Merge branch 'en/merge-tree-check'
"git merge-tree" learned an option to see if it resolves cleanly
without actually creating a result.

* en/merge-tree-check:
  merge-tree: add a new --quiet flag
  merge-ort: add a new mergeability_only option
2025-05-27 13:59:08 -07:00
Junio C Hamano 17d9dbd3c2 Merge branch 'jk/no-funny-object-types'
Support to create a loose object file with unknown object type has
been dropped.

* jk/no-funny-object-types:
  object-file: drop support for writing objects with unknown types
  hash-object: handle --literally with OPT_NEGBIT
  hash-object: merge HASH_* and INDEX_* flags
  hash-object: stop allowing unknown types
  t: add lib-loose.sh
  t/helper: add zlib test-tool
  oid_object_info(): drop type_name strbuf
  fsck: stop using object_info->type_name strbuf
  oid_object_info_convert(): stop using string for object type
  cat-file: use type enum instead of buffer for -t option
  object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag
  cat-file: make --allow-unknown-type a noop
  object-file.h: fix typo in variable declaration
2025-05-27 13:59:08 -07:00
Junio C Hamano 96d127896d Merge branch 'en/replay-wo-the-repository'
The dependency on the_repository variable has been reduced from the
code paths in "git replay".

* en/replay-wo-the-repository:
  replay: replace the_repository with repo parameter passed to cmd_replay ()
2025-05-23 15:34:08 -07:00
Jacob Keller 09fb155f11 diff --no-index: support limiting by pathspec
The --no-index option of git-diff enables using the diff machinery from
git while operating outside of a repository. This mode of git diff is
able to compare directories and produce a diff of their contents.

When operating git diff in a repository, git has the notion of
"pathspecs" which can specify which files to compare. In particular,
when using git to diff two trees, you might invoke:

  $ git diff-tree -r <treeish1> <treeish2>.

where the treeish could point to a subdirectory of the repository.

When invoked this way, users can limit the selected paths of the tree by
using a pathspec. Either by providing some list of paths to accept, or
by removing paths via a negative refspec.

The git diff --no-index mode does not support pathspecs, and cannot
limit the diff output in this way. Other diff programs such as GNU
difftools have options for excluding paths based on a pattern match.
However, using git diff as a diff replacement has several advantages
over many popular diff tools, including coloring moved lines, rename
detections, and similar.

Teach git diff --no-index how to handle pathspecs to limit the
comparisons. This will only be supported if both provided paths are
directories.

For comparisons where one path isn't a directory, the --no-index mode
already has some DWIM shortcuts implemented in the fixup_paths()
function.

Modify the fixup_paths function to return 1 if both paths are
directories. If this is the case, interpret any extra arguments to git
diff as pathspecs via parse_pathspec.

Use parse_pathspec to load the remaining arguments (if any) to git diff
--no-index as pathspec items. Disable PATHSPEC_ATTR support since we do
not have a repository to do attribute lookup. Disable PATHSPEC_FROMTOP
since we do not have a repository root. All pathspecs are treated as
rooted at the provided comparison paths.

After loading the pathspec data, calculate skip offsets for skipping
past the root portion of the paths. This is required to ensure that
pathspecs start matching from the provided path, rather than matching
from the absolute path. We could instead pass the paths as prefix values
to parse_pathspec. This is slightly problematic because the paths come
from the command line and don't necessarily have the proper trailing
slash. Additionally, that would require parsing pathspecs multiple
times.

Pass the pathspec object and the skip offsets into queue_diff, which
in-turn must pass them along to read_directory_contents.

Modify read_directory_contents to check against the pathspecs when
scanning the directory. Use the skip offset to skip past the initial
root of the path, and only match against portions that are below the
intended directory structure being compared.

The search algorithm for finding paths is recursive with read_dir. To
make pathspec matching work properly, we must set both
DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC.

Without DO_MATCH_DIRECTORY, paths like "a/b/c/d" will not match against
pathspecs like "a/b/c". This is usually achieved by setting the is_dir
parameter of match_pathspec.

Without DO_MATCH_LEADING_PATHSPEC, paths like "a/b/c" would not match
against pathspecs like "a/b/c/d". This is crucial because we recursively
iterate down the directories. We could simply avoid checking pathspecs
at subdirectories, but this would force recursion down directories
which would simply be skipped.

If we always passed DO_MATCH_LEADING_PATHSPEC, then we will
incorrectly match in certain cases such as matching 'a/c' against
':(glob)**/d'. The match logic will see that a matches the leading part
of the **/ and accept this even tho c doesn't match.

To avoid this, use the match_leading_pathspec() variant recently
introduced. This sets both flags when is_dir is set, but leaves them
both cleared when is_dir is 0.

Add test cases and documentation covering the new functionality. Note
for the documentation I opted not to move the placement of '--' which is
sometimes used to disambiguate arguments. The diff --no-index mode
requires exactly 2 arguments determining what to compare. Any additional
arguments are interpreted as pathspecs and must come afterwards. Use of
'--' would not actually disambiguate anything, since there will never be
ambiguity over which arguments represent paths or pathspecs.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-22 14:20:11 -07:00
Justin Tobler 68cb0b5253 builtin/receive-pack: add option to skip connectivity check
During git-receive-pack(1), connectivity of the object graph is
validated to ensure that the received packfile does not leave the
repository in a broken state. This is done via git-rev-list(1) and
walking the objects, which can be expensive for large repositories.

Generally, this check is critical to avoid an incomplete received
packfile from corrupting a repository. Server operators may have
additional knowledge though around exactly how Git is being used on the
server-side which can be used to facilitate more efficient connectivity
computation of incoming objects.

For example, if it can be ensured that all objects in a repository are
connected and do not depend on any missing objects, the connectivity of
newly written objects can be checked by walking the object graph
containing only the new objects from the updated tips and identifying
the missing objects which represent the boundary between the new objects
and the repository. These boundary objects can be checked in the
canonical repository to ensure the new objects connect as expected and
thus avoid walking the rest of the object graph.

Git itself cannot make the guarantees required for such an optimization
as it is possible for a repository to contain an unreachable object that
references a missing object without the repository being considered
corrupt.

Introduce the --skip-connectivity-check option for git-receive-pack(1)
which bypasses this connectivity check to give more control to the
server-side. Note that without proper server-side validation of newly
received objects handled outside of Git, usage of this option risks
corrupting a repository.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-20 11:43:36 -07:00
Junio C Hamano a9dcacbf2a Merge branch 'jk/oidmap-cleanup'
Code cleanup.

* jk/oidmap-cleanup:
  raw_object_store: drop extra pointer to replace_map
  oidmap: add size function
  oidmap: rename oidmap_free() to oidmap_clear()
2025-05-19 16:02:47 -07:00
Junio C Hamano 6660b42929 Merge branch 'ly/am-split-stgit-leakfix'
Leakfix.

* ly/am-split-stgit-leakfix:
  builtin/am: fix memory leak in `split_mail_stgit_series`
2025-05-19 16:02:46 -07:00
Karthik Nayak 9d2962a7c4 receive-pack: use batched reference updates
The reference updates performed as a part of 'git-receive-pack(1)', take
place one at a time. For each reference update, a new transaction is
created and committed. This is necessary to ensure we can allow
individual updates to fail without failing the entire command. The
command also supports an 'atomic' mode, which uses a single transaction
to update all of the references. But this mode has an all-or-nothing
approach, where if a single update fails, all updates would fail.

In 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08), we introduced a new mechanism to batch reference updates.
Under the hood, this uses a single transaction to perform a batch of
reference updates, while allowing only individual updates to fail.
Utilize this newly introduced batch update mechanism in
'git-receive-pack(1)'. This provides a significant bump in performance,
especially when dealing with repositories with large number of
references.

With the reftable backend there is a 18x performance improvement, when
performing receive-pack with 10000 refs:

  Benchmark 1: receive: many refs (refformat = reftable, refcount = 10000, revision = master)
    Time (mean ± σ):      4.276 s ±  0.078 s    [User: 0.796 s, System: 3.318 s]
    Range (min … max):    4.185 s …  4.430 s    10 runs

  Benchmark 2: receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     235.4 ms ±   6.9 ms    [User: 75.4 ms, System: 157.3 ms]
    Range (min … max):   228.5 ms … 254.2 ms    11 runs

  Summary
    receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran
     18.16 ± 0.63 times faster than receive: many refs (refformat = reftable, refcount = 10000, revision = master)

In similar conditions, the files backend sees a 1.21x performance
improvement:

  Benchmark 1: receive: many refs (refformat = files, refcount = 10000, revision = master)
    Time (mean ± σ):      1.121 s ±  0.021 s    [User: 0.128 s, System: 0.975 s]
    Range (min … max):    1.097 s …  1.156 s    10 runs

  Benchmark 2: receive: many refs (refformat = files, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     927.9 ms ±  22.6 ms    [User: 99.0 ms, System: 815.2 ms]
    Range (min … max):   903.1 ms … 978.0 ms    10 runs

  Summary
    receive: many refs (refformat = files, refcount = 10000, revision = HEAD) ran
      1.21 ± 0.04 times faster than receive: many refs (refformat = files, refcount = 10000, revision = master)

As using batched updates requires the error handling to be moved to the
end of the flow, create and use a 'struct strset' to track the failed
refs and attribute the correct errors to them.

This change also uncovers an issue when a client provides multiple
updates to the same reference. For example:

  $ git send-pack remote.git A:foo B:foo
  Enumerating objects: 3, done.
  Counting objects: 100% (3/3), done.
  Delta compression using up to 20 threads
  Compressing objects: 100% (2/2), done.
  Writing objects: 100% (3/3), 226 bytes | 226.00 KiB/s, done.
  Total 3 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0)
  remote: error: cannot lock ref 'refs/heads/foo': reference already exists
  To remote.git
   ! [remote rejected] A -> foo (failed to update ref)
   ! [remote failure]  B -> foo (remote failed to report status)

As you can see, the remote runs into an error because it cannot lock the
target reference for the second update. Furthermore, the remote complains
that the first update has been rejected whereas the second update didn't
receive any status update because we failed to lock it. Reading this status
message alone a user would probably expect that `foo` has not been updated
at all. But that's not the case: while we claim that the ref wasn't updated,
it surprisingly points to `A` now.

One could argue that this is merely an error in how we report the result of
this push. But ultimately, the user's request itself is already broken and
doesn't make any sense in the first place and cannot ever lead to a sensible
outcome that honors the full request.

The conversion to batched transactions fixes the issue because we now try to
queue both updates in the same transaction. As such, the transaction itself
will notice this conflict and refuse the update altogether before we commit
any of the values.

Note that this requires changes to a couple of tests in t5408 that happened
to exercise this behaviour. Given that the generated output is misleading
and given that the user request cannot ever be fully honored this really
feels more like a bug than properly designed behaviour. As such, changing
the behaviour feels like the right thing to do.

Since now reference updates are batched, the 'reference-transaction'
hook will be invoked with all updates together. Currently git will 'die'
when the hook returns with a non-zero exit status in the 'prepared'
stage. For 'git-receive-pack(1)', this allowed users to reject an
individual reference update, git would have applied previous updates but
immediately abort further execution. This is definitely an incorrect
usage of this hook, since the right place to do this would be the
'update' hook. This patch retains the latter behavior, but
'reference-transaction' hook now changes to a all-or-nothing behavior
when a non-zero exit status is returned in the 'prepared' stage, since
batch updates use a transaction under the hood. This explains the change
in 't1416'.

Helped-by: Jeff King <peff@peff.net>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-19 11:06:32 -07:00
Karthik Nayak 0e358de64a fetch: use batched reference updates
The reference updates performed as a part of 'git-fetch(1)', take place
one at a time. For each reference update, a new transaction is created
and committed. This is necessary to ensure we can allow individual
updates to fail without failing the entire command. The command also
supports an '--atomic' mode, which uses a single transaction to update
all of the references. But this mode has an all-or-nothing approach,
where if a single update fails, all updates would fail.

In 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08), we introduced a new mechanism to batch reference updates.
Under the hood, this uses a single transaction to perform a batch of
reference updates, while allowing only individual updates to fail.
Utilize this newly introduced batch update mechanism in 'git-fetch(1)'.
This provides a significant bump in performance, especially when dealing
with repositories with large number of references.

Adding support for batched updates is simply modifying the flow to also
create a batch update transaction in the non-atomic flow.

With the reftable backend there is a 22x performance improvement, when
performing 'git-fetch(1)' with 10000 refs:

  Benchmark 1: fetch: many refs (refformat = reftable, refcount = 10000, revision = master)
    Time (mean ± σ):      3.403 s ±  0.775 s    [User: 1.875 s, System: 1.417 s]
    Range (min … max):    2.454 s …  4.529 s    10 runs

  Benchmark 2: fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     154.3 ms ±  17.6 ms    [User: 102.5 ms, System: 56.1 ms]
    Range (min … max):   145.2 ms … 220.5 ms    18 runs

  Summary
    fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran
     22.06 ± 5.62 times faster than fetch: many refs (refformat = reftable, refcount = 10000, revision = master)

In similar conditions, the files backend sees a 1.25x performance
improvement:

  Benchmark 1: fetch: many refs (refformat = files, refcount = 10000, revision = master)
    Time (mean ± σ):     605.5 ms ±   9.4 ms    [User: 117.8 ms, System: 483.3 ms]
    Range (min … max):   595.6 ms … 621.5 ms    10 runs

  Benchmark 2: fetch: many refs (refformat = files, refcount = 10000, revision = HEAD)
    Time (mean ± σ):     485.8 ms ±   4.3 ms    [User: 91.1 ms, System: 396.7 ms]
    Range (min … max):   477.6 ms … 494.3 ms    10 runs

  Summary
    fetch: many refs (refformat = files, refcount = 10000, revision = HEAD) ran
      1.25 ± 0.02 times faster than fetch: many refs (refformat = files, refcount = 10000, revision = master)

With this we'll either be using a regular transaction or a batch update
transaction. This helps cleanup some code which is no longer needed as
we'll now always have some type of 'ref_transaction' object being
propagated.

One big change is that earlier, each individual update would propagate a
failure. Whereas now, the `ref_transaction_for_each_rejected_update`
function is called at the end of the flow to capture the exit status for
'git-fetch(1)' and also to print F/D conflict errors. This does change
the order of the errors being printed, but the behavior stays the same.

Since transaction errors are now explicitly defined as part of
76e760b999 (refs: introduce enum-based transaction error types,
2025-04-08), utilize them and get rid of custom errors defined within
'builtin/fetch.c'.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-19 11:06:31 -07:00
Karthik Nayak b3de3832ce refs: add function to translate errors to strings
The commit 76e760b999 (refs: introduce enum-based transaction error
types, 2025-04-08) introduced enum-based transaction error types. The
refs transaction logic was also modified to propagate these errors. For
clients of the ref transaction system, it would be beneficial to provide
human readable messages for these errors.

There is already an existing mapping in 'builtin/update-ref.c', move it
to 'refs.c' as `ref_transaction_error_msg()` and use the same within the
'builtin/update-ref.c'.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-19 11:06:31 -07:00
Elijah Newren 29d7bf1951 merge-tree: add a new --quiet flag
Git Forges may be interested in whether two branches can be merged while
not being interested in what the resulting merge tree is nor which files
conflicted.  For such cases, add a new --quiet flag which
will make use of the new mergeability_only flag added to merge-ort in
the previous commit.  This option allows the merge machinery to, in the
outer layer of the merge:
    * exit early when a conflict is detected
    * avoid writing (most) merged blobs/trees to the object store

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 15:09:14 -07:00
Derrick Stolee c178b02e29 pack-objects: allow --shallow and --path-walk
There does not appear to be anything particularly incompatible about the
--shallow and --path-walk options of 'git pack-objects'. If shallow
commits are to be handled differently, then it is by the revision walk
that defines the commit set and which are interesting or uninteresting.

However, before the previous change, a trivial removal of the warning
would cause a failure in t5500-fetch-pack.sh when
GIT_TEST_PACK_PATH_WALK is enabled. The shallow fetch would provide more
objects than we desired, due to some incorrect behavior of the path-walk
API, especially around walking uninteresting objects.

The recently-added tests in t5538-push-shallow.sh help to confirm this
behavior is working with the --path-walk option if
GIT_TEST_PACK_PATH_WALK is enabled. These tests passed previously due to
the --path-walk feature being disabled in the presence of a shallow
clone.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:41 -07:00
Derrick Stolee e5394794a5 pack-objects: thread the path-based compression
Adapting the implementation of ll_find_deltas(), create a threaded
version of the --path-walk compression step in 'git pack-objects'.

This involves adding a 'regions' member to the thread_params struct,
allowing each thread to own a section of paths. We can simplify the way
jobs are split because there is no value in extending the batch based on
name-hash the way sections of the object entry array are attempted to be
grouped. We re-use the 'list_size' and 'remaining' items for the purpose
of borrowing work in progress from other "victim" threads when a thread
has finished its batch of work more quickly.

Using the Git repository as a test repo, the p5313 performance test
shows that the resulting size of the repo is the same, but the threaded
implementation gives gains of varying degrees depending on the number of
objects being packed. (This was tested on a 16-core machine.)

Test                        HEAD~1      HEAD
---------------------------------------------------
5313.20: big pack             2.38      1.99 -16.4%
5313.21: big pack size       16.1M     16.0M  -0.2%
5313.24: repack             107.32     45.41 -57.7%
5313.25: repack size        213.3M    213.2M  -0.0%

(Test output is formatted to better fit in message.)

This ~60% reduction in 'git repack --path-walk' time is typical across
all repos I used for testing. What is interesting is to compare when the
overall time improves enough to outperform the --name-hash-version=1
case. These time improvements correlate with repositories with data
shapes that significantly improve their data size as well. The
--path-walk feature frequently takes longer than --name-hash-version=2,
trading some extra computation for some additional compression. The
natural place where this additional computation comes from is the two
compression passes that --path-walk takes, though the first pass is
naturally faster due to the path boundaries avoiding a number of delta
compression attempts.

For example, the microsoft/fluentui repo has significant size reduction
from --name-hash-version=1 to --name-hash-version=2 followed by further
improvements with --path-walk. The threaded computation makes
--path-walk more competitive in time compared to --name-hash-version=2,
though still ~31% more expensive in that metric.

Repack Method       Pack Size       Time
------------------------------------------
Hash v1                439.4M      87.24s
Hash v2                161.7M      21.51s
Path Walk (Before)     142.5M      81.29s
Path Walk (After)      142.5M      28.16s

Similar results hold for the Git repository:

Repack Method       Pack Size       Time
------------------------------------------
Hash v1                248.8M      30.44s
Hash v2                249.0M      30.15s
Path Walk (Before)     213.2M     142.50s
Path Walk (After)      213.3M      45.41s

...as well as the nodejs/node repository:

Repack Method       Pack Size       Time
------------------------------------------
Hash v1                739.9M      71.18s
Hash v2                764.6M      67.82s
Path Walk (Before)     698.1M     208.10s
Path Walk (After)      698.0M      75.10s

Finally, the Linux kernel repository is a good test for this repacking
time change, even though the space savings is more subtle:

Repack Method       Pack Size       Time
------------------------------------------
Hash v1                  2.5G     554.41s
Hash v2                  2.5G     549.62s
Path Walk (before)       2.2G    1562.36s
Path Walk (before)       2.2G     559.00s

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:40 -07:00
Derrick Stolee 206a1bb203 pack-objects: refactor path-walk delta phase
Previously, the --path-walk option to 'git pack-objects' would compute
deltas inline with the path-walk logic. This would make the progress
indicator look like it is taking a long time to enumerate objects, and
then very quickly computed deltas.

Instead of computing deltas on each region of objects organized by tree,
store a list of regions corresponding to these groups. These can later
be pulled from the list for delta compression before doing the "global"
delta search.

This presents a new progress indicator that can be used in tests to
verify that this stage is happening.

The current implementation is not integrated with threads, but we are
setting it up to arrive in the next change.

Since we do not attempt to sort objects by size until after exploring
all trees, we can remove the previous change to t5530 due to a different
error message appearing first.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:40 -07:00
Derrick Stolee 4f7f571204 pack-objects: enable --path-walk via config
Users may want to enable the --path-walk option for 'git pack-objects' by
default, especially underneath commands like 'git push' or 'git repack'.

This should be limited to client repositories, since the --path-walk option
disables bitmap walks, so would be bad to include in Git servers when
serving fetches and clones. There is potential that it may be helpful to
consider when repacking the repository, to take advantage of improved deltas
across historical versions of the same files.

Much like how "pack.useSparse" was introduced and included in
"feature.experimental" before being enabled by default, use the repository
settings infrastructure to make the new "pack.usePathWalk" config enabled by
"feature.experimental" and "feature.manyFiles".

In order to test that this config works, add a new trace2 region around
the path walk code that can be checked by a 'git push' command.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:39 -07:00
Derrick Stolee 5f711504d9 repack: add --path-walk option
Since 'git pack-objects' supports a --path-walk option, allow passing it
through in 'git repack'. This presents interesting testing opportunities for
comparing the different repacking strategies against each other.

Add the --path-walk option to the performance tests in p5313.

For the microsoft/fluentui repo [1] checked out at a specific commit [2],
the --path-walk tests in p5313 look like this:

Test                                                     this tree
-------------------------------------------------------------------------
5313.18: thin pack with --path-walk                      0.08(0.06+0.02)
5313.19: thin pack size with --path-walk                           18.4K
5313.20: big pack with --path-walk                       2.10(7.80+0.26)
5313.21: big pack size with --path-walk                            19.8M
5313.22: shallow fetch pack with --path-walk             1.62(3.38+0.17)
5313.23: shallow pack size with --path-walk                        33.6M
5313.24: repack with --path-walk                         81.29(96.08+0.71)
5313.25: repack size with --path-walk                             142.5M

[1] https://github.com/microsoft/fluentui
[2] e70848ebac1cd720875bccaa3026f4a9ed700e08

Along with the earlier tests in p5313, I'll instead reformat the
comparison as follows:

Repack Method    Pack Size       Time
---------------------------------------
Hash v1             439.4M      87.24s
Hash v2             161.7M      21.51s
Path Walk           142.5M      81.29s

There are a few things to notice here:

 1. The benefits of --name-hash-version=2 over --name-hash-version=1 are
    significant, but --path-walk still compresses better than that
    option.

 2. The --path-walk command is still using --name-hash-version=1 for the
    second pass of delta computation, using the increased name hash
    collisions as a potential method for opportunistic compression on
    top of the path-focused compression.

 3. The --path-walk algorithm is currently sequential and does not use
    multiple threads for delta compression. Threading will be
    implemented in a future change so the computation time will improve
    to better compete in this metric.

There are small benefits in size for my copy of the Git repository:

Repack Method    Pack Size       Time
---------------------------------------
Hash v1             248.8M      30.44s
Hash v2             249.0M      30.15s
Path Walk           213.2M     142.50s

As well as in the nodejs/node repository [3]:

Repack Method    Pack Size       Time
---------------------------------------
Hash v1             739.9M      71.18s
Hash v2             764.6M      67.82s
Path Walk           698.1M     208.10s

[3] https://github.com/nodejs/node

This benefit also repeats in my copy of the Linux kernel repository:

Repack Method    Pack Size       Time
---------------------------------------
Hash v1               2.5G     554.41s
Hash v2               2.5G     549.62s
Path Walk             2.2G    1562.36s

It is important to see that even when the repository shape does not have
many name-hash collisions, there is a slight space boost to be found
using this method.

As this repacking strategy was released in Git for Windows 2.47.0, some
users have reported cases where the --path-walk compression is slightly
worse than the --name-hash-version=2 option. In those cases, it may be
beneficial to combine the two options. However, there has not been a
released version of Git that has both options and I don't have access to
these repos for testing.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:39 -07:00
Derrick Stolee 861d4bc292 pack-objects: introduce GIT_TEST_PACK_PATH_WALK
There are many tests that validate whether 'git pack-objects' works as
expected. Instead of duplicating these tests, add a new test environment
variable, GIT_TEST_PACK_PATH_WALK, that implies --path-walk by default
when specified.

This was useful in testing the implementation of the --path-walk
implementation, helping to find tests that are overly specific to the
default object walk. These include:

 - t0411-clone-from-partial.sh : One test fetches from a repo that does
   not have the boundary objects. This causes the path-based walk to
   fail. Disable the variable for this test.

 - t5306-pack-nobase.sh : Similar to t0411, one test fetches from a repo
   without a boundary object.

 - t5310-pack-bitmaps.sh : One test compares the case when packing with
   bitmaps to the case when packing without them. Since we disable the
   test variable when writing bitmaps, this causes a difference in the
   object list (the --path-walk option adds an extra object). Specify
   --no-path-walk in both processes for the comparison. Another test
   checks for a specific delta base, but when computing dynamically
   without using bitmaps, the base object it too small to be considered
   in the delta calculations so no base is used.

 - t5316-pack-delta-depth.sh : This script cares about certain delta
   choices and their chain lengths. The --path-walk option changes how
   these chains are selected, and thus changes the results of this test.

 - t5322-pack-objects-sparse.sh : This demonstrates the effectiveness of
   the --sparse option and how it combines with --path-walk.

 - t5332-multi-pack-reuse.sh : This test verifies that the preferred
   pack is used for delta reuse when possible. The --path-walk option is
   not currently aware of the preferred pack at all, so finds a
   different delta base.

 - t7406-submodule-update.sh : When using the variable, the --depth
   option collides with the --path-walk feature, resulting in a warning
   message. Disable the variable so this warning does not appear.

I want to call out one specific test change that is only temporary:

 - t5530-upload-pack-error.sh : One test cares specifically about an
   "unable to read" error message. Since the current implementation
   performs delta calculations within the path-walk API callback, a
   different "unable to get size" error message appears. When this
   is changed in a future refactoring, this test change can be reverted.

Similar to GIT_TEST_NAME_HASH_VERSION, we do not add this option to the
linux-TEST-vars CI build as that's already an overloaded build.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:39 -07:00
Derrick Stolee 9fcfe12ac4 pack-objects: update usage to match docs
The t0450 test script verifies that builtin usage matches the synopsis
in the documentation. Adjust the builtin to match and then remove 'git
pack-objects' from the exception list.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:38 -07:00
Derrick Stolee 70664d2865 pack-objects: add --path-walk option
In order to more easily compute delta bases among objects that appear at
the exact same path, add a --path-walk option to 'git pack-objects'.

This option will use the path-walk API instead of the object walk given
by the revision machinery. Since objects will be provided in batches
representing a common path, those objects can be tested for delta bases
immediately instead of waiting for a sort of the full object list by
name-hash. This has multiple benefits, including avoiding collisions by
name-hash.

The objects marked as UNINTERESTING are included in these batches, so we
are guaranteeing some locality to find good delta bases.

After the individual passes are done on a per-path basis, the default
name-hash is used to find other opportunistic delta bases that did not
match exactly by the full path name.

The current implementation performs delta calculations while walking
objects, which is not ideal for a few reasons. First, this will cause
the "Enumerating objects" phase to be much longer than usual. Second, it
does not take advantage of threading during the path-scoped delta
calculations. Even with this lack of threading, the path-walk option is
sometimes faster than the usual approach. Future changes will refactor
this code to allow for threading, but that complexity is deferred until
later to keep this patch as simple as possible.

This new walk is incompatible with some features and is ignored by
others:

 * Object filters are not currently integrated with the path-walk API,
   such as sparse-checkout or tree depth. A blobless packfile could be
   integrated easily, but that is deferred for later.

 * Server-focused features such as delta islands, shallow packs, and
   using a bitmap index are incompatible with the path-walk API.

 * The path walk API is only compatible with the --revs option, not
   taking object lists or pack lists over stdin. These alternative ways
   to specify the objects currently ignores the --path-walk option
   without even a warning.

Future changes will create performance tests that demonstrate the power
of this approach.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:38 -07:00
Derrick Stolee 4bc0ba0829 pack-objects: extract should_attempt_deltas()
This will be helpful in a future change, which will reuse this logic.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:37 -07:00
Derrick Stolee efab7dc1f4 reset: integrate sparse index with --patch
Similar to the previous change for 'git add -p', the reset builtin
checked for integration with the sparse index after possibly redirecting
its logic toward the interactive logic. This means that the builtin
would expand the sparse index to a full one upon read.

Move this check earlier within cmd_reset() to improve performance here.

Add tests to guarantee that we are not universally expanding the index.
Add behavior tests to check that we are doing the same operations as a
full index.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:02:47 -07:00
Derrick Stolee 02ed8555f6 git add: make -p/-i aware of sparse index
It is slow to expand a sparse index in-memory due to parsing of trees.
We aim to minimize that performance cost when possible. 'git add -p'
uses 'git apply' child processes to modify the index, but still there
are some expansions that occur.

It turns out that control flows out of cmd_add() in the interactive
cases before the lines that confirm that the builtin is integrated with
the sparse index.

Moving that integration point earlier in cmd_add() allows 'git add -i'
and 'git add -p' to operate without expanding a sparse index to a full
one.

Add test cases that confirm that these interactive add options work with
the sparse index.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:01:51 -07:00
Derrick Stolee 952de281fe apply: integrate with the sparse index
The sparse index allows storing directory entries in the index, marked
with the skip-wortkree bit and pointing to a tree object. This may be an
unexpected data shape for some implementation areas, so we are rolling
it out incrementally on a builtin-per-builtin basis.

This change enables the sparse index for 'git apply'. The main
motivation for this change is that 'git apply' is used as a child
process of 'git add -p' and expanding the sparse index for each of those
child processes can lead to significant performance issues.

The good news is that the actual index manipulation code used by 'git
apply' is already integrated with the sparse index, so the only product
change is to mark the builtin as allowing the sparse index so it isn't
inflated on read.

The more involved part of this change is around adding tests that verify
how 'git apply' behaves in a sparse-checkout environment and whether or
not the index expands in certain operations.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:00:33 -07:00
Jeff King f710fd7b49 hash-object: handle --literally with OPT_NEGBIT
Since we recently removed the hash_literally() function, the hash-object
--literally option has been simplified to just removing the
INDEX_FORMAT_CHECK flag. Rather than pass it around as a separate bool,
we can just have the option parser remove the bit from the set of flags
directly. This simplifies the helper functions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:11 -07:00
Jeff King 931e5ca507 hash-object: merge HASH_* and INDEX_* flags
The hash-object command has its own custom flag bits that it sets based
on command-line options. But since we dropped hash_literally() in the
previous commit, the only thing we do with those flag bits is convert
them directly into "index_flags" to pass to index_fd().

This extra layer of indirection makes the code harder to read and reason
about. Let's just use the INDEX_* flags directly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:11 -07:00
Jeff King 65a6a79b42 hash-object: stop allowing unknown types
When passed the "--literally" option, hash-object will allow any
arbitrary string for its "-t" type option. Such objects are only useful
for testing or debugging, as they cannot be used in the normal way
(e.g., you cannot fetch their contents!).

Let's drop this feature, which will eventually let us simplify the
object-writing code. This is technically backwards incompatible, but
since such objects were never really functional, it seems unlikely that
anybody will notice.

We will retain the --literally flag, as it also instructs hash-object
not to worry about other format issues (e.g., type-specific things that
fsck would complain about). The documentation does not need to be
updated, as it was always vague about which checks we're loosening (it
uses only the phrase "any garbage").

The code change is a bit hard to verify from just the patch text. We can
drop our local hash_literally() helper, but it was really just wrapping
write_object_file_literally(). We now replace that with calling
index_fd(), as we do for the non-literal code path, but dropping the
INDEX_FORMAT_CHECK flag. This ends up being the same semantically as
what the _literally() code path was doing (modulo handling unknown
types, which is our goal).

We'll be able to clean up these code paths a bit more in subsequent
patches.

The existing test is flipped to show that we now reject the unknown
type. The additional "extra-long type" test is now redundant, as we bail
early upon seeing a bogus type.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:11 -07:00
Jeff King 4ae0e9423c fsck: stop using object_info->type_name strbuf
When fsck-ing a loose object, we use object_info's type_name strbuf to
record the parsed object type as a string. For most objects this is
redundant with the object_type enum, but it does let us report the
string when we encounter an object with an unknown type (for which there
is no matching enum value).

There are a few downsides, though:

  1. The code to report these cases is not actually robust. Since we did
     not pass a strbuf to unpack_loose_header(), we only retrieved types
     from headers up to 32 bytes. In longer cases, we'd simply say
     "object corrupt or missing".

  2. This is the last caller that uses object_info's type_name strbuf
     support. It would be nice to refactor it so that we can simplify
     that code.

  3. Likewise, we'll check the hash of the object using its unknown type
     (again, as long as that type is short enough). That depends on the
     hash_object_file_literally() code, which we'd eventually like to
     get rid of.

So we can simplify things by bailing immediately in read_loose_object()
when we encounter an unknown type. This has a few user-visible effects:

  a. Instead of producing a single line of error output like this:

       error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object is of unknown type 'bogus': .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6

     we'll now issue two lines (the first from read_loose_object() when
     we see the unparsable header, and the second from the fsck code,
     since we couldn't read the object):

       error: unable to parse type from header 'bogus 4' of .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
       error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object corrupt or missing: .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6

     This is a little more verbose, but this sort of error should be
     rare (such objects are almost impossible to work with, and cannot
     be transferred between repositories as they are not representable
     in packfiles). And as a bonus, reporting the broken header in full
     could help with debugging other cases (e.g., a header like "blob
     xyzzy\0" would fail in parsing the size, but previously we'd not
     have showed the offending bytes).

  b. An object with an unknown type will be reported as corrupt, without
     actually doing a hash check. Again, I think this is unlikely to
     matter in practice since such objects are totally unusable.

We'll update one fsck test to match the new error strings. And we can
remove another test that covered the case of an object with an unknown
type _and_ a hash corruption. Since we'll skip the hash check now in
this case, the test is no longer interesting.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:10 -07:00
Jeff King aac2abeca7 cat-file: use type enum instead of buffer for -t option
Now that we no longer support OBJECT_INFO_ALLOW_UNKNOWN_TYPE, there is
no need to pass a strbuf into oid_object_info_extended() to record the
type. The regular object_type enum is sufficient to capture all of the
types we will allow.

This simplifies the code a bit, and will eventually let us drop
object_info's type_name strbuf support.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:10 -07:00
Jeff King f227fc7d43 cat-file: make --allow-unknown-type a noop
The cat-file command has some minor support for handling objects with
"unknown" types. I.e., strings that are not "blob", "commit", "tree", or
"tag".

In theory this could be used for debugging or experimenting with
extensions to Git. But in practice this support is not very useful:

  1. You can get the type and size of such objects, but nothing else.
     Not even the contents!

  2. Only loose objects are supported, since packfiles use numeric ids
     for the types, rather than strings.

  3. Likewise you cannot ever transfer objects between repositories,
     because they cannot be represented in the packfiles used for the
     on-the-wire protocol.

The support for these unknown types complicates the object-parsing code,
and has led to bugs such as b748ddb7a4 (unpack_loose_header(): fix
infinite loop on broken zlib input, 2025-02-25). So let's drop it.

The first step is to remove the user-facing parts, which are accessible
only via cat-file. This is technically backwards-incompatible, but given
the limitations listed above, these objects couldn't possibly be useful
in any workflow.

However, we can't just rip out the option entirely. That would hurt a
caller who ran:

  git cat-file -t --allow-unknown-object <oid>

and fed it normal, well-formed objects. There --allow-unknown-type was
doing nothing, but we wouldn't want to start bailing with an error. So
to protect any such callers, we'll retain --allow-unknown-type as a
noop.

The code change is fairly small (but we'll able to clean up more code in
follow-on patches). The test updates drop any use of the option. We
still retain tests that feed the broken objects to cat-file without
--allow-unknown-type, as we should continue to confirm that those
objects are rejected. Note that in one spot we can drop a layer of loop,
re-indenting the body; viewing the diff with "-w" helps there.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:09 -07:00
Junio C Hamano 4dda60c9df Merge branch 'ps/maintenance-missing-tasks'
Make repository clean-up tasks "gc" can do available to "git
maintenance" front-end.

* ps/maintenance-missing-tasks:
  builtin/maintenance: introduce "rerere-gc" task
  builtin/gc: move rerere garbage collection into separate function
  builtin/maintenance: introduce "worktree-prune" task
  builtin/gc: move pruning of worktrees into a separate function
  builtin/gc: remove global variables where it is trivial to do
  builtin/gc: fix indentation of `cmd_gc()` parameters
2025-05-15 17:24:56 -07:00
Johannes Schindelin 6c91162449 fetch: avoid unnecessary work when there is no current branch
As pointed out by CodeQL, `branch_get()` may return `NULL`, in which
case `branch_has_merge_config()` would return early, but we can even
avoid enumerating the refs prefixes in that case, saving even more CPU
cycles.

Technically, we should enclose these two statements in an `if (branch)
{...}` block, but the indentation is already quite deep, therefore I
refrained from doing that.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-15 13:46:47 -07:00
Johannes Schindelin c607410ada fetch: carefully clear local variable's address after use
As pointed out by CodeQL, it is a potentially dangerous practice to
store local variables' addresses in non-local structs. Yet this is
exactly what happens with the `acked_commits` attribute that is used in
`cmd_fetch()`: The pointer to a local variable is assigned to it.

Now, it is Git's convention that `cmd_*()` functions are essentially
only returning just before exiting the process, therefore there is
little danger that this attribute is used after the code flow returns
from that function.

However, code in `cmd_*()` function is often so useful that it gets
lifted into a library function, at which point this issue could become a
real problem.

Let's make sure to clear the `acked_commits` attribute out after it was
used, and before the function returns (at which point the address would
go stale).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-15 13:46:45 -07:00
Johannes Schindelin 131a8fa815 commit: simplify code
The difference of two unsigned integers is defined to be unsigned, and
therefore it is misleading to check whether it is greater than zero
(instead, the more natural way would be to check whether the difference
is zero or not).

Let's instead avoid the subtraction altogether, and compare the two
operands directly, which makes the code more obvious as a side effect.

Pointed out by CodeQL's rule with the ID
`cpp/unsigned-difference-expression-compared-zero`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-15 13:46:44 -07:00
Elijah Newren d2c3e94a0a replay: replace the_repository with repo parameter passed to cmd_replay ()
Replace the_repository everywhere with repo, feed repo from cmd_replay()
to all the other functions in the file that need it, and remove the
UNUSED annotation on repo.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-14 15:00:49 -07:00
Junio C Hamano 07572f220a whatchanged: remove when built with WITH_BREAKING_CHANGES
As we made "git whatchanged" require "--i-still-use-this" and asked
the users to report if they still want to use it, the logical next
step is to allow us build Git without "whatchanged" to prepare for
its eventual removal.

If we were to follow the pattern established in 8ccc75c2 (remote:
announce removal of "branches/" and "remotes/", 2025-01-22), we can
do this together with the documentation update to officially list
that the command will be removed in the BreakingChanges document,
but let's just keep the changes separate just in case we want to
proceed a bit slower.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-12 15:30:12 -07:00
Junio C Hamano 731a2c7dda whatchanged: require --i-still-use-this
The documentation of "git whatchanged" is pretty explicit that the
command was retained for historical reasons to help those whose fingers
cannot be retrained.  Let's see if they still are finding it hard to
type "git log --raw" instead of "git whatchanged" by marking the
command as "nominated for removal", and require "--i-still-use-this"
on the command line.  Adjust the tests so that the option is passed
when we invoke the command.  In addition, we test that the command
fails when "--i-still-use-this" is not given.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-12 15:29:37 -07:00
Junio C Hamano 6dbc41631d Merge branch 'ds/fix-thin-fix'
"git index-pack --fix-thin" used to abort to prevent a cycle in
delta chains from forming in a corner case even when there is no
such cycle.

* ds/fix-thin-fix:
  index-pack: allow revisiting REF_DELTA chains
  t5309: create failing test for 'git index-pack'
  test-tool: add pack-deltas helper
2025-05-12 14:22:49 -07:00
Junio C Hamano bd99d6e8db Merge branch 'ps/object-store-cleanup'
Further code clean-up in the object-store layer.

* ps/object-store-cleanup:
  object-store: drop `repo_has_object_file()`
  treewide: convert users of `repo_has_object_file()` to `has_object()`
  object-store: allow fetching objects via `has_object()`
  object-store: move function declarations to their respective subsystems
  object-store: move and rename `odb_pack_keep()`
  object-store: drop `loose_object_path()`
  object-store: move `struct packed_git` into "packfile.h"
2025-05-12 14:22:49 -07:00
Junio C Hamano 4511d56e1a you-still-use-that??: help deprecating commands for removal
Commands slated for removal like "git pack-redundant" now require
an explicit "--i-still-use-this" option to run.  This is to
discourage casual use and surface their pending deprecation to
users.

The warning message is long, so factor it into a helper function
you_still_use_that() to simplify reuse by other commands.

Also add a missing test to ensure this enforcement works for
"pack-redundant".

Helped-by: Elijah Newren <newren@gmail.com>
[en: log message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-12 13:11:43 -07:00
Jeff King 2744646834 oidmap: rename oidmap_free() to oidmap_clear()
This function does not free the oidmap struct itself; it just drops all
items from the map (using hashmap_clear_() internally). It should be
called oidmap_clear(), per CodingGuidelines.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-12 13:06:26 -07:00
Lidong Yan e5dd0a05ed builtin/am: fix memory leak in `split_mail_stgit_series`
In builtin/am.c:split_mail_stgit_series, if `fopen` failed,
`series_dir_buf` allocated by `xstrdup` will leak. Add `free` in
`!fp` if branch will prevent the leak.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-12 10:28:16 -07:00
Junio C Hamano 0730906043 Merge branch 'ps/mv-contradiction-fix'
"git mv a a/b dst" would ask to move the directory 'a' itself, as
well as its contents, in a single destination directory, which is
a contradicting request that is impossible to satisfy. This case is
now detected and the command errors out.

* ps/mv-contradiction-fix:
  builtin/mv: convert assert(3p) into `BUG()`
  builtin/mv: bail out when trying to move child and its parent
2025-05-08 12:36:32 -07:00
Patrick Steinhardt 283621a553 builtin/maintenance: introduce "rerere-gc" task
While git-gc(1) knows to garbage collect the rerere cache,
git-maintenance(1) does not yet have a task for this cleanup. Introduce
a new "rerere-gc" task to plug this gap.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-07 10:50:15 -07:00
Patrick Steinhardt 255251cce1 builtin/gc: move rerere garbage collection into separate function
In a subsequent commit we are going to introduce a new "rerere-gc" task
for git-maintenance(1). To prepare for this, refactor the code that
spawns `git rerere gc` into a separate function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-07 10:50:15 -07:00
Patrick Steinhardt ec31474656 builtin/maintenance: introduce "worktree-prune" task
While git-gc(1) knows to prune stale worktrees, git-maintenance(1) does
not yet have a task for this cleanup. Introduce a new "worktree-prune"
task to plug this gap.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-07 10:50:14 -07:00
Patrick Steinhardt ae76c1c990 builtin/gc: move pruning of worktrees into a separate function
In a subsequent commit we will introduce a new "worktree-prune" task for
git-maintenance(1). To prepare for this, refactor the code that spawns
`git worktree prune` into a separate function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-07 10:50:14 -07:00
Patrick Steinhardt e3a69d72b1 builtin/gc: remove global variables where it is trivial to do
We use a couple of global variables to assemble command line arguments
for subprocesses we execute in git-gc(1). All of these variables except
the one for git-repack(1) are only used in a single place though, so
they don't really add anything but confusion.

Remove those variables.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-07 10:50:13 -07:00
Patrick Steinhardt 58f62837fb builtin/gc: fix indentation of `cmd_gc()` parameters
The parameters of `cmd_gc()` aren't indented properly. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-07 10:50:13 -07:00
Patrick Steinhardt 974f0d4664 builtin/mv: convert assert(3p) into `BUG()`
The use of asserts is discouraged in our codebase because they lead to
different behaviour depending on how Git is built. When being unsure
enough whether a condition always holds so that one adds the assert,
then the assert should probably trigger regardless of how Git is being
built.

Drop the call to assert(3p) in git-mv(1) and instead use `BUG()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-30 15:22:04 -07:00
Patrick Steinhardt 8583c9dcbc builtin/mv: bail out when trying to move child and its parent
We have a known issue in git-mv(1) where moving both a child and any of
its parents causes an assert to trigger because the child cannot be
found anymore in the index. We have added a test for this in commit
0fcd473fdd (t7001: add failure test which triggers assertion,
2024-10-22) without addressing the issue, which is why the test itself
is marked as `test_expect_failure`.

The behaviour of that test relies on a call to assert(3p) though, which
may or may not be compiled into the resulting binary depending on
whether or not we pass `-DNDEBUG`. When these asserts are compiled into
Git this may cause our CI to hang on Windows though, because asserts may
cause a modal window to be shown.

While we could work around the issue by converting this into a call to
`BUG()`, let's rather address the root cause of the issue by bailing out
in case we see that both a child and any of its parents are being moved
in the same command.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-30 15:05:15 -07:00
Junio C Hamano 0c9d6b7ced Merge branch 'jh/gc-launchctl-schedule-fix'
Fix for scheduled maintenance tasks on platforms using launchctl.

* jh/gc-launchctl-schedule-fix:
  maintenance: fix launchctl calendar intervals
2025-04-29 14:21:29 -07:00
Junio C Hamano 5a6de390d8 Merge branch 'az/tighten-string-array-constness'
Code clean-up.

* az/tighten-string-array-constness:
  global: mark usage strings and string tables const
2025-04-29 14:21:28 -07:00
Junio C Hamano a501213402 Merge branch 'ua/call-repo-config-with-possibly-null-repository'
Since a call to repo_config() can be called with repo set to NULL
these days, a command that is marked as RUN_SETUP in the builtin
command table does not have to check repo with NULL before making
the call.

* ua/call-repo-config-with-possibly-null-repository:
  builtin/difftool: remove unnecessary if statement
  builtin/add: remove unnecessary if statement
2025-04-29 14:21:27 -07:00
Patrick Steinhardt 062b914c84 treewide: convert users of `repo_has_object_file()` to `has_object()`
As the comment of `repo_has_object_file()` and its `_with_flags()`
variant tells us, these functions are considered to be deprecated in
favor of `has_object()`. There are a couple of slight benefits in favor
of the replacement:

  - The new function has a short-and-sweet name.

  - More explicit defaults: `has_object()` doesn't fetch missing objects
    via promisor remotes, and neither does it reload packfiles if an
    object wasn't found by default. This ensures that it becomes
    immediately obvious when a simple object existence check may result
    in expensive actions.

Most importantly though, it is confusing that we have two sets of
functions that ultimately do the same thing, but with different
defaults.

Start sunsetting `repo_has_object_file()` and its `_with_flags()`
sibling by replacing all callsites with `has_object()`:

  - `repo_has_object_file(...)` is equivalent to
    `has_object(..., HAS_OBJECT_RECHECK_PACKED | HAS_OBJECT_FETCH_PROMISOR)`.

  - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT)`
    is equivalent to `has_object(..., 0)`.

  - `repo_has_object_file_with_flags(..., OBJECT_INFO_SKIP_FETCH_OBJECT)`
    is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED)`.

  - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)`
    is equivalent to `has_object(..., HAS_OBJECT_FETCH_PROMISOR)`.

The replacements should be functionally equivalent.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29 10:08:13 -07:00
Patrick Steinhardt 1a793261c5 object-store: move function declarations to their respective subsystems
We carry declarations for a couple of functions in "object-store.h" that
are not defined in "object-store.c", but in a different subsystem. Move
these declarations to the respective headers whose matching code files
carry the corresponding definition.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29 10:08:12 -07:00
Patrick Steinhardt 0b8ed25b66 object-store: move and rename `odb_pack_keep()`
The function `odb_pack_keep()` creates a file at the passed-in path. If
this fails, then the function re-tries by first creating any potentially
missing leading directories and then trying to create the file once
again. As such, this function doesn't host any kind of logic that is
specific to the object store, but is rather a generic helper function.

Rename the function to `safe_create_file_with_leading_directories()` and
move it into "path.c". While at it, refactor it so that it loses its
dependency on `the_repository`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29 10:08:12 -07:00
Derrick Stolee 98f8854c94 index-pack: allow revisiting REF_DELTA chains
As detailed in the previous changes to t5309-pack-delta-cycles.sh, the
logic within 'git index-pack' to analyze an incoming thin packfile with
REF_DELTAs is suspect. The algorithm is overly cautious around delta
cycles, and that leads in fact to failing even when there is no cycle.

This change adjusts the algorithm to no longer fail in these cases. In
fact, these cycle cases will no longer fail but more importantly the
valid cases will no longer fail, either. The resulting packfile from the
--fix-thin operation will not have cycles either since REF_DELTAs are
forbidden from the on-disk format and OFS_DELTAs are impossible to write
as a cycle.

The crux of the matter is how the algorithm works when the REF_DELTAs
point to base objects that exist in the local repository. When reading
the thin packfile, the object IDs for the delta objects are unknown so
we do not have the delta chain structure automatically. Instead, we need
to start somewhere by selecting a delta whose base is inside our current
object database.

Consider the case where the packfile has two REF_DELTA objects, A and B,
and the delta chain looks like "A depends on B" and "B depends on C" for
some third object C, where C is already in the current repository. The
algorithm _should_ start with all objects that depend on C, finding B,
and then moving on to all objects depending on B, finding A.

However, if the repository also already has object B, then the delta
chain can be analyzed in a different order. The deltas with base B can
be analyzed first, finding A, and then the deltas with base C are
analyzed, finding B. The algorithm currently continues to look for
objects that depend on B, finding A again. This fails due to A's
'real_type' member already being overwritten from OBJ_REF_DELTA to the
correct object type.

This scenario is possible in a typical 'git fetch' where the client does
not advertise B as a 'have' but requests A as a 'want' (and C is noticed
as a common object based on other 'have's). The reason this isn't
typically seen is that most Git servers use OFS_DELTAs to represent
deltas within a packfile. However, if a server uses only REF_DELTAs,
then this kind of issue can occur. There is nothing in the explicit
packfile format that states this use of inter-pack REF_DELTA is
incorrect, only that REF_DELTAs should not be used in the on-disk
representation to avoid cycles.

This die() was introduced in ab791dd138 (index-pack: fix race condition
with duplicate bases, 2014-08-29). Several refactors have adjusted the
error message and the surrounding logic, but this issue has existed for
a longer time as that was only a conversion from an assert().

The tests in t5309 originated in 3b910d0c5e (add tests for indexing
packs with delta cycles, 2013-08-23) and b2ef3d9ebb (test index-pack on
packs with recoverable delta cycles, 2013-08-23). These changes make
note that the current behavior of handling "resolvable" cycles is mostly
a documentation-only test, not that this behavior is the best way for
Git to handle the situation.

The fix here is somewhat complicated due to the amount of state being
adjusted by the loop within threaded_second_pass(). Instead of trying to
resume the start of the loop while adjusting the necessary context, I
chose to scan the REF_DELTAs depending on the current 'parent' and skip
any that have already been processed. This necessarily leaves us in a
state where 'child' and 'child_obj' could be left as NULL and that must
be handled later. There is also some careful handling around skipping
REF_DELTAs when there are also OFS_DELTAs depending on that parent.
There may be value in extending 'test-tool pack-deltas' to allow writing
OFS_DELTAs in order to exercise this logic across the delta types.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-28 15:37:26 -07:00
Junio C Hamano 028c43269e Merge branch 'rj/build-tweaks'
Various build tweaks, including CSPRNG selection on some platforms.

* rj/build-tweaks:
  config.mak.uname: set CSPRNG_METHOD to getrandom on Linux
  config.mak.uname: add arc4random to the cygwin build
  config.mak.uname: add sysinfo() configuration for cygwin
  builtin/gc.c: correct RAM calculation when using sysinfo
  config.mak.uname: add clock_gettime() to the cygwin build
  config.mak.uname: add HAVE_GETDELIM to the cygwin section
  config.mak.uname: only set NO_REGEX on cygwin for v1.7
  config.mak.uname: add a note about NO_STRLCPY for Linux
  Makefile: remove NEEDS_LIBRT build variable
  meson.build: set default help format to html on windows
  meson.build: only set build variables for non-default values
  Makefile: only set some BASIC_CFLAGS when RUNTIME_PREFIX is set
  meson.build: remove -DCURL_DISABLE_TYPECHECK
2025-04-24 17:25:34 -07:00
Junio C Hamano 2bc5414c41 Merge branch 'ps/parse-options-integers'
Update parse-options API to catch mistakes to pass address of an
integral variable of a wrong type/size.

* ps/parse-options-integers:
  parse-options: detect mismatches in integer signedness
  parse-options: introduce precision handling for `OPTION_UNSIGNED`
  parse-options: introduce precision handling for `OPTION_INTEGER`
  parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()`
  parse-options: support unit factors in `OPT_INTEGER()`
  global: use designated initializers for options
  parse: fix off-by-one for minimum signed values
2025-04-24 17:25:34 -07:00
Junio C Hamano 36d8035d27 Merge branch 'ps/object-file-cleanup'
Code clean-up.

* ps/object-file-cleanup:
  object-store: merge "object-store-ll.h" and "object-store.h"
  object-store: remove global array of cached objects
  object: split out functions relating to object store subsystem
  object-file: drop `index_blob_stream()`
  object-file: split up concerns of `HASH_*` flags
  object-file: split out functions relating to object store subsystem
  object-file: move `xmmap()` into "wrapper.c"
  object-file: move `git_open_cloexec()` to "compat/open.c"
  object-file: move `safe_create_leading_directories()` into "path.c"
  object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-24 17:25:33 -07:00
Junio C Hamano d61ff9c237 Merge branch 'ps/object-file-cleanup' into ps/object-store-cleanup
* ps/object-file-cleanup:
  object-store: merge "object-store-ll.h" and "object-store.h"
  object-store: remove global array of cached objects
  object: split out functions relating to object store subsystem
  object-file: drop `index_blob_stream()`
  object-file: split up concerns of `HASH_*` flags
  object-file: split out functions relating to object store subsystem
  object-file: move `xmmap()` into "wrapper.c"
  object-file: move `git_open_cloexec()` to "compat/open.c"
  object-file: move `safe_create_leading_directories()` into "path.c"
  object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-24 11:37:21 -07:00
Junio C Hamano 29860f3282 Merge branch 'ja/doc-reset-mv-rm-markup-updates'
Doc mark-up updates.

* ja/doc-reset-mv-rm-markup-updates:
  doc: add markup for characters in Guidelines
  doc: fix asciidoctor synopsis processing of triple-dots
  doc: convert git-mv to new documentation format
  doc: move synopsis git-mv commands in the synopsis section
  doc: convert git-rm to new documentation format
  doc: fix synopsis analysis logic
  doc: convert git-reset to new documentation format
2025-04-23 13:58:51 -07:00
Josh Heinrichs eb2d7beb0e maintenance: fix launchctl calendar intervals
When using the launchctl scheduler, the weekly job runs daily, and the
daily job runs on the first six days of each month. This appears to be
due to specifying "Day" in the calendar intervals, which according to
launchd.plist(5) is for specifying days of the month rather than days of
the week. The behaviour of running a job on the 0th day is undocumented,
but in my testing appears to be the same as not specifying "Day" in the
calendar interval, in which case the job will run daily.

Use "Weekday" in the calendar intervals, which is the correct way to
schedule jobs to run on specific days of the week.

Signed-off-by: Josh Heinrichs <joshiheinrichs@gmail.com>
Acked-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-23 12:58:52 -07:00
Ahelenia Ziemiańska 86eef3541e global: mark usage strings and string tables const
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-21 21:01:19 -07:00
Usman Akinyemi b502a648ef builtin/difftool: remove unnecessary if statement
Since we already teach the `repo_config()` in "f29f1990b5
(config: teach repo_config to allow `repo` to be NULL, 2025-03-08)"
to allow `repo` to be NULL, no need to check if `repo` is NULL
before calling `repo_config()`.

Suggested-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-20 14:17:22 -07:00
Usman Akinyemi 2e4e439ec2 builtin/add: remove unnecessary if statement
Since we already teach the `repo_config()` in "f29f1990b5
(config: teach repo_config to allow `repo` to be NULL, 2025-03-08)"
to allow `repo` to be NULL, no need to check if `repo` is NULL
before calling `repo_config()`.

Suggested-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-20 14:17:20 -07:00
Junio C Hamano 72801dfde1 Merge branch 'ua/update-update-server-info'
Code simplification.

* ua/update-update-server-info:
  builtin/update-server-info: remove unnecessary if statement
2025-04-17 10:28:19 -07:00
Junio C Hamano c3ebf18eb2 Merge branch 'en/merge-recursive-debug'
Remove remnants of the recursive merge strategy backend, which was
superseded by the ort merge strategy.

* en/merge-recursive-debug:
  builtin/{merge,rebase,revert}: remove GIT_TEST_MERGE_ALGORITHM
  tests: remove GIT_TEST_MERGE_ALGORITHM and test_expect_merge_algorithm
  merge-recursive.[ch]: thoroughly debug these
  merge, sequencer: switch recursive merges over to ort
  sequencer: switch non-recursive merges over to ort
  merge-ort: enable diff-algorithms other than histogram
  builtin/merge-recursive: switch to using merge_ort_generic()
  checkout: replace merge_trees() with merge_ort_nonrecursive()
2025-04-17 10:28:18 -07:00
Junio C Hamano fe7ae3b87e Merge branch 'kn/blame-porcelain-unblamable'
"git blame --porcelain" mode now talks about unblamable lines and
lines that are blamed to an ignored commit.

* kn/blame-porcelain-unblamable:
  blame: print unblamable and ignored commits in porcelain mode
2025-04-17 10:28:18 -07:00
Junio C Hamano b45113f581 Merge branch 'jk/fetch-follow-remote-head-fix'
"git fetch [<remote>]" with only the configured fetch refspec
should be the only thing to update refs/remotes/<remote>/HEAD,
but the code was overly eager to do so in other cases.

* jk/fetch-follow-remote-head-fix:
  fetch: make set_head() call easier to read
  fetch: don't ask for remote HEAD if followRemoteHEAD is "never"
  fetch: only respect followRemoteHEAD with configured refspecs
2025-04-17 10:28:17 -07:00
Patrick Steinhardt 791aeddfa2 parse-options: detect mismatches in integer signedness
It was reported that "t5620-backfill.sh" fails on s390x and sparc64 in a
test that exercises the "--min-batch-size" command line option. The
symptom was that the option didn't seem to have an effect: we didn't
fetch objects with a batch size of 20, but instead fetched all objects
at once.

As it turns out, the root cause is that `--min-batch-size` uses
`OPT_INTEGER()` to parse the command line option. While this macro
expects the caller to pass a pointer to an integer, we instead pass a
pointer to a `size_t`. This coincidentally works on most platforms, but
it breaks apart on the mentioned platforms because they are big endian.

This issue isn't specific to git-backfill(1): there are a couple of
other places where we have the same type confusion going on. This
indicates that the issue really is the interface that the parse-options
subsystem provides -- it is simply too easy to get this wrong as there
isn't any kind of compiler warning, and things just work on the most
common systems.

Address the systemic issue by introducing two new build asserts
`BARF_UNLESS_SIGNED()` and `BARF_UNLESS_UNSIGNED()`. As the names
already hint at, those macros will cause a compiler error when passed a
value that is not signed or unsigned, respectively.

Adapt `OPT_INTEGER()`, `OPT_UNSIGNED()` as well as `OPT_MAGNITUDE()` to
use those asserts. This uncovers a small set of sites where we indeed
have the same bug as in git-backfill(1). Adapt all of them to use the
correct option.

Reported-by: Todd Zullinger <tmz@pobox.com>
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-17 08:15:16 -07:00
Patrick Steinhardt 09705696f7 parse-options: introduce precision handling for `OPTION_INTEGER`
The `OPTION_INTEGER` option type accepts a signed integer. The type of
the underlying integer is a simple `int`, which restricts the range of
values accepted by such options. But there is a catch: because the
caller provides a pointer to the value via the `.value` field, which is
a simple void pointer. This has two consequences:

  - There is no check whether the passed value is sufficiently long to
    store the entire range of `int`. This can lead to integer wraparound
    in the best case and out-of-bounds writes in the worst case.

  - Even when a caller knows that they want to store a value larger than
    `INT_MAX` they don't have a way to do so.

In practice this doesn't tend to be a huge issue because users typically
don't end up passing huge values to most commands. But the parsing logic
is demonstrably broken, and it is too easy to get the calling convention
wrong.

Improve the situation by introducing a new `precision` field into the
structure. This field gets assigned automatically by `OPT_INTEGER_F()`
and tracks the size of the passed value. Like this it becomes possible
for the caller to pass arbitrarily-sized integers and the underlying
logic knows to handle it correctly by doing range checks. Furthermore,
convert the code to use `strtoimax()` intstead of `strtol()` so that we
can also parse values larger than `LONG_MAX`.

Note that we do not yet assert signedness of the passed variable, which
is another source of bugs. This will be handled in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-17 08:15:15 -07:00
Patrick Steinhardt 785c17df78 parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()`
With the preceding commit, `OPT_INTEGER()` has learned to support unit
factors. Consequently, the major differencen between `OPT_INTEGER()` and
`OPT_MAGNITUDE()` isn't the support of unit factors anymore, as both of
them do support them now. Instead, the difference is that one handles
signed and the other handles unsigned integers.

Adapt the name of `OPT_MAGNITUDE()` accordingly by renaming it to
`OPT_UNSIGNED()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-17 08:15:15 -07:00
Patrick Steinhardt d012ceb5f3 global: use designated initializers for options
While we expose macros for most of our different option types understood
by the "parse-options" subsystem, not every combination of fields that
has one as that would otherwise quickly lead to an explosion of macros.
Instead, we just initialize structures manually for those variants of
fields that don't have a macro.

Callsites that open-code these structure initialization don't use
designated initializers though and instead just provide values for each
of the fields that they want to initialize. This has three significant
downsides:

  - Callsites need to specify all values up to the last field that they
    care about. This often includes fields that should simply be left at
    their default zero-initialized state, which adds distraction.

  - Any reader not deeply familiar with the layout of the structure
    has a hard time figuring out what the respective initializers mean.

  - Reordering or introducing new fields in the middle of the structure
    is impossible without adapting all callsites.

Convert all sites to instead use designated initializers, which we have
started using in our codebase quite a while ago. This allows us to skip
any default-initialized fields, gives the reader context by specifying
the field names and allows us to reorder or introduce new fields where
we want to.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-17 08:15:15 -07:00
Ramsay Jones c9a51775a3 builtin/gc.c: correct RAM calculation when using sysinfo
The man page for sysinfo(2) on Linux states that (from v2.3.48) the
sizes of the memory and swap fields, of the returned structure, are
given as multiples of 'mem_unit' bytes. In earlier versions (prior to
v2.3.23 on i386 in particular), the 'mem_unit' field was not part of
the structure, and all sizes were measured in bytes. The man page does
not discuss the motivation for this change, but it is possible that the
change was intended for the, relatively rare, 32-bit platform with more
than 4GB of memory.

The total_ram() function makes the assumption that the 'totalram' field
of the 'struct sysinfo' is measured in bytes, or alternatively that the
'mem_unit' field is always equal to one. Having writen a program to call
the sysinfo() function and print the structure fields, it seems that, on
Linux x84_64 and i686 anyway, the 'mem_unit' field is indeed set to one
(note that the 32-bit system had only 2GB ram). However, cygwin also has
an sysinfo() implementation, which gives the following values:

  $ ./sysinfo
  uptime:      21381
  loads:       0, 0, 0
  total ram:   2074637
  free ram:    843237
  shared ram:  0
  buffer ram:  0
  total swap:  327680
  free swap:   306932
  procs:       15
  total high:  0
  free high:   0
  mem_unit:    4096

  total ram: 8497713152
  $

[This laptop has 8GB ram, so a little bit seems to be missing. ;) ]

Modify the total_ram() function to allow for the possibility that the
memory size is not specified in bytes (ie 'mem_unit' is greater than
one).

Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-16 20:43:45 -07:00
Junio C Hamano a271b05066 Merge branch 'ps/cat-file-filter-batch'
"git cat-file --batch" and friends learned to allow "--filter=" to
omit certain objects, just like the transport layer does.

* ps/cat-file-filter-batch:
  builtin/cat-file: use bitmaps to efficiently filter by object type
  builtin/cat-file: deduplicate logic to iterate over all objects
  pack-bitmap: introduce function to check whether a pack is bitmapped
  pack-bitmap: add function to iterate over filtered bitmapped objects
  pack-bitmap: allow passing payloads to `show_reachable_fn()`
  builtin/cat-file: support "object:type=" objects filter
  builtin/cat-file: support "blob:limit=" objects filter
  builtin/cat-file: support "blob:none" objects filter
  builtin/cat-file: wire up an option to filter objects
  builtin/cat-file: introduce function to report object status
  builtin/cat-file: rename variable that tracks usage
2025-04-16 13:54:21 -07:00
Junio C Hamano 47478802da Merge branch 'kn/non-transactional-batch-updates'
Updating multiple references have only been possible in all-or-none
fashion with transactions, but it can be more efficient to batch
multiple updates even when some of them are allowed to fail in a
best-effort manner.  A new "best effort batches of updates" mode
has been introduced.

* kn/non-transactional-batch-updates:
  update-ref: add --batch-updates flag for stdin mode
  refs: support rejection in batch updates during F/D checks
  refs: implement batch reference update support
  refs: introduce enum-based transaction error types
  refs/reftable: extract code from the transaction preparation
  refs/files: remove duplicate duplicates check
  refs: move duplicate refname update check to generic layer
  refs/files: remove redundant check in split_symref_update()
2025-04-16 13:54:19 -07:00
Junio C Hamano 01a6e244f9 Merge branch 'ps/maintenance-reflog-expire'
"git maintenance" learns a new task to expire reflog entries.

* ps/maintenance-reflog-expire:
  builtin/maintenance: introduce "reflog-expire" task
  builtin/gc: split out function to expire reflog entries
  builtin/reflog: make functions regarding `reflog_expire_options` public
  builtin/reflog: stop storing per-reflog expiry dates globally
  builtin/reflog: stop storing default reflog expiry dates globally
  reflog: rename `cmd_reflog_expire_cb` to `reflog_expire_options`
2025-04-16 13:54:19 -07:00
Junio C Hamano 1a1661bd41 Merge branch 'jt/rev-list-z'
"git rev-list" learns machine-parsable output format that delimits
each field with NUL.

* jt/rev-list-z:
  rev-list: support NUL-delimited --missing option
  rev-list: support NUL-delimited --boundary option
  rev-list: support delimiting objects with NUL bytes
  rev-list: refactor early option parsing
  rev-list: inline `show_object_with_name()` in `show_object()`
2025-04-16 13:54:18 -07:00