Commit Graph

12753 Commits (v2.50.1)

Author SHA1 Message Date
Usman Akinyemi 6aa09fd872 version: extend get_uname_info() to hide system details
Currently, get_uname_info() function provides the full OS information.
In a following commit, we will need it to provide only the OS name.

Let's extend it to accept a "full" flag that makes it switch between
providing full OS information and providing only the OS name.

We may need to refactor this function in the future if an
`osVersion.format` is added.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 09:05:12 -08:00
Usman Akinyemi 0a78d61247 version: refactor get_uname_info()
Some code from "builtin/bugreport.c" uses uname(2) to get system
information.

Let's refactor this code into a new get_uname_info() function, so
that we can reuse it in a following commit.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 09:05:12 -08:00
Junio C Hamano c3fffcfe8e Merge branch 'bf/fetch-set-head-fix'
Fetching into a bare repository incorrectly assumed it always used
a mirror layout when deciding to update remote-tracking HEAD, which
has been corrected.

* bf/fetch-set-head-fix:
  fetch set_head: fix non-mirror remotes in bare repositories
  fetch set_head: refactor to use remote directly
2025-02-14 17:53:48 -08:00
Junio C Hamano 5785d9143b Merge branch 'tc/clone-single-revision'
"git clone" learned to make a shallow clone for a single commit
that is not necessarily be at the tip of any branch.

* tc/clone-single-revision:
  builtin/clone: teach git-clone(1) the --revision= option
  parse-options: introduce die_for_incompatible_opt2()
  clone: introduce struct clone_opts in builtin/clone.c
  clone: add tags refspec earlier to fetch refspec
  clone: refactor wanted_peer_refs()
  clone: make it possible to specify --tags
  clone: cut down on global variables in clone.c
2025-02-14 17:53:48 -08:00
Junio C Hamano 998c5f0c75 Merge branch 'ms/refspec-cleanup'
Code clean-up.  cf. <Z6G-toOJjMmK8iJG@pks.im>

* ms/refspec-cleanup:
  refspec: relocate apply_refspecs and related funtions
  refspec: relocate matching related functions
  remote: rename query_refspecs functions
  refspec: relocate refname_matches_negative_refspec_item
  remote: rename function omit_name_by_refspec
2025-02-12 10:08:54 -08:00
Junio C Hamano 5b9d01bc4d Merge branch 'zh/gc-expire-to'
"git gc" learned the "--expire-to" option and passes it down to
underlying "git repack".

* zh/gc-expire-to:
  gc: add `--expire-to` option
2025-02-12 10:08:53 -08:00
Junio C Hamano 07c401d392 Merge branch 'ps/repack-keep-unreachable-in-unpacked-repo'
"git repack --keep-unreachable" to send unreachable objects to the
main pack "git repack -ad" produces did not work when there is no
existing packs, which has been corrected.

* ps/repack-keep-unreachable-in-unpacked-repo:
  builtin/repack: fix `--keep-unreachable` when there are no packs
2025-02-12 10:08:52 -08:00
Junio C Hamano aae91a86fb Merge branch 'ds/name-hash-tweaks'
"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.

* ds/name-hash-tweaks:
  pack-objects: prevent name hash version change
  test-tool: add helper for name-hash values
  p5313: add size comparison test
  pack-objects: add GIT_TEST_NAME_HASH_VERSION
  repack: add --name-hash-version option
  pack-objects: add --name-hash-version option
  pack-objects: create new name-hash function version
2025-02-12 10:08:51 -08:00
Usman Akinyemi 62898b8f5e builtin/update-server-info: remove the_repository global variable
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/update-server-info.c".

When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_update_server_info()` function
with `repo` set to NULL and then early in the function, "parse_options()"
call will give the options help and exit, without having to consult much
of the configuration file. So it is safe to omit reading the config when
`repo` argument the caller gave us is NULL.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-10 16:20:21 -08:00
Junio C Hamano 246569bf83 Merge branch 'ps/hash-cleanup'
Further code clean-up on the use of hash functions.  Now the
context object knows what hash function it is working with.

* ps/hash-cleanup:
  global: adapt callers to use generic hash context helpers
  hash: provide generic wrappers to update hash contexts
  hash: stop typedeffing the hash context
  hash: convert hashing context to a structure
2025-02-10 10:18:31 -08:00
Patrick Steinhardt 07242c2a5a path: drop `git_common_path()` in favor of `repo_common_path()`
Remove `git_common_path()` in favor of the `repo_common_path()` family
of functions, which makes the implicit dependency on `the_repository` go
away.

Note that `git_common_path()` used to return a string allocated via
`get_pathname()`, which uses a rotating set of statically allocated
buffers. Consequently, callers didn't have to free the returned string.
The same isn't true for `repo_common_path()`, so we also have to add
logic to free the returned strings.

This refactoring also allows us to remove `repo_common_pathv()` from the
public interface.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:23 -08:00
Patrick Steinhardt 8e4710f011 worktree: return allocated string from `get_worktree_git_dir()`
The `get_worktree_git_dir()` function returns a string constant that
does not need to be free'd by the caller. This string is computed for
three different cases:

  - If we don't have a worktree we return a path into the Git directory.
    The returned string is owned by `the_repository`, so there is no
    need for the caller to free it.

  - If we have a worktree, but no worktree ID then the caller requests
    the main worktree. In this case we return a path into the common
    directory, which again is owned by `the_repository` and thus does
    not need to be free'd.

  - In the third case, where we have an actual worktree, we compute the
    path relative to "$GIT_COMMON_DIR/worktrees/". This string does not
    need to be released either, even though `git_common_path()` ends up
    allocating memory. But this doesn't result in a memory leak either
    because we write into a buffer returned by `get_pathname()`, which
    returns one out of four static buffers.

We're about to drop `git_common_path()` in favor of `repo_common_path()`,
which doesn't use the same mechanism but instead returns an allocated
string owned by the caller. While we could adapt `get_worktree_git_dir()`
to also use `get_pathname()` and print the derived common path into that
buffer, the whole schema feels a lot like premature optimization in this
context. There are some callsites where we call `get_worktree_git_dir()`
in a loop that iterates through all worktrees. But none of these loops
seem to be even remotely in the hot path, so saving a single allocation
there does not feel worth it.

Refactor the function to instead consistently return an allocated path
so that we can start using `repo_common_path()` in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:23 -08:00
Patrick Steinhardt 3859e39659 path: drop `git_path_buf()` in favor of `repo_git_path_replace()`
Remove `git_path_buf()` in favor of `repo_git_path_replace()`. The
latter does essentially the same, with the only exception that it does
not rely on `the_repository` but takes the repo as separate parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:22 -08:00
Patrick Steinhardt bba59f58a4 path: drop `git_pathdup()` in favor of `repo_git_path()`
Remove `git_pathdup()` in favor of `repo_git_path()`. The latter does
essentially the same, with the only exception that it does not rely on
`the_repository` but takes the repo as separate parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:22 -08:00
Patrick Steinhardt f5c714e2a7 path: refactor `repo_submodule_path()` family of functions
As explained in an earlier commit, we're refactoring path-related
functions to provide a consistent interface for computing paths into the
commondir, gitdir and worktree. Refactor the "submodule" family of
functions accordingly.

Note that in contrast to the other `repo_*_path()` families, we have to
pass in the repository as a non-constant pointer. This is because we end
up calling `repo_read_gitmodules()` deep down in the callstack, which
may end up modifying the repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:22 -08:00
Patrick Steinhardt f9467895d8 submodule: refactor `submodule_to_gitdir()` to accept a repo
The `submodule_to_gitdir()` function implicitly uses `the_repository` to
resolve submodule paths. Refactor the function to instead accept a repo
as parameter to remove the dependency on global state.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07 09:59:21 -08:00
David Aguilar 7c2f291943 difftool: eliminate use of USE_THE_REPOSITORY_VARIABLE
Remove the USE_THE_REPOSITORY_VARIABLE #define now that all
state is passed to each function from callers.

Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 13:00:21 -08:00
David Aguilar a24953f3df difftool: eliminate use of the_repository
Make callers pass a repository struct into each function instead
of relying on the global the_repository variable.

Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 13:00:21 -08:00
David Aguilar 8241ae63d8 difftool: eliminate use of global variables
Move difftool's global variables into a difftools_option struct
in preparation for removal of USE_THE_REPOSITORY_VARIABLE.

Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 13:00:21 -08:00
Toon Claes 337855629f builtin/clone: teach git-clone(1) the --revision= option
The git-clone(1) command has the option `--branch` that allows the user
to select the branch they want HEAD to point to. In a non-bare
repository this also checks out that branch.

Option `--branch` also accepts a tag. When a tag name is provided, the
commit this tag points to is checked out and HEAD is detached. Thus
`--branch` can be used to clone a repository and check out a ref kept
under `refs/heads` or `refs/tags`. But some other refs might be in use
as well. For example Git forges might use refs like `refs/pull/<id>` and
`refs/merge-requests/<id>` to track pull/merge requests. These refs
cannot be selected upon git-clone(1).

Add option `--revision` to git-clone(1). This option accepts a fully
qualified reference, or a hexadecimal commit ID. This enables the user
to clone and check out any revision they want. `--revision` can be used
in conjunction with `--depth` to do a minimal clone that only contains
the blob and tree for a single revision. This can be useful for
automated tests running in CI systems.

Using option `--branch` and `--single-branch` together is a similar
scenario, but serves a different purpose. Using these two options, a
singlet remote tracking branch is created and the fetch refspec is set
up so git-fetch(1) will receive updates on that branch from the remote.
This allows the user work on that single branch.

Option `--revision` on contrary detaches HEAD, creates no tracking
branches, and writes no fetch refspec.

Signed-off-by: Toon Claes <toon@iotcl.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
[jc: removed unnecessary TEST_PASSES_SANITIZE_LEAK from the test]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:26:42 -08:00
Toon Claes 9144b9362b parse-options: introduce die_for_incompatible_opt2()
The functions die_for_incompatible_opt3() and
die_for_incompatible_opt4() already exist to die whenever a user
specifies three or four options respectively that are not compatible.

Introduce die_for_incompatible_opt2() which dies when two options that
are incompatible are set.

Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:23:54 -08:00
Toon Claes 7a52a8c7d8 clone: introduce struct clone_opts in builtin/clone.c
There is a lot of state stored in global variables in builtin/clone.c.
In the long run we'd like to remove many of those.

Introduce `struct clone_opts` in this file. This struct will be used to
contain all details needed to perform the clone. The struct object can
be thrown around to all the functions that need these details.

The first field we're adding is `wants_head`. In some scenarios
(specifically when both `--single-branch` and `--branch` are given) we
are not interested in `HEAD` on the remote. The field `wants_head` in
`struct clone_opts` will hold this information. We could have put
`option_branch` and `option_single_branch` into that struct instead, but
in a following commit we'll be using `wants_head` as well.

Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:23:54 -08:00
Toon Claes 2ca67c6f14 clone: add tags refspec earlier to fetch refspec
In clone.c we call refspec_ref_prefixes() to copy the fetch refspecs
from the `remote->fetch` refspec into `ref_prefixes` of
`transport_ls_refs_options`. Afterwards we add the tags prefix
`refs/tags/` prefix as well. At a later point, in wanted_peer_refs() we
process refs using both `remote->fetch` and `TAG_REFSPEC`.

Simplify the code by appending `TAG_REFSPEC` to `remote->fetch` before
calling refspec_ref_prefixes().

To be able to do this, we set `option_tags` to 0 when --mirror is given.
This is because --mirror mirrors (hence the name) all the refs,
including tags and they do not need to be treated separately.

Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:23:53 -08:00
Toon Claes 879780f9a1 clone: refactor wanted_peer_refs()
The function wanted_peer_refs() is used to map the refs returned by the
server to refs we will save in our clone.

Over time this function grown to be very complex. Refactor it.

Previously, there was a separate code path for when
`option_single_branch` was set. It resulted in duplicated code and
deeper nested conditions. After this refactor the code path for when
`option_single_branch` is truthy modifies `refs` and then falls through
to the common code path. This approach relies on the `refspec` being set
correctly and thus only mapping refs that are relevant.

Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:23:53 -08:00
Toon Claes bc26f7690a clone: make it possible to specify --tags
Option --no-tags was added in 0dab2468ee (clone: add a --no-tags option
to clone without tags, 2017-04-26). At the time there was no need to
support --tags as well, although there was some conversation about
it[1].

To simplify the code and to prepare for future commits, invert the flag
internally. Functionally there is no change, because the flag is
default-enabled passing `--tags` has no effect, so there's no need to
add tests for this.

[1]: https://lore.kernel.org/git/CAGZ79kbHuMpiavJ90kQLEL_AR0BEyArcZoEWAjPPhOFacN16YQ@mail.gmail.com/

Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:23:53 -08:00
Toon Claes 7f420a6bda clone: cut down on global variables in clone.c
In clone.c the `struct option` which is used to parse the input options
for git-clone(1) is a global variable. Due to this, many variables that
are used to parse the value into, are also global.

Make `builtin_clone_options` a local variable in cmd_clone() and carry
along all variables that are only used in that function.

Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:23:53 -08:00
Justin Tobler 3295c35398 rev-list: extend print-info to print missing object type
Additional information about missing objects found in git-rev-list(1)
can be printed by specifying the `print-info` missing action for the
`--missing` option. Extend this action to also print missing object type
information inferred from its containing object. This token follows the
form `type=<type>` and specifies the expected object type of the missing
object.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-05 09:32:01 -08:00
Justin Tobler c6d896bcfd rev-list: add print-info action to print missing object path
Missing objects identified through git-rev-list(1) can be printed by
setting the `--missing=print` option. Additional information about the
missing object, such as its path and type, may be present in its
containing object.

Add the `print-info` missing action for the `--missing` option that,
when set, prints additional insight about the missing object inferred
from its containing object. Each line of output for a missing object is
in the form: `?<oid> [<token>=<value>]...`. The `<token>=<value>` pairs
containing additional information are separated from each other by a SP.
The value is encoded in a token specific fashion, but SP or LF contained
in value are always expected to be represented in such a way that the
resulting encoded value does not have either of these two problematic
bytes. This format is kept generic so it can be extended in the future
to support additional information.

For now, only a missing object path info is implemented. It follows the
form `path=<path>` and specifies the full path to the object from the
top-level tree. A path containing SP or special characters is enclosed
in double-quotes in the C style as needed. In a subsequent commit,
missing object type info will also be added.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-05 09:32:01 -08:00
Patrick Steinhardt 414c82300a builtin/repack: fix `--keep-unreachable` when there are no packs
The "--keep-unreachable" flag is supposed to append any unreachable
objects to the newly written pack. This flag is explicitly documented as
appending both packed and loose unreachable objects to the new packfile.
And while this works alright when repacking with preexisting packfiles,
it stops working when the repository does not have any packfiles at all.

The root cause are the conditions used to decide whether or not we want
to append "--pack-loose-unreachable" to git-pack-objects(1). There are
a couple of conditions here:

  - `has_existing_non_kept_packs()` checks whether there are existing
    packfiles. This condition makes sense to guard "--keep-pack=",
    "--unpack-unreachable" and "--keep-unreachable", because all of
    these flags only make sense in combination with existing packfiles.
    But it does not make sense to disable `--pack-loose-unreachable`
    when there aren't any preexisting packfiles, as loose objects can be
    packed into the new packfile regardless of that.

  - `delete_redundant` checks whether we want to delete any objects or
    packs that are about to become redundant. The documentation of
    `--keep-unreachable` explicitly says that `git repack -ad` needs to
    be executed for the flag to have an effect.

    It is not immediately obvious why such redundant objects need to be
    deleted in order for "--pack-unreachable-objects" to be effective.
    But as things are working as documented this is nothing we'll change
    for now.

  - `pack_everything & PACK_CRUFT` checks that we're not creating a
    cruft pack. This condition makes sense in the context of
    "--pack-loose-unreachable", as unreachable objects would end up in
    the cruft pack anyway.

So while the second and third condition are sensible, it does not make
any sense to condition `--pack-loose-unreachable` on the existence of
packfiles.

Fix the bug by splitting out the "--pack-loose-unreachable" and only
making it depend on the second and third condition. Like this, loose
unreachable objects will be packed regardless of any preexisting
packfiles.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04 09:58:02 -08:00
Meet Soni be0905fed1 remote: rename query_refspecs functions
Rename functions related to handling refspecs in preparation for their
move from `remote.c` to `refspec.c`. Update their names to better
reflect their intent:

    - `query_refspecs()` -> `refspec_find_match()` for clarity, as it
      finds a single matching refspec.

    - `query_refspecs_multiple()` -> `refspec_find_all_matches()` to
      better reflect that it collects all matching refspecs instead of
      returning just the first match.

    - `query_matches_negative_refspec()` ->
      `refspec_find_negative_match()` for consistency with the
      updated naming convention, even though this static function
      didn't strictly require renaming.

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04 09:51:41 -08:00
Meet Soni e4f6ab0085 remote: rename function omit_name_by_refspec
Rename the function `omit_name_by_refspec()` to
`refname_matches_negative_refspec_item()` to provide clearer intent.
The previous function name was vague and did not accurately describe its
purpose. By using `refname_matches_negative_refspec_item`, make the
function's purpose more intuitive, clarifying that it checks if a
reference name matches any negative refspec.

Rename function parameters for consistency with existing naming
conventions. Use `refname` instead of `name` to align with terminology
in `refs.h`.

Remove the redundant doc comment since the function name is now
self-explanatory.

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04 09:51:41 -08:00
Derrick Stolee 85127bcdea backfill: assume --sparse when sparse-checkout is enabled
The previous change introduced the '--[no-]sparse' option for the 'git
backfill' command, but did not assume it as enabled by default. However,
this is likely the behavior that users will most often want to happen.
Without this default, users with a small sparse-checkout may be confused
when 'git backfill' downloads every version of every object in the full
history.

However, this is left as a separate change so this decision can be reviewed
independently of the value of the '--[no-]sparse' option.

Add a test of adding the '--sparse' option to a repo without sparse-checkout
to make it clear that supplying it without a sparse-checkout is an error.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:42 -08:00
Derrick Stolee bff4555767 backfill: add --sparse option
One way to significantly reduce the cost of a Git clone and later fetches is
to use a blobless partial clone and combine that with a sparse-checkout that
reduces the paths that need to be populated in the working directory. Not
only does this reduce the cost of clones and fetches, the sparse-checkout
reduces the number of objects needed to download from a promisor remote.

However, history investigations can be expensive as computing blob diffs
will trigger promisor remote requests for one object at a time. This can be
avoided by downloading the blobs needed for the given sparse-checkout using
'git backfill' and its new '--sparse' mode, at a time that the user is
willing to pay that extra cost.

Note that this is distinctly different from the '--filter=sparse:<oid>'
option, as this assumes that the partial clone has all reachable trees and
we are using client-side logic to avoid downloading blobs outside of the
sparse-checkout cone. This avoids the server-side cost of walking trees
while also achieving a similar goal. It also downloads in batches based on
similar path names, presenting a resumable download if things are
interrupted.

This augments the path-walk API to have a possibly-NULL 'pl' member that may
point to a 'struct pattern_list'. This could be more general than the
sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently
the only consumer.

Be sure to test this in both cone mode and not cone mode. Cone mode has the
benefit that the path-walk can skip certain paths once they would expand
beyond the sparse-checkout. Non-cone mode can describe the included files
using both positive and negative patterns, which changes the possible return
values of path_matches_pattern_list(). Test both kinds of matches for
increased coverage.

To test this, we can create a blobless sparse clone, expand the
sparse-checkout slightly, and then run 'git backfill --sparse' to see
how much data is downloaded. The general steps are

 1. git clone --filter=blob:none --sparse <url>
 2. git sparse-checkout set <dir1> ... <dirN>
 3. git backfill --sparse

For the Git repository with the 'builtin' directory in the
sparse-checkout, we get these results for various batch sizes:

| Batch Size      | Pack Count | Pack Size | Time  |
|-----------------|------------|-----------|-------|
| (Initial clone) | 3          | 110 MB    |       |
| 10K             | 12         | 192 MB    | 17.2s |
| 15K             | 9          | 192 MB    | 15.5s |
| 20K             | 8          | 192 MB    | 15.5s |
| 25K             | 7          | 192 MB    | 14.7s |

This case matters less because a full clone of the Git repository from
GitHub is currently at 277 MB.

Using a copy of the Linux repository with the 'kernel/' directory in the
sparse-checkout, we get these results:

| Batch Size      | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|------|
| (Initial clone) | 2          | 1,876 MB  |      |
| 10K             | 11         | 2,187 MB  | 46s  |
| 25K             | 7          | 2,188 MB  | 43s  |
| 50K             | 5          | 2,194 MB  | 44s  |
| 100K            | 4          | 2,194 MB  | 48s  |

This case is more meaningful because a full clone of the Linux
repository is currently over 6 GB, so this is a valuable way to download
a fraction of the repository and no longer need network access for all
reachable objects within the sparse-checkout.

Choosing a batch size will depend on a lot of factors, including the
user's network speed or reliability, the repository's file structure,
and how many versions there are of the file within the sparse-checkout
scope. There will not be a one-size-fits-all solution.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:42 -08:00
Derrick Stolee 6840fe9ee2 backfill: add --min-batch-size=<n> option
Users may want to specify a minimum batch size for their needs. This is only
a minimum: the path-walk API provides a list of OIDs that correspond to the
same path, and thus it is optimal to allow delta compression across those
objects in a single server request.

We could consider limiting the request to have a maximum batch size in the
future. For now, we let the path-walk API batches determine the
boundaries.

To get a feeling for the value of specifying the --min-batch-size parameter,
I tested a number of open source repositories available on GitHub. The
procedure was generally:

 1. git clone --filter=blob:none <url>
 2. git backfill

Checking the number of packfiles and the size of the .git/objects/pack
directory helps to identify the effects of different batch sizes.

For the Git repository, we get these results:

| Batch Size      | Pack Count | Pack Size | Time  |
|-----------------|------------|-----------|-------|
| (Initial clone) | 2          | 119 MB    |       |
| 25K             | 8          | 290 MB    | 24s   |
| 50K             | 5          | 290 MB    | 24s   |
| 100K            | 4          | 290 MB    | 29s   |

Other than the packfile counts decreasing as we need fewer batches, the
size and time required is not changing much for this small example.

For the nodejs/node repository, we see these results:

| Batch Size      | Pack Count | Pack Size | Time   |
|-----------------|------------|-----------|--------|
| (Initial clone) | 2          | 330 MB    |        |
| 25K             | 19         | 1,222 MB  | 1m 22s |
| 50K             | 11         | 1,221 MB  | 1m 24s |
| 100K            | 7          | 1,223 MB  | 1m 40s |
| 250K            | 4          | 1,224 MB  | 2m 23s |
| 500K            | 3          | 1,216 MB  | 4m 38s |

Here, we don't have much difference in the size of the repo, though the
500K batch size results in a few MB gained. That comes at a cost of a
much longer time. This extra time is due to server-side delta
compression happening as the on-disk deltas don't appear to be reusable
all the time. But for smaller batch sizes, the server is able to find
reasonable deltas partly because we are asking for objects that appear
in the same region of the directory tree and include all versions of a
file at a specific path.

To contrast this example, I tested the microsoft/fluentui repo, which
has been known to have inefficient packing due to name hash collisions.
These results are found before GitHub had the opportunity to repack the
server with more advanced name hash versions:

| Batch Size      | Pack Count | Pack Size | Time   |
|-----------------|------------|-----------|--------|
| (Initial clone) | 2          | 105 MB    |        |
| 5K              | 53         | 348 MB    | 2m 26s |
| 10K             | 28         | 365 MB    | 2m 22s |
| 15K             | 19         | 407 MB    | 2m 21s |
| 20K             | 15         | 393 MB    | 2m 28s |
| 25K             | 13         | 417 MB    | 2m 06s |
| 50K             | 8          | 509 MB    | 1m 34s |
| 100K            | 5          | 535 MB    | 1m 56s |
| 250K            | 4          | 698 MB    | 1m 33s |
| 500K            | 3          | 696 MB    | 1m 42s |

Here, a larger variety of batch sizes were chosen because of the great
variation in results. By asking the server to download small batches
corresponding to fewer paths at a time, the server is able to provide
better compression for these batches than it would for a regular clone.
A typical full clone for this repository would require 738 MB.

This example justifies the choice to batch requests by path name,
leading to improved communication with a server that is not optimally
packed.

Finally, the same experiment for the Linux repository had these results:

| Batch Size      | Pack Count | Pack Size | Time    |
|-----------------|------------|-----------|---------|
| (Initial clone) | 2          | 2,153 MB  |         |
| 25K             | 63         | 6,380 MB  | 14m 08s |
| 50K             | 58         | 6,126 MB  | 15m 11s |
| 100K            | 30         | 6,135 MB  | 18m 11s |
| 250K            | 14         | 6,146 MB  | 18m 22s |
| 500K            | 8          | 6,143 MB  | 33m 29s |

Even in this example, where the default name hash algorithm leads to
decent compression of the Linux kernel repository, there is value for
selecting a smaller batch size, to a limit. The 25K batch size has the
fastest time, but uses 250 MB more than the 50K batch size. The 500K
batch size took much more time due to server compression time and thus
we should avoid large batch sizes like this.

Based on these experiments, a batch size of 50,000 was chosen as the
default value.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:42 -08:00
Derrick Stolee 1e72e889e7 backfill: basic functionality and tests
The default behavior of 'git backfill' is to fetch all missing blobs that
are reachable from HEAD. Document and test this behavior.

The implementation is a very simple use of the path-walk API, initializing
the revision walk at HEAD to start the path-walk from all commits reachable
from HEAD. Ignore the object arrays that correspond to tree entries,
assuming that they are all present already.

The path-walk API provides lists of objects in batches according to a
common path, but that list could be very small. We want to balance the
number of requests to the server with the ability to have the process
interrupted with minimal repeated work to catch up in the next run.
Based on some experiments (detailed in the next change) a minimum batch
size of 50,000 is selected for the default.

This batch size is a _minimum_. As the path-walk API emits lists of blob
IDs, they are collected into a list of objects for a request to the
server. When that list is at least the minimum batch size, then the
request is sent to the server for the new objects. However, the list of
blob IDs from the path-walk API could be much longer than the batch
size. At this moment, it is unclear if there is a benefit to split the
list when there are too many objects at the same path.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:41 -08:00
Derrick Stolee a3f79e9abd backfill: add builtin boilerplate
In anticipation of implementing 'git backfill', populate the necessary files
with the boilerplate of a new builtin. Mark the builtin as experimental at
this time, allowing breaking changes in the near future, if necessary.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:41 -08:00
Junio C Hamano b83a2f9006 Merge branch 'kn/pack-write-with-reduced-globals'
Code clean-up.

* kn/pack-write-with-reduced-globals:
  pack-write: pass hash_algo to internal functions
  pack-write: pass hash_algo to `write_rev_file()`
  pack-write: pass hash_algo to `write_idx_file()`
  pack-write: pass repository to `index_pack_lockfile()`
  pack-write: pass hash_algo to `fixup_pack_header_footer()`
2025-02-03 10:23:34 -08:00
Junio C Hamano 803b5acaa7 Merge branch 'ps/3.0-remote-deprecation'
Following the procedure we established to introduce breaking
changes for Git 3.0, allow an early opt-in for removing support of
$GIT_DIR/branches/ and $GIT_DIR/remotes/ directories to configure
remotes.

* ps/3.0-remote-deprecation:
  remote: announce removal of "branches/" and "remotes/"
  builtin/pack-redundant: remove subcommand with breaking changes
  ci: repurpose "linux-gcc" job for deprecations
  ci: merge linux-gcc-default into linux-gcc
  Makefile: wire up build option for deprecated features
2025-02-03 10:23:33 -08:00
Junio C Hamano caf17423d3 Merge branch 'tb/unsafe-hash-cleanup'
The API around choosing to use unsafe variant of SHA-1
implementation has been updated in an attempt to make it harder to
abuse.

* tb/unsafe-hash-cleanup:
  hash.h: drop unsafe_ function variants
  csum-file: introduce hashfile_checkpoint_init()
  t/helper/test-hash.c: use unsafe_hash_algo()
  csum-file.c: use unsafe_hash_algo()
  hash.h: introduce `unsafe_hash_algo()`
  csum-file.c: extract algop from hashfile_checksum_valid()
  csum-file: store the hash algorithm as a struct field
  t/helper/test-tool: implement sha1-unsafe helper
2025-02-03 10:23:32 -08:00
Patrick Steinhardt 0578f1e66a global: adapt callers to use generic hash context helpers
Adapt callers to use generic hash context helpers instead of using the
hash algorithm to update them. This makes the callsites easier to reason
about and removes the possibility that the wrong hash algorithm is used
to update the hash context's state. And as a nice side effect this also
gets rid of a bunch of users of `the_hash_algo`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31 10:06:11 -08:00
Patrick Steinhardt 7346e340f1 hash: stop typedeffing the hash context
We generally avoid using `typedef` in the Git codebase. One exception
though is the `git_hash_ctx`, likely because it used to be a union
rather than a struct until the preceding commit refactored it. But now
that it is a normal `struct` there isn't really a need for a typedef
anymore.

Drop the typedef and adapt all callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31 10:06:10 -08:00
Junio C Hamano 0cbcba5455 Merge branch 'tb/unsafe-hash-cleanup' into ps/hash-cleanup
* tb/unsafe-hash-cleanup:
  hash.h: drop unsafe_ function variants
  csum-file: introduce hashfile_checkpoint_init()
  t/helper/test-hash.c: use unsafe_hash_algo()
  csum-file.c: use unsafe_hash_algo()
  hash.h: introduce `unsafe_hash_algo()`
  csum-file.c: extract algop from hashfile_checksum_valid()
  csum-file: store the hash algorithm as a struct field
  t/helper/test-tool: implement sha1-unsafe helper
2025-01-31 10:05:46 -08:00
Junio C Hamano 81309f424b Merge branch 'jc/show-index-h-update'
Doc and short-help text for "show-index" has been clarified to
stress that the command reads its data from the standard input.

* jc/show-index-h-update:
  show-index: the short help should say the command reads from its input
2025-01-31 09:44:16 -08:00
Junio C Hamano 8d6240d4c6 Merge branch 'rs/ref-fitler-used-atoms-value-fix'
"git branch --sort=..." and "git for-each-ref --format=... --sort=..."
did not work as expected with some atoms, which has been corrected.

* rs/ref-fitler-used-atoms-value-fix:
  ref-filter: remove ref_format_clear()
  ref-filter: move is-base tip to used_atom
  ref-filter: move ahead-behind bases into used_atom
2025-01-29 14:05:09 -08:00
Junio C Hamano de56e1d746 Merge branch 'ja/doc-commit-markup-updates'
Doc updates.

* ja/doc-commit-markup-updates:
  doc: migrate git-commit manpage secondary files to new format
  doc: convert git commit config to new format
  doc: make more direct explanations in git commit options
  doc: the mode param of -u of git commit is optional
  doc: apply new documentation guidelines to git commit
2025-01-29 14:05:09 -08:00
Junio C Hamano 73e055d71e Merge branch 'mh/credential-cache-authtype-request-fix'
The "cache" credential back-end did not handle authtype correctly,
which has been corrected.

* mh/credential-cache-authtype-request-fix:
  credential-cache: respect authtype capability
2025-01-28 13:02:24 -08:00
Junio C Hamano f8b9821f7d Merge branch 'jk/pack-header-parse-alignment-fix'
It was possible for "git unpack-objects" and "git index-pack" to
make an unaligned access, which has been corrected.

* jk/pack-header-parse-alignment-fix:
  index-pack, unpack-objects: use skip_prefix to avoid magic number
  index-pack, unpack-objects: use get_be32() for reading pack header
  parse_pack_header_option(): avoid unaligned memory writes
  packfile: factor out --pack_header argument parsing
  bswap.h: squelch potential sparse -Wcast-truncate warnings
2025-01-28 13:02:23 -08:00
Junio C Hamano f0a371a39d Merge branch 'jc/show-usage-help'
The help text from "git $cmd -h" appear on the standard output for
some $cmd and the standard error for others.  The built-in commands
have been fixed to show them on the standard output consistently.

* jc/show-usage-help:
  builtin: send usage() help text to standard output
  oddballs: send usage() help text to standard output
  builtins: send usage_with_options() help text to standard output
  usage: add show_usage_if_asked()
  parse-options: add show_usage_with_options_if_asked()
  t0012: optionally check that "-h" output goes to stdout
2025-01-28 13:02:22 -08:00
Derrick Stolee b4cf68476a pack-objects: prevent name hash version change
When the --name-hash-version option is used in 'git pack-objects', it
can change from the initial assignment to when it is used based on
interactions with other arguments. Specifically, when writing or reading
bitmaps, we must force version 1 for now. This could change in the
future when the bitmap format can store a name hash version value,
indicating which was used during the writing of the packfile.

Protect the 'git pack-objects' process from getting confused by failing
with a BUG() statement if the value of the name hash version changes
between calls to pack_name_hash_fn().

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27 13:21:43 -08:00
Derrick Stolee ce961135cc pack-objects: add GIT_TEST_NAME_HASH_VERSION
Add a new environment variable to opt-in to different values of the
--name-hash-version=<n> option in 'git pack-objects'. This allows for
extra testing of the feature without repeating all of the test
scenarios. Unlike many GIT_TEST_* variables, we are choosing to not add
this to the linux-TEST-vars CI build as that test run is already
overloaded. The behavior exposed by this test variable is of low risk
and should be sufficient to allow manual testing when an issue arises.

But this option isn't free. There are a few tests that change behavior
with the variable enabled.

First, there are a few tests that are very sensitive to certain delta
bases being picked. These are both involving the generation of thin
bundles and then counting their objects via 'git index-pack --fix-thin'
which pulls the delta base into the new packfile. For these tests,
disable the option as a decent long-term option.

Second, there are some tests that compare the exact output of a 'git
pack-objects' process when using bitmaps. The warning that ignores the
--name-hash-version=2 and forces version 1 causes these tests to fail.
Disable the environment variable to get around this issue.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27 13:21:43 -08:00
Derrick Stolee 928ef41dd8 repack: add --name-hash-version option
The new '--name-hash-version' option for 'git repack' is a simple
pass-through to the underlying 'git pack-objects' subcommand. However,
this subcommand may have other options and a temporary filename as part
of the subcommand execution that may not be predictable or could change
over time.

The existing test_subcommand method requires an exact list of arguments
for the subcommand. This is too rigid for our needs here, so create a
new method, test_subcommand_flex. Use it to check that the
--name-hash-version option is passing through.

Since we are modifying the 'git repack' command, let's bring its usage
in line with the Documentation's synopsis. This removes it from the
allow list in t0450 so it will remain in sync in the future.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27 13:21:43 -08:00
Derrick Stolee fc62e033cd pack-objects: add --name-hash-version option
The previous change introduced a new pack_name_hash_v2() function that
intends to satisfy much of the hash locality features of the existing
pack_name_hash() function while also distinguishing paths with similar
final components of their paths.

This change adds a new --name-hash-version option for 'git pack-objects'
to allow users to select their preferred function version. This use of
an integer version allows for future expansion and a direct way to later
store a name hash version in the .bitmap format.

For now, let's consider how effective this mechanism is when repacking a
repository with different name hash versions. Specifically, we will
execute 'git pack-objects' the same way a 'git repack -adf' process
would, except we include --name-hash-version=<n> for testing.

On the Git repository, we do not expect much difference. All path names
are short. This is backed by our results:

| Stage                 | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone           | 260 MB    | N/A         |
| --name-hash-version=1 | 127 MB    | 129s        |
| --name-hash-version=2 | 127 MB    | 112s        |

This example demonstrates how there is some natural overhead coming from
the cloned copy because the server is hosting many forks and has not
optimized for exactly this set of reachable objects. But the full repack
has similar characteristics for both versions.

Let's consider some repositories that are hitting too many collisions
with version 1. First, let's explore the kinds of paths that are
commonly causing these collisions:

 * "/CHANGELOG.json" is 15 characters, and is created by the beachball
   [1] tool. Only the final character of the parent directory can
   differentiate different versions of this file, but also only the two
   most-significant digits. If that character is a letter, then this is
   always a collision. Similar issues occur with the similar
   "/CHANGELOG.md" path, though there is more opportunity for
   differences In the parent directory.

 * Localization files frequently have common filenames but
   differentiates via parent directories. In C#, the name
   "/strings.resx.lcl" is used for these localization files and they
   will all collide in name-hash.

[1] https://github.com/microsoft/beachball

I've come across many other examples where some internal tool uses a
common name across multiple directories and is causing Git to repack
poorly due to name-hash collisions.

One open-source example is the fluentui [2] repo, which  uses beachball
to generate CHANGELOG.json and CHANGELOG.md files, and these files have
very poor delta characteristics when comparing against versions across
parent directories.

| Stage                 | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone           | 694 MB    | N/A         |
| --name-hash-version=1 | 438 MB    | 728s        |
| --name-hash-version=2 | 168 MB    | 142s        |

[2] https://github.com/microsoft/fluentui

In this example, we see significant gains in the compressed packfile
size as well as the time taken to compute the packfile.

Using a collection of repositories that use the beachball tool, I was
able to make similar comparisions with dramatic results. While the
fluentui repo is public, the others are private so cannot be shared for
reproduction. The results are so significant that I find it important to
share here:

| Repo     | --name-hash-version=1 | --name-hash-version=2 |
|----------|-----------------------|-----------------------|
| fluentui |               440 MB  |               161 MB  |
| Repo B   |             6,248 MB  |               856 MB  |
| Repo C   |            37,278 MB  |             6,755 MB  |
| Repo D   |           131,204 MB  |             7,463 MB  |

Future changes could include making --name-hash-version implied by a config
value or even implied by default during a full repack.

It is important to point out that the name hash value is stored in the
.bitmap file format, so we must force --name-hash-version=1 when bitmaps
are being read or written. Later, the bitmap format could be updated to
be aware of the name hash version so deltas can be quickly computed
across the bitmapped/not-bitmapped boundary. To promote the safety of
this parameter, the validate_name_hash_version() method will die() if
the given name-hash version is incorrect and will disable newer versions
if not yet compatible with other features, such as --write-bitmap-index.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27 13:21:41 -08:00
Bence Ferdinandy 93dc16483a fetch set_head: fix non-mirror remotes in bare repositories
In b1b713f722 (fetch set_head: handle mirrored bare repositories,
2024-11-22) it was implicitly assumed that all remotes will be mirrors
in a bare repository, thus fetching a non-mirrored remote could lead to
HEAD pointing to a non-existent reference. Make sure we only overwrite
HEAD if we are in a bare repository and fetching from a mirror.
Otherwise, proceed as normally, and create
refs/remotes/<nonmirrorremote>/HEAD instead.

Reported-by: Christian Hesse <list@eworm.de>
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27 08:16:47 -08:00
Bence Ferdinandy 638060dcb9 fetch set_head: refactor to use remote directly
As a preparatory step to use even more properties from the remote
struct, refactor set_head to take the entire struct as a parameter,
instead of the necessary bits. This also allows consolidating the use of
gtransport->remote in set_head, making the access of the remote's
properties consistent in the function.

Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27 08:16:45 -08:00
ZheNing Hu 08032fa30f gc: add `--expire-to` option
This commit extends the functionality of `git gc`
by adding a new option, `--expire-to=<dir>`. Previously,
this feature was implemented in 91badeba32 (builtin/repack.c:
implement `--expire-to` for storing pruned objects, 2022-10-24),
which allowing users to specify a directory where unreachable
and expired cruft packs are stored during garbage collection.
However, users had to run `git repack --cruft --expire-to=<dir>`
followed by `git prune` to achieve similar results within `git gc`.

By introducing `--expire-to=<dir>` directly into `git gc`,
we simplify the process for users who wish to manage their
repository's cleanup more efficiently. This change involves
passing the `--expire-to=<dir>` parameter through to `git repack`,
making it easier for users to set up a backup location for cruft
packs that will be pruned.

Due to the original `git gc --prune=now` deleting all unreachable
objects by passing the `-a` parameter to git repack. With the
addition of the `--cruft` and `--expire-to` options, it is necessary
to modify this default behavior: instead of deleting these
unreachable objects, they should be merged into a cruft pack and
collected in a specified directory. Therefore, we do not pass `-a`
to the repack command but instead pass `--cruft`, `--expire-to`,
and `--cruft-expiration=now` to repack.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-24 14:32:28 -08:00
Patrick Steinhardt 8ccc75c245 remote: announce removal of "branches/" and "remotes/"
Back when Git was in its infancy, remotes were configured via separate
files in "branches/" (back in 2005). This mechanism was replaced later
that year with the "remotes/" directory. Both mechanisms have eventually
been replaced by config-based remotes, and it is very unlikely that
anybody still uses these directories to configure their remotes.

Both of these directories have been marked as deprecated, one in 2005
and the other one in 2011. Follow through with the deprecation and
finally announce the removal of these features in Git 3.0.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
[jc: with a small tweak to the help message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-24 08:08:56 -08:00
Taylor Blau a8dd3821fe csum-file: introduce hashfile_checkpoint_init()
In 106140a99f (builtin/fast-import: fix segfault with unsafe SHA1
backend, 2024-12-30) and 9218c0bfe1 (bulk-checkin: fix segfault with
unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to
initialize a hashfile_checkpoint with the same hash function
implementation as is used by the hashfile it is used to checkpoint.

While both 106140a99f and 9218c0bfe1 work around the immediate crash,
changing the hash function implementation within the hashfile API to,
for example, the non-unsafe variant would re-introduce the crash. This
is a result of the tight coupling between initializing hashfiles and
hashfile_checkpoints.

Introduce and use a new function which ensures that both parts of a
hashfile and hashfile_checkpoint pair use the same hash function
implementation to avoid such crashes.

A few things worth noting:

  - In the change to builtin/fast-import.c::stream_blob(), we can see
    that by removing the explicit reference to
    'the_hash_algo->unsafe_init_fn()', we are hardened against the
    hashfile API changing away from the_hash_algo (or its unsafe
    variant) in the future.

  - The bulk-checkin code no longer needs to explicitly zero-initialize
    the hashfile_checkpoint, since it is now done as a result of calling
    'hashfile_checkpoint_init()'.

  - Also in the bulk-checkin code, we add an additional call to
    prepare_to_stream() outside of the main loop in order to initialize
    'state->f' so we know which hash function implementation to use when
    calling 'hashfile_checkpoint_init()'.

    This is OK, since subsequent 'prepare_to_stream()' calls are noops.
    However, we only need to call 'prepare_to_stream()' when we have the
    HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling
    'prepare_to_stream()' does not assign 'state->f', so we have nothing
    to initialize.

  - Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are
    appropriately guarded.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23 10:28:17 -08:00
Karthik Nayak 6b2aa7fd37 pack-write: pass hash_algo to `write_rev_file()`
The `write_rev_file()` function uses the global `the_hash_algo` variable
to access the repository's hash_algo. To avoid global variable usage,
pass a hash_algo from the layers above. Also modify children functions
`write_rev_file_order()` and `write_rev_header()` to accept
'the_hash_algo'.

Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.

However, in `midx-write.c`, since all usage of global variables is
removed, don't reintroduce them and instead use the `repo` available in
the context.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21 12:36:34 -08:00
Karthik Nayak 7653e9af9b pack-write: pass hash_algo to `write_idx_file()`
The `write_idx_file()` function uses the global `the_hash_algo` variable
to access the repository's hash_algo. To avoid global variable usage,
pass a hash_algo from the layers above.

Since `stage_tmp_packfiles()` also resides in 'pack-write.c' and calls
`write_idx_file()`, update it to accept a `struct git_hash_algo` as a
parameter and pass it through to the callee.

Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21 12:36:34 -08:00
Karthik Nayak e2f6f76585 pack-write: pass repository to `index_pack_lockfile()`
The `index_pack_lockfile()` function uses the global `the_repository`
variable to access the repository. To avoid global variable usage, pass
the repository from the layers above.

Altough the layers above could have access to the repository internally,
simply pass in `the_repository`. This avoids any compatibility issues
and bubbles up global variable usage to upper layers which can be
eventually resolved.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21 12:36:34 -08:00
Karthik Nayak 8244d01de6 pack-write: pass hash_algo to `fixup_pack_header_footer()`
The `fixup_pack_header_footer()` function uses the global
`the_hash_algo` variable to access the repository's hash function. To
avoid global variable usage, pass a hash_algo from the layers above.

Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21 12:36:34 -08:00
René Scharfe c5490ce9d1 ref-filter: remove ref_format_clear()
Now that ref_format_clear() no longer releases any memory we don't need
it anymore.  Remove it and its counterpart, ref_format_init().

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21 09:06:24 -08:00
René Scharfe 5e58db6575 ref-filter: move ahead-behind bases into used_atom
verify_ref_format() parses a ref-filter format string and stores
recognized items in the static array "used_atom".  For
"ahead-behind:<committish>" it stores the committish part in a
string_list member "bases" of struct ref_format.

ref_sorting_options() also parses bare ref-filter format items and
stores stores recognized ones in "used_atom" as well.  The committish
parts go to a dummy struct ref_format in parse_sorting_atom(), though,
and are leaked and forgotten.

If verify_ref_format() is called before ref_sorting_options(), like in
git for-each-ref, then all works well if the sort key is included in the
format string.  If it isn't then sorting cannot work as the committishes
are missing.

If ref_sorting_options() is called first, like in git branch, then we
have the additional issue that if the sort key is included in the format
string then filter_ahead_behind() can't see its committish, will not
generate any results for it and thus it will be expanded to an empty
string.

Fix those issues by replacing the string_list with a field in used_atom
for storing the committish.  This way it can be shared for handling both
ref-filter format strings and sorting options in the same command.

Reported-by: Ross Goldberg <ross.goldberg@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21 09:06:15 -08:00
Junio C Hamano 7b39a128c8 Merge branch 'ps/the-repository'
More code paths have a repository passed through the callchain,
instead of assuming the primary the_repository object.

* ps/the-repository:
  match-trees: stop using `the_repository`
  graph: stop using `the_repository`
  add-interactive: stop using `the_repository`
  tmp-objdir: stop using `the_repository`
  resolve-undo: stop using `the_repository`
  credential: stop using `the_repository`
  mailinfo: stop using `the_repository`
  diagnose: stop using `the_repository`
  server-info: stop using `the_repository`
  send-pack: stop using `the_repository`
  serve: stop using `the_repository`
  trace: stop using `the_repository`
  pager: stop using `the_repository`
  progress: stop using `the_repository`
2025-01-21 08:44:54 -08:00
Junio C Hamano d6a7cace21 Merge branch 'jt/fsck-skiplist-parse-fix'
A misconfigured "fsck.skiplist" configuration variable was not
diagnosed as an error, which has been corrected.

* jt/fsck-skiplist-parse-fix:
  fsck: reject misconfigured fsck.skipList
2025-01-21 08:44:53 -08:00
Junio C Hamano cb441e1ec3 Merge branch 'ps/reftable-get-random-fix'
The code to compute "unique" name used git_rand() which can fail or
get stuck; the callsite does not require cryptographic security.
Introduce the "insecure" mode and use it appropriately.

* ps/reftable-get-random-fix:
  reftable/stack: accept insecure random bytes
  wrapper: allow generating insecure random bytes
2025-01-21 08:44:53 -08:00
Jeff King 98046591b9 index-pack, unpack-objects: use skip_prefix to avoid magic number
When parsing --pack_header=, we manually skip 14 bytes to the data.
Let's use skip_prefix() to do this automatically.

Note that we overwrite our pointer to the front of the string, so we
have to add more context to the error message. We could avoid this by
declaring an extra pointer to hold the value, but I think the modified
message is actually preferable; it should give translators a bit more
context.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21 08:42:56 -08:00
Jeff King f1299bff26 index-pack, unpack-objects: use get_be32() for reading pack header
Both of these commands read the incoming pack into a static unsigned
char buffer in BSS, and then parse it by casting the start of the buffer
to a struct pack_header. This can result in SIGBUS on some platforms if
the compiler doesn't place the buffer in a position that is properly
aligned for 4-byte integers.

This reportedly happens with unpack-objects (but not index-pack) on
sparc64 when compiled with clang (but not gcc). But we are definitely in
the wrong in both spots; since the buffer's type is unsigned char, we
can't depend on larger alignment. When it works it is only because we
are lucky.

We'll fix this by switching to get_be32() to read the headers (just like
the last few commits similarly switched us to put_be32() for writing
into the same buffer).

It would be nice to factor this out into a common helper function, but
the interface ends up quite awkward. Either the caller needs to hardcode
how many bytes we'll need, or it needs to pass us its fill()/use()
functions as pointers. So I've just fixed both spots in the same way;
this is not code that is likely to be repeated a third time (most of the
pack reading code uses an mmap'd buffer, which should be properly
aligned).

I did make one tweak to the shared code: our pack_version_ok() macro
expects us to pass the big-endian value we'd get by casting. We can
introduce a "native" variant which uses the host integer ordering.

Reported-by: Koakuma <koachan@protonmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21 08:42:56 -08:00
Jeff King 798e0f4516 packfile: factor out --pack_header argument parsing
Both index-pack and unpack-objects accept a --pack_header argument. This
is an undocumented internal argument used by receive-pack and fetch to
pass along information about the header of the pack, which they've
already read from the incoming stream.

In preparation for a bugfix, let's factor the duplicated code into a
common helper.

The callers are still responsible for identifying the option. While this
could likewise be factored out, it is more flexible this way (e.g., if
they ever started using parse-options and wanted to handle both the
stuck and unstuck forms).

Likewise, the callers are responsible for reporting errors, though they
both just call die(). I've tweaked unpack-objects to match index-pack in
marking the error for translation.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21 08:42:55 -08:00
Junio C Hamano f66d1423f5 builtin: send usage() help text to standard output
Using the show_usage_and_exit_if_asked() helper we introduced
earlier, fix callers of usage() that want to show the help text when
explicitly asked by the end-user.  The help text now goes to the
standard output stream for them.

These are the bog standard "if we got only '-h', then that is a
request for help" callers.  Their

	if (argc == 2 && !strcmp(argv[1], "-h"))
		usage(message);

are simply replaced with

	show_usage_and_exit_if_asked(argc, argv, message);

With this, the built-ins tested by t0012 all send their help text to
their standard output stream, so the check in t0012 that was half
tightened earlier is now fully tightened to insist on standard error
stream being empty.

Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-17 13:30:03 -08:00
Junio C Hamano a36a822d7d oddballs: send usage() help text to standard output
Using the show_usage_if_asked() helper we introduced earlier, fix
callers of usage() that want to show the help text when explicitly
asked by the end-user.  The help text now goes to the standard
output stream for them.

The callers in this step are oddballs in that their invocations of
usage() are *not* guarded by

	if (argc == 2 && !strcmp(argv[1], "-h")
		usage(...);

There are (unnecessarily) being clever ones that do things like

	if (argc != 2 || !strcmp(argv[1], "-h")
		usage(...);

to say "I know I take only one argument, so argc != 2 is always an
error regardless of what is in argv[].  Ah, by the way, even if argc
is 2, "-h" is a request for usage text, so we do the same".

Some like "git var -h" just do not treat "-h" any specially, and let
it take the same error code paths as a parameter error.

Now we cannot do the same, so these callers are rewrittin to do the
show_usage_and_exit_if_asked() first and then handle the usage error
the way they used to.

Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-17 13:30:03 -08:00
Junio C Hamano b821c999ca builtins: send usage_with_options() help text to standard output
Using the show_usage_with_options_if_asked() helper we introduced
earlier, fix callers of usage_with_options() that want to show the
help text when explicitly asked by the end-user.  The help text now
goes to the standard output stream for them.

The test in t7600 for "git merge -h" may want to be retired, as the
same is covered by t0012 already, but it is specifically testing that
the "-h" option gets a response even with a corrupt index file, so
for now let's leave it there.

Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-17 13:30:03 -08:00
Junio C Hamano 564b907c8a Merge branch 'ps/more-sign-compare'
More -Wsign-compare fixes.

* ps/more-sign-compare:
  sign-compare: avoid comparing ptrdiff with an int/unsigned
  commit-reach: use `size_t` to track indices when computing merge bases
  shallow: fix -Wsign-compare warnings
  builtin/log: fix remaining -Wsign-compare warnings
  builtin/log: use `size_t` to track indices
  commit-reach: use `size_t` to track indices in `get_reachable_subset()`
  commit-reach: use `size_t` to track indices in `remove_redundant()`
  commit-reach: fix type of `min_commit_date`
  commit-reach: fix index used to loop through unsigned integer
  prio-queue: fix type of `insertion_ctr`
2025-01-16 16:35:14 -08:00
Jean-Noël Avila d533c10697 doc: the mode param of -u of git commit is optional
Fix the synopsis to reflect the option description.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-15 14:43:36 -08:00
Junio C Hamano b28fb93e51 Merge branch 'ps/build-sign-compare'
Last-minute fix for a regression in "git blame --abbrev=<length>"
when insane <length> is specified; we used to correctly cap it to
the hash output length but broke it during the cycle.

* ps/build-sign-compare:
  builtin/blame: fix out-of-bounds write with blank boundary commits
  builtin/blame: fix out-of-bounds read with excessive `--abbrev`
2025-01-10 09:19:34 -08:00
Patrick Steinhardt e7fb2ca945 builtin/blame: fix out-of-bounds write with blank boundary commits
When passing the `-b` flag to git-blame(1), then any blamed boundary
commits which were marked as uninteresting will not get their actual
commit ID printed, but will instead be replaced by a couple of spaces.

The flag can lead to an out-of-bounds write as though when combined with
`--abbrev=` when the abbreviation length is longer than `GIT_MAX_HEXSZ`
as we simply use memset(3p) on that array with the user-provided length
directly. The result is most likely that we segfault.

An obvious fix would be to cull `length` to `GIT_MAX_HEXSZ` many bytes.
But when the underlying object ID is SHA1, and if the abbreviated length
exceeds the SHA1 length, it would cause us to print more bytes than
desired, and the result would be misaligned.

Instead, fix the bug by computing the length via strlen(3p). This makes
us write as many bytes as the formatted object ID requires and thus
effectively limits the length of what we may end up printing to the
length of its hash. If `--abbrev=` asks us to abbreviate to something
shorter than the full length of the underlying hash function it would be
handled by the call to printf(3p) correctly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-10 06:56:55 -08:00
Patrick Steinhardt 1fbb8d7ecb builtin/blame: fix out-of-bounds read with excessive `--abbrev`
In 6411a0a896 (builtin/blame: fix type of `length` variable when
emitting object ID, 2024-12-06) we have fixed the type of the `length`
variable. In order to avoid a cast from `size_t` to `int` in the call to
printf(3p) with the "%.*s" formatter we have converted the code to
instead use fwrite(3p), which accepts the length as a `size_t`.

It was reported though that this makes us read over the end of the OID
array when the provided `--abbrev=` length exceeds the length of the
object ID. This is because fwrite(3p) of course doesn't stop when it
sees a NUL byte, whereas printf(3p) does.

Fix the bug by reverting back to printf(3p) and culling the provided
length to `GIT_MAX_HEXSZ` to keep it from overflowing when cast to an
`int`.

Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-10 06:56:54 -08:00
M Hickford 0b43274850 credential-cache: respect authtype capability
Previously, credential-cache populated authtype regardless whether
"get" request had authtype capability. As documented in
git-credential.txt, authtype "should not be sent unless the appropriate
capability ... is provided".

Add test. Without this change, the test failed because "credential fill"
printed an incomplete credential with only protocol and host attributes
(the unexpected authtype attribute was discarded by credential.c).

Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-09 15:04:15 -08:00
Justin Tobler ca7158076f fsck: reject misconfigured fsck.skipList
In Git, fsck operations can ignore known broken objects via the
`fsck.skipList` configuration. This option expects a path to a file with
the list of object names. When the configuration is specified without a
path, an error message is printed, but the command continues as if the
configuration was not set. Configuring `fsck.skipList` without a value
is a misconfiguration so config parsing should be more strict and reject
it.

Update `git_fsck_config()` to no longer ignore misconfiguration of
`fsck.skipList`. The same behavior is also present for
`fetch.fsck.skipList` and `receive.fsck.skipList` so the configuration
parsers for these are updated to ensure the related operations remain
consistent.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-07 09:22:25 -08:00
Patrick Steinhardt 1568d1562e wrapper: allow generating insecure random bytes
The `csprng_bytes()` function generates randomness and writes it into a
caller-provided buffer. It abstracts over a couple of implementations,
where the exact one that is used depends on the platform.

These implementations have different guarantees: while some guarantee to
never fail (arc4random(3)), others may fail. There are two significant
failures to distinguish from one another:

  - Systemic failure, where e.g. opening "/dev/urandom" fails or when
    OpenSSL doesn't have a provider configured.

  - Entropy failure, where the entropy pool is exhausted, and thus the
    function cannot guarantee strong cryptographic randomness.

While we cannot do anything about the former, the latter failure can be
acceptable in some situations where we don't care whether or not the
randomness can be predicted.

Introduce a new `CSPRNG_BYTES_INSECURE` flag that allows callers to opt
into weak cryptographic randomness. The exact behaviour of the flag
depends on the underlying implementation:

    - `arc4random_buf()` never returns an error, so it doesn't change.

    - `getrandom()` pulls from "/dev/urandom" by default, which never
      blocks on modern systems even when the entropy pool is empty.

    - `getentropy()` seems to block when there is not enough randomness
      available, and there is no way of changing that behaviour.

    - `GtlGenRandom()` doesn't mention anything about its specific
      failure mode.

    - The fallback reads from "/dev/urandom", which also returns bytes in
      case the entropy pool is drained in modern Linux systems.

That only leaves OpenSSL with `RAND_bytes()`, which returns an error in
case the returned data wouldn't be cryptographically safe. This function
is replaced with a call to `RAND_pseudo_bytes()`, which can indicate
whether or not the returned data is cryptographically secure via its
return value. If it is insecure, and if the `CSPRNG_BYTES_INSECURE` flag
is set, then we ignore the insecurity and return the data regardless.

It is somewhat questionable whether we really need the flag in the first
place, or whether we wouldn't just ignore the potentially-insecure data.
But the risk of doing that is that we might have or grow callsites that
aren't aware of the potential insecureness of the data in places where
it really matters. So using a flag to opt-in to that behaviour feels
like the more secure choice.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-07 09:04:18 -08:00
Junio C Hamano a41e394e21 Merge branch 'bf/fetch-set-head-config'
A hotfix on an advice messagge added during this cycle.

* bf/fetch-set-head-config:
  fetch: fix erroneous set_head advice message
2025-01-06 12:02:21 -08:00
Bence Ferdinandy 233d48f5de fetch: fix erroneous set_head advice message
9e2b7005be (fetch set_head: add warn-if-not-$branch option, 2024-12-05)
tried to expand the advice message for set_head with the new option, but
unfortunately did not manage to add the right incantation. Fix the
advice message with the correct usage of warn-if-not-$branch.

Reported-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-06 06:50:03 -08:00
Junio C Hamano effbef2beb Merge branch 'jk/lsan-race-ignore-false-positive'
CI jobs that run threaded programs under LSan has been giving false
positives from time to time, which has been worked around.

This is an alternative to the jk/lsan-race-with-barrier topic with
much smaller change to the production code.

* jk/lsan-race-ignore-false-positive:
  test-lib: ignore leaks in the sanitizer's thread code
  test-lib: check leak logs for presence of DEDUP_TOKEN
  test-lib: simplify leak-log checking
  test-lib: rely on logs to detect leaks
  Revert barrier-based LSan threading race workaround
2025-01-02 13:37:08 -08:00
Junio C Hamano fc89d14c63 Revert barrier-based LSan threading race workaround
The extra "barrier" approach was too much code whose sole purpose
was to work around a race that is not even ours (i.e. in LSan's
teardown code).

In preparation for queuing a solution taking a much-less-invasive
approach, let's revert them.
2025-01-01 14:13:01 -08:00
Junio C Hamano d893741e02 Merge branch 'jk/lsan-race-with-barrier'
CI jobs that run threaded programs under LSan has been giving false
positives from time to time, which has been worked around.

* jk/lsan-race-with-barrier:
  grep: work around LSan threading race with barrier
  index-pack: work around LSan threading race with barrier
  thread-utils: introduce optional barrier type
  Revert "index-pack: spawn threads atomically"
  test-lib: use individual lsan dir for --stress runs
2025-01-01 09:21:15 -08:00
Junio C Hamano 98422943f0 Merge branch 'ps/weak-sha1-for-tail-sum-fix'
An earlier "csum-file checksum does not have to be computed with
sha1dc" topic had a few code paths that had initialized an
implementation of a hash function to be used by an unmatching hash
by mistake, which have been corrected.

* ps/weak-sha1-for-tail-sum-fix:
  ci: exercise unsafe OpenSSL backend
  builtin/fast-import: fix segfault with unsafe SHA1 backend
  bulk-checkin: fix segfault with unsafe SHA1 backend
2025-01-01 09:21:14 -08:00
Patrick Steinhardt 106140a99f builtin/fast-import: fix segfault with unsafe SHA1 backend
Same as with the preceding commit, git-fast-import(1) is using the safe
variant to initialize a hashfile checkpoint. This leads to a segfault
when passing the checkpoint into the hashfile subsystem because it would
use the unsafe variants instead:

    ++ git --git-dir=R/.git fast-import --big-file-threshold=1
    AddressSanitizer:DEADLYSIGNAL
    =================================================================
    ==577126==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 (pc 0x7ffff7a01a99 bp 0x5070000009c0 sp 0x7fffffff5b30 T0)
    ==577126==The signal is caused by a READ memory access.
    ==577126==Hint: address points to the zero page.
        #0 0x7ffff7a01a99 in EVP_MD_CTX_copy_ex (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4)
        #1 0x555555ddde56 in openssl_SHA1_Clone ../sha1/openssl.h:40:2
        #2 0x555555dce2fc in git_hash_sha1_clone_unsafe ../object-file.c:123:2
        #3 0x555555c2d5f8 in hashfile_checkpoint ../csum-file.c:211:2
        #4 0x5555559647d1 in stream_blob ../builtin/fast-import.c:1110:2
        #5 0x55555596247b in parse_and_store_blob ../builtin/fast-import.c:2031:3
        #6 0x555555967f91 in file_change_m ../builtin/fast-import.c:2408:5
        #7 0x55555595d8a2 in parse_new_commit ../builtin/fast-import.c:2768:4
        #8 0x55555595bb7a in cmd_fast_import ../builtin/fast-import.c:3614:4
        #9 0x555555b1f493 in run_builtin ../git.c:480:11
        #10 0x555555b1bfef in handle_builtin ../git.c:740:9
        #11 0x555555b1e6f4 in run_argv ../git.c:807:4
        #12 0x555555b1b87a in cmd_main ../git.c:947:19
        #13 0x5555561649e6 in main ../common-main.c:64:11
        #14 0x7ffff742a1fb in __libc_start_call_main (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
        #15 0x7ffff742a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
        #16 0x555555772c84 in _start (git+0x21ec84)

    ==577126==Register values:
    rax = 0x0000511000000cc0  rbx = 0x0000000000000000  rcx = 0x000000000000000c  rdx = 0x0000000000000000
    rdi = 0x0000000000000000  rsi = 0x00005070000009c0  rbp = 0x00005070000009c0  rsp = 0x00007fffffff5b30
     r8 = 0x0000000000000000   r9 = 0x0000000000000000  r10 = 0x0000000000000000  r11 = 0x00007ffff7a01a30
    r12 = 0x0000000000000000  r13 = 0x00007fffffff6b60  r14 = 0x00007ffff7ffd000  r15 = 0x00005555563b9910
    AddressSanitizer can not provide additional info.
    SUMMARY: AddressSanitizer: SEGV (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4) in EVP_MD_CTX_copy_ex
    ==577126==ABORTING
    ./test-lib.sh: line 1039: 577126 Aborted                 git --git-dir=R/.git fast-import --big-file-threshold=1 < input
    error: last command exited with $?=134
    not ok 167 - R: blob bigger than threshold

The segfault is only exposed in case the unsafe and safe backends are
different from one another.

Fix the issue by initializing the context with the unsafe SHA1 variant.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-30 06:46:30 -08:00
Jeff King 7a8d9efc26 grep: work around LSan threading race with barrier
There's a race with LSan when spawning threads and one of the threads
calls die(). We worked around one such problem with index-pack in the
previous commit, but it exists in git-grep, too. You can see it with:

  make SANITIZE=leak THREAD_BARRIER_PTHREAD=YesOnLinux
  cd t
  ./t0003-attributes.sh --stress

which fails pretty quickly with:

  ==git==4096424==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 32 byte(s) in 1 object(s) allocated from:
      #0 0x7f906de14556 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
      #1 0x7f906dc9d2c1 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
      #2 0x7f906de2500d in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
      #3 0x7f906de25187 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:614
      #4 0x7f906de17d18 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:53
      #5 0x7f906de143a9 in ThreadStartFunc<false> ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:431
      #6 0x7f906dc9bf51 in start_thread nptl/pthread_create.c:447
      #7 0x7f906dd1a677 in __clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

As with the previous commit, we can fix this by inserting a barrier that
makes sure all threads have finished their setup before continuing. But
there's one twist in this case: the thread which calls die() is not one
of the worker threads, but the main thread itself!

So we need the main thread to wait in the barrier, too, until all
threads have gotten to it. And thus we initialize the barrier for
num_threads+1, to account for all of the worker threads plus the main
one.

If we then test as above, t0003 should run indefinitely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-30 06:18:58 -08:00
Jeff King 526c0a851b index-pack: work around LSan threading race with barrier
We sometimes get false positives from our linux-leaks CI job because of
a race in LSan itself. The problem is that one thread is still
initializing its stack in LSan's code (and allocating memory to do so)
while anothe thread calls die(), taking down the whole process and
triggering a leak check.

The problem is described in more detail in 993d38a066 (index-pack: spawn
threads atomically, 2024-01-05), which tried to fix it by pausing worker
threads until all calls to pthread_create() had completed. But that's
not enough to fix the problem, because the LSan setup code runs in the
threads themselves. So even though pthread_create() has returned, we
have no idea if all threads actually finished their setup before letting
any of them do real work.

We can fix that by using a barrier inside the threads themselves,
waiting for all of them to hit the start of their main function before
any of them proceed.

You can test for the race by running:

  make SANITIZE=leak THREAD_BARRIER_PTHREAD=YesOnLinux
  cd t
  ./t5309-pack-delta-cycles.sh --stress

which fails quickly before this patch, and should run indefinitely
without it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-30 06:18:58 -08:00
Jeff King ca9d60f246 Revert "index-pack: spawn threads atomically"
This reverts commit 993d38a066.

That commit was trying to solve a race between LSan setting up the
threads stack and another thread calling exit(), by making sure that all
pthread_create() calls have finished before doing any work that might
trigger the exit().

But that isn't sufficient. The setup code actually runs in the
individual threads themselves, not in the spawning thread's call to
pthread_create(). So while it may have improved the race a bit, you can
still trigger it pretty quickly with:

  make SANITIZE=leak
  cd t
  ./t5309-pack-delta-cycles.sh --stress

Let's back out that failed attempt so we can try again.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-30 06:18:57 -08:00
Patrick Steinhardt 5e7fe8a7b8 commit-reach: use `size_t` to track indices when computing merge bases
The functions `repo_get_merge_bases_many()` and friends accepts an array
of commits as well as a parameter that indicates how large that array
is. This parameter is using a signed integer, which leads to a couple of
warnings with -Wsign-compare.

Refactor the code to use `size_t` to track indices instead and adapt
callers accordingly. While most callers are trivial, there are two
callers that require a bit more scrutiny:

  - builtin/merge-base.c:show_merge_base() subtracts `1` from the
    `rev_nr` before calling `repo_get_merge_bases_many_dirty()`, so if
    the variable was `0` it would wrap. This code is fine though because
    its only caller will execute that code only when `argc >= 2`, and it
    follows that `rev_nr >= 2`, as well.

  - bisect.ccheck_merge_bases() similarly subtracts `1` from `rev_nr`.
    Again, there is only a single caller that populates `rev_nr` with
    `good_revs.nr`. And because a bisection always requires at least one
    good revision it follws that `rev_nr >= 1`.

Mark the file as -Wsign-compare-clean.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-27 08:12:40 -08:00
Patrick Steinhardt 1ab5948141 builtin/log: fix remaining -Wsign-compare warnings
Fix remaining -Wsign-compare warnings in "builtin/log.c" and mark the
file as -Wsign-compare-clean. While most of the fixes are obvious, one
fix requires us to use `cast_size_t_to_int()`, which will cause us to
die in case the `size_t` cannot be represented as `int`. This should be
fine though, as the data would typically be set either via a config key
or via the command line, neither of which should ever exceed a couple of
kilobytes of data.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-27 08:11:46 -08:00
Patrick Steinhardt 0905ed201a builtin/log: use `size_t` to track indices
Similar as with the preceding commit, adapt "builtin/log.c" so that it
tracks array indices via `size_t` instead of using signed integers. This
fixes a couple of -Wsign-compare warnings and prepares the code for
a similar refactoring of `repo_get_merge_bases_many()` in a subsequent
commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-27 08:11:46 -08:00
Junio C Hamano 88e59f8027 Merge branch 'js/range-diff-diff-merges'
"git range-diff" learned to optionally show and compare merge
commits in the ranges being compared, with the --diff-merges
option.

* js/range-diff-diff-merges:
  range-diff: introduce the convenience option `--remerge-diff`
  range-diff: optionally include merge commits' diffs in the analysis
2024-12-23 09:32:17 -08:00
Junio C Hamano 002a8a9d36 Merge branch 'as/show-index-uninitialized-hash'
Regression fix for 'show-index' when run outside of a repository.

* as/show-index-uninitialized-hash:
  t5300: add test for 'show-index --object-format'
  show-index: fix uninitialized hash function
2024-12-23 09:32:12 -08:00
Junio C Hamano 4156b6a741 Merge branch 'ps/build-sign-compare'
Start working to make the codebase buildable with -Wsign-compare.

* ps/build-sign-compare:
  t/helper: don't depend on implicit wraparound
  scalar: address -Wsign-compare warnings
  builtin/patch-id: fix type of `get_one_patchid()`
  builtin/blame: fix type of `length` variable when emitting object ID
  gpg-interface: address -Wsign-comparison warnings
  daemon: fix type of `max_connections`
  daemon: fix loops that have mismatching integer types
  global: trivial conversions to fix `-Wsign-compare` warnings
  pkt-line: fix -Wsign-compare warning on 32 bit platform
  csum-file: fix -Wsign-compare warning on 32-bit platform
  diff.h: fix index used to loop through unsigned integer
  config.mak.dev: drop `-Wno-sign-compare`
  global: mark code units that generate warnings with `-Wsign-compare`
  compat/win32: fix -Wsign-compare warning in "wWinMain()"
  compat/regex: explicitly ignore "-Wsign-compare" warnings
  git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-23 09:32:11 -08:00
Junio C Hamano 49edce4ff9 show-index: the short help should say the command reads from its input
The short help text given by "git show-index -h" says

    $ git show-index -h
    usage: git show-index [--object-format=<hash-algorithm>]

        --[no-]object-format <hash-algorithm>
                              specify the hash algorithm to use

The command takes a pack .idx file from its standard input.  The
user has to _know_ this, as there is no indication from this output.

Give a hint that the data to work on is fed from its standard input.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 17:30:57 -08:00
Junio C Hamano cb89eebf3b Merge branch 'js/log-remerge-keep-ancestry'
"git log -p --remerge-diff --reverse" was completely broken.

* js/log-remerge-keep-ancestry:
  log: --remerge-diff needs to keep around commit parents
2024-12-19 10:58:31 -08:00
Junio C Hamano a1f34d5955 Merge branch 'bf/fetch-set-head-config'
"git fetch" honors "remote.<remote>.followRemoteHEAD" settings to
tweak the remote-tracking HEAD in "refs/remotes/<remote>/HEAD".

* bf/fetch-set-head-config:
  remote set-head: set followRemoteHEAD to "warn" if "always"
  fetch set_head: add warn-if-not-$branch option
  fetch set_head: move warn advice into advise_if_enabled
  fetch: add configuration for set_head behaviour
2024-12-19 10:58:30 -08:00
Junio C Hamano ae75cefd94 Merge branch 'jc/set-head-symref-fix'
"git fetch" from a configured remote learned to update a missing
remote-tracking HEAD but it asked the remote about their HEAD even
when it did not need to, which has been corrected.  Incidentally,
this also corrects "git fetch --tags $URL" which was broken by the
new feature in an unspecified way.

* jc/set-head-symref-fix:
  fetch: do not ask for HEAD unnecessarily
2024-12-19 10:58:28 -08:00
Junio C Hamano 5f212684ab Merge branch 'bf/set-head-symref'
When "git fetch $remote" notices that refs/remotes/$remote/HEAD is
missing and discovers what branch the other side points with its
HEAD, refs/remotes/$remote/HEAD is updated to point to it.

* bf/set-head-symref:
  fetch set_head: handle mirrored bare repositories
  fetch: set remote/HEAD if it does not exist
  refs: add create_only option to refs_update_symref_extended
  refs: add TRANSACTION_CREATE_EXISTS error
  remote set-head: better output for --auto
  remote set-head: refactor for readability
  refs: atomically record overwritten ref in update_symref
  refs: standardize output of refs_read_symbolic_ref
  t/t5505-remote: test failure of set-head
  t/t5505-remote: set default branch to main
2024-12-19 10:58:27 -08:00
Patrick Steinhardt 727c71a112 tmp-objdir: stop using `the_repository`
Stop using `the_repository` in the "tmp-objdir" subsystem by passing
in the repostiroy when creating a new temporary object directory.

While we could trivially update the caller to pass in the hash algorithm
used by the index itself, we instead pass in `the_hash_algo`. This is
mostly done to stay consistent with the rest of the code in that file,
which isn't prepared to handle arbitrary repositories, either.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18 10:44:31 -08:00
Patrick Steinhardt 6c27d22276 credential: stop using `the_repository`
Stop using `the_repository` in the "credential" subsystem by passing in
a repository when filling, approving or rejecting credentials.

Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18 10:44:31 -08:00
Patrick Steinhardt 71e5afee8b mailinfo: stop using `the_repository`
Stop using `the_repository` in the "mailinfo" subsystem by passing in
a repository when setting up the mailinfo structure.

Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18 10:44:31 -08:00
Patrick Steinhardt b4c476c43a diagnose: stop using `the_repository`
Stop using `the_repository` in the "diagnose" subsystem by passing in a
repository when generating a diagnostics archive.

Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18 10:44:31 -08:00
Patrick Steinhardt c365dbb44e server-info: stop using `the_repository`
Stop using `the_repository` in the "server-info" subsystem by passing in
a repository when updating server info and storing the repository in the
`update_info_ctx` structure to make it accessible to other functions.

Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18 10:44:30 -08:00
Patrick Steinhardt 5ee907bb3f send-pack: stop using `the_repository`
Stop using `the_repository` in the "send-pack" subsystem by passing in a
repository when sending a packfile.

Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18 10:44:30 -08:00
Patrick Steinhardt 395b584b57 serve: stop using `the_repository`
Stop using `the_repository` in the "serve" subsystem by passing in a
repository when advertising capabilities or serving requests.

Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18 10:44:30 -08:00
Patrick Steinhardt 59b6131a67 pager: stop using `the_repository`
Stop using `the_repository` in the "pager" subsystem by passing in a
repository when setting up the pager and when configuring it.

Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18 10:44:30 -08:00
Patrick Steinhardt 1f7e6478dc progress: stop using `the_repository`
Stop using `the_repository` in the "progress" subsystem by passing in a
repository when initializing `struct progress`. Furthermore, store a
pointer to the repository in that struct so that we can pass it to the
trace2 API when logging information.

Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18 10:44:30 -08:00
Junio C Hamano 913a1e157c Merge branch 'ps/build-sign-compare' into ps/the-repository
* ps/build-sign-compare:
  t/helper: don't depend on implicit wraparound
  scalar: address -Wsign-compare warnings
  builtin/patch-id: fix type of `get_one_patchid()`
  builtin/blame: fix type of `length` variable when emitting object ID
  gpg-interface: address -Wsign-comparison warnings
  daemon: fix type of `max_connections`
  daemon: fix loops that have mismatching integer types
  global: trivial conversions to fix `-Wsign-compare` warnings
  pkt-line: fix -Wsign-compare warning on 32 bit platform
  csum-file: fix -Wsign-compare warning on 32-bit platform
  diff.h: fix index used to loop through unsigned integer
  config.mak.dev: drop `-Wno-sign-compare`
  global: mark code units that generate warnings with `-Wsign-compare`
  compat/win32: fix -Wsign-compare warning in "wWinMain()"
  compat/regex: explicitly ignore "-Wsign-compare" warnings
  git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-18 10:43:16 -08:00
Johannes Schindelin 4538338c7e range-diff: introduce the convenience option `--remerge-diff`
Just like `git log`, now also `git range-diff` has that option as a
shortcut for the common operation that would otherwise require the quite
unwieldy (if theoretically "more correct") `--diff-mode=remerge` option.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-16 08:45:48 -08:00
Johannes Schindelin f8043236c6 range-diff: optionally include merge commits' diffs in the analysis
The `git log` command already offers support for including diffs for
merges, via the `--diff-merges=<format>` option.

Let's add corresponding support for `git range-diff`, too. This makes it
more convenient to spot differences between commit ranges that contain
merges.

This is especially true in scenarios with non-trivial merges, i.e.
merges introducing changes other than, or in addition to, what merge ORT
would have produced. Merging a topic branch that changes a function
signature into a branch that added a caller of that function, for
example, would require the merge commit itself to adjust that caller to
the modified signature.

In my code reviews, I found the `--diff-merges=remerge` option
particularly useful.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-16 08:45:48 -08:00
Junio C Hamano eb8374c652 Merge branch 'js/log-remerge-keep-ancestry' into js/range-diff-diff-merges
* js/log-remerge-keep-ancestry:
  log: --remerge-diff needs to keep around commit parents
2024-12-16 08:45:14 -08:00
Junio C Hamano ededd0d5dc Merge branch 'jt/fix-fattening-promisor-fetch'
Fix performance regression of a recent "fatten promisor pack with
local objects" protection against an unwanted gc.

* jt/fix-fattening-promisor-fetch:
  index-pack --promisor: also check commits' trees
  index-pack --promisor: don't check blobs
  index-pack --promisor: dedup before checking links
2024-12-15 17:54:31 -08:00
Junio C Hamano ab738b2f1f Merge branch 'jc/forbid-head-as-tagname'
"git tag" has been taught to refuse to create refs/tags/HEAD
as such a tag will be confusing in the context of UI provided by
the Git Porcelain commands.

* jc/forbid-head-as-tagname:
  tag: "git tag" refuses to use HEAD as a tagname
  t5604: do not expect that HEAD can be a valid tagname
  refs: drop strbuf_ prefix from helpers
  refs: move ref name helpers around
2024-12-15 17:54:26 -08:00
Junio C Hamano 73b7e03e9e Merge branch 'jk/describe-perf'
"git describe" optimization.

* jk/describe-perf:
  describe: split "found all tags" and max_candidates logic
  describe: stop traversing when we run out of names
  describe: stop digging for max_candidates+1
  t/perf: add tests for git-describe
  t6120: demonstrate weakness in disjoint-root handling
2024-12-15 17:54:25 -08:00
Junio C Hamano ca43bd2562 Merge branch 'kn/midx-wo-the-repository'
Yet another "pass the repository through the callchain" topic.

* kn/midx-wo-the-repository:
  midx: inline the `MIDX_MIN_SIZE` definition
  midx: pass down `hash_algo` to functions using global variables
  midx: pass `repository` to `load_multi_pack_index`
  midx: cleanup internal usage of `the_repository` and `the_hash_algo`
  midx-write: pass down repository to `write_midx_file[_only]`
  write-midx: add repository field to `write_midx_context`
  midx-write: use `revs->repo` inside `read_refs_snapshot`
  midx-write: pass down repository to static functions
  packfile.c: remove unnecessary prepare_packed_git() call
  midx: add repository to `multi_pack_index` struct
  config: make `packed_git_(limit|window_size)` non-global variables
  config: make `delta_base_cache_limit` a non-global variable
  packfile: pass down repository to `for_each_packed_object`
  packfile: pass down repository to `has_object[_kept]_pack`
  packfile: pass down repository to `odb_pack_name`
  packfile: pass `repository` to static function in the file
  packfile: use `repository` from `packed_git` directly
  packfile: add repository to struct `packed_git`
2024-12-13 07:33:44 -08:00
Junio C Hamano 3b11c9139d Merge branch 'cw/worktree-extension'
Introduce a new repository extension to prevent older Git versions
from mis-interpreting worktrees created with relative paths.

* cw/worktree-extension:
  worktree: refactor `repair_worktree_after_gitdir_move()`
  worktree: add relative cli/config options to `repair` command
  worktree: add relative cli/config options to `move` command
  worktree: add relative cli/config options to `add` command
  worktree: add `write_worktree_linking_files()` function
  worktree: refactor infer_backlink return
  worktree: add `relativeWorktrees` extension
  setup: correctly reinitialize repository version
2024-12-13 07:33:43 -08:00
Junio C Hamano e56c283c15 Merge branch 'en/fast-import-verify-path'
"git fast-import" learned to reject paths with ".."  and "." as
their components to avoid creating invalid tree objects.

* en/fast-import-verify-path:
  t9300: test verification of renamed paths
  fast-import: disallow more path components
  fast-import: disallow "." and ".." path components
2024-12-13 07:33:41 -08:00
Junio C Hamano a32668829d Merge branch 'jt/bundle-fsck'
"git bundle --unbundle" and "git clone" running on a bundle file
both learned to trigger fsck over the new objects with configurable
fck check levels.

* jt/bundle-fsck:
  transport: propagate fsck configuration during bundle fetch
  fetch-pack: split out fsck config parsing
  bundle: support fsck message configuration
  bundle: add bundle verification options type
2024-12-13 07:33:36 -08:00
Johannes Schindelin f94bfa1516 log: --remerge-diff needs to keep around commit parents
To show a remerge diff, the merge needs to be recreated. For that to
work, the merge base(s) need to be found, which means that the commits'
parents have to be traversed until common ancestors are found (if any).

However, one optimization that hails all the way back to cb115748ec
(Some more memory leak avoidance, 2006-06-17) is to release the commit's
list of parents immediately after showing it _and to set that parent
list to `NULL`_. This can break the merge base computation.

This problem is most obvious when traversing the commits in reverse: In
that instance, if a parent of a merge commit has been shown as part of
the `git log` command, by the time the merge commit's diff needs to be
computed, that parent commit's list of parent commits will have been set
to `NULL` and as a result no merge base will be found (even if one
should be found).

Traversing commits in reverse is far from the only circumstance in which
this problem occurs, though. There are many avenues to traversing at
least one commit in the revision walk that will later be part of a merge
base computation, for example when not even walking any revisions in
`git show <merge1> <merge2>` where `<merge1>` is part of the commit
graph between the parents of `<merge2>`.

Another way to force a scenario where a commit is traversed before it
has to be traversed again as part of a merge base computation is to
start with two revisions (where the first one is reachable from the
second but not in a first-parent ancestry) and show the commit log with
`--topo-order` and `--first-parent`.

Let's fix this by special-casing the `remerge_diff` mode, similar to
what we did with reflogs in f35650dff6 (log: do not free parents when
walking reflog, 2017-07-07).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-13 06:56:10 -08:00
Junio C Hamano 35f40385e4 Merge branch 'bc/allow-upload-pack-from-other-people'
Loosen overly strict ownership check introduced in the recent past,
to keep the promise "cloning a suspicious repository is a safe
first step to inspect it".

* bc/allow-upload-pack-from-other-people:
  Allow cloning from repositories owned by another user
2024-12-10 10:04:55 +09:00
Jonathan Tan 1a14c857db index-pack --promisor: also check commits' trees
Commit c08589efdc (index-pack: repack local links into promisor packs,
2024-11-01) seems to contain an oversight in that the tree of a commit
is not checked. Teach git to check these trees.

The fix slows down a fetch from a certain repo at $DAYJOB from 2m2.127s
to 2m45.052s, but in order to make the fetch correct, it seems worth it.

In order to test this, we could create server and client repos as
follows...

 C   S
  \ /
   O

(O and C are commits both on the client and server. S is a commit
only on the server. C and S have the same tree but different commit
messages. The diff between O and C is non-zero.)

...and then, from the client, fetch S from the server.

In theory, the client declares "have C" and the server can use this
information to exclude S's tree (since it knows that the client has C's
tree, which is the same as S's tree). However, it is also possible for
the server to compute that it needs to send S and not O, and proceed
from there; therefore the objects of C are not considered at all when
determining what to send in the packfile. In order to prevent a test of
client functionality from having such a dependence on server behavior, I
have not included such a test.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-10 08:53:59 +09:00
Jonathan Tan 36198026d8 index-pack --promisor: don't check blobs
As a follow-up to the parent of this commit, it was found that not
checking for the existence of blobs linked from trees sped up the fetch
from 24m47.815s to 2m2.127s. Teach Git to do that.

The tradeoff of not checking blobs is documented in a code comment.

(Blobs may also be linked from tag objects, but it is impossible to know
the type of an object linked from a tag object without looking it up in
the object database, so the code for that is untouched.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-10 08:53:59 +09:00
Jonathan Tan 911d14203c index-pack --promisor: dedup before checking links
Commit c08589efdc (index-pack: repack local links into promisor packs,
2024-11-01) fixed a bug with what was believed to be a negligible
decrease in performance [1] [2]. But at $DAYJOB, with at least one repo,
it was found that the decrease in performance was very significant.

Looking at the patch, whenever we parse an object in the packfile to
be indexed, we check the targets of all its outgoing links for its
existence. However, this could be optimized by first collecting all such
targets into an oidset (thus deduplicating them) before checking. Teach
Git to do that.

On a certain fetch from the aforementioned repo, this improved
performance from approximately 7 hours to 24m47.815s. This number will
be further reduced in a subsequent patch.

[1] https://lore.kernel.org/git/CAG1j3zGiNMbri8rZNaF0w+yP+6OdMz0T8+8_Wgd1R_p1HzVasg@mail.gmail.com/
[2] https://lore.kernel.org/git/20241105212849.3759572-1-jonathantanmy@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-10 08:53:59 +09:00
Junio C Hamano 6c915c3f85 fetch: do not ask for HEAD unnecessarily
In 3f763ddf28 (fetch: set remote/HEAD if it does not exist,
2024-11-22), git-fetch learned to opportunistically set $REMOTE/HEAD
when fetching by always asking for remote HEAD, in the hope that it
will help setting refs/remotes/<name>/HEAD if missing.

But it is not needed to always ask for remote HEAD.  When we are
fetching from a remote, for which we have remote-tracking branches,
we do need to know about HEAD.  But if we are doing one-shot fetch,
e.g.,

  $ git fetch --tags https://github.com/git/git

we do not even know what sub-hierarchy of refs/remotes/<remote>/
we need to adjust the remote HEAD for.  There is no need to ask for
HEAD in such a case.

Incidentally, because the unconditional request to list "HEAD"
affected the number of ref-prefixes requested in the ls-remote
request, this affected how the requests for tags are added to the
same ls-remote request, breaking "git fetch --tags $URL" performed
against a URL that is not configured as a remote.

Reported-by: Josh Steadmon <steadmon@google.com>
[jc: tests are also borrowed from Josh's patch]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 21:58:59 +09:00
Patrick Steinhardt efb38ad49f builtin/patch-id: fix type of `get_one_patchid()`
In `get_one_patchid()` we assign either the result of `strlen()` or
`remove_space()` to `len`. But while the former correctly returns a
`size_t`, the latter returns an `int` to indicate the length of the
stripped string even though it cannot ever return a negative value. This
causes a warning with "-Wsign-conversion".

In fact, even `get_one_patchid()` itself is also using an integer as
return value even though it always returns the length of the patch, and
this bubbles up to other callers.

Adapt the function and its helpers to use `size_t` for string lengths
consistently.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 20:20:05 +09:00
Patrick Steinhardt 6411a0a896 builtin/blame: fix type of `length` variable when emitting object ID
The `length` variable is used to store how many bytes we wish to emit
from an object ID. This value will either be the full hash algorithm's
length, or the abbreviated hash that can be set via `--abbrev` or the
"core.abbrev" option. The former is of type `size_t`, whereas the latter
is of type `int`, which causes a warning with "-Wsign-compare".

The reason why `abbrev` is using a signed type is mostly that it is
initialized with `-1` to indicate that we have to compute the minimum
abbreviation length. This length is computed via `find_alignment()`,
which always gets called before `emit_other()`, and thus we can assume
that the value would never be negative in `emit_other()`.

In fact, we can even assume that the value will always be at least
`MINIMUM_ABBREV`, which is enforced by both `git_default_core_config()`
and `parse_opt_abbrev_cb()`. We implicitly rely on this by subtracting
up to 3 without checking for whether the value becomes negative. We then
pass the value to printf(3p) to print the prefix of our object's ID, so
if that assumption was violated we may end up with undefined behaviour.

Squelch the warning by asserting this invariant and casting the value of
`abbrev` to `size_t`. This allows us to store the whole length as an
unsigned integer, which we can then pass to `fwrite()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 20:20:05 +09:00
Patrick Steinhardt 80c9e70ebe global: trivial conversions to fix `-Wsign-compare` warnings
We have a bunch of loops which iterate up to an unsigned boundary using
a signed index, which generates warnigs because we compare a signed and
unsigned value in the loop condition. Address these sites for trivial
cases and enable `-Wsign-compare` warnings for these code units.

This patch only adapts those code units where we can drop the
`DISABLE_SIGN_COMPARE_WARNINGS` macro in the same step.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 20:20:04 +09:00
Patrick Steinhardt 47d72a74a7 diff.h: fix index used to loop through unsigned integer
The `struct diff_flags` structure is essentially an array of flags, all
of which have the same type. We can thus use `sizeof()` to iterate
through all of the flags, which we do in `diff_flags_or()`. But while
the statement returns an unsigned integer, we used a signed integer to
iterate through the flags, which generates a warning.

Fix this by using `size_t` for the index instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 20:20:03 +09:00
Patrick Steinhardt 41f43b8243 global: mark code units that generate warnings with `-Wsign-compare`
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 20:20:02 +09:00
Jeff King db162862b3 describe: split "found all tags" and max_candidates logic
Commit a30154187a (describe: stop traversing when we run out of names,
2024-10-31) taught git-describe to automatically reduce the
max_candidates setting to match the total number of possible names. This
lets us break out of the traversal rather than fruitlessly searching for
more candidates when there are no more to be found.

However, setting max_candidates to 0 (e.g., if the repo has no tags)
overlaps with the --exact-match option, which explicitly uses the same
value. And this causes a regression with --always, which is ignored in
exact-match mode. We used to get this in a repo with no tags:

  $ git describe --always HEAD
  b2f0a7f

and now we get:

  $ git describe --always HEAD
  fatal: no tag exactly matches 'b2f0a7f47f5f2aebe1e7fceff19a57de20a78c06'

The reason is that we bail early in describe_commit() when
max_candidates is set to 0. This logic goes all the way back to
2c33f75754 (Teach git-describe --exact-match to avoid expensive tag
searches, 2008-02-24).

We should obviously fix this regression, but there are two paths,
depending on what you think:

  $ git describe --always --exact-match

and

  $ git describe --always --candidates=0

should do. Since the "--always" option was added, it has always been
ignored in --exact-match (or --candidates=0) mode. I.e., we treat
--exact-match as a true exact match of a tag, and never fall back to
using --always, even if it was requested.

If we think that's a bug (or at least a misfeature), then the right
solution is to fix it by removing the early bail-out from 2c33f75754,
letting the noop algorithm run and then hitting the --always fallback
output. And then our regression naturally goes away, because it follows
the same path.

If we think that the current "--exact-match --always" behavior is the
right thing, then we have to differentiate the case where we
automatically reduced max_candidates to 0 from the case where the user
asked for it specifically. That's possible to do with a flag, but we can
also just reimplement the logic from a30154187a to explicitly break out
of the traversal when we run out of candidates (rather than relying on
the existing max_candidates check).

My gut feeling is along the lines of option 1 (it's a bug, and people
would be happy for "--exact-match --always" to give the fallback rather
than ignoring "--always"). But the documentation can be interpreted in
the other direction, and we've certainly lived with the existing
behavior for many years. So it's possible that changing it now is the
wrong thing.

So this patch fixes the regression by taking the second option,
retaining the "--exact-match" behavior as-is. There are two new tests.
The first shows that the regression is fixed (we don't even need a new
repo without tags; a restrictive --match is enough to create the
situation that there are no candidate names).

The second test confirms that the "--exact-match --always" behavior
remains unchanged and continues to die when there is no tag pointing at
the specified commit. It's possible we may reconsider this in the
future, but this shows that the approach described above is implemented
faithfully.

We can also run the perf tests in p6100 to see that we've retained the
speedup that a30154187a was going for:

  Test                                           HEAD^             HEAD
  --------------------------------------------------------------------------------------
  6100.2: describe HEAD                          0.72(0.64+0.07)   0.72(0.66+0.06) +0.0%
  6100.3: describe HEAD with one max candidate   0.01(0.00+0.00)   0.01(0.00+0.00) +0.0%
  6100.4: describe HEAD with one tag             0.01(0.01+0.00)   0.01(0.01+0.00) +0.0%

Reported-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 15:18:21 +09:00
Bence Ferdinandy 012bc566ba remote set-head: set followRemoteHEAD to "warn" if "always"
When running "remote set-head" manually it is unlikely, that the user
would actually like to have "fetch" always update the remote/HEAD. On
the contrary, it is more likely, that the user would expect remote/HEAD
to stay the way they manually set it, and just forgot about having
"followRemoteHEAD" set to "always".

When "followRemoteHEAD" is set to "always" make running "remote
set-head" change the config to "warn".

Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 02:59:40 +09:00
Bence Ferdinandy 9e2b7005be fetch set_head: add warn-if-not-$branch option
Currently if we want to have a remote/HEAD locally that is different
from the one on the remote, but we still want to get a warning if remote
changes HEAD, our only option is to have an indiscriminate warning with
"follow_remote_head" set to "warn". Add a new option
"warn-if-not-$branch", where $branch is a branch name we do not wish to
get a warning about. If the remote HEAD is $branch do not warn,
otherwise, behave as "warn".

E.g. let's assume, that our remote origin has HEAD
set to "master", but locally we have "git remote set-head origin seen".
Setting 'remote.origin.followRemoteHEAD = "warn"' will always print
a warning, even though the remote has not changed HEAD from "master".
Setting 'remote.origin.followRemoteHEAD = "warn-if-not-master" will
squelch the warning message, unless the remote changes HEAD from
"master". Note, that should the remote change HEAD to "seen" (which we
have locally), there will still be no warning.

Improve the advice message in report_set_head to also include silencing
the warning message with "warn-if-not-$branch".

Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 02:59:40 +09:00
Bence Ferdinandy ad739f525e fetch set_head: move warn advice into advise_if_enabled
Advice about what to do when getting a warning is typed out explicitly
twice and is printed as regular output. The output is also tested for.
Extract the advice message into a single place and use a wrapper
function, so if later the advice is made more chatty the signature only
needs to be changed in once place. Remove the testing for the advice
output in the tests.

Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 02:59:16 +09:00
Karthik Nayak 2fed09aa9b midx-write: pass down repository to `write_midx_file[_only]`
In a previous commit, we passed the repository field to all
subcommands in the `builtin/` directory. Utilize this to pass the
repository field down to the `write_midx_file[_only]` functions to
remove the usage of `the_repository` global variables.

With this, all usage of global variables in `midx-write.c` is removed,
hence, remove the `USE_THE_REPOSITORY_VARIABLE` guard from the file.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 10:32:20 +09:00
Junio C Hamano aaafb67ba9 Merge branch 'kn/pass-repo-to-builtin-sub-sub-commands' into kn/midx-wo-the-repository
* kn/pass-repo-to-builtin-sub-sub-commands:
  builtin: pass repository to sub commands
  Git 2.47.1
  Makefile(s): avoid recipe prefix in conditional statements
  doc: switch links to https
  doc: update links to current pages
  The eleventh batch
  pack-objects: only perform verbatim reuse on the preferred pack
  t5332-multi-pack-reuse.sh: demonstrate duplicate packing failure
  test-lib: move malloc-debug setup after $PATH setup
  builtin/difftool: intialize some hashmap variables
  refspec: store raw refspecs inside refspec_item
  refspec: drop separate raw_nr count
  fetch: adjust refspec->raw_nr when filtering prefetch refspecs
  test-lib: check malloc debug LD_PRELOAD before using
2024-12-04 10:32:02 +09:00
Junio C Hamano 33833ed08b Merge branch 'kn/the-repository' into kn/midx-wo-the-repository
* kn/the-repository:
  packfile.c: remove unnecessary prepare_packed_git() call
  midx: add repository to `multi_pack_index` struct
  config: make `packed_git_(limit|window_size)` non-global variables
  config: make `delta_base_cache_limit` a non-global variable
  packfile: pass down repository to `for_each_packed_object`
  packfile: pass down repository to `has_object[_kept]_pack`
  packfile: pass down repository to `odb_pack_name`
  packfile: pass `repository` to static function in the file
  packfile: use `repository` from `packed_git` directly
  packfile: add repository to struct `packed_git`
2024-12-04 10:31:46 +09:00
Junio C Hamano 1e18cf4310 Merge branch 'kn/pass-repo-to-builtin-sub-sub-commands'
Built-in Git subcommands are supplied the repository object to work
with; they learned to do the same when they invoke sub-subcommands.

* kn/pass-repo-to-builtin-sub-sub-commands:
  builtin: pass repository to sub commands
2024-12-04 10:14:47 +09:00
Junio C Hamano 57e81b59f3 Merge branch 'sj/ref-contents-check'
"git fsck" learned to issue warnings on "curiously formatted" ref
contents that have always been taken valid but something Git
wouldn't have written itself (e.g., missing terminating end-of-line
after the full object name).

* sj/ref-contents-check:
  ref: add symlink ref content check for files backend
  ref: check whether the target of the symref is a ref
  ref: add basic symref content check for files backend
  ref: add more strict checks for regular refs
  ref: port git-fsck(1) regular refs check for files backend
  ref: support multiple worktrees check for refs
  ref: initialize ref name outside of check functions
  ref: check the full refname instead of basename
  ref: initialize "fsck_ref_report" with zero
2024-12-04 10:14:42 +09:00
Junio C Hamano 7ee055b237 Merge branch 'ps/ref-backend-migration-optim'
The migration procedure between two ref backends has been optimized.

* ps/ref-backend-migration-optim:
  reftable: rename scratch buffer
  refs: adapt `initial_transaction` flag to be unsigned
  reftable/block: optimize allocations by using scratch buffer
  reftable/block: rename `block_writer::buf` variable
  reftable/writer: optimize allocations by using a scratch buffer
  refs: don't normalize log messages with `REF_SKIP_CREATE_REFLOG`
  refs: skip collision checks in initial transactions
  refs: use "initial" transaction semantics to migrate refs
  refs/files: support symbolic and root refs in initial transaction
  refs: introduce "initial" transaction flag
  refs/files: move logic to commit initial transaction
  refs: allow passing flags when setting up a transaction
2024-12-04 10:14:41 +09:00
Junio C Hamano a5dd262a75 Merge branch 'ps/leakfixes-part-10'
Leakfixes.

* ps/leakfixes-part-10: (27 commits)
  t: remove TEST_PASSES_SANITIZE_LEAK annotations
  test-lib: unconditionally enable leak checking
  t: remove unneeded !SANITIZE_LEAK prerequisites
  t: mark some tests as leak free
  t5601: work around leak sanitizer issue
  git-compat-util: drop now-unused `UNLEAK()` macro
  global: drop `UNLEAK()` annotation
  t/helper: fix leaking commit graph in "read-graph" subcommand
  builtin/branch: fix leaking sorting options
  builtin/init-db: fix leaking directory paths
  builtin/help: fix leaks in `check_git_cmd()`
  help: fix leaking return value from `help_unknown_cmd()`
  help: fix leaking `struct cmdnames`
  help: refactor to not use globals for reading config
  builtin/sparse-checkout: fix leaking sanitized patterns
  split-index: fix memory leak in `move_cache_to_base_index()`
  git: refactor builtin handling to use a `struct strvec`
  git: refactor alias handling to use a `struct strvec`
  strvec: introduce new `strvec_splice()` function
  line-log: fix leak when rewriting commit parents
  ...
2024-12-04 10:14:40 +09:00
Junio C Hamano 2f605347da Merge branch 'ps/gc-stale-lock-warning'
Give a bit of advice/hint message when "git maintenance" stops finding a
lock file left by another instance that still is potentially running.

* ps/gc-stale-lock-warning:
  t7900: fix host-dependent behaviour when testing git-maintenance(1)
  builtin/gc: provide hint when maintenance hits a stale schedule lock
2024-12-04 10:14:37 +09:00
Karthik Nayak d284713bae config: make `packed_git_(limit|window_size)` non-global variables
The variables `packed_git_window_size` and `packed_git_limit` are global
config variables used in the `packfile.c` file. Since it is only used in
this file, let's change it from being a global config variable to a
local variable for the subsystem.

With this, we rid `packfile.c` from all global variable usage and this
means we can also remove the `USE_THE_REPOSITORY_VARIABLE` guard from
the file.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 08:21:55 +09:00
Karthik Nayak d6b2d21fbf config: make `delta_base_cache_limit` a non-global variable
The `delta_base_cache_limit` variable is a global config variable used
by multiple subsystems. Let's make this non-global, by adding this
variable independently to the subsystems where it is used.

First, add the setting to the `repo_settings` struct, this provides
access to the config in places where the repository is available. Use
this in `packfile.c`.

In `index-pack.c` we add it to the `pack_idx_option` struct and its
constructor. While the repository struct is available here, it may not
be set  because `git index-pack` can be used without a repository.

In `gc.c` add it to the `gc_config` struct and also the constructor
function. The gc functions currently do not have direct access to a
repository struct.

These changes are made to remove the usage of `delta_base_cache_limit`
as a global variable in `packfile.c`. This brings us one step closer to
removing the `USE_THE_REPOSITORY_VARIABLE` definition in `packfile.c`
which we complete in the next patch.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 08:21:55 +09:00
Karthik Nayak c87910b96b packfile: pass down repository to `for_each_packed_object`
The function `for_each_packed_object` currently relies on the global
variable `the_repository`. To eliminate global variable usage in
`packfile.c`, we should progressively shift the dependency on
the_repository to higher layers. Let's remove its usage from this
function and closely related function `is_promisor_object`.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 08:21:54 +09:00
Karthik Nayak cc656f4eb2 packfile: pass down repository to `has_object[_kept]_pack`
The functions `has_object[_kept]_pack` currently rely on the global
variable `the_repository`. To eliminate global variable usage in
`packfile.c`, we should progressively shift the dependency on
the_repository to higher layers. Let's remove its usage from these
functions and any related ones.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 08:21:54 +09:00
Karthik Nayak 873b00597b packfile: pass down repository to `odb_pack_name`
The function `odb_pack_name` currently relies on the global variable
`the_repository`. To eliminate global variable usage in `packfile.c`, we
should progressively shift the dependency on the_repository to higher
layers.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 08:21:54 +09:00
Karthik Nayak 2cf3fe63f6 packfile: add repository to struct `packed_git`
The struct `packed_git` holds information regarding a packed object
file. Let's add the repository variable to this object, to represent the
repository that this packfile belongs to. This helps remove dependency
on the global `the_repository` object in `packfile.c` by simply using
repository information now readily available in the struct.

We do need to consider that a packfile could be part of the alternates
of a repository, but considering that we only have one repository struct
and also that we currently anyways use 'the_repository', we should be
OK with this change.

We also modify `alloc_packed_git` to ensure that the repository is added
to newly created `packed_git` structs. This requires modifying the
function and all its callee to pass the repository object down the
levels.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 08:21:53 +09:00
Junio C Hamano 93e5e048f8 refs: drop strbuf_ prefix from helpers
The helper functions (strbuf_branchname, strbuf_check_branch_ref,
and strbuf_check_tag_ref) are about handling branch and tag names,
and it is a non-essential fact that these functions use strbuf to
hold these names.  Rename them to make it clarify that these are
more about "ref".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-03 12:38:49 +09:00
Junio C Hamano 5bcbde9e49 refs: move ref name helpers around
strbuf_branchname(), strbuf_check_{branch,tag}_ref() are helper
functions to deal with branch and tag names, and the fact that they
happen to use strbuf to hold the name of a branch or a tag is not
essential.  These functions fit better in the refs API than strbuf
API, the latter of which is about string manipulations.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-03 12:38:49 +09:00
Elijah Newren da91a90c2f fast-import: disallow more path components
Instead of just disallowing '.' and '..', make use of verify_path() to
ensure that fast-import will disallow anything we wouldn't allow into
the index, such as anything under .git/, .gitmodules as a symlink, or
a dos drive prefix on Windows.

Since a few fast-export and fast-import tests that tried to stress-test
the correct handling of quoting relied on filenames that fail
is_valid_win32_path(), such as spaces or periods at the end of filenames
or backslashes within the filename, turn off core.protectNTFS for those
tests to ensure they keep passing.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-02 10:09:48 +09:00
Bence Ferdinandy b7f7d16562 fetch: add configuration for set_head behaviour
In the current implementation, if refs/remotes/$remote/HEAD does not
exist, running fetch will create it, but if it does exist it will not do
anything, which is a somewhat safe and minimal approach. Unfortunately,
for users who wish to NOT have refs/remotes/$remote/HEAD set for any
reason (e.g. so that `git rev-parse origin` doesn't accidentally point
them somewhere they do not want to), there is no way to remove this
behaviour. On the other side of the spectrum, users may want fetch to
automatically update HEAD or at least give them a warning if something
changed on the remote.

Introduce a new setting, remote.$remote.followRemoteHEAD with four
options:

    - "never": do not ever do anything, not even create
    - "create": the current behaviour, now the default behaviour
    - "warn": print a message if remote and local HEAD is different
    - "always": silently update HEAD on every change

Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-02 09:55:17 +09:00
Caleb White e6df1ee2c1 worktree: add relative cli/config options to `repair` command
This teaches the `worktree repair` command to respect the
`--[no-]relative-paths` CLI option and `worktree.useRelativePaths`
config setting. If an existing worktree with an absolute path is repaired
with `--relative-paths`, the links will be replaced with relative paths,
even if the original path was correct. This allows a user to covert
existing worktrees between absolute/relative as desired.

To simplify things, both linking files are written when one of the files
needs to be repaired. In some cases, this fixes the other file before it
is checked, in other cases this results in a correct file being written
with the same contents.

Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-02 09:36:17 +09:00
Caleb White 298d2917e2 worktree: add relative cli/config options to `move` command
This teaches the `worktree move` command to respect the
`--[no-]relative-paths` CLI option and `worktree.useRelativePaths`
config setting. If an existing worktree is moved with `--relative-paths`
the new path will be relative (and visa-versa).

Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-02 09:36:17 +09:00
Caleb White b7016344f1 worktree: add relative cli/config options to `add` command
This introduces the `--[no-]relative-paths` CLI option and
`worktree.useRelativePaths` configuration setting to the `worktree add`
command. When enabled these options allow worktrees to be linked using
relative paths, enhancing portability across environments where absolute
paths may differ (e.g., containerized setups, shared network drives).
Git still creates absolute paths by default, but these options allow
users to opt-in to relative paths if desired.

The t2408 test file is removed and more comprehensive tests are
written for the various worktree operations in their own files.

Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-02 09:36:17 +09:00
Justin Tobler 87c01003cd bundle: add bundle verification options type
When `unbundle()` is invoked, fsck verification may be configured by
passing the `VERIFY_BUNDLE_FSCK` flag. This mechanism allows fsck checks
on the bundle to be enabled or disabled entirely. To facilitate more
fine-grained fsck configuration, additional context must be provided to
`unbundle()`.

Introduce the `unbundle_opts` type, which wraps the existing
`verify_bundle_flags`, to facilitate future extension of `unbundle()`
configuration. Also update `unbundle()` and its call sites to accept
this new options type instead of the flags directly. The end behavior is
functionally the same, but allows for the set of configurable options to
be extended. This is leveraged in a subsequent commit to enable fsck
message severity configuration.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-28 12:07:57 +09:00
Junio C Hamano 761e62a09a Merge branch 'bf/set-head-symref' into bf/fetch-set-head-config
* bf/set-head-symref:
  fetch set_head: handle mirrored bare repositories
  fetch: set remote/HEAD if it does not exist
  refs: add create_only option to refs_update_symref_extended
  refs: add TRANSACTION_CREATE_EXISTS error
  remote set-head: better output for --auto
  remote set-head: refactor for readability
  refs: atomically record overwritten ref in update_symref
  refs: standardize output of refs_read_symbolic_ref
  t/t5505-remote: test failure of set-head
  t/t5505-remote: set default branch to main
2024-11-27 22:49:05 +09:00
Junio C Hamano 1f3d9b9814 Merge branch 'jt/index-pack-allow-promisor-only-while-fetching'
We now ensure "index-pack" is used with the "--promisor" option
only during a "git fetch".

* jt/index-pack-allow-promisor-only-while-fetching:
  index-pack: teach --promisor to forbid pack name
2024-11-27 07:57:09 +09:00
Junio C Hamano 8eaa06590f Merge branch 'en/fast-import-avoid-self-replace'
"git fast-import" can be tricked into a replace ref that maps an
object to itself, which is a useless thing to do.

* en/fast-import-avoid-self-replace:
  fast-import: avoid making replace refs point to themselves
2024-11-27 07:57:08 +09:00
Junio C Hamano 93905d3b70 Merge branch 'bc/c23'
C23 compatibility updates.

* bc/c23:
  reflog: rename unreachable
  index-pack: rename struct thread_local
2024-11-27 07:57:05 +09:00
Karthik Nayak 6f33d8e255 builtin: pass repository to sub commands
In 9b1cb5070f (builtin: add a repository parameter for builtin
functions, 2024-09-13) the repository was passed down to all builtin
commands. This allowed the repository to be passed down to lower layers
without depending on the global `the_repository` variable.

Continue this work by also passing down the repository parameter from
the command to sub-commands. This will help pass down the repository to
other subsystems and cleanup usage of global variables like
'the_repository' and 'the_hash_algo'.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26 10:36:08 +09:00
Elijah Newren 4a2790a257 fast-import: disallow "." and ".." path components
If a user specified e.g.
   M 100644 :1 ../some-file
then fast-import previously would happily create a git history where
there is a tree in the top-level directory named "..", and with a file
inside that directory named "some-file".  The top-level ".." directory
causes problems.  While git checkout will die with errors and fsck will
report hasDotdot problems, the user is going to have problems trying to
remove the problematic file.  Simply avoid creating this bad history in
the first place.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26 10:30:04 +09:00
Junio C Hamano 058c36aa26 Merge branch 'kh/checkout-ignore-other-docfix' into maint-2.47
Doc updates.

* kh/checkout-ignore-other-docfix:
  checkout: refer to other-worktree branch, not ref
2024-11-25 12:29:45 +09:00
Junio C Hamano e52276d340 Merge branch 'jh/config-unset-doc-fix' into maint-2.47
Docfix.

* jh/config-unset-doc-fix:
  git-config.1: remove value from positional args in unset usage
2024-11-25 12:29:40 +09:00
Bence Ferdinandy b1b713f722 fetch set_head: handle mirrored bare repositories
When adding a remote to bare repository with "git remote add --mirror",
running fetch will fail to update HEAD to the remote's HEAD, since it
does not know how to handle bare repositories. On the other hand HEAD
already has content, since "git init --bare" has already set HEAD to
whatever is the default branch set for the user. Unless this - by chance
- is the same as the remote's HEAD, HEAD will be pointing to a bad
symref. Teach set_head to handle bare repositories, by overwriting HEAD
so it mirrors the remote's HEAD.

Note, that in this case overriding the local HEAD reference is
necessary, since HEAD will exist before fetch can be run, but this
should not be an issue, since the whole purpose of --mirror is to be an
exact mirror of the remote, so following any changes to HEAD makes
sense.

Also note, that although "git remote set-head" also fails when trying to
update the remote's locally tracked HEAD in a mirrored bare repository,
the usage of the command does not make much sense after this patch:
fetch will update the remote HEAD correctly, and setting it manually to
something else is antithetical to the concept of mirroring.

Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-25 11:46:37 +09:00
Bence Ferdinandy 3f763ddf28 fetch: set remote/HEAD if it does not exist
When cloning a repository remote/HEAD is created, but when the user
creates a repository with git init, and later adds a remote, remote/HEAD
is only created if the user explicitly runs a variant of "remote
set-head". Attempt to set remote/HEAD during fetch, if the user does not
have it already set. Silently ignore any errors.

Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-25 11:46:37 +09:00
Bence Ferdinandy 9963746c84 refs: add create_only option to refs_update_symref_extended
Allow the caller to specify that it only wants to update the symref if
it does not already exist. Silently ignore the error from the
transaction API if the symref already exists.

Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-25 11:46:36 +09:00
Bence Ferdinandy dfe86fa06b remote set-head: better output for --auto
Currently, set-head --auto will print a message saying "remote/HEAD set
to branch", which implies something was changed.

Change the output of --auto, so the output actually reflects what was
done: a) set a previously unset HEAD, b) change HEAD because remote
changed or c) no updates. As edge cases, if HEAD is changed from
a previous symbolic reference that was not a remote branch, explicitly
call attention to this fact, and also notify the user if the previous
reference was not a symbolic reference.

Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-25 11:46:36 +09:00
Bence Ferdinandy 4f07c45e25 remote set-head: refactor for readability
Make two different readability refactors:

Rename strbufs "buf" and "buf2" to something more explanatory.

Instead of calling get_main_ref_store(the_repository) multiple times,
call it once and store the result in a new refs variable. Although this
change probably offers some performance benefits, the main purpose is to
shorten the line lengths of function calls using this variable.

Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-25 11:46:35 +09:00
Bence Ferdinandy 2fd5555895 t/t5505-remote: test failure of set-head
The test coverage was missing a test for the failure branch of remote
set-head auto's output. Add the missing text and while we are at it,
correct a small grammatical mistake in the error's output ("setup" is
the noun, "set up" is the verb).

Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-25 11:46:34 +09:00
Junio C Hamano 0a83b39594 Merge branch 'tb/multi-pack-reuse-dupfix'
Object reuse code based on multi-pack-index sent an unwanted copy
of object.

* tb/multi-pack-reuse-dupfix:
  pack-objects: only perform verbatim reuse on the preferred pack
  t5332-multi-pack-reuse.sh: demonstrate duplicate packing failure
2024-11-22 14:34:19 +09:00
Junio C Hamano 76bb16db5c Merge branch 'sm/difftool'
Use of some uninitialized variables in "git difftool" has been
corrected.

* sm/difftool:
  builtin/difftool: intialize some hashmap variables
2024-11-22 14:34:18 +09:00
Junio C Hamano aa1d4b42e5 Merge branch 'jk/fetch-prefetch-double-free-fix'
Double-free fix.

* jk/fetch-prefetch-double-free-fix:
  refspec: store raw refspecs inside refspec_item
  refspec: drop separate raw_nr count
  fetch: adjust refspec->raw_nr when filtering prefetch refspecs
2024-11-22 14:34:17 +09:00
Patrick Steinhardt d91a9db33c global: drop `UNLEAK()` annotation
There are two users of `UNLEAK()` left in our codebase:

  - In "builtin/clone.c", annotating the `repo` variable. That leak has
    already been fixed though as you can see in the context, where we do
    know to free `repo_to_free`.

  - In "builtin/diff.c", to unleak entries of the `blob[]` array. That
    leak has also been fixed, because the entries we assign to that
    array come from `rev.pending.objects`, and we do eventually release
    `rev`.

This neatly demonstrates one of the issues with `UNLEAK()`: it is quite
easy for the annotation to become stale. A second issue is that its
whole intent is to paper over leaks. And while that has been a necessary
evil in the past, because Git was leaking left and right, it isn't
really much of an issue nowadays where our test suite has no known leaks
anymore.

Remove the last two users of this macro.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 08:23:46 +09:00
Patrick Steinhardt b97301c13c builtin/branch: fix leaking sorting options
The sorting options are leaking, but given that they are marked with
`UNLEAK()` the leak sanitizer doesn't complain.

Fix the leak by creating a common exit path and clearing the vector such
that we can get rid of the `UNLEAK()` annotation entirely.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 08:23:45 +09:00
Patrick Steinhardt 8ef15c205b builtin/init-db: fix leaking directory paths
We've got a couple of leaking directory paths in git-init(1), all of
which are marked with `UNLEAK()`. Fixing them is trivial, so let's do
that instead so that we can get rid of `UNLEAK()` entirely.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 08:23:45 +09:00
Patrick Steinhardt 2379b5c900 builtin/help: fix leaks in `check_git_cmd()`
The `check_git_cmd()` function is declared to return a string constant.
And while it sometimes does return a constant, it may also return an
allocated string in two cases:

  - When handling aliases. This case is already marked with `UNLEAK()`
    to work around the leak.

  - When handling unknown commands in case "help.autocorrect" is
    enabled. This one is not marked with `UNLEAK()`.

The function only has a single caller, so let's fix its return type to
be non-constant, consistently return an allocated string and free it at
its callsite to plug the leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 08:23:44 +09:00
Patrick Steinhardt 58e7568c61 builtin/sparse-checkout: fix leaking sanitized patterns
Both `git sparse-checkout add` and `git sparse-checkout set` accept a
list of additional directories or patterns. These get massaged via calls
to `sanitize_paths()`, which may end up modifying the passed-in array by
updating its pointers to be prefixed paths. This allocates memory that
we never free.

Refactor the code to instead use a `struct strvec`, which makes it way
easier for us to track the lifetime correctly. The couple of extra
memory allocations likely do not matter as we only ever populate it with
command line arguments.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 08:23:43 +09:00
Patrick Steinhardt 65a1b7e2bd builtin/blame: fix leaking blame entries with `--incremental`
When passing `--incremental` to git-blame(1) we exit early by jumping to
the `cleanup` label. But some of the cleanups we perform are handled
between the `goto` and its label, and thus we leak the data.

Move the cleanups after the `cleanup` label. While at it, move the logic
to free the scoreboard's `final_buf` into `cleanup_scoreboard()` and
drop its `const` declaration.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 08:23:40 +09:00
shejialuo 7c78d819e6 ref: support multiple worktrees check for refs
We have already set up the infrastructure to check the consistency for
refs, but we do not support multiple worktrees. However, "git-fsck(1)"
will check the refs of worktrees. As we decide to get feature parity
with "git-fsck(1)", we need to set up support for multiple worktrees.

Because each worktree has its own specific refs, instead of just showing
the users "refs/worktree/foo", we need to display the full name such as
"worktrees/<id>/refs/worktree/foo". So we should know the id of the
worktree to get the full name. Add a new parameter "struct worktree *"
for "refs-internal.h::fsck_fn". Then change the related functions to
follow this new interface.

The "packed-refs" only exists in the main worktree, so we should only
check "packed-refs" in the main worktree. Use "is_main_worktree" method
to skip checking "packed-refs" in "packed_fsck" function.

Then, enhance the "files-backend.c::files_fsck_refs_dir" function to add
"worktree/<id>/" prefix when we are not in the main worktree.

Last, add a new test to check the refname when there are multiple
worktrees to exercise the code.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 08:21:32 +09:00
Patrick Steinhardt 1c299d03e5 refs: introduce "initial" transaction flag
There are two different ways to commit a transaction:

  - `ref_transaction_commit()` can be used to commit a regular
    transaction and is what almost every caller wants.

  - `initial_ref_transaction_commit()` can be used when it is known that
    the ref store that the transaction is committed for is empty and
    when there are no concurrent processes. This is used when cloning a
    new repository.

Implementing this via two separate functions has a couple of downsides.
First, every reference backend needs to implement a separate callback
even in the case where they don't special-case the initial transaction.
Second, backends are basically forced to reimplement the whole logic for
how to commit the transaction like the "files" backend does, even though
backends may wish to only tweak certain behaviour of a "normal" commit.
Third, it is awkward that callers must never prepare the transaction as
this is somewhat different than how a transaction typically works.

Refactor the code such that we instead mark initial transactions via a
separate flag when starting the transaction. This addresses all of the
mentioned painpoints, where the most important part is that it will
allow backends to have way more leeway in how exactly they want to
handle the initial transaction.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:15 +09:00
Patrick Steinhardt a0efef1446 refs: allow passing flags when setting up a transaction
Allow passing flags when setting up a transaction such that the
behaviour of the transaction itself can be altered. This functionality
will be used in a subsequent patch.

Adapt callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:14 +09:00
Junio C Hamano 38e4df6615 Merge branch 'la/trailer-info'
Renaming a handful of variables and structure fields.

* la/trailer-info:
  trailer: spread usage of "trailer_block" language
2024-11-20 14:47:17 +09:00
Junio C Hamano 0c11ef1356 Merge branch 'jt/repack-local-promisor'
"git gc" discards any objects that are outside promisor packs that
are referred to by an object in a promisor pack, and we do not
refetch them from the promisor at runtime, resulting an unusable
repository.  Work it around by including these objects in the
referring promisor pack at the receiving end of the fetch.

* jt/repack-local-promisor:
  index-pack: repack local links into promisor packs
  t5300: move --window clamp test next to unclamped
  t0410: use from-scratch server
  t0410: make test description clearer
2024-11-20 14:47:16 +09:00
Junio C Hamano cc53ddf7f0 Merge branch 'db/submodule-fetch-with-remote-name-fix' into maint-2.47
A "git fetch" from the superproject going down to a submodule used
a wrong remote when the default remote names are set differently
between them.

* db/submodule-fetch-with-remote-name-fix:
  submodule: correct remote name with fetch
2024-11-20 14:43:00 +09:00
Junio C Hamano 76c1953395 Merge branch 'ps/maintenance-start-crash-fix' into maint-2.47
"git maintenance start" crashed due to an uninitialized variable
reference, which has been corrected.

* ps/maintenance-start-crash-fix:
  builtin/gc: fix crash when running `git maintenance start`
2024-11-20 14:42:58 +09:00
Junio C Hamano f1a50f12b9 Merge branch 'jk/fsmonitor-event-listener-race-fix' into maint-2.47
On macOS, fsmonitor can fall into a race condition that results in
a client waiting forever to be notified for an event that have
already happened.  This problem has been corrected.

* jk/fsmonitor-event-listener-race-fix:
  fsmonitor: initialize fs event listener before accepting clients
  simple-ipc: split async server initialization and running
2024-11-20 14:42:57 +09:00
Jonathan Tan 1f2be8bed6 index-pack: teach --promisor to forbid pack name
Currently,

 - Running "index-pack --promisor" outside a repo segfaults.
 - It may be confusing to a user that running "index-pack --promisor"
   within a repo may make changes to the repo's object DB, especially
   since the packs indexed by the index-pack invocation may not even be
   related to the repo.

As discussed in [1] and [2], teaching --promisor to forbid a packfile
name solves both these problems. This combination of arguments requires
a repo (since we are writing the resulting .pack and .idx to it) and it
is clear that the files are related to the repo.

Currently, Git uses "index-pack --promisor" only when fetching into
a repo, so it could be argued that we should teach "index-pack" a
new argument (say, "--fetching-mode") instead of tying --promisor to
a generic argument like the packfile name. However, this --promisor
feature could conceivably be used whenever we have a packfile that is
known to come from the promisor remote (whether obtained through Git's
fetch protocol or through other means) so not using a new argument seems
reasonable - one could envision a user-made script obtaining a packfile
and then running "index-pack --promisor --stdin", for example. In fact,
it might be possible to relax the restriction further (say, by also
allowing --promisor when indexing a packfile that is in the object DB),
but relaxing the restriction is backwards-compatible so we can revisit
that later.

One thing to watch out for is the possibility of a future Git feature
that indexes a pack in the context of a repo, but does not necessarily
write the resulting pack to it (and does not necessarily desire to
make any changes to the object DB). One such feature would be fetch
quarantine, which might need the repo context in order to detect
hash collisions, but would also need to ensure that the object DB
is undisturbed in case the fetch fails for whatever reason, even if
the reason occurs only after the indexing is complete. It may not be
obvious to the implementer of such a feature that "index-pack" could
sometimes write packs other than the indexed pack to the object DB,
but there are already other ways that "fetch" could write to the object
DB (in particular, packfile URIs and bundle URIs), so hopefully the
implementation of this future feature would already include a test that
the object DB be undisturbed.

This change requires the change to t5300 by 1f52cdfacb (index-pack:
document and test the --promisor option, 2022-03-09) to be undone.
(--promisor is already tested indirectly, so we don't need the explicit
test here any more.)

[1] https://lore.kernel.org/git/20241114005652.GC1140565@coredump.intra.peff.net/
[2] https://lore.kernel.org/git/20241119185345.GB15723@coredump.intra.peff.net/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-20 10:37:56 +09:00
Patrick Steinhardt 656ca9204a builtin/gc: provide hint when maintenance hits a stale schedule lock
When running scheduled maintenance via `git maintenance start`, we
acquire a lockfile to ensure that no other scheduled maintenance task is
running in the repository concurrently. If so, we do provide an error to
the user hinting that another process seems to be running in this repo.

There are two important cases why such a lockfile may exist:

  - An actual git-maintenance(1) process is still running in this
    repository.

  - An earlier process may have crashed or was interrupted part way
    through and has left a stale lockfile behind.

In c95547a394 (builtin/gc: fix crash when running `git maintenance
start`, 2024-10-10), we have fixed an issue where git-maintenance(1)
would crash with the "start" subcommand, and the underlying bug causes
the second scenario to trigger quite often now.

Most users don't know how to get out of that situation again though.
Ideally, we'd be removing the stale lock for our users automatically.
But in the context of repository maintenance this is rather risky, as it
can easily run for hours or even days. So finding a clear point where we
know that the old process has exited is basically impossible.

We have the same issue in other subsystems, e.g. when locking refs. Our
lockfile interfaces thus provide the `unable_to_lock_message()` function
for exactly this purpose: it provides a nice hint to the user that
explains what is going on and how to get out of that situation again by
manually removing the file.

Adapt git-maintenance(1) to print a similar hint. While we could use the
above function, we can provide a bit more context as we know exactly
what kind of process would create the lockfile.

Reported-by: Miguel Rincon Barahona <mrincon@gitlab.com>
Reported-by: Kev Kloss <kkloss@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-20 10:26:12 +09:00
Elijah Newren 5e904f1a4a fast-import: avoid making replace refs point to themselves
If someone replaces a commit with a modified version, then builds on
that commit, and then later decides to rewrite history in a format like

    git fast-export --all | CMD_TO_TWEAK_THE_STREAM | git fast-import

and CMD_TO_TWEAK_THE_STREAM undoes the modifications that the
replacement did, then at the end you'd get a replace ref that points to
itself.  For example:

    $ git show-ref | grep replace
    fb92ebc654641b310e7d0360d0a5a49316fd7264 refs/replace/fb92ebc654641b310e7d0360d0a5a49316fd7264

Git commands which pay attention to replace refs will die with an error
when a self-referencing replace ref is present:

    $ git log
    fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264

Avoid such problems by deleting replace refs that will simply end up
pointing to themselves at the end of our writing.  Unless users specify
--quiet, warn them when we delete such a replace ref.

Two notes about this patch:
  * We are not ignoring the problematic update of the replace ref
    (turning it into a no-op), we are replacing the update with a delete.
    The logic here is that if the repository had a value for the replace
    ref before fast-import was run, and the replace ref was explicitly
    named in the fast-import stream, we don't want the replace ref to be
    left with a pre-fast-import value.
  * While loops with more than one element (e.g. refs/replace/A points
    to B, and refs/replace/B points to A) are possible, they seem much
    less plausible.  It is pretty easy to create a sequence of
    git-filter-repo commands that will trigger a self-referencing replace
    ref, but I do not know how to trigger a scenario with a cycle length
    greater than 1.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 09:39:33 +09:00
brian m. carlson e8b3bcf491 index-pack: rename struct thread_local
"thread_local" is a keyword in C23.  To make sure that our code compiles
on a wide variety of C versions, rename struct thread_local to "struct
thread_local_data" to avoid a conflict.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 09:42:08 +09:00
brian m. carlson 0ffb5a6bf1 Allow cloning from repositories owned by another user
Historically, Git has allowed users to clone from an untrusted
repository, and we have documented that this is safe to do so:

    `upload-pack` tries to avoid any dangerous configuration options or
    hooks from the repository it's serving, making it safe to clone an
    untrusted directory and run commands on the resulting clone.

However, this was broken by f4aa8c8bb1 ("fetch/clone: detect dubious
ownership of local repositories", 2024-04-10) in an attempt to make
things more secure.  That change resulted in a variety of problems when
cloning locally and over SSH, but it did not change the stated security
boundary.  Because the security boundary has not changed, it is safe to
adjust part of the code that patch introduced.

To do that and restore the previous functionality, adjust enter_repo to
take two flags instead of one.

The two bits are

 - ENTER_REPO_STRICT: callers that require exact paths (as opposed
   to allowing known suffixes like ".git", ".git/.git" to be
   omitted) can set this bit.  Corresponds to the "strict" parameter
   that the flags word replaces.

 - ENTER_REPO_ANY_OWNER_OK: callers that are willing to run without
   ownership check can set this bit.

The former is --strict-paths option of "git daemon".  The latter is
set only by upload-pack, which honors the claimed security boundary.

Note that local clones across ownership boundaries require --no-local so
that upload-pack is used.  Document this fact in the manual page and
provide an example.

This patch was based on one written by Junio C Hamano.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-15 11:05:06 +09:00
Taylor Blau e199290592 pack-objects: only perform verbatim reuse on the preferred pack
When reusing objects from source pack(s), write_reused_pack_verbatim()
is responsible for reusing objects whole eword_t's at a time. It works
by taking the longest continuous run of objects from the beginning of
each source pack that the caller wants, and reuses the entirety of that
section from each pack.

This is based on the assumption that we don't have any gaps within the
region. This assumption relieves us from having to patch any
OFS_DELTAs, since we know that there aren't any gaps between any delta
and its base in that region.

To illustrate why this assumption is necessary, suppose we have some
pack P, which has objects X, Y, and Z. If the MIDX's copy of Y was
selected from a pack other than P, then the bit corresponding to object
Y will appear earlier in the bitmap than the bits corresponding to X and
Z.

If pack-objects already has or will use the copy of Y from the pack it
was selected from in the MIDX, then it is an error to reuse all objects
between X and Z in the source pack. Doing so will cause us to reuse Y
from a different pack than the one which represents Y in the MIDX,
causing us to either:

 - include the object twice, assuming that the caller wants Y in the
   pack, or

 - include the object once, resulting in us packing more objects than
   necessary.

This regression comes from ca0fd69e37 (pack-objects: prepare
`write_reused_pack_verbatim()` for multi-pack reuse, 2023-12-14), which
incorrectly assumed that there would be no gaps in reusable regions of
non-preferred packs.

Instead, we can only safely perform the whole-word reuse optimization on
the preferred pack, where we know with certainty that no gaps exist in
that region of the bitmap. We can still reuse objects from non-preferred
packs, but we have to inspect them individually in write_reused_pack()
to ensure that any gaps that may exist are accounted for.

This allows us to simplify the implementation of
write_reused_pack_verbatim() back to almost its pre-multi-pack reuse
form, since we can now assume that the beginning of the pack appears at
the beginning of the bitmap, meaning that we don't have to account for
any bits up to the first word boundary (like we had to special case in
ca0fd69e37).

The only significant changes from the pre-ca0fd69e37 implementation are:

 - that we can no longer inspect words up to the end of
   reuse_packfile_bitmap->word_alloc, since we only want to look at
   words whose bits all correspond to objects in the given packfile, and

 - that we return early when given a reuse_packfile which is not
   preferred, making the call a noop.

In the future, it might be possible to restore this optimization if we
could guarantee that some reuse packs don't contain any gaps by
construction (similar to the "disjoint packs" idea in very early
versions of multi-pack reuse).

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-15 09:13:31 +09:00
Junio C Hamano 51ba601160 Merge branch 'en/shallow-exclude-takes-a-ref-fix'
The "--shallow-exclude=<ref>" option to various history transfer
commands takes a ref, not an arbitrary revision.

* en/shallow-exclude-takes-a-ref-fix:
  doc: correct misleading descriptions for --shallow-exclude
  upload-pack: fix ambiguous error message
2024-11-13 08:35:32 +09:00
Junio C Hamano 6890c99e38 Merge branch 'ps/leakfixes-part-9'
More leakfixes.

* ps/leakfixes-part-9: (22 commits)
  list-objects-filter-options: work around reported leak on error
  builtin/merge: release output buffer after performing merge
  dir: fix leak when parsing "status.showUntrackedFiles"
  t/helper: fix leaking buffer in "dump-untracked-cache"
  t/helper: stop re-initialization of `the_repository`
  sparse-index: correctly free EWAH contents
  dir: release untracked cache data
  combine-diff: fix leaking lost lines
  builtin/tag: fix leaking key ID on failure to sign
  transport-helper: fix leaking import/export marks
  builtin/commit: fix leaking cleanup config
  trailer: fix leaking strbufs when formatting trailers
  trailer: fix leaking trailer values
  builtin/commit: fix leaking change data contents
  upload-pack: fix leaking URI protocols
  pretty: clear signature check
  diff-lib: fix leaking diffopts in `do_diff_cache()`
  revision: fix leaking bloom filters
  builtin/grep: fix leak with `--max-count=0`
  grep: fix leak in `grep_splice_or()`
  ...
2024-11-13 08:35:31 +09:00
Simon Marchi 98e4015593 builtin/difftool: intialize some hashmap variables
When running a dir-diff command that produces no diff, variables
`wt_modified` and `tmp_modified` are used while uninitialized, causing:

    $ /home/smarchi/src/git/git-difftool --dir-diff master
    free(): invalid pointer
    [1]    334004 IOT instruction (core dumped)  /home/smarchi/src/git/git-difftool --dir-diff master
    $ valgrind --track-origins=yes /home/smarchi/src/git/git-difftool --dir-diff master
    ...
    Invalid free() / delete / delete[] / realloc()
       at 0x48478EF: free (vg_replace_malloc.c:989)
       by 0x422CAC: hashmap_clear_ (hashmap.c:208)
       by 0x283830: run_dir_diff (difftool.c:667)
       by 0x284103: cmd_difftool (difftool.c:801)
       by 0x238E0F: run_builtin (git.c:484)
       by 0x2392B9: handle_builtin (git.c:750)
       by 0x2399BC: cmd_main (git.c:921)
       by 0x356FEF: main (common-main.c:64)
     Address 0x1ffefff180 is on thread 1's stack
     in frame #2, created by run_dir_diff (difftool.c:358)
    ...

If taking any `goto finish` path before these variables are initialized,
`hashmap_clear_and_free()` operates on uninitialized data, sometimes
causing a crash.

This regression was introduced in 7f795a1715 (builtin/difftool: plug
several trivial memory leaks, 2024-09-26).

Fix it by initializing those variables with the `HASHMAP_INIT` macro.

Add a test comparing the main branch to itself, resulting in no diff.

Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-13 08:11:19 +09:00
Jeff King fe17a25905 refspec: store raw refspecs inside refspec_item
The refspec struct keeps two matched arrays: one for the refspec_item
structs and one for the original raw refspec strings. The main reason
for this is that there are other users of refspec_item that do not care
about the raw strings. But it does make managing the refspec struct
awkward, as we must keep the two arrays in sync. This has led to bugs in
the past (both leaks and double-frees).

Let's just store a copy of the raw refspec string directly in each
refspec_item struct. This simplifies the handling at a small cost:

  1. Direct callers of refspec_item_init() will now get an extra copy of
     the refspec string, even if they don't need it. This should be
     negligible, as the struct is already allocating two strings for the
     parsed src/dst values (and we tend to only do it sparingly anyway
     for things like the TAG_REFSPEC literal).

  2. Users of refspec_appendf() will now generate a temporary string,
     copy it, and then free the result (versus handing off ownership of
     the temporary string). We could get around this by having a "nodup"
     variant of refspec_item_init(), but it doesn't seem worth the extra
     complexity for something that is not remotely a hot code path.

Code which accesses refspec->raw now needs to look at refspec->item.raw.
Other callers which just use refspec_item directly can remain the same.
We'll free the allocated string in refspec_item_clear(), which they
should be calling anyway to free src/dst.

One subtle note: refspec_item_init() can return an error, in which case
we'll still have set its "raw" field. But that is also true of the "src"
and "dst" fields, so any caller which does not _clear() the failed item
is already potentially leaking. In practice most code just calls die()
on an error anyway, but you can see the exception in valid_fetch_refspec(),
which does correctly call _clear() even on error.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12 18:16:48 +09:00
Jeff King d36af33081 refspec: drop separate raw_nr count
A refspec struct contains zero or more refspec_item structs, along with
matching "raw" strings. The items and raw strings are kept in separate
arrays, but those arrays will always have the same length (because we
write them only via refspec_append_nodup(), which grows both). This can
lead to bugs when manipulating the array, since the arrays and lengths
must be modified in lockstep. For example, the bug fixed in the previous
commit, which forgot to decrement raw_nr.

So let's get rid of "raw_nr" and have only "nr", making this kind of bug
impossible (and also making it clear that the two are always matched,
something that existing code already assumed but was not guaranteed by
the interface).

Even though we'd expect "alloc" and "raw_alloc" to likewise move in
lockstep, we still need to keep separate counts there if we want to
continue to use ALLOC_GROW() for both.

Conceptually this would all be simpler if refspec_item just held onto
its own raw string, and we had a single array. But there are callers
which use refspec_item outside of "struct refspec" (and so don't hold on
to a matching "raw" string at all), which we'd possibly need to adjust.
So let's not worry about refactoring that for now, and just get rid of
the redundant count variable. That is the first step on the road to
combining them anyway.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12 18:16:48 +09:00
Jeff King b970509c59 fetch: adjust refspec->raw_nr when filtering prefetch refspecs
In filter_prefetch_refspecs(), we may remove one or more refspecs if
they point into refs/tags/. When we do, we remove the item from the
refspec->items array, shifting subsequent items down, and then decrement
the refspec->nr count.

We also remove the item from the refspec->raw array, but fail to
decrement refspec->raw_nr. This leaves us with a count that is too high,
and anybody looking at the "raw" array will erroneously see either:

  1. The removed entry, if there were no subsequent items to shift down.

  2. A duplicate of the final entry, as everything is shifted down but
     there was nothing to overwrite the final item.

The obvious culprit to run into this is calling refspec_clear(), which
will try to free the removed entry (case 1) or double-free the final
entry (case 2). But even though the bug has existed since the function
was added in 2e03115d0c (fetch: add --prefetch option, 2021-04-16), we
did not trigger it in the test suite. The --prefetch option is normally
only used with configured refspecs, and we never bother to call
refspec_clear() on those (they are stored as part of a struct remote,
which is held in a global variable).

But you could trigger case 2 manually like:

  git fetch --prefetch . refs/tags/foo refs/tags/bar

Ironically you couldn't trigger case 1, because the code accidentally
leaked the string in the raw array, and the two bugs (the leak and the
double-free) cancelled out. But when we fixed the leak in ea4780307c
(fetch: free "raw" string when shrinking refspec, 2024-09-24), it became
possible to trigger that, too, with a single item:

  git fetch --prefetch . refs/tags/foo

We can fix both cases by just correctly decrementing "raw_nr" when we
shrink the array. Even though we don't expect people to use --prefetch
with command-line refspecs, we'll add a test to make sure it behaves
well (like the test just before it, we're just confirming that the
filtered prefetch succeeds at all).

Reported-by: Eric Mills <ermills@epic.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12 18:16:47 +09:00
Jonathan Tan c08589efdc index-pack: repack local links into promisor packs
Teach index-pack to, when processing the objects in a pack with
--promisor specified on the CLI, repack local objects (and the local
objects that they refer to, recursively) referenced by these objects
into promisor packs.

This prevents the situation in which, when fetching from a promisor
remote, we end up with promisor objects (newly fetched) referring
to non-promisor objects (locally created prior to the fetch). This
situation may arise if the client had previously pushed objects to the
remote, for example. One issue that arises in this situation is that,
if the non-promisor objects become inaccessible except through promisor
objects (for example, if the branch pointing to them has moved to
point to the promisor object that refers to them), then GC will garbage
collect them. There are other ways to solve this, but the simplest
seems to be to enforce the invariant that we don't have promisor objects
referring to non-promisor objects.

This repacking is done from index-pack to minimize the performance
impact. During a fetch, the only time most objects are fully inflated
in memory is when their object ID is computed, so we also scan the
objects (to see which objects they refer to) during this time.

Also to minimize the performance impact, an object is calculated to be
local if it's a loose object or present in a non-promisor pack. (If it's
also in a promisor pack or referred to by an object in a promisor pack,
it is technically already a promisor object. But a misidentification
of a promisor object as a non-promisor object is relatively benign
here - we will thus repack that promisor object into a promisor pack,
duplicating it in the object store, but there is no correctness issue,
just an issue of inefficiency.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12 10:18:16 +09:00
Abhijeet Sonar 4da8d90fdd show-index: fix uninitialized hash function
In c8aed5e8da (repository: stop setting SHA1 as the default object
hash), we got rid of the default hash algorithm for the_repository.
Due to this change, it is now the responsibility of the callers to set
their own default when this is not present.

As stated in the docs, show-index should use SHA1 as the default hash
algorithm when run outside a repository. Make sure this promise is
met by falling back to SHA1 when the_hash_algo is not present (i.e.
when the command is run outside a repository). Also add a test that
verifies this behavior.

Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-11 12:16:59 +09:00
Junio C Hamano 02a2d5706d Merge branch 'jk/left-right-bitmap'
When called with '--left-right' and '--use-bitmap-index', 'rev-list'
will produce output without any left/right markers, which has been
corrected.

* jk/left-right-bitmap:
  rev-list: skip bitmap traversal for --left-right
2024-11-08 12:56:28 +09:00
Jeff King b8150bfee1 describe: stop traversing when we run out of names
When trying to describe a commit, we'll traverse from the commit,
collecting candidate tags that point to its ancestors. But once we've
seen all of the tags in the repo, there's no point in traversing
further. There's nothing left to find!

For a default "git describe", this isn't usually a big problem. In a
large repo you'll probably have multiple tags, so we'll eventually find
10 candidates (the default for max_candidates) and stop there. And in a
small repo, it's quick to traverse to the root.

But you can imagine a large repo with few tags. Or, as we saw in a real
world case, explicitly limiting the set of matches like this (on
linux.git):

  git describe --match=v6.12-rc4 HEAD

which goes all the way to the root before realizing that no, there are
no other tags under consideration besides the one we fed via --match.
If we add in "--candidates=1" there, it's much faster (at least as of
the previous commit).

But we should be able to speed this up without the user asking for it.
After expanding all matching tags, we know the total number of names. We
could just stop the traversal there, but as hinted at above we already
have a mechanism for doing that: the max_candidate limit. So we can just
reduce that limit to match the number of possible candidates.

Our p6100 test shows this off:

  Test                                           HEAD^             HEAD
  ---------------------------------------------------------------------------------------
  6100.2: describe HEAD                          0.71(0.65+0.06)   0.72(0.68+0.04) +1.4%
  6100.3: describe HEAD with one max candidate   0.01(0.00+0.00)   0.01(0.00+0.00) +0.0%
  6100.4: describe HEAD with one tag             0.72(0.66+0.05)   0.01(0.00+0.00) -98.6%

Now we are fast automatically, just as if --candidates=1 were supplied
by the user.

Reported-by: Josh Poimboeuf <jpoimboe@kernel.org>
Helped-by: Rasmus Villemoes <ravi@prevas.dk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-07 13:28:22 +09:00
Jeff King 7379046221 describe: stop digging for max_candidates+1
By default, describe considers only 10 candidate matches, and stops
traversing when we have enough. This makes things much faster in a large
repository, where collecting all candidates requires walking all the way
down to the root (or at least to the oldest tag). This goes all the way
back to 8713ab3079 (Improve git-describe performance by reducing
revision listing., 2007-01-13).

However, we don't stop immediately when we have enough candidates. We
keep traversing and only bail when we find one more candidate that we're
ignoring. Usually this is not too expensive, if the tags are sprinkled
evenly throughout history. But if you are unlucky, you might hit the max
candidate quickly, and then have a huge swath of history before finding
the next one.

Our p6100 test has exactly this unlucky case: with a max of "1", we find
a recent tag quickly and then have to go all the way to the root to find
the old tag that will be discarded.

A more interesting real-world case is:

  git describe --candidates=1 --match=v6.12-rc4 HEAD

in the linux.git repo. There we restrict the set of tags to a single
one, so there is no older candidate to find at all! But despite
--candidates=1, we keep traversing to the root only to find nothing.

So why do we keep traversing after hitting thet max? There are two
reasons I can see:

  1. In theory the extra information that there was another candidate
     could be useful, and we record it in the gave_up_on variable. But
     we only show this information with --debug.

  2. After finding the candidate, there's more processing we do in our
     loop. The most important of this is propagating the "within" flags
     to our parent commits, and putting them in the commit_list we'll
     use for finish_depth_computation().

     That function continues the traversal until we've counted all
     commits reachable from the starting point but not reachable from
     our best candidate tag (so essentially counting "$tag..$start", but
     avoiding re-walking over the bits we've seen).  If we break
     immediately without putting those commits into the list, our depth
     computation will be wrong (in the worst case we'll count all the
     way down to the root, not realizing those commits are included in
     our tag).

But we don't need to find a new candidate for (2). As soon as we finish
the loop iteration where we hit max_candidates, we can then quit on the
next iteration. This should produce the same output as the original code
(which could, after all, find a candidate on the very next commit
anyway) but ends the traversal with less pointless digging.

We still have to set "gave_up_on"; we've popped it off the list and it
has to go back. An alternative would be to re-order the loop so that it
never gets popped, but it's perhaps still useful to show in the --debug
output, so we need to know it anyway. We do have to adjust the --debug
output since it's now just a commit where we stopped traversing, and not
the max+1th candidate.

p6100 shows the speedup using linux.git:

  Test                                           HEAD^             HEAD
  ---------------------------------------------------------------------------------------
  6100.2: describe HEAD                          0.70(0.63+0.06)   0.71(0.66+0.04) +1.4%
  6100.3: describe HEAD with one max candidate   0.70(0.64+0.05)   0.01(0.00+0.00) -98.6%
  6100.4: describe HEAD with one tag             0.70(0.67+0.03)   0.70(0.63+0.06) +0.0%

Reported-by: Josh Poimboeuf <jpoimboe@kernel.org>
Helped-by: Rasmus Villemoes <ravi@prevas.dk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-07 13:28:22 +09:00
Junio C Hamano 2664f2a0cb Merge branch 'ps/leakfixes-part-9' into ps/leakfixes-part-10
* ps/leakfixes-part-9: (22 commits)
  list-objects-filter-options: work around reported leak on error
  builtin/merge: release output buffer after performing merge
  dir: fix leak when parsing "status.showUntrackedFiles"
  t/helper: fix leaking buffer in "dump-untracked-cache"
  t/helper: stop re-initialization of `the_repository`
  sparse-index: correctly free EWAH contents
  dir: release untracked cache data
  combine-diff: fix leaking lost lines
  builtin/tag: fix leaking key ID on failure to sign
  transport-helper: fix leaking import/export marks
  builtin/commit: fix leaking cleanup config
  trailer: fix leaking strbufs when formatting trailers
  trailer: fix leaking trailer values
  builtin/commit: fix leaking change data contents
  upload-pack: fix leaking URI protocols
  pretty: clear signature check
  diff-lib: fix leaking diffopts in `do_diff_cache()`
  revision: fix leaking bloom filters
  builtin/grep: fix leak with `--max-count=0`
  grep: fix leak in `grep_splice_or()`
  ...
2024-11-07 13:25:01 +09:00
Elijah Newren 00e10e0751 doc: correct misleading descriptions for --shallow-exclude
The documentation for the --shallow-exclude option to clone/fetch/etc.
claims that the option takes a revision, but it does not.  As per
upload-pack.c's process_deepen_not(), it passes the option to
expand_ref() and dies if it does not find exactly one ref matching the
name passed.  Further, this has always been the case ever since these
options were introduced by the commits merged in a460ea4a3c (Merge
branch 'nd/shallow-deepen', 2016-10-10).  Fix the documentation to
match the implementation.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:53:23 -08:00
Patrick Steinhardt ff67083ccd builtin/merge: release output buffer after performing merge
The `obuf` member of `struct merge_options` is used to buffer output in
some cases. In order to not discard its allocated memory we only release
its contents in `merge_finalize()` when we're not currently recursing
into a subtree.

This results in some situations where we seemingly do not release the
buffer reliably. We thus have calls to `strbuf_release()` for this
buffer scattered across the codebase. But we're missing one callsite in
git-merge(1), which causes a memory leak.

We should ideally refactor this interface so that callers don't have to
know about any such internals. But for now, paper over the issue by
adding one more `strbuf_release()` call.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:57 -08:00
Patrick Steinhardt d06e3ec858 builtin/tag: fix leaking key ID on failure to sign
We do not free the key ID when signing a tag fails. Do so by using
the common exit path.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:55 -08:00
Patrick Steinhardt 6ef9f77a15 builtin/commit: fix leaking cleanup config
The cleanup string set by the config is leaking when it is being
overridden by an option. Fix this by tracking these via two separate
variables such that we can free the old value.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:54 -08:00
Patrick Steinhardt d34b5cbf02 builtin/commit: fix leaking change data contents
While we free the worktree change data, we never free its contents. Fix
this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:53 -08:00
Patrick Steinhardt 43fedde3df builtin/grep: fix leak with `--max-count=0`
When executing with `--max-count=0` we'll return early from git-grep(1)
without performing any cleanup, which causes memory leaks. Plug these.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:52 -08:00
Patrick Steinhardt 5f5dd8e297 builtin/ls-remote: plug leaking server options
The list of server options populated via `OPT_STRING_LIST()` is never
cleared, causing a memory leak. Plug it.

This leak is exposed by t5702, but plugging it alone does not make the
whole test suite pass.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:51 -08:00
Taylor Blau 1c5a712f26 Merge branch 'jk/dumb-http-finalize'
The dumb-http code regressed when the result of re-indexing a pack
yielded an *.idx file that differs in content from the *.idx file it
downloaded from the remote. This has been corrected by no longer
relying on the *.idx file we got from the remote.

* jk/dumb-http-finalize:
  packfile: use oidread() instead of hashcpy() to fill object_id
  packfile: use object_id in find_pack_entry_one()
  packfile: convert find_sha1_pack() to use object_id
  http-walker: use object_id instead of bare hash
  packfile: warn people away from parse_packed_git()
  packfile: drop sha1_pack_index_name()
  packfile: drop sha1_pack_name()
  packfile: drop has_pack_index()
  dumb-http: store downloaded pack idx as tempfile
  t5550: count fetches in "previously-fetched .idx" test
  midx: avoid duplicate packed_git entries
2024-11-01 12:53:32 -04:00
Taylor Blau 20ab7fa3b6 Merge branch 'sa/notes-edit'
Teach 'git notes add' and 'git notes append' a new '-e' flag,
instructing them to open the note in $GIT_EDITOR before saving.

* sa/notes-edit:
  notes: teach the -e option to edit messages in editor
2024-11-01 12:53:25 -04:00
Taylor Blau 7b1f01f02e Merge branch 'ss/duplicate-typos'
Typofixes.

* ss/duplicate-typos:
  global: Fix duplicate word typos
2024-11-01 12:53:23 -04:00
Taylor Blau 787297b396 Merge branch 'rj/cygwin-exit'
Treat ECONNABORTED the same as ECONNRESET in 'git credential-cache' to
work around a possible Cygwin regression. This resolves a race condition
caused by changes in Cygwin's handling of socket closures, allowing the
client to exit cleanly when encountering ECONNABORTED.

* rj/cygwin-exit:
  credential-cache: treat ECONNABORTED like ECONNRESET
2024-11-01 12:53:19 -04:00
Taylor Blau 268fd2fe58 Merge branch 'ps/platform-compat-fixes'
Various platform compatibility fixes split out of the larger effort
to use Meson as the primary build tool.

* ps/platform-compat-fixes:
  t6006: fix prereq handling with `test_format ()`
  http: fix build error on FreeBSD
  builtin/credential-cache: fix missing parameter for stub function
  t7300: work around platform-specific behaviour with long paths on MinGW
  t5500, t5601: skip tests which exercise paths with '[::1]' on Cygwin
  t3404: work around platform-specific behaviour on macOS 10.15
  t1401: make invocation of tar(1) work with Win32-provided one
  t/lib-gpg: fix setup of GNUPGHOME in MinGW
  t/lib-gitweb: test against the build version of gitweb
  t/test-lib: wire up NO_ICONV prerequisite
  t/test-lib: fix quoting of TEST_RESULTS_SAN_FILE
2024-11-01 12:53:17 -04:00
Jeff King 16a186fede rev-list: skip bitmap traversal for --left-right
Running:

  git rev-list --left-right --use-bitmap-index one...two

will produce output without any left-right markers, since the bitmap
traversal returns only a single set of reachable commits. Instead we
should refuse to use bitmaps here and produce the correct output using a
traditional traversal.

This is probably not the only remaining option that misbehaves with
bitmaps, but it's particularly egregious in that it feels like it
_could_ work. Doing two separate traversals for the left/right sides and
then taking the symmetric set differences should yield the correct
answer, but our traversal code doesn't know how to do that.

It's not clear if naively doing two separate traversals would always be
a performance win. A traditional traversal only needs to walk down to
the merge base, but bitmaps always fill out the full reachability set.
So depending on your bitmap coverage, we could end up walking old bits
of history twice to fill out the same uninteresting bits on both sides.
We'd also of course end up with a very large --boundary set, if the user
asked for that.

So this might or might not be something worth implementing later. But
for now, let's make sure we don't produce the wrong answer if somebody
tries it.

The test covers this, but also the same thing with "--count" (which is
what I originally tried in a real-world case). Ironically the
try_bitmap_count() code already realizes that "--left-right" won't work
there. But that just causes us to fall back to the regular bitmap
traversal code, which itself doesn't handle counting (we produce a list
of objects rather than a count).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-11-01 11:02:27 -04:00
Jeff King 479ab76c9f packfile: use object_id in find_pack_entry_one()
The main function we use to search a pack index for an object is
find_pack_entry_one(). That function still takes a bare pointer to the
hash, despite the fact that its underlying bsearch_pack() function needs
an object_id struct. And so we end up making an extra copy of the hash
into the struct just to do a lookup.

As it turns out, all callers but one already have such an object_id. So
we can just take a pointer to that struct and use it directly. This
avoids the extra copy and provides a more type-safe interface.

The one exception is get_delta_base() in packfile.c, when we are chasing
a REF_DELTA from inside the pack (and thus we have a pointer directly to
the mmap'd pack memory, not a struct). We can just bump the hashcpy()
from inside find_pack_entry_one() to this one caller that needs it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25 17:35:46 -04:00
Jeff King 4d99559147 packfile: convert find_sha1_pack() to use object_id
The find_sha1_pack() function has a few problems:

  - it's badly named, since it works with any object hash

  - it takes the hash as a bare pointer rather than an object_id struct

We can fix both of these easily, as all callers actually have a real
object_id anyway.

I also found the existence of this function somewhat confusing, as it is
about looking in an arbitrary set of linked packed_git structs. It's
good for things like dumb-http which are looking in downloaded remote
packs, and not our local packs. But despite the name, it is not a good
way to find the pack which contains a local object (it skips the use of
the midx, the pack mru list, and so on).

So let's also add an explanatory comment above the declaration that may
point people in the right direction.

I suspect the calls in fast-import.c, which use the packed_git list from
the repository struct, could actually just be using find_pack_entry().
But since we'd need to keep it anyway for dumb-http, I didn't dig
further there. If we eventually drop dumb-http support, then it might be
worth examining them to see if we can get rid of the function entirely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25 17:35:46 -04:00
Jeff King 4390fea963 packfile: drop sha1_pack_index_name()
Like sha1_pack_name() that we dropped in the previous commit, this
function uses an error-prone static strbuf and the somewhat misleading
name "sha1".

The only caller left is in pack-redundant.c. While this command is
marked for potential removal in our BreakingChanges document, we still
have it for now. But it's simple enough to convert it to use its own
strbuf with the underlying odb_pack_name() function, letting us drop the
otherwise obsolete function.

Note that odb_pack_name() does its own strbuf_reset(), so it's safe to
use directly within a loop like this.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25 17:35:46 -04:00
Taylor Blau 55d12c24d7 Merge branch 'wm/shortlog-hash'
Teaches 'shortlog' to explicitly use SHA-1 when operating outside of
a repository.

* wm/shortlog-hash:
  builtin/shortlog: explicitly set hash algo when there is no repo
2024-10-25 14:02:49 -04:00
Taylor Blau 0ab43ed95c Merge branch 'jc/a-commands-without-the-repo'
Commands that can also work outside Git have learned to take the
repository instance "repo" when we know we are in a repository, and
NULL when we are not, in a parameter.  The uses of the_repository
variable in a few of them have been removed using the new calling
convention.

* jc/a-commands-without-the-repo:
  archive: remove the_repository global variable
  annotate: remove usage of the_repository global
  git: pass in repo to builtin based on setup_git_directory_gently
2024-10-25 14:02:36 -04:00
Taylor Blau 6cbcc68ea7 Merge branch 'db/submodule-fetch-with-remote-name-fix'
A "git fetch" from the superproject going down to a submodule used
a wrong remote when the default remote names are set differently
between them.

* db/submodule-fetch-with-remote-name-fix:
  submodule: correct remote name with fetch
2024-10-25 14:01:09 -04:00
Taylor Blau 8e08668322 Merge branch 'cw/worktree-relative'
An extra worktree attached to a repository points at each other to
allow finding the repository from the worktree and vice versa
possible.  Turn this linkage to relative paths.

* cw/worktree-relative:
  worktree: add test for path handling in linked worktrees
  worktree: link worktrees with relative paths
  worktree: refactor infer_backlink() to use *strbuf
  worktree: repair copied repository and linked worktrees
2024-10-22 14:40:39 -04:00
Sven Strickroth c32d4a8cfe global: Fix duplicate word typos
Used regex to find these typos:

    (?<!struct )(?<=\s)([a-z]{1,}) \1(?=\s)

Signed-off-by: Sven Strickroth <email@cs-ware.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-21 16:05:04 -04:00
Abraham Samuel Adekunle dab0b9e176 notes: teach the -e option to edit messages in editor
Notes can be added to a commit using:
	- "-m" to provide a message on the command line.
	- -C to copy a note from a blob object.
	- -F to read the note from a file.
When these options are used, Git does not open an editor,
it simply takes the content provided via these options and
attaches it to the commit as a note.

Improve flexibility to fine-tune the note before finalizing it
by allowing the messages to be prefilled in the editor and edited
after the messages have been provided through -[mF].

Signed-off-by: Abraham Samuel Adekunle <abrahamadekunle50@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-21 15:52:48 -04:00
Ramsay Jones 468a7e41e8 credential-cache: treat ECONNABORTED like ECONNRESET
On Cygwin, t0301 fails because "git credential-cache exit" returns a
non-zero exit code. What's supposed to happen here is:

  1. The client (the "credential-cache" invocation above) connects to a
     previously-spawned credential-cache--daemon.

  2. The client sends an "exit" command to the daemon.

  3. The daemon unlinks the socket and then exits, closing the
     descriptor back to the client.

  4. The client sees EOF on the descriptor and exits successfully.

That works on most platforms, and even _used_ to work on Cygwin. But
that changed in Cygwin's ef95c03522 (Cygwin: select: Fix FD_CLOSE
handling, 2021-04-06). After that commit, the client sees a read error
with errno set to ECONNABORTED, and it reports the error and dies.

It's not entirely clear if this is a Cygwin bug. It seems that calling
fclose() on the filehandles pointing to the sockets is sufficient to
avoid this error return, even though exiting should in general look the
same from the client's perspective.

However, we can't just call fclose() here. It's important in step 3
above to unlink the socket before closing the descriptor to avoid the
race mentioned by 7d5e9c9849 (credential-cache--daemon: clarify "exit"
action semantics, 2016-03-18). The client will exit as soon as it sees
the descriptor close, and the daemon may or may not have actually
unlinked the socket by then. That makes test code like this:

  git credential exit &&
  test_path_is_missing .git-credential-cache

racy.

So we probably _could_ fix this by calling:

  delete_tempfile(&socket_file);
  fclose(in);
  fclose(out);

before we exit(). Or by replacing the exit() with a return up the stack,
in which case the fclose() happens as we unwind. But in that case we'd
still need to call delete_tempfile() here to avoid the race.

But simpler still is that we can notice that we already special-case
ECONNRESET on the client side, courtesy of 1f180e5eb9 (credential-cache:
interpret an ECONNRESET as an EOF, 2017-07-27). We can just do the same
thing here (I suspect that prior to the Cygwin commit that introduced
this problem, we were really just seeing ECONNRESET instead of
ECONNABORTED, so the "new" problem is just the switch of the errno
values).

There's loads more debugging in this thread:

  https://lore.kernel.org/git/9dc3e85f-a532-6cff-de11-1dfb2e4bc6b6@ramsayjones.plus.com/

but I've tried to summarize the useful bits in this commit message.

[jk: commit message]

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-18 17:18:05 -04:00
Taylor Blau c1662a00b6 Merge branch 'ps/maintenance-start-crash-fix'
"git maintenance start" crashed due to an uninitialized variable
reference, which has been corrected.

* ps/maintenance-start-crash-fix:
  builtin/gc: fix crash when running `git maintenance start`
2024-10-18 13:56:26 -04:00
Taylor Blau 645cc7a2a7 Merge branch 'kh/checkout-ignore-other-docfix'
Doc updates.

* kh/checkout-ignore-other-docfix:
  checkout: refer to other-worktree branch, not ref
2024-10-18 13:56:24 -04:00
Wolfgang Müller b33001645e builtin/shortlog: explicitly set hash algo when there is no repo
Whilst git-shortlog(1) does not explicitly need any repository
information when run without reference to one, it still parses some of
its arguments with parse_revision_opt() which assumes that the hash
algorithm is set. However, in c8aed5e8da (repository: stop setting SHA1
as the default object hash, 2024-05-07) we stopped setting up a default
hash algorithm and instead require commands to set it up explicitly.

This was done for most other commands like in ab274909d4 (builtin/diff:
explicitly set hash algo when there is no repo, 2024-05-07) but was
missed for builtin/shortlog, making git-shortlog(1) segfault outside of
a repository when given arguments like --author that trigger a call to
parse_revision_opt().

Fix this for now by explicitly setting the hash algorithm to SHA1. Also
add a regression test for the segfault.

Thanks-to: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Wolfgang Müller <wolf@oriole.systems>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-17 16:10:54 -04:00
Patrick Steinhardt 87ad2a9d56 builtin/credential-cache: fix missing parameter for stub function
When not compiling the credential cache we may use a stub function for
`cmd_credential_cache()`. With commit 9b1cb5070f (builtin: add a
repository parameter for builtin functions, 2024-09-13), we have added a
new parameter to all of those top-level `cmd_*()` functions, and did
indeed adapt the non-stubbed-out `cmd_credential_cache()`. But we didn't
adapt the stubbed-out variant, so the code does not compile.

Fix this by adding the missing parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-16 17:00:49 -04:00
Taylor Blau b43e23fa02 Merge branch 'jk/fsmonitor-event-listener-race-fix'
On macOS, fsmonitor can fall into a race condition that results in
a client waiting forever to be notified for an event that have
already happened.  This problem has been corrected.

* jk/fsmonitor-event-listener-race-fix:
  fsmonitor: initialize fs event listener before accepting clients
  simple-ipc: split async server initialization and running
2024-10-15 16:56:43 -04:00
Taylor Blau fd98f659fd Merge branch 'xx/remote-server-option-config'
A new configuration variable remote.<name>.serverOption makes the
transport layer act as if the --serverOption=<value> option is
given from the command line.

* xx/remote-server-option-config:
  ls-remote: leakfix for not clearing server_options
  fetch: respect --server-option when fetching multiple remotes
  transport.c:🤝 make use of server options from remote
  remote: introduce remote.<name>.serverOption configuration
  transport: introduce parse_transport_option() method
2024-10-15 16:56:43 -04:00
Taylor Blau f004467b04 Merge branch 'jh/config-unset-doc-fix'
Docfix.

* jh/config-unset-doc-fix:
  git-config.1: remove value from positional args in unset usage
2024-10-15 16:56:43 -04:00
Linus Arver 3f0346d4dc trailer: spread usage of "trailer_block" language
Deprecate the "trailer_info" struct name and replace it with
"trailer_block". This is more readable, for two reasons:

  1. "trailer_info" on the surface sounds like it's about a single
     trailer when in reality it is a collection of one or more trailers,
     and

  2. the "*_block" suffix is more informative than "*_info", because it
     describes a block (or region) of contiguous text which has trailers
     in it, which has been parsed into the trailer_block structure.

Rename the

    size_t trailer_block_start, trailer_block_end;

members of trailer_info to just "start" and "end". Rename the "info"
pointer to "trailer_block" because it is more descriptive. Update
comments accordingly.

Signed-off-by: Linus Arver <linus@ucla.edu>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-14 12:33:02 -04:00
John Cai 528d3e4d53 archive: remove the_repository global variable
As part of the effort to get rid of global state due to the global
the_repository variable, replace the_repository with the repository
argument that gets passed down through the builtin function.

The repo might be NULL, but we should be safe in write_archive() because
it detects if we are outside of a repository and calls
setup_git_directory() which will error.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-11 09:37:18 -07:00
John Cai ebe8f4b6ec annotate: remove usage of the_repository global
As part of the effort to get rid of global state due to the_repository
variable, remove the the_repository with the repository argument that
gets passed down through the builtin function.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-11 09:37:18 -07:00
John Cai 5db948d413 git: pass in repo to builtin based on setup_git_directory_gently
The current code in run_builtin() passes in a repository to the builtin
based on whether cmd_struct's option flag has RUN_SETUP.

This is incorrect, however, since some builtins that only have
RUN_SETUP_GENTLY can potentially take a repository.
setup_git_directory_gently() tells us whether or not a command is being
run inside of a repository.

Use the output of setup_git_directory_gently() to help determine whether
or not there is a repository to pass to the builtin. If not, then we
just pass NULL.

As part of this patch, we need to modify add to check for a NULL repo
before calling repo_git_config(), since add -h can be run outside of a
repository.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-11 09:37:17 -07:00
Junio C Hamano 31bc4454de Merge branch 'ps/leakfixes-part-8'
More leakfixes.

* ps/leakfixes-part-8: (23 commits)
  builtin/send-pack: fix leaking list of push options
  remote: fix leaking push reports
  t/helper: fix leaks in proc-receive helper
  pack-write: fix return parameter of `write_rev_file_order()`
  revision: fix leaking saved parents
  revision: fix memory leaks when rewriting parents
  midx-write: fix leaking buffer
  pack-bitmap-write: fix leaking OID array
  pseudo-merge: fix leaking strmap keys
  pseudo-merge: fix various memory leaks
  line-log: fix several memory leaks
  diff: improve lifecycle management of diff queues
  builtin/revert: fix leaking `gpg_sign` and `strategy` config
  t/helper: fix leaking repository in partial-clone helper
  builtin/clone: fix leaking repo state when cloning with bundle URIs
  builtin/pack-redundant: fix various memory leaks
  builtin/stash: fix leaking `pathspec_from_file`
  submodule: fix leaking submodule entry list
  wt-status: fix leaking buffer with sparse directories
  shell: fix leaking strings
  ...
2024-10-10 14:22:29 -07:00
Kristoffer Haugsbakk b8139c8f4e checkout: refer to other-worktree branch, not ref
We can only check out commits or branches, not refs in general.  And the
problem here is if another worktree is using the branch that we want to
check out.

Let’s be more direct and just talk about branches instead of refs.

Also replace “be held” with “in use”.  Further, “in use” is not
restricted to a branch being checked out (e.g. the branch could be busy
on a rebase), hence generalize to “or otherwise in use” in the option
description.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-10 13:09:13 -07:00
Patrick Steinhardt c95547a394 builtin/gc: fix crash when running `git maintenance start`
It was reported on the mailing list that running `git maintenance start`
immediately segfaults starting with b6c3f8e12c (builtin/maintenance: fix
leak in `get_schedule_cmd()`, 2024-09-26). And indeed, this segfault is
trivial to reproduce up to a point where one is scratching their head
why we didn't catch this regression in our test suite.

The root cause of this error is `get_schedule_cmd()`, which does not
populate the `out` parameter in all cases anymore starting with the
mentioned commit. Callers do assume it to always be populated though and
will e.g. call `strvec_split()` on the returned value, which will of
course segfault when the variable is uninitialized.

So why didn't we catch this trivial regression? The reason is that our
tests always set up the "GIT_TEST_MAINT_SCHEDULER" environment variable
via "t/test-lib.sh", which allows us to override the scheduler command
with a custom one so that we don't accidentally modify the developer's
system. But the faulty code where we don't set the `out` parameter will
only get hit in case that environment variable is _not_ set, which is
never the case when executing our tests.

Fix the regression by again unconditionally allocating the value in the
`out` parameter, if provided. Add a test that unsets the environment
variable to catch future regressions in this area.

Reported-by: Shubham Kanodia <shubham.kanodia10@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-10 10:04:43 -07:00
Daniel Black 0c1a9987da submodule: correct remote name with fetch
The code fetches the submodules remote based on the superproject remote name
instead of the submodule remote name[1].

Instead of grabbing the default remote of the superproject repository, ask
the default remote of the submodule we are going to run 'git fetch' in.

1. https://lore.kernel.org/git/ZJR5SPDj4Wt_gmRO@pweza/

Signed-off-by: Daniel Black <daniel@mariadb.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-09 10:48:08 -07:00
Josh Heinrichs f36b8cbaef git-config.1: remove value from positional args in unset usage
The synopsis for `git config unset` mentions two positional arguments:
`<name>` and `<value>`. While the first argument is correct, the second
is not. Users are expected to provide the value via `--value=<value>`.

Remove the positional argument. The `--value=<value>` option is already
documented correctly, so this is all we need to do to fix the
documentation.

Signed-off-by: Josh Heinrichs <joshiheinrichs@gmail.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-08 23:35:45 -07:00
Jeff King 51907f8fee fsmonitor: initialize fs event listener before accepting clients
There's a racy hang in fsmonitor on macOS that we sometimes see in CI.
When we serve a client, what's supposed to happen is:

  1. The client thread calls with_lock__wait_for_cookie() in which we
     create a cookie file and then wait for a pthread_cond event

  2. The filesystem event listener sees the cookie file creation, does
     some internal book-keeping, and then triggers the pthread_cond.

But there's a problem: we start the listener that accepts client threads
before we start the fs event thread. So it's possible for us to accept a
client which creates the cookie file and starts waiting before the fs
event thread is initialized, and we miss those filesystem events
entirely. That leaves the client thread hanging forever.

In CI, the symptom is that t9210 (which is testing scalar, which always
enables fsmonitor under the hood) may hang forever in "scalar clone". It
is waiting on "git fetch" which is waiting on the fsmonitor daemon.

The race happens more frequently under load, but you can trigger it
predictably with a sleep like this, which delays the start of the fs
event thread:

  --- a/compat/fsmonitor/fsm-listen-darwin.c
  +++ b/compat/fsmonitor/fsm-listen-darwin.c
  @@ -510,6 +510,7 @@ void fsm_listen__loop(struct fsmonitor_daemon_state *state)
          FSEventStreamSetDispatchQueue(data->stream, data->dq);
          data->stream_scheduled = 1;

  +       sleep(1);
          if (!FSEventStreamStart(data->stream)) {
                  error(_("Failed to start the FSEventStream"));
                  goto force_error_stop_without_loop;

One solution might be to reverse the order of initialization: start the
fs event thread before we start the thread listening for clients. But
the fsmonitor code explicitly does it in the opposite direction. The fs
event thread wants to refer to the ipc_server_data struct, so we need it
to be initialized first.

A further complication is that we need a signal from the fs event thread
that it is actually ready and listening. And those details happen within
backend-specific fsmonitor code, whereas the initialization is in the
shared code.

So instead, let's use the ipc_server init/start split added in the
previous commit. The generic fsmonitor code will init the ipc_server but
_not_ start it, leaving that to the backend specific code, which now
needs to call ipc_server_start_async() at the right time.

For macOS, that is right after we start the FSEventStream that you can
see in the diff above.

It's not clear to me if Windows suffers from the same problem (and we
simply don't trigger it in CI), or if it is immune. Regardless, the
obvious place to start accepting clients there is right after we've
established the ReadDirectoryChanges watch.

This makes the hangs go away in our macOS CI environment, even when
compiled with the sleep() above.

Helped-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-08 12:03:56 -07:00
Jeff King 766fce69e9 simple-ipc: split async server initialization and running
To start an async ipc server, you call ipc_server_run_async(). That
initializes the ipc_server_data object, and starts all of the threads
running, which may immediately start serving clients.

This can create some awkward timing problems, though. In the fsmonitor
daemon (the sole user of the simple-ipc system), we want to create the
ipc server early in the process, which means we may start serving
clients before the rest of the daemon is fully initialized.

To solve this, let's break run_async() into two parts: an initialization
which allocates all data and spawns the threads (without letting them
run), and a start function which actually lets them begin work. Since we
have two simple-ipc implementations, we have to handle this twice:

  - in ipc-unix-socket.c, we have a central listener thread which hands
    connections off to worker threads using a work_available mutex. We
    can hold that mutex after init, and release it when we're ready to
    start.

    We do need an extra "started" flag so that we know whether the main
    thread is holding the mutex or not (e.g., if we prematurely stop the
    server, we want to make sure all of the worker threads are released
    to hear about the shutdown).

  - in ipc-win32.c, we don't have a central mutex. So we'll introduce a
    new startup_barrier mutex, which we'll similarly hold until we're
    ready to let the threads proceed.

    We again need a "started" flag here to make sure that we release the
    barrier mutex when shutting down, so that the sub-threads can
    proceed to the finish.

I've renamed the run_async() function to init_async() to make sure we
catch all callers, since they'll now need to call the matching
start_async().

We could leave run_async() as a wrapper that does both, but there's not
much point. There are only two callers, one of which is fsmonitor, which
will want to actually do work between the two calls. And the other is
just a test-tool wrapper.

For now I've added the start_async() calls in fsmonitor where they would
otherwise have happened, so there should be no behavior change with this
patch.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-08 12:03:56 -07:00
Caleb White 717af916cd worktree: link worktrees with relative paths
Git currently stores absolute paths to both the main repository and
linked worktrees. However, this causes problems when moving repositories
or working in containerized environments where absolute paths differ
between systems. The worktree links break, and users are required to
manually execute `worktree repair` to repair them, leading to workflow
disruptions. Additionally, mapping repositories inside of containerized
environments renders the repository unusable inside the containers, and
this is not repairable as repairing the worktrees inside the containers
will result in them being broken outside the containers.

To address this, this patch makes Git always write relative paths when
linking worktrees. Relative paths increase the resilience of the
worktree links across various systems and environments, particularly
when the worktrees are self-contained inside the main repository (such
as when using a bare repository with worktrees). This improves
portability, workflow efficiency, and reduces overall breakages.

Although Git now writes relative paths, existing repositories with
absolute paths are still supported. There are no breaking changes
to workflows based on absolute paths, ensuring backward compatibility.

At a low level, the changes involve modifying functions in `worktree.c`
and `builtin/worktree.c` to use `relative_path()` when writing the
worktree’s `.git` file and the main repository’s `gitdir` reference.
Instead of hardcoding absolute paths, Git now computes the relative path
between the worktree and the repository, ensuring that these links are
portable. Locations where these respective file are read have also been
updated to properly handle both absolute and relative paths. Generally,
relative paths are always resolved into absolute paths before any
operations or comparisons are performed.

Additionally, `repair_worktrees_after_gitdir_move()` has been introduced
to address the case where both the `<worktree>/.git` and
`<repo>/worktrees/<id>/gitdir` links are broken after the gitdir is
moved (such as during a re-initialization). This function repairs both
sides of the worktree link using the old gitdir path to reestablish the
correct paths after a move.

The `worktree.path` struct member has also been updated to always store
the absolute path of a worktree. This ensures that worktree consumers
never have to worry about trying to resolve the absolute path themselves.

Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-08 11:49:22 -07:00
Xing Xin 0f490d270a ls-remote: leakfix for not clearing server_options
Ensure `server_options` is properly cleared using `string_list_clear()`
in `builtin/ls-remote.c:cmd_ls_remote`.

Although we cannot yet enable `TEST_PASSES_SANITIZE_LEAK=true` for
`t/t5702-protocol-v2.sh` due to other existing leaks, this fix ensures
that "git-ls-remote" related server options tests pass the sanitize leak
check:

  ...
  ok 12 - server-options are sent when using ls-remote
  ok 13 - server-options from configuration are used by ls-remote
  ...

Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-08 10:22:10 -07:00