Commit Graph

76220 Commits (e565f3755342caf1d21e22359eaf09ec11d8c0ae)

Author SHA1 Message Date
Junio C Hamano e565f37553 Merge branch 'ds/backfill'
Lazy-loading missing files in a blobless clone on demand is costly
as it tends to be one-blob-at-a-time.  "git backfill" is introduced
to help bulk-download necessary files beforehand.

* ds/backfill:
  backfill: assume --sparse when sparse-checkout is enabled
  backfill: add --sparse option
  backfill: add --min-batch-size=<n> option
  backfill: basic functionality and tests
  backfill: add builtin boilerplate
2025-02-18 15:30:31 -08:00
Junio C Hamano 0394451348 The eleventh batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-14 17:53:49 -08:00
Junio C Hamano 60cb8e79cb Merge branch 'ps/doc-http-upload-archive-service'
Doc update.

* ps/doc-http-upload-archive-service:
  doc: documentation for http.uploadarchive config option
2025-02-14 17:53:49 -08:00
Junio C Hamano 82522a9e2c Merge branch 'kn/reflog-migration-fix-followup'
Code clean-up.

* kn/reflog-migration-fix-followup:
  reftable: prevent 'update_index' changes after adding records
  refs: use 'uint64_t' for 'ref_update.index'
  refs: mark `ref_transaction_update_reflog()` as static
2025-02-14 17:53:48 -08:00
Junio C Hamano c3fffcfe8e Merge branch 'bf/fetch-set-head-fix'
Fetching into a bare repository incorrectly assumed it always used
a mirror layout when deciding to update remote-tracking HEAD, which
has been corrected.

* bf/fetch-set-head-fix:
  fetch set_head: fix non-mirror remotes in bare repositories
  fetch set_head: refactor to use remote directly
2025-02-14 17:53:48 -08:00
Junio C Hamano 09e74b06ea Merge branch 'op/worktree-is-main-bare-fix'
Going into a secondary worktree and asking "is the main worktree
bare?" did not work correctly when per-worktree configuration
option was in use, which has been corrected.

* op/worktree-is-main-bare-fix:
  worktree: detect from secondary worktree if main worktree is bare
2025-02-14 17:53:48 -08:00
Junio C Hamano 5785d9143b Merge branch 'tc/clone-single-revision'
"git clone" learned to make a shallow clone for a single commit
that is not necessarily be at the tip of any branch.

* tc/clone-single-revision:
  builtin/clone: teach git-clone(1) the --revision= option
  parse-options: introduce die_for_incompatible_opt2()
  clone: introduce struct clone_opts in builtin/clone.c
  clone: add tags refspec earlier to fetch refspec
  clone: refactor wanted_peer_refs()
  clone: make it possible to specify --tags
  clone: cut down on global variables in clone.c
2025-02-14 17:53:48 -08:00
Junio C Hamano 0cc13007e5 Merge branch 'bc/doc-adoc-not-txt'
All the documentation .txt files have been renamed to .adoc to help
content aware editors.

* bc/doc-adoc-not-txt:
  Remove obsolete ".txt" extensions for AsciiDoc files
  doc: use .adoc extension for AsciiDoc files
  gitattributes: mark AsciiDoc files as LF-only
  editorconfig: add .adoc extension
  doc: update gitignore for .adoc extension
2025-02-14 17:53:47 -08:00
Junio C Hamano e2067b49ec The tenth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-12 10:09:08 -08:00
Junio C Hamano 2d7a874493 Merge branch 'da/help-autocorrect-one-fix'
"git -c help.autocorrect=0 psuh" shows the suggested typofix,
unlike the previous attempt in the base topic.

* da/help-autocorrect-one-fix:
  help: add "show" as a valid configuration value
  help: show the suggested command when help.autocorrect is false
2025-02-12 10:08:55 -08:00
Junio C Hamano 39de0ffbe3 Merge branch 'sc/help-autocorrect-one'
"[help] autocorrect = 1" used to be a way to say "please wait for
0.1 second after suggesting a typofix of the command name before
running that command"; now it means "yes, if there is a plausible
typofix for the command name, please run it immediately".

* sc/help-autocorrect-one:
  help: interpret boolean string values for help.autocorrect
2025-02-12 10:08:55 -08:00
Junio C Hamano 0a99ffb4d6 Merge branch 'ms/remote-valid-remote-name'
Code shuffling.

* ms/remote-valid-remote-name:
  remote: relocate valid_remote_name
2025-02-12 10:08:54 -08:00
Junio C Hamano 998c5f0c75 Merge branch 'ms/refspec-cleanup'
Code clean-up.  cf. <Z6G-toOJjMmK8iJG@pks.im>

* ms/refspec-cleanup:
  refspec: relocate apply_refspecs and related funtions
  refspec: relocate matching related functions
  remote: rename query_refspecs functions
  refspec: relocate refname_matches_negative_refspec_item
  remote: rename function omit_name_by_refspec
2025-02-12 10:08:54 -08:00
Junio C Hamano 791677a5dd Merge branch 'jp/doc-trailer-config'
Documentaiton updates.

* jp/doc-trailer-config:
  config.txt: add trailer.* variables
2025-02-12 10:08:54 -08:00
Junio C Hamano 5b9d01bc4d Merge branch 'zh/gc-expire-to'
"git gc" learned the "--expire-to" option and passes it down to
underlying "git repack".

* zh/gc-expire-to:
  gc: add `--expire-to` option
2025-02-12 10:08:53 -08:00
Junio C Hamano a4af0b6288 Merge branch 'js/libgit-rust'
Foreign language interface for Rust into our code base has been added.

* js/libgit-rust:
  libgit: add higher-level libgit crate
  libgit-sys: also export some config_set functions
  libgit-sys: introduce Rust wrapper for libgit.a
  common-main: split init and exit code into new files
2025-02-12 10:08:53 -08:00
Junio C Hamano 3f3fd0f346 Merge branch 'ac/t5401-use-test-path-is-file'
Test clean-up.

* ac/t5401-use-test-path-is-file:
  t5401: prefer test_path_is_* helper function
2025-02-12 10:08:52 -08:00
Junio C Hamano 9865ef2457 Merge branch 'ac/t6423-unhide-git-exit-status'
Test clean-up.

* ac/t6423-unhide-git-exit-status:
  t6423: fix suppression of Git’s exit code in tests
2025-02-12 10:08:52 -08:00
Junio C Hamano 07c401d392 Merge branch 'ps/repack-keep-unreachable-in-unpacked-repo'
"git repack --keep-unreachable" to send unreachable objects to the
main pack "git repack -ad" produces did not work when there is no
existing packs, which has been corrected.

* ps/repack-keep-unreachable-in-unpacked-repo:
  builtin/repack: fix `--keep-unreachable` when there are no packs
2025-02-12 10:08:52 -08:00
Junio C Hamano aae91a86fb Merge branch 'ds/name-hash-tweaks'
"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.

* ds/name-hash-tweaks:
  pack-objects: prevent name hash version change
  test-tool: add helper for name-hash values
  p5313: add size comparison test
  pack-objects: add GIT_TEST_NAME_HASH_VERSION
  repack: add --name-hash-version option
  pack-objects: add --name-hash-version option
  pack-objects: create new name-hash function version
2025-02-12 10:08:51 -08:00
Junio C Hamano 388218fac7 The ninth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-10 10:18:32 -08:00
Junio C Hamano 50e1821529 Merge branch 'jk/ci-coverity-update'
CI update to make Coverity job work again.

* jk/ci-coverity-update:
  ci: set CI_JOB_IMAGE for coverity job
2025-02-10 10:18:31 -08:00
Junio C Hamano 6f0b72205d Merge branch 'sk/unit-tests-0130'
Convert a handful of unit tests to work with the clar framework.

* sk/unit-tests-0130:
  t/unit-tests: convert strcmp-offset test to use clar test framework
  t/unit-tests: convert strbuf test to use clar test framework
  t/unit-tests: adapt example decorate test to use clar test framework
  t/unit-tests: convert hashmap test to use clar test framework
2025-02-10 10:18:31 -08:00
Junio C Hamano 246569bf83 Merge branch 'ps/hash-cleanup'
Further code clean-up on the use of hash functions.  Now the
context object knows what hash function it is working with.

* ps/hash-cleanup:
  global: adapt callers to use generic hash context helpers
  hash: provide generic wrappers to update hash contexts
  hash: stop typedeffing the hash context
  hash: convert hashing context to a structure
2025-02-10 10:18:31 -08:00
Junio C Hamano 0ca6b46d7c Merge branch 'jt/gitlab-ci-base-fix'
Two CI tasks, whitespace check and style check, work on the
difference from the base version and the version being checked, but
the base was computed incorrectly in GitLab CI in some cases, which
has been corrected.

* jt/gitlab-ci-base-fix:
  ci: fix base commit fallback for check-whitespace and check-style
2025-02-10 10:18:30 -08:00
Junio C Hamano 34736ff48e Merge branch 'pw/apply-ulong-overflow-check'
"git apply" internally uses unsigned long for line numbers and uses
strtoul() to parse numbers on the hunk headers.  It however forgot
to check parse errors.

* pw/apply-ulong-overflow-check:
  apply: detect overflow when parsing hunk header
2025-02-10 10:18:30 -08:00
Junio C Hamano 442b7e0018 Merge branch 'ps/setup-reinit-fixes'
"git init" to reinitialize a repository that already exists cannot
change the hash function and ref backends; such a request is
silently ignored now.

* ps/setup-reinit-fixes:
  setup: fix reinit of repos with incompatible GIT_DEFAULT_HASH
  setup: fix reinit of repos with incompatible GIT_DEFAULT_REF_FORMAT
  t0001: remove duplicate test
2025-02-10 10:18:29 -08:00
Junio C Hamano 9520f7d998 The eighth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 14:56:45 -08:00
Junio C Hamano 5f338eae76 Merge branch 'ps/leakfixes-0129'
A few more leakfixes.

* ps/leakfixes-0129:
  scalar: free result of `remote_default_branch()`
  unix-socket: fix memory leak when chdir(3p) fails
2025-02-06 14:56:45 -08:00
Junio C Hamano 9d0e81e2ae Merge branch 'ps/zlib-ng'
The code paths to interact with zlib has been cleaned up in
preparation for building with zlib-ng.

* ps/zlib-ng:
  ci: make "linux-musl" job use zlib-ng
  ci: switch linux-musl to use Meson
  compat/zlib: allow use of zlib-ng as backend
  git-zlib: cast away potential constness of `next_in` pointer
  compat/zlib: provide stubs for `deflateSetHeader()`
  compat/zlib: provide `deflateBound()` shim centrally
  git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"
  compat: introduce new "zlib.h" header
  git-compat-util: drop `z_const` define
  compat: drop `uncompress2()` compatibility shim
2025-02-06 14:56:45 -08:00
Junio C Hamano 9fad473fae Merge branch 'js/bundle-unbundle-fd-reuse-fix'
The code path used when "git fetch" fetches from a bundle file
closed the same file descriptor twice, which sometimes broke things
unexpectedly when the file descriptor was reused, which has been
corrected.

* js/bundle-unbundle-fd-reuse-fix:
  bundle: avoid closing file descriptor twice
2025-02-06 14:56:44 -08:00
Junio C Hamano 2bf3c7fab1 Merge branch 'ps/ci-misc-updates'
CI updates (containerization, dropping stale ones, etc.).

* ps/ci-misc-updates:
  ci: remove stale code for Azure Pipelines
  ci: use latest Ubuntu release
  ci: stop special-casing for Ubuntu 16.04
  gitlab-ci: add linux32 job testing against i386
  gitlab-ci: remove the "linux-old" job
  github: simplify computation of the job's distro
  github: convert all Linux jobs to be containerized
  github: adapt containerized jobs to be rootless
  t7422: fix flaky test caused by buffered stdout
  t0060: fix EBUSY in MinGW when setting up runtime prefix
2025-02-06 14:56:44 -08:00
Piotr Szlazak dd1eb665ef doc: documentation for http.uploadarchive config option
In Git v2.44.0 support for 'git archive' over HTTP protocol
was added, but it was nowhere documented how it should be
enabled in git-http-backend.

Add missing documentation.

Signed-off-by: Piotr Szlazak <piotr.szlazak@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:33:14 -08:00
Toon Claes 337855629f builtin/clone: teach git-clone(1) the --revision= option
The git-clone(1) command has the option `--branch` that allows the user
to select the branch they want HEAD to point to. In a non-bare
repository this also checks out that branch.

Option `--branch` also accepts a tag. When a tag name is provided, the
commit this tag points to is checked out and HEAD is detached. Thus
`--branch` can be used to clone a repository and check out a ref kept
under `refs/heads` or `refs/tags`. But some other refs might be in use
as well. For example Git forges might use refs like `refs/pull/<id>` and
`refs/merge-requests/<id>` to track pull/merge requests. These refs
cannot be selected upon git-clone(1).

Add option `--revision` to git-clone(1). This option accepts a fully
qualified reference, or a hexadecimal commit ID. This enables the user
to clone and check out any revision they want. `--revision` can be used
in conjunction with `--depth` to do a minimal clone that only contains
the blob and tree for a single revision. This can be useful for
automated tests running in CI systems.

Using option `--branch` and `--single-branch` together is a similar
scenario, but serves a different purpose. Using these two options, a
singlet remote tracking branch is created and the fetch refspec is set
up so git-fetch(1) will receive updates on that branch from the remote.
This allows the user work on that single branch.

Option `--revision` on contrary detaches HEAD, creates no tracking
branches, and writes no fetch refspec.

Signed-off-by: Toon Claes <toon@iotcl.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
[jc: removed unnecessary TEST_PASSES_SANITIZE_LEAK from the test]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:26:42 -08:00
Toon Claes 9144b9362b parse-options: introduce die_for_incompatible_opt2()
The functions die_for_incompatible_opt3() and
die_for_incompatible_opt4() already exist to die whenever a user
specifies three or four options respectively that are not compatible.

Introduce die_for_incompatible_opt2() which dies when two options that
are incompatible are set.

Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:23:54 -08:00
Toon Claes 7a52a8c7d8 clone: introduce struct clone_opts in builtin/clone.c
There is a lot of state stored in global variables in builtin/clone.c.
In the long run we'd like to remove many of those.

Introduce `struct clone_opts` in this file. This struct will be used to
contain all details needed to perform the clone. The struct object can
be thrown around to all the functions that need these details.

The first field we're adding is `wants_head`. In some scenarios
(specifically when both `--single-branch` and `--branch` are given) we
are not interested in `HEAD` on the remote. The field `wants_head` in
`struct clone_opts` will hold this information. We could have put
`option_branch` and `option_single_branch` into that struct instead, but
in a following commit we'll be using `wants_head` as well.

Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:23:54 -08:00
Toon Claes 2ca67c6f14 clone: add tags refspec earlier to fetch refspec
In clone.c we call refspec_ref_prefixes() to copy the fetch refspecs
from the `remote->fetch` refspec into `ref_prefixes` of
`transport_ls_refs_options`. Afterwards we add the tags prefix
`refs/tags/` prefix as well. At a later point, in wanted_peer_refs() we
process refs using both `remote->fetch` and `TAG_REFSPEC`.

Simplify the code by appending `TAG_REFSPEC` to `remote->fetch` before
calling refspec_ref_prefixes().

To be able to do this, we set `option_tags` to 0 when --mirror is given.
This is because --mirror mirrors (hence the name) all the refs,
including tags and they do not need to be treated separately.

Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:23:53 -08:00
Toon Claes 879780f9a1 clone: refactor wanted_peer_refs()
The function wanted_peer_refs() is used to map the refs returned by the
server to refs we will save in our clone.

Over time this function grown to be very complex. Refactor it.

Previously, there was a separate code path for when
`option_single_branch` was set. It resulted in duplicated code and
deeper nested conditions. After this refactor the code path for when
`option_single_branch` is truthy modifies `refs` and then falls through
to the common code path. This approach relies on the `refspec` being set
correctly and thus only mapping refs that are relevant.

Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:23:53 -08:00
Toon Claes bc26f7690a clone: make it possible to specify --tags
Option --no-tags was added in 0dab2468ee (clone: add a --no-tags option
to clone without tags, 2017-04-26). At the time there was no need to
support --tags as well, although there was some conversation about
it[1].

To simplify the code and to prepare for future commits, invert the flag
internally. Functionally there is no change, because the flag is
default-enabled passing `--tags` has no effect, so there's no need to
add tests for this.

[1]: https://lore.kernel.org/git/CAGZ79kbHuMpiavJ90kQLEL_AR0BEyArcZoEWAjPPhOFacN16YQ@mail.gmail.com/

Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:23:53 -08:00
Toon Claes 7f420a6bda clone: cut down on global variables in clone.c
In clone.c the `struct option` which is used to parse the input options
for git-clone(1) is a global variable. Due to this, many variables that
are used to parse the value into, are also global.

Make `builtin_clone_options` a local variable in cmd_clone() and carry
along all variables that are only used in that function.

Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:23:53 -08:00
Olga Pilipenco 78a95e0d80 worktree: detect from secondary worktree if main worktree is bare
When extensions.worktreeConfig is true and the main worktree is
bare -- that is, its config.worktree file contains core.bare=true
-- commands run from secondary worktrees incorrectly see the main
worktree as not bare. As such, those commands incorrectly think
that the repository's default branch (typically "main" or
"master") is checked out in the bare repository even though it's
not. This makes it impossible, for instance, to checkout or delete
the default branch from a secondary worktree, among other
shortcomings.

This problem occurs because, when extensions.worktreeConfig is
true, commands run in secondary worktrees only consult
$commondir/config and $commondir/worktrees/<id>/config.worktree,
thus they never see the main worktree's core.bare=true setting in
$commondir/config.worktree.

Fix this problem by consulting the main worktree's config.worktree
file when checking whether it is bare. (This extra work is
performed only when running from a secondary worktree.)

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Olga Pilipenco <olga.pilipenco@shopify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-05 09:46:23 -08:00
Patrick Steinhardt 414c82300a builtin/repack: fix `--keep-unreachable` when there are no packs
The "--keep-unreachable" flag is supposed to append any unreachable
objects to the newly written pack. This flag is explicitly documented as
appending both packed and loose unreachable objects to the new packfile.
And while this works alright when repacking with preexisting packfiles,
it stops working when the repository does not have any packfiles at all.

The root cause are the conditions used to decide whether or not we want
to append "--pack-loose-unreachable" to git-pack-objects(1). There are
a couple of conditions here:

  - `has_existing_non_kept_packs()` checks whether there are existing
    packfiles. This condition makes sense to guard "--keep-pack=",
    "--unpack-unreachable" and "--keep-unreachable", because all of
    these flags only make sense in combination with existing packfiles.
    But it does not make sense to disable `--pack-loose-unreachable`
    when there aren't any preexisting packfiles, as loose objects can be
    packed into the new packfile regardless of that.

  - `delete_redundant` checks whether we want to delete any objects or
    packs that are about to become redundant. The documentation of
    `--keep-unreachable` explicitly says that `git repack -ad` needs to
    be executed for the flag to have an effect.

    It is not immediately obvious why such redundant objects need to be
    deleted in order for "--pack-unreachable-objects" to be effective.
    But as things are working as documented this is nothing we'll change
    for now.

  - `pack_everything & PACK_CRUFT` checks that we're not creating a
    cruft pack. This condition makes sense in the context of
    "--pack-loose-unreachable", as unreachable objects would end up in
    the cruft pack anyway.

So while the second and third condition are sensible, it does not make
any sense to condition `--pack-loose-unreachable` on the existence of
packfiles.

Fix the bug by splitting out the "--pack-loose-unreachable" and only
making it depend on the second and third condition. Like this, loose
unreachable objects will be packed regardless of any preexisting
packfiles.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04 09:58:02 -08:00
Meet Soni f21ea69d94 remote: relocate valid_remote_name
Move the `valid_remote_name()` function from the refspec subsystem to
the remote subsystem to better align with the separation of concerns.

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04 09:55:59 -08:00
Meet Soni d549b6c9ff refspec: relocate apply_refspecs and related funtions
Move the functions `apply_refspecs()` and `apply_negative_refspecs()`
from `remote.c` to `refspec.c`. These functions focus on applying
refspecs, so centralizing them in `refspec.c` improves code organization
by keeping refspec-related logic in one place.

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04 09:51:42 -08:00
Meet Soni 7b24a170d2 refspec: relocate matching related functions
Move the functions `refspec_find_match()`, `refspec_find_all_matches()`
and `refspec_find_negative_match()` from `remote.c` to `refspec.c`.
These functions focus on matching refspecs, so centralizing them in
`refspec.c` improves code organization by keeping refspec-related logic
in one place.

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04 09:51:41 -08:00
Meet Soni be0905fed1 remote: rename query_refspecs functions
Rename functions related to handling refspecs in preparation for their
move from `remote.c` to `refspec.c`. Update their names to better
reflect their intent:

    - `query_refspecs()` -> `refspec_find_match()` for clarity, as it
      finds a single matching refspec.

    - `query_refspecs_multiple()` -> `refspec_find_all_matches()` to
      better reflect that it collects all matching refspecs instead of
      returning just the first match.

    - `query_matches_negative_refspec()` ->
      `refspec_find_negative_match()` for consistency with the
      updated naming convention, even though this static function
      didn't strictly require renaming.

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04 09:51:41 -08:00
Meet Soni 230d022fe3 refspec: relocate refname_matches_negative_refspec_item
Move the functions `refname_matches_negative_refspec_item()`,
`refspec_match()`, and `match_name_with_pattern()` from `remote.c` to
`refspec.c`. These functions focus on refspec matching, so placing them
in `refspec.c` aligns with the separation of concerns. Keep
refspec-related logic in `refspec.c` and remote-specific logic in
`remote.c` for better code organization.

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04 09:51:41 -08:00
Meet Soni e4f6ab0085 remote: rename function omit_name_by_refspec
Rename the function `omit_name_by_refspec()` to
`refname_matches_negative_refspec_item()` to provide clearer intent.
The previous function name was vague and did not accurately describe its
purpose. By using `refname_matches_negative_refspec_item`, make the
function's purpose more intuitive, clarifying that it checks if a
reference name matches any negative refspec.

Rename function parameters for consistency with existing naming
conventions. Use `refname` instead of `name` to align with terminology
in `refs.h`.

Remove the redundant doc comment since the function name is now
self-explanatory.

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-04 09:51:41 -08:00
Derrick Stolee 85127bcdea backfill: assume --sparse when sparse-checkout is enabled
The previous change introduced the '--[no-]sparse' option for the 'git
backfill' command, but did not assume it as enabled by default. However,
this is likely the behavior that users will most often want to happen.
Without this default, users with a small sparse-checkout may be confused
when 'git backfill' downloads every version of every object in the full
history.

However, this is left as a separate change so this decision can be reviewed
independently of the value of the '--[no-]sparse' option.

Add a test of adding the '--sparse' option to a repo without sparse-checkout
to make it clear that supplying it without a sparse-checkout is an error.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:42 -08:00
Derrick Stolee bff4555767 backfill: add --sparse option
One way to significantly reduce the cost of a Git clone and later fetches is
to use a blobless partial clone and combine that with a sparse-checkout that
reduces the paths that need to be populated in the working directory. Not
only does this reduce the cost of clones and fetches, the sparse-checkout
reduces the number of objects needed to download from a promisor remote.

However, history investigations can be expensive as computing blob diffs
will trigger promisor remote requests for one object at a time. This can be
avoided by downloading the blobs needed for the given sparse-checkout using
'git backfill' and its new '--sparse' mode, at a time that the user is
willing to pay that extra cost.

Note that this is distinctly different from the '--filter=sparse:<oid>'
option, as this assumes that the partial clone has all reachable trees and
we are using client-side logic to avoid downloading blobs outside of the
sparse-checkout cone. This avoids the server-side cost of walking trees
while also achieving a similar goal. It also downloads in batches based on
similar path names, presenting a resumable download if things are
interrupted.

This augments the path-walk API to have a possibly-NULL 'pl' member that may
point to a 'struct pattern_list'. This could be more general than the
sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently
the only consumer.

Be sure to test this in both cone mode and not cone mode. Cone mode has the
benefit that the path-walk can skip certain paths once they would expand
beyond the sparse-checkout. Non-cone mode can describe the included files
using both positive and negative patterns, which changes the possible return
values of path_matches_pattern_list(). Test both kinds of matches for
increased coverage.

To test this, we can create a blobless sparse clone, expand the
sparse-checkout slightly, and then run 'git backfill --sparse' to see
how much data is downloaded. The general steps are

 1. git clone --filter=blob:none --sparse <url>
 2. git sparse-checkout set <dir1> ... <dirN>
 3. git backfill --sparse

For the Git repository with the 'builtin' directory in the
sparse-checkout, we get these results for various batch sizes:

| Batch Size      | Pack Count | Pack Size | Time  |
|-----------------|------------|-----------|-------|
| (Initial clone) | 3          | 110 MB    |       |
| 10K             | 12         | 192 MB    | 17.2s |
| 15K             | 9          | 192 MB    | 15.5s |
| 20K             | 8          | 192 MB    | 15.5s |
| 25K             | 7          | 192 MB    | 14.7s |

This case matters less because a full clone of the Git repository from
GitHub is currently at 277 MB.

Using a copy of the Linux repository with the 'kernel/' directory in the
sparse-checkout, we get these results:

| Batch Size      | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|------|
| (Initial clone) | 2          | 1,876 MB  |      |
| 10K             | 11         | 2,187 MB  | 46s  |
| 25K             | 7          | 2,188 MB  | 43s  |
| 50K             | 5          | 2,194 MB  | 44s  |
| 100K            | 4          | 2,194 MB  | 48s  |

This case is more meaningful because a full clone of the Linux
repository is currently over 6 GB, so this is a valuable way to download
a fraction of the repository and no longer need network access for all
reachable objects within the sparse-checkout.

Choosing a batch size will depend on a lot of factors, including the
user's network speed or reliability, the repository's file structure,
and how many versions there are of the file within the sparse-checkout
scope. There will not be a one-size-fits-all solution.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:42 -08:00