Commit Graph

16251 Commits (7c78c599bb9b51e5cbdae3e7dc1d723eefcf7c61)

Author SHA1 Message Date
Todd Zullinger 7c78c599bb CodingGuidelines: *.txt -> *.adoc fixes
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-03 13:49:20 -08:00
Todd Zullinger 7d90a272ac doc: remove unneeded .gitattributes
The top-level .gitattributes file contains entries for the Documentation
tree.  Documentation/.gitattributes has not been touched since it was
added in 14f9e128d3 (Define the project whitespace policy, 2008-02-10).

Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-03 13:49:19 -08:00
Junio C Hamano 06d9252bcc doc: fix build-docdep.perl
We renamed from .txt to .adoc all the asciidoc source files and
necessary includes.  We also need to adjust the build-docdep tool to
work on files whose suffix is .adoc when computing the documentation
dependencies.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-01 10:26:15 -08:00
Todd Zullinger 41c793eae9 doc: update howto-index.sh for .adoc extensions
The .txt extensions were changed to .adoc in 1f010d6bdf (doc: use .adoc
extension for AsciiDoc files, 2025-01-20).  This left broken links in
the generated howto-index.html.

Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-01 10:00:51 -08:00
Junio C Hamano 08bdfd4535 Git 2.49-rc0
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-26 08:55:18 -08:00
Junio C Hamano 5a526e5e18 The fourteenth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-25 14:19:37 -08:00
Junio C Hamano 092180990d Merge branch 'ad/set-default-target-in-makefiles'
Correct the default target in Documentation/Makefile, and
future-proof all Makefiles from similar breakages by declaring the
default target (which happens to be "all") upfront.

* ad/set-default-target-in-makefiles:
  Makefile: set default goals in makefiles
2025-02-25 14:19:36 -08:00
Junio C Hamano 9b07c152df Merge branch 'pw/merge-tree-stdin-deadlock-fix'
"git merge-tree --stdin" has been improved (including a workaround
for a deadlock).

* pw/merge-tree-stdin-deadlock-fix:
  merge-tree: fix link formatting in html docs
  merge-tree: improve docs for --stdin
  merge-tree: only use basic merge config
  merge-tree: remove redundant code
  merge-tree --stdin: flush stdout to avoid deadlock
2025-02-25 14:19:36 -08:00
Junio C Hamano 37b34c4e99 Merge branch 'mh/doc-commit-title-not-subject'
The documentation of "git commit" and "git rebase" now refer to
commit titles as such, not "subject".

* mh/doc-commit-title-not-subject:
  doc: use 'title' consistently
2025-02-25 14:19:36 -08:00
Junio C Hamano 2d2a71ce85 The thirteenth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-21 10:35:54 -08:00
Junio C Hamano 84a5ce3f03 Merge branch 'ac/doc-http-ssl-type-config'
Two configuration variables about SSL authentication material that
weren't mentioned in the documentations are now mentioned.

* ac/doc-http-ssl-type-config:
  docs: indicate http.sslCertType and sslKeyType
2025-02-21 10:35:54 -08:00
Junio C Hamano 0fbe93b36c Merge branch 'jc/doc-boolean-synonyms'
Doc updates.

* jc/doc-boolean-synonyms:
  doc: centrally document various ways tospell `true` and `false`
2025-02-21 10:35:53 -08:00
Junio C Hamano 55b5ba87f1 Merge branch 'en/doc-renormalize'
Doc updates.

* en/doc-renormalize:
  doc: clarify the intent of the renormalize option in the merge machinery
2025-02-21 10:35:53 -08:00
Junio C Hamano a554262210 The twelfth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 15:30:33 -08:00
Junio C Hamano 7722b997c6 Merge branch 'jt/rev-list-missing-print-info'
"git rev-list --missing=" learned to accept "print-info" that gives
known details expected of the missing objects, like path and type.

* jt/rev-list-missing-print-info:
  rev-list: extend print-info to print missing object type
  rev-list: add print-info action to print missing object path
2025-02-18 15:30:32 -08:00
Junio C Hamano e565f37553 Merge branch 'ds/backfill'
Lazy-loading missing files in a blobless clone on demand is costly
as it tends to be one-blob-at-a-time.  "git backfill" is introduced
to help bulk-download necessary files beforehand.

* ds/backfill:
  backfill: assume --sparse when sparse-checkout is enabled
  backfill: add --sparse option
  backfill: add --min-batch-size=<n> option
  backfill: basic functionality and tests
  backfill: add builtin boilerplate
2025-02-18 15:30:31 -08:00
M Hickford c2d96bc42c doc: use 'title' consistently
The first line of a commit message is variously called 'title' or
'subject'.

Prefer 'title' unless discussing email.

Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 09:56:00 -08:00
Phillip Wood 6a9ae81015 merge-tree: fix link formatting in html docs
In the html documentation the link to the "OUTPUT" section is surrounded
by square brackets. Fix this by adding explicit link text to the cross
reference.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 09:52:40 -08:00
Phillip Wood 3e681a7ccc merge-tree: improve docs for --stdin
Add a section for --stdin in the list of options and document that it
implies -z so readers know how to parse the output. Also correct the
merge status documentation for --stdin as if the status is less than
zero "git merge-tree" dies before printing it.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 09:52:40 -08:00
Adam Dinwoodie 5309c1e9fb Makefile: set default goals in makefiles
Explicitly set the default goal at the very top of various makefiles.
This is already present in some makefiles, but not all of them.

In particular, this corrects a regression introduced in a38edab7c8
(Makefile: generate doc versions via GIT-VERSION-GEN, 2024-12-06).  That
commit added some config files as build targets for the Documentation
directory, and put the target configuration in a sensible place.
Unfortunately, that sensible place was above any other build target
definitions, meaning the default goal changed to being those
configuration files only, rather than the HTML and man page
documentation.

Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org>
Helped-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 09:02:26 -08:00
Junio C Hamano 0394451348 The eleventh batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-14 17:53:49 -08:00
Junio C Hamano 60cb8e79cb Merge branch 'ps/doc-http-upload-archive-service'
Doc update.

* ps/doc-http-upload-archive-service:
  doc: documentation for http.uploadarchive config option
2025-02-14 17:53:49 -08:00
Junio C Hamano 5785d9143b Merge branch 'tc/clone-single-revision'
"git clone" learned to make a shallow clone for a single commit
that is not necessarily be at the tip of any branch.

* tc/clone-single-revision:
  builtin/clone: teach git-clone(1) the --revision= option
  parse-options: introduce die_for_incompatible_opt2()
  clone: introduce struct clone_opts in builtin/clone.c
  clone: add tags refspec earlier to fetch refspec
  clone: refactor wanted_peer_refs()
  clone: make it possible to specify --tags
  clone: cut down on global variables in clone.c
2025-02-14 17:53:48 -08:00
Junio C Hamano 0cc13007e5 Merge branch 'bc/doc-adoc-not-txt'
All the documentation .txt files have been renamed to .adoc to help
content aware editors.

* bc/doc-adoc-not-txt:
  Remove obsolete ".txt" extensions for AsciiDoc files
  doc: use .adoc extension for AsciiDoc files
  gitattributes: mark AsciiDoc files as LF-only
  editorconfig: add .adoc extension
  doc: update gitignore for .adoc extension
2025-02-14 17:53:47 -08:00
Junio C Hamano e2067b49ec The tenth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-12 10:09:08 -08:00
Junio C Hamano 2d7a874493 Merge branch 'da/help-autocorrect-one-fix'
"git -c help.autocorrect=0 psuh" shows the suggested typofix,
unlike the previous attempt in the base topic.

* da/help-autocorrect-one-fix:
  help: add "show" as a valid configuration value
  help: show the suggested command when help.autocorrect is false
2025-02-12 10:08:55 -08:00
Junio C Hamano 39de0ffbe3 Merge branch 'sc/help-autocorrect-one'
"[help] autocorrect = 1" used to be a way to say "please wait for
0.1 second after suggesting a typofix of the command name before
running that command"; now it means "yes, if there is a plausible
typofix for the command name, please run it immediately".

* sc/help-autocorrect-one:
  help: interpret boolean string values for help.autocorrect
2025-02-12 10:08:55 -08:00
Junio C Hamano 791677a5dd Merge branch 'jp/doc-trailer-config'
Documentaiton updates.

* jp/doc-trailer-config:
  config.txt: add trailer.* variables
2025-02-12 10:08:54 -08:00
Junio C Hamano 5b9d01bc4d Merge branch 'zh/gc-expire-to'
"git gc" learned the "--expire-to" option and passes it down to
underlying "git repack".

* zh/gc-expire-to:
  gc: add `--expire-to` option
2025-02-12 10:08:53 -08:00
Junio C Hamano aae91a86fb Merge branch 'ds/name-hash-tweaks'
"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.

* ds/name-hash-tweaks:
  pack-objects: prevent name hash version change
  test-tool: add helper for name-hash values
  p5313: add size comparison test
  pack-objects: add GIT_TEST_NAME_HASH_VERSION
  repack: add --name-hash-version option
  pack-objects: add --name-hash-version option
  pack-objects: create new name-hash function version
2025-02-12 10:08:51 -08:00
Elijah Newren 45761988ac doc: clarify the intent of the renormalize option in the merge machinery
The -X renormalize (or merge.renormalize config) option is intended to
reduce conflicts due to normalization of newer versions of history.  It
does so by renormalizing files that it is about to do a three-way
content merge on.  Some folks thought it would renormalize all files
throughout the tree, and the previous wording wasn't clear enough to
dispell that misconception.  Update the docs to make it clear that the
merge machinery will only apply renormalization to files which need a
three-way content merge.

(Technically, the merge machinery also does renormalization on
modify/delete conflicts, in order to see if the modification was merely
a normalization; if so, it can accept the delete and not report a
conflict.  But it's not clear that this piece needs to be explained to
users, and trying to distinguish it might feel like splitting hairs and
overcomplicating the explanation, so we leave it out.)

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-11 13:34:36 -08:00
Junio C Hamano 832f56f06a doc: centrally document various ways tospell `true` and `false`
We do not seem to centrally document exhaustively ways to spell
Boolean values.

The description in the Environment Variables of git(1) section
assumes that the reader is already familiar with how "Boolean valued
configuration variables" are specified, without referring to
anything, so there is no way for the readers to find out more.

The description of `bool` in the section on "--type
<type>" in "git config --help" might be the place to do so, but it
is not telling us all that much.

The description of Boolean valued placeholders in the pretty formats
section of "git log --help" enumerates the possible values with "etc."
implying there may be other synonyms; shrink the list of samples and
instead refer to the canonical and authoritative source of truth, which
now is git-config(1).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-11 10:12:04 -08:00
Junio C Hamano 388218fac7 The ninth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-10 10:18:32 -08:00
Junio C Hamano 9520f7d998 The eighth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 14:56:45 -08:00
Piotr Szlazak dd1eb665ef doc: documentation for http.uploadarchive config option
In Git v2.44.0 support for 'git archive' over HTTP protocol
was added, but it was nowhere documented how it should be
enabled in git-http-backend.

Add missing documentation.

Signed-off-by: Piotr Szlazak <piotr.szlazak@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:33:14 -08:00
Toon Claes 337855629f builtin/clone: teach git-clone(1) the --revision= option
The git-clone(1) command has the option `--branch` that allows the user
to select the branch they want HEAD to point to. In a non-bare
repository this also checks out that branch.

Option `--branch` also accepts a tag. When a tag name is provided, the
commit this tag points to is checked out and HEAD is detached. Thus
`--branch` can be used to clone a repository and check out a ref kept
under `refs/heads` or `refs/tags`. But some other refs might be in use
as well. For example Git forges might use refs like `refs/pull/<id>` and
`refs/merge-requests/<id>` to track pull/merge requests. These refs
cannot be selected upon git-clone(1).

Add option `--revision` to git-clone(1). This option accepts a fully
qualified reference, or a hexadecimal commit ID. This enables the user
to clone and check out any revision they want. `--revision` can be used
in conjunction with `--depth` to do a minimal clone that only contains
the blob and tree for a single revision. This can be useful for
automated tests running in CI systems.

Using option `--branch` and `--single-branch` together is a similar
scenario, but serves a different purpose. Using these two options, a
singlet remote tracking branch is created and the fetch refspec is set
up so git-fetch(1) will receive updates on that branch from the remote.
This allows the user work on that single branch.

Option `--revision` on contrary detaches HEAD, creates no tracking
branches, and writes no fetch refspec.

Signed-off-by: Toon Claes <toon@iotcl.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
[jc: removed unnecessary TEST_PASSES_SANITIZE_LEAK from the test]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:26:42 -08:00
Toon Claes bc26f7690a clone: make it possible to specify --tags
Option --no-tags was added in 0dab2468ee (clone: add a --no-tags option
to clone without tags, 2017-04-26). At the time there was no need to
support --tags as well, although there was some conversation about
it[1].

To simplify the code and to prepare for future commits, invert the flag
internally. Functionally there is no change, because the flag is
default-enabled passing `--tags` has no effect, so there's no need to
add tests for this.

[1]: https://lore.kernel.org/git/CAGZ79kbHuMpiavJ90kQLEL_AR0BEyArcZoEWAjPPhOFacN16YQ@mail.gmail.com/

Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-06 12:23:53 -08:00
Andrew Carter 3eeed876a9 docs: indicate http.sslCertType and sslKeyType
0a01d41ee4 (http: add support for different sslcert and sslkey types.,
2023-03-20) added useful SSL config options, but did not document them.

Signed-off-by: Andrew Carter <andrew@emailcarter.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-05 09:43:38 -08:00
Justin Tobler 3295c35398 rev-list: extend print-info to print missing object type
Additional information about missing objects found in git-rev-list(1)
can be printed by specifying the `print-info` missing action for the
`--missing` option. Extend this action to also print missing object type
information inferred from its containing object. This token follows the
form `type=<type>` and specifies the expected object type of the missing
object.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-05 09:32:01 -08:00
Justin Tobler c6d896bcfd rev-list: add print-info action to print missing object path
Missing objects identified through git-rev-list(1) can be printed by
setting the `--missing=print` option. Additional information about the
missing object, such as its path and type, may be present in its
containing object.

Add the `print-info` missing action for the `--missing` option that,
when set, prints additional insight about the missing object inferred
from its containing object. Each line of output for a missing object is
in the form: `?<oid> [<token>=<value>]...`. The `<token>=<value>` pairs
containing additional information are separated from each other by a SP.
The value is encoded in a token specific fashion, but SP or LF contained
in value are always expected to be represented in such a way that the
resulting encoded value does not have either of these two problematic
bytes. This format is kept generic so it can be extended in the future
to support additional information.

For now, only a missing object path info is implemented. It follows the
form `path=<path>` and specifies the full path to the object from the
top-level tree. A path containing SP or special characters is enclosed
in double-quotes in the C style as needed. In a subsequent commit,
missing object type info will also be added.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-05 09:32:01 -08:00
Derrick Stolee 85127bcdea backfill: assume --sparse when sparse-checkout is enabled
The previous change introduced the '--[no-]sparse' option for the 'git
backfill' command, but did not assume it as enabled by default. However,
this is likely the behavior that users will most often want to happen.
Without this default, users with a small sparse-checkout may be confused
when 'git backfill' downloads every version of every object in the full
history.

However, this is left as a separate change so this decision can be reviewed
independently of the value of the '--[no-]sparse' option.

Add a test of adding the '--sparse' option to a repo without sparse-checkout
to make it clear that supplying it without a sparse-checkout is an error.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:42 -08:00
Derrick Stolee bff4555767 backfill: add --sparse option
One way to significantly reduce the cost of a Git clone and later fetches is
to use a blobless partial clone and combine that with a sparse-checkout that
reduces the paths that need to be populated in the working directory. Not
only does this reduce the cost of clones and fetches, the sparse-checkout
reduces the number of objects needed to download from a promisor remote.

However, history investigations can be expensive as computing blob diffs
will trigger promisor remote requests for one object at a time. This can be
avoided by downloading the blobs needed for the given sparse-checkout using
'git backfill' and its new '--sparse' mode, at a time that the user is
willing to pay that extra cost.

Note that this is distinctly different from the '--filter=sparse:<oid>'
option, as this assumes that the partial clone has all reachable trees and
we are using client-side logic to avoid downloading blobs outside of the
sparse-checkout cone. This avoids the server-side cost of walking trees
while also achieving a similar goal. It also downloads in batches based on
similar path names, presenting a resumable download if things are
interrupted.

This augments the path-walk API to have a possibly-NULL 'pl' member that may
point to a 'struct pattern_list'. This could be more general than the
sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently
the only consumer.

Be sure to test this in both cone mode and not cone mode. Cone mode has the
benefit that the path-walk can skip certain paths once they would expand
beyond the sparse-checkout. Non-cone mode can describe the included files
using both positive and negative patterns, which changes the possible return
values of path_matches_pattern_list(). Test both kinds of matches for
increased coverage.

To test this, we can create a blobless sparse clone, expand the
sparse-checkout slightly, and then run 'git backfill --sparse' to see
how much data is downloaded. The general steps are

 1. git clone --filter=blob:none --sparse <url>
 2. git sparse-checkout set <dir1> ... <dirN>
 3. git backfill --sparse

For the Git repository with the 'builtin' directory in the
sparse-checkout, we get these results for various batch sizes:

| Batch Size      | Pack Count | Pack Size | Time  |
|-----------------|------------|-----------|-------|
| (Initial clone) | 3          | 110 MB    |       |
| 10K             | 12         | 192 MB    | 17.2s |
| 15K             | 9          | 192 MB    | 15.5s |
| 20K             | 8          | 192 MB    | 15.5s |
| 25K             | 7          | 192 MB    | 14.7s |

This case matters less because a full clone of the Git repository from
GitHub is currently at 277 MB.

Using a copy of the Linux repository with the 'kernel/' directory in the
sparse-checkout, we get these results:

| Batch Size      | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|------|
| (Initial clone) | 2          | 1,876 MB  |      |
| 10K             | 11         | 2,187 MB  | 46s  |
| 25K             | 7          | 2,188 MB  | 43s  |
| 50K             | 5          | 2,194 MB  | 44s  |
| 100K            | 4          | 2,194 MB  | 48s  |

This case is more meaningful because a full clone of the Linux
repository is currently over 6 GB, so this is a valuable way to download
a fraction of the repository and no longer need network access for all
reachable objects within the sparse-checkout.

Choosing a batch size will depend on a lot of factors, including the
user's network speed or reliability, the repository's file structure,
and how many versions there are of the file within the sparse-checkout
scope. There will not be a one-size-fits-all solution.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:42 -08:00
Derrick Stolee 6840fe9ee2 backfill: add --min-batch-size=<n> option
Users may want to specify a minimum batch size for their needs. This is only
a minimum: the path-walk API provides a list of OIDs that correspond to the
same path, and thus it is optimal to allow delta compression across those
objects in a single server request.

We could consider limiting the request to have a maximum batch size in the
future. For now, we let the path-walk API batches determine the
boundaries.

To get a feeling for the value of specifying the --min-batch-size parameter,
I tested a number of open source repositories available on GitHub. The
procedure was generally:

 1. git clone --filter=blob:none <url>
 2. git backfill

Checking the number of packfiles and the size of the .git/objects/pack
directory helps to identify the effects of different batch sizes.

For the Git repository, we get these results:

| Batch Size      | Pack Count | Pack Size | Time  |
|-----------------|------------|-----------|-------|
| (Initial clone) | 2          | 119 MB    |       |
| 25K             | 8          | 290 MB    | 24s   |
| 50K             | 5          | 290 MB    | 24s   |
| 100K            | 4          | 290 MB    | 29s   |

Other than the packfile counts decreasing as we need fewer batches, the
size and time required is not changing much for this small example.

For the nodejs/node repository, we see these results:

| Batch Size      | Pack Count | Pack Size | Time   |
|-----------------|------------|-----------|--------|
| (Initial clone) | 2          | 330 MB    |        |
| 25K             | 19         | 1,222 MB  | 1m 22s |
| 50K             | 11         | 1,221 MB  | 1m 24s |
| 100K            | 7          | 1,223 MB  | 1m 40s |
| 250K            | 4          | 1,224 MB  | 2m 23s |
| 500K            | 3          | 1,216 MB  | 4m 38s |

Here, we don't have much difference in the size of the repo, though the
500K batch size results in a few MB gained. That comes at a cost of a
much longer time. This extra time is due to server-side delta
compression happening as the on-disk deltas don't appear to be reusable
all the time. But for smaller batch sizes, the server is able to find
reasonable deltas partly because we are asking for objects that appear
in the same region of the directory tree and include all versions of a
file at a specific path.

To contrast this example, I tested the microsoft/fluentui repo, which
has been known to have inefficient packing due to name hash collisions.
These results are found before GitHub had the opportunity to repack the
server with more advanced name hash versions:

| Batch Size      | Pack Count | Pack Size | Time   |
|-----------------|------------|-----------|--------|
| (Initial clone) | 2          | 105 MB    |        |
| 5K              | 53         | 348 MB    | 2m 26s |
| 10K             | 28         | 365 MB    | 2m 22s |
| 15K             | 19         | 407 MB    | 2m 21s |
| 20K             | 15         | 393 MB    | 2m 28s |
| 25K             | 13         | 417 MB    | 2m 06s |
| 50K             | 8          | 509 MB    | 1m 34s |
| 100K            | 5          | 535 MB    | 1m 56s |
| 250K            | 4          | 698 MB    | 1m 33s |
| 500K            | 3          | 696 MB    | 1m 42s |

Here, a larger variety of batch sizes were chosen because of the great
variation in results. By asking the server to download small batches
corresponding to fewer paths at a time, the server is able to provide
better compression for these batches than it would for a regular clone.
A typical full clone for this repository would require 738 MB.

This example justifies the choice to batch requests by path name,
leading to improved communication with a server that is not optimally
packed.

Finally, the same experiment for the Linux repository had these results:

| Batch Size      | Pack Count | Pack Size | Time    |
|-----------------|------------|-----------|---------|
| (Initial clone) | 2          | 2,153 MB  |         |
| 25K             | 63         | 6,380 MB  | 14m 08s |
| 50K             | 58         | 6,126 MB  | 15m 11s |
| 100K            | 30         | 6,135 MB  | 18m 11s |
| 250K            | 14         | 6,146 MB  | 18m 22s |
| 500K            | 8          | 6,143 MB  | 33m 29s |

Even in this example, where the default name hash algorithm leads to
decent compression of the Linux kernel repository, there is value for
selecting a smaller batch size, to a limit. The 25K batch size has the
fastest time, but uses 250 MB more than the 50K batch size. The 500K
batch size took much more time due to server compression time and thus
we should avoid large batch sizes like this.

Based on these experiments, a batch size of 50,000 was chosen as the
default value.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:42 -08:00
Derrick Stolee 1e72e889e7 backfill: basic functionality and tests
The default behavior of 'git backfill' is to fetch all missing blobs that
are reachable from HEAD. Document and test this behavior.

The implementation is a very simple use of the path-walk API, initializing
the revision walk at HEAD to start the path-walk from all commits reachable
from HEAD. Ignore the object arrays that correspond to tree entries,
assuming that they are all present already.

The path-walk API provides lists of objects in batches according to a
common path, but that list could be very small. We want to balance the
number of requests to the server with the ability to have the process
interrupted with minimal repeated work to catch up in the next run.
Based on some experiments (detailed in the next change) a minimum batch
size of 50,000 is selected for the default.

This batch size is a _minimum_. As the path-walk API emits lists of blob
IDs, they are collected into a list of objects for a request to the
server. When that list is at least the minimum batch size, then the
request is sent to the server for the new objects. However, the list of
blob IDs from the path-walk API could be much longer than the batch
size. At this moment, it is unclear if there is a benefit to split the
list when there are too many objects at the same path.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:41 -08:00
Derrick Stolee a3f79e9abd backfill: add builtin boilerplate
In anticipation of implementing 'git backfill', populate the necessary files
with the boilerplate of a new builtin. Mark the builtin as experimental at
this time, allowing breaking changes in the near future, if necessary.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:41 -08:00
David Aguilar e4542d8b35 help: add "show" as a valid configuration value
Add a literal value for showing the suggested autocorrection
for consistency with the rest of the help.autocorrect options.

Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 15:22:05 -08:00
David Aguilar e21bf2c431 help: show the suggested command when help.autocorrect is false
Make the handling of false boolean values for help.autocorrect
consistent with the handling of value 0 by showing the suggested
commands but not running them.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 15:22:03 -08:00
Junio C Hamano a0fc18f042 Merge branch 'sc/help-autocorrect-one' into da/help-autocorrect-one-fix
* sc/help-autocorrect-one:
  help: interpret boolean string values for help.autocorrect
2025-02-03 15:21:57 -08:00
Junio C Hamano bc204b7427 The seventh batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 10:23:35 -08:00
Junio C Hamano 803b5acaa7 Merge branch 'ps/3.0-remote-deprecation'
Following the procedure we established to introduce breaking
changes for Git 3.0, allow an early opt-in for removing support of
$GIT_DIR/branches/ and $GIT_DIR/remotes/ directories to configure
remotes.

* ps/3.0-remote-deprecation:
  remote: announce removal of "branches/" and "remotes/"
  builtin/pack-redundant: remove subcommand with breaking changes
  ci: repurpose "linux-gcc" job for deprecations
  ci: merge linux-gcc-default into linux-gcc
  Makefile: wire up build option for deprecated features
2025-02-03 10:23:33 -08:00