Commit Graph

76101 Commits (85127bcdeab5ab34f9c738da3fcc88d637f39089)

Author SHA1 Message Date
Derrick Stolee 85127bcdea backfill: assume --sparse when sparse-checkout is enabled
The previous change introduced the '--[no-]sparse' option for the 'git
backfill' command, but did not assume it as enabled by default. However,
this is likely the behavior that users will most often want to happen.
Without this default, users with a small sparse-checkout may be confused
when 'git backfill' downloads every version of every object in the full
history.

However, this is left as a separate change so this decision can be reviewed
independently of the value of the '--[no-]sparse' option.

Add a test of adding the '--sparse' option to a repo without sparse-checkout
to make it clear that supplying it without a sparse-checkout is an error.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:42 -08:00
Derrick Stolee bff4555767 backfill: add --sparse option
One way to significantly reduce the cost of a Git clone and later fetches is
to use a blobless partial clone and combine that with a sparse-checkout that
reduces the paths that need to be populated in the working directory. Not
only does this reduce the cost of clones and fetches, the sparse-checkout
reduces the number of objects needed to download from a promisor remote.

However, history investigations can be expensive as computing blob diffs
will trigger promisor remote requests for one object at a time. This can be
avoided by downloading the blobs needed for the given sparse-checkout using
'git backfill' and its new '--sparse' mode, at a time that the user is
willing to pay that extra cost.

Note that this is distinctly different from the '--filter=sparse:<oid>'
option, as this assumes that the partial clone has all reachable trees and
we are using client-side logic to avoid downloading blobs outside of the
sparse-checkout cone. This avoids the server-side cost of walking trees
while also achieving a similar goal. It also downloads in batches based on
similar path names, presenting a resumable download if things are
interrupted.

This augments the path-walk API to have a possibly-NULL 'pl' member that may
point to a 'struct pattern_list'. This could be more general than the
sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently
the only consumer.

Be sure to test this in both cone mode and not cone mode. Cone mode has the
benefit that the path-walk can skip certain paths once they would expand
beyond the sparse-checkout. Non-cone mode can describe the included files
using both positive and negative patterns, which changes the possible return
values of path_matches_pattern_list(). Test both kinds of matches for
increased coverage.

To test this, we can create a blobless sparse clone, expand the
sparse-checkout slightly, and then run 'git backfill --sparse' to see
how much data is downloaded. The general steps are

 1. git clone --filter=blob:none --sparse <url>
 2. git sparse-checkout set <dir1> ... <dirN>
 3. git backfill --sparse

For the Git repository with the 'builtin' directory in the
sparse-checkout, we get these results for various batch sizes:

| Batch Size      | Pack Count | Pack Size | Time  |
|-----------------|------------|-----------|-------|
| (Initial clone) | 3          | 110 MB    |       |
| 10K             | 12         | 192 MB    | 17.2s |
| 15K             | 9          | 192 MB    | 15.5s |
| 20K             | 8          | 192 MB    | 15.5s |
| 25K             | 7          | 192 MB    | 14.7s |

This case matters less because a full clone of the Git repository from
GitHub is currently at 277 MB.

Using a copy of the Linux repository with the 'kernel/' directory in the
sparse-checkout, we get these results:

| Batch Size      | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|------|
| (Initial clone) | 2          | 1,876 MB  |      |
| 10K             | 11         | 2,187 MB  | 46s  |
| 25K             | 7          | 2,188 MB  | 43s  |
| 50K             | 5          | 2,194 MB  | 44s  |
| 100K            | 4          | 2,194 MB  | 48s  |

This case is more meaningful because a full clone of the Linux
repository is currently over 6 GB, so this is a valuable way to download
a fraction of the repository and no longer need network access for all
reachable objects within the sparse-checkout.

Choosing a batch size will depend on a lot of factors, including the
user's network speed or reliability, the repository's file structure,
and how many versions there are of the file within the sparse-checkout
scope. There will not be a one-size-fits-all solution.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:42 -08:00
Derrick Stolee 6840fe9ee2 backfill: add --min-batch-size=<n> option
Users may want to specify a minimum batch size for their needs. This is only
a minimum: the path-walk API provides a list of OIDs that correspond to the
same path, and thus it is optimal to allow delta compression across those
objects in a single server request.

We could consider limiting the request to have a maximum batch size in the
future. For now, we let the path-walk API batches determine the
boundaries.

To get a feeling for the value of specifying the --min-batch-size parameter,
I tested a number of open source repositories available on GitHub. The
procedure was generally:

 1. git clone --filter=blob:none <url>
 2. git backfill

Checking the number of packfiles and the size of the .git/objects/pack
directory helps to identify the effects of different batch sizes.

For the Git repository, we get these results:

| Batch Size      | Pack Count | Pack Size | Time  |
|-----------------|------------|-----------|-------|
| (Initial clone) | 2          | 119 MB    |       |
| 25K             | 8          | 290 MB    | 24s   |
| 50K             | 5          | 290 MB    | 24s   |
| 100K            | 4          | 290 MB    | 29s   |

Other than the packfile counts decreasing as we need fewer batches, the
size and time required is not changing much for this small example.

For the nodejs/node repository, we see these results:

| Batch Size      | Pack Count | Pack Size | Time   |
|-----------------|------------|-----------|--------|
| (Initial clone) | 2          | 330 MB    |        |
| 25K             | 19         | 1,222 MB  | 1m 22s |
| 50K             | 11         | 1,221 MB  | 1m 24s |
| 100K            | 7          | 1,223 MB  | 1m 40s |
| 250K            | 4          | 1,224 MB  | 2m 23s |
| 500K            | 3          | 1,216 MB  | 4m 38s |

Here, we don't have much difference in the size of the repo, though the
500K batch size results in a few MB gained. That comes at a cost of a
much longer time. This extra time is due to server-side delta
compression happening as the on-disk deltas don't appear to be reusable
all the time. But for smaller batch sizes, the server is able to find
reasonable deltas partly because we are asking for objects that appear
in the same region of the directory tree and include all versions of a
file at a specific path.

To contrast this example, I tested the microsoft/fluentui repo, which
has been known to have inefficient packing due to name hash collisions.
These results are found before GitHub had the opportunity to repack the
server with more advanced name hash versions:

| Batch Size      | Pack Count | Pack Size | Time   |
|-----------------|------------|-----------|--------|
| (Initial clone) | 2          | 105 MB    |        |
| 5K              | 53         | 348 MB    | 2m 26s |
| 10K             | 28         | 365 MB    | 2m 22s |
| 15K             | 19         | 407 MB    | 2m 21s |
| 20K             | 15         | 393 MB    | 2m 28s |
| 25K             | 13         | 417 MB    | 2m 06s |
| 50K             | 8          | 509 MB    | 1m 34s |
| 100K            | 5          | 535 MB    | 1m 56s |
| 250K            | 4          | 698 MB    | 1m 33s |
| 500K            | 3          | 696 MB    | 1m 42s |

Here, a larger variety of batch sizes were chosen because of the great
variation in results. By asking the server to download small batches
corresponding to fewer paths at a time, the server is able to provide
better compression for these batches than it would for a regular clone.
A typical full clone for this repository would require 738 MB.

This example justifies the choice to batch requests by path name,
leading to improved communication with a server that is not optimally
packed.

Finally, the same experiment for the Linux repository had these results:

| Batch Size      | Pack Count | Pack Size | Time    |
|-----------------|------------|-----------|---------|
| (Initial clone) | 2          | 2,153 MB  |         |
| 25K             | 63         | 6,380 MB  | 14m 08s |
| 50K             | 58         | 6,126 MB  | 15m 11s |
| 100K            | 30         | 6,135 MB  | 18m 11s |
| 250K            | 14         | 6,146 MB  | 18m 22s |
| 500K            | 8          | 6,143 MB  | 33m 29s |

Even in this example, where the default name hash algorithm leads to
decent compression of the Linux kernel repository, there is value for
selecting a smaller batch size, to a limit. The 25K batch size has the
fastest time, but uses 250 MB more than the 50K batch size. The 500K
batch size took much more time due to server compression time and thus
we should avoid large batch sizes like this.

Based on these experiments, a batch size of 50,000 was chosen as the
default value.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:42 -08:00
Derrick Stolee 1e72e889e7 backfill: basic functionality and tests
The default behavior of 'git backfill' is to fetch all missing blobs that
are reachable from HEAD. Document and test this behavior.

The implementation is a very simple use of the path-walk API, initializing
the revision walk at HEAD to start the path-walk from all commits reachable
from HEAD. Ignore the object arrays that correspond to tree entries,
assuming that they are all present already.

The path-walk API provides lists of objects in batches according to a
common path, but that list could be very small. We want to balance the
number of requests to the server with the ability to have the process
interrupted with minimal repeated work to catch up in the next run.
Based on some experiments (detailed in the next change) a minimum batch
size of 50,000 is selected for the default.

This batch size is a _minimum_. As the path-walk API emits lists of blob
IDs, they are collected into a list of objects for a request to the
server. When that list is at least the minimum batch size, then the
request is sent to the server for the new objects. However, the list of
blob IDs from the path-walk API could be much longer than the batch
size. At this moment, it is unclear if there is a benefit to split the
list when there are too many objects at the same path.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:41 -08:00
Derrick Stolee a3f79e9abd backfill: add builtin boilerplate
In anticipation of implementing 'git backfill', populate the necessary files
with the boilerplate of a new builtin. Mark the builtin as experimental at
this time, allowing breaking changes in the near future, if necessary.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 16:12:41 -08:00
Junio C Hamano e5a0d5d8bb Merge branch 'master' into ds/backfill
* master: (446 commits)
  The seventh batch
  The sixth batch
  The fifth batch
  The fourth batch
  refs/reftable: fix uninitialized memory access of `max_index`
  remote: announce removal of "branches/" and "remotes/"
  The third batch
  hash.h: drop unsafe_ function variants
  csum-file: introduce hashfile_checkpoint_init()
  t/helper/test-hash.c: use unsafe_hash_algo()
  csum-file.c: use unsafe_hash_algo()
  hash.h: introduce `unsafe_hash_algo()`
  csum-file.c: extract algop from hashfile_checksum_valid()
  csum-file: store the hash algorithm as a struct field
  t/helper/test-tool: implement sha1-unsafe helper
  trace2: prevent segfault on config collection with valueless true
  refs: fix creation of reflog entries for symrefs
  ci: wire up Visual Studio build with Meson
  ci: raise error when Meson generates warnings
  meson: fix compilation with Visual Studio
  ...
2025-02-03 16:12:33 -08:00
Junio C Hamano bc204b7427 The seventh batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03 10:23:35 -08:00
Junio C Hamano 1f124f3024 Merge branch 'kn/reflog-migration-fix-fix'
Fix bugs in an earlier attempt to fix "git refs migration".

* kn/reflog-migration-fix-fix:
  refs/reftable: fix uninitialized memory access of `max_index`
  reftable: write correct max_update_index to header
2025-02-03 10:23:35 -08:00
Junio C Hamano b83a2f9006 Merge branch 'kn/pack-write-with-reduced-globals'
Code clean-up.

* kn/pack-write-with-reduced-globals:
  pack-write: pass hash_algo to internal functions
  pack-write: pass hash_algo to `write_rev_file()`
  pack-write: pass hash_algo to `write_idx_file()`
  pack-write: pass repository to `index_pack_lockfile()`
  pack-write: pass hash_algo to `fixup_pack_header_footer()`
2025-02-03 10:23:34 -08:00
Junio C Hamano f49905d47d Merge branch 'ps/build-meson-fixes'
More build fixes and enhancements on meson based build procedure.

* ps/build-meson-fixes:
  ci: wire up Visual Studio build with Meson
  ci: raise error when Meson generates warnings
  meson: fix compilation with Visual Studio
  meson: make the CSPRNG backend configurable
  meson: wire up fuzzers
  meson: wire up generation of distribution archive
  meson: wire up development environments
  meson: fix dependencies for generated headers
  meson: populate project version via GIT-VERSION-GEN
  GIT-VERSION-GEN: allow running without input and output files
  GIT-VERSION-GEN: simplify computing the dirty marker
2025-02-03 10:23:34 -08:00
Junio C Hamano 803b5acaa7 Merge branch 'ps/3.0-remote-deprecation'
Following the procedure we established to introduce breaking
changes for Git 3.0, allow an early opt-in for removing support of
$GIT_DIR/branches/ and $GIT_DIR/remotes/ directories to configure
remotes.

* ps/3.0-remote-deprecation:
  remote: announce removal of "branches/" and "remotes/"
  builtin/pack-redundant: remove subcommand with breaking changes
  ci: repurpose "linux-gcc" job for deprecations
  ci: merge linux-gcc-default into linux-gcc
  Makefile: wire up build option for deprecated features
2025-02-03 10:23:33 -08:00
Junio C Hamano c43136d67b Merge branch 'jk/combine-diff-cleanup'
Code clean-up for code paths around combined diff.

* jk/combine-diff-cleanup:
  tree-diff: make list tail-passing more explicit
  tree-diff: simplify emit_path() list management
  tree-diff: use the name "tail" to refer to list tail
  tree-diff: drop list-tail argument to diff_tree_paths()
  combine-diff: drop public declaration of combine_diff_path_size()
  tree-diff: inline path_appendnew()
  tree-diff: pass whole path string to path_appendnew()
  tree-diff: drop path_appendnew() alloc optimization
  run_diff_files(): de-mystify the size of combine_diff_path struct
  diff: add a comment about combine_diff_path.parent.path
  combine-diff: use pointer for parent paths
  tree-diff: clear parent array in path_appendnew()
  combine-diff: add combine_diff_path_new()
  run_diff_files(): delay allocation of combine_diff_path
2025-02-03 10:23:33 -08:00
Junio C Hamano caf17423d3 Merge branch 'tb/unsafe-hash-cleanup'
The API around choosing to use unsafe variant of SHA-1
implementation has been updated in an attempt to make it harder to
abuse.

* tb/unsafe-hash-cleanup:
  hash.h: drop unsafe_ function variants
  csum-file: introduce hashfile_checkpoint_init()
  t/helper/test-hash.c: use unsafe_hash_algo()
  csum-file.c: use unsafe_hash_algo()
  hash.h: introduce `unsafe_hash_algo()`
  csum-file.c: extract algop from hashfile_checksum_valid()
  csum-file: store the hash algorithm as a struct field
  t/helper/test-tool: implement sha1-unsafe helper
2025-02-03 10:23:32 -08:00
Junio C Hamano 58b5801aa9 The sixth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31 09:44:16 -08:00
Junio C Hamano 81309f424b Merge branch 'jc/show-index-h-update'
Doc and short-help text for "show-index" has been clarified to
stress that the command reads its data from the standard input.

* jc/show-index-h-update:
  show-index: the short help should say the command reads from its input
2025-01-31 09:44:16 -08:00
Junio C Hamano bdd1988eb3 Merge branch 'ja/doc-notes-markup-updates'
Doc mark-up updates.

* ja/doc-notes-markup-updates:
  doc: convert git-notes to new documentation format
2025-01-31 09:44:15 -08:00
Junio C Hamano ecba2c181c Merge branch 'sk/strlen-returns-size_t'
Code clean-up.

* sk/strlen-returns-size_t:
  date.c: Fix type missmatch warings from msvc
2025-01-31 09:44:15 -08:00
Junio C Hamano dccd9c5cf2 Merge branch 'ja/doc-restore-markup-update'
Doc mark-up updates.

* ja/doc-restore-markup-update:
  doc: convert git-restore to new style format
2025-01-31 09:44:15 -08:00
Junio C Hamano 3b0d05c4a7 The fifth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-29 14:05:10 -08:00
Junio C Hamano 296cf82f93 Merge branch 'ps/reflog-migration-with-logall-fix'
The "git refs migrate" command did not migrate the reflog for
refs/stash, which is the contents of the stashes, which has been
corrected.

* ps/reflog-migration-with-logall-fix:
  refs: fix migration of reflogs respecting "core.logAllRefUpdates"
2025-01-29 14:05:10 -08:00
Junio C Hamano c5216a1bc6 Merge branch 'am/trace2-with-valueless-true'
The trace2 code was not prepared to show a configuration variable
that is set to true using the valueless true syntax, which has been
corrected.

* am/trace2-with-valueless-true:
  trace2: prevent segfault on config collection with valueless true
2025-01-29 14:05:10 -08:00
Junio C Hamano d205f06ae0 Merge branch 'kn/reflog-symref-fix'
reflog entries for symbolic ref updates were broken, which has been
corrected.

* kn/reflog-symref-fix:
  refs: fix creation of reflog entries for symrefs
2025-01-29 14:05:10 -08:00
Junio C Hamano 8d6240d4c6 Merge branch 'rs/ref-fitler-used-atoms-value-fix'
"git branch --sort=..." and "git for-each-ref --format=... --sort=..."
did not work as expected with some atoms, which has been corrected.

* rs/ref-fitler-used-atoms-value-fix:
  ref-filter: remove ref_format_clear()
  ref-filter: move is-base tip to used_atom
  ref-filter: move ahead-behind bases into used_atom
2025-01-29 14:05:09 -08:00
Junio C Hamano de56e1d746 Merge branch 'ja/doc-commit-markup-updates'
Doc updates.

* ja/doc-commit-markup-updates:
  doc: migrate git-commit manpage secondary files to new format
  doc: convert git commit config to new format
  doc: make more direct explanations in git commit options
  doc: the mode param of -u of git commit is optional
  doc: apply new documentation guidelines to git commit
2025-01-29 14:05:09 -08:00
Junio C Hamano f046ab2dd4 Merge branch 'ds/path-walk-1'
Introduce a new API to visit objects in batches based on a common
path, or by type.

* ds/path-walk-1:
  path-walk: drop redundant parse_tree() call
  path-walk: reorder object visits
  path-walk: mark trees and blobs as UNINTERESTING
  path-walk: visit tags and cached objects
  path-walk: allow consumer to specify object types
  t6601: add helper for testing path-walk API
  test-lib-functions: add test_cmp_sorted
  path-walk: introduce an object walk by path
2025-01-29 14:05:09 -08:00
Junio C Hamano da898a5c64 The fourth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28 13:02:25 -08:00
Junio C Hamano b09b10ad26 Merge branch 'jp/t8002-printf-fix'
Test fix.

* jp/t8002-printf-fix:
  t8002: fix ambiguous printf conversion specifications
2025-01-28 13:02:24 -08:00
Junio C Hamano a17fd7dd3a Merge branch 'ps/reftable-sign-compare'
The reftable/ library code has been made -Wsign-compare clean.

* ps/reftable-sign-compare:
  reftable: address trivial -Wsign-compare warnings
  reftable/blocksource: adjust `read_block()` to return `ssize_t`
  reftable/blocksource: adjust type of the block length
  reftable/block: adjust type of the restart length
  reftable/block: adapt header and footer size to return a `size_t`
  reftable/basics: adjust `hash_size()` to return `uint32_t`
  reftable/basics: adjust `common_prefix_size()` to return `size_t`
  reftable/record: handle overflows when decoding varints
  reftable/record: drop unused `print` function pointer
  meson: stop disabling -Wsign-compare
2025-01-28 13:02:24 -08:00
Junio C Hamano 73e055d71e Merge branch 'mh/credential-cache-authtype-request-fix'
The "cache" credential back-end did not handle authtype correctly,
which has been corrected.

* mh/credential-cache-authtype-request-fix:
  credential-cache: respect authtype capability
2025-01-28 13:02:24 -08:00
Junio C Hamano f8b9821f7d Merge branch 'jk/pack-header-parse-alignment-fix'
It was possible for "git unpack-objects" and "git index-pack" to
make an unaligned access, which has been corrected.

* jk/pack-header-parse-alignment-fix:
  index-pack, unpack-objects: use skip_prefix to avoid magic number
  index-pack, unpack-objects: use get_be32() for reading pack header
  parse_pack_header_option(): avoid unaligned memory writes
  packfile: factor out --pack_header argument parsing
  bswap.h: squelch potential sparse -Wcast-truncate warnings
2025-01-28 13:02:23 -08:00
Junio C Hamano 3ddeb7f337 Merge branch 'ps/build-meson-subtree'
The meson-driven build is now aware of "git-subtree" housed in
contrib/subtree hierarchy.

* ps/build-meson-subtree:
  meson: wire up the git-subtree(1) command
  meson: introduce build option for contrib
  contrib/subtree: fix building docs
2025-01-28 13:02:23 -08:00
Junio C Hamano 63d555a2dc Merge branch 'mh/connect-sign-compare'
The code in connect.c has been updated to work around complaints
from -Wsign-compare.

* mh/connect-sign-compare:
  connect: address -Wsign-compare warnings
2025-01-28 13:02:23 -08:00
Junio C Hamano 8d335468ec Merge branch 'sk/unit-tests'
Move a few more unit tests to the clar test framework.

* sk/unit-tests:
  t/unit-tests: convert reftable tree test to use clar test framework
  t/unit-tests: adapt priority queue test to use clar test framework
  t/unit-tests: convert mem-pool test to use clar test framework
  t/unit-tests: handle dashes in test suite filenames
2025-01-28 13:02:22 -08:00
Junio C Hamano f0a371a39d Merge branch 'jc/show-usage-help'
The help text from "git $cmd -h" appear on the standard output for
some $cmd and the standard error for others.  The built-in commands
have been fixed to show them on the standard output consistently.

* jc/show-usage-help:
  builtin: send usage() help text to standard output
  oddballs: send usage() help text to standard output
  builtins: send usage_with_options() help text to standard output
  usage: add show_usage_if_asked()
  parse-options: add show_usage_with_options_if_asked()
  t0012: optionally check that "-h" output goes to stdout
2025-01-28 13:02:22 -08:00
Karthik Nayak f11f0a5a2d refs/reftable: fix uninitialized memory access of `max_index`
When migrating reflogs between reference backends, maintaining the
original order of the reflog entries is crucial. To achieve this, an
`index` field is stored within the `ref_update` struct that encodes the
relative order of reflog entries. This field is used by the reftable
backend as update index for the respective reflog entries to maintain
that ordering.

These update indices must be respected when writing table headers, which
encode the minimum and maximum update index of contained records in the
header and footer. This logic was added in commit bc67b4ab5f (reftable:
write correct max_update_index to header, 2025-01-15), which started to
use `reftable_writer_set_limits()` to propagate the mininum and maximum
update index of all records contained in a ref transaction.

However, we only set the maximum update index for the first transaction
argument, even though there can be multiple such arguments. This is the
case when we write to multiple stacks in a single transaction, e.g. when
updating references in two different worktrees at once. Consequently,
the update index for all but the first argument remain uninitialized,
which may cause undefined behaviour.

Fix this by moving the assignment of the maximum update index in
`reftable_be_transaction_finish()` inside the loop, which ensures that
all elements of the array are correctly initialized.

Furthermore, initialize the `max_index` field to 0 when queueing a new
transaction argument. This is not strictly necessary, as all elements of
`write_transaction_table_arg.max_index` are now assigned correctly.
However, this initialization is added for consistency and to safeguard
against potential future changes that might inadvertently introduce
uninitialized memory access.

Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-27 08:21:41 -08:00
Patrick Steinhardt 8ccc75c245 remote: announce removal of "branches/" and "remotes/"
Back when Git was in its infancy, remotes were configured via separate
files in "branches/" (back in 2005). This mechanism was replaced later
that year with the "remotes/" directory. Both mechanisms have eventually
been replaced by config-based remotes, and it is very unlikely that
anybody still uses these directories to configure their remotes.

Both of these directories have been marked as deprecated, one in 2005
and the other one in 2011. Follow through with the deprecation and
finally announce the removal of these features in Git 3.0.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
[jc: with a small tweak to the help message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-24 08:08:56 -08:00
Junio C Hamano 5f8f7081f7 The third batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23 15:07:03 -08:00
Junio C Hamano 39ba2e8e56 Merge branch 'jc/cli-doc-option-and-config'
Doc update.

* jc/cli-doc-option-and-config:
  gitcli: document that command line trumps config and env
2025-01-23 15:07:02 -08:00
Junio C Hamano 6ecb4fc149 Merge branch 'mh/doc-credential-helpers-with-pat'
Document that it is insecure to use Personal Access Tokens, which
some hosting providers take as username/password, embedded in URLs.

* mh/doc-credential-helpers-with-pat:
  docs: discuss caching personal access tokens
  docs: list popular credential helpers
2025-01-23 15:07:02 -08:00
Junio C Hamano 294673a17e Merge branch 'ak/instaweb-python-port-binding-fix'
The "instaweb" bound only to local IP address without "--local" and
to all addresses with "--local", which was the other way around, when
using Python's http.server class, which has been corrected.

* ak/instaweb-python-port-binding-fix:
  instaweb: fix ip binding for the python http.server
2025-01-23 15:07:02 -08:00
Junio C Hamano aa31820d9d Merge branch 'sj/meson-doc-technical-dependency-fix'
The meson build procedure for Documentation/technical/ hierarchy was
missing necessary dependencies, which has been corrected.

* sj/meson-doc-technical-dependency-fix:
  meson: fix missing deps for technical articles
2025-01-23 15:07:02 -08:00
Junio C Hamano d8093fd6c1 Merge branch 'tc/meson-use-our-version-def-h'
The meson build procedure looked for the 'version-def.h' file in a
wrong directory, which has been corrected.

* tc/meson-use-our-version-def-h:
  meson: ensure correct version-def.h is used
2025-01-23 15:07:01 -08:00
Junio C Hamano 7e3cb2e515 Merge branch 'en/object-name-with-funny-refname-fix'
Extended SHA-1 expression parser did not work well when a branch
with an unusual name (e.g. "foo{bar") is involved.

* en/object-name-with-funny-refname-fix:
  object-name: be more strict in parsing describe-like output
  object-name: fix resolution of object names containing curly braces
2025-01-23 15:07:01 -08:00
Junio C Hamano 0cb454c072 Merge branch 'ds/path-walk-1' into ds/backfill
* ds/path-walk-1:
  path-walk: drop redundant parse_tree() call
  path-walk: reorder object visits
  path-walk: mark trees and blobs as UNINTERESTING
  path-walk: visit tags and cached objects
  path-walk: allow consumer to specify object types
  t6601: add helper for testing path-walk API
  test-lib-functions: add test_cmp_sorted
  path-walk: introduce an object walk by path
2025-01-23 12:00:40 -08:00
Taylor Blau 04292c3796 hash.h: drop unsafe_ function variants
Now that all callers have been converted from:

    the_hash_algo->unsafe_init_fn();

to

    unsafe_hash_algo(the_hash_algo)->init_fn();

and similar, we can remove the scaffolding for the unsafe_ function
variants and force callers to use the new unsafe_hash_algo() mechanic
instead.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23 10:28:17 -08:00
Taylor Blau a8dd3821fe csum-file: introduce hashfile_checkpoint_init()
In 106140a99f (builtin/fast-import: fix segfault with unsafe SHA1
backend, 2024-12-30) and 9218c0bfe1 (bulk-checkin: fix segfault with
unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to
initialize a hashfile_checkpoint with the same hash function
implementation as is used by the hashfile it is used to checkpoint.

While both 106140a99f and 9218c0bfe1 work around the immediate crash,
changing the hash function implementation within the hashfile API to,
for example, the non-unsafe variant would re-introduce the crash. This
is a result of the tight coupling between initializing hashfiles and
hashfile_checkpoints.

Introduce and use a new function which ensures that both parts of a
hashfile and hashfile_checkpoint pair use the same hash function
implementation to avoid such crashes.

A few things worth noting:

  - In the change to builtin/fast-import.c::stream_blob(), we can see
    that by removing the explicit reference to
    'the_hash_algo->unsafe_init_fn()', we are hardened against the
    hashfile API changing away from the_hash_algo (or its unsafe
    variant) in the future.

  - The bulk-checkin code no longer needs to explicitly zero-initialize
    the hashfile_checkpoint, since it is now done as a result of calling
    'hashfile_checkpoint_init()'.

  - Also in the bulk-checkin code, we add an additional call to
    prepare_to_stream() outside of the main loop in order to initialize
    'state->f' so we know which hash function implementation to use when
    calling 'hashfile_checkpoint_init()'.

    This is OK, since subsequent 'prepare_to_stream()' calls are noops.
    However, we only need to call 'prepare_to_stream()' when we have the
    HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling
    'prepare_to_stream()' does not assign 'state->f', so we have nothing
    to initialize.

  - Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are
    appropriately guarded.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23 10:28:17 -08:00
Taylor Blau 3339180b28 t/helper/test-hash.c: use unsafe_hash_algo()
Remove a series of conditionals within the shared cmd_hash_impl() helper
that powers the 'sha1' and 'sha1-unsafe' helpers.

Instead, replace them with a single conditional that transforms the
specified hash algorithm into its unsafe variant. Then all subsequent
calls can directly use whatever function it wants to call without having
to decide between the safe and unsafe variants.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23 10:28:17 -08:00
Taylor Blau f0c266af4e csum-file.c: use unsafe_hash_algo()
Instead of calling the unsafe_ hash function variants directly, make use
of the shared 'algop' pointer by initializing it to:

    f->algop = unsafe_hash_algo(the_hash_algo);

, thus making all calls use the unsafe variants directly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23 10:28:17 -08:00
Taylor Blau 7b081d2f70 hash.h: introduce `unsafe_hash_algo()`
In 253ed9ecff (hash.h: scaffolding for _unsafe hashing variants,
2024-09-26), we introduced "unsafe" variants of the SHA-1 hashing
functions by introducing new functions like "unsafe_init_fn()" and so
on.

This approach has a major shortcoming that callers must remember to
consistently use one variant or the other. Failing to consistently use
(or not use) the unsafe variants can lead to crashes at best, or subtle
memory corruption issues at worst.

In the hashfile API, this isn't difficult to achieve, but verifying that
all callers consistently use the unsafe variants is somewhat of a chore
given how spread out all of the callers are. In the sha1 and sha1-unsafe
test helpers, all of the calls to various hash functions are guarded by
an "if (unsafe)" conditional, which is repetitive and cumbersome.

Address these issues by introducing a new pattern whereby one
'git_hash_algo' can return a pointer to another 'git_hash_algo' that
represents the unsafe version of itself. So instead of having something
like:

    if (unsafe)
      the_hash_algo->init_fn(...);
      the_hash_algo->update_fn(...);
      the_hash_algo->final_fn(...);
    else
      the_hash_algo->unsafe_init_fn(...);
      the_hash_algo->unsafe_update_fn(...);
      the_hash_algo->unsafe_final_fn(...);

we can instead write:

    struct git_hash_algo *algop = the_hash_algo;
    if (unsafe)
      algop = unsafe_hash_algo(algop);

    algop->init_fn(...);
    algop->update_fn(...);
    algop->final_fn(...);

This removes the existing shortcoming by no longer forcing the caller to
"remember" which variant of the hash functions it wants to call, only to
hold onto a 'struct git_hash_algo' pointer that is initialized once.

Similarly, while there currently is still a way to "mix" safe and unsafe
functions, this too will go away after subsequent commits remove all
direct calls to the unsafe_ variants.

Note that hash_algo_by_ptr() needs an adjustment to allow passing in the
unsafe variant of a hash function. All other query functions on the
hash_algos array will continue to return the safe variants of any
function.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23 10:28:16 -08:00
Taylor Blau 5fcc683338 csum-file.c: extract algop from hashfile_checksum_valid()
Perform a similar transformation as in the previous commit, but focused
instead on hashfile_checksum_valid(). This function does not work with a
hashfile structure itself, and instead validates the raw contents of a
file written using the hashfile API.

We'll want to be prepared for a similar change to this function in the
future, so prepare ourselves for that by extracting 'the_hash_algo' into
its own field for use within this function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-23 10:28:16 -08:00