git/builtin
Derrick Stolee 70664d2865 pack-objects: add --path-walk option
In order to more easily compute delta bases among objects that appear at
the exact same path, add a --path-walk option to 'git pack-objects'.

This option will use the path-walk API instead of the object walk given
by the revision machinery. Since objects will be provided in batches
representing a common path, those objects can be tested for delta bases
immediately instead of waiting for a sort of the full object list by
name-hash. This has multiple benefits, including avoiding collisions by
name-hash.

The objects marked as UNINTERESTING are included in these batches, so we
are guaranteeing some locality to find good delta bases.

After the individual passes are done on a per-path basis, the default
name-hash is used to find other opportunistic delta bases that did not
match exactly by the full path name.

The current implementation performs delta calculations while walking
objects, which is not ideal for a few reasons. First, this will cause
the "Enumerating objects" phase to be much longer than usual. Second, it
does not take advantage of threading during the path-scoped delta
calculations. Even with this lack of threading, the path-walk option is
sometimes faster than the usual approach. Future changes will refactor
this code to allow for threading, but that complexity is deferred until
later to keep this patch as simple as possible.

This new walk is incompatible with some features and is ignored by
others:

 * Object filters are not currently integrated with the path-walk API,
   such as sparse-checkout or tree depth. A blobless packfile could be
   integrated easily, but that is deferred for later.

 * Server-focused features such as delta islands, shallow packs, and
   using a bitmap index are incompatible with the path-walk API.

 * The path walk API is only compatible with the --revs option, not
   taking object lists or pack lists over stdin. These alternative ways
   to specify the objects currently ignores the --path-walk option
   without even a warning.

Future changes will create performance tests that demonstrate the power
of this approach.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:15:38 -07:00
..
add.c global: trivial conversions to fix `-Wsign-compare` warnings 2024-12-06 20:20:04 +09:00
am.c path: drop `git_pathdup()` in favor of `repo_git_path()` 2025-02-07 09:59:22 -08:00
annotate.c Merge branch 'jc/a-commands-without-the-repo' 2024-10-25 14:02:36 -04:00
apply.c builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h 2024-09-13 14:32:24 -07:00
archive.c archive: remove the_repository global variable 2024-10-11 09:37:18 -07:00
backfill.c Merge branch 'ds/backfill' 2025-02-18 15:30:31 -08:00
bisect.c global: mark code units that generate warnings with `-Wsign-compare` 2024-12-06 20:20:02 +09:00
blame.c Merge branch 'ps/the-repository' 2025-01-21 08:44:54 -08:00
branch.c Merge branch 'rs/ref-fitler-used-atoms-value-fix' 2025-01-29 14:05:09 -08:00
bugreport.c Merge branch 'ua/os-version-capability' 2025-02-27 15:23:00 -08:00
bundle.c Merge branch 'jt/bundle-fsck' 2024-12-13 07:33:36 -08:00
cat-file.c Merge branch 'ps/build-sign-compare' 2024-12-23 09:32:11 -08:00
check-attr.c builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h 2024-09-13 14:32:24 -07:00
check-ignore.c builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h 2024-09-13 14:32:24 -07:00
check-mailmap.c mailmap: fix check-mailmap with full mailmap line 2025-02-21 18:27:16 -08:00
check-ref-format.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
checkout--worker.c builtins: send usage_with_options() help text to standard output 2025-01-17 13:30:03 -08:00
checkout-index.c builtins: send usage_with_options() help text to standard output 2025-01-17 13:30:03 -08:00
checkout.c Merge branch 'ps/build-sign-compare' 2024-12-23 09:32:11 -08:00
clean.c global: mark code units that generate warnings with `-Wsign-compare` 2024-12-06 20:20:02 +09:00
clone.c Merge branch 'ps/path-sans-the-repository' 2025-03-05 10:37:43 -08:00
column.c builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h 2024-09-13 14:32:24 -07:00
commit-graph.c progress: stop using `the_repository` 2024-12-18 10:44:30 -08:00
commit-tree.c builtins: send usage_with_options() help text to standard output 2025-01-17 13:30:03 -08:00
commit.c path: drop `git_path()` in favor of `repo_git_path()` 2025-02-28 13:54:11 -08:00
config.c path: drop `git_pathdup()` in favor of `repo_git_path()` 2025-02-07 09:59:22 -08:00
count-objects.c packfile: pass down repository to `has_object[_kept]_pack` 2024-12-04 08:21:54 +09:00
credential-cache--daemon.c Merge branch 'mh/credential-cache-authtype-request-fix' 2025-01-28 13:02:24 -08:00
credential-cache.c Merge branch 'rj/cygwin-exit' 2024-11-01 12:53:19 -04:00
credential-store.c builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h 2024-09-13 14:32:24 -07:00
credential.c Merge branch 'jc/show-usage-help' 2025-01-28 13:02:22 -08:00
describe.c Merge branch 'ps/build-sign-compare' 2024-12-23 09:32:11 -08:00
diagnose.c diagnose: stop using `the_repository` 2024-12-18 10:44:31 -08:00
diff-files.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
diff-index.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
diff-tree.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
diff.c global: mark code units that generate warnings with `-Wsign-compare` 2024-12-06 20:20:02 +09:00
difftool.c difftool: eliminate use of USE_THE_REPOSITORY_VARIABLE 2025-02-06 13:00:21 -08:00
fast-export.c global: mark code units that generate warnings with `-Wsign-compare` 2024-12-06 20:20:02 +09:00
fast-import.c Merge branch 'ps/path-sans-the-repository' 2025-03-05 10:37:43 -08:00
fetch-pack.c oddballs: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
fetch.c Merge branch 'tb/fetch-follow-tags-fix' 2025-03-10 08:45:58 -07:00
fmt-merge-msg.c Merge branch 'jc/pass-repo-to-builtins' 2024-09-23 10:35:09 -07:00
for-each-ref.c ref-filter: remove ref_format_clear() 2025-01-21 09:06:24 -08:00
for-each-repo.c global: trivial conversions to fix `-Wsign-compare` warnings 2024-12-06 20:20:04 +09:00
fsck.c worktree: return allocated string from `get_worktree_git_dir()` 2025-02-07 09:59:23 -08:00
fsmonitor--daemon.c builtins: send usage_with_options() help text to standard output 2025-01-17 13:30:03 -08:00
gc.c Merge branch 'ps/path-sans-the-repository' 2025-03-05 10:37:43 -08:00
get-tar-commit-id.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
grep.c Revert barrier-based LSan threading race workaround 2025-01-01 14:13:01 -08:00
hash-object.c builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h 2024-09-13 14:32:24 -07:00
help.c pager: stop using `the_repository` 2024-12-18 10:44:30 -08:00
hook.c builtin: pass repository to sub commands 2024-11-26 10:36:08 +09:00
index-pack.c Merge branch 'ps/hash-cleanup' 2025-02-10 10:18:31 -08:00
init-db.c environment: move access to "core.sharedRepository" into repo settings 2025-02-28 13:54:11 -08:00
interpret-trailers.c trailer: spread usage of "trailer_block" language 2024-10-14 12:33:02 -04:00
log.c environment: move access to "core.sharedRepository" into repo settings 2025-02-28 13:54:11 -08:00
ls-files.c builtins: send usage_with_options() help text to standard output 2025-01-17 13:30:03 -08:00
ls-remote.c builtin/ls-remote: plug leaking server options 2024-11-04 22:37:51 -08:00
ls-tree.c builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h 2024-09-13 14:32:24 -07:00
mailinfo.c mailinfo: stop using `the_repository` 2024-12-18 10:44:31 -08:00
mailsplit.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
merge-base.c commit-reach: use `size_t` to track indices when computing merge bases 2024-12-27 08:12:40 -08:00
merge-file.c global: mark code units that generate warnings with `-Wsign-compare` 2024-12-06 20:20:02 +09:00
merge-index.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
merge-ours.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
merge-recursive.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
merge-tree.c merge-tree: only use basic merge config 2025-02-18 09:52:39 -08:00
merge.c builtins: send usage_with_options() help text to standard output 2025-01-17 13:30:03 -08:00
mktag.c builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h 2024-09-13 14:32:24 -07:00
mktree.c builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h 2024-09-13 14:32:24 -07:00
multi-pack-index.c midx-write: pass down repository to `write_midx_file[_only]` 2024-12-04 10:32:20 +09:00
mv.c global: mark code units that generate warnings with `-Wsign-compare` 2024-12-06 20:20:02 +09:00
name-rev.c global: mark code units that generate warnings with `-Wsign-compare` 2024-12-06 20:20:02 +09:00
notes.c path: drop `git_path()` in favor of `repo_git_path()` 2025-02-28 13:54:11 -08:00
pack-objects.c pack-objects: add --path-walk option 2025-05-16 12:15:38 -07:00
pack-redundant.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
pack-refs.c diff.h: fix index used to loop through unsigned integer 2024-12-06 20:20:03 +09:00
patch-id.c global: adapt callers to use generic hash context helpers 2025-01-31 10:06:11 -08:00
prune-packed.c builtin: remove USE_THE_REPOSITORY for those without the_repository 2024-09-13 14:33:30 -07:00
prune.c progress: stop using `the_repository` 2024-12-18 10:44:30 -08:00
pull.c global: trivial conversions to fix `-Wsign-compare` warnings 2024-12-06 20:20:04 +09:00
push.c remote: rename query_refspecs functions 2025-02-04 09:51:41 -08:00
range-diff.c Merge branch 'js/range-diff-diff-merges' 2024-12-23 09:32:17 -08:00
read-tree.c builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h 2024-09-13 14:32:24 -07:00
rebase.c path: drop `git_path()` in favor of `repo_git_path()` 2025-02-28 13:54:11 -08:00
receive-pack.c Merge branch 'ps/path-sans-the-repository' 2025-03-05 10:37:43 -08:00
reflog.c diff.h: fix index used to loop through unsigned integer 2024-12-06 20:20:03 +09:00
refs.c refs: show --no-reflog in the help text 2025-03-03 14:51:29 -08:00
remote-ext.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
remote-fd.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
remote.c Merge branch 'ps/path-sans-the-repository' 2025-03-05 10:37:43 -08:00
repack.c Merge branch 'ps/repack-keep-unreachable-in-unpacked-repo' 2025-02-12 10:08:52 -08:00
replace.c path: drop `git_pathdup()` in favor of `repo_git_path()` 2025-02-07 09:59:22 -08:00
replay.c parse-options: introduce die_for_incompatible_opt2() 2025-02-06 12:23:54 -08:00
rerere.c rerere: let `rerere_path()` write paths into a caller-provided buffer 2025-02-28 13:54:11 -08:00
reset.c diff.h: fix index used to loop through unsigned integer 2024-12-06 20:20:03 +09:00
rev-list.c rev-list: extend print-info to print missing object type 2025-02-05 09:32:01 -08:00
rev-parse.c path: drop `git_path()` in favor of `repo_git_path()` 2025-02-28 13:54:11 -08:00
revert.c diff.h: fix index used to loop through unsigned integer 2024-12-06 20:20:03 +09:00
rm.c global: mark code units that generate warnings with `-Wsign-compare` 2024-12-06 20:20:02 +09:00
send-pack.c send-pack: stop using `the_repository` 2024-12-18 10:44:30 -08:00
shortlog.c diff.h: fix index used to loop through unsigned integer 2024-12-06 20:20:03 +09:00
show-branch.c global: mark code units that generate warnings with `-Wsign-compare` 2024-12-06 20:20:02 +09:00
show-index.c Merge branch 'jc/show-index-h-update' 2025-01-31 09:44:16 -08:00
show-ref.c builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h 2024-09-13 14:32:24 -07:00
sparse-checkout.c global: mark code units that generate warnings with `-Wsign-compare` 2024-12-06 20:20:02 +09:00
stash.c global: trivial conversions to fix `-Wsign-compare` warnings 2024-12-06 20:20:04 +09:00
stripspace.c builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h 2024-09-13 14:32:24 -07:00
submodule--helper.c path: refactor `repo_submodule_path()` family of functions 2025-02-07 09:59:22 -08:00
symbolic-ref.c builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h 2024-09-13 14:32:24 -07:00
tag.c path: drop `git_pathdup()` in favor of `repo_git_path()` 2025-02-07 09:59:22 -08:00
unpack-file.c oddballs: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
unpack-objects.c global: adapt callers to use generic hash context helpers 2025-01-31 10:06:11 -08:00
update-index.c builtins: send usage_with_options() help text to standard output 2025-01-17 13:30:03 -08:00
update-ref.c global: mark code units that generate warnings with `-Wsign-compare` 2024-12-06 20:20:02 +09:00
update-server-info.c builtin/update-server-info: remove the_repository global variable 2025-02-10 16:20:21 -08:00
upload-archive.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
upload-pack.c serve: stop using `the_repository` 2024-12-18 10:44:30 -08:00
var.c Merge branch 'jc/show-usage-help' 2025-01-28 13:02:22 -08:00
verify-commit.c builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h 2024-09-13 14:32:24 -07:00
verify-pack.c builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h 2024-09-13 14:32:24 -07:00
verify-tag.c ref-filter: remove ref_format_clear() 2025-01-21 09:06:24 -08:00
worktree.c path: drop `git_path()` in favor of `repo_git_path()` 2025-02-28 13:54:11 -08:00
write-tree.c Merge branch 'jc/pass-repo-to-builtins' 2024-09-23 10:35:09 -07:00