git/builtin
Derrick Stolee 2dc858e69e pack-objects: support sparse:oid filter with path-walk
The --filter=sparse:<oid> option to 'git pack-objects' allows focusing
an object set to a sparse-checkout definition. This reduces the set of
matching blobs while retaining all reachable trees. No server currently
supports fetching with this filter because it is expensive to compute
and reachability bitmaps do not help without a significant effort to
extend the bitmap feature to store bitmaps for each supported sparse-
checkout definition.

Without focusing on serving fetches and clones with these filters, there
are still benefits that could be realized by making this faster. With
the sparse index, it's more realistic now than ever to be able to
operate a local clone that was bootstrapped by a packfile created with
a sparse filter, because the missing trees are not needed to move a
sparse-checkout from one commit to another or to view the history of any
path in scope. Such clones could perhaps be bootstrapped by partial
bundles.

Previously, constructing these sparse packs has been incredibly
computationally inefficient. The revision walk that explores which
objects are in scope spends a lot of time checking each object to see if
it matches the sparse-checkout patterns, causing quadratic behavior
(number of objects times number of sparse-checkout patterns). This
improves somewhat when using cone-mode sparse-checkout patterns that can
use hashtables and prefix matches to determine containment. However, the
check per object is still too expensive for most cases.

This is where the path-walk feature comes in. We can proceed as normal
by placing objects in bins by path and _then_ check a group of objects
all at once. Since sparse:<oid> only restricts blobs, the path-walk must
include all reachable trees while using the cone-mode patterns to skip
blobs at paths outside the sparse scope. This establishes a baseline for
a potential future "treesparse:<oid>" filter that would also restrict
trees, but introducing such a new filter is deferred to a later change.

The implementation here is focused around loading the sparse-checkout
patterns from the provided object ID and checking that the patterns are
indeed cone-mode patterns. We can then load the correct pattern list
into the path walk context and use the logic that already exists from
bff4555767 (backfill: add --sparse option, 2025-02-03), though that
feature loads sparse-checkout patterns from the worktree's local
settings and also restricts tree objects. We use a combination of errors
and warnings to signal problems during this load. The difference is that
errors are likely fatal for the non-path-walk version while the warnings
are probably just implementation details for the path-walk version and
the 'git pack-objects' command can fall back to the revision walk
version.

Now that the SEEN flag is deferred until after pattern checks (from the
previous commit), handle the case where a tree with a shared OID appears
at both an out-of-cone and in-cone path. When trees are not being pruned
(pl_sparse_trees == 0), the path-walk re-walks the tree at the in-cone
path so that in-cone blobs within it are discovered. The new tests in
t5317 and t6601 demonstrate this behavior and would fail without these
changes.

The performance test p5315 shows the impact of this change when using
sparse filters:

Test                                              HEAD~1     HEAD
----------------------------------------------------------------------
5315.10: repack (sparse:oid)                      77.98    77.47  -0.7%
5315.11: repack size (sparse:oid)                187.5M   187.4M  -0.0%
5315.12: repack (sparse:oid, --path-walk)         77.91    31.41 -59.7%
5315.13: repack size (sparse:oid, --path-walk)   187.5M   161.1M -14.1%

These performance tests were run on the Git repository. The --path-walk
feature shows meaningful space savings (14% smaller for sparse packs)
and dramatic time savings (60% faster) by leveraging the path-walk's
ability to skip blobs outside the sparse scope.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blaue <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-24 18:41:06 +09:00
..
add.c Merge branch 'ps/history-split' 2026-03-24 12:31:32 -07:00
am.c Merge branch 'vp/http-rate-limit-retries' 2026-04-01 10:28:18 -07:00
annotate.c
apply.c builtin: use default hash when outside a repository 2025-07-01 14:58:24 -07:00
archive.c
backfill.c path-walk: add pl_sparse_trees to control tree pruning 2026-05-24 18:41:06 +09:00
bisect.c refs: replace `refs_for_each_glob_ref_in()` 2026-02-23 13:21:19 -08:00
blame.c mailmap: stop using the_repository 2026-02-20 08:13:58 -08:00
branch.c object-name: turn INTERPRET_BRANCH_* constants into enum values 2026-03-18 12:52:29 -07:00
bugreport.c object-file: move `safe_create_leading_directories()` into "path.c" 2025-04-15 08:24:35 -07:00
bundle.c
cat-file.c odb: rename `odb_has_object()` flags 2026-03-31 20:43:14 -07:00
check-attr.c config: drop `git_config()` wrapper 2025-07-23 08:15:18 -07:00
check-ignore.c config: move Git config parsing into "environment.c" 2025-07-23 08:15:22 -07:00
check-mailmap.c mailmap: stop using the_repository 2026-02-20 08:13:58 -08:00
check-ref-format.c
checkout--worker.c config: move Git config parsing into "environment.c" 2025-07-23 08:15:22 -07:00
checkout-index.c config: move Git config parsing into "environment.c" 2025-07-23 08:15:22 -07:00
checkout.c Merge branch 'ps/history-split' 2026-03-24 12:31:32 -07:00
clean.c Merge branch 'jk/color-variable-fixes' 2025-09-29 11:40:35 -07:00
clone.c Merge branch 'ob/core-attributesfile-in-repository' 2026-03-05 10:04:49 -08:00
column.c config: drop `git_config()` wrapper 2025-07-23 08:15:18 -07:00
commit-graph.c commit-graph: add new config for changed-paths & recommend it in scalar 2025-10-22 10:40:11 -07:00
commit-tree.c commit: rename `free_commit_list()` to conform to coding guidelines 2026-01-15 05:32:31 -08:00
commit.c Merge branch 'ps/history-split' 2026-03-24 12:31:32 -07:00
config.c config: store allocated string in non-const pointer 2026-03-26 12:47:17 -07:00
count-objects.c packfile: introduce macro to iterate through packs 2025-10-16 14:42:39 -07:00
credential-cache--daemon.c config: drop `git_config_get_bool()` wrapper 2025-07-23 08:15:20 -07:00
credential-cache.c
credential-store.c builtin/credential-store: move is_rfc3986_unreserved to url.[ch] 2026-01-12 11:56:56 -08:00
credential.c config: move Git config parsing into "environment.c" 2025-07-23 08:15:22 -07:00
describe.c refs: replace `refs_for_each_rawref()` 2026-02-23 13:21:18 -08:00
diagnose.c object-file: move `safe_create_leading_directories()` into "path.c" 2025-04-15 08:24:35 -07:00
diff-files.c config: drop `git_config()` wrapper 2025-07-23 08:15:18 -07:00
diff-index.c config: drop `git_config()` wrapper 2025-07-23 08:15:18 -07:00
diff-pairs.c
diff-tree.c Merge branch 'ps/commit-list-functions-renamed' 2026-02-13 13:39:25 -08:00
diff.c diff: --no-index should ignore the worktree 2025-08-09 17:22:01 -07:00
difftool.c odb: rename `repo_read_object_file()` 2025-07-01 14:46:38 -07:00
fast-export.c fast-import: add 'abort-if-invalid' mode to '--signed-commits=<mode>' 2026-03-26 12:42:57 -07:00
fast-import.c Merge branch 'jt/fast-import-signed-modes' 2026-04-07 14:59:27 -07:00
fetch-pack.c builtin/fetch-pack: cleanup before return error 2025-06-04 08:52:25 -07:00
fetch.c odb: rename `odb_has_object()` flags 2026-03-31 20:43:14 -07:00
fmt-merge-msg.c builtin/fmt-merge-msg: stop depending on 'the_repository' 2025-08-11 09:19:40 -07:00
for-each-ref.c Merge branch 'ms/refs-list' 2025-08-22 13:13:20 -07:00
for-each-repo.c for-each-repo: simplify passing of parameters 2026-03-03 10:20:00 -08:00
fsck.c Merge branch 'ps/odb-cleanup' 2026-04-08 10:19:17 -07:00
fsmonitor--daemon.c config: move Git config parsing into "environment.c" 2025-07-23 08:15:22 -07:00
gc.c Merge branch 'ps/object-counting' 2026-03-25 12:58:05 -07:00
get-tar-commit-id.c
grep.c Merge branch 'ps/odb-sources' 2026-03-12 14:09:07 -07:00
hash-object.c config: move Git config parsing into "environment.c" 2025-07-23 08:15:22 -07:00
help.c Merge branch 'ac/help-sort-correctly' 2026-03-23 09:20:30 -07:00
history.c history: fix short help for argument of --update-refs 2026-04-06 10:17:36 -07:00
hook.c hook: reject unknown hook names in git-hook(1) 2026-03-25 14:00:48 -07:00
index-pack.c Merge branch 'ps/odb-cleanup' 2026-04-08 10:19:17 -07:00
init-db.c Merge branch 'ps/parse-options-integers' 2025-04-24 17:25:34 -07:00
interpret-trailers.c Merge branch 'kh/doc-interpret-trailers-1' 2026-03-27 11:00:02 -07:00
last-modified.c Merge branch 'tc/last-modified-not-a-tree' 2026-02-13 13:39:25 -08:00
log.c Merge branch 'mf/format-patch-cover-letter-format' 2026-04-03 13:01:08 -07:00
ls-files.c Merge branch 'ds/ls-files-lazy-unsparse' 2025-09-08 14:54:35 -07:00
ls-remote.c ref-filter: propagate peeled object ID 2025-11-04 07:32:25 -08:00
ls-tree.c cocci: convert parse_tree functions to repo_ variants 2026-01-09 18:36:18 -08:00
mailinfo.c
mailsplit.c
merge-base.c commit: rename `free_commit_list()` to conform to coding guidelines 2026-01-15 05:32:31 -08:00
merge-file.c Merge branch 'mr/merge-file-object-id-worktree-fix' 2026-03-27 11:00:01 -07:00
merge-index.c
merge-ours.c merge-ours: integrate with sparse-index 2026-02-06 11:45:33 -08:00
merge-recursive.c builtin: also setup gently for --help-all 2025-08-08 11:13:12 -07:00
merge-tree.c Merge branch 'ps/commit-list-functions-renamed' 2026-02-13 13:39:25 -08:00
merge.c run-command: wean auto_maintenance() functions off the_repository 2026-03-12 08:30:57 -07:00
mktag.c fsck: store repository in fsck options 2026-03-23 08:33:10 -07:00
mktree.c builtin/mktree: remove USE_THE_REPOSITORY_VARIABLE 2026-03-12 10:03:23 -07:00
multi-pack-index.c Merge branch 'ps/object-counting' 2026-03-25 12:58:05 -07:00
mv.c environment: stop using core.sparseCheckout globally 2026-02-26 07:22:51 -08:00
name-rev.c use commit_stack instead of prio_queue in LIFO mode 2026-03-18 10:39:56 -07:00
notes.c Merge branch 'jc/strbuf-split' 2025-08-21 13:47:00 -07:00
pack-objects.c pack-objects: support sparse:oid filter with path-walk 2026-05-24 18:41:06 +09:00
pack-redundant.c pack-redundant: fix memory leak when open_pack_index() fails 2026-02-21 21:26:53 -08:00
pack-refs.c builtin/pack-refs: factor out core logic into a shared library 2025-09-19 10:02:55 -07:00
patch-id.c patch-id: use “patch ID” throughout 2026-01-09 06:07:21 -08:00
prune-packed.c
prune.c Merge branch 'ps/object-file-wo-the-repository' 2025-08-05 11:53:55 -07:00
pull.c run-command: wean start_command() off the_repository 2026-03-12 08:30:57 -07:00
push.c environment: move "branch.autoSetupMerge" into `struct repo_config_values` 2026-02-26 07:22:53 -08:00
range-diff.c Merge branch 'kh/format-patch-range-diff-notes' 2025-10-14 12:56:09 -07:00
read-tree.c cocci: convert parse_tree functions to repo_ variants 2026-01-09 18:36:18 -08:00
rebase.c use strvec_pushv() to add another strvec 2026-03-24 12:26:58 -07:00
receive-pack.c Merge branch 'jk/c23-const-preserving-fixes-more' 2026-04-09 11:21:59 -07:00
reflog.c Merge branch 'ps/reflog-migrate-fixes' into maint-2.51 2025-10-15 10:29:28 -07:00
refs.c fsck: store repository in fsck options 2026-03-23 08:33:10 -07:00
remote-ext.c
remote-fd.c
remote.c odb: rename `odb_has_object()` flags 2026-03-31 20:43:14 -07:00
repack.c repack: mark non-MIDX packs above the split as excluded-open 2026-03-27 13:40:40 -07:00
replace.c refs: introduce wrapper struct for `each_ref_fn` 2025-11-04 07:32:24 -08:00
replay.c replay: allow to specify a ref with option --ref 2026-04-01 21:34:25 -07:00
repo.c repo: show subcommand-specific help text 2026-03-25 10:35:27 -07:00
rerere.c config: drop `git_config()` wrapper 2025-07-23 08:15:18 -07:00
reset.c add-patch: allow disabling editing of hunks 2026-03-03 15:09:36 -08:00
rev-list.c rev-list: use reduce_heads() for --maximal-only 2026-04-06 12:02:30 -07:00
rev-parse.c rev-parse: avoid writing to const string for parent marks 2026-03-26 12:47:17 -07:00
revert.c Merge branch 'pw/3.0-commentchar-auto-deprecation' 2025-09-18 10:07:00 -07:00
rm.c config: move Git config parsing into "environment.c" 2025-07-23 08:15:22 -07:00
send-pack.c config: move Git config parsing into "environment.c" 2025-07-23 08:15:22 -07:00
shortlog.c mailmap: stop using the_repository 2026-02-20 08:13:58 -08:00
show-branch.c commit: rename `free_commit_list()` to conform to coding guidelines 2026-01-15 05:32:31 -08:00
show-index.c show-index: use gettext wrapping in user facing error messages 2026-01-30 08:58:12 -08:00
show-ref.c odb: rename `odb_has_object()` flags 2026-03-31 20:43:14 -07:00
sparse-checkout.c Merge branch 'ob/core-attributesfile-in-repository' 2026-03-05 10:04:49 -08:00
stash.c docs: fix "git stash [push]" documentation 2026-03-30 08:19:40 -07:00
stripspace.c config: drop `git_config()` wrapper 2025-07-23 08:15:18 -07:00
submodule--helper.c Merge branch 'ps/object-counting' 2026-03-25 12:58:05 -07:00
symbolic-ref.c config: move Git config parsing into "environment.c" 2025-07-23 08:15:22 -07:00
tag.c Merge branch 'jt/fast-import-sign-again' 2026-03-24 12:31:31 -07:00
unpack-file.c config: move Git config parsing into "environment.c" 2025-07-23 08:15:22 -07:00
unpack-objects.c Merge branch 'ps/odb-cleanup' 2026-04-08 10:19:17 -07:00
update-index.c odb: add transaction interface 2025-09-16 11:37:06 -07:00
update-ref.c update-ref: utilize rejected error details if available 2026-01-25 22:27:33 -08:00
update-server-info.c config: move Git config parsing into "environment.c" 2025-07-23 08:15:22 -07:00
upload-archive.c path: move `enter_repo()` into "setup.c" 2025-11-19 17:41:03 -08:00
upload-pack.c path: move `enter_repo()` into "setup.c" 2025-11-19 17:41:03 -08:00
var.c Merge branch 'jc/string-list-split' 2025-08-21 13:46:59 -07:00
verify-commit.c config: move Git config parsing into "environment.c" 2025-07-23 08:15:22 -07:00
verify-pack.c config: move Git config parsing into "environment.c" 2025-07-23 08:15:22 -07:00
verify-tag.c tag: support arbitrary repositories in gpg_verify_tag() 2025-12-29 22:02:53 +09:00
worktree.c Merge branch 'pw/worktree-reduce-the-repository' 2026-04-03 13:01:09 -07:00
write-tree.c config: move Git config parsing into "environment.c" 2025-07-23 08:15:22 -07:00