git/builtin
Taylor Blau 5ee86c273b repack: exclude cruft pack(s) from the MIDX where possible
In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
MIDX with '--write-midx' to ensure that the resulting MIDX was always
closed under reachability in order to generate reachability bitmaps.

While the previous patch added the '--stdin-packs=follow' option to
pack-objects, it is not yet on by default. Given that, suppose you have
a once-unreachable object packed in a cruft pack, which later becomes
reachable from one or more objects in a geometrically repacked pack.
That once-unreachable object *won't* appear in the new pack, since the
cruft pack was not specified as included or excluded when the
geometrically repacked pack was created with 'pack-objects
--stdin-packs' (*not* '--stdin-packs=follow', which is not on). If that
new pack is included in a MIDX without the cruft pack, then trying to
generate bitmaps for that MIDX may fail. This happens when the bitmap
selection process picks one or more commits which reach the
once-unreachable objects.

To mitigate this failure mode, commit ddee3703b3 ensures that the MIDX
will be closed under reachability by including cruft pack(s). If cruft
pack(s) were not included, we would fail to generate a MIDX bitmap. But
ddee3703b3 alludes to the fact that this is sub-optimal by saying

    [...] it's desirable to avoid including cruft packs in the MIDX
    because it causes the MIDX to store a bunch of objects which are
    likely to get thrown away.

, which is true, but hides an even larger problem. If repositories
rarely prune their unreachable objects and/or have many of them, the
MIDX must keep track of a large number of objects which bloats the MIDX
and slows down object lookup.

This is doubly unfortunate because the vast majority of objects in cruft
pack(s) are unlikely to be read. But any object lookups that go through
the MIDX must binary search over them anyway, slowing down object
lookups using the MIDX.

This patch causes geometrically-repacked packs to contain a copy of any
once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
allowing us to avoid including any cruft packs in the MIDX. This is
because a sequence of geometrically-repacked packs that were all
generated with '--stdin-packs=follow' are guaranteed to have their union
be closed under reachability.

Note that you cannot guarantee that a collection of packs is closed
under reachability if not all of them were generated with "following" as
above. One tell-tale sign that not all geometrically-repacked packs in
the MIDX were generated with "following" is to see if there is a pack in
the existing MIDX that is not going to be somehow represented (either
verbatim or as part of a geometric rollup) in the new MIDX.

If there is, then starting to generate packs with "following" during
geometric repacking won't work, since it's open to the same race as
described above.

But if you're starting from scratch (e.g., building the first MIDX after
an all-into-one '--cruft' repack), then you can guarantee that the union
of subsequently generated packs from geometric repacking *is* closed
under reachability.

(One exception here is when "starting from scratch" results in a noop
repack, e.g., because the non-cruft pack(s) in a repository already form
a geometric progression. Since we can't tell whether or not those were
generated with '--stdin-packs=follow', they may depend on
once-unreachable objects, so we have to include the cruft pack in the
MIDX in this case.)

Detect when this is the case and avoid including cruft packs in the MIDX
where possible. The existing behavior remains the default, and the new
behavior is available with the config 'repack.midxMustIncludeCruft' set
to 'false'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-23 15:41:38 -07:00
..
add.c Merge branch 'ds/sparse-apply-add-p' 2025-05-27 13:59:09 -07:00
am.c Merge branch 'ly/am-split-stgit-leakfix' 2025-05-19 16:02:46 -07:00
annotate.c
apply.c apply: integrate with the sparse index 2025-05-16 12:00:33 -07:00
archive.c
backfill.c Merge branch 'ps/parse-options-integers' 2025-04-24 17:25:34 -07:00
bisect.c global: mark code units that generate warnings with `-Wsign-compare` 2024-12-06 20:20:02 +09:00
blame.c Merge branch 'az/tighten-string-array-constness' 2025-04-29 14:21:28 -07:00
branch.c Merge branch 'rs/ref-fitler-used-atoms-value-fix' 2025-01-29 14:05:09 -08:00
bugreport.c object-file: move `safe_create_leading_directories()` into "path.c" 2025-04-15 08:24:35 -07:00
bundle.c Merge branch 'jt/bundle-fsck' 2024-12-13 07:33:36 -08:00
cat-file.c cat-file.c: add batch handling for submodules 2025-06-03 12:08:58 -07:00
check-attr.c
check-ignore.c
check-mailmap.c mailmap: fix check-mailmap with full mailmap line 2025-02-21 18:27:16 -08:00
check-ref-format.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
checkout--worker.c builtins: send usage_with_options() help text to standard output 2025-01-17 13:30:03 -08:00
checkout-index.c builtin/checkout-index: stop using `the_repository` 2025-03-07 16:52:02 -08:00
checkout.c Merge branch 'ps/object-file-cleanup' 2025-04-24 17:25:33 -07:00
clean.c global: mark code units that generate warnings with `-Wsign-compare` 2024-12-06 20:20:02 +09:00
clone.c Merge branch 'ps/object-store-cleanup' 2025-05-12 14:22:49 -07:00
column.c parse-options: detect mismatches in integer signedness 2025-04-17 08:15:16 -07:00
commit-graph.c Merge branch 'ly/commit-graph-graph-write-leakfix' 2025-06-17 10:44:41 -07:00
commit-tree.c Merge branch 'ps/parse-options-integers' 2025-04-24 17:25:34 -07:00
commit.c commit: simplify code 2025-05-15 13:46:44 -07:00
config.c global: use designated initializers for options 2025-04-17 08:15:15 -07:00
count-objects.c object-store: move function declarations to their respective subsystems 2025-04-29 10:08:12 -07:00
credential-cache--daemon.c object-file: move `safe_create_leading_directories()` into "path.c" 2025-04-15 08:24:35 -07:00
credential-cache.c
credential-store.c
credential.c Merge branch 'jc/show-usage-help' 2025-01-28 13:02:22 -08:00
describe.c Merge branch 'ps/parse-options-integers' 2025-04-24 17:25:34 -07:00
diagnose.c object-file: move `safe_create_leading_directories()` into "path.c" 2025-04-15 08:24:35 -07:00
diff-files.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
diff-index.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
diff-pairs.c builtin/diff-pairs: allow explicit diff queue flush 2025-03-03 08:17:47 -08:00
diff-tree.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
diff.c diff --no-index: support limiting by pathspec 2025-05-22 14:20:11 -07:00
difftool.c Merge branch 'ua/call-repo-config-with-possibly-null-repository' 2025-04-29 14:21:27 -07:00
fast-export.c fast-export: --signed-commits is experimental 2025-05-28 10:30:47 -07:00
fast-import.c object-store: move and rename `odb_pack_keep()` 2025-04-29 10:08:12 -07:00
fetch-pack.c builtin/fetch-pack: cleanup before return error 2025-06-04 08:52:25 -07:00
fetch.c fetch: avoid unnecessary work when there is no current branch 2025-05-15 13:46:47 -07:00
fmt-merge-msg.c parse-options: introduce precision handling for `OPTION_INTEGER` 2025-04-17 08:15:15 -07:00
for-each-ref.c builtin/for-each-ref: stop using `the_repository` 2025-03-07 16:52:02 -08:00
for-each-repo.c global: trivial conversions to fix `-Wsign-compare` warnings 2024-12-06 20:20:04 +09:00
fsck.c fsck: stop using object_info->type_name strbuf 2025-05-16 09:43:10 -07:00
fsmonitor--daemon.c builtins: send usage_with_options() help text to standard output 2025-01-17 13:30:03 -08:00
gc.c builtin/gc: correct physical memory detection for OpenBSD / NetBSD 2025-06-01 19:01:07 -07:00
get-tar-commit-id.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
grep.c Merge branch 'ps/parse-options-integers' 2025-04-24 17:25:34 -07:00
hash-object.c hash-object: handle --literally with OPT_NEGBIT 2025-05-16 09:43:11 -07:00
help.c pager: stop using `the_repository` 2024-12-18 10:44:30 -08:00
hook.c
index-pack.c Merge branch 'ds/fix-thin-fix' 2025-05-12 14:22:49 -07:00
init-db.c Merge branch 'ps/parse-options-integers' 2025-04-24 17:25:34 -07:00
interpret-trailers.c
log.c object-store: merge "object-store-ll.h" and "object-store.h" 2025-04-15 08:24:37 -07:00
ls-files.c Merge branch 'ps/object-wo-the-repository' 2025-04-15 13:50:15 -07:00
ls-remote.c global: use designated initializers for options 2025-04-17 08:15:15 -07:00
ls-tree.c object-store: merge "object-store-ll.h" and "object-store.h" 2025-04-15 08:24:37 -07:00
mailinfo.c mailinfo: stop using `the_repository` 2024-12-18 10:44:31 -08:00
mailsplit.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
merge-base.c commit-reach: use `size_t` to track indices when computing merge bases 2024-12-27 08:12:40 -08:00
merge-file.c object-file: split out functions relating to object store subsystem 2025-04-15 08:24:36 -07:00
merge-index.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
merge-ours.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
merge-recursive.c builtin/merge-recursive: switch to using merge_ort_generic() 2025-04-08 13:59:11 -07:00
merge-tree.c merge-tree: add a new --quiet flag 2025-05-16 15:09:14 -07:00
merge.c Merge branch 'ps/parse-options-integers' 2025-04-24 17:25:34 -07:00
mktag.c Merge branch 'ly/do-not-localize-bug-messages' 2025-06-17 10:44:40 -07:00
mktree.c Merge branch 'az/tighten-string-array-constness' 2025-04-29 14:21:28 -07:00
multi-pack-index.c Merge branch 'ps/parse-options-integers' 2025-04-24 17:25:34 -07:00
mv.c Merge branch 'ps/mv-contradiction-fix' 2025-05-08 12:36:32 -07:00
name-rev.c Merge branch 'ps/object-wo-the-repository' 2025-04-15 13:50:15 -07:00
notes.c object-store: merge "object-store-ll.h" and "object-store.h" 2025-04-15 08:24:37 -07:00
pack-objects.c pack-objects: introduce '--stdin-packs=follow' 2025-06-23 15:41:37 -07:00
pack-redundant.c object-store: merge "object-store-ll.h" and "object-store.h" 2025-04-15 08:24:37 -07:00
pack-refs.c builtin/pack-refs: stop using `the_repository` 2025-03-07 16:52:01 -08:00
patch-id.c global: adapt callers to use generic hash context helpers 2025-01-31 10:06:11 -08:00
prune-packed.c
prune.c object-store: merge "object-store-ll.h" and "object-store.h" 2025-04-15 08:24:37 -07:00
pull.c refspec: replace `refspec_item_init()` with fetch/push variants 2025-03-21 01:45:16 -07:00
push.c remote: rename query_refspecs functions 2025-02-04 09:51:41 -08:00
range-diff.c Merge branch 'js/range-diff-diff-merges' 2024-12-23 09:32:17 -08:00
read-tree.c global: use designated initializers for options 2025-04-17 08:15:15 -07:00
rebase.c Merge branch 'ps/parse-options-integers' 2025-04-24 17:25:34 -07:00
receive-pack.c builtin/receive-pack: add option to skip connectivity check 2025-05-20 11:43:36 -07:00
reflog.c Merge branch 'ps/maintenance-reflog-expire' 2025-04-16 13:54:19 -07:00
refs.c Merge branch 'sj/ref-consistency-checks-more' 2025-03-26 16:26:10 +09:00
remote-ext.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
remote-fd.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
remote.c treewide: convert users of `repo_has_object_file()` to `has_object()` 2025-04-29 10:08:13 -07:00
repack.c repack: exclude cruft pack(s) from the MIDX where possible 2025-06-23 15:41:38 -07:00
replace.c object-store: merge "object-store-ll.h" and "object-store.h" 2025-04-15 08:24:37 -07:00
replay.c replay: replace the_repository with repo parameter passed to cmd_replay () 2025-05-14 15:00:49 -07:00
rerere.c rerere: let `rerere_path()` write paths into a caller-provided buffer 2025-02-28 13:54:11 -08:00
reset.c reset: integrate sparse index with --patch 2025-05-16 12:02:47 -07:00
rev-list.c oidmap: rename oidmap_free() to oidmap_clear() 2025-05-12 13:06:26 -07:00
rev-parse.c path: drop `git_path()` in favor of `repo_git_path()` 2025-02-28 13:54:11 -08:00
revert.c Merge branch 'ps/parse-options-integers' 2025-04-24 17:25:34 -07:00
rm.c rm: fix sign comparison warnings 2025-03-29 01:04:40 -07:00
send-pack.c builtin/send-pack: stop using `the_repository` 2025-03-07 16:52:01 -08:00
shortlog.c diff.h: fix index used to loop through unsigned integer 2024-12-06 20:20:03 +09:00
show-branch.c Merge branch 'az/tighten-string-array-constness' 2025-04-29 14:21:28 -07:00
show-index.c Merge branch 'jc/show-index-h-update' 2025-01-31 09:44:16 -08:00
show-ref.c treewide: convert users of `repo_has_object_file()` to `has_object()` 2025-04-29 10:08:13 -07:00
sparse-checkout.c object-file: move `safe_create_leading_directories()` into "path.c" 2025-04-15 08:24:35 -07:00
stash.c stash: remove merge-recursive.h include 2025-03-17 15:39:03 -07:00
stripspace.c
submodule--helper.c object-store: merge "object-store-ll.h" and "object-store.h" 2025-04-15 08:24:37 -07:00
symbolic-ref.c
tag.c Merge branch 'ps/parse-options-integers' 2025-04-24 17:25:34 -07:00
unpack-file.c object-store: merge "object-store-ll.h" and "object-store.h" 2025-04-15 08:24:37 -07:00
unpack-objects.c treewide: convert users of `repo_has_object_file()` to `has_object()` 2025-04-29 10:08:13 -07:00
update-index.c Merge branch 'ps/parse-options-integers' 2025-04-24 17:25:34 -07:00
update-ref.c Merge branch 'kn/non-transactional-batch-updates' 2025-04-16 13:54:19 -07:00
update-server-info.c builtin/update-server-info: remove unnecessary if statement 2025-04-08 14:47:37 -07:00
upload-archive.c builtin: send usage() help text to standard output 2025-01-17 13:30:03 -08:00
upload-pack.c serve: stop using `the_repository` 2024-12-18 10:44:30 -08:00
var.c Merge branch 'jc/show-usage-help' 2025-01-28 13:02:22 -08:00
verify-commit.c builtin/verify-commit: stop using `the_repository` 2025-03-07 16:52:01 -08:00
verify-pack.c
verify-tag.c builtin/verify-tag: stop using `the_repository` 2025-03-07 16:52:01 -08:00
worktree.c Merge branch 'ly/do-not-localize-bug-messages' 2025-06-17 10:44:40 -07:00
write-tree.c global: use designated initializers for options 2025-04-17 08:15:15 -07:00