git/Documentation
Taylor Blau 37dc6d8104 builtin/repack.c: implement support for `--max-cruft-size`
Cruft packs are an alternative mechanism for storing a collection of
unreachable objects whose mtimes are recent enough to avoid being
pruned out of the repository.

When cruft packs were first introduced back in b757353676
(builtin/pack-objects.c: --cruft without expiration, 2022-05-20) and
a7d493833f (builtin/pack-objects.c: --cruft with expiration,
2022-05-20), the recommended workflow consisted of:

  - Repacking periodically, either by packing anything loose in the
    repository (via `git repack -d`) or producing a geometric sequence
    of packs (via `git repack --geometric=<d> -d`).

  - Every so often, splitting the repository into two packs, one cruft
    to store the unreachable objects, and another non-cruft pack to
    store the reachable objects.

Repositories may (out of band with the above) choose periodically to
prune out some unreachable objects which have aged out of the grace
period by generating a pack with `--cruft-expiration=<approxidate>`.

This allowed repositories to maintain relatively few packs on average,
and quarantine unreachable objects together in a cruft pack, avoiding
the pitfalls of holding unreachable objects as loose while they age out
(for more, see some of the details in 3d89a8c118
(Documentation/technical: add cruft-packs.txt, 2022-05-20)).

This all works, but can be costly from an I/O-perspective when
frequently repacking a repository that has many unreachable objects.
This problem is exacerbated when those unreachable objects are rarely
(if every) pruned.

Since there is at most one cruft pack in the above scheme, each time we
update the cruft pack it must be rewritten from scratch. Because much of
the pack is reused, this is a relatively inexpensive operation from a
CPU-perspective, but is very costly in terms of I/O since we end up
rewriting basically the same pack (plus any new unreachable objects that
have entered the repository since the last time a cruft pack was
generated).

At the time, we decided against implementing more robust support for
multiple cruft packs. This patch implements that support which we were
lacking.

Introduce a new option `--max-cruft-size` which allows repositories to
accumulate cruft packs up to a given size, after which point a new
generation of cruft packs can accumulate until it reaches the maximum
size, and so on. To generate a new cruft pack, the process works like
so:

  - Sort a list of any existing cruft packs in ascending order of pack
    size.

  - Starting from the beginning of the list, group cruft packs together
    while the accumulated size is smaller than the maximum specified
    pack size.

  - Combine the objects in these cruft packs together into a new cruft
    pack, along with any other unreachable objects which have since
    entered the repository.

Once a cruft pack grows beyond the size specified via `--max-cruft-size`
the pack is effectively frozen. This limits the I/O churn up to a
quadratic function of the value specified by the `--max-cruft-size`
option, instead of behaving quadratically in the number of total
unreachable objects.

When pruning unreachable objects, we bypass the new code paths which
combine small cruft packs together, and instead start from scratch,
passing in the appropriate `--max-pack-size` down to `pack-objects`,
putting it in charge of keeping the resulting set of cruft packs sized
correctly.

This may seem like further I/O churn, but in practice it isn't so bad.
We could prune old cruft packs for whom all or most objects are removed,
and then generate a new cruft pack with just the remaining set of
objects. But this additional complexity buys us relatively little,
because most objects end up being pruned anyway, so the I/O churn is
well contained.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-05 13:26:11 -07:00
..
RelNotes The fourteenth batch 2023-10-02 11:20:00 -07:00
config builtin/repack.c: implement support for `--max-cruft-size` 2023-10-05 13:26:11 -07:00
howto new-command.txt: update reference to builtin docs 2023-02-06 14:07:33 -08:00
includes
mergetools
technical Merge branch 'en/header-split-cache-h-part-3' 2023-06-29 16:43:21 -07:00
.gitattributes
.gitignore doc: remove manpage-base-url workaround 2023-04-05 14:18:53 -07:00
CodingGuidelines Merge branch 'en/header-split-cache-h-part-3' 2023-06-29 16:43:21 -07:00
Makefile Merge branch 'fc/doc-stop-using-manversion' 2023-04-21 15:35:04 -07:00
MyFirstContribution.txt Merge branch 'jc/doc-sent-patch-now-what' 2023-08-04 10:52:31 -07:00
MyFirstObjectWalk.txt Merge branch 'vd/adjust-mfow-doc-to-updated-headers' 2023-07-17 11:30:42 -07:00
ReviewingGuidelines.txt
SubmittingPatches SubmittingPatches: use of older maintenance tracks is an exception 2023-07-27 13:07:40 -07:00
ToolsForGit.txt
asciidoc.conf doc: asciidoc: remove custom header macro 2023-04-05 21:37:45 -07:00
asciidoctor-extensions.rb
blame-options.txt blame: use different author name for fake commit generated by --contents 2023-04-24 15:16:31 -07:00
build-docdep.perl
cat-texi.perl
cmd-list.perl
config.txt docs: typofixes 2023-06-12 13:52:51 -07:00
date-formats.txt
diff-format.txt
diff-generate-patch.txt docs: link generating patch sections 2023-01-13 12:55:14 -08:00
diff-options.txt diff --stat: add config option to limit filename width 2023-09-18 09:39:07 -07:00
doc-diff doc-diff: drop SOURCE_DATE_EPOCH override 2023-05-05 14:28:03 -07:00
docbook-xsl.css
docbook.xsl
everyday.txto
fetch-options.txt fetch: introduce machine-parseable "porcelain" output format 2023-05-10 10:35:25 -07:00
fix-texi.perl
fsck-msgids.txt fsck: detect very large tree pathnames 2023-08-31 15:51:07 -07:00
git-add.txt docs & comments: replace mentions of "git-add--interactive.perl" 2023-02-06 15:03:34 -08:00
git-am.txt am: refer to format-patch in the documentation 2023-03-21 13:18:45 -07:00
git-annotate.txt
git-apply.txt Documentation: render dash correctly 2023-01-23 09:40:14 -08:00
git-archimport.txt
git-archive.txt archive: add --mtime 2023-02-18 09:29:13 -08:00
git-bisect-lk2009.txt git-bisect-lk2009: update nist report link 2023-01-13 11:58:51 -08:00
git-bisect.txt docs: update when `git bisect visualize` uses `gitk` 2023-08-04 09:47:10 -07:00
git-blame.txt blame: allow --contents to work with non-HEAD commit 2023-03-24 12:05:22 -07:00
git-branch.txt branch, for-each-ref, tag: add option to omit empty lines 2023-04-13 08:07:45 -07:00
git-bugreport.txt
git-bundle.txt Merge branch 'jk/bundle-use-dash-for-stdfiles' 2023-03-19 15:03:12 -07:00
git-cat-file.txt cat-file: add option '-Z' that delimits input and output with NUL 2023-06-12 13:23:46 -07:00
git-check-attr.txt attr: add flag `--source` to work with tree-ish 2023-01-14 08:49:55 -08:00
git-check-ignore.txt
git-check-mailmap.txt
git-check-ref-format.txt
git-checkout-index.txt
git-checkout.txt checkout/restore: refuse unmerging paths unless checking out of the index 2023-07-31 16:10:54 -07:00
git-cherry-pick.txt git-cherry-pick.txt: do not use 'ORIG_HEAD' in example 2023-01-13 09:55:45 -08:00
git-cherry.txt
git-citool.txt
git-clean.txt Merge branch 'ch/clean-docfix' 2023-09-22 17:01:37 -07:00
git-clone.txt clone: error specifically with --local and symlinked objects 2023-04-11 08:46:09 -07:00
git-column.txt
git-commit-graph.txt Merge branch 'ab/doc-synopsis-and-cmd-usage' 2022-10-28 11:26:54 -07:00
git-commit-tree.txt
git-commit.txt
git-config.txt git-config: fix misworded --type=path explanation 2023-09-15 14:09:37 -07:00
git-count-objects.txt
git-credential-cache--daemon.txt
git-credential-cache.txt Documentation: clarify that cache forgets credentials if the system restarts 2023-01-29 09:21:07 -08:00
git-credential-store.txt
git-credential.txt credential: erase all matching credentials 2023-06-15 13:26:41 -07:00
git-cvsexportcommit.txt
git-cvsimport.txt
git-cvsserver.txt docs: typofixes 2023-06-12 13:52:51 -07:00
git-daemon.txt
git-describe.txt docs: typofixes 2023-06-12 13:52:51 -07:00
git-diagnose.txt
git-diff-files.txt
git-diff-index.txt
git-diff-tree.txt
git-diff.txt Documentation: document AUTO_MERGE 2023-05-23 17:21:47 +09:00
git-difftool.txt mergetool: new config guiDefault supports auto-toggling gui by DISPLAY 2023-04-05 21:03:29 -07:00
git-fast-export.txt
git-fast-import.txt
git-fetch-pack.txt
git-fetch.txt fetch: introduce machine-parseable "porcelain" output format 2023-05-10 10:35:25 -07:00
git-filter-branch.txt
git-fmt-merge-msg.txt
git-for-each-ref.txt Merge branch 'ks/ref-filter-describe' 2023-08-02 09:37:24 -07:00
git-for-each-repo.txt
git-format-patch.txt Merge branch 'dd/format-patch-rfc-updates' 2023-09-07 15:06:08 -07:00
git-fsck-objects.txt
git-fsck.txt
git-fsmonitor--daemon.txt
git-gc.txt builtin/repack.c: implement support for `--max-cruft-size` 2023-10-05 13:26:11 -07:00
git-get-tar-commit-id.txt
git-grep.txt
git-gui.txt
git-hash-object.txt docs: add git hash-object -t option's possible values 2023-06-28 23:00:10 -07:00
git-help.txt
git-hook.txt hook: support a --to-stdin=<path> option 2023-02-08 12:50:03 -08:00
git-http-backend.txt
git-http-fetch.txt
git-http-push.txt
git-imap-send.txt
git-index-pack.txt
git-init-db.txt
git-init.txt
git-instaweb.txt
git-interpret-trailers.txt doc: trailer: add more examples in DESCRIPTION 2023-06-14 21:42:20 -07:00
git-log.txt
git-ls-files.txt ls-files: align format atoms with ls-tree 2023-05-23 20:12:57 +09:00
git-ls-remote.txt ls-remote doc: document the output format 2023-05-19 08:19:34 -07:00
git-ls-tree.txt Merge branch 'rs/doc-ls-tree-hex-literal' 2023-06-22 16:29:07 -07:00
git-mailinfo.txt
git-mailsplit.txt
git-maintenance.txt maintenance: add option to register in a specific config 2022-11-14 22:39:25 -05:00
git-merge-base.txt
git-merge-file.txt
git-merge-index.txt
git-merge-one-file.txt
git-merge-tree.txt Merge branch 'as/doc-markup-fix' 2023-03-19 15:03:11 -07:00
git-merge.txt Documentation: document AUTO_MERGE 2023-05-23 17:21:47 +09:00
git-mergetool--lib.txt
git-mergetool.txt mergetool: new config guiDefault supports auto-toggling gui by DISPLAY 2023-04-05 21:03:29 -07:00
git-mktag.txt docs: typofixes 2023-06-12 13:52:51 -07:00
git-mktree.txt
git-multi-pack-index.txt
git-mv.txt
git-name-rev.txt name-rev: make --stdin hidden 2023-05-06 14:32:20 -07:00
git-notes.txt notes doc: tidy up `--no-stripspace` paragraph 2023-08-16 11:37:25 -07:00
git-p4.txt
git-pack-objects.txt builtin/pack-objects.c: support `--max-pack-size` with `--cruft` 2023-08-29 11:58:06 -07:00
git-pack-redundant.txt pack-redundant: document deprecation 2023-03-30 07:50:43 -07:00
git-pack-refs.txt pack-refs: teach pack-refs --include option 2023-05-12 14:54:14 -07:00
git-patch-id.txt
git-prune-packed.txt
git-prune.txt
git-pull.txt
git-push.txt Merge branch 'ws/git-push-doc-grammofix' 2023-08-24 09:32:33 -07:00
git-quiltimport.txt
git-range-diff.txt
git-read-tree.txt Documentation: render dash correctly 2023-01-23 09:40:14 -08:00
git-rebase.txt rebase: add a config option for --rebase-merges 2023-03-27 09:32:49 -07:00
git-receive-pack.txt
git-reflog.txt
git-remote-ext.txt
git-remote-fd.txt
git-remote-helpers.txto
git-remote.txt
git-repack.txt builtin/repack.c: implement support for `--max-cruft-size` 2023-10-05 13:26:11 -07:00
git-replace.txt
git-request-pull.txt
git-rerere.txt
git-reset.txt git-reset.txt: mention 'ORIG_HEAD' in the Description 2023-01-13 09:55:45 -08:00
git-restore.txt checkout/restore: refuse unmerging paths unless checking out of the index 2023-07-31 16:10:54 -07:00
git-rev-list.txt
git-rev-parse.txt parse-options: show negatability of options in short help 2023-08-06 17:16:50 -07:00
git-revert.txt git-revert.txt: add discussion 2023-09-02 15:21:44 -07:00
git-rm.txt
git-send-email.txt Merge branch 'mc/send-email-header-cmd' 2023-05-15 13:59:03 -07:00
git-send-pack.txt
git-sh-i18n--envsubst.txt
git-sh-i18n.txt
git-sh-setup.txt
git-shell.txt
git-shortlog.txt
git-show-branch.txt show-branch doc: say <ref>, not <reference> 2023-05-19 08:19:34 -07:00
git-show-index.txt
git-show-ref.txt show-ref doc: fix carets in monospace 2023-08-16 11:40:10 -07:00
git-show.txt show doc: redirect user to git log manual instead of git diff-tree 2023-09-20 08:52:59 -07:00
git-sparse-checkout.txt docs: typofixes 2023-06-12 13:52:51 -07:00
git-stage.txt
git-stash.txt docs: typofixes 2023-06-12 13:52:51 -07:00
git-status.txt Documentation/git-status: add missing line breaks 2023-09-22 15:27:51 -07:00
git-stripspace.txt
git-submodule.txt doc: highlight that .gitmodules does not support !command 2023-07-25 14:55:07 -07:00
git-svn.txt
git-switch.txt
git-symbolic-ref.txt
git-tag.txt doc: tag: document `TAG_EDITMSG` 2023-05-16 11:38:14 -07:00
git-tools.txt
git-unpack-file.txt
git-unpack-objects.txt
git-update-index.txt update-index: add --show-index-version 2023-09-12 16:21:53 -07:00
git-update-ref.txt
git-update-server-info.txt
git-upload-archive.txt
git-upload-pack.txt
git-var.txt var: add config file locations 2023-06-27 11:31:06 -07:00
git-verify-commit.txt
git-verify-pack.txt
git-verify-tag.txt
git-version.txt
git-web--browse.txt
git-whatchanged.txt
git-worktree.txt worktree add: extend DWIM to infer --orphan 2023-05-17 15:55:25 -07:00
git-write-tree.txt
git.txt doc: sha256 is no longer experimental 2023-07-31 09:11:04 -07:00
gitattributes.txt ll-merge: killing the external merge driver aborts the merge 2023-06-23 09:27:10 -07:00
gitcli.txt
gitcore-tutorial.txt
gitcredentials.txt Merge branch 'mh/doc-credential-helpers' 2023-07-18 07:28:52 -07:00
gitcvs-migration.txt
gitdiffcore.txt
giteveryday.txt
gitfaq.txt
gitformat-bundle.txt
gitformat-chunk.txt
gitformat-commit-graph.txt doc: use "commit-graph" hyphenation consistently 2022-10-30 19:58:40 -04:00
gitformat-index.txt docs: document zero bits in index "mode" 2023-02-01 08:49:23 -08:00
gitformat-pack.txt Documentation/gitformat-pack.txt: drop mixed version section 2023-08-29 11:58:26 -07:00
gitformat-signature.txt Merge branch 'gm/signature-format-doc' 2023-03-06 21:51:56 -08:00
gitglossary.txt
githooks.txt Merge branch 'ms/send-email-feed-header-to-validate-hook' 2023-05-10 10:23:28 -07:00
gitignore.txt Merge branch 'jc/gitignore-doc-pattern-markup' 2023-07-27 15:26:37 -07:00
gitk.txt
gitmailmap.txt
gitmodules.txt doc: highlight that .gitmodules does not support !command 2023-07-25 14:55:07 -07:00
gitnamespaces.txt
gitprotocol-capabilities.txt
gitprotocol-common.txt
gitprotocol-http.txt
gitprotocol-pack.txt
gitprotocol-v2.txt *: fix typos which duplicate a word 2023-01-08 10:28:34 +09:00
gitremote-helpers.txt
gitrepository-layout.txt
gitrevisions.txt
gitsubmodules.txt
gittutorial-2.txt
gittutorial.txt gittutorial: wrap literal examples in backticks 2023-04-20 14:34:08 -07:00
gitweb.conf.txt
gitweb.txt docs: typofixes 2023-06-12 13:52:51 -07:00
gitworkflows.txt
glossary-content.txt glossary: add reachability bitmap description 2022-10-30 19:58:46 -04:00
howto-index.sh
i18n.txt
install-doc-quick.sh
install-webdoc.sh
line-range-format.txt
line-range-options.txt
lint-fsck-msgids.perl
lint-gitlink.perl
lint-man-end-blurb.perl
lint-man-section-order.perl
manpage-bold-literal.xsl
manpage-normal.xsl Merge branch 'fc/doc-man-lift-title-length-limit' 2023-05-10 10:23:29 -07:00
manpage.xsl
merge-options.txt
merge-strategies.txt
object-format-disclaimer.txt doc: sha256 is no longer experimental 2023-07-31 09:11:04 -07:00
pretty-formats.txt pretty: add pointer and tag options to %(decorate) 2023-08-21 11:40:10 -07:00
pretty-options.txt range-diff: treat notes like `log` 2023-09-19 14:40:19 -07:00
pull-fetch-param.txt
ref-reachability-filters.txt
rerere-options.txt
rev-list-description.txt
rev-list-options.txt rev-list-options: fix typo in `--stdin` documentation 2023-08-16 11:42:54 -07:00
revisions.txt Documentation: document AUTO_MERGE 2023-05-23 17:21:47 +09:00
scalar.txt scalar: add --[no-]src option 2023-08-28 09:16:06 -07:00
sequencer.txt
signoff-option.txt
texi.xsl
trace2-target-values.txt
transfer-data-leaks.txt
urls-remotes.txt docs: typofixes 2023-06-12 13:52:51 -07:00
urls.txt
user-manual.conf
user-manual.txt cache.h: remove this no-longer-used header 2023-06-21 13:39:53 -07:00