Go to file
Jeff King 07e7dbf0db gc: default aggressive depth to 50
This commit message is long and has lots of background and
numbers. The summary is: the current default of 250 doesn't
save much space, and costs CPU. It's not a good tradeoff.
Read on for details.

The "--aggressive" flag to git-gc does three things:

  1. use "-f" to throw out existing deltas and recompute from
     scratch

  2. use "--window=250" to look harder for deltas

  3. use "--depth=250" to make longer delta chains

Items (1) and (2) are good matches for an "aggressive"
repack. They ask the repack to do more computation work in
the hopes of getting a better pack. You pay the costs during
the repack, and other operations see only the benefit.

Item (3) is not so clear. Allowing longer chains means fewer
restrictions on the deltas, which means potentially finding
better ones and saving some space. But it also means that
operations which access the deltas have to follow longer
chains, which affects their performance. So it's a tradeoff,
and it's not clear that the tradeoff is even a good one.

The existing "250" numbers for "--aggressive" come
originally from this thread:

  http://public-inbox.org/git/alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org/

where Linus says:

  So when I said "--depth=250 --window=250", I chose those
  numbers more as an example of extremely aggressive
  packing, and I'm not at all sure that the end result is
  necessarily wonderfully usable. It's going to save disk
  space (and network bandwidth - the delta's will be re-used
  for the network protocol too!), but there are definitely
  downsides too, and using long delta chains may
  simply not be worth it in practice.

There are some numbers in that thread, but they're mostly
focused on the improved window size, and measure the
improvement from --depth=250 and --window=250 together.
E.g.:

  http://public-inbox.org/git/9e4733910712062006l651571f3w7f76ce64c6650dff@mail.gmail.com/

talks about the improved run-time of "git-blame", which
comes from the reduced pack size. But most of that reduction
is coming from --window=250, whereas most of the extra costs
come from --depth=250. There's a link in that thread showing
that increasing the depth beyond 50 doesn't seem to help
much with the size:

  https://vcscompare.blogspot.com/2008/06/git-repack-parameters.html

but again, no discussion of the timing impact.

In an earlier thread from Ted Ts'o which discussed setting
the non-aggressive default (from 10 to 50):

  http://public-inbox.org/git/20070509134958.GA21489%40thunk.org/

we have more numbers, with the conclusion that going past 50
does not help size much, and hurts the speed of normal
operations.

So from that, we might guess that 50 is actually a sweet
spot, even for aggressive, if we interpret aggressive to
"spend time now to make a better pack". It is not clear that
"--depth=250" is actually a better pack. It may be slightly
_smaller_, but it carries a run-time penalty.

Here are some more recent timings I did to verify that. They
show three things:

  - the size of the resulting pack (so disk saved to store,
    bandwidth saved on clones/fetches)

  - the cost of "rev-list --objects --all", which shows the
    effect of the delta chains on trees (commits typically
    don't delta, and the command doesn't touch the blobs at
    all)

  - the cost of "log -Sfoo", which will additionally access
    each blob

All cases were repacked with "git repack -adf --depth=$d
--window=250" (so basically, what would happen if we tweaked
the "gc --aggressive" default depth).

The timings are all wall-clock best-of-3. The machine itself
has plenty of RAM compared to the repositories (which is
probably typical of most workstations these days), so we're
really measuring CPU usage, as the whole thing will be in
disk cache after the first run.

The core.deltaBaseCacheLimit is at its default of 96MiB.
It's possible that tweaking it would have some impact on the
tests, as some of them (especially "log -S" on a large repo)
are likely to overflow that. But bumping that carries a
run-time memory cost, so for these tests, I focused on what
we could do just with the on-disk pack tradeoffs.

Each test is done for four depths: 250 (the current value),
50 (the current default that tested well previously), 100
(to show something on the larger side, which previous tests
showed was not a good tradeoff), and 10 (the very old
default, which previous tests showed was worse than 50).

Here are the numbers for linux.git:

   depth |  size |  %    | rev-list |  %     | log -Sfoo |   %
  -------+-------+-------+----------+--------+-----------+-------
    250  | 967MB |  n/a  | 48.159s  |   n/a  | 378.088   |   n/a
    100  | 971MB | +0.4% | 41.471s  | -13.9% | 342.060   |  -9.5%
     50  | 979MB | +1.2% | 37.778s  | -21.6% | 311.040s  | -17.7%
     10  | 1.1GB | +6.6% | 32.518s  | -32.5% | 279.890s  | -25.9%

and for git.git:

   depth |  size |  %    | rev-list |  %     | log -Sfoo |   %
  -------+-------+-------+----------+--------+-----------+-------
    250  |  48MB |  n/a  |  2.215s  |   n/a  |  20.922s  |   n/a
    100  |  49MB | +0.5% |  2.140s  |  -3.4% |  17.736s  | -15.2%
     50  |  49MB | +1.7% |  2.099s  |  -5.2% |  15.418s  | -26.3%
     10  |  53MB | +9.3% |  2.001s  |  -9.7% |  12.677s  | -39.4%

You can see that that the CPU savings for regular operations improves as we
decrease the depth. The savings are less for "rev-list" on a smaller repository
than they are for blob-accessing operations, or even rev-list on a larger
repository. This may mean that a larger delta cache would help (though setting
core.deltaBaseCacheLimit by itself doesn't).

But we can also see that the space savings are not that great as the depth goes
higher. Saving 5-10% between 10 and 50 is probably worth the CPU tradeoff.
Saving 1% to go from 50 to 100, or another 0.5% to go from 100 to 250 is
probably not.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 11:53:19 -07:00
Documentation gc: default aggressive depth to 50 2016-08-11 11:53:19 -07:00
block-sha1
builtin gc: default aggressive depth to 50 2016-08-11 11:53:19 -07:00
compat
contrib Merge branch 'sg/completion-commit-cleanup' into maint 2015-08-03 10:41:33 -07:00
ewah Merge branch 'es/osx-header-pollutes-mask-macro' into maint 2015-07-15 11:41:24 -07:00
git-gui
gitk-git
gitweb
mergetools
perl
po l10n: de.po: translation fix for fall-back to 3way merge 2015-06-12 20:40:04 +02:00
ppc
t Sync with 2.3.10 2015-09-28 15:28:31 -07:00
templates
vcs-svn
xdiff
.gitattributes
.gitignore
.mailmap
COPYING
GIT-VERSION-GEN Git 2.4.11 2016-03-17 11:23:05 -07:00
INSTALL
LGPL-2.1
Makefile Merge branch 'jk/make-fix-dependencies' into maint 2015-06-25 11:02:16 -07:00
README
RelNotes Git 2.4.11 2016-03-17 11:23:05 -07:00
abspath.c
aclocal.m4
advice.c
advice.h
alias.c
alloc.c
archive-tar.c
archive-zip.c
archive.c
archive.h
argv-array.c
argv-array.h
attr.c Merge branch 'pt/xdg-config-path' into maint 2015-06-05 12:00:04 -07:00
attr.h
base85.c
bisect.c
bisect.h
blob.c
blob.h
branch.c
branch.h
builtin.h
bulk-checkin.c
bulk-checkin.h
bundle.c
bundle.h
cache-tree.c
cache-tree.h
cache.h Merge branch 'jk/index-pack-reduce-recheck' into maint 2015-07-27 12:21:38 -07:00
check-builtins.sh
check-racy.c
check_bindir
color.c
color.h
column.c
column.h
combine-diff.c Sync with 2.3.10 2015-09-28 15:28:31 -07:00
command-list.txt
commit-slab.h
commit.c Merge branch 'jk/squelch-missing-link-warning-for-unreachable' into maint 2015-06-25 11:02:10 -07:00
commit.h Merge branch 'jk/squelch-missing-link-warning-for-unreachable' into maint 2015-06-25 11:02:10 -07:00
config.c config.c: fix writing config files on Windows network shares 2015-06-30 11:01:59 -07:00
config.mak.in
config.mak.uname
configure.ac
connect.c Sync with 2.3.10 2015-09-28 15:28:31 -07:00
connect.h
connected.c
connected.h
convert.c
convert.h
copy.c
credential-cache--daemon.c
credential-cache.c
credential-store.c Merge branch 'pt/xdg-config-path' into maint 2015-06-05 12:00:04 -07:00
credential.c
credential.h
csum-file.c
csum-file.h
ctype.c
daemon.c daemon: unbreak NO_IPV6 build regression 2015-05-05 11:03:24 -07:00
date.c Merge branch 'jc/epochtime-wo-tz' into maint-2.3 2015-05-11 14:33:58 -07:00
decorate.c
decorate.h
delta.h
diff-delta.c
diff-lib.c
diff-no-index.c
diff.c Sync with 2.3.10 2015-09-28 15:28:31 -07:00
diff.h tree-diff: catch integer overflow in combine_diff_path allocation 2016-03-16 10:41:02 -07:00
diffcore-break.c
diffcore-delta.c
diffcore-order.c
diffcore-pickaxe.c react to errors in xdi_diff 2015-09-28 14:57:10 -07:00
diffcore-rename.c
diffcore.h
dir.c Merge branch 'rs/janitorial' into maint 2015-06-16 14:33:47 -07:00
dir.h
editor.c
entry.c
environment.c
exec_cmd.c Merge branch 'jk/git-no-more-argv0-path-munging' into maint 2015-05-26 13:49:18 -07:00
exec_cmd.h
fast-import.c
fetch-pack.c Merge branch 'me/fetch-into-shallow-safety' into maint 2015-07-15 11:41:20 -07:00
fetch-pack.h
fmt-merge-msg.h
fsck.c Merge branch 'jc/fsck-retire-require-eoh' into maint 2015-07-27 12:21:46 -07:00
fsck.h
generate-cmdlist.sh
gettext.c
gettext.h
git-add--interactive.perl
git-am.sh Merge branch 'pt/am-abort-fix' into maint 2015-08-03 10:41:32 -07:00
git-archimport.perl
git-bisect.sh
git-compat-util.h add helpers for detecting size_t overflow 2016-03-16 10:41:02 -07:00
git-cvsexportcommit.perl
git-cvsimport.perl
git-cvsserver.perl
git-difftool--helper.sh
git-difftool.perl
git-filter-branch.sh filter-branch: avoid passing commit message through sed 2015-04-29 10:01:04 -07:00
git-instaweb.sh
git-merge-octopus.sh
git-merge-one-file.sh
git-merge-resolve.sh
git-mergetool--lib.sh
git-mergetool.sh
git-p4.py
git-parse-remote.sh
git-pull.sh Merge branch 'pt/pull-tags-error-diag' into maint 2015-06-25 11:02:12 -07:00
git-quiltimport.sh
git-rebase--am.sh rebase: return non-zero error code if format-patch fails 2015-07-08 15:36:42 -07:00
git-rebase--interactive.sh Merge branch 'js/rebase-i-clean-up-upon-continue-to-skip' into maint 2015-08-03 10:41:34 -07:00
git-rebase--merge.sh
git-rebase.sh Merge branch 'jk/rebase-quiet-noop' into maint 2015-05-26 13:49:23 -07:00
git-relink.perl
git-remote-testgit.sh
git-request-pull.sh
git-send-email.perl
git-sh-i18n.sh
git-sh-setup.sh
git-stash.sh Merge branch 'jk/stash-require-clean-index' into maint 2015-06-25 23:03:27 -07:00
git-submodule.sh submodule: allow only certain protocols for submodule fetches 2015-09-23 11:35:48 -07:00
git-svn.perl
git-web--browse.sh
git.c
git.rc
git.spec.in
gpg-interface.c
gpg-interface.h
graph.c
graph.h
grep.c
grep.h
hashmap.c
hashmap.h
help.c
help.h
hex.c
http-backend.c http-backend: spool ref negotiation requests to buffer 2015-05-25 20:43:18 -07:00
http-fetch.c
http-push.c http-push: stop using name_path 2016-03-16 10:41:02 -07:00
http-walker.c
http.c Sync with 2.3.10 2015-09-28 15:28:31 -07:00
http.h
ident.c
imap-send.c
khash.h
kwset.c
kwset.h
levenshtein.c
levenshtein.h
line-log.c Sync with 2.3.10 2015-09-28 15:28:31 -07:00
line-log.h
line-range.c
line-range.h
list-objects.c list-objects: pass full pathname to callbacks 2016-03-16 10:41:04 -07:00
list-objects.h list-objects: pass full pathname to callbacks 2016-03-16 10:41:04 -07:00
ll-merge.c xdiff: reject files larger than ~1GB 2015-09-28 14:57:23 -07:00
ll-merge.h
lockfile.c
lockfile.h
log-tree.c log: do not shorten decoration names too early 2015-05-13 12:40:57 -07:00
log-tree.h
mailmap.c
mailmap.h
match-trees.c
merge-blobs.c
merge-blobs.h
merge-recursive.c use file_exists() to check if a file exists in the worktree 2015-05-20 13:49:10 -07:00
merge-recursive.h
merge.c
mergesort.c
mergesort.h
name-hash.c
notes-cache.c
notes-cache.h
notes-merge.c
notes-merge.h
notes-utils.c
notes-utils.h
notes.c Sync with 2.2.3 2015-09-04 10:29:28 -07:00
notes.h
object.c Merge branch 'jk/type-from-string-gently' into maint 2015-05-13 14:05:54 -07:00
object.h
pack-bitmap-write.c list-objects: pass full pathname to callbacks 2016-03-16 10:41:04 -07:00
pack-bitmap.c list-objects: pass full pathname to callbacks 2016-03-16 10:41:04 -07:00
pack-bitmap.h
pack-check.c
pack-objects.c
pack-objects.h
pack-revindex.c
pack-revindex.h
pack-write.c
pack.h
pager.c Merge branch 'jc/unexport-git-pager-in-use-in-pager' into maint 2015-07-27 12:21:44 -07:00
parse-options-cb.c
parse-options.c
parse-options.h Merge branch 'iu/fix-parse-options-h-comment' into maint 2015-04-21 12:12:20 -07:00
patch-delta.c
patch-ids.c
patch-ids.h
path.c Merge branch 'pt/xdg-config-path' into maint 2015-06-05 12:00:04 -07:00
pathspec.c
pathspec.h
pkt-line.c
pkt-line.h
preload-index.c
pretty.c
prio-queue.c
prio-queue.h
progress.c
progress.h
prompt.c
prompt.h
quote.c
quote.h
reachable.c list-objects: pass full pathname to callbacks 2016-03-16 10:41:04 -07:00
reachable.h
read-cache.c Merge branch 'jk/diagnose-config-mmap-failure' into maint 2015-06-25 11:02:11 -07:00
reflog-walk.c
reflog-walk.h
refs.c Merge branch 'mh/reporting-broken-refs-from-for-each-ref' into maint 2015-08-03 10:41:31 -07:00
refs.h
remote-curl.c
remote-testsvn.c
remote.c
remote.h
replace_object.c
rerere.c Merge branch 'jk/rerere-forget-check-enabled' into maint 2015-06-05 12:00:25 -07:00
rerere.h
resolve-undo.c
resolve-undo.h
revision.c list-objects: pass full pathname to callbacks 2016-03-16 10:41:04 -07:00
revision.h list-objects: pass full pathname to callbacks 2016-03-16 10:41:04 -07:00
run-command.c
run-command.h
send-pack.c Merge branch 'jc/push-cert' into maint 2015-04-27 12:23:50 -07:00
send-pack.h
sequencer.c
sequencer.h
server-info.c
setup.c setup_git_directory: delay core.bare/core.worktree errors 2015-05-29 09:27:27 -07:00
sh-i18n--envsubst.c
sha1-array.c
sha1-array.h
sha1-lookup.c
sha1-lookup.h
sha1_file.c Sync with 2.3.9 2015-09-04 10:34:19 -07:00
sha1_name.c use file_exists() to check if a file exists in the worktree 2015-05-20 13:49:10 -07:00
shallow.c
shell.c
shortlog.h
show-index.c
sideband.c
sideband.h
sigchain.c
sigchain.h
split-index.c
split-index.h
strbuf.c strbuf: strbuf_read_file() should return ssize_t 2015-07-03 18:25:02 -07:00
strbuf.h strbuf: strbuf_read_file() should return ssize_t 2015-07-03 18:25:02 -07:00
streaming.c
streaming.h
string-list.c
string-list.h
submodule.c use file_exists() to check if a file exists in the worktree 2015-05-20 13:49:10 -07:00
submodule.h
symlinks.c
tag.c
tag.h
tar.h
test-chmtime.c
test-config.c
test-ctype.c
test-date.c
test-delta.c
test-dump-cache-tree.c
test-dump-split-index.c
test-genrandom.c
test-hashmap.c
test-index-version.c
test-line-buffer.c
test-match-trees.c
test-mergesort.c
test-mktemp.c
test-parse-options.c
test-path-utils.c
test-prio-queue.c
test-read-cache.c
test-regex.c
test-revision-walking.c
test-run-command.c
test-scrap-cache-tree.c
test-sha1-array.c
test-sha1.c
test-sha1.sh
test-sigchain.c
test-string-list.c
test-subprocess.c
test-svn-fe.c
test-urlmatch-normalization.c
test-wildmatch.c
thread-utils.c
thread-utils.h
trace.c
trace.h
trailer.c
trailer.h
transport-helper.c Sync with 2.3.10 2015-09-28 15:28:31 -07:00
transport.c Sync with 2.3.10 2015-09-28 15:28:31 -07:00
transport.h Sync with 2.3.10 2015-09-28 15:28:31 -07:00
tree-diff.c tree-diff: catch integer overflow in combine_diff_path allocation 2016-03-16 10:41:02 -07:00
tree-walk.c
tree-walk.h
tree.c Merge branch 'jk/squelch-missing-link-warning-for-unreachable' into maint 2015-06-25 11:02:10 -07:00
tree.h Merge branch 'jk/squelch-missing-link-warning-for-unreachable' into maint 2015-06-25 11:02:10 -07:00
unicode_width.h
unimplemented.sh
unix-socket.c
unix-socket.h
unpack-trees.c Sync with 2.2.3 2015-09-04 10:29:28 -07:00
unpack-trees.h
update_unicode.sh
upload-pack.c
url.c
url.h
urlmatch.c
urlmatch.h
usage.c
userdiff.c
userdiff.h
utf8.c
utf8.h Merge branch 'es/utf8-stupid-compiler-workaround' into maint 2015-07-15 11:41:23 -07:00
varint.c
varint.h
version.c
version.h
versioncmp.c
walker.c
walker.h
wildmatch.c
wildmatch.h
wrap-for-bin.sh
wrapper.c
write_or_die.c
ws.c
wt-status.c Merge branch 'sg/commit-cleanup-scissors' into maint 2015-08-03 10:41:30 -07:00
wt-status.h
xdiff-interface.c xdiff: reject files larger than ~1GB 2015-09-28 14:57:23 -07:00
xdiff-interface.h xdiff: reject files larger than ~1GB 2015-09-28 14:57:23 -07:00
zlib.c

README

////////////////////////////////////////////////////////////////

	Git - the stupid content tracker

////////////////////////////////////////////////////////////////

"git" can mean anything, depending on your mood.

 - random three-letter combination that is pronounceable, and not
   actually used by any common UNIX command.  The fact that it is a
   mispronunciation of "get" may or may not be relevant.
 - stupid. contemptible and despicable. simple. Take your pick from the
   dictionary of slang.
 - "global information tracker": you're in a good mood, and it actually
   works for you. Angels sing, and a light suddenly fills the room.
 - "goddamn idiotic truckload of sh*t": when it breaks

Git is a fast, scalable, distributed revision control system with an
unusually rich command set that provides both high-level operations
and full access to internals.

Git is an Open Source project covered by the GNU General Public
License version 2 (some parts of it are under different licenses,
compatible with the GPLv2). It was originally written by Linus
Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

See Documentation/gittutorial.txt to get started, then see
Documentation/giteveryday.txt for a useful minimum set of commands, and
Documentation/git-commandname.txt for documentation of each command.
If git has been correctly installed, then the tutorial can also be
read with "man gittutorial" or "git help tutorial", and the
documentation of each command with "man git-commandname" or "git help
commandname".

CVS users may also want to read Documentation/gitcvs-migration.txt
("man gitcvs-migration" or "git help cvs-migration" if git is
installed).

Many Git online resources are accessible from http://git-scm.com/
including full documentation and Git related tools.

The user discussion and development of Git take place on the Git
mailing list -- everyone is welcome to post bug reports, feature
requests, comments and patches to git@vger.kernel.org (read
Documentation/SubmittingPatches for instructions on patch submission).
To subscribe to the list, send an email with just "subscribe git" in
the body to majordomo@vger.kernel.org. The mailing list archives are
available at http://news.gmane.org/gmane.comp.version-control.git/,
http://marc.info/?l=git and other archival sites.

The maintainer frequently sends the "What's cooking" reports that
list the current status of various development topics to the mailing
list.  The discussion following them give a good reference for
project status, development direction and remaining tasks.