git/builtin
René Scharfe 2d53975488 name-rev: release unused name strings
name_rev() assigns a name to a commit and its parents and grandparents
and so on.  Commits share their name string with their first parent,
which in turn does the same, recursively to the root.  That saves a lot
of allocations.  When a better name is found, the old name is replaced,
but its memory is not released.  That leakage can become significant.

Can we release these old strings exactly once even though they are
referenced multiple times?  Yes, indeed -- we can make use of the fact
that name_rev() visits the ancestors of a commit after it set a new name
for it and tries to update their names as well.

Members of the first ancestral line have the same taggerdate and
from_tag values, but a higher distance value than their child commit at
generation 0.  These are the only criteria used by is_better_name().
Lower distance values are considered better, so a name that is better
for a child will also be better for its parent and grandparent etc.

That means we can free(3) an inferior name at generation 0 and rely on
name_rev() to replace all references in ancestors as well.

If we do that then we need to stop using the string pointer alone to
distinguish new empty rev_name slots from initialized ones, though, as
it technically becomes invalid after the free(3) call -- even though its
value is still different from NULL.

We can check the generation value first, as empty slots will have it
initialized to 0, and for the actual generation 0 we'll set a new valid
name right after the create_or_update_name() call that releases the
string.

For the Chromium repo, releasing superceded names reduces the memory
footprint of name-rev --all significantly.  Here's the output of GNU
time before:

0.98user 0.48system 0:01.46elapsed 99%CPU (0avgtext+0avgdata 2601812maxresident)k
0inputs+0outputs (0major+571470minor)pagefaults 0swaps

... and with this patch:

1.01user 0.26system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 1559196maxresident)k
0inputs+0outputs (0major+314370minor)pagefaults 0swaps

It also gets faster; hyperfine before:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      1.534 s ±  0.006 s    [User: 1.039 s, System: 0.494 s]
  Range (min … max):    1.522 s …  1.542 s    10 runs

... and with this patch:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      1.338 s ±  0.006 s    [User: 1.047 s, System: 0.291 s]
  Range (min … max):    1.327 s …  1.346 s    10 runs

For the Linux repo it doesn't pay off; memory usage only gets down from:

0.76user 0.03system 0:00.80elapsed 99%CPU (0avgtext+0avgdata 292848maxresident)k
0inputs+0outputs (0major+44579minor)pagefaults 0swaps

... to:

0.78user 0.03system 0:00.81elapsed 100%CPU (0avgtext+0avgdata 284696maxresident)k
0inputs+0outputs (0major+44892minor)pagefaults 0swaps

The runtime actually increases slightly from:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     828.8 ms ±   5.0 ms    [User: 797.2 ms, System: 31.6 ms]
  Range (min … max):   824.1 ms … 838.9 ms    10 runs

... to:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     847.6 ms ±   3.4 ms    [User: 807.9 ms, System: 39.6 ms]
  Range (min … max):   843.4 ms … 854.3 ms    10 runs

Why is that?  In the Chromium repo, ca. 44000 free(3) calls in
create_or_update_name() release almost 1GB, while in the Linux repo
240000+ calls release a bit more than 5MB, so the average discarded
name is ca.  1000x longer in the latter.

Overall I think it's the right tradeoff to make, as it helps curb the
memory usage in repositories with big discarded names, and the added
overhead is small.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05 10:24:15 -08:00
..
add.c Merge branch 'js/add-p-in-c' 2019-12-25 11:22:01 -08:00
am.c Merge branch 'en/merge-recursive-cleanup' 2019-10-15 13:47:59 +09:00
annotate.c
apply.c
archive.c
bisect--helper.c Merge branch 'mr/bisect-save-pointer-to-const-string' 2019-12-25 11:22:01 -08:00
blame.c Merge branch 'sg/blame-indent-heuristics-is-now-the-default' 2019-12-01 09:04:30 -08:00
branch.c l10n: minor case fix in 'git branch' '--unset-upstream' description 2019-12-09 12:30:55 -08:00
bundle.c bundle-verify: add --quiet 2019-11-11 11:46:29 +09:00
cat-file.c Merge branch 'cc/multi-promisor' 2019-09-18 11:50:09 -07:00
check-attr.c
check-ignore.c treewide: rename 'exclude' methods to 'pattern' 2019-09-05 14:05:12 -07:00
check-mailmap.c
check-ref-format.c
checkout-index.c Merge branch 'nd/the-index-final' 2019-02-06 22:05:23 -08:00
checkout.c checkout, restore: support the --pathspec-from-file option 2019-12-04 10:10:37 -08:00
clean.c Merge branch 'en/clean-nested-with-ignored' 2019-10-11 14:24:46 +09:00
clone.c Merge branch 'ds/sparse-cone' 2019-12-25 11:21:58 -08:00
column.c builtin: consistently pass cmd_* prefix to parse_options 2019-05-13 14:22:53 +09:00
commit-graph.c test-tool: use 'read-graph' helper 2019-11-13 11:14:16 +09:00
commit-tree.c commit-tree: utilize parse-options api 2019-03-08 10:31:24 +09:00
commit.c Merge branch 'am/pathspec-from-file' 2019-12-25 11:21:57 -08:00
config.c
count-objects.c
credential.c
describe.c Merge branch 'ew/hashmap' 2019-10-15 13:48:02 +09:00
diff-files.c
diff-index.c
diff-tree.c Merge branch 'en/combined-all-paths' 2019-03-07 09:59:54 +09:00
diff.c Merge branch 'nd/diff-parseopt-4' 2019-04-25 16:41:12 +09:00
difftool.c hashmap: remove type arg from hashmap_{get,put,remove}_entry 2019-10-07 10:20:12 +09:00
env--helper.c env--helper: mark a file-local symbol as static 2019-07-11 14:31:04 -07:00
fast-export.c Merge branch 'ew/hashmap' 2019-10-15 13:48:02 +09:00
fetch-pack.c fetch_pack(): drop unused parameters 2019-03-20 18:34:09 +09:00
fetch.c Merge branch 'ds/commit-graph-set-size-mult' 2020-01-06 14:17:51 -08:00
fmt-merge-msg.c Merge branch 'hi/gpg-use-check-signature' 2019-12-10 13:11:45 -08:00
for-each-ref.c parse_opt_ref_sorting: always use with NONEG flag 2019-03-21 12:03:35 +09:00
fsck.c fsck: only provide oid/type in fsck_error callback 2019-10-28 14:05:18 +09:00
gc.c Fix spelling errors in code comments 2019-11-10 16:00:54 +09:00
get-tar-commit-id.c builtin/get-tar-commit-id: make hash size independent 2019-04-01 11:57:39 +09:00
grep.c Merge branch 'cb/pcre2-chartables-leakfix' 2019-10-23 14:43:11 +09:00
hash-object.c builtin: consistently pass cmd_* prefix to parse_options 2019-05-13 14:22:53 +09:00
help.c completion: add more parameter value completion 2019-02-20 12:31:56 -08:00
index-pack.c Merge branch 'bc/object-id-part17' 2019-10-11 14:24:46 +09:00
init-db.c Merge branch 'nd/init-relative-template-fix' into maint 2019-07-25 14:27:06 -07:00
interpret-trailers.c interpret-trailers: load default config 2019-06-19 07:12:49 -07:00
log.c Merge branch 'dl/format-patch-notes-config-fixup' 2019-12-25 11:21:58 -08:00
ls-files.c Merge branch 'ds/include-exclude' 2019-09-30 13:19:32 +09:00
ls-remote.c parse_opt_ref_sorting: always use with NONEG flag 2019-03-21 12:03:35 +09:00
ls-tree.c
mailinfo.c
mailsplit.c
merge-base.c
merge-file.c
merge-index.c
merge-ours.c
merge-recursive.c Ensure index matches head before invoking merge machinery, round N 2019-08-19 10:08:03 -07:00
merge-tree.c Merge branch 'jk/tree-walk-overflow' 2019-08-22 12:34:10 -07:00
merge.c Merge branch 'tg/stash-refresh-index' 2019-10-07 11:32:53 +09:00
mktag.c
mktree.c mktree: drop unused length parameter 2019-05-13 14:22:54 +09:00
multi-pack-index.c multi-pack-index: add [--[no-]progress] option. 2019-10-23 12:05:06 +09:00
mv.c
name-rev.c name-rev: release unused name strings 2020-02-05 10:24:15 -08:00
notes.c notes: fix minimum number of parameters to "copy" subcommand 2019-10-18 09:43:10 +09:00
pack-objects.c Fix spelling errors in code comments 2019-11-10 16:00:54 +09:00
pack-redundant.c object-store: rename and expand packed_git's sha1 member 2019-04-01 11:57:38 +09:00
pack-refs.c Honor core.precomposeUnicode in more places 2019-04-26 10:54:03 +09:00
patch-id.c patch-id: use oid_to_hex() to print multiple object IDs 2019-12-09 12:26:40 -08:00
prune-packed.c Merge branch 'rj/prune-packed-excess-args' 2019-03-07 09:59:55 +09:00
prune.c object: convert lookup_object() to use object_id 2019-06-20 10:18:09 -07:00
pull.c pull, fetch: add --set-upstream option 2019-08-19 13:05:58 -07:00
push.c push: use skip_prefix() instead of starts_with() 2019-11-27 11:18:39 +09:00
range-diff.c range-diff: clear `other_arg` at end of function 2019-12-06 12:36:53 -08:00
read-tree.c sparse-checkout: update working directory in-process 2019-11-22 16:11:44 +09:00
rebase.c Revert "Merge branch 'ra/rebase-i-more-options'" 2020-01-12 13:25:18 -08:00
receive-pack.c builtin/receive-pack: replace sha1_to_hex 2019-08-19 15:04:59 -07:00
reflog.c Merge branch 'jk/loose-object-cache-oid' 2019-02-06 22:05:27 -08:00
remote-ext.c
remote-fd.c
remote.c remote: pass NULL to read_ref_full() because object ID is not needed 2019-12-11 13:48:46 -08:00
repack.c Merge branch 'wb/midx-progress' 2019-11-10 18:02:14 +09:00
replace.c Merge branch 'bc/object-id-part17' 2019-10-11 14:24:46 +09:00
rerere.c
reset.c Merge branch 'am/pathspec-from-file' 2019-12-10 13:11:41 -08:00
rev-list.c Merge branch 'rs/dedup-includes' 2019-10-11 14:24:48 +09:00
rev-parse.c rev-parse: make --show-toplevel without a worktree an error 2019-11-20 10:19:58 +09:00
revert.c Merge branch 'ra/cherry-pick-revert-skip' 2019-07-19 11:30:21 -07:00
rm.c Merge branch 'jc/denoise-rm-to-resolve' into maint 2019-07-29 12:38:17 -07:00
send-pack.c
shortlog.c
show-branch.c show-branch: drop unused parameter from show_independent() 2019-05-13 14:22:54 +09:00
show-index.c builtin/show-index: replace sha1_to_hex 2019-08-19 15:04:59 -07:00
show-ref.c Merge branch 'en/unicode-in-refnames' 2019-05-19 16:45:30 +09:00
sparse-checkout.c sparse-checkout: list directories in cone mode 2019-12-30 09:07:18 -08:00
stash.c Merge branch 'tg/stash-refresh-index' 2019-12-01 09:04:37 -08:00
stripspace.c
submodule--helper.c Merge branch 'jt/clone-recursesub-ref-advise' 2019-12-10 13:11:43 -08:00
symbolic-ref.c
tag.c tag: add tag.gpgSign config option to force all tags be GPG-signed 2019-06-05 14:39:28 -07:00
unpack-file.c
unpack-objects.c builtin/unpack-objects.c: show throughput progress 2019-11-20 10:30:18 +09:00
update-index.c Merge branch 'js/update-index-ignore-removal-for-skip-worktree' 2019-11-10 18:02:16 +09:00
update-ref.c
update-server-info.c
upload-archive.c
upload-pack.c builtin: consistently pass cmd_* prefix to parse_options 2019-05-13 14:22:53 +09:00
var.c
verify-commit.c Merge branch 'jk/no-system-includes-in-dot-c' 2019-07-31 14:38:56 -07:00
verify-pack.c
verify-tag.c verify-tag: drop signal.h include 2019-06-19 08:19:21 -07:00
worktree.c Merge branch 'pb/no-recursive-reset-hard-in-worktree-add' 2019-12-01 09:04:31 -08:00
write-tree.c cmd_{read,write}_tree: rename "unused" variable that is used 2019-05-13 14:22:53 +09:00