When generating bitmaps, `bitmap_builder_init()` starts with an initial
selection of commits to receive bitmap coverage, and then determines a
set of "maximal" commits based on its input.
Commit 089f751360 (pack-bitmap-write: build fewer intermediate bitmaps,
2020-12-08) has extensive details, but the gist is as follows:
Each selected commit starts with one commit_mask bit in its "commit
mask" bitmap. Then, we walk the first-parent history in topological
order and OR each commit's mask into its (first) parent. Whenever that
OR results in the parent having more bits set, the child is deemed to be
non-maximal, and the frontier is pushed further back along the first
parent history.
That approach works extremely well for ordinary selected commits, whose
first-parent histories often describe real sharing between the bitmaps
we are going to write.
It struggles, however, to efficiently generate pseudo-merge bitmaps.
Unlike ordinary commits for which the above algorithm is designed,
pseudo-merges don't represent any "real" commit in history, just a
grouping of non-bitmapped reference tips. In that sense, their first
parent is just a part of a larger set, and treating them like ordinary
selected commits imposes a significant slow-down when generating bitmaps
with pseudo-merges enabled.
Consider partitioning all non-bitmapped reference tips into eight
individual pseudo-merges via the following configuration:
[bitmapPseudoMerge "all"]
pattern=refs/
threshold=now
stableSize=10000000
maxMerges=8
, the cost of generating a bitmap from scratch rises significantly:
+------------------+-----------------+---------------+---------------------+
| | no pseudo-merge | pseudo-merges | Delta |
| | | (HEAD^) | |
+------------------+-----------------+---------------+---------------------+
| elapsed | 294.1 s | 575.0 s | +280.9 s (+95.5%) |
| cycles | 1,365.5 B | 2,686.9 B | +1,321.4 B (+96.8%) |
| instructions | 1,389.8 B | 2,546.6 B | +1,156.8 B (+83.2%) |
| CPI | 0.983 | 1.055 | +0.073 (+7.4%) |
+------------------+-----------------+---------------+---------------------+
This is a particularly poor trade-off, because the time saved by these
pseudo-merges during, e.g.,
$ git rev-list --count --all --objects --use-bitmap-index
is only:
$ hyperfine -L v true,false -n 'pseudo-merges: {v}' '
GIT_TEST_USE_PSEUDO_MERGES={v} git.compile rev-list --count \
--objects --all --use-bitmap-index
'
Benchmark 1: pseudo-merges: true
Time (mean ± σ): 2.613 s ± 0.012 s [User: 2.308 s, System: 0.305 s]
Range (min … max): 2.594 s … 2.633 s 10 runs
Benchmark 2: pseudo-merges: false
Time (mean ± σ): 52.205 s ± 0.170 s [User: 51.500 s, System: 0.697 s]
Range (min … max): 51.956 s … 52.458 s 10 runs
Summary
pseudo-merges: true ran
19.98 ± 0.11 times faster than pseudo-merges: false
In other words, we pay a nearly ~5 minute penalty to generate
pseudo-merge bitmaps, but only save ~50 seconds during traversal.
The problem stems from injecting pseudo-merges into the bitmap builder
as if they were normal commits. The maximal commit selection algorithm
was simply not designed for that case, and performs predictably poorly.
The only reason we reused the maximal commit selection routine for
pseudo-merges alongside regular non-pseudo-merge commits is because we
represent them both as commit objects (where the pseudo-merge commits
just represent a made-up commit as opposed to one that actually exists
in a repository's object store).
Instead, build the regular selected commit bitmaps first, considering
only non-pseudo-merge commits in `bitmap_builder_init()`. Once those
bitmaps have been stored, build each pseudo-merge bitmap separately and
attach its parent and object bitmaps to the corresponding pseudo-merge
entry before writing the extension.
This keeps the regular bitmap build shaped like the no-pseudo-merge
case. The later pseudo-merge fill can still stop at stored selected
ancestor bitmaps, so it does not have to rewalk each pseudo-merge
closure from scratch.
When an existing bitmap has the same pseudo-merge parent set, reuse and
remap that whole pseudo-merge bitmap before falling back to
fill_bitmap_commit(). This preserves the benefit of stable pseudo-merges
while keeping the on-disk format and reader behavior unchanged.
As a result, the overhead cost for generating pseudo-merges in the above
configuration is much smaller:
+------------------+-----------------+---------------+-------------------+
| | no pseudo-merge | pseudo-merges | Delta |
| | | (HEAD) | |
+------------------+-----------------+---------------+-------------------+
| elapsed | 294.1 s | 328.4 s | +34.3 s (+11.7%) |
| cycles | 1,365.5 B | 1,529.3 B | +163.7 B (+12.0%) |
| instructions | 1,389.8 B | 1,552.8 B | +163.0 B (+11.7%) |
| CPI | 0.983 | 0.985 | +0.002 (+0.2%) |
+------------------+-----------------+---------------+-------------------+
Recall that at the start of this series, generating reachability bitmaps
took 612.5 seconds *without* pseudo-merges. With this commit, it is
still ~46.38% *faster* to generate reachability bitmaps *with*
pseudo-merges than it was to generate bitmaps wihtout them at the
beginning of this series.
The changes to implement this are mostly straightforward. We exclude
pseudo-merge commits from the existing bitmap generation, and walk over
them in a separate pass, by either reusing an existing on-disk
pseudo-merge, or passing the pseudo-merge commit itself back to the
existing routine in `fill_bitmap_commit()`.
(Note that the routine to build pseudo-merge bitmaps is the same both
before and after this change, the difference is only that we do not let
psuedo-merges participate in determining the set of maximal commits.)
The only wrinkle is that `fill_bitmap_commit()` must be taught to not
expect that all tree objects have been parsed, which is the case for any
portion of history reachable by one or more pseudo-merge(s), but not by
any non-pseudo-merge commit selected for bitmapping.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git - fast, scalable, distributed revision control system
Git is a fast, scalable, distributed revision control system with an
unusually rich command set that provides both high-level operations
and full access to internals.
Git is an Open Source project covered by the GNU General Public
License version 2 (some parts of it are under different licenses,
compatible with the GPLv2). It was originally written by Linus
Torvalds with help of a group of hackers around the net.
Please read the file INSTALL for installation instructions.
Many Git online resources are accessible from https://git-scm.com/
including full documentation and Git related tools.
See Documentation/gittutorial.adoc to get started, then see
Documentation/giteveryday.adoc for a useful minimum set of commands, and
Documentation/git-<commandname>.adoc for documentation of each command.
If git has been correctly installed, then the tutorial can also be
read with man gittutorial or git help tutorial, and the
documentation of each command with man git-<commandname> or git help <commandname>.
Those wishing to help with error message, usage and informational message
string translations (localization l10) should see po/README.md
(a po file is a Portable Object file that holds the translations).
Issues which are security relevant should be disclosed privately to
the Git Security mailing list git-security@googlegroups.com.
The maintainer frequently sends the "What's cooking" reports that
list the current status of various development topics to the mailing
list. The discussion following them give a good reference for
project status, development direction and remaining tasks.
The name "git" was given by Linus Torvalds when he wrote the very
first version. He described the tool as "the stupid content tracker"
and the name as (depending on your mood):
random three-letter combination that is pronounceable, and not
actually used by any common UNIX command. The fact that it is a
mispronunciation of "get" may or may not be relevant.
stupid. contemptible and despicable. simple. Take your pick from the
dictionary of slang.
"global information tracker": you're in a good mood, and it actually
works for you. Angels sing, and a light suddenly fills the room.
"goddamn idiotic truckload of sh*t": when it breaks