"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.
* ds/name-hash-tweaks:
pack-objects: prevent name hash version change
test-tool: add helper for name-hash values
p5313: add size comparison test
pack-objects: add GIT_TEST_NAME_HASH_VERSION
repack: add --name-hash-version option
pack-objects: add --name-hash-version option
pack-objects: create new name-hash function version
As we will explore in later changes, the default name-hash function used
in 'git pack-objects' has a tendency to cause collisions and cause poor
delta selection. This change creates an alternative that avoids some
collisions while preserving some amount of hash locality.
The pack_name_hash() method has not been materially changed since it was
introduced in ce0bd64 (pack-objects: improve path grouping
heuristics., 2006-06-05). The intention here is to group objects by path
name, but also attempt to group similar file types together by making
the most-significant digits of the hash be focused on the final
characters.
Here's the crux of the implementation:
/*
* This effectively just creates a sortable number from the
* last sixteen non-whitespace characters. Last characters
* count "most", so things that end in ".c" sort together.
*/
while ((c = *name++) != 0) {
if (isspace(c))
continue;
hash = (hash >> 2) + (c << 24);
}
As the comment mentions, this only cares about the last sixteen
non-whitespace characters. This cause some filenames to collide more than
others. This collision is somewhat by design in order to promote hash
locality for files that have similar types (.c, .h, .json) or could be the
same file across a directory rename (a/foo.txt to b/foo.txt). This leads to
decent cross-path deltas in cases like shallow clones or packing a
repository with very few historical versions of files that share common data
with other similarly-named files.
However, when the name-hash instead leads to a large number of name-hash
collisions for otherwise unrelated files, this can lead to confusing the
delta calculation to prefer cross-path deltas over previous versions of the
same file.
The new pack_name_hash_v2() function attempts to fix this issue by
taking more of the directory path into account through its hash
function. Its naming implies that we will later wire up details for
choosing a name-hash function by version.
The first change is to be more careful about paths using non-ASCII
characters. With these characters in mind, reverse the bits in the byte
as the least-significant bits have the highest entropy and we want to
maximize their influence. This is done with some bit manipulation that
swaps the two halves, then the quarters within those halves, and then
the bits within those quarters.
The second change is to perform hash composition operations at every
level of the path. This is done by storing a 'base' hash value that
contains the hash of the parent directory. When reaching a directory
boundary, we XOR the current level's name-hash value with a downshift of
the previous level's hash. This perturbation intends to create low-bit
distinctions for paths with the same final 16 bytes but distinct parent
directory structures.
The collision rate and effectiveness of this hash function will be
explored in later changes as the function is integrated with 'git
pack-objects' and 'git repack'.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `delta_base_cache_limit` variable is a global config variable used
by multiple subsystems. Let's make this non-global, by adding this
variable independently to the subsystems where it is used.
First, add the setting to the `repo_settings` struct, this provides
access to the config in places where the repository is available. Use
this in `packfile.c`.
In `index-pack.c` we add it to the `pack_idx_option` struct and its
constructor. While the repository struct is available here, it may not
be set because `git index-pack` can be used without a repository.
In `gc.c` add it to the `gc_config` struct and also the constructor
function. The gc functions currently do not have direct access to a
repository struct.
These changes are made to remove the usage of `delta_base_cache_limit`
as a global variable in `packfile.c`. This brings us one step closer to
removing the `USE_THE_REPOSITORY_VARIABLE` definition in `packfile.c`
which we complete in the next patch.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pack-objects internals use a packing_data struct to track what
objects are part of the pack(s) being formed.
Since these structures contain allocated fields, failing to
appropriately free() them results in a leak. Plug that leak by
introducing a clear_packing_data() function, and call it in the
appropriate spots.
This is a fairly straightforward leak to plug, since none of the callers
expect to read any values or have any references to parts of the address
space being freed.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The size and padding of `struct object_entry` is an important factor in
determining the memory usage of `pack-objects`. For this reason,
3b13a5f263 (pack-objects: reorder members to shrink struct object_entry,
2018-04-14) added a comment containing some information from pahole
indicating the size and padding of that struct.
Unfortunately, this comment hasn't been updated since 9ac3f0e5b3
(pack-objects: fix performance issues on packing large deltas,
2018-07-22), despite the size of this struct changing many times since
that commit.
To see just how often the size of object_entry changes, I skimmed the
first-parent history with this script:
for sha in $(git rev-list --first-parent --reverse 9ac3f0e..)
do
echo -n "$sha "
git checkout -q $sha
make -s pack-objects.o 2>/dev/null
pahole -C object_entry pack-objects.o | sed -n \
-e 's/\/\* size: \([0-9]*\).*/size \1/p' \
-e 's/\/\*.*padding: \([0-9]*\).*/padding \1/p' | xargs
done | uniq -f1
In between each merge, the size of object_entry changes too often to
record every instance here. But the important merges (along with their
corresponding sizes and bit paddings) in chronological order are:
ad635e82d6 (Merge branch 'nd/pack-objects-pack-struct', 2018-05-23) size 80 padding 4
29d9e3e2c4 (Merge branch 'nd/pack-deltify-regression-fix', 2018-08-22) size 80 padding 9
3ebdef2e1b (Merge branch 'jk/pack-delta-reuse-with-bitmap', 2018-09-17) size 80 padding 8
33e4ae9c50 (Merge branch 'bc/sha-256', 2019-01-29) size 96 padding 8
(indicating that the current size of the struct is 96 bytes, with 8
padding bits).
Even though this comment was written in a good spirit, it is updated
infrequently enough that it serves to confuse rather than to encourage
contributors to update the appropriate values when the modify the
definition of object_entry.
For that reason, eliminate the confusion by removing the comment
altogether.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the `.mtimes` format is defined, supplement the pack-write API
to be able to conditionally write an `.mtimes` file along with a pack by
setting an additional flag and passing an oidmap that contains the
timestamps corresponding to each object in the pack.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the code that is only used in builtin/pack-objects.c out of
pack-objects.h.
This fixes an issue where Solaris's SunCC hasn't been able to compile
git since 483fa7f42d (t/helper/test-bitmap.c: initial commit,
2021-03-31).
The real origin of that issue is that in 898eba5e63 (pack-objects:
refer to delta objects by index instead of pointer, 2018-04-14)
utility functions only needed by builtin/pack-objects.c were added to
pack-objects.h. Since then the header has been used in a few other
places, but 483fa7f42d was the first time it was used by test helper.
Since Solaris is stricter about linking and the oe_get_size_slow()
function lives in builtin/pack-objects.c the build started failing
with:
Undefined first referenced
symbol in file
oe_get_size_slow t/helper/test-bitmap.o
ld: fatal: symbol referencing errors. No output written to t/helper/test-tool
On other platforms this is presumably OK because the compiler and/or
linker detects that the "static inline" functions that reference
oe_get_size_slow() aren't used.
Let's solve this by moving the relevant code from pack-objects.h to
builtin/pack-objects.c. This is almost entirely a code-only move, but
because of the early macro definitions in that file referencing some
of these inline functions we need to move the definition of "static
struct packing_data to_pack" earlier, and declare these inline
functions above the macros.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already store an object_id internally, and now our sole caller also
has one. Let's stop passing around the internal hash array, which adds a
bit of type safety.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch fixes an extreme slowdown in pack-objects when you have more
than 1023 packs. See below for numbers.
Since 43fa44fa3b (pack-objects: move in_pack out of struct object_entry,
2018-04-14), we use a complicated system to save some per-object memory.
Each object_entry structs gets a 10-bit field to store the index of the
pack it's in. We map those indices into pointers using
packing_data->in_pack_by_idx, which we initialize at the start of the
program. If we have 2^10 or more packs, then we instead create an array
of pack pointers, one per object. This is packing_data->in_pack.
So far so good. But there's one other tricky case: if a new pack arrives
after we've initialized in_pack_by_idx, it won't have an index yet. We
solve that by calling oe_map_new_pack(), which just switches on the fly
to the less-optimal in_pack mechanism, allocating the array and
back-filling it for already-seen objects.
But that logic kicks in even when we've switched to it already (whether
because we really did see a new pack, or because we had too many packs
in the first place). The result doesn't produce a wrong outcome, but
it's very slow. What happens is this:
- imagine you have a repo with 500k objects and 2000 packs that you
want to repack.
- before looking at any objects, we call prepare_in_pack_by_idx(). It
starts allocating an index for each pack. On the 1024th pack, it
sees there are too many, so it bails, leaving in_pack_by_idx as
NULL.
- while actually adding objects to the packing list, we call
oe_set_in_pack(), which checks whether the pack already has an
index. If it's one of the packs after the first 1023, then it
doesn't have one, and we'll call oe_map_new_pack().
But there's no useful work for that function to do. We're already
using in_pack, so it just uselessly walks over the complete list of
objects, trying to backfill in_pack.
And we end up doing this for almost 1000 packs (each of which may be
triggered by more than one object). And each time it triggers, we
may iterate over up to 500k objects. So in the absolute worst case,
this is quadratic in the number of objects.
The solution is simple: we don't need to bother checking whether the
pack has an index if we've already converted to using in_pack, since by
definition we're not going to use it. So we can just push the "does the
pack have a valid index" check down into that half of the conditional,
where we know we're going to use it.
The current test in p5303 sadly doesn't notice this problem, since it
maxes out at 1000 packs. If we add a new test to it at 2000 packs, it
does show the improvement:
Test HEAD^ HEAD
----------------------------------------------------------------------
5303.12: repack (2000) 26.72(39.68+0.67) 15.70(28.70+0.66) -41.2%
However, these many-pack test cases are rather expensive to run, so
adding larger and larger numbers isn't appealing. Instead, we can show
it off more easily by using GIT_TEST_FULL_IN_PACK_ARRAY, which forces us
into the absolute worst case: no pack has an index, so we'll trigger
oe_map_new_pack() pointlessly for every single object, making it truly
quadratic.
Here are the numbers (on git.git) with the included change to p5303:
Test HEAD^ HEAD
----------------------------------------------------------------------
5303.3: rev-list (1) 2.05(1.98+0.06) 2.06(1.99+0.06) +0.5%
5303.4: repack (1) 33.45(33.46+0.19) 2.75(2.73+0.22) -91.8%
5303.6: rev-list (50) 2.07(2.01+0.06) 2.06(2.01+0.05) -0.5%
5303.7: repack (50) 34.21(35.18+0.16) 3.49(4.50+0.12) -89.8%
5303.9: rev-list (1000) 2.87(2.78+0.08) 2.88(2.80+0.07) +0.3%
5303.10: repack (1000) 41.26(51.30+0.47) 10.75(20.75+0.44) -73.9%
Again, those improvements aren't realistic for the 1-pack case (because
in the real world, the full-array solution doesn't kick in), but it's
more useful to be testing the more-complicated code path.
While we're looking at this issue, we'll tweak one more thing: in
oe_map_new_pack(), we call REALLOC_ARRAY(pack->in_pack). But we'd never
expect to get here unless we're back-filling it for the first time, in
which case it would be NULL. So let's switch that to ALLOC_ARRAY() for
clarity, and add a BUG() to document the expectation. Unfortunately this
code isn't well-covered in the test suite because it's inherently racy
(it only kicks in if somebody else adds a new pack while we're in the
middle of repacking).
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once upon a time, the code to add an object to our packing list in
pack-objects all lived in a single function. It computed the position
within the hash table once, then used it to check if the object was
already present, and if not, to add it.
Later, in 2834bc27c1 (pack-objects: refactor the packing list,
2013-10-24), this was split into two functions: packlist_find() and
packlist_alloc(). We ended up with an "index_pos" variable that gets
passed through several functions to make it from one to the other.
The resulting code is rather confusing to follow. The "index_pos"
variable is sometimes undefined, if we don't yet have a hash table. This
works out in practice because in that case packlist_alloc() won't use it
at all, since it will have to create/grow the hash table. But it's hard
to verify that, and it does cause gcc 9.2.1's -Wmaybe-uninitialized to
complain when compiled with "-flto -O3" (rightfully, since we do pass
the uninitialized value as a function parameter, even if nobody ends up
using it).
All of this is to save computing the hash index again when we're
inserting into the hash table, which I found doesn't make a measurable
difference in the program runtime (which is not surprising, since we're
doing all kinds of other heavyweight things for each object).
Let's just drop this index_pos variable entirely, simplifying the code
(and pleasing the compiler).
We might be better still refactoring this custom hash table to use one
of our existing implementations (an oidmap, or a kh_oid_map). I stopped
short of that here, but this would be the likely first step towards that
anyway.
Reported-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The only caller of packlist_alloc() already has a "struct object_id",
and we immediately copy the hash they pass us into our own object_id.
Let's avoid the unnecessary round-trip to a raw sha1 pointer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We take a raw hash pointer, but most of our callers have a "struct
object_id" already. Let's switch to taking the full struct, which will
let us continue removing uses of raw sha1 buffers.
There are two callers that do need special attention:
- in rebuild_existing_bitmaps(), we need to switch to
nth_packed_object_oid(). This incurs an extra hash copy over
pointing straight to the mmap'd sha1, but it shouldn't be measurable
compared to the rest of the operation.
- in can_reuse_delta() we already spent the effort to copy the sha1
into a "struct object_id", but now we just have to do so a little
earlier in the function (we can't easily convert that function's
callers because they may be pointing at mmap'd REF_DELTA blocks).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 43fa44fa3b (pack-objects: move in_pack out of struct object_entry,
2018-04-14), we store the source pack for each object as a small index
rather than as a pointer. When we see a new pack that has no allocated
index, we fall back to generating an array of pointers by calling
oe_map_new_pack().
Perhaps counter-intuitively, that function does not need to actually see
our new index-less pack. It only allocates and populates the array with
the existing packs, after which oe_set_in_pack() actually adds the new
pack to the array.
Let's drop the unused "struct packed_git" argument to oe_map_new_pack()
to avoid confusion.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git pack-objects" incorrectly used uninitialized mutex, which has
been corrected.
* ph/pack-objects-mutex-fix:
pack-objects: merge read_lock and lock in packing_data struct
pack-objects: move read mutex to packing_data struct
Rename the packing_data lock to obd_lock and upgrade it to a recursive
mutex to make it suitable for current read_lock usages. Additionally
remove the superfluous #ifndef NO_PTHREADS guard around mutex
initialization in prepare_packing_data as the mutex functions
themselves are already protected.
Signed-off-by: Patrick Hogg <phogg@novamoon.net>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ac77d0c37 ("pack-objects: shrink size field in struct object_entry",
2018-04-14) added an extra usage of read_lock/read_unlock in the newly
introduced oe_get_size_slow for thread safety in parallel calls to
try_delta(). Unfortunately oe_get_size_slow is also used in serial
code, some of which is called before the first invocation of
ll_find_deltas. As such the read mutex is not guaranteed to be
initialized.
Resolve this by moving the read mutex to packing_data and initializing
it in prepare_packing_data which is initialized in cmd_pack_objects.
Signed-off-by: Patrick Hogg <phogg@novamoon.net>
Reviewed-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few issues in the implementation of "delta-islands" feature has
been corrected.
* cc/delta-islands:
pack-objects: fix off-by-one in delta-island tree-depth computation
pack-objects: zero-initialize tree_depth/layer arrays
pack-objects: fix tree_depth and layer invariants
Commit 108f530385 (pack-objects: move tree_depth into 'struct
packing_data', 2018-08-16) started maintaining a tree_depth array that
matches the "objects" array. We extend the array when:
1. The objects array is extended, in which case we use realloc to
extend the tree_depth array.
2. A caller asks to store a tree_depth for object N, and this is the
first such request; we create the array from scratch and store the
value for N.
In the latter case, though, we use regular xmalloc(), and the depth
values for any objects besides N is undefined. This happens to not
trigger a bug with the current code, but the reasons are quite subtle:
- we never ask about the depth for any object with index i < N. This is
because we store the depth immediately for all trees and blobs. So
any such "i" must be a non-tree, and therefore we will never need to
care about its depth (in fact, we really only care about the depth of
trees).
- there are no objects at this point with index i > N, because we
always fill in the depth for a tree immediately after its object
entry is created (we may still allocate uninitialized depth entries,
but they'll be initialized by packlist_alloc() when it initializes
the entry in the "objects" array).
So it works, but only by chance. To be defensive, let's zero the array,
which matches the "unset" values which would be handed out by
oe_tree_depth() already.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 108f530385 (pack-objects: move tree_depth into 'struct
packing_data', 2018-08-16) dynamically manages a tree_depth array in
packing_data that maintains one of these invariants:
1. tree_depth is NULL (i.e., the requested options don't require us to
track tree depths)
2. tree_depth is non-NULL and has as many entries as the "objects"
array
We maintain (2) by:
a. When the objects array grows, grow tree_depth to the same size
(unless it's NULL, in which case we can leave it).
b. When a caller asks to set a depth via oe_set_tree_depth(), if
tree_depth is NULL we allocate it.
But in (b), we use the number of stored objects, _not_ the allocated
size of the objects array. So we can run into a situation like this:
1. packlist_alloc() needs to store the Nth object, so it grows the
objects array to M, where M > N.
2. oe_set_tree_depth() wants to store a depth, so it allocates an
array of length N. Now we've violated our invariant.
3. packlist_alloc() needs to store the N+1th object. But it _doesn't_
grow the objects array, since N <= M still holds. We try to assign
to tree_depth[N+1], which is out of bounds.
That doesn't happen in our test scripts, because the repositories they
use are so small, but it's easy to trigger by running:
echo HEAD | git pack-objects --revs --delta-islands --stdout >/dev/null
in any reasonably-sized repo (like git.git).
We can fix it by always growing the array to match pack->nr_alloc, not
pack->nr_objects. Likewise for the "layer" array from fe0ac2fb7f
(pack-objects: move 'layer' into 'struct packing_data', 2018-08-16),
which has the same bug.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The codebase has been cleaned up to reduce "#ifndef NO_PTHREADS".
* nd/pthreads:
Clean up pthread_create() error handling
read-cache.c: initialize copy_len to shut up gcc 8
read-cache.c: reduce branching based on HAVE_THREADS
read-cache.c: remove #ifdef NO_PTHREADS
pack-objects: remove #ifdef NO_PTHREADS
preload-index.c: remove #ifdef NO_PTHREADS
grep: clean up num_threads handling
grep: remove #ifdef NO_PTHREADS
attr.c: remove #ifdef NO_PTHREADS
name-hash.c: remove #ifdef NO_PTHREADS
index-pack: remove #ifdef NO_PTHREADS
send-pack.c: move async's #ifdef NO_PTHREADS back to run-command.c
run-command.h: include thread-utils.h instead of pthread.h
thread-utils: macros to unconditionally compile pthreads API
A mutex used in "git pack-objects" were not correctly initialized
and this caused "git repack" to dump core on Windows.
* js/pack-objects-mutex-init-fix:
pack-objects (mingw): initialize `packing_data` mutex in the correct spot
pack-objects (mingw): demonstrate a segmentation fault with large deltas
pack-objects: fix typo 'detla' -> 'delta'
Lift code from GitHub to restrict delta computation so that an
object that exists in one fork is not made into a delta against
another object that does not appear in the same forked repository.
* cc/delta-islands:
pack-objects: move 'layer' into 'struct packing_data'
pack-objects: move tree_depth into 'struct packing_data'
t5320: tests for delta islands
repack: add delta-islands support
pack-objects: add delta-islands support
pack-objects: refactor code into compute_layer_order()
Add delta-islands.{c,h}
When creating a thin pack, which allows objects to be made into a
delta against another object that is not in the resulting pack but
is known to be present on the receiving end, the code learned to
take advantage of the reachability bitmap; this allows the server
to send a delta against a base beyond the "boundary" commit.
* jk/pack-delta-reuse-with-bitmap:
pack-objects: reuse on-disk deltas for thin "have" objects
pack-bitmap: save "have" bitmap from walk
t/perf: add perf tests for fetches from a bitmapped server
t/perf: add infrastructure for measuring sizes
t/perf: factor out percent calculations
t/perf: factor boilerplate out of test_perf
In a recent update in 2.18 era, "git pack-objects" started
producing a larger than necessary packfiles by missing
opportunities to use large deltas.
* nd/pack-deltify-regression-fix:
pack-objects: fix performance issues on packing large deltas
When we serve a fetch, we pass the "wants" and "haves" from
the fetch negotiation to pack-objects. That tells us not
only which objects we need to send, but we also use the
boundary commits as "preferred bases": their trees and blobs
are candidates for delta bases, both for reusing on-disk
deltas and for finding new ones.
However, this misses some opportunities. Modulo some special
cases like shallow or partial clones, we know that every
object reachable from the "haves" could be a preferred base.
We don't use all of them for two reasons:
1. It's expensive to traverse the whole history and
enumerate all of the objects the other side has.
2. The delta search is expensive, so we want to keep the
number of candidate bases sane. The boundary commits
are the most likely to work.
When we have reachability bitmaps, though, reason 1 no
longer applies. We can efficiently compute the set of
reachable objects on the other side (and in fact already did
so as part of the bitmap set-difference to get the list of
interesting objects). And using this set conveniently
covers the shallow and partial cases, since we have to
disable the use of bitmaps for those anyway.
The second reason argues against using these bases in the
search for new deltas. But there's one case where we can use
this information for free: when we have an existing on-disk
delta that we're considering reusing, we can do so if we
know the other side has the base object. This in fact saves
time during the delta search, because it's one less delta we
have to compute.
And that's exactly what this patch does: when we're
considering whether to reuse an on-disk delta, if bitmaps
tell us the other side has the object (and we're making a
thin-pack), then we reuse it.
Here are the results on p5311 using linux.git, which
simulates a client fetching after `N` days since their last
fetch:
Test origin HEAD
--------------------------------------------------------------------------
5311.3: server (1 days) 0.27(0.27+0.04) 0.12(0.09+0.03) -55.6%
5311.4: size (1 days) 0.9M 237.0K -73.7%
5311.5: client (1 days) 0.04(0.05+0.00) 0.10(0.10+0.00) +150.0%
5311.7: server (2 days) 0.34(0.42+0.04) 0.13(0.10+0.03) -61.8%
5311.8: size (2 days) 1.5M 347.7K -76.5%
5311.9: client (2 days) 0.07(0.08+0.00) 0.16(0.15+0.01) +128.6%
5311.11: server (4 days) 0.56(0.77+0.08) 0.13(0.10+0.02) -76.8%
5311.12: size (4 days) 2.8M 566.6K -79.8%
5311.13: client (4 days) 0.13(0.15+0.00) 0.34(0.31+0.02) +161.5%
5311.15: server (8 days) 0.97(1.39+0.11) 0.30(0.25+0.05) -69.1%
5311.16: size (8 days) 4.3M 1.0M -76.0%
5311.17: client (8 days) 0.20(0.22+0.01) 0.53(0.52+0.01) +165.0%
5311.19: server (16 days) 1.52(2.51+0.12) 0.30(0.26+0.03) -80.3%
5311.20: size (16 days) 8.0M 2.0M -74.5%
5311.21: client (16 days) 0.40(0.47+0.03) 1.01(0.98+0.04) +152.5%
5311.23: server (32 days) 2.40(4.44+0.20) 0.31(0.26+0.04) -87.1%
5311.24: size (32 days) 14.1M 4.1M -70.9%
5311.25: client (32 days) 0.70(0.90+0.03) 1.81(1.75+0.06) +158.6%
5311.27: server (64 days) 11.76(26.57+0.29) 0.55(0.50+0.08) -95.3%
5311.28: size (64 days) 89.4M 47.4M -47.0%
5311.29: client (64 days) 5.71(9.31+0.27) 15.20(15.20+0.32) +166.2%
5311.31: server (128 days) 16.15(36.87+0.40) 0.91(0.82+0.14) -94.4%
5311.32: size (128 days) 134.8M 100.4M -25.5%
5311.33: client (128 days) 9.42(16.86+0.49) 25.34(25.80+0.46) +169.0%
In all cases we save CPU time on the server (sometimes
significant) and the resulting pack is smaller. We do spend
more CPU time on the client side, because it has to
reconstruct more deltas. But that's the right tradeoff to
make, since clients tend to outnumber servers. It just means
the thin pack mechanism is doing its job.
From the user's perspective, the end-to-end time of the
operation will generally be faster. E.g., in the 128-day
case, we saved 15s on the server at a cost of 16s on the
client. Since the resulting pack is 34MB smaller, this is a
net win if the network speed is less than 270Mbit/s. And
that's actually the worst case. The 64-day case saves just
over 11s at a cost of just under 11s. So it's a slight win
at any network speed, and the 40MB saved is pure bonus. That
trend continues for the smaller fetches.
The implementation itself is mostly straightforward, with
the new logic going into check_object(). But there are two
tricky bits.
The first is that check_object() needs access to the
relevant information (the thin flag and bitmap result). We
can do this by pushing these into program-lifetime globals.
The second is that the rest of the code assumes that any
reused delta will point to another "struct object_entry" as
its base. But of course the case we are interested in here
is the one where don't have such an entry!
I looked at a number of options that didn't quite work:
- we could use a flag to signal a reused delta, but it's
not a single bit. We have to actually store the oid of
the base, which is normally done by pointing to the
existing object_entry. And we'd have to modify all the
code which looks at deltas.
- we could add the reused bases to the end of the existing
object_entry array. While this does create some extra
work as later stages consider the extra entries, it's
actually not too bad (we're not sending them, so they
don't cost much in the delta search, and at most we'd
have 2*N of them).
But there's a more subtle problem. Adding to the existing
array means we might need to grow it with realloc, which
could move the earlier entries around. While many of the
references to other entries are done by integer index,
some (including ones on the stack) use pointers, which
would become invalidated.
This isn't insurmountable, but it would require quite a
bit of refactoring (and it's hard to know that you've got
it all, since it may work _most_ of the time and then
fail subtly based on memory allocation patterns).
- we could allocate a new one-off entry for the base. In
fact, this is what an earlier version of this patch did.
However, since the refactoring brought in by ad635e82d6
(Merge branch 'nd/pack-objects-pack-struct', 2018-05-23),
the delta_idx code requires that both entries be in the
main packing list.
So taking all of those options into account, what I ended up
with is a separate list of "external bases" that are not
part of the main packing list. Each delta entry that points
to an external base has a single-bit flag to do so; we have a
little breathing room in the bitfield section of
object_entry.
This lets us limit the change primarily to the oe_delta()
and oe_set_delta_ext() functions. And as a bonus, most of
the rest of the code does not consider these dummy entries
at all, saving both runtime CPU and code complexity.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reduces the size of 'struct object_entry' from 88 bytes
to 80 and therefore makes packing objects more efficient.
For example on a Linux repo with 12M objects,
`git pack-objects --all` needs extra 96MB memory even if the
layer feature is not used.
Helped-by: Jeff King <peff@peff.net>
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reduces the size of 'struct object_entry' and therefore
makes packing objects more efficient.
This also renames cmp_tree_depth() into tree_depth_compare(),
as it is more modern to have the name of the compare functions
end with "compare".
Helped-by: Jeff King <peff@peff.net>
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hosting providers that allow users to "fork" existing
repos want those forks to share as much disk space as
possible.
Alternates are an existing solution to keep all the
objects from all the forks into a unique central repo,
but this can have some drawbacks. Especially when
packing the central repo, deltas will be created
between objects from different forks.
This can make cloning or fetching a fork much slower
and much more CPU intensive as Git might have to
compute new deltas for many objects to avoid sending
objects from a different fork.
Because the inefficiency primarily arises when an
object is deltified against another object that does
not exist in the same fork, we partition objects into
sets that appear in the same fork, and define
"delta islands". When finding delta base, we do not
allow an object outside the same island to be
considered as its base.
So "delta islands" is a way to store objects from
different forks in the same repo and packfile without
having deltas between objects from different forks.
This patch implements the delta islands mechanism in
"delta-islands.{c,h}", but does not yet make use of it.
A few new fields are added in 'struct object_entry'
in "pack-objects.h" though.
The documentation will follow in a patch that actually
uses delta islands in "builtin/pack-objects.c".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I looped over the toplevel header files, creating a temporary two-line C
program for each consisting of
#include "git-compat-util.h"
#include $HEADER
This patch is the result of manually fixing errors in compiling those
tiny programs.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's start with some background about oe_delta_size() and
oe_set_delta_size(). If you already know, skip the next paragraph.
These two are added in 0aca34e826 (pack-objects: shrink delta_size
field in struct object_entry - 2018-04-14) to help reduce 'struct
object_entry' size. The delta size field in this struct is reduced to
only contain max 1MB. So if any new delta is produced and larger than
1MB, it's dropped because we can't really save such a large size
anywhere. Fallback is provided in case existing packfiles already have
large deltas, then we can retrieve it from the pack.
While this should help small machines repacking large repos without
large deltas (i.e. less memory pressure), dropping large deltas during
the delta selection process could end up with worse pack files. And if
existing packfiles already have >1MB delta and pack-objects is
instructed to not reuse deltas, all of them will be dropped on the
floor, and the resulting pack would be definitely bigger.
There is also a regression in terms of CPU/IO if we have large on-disk
deltas because fallback code needs to parse the pack every time the
delta size is needed and just access to the mmap'd pack data is enough
for extra page faults when memory is under pressure.
Both of these issues were reported on the mailing list. Here's some
numbers for comparison.
Version Pack (MB) MaxRSS(kB) Time (s)
------- --------- ---------- --------
2.17.0 5498 43513628 2494.85
2.18.0 10531 40449596 4168.94
This patch provides a better fallback that is
- cheaper in terms of cpu and io because we won't have to read
existing pack files as much
- better in terms of pack size because the pack heuristics is back to
2.17.0 time, we do not drop large deltas at all
If we encounter any delta (on-disk or created during try_delta phase)
that is larger than the 1MB limit, we stop using delta_size_ field for
this because it can't contain such size anyway. A new array of delta
size is dynamically allocated and can hold all the deltas that 2.17.0
can. This array only contains delta sizes that delta_size_ can't
contain.
With this, we do not have to drop deltas in try_delta() anymore. Of
course the downside is we use slightly more memory, even compared to
2.17.0. But since this is considered an uncommon case, a bit more
memory consumption should not be a problem.
Delta size limit is also raised from 1MB to 16MB to better cover
common case and avoid that extra memory consumption (99.999% deltas in
this reported repo are under 12MB; Jeff noted binary artifacts topped
out at about 3MB in some other private repos). Other fields are
shuffled around to keep this struct packed tight. We don't use more
memory in common case even with this limit update.
A note about thread synchronization. Since this code can be run in
parallel during delta searching phase, we need a mutex. The realloc
part in packlist_alloc() is not protected because it only happens
during the object counting phase, which is always single-threaded.
Access to e->delta_size_ (and by extension
pack->delta_size[e - pack->objects]) is unprotected as before, the
thread scheduler in pack-objects must make sure "e" is never updated
by two different threads.
The area under the new lock is as small as possible, avoiding locking
at all in common case, since lock contention with high thread count
could be expensive (most blobs are small enough that delta compute
time is short and we end up taking the lock very often). The previous
attempt to always hold a lock in oe_delta_size() and
oe_set_delta_size() increases execution time by 33% when repacking
linux.git with with 40 threads.
Reported-by: Elijah Newren <newren@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git pack-objects" needs to allocate tons of "struct object_entry"
while doing its work, and shrinking its size helps the performance
quite a bit.
* nd/pack-objects-pack-struct:
ci: exercise the whole test suite with uncommon code in pack-objects
pack-objects: reorder members to shrink struct object_entry
pack-objects: shrink delta_size field in struct object_entry
pack-objects: shrink size field in struct object_entry
pack-objects: clarify the use of object_entry::size
pack-objects: don't check size when the object is bad
pack-objects: shrink z_delta_size field in struct object_entry
pack-objects: refer to delta objects by index instead of pointer
pack-objects: move in_pack out of struct object_entry
pack-objects: move in_pack_pos out of struct object_entry
pack-objects: use bitfield for object_entry::depth
pack-objects: use bitfield for object_entry::dfs_state
pack-objects: turn type and in_pack_type to bitfields
pack-objects: a bit of document about struct object_entry
read-cache.c: make $GIT_TEST_SPLIT_INDEX boolean
pack-objects could be a big memory hog especially on large repos,
everybody knows that. The suggestion to stick a .keep file on the
giant base pack to avoid this problem is also known for a long time.
Recent patches add an option to do just this, but it has to be either
configured or activated manually. This patch lets `git gc --auto`
activate this mode automatically when it thinks `repack -ad` will use
a lot of memory and start affecting the system due to swapping or
flushing OS cache.
gc --auto decides to do this based on an estimation of pack-objects
memory usage, which is quite accurate at least for the heap part, and
whether that fits in half of system memory (the assumption here is for
desktop environment where there are many other applications running).
This mechanism only kicks in if gc.bigBasePackThreshold is not configured.
If it is, it is assumed that the user already knows what they want.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previous patches leave lots of holes and padding in this struct. This
patch reorders the members and shrinks the struct down to 80 bytes
(from 136 bytes on 64-bit systems, before any field shrinking is done)
with 16 bits to spare (and a couple more in in_pack_header_size when
we really run out of bits).
This is the last in a series of memory reduction patches (see
"pack-objects: a bit of document about struct object_entry" for the
first one).
Overall they've reduced repack memory size on linux-2.6.git from
3.747G to 3.424G, or by around 320M, a decrease of 8.5%. The runtime
of repack has stayed the same throughout this series. Ævar's testing
on a big monorepo he has access to (bigger than linux-2.6.git) has
shown a 7.9% reduction, so the overall expected improvement should be
somewhere around 8%.
See 87po42cwql.fsf@evledraar.gmail.com on-list
(https://public-inbox.org/git/87po42cwql.fsf@evledraar.gmail.com/) for
more detailed numbers and a test script used to produce the numbers
cited above.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allowing a delta size of 64 bits is crazy. Shrink this field down to
20 bits with one overflow bit.
If we find an existing delta larger than 1MB, we do not cache
delta_size at all and will get the value from oe_size(), potentially
from disk if it's larger than 4GB.
Note, since DELTA_SIZE() is used in try_delta() code, it must be
thread-safe. Luckily oe_size() does guarantee this so we it is
thread-safe.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's very very rare that an uncompressed object is larger than 4GB
(partly because Git does not handle those large files very well to
begin with). Let's optimize it for the common case where object size
is smaller than this limit.
Shrink size field down to 31 bits and one overflow bit. If the size is
too large, we read it back from disk. As noted in the previous patch,
we need to return the delta size instead of canonical size when the
to-be-reused object entry type is a delta instead of a canonical one.
Add two compare helpers that can take advantage of the overflow
bit (e.g. if the file is 4GB+, chances are it's already larger than
core.bigFileThreshold and there's no point in comparing the actual
value).
Another note about oe_get_size_slow(). This function MUST be thread
safe because SIZE() macro is used inside try_delta() which may run in
parallel. Outside parallel code, no-contention locking should be dirt
cheap (or insignificant compared to i/o access anyway). To exercise
this code, it's best to run the test suite with something like
make test GIT_TEST_OE_SIZE=4
which forces this code on all objects larger than 3 bytes.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While this field most of the time contains the canonical object size,
there is one case it does not: when we have found that the base object
of the delta in question is also to be packed, we will very happily
reuse the delta by copying it over instead of regenerating the new
delta.
"size" in this case will record the delta size, not canonical object
size. Later on in write_reuse_object(), we reconstruct the delta
header and "size" is used for this purpose. When this happens, the
"type" field contains a delta type instead of a canonical type.
Highlight this in the code since it could be tricky to see.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We only cache deltas when it's smaller than a certain limit. This limit
defaults to 1000 but save its compressed length in a 64-bit field.
Shrink that field down to 20 bits, so you can only cache 1MB deltas.
Larger deltas must be recomputed at when the pack is written down.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These delta pointers always point to elements in the objects[] array
in packing_data struct. We can only hold maximum 4G of those objects
because the array size in nr_objects is uint32_t. We could use
uint32_t indexes to address these elements instead of pointers. On
64-bit architecture (8 bytes per pointer) this would save 4 bytes per
pointer.
Convert these delta pointers to indexes. Since we need to handle NULL
pointers as well, the index is shifted by one [1].
[1] This means we can only index 2^32-2 objects even though nr_objects
could contain 2^32-1 objects. It should not be a problem in
practice because when we grow objects[], nr_alloc would probably
blow up long before nr_objects hits the wall.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
pack. Use an index instead since the number of packs should be
relatively small.
This limits the number of packs we can handle to 1k. Since we can't be
sure people can never run into the situation where they have more than
1k pack files. Provide a fall back route for it.
If we find out they have too many packs, the new in_pack_by_idx[]
array (which has at most 1k elements) will not be used. Instead we
allocate in_pack[] array that holds nr_objects elements. This is
similar to how the optional in_pack_pos field is handled.
The new simple test is just to make sure the too-many-packs code path
is at least executed. The true test is running
make test GIT_TEST_FULL_IN_PACK_ARRAY=1
to take advantage of other special case tests.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This field is only need for pack-bitmap, which is an optional
feature. Move it to a separate array that is only allocated when
pack-bitmap is used (like objects[], it is not freed, since we need it
until the end of the process)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because of struct packing from now on we can only handle max depth
4095 (or even lower when new booleans are added in this struct). This
should be ok since long delta chain will cause significant slow down
anyway.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An extra field type_valid is added to carry the equivalent of OBJ_BAD
in the original "type" field. in_pack_type always contains a valid
type so we only need 3 bits for it.
A note about accepting OBJ_NONE as "valid" type. The function
read_object_list_from_stdin() can pass this value [1] and it
eventually calls create_object_entry() where current code skip setting
"type" field if the incoming type is zero. This does not have any bad
side effects because "type" field should be memset()'d anyway.
But since we also need to set type_valid now, skipping oe_set_type()
leaves type_valid zero/false, which will make oe_type() return
OBJ_BAD, not OBJ_NONE anymore. Apparently we do care about OBJ_NONE in
prepare_pack(). This switch from OBJ_NONE to OBJ_BAD may trigger
fatal: unable to get type of object ...
Accepting OBJ_NONE [2] does sound wrong, but this is how it is has
been for a very long time and I haven't time to dig in further.
[1] See 5c49c11686 (pack-objects: better check_object() performances -
2007-04-16)
[2] 21666f1aae (convert object type handling from a string to a number
- 2007-02-26)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The role of this comment block becomes more important after we shuffle
fields around to shrink this struct. It will be much harder to see what
field is related to what.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>