Documentation: describe incremental MIDX bitmaps

Prepare to implement support for reachability bitmaps for the new
incremental multi-pack index (MIDX) feature over the following commits.

This commit begins by first describing the relevant format and usage
details for incremental MIDX bitmaps.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
maint
Taylor Blau 2025-03-20 13:56:28 -04:00 committed by Junio C Hamano
parent 4a9179d151
commit 4887bdd4c7
1 changed files with 72 additions and 0 deletions

View File

@ -164,6 +164,78 @@ objects_nr($H2) + objects_nr($H1) + i
(in the C implementation, this is often computed as `i +
m->num_objects_in_base`).

=== Pseudo-pack order for incremental MIDXs

The original implementation of multi-pack reachability bitmaps defined
the pseudo-pack order in linkgit:gitformat-pack[5] (see the section
titled "multi-pack-index reverse indexes") roughly as follows:

____
In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
objects in packs stored by the MIDX, laid out in pack order, and the
packs arranged in MIDX order (with the preferred pack coming first).
____

In the incremental MIDX design, we extend this definition to include
objects from multiple layers of the MIDX chain. The pseudo-pack order
for incremental MIDXs is determined by concatenating the pseudo-pack
ordering for each layer of the MIDX chain in order. Formally two objects
`o1` and `o2` are compared as follows:

1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
`o1` sorts ahead of `o2`.

2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
MIDX layer has no base, then if one of `pack(o1)` and `pack(o2)` is
preferred and the other is not, then the preferred one sorts ahead of
the non-preferred one. If there is a base layer (i.e. the MIDX layer
is not the first layer in the chain), then if `pack(o1)` appears
earlier in that MIDX layer's pack order, then `o1` sorts ahead of
`o2`. Likewise if `pack(o2)` appears earlier, then the opposite is
true.

3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
same MIDX layer. Sort `o1` and `o2` by their offset within their
containing packfile.

Note that the preferred pack is a property of the MIDX chain, not the
individual layers themselves. Fundamentally we could introduce a
per-layer preferred pack, but this is less relevant now that we can
perform multi-pack reuse across the set of packs in a MIDX.

=== Reachability bitmaps and incremental MIDXs

Each layer of an incremental MIDX chain may have its objects (and the
objects from any previous layer in the same MIDX chain) represented in
its own `*.bitmap` file.

The structure of a `*.bitmap` file belonging to an incremental MIDX
chain is identical to that of a non-incremental MIDX bitmap, or a
classic single-pack bitmap. Since objects are added to the end of the
incremental MIDX's pseudo-pack order (see above), it is possible to
extend a bitmap when appending to the end of a MIDX chain.

(Note: it is possible likewise to compress a contiguous sequence of MIDX
incremental layers, and their `*.bitmap` files into a single layer and
`*.bitmap`, but this is not yet implemented.)

The object positions used are global within the pseudo-pack order, so
subsequent layers will have, for example, `m->num_objects_in_base`
number of `0` bits in each of their four type bitmaps. This follows from
the fact that we only write type bitmap entries for objects present in
the layer immediately corresponding to the bitmap).

Note also that only the bitmap pertaining to the most recent layer in an
incremental MIDX chain is used to store reachability information about
the interesting and uninteresting objects in a reachability query.
Earlier bitmap layers are only used to look up commit and pseudo-merge
bitmaps from that layer, as well as the type-level bitmaps for objects
in that layer.

To simplify the implementation, type-level bitmaps are iterated
simultaneously, and their results are OR'd together to avoid recursively
calling internal bitmap functions.

Future Work
-----------