kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Taylor Blau	49633dc88c	pack-bitmap: build pseudo-merge bitmaps after regular bitmaps When generating bitmaps, `bitmap_builder_init()` starts with an initial selection of commits to receive bitmap coverage, and then determines a set of "maximal" commits based on its input. Commit `089f751360` (pack-bitmap-write: build fewer intermediate bitmaps, 2020-12-08) has extensive details, but the gist is as follows: Each selected commit starts with one commit_mask bit in its "commit mask" bitmap. Then, we walk the first-parent history in topological order and OR each commit's mask into its (first) parent. Whenever that OR results in the parent having more bits set, the child is deemed to be non-maximal, and the frontier is pushed further back along the first parent history. That approach works extremely well for ordinary selected commits, whose first-parent histories often describe real sharing between the bitmaps we are going to write. It struggles, however, to efficiently generate pseudo-merge bitmaps. Unlike ordinary commits for which the above algorithm is designed, pseudo-merges don't represent any "real" commit in history, just a grouping of non-bitmapped reference tips. In that sense, their first parent is just a part of a larger set, and treating them like ordinary selected commits imposes a significant slow-down when generating bitmaps with pseudo-merges enabled. Consider partitioning all non-bitmapped reference tips into eight individual pseudo-merges via the following configuration: [bitmapPseudoMerge "all"] pattern=refs/ threshold=now stableSize=10000000 maxMerges=8 , the cost of generating a bitmap from scratch rises significantly: +------------------+-----------------+---------------+---------------------+ \| \| no pseudo-merge \| pseudo-merges \| Delta \| \| \| \| (HEAD^) \| \| +------------------+-----------------+---------------+---------------------+ \| elapsed \| 294.1 s \| 575.0 s \| +280.9 s (+95.5%) \| \| cycles \| 1,365.5 B \| 2,686.9 B \| +1,321.4 B (+96.8%) \| \| instructions \| 1,389.8 B \| 2,546.6 B \| +1,156.8 B (+83.2%) \| \| CPI \| 0.983 \| 1.055 \| +0.073 (+7.4%) \| +------------------+-----------------+---------------+---------------------+ This is a particularly poor trade-off, because the time saved by these pseudo-merges during, e.g., $ git rev-list --count --all --objects --use-bitmap-index is only: $ hyperfine -L v true,false -n 'pseudo-merges: {v}' ' GIT_TEST_USE_PSEUDO_MERGES={v} git.compile rev-list --count \ --objects --all --use-bitmap-index ' Benchmark 1: pseudo-merges: true Time (mean ± σ): 2.613 s ± 0.012 s [User: 2.308 s, System: 0.305 s] Range (min … max): 2.594 s … 2.633 s 10 runs Benchmark 2: pseudo-merges: false Time (mean ± σ): 52.205 s ± 0.170 s [User: 51.500 s, System: 0.697 s] Range (min … max): 51.956 s … 52.458 s 10 runs Summary pseudo-merges: true ran 19.98 ± 0.11 times faster than pseudo-merges: false In other words, we pay a nearly ~5 minute penalty to generate pseudo-merge bitmaps, but only save ~50 seconds during traversal. The problem stems from injecting pseudo-merges into the bitmap builder as if they were normal commits. The maximal commit selection algorithm was simply not designed for that case, and performs predictably poorly. The only reason we reused the maximal commit selection routine for pseudo-merges alongside regular non-pseudo-merge commits is because we represent them both as commit objects (where the pseudo-merge commits just represent a made-up commit as opposed to one that actually exists in a repository's object store). Instead, build the regular selected commit bitmaps first, considering only non-pseudo-merge commits in `bitmap_builder_init()`. Once those bitmaps have been stored, build each pseudo-merge bitmap separately and attach its parent and object bitmaps to the corresponding pseudo-merge entry before writing the extension. This keeps the regular bitmap build shaped like the no-pseudo-merge case. The later pseudo-merge fill can still stop at stored selected ancestor bitmaps, so it does not have to rewalk each pseudo-merge closure from scratch. When an existing bitmap has the same pseudo-merge parent set, reuse and remap that whole pseudo-merge bitmap before falling back to fill_bitmap_commit(). This preserves the benefit of stable pseudo-merges while keeping the on-disk format and reader behavior unchanged. As a result, the overhead cost for generating pseudo-merges in the above configuration is much smaller: +------------------+-----------------+---------------+-------------------+ \| \| no pseudo-merge \| pseudo-merges \| Delta \| \| \| \| (HEAD) \| \| +------------------+-----------------+---------------+-------------------+ \| elapsed \| 294.1 s \| 328.4 s \| +34.3 s (+11.7%) \| \| cycles \| 1,365.5 B \| 1,529.3 B \| +163.7 B (+12.0%) \| \| instructions \| 1,389.8 B \| 1,552.8 B \| +163.0 B (+11.7%) \| \| CPI \| 0.983 \| 0.985 \| +0.002 (+0.2%) \| +------------------+-----------------+---------------+-------------------+ Recall that at the start of this series, generating reachability bitmaps took 612.5 seconds without pseudo-merges. With this commit, it is still ~46.38% faster to generate reachability bitmaps with pseudo-merges than it was to generate bitmaps wihtout them at the beginning of this series. The changes to implement this are mostly straightforward. We exclude pseudo-merge commits from the existing bitmap generation, and walk over them in a separate pass, by either reusing an existing on-disk pseudo-merge, or passing the pseudo-merge commit itself back to the existing routine in `fill_bitmap_commit()`. (Note that the routine to build pseudo-merge bitmaps is the same both before and after this change, the difference is only that we do not let psuedo-merges participate in determining the set of maximal commits.) The only wrinkle is that `fill_bitmap_commit()` must be taught to not expect that all tree objects have been parsed, which is the case for any portion of history reachable by one or more pseudo-merge(s), but not by any non-pseudo-merge commit selected for bitmapping. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-28 05:23:01 +09:00
Taylor Blau	b04d26607d	pack-bitmap: remember pseudo-merge parents write_pseudo_merges() currently builds an array of temporary bitmaps for the parent set of each pseudo-merge, then serializes those bitmaps later while writing the extension. Move those parent bitmaps onto the corresponding bitmapped_commit entries instead. This keeps the on-disk output unchanged, but gives the parent bitmap the same lifetime and access pattern that later changes will use when pseudo-merge object bitmaps are built before the write step. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-28 05:23:01 +09:00
Taylor Blau	dcccd99746	pack-bitmap: sort bitmaps before XORing Reachability bitmaps may be stored as XORs against nearby bitmaps, up to 10 away. However, when callers provide selected commits in an arbitrary order, the writer may miss good ancestor/descendant pairs and produce much larger bitmap files without changing query coverage. Sort the selected bitmaps in date order (from oldest to newest) before computing XOR offsets, leaving pseudo-merge bitmaps alone (which we will deal with separately in following commits). On our same testing repository from previous commits, this change shrunk our selection of 1,261 bitmaps from ~635.46 MiB to 176.4 MiB for a ~72.24% reduction in the on-disk size of our *.bitmap file. The time to generate the smaller bitmap file decreased by ~3.69 seconds, though this is likely mostly noise. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-28 05:23:01 +09:00
Taylor Blau	c720bbcc53	pack-bitmap: cache object positions during fill The previous commits removed some redundant work from bitmap generation by avoiding unnecessary tree recursion and by reusing selected bitmaps that have already been computed. Even with those changes in place, there is still an extremely hot path from `fill_bitmap_commit()` and `fill_bitmap_tree()` to translate object IDs into their corresponding bit positions in order to generate their bitmaps. In a small repository, this overhead is not significant. However, in a very large repository (e.g., the one that we have been using as a benchmark over the past several commits with ~57M total objects), the overhead of locating object bit positions (often repeatedly) adds up significantly. Combat this by adding a small, direct-mapped cache to the bitmap writer which maps object IDs to their corresponding bit positions. Size the cache according to the number of objects being written, with fixed lower and upper bounds so small repositories do not pay for a large table and large repositories can avoid most repeated packlist and MIDX lookups. On my machine with (a somewhat outdated) GCC 15.2.0, each entry in the cache is 40 bytes wide: $ pahole -C bitmap_pos_cache_entry pack-bitmap-write.o struct bitmap_pos_cache_entry { struct object_id oid; /* 0 36 / uint32_t pos; / 36 4 / / size: 40, cachelines: 1, members: 2 / / last cacheline: 40 bytes */ }; , and we will allocate up to 2^21 entries for a maximum total of 80 MiB of cache overhead. In our example repository from above and in earlier commits, this results in a ~9.4% reduction in runtime relative to the previous commit: +------------------+-------------+-------------+---------------------+ \| \| HEAD^ \| HEAD \| Delta \| +------------------+-------------+-------------+---------------------+ \| elapsed \| 324.8 s \| 294.1 s \| -30.7 s (-9.4%) \| \| cycles \| 1,508.6 B \| 1,365.5 B \| -143.0 B (-9.5%) \| \| instructions \| 1,436.6 B \| 1,389.8 B \| -46.9 B (-3.3%) \| \| CPI \| 1.050 \| 0.983 \| -0.068 (-6.4%) \| +------------------+-------------+-------------+---------------------+ When generating bitmaps on this repository (to produce the above timings), the cache grew to its maximum size of 80 MiB, and resulted in 1.024B cache hits and 59.957M cache misses. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-28 05:23:01 +09:00
Taylor Blau	ece3465d44	pack-bitmap: consolidate `find_object_pos()` success path Both sides of `find_object_pos()` report success in the same way by setting the optional `found` out-parameter and return the resolved bitmap position. Prepare for adding more bookkeeping around object-position lookups by storing the result in a local `pos` variable and sharing the success return path between the packlist and MIDX cases. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-28 05:23:00 +09:00
Taylor Blau	3ea5fe8482	pack-bitmap: reuse stored selected bitmaps When `fill_bitmap_commit()` reaches an ancestor that was selected for its own bitmap and processed earlier, its object closure is already stored in `writer->bitmaps` as an EWAH bitmap. As a result, walking through that commit's tree and parents again is redundant. Teach `fill_bitmap_commit()` to notice that case. For non-root commits in the walk, look for a stored selected bitmap and OR it into the bitmap being built. If one exists, skip the commit, its tree, and its parents. Building bitmaps from scratch on the same test repository from the previous commits yields a significant speed-up: +------------------+-------------+-------------+---------------------+ \| \| HEAD^ \| HEAD \| Delta \| +------------------+-------------+-------------+---------------------+ \| elapsed \| 562.8 s \| 324.8 s \| -237.9 s (-42.3%) \| \| cycles \| 2,621.3 B \| 1,508.6 B \| -1,112.7 B (-42.4%) \| \| instructions \| 2,348.9 B \| 1,436.6 B \| -912.3 B (-38.8%) \| \| CPI \| 1.116 \| 1.050 \| -0.066 (-5.9%) \| +------------------+-------------+-------------+---------------------+ In our testing repository, there are 1,261 commits selected for bitmap coverage, and 1,382 maximal commits induced as a result of that. Of the 1,382 calls made to `fill_bitmap_commit()` (one per maximal commit), 131 of them can be short-circuited at some point during their traversal as a consequence of this change. In large repositories where the cost of filling the bitmap for any individual commit is large, being able to short-circuit even ~9.5% of the calls to `fill_bitmap_commit()` results in a significant savings. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-28 05:23:00 +09:00
Taylor Blau	1760c37258	pack-bitmap: check subtree bits before recursing In the previous commit, we adjusted the callers of `fill_bitmap_tree()` to pass in the bit position of the tree they wish to fill. This commit makes use of that information at the call site to avoid setting up a stack frame for fill_bitmap_tree() entirely whenever a tree's bit position is already set. Since this is such a hot path, the avoided cost of setting up and tearing down stack frames for each noop'd call to `fill_bitmap_tree()` is significant: +--------------+-------------+-------------+-------------------+ \| \| HEAD^ \| HEAD \| Delta \| +--------------+-------------+-------------+-------------------+ \| elapsed \| 582.4 s \| 562.8 s \| -19.6 s (-3.4%) \| \| cycles \| 2,713.3 B \| 2,621.3 B \| -92.0 B (-3.4%) \| \| instructions \| 2,415.5 B \| 2,348.9 B \| -66.6 B (-2.8%) \| \| CPI \| 1.123 \| 1.116 \| -0.007 (-0.7%) \| +--------------+-------------+-------------+-------------------+ In the same repository as in the previous commit, our timings dropped from ~582.4 seconds down to ~562.77 seconds. While the cycles-per-instruction ratio is basically unchanged, we execute significantly fewer instructions, and correspondingly fewer cycles. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-28 05:23:00 +09:00
Taylor Blau	e3959cc78c	pack-bitmap: pass object position to `fill_bitmap_tree()` In the following commit, callers of `fill_bitmap_tree()` will be required to check the bit corresponding to their tree before calling that function. That change will reduce the overhead of setting up and tearing down stack frames for trees whose bits are already set. To prepare for that change, have callers pass in the tree's bit position in `fill_bitmap_tree()`, which will make the next commit easier to read. In the meantime, this change has a surprising and measurable benefit during bitmap generation, particularly on very large repositories. When processing sub-trees within `fill_bitmap_tree()`, the preimage of this patch did the following: while (tree_entry(&desc, entry)) { switch (object_type(entry.mode)) { case OBJ_TREE: if (fill_bitmap_tree(writer, bitmap, lookup_tree(writer->repo, &entry.oid)) < 0) { /* ... / } / ... / } } , first performing the object lookup via `lookup_tree()`, and then locating its bit position within the recursive call. This patch effectively reorders those two calls so that we first discover the sub-tree's bit position, then* load its tree. By reordering these two operations, we spend fewer CPU cycles per instruction, likely due to improved CPU dependency/cache/pipeline behavior. Comparing the results of: running `perf stat` before and after this commit, we have: +--------------+-------------+-------------+-------------------+ \| \| HEAD^ \| HEAD \| Delta \| +--------------+-------------+-------------+-------------------+ \| elapsed \| 612.5 s \| 582.4 s \| -30.1 s (-4.9%) \| \| cycles \| 2,857.3 B \| 2,713.3 B \| -144.0 B (-5.0%) \| \| instructions \| 2,413.2 B \| 2,415.5 B \| +2.3 B (+0.1%) \| \| CPI \| 1.184 \| 1.123 \| -0.061 (-5.1%) \| +--------------+-------------+-------------+-------------------+ In a large repository with ~4.8M commit, and ~37.1M tree objects this change improves timing from ~612.5 seconds down to ~582.4 seconds, or a ~4.9% improvement. More importantly, the number of CPU cycles spent dropped off significantly as a result of this commit, lowering our cycles-per-instruction ratio by about ~5.1%. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-28 05:23:00 +09:00
Taylor Blau	8b5f199f30	pack-bitmap: fix pseudo-merge lookup for shared commits When a commit appears in more than one pseudo-merge group, its entry in the commit lookup table has the high bit set in its offset field, indicating that the offset points to an "extended" table containing the set of pseudo-merges for that commit. There are three bugs in this path: * The `next_ext` offset in `write_pseudo_merges()` undercounts the per-entry size of the lookup table (8 vs. 12 bytes). * `nth_pseudo_merge_ext()` calls `read_pseudo_merge_commit_at()` on a pseudo-merge bitmap offset, misinterpreting it as a 12-byte commit table entry. * The error check after `pseudo_merge_ext_at()` in `apply_pseudo_merges_for_commit()` tests `< -1` instead of `< 0`, silently swallowing errors from `error()`. The first bug is on the write side: each commit lookup entry contains a 4- and 8-byte unsigned value for a total of 12 bytes, but the calculation assumes that the entry only contains 8 bytes of data. This makes `next_ext` too small, so the extended-table offsets that get written point into the middle of the non-extended lookup table rather than past it. The reader then interprets non-extended lookup data as extended entries, producing garbage. The second bug is on the read side and is independently fatal: even with a correctly positioned extended table, `nth_pseudo_merge_ext()` feeds the offset it reads (which points at pseudo-merge bitmap data) to `read_pseudo_merge_commit_at()`. That function tries to parse 12 bytes as a `pseudo_merge_commit` struct, clobbering `merge->pseudo_merge_ofs` with whatever happens to be at that location. The caller only needs `pseudo_merge_ofs`, so the fix is to store the offset directly rather than re-parsing a commit table entry. The `commit_pos` field is left untouched, retaining the value that `find_pseudo_merge()` set earlier. The third bug is latent. With the first two fixes applied, the extended table is correctly written and read, so `pseudo_merge_ext_at()` does not fail during normal operation. The `< -1` vs `< 0` distinction only matters when the bitmap file is corrupt or truncated, in which case the error would be silently ignored and the code would proceed with uninitialized data. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-12 10:36:18 +09:00
Taylor Blau	ff21343a59	pack-bitmap-write: sort pseudo-merge commit lookup table in pack order The pseudo-merge commit lookup table stores each commit's position in the pack- or pseudo-pack order, and is used to perform a binary search in order to determine which pseudo-merge(s) a given commit belongs to. However, the table was previously sorted in lexical order (via `oid_array_sort()`), causing the binary search to fail. While this causes pseudo-merge bitmaps to be de-facto broken for fill-in traversal, there are a couple of important points to keep in mind: * Pseudo-merge application during the initial phases of a bitmap-based traversal are applied via `cascade_pseudo_merges_1()`. This function enumerates the known pseudo-merges and determines if its parents are a subset of the traversal roots. This is a different path than the fill-in traversal, where we are looking for any pseudo-merges which may be satisfied after visiting some commit along an object walk, which involves the aforementioned (broken) binary search. As a consequence, any pseudo-merges we apply at this stage are done so correctly. * While this bug makes applying pseudo-merges during fill-in traversal effectively broken, it does not produce wrong results. Instead of applying the wrong pseudo-merge, we will simply fail to find satisfied pseudo-merges, leaving the traversal to use the existing fill-in routines. Fix this by sorting the table by bit position before writing, matching the order that the reader's binary search expects. This does produce a change the on-disk format insofar as the actual code now complies with the documented format (for more details, refer to: Documentation/technical/bitmap-format.adoc). Given that this never worked in the first place, such a change should be OK to perform. If an out-of-tree implementation of pseudo-merges happened to generate bitmaps that comply with the documented format, they will continue to be read and interpreted as normal. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-12 10:36:18 +09:00
Junio C Hamano	5288202433	Merge branch 'ps/commit-list-functions-renamed' Rename three functions around the commit_list data structure. * ps/commit-list-functions-renamed: commit: rename `free_commit_list()` to conform to coding guidelines commit: rename `reverse_commit_list()` to conform to coding guidelines commit: rename `copy_commit_list()` to conform to coding guidelines	2026-02-13 13:39:25 -08:00
Junio C Hamano	214cbb7b1d	Merge branch 'rs/tree-wo-the-repository' Remove implicit reliance on the_repository global in the APIs around tree objects and make it explicit which repository to work in. * rs/tree-wo-the-repository: cocci: remove obsolete the_repository rules cocci: convert parse_tree functions to repo_ variants tree: stop using the_repository tree: use repo_parse_tree() path-walk: use repo_parse_tree_gently() pack-bitmap-write: use repo_parse_tree() delta-islands: use repo_parse_tree() bloom: use repo_parse_tree() add-interactive: use repo_parse_tree_indirect() tree: add repo_parse_tree*() environment: move access to core.maxTreeDepth into repo settings	2026-01-21 16:16:28 -08:00
Patrick Steinhardt	9f18d089c5	commit: rename `free_commit_list()` to conform to coding guidelines Our coding guidelines say that: Functions that operate on `struct S` are named `S_<verb>()` and should generally receive a pointer to `struct S` as first parameter. While most of the functions related to `struct commit_list` already follow that naming schema, `free_commit_list()` doesn't. Rename the function to address this and adjust all of its callers. Add a compatibility wrapper for the old function name to ease the transition and avoid any semantic conflicts with in-flight patch series. This wrapper will be removed once Git 2.53 has been released. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-15 05:32:31 -08:00
René Scharfe	fd64c6fbcb	pack-bitmap-write: use repo_parse_tree() `1a6768d1dd` (pack-bitmap-write: stop depending on `the_repository`, 2025-03-10) replaced explicit uses of the_repository. parse_tree() uses it internally, though, so call repo_parse_tree() instead and hand it the correct repository. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-09 18:36:17 -08:00
René Scharfe	065523812f	pack-bitmap-write: use commit_stack Use commit_stack instead of open-coding it. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-25 08:29:28 +09:00
Junio C Hamano	51b50c55a9	Merge branch 'ps/object-store' Code clean-up around object access API. * ps/object-store: odb: rename `read_object_with_reference()` odb: rename `pretend_object_file()` odb: rename `has_object()` odb: rename `repo_read_object_file()` odb: rename `oid_object_info()` odb: trivial refactorings to get rid of `the_repository` odb: get rid of `the_repository` when handling submodule sources odb: get rid of `the_repository` when handling the primary source odb: get rid of `the_repository` in `for_each()` functions odb: get rid of `the_repository` when handling alternates odb: get rid of `the_repository` in `odb_mkstemp()` odb: get rid of `the_repository` in `assert_oid_type()` odb: get rid of `the_repository` in `find_odb()` odb: introduce parent pointers object-store: rename files to "odb.{c,h}" object-store: rename `object_directory` to `odb_source` object-store: rename `raw_object_store` to `object_database`	2025-07-15 15:18:18 -07:00
Patrick Steinhardt	e989dd96b8	odb: rename `oid_object_info()` Rename `oid_object_info()` to `odb_read_object_info()` as well as their `_extended()` variant to match other functions related to the object database and our modern coding guidelines. Introduce compatibility wrappers so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:37 -07:00
Patrick Steinhardt	1b1679c688	odb: get rid of `the_repository` in `odb_mkstemp()` Get rid of our dependency on `the_repository` in `odb_mkstemp()` by passing in the object database as a parameter and adjusting all callers. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:35 -07:00
Patrick Steinhardt	8f49151763	object-store: rename files to "odb.{c,h}" In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:34 -07:00
Junio C Hamano	1f622bb0ab	Merge branch 'ly/do-not-localize-bug-messages' Code clean-up. * ly/do-not-localize-bug-messages: BUG(): remove leading underscore of the format string	2025-06-17 10:44:40 -07:00
Lidong Yan	5dceb8bd05	BUG(): remove leading underscore of the format string BUG() is not end-user facing but programmer facing, and we do not use _("...") in them. Replace all `BUG(_("..."))` with `BUG("...")` Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-06-03 08:36:11 -07:00
Junio C Hamano	36d8035d27	Merge branch 'ps/object-file-cleanup' Code clean-up. * ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"	2025-04-24 17:25:33 -07:00
Junio C Hamano	ee847e0034	Merge branch 'ps/object-wo-the-repository' The object layer has been updated to take an explicit repository instance as a parameter in more code paths. * ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`	2025-04-15 13:50:15 -07:00
Patrick Steinhardt	68cd492a3e	object-store: merge "object-store-ll.h" and "object-store.h" The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-15 08:24:37 -07:00
Taylor Blau	27afc272c4	midx: implement writing incremental MIDX bitmaps Now that the pack-bitmap machinery has learned how to read and interact with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery (and relevant callers from within the MIDX machinery) to write such bitmaps. The details for doing so are mostly straightforward. The main changes are as follows: - find_object_pos() now makes use of an extra MIDX parameter which is used to locate the bit positions of objects which are from previous layers (and thus do not exist in the current layer's pack_order field). (Note also that the pack_order field is moved into struct write_midx_context to further simplify the callers for write_midx_bitmap()). - bitmap_writer_build_type_index() first determines how many objects precede the current bitmap layer and offsets the bits it sets in each respective type-level bitmap by that amount so they can be OR'd together. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 04:34:16 -07:00
Patrick Steinhardt	1a6768d1dd	pack-bitmap-write: stop depending on `the_repository` There are multiple sites in "pack-bitmap-write.c" where we use the global `the_repository` variable, either explicitly or implicitly by using `the_hash_algo`. Refactor the code so that the `struct bitmap_writer` stores the repository it is getting initialized with. Like this, we can adapt callsites that use `the_repository` to instead use the repository provided by the writer. Remove the `USE_THE_REPOSITORY_VARIABLE` define. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:16:19 -07:00
Patrick Steinhardt	228457c9d9	csum-file: stop depending on `the_repository` There are multiple sites in "csum-file.c" where we use the global `the_repository` variable, either explicitly or implicitly by using `the_hash_algo`. Refactor the code to stop using `the_repository` by adapting functions to receive required data as parameters. Adapt callsites accordingly by either using `the_repository->hash_algo`, or by using a context-provided hash algorithm in case the subsystem already got rid of its dependency on `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:16:18 -07:00
Patrick Steinhardt	028f618658	path: adjust last remaining users of `the_repository` With the preceding refactorings we now only have a couple of implicit users of `the_repository` left in the "path" subsystem, all of which depend on global state via `calc_shared_perm()`. Make the dependency on `the_repository` explicit by passing the repo as a parameter instead and adjust callers accordingly. Note that this change bubbles up into a couple of subsystems that were previously declared as free from `the_repository`. Instead of marking all of them as `the_repository`-dependent again, we instead use the repository that is available in the calling context. There are three exceptions though with "copy.c", "pack-write.c" and "tempfile.c". Adjusting these would require us to adapt callsites all over the place, so this is left for a future iteration. Mark "path.c" as free from `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-28 13:54:11 -08:00
Patrick Steinhardt	1f7e6478dc	progress: stop using `the_repository` Stop using `the_repository` in the "progress" subsystem by passing in a repository when initializing `struct progress`. Furthermore, store a pointer to the repository in that struct so that we can pass it to the trace2 API when logging information. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-18 10:44:30 -08:00
Patrick Steinhardt	41f43b8243	global: mark code units that generate warnings with `-Wsign-compare` Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-06 20:20:02 +09:00
Patrick Steinhardt	7f97266ee1	pack-bitmap-write: fix leaking OID array Fix a leaking OID array in `write_pseudo_merges()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-09-30 11:23:07 -07:00
Patrick Steinhardt	55e563a90c	pseudo-merge: fix various memory leaks Fix various memory leaks hit by the pseudo-merge machinery. These leaks are exposed by t5333, but plugging them does not yet make the whole test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-09-30 11:23:06 -07:00
Junio C Hamano	2ff26d2286	Merge branch 'jk/drop-unused-parameters' Drop unused parameters from functions. * jk/drop-unused-parameters: diff-lib: drop unused index argument from get_stat_data() ref-filter: drop unused parameters from email_atom_option_parser() pack-bitmap: drop unused parameters from select_pseudo_merges() pack-bitmap: load writer config from repository parameter refs: drop some unused parameters from create_symref_lock()	2024-08-26 11:32:22 -07:00
Jeff King	4756494504	pack-bitmap: drop unused parameters from select_pseudo_merges() We take the array of indexed_commits (and its length), but there's no need. The selection is based on ref reachability, not the linearized set of commits we're packing. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-17 09:44:41 -07:00
Jeff King	ecc6fa9ae9	pack-bitmap: load writer config from repository parameter In bitmap_writer_init(), we take a repository parameter but ever look at it. Most of the initialization here is independent of the repository, but we do load some config. So let's pass the repo we get down to load_pseudo_merges_from_config(), which in turn can use repo_config(), rather than depending on the_repository via git_config(). The outcome is the same, since all callers pass in the_repository anyway. But it takes us a step closer to getting rid of the global, and as a bonus it silences an unused parameter warning. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-17 09:44:40 -07:00
Taylor Blau	187504f9b2	pack-bitmap-write.c: select pseudo-merges even for small bitmaps Ordinarily, the pack-bitmap machinery will select some subset of reachable commits to receive bitmaps. But when there are fewer than 100 commits indexed in the first place, they will all receive bitmaps as a special case. When this happens, pseudo-merges are not generated, making it impossible to test pseudo-merge corner cases with fewer than 100 commits. Select pseudo-merges even for bitmaps with fewer than 100 commits to make such testing easier. In practice, this should not make a difference to non-testing bitmaps, as they are unlikely to be used when a repository has so few commits to begin with. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-15 11:25:02 -07:00
Taylor Blau	11a08e8332	pack-bitmap: drop redundant args from `bitmap_writer_finish()` In a similar fashion as the previous commit, drop a redundant argument from the `bitmap_writer_finish()` function. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-15 11:23:15 -07:00
Taylor Blau	f00dda4849	pack-bitmap: drop redundant args from `bitmap_writer_build()` In a similar fashion as the previous commit, drop a redundant argument from the `bitmap_writer_build()` function. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-15 11:22:27 -07:00
Taylor Blau	125ee4ae80	pack-bitmap: drop redundant args from `bitmap_writer_build_type_index()` The previous commit ensures that the bitmap_writer's "to_pack" field is initialized early on, so the "to_pack" and "index_nr" arguments to `bitmap_writer_build_type_index()` are redundant. Drop them and adjust the callers accordingly. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-15 11:20:24 -07:00
Taylor Blau	01e9d12939	pack-bitmap: initialize `bitmap_writer_init()` with packing_data In order to determine its object order, the pack-bitmap machinery keeps a 'struct packing_data' corresponding to the pack or pseudo-pack (when writing a MIDX bitmap) being written. The to_pack field is provided to the bitmap machinery by callers of bitmap_writer_build() and assigned to the bitmap_writer struct at that point. But a subsequent commit will want to have access to that data earlier on during commit selection. Prepare for that by adding a 'to_pack' argument to 'bitmap_writer_init()', and initializing the field during that function. Subsequent commits will clean up other functions which take now-redundant arguments (like nr_objects, which is equivalent to pdata->objects_nr, or pdata itself). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-15 11:18:04 -07:00
Junio C Hamano	7b472da915	Merge branch 'ps/use-the-repository' A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help transition the codebase to rely less on the availability of the singleton the_repository instance. * ps/use-the-repository: hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE` t/helper: remove dependency on `the_repository` in "proc-receive" t/helper: fix segfault in "oid-array" command without repository t/helper: use correct object hash in partial-clone helper compat/fsmonitor: fix socket path in networked SHA256 repos replace-object: use hash algorithm from passed-in repository protocol-caps: use hash algorithm from passed-in repository oidset: pass hash algorithm when parsing file http-fetch: don't crash when parsing packfile without a repo hash-ll: merge with "hash.h" refs: avoid include cycle with "repository.h" global: introduce `USE_THE_REPOSITORY_VARIABLE` macro hash: require hash algorithm in `empty_tree_oid_hex()` hash: require hash algorithm in `is_empty_{blob,tree}_oid()` hash: make `is_null_oid()` independent of `the_repository` hash: convert `oidcmp()` and `oideq()` to compare whole hash global: ensure that object IDs are always padded hash: require hash algorithm in `oidread()` and `oidclr()` hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions	2024-07-02 09:59:00 -07:00
Patrick Steinhardt	e7da938570	global: introduce `USE_THE_REPOSITORY_VARIABLE` macro Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:33 -07:00
Patrick Steinhardt	f4836570a7	hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` Many of our hash functions have two variants, one receiving a `struct git_hash_algo` and one that derives it via `the_repository`. Adapt all of those functions to always require the hash algorithm as input and drop the variants that do not accept one. As those functions are now independent of `the_repository`, we can move them from "hash.h" to "hash-ll.h". Note that both in this and subsequent commits in this series we always just pass `the_repository->hash_algo` as input even if it is obvious that there is a repository in the context that we should be using the hash from instead. This is done to be on the safe side and not introduce any regressions. All callsites should eventually be amended to use a repo passed via parameters, but this is outside the scope of this patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:32 -07:00
Taylor Blau	7252d9a036	pseudo-merge: implement support for finding existing merges This patch implements support for reusing existing pseudo-merge commits when writing bitmaps when there is an existing pseudo-merge bitmap which has exactly the same set of parents as one that we are about to write. Note that unstable pseudo-merges are likely to change between consecutive repacks, and so are generally poor candidates for reuse. However, stable pseudo-merges (see the configuration option 'bitmapPseudoMerge.<name>.stableThreshold') are by definition unlikely to change between runs (as they represent long-running branches). Because there is no index from a set of pseudo-merge parents to a matching pseudo-merge bitmap, we have to construct the bitmap corresponding to the set of parents for each pending pseudo-merge commit and see if a matching bitmap exists. This is technically quadratic in the number of pseudo-merges, but is OK in practice for a couple of reasons: - non-matching pseudo-merge bitmaps are rejected quickly as soon as they differ in a single bit - already-matched pseudo-merge bitmaps are discarded from subsequent rounds of search - the number of pseudo-merges is generally small, even for large repositories In order to do this, implement (a) a function that finds a matching pseudo-merge given some uncompressed bitset describing its parents, (b) a function that computes the bitset of parents for a given pseudo-merge commit, and (c) call that function before computing the set of reachable objects for some pending pseudo-merge. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-24 11:40:44 -07:00
Taylor Blau	53ea3ec479	pack-bitmap-write.c: write pseudo-merge table Now that the pack-bitmap writer machinery understands how to select and store pseudo-merge commits, teach it how to write the new optional pseudo-merge .bitmap extension. No readers yet exist for this new extension to the .bitmap format. The following commits will take any preparatory step(s) necessary before then implementing the routines necessary to read this new table. In the meantime, the new `write_pseudo_merges()` function implements writing this new format as described by a previous commit in Documentation/technical/bitmap-format.txt. Writing this table is fairly straightforward and consists of a few sub-components: - a pair of bitmaps for each pseudo-merge (one for the pseudo-merge "parents", and another for the objects reachable from those parents) - for each commit, the offset of either (a) the pseudo-merge it belongs to, or (b) an extended lookup table if it belongs to >1 pseudo-merge groups - if there are any commits belonging to >1 pseudo-merge group, the extended lookup tables (which each consist of the number of pseudo-merge groups a commit appears in, and then that many 4-byte unsigned ) Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-24 11:40:42 -07:00
Taylor Blau	faf558b23e	pseudo-merge: implement support for selecting pseudo-merge commits Teach the new pseudo-merge machinery how to select non-bitmapped commits for inclusion in different pseudo-merge group(s) based on a handful of criteria. Note that the selected pseudo-merge commits aren't actually used or written anywhere yet. This will be done in the following commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-24 11:40:42 -07:00
Taylor Blau	c059c8795e	pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public The pseudo-merge selection code will be added in a subsequent commit, and will need a way to push the allocated commit structures into the bitmap writer from a separate compilation unit. Make the `bitmap_writer_push_bitmapped_commit()` function part of the pack-bitmap.h header in order to make this possible. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-24 11:40:42 -07:00
Taylor Blau	245a7f2e01	pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Prepare to implement pseudo-merge bitmap selection by implementing a necessary new function, `bitmap_writer_has_bitmapped_object_id()`. This function returns whether or not the bitmap_writer selected the given object ID for bitmapping. This will allow the pseudo-merge machinery to reject candidates for pseudo-merges if they have already been selected as an ordinary bitmap tip. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-24 11:40:42 -07:00
Taylor Blau	0d41b18317	pack-bitmap-write: support storing pseudo-merge commits Prepare to write pseudo-merge bitmaps by annotating individual bitmapped commits (which are represented by the `bitmapped_commit` structure) with an extra bit indicating whether or not they are a pseudo-merge. In subsequent commits, pseudo-merge bitmaps will be generated by allocating a fake commit node with parents covering the full set of commits represented by the pseudo-merge bitmap. These commits will be added to the set of "selected" commits as usual, but will be written specially instead of being included with the rest of the selected commits. Mechanically speaking, there are two parts of this change: - The bitmapped_commit struct gets a new bit indicating whether it is a pseudo-merge, or an ordinary commit selected for bitmaps. - A handful of changes to only write out the non-pseudo-merge commits when enumerating through the selected array (see the new `bitmap_writer_selected_nr()` function). Pseudo-merge commits appear after all non-pseudo-merge commits, so it is safe to enumerate through the selected array like so: for (i = 0; i < bitmap_writer_selected_nr(); i++) if (writer.selected[i].pseudo_merge) BUG("unexpected pseudo-merge"); without encountering the BUG(). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-24 11:40:42 -07:00
Taylor Blau	4722e06edc	pack-bitmap: move some initialization to `bitmap_writer_init()` The pack-bitmap-writer machinery uses a oidmap (backed by khash.h) to map from commits selected for bitmaps (by OID) to a bitmapped_commit structure (containing the bitmap itself, among other things like its XOR offset, etc.) This map was initialized at the end of `bitmap_writer_build()`. New entries are added in `pack-bitmap-write.c::store_selected()`, which is called by the bitmap_builder machinery (which is responsible for traversing history and generating the actual bitmaps). Reorganize when this field is initialized and when entries are added to it so that we can quickly determine whether a commit is a candidate for pseudo-merge selection, or not (since it was already selected to receive a bitmap, and thus storing it in a pseudo-merge would be redundant). The changes are as follows: - Introduce a new `bitmap_writer_init()` function which initializes the `writer.bitmaps` field (instead of waiting until the end of `bitmap_writer_build()`). - Add map entries in `push_bitmapped_commit()` (which is called via `bitmap_writer_select_commits()`) with OID keys and NULL values to track whether or not we expect to write a bitmap for some given commit. - Validate that a NULL entry is found matching the given key when we store a selected bitmap. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-24 11:40:41 -07:00

1 2 3

149 Commits (2c677d20b684cdcae669fe508d4cd60fea8cb320)