kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	ffaa2eddd0	Merge branch 'ds/path-walk-filters' The "git pack-objects --path-walk" traversal has been integrated with several object filters, including blobless and sparse filters. * ds/path-walk-filters: path-walk: support `combine` filter path-walk: support `object:type` filter path-walk: support `tree:0` filter t6601: tag otherwise-unreachable trees pack-objects: support sparse:oid filter with path-walk path-walk: add pl_sparse_trees to control tree pruning path-walk: support blob size limit filter backfill: die on incompatible filter options path-walk: support blobless filter path-walk: always emit directly-requested objects t/perf: add pack-objects filter and path-walk benchmark pack-objects: pass --objects with --path-walk t5620: make test work with path-walk var	2026-06-02 16:15:29 +09:00
Derrick Stolee	2dc858e69e	pack-objects: support sparse:oid filter with path-walk The --filter=sparse:<oid> option to 'git pack-objects' allows focusing an object set to a sparse-checkout definition. This reduces the set of matching blobs while retaining all reachable trees. No server currently supports fetching with this filter because it is expensive to compute and reachability bitmaps do not help without a significant effort to extend the bitmap feature to store bitmaps for each supported sparse- checkout definition. Without focusing on serving fetches and clones with these filters, there are still benefits that could be realized by making this faster. With the sparse index, it's more realistic now than ever to be able to operate a local clone that was bootstrapped by a packfile created with a sparse filter, because the missing trees are not needed to move a sparse-checkout from one commit to another or to view the history of any path in scope. Such clones could perhaps be bootstrapped by partial bundles. Previously, constructing these sparse packs has been incredibly computationally inefficient. The revision walk that explores which objects are in scope spends a lot of time checking each object to see if it matches the sparse-checkout patterns, causing quadratic behavior (number of objects times number of sparse-checkout patterns). This improves somewhat when using cone-mode sparse-checkout patterns that can use hashtables and prefix matches to determine containment. However, the check per object is still too expensive for most cases. This is where the path-walk feature comes in. We can proceed as normal by placing objects in bins by path and _then_ check a group of objects all at once. Since sparse:<oid> only restricts blobs, the path-walk must include all reachable trees while using the cone-mode patterns to skip blobs at paths outside the sparse scope. This establishes a baseline for a potential future "treesparse:<oid>" filter that would also restrict trees, but introducing such a new filter is deferred to a later change. The implementation here is focused around loading the sparse-checkout patterns from the provided object ID and checking that the patterns are indeed cone-mode patterns. We can then load the correct pattern list into the path walk context and use the logic that already exists from `bff4555767` (backfill: add --sparse option, 2025-02-03), though that feature loads sparse-checkout patterns from the worktree's local settings and also restricts tree objects. We use a combination of errors and warnings to signal problems during this load. The difference is that errors are likely fatal for the non-path-walk version while the warnings are probably just implementation details for the path-walk version and the 'git pack-objects' command can fall back to the revision walk version. Now that the SEEN flag is deferred until after pattern checks (from the previous commit), handle the case where a tree with a shared OID appears at both an out-of-cone and in-cone path. When trees are not being pruned (pl_sparse_trees == 0), the path-walk re-walks the tree at the in-cone path so that in-cone blobs within it are discovered. The new tests in t5317 and t6601 demonstrate this behavior and would fail without these changes. The performance test p5315 shows the impact of this change when using sparse filters: Test HEAD~1 HEAD ---------------------------------------------------------------------- 5315.10: repack (sparse:oid) 77.98 77.47 -0.7% 5315.11: repack size (sparse:oid) 187.5M 187.4M -0.0% 5315.12: repack (sparse:oid, --path-walk) 77.91 31.41 -59.7% 5315.13: repack size (sparse:oid, --path-walk) 187.5M 161.1M -14.1% These performance tests were run on the Git repository. The --path-walk feature shows meaningful space savings (14% smaller for sparse packs) and dramatic time savings (60% faster) by leveraging the path-walk's ability to skip blobs outside the sparse scope. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blaue <me@ttaylorr.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-24 18:41:06 +09:00
Derrick Stolee	6d87f0e8a3	path-walk: support blobless filter The 'git pack-objects' command can opt-in to using the path-walk API for scanning the objects. Currently, this option is dynamically disabled if combined with '--filter=<X>', even when using a simple filter such as 'blob:none' to signal a blobless packfile. This is a common scenario for repos at scale, so is worth integrating. Also, users can opt-in to the '--path-walk' option by default through the pack.usePathWalk=true config option. When using that in a blobless partial clone, the following warning can appear even though the user did not specify either option directly: warning: cannot use --filter with --path-walk Teach the path-walk API to handle the 'blob:none' object filter natively. When revs->filter.choice is LOFC_BLOB_NONE, the path-walk sets info->blobs to 0 (skipping all blob objects) and clears the filter from revs so that prepare_revision_walk() does not reject the configuration. This check is implemented in the static prepare_filters() method, which will simultaneously check if the input filters are compatible and will make the appropriate mutations to the path_walk_info and filters if the path_walk_info is non-NULL. This allows us to use this logic both in the API method path_walk_filter_compatible() for use in builtin/pack-objects.c and as a prep step in walk_objects_by_path(). Update the test helper (test-path-walk) to accept --filter=<spec> as a test-tool option (before '--'), applying it to revs after setup_revisions() to avoid the --objects requirement check. We can also revert recent GIT_TEST_PACK_PATH_WALK overrides in t5620. Also switch test-path-walk from REV_INFO_INIT with manual repo assignment to repo_init_revisions(), which properly initializes the filter_spec strbuf needed for filter parsing. Add tests for blob:none with --all and with a single branch. The performance test p5315 shows the impact of this change when using blobless filters: Test HEAD~1 HEAD --------------------------------------------------------------------- 5315.6: repack (blob:none) 13.53 13.87 +2.5% 5315.7: repack size (blob:none) 137.7M 137.8M +0.1% 5315.8: repack (blob:none, --path-walk) 13.51 23.43 +73.4% 5315.9: repack size (blob:none, --path-walk) 137.7M 115.2M -16.3% These performance tests were run on the Git repository. The --path-walk feature shows meaningful space savings (16% smaller for blobless packs) at the cost of increased computation time due to the two compression passes. This data demonstrates that the feature is engaged and provides real compression benefits when --no-reuse-delta forces fresh deltas. Co-Authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-24 18:41:06 +09:00
Derrick Stolee	35567889ef	pack-objects: pass --objects with --path-walk When 'git pack-objects' has the --path-walk option enabled, it uses a different set of revision walk parameters than normal. For one, --objects was previously assumed by the path-walk API and could be omitted. We also needed --boundary to allow discovering UNINTERESTING objects to use as delta bases. We will be updating the path-walk API soon to work with some filter options. However, the revision machinery will trigger a fatal error: fatal: object filtering requires --objects The fix is easy: add the --objects option as an argument. This has no effect on the path-walk API but does simplify the revision option parsing for the objects filter. We can remove the comment about "removing" the options because they were never removed and instead not added. We still need to disable using bitmaps. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-24 18:41:06 +09:00
Johannes Schindelin	606c192380	odb, packfile: use size_t for streaming object sizes The odb_read_stream structure uses unsigned long for the size field, which is 32-bit on Windows even in 64-bit builds. When streaming objects larger than 4GB, the size would be truncated to zero or an incorrect value, resulting in empty files being written to disk. Change the size field in odb_read_stream to size_t and introduce unpack_object_header_sz() to return sizes via size_t pointer. Since object_info.sizep remains unsigned long for API compatibility, use temporary variables where the types differ, with comments noting the truncation limitation for code paths that still use unsigned long. Widening the producers to size_t in this way introduces a handful of silent size_t -> unsigned long narrowings on Windows, all in builtin/pack-objects.c, where the consumers are still typed unsigned long. Make those narrowings explicit with cast_size_t_to_ulong() so they assert loudly the moment an object actually exceeds ULONG_MAX bytes: - oe_get_size_slow() returns unsigned long but holds a size_t locally; cast at the return. - write_reuse_object() passes a size_t into check_pack_inflate(), whose expect parameter is unsigned long; cast at the call. - check_object() routes a size_t through SET_SIZE() and SET_DELTA_SIZE(), both of which take unsigned long via oe_set_size() / oe_set_delta_size(); cast at the three call sites in the OBJ_OFS_DELTA / OBJ_REF_DELTA branches and in the non-delta default arm. The cast-only treatment is deliberately a stop-gap. Properly widening oe_set_size, oe_get_size_slow's return type, check_pack_inflate's expect parameter, object_info.sizep, patch_delta, and the OE_SIZE_BITS bit-fields cascades into a series that is too large to be reviewable, so the proper widening is deferred to a follow-up topic. Until then, cast_size_t_to_ulong() at least makes the truncation explicit at the source: it documents the boundary, and on a 64-bit non-Windows platform it is a no-op. This was originally authored by LordKiRon <https://github.com/LordKiRon>, who preferred not to reveal their real name and therefore agreed that I take over authorship. Helped-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-09 11:25:31 +09:00
Junio C Hamano	03311dca7f	Merge branch 'tb/stdin-packs-excluded-but-open' pack-objects's --stdin-packs=follow mode learns to handle excluded-but-open packs. * tb/stdin-packs-excluded-but-open: repack: mark non-MIDX packs above the split as excluded-open pack-objects: support excluded-open packs with --stdin-packs t7704: demonstrate failure with once-cruft objects above the geometric split pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` pack-objects: plug leak in `read_stdin_packs()`	2026-04-06 15:42:49 -07:00
Junio C Hamano	d75badf83b	Merge branch 'ps/odb-generic-object-name-handling' Object name handling (disambiguation and abbreviation) has been refactored to be backend-generic, moving logic into the respective object database backends. * ps/odb-generic-object-name-handling: odb: introduce generic `odb_find_abbrev_len()` object-file: move logic to compute packed abbreviation length object-name: move logic to compute loose abbreviation length object-name: simplify computing common prefixes object-name: abbreviate loose object names without `disambiguate_state` object-name: merge `update_candidates()` and `match_prefix()` object-name: backend-generic `get_short_oid()` object-name: backend-generic `repo_collect_ambiguous()` object-name: extract function to parse object ID prefixes object-name: move logic to iterate through packed prefixed objects object-name: move logic to iterate through loose prefixed objects odb: introduce `struct odb_for_each_object_options` oidtree: extend iteration to allow for arbitrary return codes oidtree: modernize the code a bit object-file: fix sparse 'plain integer as NULL pointer' error	2026-04-06 15:42:49 -07:00
Taylor Blau	3f7c0e722e	pack-objects: support excluded-open packs with --stdin-packs In `cd846bacc7` (pack-objects: introduce '--stdin-packs=follow', 2025-06-23), pack-objects learned to traverse through commits in included packs when using '--stdin-packs=follow', rescuing reachable objects from unlisted packs into the output. When we encounter a commit in an excluded pack during this rescuing phase we will traverse through its parents. But because we set `revs.no_kept_objects = 1`, commit simplification will prevent us from showing it via `get_revision()`. (In practice, `--stdin-packs=follow` walks commits down to the roots, but only opens up trees for ones that do not appear in an excluded pack.) But there are certain cases where we do need to see the parents of an object in an excluded pack. Namely, if an object is rescue-able, but only reachable from object(s) which appear in excluded packs, then commit simplification will exclude those commits from the object traversal, and we will never see a copy of that object, and thus not rescue it. This is what causes the failure in the previous commit during repacking. When performing a geometric repack, packs above the geometric split that weren't part of the previous MIDX (e.g., packs pushed directly into `$GIT_DIR/objects/pack`) may not have full object closure. When those packs are listed as excluded via the '^' marker, the reachability traversal encounters the sequence described above, and may miss objects which we expect to rescue with `--stdin-packs=follow`. Introduce a new "excluded-open" pack prefix, '!'. Like '^'-prefixed packs, objects from '!'-prefixed packs are excluded from the resulting pack. But unlike '^', commits in '!'-prefixed packs are used as starting points for the follow traversal, and the traversal does not treat them as a closure boundary. In order to distinguish excluded-closed from excluded-open packs during the traversal, introduce a new `pack_keep_in_core_open` bit on `struct packed_git`, along with a corresponding `KEPT_PACK_IN_CORE_OPEN` flag for the kept-pack cache. In `add_object_entry_from_pack()`, move the `want_object_in_pack()` check to after `add_pending_oid()`. This is necessary so that commits from excluded-open packs are added as traversal tips even though their objects won't appear in the output. As a consequence, the caller `for_each_object_in_pack()` will always provide a non-NULL 'p', hence we are able to drop the "if (p)" conditional. The `include_check` and `include_check_obj` callbacks on `rev_info` are used to halt the walk at closed-excluded packs, since objects behind a '^' boundary are guaranteed to have closure and need not be rescued. The following commit will make use of this new functionality within the repack layer to resolve the test failure demonstrated in the previous commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-27 13:40:40 -07:00
Taylor Blau	d31d1f2e06	pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` The '--stdin-packs' mode of pack-objects maintains two separate string_lists: one for included packs, and one for excluded packs. Each list stores the pack basename as a string and the corresponding `packed_git` pointer in its `->util` field. This works, but makes it awkward to extend the set of pack "kinds" that pack-objects can accept via stdin, since each new kind would need its own string_list and duplicated handling. A future commit will want to do just this, so prepare for that change by handling the various "kinds" of packs specified over stdin in a more generic fashion. Namely, replace the two `string_list`s with a single `strmap` keyed on the pack basename, with values pointing to a new `struct stdin_pack_info`. This struct tracks both the `packed_git` pointer and a `kind` bitfield indicating whether the pack was specified as included or excluded. Extract the logic for sorting packs by mtime and adding their objects into a separate `stdin_packs_add_pack_entries()` helper. While we could have used a `string_list`, we must handle the case where the same pack is specified more than once. With a `string_list` only, we would have to pay a quadratic cost to either (a) insert elements into their sorted positions, or (b) a repeated linear search, which is accidentally quadratic. For that reason, use a strmap instead. This patch does not include any functional changes. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-27 13:40:39 -07:00
Taylor Blau	81e2906437	pack-objects: plug leak in `read_stdin_packs()` The `read_stdin_packs()` function added originally via `339bce27f4` (builtin/pack-objects.c: add '--stdin-packs' option, 2021-02-22) declares a `rev_info` struct but neglects to call `release_revisions()` on it before returning, creating the potential for a leak. The related change in `97ec43247c` (pack-objects: declare 'rev_info' for '--stdin-packs' earlier, 2025-06-23) carried forward this oversight and did not address it. Ensure that we call `release_revisions()` appropriately to prevent a potential leak from this function. Note that in practice our `rev_info` here does not have a present leak, hence t5331 passes cleanly before this commit, even when built with SANITIZE=leak. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-27 13:40:39 -07:00
Junio C Hamano	8023abc632	Merge branch 'ps/upload-pack-buffer-more-writes' Reduce system overhead "git upload-pack" spends on relaying "git pack-objects" output to the "git fetch" running on the other end of the connection. * ps/upload-pack-buffer-more-writes: builtin/pack-objects: reduce lock contention when writing packfile data csum-file: drop `hashfd_throughput()` csum-file: introduce `hashfd_ext()` sideband: use writev(3p) to send pktlines wrapper: introduce writev(3p) wrappers compat/posix: introduce writev(3p) wrapper upload-pack: reduce lock contention when writing packfile data upload-pack: prefer flushing data over sending keepalive upload-pack: adapt keepalives based on buffering upload-pack: fix debug statement when flushing packfile data	2026-03-24 12:31:34 -07:00
Patrick Steinhardt	cfd575f0a9	odb: introduce `struct odb_for_each_object_options` The `odb_for_each_object()` function only accepts a bitset of flags. In a subsequent commit we'll want to change object iteration to also support iterating over only those objects that have a specific prefix. While we could of course add the prefix to the function signature, or alternatively introduce a new function, both of these options don't really seem to be that sensible. Instead, introduce a new `struct odb_for_each_object_options` that can be passed to a new `odb_for_each_object_ext()` function. Splice through the options structure into the respective object database sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-20 13:16:41 -07:00
Patrick Steinhardt	835e0aaf6f	builtin/pack-objects: reduce lock contention when writing packfile data When running `git pack-objects --stdout` we feed the data through `hashfd_ext()` with a progress meter and a smaller-than-usual buffer length of 8kB so that we can track throughput more granularly. But as packfiles tend to be on the larger side, this small buffer size may cause a ton of write(3p) syscalls. Originally, the buffer we used in `hashfd()` was 8kB for all use cases. This was changed though in `2ca245f8be` (csum-file.h: increase hashfile buffer size, 2021-05-18) because we noticed that the number of writes can have an impact on performance. So the buffer size was increased to 128kB, which improved performance a bit for some use cases. But the commit didn't touch the buffer size for `hashd_throughput()`. The reasoning here was that callers expect the progress indicator to update frequently, and a larger buffer size would of course reduce the update frequency especially on slow networks. While that is of course true, there was (and still is, even though it's now a call to `hashfd_ext()`) only a single caller of this function in git-pack-objects(1). This command is responsible for writing packfiles, and those packfiles are often on the bigger side. So arguably: - The user won't care about increments of 8kB when packfiles tend to be megabytes or even gigabytes in size. - Reducing the number of syscalls would be even more valuable here than it would be for multi-pack indices, which was the benchmark done in the mentioned commit, as MIDXs are typically significantly smaller than packfiles. - Nowadays, many internet connections should be able to transfer data at a rate significantly higher than 8kB per second. Update the buffer to instead have a size of `LARGE_PACKET_DATA_MAX - 1`, which translates to ~64kB. This limit was chosen because `git pack-objects --stdout` is most often used when sending packfiles via git-upload-pack(1), where packfile data is chunked into pktlines when using the sideband. Furthermore, most internet connections should have a bandwidth signifcantly higher than 64kB/s, so we'd still be able to observe progress updates at a rate of at least once per second. This change significantly reduces the number of write(3p) syscalls from 355,000 to 44,000 when packing the Linux repository. While this results in a small performance improvement on an otherwise-unused system, this improvement is mostly negligible. More importantly though, it will reduce lock contention in the kernel on an extremely busy system where we have many processes writing data at once. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-13 08:54:15 -07:00
Patrick Steinhardt	2bf8f36ddb	csum-file: drop `hashfd_throughput()` The `hashfd_throughput()` function is used by a single callsite in git-pack-objects(1). In contrast to `hashfd()`, this function uses a progress meter to measure throughput and a smaller buffer length so that the progress meter can provide more granular metrics. We're going to change that caller in the next commit to be a bit more specific to packing objects. As such, `hashfd_throughput()` will be a somewhat unfitting mechanism for any potential new callers. Drop the function and replace it with a call to `hashfd_ext()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-13 08:54:15 -07:00
Junio C Hamano	c89a495ce4	Merge branch 'ps/odb-sources' The object source API is getting restructured to allow plugging new backends. * ps/odb-sources: odb/source: make `begin_transaction()` function pluggable odb/source: make `write_alternate()` function pluggable odb/source: make `read_alternates()` function pluggable odb/source: make `write_object_stream()` function pluggable odb/source: make `write_object()` function pluggable odb/source: make `freshen_object()` function pluggable odb/source: make `for_each_object()` function pluggable odb/source: make `read_object_stream()` function pluggable odb/source: make `read_object_info()` function pluggable odb/source: make `close()` function pluggable odb/source: make `reprepare()` function pluggable odb/source: make `free()` function pluggable odb/source: introduce source type for robustness odb: move reparenting logic into respective subsystems odb: embed base source in the "files" backend odb: introduce "files" source odb: split `struct odb_source` into separate header	2026-03-12 14:09:07 -07:00
Junio C Hamano	6953f24e40	Merge branch 'rs/parse-options-duplicated-long-options' The parse-options API learned to notice an options[] array with duplicated long options. * rs/parse-options-duplicated-long-options: parseopt: check for duplicate long names and numerical options pack-objects: remove duplicate --stdin-packs definition	2026-03-10 14:23:19 -07:00
Patrick Steinhardt	d9ecf268ef	odb: embed base source in the "files" backend The "files" backend is implemented as a pointer in the `struct odb_source`. This contradicts our typical pattern for pluggable backends like we use it for example in the ref store or for object database streams, where we typically embed the generic base structure in the specialized implementation. This pattern has a couple of small benefits: - We avoid an extra allocation. - We hide implementation details in the generic structure. - We can easily downcast from a generic backend to the specialized structure and vice versa because the offsets are known at compile time. - It becomes trivial to identify locations where we depend on backend specific logic because the cast needs to be explicit. Refactor our "files" object database source to do the same and embed the `struct odb_source` in the `struct odb_source_files`. There are still a bunch of sites in our code base where we do have to access internals of the "files" backend. The intent is that those will go away over time, but this will certainly take a while. Meanwhile, provide a `odb_source_files_downcast()` function that can convert a generic source into a "files" source. As we only have a single source the downcast succeeds unconditionally for now. Eventually though the intent is to make the cast `BUG()` in case the caller requests to downcast a non-"files" backend to a "files" backend. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-05 11:45:15 -08:00
Patrick Steinhardt	cb506a8a69	odb: introduce "files" source Introduce a new "files" object database source. This source encapsulates access to both loose object files and the packfile store, similar to how the "files" backend for refs encapsulates access to loose refs and the packed-refs file. Note that for now the "files" source is still a direct member of a `struct odb_source`. This architecture will be reversed in the next commit so that the files source contains a `struct odb_source`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-05 11:45:14 -08:00
Junio C Hamano	9eb5b3b999	Merge branch 'ps/odb-for-each-object' Revamp object enumeration API around odb. * ps/odb-for-each-object: odb: drop unused `for_each_{loose,packed}_object()` functions reachable: convert to use `odb_for_each_object()` builtin/pack-objects: use `packfile_store_for_each_object()` odb: introduce mtime fields for object info requests treewide: drop uses of `for_each_{loose,packed}_object()` treewide: enumerate promisor objects via `odb_for_each_object()` builtin/fsck: refactor to use `odb_for_each_object()` odb: introduce `odb_for_each_object()` packfile: introduce function to iterate through objects packfile: extract function to iterate through objects of a store object-file: introduce function to iterate through objects object-file: extract function to read object info from path odb: fix flags parameter to be unsigned odb: rename `FOR_EACH_OBJECT_*` flags	2026-03-02 17:06:50 -08:00
Junio C Hamano	aa95f87c74	Merge branch 'ps/for-each-ref-in-fixes' A handful of places used refs_for_each_ref_in() API incorrectly, which has been corrected. * ps/for-each-ref-in-fixes: bisect: simplify string_list memory handling bisect: fix misuse of `refs_for_each_ref_in()` pack-bitmap: fix bug with exact ref match in "pack.preferBitmapTips" pack-bitmap: deduplicate logic to iterate over preferred bitmap tips	2026-02-27 15:11:50 -08:00
René Scharfe	a5f2ff6ce8	pack-objects: remove duplicate --stdin-packs definition `cd846bacc7` (pack-objects: introduce '--stdin-packs=follow', 2025-06-23) added a new definition of the option --stdin-packs that accepts an argument. It kept the old definition, which still shows up in the short help, but is shadowed by the new one. Remove it. Hinted-at-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-27 11:38:25 -08:00
Junio C Hamano	b1f4b5888b	Merge branch 'ps/pack-concat-wo-backfill' "git pack-objects --stdin-packs" with "--exclude-promisor-objects" fetched objects that are promised, which was not wanted. This has been fixed. * ps/pack-concat-wo-backfill: builtin/pack-objects: don't fetch objects when merging packs	2026-02-25 11:54:18 -08:00
Junio C Hamano	703c97519d	Merge branch 'ps/odb-for-each-object' into ps/odb-sources * ps/odb-for-each-object: odb: drop unused `for_each_{loose,packed}_object()` functions reachable: convert to use `odb_for_each_object()` builtin/pack-objects: use `packfile_store_for_each_object()` odb: introduce mtime fields for object info requests treewide: drop uses of `for_each_{loose,packed}_object()` treewide: enumerate promisor objects via `odb_for_each_object()` builtin/fsck: refactor to use `odb_for_each_object()` odb: introduce `odb_for_each_object()` packfile: introduce function to iterate through objects packfile: extract function to iterate through objects of a store object-file: introduce function to iterate through objects object-file: extract function to read object info from path odb: fix flags parameter to be unsigned odb: rename `FOR_EACH_OBJECT_*` flags	2026-02-23 13:48:00 -08:00
Patrick Steinhardt	ed693078e9	pack-bitmap: deduplicate logic to iterate over preferred bitmap tips We have two locations that iterate over the preferred bitmap tips as configured by the user via "pack.preferBitmapTips". Both of these callsites are subtly wrong: when the preferred bitmap tips contain an exact refname match, then we will hit a `BUG()`. Prepare for the fix by unifying the two callsites into a new `for_each_preferred_bitmap_tip()` function. This removes the last callsite of `bitmap_preferred_tips()` outside of "pack-bitmap.c". As such, convert the function to be local to that file only. Note that the function is still used by a second caller, so we cannot just inline it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-19 10:41:18 -08:00
Junio C Hamano	354b8d89ac	Merge branch 'rs/clean-includes' Clean up redundant includes of header files. * rs/clean-includes: remove duplicate includes	2026-02-17 13:30:42 -08:00
Patrick Steinhardt	f4eff7116d	builtin/pack-objects: don't fetch objects when merging packs The "--stdin-packs" option can be used to merge objects from multiple packfiles given via stdin into a new packfile. One big upside of this option is that we don't have to perform a complete rev walk to enumerate objects. Instead, we can simply enumerate all objects that are part of the specified packfiles, which can be significantly faster in very large repositories. There is one downside though: when we don't perform a rev walk we also don't have a good way to learn about the respective object's names. As a consequence, we cannot use the name hashes as a heuristic to get better delta selection. We try to offset this downside though by performing a localized rev walk: we queue all objects that we're about to repack as interesting, and all objects from excluded packfiles as uninteresting. We then perform a best-effort rev walk that allows us to fill in object names. There is one gotcha here though: when "--exclude-promisor-objects" has not been given we will perform backfill fetches for any promised objects that are missing. This used to not be an issue though as this option was mutually exclusive with "--stdin-packs". But that has changed recently, and starting with `dcc9c7ef47` (builtin/repack: handle promisor packs with geometric repacking, 2026-01-05) we will now repack promisor packs during geometric compaction. The consequence is that a geometric repack may now perform a bunch of backfill fetches. We of course cannot pass "--exclude-promisor-objects" to fix this issue -- after all, the whole intent is to repack objects part of a promisor pack. But arguably we don't have to: the rev walk is intended as best effort, and we already configure it to ignore missing links to other objects. So we can adapt the walk to unconditionally disable fetching any missing objects. Do so and add a test that verifies we don't backfill any objects. Reported-by: Lukas Wanko <lwanko@gitlab.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-11 09:44:09 -08:00
René Scharfe	10c68d2577	remove duplicate includes The following command reports that some header files are included twice: $ git grep '#include' '*.c' \| sort \| uniq -cd Remove the second #include line in each case, as it has no effect. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-08 15:03:06 -08:00
Amisha Chhajed	2e711acfbd	string-list: add string_list_sort_u() that mimics "sort -u" Many callsites of string_list_remove_duplicates() call it immdediately after calling string_list_sort(), understandably as the former requires string-list to be sorted, it is clear that these places are sorting only to remove duplicates and for no other reason. Introduce a helper function string_list_sort_u that combines these two calls that often appear together, to simplify these callsites. Replace the current calls of those methods with string_list_sort_u(). Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-29 09:32:50 -08:00
Patrick Steinhardt	dd097bbe29	builtin/pack-objects: use `packfile_store_for_each_object()` When enumerating objects that are supposed to be stored in a new cruft pack we use `for_each_packed_object()` and then derive each object's mtime individually. Refactor this logic to instead use the new `packfile_store_for_each_object()` function with an object info request that asks for the respective mtimes. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:08 -08:00
Patrick Steinhardt	bd1855b897	odb: rename `FOR_EACH_OBJECT_` flags Rename the `FOR_EACH_OBJECT_` flags to have an `ODB_` prefix. This prepares us for a new upcoming `odb_for_each_object()` function and ensures that both the function and its flags have the same prefix. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:06 -08:00
Junio C Hamano	070fa41675	Merge branch 'ps/geometric-repacking-with-promisor-remotes' "git repack --geometric" did not work with promisor packs, which has been corrected. * ps/geometric-repacking-with-promisor-remotes: builtin/repack: handle promisor packs with geometric repacking repack-promisor: extract function to remove redundant packs repack-promisor: extract function to finalize repacking repack-geometry: extract function to compute repacking split builtin/pack-objects: exclude promisor objects with "--stdin-packs"	2026-01-21 16:16:27 -08:00
Junio C Hamano	bc5cbbe246	Merge branch 'ps/read-object-info-improvements' The object-info API has been cleaned up. * ps/read-object-info-improvements: packfile: drop repository parameter from `packed_object_info()` packfile: skip unpacking object header for disk size requests packfile: disentangle return value of `packed_object_info()` packfile: always populate pack-specific info when reading object info packfile: extend `is_delta` field to allow for "unknown" state packfile: always declare object info to be OI_PACKED object-file: always set OI_LOOSE when reading object info	2026-01-21 08:29:00 -08:00
Junio C Hamano	d627023d80	Merge branch 'ps/packfile-store-in-odb-source' The packfile_store data structure is moved from object store to odb source. * ps/packfile-store-in-odb-source: packfile: move MIDX into packfile store packfile: refactor `find_pack_entry()` to work on the packfile store packfile: inline `find_kept_pack_entry()` packfile: only prepare owning store in `packfile_store_prepare()` packfile: only prepare owning store in `packfile_store_get_packs()` packfile: move packfile store into object source packfile: refactor misleading code when unusing pack windows packfile: refactor kept-pack cache to work with packfile stores packfile: pass source to `prepare_pack()` packfile: create store via its owning source	2026-01-21 08:28:59 -08:00
Junio C Hamano	ec16dde5c8	Merge branch 'ps/packfile-store-in-odb-source' into ps/odb-for-each-object * ps/packfile-store-in-odb-source: packfile: move MIDX into packfile store packfile: refactor `find_pack_entry()` to work on the packfile store packfile: inline `find_kept_pack_entry()` packfile: only prepare owning store in `packfile_store_prepare()` packfile: only prepare owning store in `packfile_store_get_packs()` packfile: move packfile store into object source packfile: refactor misleading code when unusing pack windows packfile: refactor kept-pack cache to work with packfile stores packfile: pass source to `prepare_pack()` packfile: create store via its owning source odb: properly close sources before freeing them builtin/gc: fix condition for whether to write commit graphs	2026-01-15 05:50:16 -08:00
Junio C Hamano	c8e1706e8d	Merge branch 'ps/read-object-info-improvements' into ps/odb-for-each-object * ps/read-object-info-improvements: packfile: drop repository parameter from `packed_object_info()` packfile: skip unpacking object header for disk size requests packfile: disentangle return value of `packed_object_info()` packfile: always populate pack-specific info when reading object info packfile: extend `is_delta` field to allow for "unknown" state packfile: always declare object info to be OI_PACKED object-file: always set OI_LOOSE when reading object info	2026-01-15 05:47:47 -08:00
Patrick Steinhardt	0cd306ebc8	builtin/pack-objects: exclude promisor objects with "--stdin-packs" It is currently not possible to combine "--exclude-promisor-objects" with "--stdin-packs" because both flags want to set up a revision walk to enumerate the objects to pack. In a subsequent commit though we want to extend geometric repacks to support promisor objects, and for that we need to handle the combination of both flags. There are two cases we have to think about here: - "--stdin-packs" asks us to pack exactly the objects part of the specified packfiles. It is somewhat questionable what to do in the case where the user asks us to exclude promisor objects, but at the same time explicitly passes a promisor pack to us. For now, we simply abort the request as it is self-contradicting. As we have also been dying before this commit there is no regression here. - "--stdin-packs=follow" does the same as the first flag, but it also asks us to include all objects transitively reachable from any object in the packs we are about to repack. This is done by doing the revision walk mentioned further up. Luckily, fixing this case is trivial: we only need to modify the revision walk to also set the `exclude_promisor_objects` field. Note that we do not support the "--exclude-promisor-objects-best-effort" flag for now as we don't need it to support geometric repacking with promisor objects. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-14 06:29:24 -08:00
Patrick Steinhardt	12d3b58b55	packfile: drop repository parameter from `packed_object_info()` The function `packed_object_info()` takes a packfile and offset and returns the object info for the corresponding object. Despite these two parameters though it also takes a repository pointer. This is redundant information though, as `struct packed_git` already has a repository pointer that is always populated. Drop the redundant parameter. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-12 06:51:15 -08:00
Patrick Steinhardt	84f0e60b28	packfile: move packfile store into object source The packfile store is a member of `struct object_database`, which means that we have a single store per database. This doesn't really make much sense though: each source connected to the database has its own set of packfiles, so there is a conceptual mismatch here. This hasn't really caused much of a problem in the past, but with the advent of pluggable object databases this is becoming more of a problem because some of the sources may not even use packfiles in the first place. Move the packfile store down by one level from the object database into the object database source. This ensures that each source now has its own packfile store, and we can eventually start to abstract it away entirely so that the caller doesn't even know what kind of store it uses. Note that we only need to adjust a relatively small number of callers, way less than one might expect. This is because most callers are using `repo_for_each_pack()`, which handles enumeration of all packfiles that exist in the repository. So for now, none of these callers need to be adapted. The remaining callers that iterate through the packfiles directly and that need adjustment are those that are a bit more tangled with packfiles. These will be adjusted over time. Note that this patch only moves the packfile store, and there is still a bunch of functions that seemingly operate on a packfile store but that end up iterating over all sources. These will be adjusted in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-09 06:40:07 -08:00
Patrick Steinhardt	085de91b95	packfile: refactor kept-pack cache to work with packfile stores The kept pack cache is a cache of packfiles that are marked as kept either via an accompanying ".kept" file or via an in-memory flag. The cache can be retrieved via `kept_pack_cache()`, where one needs to pass in a repository. Ultimately though the kept-pack cache is a property of the packfile store, and this causes problems in a subsequent commit where we want to move down the packfile store to be a per-object-source entity. Prepare for this and refactor the kept-pack cache to work on top of a packfile store instead. While at it, rename both the function and flags specific to the kept-pack cache so that they can be properly attributed to the respective subsystems. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-09 06:40:06 -08:00
René Scharfe	b6e4cc8c32	tag: support arbitrary repositories in parse_tag() Allow callers of parse_tag() pass in the repository to use. Let most of them pass in the_repository to get the same result as before. One of them has stopped using the_repository in `ef9b0370da` (sha1-name.c: store and use repo in struct disambiguate_state, 2019-04-16); let it pass in its stored repository. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-29 22:02:54 +09:00
Junio C Hamano	dbe54273a7	Merge branch 'ps/object-read-stream' The "git_istream" abstraction has been revamped to make it easier to interface with pluggable object database design. * ps/object-read-stream: streaming: drop redundant type and size pointers streaming: move into object database subsystem streaming: refactor interface to be object-database-centric streaming: move logic to read packed objects streams into backend streaming: move logic to read loose objects streams into backend streaming: make the `odb_read_stream` definition public streaming: get rid of `the_repository` streaming: rely on object sources to create object stream packfile: introduce function to read object info from a store streaming: move zlib stream into backends streaming: create structure for filtered object streams streaming: create structure for packed object streams streaming: create structure for loose object streams streaming: create structure for in-core object streams streaming: allocate stream inside the backend-specific logic streaming: explicitly pass packfile info when streaming a packed object streaming: propagate final object type via the stream streaming: drop the `open()` callback function streaming: rename `git_istream` into `odb_read_stream`	2025-12-16 11:08:34 +09:00
Junio C Hamano	f1799202ea	Merge branch 'ps/object-read-stream' into ps/packfile-store-in-odb-source * ps/object-read-stream: streaming: drop redundant type and size pointers streaming: move into object database subsystem streaming: refactor interface to be object-database-centric streaming: move logic to read packed objects streams into backend streaming: move logic to read loose objects streams into backend streaming: make the `odb_read_stream` definition public streaming: get rid of `the_repository` streaming: rely on object sources to create object stream packfile: introduce function to read object info from a store streaming: move zlib stream into backends streaming: create structure for filtered object streams streaming: create structure for packed object streams streaming: create structure for loose object streams streaming: create structure for in-core object streams streaming: allocate stream inside the backend-specific logic streaming: explicitly pass packfile info when streaming a packed object streaming: propagate final object type via the stream streaming: drop the `open()` callback function streaming: rename `git_istream` into `odb_read_stream`	2025-12-15 17:40:31 +09:00
Junio C Hamano	a545103244	Merge branch 'ps/object-source-loose' A part of code paths that deals with loose objects has been cleaned up. * ps/object-source-loose: object-file: refactor writing objects via a stream object-file: rename `write_object_file()` object-file: refactor freshening of objects object-file: rename `has_loose_object()` object-file: read objects via the loose object source object-file: move loose object map into loose source object-file: hide internals when we need to reprepare loose sources object-file: move loose object cache into loose source object-file: introduce `struct odb_source_loose` object-file: move `fetch_if_missing` odb: adjust naming to free object sources odb: introduce `odb_source_new()` odb: fix subtle logic to check whether an alternate is usable	2025-11-24 15:46:41 -08:00
Patrick Steinhardt	7b94028652	streaming: drop redundant type and size pointers In the preceding commits we have turned `struct odb_read_stream` into a publicly visible structure. Furthermore, this structure now contains the type and size of the object that we are about to stream. Consequently, the out-pointers that we used before to propagate the type and size of the streamed object are now somewhat redundant with the data contained in the structure itself. Drop these out-pointers and adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:46 -08:00
Patrick Steinhardt	1599b68d5e	streaming: move into object database subsystem The "streaming" terminology is somewhat generic, so it may not be immediately obvious that "streaming.{c,h}" is specific to the object database. Rectify this by moving it into the "odb/" directory so that it can be immediately attributed to the object subsystem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:46 -08:00
Patrick Steinhardt	378ec56beb	streaming: refactor interface to be object-database-centric Refactor the streaming interface to be centered around object databases instead of centered around the repository. Rename the functions accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:45 -08:00
Patrick Steinhardt	6bdda3a3b0	streaming: rename `git_istream` into `odb_read_stream` In the following patches we are about to make the `git_istream` more generic so that it becomes fully controlled by the specific object source that wants to create it. As part of these refactorings we'll fully move the structure into the object database subsystem. Prepare for this change by renaming the structure from `git_istream` to `odb_read_stream`. This mirrors the `odb_write_stream` structure that we already have. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:44 -08:00
Junio C Hamano	13134cecb0	Merge branch 'ps/ref-peeled-tags' Some ref backend storage can hold not just the object name of an annotated tag, but the object name of the object the tag points at. The code to handle this information has been streamlined. * ps/ref-peeled-tags: t7004: do not chdir around in the main process ref-filter: fix stale parsed objects ref-filter: parse objects on demand ref-filter: detect broken tags when dereferencing them refs: don't store peeled object IDs for invalid tags object: add flag to `peel_object()` to verify object type refs: drop infrastructure to peel via iterators refs: drop `current_ref_iter` hack builtin/show-ref: convert to use `reference_get_peeled_oid()` ref-filter: propagate peeled object ID upload-pack: convert to use `reference_get_peeled_oid()` refs: expose peeled object ID via the iterator refs: refactor reference status flags refs: fully reset `struct ref_iterator::ref` on iteration refs: introduce `.ref` field for the base iterator refs: introduce wrapper struct for `each_ref_fn`	2025-11-19 10:55:39 -08:00
Patrick Steinhardt	f898661637	refs: expose peeled object ID via the iterator Both the "files" and "reftable" backend are able to store peeled values for tags in the respective formats. This allows for a more efficient lookup of the target object of such a tag without having to manually peel via the object database. The infrastructure to access these peeled object IDs is somewhat funky though. When iterating through objects, we store a pointer reference to the current iterator in a global variable. The callbacks invoked by that iterator are then expected to call `peel_iterated_oid()`, which checks whether the globally-stored iterator's current reference refers to the one handed into that function. If so, we ask the iterator to peel the object, otherwise we manually peel the object via the object database. Depending on global state like this is somewhat weird and also quite fragile. Introduce a new `struct reference::peeled_oid` field that can be populated by the reference backends. This field can be accessed via a new function `reference_get_peeled_oid()` that either uses that value, if set, or alternatively peels via the ODB. With this change we don't have to rely on global state anymore, but make the peeled object ID available to the callback functions directly. Adjust trivial callers that already have a `struct reference` available. Remaining callers will be adjusted in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-04 07:32:25 -08:00
Patrick Steinhardt	bdbebe5714	refs: introduce wrapper struct for `each_ref_fn` The `each_ref_fn` callback function type is used across our code base for several different functions that iterate through reference. There's a bunch of callbacks implementing this type, which makes any changes to the callback signature extremely noisy. An example of the required churn is `e8207717f1` (refs: add referent to each_ref_fn, 2024-08-09): adding a single argument required us to change 48 files. It was already proposed back then [1] that we might want to introduce a wrapper structure to alleviate the pain going forward. While this of course requires the same kind of global refactoring as just introducing a new parameter, it at least allows us to more change the callback type afterwards by just extending the wrapper structure. One counterargument to this refactoring is that it makes the structure more opaque. While it is obvious which callsites need to be fixed up when we change the function type, it's not obvious anymore once we use a structure. That being said, we only have a handful of sites that actually need to populate this wrapper structure: our ref backends, "refs/iterator.c" as well as very few sites that invoke the iterator callback functions directly. Introduce this wrapper structure so that we can adapt the iterator interfaces more readily. [1]: <ZmarVcF5JjsZx0dl@tanuki> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-04 07:32:24 -08:00

1 2 3 4 5 ...

714 Commits (ffaa2eddd07afa5a86daaf0f9fd8838fb283dc2d)