kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	ffaa2eddd0	Merge branch 'ds/path-walk-filters' The "git pack-objects --path-walk" traversal has been integrated with several object filters, including blobless and sparse filters. * ds/path-walk-filters: path-walk: support `combine` filter path-walk: support `object:type` filter path-walk: support `tree:0` filter t6601: tag otherwise-unreachable trees pack-objects: support sparse:oid filter with path-walk path-walk: add pl_sparse_trees to control tree pruning path-walk: support blob size limit filter backfill: die on incompatible filter options path-walk: support blobless filter path-walk: always emit directly-requested objects t/perf: add pack-objects filter and path-walk benchmark pack-objects: pass --objects with --path-walk t5620: make test work with path-walk var	2026-06-02 16:15:29 +09:00
Junio C Hamano	7b3ab91768	Merge branch 'jk/connect-service-enum' The "name" argument in git_connect() and related functions has been converted to a "service" enum to improve type safety and clarify its purpose. * jk/connect-service-enum: transport-helper: fix typo in BUG() message connect: use "service" enum for "name" argument	2026-06-02 16:15:28 +09:00
Junio C Hamano	33da2f4d3b	Merge branch 'sa/cat-file-batch-mailmap-switch' "git cat-file --batch" learns an in-line command "mailmap" that lets the user toggle use of mailmap. * sa/cat-file-batch-mailmap-switch: cat-file: add mailmap subcommand to --batch-command	2026-05-31 10:00:38 +09:00
Junio C Hamano	4d11b9c218	Merge branch 'pt/fsmonitor-linux' The fsmonitor daemon has been implemented for Linux. * pt/fsmonitor-linux: fsmonitor: convert shown khash to strset in do_handle_client fsmonitor: add tests for Linux fsmonitor: add timeout to daemon stop command fsmonitor: close inherited file descriptors and detach in daemon run-command: add close_fd_above_stderr option fsmonitor: implement filesystem change listener for Linux fsmonitor: rename fsm-settings-darwin.c to fsm-settings-unix.c fsmonitor: rename fsm-ipc-darwin.c to fsm-ipc-unix.c fsmonitor: use pthread_cond_timedwait for cookie wait compat/win32: add pthread_cond_timedwait fsmonitor: fix hashmap memory leak in fsmonitor_run_daemon fsmonitor: fix khash memory leak in do_handle_client t9210, t9211: disable GIT_TEST_SPLIT_INDEX for scalar clone tests	2026-05-31 10:00:38 +09:00
Junio C Hamano	d2c01318b0	Merge branch 'jr/bisect-custom-terms-in-output' "git bisect" now uses the selected terms (e.g., old/new) more consistently in its output. * jr/bisect-custom-terms-in-output: rev-parse: use selected alternate terms to look up refs bisect: print bisect terms in single quotes bisect: use selected alternate terms in status output	2026-05-31 10:00:37 +09:00
Junio C Hamano	455ff75d35	Merge branch 'ps/setup-wo-the-repository' Many uses of the_repository has been updated to use a more appropriate struct repository instance in setup.c codepath. * ps/setup-wo-the-repository: setup: stop using `the_repository` in `init_db()` setup: stop using `the_repository` in `create_reference_database()` setup: stop using `the_repository` in `initialize_repository_version()` setup: stop using `the_repository` in `check_repository_format()` setup: stop using `the_repository` in `upgrade_repository_format()` setup: stop using `the_repository` in `setup_git_directory()` setup: stop using `the_repository` in `setup_git_directory_gently()` setup: stop using `the_repository` in `setup_git_env()` setup: stop using `the_repository` in `set_git_work_tree()` setup: stop using `the_repository` in `setup_work_tree()` setup: stop using `the_repository` in `enter_repo()` setup: stop using `the_repository` in `verify_non_filename()` setup: stop using `the_repository` in `verify_filename()` setup: stop using `the_repository` in `path_inside_repo()` setup: stop using `the_repository` in `prefix_path()` setup: stop using `the_repository` in `is_inside_work_tree()` setup: stop using `the_repository` in `is_inside_git_dir()` setup: replace use of `the_repository` in static functions	2026-05-27 14:15:46 +09:00
Junio C Hamano	2f952b81ed	Merge branch 'jt/odb-transaction-write' ODB transaction interface is being reworked to explicitly handle object writes. * jt/odb-transaction-write: odb/transaction: make `write_object_stream()` pluggable object-file: generalize packfile writes to use odb_write_stream object-file: avoid fd seekback by checking object size upfront object-file: remove flags from transaction packfile writes odb: update `struct odb_write_stream` read() callback odb/transaction: use pluggable `begin_transaction()` odb: split `struct odb_transaction` into separate header	2026-05-27 14:15:45 +09:00
Junio C Hamano	8b5873a1f2	Merge branch 'tb/incremental-midx-part-3.3' The repacking code has been refactored and compaction of MIDX layers have been implemented, and incremental strategy that does not require all-into-one repacking has been introduced. * tb/incremental-midx-part-3.3: repack: allow `--write-midx=incremental` without `--geometric` repack: introduce `--write-midx=incremental` repack: implement incremental MIDX repacking packfile: ensure `close_pack_revindex()` frees in-memory revindex builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK` repack-geometry: prepare for incremental MIDX repacking repack-midx: extract `repack_fill_midx_stdin_packs()` repack-midx: factor out `repack_prepare_midx_command()` midx: expose `midx_layer_contains_pack()` repack: track the ODB source via existing_packs midx: support custom `--base` for incremental MIDX writes midx: introduce `--no-write-chain-file` for incremental MIDX writes midx: use `strvec` for `keep_hashes` midx: build `keep_hashes` array in order midx: use `strset` for retained MIDX files midx-write: handle noop writes when converting incremental chains	2026-05-27 14:15:45 +09:00
Junio C Hamano	1103041f34	Merge branch 'ds/fetch-negotiation-options' The negotiation tip options in "git fetch" have been reworked to allow requiring certain refs to be sent as "have" lines, and to restrict negotiation to a specific set of refs. * ds/fetch-negotiation-options: send-pack: pass negotiation config in push remote: add remote..negotiationInclude config fetch: add --negotiation-include option for negotiation negotiator: add have_sent() interface remote: add remote..negotiationRestrict config transport: rename negotiation_tips fetch: add --negotiation-restrict option t5516: fix test order flakiness	2026-05-27 14:15:45 +09:00
Junio C Hamano	9020a116d6	Merge branch 'kk/merge-octopus-optim' The logic to determine that branches in an octopus merge are independent has been optimized. * kk/merge-octopus-optim: merge: use repo_in_merge_bases for octopus up-to-date check	2026-05-27 14:15:44 +09:00
Junio C Hamano	6d2ba7ead7	Merge branch 'en/batch-prefetch' In a lazy clone, "git cherry" and "git grep" often fetch necessary blob objects one by one from promisor remotes. It has been corrected to collect necessary object names and fetch them in bulk to gain reasonable performance. * en/batch-prefetch: grep: prefetch necessary blobs builtin/log: prefetch necessary blobs for `git cherry` patch-ids.h: add missing trailing parenthesis in documentation comment promisor-remote: document caller filtering contract	2026-05-27 14:15:44 +09:00
Junio C Hamano	0d5b240d73	Merge branch 'kk/paint-down-to-common-optim' "git merge-base" optimization. * kk/paint-down-to-common-optim: commit-reach: early exit paint_down_to_common for single merge-base commit-reach: introduce merge_base_flags enum	2026-05-25 09:40:07 +09:00
Derrick Stolee	2dc858e69e	pack-objects: support sparse:oid filter with path-walk The --filter=sparse:<oid> option to 'git pack-objects' allows focusing an object set to a sparse-checkout definition. This reduces the set of matching blobs while retaining all reachable trees. No server currently supports fetching with this filter because it is expensive to compute and reachability bitmaps do not help without a significant effort to extend the bitmap feature to store bitmaps for each supported sparse- checkout definition. Without focusing on serving fetches and clones with these filters, there are still benefits that could be realized by making this faster. With the sparse index, it's more realistic now than ever to be able to operate a local clone that was bootstrapped by a packfile created with a sparse filter, because the missing trees are not needed to move a sparse-checkout from one commit to another or to view the history of any path in scope. Such clones could perhaps be bootstrapped by partial bundles. Previously, constructing these sparse packs has been incredibly computationally inefficient. The revision walk that explores which objects are in scope spends a lot of time checking each object to see if it matches the sparse-checkout patterns, causing quadratic behavior (number of objects times number of sparse-checkout patterns). This improves somewhat when using cone-mode sparse-checkout patterns that can use hashtables and prefix matches to determine containment. However, the check per object is still too expensive for most cases. This is where the path-walk feature comes in. We can proceed as normal by placing objects in bins by path and _then_ check a group of objects all at once. Since sparse:<oid> only restricts blobs, the path-walk must include all reachable trees while using the cone-mode patterns to skip blobs at paths outside the sparse scope. This establishes a baseline for a potential future "treesparse:<oid>" filter that would also restrict trees, but introducing such a new filter is deferred to a later change. The implementation here is focused around loading the sparse-checkout patterns from the provided object ID and checking that the patterns are indeed cone-mode patterns. We can then load the correct pattern list into the path walk context and use the logic that already exists from `bff4555767` (backfill: add --sparse option, 2025-02-03), though that feature loads sparse-checkout patterns from the worktree's local settings and also restricts tree objects. We use a combination of errors and warnings to signal problems during this load. The difference is that errors are likely fatal for the non-path-walk version while the warnings are probably just implementation details for the path-walk version and the 'git pack-objects' command can fall back to the revision walk version. Now that the SEEN flag is deferred until after pattern checks (from the previous commit), handle the case where a tree with a shared OID appears at both an out-of-cone and in-cone path. When trees are not being pruned (pl_sparse_trees == 0), the path-walk re-walks the tree at the in-cone path so that in-cone blobs within it are discovered. The new tests in t5317 and t6601 demonstrate this behavior and would fail without these changes. The performance test p5315 shows the impact of this change when using sparse filters: Test HEAD~1 HEAD ---------------------------------------------------------------------- 5315.10: repack (sparse:oid) 77.98 77.47 -0.7% 5315.11: repack size (sparse:oid) 187.5M 187.4M -0.0% 5315.12: repack (sparse:oid, --path-walk) 77.91 31.41 -59.7% 5315.13: repack size (sparse:oid, --path-walk) 187.5M 161.1M -14.1% These performance tests were run on the Git repository. The --path-walk feature shows meaningful space savings (14% smaller for sparse packs) and dramatic time savings (60% faster) by leveraging the path-walk's ability to skip blobs outside the sparse scope. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blaue <me@ttaylorr.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-24 18:41:06 +09:00
Derrick Stolee	8ff8de7616	path-walk: add pl_sparse_trees to control tree pruning The path-walk API prunes trees and blobs when a sparse-checkout pattern list is provided, which is the correct behavior for 'git backfill --sparse' since it only needs to fill in objects at paths within the sparse cone. However, a future change will use the path-walk API with a sparse:<oid> filter that restricts only blobs while retaining all reachable trees. To support both behaviors, add a 'pl_sparse_trees' flag to path_walk_info. When set (as in 'git backfill --sparse' and the --stdin-pl test helper mode), the sparse patterns prune both trees and blobs. When unset, only blobs are filtered and all trees are walked and reported. Additionally, move the SEEN flag assignment in add_tree_entries() to after the sparse pattern and pathspec checks. Previously, SEEN was set immediately upon discovering an object, before checking whether its path matched the sparse patterns. When the same object ID appeared at multiple paths (e.g. sibling directories with identical contents), the first path to be visited would mark the object as SEEN. If that path was outside the sparse cone, the object would be skipped there but also never discovered at its in-cone path. By deferring the SEEN flag until after the checks pass, objects that are skipped due to sparse filtering remain discoverable at other paths where they may be in scope. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-24 18:41:06 +09:00
Derrick Stolee	f1b5d3da16	path-walk: support blob size limit filter Extend the path-walk API to handle the 'blob:limit=<size>' object filter natively. This filter omits blobs whose size is equal to or greater than the given limit, matching the semantics used by the list-objects-filter machinery. When revs->filter.choice is LOFC_BLOB_LIMIT, the prepare_filters() method stores the limit value in info->blob_limit and clears the filter from revs. If the limit is zero, this degenerates to blob:none (all blobs excluded), so info->blobs is set to 0 instead. During walk_path(), blob batches are filtered before being delivered to the callback: each blob's size is checked via odb_read_object_info(), and only blobs strictly smaller than the limit are included. Blobs whose size cannot be determined (e.g. missing in a partial clone) are conservatively included, matching the existing filter behavior. Empty batches after filtering are skipped entirely. The check for inclusion in the path batch looks a little strange at first glance. We use odb_read_object_info() to read the object's size. Based on all of the assumptions to this point, this _should_ return OBJ_BLOB. Since we are focused on the size filter, we use a short-circuited OR (\|\|) to skip the size check if that method returns a different object type. Notice that this inspection of object sizes requires the content to be present in the repository. The odb_read_object_info() call will download a missing blob on-demand. This means that the use of the path-walk API within 'git backfill' would not operate nicely with this filter type. The intention of that command is to download missing blobs in batches. Downloading objects one-by-one would go against the point. Update the validation in 'git backfill' to add its own compatibility check on top of path_walk_filter_compatible(). Add tests for blob:limit=0 (equivalent to blob:none) and blob:limit=3 (which exercises partial filtering within a batch where some blobs are kept and others are excluded). Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-24 18:41:06 +09:00
Derrick Stolee	bf24de4b7c	backfill: die on incompatible filter options The 'git backfill' command uses the path-walk API in a critical way: it uses the objects output from the command to find the batches of missing objects that should be requested from the server. Unlike 'git pack-objects', we cannot fall back to another mechanism. The previous change added the path_walk_filter_compatible() method that we can reuse here. Use it during argument validation in cmd_backfill(). Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-24 18:41:06 +09:00
Derrick Stolee	6d87f0e8a3	path-walk: support blobless filter The 'git pack-objects' command can opt-in to using the path-walk API for scanning the objects. Currently, this option is dynamically disabled if combined with '--filter=<X>', even when using a simple filter such as 'blob:none' to signal a blobless packfile. This is a common scenario for repos at scale, so is worth integrating. Also, users can opt-in to the '--path-walk' option by default through the pack.usePathWalk=true config option. When using that in a blobless partial clone, the following warning can appear even though the user did not specify either option directly: warning: cannot use --filter with --path-walk Teach the path-walk API to handle the 'blob:none' object filter natively. When revs->filter.choice is LOFC_BLOB_NONE, the path-walk sets info->blobs to 0 (skipping all blob objects) and clears the filter from revs so that prepare_revision_walk() does not reject the configuration. This check is implemented in the static prepare_filters() method, which will simultaneously check if the input filters are compatible and will make the appropriate mutations to the path_walk_info and filters if the path_walk_info is non-NULL. This allows us to use this logic both in the API method path_walk_filter_compatible() for use in builtin/pack-objects.c and as a prep step in walk_objects_by_path(). Update the test helper (test-path-walk) to accept --filter=<spec> as a test-tool option (before '--'), applying it to revs after setup_revisions() to avoid the --objects requirement check. We can also revert recent GIT_TEST_PACK_PATH_WALK overrides in t5620. Also switch test-path-walk from REV_INFO_INIT with manual repo assignment to repo_init_revisions(), which properly initializes the filter_spec strbuf needed for filter parsing. Add tests for blob:none with --all and with a single branch. The performance test p5315 shows the impact of this change when using blobless filters: Test HEAD~1 HEAD --------------------------------------------------------------------- 5315.6: repack (blob:none) 13.53 13.87 +2.5% 5315.7: repack size (blob:none) 137.7M 137.8M +0.1% 5315.8: repack (blob:none, --path-walk) 13.51 23.43 +73.4% 5315.9: repack size (blob:none, --path-walk) 137.7M 115.2M -16.3% These performance tests were run on the Git repository. The --path-walk feature shows meaningful space savings (16% smaller for blobless packs) at the cost of increased computation time due to the two compression passes. This data demonstrates that the feature is engaged and provides real compression benefits when --no-reuse-delta forces fresh deltas. Co-Authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-24 18:41:06 +09:00
Derrick Stolee	35567889ef	pack-objects: pass --objects with --path-walk When 'git pack-objects' has the --path-walk option enabled, it uses a different set of revision walk parameters than normal. For one, --objects was previously assumed by the path-walk API and could be omitted. We also needed --boundary to allow discovering UNINTERESTING objects to use as delta bases. We will be updating the path-walk API soon to work with some filter options. However, the revision machinery will trigger a fatal error: fatal: object filtering requires --objects The fix is easy: add the --objects option as an argument. This has no effect on the path-walk API but does simplify the revision option parsing for the objects filter. We can remove the comment about "removing" the options because they were never removed and instead not added. We still need to disable using bitmaps. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-24 18:41:06 +09:00
Junio C Hamano	686213114e	Merge branch 'mm/git-url-parse' The internal URL parsing logic has been made accessible via a new subcommand "git url-parse". * mm/git-url-parse: t9904: add tests for the new url-parse builtin doc: describe the url-parse builtin builtin: create url-parse command urlmatch: define url_parse function url: return URL_SCHEME_UNKNOWN instead of dying url: move scheme detection to URL header/source url: move url_is_local_not_ssh to url.h connect: rename enum protocol to url_scheme	2026-05-21 12:06:48 +09:00
Junio C Hamano	2a098fd2f6	Merge branch 'kn/refs-generic-helpers' Refactor service routines in the ref subsystem backends. * kn/refs-generic-helpers: refs: use peeled tag values in reference backends refs: add peeled object ID to the `ref_update` struct refs: move object parsing to the generic layer update-ref: handle rejections while adding updates update-ref: move `print_rejected_refs()` up refs: return `ref_transaction_error` from `ref_transaction_update()` refs: extract out reflog config to generic layer refs: introduce `ref_store_init_options` refs: remove unused typedef 'ref_transaction_commit_fn'	2026-05-21 12:06:47 +09:00
Derrick Stolee	6f37fecfed	remote: add remote.*.negotiationInclude config Add a new 'remote.<name>.negotiationInclude' multi-valued config option that provides default values for --negotiation-include when no --negotiation-include arguments are specified over the command line. This is a mirror of how 'remote.<name>.negotiationRestrict' specifies defaults for the --negotiation-restrict arguments. Each value is either an exact ref name or a glob pattern whose tips should always be sent as 'have' lines during negotiation. The config values are resolved through the same resolve_negotiation_include() codepath as the CLI options. This option is additive with the normal negotiation process: the negotiation algorithm still runs and advertises its own selected commits, but the refs matching the config are sent unconditionally on top of those heuristically selected commits. Similar to the negotiationRestrict config, an empty value resets the value list to allow ignoring earlier config values, such as those that might be set in system or global config. Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-20 11:33:24 +09:00
Derrick Stolee	e2164742c9	fetch: add --negotiation-include option for negotiation Add a new --negotiation-include option to 'git fetch', which ensures that certain ref tips are always sent as 'have' lines during fetch negotiation, regardless of what the negotiation algorithm selects. This is useful when the repository has a large number of references, so the normal negotiation algorithm truncates the list. This is especially important in repositories with long parallel commit histories. For example, a repo could have a 'dev' branch for development and a 'release' branch for released versions. If the 'dev' branch isn't selected for negotiation, then it's not a big deal because there are many in-progress development branches with a shared history. However, if 'release' is not selected for negotiation, then the server may think that this is the first time the client has asked for that reference, causing a full download of its parallel commit history (and any extra data that may be unique to that branch). This is based on a real example where certain fetches would grow to 60+ GB when a release branch updated. This option is a complement to --negotiation-restrict, which reduces the negotiation ref set to a specific list. In the earlier example, using --negotiation-restrict to focus the negotiation to 'dev' and 'release' would avoid those problematic downloads, but would still not allow advertising potentially-relevant user branches. In this way, the 'include' version solves the problem I mention while allowing negotiation to pick other references opportunistically. The two options can also be combined to allow the best of both worlds. The argument may be an exact ref name or a glob pattern. Non-existent refs are silently ignored. This behavior is also updated in the ref matching logic for the related --negotiation-restrict option to match. The implementation outputs the requested objects as haves before the negotiator performs its own algorithm to choose the next haves. Use the new have_sent() interface to signal these have commits were sent before engaging with the negotiator's next() iterator. Also add --negotiation-include to 'git pull' passthrough options. Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-20 11:33:24 +09:00
Derrick Stolee	8bb252f86c	remote: add remote.*.negotiationRestrict config In a previous change, the --negotiation-restrict command-line option of 'git fetch' was added as a synonym of --negotiation-tip. Both of these options restrict the set of 'haves' the client can send as part of negotiation. This was previously not available via a configuration option. Add a new 'remote.<name>.negotiationRestrict' multi-valued config option that updates 'git fetch <name>' to use these restrictions by default. If the user provides even one --negotiation-restrict argument, then the config is ignored. An empty value resets the value list to allow ignoring earlier config values, such as those that might be set in system or global config. Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-20 11:33:24 +09:00
Derrick Stolee	4aef7dbb06	transport: rename negotiation_tips The previous change added the --negotiation-restrict synonym for the --negotiation-tip option for 'git fetch'. In anticipation of adding a new option that behaves similarly but with distinct changes to its behavior, rename the internal representation of this data from 'negotiation_tips' to 'negotiation_restrict_tips'. The 'tips' part is kept because this is an oid_array in the transport layer. This requires the builtin to handle parsing refs into collections of oids so the transport layer can handle this cleaner form of the data. Also update the string_list used to store the inputs from command-line options. Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-20 11:33:23 +09:00
Derrick Stolee	1a445fc60b	fetch: add --negotiation-restrict option The --negotiation-tip option to 'git fetch' and 'git pull' allows users to specify that they want to focus negotiation on a small set of references. This is a _restriction_ on the negotiation set, helping to focus the negotiation when the ref count is high. However, it doesn't allow for the ability to opportunistically select references beyond that list. This subtle detail that this is a 'maximum set' and not a 'minimum set' is not immediately clear from the option name. This makes it more complicated to add a new option that provides the complementary behavior of a minimum set. For now, create a new synonym option, --negotiation-restrict, that behaves identically to --negotiation-tip. Update the documentation to make it clear that this new name is the preferred option, but we keep the old name for compatibility. Mark --negotiation-tip as an alias of the new, preferred option. Update a few warning messages with the new option, but also make them translatable with the option name inserted by formatting. At least one of these messages will be reused later for a new option. Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-20 11:33:23 +09:00
Taylor Blau	06733a50ee	repack: allow `--write-midx=incremental` without `--geometric` Previously, `--write-midx=incremental` required `--geometric` and would die() without it. Relax this restriction so that incremental MIDX repacking can be used independently. Without `--geometric`, the behavior is append-only: a single new MIDX layer is created containing whatever packs were written by the repack and appended to the existing chain (or a new chain is started). Existing layers are preserved as-is with no compaction or merging. Implement this via a new repack_make_midx_append_plan() that builds a plan consisting of a WRITE step for the freshly written packs followed by COPY steps for every existing MIDX layer. The existing compaction plan (repack_make_midx_compaction_plan) is used only when `--geometric` is active. Update the documentation to describe the behavior with and without `--geometric`, and replace the test that enforced the old restriction with one exercising append-only incremental MIDX repacking. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-20 11:31:14 +09:00
Taylor Blau	938af89260	repack: introduce `--write-midx=incremental` Expose the incremental MIDX repacking mode (implemented in an earlier commit) via a new --write-midx=incremental option for `git repack`. Add "incremental" as a recognized argument to the --write-midx OPT_CALLBACK, mapping it to REPACK_WRITE_MIDX_INCREMENTAL. When this mode is active and --geometric is in use, set the midx_layer_threshold on the pack geometry so that only packs in sufficiently large tip layers are considered for repacking. Two new configuration options control the compaction behavior: - repack.midxSplitFactor (default: 2): the factor used in the geometric merging condition for MIDX layers. - repack.midxNewLayerThreshold (default: 8): the minimum number of packs in the tip MIDX layer before its packs are considered as candidates for geometric repacking. Add tests exercising the new mode across a variety of scenarios including basic geometric violations, multi-round chain integrity, branching and merging histories, cross-layer object uniqueness, and threshold-based compaction. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-20 11:31:14 +09:00
Taylor Blau	1da62fb5c8	repack: implement incremental MIDX repacking Implement the `write_midx_incremental()` function, which builds and maintains an incremental MIDX chain as part of the geometric repacking process. Unlike the default mode which writes a single flat MIDX, the incremental mode constructs a compaction plan that determines which MIDX layers to write, compact, or copy, and then executes each step using `git multi-pack-index` subcommands with the --no-write-chain-file flag. The repacking strategy works as follows: * Acquire the lock guarding the multi-pack-index-chain. * A new MIDX layer is always written containing the newly created pack(s). If the tip MIDX layer was rewritten during geometric repacking, any surviving packs from that layer are also included. * Starting from the new layer, adjacent MIDX layers are merged together as long as the accumulated object count exceeds half the object count of the next deeper layer (controlled by 'repack.midxSplitFactor'). * Remaining layers in the chain are evaluated pairwise and either compacted or copied as-is, following the same merging condition. * Write the contents of the new multi-pack-index chain, atomically move it into place, and then release the lock. * Delete any now-unused MIDX layers. After writing the new layer, the strategy is evaluated among the existing MIDX layers in order from oldest to newest. Each step that writes a new MIDX layer uses "--no-write-chain-file" to avoid updating the multi-pack-index-chain file. After all steps are complete, the new chain file is written and then atomically moved into place. At present, this functionality is exposed behind a new enum value, `REPACK_WRITE_MIDX_INCREMENTAL`, but has no external callers. A subsequent commit will expose this mode via `git repack --write-midx=incremental`. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-20 11:31:14 +09:00
Taylor Blau	d376967fbf	builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK` Change the --write-midx (-m) flag from an OPT_BOOL to an OPT_CALLBACK that accepts an optional mode argument. Introduce an enum with REPACK_WRITE_MIDX_NONE and REPACK_WRITE_MIDX_DEFAULT to distinguish between the two states, and update all existing boolean checks accordingly. For now, passing no argument (or just `-m`) selects the default mode, preserving existing behavior. A subsequent commit will add a new mode for writing incremental MIDXs. Extract repack_write_midx() as a dispatcher that selects the appropriate MIDX-writing implementation based on the mode. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-20 11:31:14 +09:00
Taylor Blau	f0ef2afb8b	repack: track the ODB source via existing_packs Store the ODB source in the `existing_packs` struct and use that in place of the raw `repo->objects->sources` access within `cmd_repack()`. The source used is still assigned from the first source in the list, so there are no functional changes in this commit. The changes instead serve two purposes (one immediate, one not): - The incremental MIDX-based repacking machinery will need to know what source is being used to read the existing MIDX/chain (should one exist). - In the future, if "git repack" is taught how to operate on other object sources, this field will serve as the authoritative value for that source. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-20 11:31:13 +09:00
Taylor Blau	0cd2255e64	midx: support custom `--base` for incremental MIDX writes Both `compact` and `write --incremental` fix the base of the resulting MIDX layer: `compact` always places the compacted result on top of "from's" immediate parent in the chain, and `write --incremental` always appends a new layer to the existing tip. In both cases the base is not configurable. Future callers need additional flexibility. For instance, the incremental MIDX-based repacking code may wish to write a layer based on some intermediate ancestor rather than the current tip, or produce a root layer when replacing the bottommost entries in the chain. Introduce a new `--base` option for both subcommands to specify the checksum of the MIDX layer to use as the base. The given checksum must refer to a valid layer in the MIDX chain that is an ancestor of the topmost layer being written or compacted. The special value "none" is accepted to produce a root layer with no parent. This will be needed when the incremental repacking machinery determines that the bottommost layers of the chain should be replaced. If no `--base` is given, behavior is unchanged: `compact` uses "from's" immediate parent in the chain, and `write` appends to the existing tip. For the `write` subcommand, `--base` requires `--no-write-chain-file`. A plain `write --incremental` appends a new layer to the live chain tip with no mechanism to atomically replace it; overriding the base would produce a layer that does not extend the tip, breaking chain invariants. With `--no-write-chain-file` the chain is left unmodified and the caller is responsible for assembling a valid chain. For `compact`, no such restriction applies. The compaction operation atomically replaces the compacted range in the chain file, so writing the result on top of any valid ancestor preserves chain invariants. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-20 11:31:13 +09:00
Taylor Blau	8d342ed4b5	midx: introduce `--no-write-chain-file` for incremental MIDX writes When writing an incremental MIDX layer, the MIDX machinery writes the new layer into the multi-pack-index.d directory and then updates the multi-pack-index-chain file to include the freshly written layer. Future callers however may not wish to immediately update the MIDX chain itself, preferring instead to write out new layer(s) themselves before atomically updating the chain. Concretely, the new incremental MIDX-based repacking strategy will want to do exactly this (that is, assemble the new MIDX chain itself before writing a new chain file and atomically linking it into place). Introduce a `--no-write-chain-file` flag that: * writes the new MIDX layer into the multi-pack-index.d directory * prints its checksum * does not update the multi-pack-index-chain file. The MIDX chain file (and thus, the lock protecting it) remain untouched, allowing callers to assemble the chain themselves. This flag requires `--incremental`, since the notion of a separate layer only makes sense for incremental MIDXs. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-20 11:31:13 +09:00
Junio C Hamano	f5fc0f53de	Merge branch 'sb/unpack-index-pack-buffer-resize' Use a larger buffer size in the code paths to ingest pack stream. * sb/unpack-index-pack-buffer-resize: index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB	2026-05-20 10:30:58 +09:00
Junio C Hamano	ca7d7d6424	Merge branch 'ps/history-fixup' "git history" learned "fixup" command. * ps/history-fixup: builtin/history: introduce "fixup" subcommand builtin/history: generalize function to commit trees replay: allow callers to control what happens with empty commits	2026-05-20 10:30:57 +09:00
Junio C Hamano	a6876b2068	Merge branch 'js/objects-larger-than-4gb-on-windows' Update code paths that assumed "unsigned long" was long enough for "size_t". * js/objects-larger-than-4gb-on-windows: ci: run expensive tests on push builds to integration branches t5608: mark >4GB tests as EXPENSIVE test-tool synthesize: add precomputed SHA-256 pack for 4 GiB + 1 test-tool synthesize: precompute pack for 4 GiB + 1 test-tool synthesize: use the unsafe hash for speed t5608: add regression test for >4GB object clone test-tool: add a helper to synthesize large packfiles delta, packfile: use size_t for delta header sizes odb, packfile: use size_t for streaming object sizes git-zlib: handle data streams larger than 4GB index-pack, unpack-objects: use size_t for object size	2026-05-20 10:30:56 +09:00
Patrick Steinhardt	df69f40c34	setup: stop using `the_repository` in `init_db()` Stop using `the_repository` in `init_db()` and instead accept the repository as a parameter. The injection of `the_repository` is thus bumped one level higher, where callers now pass it in explicitly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-19 19:36:25 +09:00
Patrick Steinhardt	15053894cb	setup: stop using `the_repository` in `create_reference_database()` Stop using `the_repository` in `create_reference_database()` and instead accept the repository as a parameter. The injection of `the_repository` is thus bumped one level higher, where callers now pass it in explicitly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-19 19:36:25 +09:00
Patrick Steinhardt	779fbcd9eb	setup: stop using `the_repository` in `initialize_repository_version()` Stop using `the_repository` in `initialize_repository_version()` and instead accept the repository as a parameter. The injection of `the_repository` is thus bumped one level higher, where callers now pass it in explicitly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-19 19:36:25 +09:00
Patrick Steinhardt	f9210dbc8a	setup: stop using `the_repository` in `setup_git_directory()` Stop using `the_repository` in `setup_git_directory()` and instead accept the repository as a parameter. The injection of `the_repository` is thus bumped one level higher, where callers now pass it in explicitly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-19 19:36:25 +09:00
Patrick Steinhardt	a80a8e3ea6	setup: stop using `the_repository` in `setup_git_directory_gently()` Stop using `the_repository` in `setup_git_directory_gently()` and instead accept the repository as a parameter. The injection of `the_repository` is thus bumped one level higher, where callers now pass it in explicitly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-19 19:36:24 +09:00
Patrick Steinhardt	7a6a82fba0	setup: stop using `the_repository` in `set_git_work_tree()` Stop using `the_repository` in `set_git_work_tree()` and instead accept the repository as a parameter. The injection of `the_repository` is thus bumped one level higher, where callers now pass it in explicitly. Similar as with the preceding commit, we track whether the worktree has been initialized already via a global variable so that we can die in case the repository is re-initialized with a different worktree path. Store this info in the `struct repository` instead so that we correctly handle this per repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-19 19:36:24 +09:00
Patrick Steinhardt	bd2851d84f	setup: stop using `the_repository` in `setup_work_tree()` Stop using `the_repository` in `setup_work_tree()` and instead accept the repository as a parameter. The injection of `the_repository` is thus bumped one level higher, where callers now pass it in explicitly. Note that the function tracks two bits of information via global variables. This of course doesn't make much sense anymore now that we can set up worktrees for arbitrary repositories: - We track whether the worktree has already been initialized and, if so, we skip the call to `chdir_notify()` and setenv(3p). It does not make much sense to store this info in the repository, as we _would_ want to update the environment when switching between worktrees back and forth. So instead of storing this info in the repository, we drop this state entirely and live with the fact that we may execute the logic twice. It should ultimately be idempotent though and thus not be much of a problem. - We track whether the worktree configuration is bogus. If so, and if later on some caller tries to setup the worktree, then we'll die instead. This is indeed information that we can move into the repository itself. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-19 19:36:24 +09:00
Patrick Steinhardt	ea1d0f886d	setup: stop using `the_repository` in `enter_repo()` Stop using `the_repository` in `enter_repo()` and instead accept the repository as a parameter. The injection of `the_repository` is thus bumped one level higher, where callers now pass it in explicitly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-19 19:36:24 +09:00
Patrick Steinhardt	920dba4581	setup: stop using `the_repository` in `verify_non_filename()` Stop using `the_repository` in `verify_non_filename()` and instead accept the repository as a parameter. The injection of `the_repository` is thus bumped one level higher, where callers now pass it in explicitly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-19 19:36:24 +09:00
Patrick Steinhardt	6e7e50cc7b	setup: stop using `the_repository` in `verify_filename()` Stop using `the_repository` in `verify_filename()` and instead accept the repository as a parameter. The injection of `the_repository` is thus bumped one level higher, where callers now pass it in explicitly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-19 19:36:24 +09:00
Patrick Steinhardt	e6a380201e	setup: stop using `the_repository` in `path_inside_repo()` Stop using `the_repository` in `path_inside_repo()` and instead accept the repository as a parameter. The injection of `the_repository` is thus bumped one level higher, where callers now pass it in explicitly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-19 19:36:24 +09:00
Patrick Steinhardt	2c46e933fa	setup: stop using `the_repository` in `prefix_path()` Stop using `the_repository` in `prefix_path()` and instead accept the repository as a parameter. The injection of `the_repository` is thus bumped one level higher, where callers now pass it in explicitly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-19 19:36:24 +09:00
Patrick Steinhardt	8da5ecdb4d	setup: stop using `the_repository` in `is_inside_work_tree()` Similar as with the preceding commit, `is_inside_work_tree()` determines whether the current working directory is located inside the worktree of `the_repository`. Perform the same refactoring by dropping the caching mechanism and injecting the repository that shall be checked. Note that, same as in the preceding commit, we're also resolving the worktree path via `realpath()`. In theory this step is not necessary as we always set the worktree path via `repo_set_worktree()`, and that function already resolves the path for us. But resolving the path a second time is unlikely to matter performance-wise, and it feels fragile to rely on the repository's worktree path being absolute. We thus perform the same extra step even though it's ultimately not required. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-19 19:36:23 +09:00
Patrick Steinhardt	ce70cbc294	setup: stop using `the_repository` in `is_inside_git_dir()` The function `is_inside_git_dir()` verifies whether or not the current working directory is located inside the gitdir of `the_repository`. This is done by taking the gitdir path and verifying that it's a prefix of the current working directory. This information is cached so that we don't have to re-do this change multiple times. Furthermore, we proactively set the value in multiple locations so that we don't even have to perform the check when we have discovered the repository. While we could simply move the caching variable into the repository, the current layout doesn't really feel sensible in the first place: - It can easily lead to false positives or negatives if at any point in time we may switch the current working directory. - We don't call the function in a hot loop, and neither is it overly expensive to compute. Drop the caching infrastructure and instead compute the property ad-hoc via an injected repository. Note that there is one small gotcha: we often end up with relative gitdir paths, and if so `is_inside_dir()` might fail. This wasn't an issue before because of how we proactively set the cached value during repository discovery. Now that we stop doing that it becomes a problem though, which we work around by resolving the gitdir via `realpath()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-19 19:36:23 +09:00
Jeff King	3198237bf3	connect: use "service" enum for "name" argument The git_connect() function takes a "name" argument which is a bit confusing. It is _not_ the program to run on the remote repo, which is specified by the "prog" argument. It should instead be one of a few well-known strings specifying the type of operation (e.g., "git-upload-pack"). But to add to the confusion, unless otherwise configured, those well-known strings will also be the same as the programs we run, making it easy to mistake which variable is which. This confusion comes from `eaa0fd6584` (git_connect(): fix corner cases in downgrading v2 to v0, 2023-03-17), though in its defense, the term "name" and the use of a string are found in other connect code, going all the way back to `b236752a87` (Support remote archive from all smart transports, 2009-12-09). But let's see if we can clean things up a bit. The term "name" is overly vague. We use "service" in other places, including in the smart-http protocol, so let's use it here, too. Using a string invites the notion that it can be anything, not one of a defined set. Let's instead introduce an enum, which has the added bonus that the compiler can catch typos for us, rather than quietly choosing the wrong service from an unexpected strcmp() result. We do still have to turn our enum into those well-known strings to pass along in the remote-helper protocol (e.g., for a stateless-connect directive). But now we do so explicitly and in a way that I think is much more obvious to follow. This is a pure cleanup; there should be no behavior change. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-19 15:05:46 +09:00

1 2 3 4 5 ...

13720 Commits (ffaa2eddd07afa5a86daaf0f9fd8838fb283dc2d)