kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	99bd5a5c9f	Merge branch 'tc/last-modified-active-paths-optimization' "git last-modified" was optimized by narrowing the set of paths to follow as it dug deeper in the history. * tc/last-modified-active-paths-optimization: last-modified: implement faster algorithm	2025-11-12 11:45:24 -08:00
Junio C Hamano	e569dced68	Merge branch 'cc/fast-import-export-i18n-cleanup' Messages from fast-import/export are now marked for i18n. * cc/fast-import-export-i18n-cleanup: gpg-interface: mark a string for translation fast-import: mark strings for translation fast-export: mark strings for translation gpg-interface: use left shift to define GPG_VERIFY_* gpg-interface: simplify ssh fingerprint parsing	2025-11-06 15:17:01 -08:00
Junio C Hamano	c8a641c590	Merge branch 'rz/t0450-bisect-doc-update' The help text and manual page of "git bisect" command have been made consistent with each other. * rz/t0450-bisect-doc-update: bisect: update usage and docs to match each other	2025-11-05 13:41:51 -08:00
Junio C Hamano	a9db6c66f5	Merge branch 'jt/repo-structure' "git repo structure", a new command. * jt/repo-structure: builtin/repo: add progress meter for structure stats builtin/repo: add keyvalue and nul format for structure stats builtin/repo: add object counts in structure output builtin/repo: introduce structure subcommand ref-filter: export ref_kind_from_refname() ref-filter: allow NULL filter pattern builtin/repo: rename repo_info() to cmd_repo_info()	2025-11-04 07:48:07 -08:00
Toon Claes	2a04e8c293	last-modified: implement faster algorithm The current implementation of git-last-modified(1) works by doing a revision walk, and inspecting the diff at each level of that walk to annotate entries remaining in the hashmap of paths. In other words, if the diff at some level touches a path which has not yet been associated with a commit, then that commit becomes associated with the path. While a perfectly reasonable implementation, it can perform poorly in either one of two scenarios: 1. There are many entries of interest, in which case there is simply a lot of work to do. 2. Or, there are (even a few) entries which have not been updated in a long time, and so we must walk through a lot of history in order to find a commit that touches that path. This patch rewrites the last-modified implementation that addresses the second point. The idea behind the algorithm is to propagate a set of 'active' paths (a path is 'active' if it does not yet belong to a commit) up to parents and do a truncated revision walk. The walk is truncated because it does not produce a revision for every change in the original pathspec, but rather only for active paths. More specifically, consider a priority queue of commits sorted by generation number. First, enqueue the set of boundary commits with all paths in the original spec marked as interesting. Then, while the queue is not empty, do the following: 1. Pop an element, say, 'c', off of the queue, making sure that 'c' isn't reachable by anything in the '--not' set. 2. For each parent 'p' (with index 'parent_i') of 'c', do the following: a. Compute the diff between 'c' and 'p'. b. Pass any active paths that are TREESAME from 'c' to 'p'. c. If 'p' has any active paths, push it onto the queue. 3. Any path that remains active on 'c' is associated to that commit. This ends up being equivalent to doing something like 'git log -1 -- $path' for each path simultaneously. But, it allows us to go much faster than the original implementation by limiting the number of diffs we compute, since we can avoid parts of history that would have been considered by the revision walk in the original implementation, but are known to be uninteresting to us because we have already marked all paths in that area to be inactive. To avoid computing many first-parent diffs, add another trick on top of this and check if all paths active in 'c' are DEFINITELY NOT in c's Bloom filter. Since the commit-graph only stores first-parent diffs in the Bloom filters, we can only apply this trick to first-parent diffs. Comparing the performance of this new algorithm shows about a 2.5x improvement on git.git: Benchmark 1: master no bloom Time (mean ± σ): 2.868 s ± 0.023 s [User: 2.811 s, System: 0.051 s] Range (min … max): 2.847 s … 2.926 s 10 runs Benchmark 2: master with bloom Time (mean ± σ): 949.9 ms ± 15.2 ms [User: 907.6 ms, System: 39.5 ms] Range (min … max): 933.3 ms … 971.2 ms 10 runs Benchmark 3: HEAD no bloom Time (mean ± σ): 782.0 ms ± 6.3 ms [User: 740.7 ms, System: 39.2 ms] Range (min … max): 776.4 ms … 798.2 ms 10 runs Benchmark 4: HEAD with bloom Time (mean ± σ): 307.1 ms ± 1.7 ms [User: 276.4 ms, System: 29.9 ms] Range (min … max): 303.7 ms … 309.5 ms 10 runs Summary HEAD with bloom ran 2.55 ± 0.02 times faster than HEAD no bloom 3.09 ± 0.05 times faster than master with bloom 9.34 ± 0.09 times faster than master no bloom In short, the existing implementation is comparably fast with Bloom filters as the new implementation is without Bloom filters. So, most repositories should get a dramatic speed-up by just deploying this (even without computing Bloom filters), and all repositories should get faster still when computing Bloom filters. When comparing a more extreme example of `git last-modified -- COPYING t`, the difference is even 5 times better: Benchmark 1: master Time (mean ± σ): 4.372 s ± 0.057 s [User: 4.286 s, System: 0.062 s] Range (min … max): 4.308 s … 4.509 s 10 runs Benchmark 2: HEAD Time (mean ± σ): 826.3 ms ± 22.3 ms [User: 784.1 ms, System: 39.2 ms] Range (min … max): 810.6 ms … 881.2 ms 10 runs Summary HEAD ran 5.29 ± 0.16 times faster than master As an added benefit, results are more consistent now. For example implementation in 'master' gives: $ git log --max-count=1 --format=%H -- pkt-line.h `15df15fe07` $ git last-modified -- pkt-line.h `15df15fe07` pkt-line.h $ git last-modified \| grep pkt-line.h `5b49c1af03` pkt-line.h With the changes in this patch the results of git-last-modified(1) always match those of `git log --max-count=1`. One thing to note though, the results might be outputted in a different order than before. This is not considerd to be an issue because nowhere is documented the order is guaranteed. Based-on-patches-by: Derrick Stolee <stolee@gmail.com> Based-on-patches-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Toon Claes <toon@iotcl.com> Acked-by: Taylor Blau <me@ttaylorr.com> [jc: tweaked use of xcalloc() to unbreak coccicheck] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-03 07:25:41 -08:00
Junio C Hamano	3cf3369e81	Merge branch 'ps/maintenance-geometric' "git maintenance" command learns the "geometric" strategy where it avoids doing maintenance tasks that rebuilds everything from scratch. * ps/maintenance-geometric: t7900: fix a flaky test due to git-repack always regenerating MIDX builtin/maintenance: introduce "geometric" strategy builtin/maintenance: make "gc" strategy accessible builtin/maintenance: extend "maintenance.strategy" to manual maintenance builtin/maintenance: run maintenance tasks depending on type builtin/maintenance: improve readability of strategies builtin/maintenance: don't silently ignore invalid strategy builtin/maintenance: make the geometric factor configurable builtin/maintenance: introduce "geometric-repack" task builtin/gc: make `too_many_loose_objects()` reusable without GC config builtin/gc: remove global `repack` variable	2025-11-03 06:49:55 -08:00
Junio C Hamano	be414e17e5	Merge branch 'rz/bisect-help-unknown' "git bisect" command did not react correctly to "git bisect help" and "git bisect unknown", which has been corrected. * rz/bisect-help-unknown: bisect: fix handling of `help` and invalid subcommands	2025-10-30 08:00:20 -07:00
Junio C Hamano	5554738038	Merge branch 'ps/remove-packfile-store-get-packs' Two slightly different ways to get at "all the packfiles" in API has been cleaned up. * ps/remove-packfile-store-get-packs: packfile: rename `packfile_store_get_all_packs()` packfile: introduce macro to iterate through packs packfile: drop `packfile_store_get_packs()` builtin/grep: simplify how we preload packs builtin/gc: convert to use `packfile_store_get_all_packs()` object-name: convert to use `packfile_store_get_all_packs()`	2025-10-30 08:00:19 -07:00
Junio C Hamano	923436e23d	Merge branch 'ey/commit-graph-changed-paths-config' A new configuration variable commitGraph.changedPaths allows to turn "--changed-paths" on by default for "git commit-graph". * ey/commit-graph-changed-paths-config: commit-graph: add new config for changed-paths & recommend it in scalar	2025-10-30 08:00:19 -07:00
Christian Couder	c295115ec6	fast-import: mark strings for translation Some error or warning messages in "builtin/fast-import.c" are marked for translation, but many are not. To be more consistent and provide a better experience to people using a translated version, let's mark all the remaining error or warning messages for translation. While at it, let's make the following small changes: - replace "GIT" or "git" in a few error messages to just "Git", - replace "Expected from command, got %s" to "expected 'from' command, got '%s'", which makes it clearer that "from" is a command and should not be translated, - downcase error and warning messages that start with an uppercase, - fix test cases in "t9300-fast-import.sh" that broke because an error or warning message was downcased, - split error and warning messages that are too long, - adjust the indentation of some arguments of the error functions. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-30 07:06:58 -07:00
Christian Couder	d53287b734	fast-export: mark strings for translation Some error or warning messages in "builtin/fast-export.c" are marked for translation, but many are not. To be more consistent and provide a better experience to people using a translated version, let's mark all the remaining error or warning messages for translation. While at it: - improve how some arguments to some error functions are indented, - remove "Error:" at the start of an error message, - downcase error and warning messages that start with an uppercase. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-30 07:06:58 -07:00
Junio C Hamano	c1b23bd8aa	Merge branch 'tb/incremental-midx-part-3.1' Clean-up "git repack" machinery to prepare for incremental update of midx files. * tb/incremental-midx-part-3.1: (49 commits) builtin/repack.c: clean up unused `#include`s repack: move `write_cruft_pack()` out of the builtin repack: move `write_filtered_pack()` out of the builtin repack: move `pack_kept_objects` to `struct pack_objects_args` repack: move `finish_pack_objects_cmd()` out of the builtin builtin/repack.c: pass `write_pack_opts` to `finish_pack_objects_cmd()` repack: extract `write_pack_opts_is_local()` repack: move `find_pack_prefix()` out of the builtin builtin/repack.c: use `write_pack_opts` within `write_cruft_pack()` builtin/repack.c: introduce `struct write_pack_opts` repack: 'write_midx_included_packs' API from the builtin builtin/repack.c: inline packs within `write_midx_included_packs()` builtin/repack.c: pass `repack_write_midx_opts` to `midx_included_packs` builtin/repack.c: inline `remove_redundant_bitmaps()` builtin/repack.c: reorder `remove_redundant_bitmaps()` repack: keep track of MIDX pack names using existing_packs builtin/repack.c: use a string_list for 'midx_pack_names' builtin/repack.c: extract opts struct for 'write_midx_included_packs()' builtin/repack.c: remove ref snapshotting from builtin repack: remove pack_geometry API from the builtin ...	2025-10-29 12:38:24 -07:00
Ruoyu Zhong	bb42dc9710	bisect: update usage and docs to match each other Update the usage string of `git bisect` and documentation to match each other. While at it, also: 1. Move the synopsis of `git bisect` subcommands to the synopsis section, so that the test `t0450-txt-doc-vs-help.sh` can pass. 2. Document the `git bisect next` subcommand, which exists in the code but is missing from the documentation. See also: [1]. [1]: https://lore.kernel.org/git/3DA38465-7636-4EEF-B074-53E4628F5355@gmail.com/ Suggested-by: Ben Knoble <ben.knoble@gmail.com> Signed-off-by: Ruoyu Zhong <zhongruoyu@outlook.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-28 15:41:42 -07:00
Junio C Hamano	3deb97fe24	Merge branch 'cc/fast-import-strip-signed-tags' "git fast-import" is taught to handle signed tags, just like it recently learned to handle signed commits, in different ways. * cc/fast-import-strip-signed-tags: fast-import: add '--signed-tags=<mode>' option fast-export: handle all kinds of tag signatures t9350: properly count annotated tags lib-gpg: allow tests with GPGSM or GPGSSH prereq first doc: git-tag: stop focusing on GPG signed tags	2025-10-28 10:29:09 -07:00
Junio C Hamano	54ac3809c3	Merge branch 'ds/sparse-checkout-clean' "git sparse-checkout" subcommand learned a new "clean" action to prune otherwise unused working-tree files that are outside the areas of interest. * ds/sparse-checkout-clean: sparse-index: improve advice message instructions t: expand tests around sparse merges and clean sparse-index: point users to new 'clean' action sparse-checkout: add --verbose option to 'clean' dir: add generic "walk all files" helper sparse-checkout: match some 'clean' behavior sparse-checkout: add basics of 'clean' command sparse-checkout: remove use of the_repository	2025-10-28 10:29:09 -07:00
Patrick Steinhardt	d9bccf2ec3	builtin/maintenance: introduce "geometric" strategy We have two different repacking strategies in Git: - The "gc" strategy uses git-gc(1). - The "incremental" strategy uses multi-pack indices and `git multi-pack-index repack` to merge together smaller packfiles as determined by a specific batch size. The former strategy is our old and trusted default, whereas the latter has historically been used for our scheduled maintenance. But both strategies have their shortcomings: - The "gc" strategy performs regular all-into-one repacks. Furthermore it is rather inflexible, as it is not easily possible for a user to enable or disable specific subtasks. - The "incremental" strategy is not a full replacement for the "gc" strategy as it doesn't know to prune stale data. So today, we don't have a strategy that is well-suited for large repos while being a full replacement for the "gc" strategy. Introduce a new "geometric" strategy that aims to fill this gap. This strategy invokes all the usual cleanup tasks that git-gc(1) does like pruning reflogs and rerere caches as well as stale worktrees. But where it differs from both the "gc" and "incremental" strategy is that it uses our geometric repacking infrastructure exposed by git-repack(1) to repack packfiles. The advantage of geometric repacking is that we only need to perform an all-into-one repack when the object count in a repo has grown significantly. One downside of this strategy is that pruning of unreferenced objects is not going to happen regularly anymore. Every geometric repack knows to soak up all loose objects regardless of their reachability, and merging two or more packs doesn't consider reachability, either. Consequently, the number of unreachable objects will grow over time. This is remedied by doing an all-into-one repack instead of a geometric repack whenever we determine that the geometric repack would end up merging all packfiles anyway. This all-into-one repack then performs our usual reachability checks and writes unreachable objects into a cruft pack. As cruft packs won't ever be merged during geometric repacks we can thus phase out these objects over time. Of course, this still means that we retain unreachable objects for far longer than with the "gc" strategy. But the maintenance strategy is intended especially for large repositories, where the basic assumption is that the set of unreachable objects will be significantly dwarfed by the number of reachable objects. If this assumption is ever proven to be too disadvantageous we could for example introduce a time-based strategy: if the largest packfile has not been touched for longer than $T, we perform an all-into-one repack. But for now, such a mechanism is deferred into the future as it is not clear yet whether it is needed in the first place. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-24 13:42:45 -07:00
Patrick Steinhardt	40a7415833	builtin/maintenance: make "gc" strategy accessible While the user can pick the "incremental" maintenance strategy, it is not possible to explicitly use the "gc" strategy. This has two downsides: - It is impossible to use the default "gc" strategy for a specific repository when the strategy was globally set to a different strategy. - It is not possible to use git-gc(1) for scheduled maintenance. Address these issues by making making the "gc" strategy configurable. Furthermore, extend the strategy so that git-gc(1) runs for both manual and scheduled maintenance. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-24 13:42:44 -07:00
Patrick Steinhardt	0e994d9f38	builtin/maintenance: extend "maintenance.strategy" to manual maintenance The "maintenance.strategy" configuration allows users to configure how Git is supposed to perform repository maintenance. The idea is that we provide a set of high-level strategies that may be useful in different contexts, like for example when handling a large monorepo. Furthermore, the strategy can be tweaked by the user by overriding specific tasks. In its current form though, the strategy only applies to scheduled maintenance. This creates something of a gap, as scheduled and manual maintenance will now use _different_ strategies as the latter would continue to use git-gc(1) by default. This makes the strategies way less useful than they could be on the one hand. But even more importantly, the two different strategies might clash with one another, where one of the strategies performs maintenance in such a way that it discards benefits from the other strategy. So ideally, it should be possible to pick one strategy that then applies globally to all the different ways that we perform maintenance. This doesn't necessarily mean that the strategy always does the _same_ thing for every maintenance type. But it means that the strategy can configure the different types to work in tandem with each other. Change the meaning of "maintenance.strategy" accordingly so that the strategy is applied to both types, manual and scheduled. As preceding commits have introduced logic to run maintenance tasks depending on this type we can tweak strategies so that they perform those tasks depending on the context. Note that this raises the question of backwards compatibility: when the user has configured the "incremental" strategy we would have ignored that strategy beforehand. Instead, repository maintenance would have continued to use git-gc(1) by default. But luckily, we can match that behaviour by: - Keeping all current tasks of the incremental strategy as `MAINTENANCE_TYPE_SCHEDULED`. This ensures that those tasks will not run during manual maintenance. - Configuring the "gc" task so that it is invoked during manual maintenance. Like this, the user shouldn't observe any difference in behaviour. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-24 13:42:44 -07:00
Patrick Steinhardt	6a7d3eeb47	builtin/maintenance: run maintenance tasks depending on type We basically have three different ways to execute repository maintenance: 1. Manual maintenance via `git maintenance run`. 2. Automatic maintenance via `git maintenance run --auto`. 3. Scheduled maintenance via `git maintenance run --schedule=`. At the moment, maintenance strategies only have an effect for the last type of maintenance. This is about to change in subsequent commits, but to do so we need to be able to skip some tasks depending on how exactly maintenance was invoked. Introduce a new maintenance type that discern between manual (1 & 2) and scheduled (3) maintenance. Convert the `enabled` field into a bitset so that it becomes possible to specifiy which tasks exactly should run in a specific context. The types picked for existing strategies match the status quo: - The default strategy is only ever executed as part of a manual maintenance run. It is not possible to use it for scheduled maintenance. - The incremental strategy is only ever executed as part of a scheduled maintenance run. It is not possible to use it for manual maintenance. The strategies will be tweaked in subsequent commits to make use of this new infrastructure. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-24 13:42:44 -07:00
Patrick Steinhardt	e83e92e876	builtin/maintenance: improve readability of strategies Our maintenance strategies are essentially a large array of structures, where each of the tasks can be enabled and scheduled individually. With the current layout though all the configuration sits on the same nesting layer, which makes it a bit hard to discern which initialized fields belong to what task. Improve readability of the individual tasks by using nested designated initializers instead. Suggested-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-24 13:42:44 -07:00
Patrick Steinhardt	d465be2327	builtin/maintenance: don't silently ignore invalid strategy When parsing maintenance strategies we completely ignore the user-configured value in case it is unknown to us. This makes it basically undiscoverable to the user that scheduled maintenance is devolving into a no-op. Change this to instead die when seeing an unknown maintenance strategy. While at it, pull out the parsing logic into a separate function so that we can reuse it in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-24 13:42:43 -07:00
Patrick Steinhardt	5c2ad50193	builtin/maintenance: make the geometric factor configurable The geometric repacking task uses a factor of two for its geometric sequence, meaning that each next pack must contain at least twice as many objects as the next-smaller one. In some cases it may be helpful to configure this factor though to reduce the number of packfile merges even further, e.g. in very big repositories. But while git-repack(1) itself supports doing this, the maintenance task does not give us a way to tune it. Introduce a new "maintenance.geometric-repack.splitFactor" configuration to plug this gap. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-24 13:42:43 -07:00
Patrick Steinhardt	9bc151850c	builtin/maintenance: introduce "geometric-repack" task Introduce a new "geometric-repack" task. This task uses our geometric repack infrastructure as provided by git-repack(1) itself, which is a strategy that especially hosting providers tend to use to amortize the costs of repacking objects. There is one issue though with geometric repacks, namely that they unconditionally pack all loose objects, regardless of whether or not they are reachable. This is done because it means that we can completely skip the reachability step, which significantly speeds up the operation. But it has the big downside that we are unable to expire objects over time. To address this issue we thus use a split strategy in this new task: whenever a geometric repack would merge together all packs, we instead do an all-into-one repack. By default, these all-into-one repacks have cruft packs enabled, so unreachable objects would now be written into their own pack. Consequently, they won't be soaked up during geometric repacking anymore and can be expired with the next full repack, assuming that their expiry date has surpassed. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-24 13:42:43 -07:00
Patrick Steinhardt	60c0af8e20	builtin/gc: make `too_many_loose_objects()` reusable without GC config To decide whether or not a repository needs to be repacked we estimate the number of loose objects. If the number exceeds a certain threshold we perform the repack, otherwise we don't. This is done via `too_many_loose_objects()`, which takes as parameter the `struct gc_config`. This configuration is only used to determine the threshold. In a subsequent commit we'll add another caller of this function that wants to pass a different limit than the one stored in that structure. Refactor the function accordingly so that we only take the limit as parameter instead of the whole structure. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-24 13:42:42 -07:00
Patrick Steinhardt	0ea94b023a	builtin/gc: remove global `repack` variable The global `repack` variable is used to store all command line arguments that we eventually want to pass to git-repack(1). It is being appended to from multiple different functions, which makes it hard to follow the logic. Besides being hard to follow, it also makes it unnecessarily hard to reuse this infrastructure in new code. Refactor the code so that we store this variable on the stack and pass a pointer to it around as needed. This is done so that we can reuse `add_repack_all_options()` in a subsequent commit. The refactoring itself is straight-forward. One function that deserves attention though is `need_to_gc()`: this function determines whether or not we need to execute garbage collection for `git gc --auto`, but also for `git maintenance run --auto`. But besides figuring out whether we have to perform GC, the function also sets up the `repack` arguments. For `git gc --auto` it's trivial to adapt, as we already have the on-stack variable at our fingertips. But for the maintenance condition it's less obvious what to do. As it turns out, we can just use another temporary variable there that we then immediately discard. If we need to perform GC we execute a child git-gc(1) process to repack objects for us, and that process will have to recompute the arguments anyway. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-24 13:42:42 -07:00
Junio C Hamano	98401c10fc	Merge branch 'bc/sha1-256-interop-01' The beginning of SHA1-SHA256 interoperability work. * bc/sha1-256-interop-01: t1010: use BROKEN_OBJECTS prerequisite t: allow specifying compatibility hash fsck: consider gpgsig headers expected in tags rev-parse: allow printing compatibility hash docs: add documentation for loose objects docs: improve ambiguous areas of pack format documentation docs: reflect actual double signature for tags docs: update offset order for pack index v3 docs: update pack index v3 format	2025-10-22 11:38:58 -07:00
Ruoyu Zhong	2bb3a012f3	bisect: fix handling of `help` and invalid subcommands As documented in git-bisect(1), `git bisect help` should display usage information. However, since the migration of `git bisect` to a full builtin command in `73fce29427` (Turn `git bisect` into a full built-in, 2022-11-10), this behavior was broken. Running `git bisect help` would, instead of showing usage, either fail silently if already in a bisect session, or otherwise trigger an interactive autostart prompt asking "Do you want me to do it for you [Y/n]?". Similarly, since `df63421be9` (bisect--helper: handle states directly, 2022-11-10), running invalid subcommands like `git bisect foobar` also led to the same behavior. This occurred because `help` and other unrecognized subcommands were being unconditionally passed to `bisect_state`, which then called `bisect_autostart`, triggering the interactive prompt. Fix this by: 1. Adding explicit handling for the `help` subcommand to show usage; 2. Validating that unrecognized commands are actually valid state commands before calling `bisect_state`; 3. Showing an error with usage for truly invalid commands. This ensures that `git bisect help` displays the usage as documented, and invalid commands fail cleanly without entering interactive mode. Alternate terms are still handled correctly through `check_and_set_terms`. Signed-off-by: Ruoyu Zhong <zhongruoyu@outlook.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-22 11:25:23 -07:00
Emily Yang	fafdf23b2f	commit-graph: add new config for changed-paths & recommend it in scalar The changed-path Bloom filters feature has proven stable and reliable over several years of use, delivering significant performance improvement for file history computation in large monorepos. Currently a user can opt-in to writing the changed-path Bloom filters using the "--changed-paths" option to "git commit-graph write". The filters will be persisted until the user drops the filters using the "--no-changed-paths" option. For this functionality, refer to `0087a87ba8` (commit-graph: persist existence of changed-paths, 2020-07-01). Large monorepos using Git's background maintenance to build and update commit-graph files could use an easy switch to enable this feature without a foreground computation. In this commit, we're proposing a new config option "commitGraph.changedPaths": * If "true", "git commit-graph write" will write Bloom filters, equivalent to passing "--changed-paths"; * If "false" or "unset", Bloom filters will be written during "git commit-graph write" only if the filters already exist in the current commit-graph file. This matches the default behaviour of "git commit-graph write" without any "--[no-]changed-paths" option. Note "false" can disable a previous "true" config value but doesn't imply "--no-changed-paths". This config will always respect the precedence of command line option "--[no-]changed-paths". We also set this new config as optional recommended config in scalar to turn on this feature for large repos. Helped-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Emily Yang <emilyyang.git@gmail.com> Acked-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-22 10:40:11 -07:00
Justin Tobler	16a93c03c7	builtin/repo: add progress meter for structure stats When using the structure subcommand for git-repo(1), evaluating a repository may take some time depending on its shape. Add a progress meter to provide feedback to the user about what is happening. The progress meter is enabled by default when the command is executed from a tty. It can also be explicitly enabled/disabled via the --[no-]progress option. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-21 14:40:38 -07:00
Justin Tobler	17215675b5	builtin/repo: add keyvalue and nul format for structure stats All repository structure stats are outputted in a human-friendly table form. This format is not suitable for machine parsing. Add a --format option that supports three output modes: `table`, `keyvalue`, and `nul`. The `table` mode is the default format and prints the same table output as before. With the `keyvalue` mode, each line of output contains a key-value pair of a repository stat. The '=' character is used to delimit between keys and values. The `nul` mode is similar to `keyvalue`, but key-values are delimited by a NUL character instead of a newline. Also, instead of a '=' character to delimit between keys and values, a newline character is used. This allows stat values to support special characters without having to cquote them. These two new modes provides output that is more machine-friendly. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-21 14:40:38 -07:00
Justin Tobler	eb5cf58ffc	builtin/repo: add object counts in structure output The amount of objects in a repository can provide insight regarding its shape. To surface this information, use the path-walk API to count the number of reachable objects in the repository by object type. All regular references are used to determine the reachable set of objects. The object counts are appended to the same table containing the reference information. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-21 14:40:38 -07:00
Justin Tobler	bbb2b93348	builtin/repo: introduce structure subcommand The structure of a repository's history can have huge impacts on the performance and health of the repository itself. Currently, Git lacks a means to surface repository metrics regarding its structure/shape via a single command. Acquiring this information requires users to be familiar with the relevant data points and the various Git commands required to surface them. To fill this gap, supplemental tools such as git-sizer(1) have been developed. To allow users to more readily identify repository structure related information, introduce the "structure" subcommand in git-repo(1). The goal of this subcommand is to eventually provide similar functionality to git-sizer(1), but natively in Git. The initial version of this command only iterates through all references in the repository and tracks the count of branches, tags, remote refs, and other reference types. The corresponding information is displayed in a human-friendly table formatted in a very similar manner to git-sizer(1). The width of each table column is adjusted automatically to satisfy the requirements of the widest row contained. Subsequent commits will surface additional relevant data points to output and also provide other more machine-friendly output formats. Based-on-patch-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-21 14:40:37 -07:00
Justin Tobler	026ad60160	builtin/repo: rename repo_info() to cmd_repo_info() Subcommand functions are often prefixed with `cmd_` to denote that they are an entrypoint. Rename repo_info() to cmd_repo_info() accordingly. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-21 14:40:37 -07:00
Junio C Hamano	8bca1c5d59	Merge branch 'tb/incremental-midx-part-3.1' into ps/maintenance-geometric * tb/incremental-midx-part-3.1: (64 commits) builtin/repack.c: clean up unused `#include`s repack: move `write_cruft_pack()` out of the builtin repack: move `write_filtered_pack()` out of the builtin repack: move `pack_kept_objects` to `struct pack_objects_args` repack: move `finish_pack_objects_cmd()` out of the builtin builtin/repack.c: pass `write_pack_opts` to `finish_pack_objects_cmd()` repack: extract `write_pack_opts_is_local()` repack: move `find_pack_prefix()` out of the builtin builtin/repack.c: use `write_pack_opts` within `write_cruft_pack()` builtin/repack.c: introduce `struct write_pack_opts` repack: 'write_midx_included_packs' API from the builtin builtin/repack.c: inline packs within `write_midx_included_packs()` builtin/repack.c: pass `repack_write_midx_opts` to `midx_included_packs` builtin/repack.c: inline `remove_redundant_bitmaps()` builtin/repack.c: reorder `remove_redundant_bitmaps()` repack: keep track of MIDX pack names using existing_packs builtin/repack.c: use a string_list for 'midx_pack_names' builtin/repack.c: extract opts struct for 'write_midx_included_packs()' builtin/repack.c: remove ref snapshotting from builtin repack: remove pack_geometry API from the builtin ...	2025-10-21 11:39:31 -07:00
Junio C Hamano	8329f6724b	Merge branch 'tb/cat-file-objectmode-update' Code clean-up. * tb/cat-file-objectmode-update: builtin/cat-file.c: simplify calling `report_object_status()`	2025-10-20 14:12:18 -07:00
Patrick Steinhardt	ecad863c12	packfile: rename `packfile_store_get_all_packs()` In a preceding commit we have removed `packfile_store_get_packs()`. With this function removed it's somewhat useless to still have the "all" infix in `packfile_store_get_all_packs()`. Rename the latter to drop that infix. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 14:42:40 -07:00
Patrick Steinhardt	86d8c62f48	packfile: introduce macro to iterate through packs We have a bunch of different sites that want to iterate through all packs of a given `struct packfile_store`. This pattern is somewhat verbose and repetitive, which makes it somewhat cumbersome. Introduce a new macro `repo_for_each_pack()` that removes some of the boilerplate. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 14:42:39 -07:00
Patrick Steinhardt	fdebc5d4da	builtin/grep: simplify how we preload packs When using multiple threads in git-grep(1) we eagerly preload both the gitmodules file as well as the packfiles so that the threads won't race with one another to initialize these data structures. For packfiles, this is done by calling `packfile_store_get_packs()`, which first loads our packfiles and then returns a pointer to the first such packfile. This pointer is ignored though, as all we really care about is that `packfile_store_prepare()` was called. Historically, that function was file-local to "packfile.c", but that changed with 4188332569 (packfile: move `get_multi_pack_index()` into "midx.c", 2025-09-02). We can thus simplify the code by calling that function directly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 14:42:39 -07:00
Patrick Steinhardt	07fbf2be2f	builtin/gc: convert to use `packfile_store_get_all_packs()` When running maintenance tasks via git-maintenance(1) we have a couple of auto-conditions that check whether or not a specific task should be running. One such check is for incremental repacks, which essentially use `git multi-pack-index repack` to repack a set of smaller packfiles into one larger packfile. The auto-condition for this task checks how many packfiles there are that aren't indexed by any multi-pack index. If there is a sufficient number then we execute the above command to combine those into a single pack and add that pack to the MIDX. As we don't care about MIDX'd packs we use `packfile_store_get_packs()`, which knows to not load any packs that are indexed by a MIDX. But as explained in the preceding commit, we want to get rid of that function. We already handle packfiles that have a MIDX by the very nature of this function, as we explicitly count non-MIDX'd packs. As such, we can trivially switch over to use `packfile_store_get_all_packs()` instead. Do so. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 14:42:39 -07:00
Junio C Hamano	057a94fbbb	Merge branch 'tb/incremental-midx-part-3.1' into ps/remove-packfile-store-get-packs * tb/incremental-midx-part-3.1: (64 commits) builtin/repack.c: clean up unused `#include`s repack: move `write_cruft_pack()` out of the builtin repack: move `write_filtered_pack()` out of the builtin repack: move `pack_kept_objects` to `struct pack_objects_args` repack: move `finish_pack_objects_cmd()` out of the builtin builtin/repack.c: pass `write_pack_opts` to `finish_pack_objects_cmd()` repack: extract `write_pack_opts_is_local()` repack: move `find_pack_prefix()` out of the builtin builtin/repack.c: use `write_pack_opts` within `write_cruft_pack()` builtin/repack.c: introduce `struct write_pack_opts` repack: 'write_midx_included_packs' API from the builtin builtin/repack.c: inline packs within `write_midx_included_packs()` builtin/repack.c: pass `repack_write_midx_opts` to `midx_included_packs` builtin/repack.c: inline `remove_redundant_bitmaps()` builtin/repack.c: reorder `remove_redundant_bitmaps()` repack: keep track of MIDX pack names using existing_packs builtin/repack.c: use a string_list for 'midx_pack_names' builtin/repack.c: extract opts struct for 'write_midx_included_packs()' builtin/repack.c: remove ref snapshotting from builtin repack: remove pack_geometry API from the builtin ...	2025-10-16 14:42:27 -07:00
Taylor Blau	935ab44a0a	builtin/repack.c: clean up unused `#include`s Over the past several dozen commits, we have moved a large amount of functionality out of the repack builtin and into other files like repack.c, repack-cruft.c, repack-filtered.c, repack-midx.c, and repack-promisor.c. These files specify the minimal set of `#include`s that they need to compile successfully, but we did not change the set of `#include`s in the repack builtin itself. Now that the code movement is complete, let's clean up that set of `#include`s and trim down the builtin to include the minimal amount of external headers necessary to compile. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 10:08:57 -07:00
Taylor Blau	09797bd966	repack: move `write_cruft_pack()` out of the builtin In an identical fashion as the previous commit, move the function `write_cruft_pack()` into its own compilation unit, and make the function visible through the repack.h API. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 10:08:57 -07:00
Taylor Blau	7ac4231b42	repack: move `write_filtered_pack()` out of the builtin In a similar fashion as in previous commits, move the function `write_filtered_pack()` out of the builtin and into its own compilation unit. This function is now part of the repack.h API, but implemented in its own "repack-filtered.c" unit as it is a separate component from other kinds of repacking operations. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 10:08:57 -07:00
Taylor Blau	d278970aef	repack: move `pack_kept_objects` to `struct pack_objects_args` The "pack_kept_objects" variable is defined as static to the repack builtin, but is inherently related to the pack-objects arguments that the builtin uses when generating new packs. Move that field into the "struct pack_objects_args", and shuffle around where we append the corresponding command-line option when preparing a pack-objects process. Specifically: - `write_cruft_pack()` always wants to pass "--honor-pack-keep", so explicitly set the `pack_kept_objects` field to "0" when initializing the `write_pack_opts` struct before calling `write_cruft_pack()`. - `write_filtered_pack()` no longer needs to handle writing the command-line option "--honor-pack-keep" when preparing a pack-objects process, since its call to `prepare_pack_objects()` will have already taken care of that. `write_filtered_pack()` also reads the `pack_kept_objects` field to determine whether to write the existing kept packs with a leading "^" character, so update that to read through the `po_args` pointer instead. - `cmd_repack()` also no longer has to write the "--honor-pack-keep" flag explicitly, since this is also handled via its call to `prepare_pack_objects()`. Since there is a default value for "pack_kept_objects" that relies on whether or not we are writing a bitmap (and not writing a MIDX), extract a default initializer for `struct pack_objects_args` that keeps this conditional default behavior. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 10:08:57 -07:00
Taylor Blau	fa0787a6cc	repack: move `finish_pack_objects_cmd()` out of the builtin In a similar spirit as the previous commit(s), now that the function `finish_pack_objects_cmd()` has no explicit dependencies within the repack builtin, let's extract it. This prepares us to extract the remaining two functions within the repack builtin that explicitly write packfiles, which are `write_cruft_pack()` and `write_filtered_pack()`, which will be done in the future commits. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 10:08:56 -07:00
Taylor Blau	80db3cd189	builtin/repack.c: pass `write_pack_opts` to `finish_pack_objects_cmd()` To prepare to move the `finish_pack_objects_cmd()` function out of the builtin and into the repack.h API, there are a couple of things we need to do first: - First, let's take advantage of `write_pack_opts_is_local()` function introduced in the previous commit instead of passing "local" explicitly. - Let's also avoid referring to the static 'packtmp' field within builtin/repack.c by instead accessing it through the write_pack_opts argument. There are three callers which need to adjust themselves in order to account for this change. The callers which reside in write_cruft_pack() and write_filtered_pack() both already have an "opts" in scope, so they can pass it through transparently. The other call (at the bottom of `cmd_repack()`) needs to initialize its own write_pack_opts to pass the necessary fields over to the direct call to `finish_pack_objects_cmd()`. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 10:08:56 -07:00
Taylor Blau	2f79c79bba	repack: extract `write_pack_opts_is_local()` Similar to the previous commit, the functions `write_cruft_pack()` and `write_filtered_pack()` both compute a "local" variable via the exact same mechanism: const char *scratch; int local = skip_prefix(opts->destination, opts->packdir, &scratch); Not only does this cause us to repeat the same pair of lines, it also introduces an unnecessary "scratch" variable that is common between both functions. Instead of repeating ourselves, let's extract that functionality into a new function in the repack.h API called "write_pack_opts_is_local()". That function takes a pointer to a "struct write_pack_opts" (which has as fields both "destination" and "packdir"), and can encapsulate the dangling "scratch" field. Extract that function and make it visible within the repack.h API, and use it within both `write_cruft_pack()` and `write_filtered_pack()`. While we're at it, match our modern conventions by returning a "bool" instead of "int", and use `starts_with()` instead of `skip_prefix()` to avoid storing the dummy "scratch" variable. The remaining duplication (that is, that both `write_cruft_pack()` and `write_filtered_pack()` still both call `write_pack_opts_is_local()`) will be addressed in the following commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 10:08:56 -07:00
Taylor Blau	98fa0d50a7	repack: move `find_pack_prefix()` out of the builtin Both callers within the repack builtin which call functions that take a 'write_pack_opts' structure have the following pattern: struct write_pack_opts opts = { .packdir = packdir, .packtmp = packtmp, .pack_prefix = find_pack_prefix(packdir, packtmp), /* ... / }; int ret = write_some_kind_of_pack(&opts, / ... */); , but both "packdir" and "packtmp" are fields within the write_pack_opts struct itself! Instead of also computing the pack_prefix ahead of time, let's have the callees compute it themselves by moving `find_pack_prefix()` out of the repack builtin, and have it take a write_pack_opts pointer instead of the "packdir" and "packtmp" fields directly. This avoids the callers having to do some prep work that is common between the two of them, but also avoids the potential pitfall of accidentally writing: .pack_prefix = find_pack_prefix(packtmp, packdir), (which is well-typed) when the caller meant to instead write: .pack_prefix = find_pack_prefix(packdir, packtmp), Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 10:08:56 -07:00
Taylor Blau	3d2ac2065e	builtin/repack.c: use `write_pack_opts` within `write_cruft_pack()` Similar to the changes made in the previous commit to `write_filtered_pack()`, teach `write_cruft_pack()` to take a `write_pack_opts` struct and use that where possible. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 10:08:56 -07:00
Taylor Blau	7a9c81a38d	builtin/repack.c: introduce `struct write_pack_opts` There are various functions within the 'repack' builtin which are responsible for writing different kinds of packs. They include: - `static int write_filtered_pack(...)` - `static int write_cruft_pack(...)` as well as the function `finish_pack_objects_cmd()`, which is responsible for finalizing a new pack write, and recording the checksum of its contents in the 'names' list. Both of these `write_` functions have a few things in common. They both take a pointer to the 'pack_objects_args' struct, as well as a pair of character pointers for `destination` and `pack_prefix`. Instead of repeating those arguments for each function, let's extract an options struct called "write_pack_opts" which has these three parameters as member fields. While we're at it, add fields for "packdir," and "packtmp", both of which are static variables within the builtin, and need to be read from within these two functions. This will shorten the list of parameters that callers have to provide to `write_filtered_pack()`, avoid ambiguity when passing multiple variables of the same type, and provide a unified interface for the two functions mentioned earlier. (Note that "pack_prefix" can be derived on the fly as a function of "packdir" and "packtmp", making it unnecessary to store "pack_prefix" explicitly. This commit ignores that potential cleanup in the name of doing as few things as possible, but a later commit will make that change.) Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-16 10:08:56 -07:00

1 2 3 4 5 ...

13162 Commits (master)