kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	92daf08c84	Merge branch 'ly/submodule-update-failure-leakfix' A memory leak on an error code path has been plugged. * ly/submodule-update-failure-leakfix: builtin/submodule--helper: fix leak when remote_submodule_branch() failed	2025-06-18 13:53:36 -07:00
Junio C Hamano	0d0d56bca4	Merge branch 'ly/commit-buffer-reencode-leakfix' Leakfix. * ly/commit-buffer-reencode-leakfix: repo_logmsg_reencode: fix memory leak when use repo_logmsg_reencode ()	2025-06-18 13:53:34 -07:00
Junio C Hamano	2024ab3d97	Merge branch 'jk/diff-no-index-with-pathspec' "git diff --no-index dirA dirB" can limit the comparison with pathspec at the end of the command line, just like normal "git diff". * jk/diff-no-index-with-pathspec: diff --no-index: support limiting by pathspec pathspec: add flag to indicate operation without repository pathspec: add match_leading_pathspec variant	2025-06-17 10:44:42 -07:00
Junio C Hamano	5e22d03832	Merge branch 'ly/fetch-pack-leakfix' A memory-leak in an error code path has been plugged. * ly/fetch-pack-leakfix: builtin/fetch-pack: cleanup before return error	2025-06-17 10:44:41 -07:00
Junio C Hamano	b5a135b1f7	Merge branch 'ly/commit-graph-graph-write-leakfix' A memory-leak in an error code path has been plugged. * ly/commit-graph-graph-write-leakfix: commit-graph: fix start_delayed_progress() leak	2025-06-17 10:44:41 -07:00
Junio C Hamano	1f622bb0ab	Merge branch 'ly/do-not-localize-bug-messages' Code clean-up. * ly/do-not-localize-bug-messages: BUG(): remove leading underscore of the format string	2025-06-17 10:44:40 -07:00
Junio C Hamano	4fd5b1ddc7	Merge branch 'vd/cat-file-objectmode-update' "git cat-file --batch" learns to understand %(objectmode) atom to allow the caller to tell missing objects (due to repository corruption) and submodules (whose commit objects are OK to be missing) apart. * vd/cat-file-objectmode-update: cat-file.c: add batch handling for submodules cat-file: add %(objectmode) atom t1006: update 'run_tests' to test generic object specifiers	2025-06-17 10:44:39 -07:00
Junio C Hamano	88134a8417	Merge branch 'ds/path-walk-2' "git pack-objects" learns to find delta bases from blobs at the same path, using the --path-walk API. * ds/path-walk-2: pack-objects: allow --shallow and --path-walk path-walk: add new 'edge_aggressive' option pack-objects: thread the path-based compression pack-objects: refactor path-walk delta phase scalar: enable path-walk during push via config pack-objects: enable --path-walk via config repack: add --path-walk option t5538: add tests to confirm deltas in shallow pushes pack-objects: introduce GIT_TEST_PACK_PATH_WALK p5313: add performance tests for --path-walk pack-objects: update usage to match docs pack-objects: add --path-walk option pack-objects: extract should_attempt_deltas()	2025-06-17 10:44:38 -07:00
Lidong Yan	bfc9f9cc64	builtin/submodule--helper: fix leak when remote_submodule_branch() failed In builtin/submodule--helper.c:update_submodule(), the variable remote_name is allocated in get_default_remote_submodule() but may be leaked if remote_submodule_branch() fails. Although it is unlikely that remote_submodule_branch() would fail after successfully obtaining a remote ref name from get_default_remote_submodule(), it is still possible. To prevent a potential memory leak, add a call to free(remote_name) at the early exit point. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-06-08 08:49:48 -07:00
Lidong Yan	61372dd613	repo_logmsg_reencode: fix memory leak when use repo_logmsg_reencode () pretty.c:repo_logmsg_reencode() allocated memory should be freed with repo_unuse_commit_buffer(). Callers sometimes forgot free it at exit point. Add `repo_unuse_commit_buffer()` in insert_records_from_trailers at builtin/shortlog.c and create_commit at builtin/replay.c Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-06-05 08:35:22 -07:00
Lidong Yan	7082da85cb	commit-graph: fix start_delayed_progress() leak In commit-graph.c:graph_write(), if read_one_commit() failed, progress allocated in start_delayed_progress() will leak. Add stop_progress() before goto cleanup. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-06-04 08:55:30 -07:00
Lidong Yan	aedebdb6b9	builtin/fetch-pack: cleanup before return error In builtin/fetch-pack.c:cmd_fetch_pack(), if finish_connect() failed, it returns error code without cleanup which cause memory leak. Add cleanup label before frees in the end of cmd_fetch_pack(), and add `goto cleanup` if finish_connect() failed. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-06-04 08:52:25 -07:00
Victoria Dye	b0b910e052	cat-file.c: add batch handling for submodules When an object specification is passed to 'cat-file --batch[-check]' referring to a submodule (e.g. 'HEAD:path/to/my/submodule'), the current behavior of the command is to print the "missing" error message. However, it is often valuable for callers to distinguish between paths that are actually missing and "the submodule tree entry exists, but the object does not exist in the repository". To disambiguate without needing to invoke a separate Git process (e.g. 'ls-tree'), print the message "<oid> submodule" for such objects instead of "<object> missing". In addition to the change from "missing" to "submodule", the new message differs from the old in that it always prints the resolved tree entry's OID, rather than the input object specification. Note that this implementation maintains a distinction between submodules where the commit OID is not present in the repo, and submodules where the commit OID is present; the former will now print "<object> submodule", but the latter will still print the full object content. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-06-03 12:08:58 -07:00
Victoria Dye	aba1438435	cat-file: add %(objectmode) atom Add a formatting atom, used with the --batch-check/--batch-command options, that prints the octal representation of the object mode if a given revision includes that information, e.g. one that follows the format <tree-ish>:<path>. If the mode information does not exist, an empty string is printed instead. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-06-03 12:08:58 -07:00
Junio C Hamano	d9a1e51c76	Merge branch 'bs/total-ram-bsd' Update total_ram() functrion on BSD variants. * bs/total-ram-bsd: builtin/gc: correct physical memory detection for OpenBSD / NetBSD	2025-06-03 08:55:24 -07:00
Lidong Yan	5dceb8bd05	BUG(): remove leading underscore of the format string BUG() is not end-user facing but programmer facing, and we do not use _("...") in them. Replace all `BUG(_("..."))` with `BUG("...")` Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-06-03 08:36:11 -07:00
Brad Smith	35c1d592cd	builtin/gc: correct physical memory detection for OpenBSD / NetBSD OpenBSD / NetBSD use HW_PHYSMEM64 to detect the amount of physical memory in a system. HW_PHYSMEM will not provide the correct amount on a system with >=4GB of memory. Signed-off-by: Brad Smith <brad@comstyle.com> Reviewed-by: Collin Funk <collin.funk1@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-06-01 19:01:07 -07:00
Junio C Hamano	0b4c6baa70	fast-export: --signed-commits is experimental As the design of signature handling is still being discussed, it is likely that the data stream produced by the code in Git 2.50 would have to be changed in such a way that is not backward compatible. Mark the feature as experimental and discourge its use for now. Also flip the default on the generation side to "strip"; users of existing versions would not have passed --signed-commits=strip and will be broken by this change if the default is made to abort, and will be encouraged by the error message to produce data stream with future breakage guarantees by passing --signed-commits option. As we tone down the default behaviour, we no longer need the FAST_EXPORT_SIGNED_COMMITS_NOABORT environment variable, which was not discoverable enough. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-28 10:30:47 -07:00
Junio C Hamano	b4847a4477	Merge branch 'jt/receive-pack-skip-connectivity-check' "git receive-pack" optionally learns not to care about connectivity check, which can be useful when the repository arranges to ensure connectivity by some other means. * jt/receive-pack-skip-connectivity-check: builtin/receive-pack: add option to skip connectivity check t5410: test receive-pack connectivity check	2025-05-28 07:59:56 -07:00
Junio C Hamano	f9cdaa2860	Merge branch 'js/misc-fixes' Assorted fixes for issues found with CodeQL. * js/misc-fixes: sequencer: stop pretending that an assignment is a condition bundle-uri: avoid using undefined output of `sscanf()` commit-graph: avoid using stale stack addresses trace2: avoid "futile conditional" Avoid redundant conditions fetch: avoid unnecessary work when there is no current branch has_dir_name(): make code more obvious upload-pack: rename `enum` to reflect the operation commit-graph: avoid malloc'ing a local variable fetch: carefully clear local variable's address after use commit: simplify code	2025-05-27 13:59:11 -07:00
Junio C Hamano	6e5fb398d3	Merge branch 'ds/sparse-apply-add-p' "git apply" and "git add -i/-p" code paths no longer unnecessarily expand sparse-index while working. * ds/sparse-apply-add-p: p2000: add performance test for patch-mode commands reset: integrate sparse index with --patch git add: make -p/-i aware of sparse index apply: integrate with the sparse index	2025-05-27 13:59:09 -07:00
Junio C Hamano	f545f401be	Merge branch 'en/merge-tree-check' "git merge-tree" learned an option to see if it resolves cleanly without actually creating a result. * en/merge-tree-check: merge-tree: add a new --quiet flag merge-ort: add a new mergeability_only option	2025-05-27 13:59:08 -07:00
Junio C Hamano	17d9dbd3c2	Merge branch 'jk/no-funny-object-types' Support to create a loose object file with unknown object type has been dropped. * jk/no-funny-object-types: object-file: drop support for writing objects with unknown types hash-object: handle --literally with OPT_NEGBIT hash-object: merge HASH_* and INDEX_* flags hash-object: stop allowing unknown types t: add lib-loose.sh t/helper: add zlib test-tool oid_object_info(): drop type_name strbuf fsck: stop using object_info->type_name strbuf oid_object_info_convert(): stop using string for object type cat-file: use type enum instead of buffer for -t option object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag cat-file: make --allow-unknown-type a noop object-file.h: fix typo in variable declaration	2025-05-27 13:59:08 -07:00
Junio C Hamano	96d127896d	Merge branch 'en/replay-wo-the-repository' The dependency on the_repository variable has been reduced from the code paths in "git replay". * en/replay-wo-the-repository: replay: replace the_repository with repo parameter passed to cmd_replay ()	2025-05-23 15:34:08 -07:00
Jacob Keller	09fb155f11	diff --no-index: support limiting by pathspec The --no-index option of git-diff enables using the diff machinery from git while operating outside of a repository. This mode of git diff is able to compare directories and produce a diff of their contents. When operating git diff in a repository, git has the notion of "pathspecs" which can specify which files to compare. In particular, when using git to diff two trees, you might invoke: $ git diff-tree -r <treeish1> <treeish2>. where the treeish could point to a subdirectory of the repository. When invoked this way, users can limit the selected paths of the tree by using a pathspec. Either by providing some list of paths to accept, or by removing paths via a negative refspec. The git diff --no-index mode does not support pathspecs, and cannot limit the diff output in this way. Other diff programs such as GNU difftools have options for excluding paths based on a pattern match. However, using git diff as a diff replacement has several advantages over many popular diff tools, including coloring moved lines, rename detections, and similar. Teach git diff --no-index how to handle pathspecs to limit the comparisons. This will only be supported if both provided paths are directories. For comparisons where one path isn't a directory, the --no-index mode already has some DWIM shortcuts implemented in the fixup_paths() function. Modify the fixup_paths function to return 1 if both paths are directories. If this is the case, interpret any extra arguments to git diff as pathspecs via parse_pathspec. Use parse_pathspec to load the remaining arguments (if any) to git diff --no-index as pathspec items. Disable PATHSPEC_ATTR support since we do not have a repository to do attribute lookup. Disable PATHSPEC_FROMTOP since we do not have a repository root. All pathspecs are treated as rooted at the provided comparison paths. After loading the pathspec data, calculate skip offsets for skipping past the root portion of the paths. This is required to ensure that pathspecs start matching from the provided path, rather than matching from the absolute path. We could instead pass the paths as prefix values to parse_pathspec. This is slightly problematic because the paths come from the command line and don't necessarily have the proper trailing slash. Additionally, that would require parsing pathspecs multiple times. Pass the pathspec object and the skip offsets into queue_diff, which in-turn must pass them along to read_directory_contents. Modify read_directory_contents to check against the pathspecs when scanning the directory. Use the skip offset to skip past the initial root of the path, and only match against portions that are below the intended directory structure being compared. The search algorithm for finding paths is recursive with read_dir. To make pathspec matching work properly, we must set both DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC. Without DO_MATCH_DIRECTORY, paths like "a/b/c/d" will not match against pathspecs like "a/b/c". This is usually achieved by setting the is_dir parameter of match_pathspec. Without DO_MATCH_LEADING_PATHSPEC, paths like "a/b/c" would not match against pathspecs like "a/b/c/d". This is crucial because we recursively iterate down the directories. We could simply avoid checking pathspecs at subdirectories, but this would force recursion down directories which would simply be skipped. If we always passed DO_MATCH_LEADING_PATHSPEC, then we will incorrectly match in certain cases such as matching 'a/c' against ':(glob)/d'. The match logic will see that a matches the leading part of the / and accept this even tho c doesn't match. To avoid this, use the match_leading_pathspec() variant recently introduced. This sets both flags when is_dir is set, but leaves them both cleared when is_dir is 0. Add test cases and documentation covering the new functionality. Note for the documentation I opted not to move the placement of '--' which is sometimes used to disambiguate arguments. The diff --no-index mode requires exactly 2 arguments determining what to compare. Any additional arguments are interpreted as pathspecs and must come afterwards. Use of '--' would not actually disambiguate anything, since there will never be ambiguity over which arguments represent paths or pathspecs. Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-22 14:20:11 -07:00
Justin Tobler	68cb0b5253	builtin/receive-pack: add option to skip connectivity check During git-receive-pack(1), connectivity of the object graph is validated to ensure that the received packfile does not leave the repository in a broken state. This is done via git-rev-list(1) and walking the objects, which can be expensive for large repositories. Generally, this check is critical to avoid an incomplete received packfile from corrupting a repository. Server operators may have additional knowledge though around exactly how Git is being used on the server-side which can be used to facilitate more efficient connectivity computation of incoming objects. For example, if it can be ensured that all objects in a repository are connected and do not depend on any missing objects, the connectivity of newly written objects can be checked by walking the object graph containing only the new objects from the updated tips and identifying the missing objects which represent the boundary between the new objects and the repository. These boundary objects can be checked in the canonical repository to ensure the new objects connect as expected and thus avoid walking the rest of the object graph. Git itself cannot make the guarantees required for such an optimization as it is possible for a repository to contain an unreachable object that references a missing object without the repository being considered corrupt. Introduce the --skip-connectivity-check option for git-receive-pack(1) which bypasses this connectivity check to give more control to the server-side. Note that without proper server-side validation of newly received objects handled outside of Git, usage of this option risks corrupting a repository. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-20 11:43:36 -07:00
Junio C Hamano	a9dcacbf2a	Merge branch 'jk/oidmap-cleanup' Code cleanup. * jk/oidmap-cleanup: raw_object_store: drop extra pointer to replace_map oidmap: add size function oidmap: rename oidmap_free() to oidmap_clear()	2025-05-19 16:02:47 -07:00
Junio C Hamano	6660b42929	Merge branch 'ly/am-split-stgit-leakfix' Leakfix. * ly/am-split-stgit-leakfix: builtin/am: fix memory leak in `split_mail_stgit_series`	2025-05-19 16:02:46 -07:00
Elijah Newren	29d7bf1951	merge-tree: add a new --quiet flag Git Forges may be interested in whether two branches can be merged while not being interested in what the resulting merge tree is nor which files conflicted. For such cases, add a new --quiet flag which will make use of the new mergeability_only flag added to merge-ort in the previous commit. This option allows the merge machinery to, in the outer layer of the merge: * exit early when a conflict is detected * avoid writing (most) merged blobs/trees to the object store Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 15:09:14 -07:00
Derrick Stolee	c178b02e29	pack-objects: allow --shallow and --path-walk There does not appear to be anything particularly incompatible about the --shallow and --path-walk options of 'git pack-objects'. If shallow commits are to be handled differently, then it is by the revision walk that defines the commit set and which are interesting or uninteresting. However, before the previous change, a trivial removal of the warning would cause a failure in t5500-fetch-pack.sh when GIT_TEST_PACK_PATH_WALK is enabled. The shallow fetch would provide more objects than we desired, due to some incorrect behavior of the path-walk API, especially around walking uninteresting objects. The recently-added tests in t5538-push-shallow.sh help to confirm this behavior is working with the --path-walk option if GIT_TEST_PACK_PATH_WALK is enabled. These tests passed previously due to the --path-walk feature being disabled in the presence of a shallow clone. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:15:41 -07:00
Derrick Stolee	e5394794a5	pack-objects: thread the path-based compression Adapting the implementation of ll_find_deltas(), create a threaded version of the --path-walk compression step in 'git pack-objects'. This involves adding a 'regions' member to the thread_params struct, allowing each thread to own a section of paths. We can simplify the way jobs are split because there is no value in extending the batch based on name-hash the way sections of the object entry array are attempted to be grouped. We re-use the 'list_size' and 'remaining' items for the purpose of borrowing work in progress from other "victim" threads when a thread has finished its batch of work more quickly. Using the Git repository as a test repo, the p5313 performance test shows that the resulting size of the repo is the same, but the threaded implementation gives gains of varying degrees depending on the number of objects being packed. (This was tested on a 16-core machine.) Test HEAD~1 HEAD --------------------------------------------------- 5313.20: big pack 2.38 1.99 -16.4% 5313.21: big pack size 16.1M 16.0M -0.2% 5313.24: repack 107.32 45.41 -57.7% 5313.25: repack size 213.3M 213.2M -0.0% (Test output is formatted to better fit in message.) This ~60% reduction in 'git repack --path-walk' time is typical across all repos I used for testing. What is interesting is to compare when the overall time improves enough to outperform the --name-hash-version=1 case. These time improvements correlate with repositories with data shapes that significantly improve their data size as well. The --path-walk feature frequently takes longer than --name-hash-version=2, trading some extra computation for some additional compression. The natural place where this additional computation comes from is the two compression passes that --path-walk takes, though the first pass is naturally faster due to the path boundaries avoiding a number of delta compression attempts. For example, the microsoft/fluentui repo has significant size reduction from --name-hash-version=1 to --name-hash-version=2 followed by further improvements with --path-walk. The threaded computation makes --path-walk more competitive in time compared to --name-hash-version=2, though still ~31% more expensive in that metric. Repack Method Pack Size Time ------------------------------------------ Hash v1 439.4M 87.24s Hash v2 161.7M 21.51s Path Walk (Before) 142.5M 81.29s Path Walk (After) 142.5M 28.16s Similar results hold for the Git repository: Repack Method Pack Size Time ------------------------------------------ Hash v1 248.8M 30.44s Hash v2 249.0M 30.15s Path Walk (Before) 213.2M 142.50s Path Walk (After) 213.3M 45.41s ...as well as the nodejs/node repository: Repack Method Pack Size Time ------------------------------------------ Hash v1 739.9M 71.18s Hash v2 764.6M 67.82s Path Walk (Before) 698.1M 208.10s Path Walk (After) 698.0M 75.10s Finally, the Linux kernel repository is a good test for this repacking time change, even though the space savings is more subtle: Repack Method Pack Size Time ------------------------------------------ Hash v1 2.5G 554.41s Hash v2 2.5G 549.62s Path Walk (before) 2.2G 1562.36s Path Walk (before) 2.2G 559.00s Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:15:40 -07:00
Derrick Stolee	206a1bb203	pack-objects: refactor path-walk delta phase Previously, the --path-walk option to 'git pack-objects' would compute deltas inline with the path-walk logic. This would make the progress indicator look like it is taking a long time to enumerate objects, and then very quickly computed deltas. Instead of computing deltas on each region of objects organized by tree, store a list of regions corresponding to these groups. These can later be pulled from the list for delta compression before doing the "global" delta search. This presents a new progress indicator that can be used in tests to verify that this stage is happening. The current implementation is not integrated with threads, but we are setting it up to arrive in the next change. Since we do not attempt to sort objects by size until after exploring all trees, we can remove the previous change to t5530 due to a different error message appearing first. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:15:40 -07:00
Derrick Stolee	4f7f571204	pack-objects: enable --path-walk via config Users may want to enable the --path-walk option for 'git pack-objects' by default, especially underneath commands like 'git push' or 'git repack'. This should be limited to client repositories, since the --path-walk option disables bitmap walks, so would be bad to include in Git servers when serving fetches and clones. There is potential that it may be helpful to consider when repacking the repository, to take advantage of improved deltas across historical versions of the same files. Much like how "pack.useSparse" was introduced and included in "feature.experimental" before being enabled by default, use the repository settings infrastructure to make the new "pack.usePathWalk" config enabled by "feature.experimental" and "feature.manyFiles". In order to test that this config works, add a new trace2 region around the path walk code that can be checked by a 'git push' command. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:15:39 -07:00
Derrick Stolee	5f711504d9	repack: add --path-walk option Since 'git pack-objects' supports a --path-walk option, allow passing it through in 'git repack'. This presents interesting testing opportunities for comparing the different repacking strategies against each other. Add the --path-walk option to the performance tests in p5313. For the microsoft/fluentui repo [1] checked out at a specific commit [2], the --path-walk tests in p5313 look like this: Test this tree ------------------------------------------------------------------------- 5313.18: thin pack with --path-walk 0.08(0.06+0.02) 5313.19: thin pack size with --path-walk 18.4K 5313.20: big pack with --path-walk 2.10(7.80+0.26) 5313.21: big pack size with --path-walk 19.8M 5313.22: shallow fetch pack with --path-walk 1.62(3.38+0.17) 5313.23: shallow pack size with --path-walk 33.6M 5313.24: repack with --path-walk 81.29(96.08+0.71) 5313.25: repack size with --path-walk 142.5M [1] https://github.com/microsoft/fluentui [2] e70848ebac1cd720875bccaa3026f4a9ed700e08 Along with the earlier tests in p5313, I'll instead reformat the comparison as follows: Repack Method Pack Size Time --------------------------------------- Hash v1 439.4M 87.24s Hash v2 161.7M 21.51s Path Walk 142.5M 81.29s There are a few things to notice here: 1. The benefits of --name-hash-version=2 over --name-hash-version=1 are significant, but --path-walk still compresses better than that option. 2. The --path-walk command is still using --name-hash-version=1 for the second pass of delta computation, using the increased name hash collisions as a potential method for opportunistic compression on top of the path-focused compression. 3. The --path-walk algorithm is currently sequential and does not use multiple threads for delta compression. Threading will be implemented in a future change so the computation time will improve to better compete in this metric. There are small benefits in size for my copy of the Git repository: Repack Method Pack Size Time --------------------------------------- Hash v1 248.8M 30.44s Hash v2 249.0M 30.15s Path Walk 213.2M 142.50s As well as in the nodejs/node repository [3]: Repack Method Pack Size Time --------------------------------------- Hash v1 739.9M 71.18s Hash v2 764.6M 67.82s Path Walk 698.1M 208.10s [3] https://github.com/nodejs/node This benefit also repeats in my copy of the Linux kernel repository: Repack Method Pack Size Time --------------------------------------- Hash v1 2.5G 554.41s Hash v2 2.5G 549.62s Path Walk 2.2G 1562.36s It is important to see that even when the repository shape does not have many name-hash collisions, there is a slight space boost to be found using this method. As this repacking strategy was released in Git for Windows 2.47.0, some users have reported cases where the --path-walk compression is slightly worse than the --name-hash-version=2 option. In those cases, it may be beneficial to combine the two options. However, there has not been a released version of Git that has both options and I don't have access to these repos for testing. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:15:39 -07:00
Derrick Stolee	861d4bc292	pack-objects: introduce GIT_TEST_PACK_PATH_WALK There are many tests that validate whether 'git pack-objects' works as expected. Instead of duplicating these tests, add a new test environment variable, GIT_TEST_PACK_PATH_WALK, that implies --path-walk by default when specified. This was useful in testing the implementation of the --path-walk implementation, helping to find tests that are overly specific to the default object walk. These include: - t0411-clone-from-partial.sh : One test fetches from a repo that does not have the boundary objects. This causes the path-based walk to fail. Disable the variable for this test. - t5306-pack-nobase.sh : Similar to t0411, one test fetches from a repo without a boundary object. - t5310-pack-bitmaps.sh : One test compares the case when packing with bitmaps to the case when packing without them. Since we disable the test variable when writing bitmaps, this causes a difference in the object list (the --path-walk option adds an extra object). Specify --no-path-walk in both processes for the comparison. Another test checks for a specific delta base, but when computing dynamically without using bitmaps, the base object it too small to be considered in the delta calculations so no base is used. - t5316-pack-delta-depth.sh : This script cares about certain delta choices and their chain lengths. The --path-walk option changes how these chains are selected, and thus changes the results of this test. - t5322-pack-objects-sparse.sh : This demonstrates the effectiveness of the --sparse option and how it combines with --path-walk. - t5332-multi-pack-reuse.sh : This test verifies that the preferred pack is used for delta reuse when possible. The --path-walk option is not currently aware of the preferred pack at all, so finds a different delta base. - t7406-submodule-update.sh : When using the variable, the --depth option collides with the --path-walk feature, resulting in a warning message. Disable the variable so this warning does not appear. I want to call out one specific test change that is only temporary: - t5530-upload-pack-error.sh : One test cares specifically about an "unable to read" error message. Since the current implementation performs delta calculations within the path-walk API callback, a different "unable to get size" error message appears. When this is changed in a future refactoring, this test change can be reverted. Similar to GIT_TEST_NAME_HASH_VERSION, we do not add this option to the linux-TEST-vars CI build as that's already an overloaded build. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:15:39 -07:00
Derrick Stolee	9fcfe12ac4	pack-objects: update usage to match docs The t0450 test script verifies that builtin usage matches the synopsis in the documentation. Adjust the builtin to match and then remove 'git pack-objects' from the exception list. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:15:38 -07:00
Derrick Stolee	70664d2865	pack-objects: add --path-walk option In order to more easily compute delta bases among objects that appear at the exact same path, add a --path-walk option to 'git pack-objects'. This option will use the path-walk API instead of the object walk given by the revision machinery. Since objects will be provided in batches representing a common path, those objects can be tested for delta bases immediately instead of waiting for a sort of the full object list by name-hash. This has multiple benefits, including avoiding collisions by name-hash. The objects marked as UNINTERESTING are included in these batches, so we are guaranteeing some locality to find good delta bases. After the individual passes are done on a per-path basis, the default name-hash is used to find other opportunistic delta bases that did not match exactly by the full path name. The current implementation performs delta calculations while walking objects, which is not ideal for a few reasons. First, this will cause the "Enumerating objects" phase to be much longer than usual. Second, it does not take advantage of threading during the path-scoped delta calculations. Even with this lack of threading, the path-walk option is sometimes faster than the usual approach. Future changes will refactor this code to allow for threading, but that complexity is deferred until later to keep this patch as simple as possible. This new walk is incompatible with some features and is ignored by others: * Object filters are not currently integrated with the path-walk API, such as sparse-checkout or tree depth. A blobless packfile could be integrated easily, but that is deferred for later. * Server-focused features such as delta islands, shallow packs, and using a bitmap index are incompatible with the path-walk API. * The path walk API is only compatible with the --revs option, not taking object lists or pack lists over stdin. These alternative ways to specify the objects currently ignores the --path-walk option without even a warning. Future changes will create performance tests that demonstrate the power of this approach. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:15:38 -07:00
Derrick Stolee	4bc0ba0829	pack-objects: extract should_attempt_deltas() This will be helpful in a future change, which will reuse this logic. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:15:37 -07:00
Derrick Stolee	efab7dc1f4	reset: integrate sparse index with --patch Similar to the previous change for 'git add -p', the reset builtin checked for integration with the sparse index after possibly redirecting its logic toward the interactive logic. This means that the builtin would expand the sparse index to a full one upon read. Move this check earlier within cmd_reset() to improve performance here. Add tests to guarantee that we are not universally expanding the index. Add behavior tests to check that we are doing the same operations as a full index. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:02:47 -07:00
Derrick Stolee	02ed8555f6	git add: make -p/-i aware of sparse index It is slow to expand a sparse index in-memory due to parsing of trees. We aim to minimize that performance cost when possible. 'git add -p' uses 'git apply' child processes to modify the index, but still there are some expansions that occur. It turns out that control flows out of cmd_add() in the interactive cases before the lines that confirm that the builtin is integrated with the sparse index. Moving that integration point earlier in cmd_add() allows 'git add -i' and 'git add -p' to operate without expanding a sparse index to a full one. Add test cases that confirm that these interactive add options work with the sparse index. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:01:51 -07:00
Derrick Stolee	952de281fe	apply: integrate with the sparse index The sparse index allows storing directory entries in the index, marked with the skip-wortkree bit and pointing to a tree object. This may be an unexpected data shape for some implementation areas, so we are rolling it out incrementally on a builtin-per-builtin basis. This change enables the sparse index for 'git apply'. The main motivation for this change is that 'git apply' is used as a child process of 'git add -p' and expanding the sparse index for each of those child processes can lead to significant performance issues. The good news is that the actual index manipulation code used by 'git apply' is already integrated with the sparse index, so the only product change is to mark the builtin as allowing the sparse index so it isn't inflated on read. The more involved part of this change is around adding tests that verify how 'git apply' behaves in a sparse-checkout environment and whether or not the index expands in certain operations. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:00:33 -07:00
Jeff King	f710fd7b49	hash-object: handle --literally with OPT_NEGBIT Since we recently removed the hash_literally() function, the hash-object --literally option has been simplified to just removing the INDEX_FORMAT_CHECK flag. Rather than pass it around as a separate bool, we can just have the option parser remove the bit from the set of flags directly. This simplifies the helper functions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:11 -07:00
Jeff King	931e5ca507	hash-object: merge HASH_* and INDEX_* flags The hash-object command has its own custom flag bits that it sets based on command-line options. But since we dropped hash_literally() in the previous commit, the only thing we do with those flag bits is convert them directly into "index_flags" to pass to index_fd(). This extra layer of indirection makes the code harder to read and reason about. Let's just use the INDEX_* flags directly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:11 -07:00
Jeff King	65a6a79b42	hash-object: stop allowing unknown types When passed the "--literally" option, hash-object will allow any arbitrary string for its "-t" type option. Such objects are only useful for testing or debugging, as they cannot be used in the normal way (e.g., you cannot fetch their contents!). Let's drop this feature, which will eventually let us simplify the object-writing code. This is technically backwards incompatible, but since such objects were never really functional, it seems unlikely that anybody will notice. We will retain the --literally flag, as it also instructs hash-object not to worry about other format issues (e.g., type-specific things that fsck would complain about). The documentation does not need to be updated, as it was always vague about which checks we're loosening (it uses only the phrase "any garbage"). The code change is a bit hard to verify from just the patch text. We can drop our local hash_literally() helper, but it was really just wrapping write_object_file_literally(). We now replace that with calling index_fd(), as we do for the non-literal code path, but dropping the INDEX_FORMAT_CHECK flag. This ends up being the same semantically as what the _literally() code path was doing (modulo handling unknown types, which is our goal). We'll be able to clean up these code paths a bit more in subsequent patches. The existing test is flipped to show that we now reject the unknown type. The additional "extra-long type" test is now redundant, as we bail early upon seeing a bogus type. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:11 -07:00
Jeff King	4ae0e9423c	fsck: stop using object_info->type_name strbuf When fsck-ing a loose object, we use object_info's type_name strbuf to record the parsed object type as a string. For most objects this is redundant with the object_type enum, but it does let us report the string when we encounter an object with an unknown type (for which there is no matching enum value). There are a few downsides, though: 1. The code to report these cases is not actually robust. Since we did not pass a strbuf to unpack_loose_header(), we only retrieved types from headers up to 32 bytes. In longer cases, we'd simply say "object corrupt or missing". 2. This is the last caller that uses object_info's type_name strbuf support. It would be nice to refactor it so that we can simplify that code. 3. Likewise, we'll check the hash of the object using its unknown type (again, as long as that type is short enough). That depends on the hash_object_file_literally() code, which we'd eventually like to get rid of. So we can simplify things by bailing immediately in read_loose_object() when we encounter an unknown type. This has a few user-visible effects: a. Instead of producing a single line of error output like this: error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object is of unknown type 'bogus': .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6 we'll now issue two lines (the first from read_loose_object() when we see the unparsable header, and the second from the fsck code, since we couldn't read the object): error: unable to parse type from header 'bogus 4' of .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6 error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object corrupt or missing: .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6 This is a little more verbose, but this sort of error should be rare (such objects are almost impossible to work with, and cannot be transferred between repositories as they are not representable in packfiles). And as a bonus, reporting the broken header in full could help with debugging other cases (e.g., a header like "blob xyzzy\0" would fail in parsing the size, but previously we'd not have showed the offending bytes). b. An object with an unknown type will be reported as corrupt, without actually doing a hash check. Again, I think this is unlikely to matter in practice since such objects are totally unusable. We'll update one fsck test to match the new error strings. And we can remove another test that covered the case of an object with an unknown type _and_ a hash corruption. Since we'll skip the hash check now in this case, the test is no longer interesting. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:10 -07:00
Jeff King	aac2abeca7	cat-file: use type enum instead of buffer for -t option Now that we no longer support OBJECT_INFO_ALLOW_UNKNOWN_TYPE, there is no need to pass a strbuf into oid_object_info_extended() to record the type. The regular object_type enum is sufficient to capture all of the types we will allow. This simplifies the code a bit, and will eventually let us drop object_info's type_name strbuf support. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:10 -07:00
Jeff King	f227fc7d43	cat-file: make --allow-unknown-type a noop The cat-file command has some minor support for handling objects with "unknown" types. I.e., strings that are not "blob", "commit", "tree", or "tag". In theory this could be used for debugging or experimenting with extensions to Git. But in practice this support is not very useful: 1. You can get the type and size of such objects, but nothing else. Not even the contents! 2. Only loose objects are supported, since packfiles use numeric ids for the types, rather than strings. 3. Likewise you cannot ever transfer objects between repositories, because they cannot be represented in the packfiles used for the on-the-wire protocol. The support for these unknown types complicates the object-parsing code, and has led to bugs such as `b748ddb7a4` (unpack_loose_header(): fix infinite loop on broken zlib input, 2025-02-25). So let's drop it. The first step is to remove the user-facing parts, which are accessible only via cat-file. This is technically backwards-incompatible, but given the limitations listed above, these objects couldn't possibly be useful in any workflow. However, we can't just rip out the option entirely. That would hurt a caller who ran: git cat-file -t --allow-unknown-object <oid> and fed it normal, well-formed objects. There --allow-unknown-type was doing nothing, but we wouldn't want to start bailing with an error. So to protect any such callers, we'll retain --allow-unknown-type as a noop. The code change is fairly small (but we'll able to clean up more code in follow-on patches). The test updates drop any use of the option. We still retain tests that feed the broken objects to cat-file without --allow-unknown-type, as we should continue to confirm that those objects are rejected. Note that in one spot we can drop a layer of loop, re-indenting the body; viewing the diff with "-w" helps there. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:09 -07:00
Junio C Hamano	4dda60c9df	Merge branch 'ps/maintenance-missing-tasks' Make repository clean-up tasks "gc" can do available to "git maintenance" front-end. * ps/maintenance-missing-tasks: builtin/maintenance: introduce "rerere-gc" task builtin/gc: move rerere garbage collection into separate function builtin/maintenance: introduce "worktree-prune" task builtin/gc: move pruning of worktrees into a separate function builtin/gc: remove global variables where it is trivial to do builtin/gc: fix indentation of `cmd_gc()` parameters	2025-05-15 17:24:56 -07:00
Johannes Schindelin	6c91162449	fetch: avoid unnecessary work when there is no current branch As pointed out by CodeQL, `branch_get()` may return `NULL`, in which case `branch_has_merge_config()` would return early, but we can even avoid enumerating the refs prefixes in that case, saving even more CPU cycles. Technically, we should enclose these two statements in an `if (branch) {...}` block, but the indentation is already quite deep, therefore I refrained from doing that. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-15 13:46:47 -07:00
Johannes Schindelin	c607410ada	fetch: carefully clear local variable's address after use As pointed out by CodeQL, it is a potentially dangerous practice to store local variables' addresses in non-local structs. Yet this is exactly what happens with the `acked_commits` attribute that is used in `cmd_fetch()`: The pointer to a local variable is assigned to it. Now, it is Git's convention that `cmd_()` functions are essentially only returning just before exiting the process, therefore there is little danger that this attribute is used after the code flow returns from that function. However, code in `cmd_()` function is often so useful that it gets lifted into a library function, at which point this issue could become a real problem. Let's make sure to clear the `acked_commits` attribute out after it was used, and before the function returns (at which point the address would go stale). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-15 13:46:45 -07:00

1 2 3 4 5 ...

12778 Commits (ef03aa432ab7fffa81a866ec21e08ecd8a876a26)