kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	f6c8fe189b	Merge branch 'jk/commit-graph-lazy-load-fallback' The logic to lazy-load trees from the commit-graph has been made more robust by falling back to reading the commit object when the commit-graph is no longer available. * jk/commit-graph-lazy-load-fallback: commit: fall back to full read when maybe_tree is NULL	2026-05-31 10:00:38 +09:00
Junio C Hamano	382705906f	Merge branch 'jk/commit-sign-overflow-fix' Leakfix. * jk/commit-sign-overflow-fix: commit: handle large commit messages in utf8 verification	2026-05-22 08:48:20 +09:00
Jeff King	3d8e4004c6	commit: fall back to full read when maybe_tree is NULL When we load a commit object from the commit graph (rather than reading the object contents), we don't fill in its "maybe_tree" entry, but rather wait to lazy-load it. This goes back to `7b8a21dba1` (commit-graph: lazy-load trees for commits, 2018-04-06), and saves the work of instantiating tree objects that nobody cares about. But it creates a data dependency: now the commit struct depends on the graph file to do that lazy load. This is a problem if we close the graph file; now we have a commit struct that claims to be parsed but is missing some of its data. It's rare for this to be a problem in practice, because we don't tend to close the graph files at all, and if we do we don't tend to look at their commits afterward. But there is one case that is easy to trigger: git-clone's --dissociate option will close the object database before running the dissociate repack, and then afterwards still try to check out the working tree. This will yield an error like: fatal: unable to parse commit b29edc0babef41810f7b1c9ee1d74058f22e4080 warning: Clone succeeded, but checkout failed. What happens is that we expect repo_get_commit_tree() to lazy-load the tree, but commit_graph_position() returns COMMIT_NOT_FROM_GRAPH because the position slab has gone away (and even if it hadn't, we don't have the graph file itself available anymore). Let's try harder to find the tree in repo_get_commit_tree() by actually opening the commit object and parsing the tree line. This is extra work, but no more than we'd have to go to if we hadn't done the initial graph load in the first place. It does mean that a corrupt commit (e.g., one that points to a non-tree object for which we couldn't instantiate a struct) will repeatedly load the object from disk, once for each call to repo_get_commit_tree(). But such corruptions should be rare, and we don't tend to perform such calls repeatedly (usually we'd abort the operation upon seeing corruption). It also means we have to reimplement a bit of the commit parsing. We can't just use parse_commit_buffer() here, because it expects an unparsed struct and wants to load everything, including parent links. But we don't know if the parent list has been munged during traversal, so it's not safe for us to touch it. Fortunately, it's quite easy to load just the tree, as it is always the first line of the commit object. There is an alternative approach which I considered but rejected: "complete" each graph-loaded commit struct when we close the graph file by looking up and instantiating their trees at close time. This is the most elegant solution in some sense, as it resolves the data dependency at the moment it goes away. And it avoids ever opening the commit objects at all, which can be more efficient. But not always. The resolving effort scales with the number of graph-loaded commits, even though we may only later access one or a few. So the tradeoff depends on how many were loaded in total versus how many will be later accessed. And in most cases, we will not access any at all! Programs which close the object database before exiting will then do a bunch of work for no reason. This could be mitigated by requiring a separate function to resolve the graph structs before closing the file. But now each close call has to consider whether to call that resolving function. So we'd fix this case in git-clone, but we don't know what other cases (if any) are lurking. Moreover, this strategy does nothing if we lose access to the graph file unexpectedly (e.g., due to a system error). I'm not entirely sure this is possible now (we mmap it, so I'd guess any error would turn into SIGBUS anyway). But it feels like making the lazy-load more robust (which this patch does) is the best way to handle a wide variety of possible failure modes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-20 15:38:01 +09:00
Junio C Hamano	cf7151110d	Merge branch 'bc/sign-commit-with-custom-encoding' Signing commit with custom encoding was passing the data to be signed at a wrong stage in the pipeline, which has been corrected. * bc/sign-commit-with-custom-encoding: commit: sign commit after mutating buffer commit: name UTF-8 function appropriately	2026-05-20 10:30:57 +09:00
Jeff King	65ea197dca	commit: handle large commit messages in utf8 verification Running t4205 under UBSan with the EXPENSIVE prereq enabled triggers an error when we try to create a commit message that is over 2GB: commit.c:1574:6: runtime error: signed integer overflow: -2147483648 - 1 cannot be represented in type 'int' The problem is that find_invalid_utf8() is not prepared to handle large buffers, as it uses an "int" to represent buffer sizes and offsets. We can fix this with a few changes: 1. We'll take in "len" as a size_t (which is what the caller has anyway, since it's working with a strbuf). 2. We need to return a size_t to give the offset to the invalid utf8, but we also need a sentinel value for "no invalid value" (previously "-1"). Let's split these to return a bool for "found invalid utf8" and then pass back the offset as an out-parameter. We'll switch the function name to match the new semantics. 3. The caller in verify_utf8() uses a "long" to store buffer positions, which is a bit funny. This goes back to `08a94a145c` (commit/commit-tree: correct latin1 to utf-8, 2012-06-28) and is perhaps trying to match our use of "unsigned long" for object sizes (though we don't care about it ever becoming negative here). This should be a size_t, too, as some platforms (like Windows) still use a 32-bit long on machines with 64-bit pointers. 4. The "bytes" field within find_invalid_utf() does not have range problems. It is the number of bytes the utf8 sequence claims to have, so is limited by how many bits can be set in a single 8-bit byte. However, if we leave it as an "int" then the compiler will complain about the sign mismatch when comparing it to "len". So let's make it unsigned, too. All of this is a little silly, of course, because 2GB text commit messages are clearly nonsense. So we might consider rejecting them outright, but it is easy enough to make these helper functions more robust in the meantime. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-16 21:43:14 +09:00
brian m. carlson	7735d7eee3	commit: sign commit after mutating buffer The ensure_utf8 function can mutate the buffer to change its encoding, so we must call it before signing the buffer so that we do not invalidate the signature, which is made over raw bytes. Fix a bug which caused the compatibility code to not convert the compatibility buffer if the main buffer was invalid UTF-8. We expect both buffers to be valid UTF-8 or both invalid, since the only data that would differ between them would be hex object IDs, which are always valid UTF-8. Add a test for this case using 0xfe and 0xff, which are never valid in UTF-8. Reported-by: Kushal Das <kushal@sunet.se> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-04-28 09:51:11 +09:00
brian m. carlson	1ddc0481cf	commit: name UTF-8 function appropriately We have a function named verify_utf8, but it does more than verify, it modifies the buffer if it is not UTF-8. This is different from what most people would expect, so call the function ensure_utf8, since it mutates the buffer in some cases. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-04-28 09:51:11 +09:00
Emily Shaffer	ae25764e50	hook: mark non-parallelizable hooks Several hooks are known to be inherently non-parallelizable, so initialize them with RUN_HOOKS_OPT_INIT_FORCE_SERIAL. This pins jobs=1 and overrides any hook.jobs or runtime -j flags. These hooks are: applypatch-msg, pre-commit, prepare-commit-msg, commit-msg, post-commit, post-checkout, and push-to-checkout. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-04-10 07:58:53 -07:00
Justin Tobler	86ebf870b9	gpg-interface: allow sign_buffer() to use default signing key The `sign_commit_to_strbuf()` helper in "commit.c" provides fallback logic to get the default configured signing key when a key is not provided and handles generating the commit signature accordingly. This signing operation is not really specific to commits as any arbitrary buffer can be signed. Also, in a subsequent commit, this same logic is reused by git-fast-import(1) when signing commits with invalid signatures. Remove the `sign_commit_to_strbuf()` helper from "commit.c" and extend `sign_buffer()` in "gpg-interface.c" to support using the default key as a fallback when the `SIGN_BUFFER_USE_DEFAULT_KEY` flag is provided. Call sites are updated accordingly. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-12 21:28:20 -07:00
Junio C Hamano	13763ecf7d	Merge branch 'ps/receive-pack-shallow-optim' The code to accept shallow "git push" has been optimized. * ps/receive-pack-shallow-optim: commit: use commit graph in `lookup_commit_reference_gently()` commit: make `repo_parse_commit_no_graph()` more robust commit: avoid parsing non-commits in `lookup_commit_reference_gently()`	2026-03-02 17:06:53 -08:00
Patrick Steinhardt	bb5da75d61	commit: use commit graph in `lookup_commit_reference_gently()` In the preceding commit we refactored `lookup_commit_reference_gently()` so that it doesn't parse non-commit objects anymore. This has led to a speedup when git-receive-pack(1) accepts a shallow push into a repo with lots of refs that point to blobs or trees. But while this case is now faster, we still have the issue that accepting pushes with lots of "normal" refs that point to commits are still slow. This is mostly because we look up the commits via the object database, and that is rather costly. Adapt the code to use `repo_parse_commit_gently()` instead of `parse_object()` to parse the resulting commit object. This function knows to use the commit-graph to fill in the object, which is way more cost efficient. This leads to another significant speedup when accepting shallow pushes. The following benchmark pushes a single objects from a shallow clone into a repository with 600,000 references that all point to commits: Benchmark 1: git-receive-pack (rev = HEAD~) Time (mean ± σ): 9.179 s ± 0.031 s [User: 8.858 s, System: 0.528 s] Range (min … max): 9.154 s … 9.213 s 3 runs Benchmark 2: git-receive-pack (rev = HEAD) Time (mean ± σ): 2.337 s ± 0.032 s [User: 2.331 s, System: 0.234 s] Range (min … max): 2.308 s … 2.371 s 3 runs Summary git-receive-pack . </tmp/input (rev = HEAD) ran 3.93 ± 0.05 times faster than git-receive-pack (rev = HEAD~) Also, this again leads to a significant reduction in memory allocations. Before this change: HEAP SUMMARY: in use at exit: 17,524,978 bytes in 22,393 blocks total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated And after this change: HEAP SUMMARY: in use at exit: 11,534,036 bytes in 12,406 blocks total heap usage: 13,284 allocs, 878 frees, 15,521,451 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-19 09:34:26 -08:00
Patrick Steinhardt	f23ac77a43	commit: avoid parsing non-commits in `lookup_commit_reference_gently()` The function `lookup_commit_reference_gently()` can be used to look up a committish by object ID. As such, the function knows to peel for example tag objects so that we eventually end up with the commit. The function is used quite a lot throughout our tree. One such user is "shallow.c" via `assign_shallow_commits_to_refs()`. The intent of this function is to figure out whether a shallow push is missing any objects that are required to satisfy the ref updates, and if so, which of the ref updates is missing objects. This is done by painting the tree with `UNINTERESTING`. We start painting by calling `refs_for_each_ref()` so that we can mark all existing referenced objects as the boundary of objects that we already have, and which are supposed to be fully connected. The reference tips are then parsed via `lookup_commit_reference_gently()`, and the commit is then marked as uninteresting. But references may not necessarily point to a committish, and if a lot of them aren't then this step takes a lot of time. This is mostly due to the way that `lookup_commit_reference_gently()` is implemented: before we learn about the type of the object we already call `parse_object()` on the object ID. This has two consequences: - We parse all objects, including trees and blobs, even though we don't even need the contents of them. - More importantly though, `parse_object()` will cause us to check whether the object ID matches its contents. Combined this means that we deflate and hash every non-committish object, and that of course ends up being both CPU- and memory-intensive. Improve the logic so that we first use `peel_object()`. This function won't parse the object for us, and thus it allows us to learn about the object's type before we parse and return it. The following benchmark pushes a single object from a shallow clone into a repository that has 100,000 refs. These refs were created by listing all objects via `git rev-list(1) --objects --all` and creating refs for a subset of them, so lots of those refs will cover non-commit objects. Benchmark 1: git-receive-pack (rev = HEAD~) Time (mean ± σ): 62.571 s ± 0.413 s [User: 58.331 s, System: 4.053 s] Range (min … max): 62.191 s … 63.010 s 3 runs Benchmark 2: git-receive-pack (rev = HEAD) Time (mean ± σ): 38.339 s ± 0.192 s [User: 36.220 s, System: 1.992 s] Range (min … max): 38.176 s … 38.551 s 3 runs Summary git-receive-pack . </tmp/input (rev = HEAD) ran 1.63 ± 0.01 times faster than git-receive-pack . </tmp/input (rev = HEAD~) This leads to a sizeable speedup as we now skip reading and parsing non-commit objects. Before this change we spent around 40% of the time in `assign_shallow_commits_to_refs()`, after the change we only spend around 1.2% of the time in there. Almost the entire remainder of the time is spent in git-rev-list(1) to perform the connectivity checks. Despite the speedup though, this also leads to a massive reduction in allocations. Before: HEAP SUMMARY: in use at exit: 352,480,441 bytes in 97,185 blocks total heap usage: 2,793,820 allocs, 2,696,635 frees, 67,271,456,983 bytes allocated And after: HEAP SUMMARY: in use at exit: 17,524,978 bytes in 22,393 blocks total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated Note that when all references refer to commits performance stays roughly the same, as expected. The following benchmark was executed with 600k commits: Benchmark 1: git-receive-pack (rev = HEAD~) Time (mean ± σ): 9.101 s ± 0.006 s [User: 8.800 s, System: 0.520 s] Range (min … max): 9.095 s … 9.106 s 3 runs Benchmark 2: git-receive-pack (rev = HEAD) Time (mean ± σ): 9.128 s ± 0.094 s [User: 8.820 s, System: 0.522 s] Range (min … max): 9.019 s … 9.188 s 3 runs Summary git-receive-pack (rev = HEAD~) ran 1.00 ± 0.01 times faster than git-receive-pack (rev = HEAD) This will be improved in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-19 09:34:16 -08:00
Junio C Hamano	83037cb357	Merge branch 'rs/commit-commit-stack' Code clean-up to use the commit_stack API. * rs/commit-commit-stack: commit: use commit_stack	2026-02-17 13:30:42 -08:00
Junio C Hamano	354b8d89ac	Merge branch 'rs/clean-includes' Clean up redundant includes of header files. * rs/clean-includes: remove duplicate includes	2026-02-17 13:30:42 -08:00
Junio C Hamano	5288202433	Merge branch 'ps/commit-list-functions-renamed' Rename three functions around the commit_list data structure. * ps/commit-list-functions-renamed: commit: rename `free_commit_list()` to conform to coding guidelines commit: rename `reverse_commit_list()` to conform to coding guidelines commit: rename `copy_commit_list()` to conform to coding guidelines	2026-02-13 13:39:25 -08:00
René Scharfe	10c68d2577	remove duplicate includes The following command reports that some header files are included twice: $ git grep '#include' '*.c' \| sort \| uniq -cd Remove the second #include line in each case, as it has no effect. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-08 15:03:06 -08:00
René Scharfe	050566633a	commit: use commit_stack Use commit_stack instead of open-coding it. Also convert the loop counter i to size_t to match the type of the nr member of struct commit_stack. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-08 15:02:09 -08:00
Junio C Hamano	a3d1f391d3	Revert "Merge branch 'ar/run-command-hook'" This reverts commit `f406b89552`, reversing changes made to `1627809eef`. It seems to have caused a few regressions, two of the three known ones we have proposed solutions for. Let's give ourselves a bit more room to maneuver during the pre-release freeze period and restart once the 2.53 ships.	2026-01-15 13:02:38 -08:00
Patrick Steinhardt	9f18d089c5	commit: rename `free_commit_list()` to conform to coding guidelines Our coding guidelines say that: Functions that operate on `struct S` are named `S_<verb>()` and should generally receive a pointer to `struct S` as first parameter. While most of the functions related to `struct commit_list` already follow that naming schema, `free_commit_list()` doesn't. Rename the function to address this and adjust all of its callers. Add a compatibility wrapper for the old function name to ease the transition and avoid any semantic conflicts with in-flight patch series. This wrapper will be removed once Git 2.53 has been released. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-15 05:32:31 -08:00
Patrick Steinhardt	a468f3cefa	commit: rename `reverse_commit_list()` to conform to coding guidelines Our coding guidelines say that: Functions that operate on `struct S` are named `S_<verb>()` and should generally receive a pointer to `struct S` as first parameter. While most of the functions related to `struct commit_list` already follow that naming schema, `reverse_commit_list()` doesn't. Rename the function to address this and adjust all of its callers. Add a compatibility wrapper for the old function name to ease the transition and avoid any semantic conflicts with in-flight patch series. This wrapper will be removed once Git 2.53 has been released. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-15 05:32:31 -08:00
Patrick Steinhardt	ff9fb2cfe6	commit: rename `copy_commit_list()` to conform to coding guidelines Our coding guidelines say that: Functions that operate on `struct S` are named `S_<verb>()` and should generally receive a pointer to `struct S` as first parameter. While most of the functions related to `struct commit_list` already follow that naming schema, `copy_commit_list()` doesn't. Rename the function to address this and adjust all of its callers. Add a compatibility wrapper for the old function name to ease the transition and avoid any semantic conflicts with in-flight patch series. This wrapper will be removed once Git 2.53 has been released. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-15 05:32:31 -08:00
Junio C Hamano	3235ef374e	Merge branch 'rs/commit-stack' Code clean-up, unifying various hand-rolled "list of commit objects" and use the commit_stack API. * rs/commit-stack: commit-reach: use commit_stack commit-graph: use commit_stack commit: add commit_stack_grow() shallow: use commit_stack pack-bitmap-write: use commit_stack commit: add commit_stack_init() test-reach: use commit_stack remote: use commit_stack for src_commits remote: use commit_stack for sent_tips remote: use commit_stack for local_commits name-rev: use commit_stack midx: use commit_stack log: use commit_stack revision: export commit_stack	2026-01-12 05:19:52 -08:00
Junio C Hamano	f406b89552	Merge branch 'ar/run-command-hook' Use hook API to replace ad-hoc invocation of hook scripts with the run_command() API. * ar/run-command-hook: receive-pack: convert receive hooks to hook API receive-pack: convert update hooks to new API hooks: allow callers to capture output run-command: allow capturing of collated output hook: allow overriding the ungroup option reference-transaction: use hook API instead of run-command transport: convert pre-push to hook API hook: convert 'post-rewrite' hook in sequencer.c to hook API hook: provide stdin via callback run-command: add stdin callback for parallelization run-command: add first helper for pp child states	2026-01-06 16:33:53 +09:00
Adrian Ratiu	857f047e40	hook: allow overriding the ungroup option When calling run_process_parallel() in run_hooks_opt(), the ungroup option is currently hardcoded to .ungroup = 1. This causes problems when ungrouping should be disabled, for example when sideband-reading collated output from child hooks, because sideband-reading and ungrouping are mutually exclusive. Thus a new hook.h option is added to allow overriding. The existing ungroup=1 behavior is preserved in the run_hooks() API and the "hook run" command. We could modify these to take an option if necessary, so I added two code comments there. Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-28 14:02:07 +09:00
René Scharfe	958a816794	commit: add commit_stack_grow() Add a function for increasing the capacity of a commit_stack. It is useful for reducing reallocations when the target size is known in advance. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-25 08:29:28 +09:00
René Scharfe	2ebaa2b45e	commit: add commit_stack_init() Add a function for initializing a struct commit_stack, for when static initialization is not possible or impractical. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-25 08:29:28 +09:00
René Scharfe	d8a17ef09b	revision: export commit_stack Dynamic arrays of commit pointers are used in several places. Some of them use a custom struct to hold array, item count and capacity, others have them as separate variables linked by a common name part. Pick one succinct, clean implementation -- commit_stack -- and convert the different variants to it to reduce code duplication. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-25 08:29:27 +09:00
Christian Couder	cb034c020a	commit: refactor verify_commit_buffer() In a following commit, we are going to check commit signatures, but we won't have a commit yet, only a commit buffer, and we are going to discard this commit buffer if the signature is invalid. So it would be wasteful to create a commit that we might discard, just to be able to check a commit signature. It would be simpler instead to be able to check commit signatures using only a commit buffer instead of a commit. To be able to do that, let's extract some code from the check_commit_signature() function into a new verify_commit_buffer() function, and then let's make check_commit_signature() call verify_commit_buffer(). Note that this doesn't fundamentally change how check_commit_signature() works. It used to call parse_signed_commit() which calls repo_get_commit_buffer(), parse_buffer_signed_by_header() and repo_unuse_commit_buffer(). Now these 3 functions are called directly by verify_commit_buffer(). Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-16 20:56:35 -08:00
Junio C Hamano	9a85fa8406	Merge branch 'ps/remote-rename-fix' "git remote rename origin upstream" failed to move origin/HEAD to upstream/HEAD when origin/HEAD is unborn and performed other renames extremely inefficiently, which has been corrected. * ps/remote-rename-fix: builtin/remote: only iterate through refs that are to be renamed builtin/remote: rework how remote refs get renamed builtin/remote: determine whether refs need renaming early on builtin/remote: fix sign comparison warnings refs: simplify logic when migrating reflog entries refs: pass refname when invoking reflog entry callback	2025-08-21 13:46:58 -07:00
Patrick Steinhardt	b9fd73a234	refs: pass refname when invoking reflog entry callback With `refs_for_each_reflog_ent()` callers can iterate through all the reflog entries for a given reference. The callback that is being invoked for each such entry does not receive the name of the reference that we are currently iterating through. This isn't really a limiting factor, as callers can simply pass the name via the callback data. But this layout sometimes does make for a bit of an awkward calling pattern. One example: when iterating through all reflogs, and for each reflog we iterate through all refnames, we have to do some extra book keeping to track which reference name we are currently yielding reflog entries for. Change the signature of the callback function so that the reference name of the reflog gets passed through to it. Adapt callers accordingly and start using the new parameter in trivial cases. The next commit will refactor the reference migration logic to make use of this parameter so that we can simplify its logic a bit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-08-06 14:19:30 -07:00
Junio C Hamano	4ce0caa7cc	Merge branch 'ps/object-file-wo-the-repository' Reduce implicit assumption and dependence on the_repository in the object-file subsystem. * ps/object-file-wo-the-repository: object-file: get rid of `the_repository` in index-related functions object-file: get rid of `the_repository` in `force_object_loose()` object-file: get rid of `the_repository` in `read_loose_object()` object-file: get rid of `the_repository` in loose object iterators object-file: remove declaration for `for_each_file_in_obj_subdir()` object-file: inline `for_each_loose_file_in_objdir_buf()` object-file: get rid of `the_repository` when writing objects odb: introduce `odb_write_object()` loose: write loose objects map via their source object-file: get rid of `the_repository` in `finalize_object_file()` object-file: get rid of `the_repository` in `loose_object_info()` object-file: get rid of `the_repository` when freshening objects object-file: inline `check_and_freshen()` functions object-file: get rid of `the_repository` in `has_loose_object()` object-file: stop using `the_hash_algo` object-file: fix -Wsign-compare warnings	2025-08-05 11:53:55 -07:00
Junio C Hamano	0f6e5037d4	Merge branch 'rs/pop-recent-commit-with-prio-queue' The pop_most_recent_commit() function can have quite expensive worst case performance characteristics, which has been optimized by using prio-queue data structure. * rs/pop-recent-commit-with-prio-queue: commit: use prio_queue_replace() in pop_most_recent_commit() prio-queue: add prio_queue_replace() commit: convert pop_most_recent_commit() to prio_queue	2025-07-28 12:02:34 -07:00
René Scharfe	a79e3519d6	commit: use prio_queue_replace() in pop_most_recent_commit() Optimize pop_most_recent_commit() by adding the first parent using the more efficient prio_queue_peek() and prio_queue_replace() instead of prio_queue_get() and prio_queue_put(). On my machine this neutralizes the performance hit it took in Git's own repository when we converted it to prio_queue two patches ago (git_pq): $ hyperfine -w3 -L git ./git_2.50.1,./git_pq,./git '{git} rev-parse :/^Initial.revision' Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision Time (mean ± σ): 1.073 s ± 0.003 s [User: 1.053 s, System: 0.019 s] Range (min … max): 1.069 s … 1.078 s 10 runs Benchmark 2: ./git_pq rev-parse :/^Initial.revision Time (mean ± σ): 1.077 s ± 0.002 s [User: 1.057 s, System: 0.018 s] Range (min … max): 1.072 s … 1.079 s 10 runs Benchmark 3: ./git rev-parse :/^Initial.revision Time (mean ± σ): 1.069 s ± 0.003 s [User: 1.049 s, System: 0.018 s] Range (min … max): 1.065 s … 1.074 s 10 runs Summary ./git rev-parse :/^Initial.revision ran 1.00 ± 0.00 times faster than ./git_2.50.1 rev-parse :/^Initial.revision 1.01 ± 0.00 times faster than ./git_pq rev-parse :/^Initial.revision Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-22 07:28:40 -07:00
René Scharfe	d6ec08788e	commit: convert pop_most_recent_commit() to prio_queue pop_most_recent_commit() calls commit_list_insert_by_date() for parent commits, which is itself called in a loop. This can lead to quadratic complexity if there are many merges. Replace the commit_list with a prio_queue to ensure logarithmic worst case complexity and convert all three users. Add a performance test that exercises one of them using a pathological history that consists of 50% merges and 50% root commits to demonstrate the speedup: Test v2.50.1 HEAD ---------------------------------------------------------------------- 1501.2: rev-parse ':/65535' 2.48(2.47+0.00) 0.20(0.19+0.00) -91.9% Alas, sane histories don't benefit from the conversion much, and traversing Git's own history takes a 1% performance hit on my machine: $ hyperfine -w3 -L git ./git_2.50.1,./git '{git} rev-parse :/^Initial.revision' Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision Time (mean ± σ): 1.071 s ± 0.004 s [User: 1.052 s, System: 0.017 s] Range (min … max): 1.067 s … 1.078 s 10 runs Benchmark 2: ./git rev-parse :/^Initial.revision Time (mean ± σ): 1.079 s ± 0.003 s [User: 1.060 s, System: 0.017 s] Range (min … max): 1.074 s … 1.083 s 10 runs Summary ./git_2.50.1 rev-parse :/^Initial.revision ran 1.01 ± 0.00 times faster than ./git rev-parse :/^Initial.revision Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-22 07:28:23 -07:00
Patrick Steinhardt	ab1c6e1d12	odb: introduce `odb_write_object()` We do not have a backend-agnostic way to write objects into an object database. While there is `write_object_file()`, this function is rather specific to the loose object format. Introduce `odb_write_object()` to plug this gap. For now, this function is a simple wrapper around `write_object_file()` and doesn't even use the passed-in object database yet. This will change in subsequent commits, where `write_object_file()` is converted so that it works on top of an `odb_source`. `odb_write_object()` will then become responsible for deciding which source an object shall be written to. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-16 22:16:15 -07:00
Patrick Steinhardt	fcf8e3e111	odb: rename `has_object()` Rename `has_object()` to `odb_has_object()` to match other functions related to the object database and our modern coding guidelines. Introduce a compatibility wrapper so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:38 -07:00
Patrick Steinhardt	d4ff88aee3	odb: rename `repo_read_object_file()` Rename `repo_read_object_file()` to `odb_read_object()` to match other functions related to the object database and our modern coding guidelines. Introduce a compatibility wrapper so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:38 -07:00
Patrick Steinhardt	e989dd96b8	odb: rename `oid_object_info()` Rename `oid_object_info()` to `odb_read_object_info()` as well as their `_extended()` variant to match other functions related to the object database and our modern coding guidelines. Introduce compatibility wrappers so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:37 -07:00
Patrick Steinhardt	961038856b	odb: get rid of `the_repository` in `assert_oid_type()` Get rid of our dependency on `the_repository` in `assert_oid_type()` by passing in the object database as a parameter and adjusting all callers. Rename the function to `odb_assert_oid_type()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:35 -07:00
Patrick Steinhardt	8f49151763	object-store: rename files to "odb.{c,h}" In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:34 -07:00
Junio C Hamano	36d8035d27	Merge branch 'ps/object-file-cleanup' Code clean-up. * ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"	2025-04-24 17:25:33 -07:00
Junio C Hamano	ee847e0034	Merge branch 'ps/object-wo-the-repository' The object layer has been updated to take an explicit repository instance as a parameter in more code paths. * ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`	2025-04-15 13:50:15 -07:00
Patrick Steinhardt	68cd492a3e	object-store: merge "object-store-ll.h" and "object-store.h" The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-15 08:24:37 -07:00
Patrick Steinhardt	d9f517d051	object-file: split out functions relating to object store subsystem While we have the "object-store.h" header, most of the functionality for object stores is actually hosted in "object-file.c". This makes it hard to find relevant functions and causes us to mix up concerns. Split out functions relating to the object store subsystem into a new "object-store.c" file. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-15 08:24:36 -07:00
René Scharfe	98b423bc1c	commit: move clear_commit_marks_many() loop body to clear_commit_marks() clear_commit_marks_many() clears multiple commits one by one. Move the code for handling a single commit to clear_commit_marks() and call it instead of the other way around, to simplify the code. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-24 14:52:29 +09:00
Patrick Steinhardt	f6e174b2d8	object-file-convert: stop depending on `the_repository` There are multiple sites in "object-file-convert.c" where we use the global `the_repository` variable, either explicitly or implicitly by using `the_hash_algo`. All of these callsites are transitively called from `convert_object_file()`, which indeed has no repo as input. Refactor the function so that it receives a repository as a parameter and pass it through to all internal functions to get rid of the dependency. Remove the `USE_THE_REPOSITORY_VARIABLE` define. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:16:19 -07:00
René Scharfe	1ca727f230	commit: avoid parent list buildup in clear_commit_marks_many() clear_commit_marks_1() clears the marks of the first parent and its first parent and so on, and saves the higher numbered parents in a list for later. There is no benefit in keeping that list growing with each handled commit. Clear it after each run to reduce peak memory usage. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-24 08:51:18 -08:00
Patrick Steinhardt	85ee0680e2	commit-reach: use `size_t` to track indices in `get_reachable_subset()` Similar as with the preceding commit, adapt `get_reachable_subset()` so that it tracks array indices via `size_t` instead of using signed integers to fix a couple of -Wsign-compare warnings. Adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:11:45 -08:00
Junio C Hamano	4156b6a741	Merge branch 'ps/build-sign-compare' Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings	2024-12-23 09:32:11 -08:00
Junio C Hamano	e6663b9ac5	Merge branch 'bf/explicit-config-set-in-advice-messages' The advice messages now tell the newer 'git config set' command to set the advice.token configuration variable to squelch a message. * bf/explicit-config-set-in-advice-messages: advice: suggest using subcommand "git config set"	2024-12-15 17:54:28 -08:00

1 2 3 4 5 ...

693 Commits (600fe743028cbfb640855f659e9851522214bc0b)