The logic to lazy-load trees from the commit-graph has been made
more robust by falling back to reading the commit object when
the commit-graph is no longer available.
* jk/commit-graph-lazy-load-fallback:
commit: fall back to full read when maybe_tree is NULL
When we load a commit object from the commit graph (rather than reading
the object contents), we don't fill in its "maybe_tree" entry, but
rather wait to lazy-load it. This goes back to 7b8a21dba1 (commit-graph:
lazy-load trees for commits, 2018-04-06), and saves the work of
instantiating tree objects that nobody cares about.
But it creates a data dependency: now the commit struct depends on the
graph file to do that lazy load. This is a problem if we close the graph
file; now we have a commit struct that claims to be parsed but is
missing some of its data.
It's rare for this to be a problem in practice, because we don't tend to
close the graph files at all, and if we do we don't tend to look at
their commits afterward. But there is one case that is easy to trigger:
git-clone's --dissociate option will close the object database before
running the dissociate repack, and then afterwards still try to check
out the working tree. This will yield an error like:
fatal: unable to parse commit b29edc0babef41810f7b1c9ee1d74058f22e4080
warning: Clone succeeded, but checkout failed.
What happens is that we expect repo_get_commit_tree() to lazy-load the
tree, but commit_graph_position() returns COMMIT_NOT_FROM_GRAPH because
the position slab has gone away (and even if it hadn't, we don't have
the graph file itself available anymore).
Let's try harder to find the tree in repo_get_commit_tree() by actually
opening the commit object and parsing the tree line. This is extra work,
but no more than we'd have to go to if we hadn't done the initial graph
load in the first place.
It does mean that a corrupt commit (e.g., one that points to a non-tree
object for which we couldn't instantiate a struct) will repeatedly load
the object from disk, once for each call to repo_get_commit_tree(). But
such corruptions should be rare, and we don't tend to perform such calls
repeatedly (usually we'd abort the operation upon seeing corruption).
It also means we have to reimplement a bit of the commit parsing. We
can't just use parse_commit_buffer() here, because it expects an
unparsed struct and wants to load everything, including parent links.
But we don't know if the parent list has been munged during traversal,
so it's not safe for us to touch it. Fortunately, it's quite easy to
load just the tree, as it is always the first line of the commit object.
There is an alternative approach which I considered but rejected:
"complete" each graph-loaded commit struct when we close the graph file
by looking up and instantiating their trees at close time. This is the
most elegant solution in some sense, as it resolves the data dependency
at the moment it goes away. And it avoids ever opening the commit
objects at all, which can be more efficient.
But not always. The resolving effort scales with the number of
graph-loaded commits, even though we may only later access one or a few.
So the tradeoff depends on how many were loaded in total versus how many
will be later accessed.
And in most cases, we will not access any at all! Programs which close
the object database before exiting will then do a bunch of work for no
reason. This could be mitigated by requiring a separate function to
resolve the graph structs before closing the file. But now each close
call has to consider whether to call that resolving function. So we'd
fix this case in git-clone, but we don't know what other cases (if any)
are lurking.
Moreover, this strategy does nothing if we lose access to the graph file
unexpectedly (e.g., due to a system error). I'm not entirely sure this
is possible now (we mmap it, so I'd guess any error would turn into
SIGBUS anyway). But it feels like making the lazy-load more robust
(which this patch does) is the best way to handle a wide variety of
possible failure modes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signing commit with custom encoding was passing the data to be
signed at a wrong stage in the pipeline, which has been corrected.
* bc/sign-commit-with-custom-encoding:
commit: sign commit after mutating buffer
commit: name UTF-8 function appropriately
Running t4205 under UBSan with the EXPENSIVE prereq enabled triggers an
error when we try to create a commit message that is over 2GB:
commit.c:1574:6: runtime error: signed integer overflow:
-2147483648 - 1 cannot be represented in type 'int'
The problem is that find_invalid_utf8() is not prepared to handle
large buffers, as it uses an "int" to represent buffer sizes and
offsets.
We can fix this with a few changes:
1. We'll take in "len" as a size_t (which is what the caller has
anyway, since it's working with a strbuf).
2. We need to return a size_t to give the offset to the invalid utf8,
but we also need a sentinel value for "no invalid value"
(previously "-1"). Let's split these to return a bool for "found
invalid utf8" and then pass back the offset as an out-parameter.
We'll switch the function name to match the new semantics.
3. The caller in verify_utf8() uses a "long" to store buffer
positions, which is a bit funny. This goes back to 08a94a145c
(commit/commit-tree: correct latin1 to utf-8, 2012-06-28) and is
perhaps trying to match our use of "unsigned long" for object sizes
(though we don't care about it ever becoming negative here). This
should be a size_t, too, as some platforms (like Windows) still use
a 32-bit long on machines with 64-bit pointers.
4. The "bytes" field within find_invalid_utf() does not have range
problems. It is the number of bytes the utf8 sequence claims to
have, so is limited by how many bits can be set in a single 8-bit
byte. However, if we leave it as an "int" then the compiler will
complain about the sign mismatch when comparing it to "len". So
let's make it unsigned, too.
All of this is a little silly, of course, because 2GB text commit
messages are clearly nonsense. So we might consider rejecting them
outright, but it is easy enough to make these helper functions more
robust in the meantime.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ensure_utf8 function can mutate the buffer to change its encoding,
so we must call it before signing the buffer so that we do not
invalidate the signature, which is made over raw bytes. Fix a bug which
caused the compatibility code to not convert the compatibility buffer if
the main buffer was invalid UTF-8. We expect both buffers to be valid
UTF-8 or both invalid, since the only data that would differ between
them would be hex object IDs, which are always valid UTF-8.
Add a test for this case using 0xfe and 0xff, which are never valid in
UTF-8.
Reported-by: Kushal Das <kushal@sunet.se>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a function named verify_utf8, but it does more than verify, it
modifies the buffer if it is not UTF-8. This is different from what
most people would expect, so call the function ensure_utf8, since it
mutates the buffer in some cases.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Several hooks are known to be inherently non-parallelizable, so initialize
them with RUN_HOOKS_OPT_INIT_FORCE_SERIAL. This pins jobs=1 and overrides
any hook.jobs or runtime -j flags.
These hooks are:
applypatch-msg, pre-commit, prepare-commit-msg, commit-msg, post-commit,
post-checkout, and push-to-checkout.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `sign_commit_to_strbuf()` helper in "commit.c" provides fallback
logic to get the default configured signing key when a key is not
provided and handles generating the commit signature accordingly. This
signing operation is not really specific to commits as any arbitrary
buffer can be signed. Also, in a subsequent commit, this same logic is
reused by git-fast-import(1) when signing commits with invalid
signatures.
Remove the `sign_commit_to_strbuf()` helper from "commit.c" and extend
`sign_buffer()` in "gpg-interface.c" to support using the default key as
a fallback when the `SIGN_BUFFER_USE_DEFAULT_KEY` flag is provided. Call
sites are updated accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to accept shallow "git push" has been optimized.
* ps/receive-pack-shallow-optim:
commit: use commit graph in `lookup_commit_reference_gently()`
commit: make `repo_parse_commit_no_graph()` more robust
commit: avoid parsing non-commits in `lookup_commit_reference_gently()`
In the preceding commit we refactored `lookup_commit_reference_gently()`
so that it doesn't parse non-commit objects anymore. This has led to a
speedup when git-receive-pack(1) accepts a shallow push into a repo
with lots of refs that point to blobs or trees.
But while this case is now faster, we still have the issue that
accepting pushes with lots of "normal" refs that point to commits are
still slow. This is mostly because we look up the commits via the object
database, and that is rather costly.
Adapt the code to use `repo_parse_commit_gently()` instead of
`parse_object()` to parse the resulting commit object. This function
knows to use the commit-graph to fill in the object, which is way more
cost efficient.
This leads to another significant speedup when accepting shallow pushes.
The following benchmark pushes a single objects from a shallow clone
into a repository with 600,000 references that all point to commits:
Benchmark 1: git-receive-pack (rev = HEAD~)
Time (mean ± σ): 9.179 s ± 0.031 s [User: 8.858 s, System: 0.528 s]
Range (min … max): 9.154 s … 9.213 s 3 runs
Benchmark 2: git-receive-pack (rev = HEAD)
Time (mean ± σ): 2.337 s ± 0.032 s [User: 2.331 s, System: 0.234 s]
Range (min … max): 2.308 s … 2.371 s 3 runs
Summary
git-receive-pack . </tmp/input (rev = HEAD) ran
3.93 ± 0.05 times faster than git-receive-pack (rev = HEAD~)
Also, this again leads to a significant reduction in memory allocations.
Before this change:
HEAP SUMMARY:
in use at exit: 17,524,978 bytes in 22,393 blocks
total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated
And after this change:
HEAP SUMMARY:
in use at exit: 11,534,036 bytes in 12,406 blocks
total heap usage: 13,284 allocs, 878 frees, 15,521,451 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `lookup_commit_reference_gently()` can be used to look up a
committish by object ID. As such, the function knows to peel for example
tag objects so that we eventually end up with the commit.
The function is used quite a lot throughout our tree. One such user is
"shallow.c" via `assign_shallow_commits_to_refs()`. The intent of this
function is to figure out whether a shallow push is missing any objects
that are required to satisfy the ref updates, and if so, which of the
ref updates is missing objects.
This is done by painting the tree with `UNINTERESTING`. We start
painting by calling `refs_for_each_ref()` so that we can mark all
existing referenced objects as the boundary of objects that we already
have, and which are supposed to be fully connected. The reference tips
are then parsed via `lookup_commit_reference_gently()`, and the commit
is then marked as uninteresting.
But references may not necessarily point to a committish, and if a lot
of them aren't then this step takes a lot of time. This is mostly due to
the way that `lookup_commit_reference_gently()` is implemented: before
we learn about the type of the object we already call `parse_object()`
on the object ID. This has two consequences:
- We parse all objects, including trees and blobs, even though we
don't even need the contents of them.
- More importantly though, `parse_object()` will cause us to check
whether the object ID matches its contents.
Combined this means that we deflate and hash every non-committish
object, and that of course ends up being both CPU- and memory-intensive.
Improve the logic so that we first use `peel_object()`. This function
won't parse the object for us, and thus it allows us to learn about the
object's type before we parse and return it.
The following benchmark pushes a single object from a shallow clone into
a repository that has 100,000 refs. These refs were created by listing
all objects via `git rev-list(1) --objects --all` and creating refs for
a subset of them, so lots of those refs will cover non-commit objects.
Benchmark 1: git-receive-pack (rev = HEAD~)
Time (mean ± σ): 62.571 s ± 0.413 s [User: 58.331 s, System: 4.053 s]
Range (min … max): 62.191 s … 63.010 s 3 runs
Benchmark 2: git-receive-pack (rev = HEAD)
Time (mean ± σ): 38.339 s ± 0.192 s [User: 36.220 s, System: 1.992 s]
Range (min … max): 38.176 s … 38.551 s 3 runs
Summary
git-receive-pack . </tmp/input (rev = HEAD) ran
1.63 ± 0.01 times faster than git-receive-pack . </tmp/input (rev = HEAD~)
This leads to a sizeable speedup as we now skip reading and parsing
non-commit objects. Before this change we spent around 40% of the time
in `assign_shallow_commits_to_refs()`, after the change we only spend
around 1.2% of the time in there. Almost the entire remainder of the
time is spent in git-rev-list(1) to perform the connectivity checks.
Despite the speedup though, this also leads to a massive reduction in
allocations. Before:
HEAP SUMMARY:
in use at exit: 352,480,441 bytes in 97,185 blocks
total heap usage: 2,793,820 allocs, 2,696,635 frees, 67,271,456,983 bytes allocated
And after:
HEAP SUMMARY:
in use at exit: 17,524,978 bytes in 22,393 blocks
total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated
Note that when all references refer to commits performance stays roughly
the same, as expected. The following benchmark was executed with 600k
commits:
Benchmark 1: git-receive-pack (rev = HEAD~)
Time (mean ± σ): 9.101 s ± 0.006 s [User: 8.800 s, System: 0.520 s]
Range (min … max): 9.095 s … 9.106 s 3 runs
Benchmark 2: git-receive-pack (rev = HEAD)
Time (mean ± σ): 9.128 s ± 0.094 s [User: 8.820 s, System: 0.522 s]
Range (min … max): 9.019 s … 9.188 s 3 runs
Summary
git-receive-pack (rev = HEAD~) ran
1.00 ± 0.01 times faster than git-receive-pack (rev = HEAD)
This will be improved in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename three functions around the commit_list data structure.
* ps/commit-list-functions-renamed:
commit: rename `free_commit_list()` to conform to coding guidelines
commit: rename `reverse_commit_list()` to conform to coding guidelines
commit: rename `copy_commit_list()` to conform to coding guidelines
The following command reports that some header files are included twice:
$ git grep '#include' '*.c' | sort | uniq -cd
Remove the second #include line in each case, as it has no effect.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use commit_stack instead of open-coding it. Also convert the loop
counter i to size_t to match the type of the nr member of struct
commit_stack.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit f406b89552,
reversing changes made to 1627809eef.
It seems to have caused a few regressions, two of the three known
ones we have proposed solutions for. Let's give ourselves a bit
more room to maneuver during the pre-release freeze period and
restart once the 2.53 ships.
Our coding guidelines say that:
Functions that operate on `struct S` are named `S_<verb>()` and should
generally receive a pointer to `struct S` as first parameter.
While most of the functions related to `struct commit_list` already
follow that naming schema, `free_commit_list()` doesn't.
Rename the function to address this and adjust all of its callers. Add a
compatibility wrapper for the old function name to ease the transition
and avoid any semantic conflicts with in-flight patch series. This
wrapper will be removed once Git 2.53 has been released.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our coding guidelines say that:
Functions that operate on `struct S` are named `S_<verb>()` and should
generally receive a pointer to `struct S` as first parameter.
While most of the functions related to `struct commit_list` already
follow that naming schema, `reverse_commit_list()` doesn't.
Rename the function to address this and adjust all of its callers. Add a
compatibility wrapper for the old function name to ease the transition
and avoid any semantic conflicts with in-flight patch series. This
wrapper will be removed once Git 2.53 has been released.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our coding guidelines say that:
Functions that operate on `struct S` are named `S_<verb>()` and should
generally receive a pointer to `struct S` as first parameter.
While most of the functions related to `struct commit_list` already
follow that naming schema, `copy_commit_list()` doesn't.
Rename the function to address this and adjust all of its callers. Add a
compatibility wrapper for the old function name to ease the transition
and avoid any semantic conflicts with in-flight patch series. This
wrapper will be removed once Git 2.53 has been released.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up, unifying various hand-rolled "list of commit
objects" and use the commit_stack API.
* rs/commit-stack:
commit-reach: use commit_stack
commit-graph: use commit_stack
commit: add commit_stack_grow()
shallow: use commit_stack
pack-bitmap-write: use commit_stack
commit: add commit_stack_init()
test-reach: use commit_stack
remote: use commit_stack for src_commits
remote: use commit_stack for sent_tips
remote: use commit_stack for local_commits
name-rev: use commit_stack
midx: use commit_stack
log: use commit_stack
revision: export commit_stack
Use hook API to replace ad-hoc invocation of hook scripts with the
run_command() API.
* ar/run-command-hook:
receive-pack: convert receive hooks to hook API
receive-pack: convert update hooks to new API
hooks: allow callers to capture output
run-command: allow capturing of collated output
hook: allow overriding the ungroup option
reference-transaction: use hook API instead of run-command
transport: convert pre-push to hook API
hook: convert 'post-rewrite' hook in sequencer.c to hook API
hook: provide stdin via callback
run-command: add stdin callback for parallelization
run-command: add first helper for pp child states
When calling run_process_parallel() in run_hooks_opt(), the
ungroup option is currently hardcoded to .ungroup = 1.
This causes problems when ungrouping should be disabled, for
example when sideband-reading collated output from child hooks,
because sideband-reading and ungrouping are mutually exclusive.
Thus a new hook.h option is added to allow overriding.
The existing ungroup=1 behavior is preserved in the run_hooks()
API and the "hook run" command. We could modify these to take
an option if necessary, so I added two code comments there.
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function for increasing the capacity of a commit_stack. It is
useful for reducing reallocations when the target size is known in
advance.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function for initializing a struct commit_stack, for when static
initialization is not possible or impractical.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Dynamic arrays of commit pointers are used in several places. Some of
them use a custom struct to hold array, item count and capacity, others
have them as separate variables linked by a common name part.
Pick one succinct, clean implementation -- commit_stack -- and convert
the different variants to it to reduce code duplication.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a following commit, we are going to check commit signatures, but we
won't have a commit yet, only a commit buffer, and we are going to
discard this commit buffer if the signature is invalid. So it would be
wasteful to create a commit that we might discard, just to be able to
check a commit signature.
It would be simpler instead to be able to check commit signatures
using only a commit buffer instead of a commit.
To be able to do that, let's extract some code from the
check_commit_signature() function into a new verify_commit_buffer()
function, and then let's make check_commit_signature() call
verify_commit_buffer().
Note that this doesn't fundamentally change how
check_commit_signature() works. It used to call parse_signed_commit()
which calls repo_get_commit_buffer(), parse_buffer_signed_by_header()
and repo_unuse_commit_buffer(). Now these 3 functions are called
directly by verify_commit_buffer().
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git remote rename origin upstream" failed to move origin/HEAD to
upstream/HEAD when origin/HEAD is unborn and performed other
renames extremely inefficiently, which has been corrected.
* ps/remote-rename-fix:
builtin/remote: only iterate through refs that are to be renamed
builtin/remote: rework how remote refs get renamed
builtin/remote: determine whether refs need renaming early on
builtin/remote: fix sign comparison warnings
refs: simplify logic when migrating reflog entries
refs: pass refname when invoking reflog entry callback
With `refs_for_each_reflog_ent()` callers can iterate through all the
reflog entries for a given reference. The callback that is being invoked
for each such entry does not receive the name of the reference that we
are currently iterating through. This isn't really a limiting factor, as
callers can simply pass the name via the callback data.
But this layout sometimes does make for a bit of an awkward calling
pattern. One example: when iterating through all reflogs, and for each
reflog we iterate through all refnames, we have to do some extra book
keeping to track which reference name we are currently yielding reflog
entries for.
Change the signature of the callback function so that the reference name
of the reflog gets passed through to it. Adapt callers accordingly and
start using the new parameter in trivial cases. The next commit will
refactor the reference migration logic to make use of this parameter so
that we can simplify its logic a bit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce implicit assumption and dependence on the_repository in the
object-file subsystem.
* ps/object-file-wo-the-repository:
object-file: get rid of `the_repository` in index-related functions
object-file: get rid of `the_repository` in `force_object_loose()`
object-file: get rid of `the_repository` in `read_loose_object()`
object-file: get rid of `the_repository` in loose object iterators
object-file: remove declaration for `for_each_file_in_obj_subdir()`
object-file: inline `for_each_loose_file_in_objdir_buf()`
object-file: get rid of `the_repository` when writing objects
odb: introduce `odb_write_object()`
loose: write loose objects map via their source
object-file: get rid of `the_repository` in `finalize_object_file()`
object-file: get rid of `the_repository` in `loose_object_info()`
object-file: get rid of `the_repository` when freshening objects
object-file: inline `check_and_freshen()` functions
object-file: get rid of `the_repository` in `has_loose_object()`
object-file: stop using `the_hash_algo`
object-file: fix -Wsign-compare warnings
The pop_most_recent_commit() function can have quite expensive
worst case performance characteristics, which has been optimized by
using prio-queue data structure.
* rs/pop-recent-commit-with-prio-queue:
commit: use prio_queue_replace() in pop_most_recent_commit()
prio-queue: add prio_queue_replace()
commit: convert pop_most_recent_commit() to prio_queue
Optimize pop_most_recent_commit() by adding the first parent using the
more efficient prio_queue_peek() and prio_queue_replace() instead of
prio_queue_get() and prio_queue_put().
On my machine this neutralizes the performance hit it took in Git's own
repository when we converted it to prio_queue two patches ago (git_pq):
$ hyperfine -w3 -L git ./git_2.50.1,./git_pq,./git '{git} rev-parse :/^Initial.revision'
Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision
Time (mean ± σ): 1.073 s ± 0.003 s [User: 1.053 s, System: 0.019 s]
Range (min … max): 1.069 s … 1.078 s 10 runs
Benchmark 2: ./git_pq rev-parse :/^Initial.revision
Time (mean ± σ): 1.077 s ± 0.002 s [User: 1.057 s, System: 0.018 s]
Range (min … max): 1.072 s … 1.079 s 10 runs
Benchmark 3: ./git rev-parse :/^Initial.revision
Time (mean ± σ): 1.069 s ± 0.003 s [User: 1.049 s, System: 0.018 s]
Range (min … max): 1.065 s … 1.074 s 10 runs
Summary
./git rev-parse :/^Initial.revision ran
1.00 ± 0.00 times faster than ./git_2.50.1 rev-parse :/^Initial.revision
1.01 ± 0.00 times faster than ./git_pq rev-parse :/^Initial.revision
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pop_most_recent_commit() calls commit_list_insert_by_date() for parent
commits, which is itself called in a loop. This can lead to quadratic
complexity if there are many merges. Replace the commit_list with a
prio_queue to ensure logarithmic worst case complexity and convert all
three users.
Add a performance test that exercises one of them using a pathological
history that consists of 50% merges and 50% root commits to demonstrate
the speedup:
Test v2.50.1 HEAD
----------------------------------------------------------------------
1501.2: rev-parse ':/65535' 2.48(2.47+0.00) 0.20(0.19+0.00) -91.9%
Alas, sane histories don't benefit from the conversion much, and
traversing Git's own history takes a 1% performance hit on my machine:
$ hyperfine -w3 -L git ./git_2.50.1,./git '{git} rev-parse :/^Initial.revision'
Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision
Time (mean ± σ): 1.071 s ± 0.004 s [User: 1.052 s, System: 0.017 s]
Range (min … max): 1.067 s … 1.078 s 10 runs
Benchmark 2: ./git rev-parse :/^Initial.revision
Time (mean ± σ): 1.079 s ± 0.003 s [User: 1.060 s, System: 0.017 s]
Range (min … max): 1.074 s … 1.083 s 10 runs
Summary
./git_2.50.1 rev-parse :/^Initial.revision ran
1.01 ± 0.00 times faster than ./git rev-parse :/^Initial.revision
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not have a backend-agnostic way to write objects into an object
database. While there is `write_object_file()`, this function is rather
specific to the loose object format.
Introduce `odb_write_object()` to plug this gap. For now, this function
is a simple wrapper around `write_object_file()` and doesn't even use
the passed-in object database yet. This will change in subsequent
commits, where `write_object_file()` is converted so that it works on
top of an `odb_source`. `odb_write_object()` will then become
responsible for deciding which source an object shall be written to.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `has_object()` to `odb_has_object()` to match other functions
related to the object database and our modern coding guidelines.
Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `repo_read_object_file()` to `odb_read_object()` to match other
functions related to the object database and our modern coding
guidelines.
Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.
Introduce compatibility wrappers so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Get rid of our dependency on `the_repository` in `assert_oid_type()` by
passing in the object database as a parameter and adjusting all callers.
Rename the function to `odb_assert_oid_type()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* ps/object-file-cleanup:
object-store: merge "object-store-ll.h" and "object-store.h"
object-store: remove global array of cached objects
object: split out functions relating to object store subsystem
object-file: drop `index_blob_stream()`
object-file: split up concerns of `HASH_*` flags
object-file: split out functions relating to object store subsystem
object-file: move `xmmap()` into "wrapper.c"
object-file: move `git_open_cloexec()` to "compat/open.c"
object-file: move `safe_create_leading_directories()` into "path.c"
object-file: move `mkdir_in_gitdir()` into "path.c"
The object layer has been updated to take an explicit repository
instance as a parameter in more code paths.
* ps/object-wo-the-repository:
hash: stop depending on `the_repository` in `null_oid()`
hash: fix "-Wsign-compare" warnings
object-file: split out logic regarding hash algorithms
delta-islands: stop depending on `the_repository`
object-file-convert: stop depending on `the_repository`
pack-bitmap-write: stop depending on `the_repository`
pack-revindex: stop depending on `the_repository`
pack-check: stop depending on `the_repository`
environment: move access to "core.bigFileThreshold" into repo settings
pack-write: stop depending on `the_repository` and `the_hash_algo`
object: stop depending on `the_repository`
csum-file: stop depending on `the_repository`
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.
Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we have the "object-store.h" header, most of the functionality for
object stores is actually hosted in "object-file.c". This makes it hard
to find relevant functions and causes us to mix up concerns.
Split out functions relating to the object store subsystem into a new
"object-store.c" file.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
clear_commit_marks_many() clears multiple commits one by one. Move the
code for handling a single commit to clear_commit_marks() and call it
instead of the other way around, to simplify the code.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple sites in "object-file-convert.c" where we use the
global `the_repository` variable, either explicitly or implicitly by
using `the_hash_algo`. All of these callsites are transitively called
from `convert_object_file()`, which indeed has no repo as input.
Refactor the function so that it receives a repository as a parameter
and pass it through to all internal functions to get rid of the
dependency. Remove the `USE_THE_REPOSITORY_VARIABLE` define.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
clear_commit_marks_1() clears the marks of the first parent and its
first parent and so on, and saves the higher numbered parents in a list
for later. There is no benefit in keeping that list growing with each
handled commit. Clear it after each run to reduce peak memory usage.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar as with the preceding commit, adapt `get_reachable_subset()` so
that it tracks array indices via `size_t` instead of using signed
integers to fix a couple of -Wsign-compare warnings. Adapt callers
accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Start working to make the codebase buildable with -Wsign-compare.
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings
The advice messages now tell the newer 'git config set' command to
set the advice.token configuration variable to squelch a message.
* bf/explicit-config-set-in-advice-messages:
advice: suggest using subcommand "git config set"