The commit-graph format reserved a byte among the header of the file to
store a "hash version". During the SHA-256 work, this was not modified
because file formats are not necessarily intended to work across hash
versions. If a repository has SHA-256 as its hash algorithm, it
automatically up-shifts the lengths of object names in all necessary
formats.
However, since we have this byte available for adjusting the version, we
can make the file formats more obviously incompatible instead of relying
on other context from the repository.
Update the oid_version() method in commit-graph.c to add a new value, 2,
for sha-256. This automatically writes the new value in a SHA-256
repository _and_ verifies the value is correct. This is a breaking
change relative to the current 'master' branch since 092b677 (Merge
branch 'bc/sha-256-cvs-svn-updates', 2020-08-13) but it is not breaking
relative to any released version of Git.
The test impact is relatively minor: the output of 'test-tool
read-graph' lists the header information, so those instances of '1' need
to be replaced with a variable determined by GIT_TEST_DEFAULT_HASH. A
more careful test is added that specifically creates a repository of
each type then swaps the commit-graph files. The important value here is
that the "git log" command succeeds while writing a message to stderr.
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we call test_oid_init in the setup for all test scripts,
there's no point in calling it individually. Remove all of the places
where we've done so to help keep tests tidy.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
43d3561 (commit-graph write: don't die if the existing graph is corrupt,
2019-03-25) introduced the GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD environment
variable. This was created to verify that commit-graph was not loaded
when writing a new non-incremental commit-graph.
An upcoming change wants to load a commit-graph in some valuable cases,
but we want to maintain that we don't trust the commit-graph data when
writing our new file. Instead of dying on load, instead die if we ever
try to parse a commit from the commit-graph. This functionally verifies
the same intended behavior, but allows a more advanced feature in the
next change.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we iterate over all entries of the Chunk Lookup table we make
sure that we don't attempt to read past the end of the mmap-ed
commit-graph file, and check in each iteration that the chunk ID and
offset we are about to read is still within the mmap-ed memory region.
However, these checks in each iteration are not really necessary,
because the number of chunks in the commit-graph file is already known
before this loop from the just parsed commit-graph header.
So let's check that the commit-graph file is large enough for all
entries in the Chunk Lookup table before we start iterating over those
entries, and drop those per-iteration checks. While at it, take into
account the size of everything that is necessary to have a valid
commit-graph file, i.e. the size of the header, the size of the
mandatory OID Fanout chunk, and the size of the signature in the
trailer as well.
Note that this necessitates the change of the error message as well,
and, consequently, have to update the 'detect incorrect chunk count'
test in 't5318-commit-graph.sh' as well.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit-graph file format specifies that the chunks may be in any
order. However, if the OID Lookup chunk happens to be the last one in
the file, then any command attempting to access the commit-graph data
will fail with:
fatal: invalid commit position. commit-graph is likely corrupt
In this case the error is wrong, the commit-graph file does conform to
the specification, but the parsing of the Chunk Lookup table is a bit
buggy, and leaves the field holding the number of commits in the
commit-graph zero-initialized.
The number of commits in the commit-graph is determined while parsing
the Chunk Lookup table, by dividing the size of the OID Lookup chunk
with the hash size. However, the Chunk Lookup table doesn't actually
store the size of the chunks, but it stores their starting offset.
Consequently, the size of a chunk can only be calculated by
subtracting the starting offsets of that chunk from the offset of the
subsequent chunk, or in case of the last chunk from the offset
recorded in the terminating label. This is currenly implemented in a
bit complicated way: as we iterate over the entries of the Chunk
Lookup table, we check the ID of each chunk and store its starting
offset, then we check the ID of the last seen chunk and calculate its
size using its previously saved offset if necessary (at the moment
it's only necessary for the OID Lookup chunk). Alas, while parsing
the Chunk Lookup table we only interate through the "real" chunks, but
never look at the terminating label, thus don't even check whether
it's necessary to calulate the size of the last chunk. Consequently,
if the OID Lookup chunk is the last one, then we don't calculate its
size and turn don't run the piece of code determining the number of
commits in the commit graph, leaving the field holding that number
unchanged (i.e. zero-initialized), eventually triggering the sanity
check in load_oid_from_graph().
Fix this by iterating through all entries in the Chunk Lookup table,
including the terminating label.
Note that this is the minimal fix, suitable for the maintenance track.
A better fix would be to simplify how the chunk sizes are calculated,
but that is a more invasive change, less suitable for 'maint', so that
will be done in later patches.
This additional flexibility of scanning more chunks breaks a test for
"git commit-graph verify" so alter that test to mutate the commit-graph
to have an even lower chunk count.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The following lines were not covered in a recent line-coverage test
against Git:
builtin/commit-graph.c
5b6653e5 244) progress = start_delayed_progress(
5b6653e5 268) stop_progress(&progress);
These statements are executed when both '--stdin-commits' and
'--progress' are passed. Introduce a trio of tests that exercise various
combinations of these options to ensure that these lines are covered.
More importantly, this is exercising a (somewhat) previously-ignored
feature of '--stdin-commits', which is that it respects '--progress'.
Prior to 5b6653e523 (builtin/commit-graph.c: dereference tags in
builtin, 2020-05-13), dereferencing input from '--stdin-commits' was
done inside of commit-graph.c.
Now that an additional progress meter may be generated from outside of
commit-graph.c, add a corresponding test to make sure that it also
respects '--[no]-progress'.
The other location that generates progress meter output (from d335ce8f24
(commit-graph.c: show progress of finding reachable commits,
2020-05-13)) is already covered by any test that passes '--reachable'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A handful of tests in t5318 use 'test_line_count = 0 ...' to make sure
that some command does not write any output. While correct, it is more
idiomatic to use 'test_must_be_empty' instead. Switch the former
invocations to use the latter instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in
'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on
receiving non-commit OIDs as input to '--stdin-commits'.
This behavior can be cumbersome to work around in, say, the case of
piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if
the caller does not want to cull out non-commits themselves. In this
situation, it would be ideal if 'git commit-graph write' wrote the graph
containing the inputs that did pertain to commits, and silently ignored
the remainder of the input.
Some options have been proposed to the effect of '--[no-]check-oids'
which would allow callers to have the commit-graph builtin do just that.
After some discussion, it is difficult to imagine a caller who wouldn't
want to pass '--no-check-oids', suggesting that we should get rid of the
behavior of complaining about non-commit inputs altogether.
If callers do wish to retain this behavior, they can easily work around
this change by doing the following:
git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
awk '
!/commit/ { print "not-a-commit:"$1 }
/commit/ { print $1 }
' |
git commit-graph write --stdin-commits
To make it so that valid OIDs that refer to non-existent objects are
indeed an error after loosening the error handling, perform an extra
lookup to make sure that object indeed exists before sending it to the
commit-graph internals.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the subsequent commit, we will introduce a dependency on
'graph_read_expect' from t5318.7. Preemptively move it below
'graph_read_expect()'s definition so that the test can call it.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous commit, Git learned 'hold_lock_file_for_update_mode' to
allow the caller to specify the permission bits (prior to further
adjustment by the umask and shared repository permissions) used when
acquiring a temporary file.
Use this in the commit-graph machinery for writing a non-split graph to
acquire an opened temporary file with permissions read-only permissions
to match the split behavior. (In the split case, Git uses
git_mkstemp_mode' for each of the commit-graph layers with permission
bits '0444').
One can notice this discrepancy when moving a non-split graph to be part
of a new chain. This causes a commit-graph chain where all layers have
read-only permission bits, except for the base layer, which is writable
for the current user.
Resolve this discrepancy by using the new
'hold_lock_file_for_update_mode' and passing the desired permission
bits.
Doing so causes some test fallout in t5318 and t6600. In t5318, this
occurs in tests that corrupt a commit-graph file by writing into it. For
these, 'chmod u+w'-ing the file beforehand resolves the issue. The
additional spot in 'corrupt_graph_verify' is necessary because of the
extra 'git commit-graph write' beforehand (which *does* rewrite the
commit-graph file). In t6600, this is caused by copying a read-only
commit-graph file into place and then trying to replace it. For these,
make these files writable.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When operating on a stream of commit OIDs on stdin, 'git commit-graph
write' checks that each OID refers to an object that is indeed a commit.
This is convenient to make sure that the given input is well-formed, but
can sometimes be undesirable.
For example, server operators may wish to feed the refnames that were
updated during a push to 'git commit-graph write --input=stdin-commits',
and silently discard refs that don't point at commits. This can be done
by combing the output of 'git for-each-ref' with '--format
%(*objecttype)', but this requires opening up a potentially large number
of objects. Instead, it is more convenient to feed the updated refs to
the commit-graph machinery, and let it throw out refs that don't point
to commits.
Introduce '--[no-]check-oids' to make such a behavior possible. With
'--check-oids' (the default behavior to retain backwards compatibility),
'git commit-graph write' will barf on a non-commit line in its input.
With 'no-check-oids', such lines will be silently ignored, making the
above possible by specifying this option.
No matter which is supplied, 'git commit-graph write' retains the
behavior from the previous commit of rejecting non-OID inputs like
"HEAD" and "refs/heads/foo" as before.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'write_commit_graph()' function takes in either a string list of
pack indices, or a string list of hexadecimal commit OIDs. These
correspond to the '--stdin-packs' and '--stdin-commits' mode(s) from
'git commit-graph write'.
Using a string_list of hexadecimal commit IDs is not the most efficient
use of memory, since we can instead use the 'struct oidset', which is
more well-suited for this case.
This has another benefit which will become apparent in the following
commit. This is that we are about to disambiguate the kinds of errors we
produce with '--stdin-commits' into "non-hex input" and "hex-input, but
referring to a non-commit object". By having 'write_commit_graph' take
in a 'struct oidset *' of commits, we place the burden on the caller (in
this case, the builtin) to handle the first case, and the commit-graph
machinery can handle the second case.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag to the test setup suite
in order to toggle writing Bloom filters when running any of the git tests.
If set to true, we will compute and write Bloom filters every time a test
calls `git commit-graph write`, as if the `--changed-paths` option was
passed in.
The test suite passes when GIT_TEST_COMMIT_GRAPH and
GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS are enabled.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Switch two tests to use $ZERO_OID to represent the all-zeros object ID.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In f237c8b6fe (commit-graph: implement git-commit-graph write,
2018-04-02) the test t5318.3 was introduced to ensure that calling 'git
commit-graph write' in a repository with no packfiles does not write any
commit-graph file(s).
To exercise more paths in 'builtin/commit-graph.c', this test passes
'--object-dir' to 'git commit-graph write', but the given argument
refers to the working copy, not the object directory.
Since the commit-graph sub-commands currently swallow these errors, this
does not result in a test failure. But, it is only lucky that the test
ends with no commit-graphs, since there were none to begin with.
In preparation for a future commit where an '--object-dir' argument that
does not match a known object directory will print out a failure, let's
fix the test to still use '--object-dir', but pass the correct location
to the object store instead of '.'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running with SHA-256 as the hash algorithm, the hash version octet
is 2 instead of 1. Pick the right value depending on the hash algorithm
and use it where we look for the existing value. To ensure the test
checking for invalid data passes, use 3 as the test value for an invalid
hash version.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The start_delayed_progress() method is a preferred way to show
optional progress to users as it ignores steps that take less
than two seconds. However, this makes testing unreliable as tests
expect to be very fast.
In addition, users may want to decrease or increase this time
interval depending on their preferences for terminal noise.
Create the GIT_PROGRESS_DELAY environment variable to control
the delay set during start_delayed_progress(). Set the value
in some tests to guarantee their output remains consistent.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git commit-graph read' subcommand is used in test scripts to check
that the commit-graph contents match the expected data. Mostly, this
helps check the header information and the list of chunks. Users do not
need this information, so move the functionality to a test helper.
Reported-by: Bryan Turner <bturner@atlassian.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we can't parse a commit, then parse_commit() will return an error
code. But it _also_ sets the "parsed" flag, which tells us not to bother
trying to re-parse the object. That means that subsequent parses have no
idea that the information in the struct may be bogus. I.e., doing this:
parse_commit(commit);
...
if (parse_commit(commit) < 0)
die("commit is broken");
will never trigger the die(). The second parse_commit() will see the
"parsed" flag and quietly return success.
There are two obvious ways to fix this:
1. Stop setting "parsed" until we've successfully parsed.
2. Keep a second "corrupt" flag to indicate that we saw an error (and
when the parsed flag is set, return 0/-1 depending on the corrupt
flag).
This patch does option 1. The obvious downside versus option 2 is that
we might continually re-parse a broken object. But in practice,
corruption like this is rare, and we typically die() or return an error
in the caller. So it's OK not to worry about optimizing for corruption.
And it's much simpler: we don't need to use an extra bit in the object
struct, and callers which check the "parsed" flag don't need to learn
about the corrupt bit, too.
There's no new test here, because this case is already covered in t5318.
Note that we do need to update the expected message there, because we
now detect the problem in the return from "parse_commit()", and not with
a separate check for a NULL tree. In fact, we can now ditch that
explicit tree check entirely, as we're covered robustly by this change
(and the previous recent change to treat a NULL tree as a parse error).
We'll also give tags the same treatment. I don't know offhand of any
cases where the problem can be triggered (it implies somebody ignoring a
parse error earlier in the process), but consistently returning an error
should cause the least surprise.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add --[no-]progress to git commit-graph write and verify.
The progress feature was introduced in 7b0f229
("commit-graph write: add progress output", 2018-09-17) but
the ability to opt-out was overlooked.
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply similar treatment as in the previous commit to handle an unchecked
call to 'get_commit_tree_oid()'. Previously, a NULL return value from
this function would be immediately dereferenced with '->hash', and then
cause a segfault.
Before dereferencing to access the 'hash' member, check the return value
of 'get_commit_tree_oid()' to make sure that it is not NULL.
To make this check correct, a related change is also needed in
'commit.c', which is to check the return value of 'get_commit_tree'
before taking its address. If 'get_commit_tree' returns NULL, we
encounter an undefined behavior when taking the address of the return
value of 'get_commit_tree' and then taking '->object.oid'. (On my system,
this is memory address 0x8, which is obviously wrong).
Fix this by making sure that 'get_commit_tree' returns something
non-NULL before digging through a structure that is not there, thus
preventing a segfault down the line in the commit graph code.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To write a commit graph chunk, 'write_graph_chunk_data()' takes a list
of commits to write and parses each one before writing the necessary
data, and continuing on to the next commit in the list.
Since the majority of these commits are not parsed ahead of time (an
exception is made for the *last* commit in the list, which is parsed
early within 'copy_oids_to_commits'), it is possible that calling
'parse_commit_no_graph()' on them may return an error. Failing to catch
these errors before de-referencing later calls can result in a undefined
memory access and a SIGSEGV.
One such example of this is 'get_commit_tree_oid()', which expects a
parsed object as its input (in this case, the commit-graph code passes
'*list'). If '*list' causes a parse error, the subsequent call will
fail.
Prevent such an issue by checking the return value of
'parse_commit_no_graph()' to avoid passing an unparsed object to a
function which expects a parsed object, thus preventing a segfault.
It is worth noting that this fix is really skirting around the issue in
object.c's 'parse_object()', which makes it difficult to tell how
corrupt an object is without digging into it. Presumably one could
change the meaning of 'parse_object' returns, but this would require
adjusting each callsite accordingly. Instead of that, add an additional
check to the object parsed.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When invoking 'git commit-graph' in a corrupt repository, one can cause
a segfault when ancestral commits are corrupt in one way or another.
This is due to two function calls in the 'commit-graph.c' code that may
return NULL, but are not checked for NULL-ness before dereferencing.
Before fixing the bug, introduce two failing tests that demonstrate the
problem. The first test corrupts an ancestral commit's parent to point
to a non-existent object. The second test instead corrupts an ancestral
tree by removing the 'tree' information entirely from the commit. Both
of these cases cause segfaults, each at different lines.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While 'git commit-graph write --stdin-commits' expects commit object
ids as input, it accepts and silently skips over any invalid commit
object ids, and still exits with success:
# nonsense
$ echo not-a-commit-oid | git commit-graph write --stdin-commits
$ echo $?
0
# sometimes I forgot that refs are not good...
$ echo HEAD | git commit-graph write --stdin-commits
$ echo $?
0
# valid tree OID, but not a commit OID
$ git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits
$ echo $?
0
$ ls -l .git/objects/info/commit-graph
ls: cannot access '.git/objects/info/commit-graph': No such file or directory
Check that all input records are indeed valid commit object ids and
return with error otherwise, the same way '--stdin-packs' handles
invalid input; see e103f7276f (commit-graph: return with errors during
write, 2019-06-12).
Note that it should only return with error when encountering an
invalid commit object id coming from standard input. However,
'--reachable' uses the same code path to process object ids pointed to
by all refs, and that includes tag object ids as well, which should
still be skipped over. Therefore add a new flag to 'enum
commit_graph_write_flags' and a corresponding field to 'struct
write_commit_graph_context', so we can differentiate between those two
cases.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 't5318-commit-graph.sh' the test 'close with correct error on bad
input' manually verifies the exit code of a 'git commit-graph write'
command.
Use 'test_expect_code' instead.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extend write_commit_graph() to write a commit-graph chain when given the
COMMIT_GRAPH_SPLIT flag.
This implementation is purposefully simplistic in how it creates a new
chain. The commits not already in the chain are added to a new tip
commit-graph file.
Much of the logic around writing a graph-{hash}.graph file and updating
the commit-graph-chain file is the same as the commit-graph file case.
However, there are several places where we need to do some extra logic
in the split case.
Track the list of graph filenames before and after the planned write.
This will be more important when we start merging graph files, but it
also allows us to upgrade our commit-graph file to the appropriate
graph-{hash}.graph file when we upgrade to a chain of commit-graphs.
Note that we use the eighth byte of the commit-graph header to store the
number of base graph files. This determines the length of the base
graphs chunk.
A subtle change of behavior with the new logic is that we do not write a
commit-graph if we our commit list is empty. This extends to the typical
case, which is reflected in t5318-commit-graph.sh.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The write_commit_graph() method uses die() to report failure and
exit when confronted with an unexpected condition. This use of
die() in a library function is incorrect and is now replaced by
error() statements and an int return type. Return zero on success
and a negative value on failure.
Now that we use 'goto cleanup' to jump to the terminal condition
on an error, we have new paths that could lead to uninitialized
values. New initializers are added to correct for this.
The builtins 'commit-graph', 'gc', and 'commit' call these methods,
so update them to check the return value. Test that 'git commit-graph
write' returns a proper error code when hitting a failure condition
in write_commit_graph().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the commit-graph is written we end up calling
parse_commit(). This will in turn invoke code that'll consult the
existing commit-graph about the commit, if the graph is corrupted we
die.
We thus get into a state where a failing "commit-graph verify" can't
be followed-up with a "commit-graph write" if core.commitGraph=true is
set, the graph either needs to be manually removed to proceed, or
core.commitGraph needs to be set to "false".
Change the "commit-graph write" codepath to use a new
parse_commit_no_graph() helper instead of parse_commit() to avoid
this. The latter will call repo_parse_commit_internal() with
use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit
graph with commit parsing", 2018-04-10).
Not using the old graph at all slows down the writing of the new graph
by some small amount, but is a sensible way to prevent an error in the
existing commit-graph from spreading.
Just fixing the current issue would be likely to result in code that's
inadvertently broken in the future. New code might use the
commit-graph at a distance. To detect such cases introduce a
"GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our
corruption tests, and test that a "write/verify" combo works after
every one of our current test cases where we now detect commit-graph
corruption.
Some of the code changes here might be strictly unnecessary, e.g. I
was unable to find cases where the parse_commit() called from
write_graph_chunk_data() didn't exit early due to
"item->object.parsed" being true in
repo_parse_commit_internal() (before the use_commit_graph=1 has any
effect). But let's also convert those cases for good measure, we do
not have exhaustive tests for all possible types of commit-graph
corruption.
This might need to be re-visited if we learn to write the commit-graph
incrementally, but probably not. Hopefully we'll just start by finding
out what commits we have in total, then read the old graph(s) to see
what they cover, and finally write a new graph file with everything
that's missing. In that case the new graph writing code just needs to
continue to use e.g. a parse_commit() that doesn't consult the
existing commit-graphs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change "commit-graph verify" to error on open() failures other than
ENOENT. As noted in the third paragraph of 283e68c72f ("commit-graph:
add 'verify' subcommand", 2018-06-27) and the test it added it's
intentional that "commit-graph verify" doesn't error out when the file
doesn't exist.
But let's not be overly promiscuous in what we accept. If we can't
read the file for other reasons, e.g. permission errors, bad file
descriptor etc. we'd like to report an error to the user.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the commit-graph loading code work as a library that returns an
error code instead of calling exit(1) when the commit-graph is
corrupt. This means that e.g. "status" will now report commit-graph
corruption as an "error: [...]" at the top of its output, but then
proceed to work normally.
This required splitting up the load_commit_graph_one() function so
that the code that deals with open()-ing and stat()-ing the graph can
now be called independently as open_commit_graph().
This is needed because "commit-graph verify" where the graph doesn't
exist isn't an error. See the third paragraph in
283e68c72f ("commit-graph: add 'verify' subcommand",
2018-06-27). There's a bug in that logic where we conflate the
intended ENOENT with other errno values (e.g. EACCES), but this change
doesn't address that. That'll be addressed in a follow-up change.
I'm then splitting most of the logic out of load_commit_graph_one()
into load_commit_graph_one_fd_st(), which allows for providing an
existing file descriptor and stat information to the loading
code. This isn't strictly needed, but it would be redundant and
confusing to open() and stat() the file twice for some of the
codepaths, this allows for calling open_commit_graph() followed by
load_commit_graph_one_fd_st(). The "graph_file" still needs to be
passed to that function for the the "graph file %s is too small" error
message.
This leaves load_commit_graph_one() unused by everything except the
internal prepare_commit_graph_one() function, so let's mark it as
"static". If someone needs it in the future we can remove the "static"
attribute. I could also rewrite its sole remaining
user ("prepare_commit_graph_one()") to use
load_commit_graph_one_fd_st() instead, but let's leave it at this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When core.commitGraph=true is set, various common commands now consult
the commit graph. Because the commit-graph code is very trusting of
its input data, it's possibly to construct a graph that'll cause an
immediate segfault on e.g. "status" (and e.g. "log", "blame", ...). In
some other cases where git immediately exits with a cryptic error
about the graph being broken.
The root cause of this is that while the "commit-graph verify"
sub-command exhaustively verifies the graph, other users of the graph
simply trust the graph, and will e.g. deference data found at certain
offsets as pointers, causing segfaults.
This change does the bare minimum to ensure that we don't segfault in
the common fill_commit_in_graph() codepath called by
e.g. setup_revisions(), to do this instrument the "commit-graph
verify" tests to always check if "status" would subsequently
segfault. This fixes the following tests which would previously
segfault:
not ok 50 - detect low chunk count
not ok 51 - detect missing OID fanout chunk
not ok 52 - detect missing OID lookup chunk
not ok 53 - detect missing commit data chunk
Those happened because with the commit-graph enabled setup_revisions()
would eventually call fill_commit_in_graph(), where e.g.
g->chunk_commit_data is used early as an offset (and will be
0x0). With this change we get far enough to detect that the graph is
broken, and show an error instead. E.g.:
$ git status; echo $?
error: commit-graph is missing the Commit Data chunk
1
That also sucks, we should *warn* and not hard-fail "status" just
because the commit-graph is corrupt, but fixing is left to a follow-up
change.
A side-effect of changing the reporting from graph_report() to error()
is that we now have an "error: " prefix for these even for
"commit-graph verify". Pseudo-diff before/after:
$ git commit-graph verify
-commit-graph is missing the Commit Data chunk
+error: commit-graph is missing the Commit Data chunk
Changing that is OK. Various errors it emits now early on are prefixed
with "error: ", moving these over and changing the output doesn't
break anything.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a remnant from early versions of the commit-graph patch series
[1], when 'git commit-graph --write' printed the hash of the created
commit-graph file, and tests did look at the command's output, because
the commit-graph file's name included that hash as well.
[1] https://public-inbox.org/git/1517348383-112294-6-git-send-email-dstolee@microsoft.com/
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the recently split-up components of the corrupt_graph_and_verify()
function to assert that we error on graphs that are too small. The
error was added in 2a2e32bdc5 ("commit-graph: implement git
commit-graph read", 2018-04-10), but there was no test for it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split up the corrupt_graph_and_verify() function added in
d9b9f8a6fd ("commit-graph: verify catches corrupt signature",
2018-06-27) into its logical components of setting up the test itself,
doing the corruption in a particular way with "dd", and then finally
testing that stderr is what we expect.
This allows for re-using everything except the now slimmer
corrupt_graph_and_verify() to corrupt the graph in a way that doesn't
involve inserting a given byte sequence at a given position,
e.g. truncating it entirely to a custom value.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change an unportable invocation of "dd" with count=0, that wanted to
truncate the commit-graph file. In POSIX it is unspecified what
happens when count=0 is provided[1]. The NetBSD "dd" behavior
differs from GNU (and seemingly other BSDs), which has left this test
broken since d2b86fbaa1 ("commit-graph: fix buffer read-overflow",
2019-01-15).
Copying from /dev/null would seek/truncate to seek=$zero_pos and
stop immediately after that (without being able to copy anything),
which is the right way to truncate the file.
1. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are platforms (e.g. NonStop) that lack /dev/zero; use the
generate_zero_bytes helper we just introduced to append stream
of NULs at the end of the file.
The original, even though it uses "dd seek=... count=..." to make it
look like it is overwriting the middle part of an existing file, has
truncated the file before this step with another use of "dd", which
may make it tricky to see why this rewrite is a correct one.
Signed-off-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents, and the
names of most of the macros, variables, struct fields, and functions
related to this chunk contain the term "large edges", e.g.
write_graph_chunk_large_edges(). However, it's not a really great
term, as the edges to the second and subsequent parents stored in this
chunk are not any larger than the edges to the first and second
parents stored in the "main" 'Commit Data' chunk. It's the number of
edges, IOW number of parents, that is larger compared to non-merge and
"regular" two-parent merge commits. And indeed, two functions in
'commit-graph.c' have a local variable called 'num_extra_edges' that
refer to the same thing, and this "extra edges" term is much better at
describing these edges.
So let's rename all these references to "large edges" in macro,
variable, function, etc. names to "extra edges". There is a
GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency
rename it to GRAPH_EXTRA_EDGES_NEEDED.
We can do so safely without causing any incompatibility issues,
because the term "large edges" doesn't come up in the file format
itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there
is no 'L' in there), but only in the specification text. The string
"large edges", however, does come up in the output of 'git
commit-graph read' and in tests looking at its input, but that command
is explicitly documented as debugging aid, so we can change its output
and the affected tests safely.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fuzz-commit-graph identified a case where Git will read past the end of
a buffer containing a commit graph if the graph's header has an
incorrect chunk count. A simple bounds check in parse_commit_graph()
prevents this.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix various places where the ordering was obviously wrong, meaning it
was easy to find with grep.
Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --file option to grep isn't in POSIX[1], but -f is[1]. Let's check
for that in the future, and fix the portability regression in
f237c8b6fe ("commit-graph: implement git-commit-graph write",
2018-04-02) that broke e.g. AIX.
1. http://pubs.opengroup.org/onlinepubs/009695399/utilities/grep.html
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Augment commit_graph_compatible(r) to return false when the given
repository r has commit grafts or is a shallow clone. Test that in these
situations we ignore existing commit-graph files and we do not write new
commit-graph files.
Helped-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create new method commit_graph_compatible(r) to check if a given
repository r is compatible with the commit-graph feature. Fill the
method with a check to see if replace-objects exist. Test this
interaction succeeds, including ignoring an existing commit-graph and
failing to write a new commit-graph. However, we do ensure that
we write a new commit-graph by setting read_replace_refs to 0, thereby
ignoring the replace refs.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Two tests added in dade47c06c (commit-graph: add repo arg to graph
readers, 2018-07-11) prepare the contents of 'expect' files by
'echo'ing the results of command substitutions. That's unncessary,
avoid them by directly saving the output of the commands executed in
those command substitutions.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit-graph files are binary files, so they should not be
compared with 'test_cmp', because that might cause issues like
crashing[1] or infinite loop[2] on Windows, where 'test_cmp' is a
shell function to deal with random LF-CRLF conversions[3].
Use 'test_cmp_bin' instead.
1 - b93e6e3663 (t5000, t5003: do not use test_cmp to compare binary
files, 2014-06-04)
2 - f9f3851b4d (t9300: use test_cmp_bin instead of test_cmp to compare
binary files, 2014-09-12)
3 - 4d715ac05c (Windows: a test_cmp that is agnostic to random LF <>
CRLF conversions, 2013-10-26)
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a struct repository argument to the functions in commit-graph.h that
read the commit graph. (This commit does not affect functions that write
commit graphs.)
Because the commit graph functions can now read the commit graph of any
repository, the global variable core_commit_graph has been removed.
Instead, the config option core.commitGraph is now read on the first
time in a repository that a commit is attempted to be parsed using its
commit graph.
This commit includes a test that exercises the functionality on an
arbitrary repository that is not the_repository.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit-graph file is a very helpful feature for speeding up git
operations. In order to make it more useful, make it possible to
write the commit-graph file during standard garbage collection
operations.
Add a 'gc.commitGraph' config setting that triggers writing a
commit-graph file after any non-trivial 'git gc' command. Defaults to
false while the commit-graph feature matures. We specifically do not
want to have this on by default until the commit-graph feature is fully
integrated with history-modifying features like shallow clones.
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing commit-graph files, it can be convenient to ask for all
reachable commits (starting at the ref set) in the resulting file. This
is particularly helpful when writing to stdin is complicated, such as a
future integration with 'git gc'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If core.commitGraph is true, verify the contents of the commit-graph
during 'git fsck' using the 'git commit-graph verify' subcommand. Run
this check on all alternates, as well.
We use a new process for two reasons:
1. The subcommand decouples the details of loading and verifying a
commit-graph file from the other fsck details.
2. The commit-graph verification requires the commits to be loaded
in a specific order to guarantee we parse from the commit-graph
file for some objects and from the object database for others.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>