Commit Graph

139 Commits (9a41735af66dba1b677a8e88e7c2bc2f831bf6d2)

Author SHA1 Message Date
Junio C Hamano 70cdbbe3a7 Merge branch 'ds/commit-graph-bloom-updates' into master
Updates to the changed-paths bloom filter.

* ds/commit-graph-bloom-updates:
  commit-graph: check all leading directories in changed path Bloom filters
  revision: empty pathspecs should not use Bloom filters
  revision.c: fix whitespace
  commit-graph: check chunk sizes after writing
  commit-graph: simplify chunk writes into loop
  commit-graph: unify the signatures of all write_graph_chunk_*() functions
  commit-graph: persist existence of changed-paths
  bloom: fix logic in get_bloom_filter()
  commit-graph: change test to die on parse, not load
  commit-graph: place bloom_settings in context
2020-07-30 13:20:31 -07:00
Junio C Hamano de6dda0dc3 Merge branch 'sg/commit-graph-cleanups' into master
The changed-path Bloom filter is improved using ideas from an
independent implementation.

* sg/commit-graph-cleanups:
  commit-graph: simplify write_commit_graph_file() #2
  commit-graph: simplify write_commit_graph_file() #1
  commit-graph: simplify parse_commit_graph() #2
  commit-graph: simplify parse_commit_graph() #1
  commit-graph: clean up #includes
  diff.h: drop diff_tree_oid() & friends' return value
  commit-slab: add a function to deep free entries on the slab
  commit-graph-format.txt: all multi-byte numbers are in network byte order
  commit-graph: fix parsing the Chunk Lookup table
  tree-walk.c: don't match submodule entries for 'submod/anything'
2020-07-30 13:20:30 -07:00
brian m. carlson e023ff0691 t: remove test_oid_init in tests
Now that we call test_oid_init in the setup for all test scripts,
there's no point in calling it individually.  Remove all of the places
where we've done so to help keep tests tidy.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30 09:16:49 -07:00
Derrick Stolee 7b671f8c2b commit-graph: change test to die on parse, not load
43d3561 (commit-graph write: don't die if the existing graph is corrupt,
2019-03-25) introduced the GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD environment
variable. This was created to verify that commit-graph was not loaded
when writing a new non-incremental commit-graph.

An upcoming change wants to load a commit-graph in some valuable cases,
but we want to maintain that we don't trust the commit-graph data when
writing our new file. Instead of dying on load, instead die if we ever
try to parse a commit from the commit-graph. This functionally verifies
the same intended behavior, but allows a more advanced feature in the
next change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 17:12:08 -07:00
Junio C Hamano abacefe865 Merge branch 'tb/t5318-cleanup'
Code cleanup.

* tb/t5318-cleanup:
  t5318: test that '--stdin-commits' respects '--[no-]progress'
  t5318: use 'test_must_be_empty'
2020-06-17 21:54:01 -07:00
Junio C Hamano dc57a9be5e Merge branch 'tb/commit-graph-no-check-oids'
Clean-up the commit-graph codepath.

* tb/commit-graph-no-check-oids:
  commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
  t5318: reorder test below 'graph_read_expect'
  commit-graph.c: simplify 'fill_oids_from_commits'
  builtin/commit-graph.c: dereference tags in builtin
  builtin/commit-graph.c: extract 'read_one_commit()'
  commit-graph.c: peel refs in 'add_ref_to_set'
  commit-graph.c: show progress of finding reachable commits
  commit-graph.c: extract 'refs_cb_data'
2020-06-08 18:06:27 -07:00
SZEDER Gábor 2ad4f1a7c4 commit-graph: simplify parse_commit_graph() #1
While we iterate over all entries of the Chunk Lookup table we make
sure that we don't attempt to read past the end of the mmap-ed
commit-graph file, and check in each iteration that the chunk ID and
offset we are about to read is still within the mmap-ed memory region.
However, these checks in each iteration are not really necessary,
because the number of chunks in the commit-graph file is already known
before this loop from the just parsed commit-graph header.

So let's check that the commit-graph file is large enough for all
entries in the Chunk Lookup table before we start iterating over those
entries, and drop those per-iteration checks.  While at it, take into
account the size of everything that is necessary to have a valid
commit-graph file, i.e. the size of the header, the size of the
mandatory OID Fanout chunk, and the size of the signature in the
trailer as well.

Note that this necessitates the change of the error message as well,
and, consequently, have to update the 'detect incorrect chunk count'
test in 't5318-commit-graph.sh' as well.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-08 12:28:49 -07:00
SZEDER Gábor cb9daf16db commit-graph: fix parsing the Chunk Lookup table
The commit-graph file format specifies that the chunks may be in any
order.  However, if the OID Lookup chunk happens to be the last one in
the file, then any command attempting to access the commit-graph data
will fail with:

  fatal: invalid commit position. commit-graph is likely corrupt

In this case the error is wrong, the commit-graph file does conform to
the specification, but the parsing of the Chunk Lookup table is a bit
buggy, and leaves the field holding the number of commits in the
commit-graph zero-initialized.

The number of commits in the commit-graph is determined while parsing
the Chunk Lookup table, by dividing the size of the OID Lookup chunk
with the hash size.  However, the Chunk Lookup table doesn't actually
store the size of the chunks, but it stores their starting offset.
Consequently, the size of a chunk can only be calculated by
subtracting the starting offsets of that chunk from the offset of the
subsequent chunk, or in case of the last chunk from the offset
recorded in the terminating label.  This is currenly implemented in a
bit complicated way: as we iterate over the entries of the Chunk
Lookup table, we check the ID of each chunk and store its starting
offset, then we check the ID of the last seen chunk and calculate its
size using its previously saved offset if necessary (at the moment
it's only necessary for the OID Lookup chunk).  Alas, while parsing
the Chunk Lookup table we only interate through the "real" chunks, but
never look at the terminating label, thus don't even check whether
it's necessary to calulate the size of the last chunk.  Consequently,
if the OID Lookup chunk is the last one, then we don't calculate its
size and turn don't run the piece of code determining the number of
commits in the commit graph, leaving the field holding that number
unchanged (i.e. zero-initialized), eventually triggering the sanity
check in load_oid_from_graph().

Fix this by iterating through all entries in the Chunk Lookup table,
including the terminating label.

Note that this is the minimal fix, suitable for the maintenance track.
A better fix would be to simplify how the chunk sizes are calculated,
but that is a more invasive change, less suitable for 'maint', so that
will be done in later patches.

This additional flexibility of scanning more chunks breaks a test for
"git commit-graph verify" so alter that test to mutate the commit-graph
to have an even lower chunk count.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-08 12:28:48 -07:00
Taylor Blau 94fbd9149a t5318: test that '--stdin-commits' respects '--[no-]progress'
The following lines were not covered in a recent line-coverage test
against Git:

  builtin/commit-graph.c
  5b6653e5 244) progress = start_delayed_progress(
  5b6653e5 268) stop_progress(&progress);

These statements are executed when both '--stdin-commits' and
'--progress' are passed. Introduce a trio of tests that exercise various
combinations of these options to ensure that these lines are covered.

More importantly, this is exercising a (somewhat) previously-ignored
feature of '--stdin-commits', which is that it respects '--progress'.
Prior to 5b6653e523 (builtin/commit-graph.c: dereference tags in
builtin, 2020-05-13), dereferencing input from '--stdin-commits' was
done inside of commit-graph.c.

Now that an additional progress meter may be generated from outside of
commit-graph.c, add a corresponding test to make sure that it also
respects '--[no]-progress'.

The other location that generates progress meter output (from d335ce8f24
(commit-graph.c: show progress of finding reachable commits,
2020-05-13)) is already covered by any test that passes '--reachable'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 07:54:08 -07:00
Taylor Blau 6334c5ff97 t5318: use 'test_must_be_empty'
A handful of tests in t5318 use 'test_line_count = 0 ...' to make sure
that some command does not write any output. While correct, it is more
idiomatic to use 'test_must_be_empty' instead. Switch the former
invocations to use the latter instead.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 07:52:54 -07:00
Taylor Blau 2f00c355cb commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in
'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on
receiving non-commit OIDs as input to '--stdin-commits'.

This behavior can be cumbersome to work around in, say, the case of
piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if
the caller does not want to cull out non-commits themselves. In this
situation, it would be ideal if 'git commit-graph write' wrote the graph
containing the inputs that did pertain to commits, and silently ignored
the remainder of the input.

Some options have been proposed to the effect of '--[no-]check-oids'
which would allow callers to have the commit-graph builtin do just that.
After some discussion, it is difficult to imagine a caller who wouldn't
want to pass '--no-check-oids', suggesting that we should get rid of the
behavior of complaining about non-commit inputs altogether.

If callers do wish to retain this behavior, they can easily work around
this change by doing the following:

     git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
     awk '
       !/commit/ { print "not-a-commit:"$1 }
        /commit/ { print $1 }
     ' |
     git commit-graph write --stdin-commits

To make it so that valid OIDs that refer to non-existent objects are
indeed an error after loosening the error handling, perform an extra
lookup to make sure that object indeed exists before sending it to the
commit-graph internals.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:51:11 -07:00
Taylor Blau 1f1304d497 t5318: reorder test below 'graph_read_expect'
In the subsequent commit, we will introduce a dependency on
'graph_read_expect' from t5318.7. Preemptively move it below
'graph_read_expect()'s definition so that the test can call it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:51:11 -07:00
Junio C Hamano 1d7e9c4c4e Merge branch 'tb/commit-graph-perm-bits'
Some of the files commit-graph subsystem keeps on disk did not
correctly honor the core.sharedRepository settings and some were
left read-write.

* tb/commit-graph-perm-bits:
  commit-graph.c: make 'commit-graph-chain's read-only
  commit-graph.c: ensure graph layers respect core.sharedRepository
  commit-graph.c: write non-split graphs as read-only
  lockfile.c: introduce 'hold_lock_file_for_update_mode'
  tempfile.c: introduce 'create_tempfile_mode'
2020-05-05 14:54:28 -07:00
Junio C Hamano 9b6606f43d Merge branch 'gs/commit-graph-path-filter'
Introduce an extension to the commit-graph to make it efficient to
check for the paths that were modified at each commit using Bloom
filters.

* gs/commit-graph-path-filter:
  bloom: ignore renames when computing changed paths
  commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag
  t4216: add end to end tests for git log with Bloom filters
  revision.c: add trace2 stats around Bloom filter usage
  revision.c: use Bloom filters to speed up path based revision walks
  commit-graph: add --changed-paths option to write subcommand
  commit-graph: reuse existing Bloom filters during write
  commit-graph: write Bloom filters to commit graph file
  commit-graph: examine commits by generation number
  commit-graph: examine changed-path objects in pack order
  commit-graph: compute Bloom filters for changed paths
  diff: halt tree-diff early after max_changes
  bloom.c: core Bloom filter implementation for changed paths.
  bloom.c: introduce core Bloom filter constructs
  bloom.c: add the murmur3 hash implementation
  commit-graph: define and use MAX_NUM_CHUNKS
2020-05-01 13:39:53 -07:00
Junio C Hamano 6a1c17d05b Merge branch 'tb/commit-graph-split-strategy'
"git commit-graph write" learned different ways to write out split
files.

* tb/commit-graph-split-strategy:
  Revert "commit-graph.c: introduce '--[no-]check-oids'"
  commit-graph.c: introduce '--[no-]check-oids'
  commit-graph.h: replace 'commit_hex' with 'commits'
  oidset: introduce 'oidset_size'
  builtin/commit-graph.c: introduce split strategy 'replace'
  builtin/commit-graph.c: introduce split strategy 'no-merge'
  builtin/commit-graph.c: support for '--split[=<strategy>]'
  t/helper/test-read-graph.c: support commit-graph chains
2020-05-01 13:39:52 -07:00
Junio C Hamano dbd5e0a186 Revert "commit-graph.c: introduce '--[no-]check-oids'"
This reverts commit 7a9ce0269b,
which has not yet gained consensus.
2020-04-29 12:44:40 -07:00
Taylor Blau 1f9becaedc commit-graph.c: write non-split graphs as read-only
In the previous commit, Git learned 'hold_lock_file_for_update_mode' to
allow the caller to specify the permission bits (prior to further
adjustment by the umask and shared repository permissions) used when
acquiring a temporary file.

Use this in the commit-graph machinery for writing a non-split graph to
acquire an opened temporary file with permissions read-only permissions
to match the split behavior. (In the split case, Git uses
git_mkstemp_mode' for each of the commit-graph layers with permission
bits '0444').

One can notice this discrepancy when moving a non-split graph to be part
of a new chain. This causes a commit-graph chain where all layers have
read-only permission bits, except for the base layer, which is writable
for the current user.

Resolve this discrepancy by using the new
'hold_lock_file_for_update_mode' and passing the desired permission
bits.

Doing so causes some test fallout in t5318 and t6600. In t5318, this
occurs in tests that corrupt a commit-graph file by writing into it. For
these, 'chmod u+w'-ing the file beforehand resolves the issue. The
additional spot in 'corrupt_graph_verify' is necessary because of the
extra 'git commit-graph write' beforehand (which *does* rewrite the
commit-graph file). In t6600, this is caused by copying a read-only
commit-graph file into place and then trying to replace it. For these,
make these files writable.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-29 12:35:30 -07:00
Taylor Blau 7a9ce0269b commit-graph.c: introduce '--[no-]check-oids'
When operating on a stream of commit OIDs on stdin, 'git commit-graph
write' checks that each OID refers to an object that is indeed a commit.
This is convenient to make sure that the given input is well-formed, but
can sometimes be undesirable.

For example, server operators may wish to feed the refnames that were
updated during a push to 'git commit-graph write --input=stdin-commits',
and silently discard refs that don't point at commits. This can be done
by combing the output of 'git for-each-ref' with '--format
%(*objecttype)', but this requires opening up a potentially large number
of objects.  Instead, it is more convenient to feed the updated refs to
the commit-graph machinery, and let it throw out refs that don't point
to commits.

Introduce '--[no-]check-oids' to make such a behavior possible. With
'--check-oids' (the default behavior to retain backwards compatibility),
'git commit-graph write' will barf on a non-commit line in its input.
With 'no-check-oids', such lines will be silently ignored, making the
above possible by specifying this option.

No matter which is supplied, 'git commit-graph write' retains the
behavior from the previous commit of rejecting non-OID inputs like
"HEAD" and "refs/heads/foo" as before.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-15 09:20:34 -07:00
Taylor Blau 6830c36077 commit-graph.h: replace 'commit_hex' with 'commits'
The 'write_commit_graph()' function takes in either a string list of
pack indices, or a string list of hexadecimal commit OIDs. These
correspond to the '--stdin-packs' and '--stdin-commits' mode(s) from
'git commit-graph write'.

Using a string_list of hexadecimal commit IDs is not the most efficient
use of memory, since we can instead use the 'struct oidset', which is
more well-suited for this case.

This has another benefit which will become apparent in the following
commit. This is that we are about to disambiguate the kinds of errors we
produce with '--stdin-commits' into "non-hex input" and "hex-input, but
referring to a non-commit object". By having 'write_commit_graph' take
in a 'struct oidset *' of commits, we place the burden on the caller (in
this case, the builtin) to handle the first case, and the commit-graph
machinery can handle the second case.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-15 09:20:30 -07:00
Garima Singh d5b873c832 commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag
Add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag to the test setup suite
in order to toggle writing Bloom filters when running any of the git tests.
If set to true, we will compute and write Bloom filters every time a test
calls `git commit-graph write`, as if the `--changed-paths` option was
passed in.

The test suite passes when GIT_TEST_COMMIT_GRAPH and
GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS are enabled.

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06 11:08:37 -07:00
Junio C Hamano 5af345a438 Merge branch 'bc/hash-independent-tests-part-8'
Preparation for SHA-256 migration continues.

* bc/hash-independent-tests-part-8: (21 commits)
  t6024: update for SHA-256
  t6006: make hash size independent
  t6000: abstract away SHA-1-specific constants
  t5703: make test work with SHA-256
  t5607: make hash size independent
  t5318: update for SHA-256
  t5515: make test hash independent
  t5321: make test hash independent
  t5313: make test hash independent
  t5309: make test hash independent
  t5302: make hash size independent
  t4060: make test work with SHA-256
  t4211: add test cases for SHA-256
  t4211: move SHA-1-specific test cases into a directory
  t4013: make test hash independent
  t3311: make test work with SHA-256
  t3310: make test work with SHA-256
  t3309: make test work with SHA-256
  t3308: make test work with SHA-256
  t3206: make hash size independent
  ...
2020-02-17 13:22:16 -08:00
Junio C Hamano 53c3be2c29 Merge branch 'tb/commit-graph-object-dir'
The code to compute the commit-graph has been taught to use a more
robust way to tell if two object directories refer to the same
thing.

* tb/commit-graph-object-dir:
  commit-graph.h: use odb in 'load_commit_graph_one_fd_st'
  commit-graph.c: remove path normalization, comparison
  commit-graph.h: store object directory in 'struct commit_graph'
  commit-graph.h: store an odb in 'struct write_commit_graph_context'
  t5318: don't pass non-object directory to '--object-dir'
2020-02-14 12:54:24 -08:00
brian m. carlson 48c10cc0e6 t5318: update for SHA-256
Switch two tests to use $ZERO_OID to represent the all-zeros object ID.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-07 11:07:30 -08:00
Junio C Hamano f52ab33616 Merge branch 'bc/hash-independent-tests-part-7'
Preparation of test scripts for the day when the object names will
use SHA-256 continues.

* bc/hash-independent-tests-part-7:
  t5604: make hash independent
  t5601: switch into repository to hash object
  t5562: use $ZERO_OID
  t5540: make hash size independent
  t5537: make hash size independent
  t5530: compute results based on object length
  t5512: abstract away SHA-1-specific constants
  t5510: make hash size independent
  t5504: make hash algorithm independent
  t5324: make hash size independent
  t5319: make test work with SHA-256
  t5319: change invalid offset for SHA-256 compatibility
  t5318: update for SHA-256
  t4300: abstract away SHA-1-specific constants
  t4204: make hash size independent
  t4202: abstract away SHA-1-specific constants
  t4200: make hash size independent
  t4134: compute appropriate length constant
  t4066: compute index line in diffs
  t4054: make hash-size independent
2020-02-05 14:34:59 -08:00
Taylor Blau 1793280e91 t5318: don't pass non-object directory to '--object-dir'
In f237c8b6fe (commit-graph: implement git-commit-graph write,
2018-04-02) the test t5318.3 was introduced to ensure that calling 'git
commit-graph write' in a repository with no packfiles does not write any
commit-graph file(s).

To exercise more paths in 'builtin/commit-graph.c', this test passes
'--object-dir' to 'git commit-graph write', but the given argument
refers to the working copy, not the object directory.

Since the commit-graph sub-commands currently swallow these errors, this
does not result in a test failure. But, it is only lucky that the test
ends with no commit-graphs, since there were none to begin with.

In preparation for a future commit where an '--object-dir' argument that
does not match a known object directory will print out a failure, let's
fix the test to still use '--object-dir', but pass the correct location
to the object store instead of '.'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 12:47:40 -08:00
brian m. carlson 1d86c8f0ce t5318: update for SHA-256
When running with SHA-256 as the hash algorithm, the hash version octet
is 2 instead of 1.  Pick the right value depending on the hash algorithm
and use it where we look for the existing value.  To ensure the test
checking for invalid data passes, use 3 as the test value for an invalid
hash version.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-15 14:06:19 -08:00
Junio C Hamano 41dac79c2f Merge branch 'ds/commit-graph-delay-gen-progress'
One kind of progress messages were always given during commit-graph
generation, instead of following the "if it takes more than two
seconds, show progress" pattern, which has been corrected.

* ds/commit-graph-delay-gen-progress:
  commit-graph: use start_delayed_progress()
  progress: create GIT_PROGRESS_DELAY
2019-12-10 13:11:43 -08:00
Junio C Hamano 723a8adba5 Merge branch 'ds/test-read-graph'
Dev support for commit-graph feature.

* ds/test-read-graph:
  test-tool: use 'read-graph' helper
2019-12-01 09:04:39 -08:00
Derrick Stolee 44a4693bfc progress: create GIT_PROGRESS_DELAY
The start_delayed_progress() method is a preferred way to show
optional progress to users as it ignores steps that take less
than two seconds. However, this makes testing unreliable as tests
expect to be very fast.

In addition, users may want to decrease or increase this time
interval depending on their preferences for terminal noise.

Create the GIT_PROGRESS_DELAY environment variable to control
the delay set during start_delayed_progress(). Set the value
in some tests to guarantee their output remains consistent.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-27 10:57:10 +09:00
Derrick Stolee 4bd0593e0f test-tool: use 'read-graph' helper
The 'git commit-graph read' subcommand is used in test scripts to check
that the commit-graph contents match the expected data. Mostly, this
helps check the header information and the list of chunks. Users do not
need this information, so move the functionality to a test helper.

Reported-by: Bryan Turner <bturner@atlassian.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13 11:14:16 +09:00
Jeff King 228c78fbd4 commit, tag: don't set parsed bit for parse failures
If we can't parse a commit, then parse_commit() will return an error
code. But it _also_ sets the "parsed" flag, which tells us not to bother
trying to re-parse the object. That means that subsequent parses have no
idea that the information in the struct may be bogus.  I.e., doing this:

  parse_commit(commit);
  ...
  if (parse_commit(commit) < 0)
          die("commit is broken");

will never trigger the die(). The second parse_commit() will see the
"parsed" flag and quietly return success.

There are two obvious ways to fix this:

  1. Stop setting "parsed" until we've successfully parsed.

  2. Keep a second "corrupt" flag to indicate that we saw an error (and
     when the parsed flag is set, return 0/-1 depending on the corrupt
     flag).

This patch does option 1. The obvious downside versus option 2 is that
we might continually re-parse a broken object. But in practice,
corruption like this is rare, and we typically die() or return an error
in the caller. So it's OK not to worry about optimizing for corruption.
And it's much simpler: we don't need to use an extra bit in the object
struct, and callers which check the "parsed" flag don't need to learn
about the corrupt bit, too.

There's no new test here, because this case is already covered in t5318.
Note that we do need to update the expected message there, because we
now detect the problem in the return from "parse_commit()", and not with
a separate check for a NULL tree. In fact, we can now ditch that
explicit tree check entirely, as we're covered robustly by this change
(and the previous recent change to treat a NULL tree as a parse error).

We'll also give tags the same treatment. I don't know offhand of any
cases where the problem can be triggered (it implies somebody ignoring a
parse error earlier in the process), but consistently returning an error
should cause the least surprise.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:04:49 +09:00
Junio C Hamano 80693e3f09 Merge branch 'tb/commit-graph-harden'
The code to parse and use the commit-graph file has been made more
robust against corrupted input.

* tb/commit-graph-harden:
  commit-graph.c: handle corrupt/missing trees
  commit-graph.c: handle commit parsing errors
  t/t5318: introduce failing 'git commit-graph write' tests
2019-10-07 11:32:58 +09:00
Garima Singh 7371612255 commit-graph: add --[no-]progress to write and verify
Add --[no-]progress to git commit-graph write and verify.
The progress feature was introduced in 7b0f229
("commit-graph write: add progress output", 2018-09-17) but
the ability to opt-out was overlooked.

Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-18 14:23:09 -07:00
Taylor Blau 806278dead commit-graph.c: handle corrupt/missing trees
Apply similar treatment as in the previous commit to handle an unchecked
call to 'get_commit_tree_oid()'. Previously, a NULL return value from
this function would be immediately dereferenced with '->hash', and then
cause a segfault.

Before dereferencing to access the 'hash' member, check the return value
of 'get_commit_tree_oid()' to make sure that it is not NULL.

To make this check correct, a related change is also needed in
'commit.c', which is to check the return value of 'get_commit_tree'
before taking its address. If 'get_commit_tree' returns NULL, we
encounter an undefined behavior when taking the address of the return
value of 'get_commit_tree' and then taking '->object.oid'. (On my system,
this is memory address 0x8, which is obviously wrong).

Fix this by making sure that 'get_commit_tree' returns something
non-NULL before digging through a structure that is not there, thus
preventing a segfault down the line in the commit graph code.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-09 10:55:59 -07:00
Taylor Blau 16749b8dd2 commit-graph.c: handle commit parsing errors
To write a commit graph chunk, 'write_graph_chunk_data()' takes a list
of commits to write and parses each one before writing the necessary
data, and continuing on to the next commit in the list.

Since the majority of these commits are not parsed ahead of time (an
exception is made for the *last* commit in the list, which is parsed
early within 'copy_oids_to_commits'), it is possible that calling
'parse_commit_no_graph()' on them may return an error. Failing to catch
these errors before de-referencing later calls can result in a undefined
memory access and a SIGSEGV.

One such example of this is 'get_commit_tree_oid()', which expects a
parsed object as its input (in this case, the commit-graph code passes
'*list'). If '*list' causes a parse error, the subsequent call will
fail.

Prevent such an issue by checking the return value of
'parse_commit_no_graph()' to avoid passing an unparsed object to a
function which expects a parsed object, thus preventing a segfault.

It is worth noting that this fix is really skirting around the issue in
object.c's 'parse_object()', which makes it difficult to tell how
corrupt an object is without digging into it. Presumably one could
change the meaning of 'parse_object' returns, but this would require
adjusting each callsite accordingly. Instead of that, add an additional
check to the object parsed.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-09 10:55:58 -07:00
Taylor Blau 23424ea759 t/t5318: introduce failing 'git commit-graph write' tests
When invoking 'git commit-graph' in a corrupt repository, one can cause
a segfault when ancestral commits are corrupt in one way or another.
This is due to two function calls in the 'commit-graph.c' code that may
return NULL, but are not checked for NULL-ness before dereferencing.

Before fixing the bug, introduce two failing tests that demonstrate the
problem. The first test corrupts an ancestral commit's parent to point
to a non-existent object. The second test instead corrupts an ancestral
tree by removing the 'tree' information entirely from the commit. Both
of these cases cause segfaults, each at different lines.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-09 10:55:53 -07:00
SZEDER Gábor 7c5c9b9c57 commit-graph: error out on invalid commit oids in 'write --stdin-commits'
While 'git commit-graph write --stdin-commits' expects commit object
ids as input, it accepts and silently skips over any invalid commit
object ids, and still exits with success:

  # nonsense
  $ echo not-a-commit-oid | git commit-graph write --stdin-commits
  $ echo $?
  0
  # sometimes I forgot that refs are not good...
  $ echo HEAD | git commit-graph write --stdin-commits
  $ echo $?
  0
  # valid tree OID, but not a commit OID
  $ git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits
  $ echo $?
  0
  $ ls -l .git/objects/info/commit-graph
  ls: cannot access '.git/objects/info/commit-graph': No such file or directory

Check that all input records are indeed valid commit object ids and
return with error otherwise, the same way '--stdin-packs' handles
invalid input; see e103f7276f (commit-graph: return with errors during
write, 2019-06-12).

Note that it should only return with error when encountering an
invalid commit object id coming from standard input.  However,
'--reachable' uses the same code path to process object ids pointed to
by all refs, and that includes tag object ids as well, which should
still be skipped over.  Therefore add a new flag to 'enum
commit_graph_write_flags' and a corresponding field to 'struct
write_commit_graph_context', so we can differentiate between those two
cases.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-05 14:33:39 -07:00
SZEDER Gábor 9916073be5 t5318-commit-graph: use 'test_expect_code'
In 't5318-commit-graph.sh' the test 'close with correct error on bad
input' manually verifies the exit code of a 'git commit-graph write'
command.

Use 'test_expect_code' instead.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-05 14:30:59 -07:00
Junio C Hamano 92b1ea66b9 Merge branch 'ds/commit-graph-incremental'
The commits in a repository can be described by multiple
commit-graph files now, which allows the commit-graph files to be
updated incrementally.

* ds/commit-graph-incremental:
  commit-graph: test verify across alternates
  commit-graph: normalize commit-graph filenames
  commit-graph: test --split across alternate without --split
  commit-graph: test octopus merges with --split
  commit-graph: clean up chains after flattened write
  commit-graph: verify chains with --shallow mode
  commit-graph: create options for split files
  commit-graph: expire commit-graph files
  commit-graph: allow cross-alternate chains
  commit-graph: merge commit-graph chains
  commit-graph: add --split option to builtin
  commit-graph: write commit-graph chains
  commit-graph: rearrange chunk count logic
  commit-graph: add base graphs chunk
  commit-graph: load commit-graph chains
  commit-graph: rename commit_compare to oid_compare
  commit-graph: prepare for commit-graph chains
  commit-graph: document commit-graph chains
2019-07-19 11:30:20 -07:00
Junio C Hamano e1168940ce Merge branch 'ds/commit-graph-write-refactor'
Renamed from commit-graph-format-v2 and changed scope.

* ds/commit-graph-write-refactor:
  commit-graph: extract write_commit_graph_file()
  commit-graph: extract copy_oids_to_commits()
  commit-graph: extract count_distinct_commits()
  commit-graph: extract fill_oids_from_all_packs()
  commit-graph: extract fill_oids_from_commit_hex()
  commit-graph: extract fill_oids_from_packs()
  commit-graph: create write_commit_graph_context
  commit-graph: remove Future Work section
  commit-graph: collapse parameters into flags
  commit-graph: return with errors during write
  commit-graph: fix the_repository reference
2019-07-09 15:25:36 -07:00
Derrick Stolee 6c622f9f0b commit-graph: write commit-graph chains
Extend write_commit_graph() to write a commit-graph chain when given the
COMMIT_GRAPH_SPLIT flag.

This implementation is purposefully simplistic in how it creates a new
chain. The commits not already in the chain are added to a new tip
commit-graph file.

Much of the logic around writing a graph-{hash}.graph file and updating
the commit-graph-chain file is the same as the commit-graph file case.
However, there are several places where we need to do some extra logic
in the split case.

Track the list of graph filenames before and after the planned write.
This will be more important when we start merging graph files, but it
also allows us to upgrade our commit-graph file to the appropriate
graph-{hash}.graph file when we upgrade to a chain of commit-graphs.

Note that we use the eighth byte of the commit-graph header to store the
number of base graph files. This determines the length of the base
graphs chunk.

A subtle change of behavior with the new logic is that we do not write a
commit-graph if we our commit list is empty. This extends to the typical
case, which is reflected in t5318-commit-graph.sh.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-19 20:46:26 -07:00
Derrick Stolee e103f7276f commit-graph: return with errors during write
The write_commit_graph() method uses die() to report failure and
exit when confronted with an unexpected condition. This use of
die() in a library function is incorrect and is now replaced by
error() statements and an int return type. Return zero on success
and a negative value on failure.

Now that we use 'goto cleanup' to jump to the terminal condition
on an error, we have new paths that could lead to uninitialized
values. New initializers are added to correct for this.

The builtins 'commit-graph', 'gc', and 'commit' call these methods,
so update them to check the return value. Test that 'git commit-graph
write' returns a proper error code when hitting a failure condition
in write_commit_graph().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:20:53 -07:00
Junio C Hamano a5e4be2f68 Merge branch 'ab/commit-graph-fixes'
Code cleanup with more careful error checking before using data
read from the commit-graph file.

* ab/commit-graph-fixes:
  commit-graph: improve & i18n error messages
  commit-graph write: don't die if the existing graph is corrupt
  commit-graph verify: detect inability to read the graph
  commit-graph: don't pass filename to load_commit_graph_one_fd_st()
  commit-graph: don't early exit(1) on e.g. "git status"
  commit-graph: fix segfault on e.g. "git status"
  commit-graph tests: test a graph that's too small
  commit-graph tests: split up corrupt_graph_and_verify()
2019-04-25 16:41:15 +09:00
Ævar Arnfjörð Bjarmason 43d3561805 commit-graph write: don't die if the existing graph is corrupt
When the commit-graph is written we end up calling
parse_commit(). This will in turn invoke code that'll consult the
existing commit-graph about the commit, if the graph is corrupted we
die.

We thus get into a state where a failing "commit-graph verify" can't
be followed-up with a "commit-graph write" if core.commitGraph=true is
set, the graph either needs to be manually removed to proceed, or
core.commitGraph needs to be set to "false".

Change the "commit-graph write" codepath to use a new
parse_commit_no_graph() helper instead of parse_commit() to avoid
this. The latter will call repo_parse_commit_internal() with
use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit
graph with commit parsing", 2018-04-10).

Not using the old graph at all slows down the writing of the new graph
by some small amount, but is a sensible way to prevent an error in the
existing commit-graph from spreading.

Just fixing the current issue would be likely to result in code that's
inadvertently broken in the future. New code might use the
commit-graph at a distance. To detect such cases introduce a
"GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our
corruption tests, and test that a "write/verify" combo works after
every one of our current test cases where we now detect commit-graph
corruption.

Some of the code changes here might be strictly unnecessary, e.g. I
was unable to find cases where the parse_commit() called from
write_graph_chunk_data() didn't exit early due to
"item->object.parsed" being true in
repo_parse_commit_internal() (before the use_commit_graph=1 has any
effect). But let's also convert those cases for good measure, we do
not have exhaustive tests for all possible types of commit-graph
corruption.

This might need to be re-visited if we learn to write the commit-graph
incrementally, but probably not. Hopefully we'll just start by finding
out what commits we have in total, then read the old graph(s) to see
what they cover, and finally write a new graph file with everything
that's missing. In that case the new graph writing code just needs to
continue to use e.g. a parse_commit() that doesn't consult the
existing commit-graphs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01 12:14:50 +09:00
Ævar Arnfjörð Bjarmason 7b8ce9c673 commit-graph verify: detect inability to read the graph
Change "commit-graph verify" to error on open() failures other than
ENOENT. As noted in the third paragraph of 283e68c72f ("commit-graph:
add 'verify' subcommand", 2018-06-27) and the test it added it's
intentional that "commit-graph verify" doesn't error out when the file
doesn't exist.

But let's not be overly promiscuous in what we accept. If we can't
read the file for other reasons, e.g. permission errors, bad file
descriptor etc. we'd like to report an error to the user.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01 12:14:50 +09:00
Ævar Arnfjörð Bjarmason 61df89c8e5 commit-graph: don't early exit(1) on e.g. "git status"
Make the commit-graph loading code work as a library that returns an
error code instead of calling exit(1) when the commit-graph is
corrupt. This means that e.g. "status" will now report commit-graph
corruption as an "error: [...]" at the top of its output, but then
proceed to work normally.

This required splitting up the load_commit_graph_one() function so
that the code that deals with open()-ing and stat()-ing the graph can
now be called independently as open_commit_graph().

This is needed because "commit-graph verify" where the graph doesn't
exist isn't an error. See the third paragraph in
283e68c72f ("commit-graph: add 'verify' subcommand",
2018-06-27). There's a bug in that logic where we conflate the
intended ENOENT with other errno values (e.g. EACCES), but this change
doesn't address that. That'll be addressed in a follow-up change.

I'm then splitting most of the logic out of load_commit_graph_one()
into load_commit_graph_one_fd_st(), which allows for providing an
existing file descriptor and stat information to the loading
code. This isn't strictly needed, but it would be redundant and
confusing to open() and stat() the file twice for some of the
codepaths, this allows for calling open_commit_graph() followed by
load_commit_graph_one_fd_st(). The "graph_file" still needs to be
passed to that function for the the "graph file %s is too small" error
message.

This leaves load_commit_graph_one() unused by everything except the
internal prepare_commit_graph_one() function, so let's mark it as
"static". If someone needs it in the future we can remove the "static"
attribute. I could also rewrite its sole remaining
user ("prepare_commit_graph_one()") to use
load_commit_graph_one_fd_st() instead, but let's leave it at this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01 12:14:50 +09:00
Ævar Arnfjörð Bjarmason 2ac138d568 commit-graph: fix segfault on e.g. "git status"
When core.commitGraph=true is set, various common commands now consult
the commit graph. Because the commit-graph code is very trusting of
its input data, it's possibly to construct a graph that'll cause an
immediate segfault on e.g. "status" (and e.g. "log", "blame", ...). In
some other cases where git immediately exits with a cryptic error
about the graph being broken.

The root cause of this is that while the "commit-graph verify"
sub-command exhaustively verifies the graph, other users of the graph
simply trust the graph, and will e.g. deference data found at certain
offsets as pointers, causing segfaults.

This change does the bare minimum to ensure that we don't segfault in
the common fill_commit_in_graph() codepath called by
e.g. setup_revisions(), to do this instrument the "commit-graph
verify" tests to always check if "status" would subsequently
segfault. This fixes the following tests which would previously
segfault:

    not ok 50 - detect low chunk count
    not ok 51 - detect missing OID fanout chunk
    not ok 52 - detect missing OID lookup chunk
    not ok 53 - detect missing commit data chunk

Those happened because with the commit-graph enabled setup_revisions()
would eventually call fill_commit_in_graph(), where e.g.
g->chunk_commit_data is used early as an offset (and will be
0x0). With this change we get far enough to detect that the graph is
broken, and show an error instead. E.g.:

    $ git status; echo $?
    error: commit-graph is missing the Commit Data chunk
    1

That also sucks, we should *warn* and not hard-fail "status" just
because the commit-graph is corrupt, but fixing is left to a follow-up
change.

A side-effect of changing the reporting from graph_report() to error()
is that we now have an "error: " prefix for these even for
"commit-graph verify". Pseudo-diff before/after:

    $ git commit-graph verify
    -commit-graph is missing the Commit Data chunk
    +error: commit-graph is missing the Commit Data chunk

Changing that is OK. Various errors it emits now early on are prefixed
with "error: ", moving these over and changing the output doesn't
break anything.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01 12:14:49 +09:00
SZEDER Gábor 0b918b75af t5318-commit-graph: remove unused variable
This is a remnant from early versions of the commit-graph patch series
[1], when 'git commit-graph --write' printed the hash of the created
commit-graph file, and tests did look at the command's output, because
the commit-graph file's name included that hash as well.

[1] https://public-inbox.org/git/1517348383-112294-6-git-send-email-dstolee@microsoft.com/

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 21:37:07 +09:00
Ævar Arnfjörð Bjarmason 945944ca70 commit-graph tests: test a graph that's too small
Use the recently split-up components of the corrupt_graph_and_verify()
function to assert that we error on graphs that are too small. The
error was added in 2a2e32bdc5 ("commit-graph: implement git
commit-graph read", 2018-04-10), but there was no test for it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-22 14:31:43 -08:00
Ævar Arnfjörð Bjarmason f6761faaa1 commit-graph tests: split up corrupt_graph_and_verify()
Split up the corrupt_graph_and_verify() function added in
d9b9f8a6fd ("commit-graph: verify catches corrupt signature",
2018-06-27) into its logical components of setting up the test itself,
doing the corruption in a particular way with "dd", and then finally
testing that stderr is what we expect.

This allows for re-using everything except the now slimmer
corrupt_graph_and_verify() to corrupt the graph in a way that doesn't
involve inserting a given byte sequence at a given position,
e.g. truncating it entirely to a custom value.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-22 14:29:49 -08:00
Ævar Arnfjörð Bjarmason b9cc405612 commit-graph tests: fix unportable "dd" invocation
Change an unportable invocation of "dd" with count=0, that wanted to
truncate the commit-graph file.  In POSIX it is unspecified what
happens when count=0 is provided[1]. The NetBSD "dd" behavior
differs from GNU (and seemingly other BSDs), which has left this test
broken since d2b86fbaa1 ("commit-graph: fix buffer read-overflow",
2019-01-15).

Copying from /dev/null would seek/truncate to seek=$zero_pos and
stop immediately after that (without being able to copy anything),
which is the right way to truncate the file.

1. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-22 11:20:56 -08:00
Randall S. Becker 24b451e77c t5318: replace use of /dev/zero with generate_zero_bytes
There are platforms (e.g. NonStop) that lack /dev/zero; use the
generate_zero_bytes helper we just introduced to append stream
of NULs at the end of the file.

The original, even though it uses "dd seek=... count=..." to make it
look like it is overwriting the middle part of an existing file, has
truncated the file before this step with another use of "dd", which
may make it tricky to see why this rewrite is a correct one.

Signed-off-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-13 09:23:45 -08:00
Junio C Hamano e5eac57356 Merge branch 'ab/commit-graph-write-progress'
The codepath to show progress meter while writing out commit-graph
file has been improved.

* ab/commit-graph-write-progress:
  commit-graph write: emit a percentage for all progress
  commit-graph write: add itermediate progress
  commit-graph write: remove empty line for readability
  commit-graph write: add more descriptive progress output
  commit-graph write: show progress for object search
  commit-graph write: more descriptive "writing out" output
  commit-graph write: add "Writing out" progress output
  commit-graph: don't call write_graph_chunk_extra_edges() unnecessarily
  commit-graph: rename "large edges" to "extra edges"
2019-02-05 14:26:14 -08:00
SZEDER Gábor 5af7417bd8 commit-graph: rename "large edges" to "extra edges"
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents, and the
names of most of the macros, variables, struct fields, and functions
related to this chunk contain the term "large edges", e.g.
write_graph_chunk_large_edges().  However, it's not a really great
term, as the edges to the second and subsequent parents stored in this
chunk are not any larger than the edges to the first and second
parents stored in the "main" 'Commit Data' chunk.  It's the number of
edges, IOW number of parents, that is larger compared to non-merge and
"regular" two-parent merge commits.  And indeed, two functions in
'commit-graph.c' have a local variable called 'num_extra_edges' that
refer to the same thing, and this "extra edges" term is much better at
describing these edges.

So let's rename all these references to "large edges" in macro,
variable, function, etc. names to "extra edges".  There is a
GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency
rename it to GRAPH_EXTRA_EDGES_NEEDED.

We can do so safely without causing any incompatibility issues,
because the term "large edges" doesn't come up in the file format
itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there
is no 'L' in there), but only in the specification text.  The string
"large edges", however, does come up in the output of 'git
commit-graph read' and in tests looking at its input, but that command
is explicitly documented as debugging aid, so we can change its output
and the affected tests safely.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-22 11:33:46 -08:00
Josh Steadmon d2b86fbaa1 commit-graph: fix buffer read-overflow
fuzz-commit-graph identified a case where Git will read past the end of
a buffer containing a commit graph if the graph's header has an
incorrect chunk count. A simple bounds check in parse_commit_graph()
prevents this.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15 20:32:00 -08:00
Junio C Hamano f2e2136ad7 Merge branch 'md/test-cleanup'
Various test scripts have been updated for style and also correct
handling of exit status of various commands.

* md/test-cleanup:
  tests: order arguments to git-rev-list properly
  t9109: don't swallow Git errors upstream of pipes
  tests: don't swallow Git errors upstream of pipes
  t/*: fix ordering of expected/observed arguments
  tests: standardize pipe placement
  Documentation: add shell guidelines
  t/README: reformat Do, Don't, Keep in mind lists
2018-10-16 16:16:01 +09:00
Junio C Hamano 6d8f8ebb74 Merge branch 'ds/commit-graph-with-grafts'
The recently introduced commit-graph auxiliary data is incompatible
with mechanisms such as replace & grafts that "breaks" immutable
nature of the object reference relationship.  Disable optimizations
based on its use (and updating existing commit-graph) when these
incompatible features are in use in the repository.

* ds/commit-graph-with-grafts:
  commit-graph: close_commit_graph before shallow walk
  commit-graph: not compatible with uninitialized repo
  commit-graph: not compatible with grafts
  commit-graph: not compatible with replace objects
  test-repository: properly init repo
  commit-graph: update design document
  refs.c: upgrade for_each_replace_ref to be a each_repo_ref_fn callback
  refs.c: migrate internal ref iteration to pass thru repository argument
2018-10-16 16:15:59 +09:00
Matthew DeVore dcbaa0b361 t/*: fix ordering of expected/observed arguments
Fix various places where the ordering was obviously wrong, meaning it
was easy to find with grep.

Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07 08:51:18 +09:00
Derrick Stolee ae0c89d41b t5318: use test_oid for HASH_LEN
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 08:10:32 -07:00
Ævar Arnfjörð Bjarmason 4a3ed63802 tests: fix and add lint for non-portable grep --file
The --file option to grep isn't in POSIX[1], but -f is[1]. Let's check
for that in the future, and fix the portability regression in
f237c8b6fe ("commit-graph: implement git-commit-graph write",
2018-04-02) that broke e.g. AIX.

1. http://pubs.opengroup.org/onlinepubs/009695399/utilities/grep.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-27 14:07:32 -07:00
Derrick Stolee 20fd6d5799 commit-graph: not compatible with grafts
Augment commit_graph_compatible(r) to return false when the given
repository r has commit grafts or is a shallow clone. Test that in these
situations we ignore existing commit-graph files and we do not write new
commit-graph files.

Helped-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 10:22:51 -07:00
Derrick Stolee d6538246d3 commit-graph: not compatible with replace objects
Create new method commit_graph_compatible(r) to check if a given
repository r is compatible with the commit-graph feature. Fill the
method with a check to see if replace-objects exist. Test this
interaction succeeds, including ignoring an existing commit-graph and
failing to write a new commit-graph. However, we do ensure that
we write a new commit-graph by setting read_replace_refs to 0, thereby
ignoring the replace refs.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 10:22:51 -07:00
Junio C Hamano 3bc484af74 Merge branch 'jt/commit-graph-per-object-store'
Test update.

* jt/commit-graph-per-object-store:
  t5318: avoid unnecessary command substitutions
2018-08-20 11:33:51 -07:00
Junio C Hamano 5dd54744b8 Merge branch 'ds/commit-graph-fsck'
Test fix.

* ds/commit-graph-fsck:
  t5318: use 'test_cmp_bin' to compare commit-graph files
2018-08-20 11:33:51 -07:00
SZEDER Gábor 3c4586301d t5318: avoid unnecessary command substitutions
Two tests added in dade47c06c (commit-graph: add repo arg to graph
readers, 2018-07-11) prepare the contents of 'expect' files by
'echo'ing the results of command substitutions.  That's unncessary,
avoid them by directly saving the output of the commands executed in
those command substitutions.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 12:09:02 -07:00
SZEDER Gábor eb7cc5bc80 t5318: use 'test_cmp_bin' to compare commit-graph files
The commit-graph files are binary files, so they should not be
compared with 'test_cmp', because that might cause issues like
crashing[1] or infinite loop[2] on Windows, where 'test_cmp' is a
shell function to deal with random LF-CRLF conversions[3].

Use 'test_cmp_bin' instead.

1 - b93e6e3663 (t5000, t5003: do not use test_cmp to compare binary
    files, 2014-06-04)
2 - f9f3851b4d (t9300: use test_cmp_bin instead of test_cmp to compare
    binary files, 2014-09-12)
3 - 4d715ac05c (Windows: a test_cmp that is agnostic to random LF <>
    CRLF conversions, 2013-10-26)

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 12:07:29 -07:00
Jonathan Tan dade47c06c commit-graph: add repo arg to graph readers
Add a struct repository argument to the functions in commit-graph.h that
read the commit graph. (This commit does not affect functions that write
commit graphs.)

Because the commit graph functions can now read the commit graph of any
repository, the global variable core_commit_graph has been removed.
Instead, the config option core.commitGraph is now read on the first
time in a repository that a commit is attempted to be parsed using its
commit graph.

This commit includes a test that exercises the functionality on an
arbitrary repository that is not the_repository.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17 15:47:48 -07:00
Derrick Stolee d5d5d7b641 gc: automatically write commit-graph files
The commit-graph file is a very helpful feature for speeding up git
operations. In order to make it more useful, make it possible to
write the commit-graph file during standard garbage collection
operations.

Add a 'gc.commitGraph' config setting that triggers writing a
commit-graph file after any non-trivial 'git gc' command. Defaults to
false while the commit-graph feature matures. We specifically do not
want to have this on by default until the commit-graph feature is fully
integrated with history-modifying features like shallow clones.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:29:10 -07:00
Derrick Stolee 59fb87701f commit-graph: add '--reachable' option
When writing commit-graph files, it can be convenient to ask for all
reachable commits (starting at the ref set) in the resulting file. This
is particularly helpful when writing to stdin is complicated, such as a
future integration with 'git gc'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:29:10 -07:00
Derrick Stolee e0fd51e1d7 fsck: verify commit-graph
If core.commitGraph is true, verify the contents of the commit-graph
during 'git fsck' using the 'git commit-graph verify' subcommand. Run
this check on all alternates, as well.

We use a new process for two reasons:

1. The subcommand decouples the details of loading and verifying a
   commit-graph file from the other fsck details.

2. The commit-graph verification requires the commits to be loaded
   in a specific order to guarantee we parse from the commit-graph
   file for some objects and from the object database for others.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:29:10 -07:00
Derrick Stolee 41df0e307f commit-graph: verify contents match checksum
The commit-graph file ends with a SHA1 hash of the previous contents. If
a commit-graph file has errors but the checksum hash is correct, then we
know that the problem is a bug in Git and not simply file corruption
after-the-fact.

Compute the checksum right away so it is the first error that appears,
and make the message translatable since this error can be "corrected" by
a user by simply deleting the file and recomputing. The rest of the
errors are useful only to developers.

Be sure to continue checking the rest of the file data if the checksum
is wrong. This is important for our tests, as we break the checksum as
we modify bytes of the commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:29:10 -07:00
Derrick Stolee 437787ae1b commit-graph: test for corrupted octopus edge
The commit-graph file has an extra chunk to store the parent int-ids for
parents beyond the first parent for octopus merges. Our test repo has a
single octopus merge that we can manipulate to demonstrate the 'verify'
subcommand detects incorrect values in that chunk.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:29:10 -07:00
Derrick Stolee 88968ebf86 commit-graph: verify commit date
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:29:10 -07:00
Derrick Stolee 1373e547f7 commit-graph: verify generation number
While iterating through the commit parents, perform the generation
number calculation and compare against the value stored in the
commit-graph.

The tests demonstrate that having a different set of parents affects
the generation number calculation, and this value propagates to
descendants. Hence, we drop the single-line condition on the output.

Since Git will ship with the commit-graph feature without generation
numbers, we need to accept commit-graphs with all generation numbers
equal to zero. In this case, ignore the generation number calculation.

However, verify that we should never have a mix of zero and non-zero
generation numbers. Create a test that sets one commit to generation
zero and all following commits report a failure as they have non-zero
generation in a file that contains generation number zero.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:29:10 -07:00
Derrick Stolee 53614b1351 commit-graph: verify parent list
The commit-graph file stores parents in a two-column portion of the
commit data chunk. If there is only one parent, then the second column
stores 0xFFFFFFFF to indicate no second parent.

The 'verify' subcommand checks the parent list for the commit loaded
from the commit-graph and the one parsed from the object database. Test
these checks for corrupt parents, too many parents, and wrong parents.

Add a boundary check to insert_parent_or_die() for when the parent
position value is out of range.

The octopus merge will be tested in a later commit.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:29:10 -07:00
Derrick Stolee 2e3c07378f commit-graph: verify root tree OIDs
The 'verify' subcommand must compare the commit content parsed from the
commit-graph against the content in the object database. Use
lookup_commit() and parse_commit_in_graph_one() to parse the commits
from the graph and compare against a commit that is loaded separately
and parsed directly from the object database.

Add checks for the root tree OID.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:29:10 -07:00
Derrick Stolee 96af91d410 commit-graph: verify objects exist
In the 'verify' subcommand, load commits directly from the object
database to ensure they exist. Parse by skipping the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:27:05 -07:00
Derrick Stolee 9bda846789 commit-graph: verify corrupt OID fanout and lookup
In the commit-graph file, the OID fanout chunk provides an index into
the OID lookup. The 'verify' subcommand should find incorrect values
in the fanout.

Similarly, the 'verify' subcommand should find out-of-order values in
the OID lookup.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:27:05 -07:00
Derrick Stolee 2bd0365f37 commit-graph: verify required chunks are present
The commit-graph file requires the following three chunks:

* OID Fanout
* OID Lookup
* Commit Data

If any of these are missing, then the 'verify' subcommand should
report a failure. This includes the chunk IDs malformed or the
chunk count is truncated.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:27:05 -07:00
Derrick Stolee d9b9f8a6fd commit-graph: verify catches corrupt signature
This is the first of several commits that add a test to check that
'git commit-graph verify' catches corruption in the commit-graph
file. The first test checks that the command catches an error in
the file signature. This is a check that exists in the existing
commit-graph reading code.

Add a helper method 'corrupt_graph_and_verify' to the test script
t5318-commit-graph.sh. This helper corrupts the commit-graph file
at a certain location, runs 'git commit-graph verify', and reports
the output to the 'err' file. This data is filtered to remove the
lines added by 'test_must_fail' when the test is run verbosely.
Then, the output is checked to contain a specific error message.

Most messages from 'git commit-graph verify' will not be marked
for translation. There will be one exception: the message that
reports an invalid checksum will be marked for translation, as that
is the only message that is intended for a typical user.

Helped-by: Szeder Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:27:05 -07:00
Derrick Stolee 283e68c72f commit-graph: add 'verify' subcommand
If the commit-graph file becomes corrupt, we need a way to verify
that its contents match the object database. In the manner of
'git fsck' we will implement a 'git commit-graph verify' subcommand
to report all issues with the file.

Add the 'verify' subcommand to the 'commit-graph' builtin and its
documentation. The subcommand is currently a no-op except for
loading the commit-graph into memory, which may trigger run-time
errors that would be caught by normal use. Add a simple test that
ensures the command returns a zero error code.

If no commit-graph file exists, this is an acceptable state. Do
not report any errors.

Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:27:05 -07:00
Derrick Stolee 55abcb417b t5318-commit-graph.sh: use core.commitGraph
The commit-graph tests should be checking that normal Git operations
succeed and have matching output with and without the commit-graph
feature enabled. However, the test was toggling 'core.graph' instead
of the correct 'core.commitGraph' variable.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:27:04 -07:00
Derrick Stolee 7adf526670 merge: check config before loading commits
Now that we use generation numbers from the commit-graph, we must
ensure that all commits that exist in the commit-graph are loaded
from that file instead of from the object database. Since the
commit-graph file is only checked if core.commitGraph is true, we
must check the default config before we load any commits.

In the merge builtin, the config was checked after loading the HEAD
commit. This was due to the use of the global 'branch' when checking
merge-specific config settings.

Move the config load to be between the initialization of 'branch' and
the commit lookup.

Without this change, a fast-forward merge would hit a BUG("bad
generation skip") statement in commit.c during paint_down_to_common().
This is because the HEAD commit would be loaded with "infinite"
generation but then reached by commits with "finite" generation
numbers.

Add a test to t5318-commit-graph.sh that exercises this code path to
prevent a regression.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22 12:36:34 +09:00
Derrick Stolee 7547b95b4f commit-graph: implement "--append" option
Teach git-commit-graph to add all commits from the existing
commit-graph file to the file about to be written. This should be
used when adding new commits without performing garbage collection.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11 10:43:02 +09:00
Derrick Stolee 3d5df01b5e commit-graph: build graph from starting commits
Teach git-commit-graph to read commits from stdin when the
--stdin-commits flag is specified. Commits reachable from these
commits are added to the graph. This is a much faster way to construct
the graph than inspecting all packed objects, but is restricted to
known tips.

For the Linux repository, 700,000+ commits were added to the graph
file starting from 'master' in 7-9 seconds, depending on the number
of packfiles in the repo (1, 24, or 120).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11 10:43:02 +09:00
Derrick Stolee 049d51a2bb commit-graph: read only from specific pack-indexes
Teach git-commit-graph to inspect the objects only in a certain list
of pack-indexes within the given pack directory. This allows updating
the commit graph iteratively.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11 10:43:02 +09:00
Derrick Stolee 177722b344 commit: integrate commit graph with commit parsing
Teach Git to inspect a commit graph file to supply the contents of a
struct commit when calling parse_commit_gently(). This implementation
satisfies all post-conditions on the struct commit, including loading
parents, the root tree, and the commit date.

If core.commitGraph is false, then do not check graph files.

In test script t5318-commit-graph.sh, add output-matching conditions on
read-only graph operations.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commit walks. Here are some performance
results for a copy of the Linux repository where 'master' has 678,653
reachable commits and is behind 'origin/master' by 59,929 commits.

| Command                          | Before | After  | Rel % |
|----------------------------------|--------|--------|-------|
| log --oneline --topo-order -1000 |  8.31s |  0.94s | -88%  |
| branch -vv                       |  1.02s |  0.14s | -86%  |
| rev-list --all                   |  5.89s |  1.07s | -81%  |
| rev-list --all --objects         | 66.15s | 58.45s | -11%  |

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11 10:43:02 +09:00
Derrick Stolee 2a2e32bdc5 commit-graph: implement git commit-graph read
Teach git-commit-graph to read commit graph files and summarize their contents.

Use the read subcommand to verify the contents of a commit graph file in the
tests.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11 10:43:01 +09:00
Derrick Stolee f237c8b6fe commit-graph: implement git-commit-graph write
Teach git-commit-graph to write graph files. Create new test script to verify
this command succeeds without failure.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-02 14:27:38 -07:00