From d6045294a9099af7d0f793cba557ce59d900bf3d Mon Sep 17 00:00:00 2001 From: Jeff King Date: Tue, 6 Sep 2022 17:02:13 -0400 Subject: [PATCH 1/2] lookup_commit_in_graph(): use prepare_commit_graph() to check for graph We exit early from lookup_commit_in_graph() if the commit_graph pointer is NULL, under the assumption that we don't have a graph to look at. But the graph pointer is lazy-loaded; if no other code happens to have called prepare_commit_graph(), we'll incorrectly assume that one isn't available at all. This has a pretty small performance impact in practice, because the fallback will generally be to call parse_object() instead. That ends up in parse_commit_buffer(), which loads the graph data itself. So the first commit we see won't use the graph, but subsequent ones will. Since using the graph is just an optimization there's generally no user-visible difference, but if you instrument rev-list like so: diff --git a/revision.c b/revision.c index ee702e498a..63c488ffb6 100644 --- a/revision.c +++ b/revision.c @@ -381,6 +381,9 @@ static struct object *get_reference(struct rev_info *revs, const char *name, * parsing commit data from disk. */ commit = lookup_commit_in_graph(revs->repo, oid); + warning("%s %s in commit graph", + commit ? "found" : "did not find", + name); if (commit) object = &commit->object; else and run (in git.git): git commit-graph write --reachable git rev-list origin/master origin/next >/dev/null you'll see that we fail to find the first one: warning: did not find origin/master in commit graph warning: found origin/next in commit graph After this patch, you'll see that we find both: warning: found origin/master in commit graph warning: found origin/next in commit graph Even though the performance implication is small here, there are two important reasons to do this: - it's downright confusing if you are hunting a bug triggered by the use of the commit graph. It may or may not trigger depending on the number and ordering of tips you ask for. - prepare_commit_graph() has other policy logic, too. In particular, if we've loaded a commit graph and then disabled the graph via disable_commit_graph(), that should take precedence. I'm not sure if this can trigger bad behavior in practice. The only caller there is upload-pack's deepen_by_rev_list(), which should be avoiding the commit graph for its traversal tips, but probably wasn't before this patch. Whether you could come up with a case where that mattered is unclear. Still, this is obviously the right thing to be doing. Signed-off-by: Jeff King Reviewed-by: Taylor Blau Signed-off-by: Junio C Hamano --- commit-graph.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/commit-graph.c b/commit-graph.c index a487d49c3e..e8f16dc579 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -902,7 +902,7 @@ struct commit *lookup_commit_in_graph(struct repository *repo, const struct obje struct commit *commit; uint32_t pos; - if (!repo->objects->commit_graph) + if (!prepare_commit_graph(repo)) return NULL; if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos)) return NULL; From b27ccae34ba2cabbface87d6e0f80695316bcd59 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Tue, 6 Sep 2022 17:04:35 -0400 Subject: [PATCH 2/2] rev-list: disable commit graph with --verify-objects Since the point of --verify-objects is to actually load and checksum the bytes of each object, optimizing out reads using the commit graph runs contrary to our goal. The most targeted way to implement this would be for the revision traversal code to check revs->verify_objects and avoid using the commit graph. But it's difficult to be sure we've hit all of the correct spots. For instance, I started this patch by writing the first of the included test cases, where the corrupted commit is directly on rev-list's command line. And that is easy to fix by teaching get_reference() to check revs->verify_objects before calling lookup_commit_in_graph(). But that doesn't cover the second test case: when we traverse to a corrupted commit, we'd parse the parent in process_parents(). So we'd need to check there, too. And it keeps going. In handle_commit() we sometimes parses commits, too, though I couldn't figure out a way to trigger it that did not already parse via get_reference() or tag peeling. And try_to_simplify_commit() has its own parse call, and so on. So it seems like the safest thing is to just disable the commit graph for the whole process when we see the --verify-objects option. We can do that either in builtin/rev-list.c, where we use the option, or in revision.c, where we parse it. There are some subtleties: - putting it in rev-list.c is less surprising in some ways, because there we know we are just doing a single traversal. In a command which does multiple traversals in a single process, it's rather unexpected to globally disable the commit graph. - putting it in revision.c is less surprising in some ways, because the caller does not have to remember to disable the graph themselves. But this is already tricky! The verify_objects flag in rev_info doesn't do anything by itself. The caller has to provide an object callback which does the right thing. - for that reason, in practice nobody but rev-list uses this option in the first place. So the distinction is probably not important either way. Arguably it should just be an option of rev-list, and not the general revision machinery; right now you can run "git log --verify-objects", but it does not actually do anything useful. - checking for a parsed revs.verify_objects flag in rev-list.c is too late. By that time we've already passed the arguments to setup_revisions(), which will have parsed the commits using the graph. So this commit disables the graph as soon as we see the option in revision.c. That's a pretty broad hammer, but it does what we want, and in practice nobody but rev-list is using this flag anyway. The tests cover both the "tip" and "parent" cases. Obviously our hammer hits them both in this case, but it's good to check both in case somebody later tries the more focused approach. Signed-off-by: Jeff King Reviewed-by: Taylor Blau Signed-off-by: Junio C Hamano --- revision.c | 1 + t/t1450-fsck.sh | 28 ++++++++++++++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/revision.c b/revision.c index 0c6e26cd9c..d0e9e68a2e 100644 --- a/revision.c +++ b/revision.c @@ -2398,6 +2398,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg revs->tree_objects = 1; revs->blob_objects = 1; revs->verify_objects = 1; + disable_commit_graph(revs->repo); } else if (!strcmp(arg, "--unpacked")) { revs->unpacked = 1; } else if (starts_with(arg, "--unpacked=")) { diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 53c2aa10b7..f9a1bc5de7 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -507,6 +507,34 @@ test_expect_success 'rev-list --verify-objects with bad sha1' ' test_i18ngrep -q "error: hash mismatch $(dirname $new)$(test_oid ff_2)" out ' +test_expect_success 'set up repository with commit-graph' ' + git init corrupt-graph && + ( + cd corrupt-graph && + test_commit one && + test_commit two && + git commit-graph write --reachable + ) +' + +corrupt_graph_obj () { + oid=$(git -C corrupt-graph rev-parse "$1") && + obj=corrupt-graph/.git/objects/$(test_oid_to_path $oid) && + test_when_finished 'mv backup $obj' && + mv $obj backup && + echo garbage >$obj +} + +test_expect_success 'rev-list --verify-objects with commit graph (tip)' ' + corrupt_graph_obj HEAD && + test_must_fail git -C corrupt-graph rev-list --verify-objects HEAD +' + +test_expect_success 'rev-list --verify-objects with commit graph (parent)' ' + corrupt_graph_obj HEAD^ && + test_must_fail git -C corrupt-graph rev-list --verify-objects HEAD +' + test_expect_success 'force fsck to ignore double author' ' git cat-file commit HEAD >basis && sed "s/^author .*/&,&/" multiple-authors &&