The commit-graph learned to use corrected commit dates instead of
the generation number to help topological revision traversal.
* ak/corrected-commit-date:
doc: add corrected commit date info
commit-reach: use corrected commit dates in paint_down_to_common()
commit-graph: use generation v2 only if entire chain does
commit-graph: implement generation data chunk
commit-graph: implement corrected commit date
commit-graph: return 64-bit generation number
commit-graph: add a slab to store topological levels
t6600-test-reach: generalize *_three_modes
commit-graph: consolidate fill_commit_graph_info
revision: parse parent in indegree_walk_step()
commit-graph: fix regression when computing Bloom filters
@ -38,14 +38,31 @@ A consumer may load the following info for a commit from the graph:
@@ -38,14 +38,31 @@ A consumer may load the following info for a commit from the graph:
Values 1-4 satisfy the requirements of parse_commit_gently().
Define the "generation number" of a commit recursively as follows:
There are two definitions of generation number:
1. Corrected committer dates (generation number v2)
2. Topological levels (generation nummber v1)
* A commit with no parents (a root commit) has generation number one.
Define "corrected committer date" of a commit recursively as follows:
* A commit with at least one parent has generation number one more than
the largest generation number among its parents.
* A commit with no parents (a root commit) has corrected committer date
equal to its committer date.
Equivalently, the generation number of a commit A is one more than the
* A commit with at least one parent has corrected committer date equal to
the maximum of its commiter date and one more than the largest corrected
committer date among its parents.
* As a special case, a root commit with timestamp zero has corrected commit
date of 1, to be able to distinguish it from GENERATION_NUMBER_ZERO
(that is, an uncomputed corrected commit date).
Define the "topological level" of a commit recursively as follows:
* A commit with no parents (a root commit) has topological level of one.
* A commit with at least one parent has topological level one more than
the largest topological level among its parents.
Equivalently, the topological level of a commit A is one more than the
length of a longest path from A to a root commit. The recursive definition
is easier to use for computation and observing the following property:
@ -60,6 +77,9 @@ is easier to use for computation and observing the following property:
@@ -60,6 +77,9 @@ is easier to use for computation and observing the following property:
generation numbers, then we always expand the boundary commit with highest
generation number and can easily detect the stopping condition.
The property applies to both versions of generation number, that is both
corrected committer dates and topological levels.
This property can be used to significantly reduce the time it takes to
walk commits and determine topological relationships. Without generation
numbers, the general heuristic is the following:
@ -67,7 +87,9 @@ numbers, the general heuristic is the following:
@@ -67,7 +87,9 @@ numbers, the general heuristic is the following:
If A and B are commits with commit time X and Y, respectively, and
X < Y, then A _probably_ cannot reach B.
This heuristic is currently used whenever the computation is allowed to
In absence of corrected commit dates (for example, old versions of Git or
mixed generation graph chains),
this heuristic is currently used whenever the computation is allowed to
violate topological relationships due to clock skew (such as "git log"
with default order), but is not used when the topological order is
required (such as merge base calculations, "git log --graph").
@ -77,7 +99,7 @@ in the commit graph. We can treat these commits as having "infinite"
@@ -77,7 +99,7 @@ in the commit graph. We can treat these commits as having "infinite"
generation number and walk until reaching commits with known generation
number.
We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not
We use the macro GENERATION_NUMBER_INFINITY to mark commits not
in the commit-graph file. If a commit-graph file was written by a version
of Git that did not compute generation numbers, then those commits will
have generation number represented by the macro GENERATION_NUMBER_ZERO = 0.
@ -93,12 +115,12 @@ fully-computed generation numbers. Using strict inequality may result in
@@ -93,12 +115,12 @@ fully-computed generation numbers. Using strict inequality may result in
walking a few extra commits, but the simplicity in dealing with commits
with generation number *_INFINITY or *_ZERO is valuable.
We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose
generation numbers are computed to be at least this value. We limit at
this value since it is the largest value that can be stored in the
commit-graph file using the 30 bits available to generation numbers. This
presents another case where a commit can have generation number equal to
that of a parent.
We use the macro GENERATION_NUMBER_V1_MAX = 0x3FFFFFFF for commits whose
topological levels (generation number v1) are computed to be at least
this value. We limit at this value since it is the largest value that
can be stored in the commit-graph file using the 30 bits available
to topological levels. This presents another case where a commit can
have generation number equal to that of a parent.
Design Details
--------------
@ -267,6 +289,35 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum
@@ -267,6 +289,35 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum
number of commits) could be extracted into config settings for full
flexibility.
## Handling Mixed Generation Number Chains
With the introduction of generation number v2 and generation data chunk, the
following scenario is possible:
1. "New" Git writes a commit-graph with the corrected commit dates.
2. "Old" Git writes a split commit-graph on top without corrected commit dates.
A naive approach of using the newest available generation number from
each layer would lead to violated expectations: the lower layer would
use corrected commit dates which are much larger than the topological
levels of the higher layer. For this reason, Git inspects the topmost
layer to see if the layer is missing corrected commit dates. In such a case
Git only uses topological level for generation numbers.
When writing a new layer in split commit-graph, we write corrected commit
dates if the topmost layer has corrected commit dates written. This
guarantees that if a layer has corrected commit dates, all lower layers
must have corrected commit dates as well.
When merging layers, we do not consider whether the merged layers had corrected
commit dates. Instead, the new layer will have corrected commit dates if the
layer below the new layer has corrected commit dates.
While writing or merging layers, if the new layer is the only layer, it will
have corrected commit dates when written by compatible versions of Git. Thus,
rewriting split commit-graph as a single file (`--split=replace`) creates a
single layer with corrected commit dates.
## Deleting graph-{hash} files
After a new tip file is written, some `graph-{hash}` files may no longer
@ -387,6 +387,9 @@ GIT_TEST_COMMIT_GRAPH=<boolean>, when true, forces the commit-graph to
@@ -387,6 +387,9 @@ GIT_TEST_COMMIT_GRAPH=<boolean>, when true, forces the commit-graph to
be written after every 'git commit' command, and overrides the
'core.commitGraph' setting to true.
GIT_TEST_COMMIT_GRAPH_NO_GDAT=<boolean>, when true, forces the
commit-graph to be written without generation data chunk.
GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=<boolean>, when true, forces
commit-graph write to compute and write changed path Bloom filters for
every 'git commit-graph write', as if the `--changed-paths` option was
@ -76,7 +76,7 @@ graph_git_behavior 'no graph' full commits/3 commits/1
@@ -76,7 +76,7 @@ graph_git_behavior 'no graph' full commits/3 commits/1
graph_read_expect() {
OPTIONAL=""
NUM_CHUNKS=3
if test ! -z $2
if test ! -z "$2"
then
OPTIONAL=" $2"
NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
@ -103,14 +103,14 @@ test_expect_success 'exit with correct error on bad input to --stdin-commits' '
@@ -103,14 +103,14 @@ test_expect_success 'exit with correct error on bad input to --stdin-commits' '
# valid commit and tree OID
git rev-parse HEAD HEAD^{tree} >in &&
git commit-graph write --stdin-commits <in &&
graph_read_expect 3
graph_read_expect 3 generation_data
'
test_expect_success 'write graph' '
cd "$TRASH_DIRECTORY/full" &&
git commit-graph write &&
test_path_is_file $objdir/info/commit-graph &&
graph_read_expect "3"
graph_read_expect "3" generation_data
'
test_expect_success POSIXPERM 'write graph has correct permissions' '
@ -219,7 +219,7 @@ test_expect_success 'write graph with merges' '
@@ -219,7 +219,7 @@ test_expect_success 'write graph with merges' '