You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
186 lines
6.3 KiB
186 lines
6.3 KiB
Rerere |
|
====== |
|
|
|
This document describes the rerere logic. |
|
|
|
Conflict normalization |
|
---------------------- |
|
|
|
To ensure recorded conflict resolutions can be looked up in the rerere |
|
database, even when branches are merged in a different order, |
|
different branches are merged that result in the same conflict, or |
|
when different conflict style settings are used, rerere normalizes the |
|
conflicts before writing them to the rerere database. |
|
|
|
Different conflict styles and branch names are normalized by stripping |
|
the labels from the conflict markers, and removing the common ancestor |
|
version from the `diff3` conflict style. Branches that are merged |
|
in different order are normalized by sorting the conflict hunks. More |
|
on each of those steps in the following sections. |
|
|
|
Once these two normalization operations are applied, a conflict ID is |
|
calculated based on the normalized conflict, which is later used by |
|
rerere to look up the conflict in the rerere database. |
|
|
|
Removing the common ancestor version |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Say we have three branches AB, AC and AC2. The common ancestor of |
|
these branches has a file with a line containing the string "A" (for |
|
brevity this is called "line A" in the rest of the document). In |
|
branch AB this line is changed to "B", in AC, this line is changed to |
|
"C", and branch AC2 is forked off of AC, after the line was changed to |
|
"C". |
|
|
|
Forking a branch ABAC off of branch AB and then merging AC into it, we |
|
get a conflict like the following: |
|
|
|
<<<<<<< HEAD |
|
B |
|
======= |
|
C |
|
>>>>>>> AC |
|
|
|
Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB |
|
and then merging branch AC2 into it), using the diff3 conflict style, |
|
we get a conflict like the following: |
|
|
|
<<<<<<< HEAD |
|
B |
|
||||||| merged common ancestors |
|
A |
|
======= |
|
C |
|
>>>>>>> AC2 |
|
|
|
By resolving this conflict, to leave line D, the user declares: |
|
|
|
After examining what branches AB and AC did, I believe that making |
|
line A into line D is the best thing to do that is compatible with |
|
what AB and AC wanted to do. |
|
|
|
As branch AC2 refers to the same commit as AC, the above implies that |
|
this is also compatible what AB and AC2 wanted to do. |
|
|
|
By extension, this means that rerere should recognize that the above |
|
conflicts are the same. To do this, the labels on the conflict |
|
markers are stripped, and the common ancestor version is removed. The above |
|
examples would both result in the following normalized conflict: |
|
|
|
<<<<<<< |
|
B |
|
======= |
|
C |
|
>>>>>>> |
|
|
|
Sorting hunks |
|
~~~~~~~~~~~~~ |
|
|
|
As before, lets imagine that a common ancestor had a file with line A |
|
its early part, and line X in its late part. And then four branches |
|
are forked that do these things: |
|
|
|
- AB: changes A to B |
|
- AC: changes A to C |
|
- XY: changes X to Y |
|
- XZ: changes X to Z |
|
|
|
Now, forking a branch ABAC off of branch AB and then merging AC into |
|
it, and forking a branch ACAB off of branch AC and then merging AB |
|
into it, would yield the conflict in a different order. The former |
|
would say "A became B or C, what now?" while the latter would say "A |
|
became C or B, what now?" |
|
|
|
As a reminder, the act of merging AC into ABAC and resolving the |
|
conflict to leave line D means that the user declares: |
|
|
|
After examining what branches AB and AC did, I believe that |
|
making line A into line D is the best thing to do that is |
|
compatible with what AB and AC wanted to do. |
|
|
|
So the conflict we would see when merging AB into ACAB should be |
|
resolved the same way---it is the resolution that is in line with that |
|
declaration. |
|
|
|
Imagine that similarly previously a branch XYXZ was forked from XY, |
|
and XZ was merged into it, and resolved "X became Y or Z" into "X |
|
became W". |
|
|
|
Now, if a branch ABXY was forked from AB and then merged XY, then ABXY |
|
would have line B in its early part and line Y in its later part. |
|
Such a merge would be quite clean. We can construct 4 combinations |
|
using these four branches ((AB, AC) x (XY, XZ)). |
|
|
|
Merging ABXY and ACXZ would make "an early A became B or C, a late X |
|
became Y or Z" conflict, while merging ACXY and ABXZ would make "an |
|
early A became C or B, a late X became Y or Z". We can see there are |
|
4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X"). |
|
|
|
By sorting, the conflict is given its canonical name, namely, "an |
|
early part became B or C, a late part became X or Y", and whenever |
|
any of these four patterns appear, and we can get to the same conflict |
|
and resolution that we saw earlier. |
|
|
|
Without the sorting, we'd have to somehow find a previous resolution |
|
from combinatorial explosion. |
|
|
|
Conflict ID calculation |
|
~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
Once the conflict normalization is done, the conflict ID is calculated |
|
as the sha1 hash of the conflict hunks appended to each other, |
|
separated by <NUL> characters. The conflict markers are stripped out |
|
before the sha1 is calculated. So in the example above, where we |
|
merge branch AC which changes line A to line C, into branch AB, which |
|
changes line A to line C, the conflict ID would be |
|
SHA1('B<NUL>C<NUL>'). |
|
|
|
If there are multiple conflicts in one file, the sha1 is calculated |
|
the same way with all hunks appended to each other, in the order in |
|
which they appear in the file, separated by a <NUL> character. |
|
|
|
Nested conflicts |
|
~~~~~~~~~~~~~~~~ |
|
|
|
Nested conflicts are handled very similarly to "simple" conflicts. |
|
Similar to simple conflicts, the conflict is first normalized by |
|
stripping the labels from conflict markers, stripping the common ancestor |
|
version, and the sorting the conflict hunks, both for the outer and the |
|
inner conflict. This is done recursively, so any number of nested |
|
conflicts can be handled. |
|
|
|
Note that this only works for conflict markers that "cleanly nest". If |
|
there are any unmatched conflict markers, rerere will fail to handle |
|
the conflict and record a conflict resolution. |
|
|
|
The only difference is in how the conflict ID is calculated. For the |
|
inner conflict, the conflict markers themselves are not stripped out |
|
before calculating the sha1. |
|
|
|
Say we have the following conflict for example: |
|
|
|
<<<<<<< HEAD |
|
1 |
|
======= |
|
<<<<<<< HEAD |
|
3 |
|
======= |
|
2 |
|
>>>>>>> branch-2 |
|
>>>>>>> branch-3~ |
|
|
|
After stripping out the labels of the conflict markers, and sorting |
|
the hunks, the conflict would look as follows: |
|
|
|
<<<<<<< |
|
1 |
|
======= |
|
<<<<<<< |
|
2 |
|
======= |
|
3 |
|
>>>>>>> |
|
>>>>>>> |
|
|
|
and finally the conflict ID would be calculated as: |
|
`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')`
|
|
|