Browse Source
Optimize out repeated rename detection in a sequence of mergy operations. * en/ort-perf-batch-11: merge-ort, diffcore-rename: employ cached renames when possible merge-ort: handle interactions of caching and rename/rename(1to1) cases merge-ort: add helper functions for using cached renames merge-ort: preserve cached renames for the appropriate side merge-ort: avoid accidental API mis-use merge-ort: add code to check for whether cached renames can be reused merge-ort: populate caches of rename detection results merge-ort: add data structures for in-memory caching of rename detection t6429: testcases for remembering renames fast-rebase: write conflict state to working tree, index, and HEAD fast-rebase: change assert() to BUG() Documentation/technical: describe remembering renames optimization t6423: rename file within directory that other side renamedmaint

8 changed files with 1804 additions and 37 deletions
@ -0,0 +1,671 @@
@@ -0,0 +1,671 @@
|
||||
Rebases and cherry-picks involve a sequence of merges whose results are |
||||
recorded as new single-parent commits. The first parent side of those |
||||
merges represent the "upstream" side, and often include a far larger set of |
||||
changes than the second parent side. Traditionally, the renames on the |
||||
first-parent side of that sequence of merges were repeatedly re-detected |
||||
for every merge. This file explains why it is safe and effective during |
||||
rebases and cherry-picks to remember renames on the upstream side of |
||||
history as an optimization, assuming all merges are automatic and clean |
||||
(i.e. no conflicts and not interrupted for user input or editing). |
||||
|
||||
Outline: |
||||
|
||||
0. Assumptions |
||||
|
||||
1. How rebasing and cherry-picking work |
||||
|
||||
2. Why the renames on MERGE_SIDE1 in any given pick are *always* a |
||||
superset of the renames on MERGE_SIDE1 for the next pick. |
||||
|
||||
3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also |
||||
a rename on MERGE_SIDE1 for the next pick |
||||
|
||||
4. A detailed description of the the counter-examples to #3. |
||||
|
||||
5. Why the special cases in #4 are still fully reasonable to use to pair |
||||
up files for three-way content merging in the merge machinery, and why |
||||
they do not affect the correctness of the merge. |
||||
|
||||
6. Interaction with skipping of "irrelevant" renames |
||||
|
||||
7. Additional items that need to be cached |
||||
|
||||
8. How directory rename detection interacts with the above and why this |
||||
optimization is still safe even if merge.directoryRenames is set to |
||||
"true". |
||||
|
||||
|
||||
=== 0. Assumptions === |
||||
|
||||
There are two assumptions that will hold throughout this document: |
||||
|
||||
* The upstream side where commits are transplanted to is treated as the |
||||
first parent side when rebase/cherry-pick call the merge machinery |
||||
|
||||
* All merges are fully automatic |
||||
|
||||
and a third that will hold in sections 2-5 for simplicity, that I'll later |
||||
address in section 8: |
||||
|
||||
* No directory renames occur |
||||
|
||||
|
||||
Let me explain more about each assumption and why I include it: |
||||
|
||||
|
||||
The first assumption is merely for the purposes of making this document |
||||
clearer; the optimization implementation does not actually depend upon it. |
||||
However, the assumption does hold in all cases because it reflects the way |
||||
that both rebase and cherry-pick were implemented; and the implementation |
||||
of cherry-pick and rebase are not readily changeable for backwards |
||||
compatibility reasons (see for example the discussion of the --ours and |
||||
--theirs flag in the documentation of `git checkout`, particularly the |
||||
comments about how they behave with rebase). The optimization avoids |
||||
checking first-parent-ness, though. It checks the conditions that make the |
||||
optimization valid instead, so it would still continue working if someone |
||||
changed the parent ordering that cherry-pick and rebase use. But making |
||||
this assumption does make this document much clearer and prevents me from |
||||
having to repeat every example twice. |
||||
|
||||
If the second assumption is violated, then the optimization simply is |
||||
turned off and thus isn't relevant to consider. The second assumption can |
||||
also be stated as "there is no interruption for a user to resolve conflicts |
||||
or to just further edit or tweak files". While real rebases and |
||||
cherry-picks are often interrupted (either because it's an interactive |
||||
rebase where the user requested to stop and edit, or because there were |
||||
conflicts that the user needs to resolve), the cache of renames is not |
||||
stored on disk, and thus is thrown away as soon as the rebase or cherry |
||||
pick stops for the user to resolve the operation. |
||||
|
||||
The third assumption makes sections 2-5 simpler, and allows people to |
||||
understand the basics of why this optimization is safe and effective, and |
||||
then I can go back and address the specifics in section 8. It is probably |
||||
also worth noting that if directory renames do occur, then the default of |
||||
merge.directoryRenames being set to "conflict" means that the operation |
||||
will stop for users to resolve the conflicts and the cache will be thrown |
||||
away, and thus that there won't be an optimization to apply. So, the only |
||||
reason we need to address directory renames specifically, is that some |
||||
users will have set merge.directoryRenames to "true" to allow the merges to |
||||
continue to proceed automatically. The optimization is still safe with |
||||
this config setting, but we have to discuss a few more cases to show why; |
||||
this discussion is deferred until section 8. |
||||
|
||||
|
||||
=== 1. How rebasing and cherry-picking work === |
||||
|
||||
Consider the following setup (from the git-rebase manpage): |
||||
|
||||
A---B---C topic |
||||
/ |
||||
D---E---F---G main |
||||
|
||||
After rebasing or cherry-picking topic onto main, this will appear as: |
||||
|
||||
A'--B'--C' topic |
||||
/ |
||||
D---E---F---G main |
||||
|
||||
The way the commits A', B', and C' are created is through a series of |
||||
merges, where rebase or cherry-pick sequentially uses each of the three |
||||
A-B-C commits in a special merge operation. Let's label the three commits |
||||
in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. For |
||||
this picture, the three commits for each of the three merges would be: |
||||
|
||||
To create A': |
||||
MERGE_BASE: E |
||||
MERGE_SIDE1: G |
||||
MERGE_SIDE2: A |
||||
|
||||
To create B': |
||||
MERGE_BASE: A |
||||
MERGE_SIDE1: A' |
||||
MERGE_SIDE2: B |
||||
|
||||
To create C': |
||||
MERGE_BASE: B |
||||
MERGE_SIDE1: B' |
||||
MERGE_SIDE2: C |
||||
|
||||
Sometimes, folks are surprised that these three-way merges are done. It |
||||
can be useful in understanding these three-way merges to view them in a |
||||
slightly different light. For example, in creating C', you can view it as |
||||
either: |
||||
|
||||
* Apply the changes between B & C to B' |
||||
* Apply the changes between B & B' to C |
||||
|
||||
Conceptually the two statements above are the same as a three-way merge of |
||||
B, B', and C, at least the parts before you decide to record a commit. |
||||
|
||||
|
||||
=== 2. Why the renames on MERGE_SIDE1 in any given pick are always a === |
||||
=== superset of the renames on MERGE_SIDE1 for the next pick. === |
||||
|
||||
The merge machinery uses the filenames it is fed from MERGE_BASE, |
||||
MERGE_SIDE1, and MERGE_SIDE2. It will only move content to a different |
||||
filename under one of three conditions: |
||||
|
||||
* To make both pieces of a conflict available to a user during conflict |
||||
resolution (examples: directory/file conflict, add/add type conflict |
||||
such as symlink vs. regular file) |
||||
|
||||
* When MERGE_SIDE1 renames the file. |
||||
|
||||
* When MERGE_SIDE2 renames the file. |
||||
|
||||
First, let's remember what commits are involved in the first and second |
||||
picks of the cherry-pick or rebase sequence: |
||||
|
||||
To create A': |
||||
MERGE_BASE: E |
||||
MERGE_SIDE1: G |
||||
MERGE_SIDE2: A |
||||
|
||||
To create B': |
||||
MERGE_BASE: A |
||||
MERGE_SIDE1: A' |
||||
MERGE_SIDE2: B |
||||
|
||||
So, in particular, we need to show that the renames between E and G are a |
||||
superset of those between A and A'. |
||||
|
||||
A' is created by the first merge. A' will only have renames for one of the |
||||
three reasons listed above. The first case, a conflict, results in a |
||||
situation where the cache is dropped and thus this optimization doesn't |
||||
take effect, so we need not consider that case. The third case, a rename |
||||
on MERGE_SIDE2 (i.e. from G to A), will show up in A' but it also shows up |
||||
in A -- therefore when diffing A and A' that path does not show up as a |
||||
rename. The only remaining way for renames to show up in A' is for the |
||||
rename to come from MERGE_SIDE1. Therefore, all renames between A and A' |
||||
are a subset of those between E and G. Equivalently, all renames between E |
||||
and G are a superset of those between A and A'. |
||||
|
||||
|
||||
=== 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ === |
||||
=== always also a rename on MERGE_SIDE1 for the next pick. === |
||||
|
||||
Let's again look at the first two picks: |
||||
|
||||
To create A': |
||||
MERGE_BASE: E |
||||
MERGE_SIDE1: G |
||||
MERGE_SIDE2: A |
||||
|
||||
To create B': |
||||
MERGE_BASE: A |
||||
MERGE_SIDE1: A' |
||||
MERGE_SIDE2: B |
||||
|
||||
Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e. |
||||
any given rename from E to G. Let's use the filenames 'oldfile' and |
||||
'newfile' for demonstration purposes. That first pick will function as |
||||
follows; when the rename is detected, the merge machinery will do a |
||||
three-way content merge of the following: |
||||
E:oldfile |
||||
G:newfile |
||||
A:oldfile |
||||
and produce a new result: |
||||
A':newfile |
||||
|
||||
Note above that I've assumed that E->A did not rename oldfile. If that |
||||
side did rename, then we most likely have a rename/rename(1to2) conflict |
||||
that will cause the rebase or cherry-pick operation to halt and drop the |
||||
in-memory cache of renames and thus doesn't need to be considered further. |
||||
In the special case that E->A does rename the file but also renames it to |
||||
newfile, then there is no conflict from the renaming and the merge can |
||||
succeed. In this special case, the rename is not valid to cache because |
||||
the second merge will find A:newfile in the MERGE_BASE (see also the new |
||||
testcases in t6429 with "rename same file identically" in their |
||||
description). So a rename/rename(1to1) needs to be specially handled by |
||||
pruning renames from the cache and decrementing the dir_rename_counts in |
||||
the current and leading directories associated with those renames. Or, |
||||
since these are really rare, one could just take the easy way out and |
||||
disable the remembering renames optimization when a rename/rename(1to1) |
||||
happens. |
||||
|
||||
The previous paragraph handled the cases for E->A renaming oldfile, let's |
||||
continue assuming that oldfile is not renamed in A. |
||||
|
||||
As per the diagram for creating B', MERGE_SIDE1 involves the changes from A |
||||
to A'. So, we are curious whether A:oldfile and A':newfile will be viewed |
||||
as renames. Note that: |
||||
|
||||
* There will be no A':oldfile (because there could not have been a |
||||
G:oldfile as we do not do break detection in the merge machinery and |
||||
G:newfile was detected as a rename, and by the construction of the |
||||
rename above that merged cleanly, the merge machinery will ensure there |
||||
is no 'oldfile' in the result). |
||||
|
||||
* There will be no A:newfile (if there had been, we would have had a |
||||
rename/add conflict). |
||||
|
||||
* Clearly A:oldfile and A':newfile are "related" (A':newfile came from a |
||||
clean three-way content merge involving A:oldfile). |
||||
|
||||
We can also expound on the third point above, by noting that three-way |
||||
content merges can also be viewed as applying the differences between the |
||||
base and one side to the other side. Thus we can view A':newfile as |
||||
having been created by taking the changes between E:oldfile and G:newfile |
||||
(which were detected as being related, i.e. <50% changed) to A:oldfile. |
||||
|
||||
Thus A:oldfile and A':newfile are just as related as E:oldfile and |
||||
G:newfile are -- they have exactly identical differences. Since the latter |
||||
were detected as renames, A:oldfile and A':newfile should also be |
||||
detectable as renames almost always. |
||||
|
||||
|
||||
=== 4. A detailed description of the counter-examples to #3. === |
||||
|
||||
We already noted in section 3 that rename/rename(1to1) (i.e. both sides |
||||
renaming a file the same way) was one counter-example. The more |
||||
interesting bit, though, is why did we need to use the "almost" qualifier |
||||
when stating that A:oldfile and A':newfile are "almost" always detectable |
||||
as renames? |
||||
|
||||
Let's repeat an earlier point that section 3 made: |
||||
|
||||
A':newfile was created by applying the changes between E:oldfile and |
||||
G:newfile to A:oldfile. The changes between E:oldfile and G:newfile were |
||||
<50% of the size of E:oldfile. |
||||
|
||||
If those changes that were <50% of the size of E:oldfile are also <50% of |
||||
the size of A:oldfile, then A:oldfile and A':newfile will be detectable as |
||||
renames. However, if there is a dramatic size reduction between E:oldfile |
||||
and A:oldfile (but the changes between E:oldfile, G:newfile, and A:oldfile |
||||
still somehow merge cleanly), then traditional rename detection would not |
||||
detect A:oldfile and A':newfile as renames. |
||||
|
||||
Here's an example where that can happen: |
||||
* E:oldfile had 20 lines |
||||
* G:newfile added 10 new lines at the beginning of the file |
||||
* A:oldfile kept the first 3 lines of the file, and deleted all the rest |
||||
then |
||||
=> A':newfile would have 13 lines, 3 of which matches those in A:oldfile. |
||||
E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and |
||||
A':newfile would not be. |
||||
|
||||
|
||||
=== 5. Why the special cases in #4 are still fully reasonable to use to === |
||||
=== pair up files for three-way content merging in the merge machinery, === |
||||
=== and why they do not affect the correctness of the merge. === |
||||
|
||||
In the rename/rename(1to1) case, A:newfile and A':newfile are not renames |
||||
since they use the *same* filename. However, files with the same filename |
||||
are obviously fine to pair up for three-way content merging (the merge |
||||
machinery has never employed break detection). The interesting |
||||
counter-example case is thus not the rename/rename(1to1) case, but the case |
||||
where A did not rename oldfile. That was the case that we spent most of |
||||
the time discussing in sections 3 and 4. The remainder of this section |
||||
will be devoted to that case as well. |
||||
|
||||
So, even if A:oldfile and A':newfile aren't detectable as renames, why is |
||||
it still reasonable to pair them up for three-way content merging in the |
||||
merge machinery? There are multiple reasons: |
||||
|
||||
* As noted in sections 3 and 4, the diff between A:oldfile and A':newfile |
||||
is *exactly* the same as the diff between E:oldfile and G:newfile. The |
||||
latter pair were detected as renames, so it seems unlikely to surprise |
||||
users for us to treat A:oldfile and A':newfile as renames. |
||||
|
||||
* In fact, "oldfile" and "newfile" were at one point detected as renames |
||||
due to how they were constructed in the E..G chain. And we used that |
||||
information once already in this rebase/cherry-pick. I think users |
||||
would be unlikely to be surprised at us continuing to treat the files |
||||
as renames and would quickly understand why we had done so. |
||||
|
||||
* Marking or declaring files as renames is *not* the end goal for merges. |
||||
Merges use renames to determine which files make sense to be paired up |
||||
for three-way content merges. |
||||
|
||||
* A:oldfile and A':newfile were _already_ paired up in a three-way |
||||
content merge; that is how A':newfile was created. In fact, that |
||||
three-way content merge was clean. So using them again in a later |
||||
three-way content merge seems very reasonable. |
||||
|
||||
However, the above is focusing on the common scenarios. Let's try to look |
||||
at all possible unusual scenarios and compare without the optimization to |
||||
with the optimization. Consider the following theoretical cases; we will |
||||
then dive into each to determine which of them are possible, |
||||
and if so, what they mean: |
||||
|
||||
1. Without the optimization, the second merge results in a conflict. |
||||
With the optimization, the second merge also results in a conflict. |
||||
Questions: Are the conflicts confusingly different? Better in one case? |
||||
|
||||
2. Without the optimization, the second merge results in NO conflict. |
||||
With the optimization, the second merge also results in NO conflict. |
||||
Questions: Are the merges the same? |
||||
|
||||
3. Without the optimization, the second merge results in a conflict. |
||||
With the optimization, the second merge results in NO conflict. |
||||
Questions: Possible? Bug, bugfix, or something else? |
||||
|
||||
4. Without the optimization, the second merge results in NO conflict. |
||||
With the optimization, the second merge results in a conflict. |
||||
Questions: Possible? Bug, bugfix, or something else? |
||||
|
||||
I'll consider all four cases, but out of order. |
||||
|
||||
The fourth case is impossible. For the code without the remembering |
||||
renames optimization to not get a conflict, B:oldfile would need to exactly |
||||
match A:oldfile -- if it doesn't, there would be a modify/delete conflict. |
||||
If A:oldfile matches B:oldfile exactly, then a three-way content merge |
||||
between A:oldfile, A':newfile, and B:oldfile would have no conflict and |
||||
just give us the version of newfile from A' as the result. |
||||
|
||||
From the same logic as the above paragraph, the second case would indeed |
||||
result in identical merges. When A:oldfile exactly matches B:oldfile, an |
||||
undetected rename would say, "Oh, I see one side didn't modify 'oldfile' |
||||
and the other side deleted it. I'll delete it. And I see you have this |
||||
brand new file named 'newfile' in A', so I'll keep it." That gives the |
||||
same results as three-way content merging A:oldfile, A':newfile, and |
||||
B:oldfile -- a removal of oldfile with the version of newfile from A' |
||||
showing up in the result. |
||||
|
||||
The third case is interesting. It means that A:oldfile and A':newfile were |
||||
not just similar enough, but that the changes between them did not conflict |
||||
with the changes between A:oldfile and B:oldfile. This would validate our |
||||
hunch that the files were similar enough to be used in a three-way content |
||||
merge, and thus seems entirely correct for us to have used them that way. |
||||
(Sidenote: One particular example here may be enlightening. Let's say that |
||||
B was an immediate revert of A. B clearly would have been a clean revert |
||||
of A, since A was B's immediate parent. One would assume that if you can |
||||
pick a commit, you should also be able to cherry-pick its immediate revert. |
||||
However, this is one of those funny corner cases; without this |
||||
optimization, we just successfully picked a commit cleanly, but we are |
||||
unable to cherry-pick its immediate revert due to the size differences |
||||
between E:oldfile and A:oldfile.) |
||||
|
||||
That leaves only the first case to consider -- when we get conflicts both |
||||
with or without the optimization. Without the optimization, we'll have a |
||||
modify/delete conflict, where both A':newfile and B:oldfile are left in the |
||||
tree for the user to deal with and no hints about the potential similarity |
||||
between the two. With the optimization, we'll have a three-way content |
||||
merged A:oldfile, A':newfile, and B:oldfile with conflict markers |
||||
suggesting we thought the files were related but giving the user the chance |
||||
to resolve. As noted above, I don't think users will find us treating |
||||
'oldfile' and 'newfile' as related as a surprise since they were between E |
||||
and G. In any event, though, this case shouldn't be concerning since we |
||||
hit a conflict in both cases, told the user what we know, and asked them to |
||||
resolve it. |
||||
|
||||
So, in summary, case 4 is impossible, case 2 yields the same behavior, and |
||||
cases 1 and 3 seem to provide as good or better behavior with the |
||||
optimization than without. |
||||
|
||||
|
||||
=== 6. Interaction with skipping of "irrelevant" renames === |
||||
|
||||
Previous optimizations involved skipping rename detection for paths |
||||
considered to be "irrelevant". See for example the following commits: |
||||
|
||||
* 32a56dfb99 ("merge-ort: precompute subset of sources for which we |
||||
need rename detection", 2021-03-11) |
||||
* 2fd9eda462 ("merge-ort: precompute whether directory rename |
||||
detection is needed", 2021-03-11) |
||||
* 9bd342137e ("diffcore-rename: determine which relevant_sources are |
||||
no longer relevant", 2021-03-13) |
||||
|
||||
Relevance is always determined by what the _other_ side of history has |
||||
done, in terms of modifing a file that our side renamed, or adding a |
||||
file to a directory which our side renamed. This means that a path |
||||
that is "irrelevant" when picking the first commit of a series in a |
||||
rebase or cherry-pick, may suddenly become "relevant" when picking the |
||||
next commit. |
||||
|
||||
The upshot of this is that we can only cache rename detection results |
||||
for relevant paths, and need to re-check relevance in subsequent |
||||
commits. If those subsequent commits have additional paths that are |
||||
relevant for rename detection, then we will need to redo rename |
||||
detection -- though we can limit it to the paths for which we have not |
||||
already detected renames. |
||||
|
||||
|
||||
=== 7. Additional items that need to be cached === |
||||
|
||||
It turns out we have to cache more than just renames; we also cache: |
||||
|
||||
A) non-renames (i.e. unpaired deletes) |
||||
B) counts of renames within directories |
||||
C) sources that were marked as RELEVANT_LOCATION, but which were |
||||
downgraded to RELEVANT_NO_MORE |
||||
D) the toplevel trees involved in the merge |
||||
|
||||
These are all stored in struct rename_info, and respectively appear in |
||||
* cached_pairs (along side actual renames, just with a value of NULL) |
||||
* dir_rename_counts |
||||
* cached_irrelevant |
||||
* merge_trees |
||||
|
||||
The reason for (A) comes from the irrelevant renames skipping |
||||
optimization discussed in section 6. The fact that irrelevant renames |
||||
are skipped means we only get a subset of the potential renames |
||||
detected and subsequent commits may need to run rename detection on |
||||
the upstream side on a subset of the remaining renames (to get the |
||||
renames that are relevant for that later commit). Since unpaired |
||||
deletes are involved in rename detection too, we don't want to |
||||
repeatedly check that those paths remain unpaired on the upstream side |
||||
with every commit we are transplanting. |
||||
|
||||
The reason for (B) is that diffcore_rename_extended() is what |
||||
generates the counts of renames by directory which is needed in |
||||
directory rename detection, and if we don't run |
||||
diffcore_rename_extended() again then we need to have the output from |
||||
it, including dir_rename_counts, from the previous run. |
||||
|
||||
The reason for (C) is that merge-ort's tree traversal will again think |
||||
those paths are relevant (marking them as RELEVANT_LOCATION), but the |
||||
fact that they were downgraded to RELEVANT_NO_MORE means that |
||||
dir_rename_counts already has the information we need for directory |
||||
rename detection. (A path which becomes RELEVANT_CONTENT in a |
||||
subsequent commit will be removed from cached_irrelevant.) |
||||
|
||||
The reason for (D) is that is how we determine whether the remember |
||||
renames optimization can be used. In particular, remembering that our |
||||
sequence of merges looks like: |
||||
|
||||
Merge 1: |
||||
MERGE_BASE: E |
||||
MERGE_SIDE1: G |
||||
MERGE_SIDE2: A |
||||
=> Creates A' |
||||
|
||||
Merge 2: |
||||
MERGE_BASE: A |
||||
MERGE_SIDE1: A' |
||||
MERGE_SIDE2: B |
||||
=> Creates B' |
||||
|
||||
It is the fact that the trees A and A' appear both in Merge 1 and in |
||||
Merge 2, with A as a parent of A' that allows this optimization. So |
||||
we store the trees to compare with what we are asked to merge next |
||||
time. |
||||
|
||||
|
||||
=== 8. How directory rename detection interacts with the above and === |
||||
=== why this optimization is still safe even if === |
||||
=== merge.directoryRenames is set to "true". === |
||||
|
||||
As noted in the assumptions section: |
||||
|
||||
""" |
||||
...if directory renames do occur, then the default of |
||||
merge.directoryRenames being set to "conflict" means that the operation |
||||
will stop for users to resolve the conflicts and the cache will be |
||||
thrown away, and thus that there won't be an optimization to apply. |
||||
So, the only reason we need to address directory renames specifically, |
||||
is that some users will have set merge.directoryRenames to "true" to |
||||
allow the merges to continue to proceed automatically. |
||||
""" |
||||
|
||||
Let's remember that we need to look at how any given pick affects the next |
||||
one. So let's again use the first two picks from the diagram in section |
||||
one: |
||||
|
||||
First pick does this three-way merge: |
||||
MERGE_BASE: E |
||||
MERGE_SIDE1: G |
||||
MERGE_SIDE2: A |
||||
=> creates A' |
||||
|
||||
Second pick does this three-way merge: |
||||
MERGE_BASE: A |
||||
MERGE_SIDE1: A' |
||||
MERGE_SIDE2: B |
||||
=> creates B' |
||||
|
||||
Now, directory rename detection exists so that if one side of history |
||||
renames a directory, and the other side adds a new file to the old |
||||
directory, then the merge (with merge.directoryRenames=true) can move the |
||||
file into the new directory. There are two qualitatively different ways to |
||||
add a new file to an old directory: create a new file, or rename a file |
||||
into that directory. Also, directory renames can be done on either side of |
||||
history, so there are four cases to consider: |
||||
|
||||
* MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir |
||||
* MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir |
||||
* MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir |
||||
* MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir |
||||
|
||||
One last note before we consider these four cases: There are some |
||||
important properties about how we implement this optimization with |
||||
respect to directory rename detection that we need to bear in mind |
||||
while considering all of these cases: |
||||
|
||||
* rename caching occurs *after* applying directory renames |
||||
|
||||
* a rename created by directory rename detection is recorded for the side |
||||
of history that did the directory rename. |
||||
|
||||
* dir_rename_counts, the nested map of |
||||
{oldname => {newname => count}}, |
||||
is cached between runs as well. This basically means that directory |
||||
rename detection is also cached, though only on the side of history |
||||
that we cache renames for (MERGE_SIDE1 as far as this document is |
||||
concerned; see the assumptions section). Two interesting sub-notes |
||||
about these counts: |
||||
|
||||
* If we need to perform rename-detection again on the given side (e.g. |
||||
some paths are relevant for rename detection that weren't before), |
||||
then we clear dir_rename_counts and recompute it, making use of |
||||
cached_pairs. The reason it is important to do this is optimizations |
||||
around RELEVANT_LOCATION exist to prevent us from computing |
||||
unnecessary renames for directory rename detection and from computing |
||||
dir_rename_counts for irrelevant directories; but those same renames |
||||
or directories may become necessary for subsequent merges. The |
||||
easiest way to "fix up" dir_rename_counts in such cases is to just |
||||
recompute it. |
||||
|
||||
* If we prune rename/rename(1to1) entries from the cache, then we also |
||||
need to update dir_rename_counts to decrement the counts for the |
||||
involved directory and any relevant parent directories (to undo what |
||||
update_dir_rename_counts() in diffcore-rename.c incremented when the |
||||
rename was initially found). If we instead just disable the |
||||
remembering renames optimization when the exceedingly rare |
||||
rename/rename(1to1) cases occur, then dir_rename_counts will get |
||||
re-computed the next time rename detection occurs, as noted above. |
||||
|
||||
* the side with multiple commits to pick, is the side of history that we |
||||
do NOT cache renames for. Thus, there are no additional commits to |
||||
change the number of renames in a directory, except for those done by |
||||
directory rename detection (which always pad the majority). |
||||
|
||||
* the "renames" we cache are modified slightly by any directory rename, |
||||
as noted below. |
||||
|
||||
Now, with those notes out of the way, let's go through the four cases |
||||
in order: |
||||
|
||||
Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir |
||||
|
||||
This case looks like this: |
||||
|
||||
MERGE_BASE: E, Has olddir/ |
||||
MERGE_SIDE1: G, Renames olddir/ -> newdir/ |
||||
MERGE_SIDE2: A, Adds olddir/newfile |
||||
=> creates A', With newdir/newfile |
||||
|
||||
MERGE_BASE: A, Has olddir/newfile |
||||
MERGE_SIDE1: A', Has newdir/newfile |
||||
MERGE_SIDE2: B, Modifies olddir/newfile |
||||
=> expected B', with threeway-merged newdir/newfile from above |
||||
|
||||
In this case, with the optimization, note that after the first commit: |
||||
* MERGE_SIDE1 remembers olddir/ -> newdir/ |
||||
* MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile |
||||
Given the cached rename noted above, the second merge can proceed as |
||||
expected without needing to perform rename detection from A -> A'. |
||||
|
||||
Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir |
||||
|
||||
This case looks like this: |
||||
MERGE_BASE: E oldfile, olddir/ |
||||
MERGE_SIDE1: G oldfile, olddir/ -> newdir/ |
||||
MERGE_SIDE2: A oldfile -> olddir/newfile |
||||
=> creates A', With newdir/newfile representing original oldfile |
||||
|
||||
MERGE_BASE: A olddir/newfile |
||||
MERGE_SIDE1: A' newdir/newfile |
||||
MERGE_SIDE2: B modify olddir/newfile |
||||
=> expected B', with threeway-merged newdir/newfile from above |
||||
|
||||
In this case, with the optimization, note that after the first commit: |
||||
* MERGE_SIDE1 remembers olddir/ -> newdir/ |
||||
* MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile |
||||
(NOT oldfile -> newdir/newfile; compare to case with |
||||
(p->status == 'R' && new_path) in possibly_cache_new_pair()) |
||||
|
||||
Given the cached rename noted above, the second merge can proceed as |
||||
expected without needing to perform rename detection from A -> A'. |
||||
|
||||
Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir |
||||
|
||||
This case looks like this: |
||||
|
||||
MERGE_BASE: E, Has olddir/ |
||||
MERGE_SIDE1: G, Adds olddir/newfile |
||||
MERGE_SIDE2: A, Renames olddir/ -> newdir/ |
||||
=> creates A', With newdir/newfile |
||||
|
||||
MERGE_BASE: A, Has newdir/, but no notion of newdir/newfile |
||||
MERGE_SIDE1: A', Has newdir/newfile |
||||
MERGE_SIDE2: B, Has newdir/, but no notion of newdir/newfile |
||||
=> expected B', with newdir/newfile from A' |
||||
|
||||
In this case, with the optimization, note that after the first commit there |
||||
were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed. |
||||
But the second merge didn't need any renames so this is fine. |
||||
|
||||
Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir |
||||
|
||||
This case looks like this: |
||||
|
||||
MERGE_BASE: E, Has olddir/ |
||||
MERGE_SIDE1: G, Renames oldfile -> olddir/newfile |
||||
MERGE_SIDE2: A, Renames olddir/ -> newdir/ |
||||
=> creates A', With newdir/newfile representing original oldfile |
||||
|
||||
MERGE_BASE: A, Has oldfile |
||||
MERGE_SIDE1: A', Has newdir/newfile |
||||
MERGE_SIDE2: B, Modifies oldfile |
||||
=> expected B', with threeway-merged newdir/newfile from above |
||||
|
||||
In this case, with the optimization, note that after the first commit: |
||||
* MERGE_SIDE1 remembers oldfile -> newdir/newfile |
||||
(NOT oldfile -> olddir/newfile; compare to case of second |
||||
block under p->status == 'R' in possibly_cache_new_pair()) |
||||
* MERGE_SIDE2 renames are tossed because only MERGE_SIDE1 is remembered |
||||
|
||||
Given the cached rename noted above, the second merge can proceed as |
||||
expected without needing to perform rename detection from A -> A'. |
||||
|
||||
Finally, I'll just note here that interactions with the |
||||
skip-irrelevant-renames optimization means we sometimes don't detect |
||||
renames for any files within a directory that was renamed, in which |
||||
case we will not have been able to detect any rename for the directory |
||||
itself. In such a case, we do not know whether the directory was |
||||
renamed; we want to be careful to avoid cacheing some kind of "this |
||||
directory was not renamed" statement. If we did, then a subsequent |
||||
commit being rebased could add a file to the old directory, and the |
||||
user would expect it to end up in the correct directory -- something |
||||
our erroneous "this directory was not renamed" cache would preclude. |
@ -0,0 +1,700 @@
@@ -0,0 +1,700 @@
|
||||
#!/bin/sh |
||||
|
||||
test_description="remember regular & dir renames in sequence of merges" |
||||
|
||||
. ./test-lib.sh |
||||
|
||||
# |
||||
# NOTE 1: this testfile tends to not only rename files, but modify on both |
||||
# sides; without modifying on both sides, optimizations can kick in |
||||
# which make rename detection irrelevant or trivial. We want to make |
||||
# sure that we are triggering rename caching rather than rename |
||||
# bypassing. |
||||
# |
||||
# NOTE 2: this testfile uses 'test-tool fast-rebase' instead of either |
||||
# cherry-pick or rebase. sequencer.c is only superficially |
||||
# integrated with merge-ort; it calls merge_switch_to_result() |
||||
# after EACH merge, which updates the index and working copy AND |
||||
# throws away the cached results (because merge_switch_to_result() |
||||
# is only supposed to be called at the end of the sequence). |
||||
# Integrating them more deeply is a big task, so for now the tests |
||||
# use 'test-tool fast-rebase'. |
||||
# |
||||
|
||||
|
||||
# |
||||
# In the following simple testcase: |
||||
# Base: numbers_1, values_1 |
||||
# Upstream: numbers_2, values_2 |
||||
# Topic_1: sequence_3 |
||||
# Topic_2: scruples_3 |
||||
# or, in english, rename numbers -> sequence in the first commit, and rename |
||||
# values -> scruples in the second commit. |
||||
# |
||||
# This shouldn't be a challenge, it's just verifying that cached renames isn't |
||||
# preventing us from finding new renames. |
||||
# |
||||
test_expect_success 'caching renames does not preclude finding new ones' ' |
||||
test_create_repo caching-renames-and-new-renames && |
||||
( |
||||
cd caching-renames-and-new-renames && |
||||
|
||||
test_seq 2 10 >numbers && |
||||
test_seq 2 10 >values && |
||||
git add numbers values && |
||||
git commit -m orig && |
||||
|
||||
git branch upstream && |
||||
git branch topic && |
||||
|
||||
git switch upstream && |
||||
test_seq 1 10 >numbers && |
||||
test_seq 1 10 >values && |
||||
git add numbers values && |
||||
git commit -m "Tweaked both files" && |
||||
|
||||
git switch topic && |
||||
|
||||
test_seq 2 12 >numbers && |
||||
git add numbers && |
||||
git mv numbers sequence && |
||||
git commit -m A && |
||||
|
||||
test_seq 2 12 >values && |
||||
git add values && |
||||
git mv values scruples && |
||||
git commit -m B && |
||||
|
||||
# |
||||
# Actual testing |
||||
# |
||||
|
||||
git switch upstream && |
||||
|
||||
test-tool fast-rebase --onto HEAD upstream~1 topic && |
||||
#git cherry-pick upstream~1..topic |
||||
|
||||
git ls-files >tracked-files && |
||||
test_line_count = 2 tracked-files && |
||||
test_seq 1 12 >expect && |
||||
test_cmp expect sequence && |
||||
test_cmp expect scruples |
||||
) |
||||
' |
||||
|
||||
# |
||||
# In the following testcase: |
||||
# Base: numbers_1 |
||||
# Upstream: rename numbers_1 -> sequence_2 |
||||
# Topic_1: numbers_3 |
||||
# Topic_2: numbers_1 |
||||
# or, in english, the first commit on the topic branch modifies numbers by |
||||
# shrinking it (dramatically) and the second commit on topic reverts its |
||||
# parent. |
||||
# |
||||
# Can git apply both patches? |
||||
# |
||||
# Traditional cherry-pick/rebase will fail to apply the second commit, the |
||||
# one that reverted its parent, because despite detecting the rename from |
||||
# 'numbers' to 'sequence' for the first commit, it fails to detect that |
||||
# rename when picking the second commit. That's "reasonable" given the |
||||
# dramatic change in size of the file, but remembering the rename and |
||||
# reusing it is reasonable too. |
||||
# |
||||
# We do test here that we expect rename detection to only be run once total |
||||
# (the topic side of history doesn't need renames, and with caching we |
||||
# should be able to only run rename detection on the upstream side one |
||||
# time.) |
||||
test_expect_success 'cherry-pick both a commit and its immediate revert' ' |
||||
test_create_repo pick-commit-and-its-immediate-revert && |
||||
( |
||||
cd pick-commit-and-its-immediate-revert && |
||||
|
||||
test_seq 11 30 >numbers && |
||||
git add numbers && |
||||
git commit -m orig && |
||||
|
||||
git branch upstream && |
||||
git branch topic && |
||||
|
||||
git switch upstream && |
||||
test_seq 1 30 >numbers && |
||||
git add numbers && |
||||
git mv numbers sequence && |
||||
git commit -m "Renamed (and modified) numbers -> sequence" && |
||||
|
||||
git switch topic && |
||||
|
||||
test_seq 11 13 >numbers && |
||||
git add numbers && |
||||
git commit -m A && |
||||
|
||||
git revert HEAD && |
||||
|
||||
# |
||||
# Actual testing |
||||
# |
||||
|
||||
git switch upstream && |
||||
|
||||
GIT_TRACE2_PERF="$(pwd)/trace.output" && |
||||
export GIT_TRACE2_PERF && |
||||
|
||||
test-tool fast-rebase --onto HEAD upstream~1 topic && |
||||
#git cherry-pick upstream~1..topic && |
||||
|
||||
grep region_enter.*diffcore_rename trace.output >calls && |
||||
test_line_count = 1 calls |
||||
) |
||||
' |
||||
|
||||
# |
||||
# In the following testcase: |
||||
# Base: sequence_1 |
||||
# Upstream: rename sequence_1 -> values_2 |
||||
# Topic_1: rename sequence_1 -> values_3 |
||||
# Topic_2: add unrelated sequence_4 |
||||
# or, in english, both sides rename sequence -> values, and then the second |
||||
# commit on the topic branch adds an unrelated file called sequence. |
||||
# |
||||
# This testcase presents no problems for git traditionally, but having both |
||||
# sides do the same rename in effect "uses it up" and if it remains cached, |
||||
# could cause a spurious rename/add conflict. |
||||
# |
||||
test_expect_success 'rename same file identically, then reintroduce it' ' |
||||
test_create_repo rename-rename-1to1-then-add-old-filename && |
||||
( |
||||
cd rename-rename-1to1-then-add-old-filename && |
||||
|
||||
test_seq 3 8 >sequence && |
||||
git add sequence && |
||||
git commit -m orig && |
||||
|
||||
git branch upstream && |
||||
git branch topic && |
||||
|
||||
git switch upstream && |
||||
test_seq 1 8 >sequence && |
||||
git add sequence && |
||||
git mv sequence values && |
||||
git commit -m "Renamed (and modified) sequence -> values" && |
||||
|
||||
git switch topic && |
||||
|
||||
test_seq 3 10 >sequence && |
||||
git add sequence && |
||||
git mv sequence values && |
||||
git commit -m A && |
||||
|
||||
test_write_lines A B C D E F G H I J >sequence && |
||||
git add sequence && |
||||
git commit -m B && |
||||
|
||||
# |
||||
# Actual testing |
||||
# |
||||
|
||||
git switch upstream && |
||||
|
||||
GIT_TRACE2_PERF="$(pwd)/trace.output" && |
||||
export GIT_TRACE2_PERF && |
||||
|
||||
test-tool fast-rebase --onto HEAD upstream~1 topic && |
||||
#git cherry-pick upstream~1..topic && |
||||
|
||||
git ls-files >tracked && |
||||
test_line_count = 2 tracked && |
||||
test_path_is_file values && |
||||
test_path_is_file sequence && |
||||
|
||||
grep region_enter.*diffcore_rename trace.output >calls && |
||||
test_line_count = 2 calls |
||||
) |
||||
' |
||||
|
||||
# |
||||
# In the following testcase: |
||||
# Base: olddir/{valuesZ_1, valuesY_1, valuesX_1} |
||||
# Upstream: rename olddir/valuesZ_1 -> dirA/valuesZ_2 |
||||
# rename olddir/valuesY_1 -> dirA/valuesY_2 |
||||
# rename olddir/valuesX_1 -> dirB/valuesX_2 |
||||
# Topic_1: rename olddir/valuesZ_1 -> dirA/valuesZ_3 |
||||
# rename olddir/valuesY_1 -> dirA/valuesY_3 |
||||
# Topic_2: add olddir/newfile |
||||
# Expected Pick1: dirA/{valuesZ, valuesY}, dirB/valuesX |
||||
# Expected Pick2: dirA/{valuesZ, valuesY}, dirB/{valuesX, newfile} |
||||
# |
||||
# This testcase presents no problems for git traditionally, but having both |
||||
# sides do the same renames in effect "use it up" but if the renames remain |
||||
# cached, the directory rename could put newfile in the wrong directory. |
||||
# |
||||
test_expect_success 'rename same file identically, then add file to old dir' ' |
||||
test_create_repo rename-rename-1to1-then-add-file-to-old-dir && |
||||
( |
||||
cd rename-rename-1to1-then-add-file-to-old-dir && |
||||
|
||||
mkdir olddir/ && |
||||
test_seq 3 8 >olddir/valuesZ && |
||||
test_seq 3 8 >olddir/valuesY && |
||||
test_seq 3 8 >olddir/valuesX && |
||||
git add olddir && |
||||
git commit -m orig && |
||||
|
||||
git branch upstream && |
||||
git branch topic && |
||||
|
||||
git switch upstream && |
||||
test_seq 1 8 >olddir/valuesZ && |
||||
test_seq 1 8 >olddir/valuesY && |
||||
test_seq 1 8 >olddir/valuesX && |
||||
git add olddir && |
||||
mkdir dirA && |
||||
git mv olddir/valuesZ olddir/valuesY dirA && |
||||
git mv olddir/ dirB/ && |
||||
git commit -m "Renamed (and modified) values*" && |
||||
|
||||
git switch topic && |
||||
|
||||
test_seq 3 10 >olddir/valuesZ && |
||||
test_seq 3 10 >olddir/valuesY && |
||||
git add olddir && |
||||
mkdir dirA && |
||||
git mv olddir/valuesZ olddir/valuesY dirA && |
||||
git commit -m A && |
||||
|
||||
>olddir/newfile && |
||||
git add olddir/newfile && |
||||
git commit -m B && |
||||
|
||||
# |
||||
# Actual testing |
||||
# |
||||
|
||||
git switch upstream && |
||||
git config merge.directoryRenames true && |
||||
|
||||
GIT_TRACE2_PERF="$(pwd)/trace.output" && |
||||
export GIT_TRACE2_PERF && |
||||
|
||||
test-tool fast-rebase --onto HEAD upstream~1 topic && |
||||
#git cherry-pick upstream~1..topic && |
||||
|
||||
git ls-files >tracked && |
||||
test_line_count = 4 tracked && |
||||
test_path_is_file dirA/valuesZ && |
||||
test_path_is_file dirA/valuesY && |
||||
test_path_is_file dirB/valuesX && |
||||
test_path_is_file dirB/newfile && |
||||
|
||||
grep region_enter.*diffcore_rename trace.output >calls && |
||||
test_line_count = 3 calls |
||||
) |
||||
' |
||||
|
||||
# |
||||
# In the following testcase, upstream renames a directory, and the topic branch |
||||
# first adds a file to the directory, then later renames the directory |
||||
# differently: |
||||
# Base: olddir/a |
||||
# olddir/b |
||||
# Upstream: rename olddir/ -> newdir/ |
||||
# Topic_1: add olddir/newfile |
||||
# Topic_2: rename olddir/ -> otherdir/ |
||||
# |
||||
# Here we are just concerned that cached renames might prevent us from seeing |
||||
# the rename conflict, and we want to ensure that we do get a conflict. |
||||
# |
||||
# While at it, though, we do test that we only try to detect renames 2 |
||||
# times and not three. (The first merge needs to detect renames on the |
||||
# upstream side. Traditionally, the second merge would need to detect |
||||
# renames on both sides of history, but our caching of upstream renames |
||||
# should avoid the need to re-detect upstream renames.) |
||||
# |
||||
test_expect_success 'cached dir rename does not prevent noticing later conflict' ' |
||||
test_create_repo dir-rename-cache-not-occluding-later-conflict && |
||||
( |
||||
cd dir-rename-cache-not-occluding-later-conflict && |
||||
|
||||
mkdir olddir && |
||||
test_seq 3 10 >olddir/a && |
||||
test_seq 3 10 >olddir/b && |
||||
git add olddir && |
||||
git commit -m orig && |
||||
|
||||
git branch upstream && |
||||
git branch topic && |
||||
|
||||
git switch upstream && |
||||
test_seq 3 10 >olddir/a && |
||||
test_seq 3 10 >olddir/b && |
||||
git add olddir && |
||||
git mv olddir newdir && |
||||
git commit -m "Dir renamed" && |
||||
|
||||
git switch topic && |
||||
|
||||
>olddir/newfile && |
||||
git add olddir/newfile && |
||||
git commit -m A && |
||||
|
||||
test_seq 1 8 >olddir/a && |
||||
test_seq 1 8 >olddir/b && |
||||
git add olddir && |
||||
git mv olddir otherdir && |
||||
git commit -m B && |
||||
|
||||
# |
||||
# Actual testing |
||||
# |
||||
|
||||
git switch upstream && |
||||
git config merge.directoryRenames true && |
||||
|
||||
GIT_TRACE2_PERF="$(pwd)/trace.output" && |
||||
export GIT_TRACE2_PERF && |
||||
|
||||
test_must_fail test-tool fast-rebase --onto HEAD upstream~1 topic >output && |
||||
#git cherry-pick upstream..topic && |
||||
|
||||
grep CONFLICT..rename/rename output && |
||||
|
||||
grep region_enter.*diffcore_rename trace.output >calls && |
||||
test_line_count = 2 calls |
||||
) |
||||
' |
||||
|
||||
# Helper for the next two tests |
||||
test_setup_upstream_rename () { |
||||
test_create_repo $1 && |
||||
( |
||||
cd $1 && |
||||
|
||||
test_seq 3 8 >somefile && |
||||
test_seq 3 8 >relevant-rename && |
||||
git add somefile relevant-rename && |
||||
mkdir olddir && |
||||
test_write_lines a b c d e f g >olddir/a && |
||||
test_write_lines z y x w v u t >olddir/b && |
||||
git add olddir && |
||||
git commit -m orig && |
||||
|
||||
git branch upstream && |
||||
git branch topic && |
||||
|
||||
git switch upstream && |
||||
test_seq 1 8 >somefile && |
||||
test_seq 1 8 >relevant-rename && |
||||
git add somefile relevant-rename && |
||||
git mv relevant-rename renamed && |
||||
echo h >>olddir/a && |
||||
echo s >>olddir/b && |
||||
git add olddir && |
||||
git mv olddir newdir && |
||||
git commit -m "Dir renamed" |
||||
) |
||||
} |
||||
|
||||
# |
||||
# In the following testcase, upstream renames a file in the toplevel directory |
||||
# as well as its only directory: |
||||
# Base: relevant-rename_1 |
||||
# somefile |
||||
# olddir/a |
||||
# olddir/b |
||||
# Upstream: rename relevant-rename_1 -> renamed_2 |
||||
# rename olddir/ -> newdir/ |
||||
# Topic_1: relevant-rename_3 |
||||
# Topic_2: olddir/newfile_1 |
||||
# Topic_3: olddir/newfile_2 |
||||
# |
||||
# In this testcase, since the first commit being picked only modifies a |
||||
# file in the toplevel directory, the directory rename is irrelevant for |
||||
# that first merge. However, we need to notice the directory rename for |
||||
# the merge that picks the second commit, and we don't want the third |
||||
# commit to mess up its location either. We want to make sure that |
||||
# olddir/newfile doesn't exist in the result and that newdir/newfile does. |
||||
# |
||||
# We also test that we only do rename detection twice. We never need |
||||
# rename detection on the topic side of history, but we do need it twice on |
||||
# the upstream side of history. For the first topic commit, we only need |
||||
# the |
||||
# relevant-rename -> renamed |
||||
# rename, because olddir is unmodified by Topic_1. For Topic_2, however, |
||||
# the new file being added to olddir means files that were previously |
||||
# irrelevant for rename detection are now relevant, forcing us to repeat |
||||
# rename detection for the paths we don't already have cached. Topic_3 also |
||||
# tweaks olddir/newfile, but the renames in olddir/ will have been cached |
||||
# from the second rename detection run. |
||||
# |
||||
test_expect_success 'dir rename unneeded, then add new file to old dir' ' |
||||
test_setup_upstream_rename dir-rename-unneeded-until-new-file && |
||||
( |
||||
cd dir-rename-unneeded-until-new-file && |
||||
|
||||
git switch topic && |
||||
|
||||
test_seq 3 10 >relevant-rename && |
||||
git add relevant-rename && |
||||
git commit -m A && |
||||
|
||||
echo foo >olddir/newfile && |
||||
git add olddir/newfile && |
||||
git commit -m B && |
||||
|
||||
echo bar >>olddir/newfile && |
||||
git add olddir/newfile && |
||||
git commit -m C && |
||||
|
||||
# |
||||
# Actual testing |
||||
# |
||||
|
||||
git switch upstream && |
||||
git config merge.directoryRenames true && |
||||
|
||||
GIT_TRACE2_PERF="$(pwd)/trace.output" && |
||||
export GIT_TRACE2_PERF && |
||||
|
||||
test-tool fast-rebase --onto HEAD upstream~1 topic && |
||||
#git cherry-pick upstream..topic && |
||||
|
||||
grep region_enter.*diffcore_rename trace.output >calls && |
||||
test_line_count = 2 calls && |
||||
|
||||
git ls-files >tracked && |
||||
test_line_count = 5 tracked && |
||||
test_path_is_missing olddir/newfile && |
||||
test_path_is_file newdir/newfile |
||||
) |
||||
' |
||||
|
||||
# |
||||
# The following testcase is *very* similar to the last one, but instead of |
||||
# adding a new olddir/newfile, it renames somefile -> olddir/newfile: |
||||
# Base: relevant-rename_1 |
||||
# somefile_1 |
||||
# olddir/a |
||||
# olddir/b |
||||
# Upstream: rename relevant-rename_1 -> renamed_2 |
||||
# rename olddir/ -> newdir/ |
||||
# Topic_1: relevant-rename_3 |
||||
# Topic_2: rename somefile -> olddir/newfile_2 |
||||
# Topic_3: modify olddir/newfile_3 |
||||
# |
||||
# In this testcase, since the first commit being picked only modifies a |
||||
# file in the toplevel directory, the directory rename is irrelevant for |
||||
# that first merge. However, we need to notice the directory rename for |
||||
# the merge that picks the second commit, and we don't want the third |
||||
# commit to mess up its location either. We want to make sure that |
||||
# neither somefile or olddir/newfile exists in the result and that |
||||
# newdir/newfile does. |
||||
# |
||||
# This testcase needs one more call to rename detection than the last |
||||
# testcase, because of the somefile -> olddir/newfile rename in Topic_2. |
||||
test_expect_success 'dir rename unneeded, then rename existing file into old dir' ' |
||||
test_setup_upstream_rename dir-rename-unneeded-until-file-moved-inside && |
||||
( |
||||
cd dir-rename-unneeded-until-file-moved-inside && |
||||
|
||||
git switch topic && |
||||
|
||||
test_seq 3 10 >relevant-rename && |
||||
git add relevant-rename && |
||||
git commit -m A && |
||||
|
||||
test_seq 1 10 >somefile && |
||||
git add somefile && |
||||
git mv somefile olddir/newfile && |
||||
git commit -m B && |
||||
|
||||
test_seq 1 12 >olddir/newfile && |
||||
git add olddir/newfile && |
||||
git commit -m C && |
||||
|
||||
# |
||||
# Actual testing |
||||
# |
||||
|
||||
git switch upstream && |
||||
git config merge.directoryRenames true && |
||||
|
||||
GIT_TRACE2_PERF="$(pwd)/trace.output" && |
||||
export GIT_TRACE2_PERF && |
||||
|
||||
test-tool fast-rebase --onto HEAD upstream~1 topic && |
||||
#git cherry-pick upstream..topic && |
||||
|
||||
grep region_enter.*diffcore_rename trace.output >calls && |
||||
test_line_count = 3 calls && |
||||
|
||||
test_path_is_missing somefile && |
||||
test_path_is_missing olddir/newfile && |
||||
test_path_is_file newdir/newfile && |
||||
git ls-files >tracked && |
||||
test_line_count = 4 tracked |
||||
) |
||||
' |
||||
|
||||
# Helper for the next two tests |
||||
test_setup_topic_rename () { |
||||
test_create_repo $1 && |
||||
( |
||||
cd $1 && |
||||
|
||||
test_seq 3 8 >somefile && |
||||
mkdir olddir && |
||||
test_seq 3 8 >olddir/a && |
||||
echo b >olddir/b && |
||||
git add olddir somefile && |
||||
git commit -m orig && |
||||
|
||||
git branch upstream && |
||||
git branch topic && |
||||
|
||||
git switch topic && |
||||
test_seq 1 8 >somefile && |
||||
test_seq 1 8 >olddir/a && |
||||
git add somefile olddir/a && |
||||
git mv olddir newdir && |
||||
git commit -m "Dir renamed" && |
||||
|
||||
test_seq 1 10 >somefile && |
||||
git add somefile && |
||||
mkdir olddir && |
||||
>olddir/unrelated-file && |
||||
git add olddir && |
||||
git commit -m "Unrelated file in recreated old dir" |
||||
) |
||||
} |
||||
|
||||
# |
||||
# In the following testcase, the first commit on the topic branch renames |
||||
# a directory, while the second recreates the old directory and places a |
||||
# file into it: |
||||
# Base: somefile |
||||
# olddir/a |
||||
# olddir/b |
||||
# Upstream: olddir/newfile |
||||
# Topic_1: somefile_2 |
||||
# rename olddir/ -> newdir/ |
||||
# Topic_2: olddir/unrelated-file |
||||
# |
||||
# Note that the first pick should merge: |
||||
# Base: somefile |
||||
# olddir/{a,b} |
||||
# Upstream: olddir/newfile |
||||
# Topic_1: rename olddir/ -> newdir/ |
||||
# For which the expected result (assuming merge.directoryRenames=true) is |
||||
# clearly: |
||||
# Result: somefile |
||||
# newdir/{a, b, newfile} |
||||
# |
||||
# While the second pick does the following three-way merge: |
||||
# Base (Topic_1): somefile |
||||
# newdir/{a,b} |
||||
# Upstream (Result from 1): same files as base, but adds newdir/newfile |
||||
# Topic_2: same files as base, but adds olddir/unrelated-file |
||||
# |
||||
# The second merge is pretty trivial; upstream adds newdir/newfile, and |
||||
# topic_2 adds olddir/unrelated-file. We're just testing that we don't |
||||
# accidentally cache directory renames somehow and rename |
||||
# olddir/unrelated-file to newdir/unrelated-file. |
||||
# |
||||
# This testcase should only need one call to diffcore_rename_extended(). |
||||
test_expect_success 'caching renames only on upstream side, part 1' ' |
||||
test_setup_topic_rename cache-renames-only-upstream-add-file && |
||||
( |
||||
cd cache-renames-only-upstream-add-file && |
||||
|
||||
git switch upstream && |
||||
|
||||
>olddir/newfile && |
||||
git add olddir/newfile && |
||||
git commit -m "Add newfile" && |
||||
|
||||
# |
||||
# Actual testing |
||||
# |
||||
|
||||
git switch upstream && |
||||
|
||||
git config merge.directoryRenames true && |
||||
|
||||
GIT_TRACE2_PERF="$(pwd)/trace.output" && |
||||
export GIT_TRACE2_PERF && |
||||
|
||||
test-tool fast-rebase --onto HEAD upstream~1 topic && |
||||
#git cherry-pick upstream..topic && |
||||
|
||||
grep region_enter.*diffcore_rename trace.output >calls && |
||||
test_line_count = 1 calls && |
||||
|
||||
git ls-files >tracked && |
||||
test_line_count = 5 tracked && |
||||
test_path_is_missing newdir/unrelated-file && |
||||
test_path_is_file olddir/unrelated-file && |
||||
test_path_is_file newdir/newfile && |
||||
test_path_is_file newdir/b && |
||||
test_path_is_file newdir/a && |
||||
test_path_is_file somefile |
||||
) |
||||
' |
||||
|
||||
# |
||||
# The following testcase is *very* similar to the last one, but instead of |
||||
# adding a new olddir/newfile, it renames somefile -> olddir/newfile: |
||||
# Base: somefile |
||||
# olddir/a |
||||
# olddir/b |
||||
# Upstream: somefile_1 -> olddir/newfile |
||||
# Topic_1: rename olddir/ -> newdir/ |
||||
# somefile_2 |
||||
# Topic_2: olddir/unrelated-file |
||||
# somefile_3 |
||||
# |
||||
# Much like the previous test, this case is actually trivial and we are just |
||||
# making sure there isn't some spurious directory rename caching going on |
||||
# for the wrong side of history. |
||||
# |
||||
# |
||||
# This testcase should only need two calls to diffcore_rename_extended(), |
||||
# both for the first merge, one for each side of history. |
||||
# |
||||
test_expect_success 'caching renames only on upstream side, part 2' ' |
||||
test_setup_topic_rename cache-renames-only-upstream-rename-file && |
||||
( |
||||
cd cache-renames-only-upstream-rename-file && |
||||
|
||||
git switch upstream && |
||||
|
||||
git mv somefile olddir/newfile && |
||||
git commit -m "Add newfile" && |
||||
|
||||
# |
||||
# Actual testing |
||||
# |
||||
|
||||
git switch upstream && |
||||
|
||||
git config merge.directoryRenames true && |
||||
|
||||
GIT_TRACE2_PERF="$(pwd)/trace.output" && |
||||
export GIT_TRACE2_PERF && |
||||
|
||||
test-tool fast-rebase --onto HEAD upstream~1 topic && |
||||
#git cherry-pick upstream..topic && |
||||
|
||||
grep region_enter.*diffcore_rename trace.output >calls && |
||||
test_line_count = 2 calls && |
||||
|
||||
git ls-files >tracked && |
||||
test_line_count = 4 tracked && |
||||
test_path_is_missing newdir/unrelated-file && |
||||
test_path_is_file olddir/unrelated-file && |
||||
test_path_is_file newdir/newfile && |
||||
test_path_is_file newdir/b && |
||||
test_path_is_file newdir/a |
||||
) |
||||
' |
||||
|
||||
test_done |
Loading…
Reference in new issue