diff --git a/Documentation/technical/remembering-renames.txt b/Documentation/technical/remembering-renames.txt new file mode 100644 index 0000000000..2fd5cc88e0 --- /dev/null +++ b/Documentation/technical/remembering-renames.txt @@ -0,0 +1,671 @@ +Rebases and cherry-picks involve a sequence of merges whose results are +recorded as new single-parent commits. The first parent side of those +merges represent the "upstream" side, and often include a far larger set of +changes than the second parent side. Traditionally, the renames on the +first-parent side of that sequence of merges were repeatedly re-detected +for every merge. This file explains why it is safe and effective during +rebases and cherry-picks to remember renames on the upstream side of +history as an optimization, assuming all merges are automatic and clean +(i.e. no conflicts and not interrupted for user input or editing). + +Outline: + + 0. Assumptions + + 1. How rebasing and cherry-picking work + + 2. Why the renames on MERGE_SIDE1 in any given pick are *always* a + superset of the renames on MERGE_SIDE1 for the next pick. + + 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also + a rename on MERGE_SIDE1 for the next pick + + 4. A detailed description of the the counter-examples to #3. + + 5. Why the special cases in #4 are still fully reasonable to use to pair + up files for three-way content merging in the merge machinery, and why + they do not affect the correctness of the merge. + + 6. Interaction with skipping of "irrelevant" renames + + 7. Additional items that need to be cached + + 8. How directory rename detection interacts with the above and why this + optimization is still safe even if merge.directoryRenames is set to + "true". + + +=== 0. Assumptions === + +There are two assumptions that will hold throughout this document: + + * The upstream side where commits are transplanted to is treated as the + first parent side when rebase/cherry-pick call the merge machinery + + * All merges are fully automatic + +and a third that will hold in sections 2-5 for simplicity, that I'll later +address in section 8: + + * No directory renames occur + + +Let me explain more about each assumption and why I include it: + + +The first assumption is merely for the purposes of making this document +clearer; the optimization implementation does not actually depend upon it. +However, the assumption does hold in all cases because it reflects the way +that both rebase and cherry-pick were implemented; and the implementation +of cherry-pick and rebase are not readily changeable for backwards +compatibility reasons (see for example the discussion of the --ours and +--theirs flag in the documentation of `git checkout`, particularly the +comments about how they behave with rebase). The optimization avoids +checking first-parent-ness, though. It checks the conditions that make the +optimization valid instead, so it would still continue working if someone +changed the parent ordering that cherry-pick and rebase use. But making +this assumption does make this document much clearer and prevents me from +having to repeat every example twice. + +If the second assumption is violated, then the optimization simply is +turned off and thus isn't relevant to consider. The second assumption can +also be stated as "there is no interruption for a user to resolve conflicts +or to just further edit or tweak files". While real rebases and +cherry-picks are often interrupted (either because it's an interactive +rebase where the user requested to stop and edit, or because there were +conflicts that the user needs to resolve), the cache of renames is not +stored on disk, and thus is thrown away as soon as the rebase or cherry +pick stops for the user to resolve the operation. + +The third assumption makes sections 2-5 simpler, and allows people to +understand the basics of why this optimization is safe and effective, and +then I can go back and address the specifics in section 8. It is probably +also worth noting that if directory renames do occur, then the default of +merge.directoryRenames being set to "conflict" means that the operation +will stop for users to resolve the conflicts and the cache will be thrown +away, and thus that there won't be an optimization to apply. So, the only +reason we need to address directory renames specifically, is that some +users will have set merge.directoryRenames to "true" to allow the merges to +continue to proceed automatically. The optimization is still safe with +this config setting, but we have to discuss a few more cases to show why; +this discussion is deferred until section 8. + + +=== 1. How rebasing and cherry-picking work === + +Consider the following setup (from the git-rebase manpage): + + A---B---C topic + / + D---E---F---G main + +After rebasing or cherry-picking topic onto main, this will appear as: + + A'--B'--C' topic + / + D---E---F---G main + +The way the commits A', B', and C' are created is through a series of +merges, where rebase or cherry-pick sequentially uses each of the three +A-B-C commits in a special merge operation. Let's label the three commits +in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. For +this picture, the three commits for each of the three merges would be: + +To create A': + MERGE_BASE: E + MERGE_SIDE1: G + MERGE_SIDE2: A + +To create B': + MERGE_BASE: A + MERGE_SIDE1: A' + MERGE_SIDE2: B + +To create C': + MERGE_BASE: B + MERGE_SIDE1: B' + MERGE_SIDE2: C + +Sometimes, folks are surprised that these three-way merges are done. It +can be useful in understanding these three-way merges to view them in a +slightly different light. For example, in creating C', you can view it as +either: + + * Apply the changes between B & C to B' + * Apply the changes between B & B' to C + +Conceptually the two statements above are the same as a three-way merge of +B, B', and C, at least the parts before you decide to record a commit. + + +=== 2. Why the renames on MERGE_SIDE1 in any given pick are always a === +=== superset of the renames on MERGE_SIDE1 for the next pick. === + +The merge machinery uses the filenames it is fed from MERGE_BASE, +MERGE_SIDE1, and MERGE_SIDE2. It will only move content to a different +filename under one of three conditions: + + * To make both pieces of a conflict available to a user during conflict + resolution (examples: directory/file conflict, add/add type conflict + such as symlink vs. regular file) + + * When MERGE_SIDE1 renames the file. + + * When MERGE_SIDE2 renames the file. + +First, let's remember what commits are involved in the first and second +picks of the cherry-pick or rebase sequence: + +To create A': + MERGE_BASE: E + MERGE_SIDE1: G + MERGE_SIDE2: A + +To create B': + MERGE_BASE: A + MERGE_SIDE1: A' + MERGE_SIDE2: B + +So, in particular, we need to show that the renames between E and G are a +superset of those between A and A'. + +A' is created by the first merge. A' will only have renames for one of the +three reasons listed above. The first case, a conflict, results in a +situation where the cache is dropped and thus this optimization doesn't +take effect, so we need not consider that case. The third case, a rename +on MERGE_SIDE2 (i.e. from G to A), will show up in A' but it also shows up +in A -- therefore when diffing A and A' that path does not show up as a +rename. The only remaining way for renames to show up in A' is for the +rename to come from MERGE_SIDE1. Therefore, all renames between A and A' +are a subset of those between E and G. Equivalently, all renames between E +and G are a superset of those between A and A'. + + +=== 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ === +=== always also a rename on MERGE_SIDE1 for the next pick. === + +Let's again look at the first two picks: + +To create A': + MERGE_BASE: E + MERGE_SIDE1: G + MERGE_SIDE2: A + +To create B': + MERGE_BASE: A + MERGE_SIDE1: A' + MERGE_SIDE2: B + +Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e. +any given rename from E to G. Let's use the filenames 'oldfile' and +'newfile' for demonstration purposes. That first pick will function as +follows; when the rename is detected, the merge machinery will do a +three-way content merge of the following: + E:oldfile + G:newfile + A:oldfile +and produce a new result: + A':newfile + +Note above that I've assumed that E->A did not rename oldfile. If that +side did rename, then we most likely have a rename/rename(1to2) conflict +that will cause the rebase or cherry-pick operation to halt and drop the +in-memory cache of renames and thus doesn't need to be considered further. +In the special case that E->A does rename the file but also renames it to +newfile, then there is no conflict from the renaming and the merge can +succeed. In this special case, the rename is not valid to cache because +the second merge will find A:newfile in the MERGE_BASE (see also the new +testcases in t6429 with "rename same file identically" in their +description). So a rename/rename(1to1) needs to be specially handled by +pruning renames from the cache and decrementing the dir_rename_counts in +the current and leading directories associated with those renames. Or, +since these are really rare, one could just take the easy way out and +disable the remembering renames optimization when a rename/rename(1to1) +happens. + +The previous paragraph handled the cases for E->A renaming oldfile, let's +continue assuming that oldfile is not renamed in A. + +As per the diagram for creating B', MERGE_SIDE1 involves the changes from A +to A'. So, we are curious whether A:oldfile and A':newfile will be viewed +as renames. Note that: + + * There will be no A':oldfile (because there could not have been a + G:oldfile as we do not do break detection in the merge machinery and + G:newfile was detected as a rename, and by the construction of the + rename above that merged cleanly, the merge machinery will ensure there + is no 'oldfile' in the result). + + * There will be no A:newfile (if there had been, we would have had a + rename/add conflict). + + * Clearly A:oldfile and A':newfile are "related" (A':newfile came from a + clean three-way content merge involving A:oldfile). + +We can also expound on the third point above, by noting that three-way +content merges can also be viewed as applying the differences between the +base and one side to the other side. Thus we can view A':newfile as +having been created by taking the changes between E:oldfile and G:newfile +(which were detected as being related, i.e. <50% changed) to A:oldfile. + +Thus A:oldfile and A':newfile are just as related as E:oldfile and +G:newfile are -- they have exactly identical differences. Since the latter +were detected as renames, A:oldfile and A':newfile should also be +detectable as renames almost always. + + +=== 4. A detailed description of the counter-examples to #3. === + +We already noted in section 3 that rename/rename(1to1) (i.e. both sides +renaming a file the same way) was one counter-example. The more +interesting bit, though, is why did we need to use the "almost" qualifier +when stating that A:oldfile and A':newfile are "almost" always detectable +as renames? + +Let's repeat an earlier point that section 3 made: + + A':newfile was created by applying the changes between E:oldfile and + G:newfile to A:oldfile. The changes between E:oldfile and G:newfile were + <50% of the size of E:oldfile. + +If those changes that were <50% of the size of E:oldfile are also <50% of +the size of A:oldfile, then A:oldfile and A':newfile will be detectable as +renames. However, if there is a dramatic size reduction between E:oldfile +and A:oldfile (but the changes between E:oldfile, G:newfile, and A:oldfile +still somehow merge cleanly), then traditional rename detection would not +detect A:oldfile and A':newfile as renames. + +Here's an example where that can happen: + * E:oldfile had 20 lines + * G:newfile added 10 new lines at the beginning of the file + * A:oldfile kept the first 3 lines of the file, and deleted all the rest +then + => A':newfile would have 13 lines, 3 of which matches those in A:oldfile. +E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and +A':newfile would not be. + + +=== 5. Why the special cases in #4 are still fully reasonable to use to === +=== pair up files for three-way content merging in the merge machinery, === +=== and why they do not affect the correctness of the merge. === + +In the rename/rename(1to1) case, A:newfile and A':newfile are not renames +since they use the *same* filename. However, files with the same filename +are obviously fine to pair up for three-way content merging (the merge +machinery has never employed break detection). The interesting +counter-example case is thus not the rename/rename(1to1) case, but the case +where A did not rename oldfile. That was the case that we spent most of +the time discussing in sections 3 and 4. The remainder of this section +will be devoted to that case as well. + +So, even if A:oldfile and A':newfile aren't detectable as renames, why is +it still reasonable to pair them up for three-way content merging in the +merge machinery? There are multiple reasons: + + * As noted in sections 3 and 4, the diff between A:oldfile and A':newfile + is *exactly* the same as the diff between E:oldfile and G:newfile. The + latter pair were detected as renames, so it seems unlikely to surprise + users for us to treat A:oldfile and A':newfile as renames. + + * In fact, "oldfile" and "newfile" were at one point detected as renames + due to how they were constructed in the E..G chain. And we used that + information once already in this rebase/cherry-pick. I think users + would be unlikely to be surprised at us continuing to treat the files + as renames and would quickly understand why we had done so. + + * Marking or declaring files as renames is *not* the end goal for merges. + Merges use renames to determine which files make sense to be paired up + for three-way content merges. + + * A:oldfile and A':newfile were _already_ paired up in a three-way + content merge; that is how A':newfile was created. In fact, that + three-way content merge was clean. So using them again in a later + three-way content merge seems very reasonable. + +However, the above is focusing on the common scenarios. Let's try to look +at all possible unusual scenarios and compare without the optimization to +with the optimization. Consider the following theoretical cases; we will +then dive into each to determine which of them are possible, +and if so, what they mean: + + 1. Without the optimization, the second merge results in a conflict. + With the optimization, the second merge also results in a conflict. + Questions: Are the conflicts confusingly different? Better in one case? + + 2. Without the optimization, the second merge results in NO conflict. + With the optimization, the second merge also results in NO conflict. + Questions: Are the merges the same? + + 3. Without the optimization, the second merge results in a conflict. + With the optimization, the second merge results in NO conflict. + Questions: Possible? Bug, bugfix, or something else? + + 4. Without the optimization, the second merge results in NO conflict. + With the optimization, the second merge results in a conflict. + Questions: Possible? Bug, bugfix, or something else? + +I'll consider all four cases, but out of order. + +The fourth case is impossible. For the code without the remembering +renames optimization to not get a conflict, B:oldfile would need to exactly +match A:oldfile -- if it doesn't, there would be a modify/delete conflict. +If A:oldfile matches B:oldfile exactly, then a three-way content merge +between A:oldfile, A':newfile, and B:oldfile would have no conflict and +just give us the version of newfile from A' as the result. + +From the same logic as the above paragraph, the second case would indeed +result in identical merges. When A:oldfile exactly matches B:oldfile, an +undetected rename would say, "Oh, I see one side didn't modify 'oldfile' +and the other side deleted it. I'll delete it. And I see you have this +brand new file named 'newfile' in A', so I'll keep it." That gives the +same results as three-way content merging A:oldfile, A':newfile, and +B:oldfile -- a removal of oldfile with the version of newfile from A' +showing up in the result. + +The third case is interesting. It means that A:oldfile and A':newfile were +not just similar enough, but that the changes between them did not conflict +with the changes between A:oldfile and B:oldfile. This would validate our +hunch that the files were similar enough to be used in a three-way content +merge, and thus seems entirely correct for us to have used them that way. +(Sidenote: One particular example here may be enlightening. Let's say that +B was an immediate revert of A. B clearly would have been a clean revert +of A, since A was B's immediate parent. One would assume that if you can +pick a commit, you should also be able to cherry-pick its immediate revert. +However, this is one of those funny corner cases; without this +optimization, we just successfully picked a commit cleanly, but we are +unable to cherry-pick its immediate revert due to the size differences +between E:oldfile and A:oldfile.) + +That leaves only the first case to consider -- when we get conflicts both +with or without the optimization. Without the optimization, we'll have a +modify/delete conflict, where both A':newfile and B:oldfile are left in the +tree for the user to deal with and no hints about the potential similarity +between the two. With the optimization, we'll have a three-way content +merged A:oldfile, A':newfile, and B:oldfile with conflict markers +suggesting we thought the files were related but giving the user the chance +to resolve. As noted above, I don't think users will find us treating +'oldfile' and 'newfile' as related as a surprise since they were between E +and G. In any event, though, this case shouldn't be concerning since we +hit a conflict in both cases, told the user what we know, and asked them to +resolve it. + +So, in summary, case 4 is impossible, case 2 yields the same behavior, and +cases 1 and 3 seem to provide as good or better behavior with the +optimization than without. + + +=== 6. Interaction with skipping of "irrelevant" renames === + +Previous optimizations involved skipping rename detection for paths +considered to be "irrelevant". See for example the following commits: + + * 32a56dfb99 ("merge-ort: precompute subset of sources for which we + need rename detection", 2021-03-11) + * 2fd9eda462 ("merge-ort: precompute whether directory rename + detection is needed", 2021-03-11) + * 9bd342137e ("diffcore-rename: determine which relevant_sources are + no longer relevant", 2021-03-13) + +Relevance is always determined by what the _other_ side of history has +done, in terms of modifing a file that our side renamed, or adding a +file to a directory which our side renamed. This means that a path +that is "irrelevant" when picking the first commit of a series in a +rebase or cherry-pick, may suddenly become "relevant" when picking the +next commit. + +The upshot of this is that we can only cache rename detection results +for relevant paths, and need to re-check relevance in subsequent +commits. If those subsequent commits have additional paths that are +relevant for rename detection, then we will need to redo rename +detection -- though we can limit it to the paths for which we have not +already detected renames. + + +=== 7. Additional items that need to be cached === + +It turns out we have to cache more than just renames; we also cache: + + A) non-renames (i.e. unpaired deletes) + B) counts of renames within directories + C) sources that were marked as RELEVANT_LOCATION, but which were + downgraded to RELEVANT_NO_MORE + D) the toplevel trees involved in the merge + +These are all stored in struct rename_info, and respectively appear in + * cached_pairs (along side actual renames, just with a value of NULL) + * dir_rename_counts + * cached_irrelevant + * merge_trees + +The reason for (A) comes from the irrelevant renames skipping +optimization discussed in section 6. The fact that irrelevant renames +are skipped means we only get a subset of the potential renames +detected and subsequent commits may need to run rename detection on +the upstream side on a subset of the remaining renames (to get the +renames that are relevant for that later commit). Since unpaired +deletes are involved in rename detection too, we don't want to +repeatedly check that those paths remain unpaired on the upstream side +with every commit we are transplanting. + +The reason for (B) is that diffcore_rename_extended() is what +generates the counts of renames by directory which is needed in +directory rename detection, and if we don't run +diffcore_rename_extended() again then we need to have the output from +it, including dir_rename_counts, from the previous run. + +The reason for (C) is that merge-ort's tree traversal will again think +those paths are relevant (marking them as RELEVANT_LOCATION), but the +fact that they were downgraded to RELEVANT_NO_MORE means that +dir_rename_counts already has the information we need for directory +rename detection. (A path which becomes RELEVANT_CONTENT in a +subsequent commit will be removed from cached_irrelevant.) + +The reason for (D) is that is how we determine whether the remember +renames optimization can be used. In particular, remembering that our +sequence of merges looks like: + + Merge 1: + MERGE_BASE: E + MERGE_SIDE1: G + MERGE_SIDE2: A + => Creates A' + + Merge 2: + MERGE_BASE: A + MERGE_SIDE1: A' + MERGE_SIDE2: B + => Creates B' + +It is the fact that the trees A and A' appear both in Merge 1 and in +Merge 2, with A as a parent of A' that allows this optimization. So +we store the trees to compare with what we are asked to merge next +time. + + +=== 8. How directory rename detection interacts with the above and === +=== why this optimization is still safe even if === +=== merge.directoryRenames is set to "true". === + +As noted in the assumptions section: + + """ + ...if directory renames do occur, then the default of + merge.directoryRenames being set to "conflict" means that the operation + will stop for users to resolve the conflicts and the cache will be + thrown away, and thus that there won't be an optimization to apply. + So, the only reason we need to address directory renames specifically, + is that some users will have set merge.directoryRenames to "true" to + allow the merges to continue to proceed automatically. + """ + +Let's remember that we need to look at how any given pick affects the next +one. So let's again use the first two picks from the diagram in section +one: + + First pick does this three-way merge: + MERGE_BASE: E + MERGE_SIDE1: G + MERGE_SIDE2: A + => creates A' + + Second pick does this three-way merge: + MERGE_BASE: A + MERGE_SIDE1: A' + MERGE_SIDE2: B + => creates B' + +Now, directory rename detection exists so that if one side of history +renames a directory, and the other side adds a new file to the old +directory, then the merge (with merge.directoryRenames=true) can move the +file into the new directory. There are two qualitatively different ways to +add a new file to an old directory: create a new file, or rename a file +into that directory. Also, directory renames can be done on either side of +history, so there are four cases to consider: + + * MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir + * MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir + * MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir + * MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir + +One last note before we consider these four cases: There are some +important properties about how we implement this optimization with +respect to directory rename detection that we need to bear in mind +while considering all of these cases: + + * rename caching occurs *after* applying directory renames + + * a rename created by directory rename detection is recorded for the side + of history that did the directory rename. + + * dir_rename_counts, the nested map of + {oldname => {newname => count}}, + is cached between runs as well. This basically means that directory + rename detection is also cached, though only on the side of history + that we cache renames for (MERGE_SIDE1 as far as this document is + concerned; see the assumptions section). Two interesting sub-notes + about these counts: + + * If we need to perform rename-detection again on the given side (e.g. + some paths are relevant for rename detection that weren't before), + then we clear dir_rename_counts and recompute it, making use of + cached_pairs. The reason it is important to do this is optimizations + around RELEVANT_LOCATION exist to prevent us from computing + unnecessary renames for directory rename detection and from computing + dir_rename_counts for irrelevant directories; but those same renames + or directories may become necessary for subsequent merges. The + easiest way to "fix up" dir_rename_counts in such cases is to just + recompute it. + + * If we prune rename/rename(1to1) entries from the cache, then we also + need to update dir_rename_counts to decrement the counts for the + involved directory and any relevant parent directories (to undo what + update_dir_rename_counts() in diffcore-rename.c incremented when the + rename was initially found). If we instead just disable the + remembering renames optimization when the exceedingly rare + rename/rename(1to1) cases occur, then dir_rename_counts will get + re-computed the next time rename detection occurs, as noted above. + + * the side with multiple commits to pick, is the side of history that we + do NOT cache renames for. Thus, there are no additional commits to + change the number of renames in a directory, except for those done by + directory rename detection (which always pad the majority). + + * the "renames" we cache are modified slightly by any directory rename, + as noted below. + +Now, with those notes out of the way, let's go through the four cases +in order: + +Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir + + This case looks like this: + + MERGE_BASE: E, Has olddir/ + MERGE_SIDE1: G, Renames olddir/ -> newdir/ + MERGE_SIDE2: A, Adds olddir/newfile + => creates A', With newdir/newfile + + MERGE_BASE: A, Has olddir/newfile + MERGE_SIDE1: A', Has newdir/newfile + MERGE_SIDE2: B, Modifies olddir/newfile + => expected B', with threeway-merged newdir/newfile from above + + In this case, with the optimization, note that after the first commit: + * MERGE_SIDE1 remembers olddir/ -> newdir/ + * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile + Given the cached rename noted above, the second merge can proceed as + expected without needing to perform rename detection from A -> A'. + +Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir + + This case looks like this: + MERGE_BASE: E oldfile, olddir/ + MERGE_SIDE1: G oldfile, olddir/ -> newdir/ + MERGE_SIDE2: A oldfile -> olddir/newfile + => creates A', With newdir/newfile representing original oldfile + + MERGE_BASE: A olddir/newfile + MERGE_SIDE1: A' newdir/newfile + MERGE_SIDE2: B modify olddir/newfile + => expected B', with threeway-merged newdir/newfile from above + + In this case, with the optimization, note that after the first commit: + * MERGE_SIDE1 remembers olddir/ -> newdir/ + * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile + (NOT oldfile -> newdir/newfile; compare to case with + (p->status == 'R' && new_path) in possibly_cache_new_pair()) + + Given the cached rename noted above, the second merge can proceed as + expected without needing to perform rename detection from A -> A'. + +Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir + + This case looks like this: + + MERGE_BASE: E, Has olddir/ + MERGE_SIDE1: G, Adds olddir/newfile + MERGE_SIDE2: A, Renames olddir/ -> newdir/ + => creates A', With newdir/newfile + + MERGE_BASE: A, Has newdir/, but no notion of newdir/newfile + MERGE_SIDE1: A', Has newdir/newfile + MERGE_SIDE2: B, Has newdir/, but no notion of newdir/newfile + => expected B', with newdir/newfile from A' + + In this case, with the optimization, note that after the first commit there + were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed. + But the second merge didn't need any renames so this is fine. + +Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir + + This case looks like this: + + MERGE_BASE: E, Has olddir/ + MERGE_SIDE1: G, Renames oldfile -> olddir/newfile + MERGE_SIDE2: A, Renames olddir/ -> newdir/ + => creates A', With newdir/newfile representing original oldfile + + MERGE_BASE: A, Has oldfile + MERGE_SIDE1: A', Has newdir/newfile + MERGE_SIDE2: B, Modifies oldfile + => expected B', with threeway-merged newdir/newfile from above + + In this case, with the optimization, note that after the first commit: + * MERGE_SIDE1 remembers oldfile -> newdir/newfile + (NOT oldfile -> olddir/newfile; compare to case of second + block under p->status == 'R' in possibly_cache_new_pair()) + * MERGE_SIDE2 renames are tossed because only MERGE_SIDE1 is remembered + + Given the cached rename noted above, the second merge can proceed as + expected without needing to perform rename detection from A -> A'. + +Finally, I'll just note here that interactions with the +skip-irrelevant-renames optimization means we sometimes don't detect +renames for any files within a directory that was renamed, in which +case we will not have been able to detect any rename for the directory +itself. In such a case, we do not know whether the directory was +renamed; we want to be careful to avoid cacheing some kind of "this +directory was not renamed" statement. If we did, then a subsequent +commit being rebased could add a file to the old directory, and the +user would expect it to end up in the correct directory -- something +our erroneous "this directory was not renamed" cache would preclude. diff --git a/diffcore-rename.c b/diffcore-rename.c index 963ca58221..3375e24659 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -568,7 +568,8 @@ static void update_dir_rename_counts(struct dir_rename_info *info, static void initialize_dir_rename_info(struct dir_rename_info *info, struct strintmap *relevant_sources, struct strintmap *dirs_removed, - struct strmap *dir_rename_count) + struct strmap *dir_rename_count, + struct strmap *cached_pairs) { struct hashmap_iter iter; struct strmap_entry *entry; @@ -633,6 +634,17 @@ static void initialize_dir_rename_info(struct dir_rename_info *info, rename_dst[i].p->two->path); } + /* Add cached_pairs to counts */ + strmap_for_each_entry(cached_pairs, &iter, entry) { + const char *old_name = entry->key; + const char *new_name = entry->value; + if (!new_name) + /* known delete; ignore it */ + continue; + + update_dir_rename_counts(info, dirs_removed, old_name, new_name); + } + /* * Now we collapse * dir_rename_count: old_directory -> {new_directory -> count} @@ -1247,7 +1259,8 @@ static void handle_early_known_dir_renames(struct dir_rename_info *info, void diffcore_rename_extended(struct diff_options *options, struct strintmap *relevant_sources, struct strintmap *dirs_removed, - struct strmap *dir_rename_count) + struct strmap *dir_rename_count, + struct strmap *cached_pairs) { int detect_rename = options->detect_rename; int minimum_score = options->rename_score; @@ -1363,7 +1376,8 @@ void diffcore_rename_extended(struct diff_options *options, /* Preparation for basename-driven matching. */ trace2_region_enter("diff", "dir rename setup", options->repo); initialize_dir_rename_info(&info, relevant_sources, - dirs_removed, dir_rename_count); + dirs_removed, dir_rename_count, + cached_pairs); trace2_region_leave("diff", "dir rename setup", options->repo); /* Utilize file basenames to quickly find renames. */ @@ -1560,5 +1574,5 @@ void diffcore_rename_extended(struct diff_options *options, void diffcore_rename(struct diff_options *options) { - diffcore_rename_extended(options, NULL, NULL, NULL); + diffcore_rename_extended(options, NULL, NULL, NULL, NULL); } diff --git a/diffcore.h b/diffcore.h index f5c6de4841..533b30e21e 100644 --- a/diffcore.h +++ b/diffcore.h @@ -181,7 +181,8 @@ void diffcore_rename(struct diff_options *); void diffcore_rename_extended(struct diff_options *options, struct strintmap *relevant_sources, struct strintmap *dirs_removed, - struct strmap *dir_rename_count); + struct strmap *dir_rename_count, + struct strmap *cached_pairs); void diffcore_merge_broken(void); void diffcore_pickaxe(struct diff_options *); void diffcore_order(const char *orderfile); diff --git a/merge-ort.c b/merge-ort.c index 4a9ce2a822..b954f7184a 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -53,6 +53,8 @@ enum merge_side { MERGE_SIDE2 = 2 }; +static unsigned RESULT_INITIALIZED = 0x1abe11ed; /* unlikely accidental value */ + struct traversal_callback_data { unsigned long mask; unsigned long dirmask; @@ -140,6 +142,72 @@ struct rename_info { int callback_data_nr, callback_data_alloc; char *callback_data_traverse_path; + /* + * merge_trees: trees passed to the merge algorithm for the merge + * + * merge_trees records the trees passed to the merge algorithm. But, + * this data also is stored in merge_result->priv. If a sequence of + * merges are being done (such as when cherry-picking or rebasing), + * the next merge can look at this and re-use information from + * previous merges under certain circumstances. + * + * See also all the cached_* variables. + */ + struct tree *merge_trees[3]; + + /* + * cached_pairs_valid_side: which side's cached info can be reused + * + * See the description for merge_trees. For repeated merges, at most + * only one side's cached information can be used. Valid values: + * MERGE_SIDE2: cached data from side2 can be reused + * MERGE_SIDE1: cached data from side1 can be reused + * 0: no cached data can be reused + */ + int cached_pairs_valid_side; + + /* + * cached_pairs: Caching of renames and deletions. + * + * These are mappings recording renames and deletions of individual + * files (not directories). They are thus a map from an old + * filename to either NULL (for deletions) or a new filename (for + * renames). + */ + struct strmap cached_pairs[3]; + + /* + * cached_target_names: just the destinations from cached_pairs + * + * We sometimes want a fast lookup to determine if a given filename + * is one of the destinations in cached_pairs. cached_target_names + * is thus duplicative information, but it provides a fast lookup. + */ + struct strset cached_target_names[3]; + + /* + * cached_irrelevant: Caching of rename_sources that aren't relevant. + * + * If we try to detect a rename for a source path and succeed, it's + * part of a rename. If we try to detect a rename for a source path + * and fail, then it's a delete. If we do not try to detect a rename + * for a path, then we don't know if it's a rename or a delete. If + * merge-ort doesn't think the path is relevant, then we just won't + * cache anything for that path. But there's a slight problem in + * that merge-ort can think a path is RELEVANT_LOCATION, but due to + * commit 9bd342137e ("diffcore-rename: determine which + * relevant_sources are no longer relevant", 2021-03-13), + * diffcore-rename can downgrade the path to RELEVANT_NO_MORE. To + * avoid excessive calls to diffcore_rename_extended() we still need + * to cache such paths, though we cannot record them as either + * renames or deletes. So we cache them here as a "turned out to be + * irrelevant *for this commit*" as they are often also irrelevant + * for subsequent commits, though we will have to do some extra + * checking to see whether such paths become relevant for rename + * detection when cherry-picking/rebasing subsequent commits. + */ + struct strset cached_irrelevant[3]; + /* * needed_limit: value needed for inexact rename detection to run * @@ -382,6 +450,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, reinitialize ? strmap_partial_clear : strmap_clear; void (*strintmap_func)(struct strintmap *) = reinitialize ? strintmap_partial_clear : strintmap_clear; + void (*strset_func)(struct strset *) = + reinitialize ? strset_partial_clear : strset_clear; /* * We marked opti->paths with strdup_strings = 0, so that we @@ -417,15 +487,21 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, /* Free memory used by various renames maps */ for (i = MERGE_SIDE1; i <= MERGE_SIDE2; ++i) { strintmap_func(&renames->dirs_removed[i]); - - partial_clear_dir_rename_count(&renames->dir_rename_count[i]); - if (!reinitialize) - strmap_clear(&renames->dir_rename_count[i], 1); - strmap_func(&renames->dir_renames[i], 0); - strintmap_func(&renames->relevant_sources[i]); + if (!reinitialize) + assert(renames->cached_pairs_valid_side == 0); + if (i != renames->cached_pairs_valid_side) { + strset_func(&renames->cached_target_names[i]); + strmap_func(&renames->cached_pairs[i], 1); + strset_func(&renames->cached_irrelevant[i]); + partial_clear_dir_rename_count(&renames->dir_rename_count[i]); + if (!reinitialize) + strmap_clear(&renames->dir_rename_count[i], 1); + } } + renames->cached_pairs_valid_side = 0; + renames->dir_rename_mask = 0; if (!reinitialize) { struct hashmap_iter iter; @@ -448,8 +524,6 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, strmap_clear(&opti->output, 0); } - renames->dir_rename_mask = 0; - /* Clean out callback_data as well. */ FREE_AND_NULL(renames->callback_data); renames->callback_data_nr = renames->callback_data_alloc = 0; @@ -690,15 +764,48 @@ static void add_pair(struct merge_options *opt, struct rename_info *renames = &opt->priv->renames; int names_idx = is_add ? side : 0; - if (!is_add) { + if (is_add) { + if (strset_contains(&renames->cached_target_names[side], + pathname)) + return; + } else { unsigned content_relevant = (match_mask == 0); unsigned location_relevant = (dir_rename_mask == 0x07); + /* + * If pathname is found in cached_irrelevant[side] due to + * previous pick but for this commit content is relevant, + * then we need to remove it from cached_irrelevant. + */ + if (content_relevant) + /* strset_remove is no-op if strset doesn't have key */ + strset_remove(&renames->cached_irrelevant[side], + pathname); + + /* + * We do not need to re-detect renames for paths that we already + * know the pairing, i.e. for cached_pairs (or + * cached_irrelevant). However, handle_deferred_entries() needs + * to loop over the union of keys from relevant_sources[side] and + * cached_pairs[side], so for simplicity we set relevant_sources + * for all the cached_pairs too and then strip them back out in + * prune_cached_from_relevant() at the beginning of + * detect_regular_renames(). + */ if (content_relevant || location_relevant) { /* content_relevant trumps location_relevant */ strintmap_set(&renames->relevant_sources[side], pathname, content_relevant ? RELEVANT_CONTENT : RELEVANT_LOCATION); } + + /* + * Avoid creating pair if we've already cached rename results. + * Note that we do this after setting relevant_sources[side] + * as noted in the comment above. + */ + if (strmap_contains(&renames->cached_pairs[side], pathname) || + strset_contains(&renames->cached_irrelevant[side], pathname)) + return; } one = alloc_filespec(pathname); @@ -2037,6 +2144,9 @@ static int process_renames(struct merge_options *opt, VERIFY_CI(side2); if (!strcmp(pathnames[1], pathnames[2])) { + struct rename_info *ri = &opt->priv->renames; + int j; + /* Both sides renamed the same way */ assert(side1 == side2); memcpy(&side1->stages[0], &base->stages[0], @@ -2046,6 +2156,16 @@ static int process_renames(struct merge_options *opt, base->merged.is_null = 1; base->merged.clean = 1; + /* + * Disable remembering renames optimization; + * rename/rename(1to1) is incredibly rare, and + * just disabling the optimization is easier + * than purging cached_pairs, + * cached_target_names, and dir_rename_counts. + */ + for (j = 0; j < 3; j++) + ri->merge_trees[j] = NULL; + /* We handled both renames, i.e. i+1 handled */ i++; /* Move to next rename */ @@ -2273,7 +2393,9 @@ static inline int possible_side_renames(struct rename_info *renames, static inline int possible_renames(struct rename_info *renames) { return possible_side_renames(renames, 1) || - possible_side_renames(renames, 2); + possible_side_renames(renames, 2) || + !strmap_empty(&renames->cached_pairs[1]) || + !strmap_empty(&renames->cached_pairs[2]); } static void resolve_diffpair_statuses(struct diff_queue_struct *q) @@ -2297,6 +2419,112 @@ static void resolve_diffpair_statuses(struct diff_queue_struct *q) } } +static void prune_cached_from_relevant(struct rename_info *renames, + unsigned side) +{ + /* Reason for this function described in add_pair() */ + struct hashmap_iter iter; + struct strmap_entry *entry; + + /* Remove from relevant_sources all entries in cached_pairs[side] */ + strmap_for_each_entry(&renames->cached_pairs[side], &iter, entry) { + strintmap_remove(&renames->relevant_sources[side], + entry->key); + } + /* Remove from relevant_sources all entries in cached_irrelevant[side] */ + strset_for_each_entry(&renames->cached_irrelevant[side], &iter, entry) { + strintmap_remove(&renames->relevant_sources[side], + entry->key); + } +} + +static void use_cached_pairs(struct merge_options *opt, + struct strmap *cached_pairs, + struct diff_queue_struct *pairs) +{ + struct hashmap_iter iter; + struct strmap_entry *entry; + + /* + * Add to side_pairs all entries from renames->cached_pairs[side_index]. + * (Info in cached_irrelevant[side_index] is not relevant here.) + */ + strmap_for_each_entry(cached_pairs, &iter, entry) { + struct diff_filespec *one, *two; + const char *old_name = entry->key; + const char *new_name = entry->value; + if (!new_name) + new_name = old_name; + + /* We don't care about oid/mode, only filenames and status */ + one = alloc_filespec(old_name); + two = alloc_filespec(new_name); + diff_queue(pairs, one, two); + pairs->queue[pairs->nr-1]->status = entry->value ? 'R' : 'D'; + } +} + +static void cache_new_pair(struct rename_info *renames, + int side, + char *old_path, + char *new_path, + int free_old_value) +{ + char *old_value; + new_path = xstrdup(new_path); + old_value = strmap_put(&renames->cached_pairs[side], + old_path, new_path); + strset_add(&renames->cached_target_names[side], new_path); + if (free_old_value) + free(old_value); + else + assert(!old_value); +} + +static void possibly_cache_new_pair(struct rename_info *renames, + struct diff_filepair *p, + unsigned side, + char *new_path) +{ + int dir_renamed_side = 0; + + if (new_path) { + /* + * Directory renames happen on the other side of history from + * the side that adds new files to the old directory. + */ + dir_renamed_side = 3 - side; + } else { + int val = strintmap_get(&renames->relevant_sources[side], + p->one->path); + if (val == RELEVANT_NO_MORE) { + assert(p->status == 'D'); + strset_add(&renames->cached_irrelevant[side], + p->one->path); + } + if (val <= 0) + return; + } + + if (p->status == 'D') { + /* + * If we already had this delete, we'll just set it's value + * to NULL again, so no harm. + */ + strmap_put(&renames->cached_pairs[side], p->one->path, NULL); + } else if (p->status == 'R') { + if (!new_path) + new_path = p->two->path; + else + cache_new_pair(renames, dir_renamed_side, + p->two->path, new_path, 0); + cache_new_pair(renames, side, p->one->path, new_path, 1); + } else if (p->status == 'A' && new_path) { + cache_new_pair(renames, dir_renamed_side, + p->two->path, new_path, 0); + } +} + static int compare_pairs(const void *a_, const void *b_) { const struct diff_filepair *a = *((const struct diff_filepair **)a_); @@ -2312,6 +2540,7 @@ static void detect_regular_renames(struct merge_options *opt, struct diff_options diff_opts; struct rename_info *renames = &opt->priv->renames; + prune_cached_from_relevant(renames, side_index); if (!possible_side_renames(renames, side_index)) { /* * No rename detection needed for this side, but we still need @@ -2322,6 +2551,7 @@ static void detect_regular_renames(struct merge_options *opt, return; } + partial_clear_dir_rename_count(&renames->dir_rename_count[side_index]); repo_diff_setup(opt->repo, &diff_opts); diff_opts.flags.recursive = 1; diff_opts.flags.rename_empty = 0; @@ -2339,7 +2569,8 @@ static void detect_regular_renames(struct merge_options *opt, diffcore_rename_extended(&diff_opts, &renames->relevant_sources[side_index], &renames->dirs_removed[side_index], - &renames->dir_rename_count[side_index]); + &renames->dir_rename_count[side_index], + &renames->cached_pairs[side_index]); trace2_region_leave("diff", "diffcore_rename", opt->repo); resolve_diffpair_statuses(&diff_queued_diff); @@ -2379,6 +2610,7 @@ static int collect_renames(struct merge_options *opt, char *new_path; /* non-NULL only with directory renames */ if (p->status != 'A' && p->status != 'R') { + possibly_cache_new_pair(renames, p, side_index, NULL); diff_free_filepair(p); continue; } @@ -2390,6 +2622,7 @@ static int collect_renames(struct merge_options *opt, &collisions, &clean); + possibly_cache_new_pair(renames, p, side_index, new_path); if (p->status != 'R' && !new_path) { diff_free_filepair(p); continue; @@ -2445,6 +2678,8 @@ static int detect_and_process_renames(struct merge_options *opt, trace2_region_enter("merge", "regular renames", opt->repo); detect_regular_renames(opt, MERGE_SIDE1); detect_regular_renames(opt, MERGE_SIDE2); + use_cached_pairs(opt, &renames->cached_pairs[1], &renames->pairs[1]); + use_cached_pairs(opt, &renames->cached_pairs[2], &renames->pairs[2]); trace2_region_leave("merge", "regular renames", opt->repo); trace2_region_enter("merge", "directory renames", opt->repo); @@ -3635,6 +3870,10 @@ static void merge_start(struct merge_options *opt, struct merge_result *result) assert(opt->obuf.len == 0); assert(opt->priv == NULL); + if (result->_properly_initialized != 0 && + result->_properly_initialized != RESULT_INITIALIZED) + BUG("struct merge_result passed to merge_incore_*recursive() must be zeroed or filled with values from a previous run"); + assert(!!result->priv == !!result->_properly_initialized); if (result->priv) { opt->priv = result->priv; result->priv = NULL; @@ -3674,8 +3913,22 @@ static void merge_start(struct merge_options *opt, struct merge_result *result) NULL, 1); strmap_init_with_options(&renames->dir_renames[i], NULL, 0); + /* + * relevant_sources uses -1 for the default, because we need + * to be able to distinguish not-in-strintmap from valid + * relevant_source values from enum file_rename_relevance. + * In particular, possibly_cache_new_pair() expects a negative + * value for not-found entries. + */ strintmap_init_with_options(&renames->relevant_sources[i], - 0, NULL, 0); + -1 /* explicitly invalid */, + NULL, 0); + strmap_init_with_options(&renames->cached_pairs[i], + NULL, 1); + strset_init_with_options(&renames->cached_irrelevant[i], + NULL, 1); + strset_init_with_options(&renames->cached_target_names[i], + NULL, 0); } /* @@ -3701,6 +3954,50 @@ static void merge_start(struct merge_options *opt, struct merge_result *result) trace2_region_leave("merge", "allocate/init", opt->repo); } +static void merge_check_renames_reusable(struct merge_options *opt, + struct merge_result *result, + struct tree *merge_base, + struct tree *side1, + struct tree *side2) +{ + struct rename_info *renames; + struct tree **merge_trees; + struct merge_options_internal *opti = result->priv; + + if (!opti) + return; + + renames = &opti->renames; + merge_trees = renames->merge_trees; + + /* + * Handle case where previous merge operation did not want cache to + * take effect, e.g. because rename/rename(1to1) makes it invalid. + */ + if (!merge_trees[0]) { + assert(!merge_trees[0] && !merge_trees[1] && !merge_trees[2]); + renames->cached_pairs_valid_side = 0; /* neither side valid */ + return; + } + + /* + * Handle other cases; note that merge_trees[0..2] will only + * be NULL if opti is, or if all three were manually set to + * NULL by e.g. rename/rename(1to1) handling. + */ + assert(merge_trees[0] && merge_trees[1] && merge_trees[2]); + + /* Check if we meet a condition for re-using cached_pairs */ + if (oideq(&merge_base->object.oid, &merge_trees[2]->object.oid) && + oideq(&side1->object.oid, &result->tree->object.oid)) + renames->cached_pairs_valid_side = MERGE_SIDE1; + else if (oideq(&merge_base->object.oid, &merge_trees[1]->object.oid) && + oideq(&side2->object.oid, &result->tree->object.oid)) + renames->cached_pairs_valid_side = MERGE_SIDE2; + else + renames->cached_pairs_valid_side = 0; /* neither side valid */ +} + /*** Function Grouping: merge_incore_*() and their internal variants ***/ /* @@ -3751,6 +4048,7 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt, result->clean &= strmap_empty(&opt->priv->conflicted); if (!opt->priv->call_depth) { result->priv = opt->priv; + result->_properly_initialized = RESULT_INITIALIZED; opt->priv = NULL; } } @@ -3848,7 +4146,16 @@ void merge_incore_nonrecursive(struct merge_options *opt, trace2_region_enter("merge", "merge_start", opt->repo); assert(opt->ancestor != NULL); + merge_check_renames_reusable(opt, result, merge_base, side1, side2); merge_start(opt, result); + /* + * Record the trees used in this merge, so if there's a next merge in + * a cherry-pick or rebase sequence it might be able to take advantage + * of the cached_pairs in that next merge. + */ + opt->priv->renames.merge_trees[0] = merge_base; + opt->priv->renames.merge_trees[1] = side1; + opt->priv->renames.merge_trees[2] = side2; trace2_region_leave("merge", "merge_start", opt->repo); merge_ort_nonrecursive_internal(opt, merge_base, side1, side2, result); diff --git a/merge-ort.h b/merge-ort.h index d53a0a339f..c011864ffe 100644 --- a/merge-ort.h +++ b/merge-ort.h @@ -29,6 +29,8 @@ struct merge_result { * !clean) and to print "CONFLICT" messages. Not for external use. */ void *priv; + /* Also private */ + unsigned _properly_initialized; }; /* diff --git a/t/helper/test-fast-rebase.c b/t/helper/test-fast-rebase.c index 373212256a..fc2d460904 100644 --- a/t/helper/test-fast-rebase.c +++ b/t/helper/test-fast-rebase.c @@ -91,7 +91,6 @@ int cmd__fast_rebase(int argc, const char **argv) struct commit *last_commit = NULL, *last_picked_commit = NULL; struct object_id head; struct lock_file lock = LOCK_INIT; - int clean = 1; struct strvec rev_walk_args = STRVEC_INIT; struct rev_info revs; struct commit *commit; @@ -124,7 +123,8 @@ int cmd__fast_rebase(int argc, const char **argv) assert(oideq(&onto->object.oid, &head)); hold_locked_index(&lock, LOCK_DIE_ON_ERROR); - assert(repo_read_index(the_repository) >= 0); + if (repo_read_index(the_repository) < 0) + BUG("Could not read index"); repo_init_revisions(the_repository, &revs, NULL); revs.verbose_header = 1; @@ -175,11 +175,10 @@ int cmd__fast_rebase(int argc, const char **argv) free((char*)merge_opt.ancestor); merge_opt.ancestor = NULL; if (!result.clean) - die("Aborting: Hit a conflict and restarting is not implemented."); + break; last_picked_commit = commit; last_commit = create_commit(result.tree, commit, last_commit); } - fprintf(stderr, "\nDone.\n"); /* TODO: There should be some kind of rev_info_free(&revs) call... */ memset(&revs, 0, sizeof(revs)); @@ -188,24 +187,39 @@ int cmd__fast_rebase(int argc, const char **argv) if (result.clean < 0) exit(128); - strbuf_addf(&reflog_msg, "finish rebase %s onto %s", - oid_to_hex(&last_picked_commit->object.oid), - oid_to_hex(&last_commit->object.oid)); - if (update_ref(reflog_msg.buf, branch_name.buf, - &last_commit->object.oid, - &last_picked_commit->object.oid, - REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR)) { - error(_("could not update %s"), argv[4]); - die("Failed to update %s", argv[4]); + if (result.clean) { + fprintf(stderr, "\nDone.\n"); + strbuf_addf(&reflog_msg, "finish rebase %s onto %s", + oid_to_hex(&last_picked_commit->object.oid), + oid_to_hex(&last_commit->object.oid)); + if (update_ref(reflog_msg.buf, branch_name.buf, + &last_commit->object.oid, + &last_picked_commit->object.oid, + REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR)) { + error(_("could not update %s"), argv[4]); + die("Failed to update %s", argv[4]); + } + if (create_symref("HEAD", branch_name.buf, reflog_msg.buf) < 0) + die(_("unable to update HEAD")); + strbuf_release(&reflog_msg); + strbuf_release(&branch_name); + + prime_cache_tree(the_repository, the_repository->index, + result.tree); + } else { + fprintf(stderr, "\nAborting: Hit a conflict.\n"); + strbuf_addf(&reflog_msg, "rebase progress up to %s", + oid_to_hex(&last_picked_commit->object.oid)); + if (update_ref(reflog_msg.buf, "HEAD", + &last_commit->object.oid, + &head, + REF_NO_DEREF, UPDATE_REFS_MSG_ON_ERR)) { + error(_("could not update %s"), argv[4]); + die("Failed to update %s", argv[4]); + } } - if (create_symref("HEAD", branch_name.buf, reflog_msg.buf) < 0) - die(_("unable to update HEAD")); - strbuf_release(&reflog_msg); - strbuf_release(&branch_name); - - prime_cache_tree(the_repository, the_repository->index, result.tree); if (write_locked_index(&the_index, &lock, COMMIT_LOCK | SKIP_IF_UNCHANGED)) die(_("unable to write %s"), get_index_file()); - return (clean == 0); + return (result.clean == 0); } diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh index 7134769149..be84d22419 100755 --- a/t/t6423-merge-rename-directories.sh +++ b/t/t6423-merge-rename-directories.sh @@ -4966,6 +4966,64 @@ test_expect_success '12g: Testcase with two kinds of "relevant" renames' ' ) ' +# Testcase 12h, Testcase with two kinds of "relevant" renames +# Commit O: olddir/{a_1, b} +# Commit A: newdir/{a_2, b} +# Commit B: olddir/{alpha_1, b} +# Expected: newdir/{alpha_2, b} + +test_setup_12h () { + test_create_repo 12h && + ( + cd 12h && + + mkdir olddir && + test_seq 3 8 >olddir/a && + >olddir/b && + git add olddir && + git commit -m orig && + + git branch O && + git branch A && + git branch B && + + git switch A && + test_seq 3 10 >olddir/a && + git add olddir/a && + git mv olddir newdir && + git commit -m A && + + git switch B && + + git mv olddir/a olddir/alpha && + git commit -m B + ) +} + +test_expect_failure '12h: renaming a file within a renamed directory' ' + test_setup_12h && + ( + cd 12h && + + git checkout A^0 && + + test_might_fail git -c merge.directoryRenames=true merge -s recursive B^0 && + + git ls-files >tracked && + test_line_count = 2 tracked && + + test_path_is_missing olddir/a && + test_path_is_file newdir/alpha && + test_path_is_file newdir/b && + + git rev-parse >actual \ + HEAD:newdir/alpha HEAD:newdir/b && + git rev-parse >expect \ + A:newdir/a O:oldir/b && + test_cmp expect actual + ) +' + ########################################################################### # SECTION 13: Checking informational and conflict messages # diff --git a/t/t6429-merge-sequence-rename-caching.sh b/t/t6429-merge-sequence-rename-caching.sh new file mode 100755 index 0000000000..035edc40b1 --- /dev/null +++ b/t/t6429-merge-sequence-rename-caching.sh @@ -0,0 +1,700 @@ +#!/bin/sh + +test_description="remember regular & dir renames in sequence of merges" + +. ./test-lib.sh + +# +# NOTE 1: this testfile tends to not only rename files, but modify on both +# sides; without modifying on both sides, optimizations can kick in +# which make rename detection irrelevant or trivial. We want to make +# sure that we are triggering rename caching rather than rename +# bypassing. +# +# NOTE 2: this testfile uses 'test-tool fast-rebase' instead of either +# cherry-pick or rebase. sequencer.c is only superficially +# integrated with merge-ort; it calls merge_switch_to_result() +# after EACH merge, which updates the index and working copy AND +# throws away the cached results (because merge_switch_to_result() +# is only supposed to be called at the end of the sequence). +# Integrating them more deeply is a big task, so for now the tests +# use 'test-tool fast-rebase'. +# + + +# +# In the following simple testcase: +# Base: numbers_1, values_1 +# Upstream: numbers_2, values_2 +# Topic_1: sequence_3 +# Topic_2: scruples_3 +# or, in english, rename numbers -> sequence in the first commit, and rename +# values -> scruples in the second commit. +# +# This shouldn't be a challenge, it's just verifying that cached renames isn't +# preventing us from finding new renames. +# +test_expect_success 'caching renames does not preclude finding new ones' ' + test_create_repo caching-renames-and-new-renames && + ( + cd caching-renames-and-new-renames && + + test_seq 2 10 >numbers && + test_seq 2 10 >values && + git add numbers values && + git commit -m orig && + + git branch upstream && + git branch topic && + + git switch upstream && + test_seq 1 10 >numbers && + test_seq 1 10 >values && + git add numbers values && + git commit -m "Tweaked both files" && + + git switch topic && + + test_seq 2 12 >numbers && + git add numbers && + git mv numbers sequence && + git commit -m A && + + test_seq 2 12 >values && + git add values && + git mv values scruples && + git commit -m B && + + # + # Actual testing + # + + git switch upstream && + + test-tool fast-rebase --onto HEAD upstream~1 topic && + #git cherry-pick upstream~1..topic + + git ls-files >tracked-files && + test_line_count = 2 tracked-files && + test_seq 1 12 >expect && + test_cmp expect sequence && + test_cmp expect scruples + ) +' + +# +# In the following testcase: +# Base: numbers_1 +# Upstream: rename numbers_1 -> sequence_2 +# Topic_1: numbers_3 +# Topic_2: numbers_1 +# or, in english, the first commit on the topic branch modifies numbers by +# shrinking it (dramatically) and the second commit on topic reverts its +# parent. +# +# Can git apply both patches? +# +# Traditional cherry-pick/rebase will fail to apply the second commit, the +# one that reverted its parent, because despite detecting the rename from +# 'numbers' to 'sequence' for the first commit, it fails to detect that +# rename when picking the second commit. That's "reasonable" given the +# dramatic change in size of the file, but remembering the rename and +# reusing it is reasonable too. +# +# We do test here that we expect rename detection to only be run once total +# (the topic side of history doesn't need renames, and with caching we +# should be able to only run rename detection on the upstream side one +# time.) +test_expect_success 'cherry-pick both a commit and its immediate revert' ' + test_create_repo pick-commit-and-its-immediate-revert && + ( + cd pick-commit-and-its-immediate-revert && + + test_seq 11 30 >numbers && + git add numbers && + git commit -m orig && + + git branch upstream && + git branch topic && + + git switch upstream && + test_seq 1 30 >numbers && + git add numbers && + git mv numbers sequence && + git commit -m "Renamed (and modified) numbers -> sequence" && + + git switch topic && + + test_seq 11 13 >numbers && + git add numbers && + git commit -m A && + + git revert HEAD && + + # + # Actual testing + # + + git switch upstream && + + GIT_TRACE2_PERF="$(pwd)/trace.output" && + export GIT_TRACE2_PERF && + + test-tool fast-rebase --onto HEAD upstream~1 topic && + #git cherry-pick upstream~1..topic && + + grep region_enter.*diffcore_rename trace.output >calls && + test_line_count = 1 calls + ) +' + +# +# In the following testcase: +# Base: sequence_1 +# Upstream: rename sequence_1 -> values_2 +# Topic_1: rename sequence_1 -> values_3 +# Topic_2: add unrelated sequence_4 +# or, in english, both sides rename sequence -> values, and then the second +# commit on the topic branch adds an unrelated file called sequence. +# +# This testcase presents no problems for git traditionally, but having both +# sides do the same rename in effect "uses it up" and if it remains cached, +# could cause a spurious rename/add conflict. +# +test_expect_success 'rename same file identically, then reintroduce it' ' + test_create_repo rename-rename-1to1-then-add-old-filename && + ( + cd rename-rename-1to1-then-add-old-filename && + + test_seq 3 8 >sequence && + git add sequence && + git commit -m orig && + + git branch upstream && + git branch topic && + + git switch upstream && + test_seq 1 8 >sequence && + git add sequence && + git mv sequence values && + git commit -m "Renamed (and modified) sequence -> values" && + + git switch topic && + + test_seq 3 10 >sequence && + git add sequence && + git mv sequence values && + git commit -m A && + + test_write_lines A B C D E F G H I J >sequence && + git add sequence && + git commit -m B && + + # + # Actual testing + # + + git switch upstream && + + GIT_TRACE2_PERF="$(pwd)/trace.output" && + export GIT_TRACE2_PERF && + + test-tool fast-rebase --onto HEAD upstream~1 topic && + #git cherry-pick upstream~1..topic && + + git ls-files >tracked && + test_line_count = 2 tracked && + test_path_is_file values && + test_path_is_file sequence && + + grep region_enter.*diffcore_rename trace.output >calls && + test_line_count = 2 calls + ) +' + +# +# In the following testcase: +# Base: olddir/{valuesZ_1, valuesY_1, valuesX_1} +# Upstream: rename olddir/valuesZ_1 -> dirA/valuesZ_2 +# rename olddir/valuesY_1 -> dirA/valuesY_2 +# rename olddir/valuesX_1 -> dirB/valuesX_2 +# Topic_1: rename olddir/valuesZ_1 -> dirA/valuesZ_3 +# rename olddir/valuesY_1 -> dirA/valuesY_3 +# Topic_2: add olddir/newfile +# Expected Pick1: dirA/{valuesZ, valuesY}, dirB/valuesX +# Expected Pick2: dirA/{valuesZ, valuesY}, dirB/{valuesX, newfile} +# +# This testcase presents no problems for git traditionally, but having both +# sides do the same renames in effect "use it up" but if the renames remain +# cached, the directory rename could put newfile in the wrong directory. +# +test_expect_success 'rename same file identically, then add file to old dir' ' + test_create_repo rename-rename-1to1-then-add-file-to-old-dir && + ( + cd rename-rename-1to1-then-add-file-to-old-dir && + + mkdir olddir/ && + test_seq 3 8 >olddir/valuesZ && + test_seq 3 8 >olddir/valuesY && + test_seq 3 8 >olddir/valuesX && + git add olddir && + git commit -m orig && + + git branch upstream && + git branch topic && + + git switch upstream && + test_seq 1 8 >olddir/valuesZ && + test_seq 1 8 >olddir/valuesY && + test_seq 1 8 >olddir/valuesX && + git add olddir && + mkdir dirA && + git mv olddir/valuesZ olddir/valuesY dirA && + git mv olddir/ dirB/ && + git commit -m "Renamed (and modified) values*" && + + git switch topic && + + test_seq 3 10 >olddir/valuesZ && + test_seq 3 10 >olddir/valuesY && + git add olddir && + mkdir dirA && + git mv olddir/valuesZ olddir/valuesY dirA && + git commit -m A && + + >olddir/newfile && + git add olddir/newfile && + git commit -m B && + + # + # Actual testing + # + + git switch upstream && + git config merge.directoryRenames true && + + GIT_TRACE2_PERF="$(pwd)/trace.output" && + export GIT_TRACE2_PERF && + + test-tool fast-rebase --onto HEAD upstream~1 topic && + #git cherry-pick upstream~1..topic && + + git ls-files >tracked && + test_line_count = 4 tracked && + test_path_is_file dirA/valuesZ && + test_path_is_file dirA/valuesY && + test_path_is_file dirB/valuesX && + test_path_is_file dirB/newfile && + + grep region_enter.*diffcore_rename trace.output >calls && + test_line_count = 3 calls + ) +' + +# +# In the following testcase, upstream renames a directory, and the topic branch +# first adds a file to the directory, then later renames the directory +# differently: +# Base: olddir/a +# olddir/b +# Upstream: rename olddir/ -> newdir/ +# Topic_1: add olddir/newfile +# Topic_2: rename olddir/ -> otherdir/ +# +# Here we are just concerned that cached renames might prevent us from seeing +# the rename conflict, and we want to ensure that we do get a conflict. +# +# While at it, though, we do test that we only try to detect renames 2 +# times and not three. (The first merge needs to detect renames on the +# upstream side. Traditionally, the second merge would need to detect +# renames on both sides of history, but our caching of upstream renames +# should avoid the need to re-detect upstream renames.) +# +test_expect_success 'cached dir rename does not prevent noticing later conflict' ' + test_create_repo dir-rename-cache-not-occluding-later-conflict && + ( + cd dir-rename-cache-not-occluding-later-conflict && + + mkdir olddir && + test_seq 3 10 >olddir/a && + test_seq 3 10 >olddir/b && + git add olddir && + git commit -m orig && + + git branch upstream && + git branch topic && + + git switch upstream && + test_seq 3 10 >olddir/a && + test_seq 3 10 >olddir/b && + git add olddir && + git mv olddir newdir && + git commit -m "Dir renamed" && + + git switch topic && + + >olddir/newfile && + git add olddir/newfile && + git commit -m A && + + test_seq 1 8 >olddir/a && + test_seq 1 8 >olddir/b && + git add olddir && + git mv olddir otherdir && + git commit -m B && + + # + # Actual testing + # + + git switch upstream && + git config merge.directoryRenames true && + + GIT_TRACE2_PERF="$(pwd)/trace.output" && + export GIT_TRACE2_PERF && + + test_must_fail test-tool fast-rebase --onto HEAD upstream~1 topic >output && + #git cherry-pick upstream..topic && + + grep CONFLICT..rename/rename output && + + grep region_enter.*diffcore_rename trace.output >calls && + test_line_count = 2 calls + ) +' + +# Helper for the next two tests +test_setup_upstream_rename () { + test_create_repo $1 && + ( + cd $1 && + + test_seq 3 8 >somefile && + test_seq 3 8 >relevant-rename && + git add somefile relevant-rename && + mkdir olddir && + test_write_lines a b c d e f g >olddir/a && + test_write_lines z y x w v u t >olddir/b && + git add olddir && + git commit -m orig && + + git branch upstream && + git branch topic && + + git switch upstream && + test_seq 1 8 >somefile && + test_seq 1 8 >relevant-rename && + git add somefile relevant-rename && + git mv relevant-rename renamed && + echo h >>olddir/a && + echo s >>olddir/b && + git add olddir && + git mv olddir newdir && + git commit -m "Dir renamed" + ) +} + +# +# In the following testcase, upstream renames a file in the toplevel directory +# as well as its only directory: +# Base: relevant-rename_1 +# somefile +# olddir/a +# olddir/b +# Upstream: rename relevant-rename_1 -> renamed_2 +# rename olddir/ -> newdir/ +# Topic_1: relevant-rename_3 +# Topic_2: olddir/newfile_1 +# Topic_3: olddir/newfile_2 +# +# In this testcase, since the first commit being picked only modifies a +# file in the toplevel directory, the directory rename is irrelevant for +# that first merge. However, we need to notice the directory rename for +# the merge that picks the second commit, and we don't want the third +# commit to mess up its location either. We want to make sure that +# olddir/newfile doesn't exist in the result and that newdir/newfile does. +# +# We also test that we only do rename detection twice. We never need +# rename detection on the topic side of history, but we do need it twice on +# the upstream side of history. For the first topic commit, we only need +# the +# relevant-rename -> renamed +# rename, because olddir is unmodified by Topic_1. For Topic_2, however, +# the new file being added to olddir means files that were previously +# irrelevant for rename detection are now relevant, forcing us to repeat +# rename detection for the paths we don't already have cached. Topic_3 also +# tweaks olddir/newfile, but the renames in olddir/ will have been cached +# from the second rename detection run. +# +test_expect_success 'dir rename unneeded, then add new file to old dir' ' + test_setup_upstream_rename dir-rename-unneeded-until-new-file && + ( + cd dir-rename-unneeded-until-new-file && + + git switch topic && + + test_seq 3 10 >relevant-rename && + git add relevant-rename && + git commit -m A && + + echo foo >olddir/newfile && + git add olddir/newfile && + git commit -m B && + + echo bar >>olddir/newfile && + git add olddir/newfile && + git commit -m C && + + # + # Actual testing + # + + git switch upstream && + git config merge.directoryRenames true && + + GIT_TRACE2_PERF="$(pwd)/trace.output" && + export GIT_TRACE2_PERF && + + test-tool fast-rebase --onto HEAD upstream~1 topic && + #git cherry-pick upstream..topic && + + grep region_enter.*diffcore_rename trace.output >calls && + test_line_count = 2 calls && + + git ls-files >tracked && + test_line_count = 5 tracked && + test_path_is_missing olddir/newfile && + test_path_is_file newdir/newfile + ) +' + +# +# The following testcase is *very* similar to the last one, but instead of +# adding a new olddir/newfile, it renames somefile -> olddir/newfile: +# Base: relevant-rename_1 +# somefile_1 +# olddir/a +# olddir/b +# Upstream: rename relevant-rename_1 -> renamed_2 +# rename olddir/ -> newdir/ +# Topic_1: relevant-rename_3 +# Topic_2: rename somefile -> olddir/newfile_2 +# Topic_3: modify olddir/newfile_3 +# +# In this testcase, since the first commit being picked only modifies a +# file in the toplevel directory, the directory rename is irrelevant for +# that first merge. However, we need to notice the directory rename for +# the merge that picks the second commit, and we don't want the third +# commit to mess up its location either. We want to make sure that +# neither somefile or olddir/newfile exists in the result and that +# newdir/newfile does. +# +# This testcase needs one more call to rename detection than the last +# testcase, because of the somefile -> olddir/newfile rename in Topic_2. +test_expect_success 'dir rename unneeded, then rename existing file into old dir' ' + test_setup_upstream_rename dir-rename-unneeded-until-file-moved-inside && + ( + cd dir-rename-unneeded-until-file-moved-inside && + + git switch topic && + + test_seq 3 10 >relevant-rename && + git add relevant-rename && + git commit -m A && + + test_seq 1 10 >somefile && + git add somefile && + git mv somefile olddir/newfile && + git commit -m B && + + test_seq 1 12 >olddir/newfile && + git add olddir/newfile && + git commit -m C && + + # + # Actual testing + # + + git switch upstream && + git config merge.directoryRenames true && + + GIT_TRACE2_PERF="$(pwd)/trace.output" && + export GIT_TRACE2_PERF && + + test-tool fast-rebase --onto HEAD upstream~1 topic && + #git cherry-pick upstream..topic && + + grep region_enter.*diffcore_rename trace.output >calls && + test_line_count = 3 calls && + + test_path_is_missing somefile && + test_path_is_missing olddir/newfile && + test_path_is_file newdir/newfile && + git ls-files >tracked && + test_line_count = 4 tracked + ) +' + +# Helper for the next two tests +test_setup_topic_rename () { + test_create_repo $1 && + ( + cd $1 && + + test_seq 3 8 >somefile && + mkdir olddir && + test_seq 3 8 >olddir/a && + echo b >olddir/b && + git add olddir somefile && + git commit -m orig && + + git branch upstream && + git branch topic && + + git switch topic && + test_seq 1 8 >somefile && + test_seq 1 8 >olddir/a && + git add somefile olddir/a && + git mv olddir newdir && + git commit -m "Dir renamed" && + + test_seq 1 10 >somefile && + git add somefile && + mkdir olddir && + >olddir/unrelated-file && + git add olddir && + git commit -m "Unrelated file in recreated old dir" + ) +} + +# +# In the following testcase, the first commit on the topic branch renames +# a directory, while the second recreates the old directory and places a +# file into it: +# Base: somefile +# olddir/a +# olddir/b +# Upstream: olddir/newfile +# Topic_1: somefile_2 +# rename olddir/ -> newdir/ +# Topic_2: olddir/unrelated-file +# +# Note that the first pick should merge: +# Base: somefile +# olddir/{a,b} +# Upstream: olddir/newfile +# Topic_1: rename olddir/ -> newdir/ +# For which the expected result (assuming merge.directoryRenames=true) is +# clearly: +# Result: somefile +# newdir/{a, b, newfile} +# +# While the second pick does the following three-way merge: +# Base (Topic_1): somefile +# newdir/{a,b} +# Upstream (Result from 1): same files as base, but adds newdir/newfile +# Topic_2: same files as base, but adds olddir/unrelated-file +# +# The second merge is pretty trivial; upstream adds newdir/newfile, and +# topic_2 adds olddir/unrelated-file. We're just testing that we don't +# accidentally cache directory renames somehow and rename +# olddir/unrelated-file to newdir/unrelated-file. +# +# This testcase should only need one call to diffcore_rename_extended(). +test_expect_success 'caching renames only on upstream side, part 1' ' + test_setup_topic_rename cache-renames-only-upstream-add-file && + ( + cd cache-renames-only-upstream-add-file && + + git switch upstream && + + >olddir/newfile && + git add olddir/newfile && + git commit -m "Add newfile" && + + # + # Actual testing + # + + git switch upstream && + + git config merge.directoryRenames true && + + GIT_TRACE2_PERF="$(pwd)/trace.output" && + export GIT_TRACE2_PERF && + + test-tool fast-rebase --onto HEAD upstream~1 topic && + #git cherry-pick upstream..topic && + + grep region_enter.*diffcore_rename trace.output >calls && + test_line_count = 1 calls && + + git ls-files >tracked && + test_line_count = 5 tracked && + test_path_is_missing newdir/unrelated-file && + test_path_is_file olddir/unrelated-file && + test_path_is_file newdir/newfile && + test_path_is_file newdir/b && + test_path_is_file newdir/a && + test_path_is_file somefile + ) +' + +# +# The following testcase is *very* similar to the last one, but instead of +# adding a new olddir/newfile, it renames somefile -> olddir/newfile: +# Base: somefile +# olddir/a +# olddir/b +# Upstream: somefile_1 -> olddir/newfile +# Topic_1: rename olddir/ -> newdir/ +# somefile_2 +# Topic_2: olddir/unrelated-file +# somefile_3 +# +# Much like the previous test, this case is actually trivial and we are just +# making sure there isn't some spurious directory rename caching going on +# for the wrong side of history. +# +# +# This testcase should only need two calls to diffcore_rename_extended(), +# both for the first merge, one for each side of history. +# +test_expect_success 'caching renames only on upstream side, part 2' ' + test_setup_topic_rename cache-renames-only-upstream-rename-file && + ( + cd cache-renames-only-upstream-rename-file && + + git switch upstream && + + git mv somefile olddir/newfile && + git commit -m "Add newfile" && + + # + # Actual testing + # + + git switch upstream && + + git config merge.directoryRenames true && + + GIT_TRACE2_PERF="$(pwd)/trace.output" && + export GIT_TRACE2_PERF && + + test-tool fast-rebase --onto HEAD upstream~1 topic && + #git cherry-pick upstream..topic && + + grep region_enter.*diffcore_rename trace.output >calls && + test_line_count = 2 calls && + + git ls-files >tracked && + test_line_count = 4 tracked && + test_path_is_missing newdir/unrelated-file && + test_path_is_file olddir/unrelated-file && + test_path_is_file newdir/newfile && + test_path_is_file newdir/b && + test_path_is_file newdir/a + ) +' + +test_done