Merge branch 'rj/doc-technical-fixes'
Documentation mark-up fixes. * rj/doc-technical-fixes: doc: add large-object-promisors.adoc to the docs build doc: commit-graph.adoc: fix up some formatting doc: sparse-checkout.adoc: fix asciidoc warnings doc: remembering-renames.adoc: fix asciidoc warningsmain
commit
411903ce4c
|
|
@ -123,6 +123,7 @@ TECH_DOCS += technical/bundle-uri
|
|||
TECH_DOCS += technical/commit-graph
|
||||
TECH_DOCS += technical/directory-rename-detection
|
||||
TECH_DOCS += technical/hash-function-transition
|
||||
TECH_DOCS += technical/large-object-promisors
|
||||
TECH_DOCS += technical/long-running-process-protocol
|
||||
TECH_DOCS += technical/multi-pack-index
|
||||
TECH_DOCS += technical/packfile-uri
|
||||
|
|
|
|||
|
|
@ -39,6 +39,7 @@ A consumer may load the following info for a commit from the graph:
|
|||
Values 1-4 satisfy the requirements of parse_commit_gently().
|
||||
|
||||
There are two definitions of generation number:
|
||||
|
||||
1. Corrected committer dates (generation number v2)
|
||||
2. Topological levels (generation number v1)
|
||||
|
||||
|
|
@ -158,7 +159,8 @@ number of commits in the full history. By creating a "chain" of commit-graphs,
|
|||
we enable fast writes of new commit data without rewriting the entire commit
|
||||
history -- at least, most of the time.
|
||||
|
||||
## File Layout
|
||||
File Layout
|
||||
~~~~~~~~~~~
|
||||
|
||||
A commit-graph chain uses multiple files, and we use a fixed naming convention
|
||||
to organize these files. Each commit-graph file has a name
|
||||
|
|
@ -170,11 +172,11 @@ hashes for the files in order from "lowest" to "highest".
|
|||
|
||||
For example, if the `commit-graph-chain` file contains the lines
|
||||
|
||||
```
|
||||
----
|
||||
{hash0}
|
||||
{hash1}
|
||||
{hash2}
|
||||
```
|
||||
----
|
||||
|
||||
then the commit-graph chain looks like the following diagram:
|
||||
|
||||
|
|
@ -213,7 +215,8 @@ specifying the hashes of all files in the lower layers. In the above example,
|
|||
`graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains
|
||||
`{hash0}` and `{hash1}`.
|
||||
|
||||
## Merging commit-graph files
|
||||
Merging commit-graph files
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
If we only added a new commit-graph file on every write, we would run into a
|
||||
linear search problem through many commit-graph files. Instead, we use a merge
|
||||
|
|
@ -225,6 +228,7 @@ is determined by the merge strategy that the files should collapse to
|
|||
the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}`
|
||||
file.
|
||||
|
||||
....
|
||||
+---------------------+
|
||||
| |
|
||||
| (new commits) |
|
||||
|
|
@ -250,6 +254,7 @@ file.
|
|||
| |
|
||||
| |
|
||||
+-----------------------+
|
||||
....
|
||||
|
||||
During this process, the commits to write are combined, sorted and we write the
|
||||
contents to a temporary file, all while holding a `commit-graph-chain.lock`
|
||||
|
|
@ -257,14 +262,15 @@ lock-file. When the file is flushed, we rename it to `graph-{hash3}`
|
|||
according to the computed `{hash3}`. Finally, we write the new chain data to
|
||||
`commit-graph-chain.lock`:
|
||||
|
||||
```
|
||||
----
|
||||
{hash3}
|
||||
{hash0}
|
||||
```
|
||||
----
|
||||
|
||||
We then close the lock-file.
|
||||
|
||||
## Merge Strategy
|
||||
Merge Strategy
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
When writing a set of commits that do not exist in the commit-graph stack of
|
||||
height N, we default to creating a new file at level N + 1. We then decide to
|
||||
|
|
@ -289,7 +295,8 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum
|
|||
number of commits) could be extracted into config settings for full
|
||||
flexibility.
|
||||
|
||||
## Handling Mixed Generation Number Chains
|
||||
Handling Mixed Generation Number Chains
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
With the introduction of generation number v2 and generation data chunk, the
|
||||
following scenario is possible:
|
||||
|
|
@ -318,7 +325,8 @@ have corrected commit dates when written by compatible versions of Git. Thus,
|
|||
rewriting split commit-graph as a single file (`--split=replace`) creates a
|
||||
single layer with corrected commit dates.
|
||||
|
||||
## Deleting graph-{hash} files
|
||||
Deleting graph-\{hash\} files
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
After a new tip file is written, some `graph-{hash}` files may no longer
|
||||
be part of a chain. It is important to remove these files from disk, eventually.
|
||||
|
|
@ -333,7 +341,8 @@ files whose modified times are older than a given expiry window. This window
|
|||
defaults to zero, but can be changed using command-line arguments or a config
|
||||
setting.
|
||||
|
||||
## Chains across multiple object directories
|
||||
Chains across multiple object directories
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In a repo with alternates, we look for the `commit-graph-chain` file starting
|
||||
in the local object directory and then in each alternate. The first file that
|
||||
|
|
|
|||
|
|
@ -34,8 +34,8 @@ a new object representation for large blobs as discussed in:
|
|||
|
||||
https://lore.kernel.org/git/xmqqbkdometi.fsf@gitster.g/
|
||||
|
||||
0) Non goals
|
||||
------------
|
||||
Non goals
|
||||
---------
|
||||
|
||||
- We will not discuss those client side improvements here, as they
|
||||
would require changes in different parts of Git than this effort.
|
||||
|
|
@ -90,8 +90,8 @@ later in this document:
|
|||
even more to host content with larger blobs or more large blobs
|
||||
than currently.
|
||||
|
||||
I) Issues with the current situation
|
||||
------------------------------------
|
||||
I Issues with the current situation
|
||||
-----------------------------------
|
||||
|
||||
- Some statistics made on GitLab repos have shown that more than 75%
|
||||
of the disk space is used by blobs that are larger than 1MB and
|
||||
|
|
@ -138,8 +138,8 @@ I) Issues with the current situation
|
|||
complaining that these tools require significant effort to set up,
|
||||
learn and use correctly.
|
||||
|
||||
II) Main features of the "Large Object Promisors" solution
|
||||
----------------------------------------------------------
|
||||
II Main features of the "Large Object Promisors" solution
|
||||
---------------------------------------------------------
|
||||
|
||||
The main features below should give a rough overview of how the
|
||||
solution may work. Details about needed elements can be found in
|
||||
|
|
@ -166,7 +166,7 @@ format. They should be used along with main remotes that contain the
|
|||
other objects.
|
||||
|
||||
Note 1
|
||||
++++++
|
||||
^^^^^^
|
||||
|
||||
To clarify, a LOP is a normal promisor remote, except that:
|
||||
|
||||
|
|
@ -178,7 +178,7 @@ To clarify, a LOP is a normal promisor remote, except that:
|
|||
itself.
|
||||
|
||||
Note 2
|
||||
++++++
|
||||
^^^^^^
|
||||
|
||||
Git already makes it possible for a main remote to also be a promisor
|
||||
remote storing both regular objects and large blobs for a client that
|
||||
|
|
@ -186,13 +186,13 @@ clones from it with a filter on blob size. But here we explicitly want
|
|||
to avoid that.
|
||||
|
||||
Rationale
|
||||
+++++++++
|
||||
^^^^^^^^^
|
||||
|
||||
LOPs aim to be good at handling large blobs while main remotes are
|
||||
already good at handling other objects.
|
||||
|
||||
Implementation
|
||||
++++++++++++++
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
Git already has support for multiple promisor remotes, see
|
||||
link:partial-clone.html#using-many-promisor-remotes[the partial clone documentation].
|
||||
|
|
@ -213,19 +213,19 @@ remote helper (see linkgit:gitremote-helpers[7]) which makes the
|
|||
underlying object storage appear like a remote to Git.
|
||||
|
||||
Note
|
||||
++++
|
||||
^^^^
|
||||
|
||||
A LOP can be a promisor remote accessed using a remote helper by
|
||||
both some clients and the main remote.
|
||||
|
||||
Rationale
|
||||
+++++++++
|
||||
^^^^^^^^^
|
||||
|
||||
This looks like the simplest way to create LOPs that can cheaply
|
||||
handle many large blobs.
|
||||
|
||||
Implementation
|
||||
++++++++++++++
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
Remote helpers are quite easy to write as shell scripts, but it might
|
||||
be more efficient and maintainable to write them using other languages
|
||||
|
|
@ -247,7 +247,7 @@ The underlying object storage that a LOP uses could also serve as
|
|||
storage for large files handled by Git LFS.
|
||||
|
||||
Rationale
|
||||
+++++++++
|
||||
^^^^^^^^^
|
||||
|
||||
This would simplify the server side if it wants to both use a LOP and
|
||||
act as a Git LFS server.
|
||||
|
|
@ -259,7 +259,7 @@ On the server side, a main remote should have a way to offload to a
|
|||
LOP all its blobs with a size over a configurable threshold.
|
||||
|
||||
Rationale
|
||||
+++++++++
|
||||
^^^^^^^^^
|
||||
|
||||
This makes it easy to set things up and to clean things up. For
|
||||
example, an admin could use this to manually convert a repo not using
|
||||
|
|
@ -268,7 +268,7 @@ some users would sometimes push large blobs, a cron job could use this
|
|||
to regularly make sure the large blobs are moved to the LOP.
|
||||
|
||||
Implementation
|
||||
++++++++++++++
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
Using something based on `git repack --filter=...` to separate the
|
||||
blobs we want to offload from the other Git objects could be a good
|
||||
|
|
@ -284,13 +284,13 @@ should have ways to prevent oversize blobs to be fetched, and also
|
|||
perhaps pushed, into it.
|
||||
|
||||
Rationale
|
||||
+++++++++
|
||||
^^^^^^^^^
|
||||
|
||||
A main remote containing many oversize blobs would defeat the purpose
|
||||
of LOPs.
|
||||
|
||||
Implementation
|
||||
++++++++++++++
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
The way to offload to a LOP discussed in 4) above can be used to
|
||||
regularly offload oversize blobs. About preventing oversize blobs from
|
||||
|
|
@ -326,18 +326,18 @@ large blobs directly from the LOP and the server would not need to
|
|||
fetch those blobs from the LOP to be able to serve the client.
|
||||
|
||||
Note
|
||||
++++
|
||||
^^^^
|
||||
|
||||
For fetches instead of clones, a protocol negotiation might not always
|
||||
happen, see the "What about fetches?" FAQ entry below for details.
|
||||
|
||||
Rationale
|
||||
+++++++++
|
||||
^^^^^^^^^
|
||||
|
||||
Security, configurability and efficiency of setting things up.
|
||||
|
||||
Implementation
|
||||
++++++++++++++
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
A "promisor-remote" protocol v2 capability looks like a good way to
|
||||
implement this. The way the client and server use this capability
|
||||
|
|
@ -356,7 +356,7 @@ the client should be able to offload some large blobs it has fetched,
|
|||
but might not need anymore, to the LOP.
|
||||
|
||||
Note
|
||||
++++
|
||||
^^^^
|
||||
|
||||
It might depend on the context if it should be OK or not for clients
|
||||
to offload large blobs they have created, instead of fetched, directly
|
||||
|
|
@ -367,13 +367,13 @@ This should be discussed and refined when we get closer to
|
|||
implementing this feature.
|
||||
|
||||
Rationale
|
||||
+++++++++
|
||||
^^^^^^^^^
|
||||
|
||||
On the client, the easiest way to deal with unneeded large blobs is to
|
||||
offload them.
|
||||
|
||||
Implementation
|
||||
++++++++++++++
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
This is very similar to what 4) above is about, except on the client
|
||||
side instead of the server side. So a good solution to 4) could likely
|
||||
|
|
@ -385,8 +385,8 @@ when cloning (see 6) above). Also if the large blobs were fetched from
|
|||
a LOP, it is likely, and can easily be confirmed, that the LOP still
|
||||
has them, so that they can just be removed from the client.
|
||||
|
||||
III) Benefits of using LOPs
|
||||
---------------------------
|
||||
III Benefits of using LOPs
|
||||
--------------------------
|
||||
|
||||
Many benefits are related to the issues discussed in "I) Issues with
|
||||
the current situation" above:
|
||||
|
|
@ -406,8 +406,8 @@ the current situation" above:
|
|||
|
||||
- Reduced storage needs on the client side.
|
||||
|
||||
IV) FAQ
|
||||
-------
|
||||
IV FAQ
|
||||
------
|
||||
|
||||
What about using multiple LOPs on the server and client side?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
@ -533,7 +533,7 @@ some objects it already knows about but doesn't have because they are
|
|||
on a promisor remote.
|
||||
|
||||
Regular fetch
|
||||
+++++++++++++
|
||||
^^^^^^^^^^^^^
|
||||
|
||||
In a regular fetch, the client will contact the main remote and a
|
||||
protocol negotiation will happen between them. It's a good thing that
|
||||
|
|
@ -551,7 +551,7 @@ new fetch will happen in the same way as the previous clone or fetch,
|
|||
using, or not using, the same LOP(s) as last time.
|
||||
|
||||
"Backfill" or "lazy" fetch
|
||||
++++++++++++++++++++++++++
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
When there is a backfill fetch, the client doesn't necessarily contact
|
||||
the main remote first. It will try to fetch from its promisor remotes
|
||||
|
|
@ -576,8 +576,8 @@ from the client when it fetches from them. The client could get the
|
|||
token when performing a protocol negotiation with the main remote (see
|
||||
section II.6 above).
|
||||
|
||||
V) Future improvements
|
||||
----------------------
|
||||
V Future improvements
|
||||
---------------------
|
||||
|
||||
It is expected that at the beginning using LOPs will be mostly worth
|
||||
it either in a corporate context where the Git version that clients
|
||||
|
|
|
|||
|
|
@ -13,6 +13,7 @@ articles = [
|
|||
'commit-graph.adoc',
|
||||
'directory-rename-detection.adoc',
|
||||
'hash-function-transition.adoc',
|
||||
'large-object-promisors.adoc',
|
||||
'long-running-process-protocol.adoc',
|
||||
'multi-pack-index.adoc',
|
||||
'packfile-uri.adoc',
|
||||
|
|
|
|||
|
|
@ -10,32 +10,32 @@ history as an optimization, assuming all merges are automatic and clean
|
|||
|
||||
Outline:
|
||||
|
||||
0. Assumptions
|
||||
1. Assumptions
|
||||
|
||||
1. How rebasing and cherry-picking work
|
||||
2. How rebasing and cherry-picking work
|
||||
|
||||
2. Why the renames on MERGE_SIDE1 in any given pick are *always* a
|
||||
3. Why the renames on MERGE_SIDE1 in any given pick are *always* a
|
||||
superset of the renames on MERGE_SIDE1 for the next pick.
|
||||
|
||||
3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also
|
||||
4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also
|
||||
a rename on MERGE_SIDE1 for the next pick
|
||||
|
||||
4. A detailed description of the counter-examples to #3.
|
||||
5. A detailed description of the counter-examples to #4.
|
||||
|
||||
5. Why the special cases in #4 are still fully reasonable to use to pair
|
||||
6. Why the special cases in #5 are still fully reasonable to use to pair
|
||||
up files for three-way content merging in the merge machinery, and why
|
||||
they do not affect the correctness of the merge.
|
||||
|
||||
6. Interaction with skipping of "irrelevant" renames
|
||||
7. Interaction with skipping of "irrelevant" renames
|
||||
|
||||
7. Additional items that need to be cached
|
||||
8. Additional items that need to be cached
|
||||
|
||||
8. How directory rename detection interacts with the above and why this
|
||||
9. How directory rename detection interacts with the above and why this
|
||||
optimization is still safe even if merge.directoryRenames is set to
|
||||
"true".
|
||||
|
||||
|
||||
=== 0. Assumptions ===
|
||||
== 1. Assumptions ==
|
||||
|
||||
There are two assumptions that will hold throughout this document:
|
||||
|
||||
|
|
@ -44,8 +44,8 @@ There are two assumptions that will hold throughout this document:
|
|||
|
||||
* All merges are fully automatic
|
||||
|
||||
and a third that will hold in sections 2-5 for simplicity, that I'll later
|
||||
address in section 8:
|
||||
and a third that will hold in sections 3-6 for simplicity, that I'll later
|
||||
address in section 9:
|
||||
|
||||
* No directory renames occur
|
||||
|
||||
|
|
@ -77,9 +77,9 @@ conflicts that the user needs to resolve), the cache of renames is not
|
|||
stored on disk, and thus is thrown away as soon as the rebase or cherry
|
||||
pick stops for the user to resolve the operation.
|
||||
|
||||
The third assumption makes sections 2-5 simpler, and allows people to
|
||||
The third assumption makes sections 3-6 simpler, and allows people to
|
||||
understand the basics of why this optimization is safe and effective, and
|
||||
then I can go back and address the specifics in section 8. It is probably
|
||||
then I can go back and address the specifics in section 9. It is probably
|
||||
also worth noting that if directory renames do occur, then the default of
|
||||
merge.directoryRenames being set to "conflict" means that the operation
|
||||
will stop for users to resolve the conflicts and the cache will be thrown
|
||||
|
|
@ -88,22 +88,26 @@ reason we need to address directory renames specifically, is that some
|
|||
users will have set merge.directoryRenames to "true" to allow the merges to
|
||||
continue to proceed automatically. The optimization is still safe with
|
||||
this config setting, but we have to discuss a few more cases to show why;
|
||||
this discussion is deferred until section 8.
|
||||
this discussion is deferred until section 9.
|
||||
|
||||
|
||||
=== 1. How rebasing and cherry-picking work ===
|
||||
== 2. How rebasing and cherry-picking work ==
|
||||
|
||||
Consider the following setup (from the git-rebase manpage):
|
||||
|
||||
------------
|
||||
A---B---C topic
|
||||
/
|
||||
D---E---F---G main
|
||||
------------
|
||||
|
||||
After rebasing or cherry-picking topic onto main, this will appear as:
|
||||
|
||||
------------
|
||||
A'--B'--C' topic
|
||||
/
|
||||
D---E---F---G main
|
||||
------------
|
||||
|
||||
The way the commits A', B', and C' are created is through a series of
|
||||
merges, where rebase or cherry-pick sequentially uses each of the three
|
||||
|
|
@ -111,6 +115,7 @@ A-B-C commits in a special merge operation. Let's label the three commits
|
|||
in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. For
|
||||
this picture, the three commits for each of the three merges would be:
|
||||
|
||||
....
|
||||
To create A':
|
||||
MERGE_BASE: E
|
||||
MERGE_SIDE1: G
|
||||
|
|
@ -125,6 +130,7 @@ To create C':
|
|||
MERGE_BASE: B
|
||||
MERGE_SIDE1: B'
|
||||
MERGE_SIDE2: C
|
||||
....
|
||||
|
||||
Sometimes, folks are surprised that these three-way merges are done. It
|
||||
can be useful in understanding these three-way merges to view them in a
|
||||
|
|
@ -138,8 +144,7 @@ Conceptually the two statements above are the same as a three-way merge of
|
|||
B, B', and C, at least the parts before you decide to record a commit.
|
||||
|
||||
|
||||
=== 2. Why the renames on MERGE_SIDE1 in any given pick are always a ===
|
||||
=== superset of the renames on MERGE_SIDE1 for the next pick. ===
|
||||
== 3. Why the renames on MERGE_SIDE1 in any given pick are always a superset of the renames on MERGE_SIDE1 for the next pick. ==
|
||||
|
||||
The merge machinery uses the filenames it is fed from MERGE_BASE,
|
||||
MERGE_SIDE1, and MERGE_SIDE2. It will only move content to a different
|
||||
|
|
@ -156,6 +161,7 @@ filename under one of three conditions:
|
|||
First, let's remember what commits are involved in the first and second
|
||||
picks of the cherry-pick or rebase sequence:
|
||||
|
||||
....
|
||||
To create A':
|
||||
MERGE_BASE: E
|
||||
MERGE_SIDE1: G
|
||||
|
|
@ -165,6 +171,7 @@ To create B':
|
|||
MERGE_BASE: A
|
||||
MERGE_SIDE1: A'
|
||||
MERGE_SIDE2: B
|
||||
....
|
||||
|
||||
So, in particular, we need to show that the renames between E and G are a
|
||||
superset of those between A and A'.
|
||||
|
|
@ -181,11 +188,11 @@ are a subset of those between E and G. Equivalently, all renames between E
|
|||
and G are a superset of those between A and A'.
|
||||
|
||||
|
||||
=== 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ ===
|
||||
=== always also a rename on MERGE_SIDE1 for the next pick. ===
|
||||
== 4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also a rename on MERGE_SIDE1 for the next pick. ==
|
||||
|
||||
Let's again look at the first two picks:
|
||||
|
||||
....
|
||||
To create A':
|
||||
MERGE_BASE: E
|
||||
MERGE_SIDE1: G
|
||||
|
|
@ -195,17 +202,25 @@ To create B':
|
|||
MERGE_BASE: A
|
||||
MERGE_SIDE1: A'
|
||||
MERGE_SIDE2: B
|
||||
....
|
||||
|
||||
Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e.
|
||||
any given rename from E to G. Let's use the filenames 'oldfile' and
|
||||
'newfile' for demonstration purposes. That first pick will function as
|
||||
follows; when the rename is detected, the merge machinery will do a
|
||||
three-way content merge of the following:
|
||||
|
||||
....
|
||||
E:oldfile
|
||||
G:newfile
|
||||
A:oldfile
|
||||
....
|
||||
|
||||
and produce a new result:
|
||||
|
||||
....
|
||||
A':newfile
|
||||
....
|
||||
|
||||
Note above that I've assumed that E->A did not rename oldfile. If that
|
||||
side did rename, then we most likely have a rename/rename(1to2) conflict
|
||||
|
|
@ -254,19 +269,21 @@ were detected as renames, A:oldfile and A':newfile should also be
|
|||
detectable as renames almost always.
|
||||
|
||||
|
||||
=== 4. A detailed description of the counter-examples to #3. ===
|
||||
== 5. A detailed description of the counter-examples to #4. ==
|
||||
|
||||
We already noted in section 3 that rename/rename(1to1) (i.e. both sides
|
||||
We already noted in section 4 that rename/rename(1to1) (i.e. both sides
|
||||
renaming a file the same way) was one counter-example. The more
|
||||
interesting bit, though, is why did we need to use the "almost" qualifier
|
||||
when stating that A:oldfile and A':newfile are "almost" always detectable
|
||||
as renames?
|
||||
|
||||
Let's repeat an earlier point that section 3 made:
|
||||
Let's repeat an earlier point that section 4 made:
|
||||
|
||||
....
|
||||
A':newfile was created by applying the changes between E:oldfile and
|
||||
G:newfile to A:oldfile. The changes between E:oldfile and G:newfile were
|
||||
<50% of the size of E:oldfile.
|
||||
....
|
||||
|
||||
If those changes that were <50% of the size of E:oldfile are also <50% of
|
||||
the size of A:oldfile, then A:oldfile and A':newfile will be detectable as
|
||||
|
|
@ -276,18 +293,21 @@ still somehow merge cleanly), then traditional rename detection would not
|
|||
detect A:oldfile and A':newfile as renames.
|
||||
|
||||
Here's an example where that can happen:
|
||||
|
||||
* E:oldfile had 20 lines
|
||||
* G:newfile added 10 new lines at the beginning of the file
|
||||
* A:oldfile kept the first 3 lines of the file, and deleted all the rest
|
||||
|
||||
then
|
||||
|
||||
....
|
||||
=> A':newfile would have 13 lines, 3 of which matches those in A:oldfile.
|
||||
E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and
|
||||
A':newfile would not be.
|
||||
E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and
|
||||
A':newfile would not be.
|
||||
....
|
||||
|
||||
|
||||
=== 5. Why the special cases in #4 are still fully reasonable to use to ===
|
||||
=== pair up files for three-way content merging in the merge machinery, ===
|
||||
=== and why they do not affect the correctness of the merge. ===
|
||||
== 6. Why the special cases in #5 are still fully reasonable to use to pair up files for three-way content merging in the merge machinery, and why they do not affect the correctness of the merge. ==
|
||||
|
||||
In the rename/rename(1to1) case, A:newfile and A':newfile are not renames
|
||||
since they use the *same* filename. However, files with the same filename
|
||||
|
|
@ -295,14 +315,14 @@ are obviously fine to pair up for three-way content merging (the merge
|
|||
machinery has never employed break detection). The interesting
|
||||
counter-example case is thus not the rename/rename(1to1) case, but the case
|
||||
where A did not rename oldfile. That was the case that we spent most of
|
||||
the time discussing in sections 3 and 4. The remainder of this section
|
||||
the time discussing in sections 4 and 5. The remainder of this section
|
||||
will be devoted to that case as well.
|
||||
|
||||
So, even if A:oldfile and A':newfile aren't detectable as renames, why is
|
||||
it still reasonable to pair them up for three-way content merging in the
|
||||
merge machinery? There are multiple reasons:
|
||||
|
||||
* As noted in sections 3 and 4, the diff between A:oldfile and A':newfile
|
||||
* As noted in sections 4 and 5, the diff between A:oldfile and A':newfile
|
||||
is *exactly* the same as the diff between E:oldfile and G:newfile. The
|
||||
latter pair were detected as renames, so it seems unlikely to surprise
|
||||
users for us to treat A:oldfile and A':newfile as renames.
|
||||
|
|
@ -394,7 +414,7 @@ cases 1 and 3 seem to provide as good or better behavior with the
|
|||
optimization than without.
|
||||
|
||||
|
||||
=== 6. Interaction with skipping of "irrelevant" renames ===
|
||||
== 7. Interaction with skipping of "irrelevant" renames ==
|
||||
|
||||
Previous optimizations involved skipping rename detection for paths
|
||||
considered to be "irrelevant". See for example the following commits:
|
||||
|
|
@ -421,24 +441,27 @@ detection -- though we can limit it to the paths for which we have not
|
|||
already detected renames.
|
||||
|
||||
|
||||
=== 7. Additional items that need to be cached ===
|
||||
== 8. Additional items that need to be cached ==
|
||||
|
||||
It turns out we have to cache more than just renames; we also cache:
|
||||
|
||||
....
|
||||
A) non-renames (i.e. unpaired deletes)
|
||||
B) counts of renames within directories
|
||||
C) sources that were marked as RELEVANT_LOCATION, but which were
|
||||
downgraded to RELEVANT_NO_MORE
|
||||
D) the toplevel trees involved in the merge
|
||||
....
|
||||
|
||||
These are all stored in struct rename_info, and respectively appear in
|
||||
|
||||
* cached_pairs (along side actual renames, just with a value of NULL)
|
||||
* dir_rename_counts
|
||||
* cached_irrelevant
|
||||
* merge_trees
|
||||
|
||||
The reason for (A) comes from the irrelevant renames skipping
|
||||
optimization discussed in section 6. The fact that irrelevant renames
|
||||
The reason for `(A)` comes from the irrelevant renames skipping
|
||||
optimization discussed in section 7. The fact that irrelevant renames
|
||||
are skipped means we only get a subset of the potential renames
|
||||
detected and subsequent commits may need to run rename detection on
|
||||
the upstream side on a subset of the remaining renames (to get the
|
||||
|
|
@ -447,23 +470,24 @@ deletes are involved in rename detection too, we don't want to
|
|||
repeatedly check that those paths remain unpaired on the upstream side
|
||||
with every commit we are transplanting.
|
||||
|
||||
The reason for (B) is that diffcore_rename_extended() is what
|
||||
The reason for `(B)` is that diffcore_rename_extended() is what
|
||||
generates the counts of renames by directory which is needed in
|
||||
directory rename detection, and if we don't run
|
||||
diffcore_rename_extended() again then we need to have the output from
|
||||
it, including dir_rename_counts, from the previous run.
|
||||
|
||||
The reason for (C) is that merge-ort's tree traversal will again think
|
||||
The reason for `(C)` is that merge-ort's tree traversal will again think
|
||||
those paths are relevant (marking them as RELEVANT_LOCATION), but the
|
||||
fact that they were downgraded to RELEVANT_NO_MORE means that
|
||||
dir_rename_counts already has the information we need for directory
|
||||
rename detection. (A path which becomes RELEVANT_CONTENT in a
|
||||
subsequent commit will be removed from cached_irrelevant.)
|
||||
|
||||
The reason for (D) is that is how we determine whether the remember
|
||||
The reason for `(D)` is that is how we determine whether the remember
|
||||
renames optimization can be used. In particular, remembering that our
|
||||
sequence of merges looks like:
|
||||
|
||||
....
|
||||
Merge 1:
|
||||
MERGE_BASE: E
|
||||
MERGE_SIDE1: G
|
||||
|
|
@ -475,6 +499,7 @@ sequence of merges looks like:
|
|||
MERGE_SIDE1: A'
|
||||
MERGE_SIDE2: B
|
||||
=> Creates B'
|
||||
....
|
||||
|
||||
It is the fact that the trees A and A' appear both in Merge 1 and in
|
||||
Merge 2, with A as a parent of A' that allows this optimization. So
|
||||
|
|
@ -482,12 +507,11 @@ we store the trees to compare with what we are asked to merge next
|
|||
time.
|
||||
|
||||
|
||||
=== 8. How directory rename detection interacts with the above and ===
|
||||
=== why this optimization is still safe even if ===
|
||||
=== merge.directoryRenames is set to "true". ===
|
||||
== 9. How directory rename detection interacts with the above and why this optimization is still safe even if merge.directoryRenames is set to "true". ==
|
||||
|
||||
As noted in the assumptions section:
|
||||
|
||||
....
|
||||
"""
|
||||
...if directory renames do occur, then the default of
|
||||
merge.directoryRenames being set to "conflict" means that the operation
|
||||
|
|
@ -497,11 +521,13 @@ As noted in the assumptions section:
|
|||
is that some users will have set merge.directoryRenames to "true" to
|
||||
allow the merges to continue to proceed automatically.
|
||||
"""
|
||||
....
|
||||
|
||||
Let's remember that we need to look at how any given pick affects the next
|
||||
one. So let's again use the first two picks from the diagram in section
|
||||
one:
|
||||
|
||||
....
|
||||
First pick does this three-way merge:
|
||||
MERGE_BASE: E
|
||||
MERGE_SIDE1: G
|
||||
|
|
@ -513,6 +539,7 @@ one:
|
|||
MERGE_SIDE1: A'
|
||||
MERGE_SIDE2: B
|
||||
=> creates B'
|
||||
....
|
||||
|
||||
Now, directory rename detection exists so that if one side of history
|
||||
renames a directory, and the other side adds a new file to the old
|
||||
|
|
@ -545,7 +572,7 @@ while considering all of these cases:
|
|||
concerned; see the assumptions section). Two interesting sub-notes
|
||||
about these counts:
|
||||
|
||||
* If we need to perform rename-detection again on the given side (e.g.
|
||||
** If we need to perform rename-detection again on the given side (e.g.
|
||||
some paths are relevant for rename detection that weren't before),
|
||||
then we clear dir_rename_counts and recompute it, making use of
|
||||
cached_pairs. The reason it is important to do this is optimizations
|
||||
|
|
@ -556,7 +583,7 @@ while considering all of these cases:
|
|||
easiest way to "fix up" dir_rename_counts in such cases is to just
|
||||
recompute it.
|
||||
|
||||
* If we prune rename/rename(1to1) entries from the cache, then we also
|
||||
** If we prune rename/rename(1to1) entries from the cache, then we also
|
||||
need to update dir_rename_counts to decrement the counts for the
|
||||
involved directory and any relevant parent directories (to undo what
|
||||
update_dir_rename_counts() in diffcore-rename.c incremented when the
|
||||
|
|
@ -578,6 +605,7 @@ in order:
|
|||
|
||||
Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir
|
||||
|
||||
....
|
||||
This case looks like this:
|
||||
|
||||
MERGE_BASE: E, Has olddir/
|
||||
|
|
@ -595,10 +623,13 @@ Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir
|
|||
* MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
|
||||
Given the cached rename noted above, the second merge can proceed as
|
||||
expected without needing to perform rename detection from A -> A'.
|
||||
....
|
||||
|
||||
Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir
|
||||
|
||||
....
|
||||
This case looks like this:
|
||||
|
||||
MERGE_BASE: E oldfile, olddir/
|
||||
MERGE_SIDE1: G oldfile, olddir/ -> newdir/
|
||||
MERGE_SIDE2: A oldfile -> olddir/newfile
|
||||
|
|
@ -617,9 +648,11 @@ Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir
|
|||
|
||||
Given the cached rename noted above, the second merge can proceed as
|
||||
expected without needing to perform rename detection from A -> A'.
|
||||
....
|
||||
|
||||
Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir
|
||||
|
||||
....
|
||||
This case looks like this:
|
||||
|
||||
MERGE_BASE: E, Has olddir/
|
||||
|
|
@ -635,9 +668,11 @@ Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir
|
|||
In this case, with the optimization, note that after the first commit there
|
||||
were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed.
|
||||
But the second merge didn't need any renames so this is fine.
|
||||
....
|
||||
|
||||
Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir
|
||||
|
||||
....
|
||||
This case looks like this:
|
||||
|
||||
MERGE_BASE: E, Has olddir/
|
||||
|
|
@ -658,6 +693,7 @@ Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir
|
|||
|
||||
Given the cached rename noted above, the second merge can proceed as
|
||||
expected without needing to perform rename detection from A -> A'.
|
||||
....
|
||||
|
||||
Finally, I'll just note here that interactions with the
|
||||
skip-irrelevant-renames optimization means we sometimes don't detect
|
||||
|
|
|
|||
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue