Merge branch 'rj/doc-technical-fixes'

Documentation mark-up fixes. * rj/doc-technical-fixes: doc: add large-object-promisors.adoc to the docs build doc: commit-graph.adoc: fix up some formatting doc: sparse-checkout.adoc: fix asciidoc warnings doc: remembering-renames.adoc: fix asciidoc warnings
2025-10-24 13:48:04 -07:00 · 2025-10-24 13:48:04 -07:00 · 411903ce4c
parent 1d10771264 1c1fc86d55
commit 411903ce4c
6 changed files with 494 additions and 399 deletions
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@ -123,6 +123,7 @@ TECH_DOCS += technical/bundle-uri
 TECH_DOCS += technical/commit-graph
 TECH_DOCS += technical/directory-rename-detection
 TECH_DOCS += technical/hash-function-transition
+TECH_DOCS += technical/large-object-promisors
 TECH_DOCS += technical/long-running-process-protocol
 TECH_DOCS += technical/multi-pack-index
 TECH_DOCS += technical/packfile-uri
--- a/Documentation/technical/commit-graph.adoc
+++ b/Documentation/technical/commit-graph.adoc
@ -39,6 +39,7 @@ A consumer may load the following info for a commit from the graph:
 Values 1-4 satisfy the requirements of parse_commit_gently().

 There are two definitions of generation number:
+
 1. Corrected committer dates (generation number v2)
 2. Topological levels (generation number v1)

@ -158,7 +159,8 @@ number of commits in the full history. By creating a "chain" of commit-graphs,
 we enable fast writes of new commit data without rewriting the entire commit
 history -- at least, most of the time.

-## File Layout
+File Layout
+~~~~~~~~~~~

 A commit-graph chain uses multiple files, and we use a fixed naming convention
 to organize these files. Each commit-graph file has a name
@ -170,11 +172,11 @@ hashes for the files in order from "lowest" to "highest".

 For example, if the `commit-graph-chain` file contains the lines

-```
+----
 	{hash0}
 	{hash1}
 	{hash2}
-```
+----

 then the commit-graph chain looks like the following diagram:

@ -213,7 +215,8 @@ specifying the hashes of all files in the lower layers. In the above example,
 `graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains
 `{hash0}` and `{hash1}`.

-## Merging commit-graph files
+Merging commit-graph files
+~~~~~~~~~~~~~~~~~~~~~~~~~~

 If we only added a new commit-graph file on every write, we would run into a
 linear search problem through many commit-graph files.  Instead, we use a merge
@ -225,6 +228,7 @@ is determined by the merge strategy that the files should collapse to
 the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}`
 file.

+....
 			    +---------------------+
 			    |                     |
 			    |    (new commits)    |
@ -250,6 +254,7 @@ file.
 |                       |
 |                       |
 +-----------------------+
+....

 During this process, the commits to write are combined, sorted and we write the
 contents to a temporary file, all while holding a `commit-graph-chain.lock`
@ -257,14 +262,15 @@ lock-file.  When the file is flushed, we rename it to `graph-{hash3}`
 according to the computed `{hash3}`. Finally, we write the new chain data to
 `commit-graph-chain.lock`:

-```
+----
 	{hash3}
 	{hash0}
-```
+----

 We then close the lock-file.

-## Merge Strategy
+Merge Strategy
+~~~~~~~~~~~~~~

 When writing a set of commits that do not exist in the commit-graph stack of
 height N, we default to creating a new file at level N + 1. We then decide to
@ -289,7 +295,8 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum
 number of commits) could be extracted into config settings for full
 flexibility.

-## Handling Mixed Generation Number Chains
+Handling Mixed Generation Number Chains
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 With the introduction of generation number v2 and generation data chunk, the
 following scenario is possible:
@ -318,7 +325,8 @@ have corrected commit dates when written by compatible versions of Git. Thus,
 rewriting split commit-graph as a single file (`--split=replace`) creates a
 single layer with corrected commit dates.

-## Deleting graph-{hash} files
+Deleting graph-\{hash\} files
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 After a new tip file is written, some `graph-{hash}` files may no longer
 be part of a chain. It is important to remove these files from disk, eventually.
@ -333,7 +341,8 @@ files whose modified times are older than a given expiry window. This window
 defaults to zero, but can be changed using command-line arguments or a config
 setting.

-## Chains across multiple object directories
+Chains across multiple object directories
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 In a repo with alternates, we look for the `commit-graph-chain` file starting
 in the local object directory and then in each alternate. The first file that
--- a/Documentation/technical/large-object-promisors.adoc
+++ b/Documentation/technical/large-object-promisors.adoc
@ -34,8 +34,8 @@ a new object representation for large blobs as discussed in:

 https://lore.kernel.org/git/xmqqbkdometi.fsf@gitster.g/

-0) Non goals
------------
+Non goals
+---------

 - We will not discuss those client side improvements here, as they
  would require changes in different parts of Git than this effort.
@ -90,8 +90,8 @@ later in this document:
    even more to host content with larger blobs or more large blobs
    than currently.

-I) Issues with the current situation
------------------------------------
+I Issues with the current situation
+-----------------------------------

 - Some statistics made on GitLab repos have shown that more than 75%
  of the disk space is used by blobs that are larger than 1MB and
@ -138,8 +138,8 @@ I) Issues with the current situation
  complaining that these tools require significant effort to set up,
  learn and use correctly.

-II) Main features of the "Large Object Promisors" solution
----------------------------------------------------------
+II Main features of the "Large Object Promisors" solution
+---------------------------------------------------------

 The main features below should give a rough overview of how the
 solution may work. Details about needed elements can be found in
@ -166,7 +166,7 @@ format. They should be used along with main remotes that contain the
 other objects.

 Note 1
-++++++
+^^^^^^

 To clarify, a LOP is a normal promisor remote, except that:

@ -178,7 +178,7 @@ To clarify, a LOP is a normal promisor remote, except that:
  itself.

 Note 2
-++++++
+^^^^^^

 Git already makes it possible for a main remote to also be a promisor
 remote storing both regular objects and large blobs for a client that
@ -186,13 +186,13 @@ clones from it with a filter on blob size. But here we explicitly want
 to avoid that.

 Rationale
-+++++++++
+^^^^^^^^^

 LOPs aim to be good at handling large blobs while main remotes are
 already good at handling other objects.

 Implementation
-++++++++++++++
+^^^^^^^^^^^^^^

 Git already has support for multiple promisor remotes, see
 link:partial-clone.html#using-many-promisor-remotes[the partial clone documentation].
@ -213,19 +213,19 @@ remote helper (see linkgit:gitremote-helpers[7]) which makes the
 underlying object storage appear like a remote to Git.

 Note
-++++
+^^^^

 A LOP can be a promisor remote accessed using a remote helper by
 both some clients and the main remote.

 Rationale
-+++++++++
+^^^^^^^^^

 This looks like the simplest way to create LOPs that can cheaply
 handle many large blobs.

 Implementation
-++++++++++++++
+^^^^^^^^^^^^^^

 Remote helpers are quite easy to write as shell scripts, but it might
 be more efficient and maintainable to write them using other languages
@ -247,7 +247,7 @@ The underlying object storage that a LOP uses could also serve as
 storage for large files handled by Git LFS.

 Rationale
-+++++++++
+^^^^^^^^^

 This would simplify the server side if it wants to both use a LOP and
 act as a Git LFS server.
@ -259,7 +259,7 @@ On the server side, a main remote should have a way to offload to a
 LOP all its blobs with a size over a configurable threshold.

 Rationale
-+++++++++
+^^^^^^^^^

 This makes it easy to set things up and to clean things up. For
 example, an admin could use this to manually convert a repo not using
@ -268,7 +268,7 @@ some users would sometimes push large blobs, a cron job could use this
 to regularly make sure the large blobs are moved to the LOP.

 Implementation
-++++++++++++++
+^^^^^^^^^^^^^^

 Using something based on `git repack --filter=...` to separate the
 blobs we want to offload from the other Git objects could be a good
@ -284,13 +284,13 @@ should have ways to prevent oversize blobs to be fetched, and also
 perhaps pushed, into it.

 Rationale
-+++++++++
+^^^^^^^^^

 A main remote containing many oversize blobs would defeat the purpose
 of LOPs.

 Implementation
-++++++++++++++
+^^^^^^^^^^^^^^

 The way to offload to a LOP discussed in 4) above can be used to
 regularly offload oversize blobs. About preventing oversize blobs from
@ -326,18 +326,18 @@ large blobs directly from the LOP and the server would not need to
 fetch those blobs from the LOP to be able to serve the client.

 Note
-++++
+^^^^

 For fetches instead of clones, a protocol negotiation might not always
 happen, see the "What about fetches?" FAQ entry below for details.

 Rationale
-+++++++++
+^^^^^^^^^

 Security, configurability and efficiency of setting things up.

 Implementation
-++++++++++++++
+^^^^^^^^^^^^^^

 A "promisor-remote" protocol v2 capability looks like a good way to
 implement this. The way the client and server use this capability
@ -356,7 +356,7 @@ the client should be able to offload some large blobs it has fetched,
 but might not need anymore, to the LOP.

 Note
-++++
+^^^^

 It might depend on the context if it should be OK or not for clients
 to offload large blobs they have created, instead of fetched, directly
@ -367,13 +367,13 @@ This should be discussed and refined when we get closer to
 implementing this feature.

 Rationale
-+++++++++
+^^^^^^^^^

 On the client, the easiest way to deal with unneeded large blobs is to
 offload them.

 Implementation
-++++++++++++++
+^^^^^^^^^^^^^^

 This is very similar to what 4) above is about, except on the client
 side instead of the server side. So a good solution to 4) could likely
@ -385,8 +385,8 @@ when cloning (see 6) above). Also if the large blobs were fetched from
 a LOP, it is likely, and can easily be confirmed, that the LOP still
 has them, so that they can just be removed from the client.

-III) Benefits of using LOPs
---------------------------
+III Benefits of using LOPs
+--------------------------

 Many benefits are related to the issues discussed in "I) Issues with
 the current situation" above:
@ -406,8 +406,8 @@ the current situation" above:

 - Reduced storage needs on the client side.

-IV) FAQ
-------
+IV FAQ
+------

 What about using multiple LOPs on the server and client side?
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -533,7 +533,7 @@ some objects it already knows about but doesn't have because they are
 on a promisor remote.

 Regular fetch
-+++++++++++++
+^^^^^^^^^^^^^

 In a regular fetch, the client will contact the main remote and a
 protocol negotiation will happen between them. It's a good thing that
@ -551,7 +551,7 @@ new fetch will happen in the same way as the previous clone or fetch,
 using, or not using, the same LOP(s) as last time.

 "Backfill" or "lazy" fetch
-++++++++++++++++++++++++++
+^^^^^^^^^^^^^^^^^^^^^^^^^^

 When there is a backfill fetch, the client doesn't necessarily contact
 the main remote first. It will try to fetch from its promisor remotes
@ -576,8 +576,8 @@ from the client when it fetches from them. The client could get the
 token when performing a protocol negotiation with the main remote (see
 section II.6 above).

-V) Future improvements
----------------------
+V Future improvements
+---------------------

 It is expected that at the beginning using LOPs will be mostly worth
 it either in a corporate context where the Git version that clients
--- a/Documentation/technical/meson.build
+++ b/Documentation/technical/meson.build
@ -13,6 +13,7 @@ articles = [
  'commit-graph.adoc',
  'directory-rename-detection.adoc',
  'hash-function-transition.adoc',
+  'large-object-promisors.adoc',
  'long-running-process-protocol.adoc',
  'multi-pack-index.adoc',
  'packfile-uri.adoc',
--- a/Documentation/technical/remembering-renames.adoc
+++ b/Documentation/technical/remembering-renames.adoc
@ -10,32 +10,32 @@ history as an optimization, assuming all merges are automatic and clean

 Outline:

-  0. Assumptions
+  1. Assumptions

-  1. How rebasing and cherry-picking work
+  2. How rebasing and cherry-picking work

-  2. Why the renames on MERGE_SIDE1 in any given pick are *always* a
+  3. Why the renames on MERGE_SIDE1 in any given pick are *always* a
     superset of the renames on MERGE_SIDE1 for the next pick.

-  3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also
+  4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also
     a rename on MERGE_SIDE1 for the next pick

-  4. A detailed description of the counter-examples to #3.
+  5. A detailed description of the counter-examples to #4.

-  5. Why the special cases in #4 are still fully reasonable to use to pair
+  6. Why the special cases in #5 are still fully reasonable to use to pair
     up files for three-way content merging in the merge machinery, and why
     they do not affect the correctness of the merge.

-  6. Interaction with skipping of "irrelevant" renames
+  7. Interaction with skipping of "irrelevant" renames

-  7. Additional items that need to be cached
+  8. Additional items that need to be cached

-  8. How directory rename detection interacts with the above and why this
+  9. How directory rename detection interacts with the above and why this
     optimization is still safe even if merge.directoryRenames is set to
     "true".


-=== 0. Assumptions ===
+== 1. Assumptions ==

 There are two assumptions that will hold throughout this document:

@ -44,8 +44,8 @@ There are two assumptions that will hold throughout this document:

  * All merges are fully automatic

-and a third that will hold in sections 2-5 for simplicity, that I'll later
-address in section 8:
+and a third that will hold in sections 3-6 for simplicity, that I'll later
+address in section 9:

  * No directory renames occur

@ -77,9 +77,9 @@ conflicts that the user needs to resolve), the cache of renames is not
 stored on disk, and thus is thrown away as soon as the rebase or cherry
 pick stops for the user to resolve the operation.

-The third assumption makes sections 2-5 simpler, and allows people to
+The third assumption makes sections 3-6 simpler, and allows people to
 understand the basics of why this optimization is safe and effective, and
-then I can go back and address the specifics in section 8.  It is probably
+then I can go back and address the specifics in section 9.  It is probably
 also worth noting that if directory renames do occur, then the default of
 merge.directoryRenames being set to "conflict" means that the operation
 will stop for users to resolve the conflicts and the cache will be thrown
@ -88,22 +88,26 @@ reason we need to address directory renames specifically, is that some
 users will have set merge.directoryRenames to "true" to allow the merges to
 continue to proceed automatically.  The optimization is still safe with
 this config setting, but we have to discuss a few more cases to show why;
-this discussion is deferred until section 8.
+this discussion is deferred until section 9.


-=== 1. How rebasing and cherry-picking work ===
+== 2. How rebasing and cherry-picking work ==

 Consider the following setup (from the git-rebase manpage):

+------------
 		     A---B---C topic
 		    /
 	       D---E---F---G main
+------------

 After rebasing or cherry-picking topic onto main, this will appear as:

+------------
 			     A'--B'--C' topic
 			    /
 	       D---E---F---G main
+------------

 The way the commits A', B', and C' are created is through a series of
 merges, where rebase or cherry-pick sequentially uses each of the three
@ -111,6 +115,7 @@ A-B-C commits in a special merge operation.  Let's label the three commits
 in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2.  For
 this picture, the three commits for each of the three merges would be:

+....
 To create A':
   MERGE_BASE:   E
   MERGE_SIDE1:  G
@ -125,6 +130,7 @@ To create C':
   MERGE_BASE:   B
   MERGE_SIDE1:  B'
   MERGE_SIDE2:  C
+....

 Sometimes, folks are surprised that these three-way merges are done.  It
 can be useful in understanding these three-way merges to view them in a
@ -138,8 +144,7 @@ Conceptually the two statements above are the same as a three-way merge of
 B, B', and C, at least the parts before you decide to record a commit.


-=== 2. Why the renames on MERGE_SIDE1 in any given pick are always a ===
-===    superset of the renames on MERGE_SIDE1 for the next pick.     ===
+== 3. Why the renames on MERGE_SIDE1 in any given pick are always a superset of the renames on MERGE_SIDE1 for the next pick. ==

 The merge machinery uses the filenames it is fed from MERGE_BASE,
 MERGE_SIDE1, and MERGE_SIDE2.  It will only move content to a different
@ -156,6 +161,7 @@ filename under one of three conditions:
 First, let's remember what commits are involved in the first and second
 picks of the cherry-pick or rebase sequence:

+....
 To create A':
   MERGE_BASE:   E
   MERGE_SIDE1:  G
@ -165,6 +171,7 @@ To create B':
   MERGE_BASE:   A
   MERGE_SIDE1:  A'
   MERGE_SIDE2:  B
+....

 So, in particular, we need to show that the renames between E and G are a
 superset of those between A and A'.
@ -181,11 +188,11 @@ are a subset of those between E and G.  Equivalently, all renames between E
 and G are a superset of those between A and A'.


-=== 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_   ===
-===    always also a rename on MERGE_SIDE1 for the next pick.        ===
+== 4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also a rename on MERGE_SIDE1 for the next pick. ==

 Let's again look at the first two picks:

+....
 To create A':
   MERGE_BASE:   E
   MERGE_SIDE1:  G
@ -195,17 +202,25 @@ To create B':
   MERGE_BASE:   A
   MERGE_SIDE1:  A'
   MERGE_SIDE2:  B
+....

 Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e.
 any given rename from E to G.  Let's use the filenames 'oldfile' and
 'newfile' for demonstration purposes.  That first pick will function as
 follows; when the rename is detected, the merge machinery will do a
 three-way content merge of the following:
+
+....
    E:oldfile
    G:newfile
    A:oldfile
+....
+
 and produce a new result:
+
+....
    A':newfile
+....

 Note above that I've assumed that E->A did not rename oldfile.  If that
 side did rename, then we most likely have a rename/rename(1to2) conflict
@ -254,19 +269,21 @@ were detected as renames, A:oldfile and A':newfile should also be
 detectable as renames almost always.


-=== 4. A detailed description of the counter-examples to #3.         ===
+== 5. A detailed description of the counter-examples to #4. ==

-We already noted in section 3 that rename/rename(1to1) (i.e. both sides
+We already noted in section 4 that rename/rename(1to1) (i.e. both sides
 renaming a file the same way) was one counter-example.  The more
 interesting bit, though, is why did we need to use the "almost" qualifier
 when stating that A:oldfile and A':newfile are "almost" always detectable
 as renames?

-Let's repeat an earlier point that section 3 made:
+Let's repeat an earlier point that section 4 made:

+....
  A':newfile was created by applying the changes between E:oldfile and
  G:newfile to A:oldfile.  The changes between E:oldfile and G:newfile were
  <50% of the size of E:oldfile.
+....

 If those changes that were <50% of the size of E:oldfile are also <50% of
 the size of A:oldfile, then A:oldfile and A':newfile will be detectable as
@ -276,18 +293,21 @@ still somehow merge cleanly), then traditional rename detection would not
 detect A:oldfile and A':newfile as renames.

 Here's an example where that can happen:
+
  * E:oldfile had 20 lines
  * G:newfile added 10 new lines at the beginning of the file
  * A:oldfile kept the first 3 lines of the file, and deleted all the rest
+
 then
+
+....
  => A':newfile would have 13 lines, 3 of which matches those in A:oldfile.
-E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and
-A':newfile would not be.
+  E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and
+  A':newfile would not be.
+....


-=== 5. Why the special cases in #4 are still fully reasonable to use to    ===
-===    pair up files for three-way content merging in the merge machinery, ===
-===    and why they do not affect the correctness of the merge.            ===
+== 6. Why the special cases in #5 are still fully reasonable to use to pair up files for three-way content merging in the merge machinery, and why they do not affect the correctness of the merge. ==

 In the rename/rename(1to1) case, A:newfile and A':newfile are not renames
 since they use the *same* filename.  However, files with the same filename
@ -295,14 +315,14 @@ are obviously fine to pair up for three-way content merging (the merge
 machinery has never employed break detection).  The interesting
 counter-example case is thus not the rename/rename(1to1) case, but the case
 where A did not rename oldfile.  That was the case that we spent most of
-the time discussing in sections 3 and 4.  The remainder of this section
+the time discussing in sections 4 and 5.  The remainder of this section
 will be devoted to that case as well.

 So, even if A:oldfile and A':newfile aren't detectable as renames, why is
 it still reasonable to pair them up for three-way content merging in the
 merge machinery?  There are multiple reasons:

-  * As noted in sections 3 and 4, the diff between A:oldfile and A':newfile
+  * As noted in sections 4 and 5, the diff between A:oldfile and A':newfile
    is *exactly* the same as the diff between E:oldfile and G:newfile.  The
    latter pair were detected as renames, so it seems unlikely to surprise
    users for us to treat A:oldfile and A':newfile as renames.
@ -394,7 +414,7 @@ cases 1 and 3 seem to provide as good or better behavior with the
 optimization than without.


-=== 6. Interaction with skipping of "irrelevant" renames ===
+== 7. Interaction with skipping of "irrelevant" renames ==

 Previous optimizations involved skipping rename detection for paths
 considered to be "irrelevant".  See for example the following commits:
@ -421,24 +441,27 @@ detection -- though we can limit it to the paths for which we have not
 already detected renames.


-=== 7. Additional items that need to be cached ===
+== 8. Additional items that need to be cached ==

 It turns out we have to cache more than just renames; we also cache:

+....
  A) non-renames (i.e. unpaired deletes)
  B) counts of renames within directories
  C) sources that were marked as RELEVANT_LOCATION, but which were
     downgraded to RELEVANT_NO_MORE
  D) the toplevel trees involved in the merge
+....

 These are all stored in struct rename_info, and respectively appear in
+
  * cached_pairs (along side actual renames, just with a value of NULL)
  * dir_rename_counts
  * cached_irrelevant
  * merge_trees

-The reason for (A) comes from the irrelevant renames skipping
-optimization discussed in section 6.  The fact that irrelevant renames
+The reason for `(A)` comes from the irrelevant renames skipping
+optimization discussed in section 7.  The fact that irrelevant renames
 are skipped means we only get a subset of the potential renames
 detected and subsequent commits may need to run rename detection on
 the upstream side on a subset of the remaining renames (to get the
@ -447,23 +470,24 @@ deletes are involved in rename detection too, we don't want to
 repeatedly check that those paths remain unpaired on the upstream side
 with every commit we are transplanting.

-The reason for (B) is that diffcore_rename_extended() is what
+The reason for `(B)` is that diffcore_rename_extended() is what
 generates the counts of renames by directory which is needed in
 directory rename detection, and if we don't run
 diffcore_rename_extended() again then we need to have the output from
 it, including dir_rename_counts, from the previous run.

-The reason for (C) is that merge-ort's tree traversal will again think
+The reason for `(C)` is that merge-ort's tree traversal will again think
 those paths are relevant (marking them as RELEVANT_LOCATION), but the
 fact that they were downgraded to RELEVANT_NO_MORE means that
 dir_rename_counts already has the information we need for directory
 rename detection.  (A path which becomes RELEVANT_CONTENT in a
 subsequent commit will be removed from cached_irrelevant.)

-The reason for (D) is that is how we determine whether the remember
+The reason for `(D)` is that is how we determine whether the remember
 renames optimization can be used.  In particular, remembering that our
 sequence of merges looks like:

+....
   Merge 1:
   MERGE_BASE:   E
   MERGE_SIDE1:  G
@ -475,6 +499,7 @@ sequence of merges looks like:
   MERGE_SIDE1:  A'
   MERGE_SIDE2:  B
   => Creates    B'
+....

 It is the fact that the trees A and A' appear both in Merge 1 and in
 Merge 2, with A as a parent of A' that allows this optimization.  So
@ -482,12 +507,11 @@ we store the trees to compare with what we are asked to merge next
 time.


-=== 8. How directory rename detection interacts with the above and   ===
-===    why this optimization is still safe even if                   ===
-===    merge.directoryRenames is set to "true".                      ===
+== 9. How directory rename detection interacts with the above and why this optimization is still safe even if merge.directoryRenames is set to "true". ==

 As noted in the assumptions section:

+....
    """
    ...if directory renames do occur, then the default of
    merge.directoryRenames being set to "conflict" means that the operation
@ -497,11 +521,13 @@ As noted in the assumptions section:
    is that some users will have set merge.directoryRenames to "true" to
    allow the merges to continue to proceed automatically.
    """
+....

 Let's remember that we need to look at how any given pick affects the next
 one.  So let's again use the first two picks from the diagram in section
 one:

+....
  First pick does this three-way merge:
    MERGE_BASE:   E
    MERGE_SIDE1:  G
@ -513,6 +539,7 @@ one:
    MERGE_SIDE1:  A'
    MERGE_SIDE2:  B
    => creates B'
+....

 Now, directory rename detection exists so that if one side of history
 renames a directory, and the other side adds a new file to the old
@ -545,7 +572,7 @@ while considering all of these cases:
    concerned; see the assumptions section).  Two interesting sub-notes
    about these counts:

-    * If we need to perform rename-detection again on the given side (e.g.
+   ** If we need to perform rename-detection again on the given side (e.g.
      some paths are relevant for rename detection that weren't before),
      then we clear dir_rename_counts and recompute it, making use of
      cached_pairs.  The reason it is important to do this is optimizations
@ -556,7 +583,7 @@ while considering all of these cases:
      easiest way to "fix up" dir_rename_counts in such cases is to just
      recompute it.

-    * If we prune rename/rename(1to1) entries from the cache, then we also
+   ** If we prune rename/rename(1to1) entries from the cache, then we also
      need to update dir_rename_counts to decrement the counts for the
      involved directory and any relevant parent directories (to undo what
      update_dir_rename_counts() in diffcore-rename.c incremented when the
@ -578,6 +605,7 @@ in order:

 Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir

+....
  This case looks like this:

    MERGE_BASE:   E,   Has olddir/
@ -595,10 +623,13 @@ Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir
    * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
  Given the cached rename noted above, the second merge can proceed as
  expected without needing to perform rename detection from A -> A'.
+....

 Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames  file into old dir

+....
  This case looks like this:
+
    MERGE_BASE:   E    oldfile, olddir/
    MERGE_SIDE1:  G    oldfile, olddir/ -> newdir/
    MERGE_SIDE2:  A    oldfile -> olddir/newfile
@ -617,9 +648,11 @@ Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames  file into old dir

  Given the cached rename noted above, the second merge can proceed as
  expected without needing to perform rename detection from A -> A'.
+....

 Case 3: MERGE_SIDE1 adds new file to   old dir, MERGE_SIDE2 renames old dir

+....
  This case looks like this:

    MERGE_BASE:   E,   Has olddir/
@ -635,9 +668,11 @@ Case 3: MERGE_SIDE1 adds new file to   old dir, MERGE_SIDE2 renames old dir
  In this case, with the optimization, note that after the first commit there
  were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed.
  But the second merge didn't need any renames so this is fine.
+....

 Case 4: MERGE_SIDE1 renames  file into old dir, MERGE_SIDE2 renames old dir

+....
  This case looks like this:

    MERGE_BASE:   E,   Has olddir/
@ -658,6 +693,7 @@ Case 4: MERGE_SIDE1 renames  file into old dir, MERGE_SIDE2 renames old dir

  Given the cached rename noted above, the second merge can proceed as
  expected without needing to perform rename detection from A -> A'.
+....

 Finally, I'll just note here that interactions with the
 skip-irrelevant-renames optimization means we sometimes don't detect
--- a/Documentation/technical/sparse-checkout.adoc
+++ b/Documentation/technical/sparse-checkout.adoc