Merge branch 'rj/doc-technical-fixes'

Documentation mark-up fixes.

* rj/doc-technical-fixes:
  doc: add large-object-promisors.adoc to the docs build
  doc: commit-graph.adoc: fix up some formatting
  doc: sparse-checkout.adoc: fix asciidoc warnings
  doc: remembering-renames.adoc: fix asciidoc warnings
main
Junio C Hamano 2025-10-24 13:48:04 -07:00
commit 411903ce4c
6 changed files with 494 additions and 399 deletions

View File

@ -123,6 +123,7 @@ TECH_DOCS += technical/bundle-uri
TECH_DOCS += technical/commit-graph TECH_DOCS += technical/commit-graph
TECH_DOCS += technical/directory-rename-detection TECH_DOCS += technical/directory-rename-detection
TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/hash-function-transition
TECH_DOCS += technical/large-object-promisors
TECH_DOCS += technical/long-running-process-protocol TECH_DOCS += technical/long-running-process-protocol
TECH_DOCS += technical/multi-pack-index TECH_DOCS += technical/multi-pack-index
TECH_DOCS += technical/packfile-uri TECH_DOCS += technical/packfile-uri

View File

@ -39,6 +39,7 @@ A consumer may load the following info for a commit from the graph:
Values 1-4 satisfy the requirements of parse_commit_gently(). Values 1-4 satisfy the requirements of parse_commit_gently().


There are two definitions of generation number: There are two definitions of generation number:

1. Corrected committer dates (generation number v2) 1. Corrected committer dates (generation number v2)
2. Topological levels (generation number v1) 2. Topological levels (generation number v1)


@ -158,7 +159,8 @@ number of commits in the full history. By creating a "chain" of commit-graphs,
we enable fast writes of new commit data without rewriting the entire commit we enable fast writes of new commit data without rewriting the entire commit
history -- at least, most of the time. history -- at least, most of the time.


## File Layout File Layout
~~~~~~~~~~~


A commit-graph chain uses multiple files, and we use a fixed naming convention A commit-graph chain uses multiple files, and we use a fixed naming convention
to organize these files. Each commit-graph file has a name to organize these files. Each commit-graph file has a name
@ -170,11 +172,11 @@ hashes for the files in order from "lowest" to "highest".


For example, if the `commit-graph-chain` file contains the lines For example, if the `commit-graph-chain` file contains the lines


``` ----
{hash0} {hash0}
{hash1} {hash1}
{hash2} {hash2}
``` ----


then the commit-graph chain looks like the following diagram: then the commit-graph chain looks like the following diagram:


@ -213,7 +215,8 @@ specifying the hashes of all files in the lower layers. In the above example,
`graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains `graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains
`{hash0}` and `{hash1}`. `{hash0}` and `{hash1}`.


## Merging commit-graph files Merging commit-graph files
~~~~~~~~~~~~~~~~~~~~~~~~~~


If we only added a new commit-graph file on every write, we would run into a If we only added a new commit-graph file on every write, we would run into a
linear search problem through many commit-graph files. Instead, we use a merge linear search problem through many commit-graph files. Instead, we use a merge
@ -225,6 +228,7 @@ is determined by the merge strategy that the files should collapse to
the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}` the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}`
file. file.


....
+---------------------+ +---------------------+
| | | |
| (new commits) | | (new commits) |
@ -250,6 +254,7 @@ file.
| | | |
| | | |
+-----------------------+ +-----------------------+
....


During this process, the commits to write are combined, sorted and we write the During this process, the commits to write are combined, sorted and we write the
contents to a temporary file, all while holding a `commit-graph-chain.lock` contents to a temporary file, all while holding a `commit-graph-chain.lock`
@ -257,14 +262,15 @@ lock-file. When the file is flushed, we rename it to `graph-{hash3}`
according to the computed `{hash3}`. Finally, we write the new chain data to according to the computed `{hash3}`. Finally, we write the new chain data to
`commit-graph-chain.lock`: `commit-graph-chain.lock`:


``` ----
{hash3} {hash3}
{hash0} {hash0}
``` ----


We then close the lock-file. We then close the lock-file.


## Merge Strategy Merge Strategy
~~~~~~~~~~~~~~


When writing a set of commits that do not exist in the commit-graph stack of When writing a set of commits that do not exist in the commit-graph stack of
height N, we default to creating a new file at level N + 1. We then decide to height N, we default to creating a new file at level N + 1. We then decide to
@ -289,7 +295,8 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum
number of commits) could be extracted into config settings for full number of commits) could be extracted into config settings for full
flexibility. flexibility.


## Handling Mixed Generation Number Chains Handling Mixed Generation Number Chains
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


With the introduction of generation number v2 and generation data chunk, the With the introduction of generation number v2 and generation data chunk, the
following scenario is possible: following scenario is possible:
@ -318,7 +325,8 @@ have corrected commit dates when written by compatible versions of Git. Thus,
rewriting split commit-graph as a single file (`--split=replace`) creates a rewriting split commit-graph as a single file (`--split=replace`) creates a
single layer with corrected commit dates. single layer with corrected commit dates.


## Deleting graph-{hash} files Deleting graph-\{hash\} files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


After a new tip file is written, some `graph-{hash}` files may no longer After a new tip file is written, some `graph-{hash}` files may no longer
be part of a chain. It is important to remove these files from disk, eventually. be part of a chain. It is important to remove these files from disk, eventually.
@ -333,7 +341,8 @@ files whose modified times are older than a given expiry window. This window
defaults to zero, but can be changed using command-line arguments or a config defaults to zero, but can be changed using command-line arguments or a config
setting. setting.


## Chains across multiple object directories Chains across multiple object directories
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


In a repo with alternates, we look for the `commit-graph-chain` file starting In a repo with alternates, we look for the `commit-graph-chain` file starting
in the local object directory and then in each alternate. The first file that in the local object directory and then in each alternate. The first file that

View File

@ -34,8 +34,8 @@ a new object representation for large blobs as discussed in:


https://lore.kernel.org/git/xmqqbkdometi.fsf@gitster.g/ https://lore.kernel.org/git/xmqqbkdometi.fsf@gitster.g/


0) Non goals Non goals
------------ ---------


- We will not discuss those client side improvements here, as they - We will not discuss those client side improvements here, as they
would require changes in different parts of Git than this effort. would require changes in different parts of Git than this effort.
@ -90,8 +90,8 @@ later in this document:
even more to host content with larger blobs or more large blobs even more to host content with larger blobs or more large blobs
than currently. than currently.


I) Issues with the current situation I Issues with the current situation
------------------------------------ -----------------------------------


- Some statistics made on GitLab repos have shown that more than 75% - Some statistics made on GitLab repos have shown that more than 75%
of the disk space is used by blobs that are larger than 1MB and of the disk space is used by blobs that are larger than 1MB and
@ -138,8 +138,8 @@ I) Issues with the current situation
complaining that these tools require significant effort to set up, complaining that these tools require significant effort to set up,
learn and use correctly. learn and use correctly.


II) Main features of the "Large Object Promisors" solution II Main features of the "Large Object Promisors" solution
---------------------------------------------------------- ---------------------------------------------------------


The main features below should give a rough overview of how the The main features below should give a rough overview of how the
solution may work. Details about needed elements can be found in solution may work. Details about needed elements can be found in
@ -166,7 +166,7 @@ format. They should be used along with main remotes that contain the
other objects. other objects.


Note 1 Note 1
++++++ ^^^^^^


To clarify, a LOP is a normal promisor remote, except that: To clarify, a LOP is a normal promisor remote, except that:


@ -178,7 +178,7 @@ To clarify, a LOP is a normal promisor remote, except that:
itself. itself.


Note 2 Note 2
++++++ ^^^^^^


Git already makes it possible for a main remote to also be a promisor Git already makes it possible for a main remote to also be a promisor
remote storing both regular objects and large blobs for a client that remote storing both regular objects and large blobs for a client that
@ -186,13 +186,13 @@ clones from it with a filter on blob size. But here we explicitly want
to avoid that. to avoid that.


Rationale Rationale
+++++++++ ^^^^^^^^^


LOPs aim to be good at handling large blobs while main remotes are LOPs aim to be good at handling large blobs while main remotes are
already good at handling other objects. already good at handling other objects.


Implementation Implementation
++++++++++++++ ^^^^^^^^^^^^^^


Git already has support for multiple promisor remotes, see Git already has support for multiple promisor remotes, see
link:partial-clone.html#using-many-promisor-remotes[the partial clone documentation]. link:partial-clone.html#using-many-promisor-remotes[the partial clone documentation].
@ -213,19 +213,19 @@ remote helper (see linkgit:gitremote-helpers[7]) which makes the
underlying object storage appear like a remote to Git. underlying object storage appear like a remote to Git.


Note Note
++++ ^^^^


A LOP can be a promisor remote accessed using a remote helper by A LOP can be a promisor remote accessed using a remote helper by
both some clients and the main remote. both some clients and the main remote.


Rationale Rationale
+++++++++ ^^^^^^^^^


This looks like the simplest way to create LOPs that can cheaply This looks like the simplest way to create LOPs that can cheaply
handle many large blobs. handle many large blobs.


Implementation Implementation
++++++++++++++ ^^^^^^^^^^^^^^


Remote helpers are quite easy to write as shell scripts, but it might Remote helpers are quite easy to write as shell scripts, but it might
be more efficient and maintainable to write them using other languages be more efficient and maintainable to write them using other languages
@ -247,7 +247,7 @@ The underlying object storage that a LOP uses could also serve as
storage for large files handled by Git LFS. storage for large files handled by Git LFS.


Rationale Rationale
+++++++++ ^^^^^^^^^


This would simplify the server side if it wants to both use a LOP and This would simplify the server side if it wants to both use a LOP and
act as a Git LFS server. act as a Git LFS server.
@ -259,7 +259,7 @@ On the server side, a main remote should have a way to offload to a
LOP all its blobs with a size over a configurable threshold. LOP all its blobs with a size over a configurable threshold.


Rationale Rationale
+++++++++ ^^^^^^^^^


This makes it easy to set things up and to clean things up. For This makes it easy to set things up and to clean things up. For
example, an admin could use this to manually convert a repo not using example, an admin could use this to manually convert a repo not using
@ -268,7 +268,7 @@ some users would sometimes push large blobs, a cron job could use this
to regularly make sure the large blobs are moved to the LOP. to regularly make sure the large blobs are moved to the LOP.


Implementation Implementation
++++++++++++++ ^^^^^^^^^^^^^^


Using something based on `git repack --filter=...` to separate the Using something based on `git repack --filter=...` to separate the
blobs we want to offload from the other Git objects could be a good blobs we want to offload from the other Git objects could be a good
@ -284,13 +284,13 @@ should have ways to prevent oversize blobs to be fetched, and also
perhaps pushed, into it. perhaps pushed, into it.


Rationale Rationale
+++++++++ ^^^^^^^^^


A main remote containing many oversize blobs would defeat the purpose A main remote containing many oversize blobs would defeat the purpose
of LOPs. of LOPs.


Implementation Implementation
++++++++++++++ ^^^^^^^^^^^^^^


The way to offload to a LOP discussed in 4) above can be used to The way to offload to a LOP discussed in 4) above can be used to
regularly offload oversize blobs. About preventing oversize blobs from regularly offload oversize blobs. About preventing oversize blobs from
@ -326,18 +326,18 @@ large blobs directly from the LOP and the server would not need to
fetch those blobs from the LOP to be able to serve the client. fetch those blobs from the LOP to be able to serve the client.


Note Note
++++ ^^^^


For fetches instead of clones, a protocol negotiation might not always For fetches instead of clones, a protocol negotiation might not always
happen, see the "What about fetches?" FAQ entry below for details. happen, see the "What about fetches?" FAQ entry below for details.


Rationale Rationale
+++++++++ ^^^^^^^^^


Security, configurability and efficiency of setting things up. Security, configurability and efficiency of setting things up.


Implementation Implementation
++++++++++++++ ^^^^^^^^^^^^^^


A "promisor-remote" protocol v2 capability looks like a good way to A "promisor-remote" protocol v2 capability looks like a good way to
implement this. The way the client and server use this capability implement this. The way the client and server use this capability
@ -356,7 +356,7 @@ the client should be able to offload some large blobs it has fetched,
but might not need anymore, to the LOP. but might not need anymore, to the LOP.


Note Note
++++ ^^^^


It might depend on the context if it should be OK or not for clients It might depend on the context if it should be OK or not for clients
to offload large blobs they have created, instead of fetched, directly to offload large blobs they have created, instead of fetched, directly
@ -367,13 +367,13 @@ This should be discussed and refined when we get closer to
implementing this feature. implementing this feature.


Rationale Rationale
+++++++++ ^^^^^^^^^


On the client, the easiest way to deal with unneeded large blobs is to On the client, the easiest way to deal with unneeded large blobs is to
offload them. offload them.


Implementation Implementation
++++++++++++++ ^^^^^^^^^^^^^^


This is very similar to what 4) above is about, except on the client This is very similar to what 4) above is about, except on the client
side instead of the server side. So a good solution to 4) could likely side instead of the server side. So a good solution to 4) could likely
@ -385,8 +385,8 @@ when cloning (see 6) above). Also if the large blobs were fetched from
a LOP, it is likely, and can easily be confirmed, that the LOP still a LOP, it is likely, and can easily be confirmed, that the LOP still
has them, so that they can just be removed from the client. has them, so that they can just be removed from the client.


III) Benefits of using LOPs III Benefits of using LOPs
--------------------------- --------------------------


Many benefits are related to the issues discussed in "I) Issues with Many benefits are related to the issues discussed in "I) Issues with
the current situation" above: the current situation" above:
@ -406,8 +406,8 @@ the current situation" above:


- Reduced storage needs on the client side. - Reduced storage needs on the client side.


IV) FAQ IV FAQ
------- ------


What about using multiple LOPs on the server and client side? What about using multiple LOPs on the server and client side?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -533,7 +533,7 @@ some objects it already knows about but doesn't have because they are
on a promisor remote. on a promisor remote.


Regular fetch Regular fetch
+++++++++++++ ^^^^^^^^^^^^^


In a regular fetch, the client will contact the main remote and a In a regular fetch, the client will contact the main remote and a
protocol negotiation will happen between them. It's a good thing that protocol negotiation will happen between them. It's a good thing that
@ -551,7 +551,7 @@ new fetch will happen in the same way as the previous clone or fetch,
using, or not using, the same LOP(s) as last time. using, or not using, the same LOP(s) as last time.


"Backfill" or "lazy" fetch "Backfill" or "lazy" fetch
++++++++++++++++++++++++++ ^^^^^^^^^^^^^^^^^^^^^^^^^^


When there is a backfill fetch, the client doesn't necessarily contact When there is a backfill fetch, the client doesn't necessarily contact
the main remote first. It will try to fetch from its promisor remotes the main remote first. It will try to fetch from its promisor remotes
@ -576,8 +576,8 @@ from the client when it fetches from them. The client could get the
token when performing a protocol negotiation with the main remote (see token when performing a protocol negotiation with the main remote (see
section II.6 above). section II.6 above).


V) Future improvements V Future improvements
---------------------- ---------------------


It is expected that at the beginning using LOPs will be mostly worth It is expected that at the beginning using LOPs will be mostly worth
it either in a corporate context where the Git version that clients it either in a corporate context where the Git version that clients

View File

@ -13,6 +13,7 @@ articles = [
'commit-graph.adoc', 'commit-graph.adoc',
'directory-rename-detection.adoc', 'directory-rename-detection.adoc',
'hash-function-transition.adoc', 'hash-function-transition.adoc',
'large-object-promisors.adoc',
'long-running-process-protocol.adoc', 'long-running-process-protocol.adoc',
'multi-pack-index.adoc', 'multi-pack-index.adoc',
'packfile-uri.adoc', 'packfile-uri.adoc',

View File

@ -10,32 +10,32 @@ history as an optimization, assuming all merges are automatic and clean


Outline: Outline:


0. Assumptions 1. Assumptions


1. How rebasing and cherry-picking work 2. How rebasing and cherry-picking work


2. Why the renames on MERGE_SIDE1 in any given pick are *always* a 3. Why the renames on MERGE_SIDE1 in any given pick are *always* a
superset of the renames on MERGE_SIDE1 for the next pick. superset of the renames on MERGE_SIDE1 for the next pick.


3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also 4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also
a rename on MERGE_SIDE1 for the next pick a rename on MERGE_SIDE1 for the next pick


4. A detailed description of the counter-examples to #3. 5. A detailed description of the counter-examples to #4.


5. Why the special cases in #4 are still fully reasonable to use to pair 6. Why the special cases in #5 are still fully reasonable to use to pair
up files for three-way content merging in the merge machinery, and why up files for three-way content merging in the merge machinery, and why
they do not affect the correctness of the merge. they do not affect the correctness of the merge.


6. Interaction with skipping of "irrelevant" renames 7. Interaction with skipping of "irrelevant" renames


7. Additional items that need to be cached 8. Additional items that need to be cached


8. How directory rename detection interacts with the above and why this 9. How directory rename detection interacts with the above and why this
optimization is still safe even if merge.directoryRenames is set to optimization is still safe even if merge.directoryRenames is set to
"true". "true".




=== 0. Assumptions === == 1. Assumptions ==


There are two assumptions that will hold throughout this document: There are two assumptions that will hold throughout this document:


@ -44,8 +44,8 @@ There are two assumptions that will hold throughout this document:


* All merges are fully automatic * All merges are fully automatic


and a third that will hold in sections 2-5 for simplicity, that I'll later and a third that will hold in sections 3-6 for simplicity, that I'll later
address in section 8: address in section 9:


* No directory renames occur * No directory renames occur


@ -77,9 +77,9 @@ conflicts that the user needs to resolve), the cache of renames is not
stored on disk, and thus is thrown away as soon as the rebase or cherry stored on disk, and thus is thrown away as soon as the rebase or cherry
pick stops for the user to resolve the operation. pick stops for the user to resolve the operation.


The third assumption makes sections 2-5 simpler, and allows people to The third assumption makes sections 3-6 simpler, and allows people to
understand the basics of why this optimization is safe and effective, and understand the basics of why this optimization is safe and effective, and
then I can go back and address the specifics in section 8. It is probably then I can go back and address the specifics in section 9. It is probably
also worth noting that if directory renames do occur, then the default of also worth noting that if directory renames do occur, then the default of
merge.directoryRenames being set to "conflict" means that the operation merge.directoryRenames being set to "conflict" means that the operation
will stop for users to resolve the conflicts and the cache will be thrown will stop for users to resolve the conflicts and the cache will be thrown
@ -88,22 +88,26 @@ reason we need to address directory renames specifically, is that some
users will have set merge.directoryRenames to "true" to allow the merges to users will have set merge.directoryRenames to "true" to allow the merges to
continue to proceed automatically. The optimization is still safe with continue to proceed automatically. The optimization is still safe with
this config setting, but we have to discuss a few more cases to show why; this config setting, but we have to discuss a few more cases to show why;
this discussion is deferred until section 8. this discussion is deferred until section 9.




=== 1. How rebasing and cherry-picking work === == 2. How rebasing and cherry-picking work ==


Consider the following setup (from the git-rebase manpage): Consider the following setup (from the git-rebase manpage):


------------
A---B---C topic A---B---C topic
/ /
D---E---F---G main D---E---F---G main
------------


After rebasing or cherry-picking topic onto main, this will appear as: After rebasing or cherry-picking topic onto main, this will appear as:


------------
A'--B'--C' topic A'--B'--C' topic
/ /
D---E---F---G main D---E---F---G main
------------


The way the commits A', B', and C' are created is through a series of The way the commits A', B', and C' are created is through a series of
merges, where rebase or cherry-pick sequentially uses each of the three merges, where rebase or cherry-pick sequentially uses each of the three
@ -111,6 +115,7 @@ A-B-C commits in a special merge operation. Let's label the three commits
in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. For in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. For
this picture, the three commits for each of the three merges would be: this picture, the three commits for each of the three merges would be:


....
To create A': To create A':
MERGE_BASE: E MERGE_BASE: E
MERGE_SIDE1: G MERGE_SIDE1: G
@ -125,6 +130,7 @@ To create C':
MERGE_BASE: B MERGE_BASE: B
MERGE_SIDE1: B' MERGE_SIDE1: B'
MERGE_SIDE2: C MERGE_SIDE2: C
....


Sometimes, folks are surprised that these three-way merges are done. It Sometimes, folks are surprised that these three-way merges are done. It
can be useful in understanding these three-way merges to view them in a can be useful in understanding these three-way merges to view them in a
@ -138,8 +144,7 @@ Conceptually the two statements above are the same as a three-way merge of
B, B', and C, at least the parts before you decide to record a commit. B, B', and C, at least the parts before you decide to record a commit.




=== 2. Why the renames on MERGE_SIDE1 in any given pick are always a === == 3. Why the renames on MERGE_SIDE1 in any given pick are always a superset of the renames on MERGE_SIDE1 for the next pick. ==
=== superset of the renames on MERGE_SIDE1 for the next pick. ===


The merge machinery uses the filenames it is fed from MERGE_BASE, The merge machinery uses the filenames it is fed from MERGE_BASE,
MERGE_SIDE1, and MERGE_SIDE2. It will only move content to a different MERGE_SIDE1, and MERGE_SIDE2. It will only move content to a different
@ -156,6 +161,7 @@ filename under one of three conditions:
First, let's remember what commits are involved in the first and second First, let's remember what commits are involved in the first and second
picks of the cherry-pick or rebase sequence: picks of the cherry-pick or rebase sequence:


....
To create A': To create A':
MERGE_BASE: E MERGE_BASE: E
MERGE_SIDE1: G MERGE_SIDE1: G
@ -165,6 +171,7 @@ To create B':
MERGE_BASE: A MERGE_BASE: A
MERGE_SIDE1: A' MERGE_SIDE1: A'
MERGE_SIDE2: B MERGE_SIDE2: B
....


So, in particular, we need to show that the renames between E and G are a So, in particular, we need to show that the renames between E and G are a
superset of those between A and A'. superset of those between A and A'.
@ -181,11 +188,11 @@ are a subset of those between E and G. Equivalently, all renames between E
and G are a superset of those between A and A'. and G are a superset of those between A and A'.




=== 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ === == 4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also a rename on MERGE_SIDE1 for the next pick. ==
=== always also a rename on MERGE_SIDE1 for the next pick. ===


Let's again look at the first two picks: Let's again look at the first two picks:


....
To create A': To create A':
MERGE_BASE: E MERGE_BASE: E
MERGE_SIDE1: G MERGE_SIDE1: G
@ -195,17 +202,25 @@ To create B':
MERGE_BASE: A MERGE_BASE: A
MERGE_SIDE1: A' MERGE_SIDE1: A'
MERGE_SIDE2: B MERGE_SIDE2: B
....


Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e. Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e.
any given rename from E to G. Let's use the filenames 'oldfile' and any given rename from E to G. Let's use the filenames 'oldfile' and
'newfile' for demonstration purposes. That first pick will function as 'newfile' for demonstration purposes. That first pick will function as
follows; when the rename is detected, the merge machinery will do a follows; when the rename is detected, the merge machinery will do a
three-way content merge of the following: three-way content merge of the following:

....
E:oldfile E:oldfile
G:newfile G:newfile
A:oldfile A:oldfile
....

and produce a new result: and produce a new result:

....
A':newfile A':newfile
....


Note above that I've assumed that E->A did not rename oldfile. If that Note above that I've assumed that E->A did not rename oldfile. If that
side did rename, then we most likely have a rename/rename(1to2) conflict side did rename, then we most likely have a rename/rename(1to2) conflict
@ -254,19 +269,21 @@ were detected as renames, A:oldfile and A':newfile should also be
detectable as renames almost always. detectable as renames almost always.




=== 4. A detailed description of the counter-examples to #3. === == 5. A detailed description of the counter-examples to #4. ==


We already noted in section 3 that rename/rename(1to1) (i.e. both sides We already noted in section 4 that rename/rename(1to1) (i.e. both sides
renaming a file the same way) was one counter-example. The more renaming a file the same way) was one counter-example. The more
interesting bit, though, is why did we need to use the "almost" qualifier interesting bit, though, is why did we need to use the "almost" qualifier
when stating that A:oldfile and A':newfile are "almost" always detectable when stating that A:oldfile and A':newfile are "almost" always detectable
as renames? as renames?


Let's repeat an earlier point that section 3 made: Let's repeat an earlier point that section 4 made:


....
A':newfile was created by applying the changes between E:oldfile and A':newfile was created by applying the changes between E:oldfile and
G:newfile to A:oldfile. The changes between E:oldfile and G:newfile were G:newfile to A:oldfile. The changes between E:oldfile and G:newfile were
<50% of the size of E:oldfile. <50% of the size of E:oldfile.
....


If those changes that were <50% of the size of E:oldfile are also <50% of If those changes that were <50% of the size of E:oldfile are also <50% of
the size of A:oldfile, then A:oldfile and A':newfile will be detectable as the size of A:oldfile, then A:oldfile and A':newfile will be detectable as
@ -276,18 +293,21 @@ still somehow merge cleanly), then traditional rename detection would not
detect A:oldfile and A':newfile as renames. detect A:oldfile and A':newfile as renames.


Here's an example where that can happen: Here's an example where that can happen:

* E:oldfile had 20 lines * E:oldfile had 20 lines
* G:newfile added 10 new lines at the beginning of the file * G:newfile added 10 new lines at the beginning of the file
* A:oldfile kept the first 3 lines of the file, and deleted all the rest * A:oldfile kept the first 3 lines of the file, and deleted all the rest

then then

....
=> A':newfile would have 13 lines, 3 of which matches those in A:oldfile. => A':newfile would have 13 lines, 3 of which matches those in A:oldfile.
E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and
A':newfile would not be. A':newfile would not be.
....




=== 5. Why the special cases in #4 are still fully reasonable to use to === == 6. Why the special cases in #5 are still fully reasonable to use to pair up files for three-way content merging in the merge machinery, and why they do not affect the correctness of the merge. ==
=== pair up files for three-way content merging in the merge machinery, ===
=== and why they do not affect the correctness of the merge. ===


In the rename/rename(1to1) case, A:newfile and A':newfile are not renames In the rename/rename(1to1) case, A:newfile and A':newfile are not renames
since they use the *same* filename. However, files with the same filename since they use the *same* filename. However, files with the same filename
@ -295,14 +315,14 @@ are obviously fine to pair up for three-way content merging (the merge
machinery has never employed break detection). The interesting machinery has never employed break detection). The interesting
counter-example case is thus not the rename/rename(1to1) case, but the case counter-example case is thus not the rename/rename(1to1) case, but the case
where A did not rename oldfile. That was the case that we spent most of where A did not rename oldfile. That was the case that we spent most of
the time discussing in sections 3 and 4. The remainder of this section the time discussing in sections 4 and 5. The remainder of this section
will be devoted to that case as well. will be devoted to that case as well.


So, even if A:oldfile and A':newfile aren't detectable as renames, why is So, even if A:oldfile and A':newfile aren't detectable as renames, why is
it still reasonable to pair them up for three-way content merging in the it still reasonable to pair them up for three-way content merging in the
merge machinery? There are multiple reasons: merge machinery? There are multiple reasons:


* As noted in sections 3 and 4, the diff between A:oldfile and A':newfile * As noted in sections 4 and 5, the diff between A:oldfile and A':newfile
is *exactly* the same as the diff between E:oldfile and G:newfile. The is *exactly* the same as the diff between E:oldfile and G:newfile. The
latter pair were detected as renames, so it seems unlikely to surprise latter pair were detected as renames, so it seems unlikely to surprise
users for us to treat A:oldfile and A':newfile as renames. users for us to treat A:oldfile and A':newfile as renames.
@ -394,7 +414,7 @@ cases 1 and 3 seem to provide as good or better behavior with the
optimization than without. optimization than without.




=== 6. Interaction with skipping of "irrelevant" renames === == 7. Interaction with skipping of "irrelevant" renames ==


Previous optimizations involved skipping rename detection for paths Previous optimizations involved skipping rename detection for paths
considered to be "irrelevant". See for example the following commits: considered to be "irrelevant". See for example the following commits:
@ -421,24 +441,27 @@ detection -- though we can limit it to the paths for which we have not
already detected renames. already detected renames.




=== 7. Additional items that need to be cached === == 8. Additional items that need to be cached ==


It turns out we have to cache more than just renames; we also cache: It turns out we have to cache more than just renames; we also cache:


....
A) non-renames (i.e. unpaired deletes) A) non-renames (i.e. unpaired deletes)
B) counts of renames within directories B) counts of renames within directories
C) sources that were marked as RELEVANT_LOCATION, but which were C) sources that were marked as RELEVANT_LOCATION, but which were
downgraded to RELEVANT_NO_MORE downgraded to RELEVANT_NO_MORE
D) the toplevel trees involved in the merge D) the toplevel trees involved in the merge
....


These are all stored in struct rename_info, and respectively appear in These are all stored in struct rename_info, and respectively appear in

* cached_pairs (along side actual renames, just with a value of NULL) * cached_pairs (along side actual renames, just with a value of NULL)
* dir_rename_counts * dir_rename_counts
* cached_irrelevant * cached_irrelevant
* merge_trees * merge_trees


The reason for (A) comes from the irrelevant renames skipping The reason for `(A)` comes from the irrelevant renames skipping
optimization discussed in section 6. The fact that irrelevant renames optimization discussed in section 7. The fact that irrelevant renames
are skipped means we only get a subset of the potential renames are skipped means we only get a subset of the potential renames
detected and subsequent commits may need to run rename detection on detected and subsequent commits may need to run rename detection on
the upstream side on a subset of the remaining renames (to get the the upstream side on a subset of the remaining renames (to get the
@ -447,23 +470,24 @@ deletes are involved in rename detection too, we don't want to
repeatedly check that those paths remain unpaired on the upstream side repeatedly check that those paths remain unpaired on the upstream side
with every commit we are transplanting. with every commit we are transplanting.


The reason for (B) is that diffcore_rename_extended() is what The reason for `(B)` is that diffcore_rename_extended() is what
generates the counts of renames by directory which is needed in generates the counts of renames by directory which is needed in
directory rename detection, and if we don't run directory rename detection, and if we don't run
diffcore_rename_extended() again then we need to have the output from diffcore_rename_extended() again then we need to have the output from
it, including dir_rename_counts, from the previous run. it, including dir_rename_counts, from the previous run.


The reason for (C) is that merge-ort's tree traversal will again think The reason for `(C)` is that merge-ort's tree traversal will again think
those paths are relevant (marking them as RELEVANT_LOCATION), but the those paths are relevant (marking them as RELEVANT_LOCATION), but the
fact that they were downgraded to RELEVANT_NO_MORE means that fact that they were downgraded to RELEVANT_NO_MORE means that
dir_rename_counts already has the information we need for directory dir_rename_counts already has the information we need for directory
rename detection. (A path which becomes RELEVANT_CONTENT in a rename detection. (A path which becomes RELEVANT_CONTENT in a
subsequent commit will be removed from cached_irrelevant.) subsequent commit will be removed from cached_irrelevant.)


The reason for (D) is that is how we determine whether the remember The reason for `(D)` is that is how we determine whether the remember
renames optimization can be used. In particular, remembering that our renames optimization can be used. In particular, remembering that our
sequence of merges looks like: sequence of merges looks like:


....
Merge 1: Merge 1:
MERGE_BASE: E MERGE_BASE: E
MERGE_SIDE1: G MERGE_SIDE1: G
@ -475,6 +499,7 @@ sequence of merges looks like:
MERGE_SIDE1: A' MERGE_SIDE1: A'
MERGE_SIDE2: B MERGE_SIDE2: B
=> Creates B' => Creates B'
....


It is the fact that the trees A and A' appear both in Merge 1 and in It is the fact that the trees A and A' appear both in Merge 1 and in
Merge 2, with A as a parent of A' that allows this optimization. So Merge 2, with A as a parent of A' that allows this optimization. So
@ -482,12 +507,11 @@ we store the trees to compare with what we are asked to merge next
time. time.




=== 8. How directory rename detection interacts with the above and === == 9. How directory rename detection interacts with the above and why this optimization is still safe even if merge.directoryRenames is set to "true". ==
=== why this optimization is still safe even if ===
=== merge.directoryRenames is set to "true". ===


As noted in the assumptions section: As noted in the assumptions section:


....
""" """
...if directory renames do occur, then the default of ...if directory renames do occur, then the default of
merge.directoryRenames being set to "conflict" means that the operation merge.directoryRenames being set to "conflict" means that the operation
@ -497,11 +521,13 @@ As noted in the assumptions section:
is that some users will have set merge.directoryRenames to "true" to is that some users will have set merge.directoryRenames to "true" to
allow the merges to continue to proceed automatically. allow the merges to continue to proceed automatically.
""" """
....


Let's remember that we need to look at how any given pick affects the next Let's remember that we need to look at how any given pick affects the next
one. So let's again use the first two picks from the diagram in section one. So let's again use the first two picks from the diagram in section
one: one:


....
First pick does this three-way merge: First pick does this three-way merge:
MERGE_BASE: E MERGE_BASE: E
MERGE_SIDE1: G MERGE_SIDE1: G
@ -513,6 +539,7 @@ one:
MERGE_SIDE1: A' MERGE_SIDE1: A'
MERGE_SIDE2: B MERGE_SIDE2: B
=> creates B' => creates B'
....


Now, directory rename detection exists so that if one side of history Now, directory rename detection exists so that if one side of history
renames a directory, and the other side adds a new file to the old renames a directory, and the other side adds a new file to the old
@ -545,7 +572,7 @@ while considering all of these cases:
concerned; see the assumptions section). Two interesting sub-notes concerned; see the assumptions section). Two interesting sub-notes
about these counts: about these counts:


* If we need to perform rename-detection again on the given side (e.g. ** If we need to perform rename-detection again on the given side (e.g.
some paths are relevant for rename detection that weren't before), some paths are relevant for rename detection that weren't before),
then we clear dir_rename_counts and recompute it, making use of then we clear dir_rename_counts and recompute it, making use of
cached_pairs. The reason it is important to do this is optimizations cached_pairs. The reason it is important to do this is optimizations
@ -556,7 +583,7 @@ while considering all of these cases:
easiest way to "fix up" dir_rename_counts in such cases is to just easiest way to "fix up" dir_rename_counts in such cases is to just
recompute it. recompute it.


* If we prune rename/rename(1to1) entries from the cache, then we also ** If we prune rename/rename(1to1) entries from the cache, then we also
need to update dir_rename_counts to decrement the counts for the need to update dir_rename_counts to decrement the counts for the
involved directory and any relevant parent directories (to undo what involved directory and any relevant parent directories (to undo what
update_dir_rename_counts() in diffcore-rename.c incremented when the update_dir_rename_counts() in diffcore-rename.c incremented when the
@ -578,6 +605,7 @@ in order:


Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir


....
This case looks like this: This case looks like this:


MERGE_BASE: E, Has olddir/ MERGE_BASE: E, Has olddir/
@ -595,10 +623,13 @@ Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir
* MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
Given the cached rename noted above, the second merge can proceed as Given the cached rename noted above, the second merge can proceed as
expected without needing to perform rename detection from A -> A'. expected without needing to perform rename detection from A -> A'.
....


Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir


....
This case looks like this: This case looks like this:

MERGE_BASE: E oldfile, olddir/ MERGE_BASE: E oldfile, olddir/
MERGE_SIDE1: G oldfile, olddir/ -> newdir/ MERGE_SIDE1: G oldfile, olddir/ -> newdir/
MERGE_SIDE2: A oldfile -> olddir/newfile MERGE_SIDE2: A oldfile -> olddir/newfile
@ -617,9 +648,11 @@ Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir


Given the cached rename noted above, the second merge can proceed as Given the cached rename noted above, the second merge can proceed as
expected without needing to perform rename detection from A -> A'. expected without needing to perform rename detection from A -> A'.
....


Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir


....
This case looks like this: This case looks like this:


MERGE_BASE: E, Has olddir/ MERGE_BASE: E, Has olddir/
@ -635,9 +668,11 @@ Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir
In this case, with the optimization, note that after the first commit there In this case, with the optimization, note that after the first commit there
were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed. were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed.
But the second merge didn't need any renames so this is fine. But the second merge didn't need any renames so this is fine.
....


Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir


....
This case looks like this: This case looks like this:


MERGE_BASE: E, Has olddir/ MERGE_BASE: E, Has olddir/
@ -658,6 +693,7 @@ Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir


Given the cached rename noted above, the second merge can proceed as Given the cached rename noted above, the second merge can proceed as
expected without needing to perform rename detection from A -> A'. expected without needing to perform rename detection from A -> A'.
....


Finally, I'll just note here that interactions with the Finally, I'll just note here that interactions with the
skip-irrelevant-renames optimization means we sometimes don't detect skip-irrelevant-renames optimization means we sometimes don't detect

View File

@ -14,37 +14,41 @@ Table of contents:
* Reference Emails * Reference Emails




=== Terminology === == Terminology ==


cone mode: one of two modes for specifying the desired subset of files *`cone mode`*::
one of two modes for specifying the desired subset of files
in a sparse-checkout. In cone-mode, the user specifies in a sparse-checkout. In cone-mode, the user specifies
directories (getting both everything under that directory as directories (getting both everything under that directory as
well as everything in leading directories), while in non-cone well as everything in leading directories), while in non-cone
mode, the user specifies gitignore-style patterns. Controlled mode, the user specifies gitignore-style patterns. Controlled
by the --[no-]cone option to sparse-checkout init|set. by the --[no-]cone option to sparse-checkout init|set.


SKIP_WORKTREE: When tracked files do not match the sparse specification and *`SKIP_WORKTREE`*::
When tracked files do not match the sparse specification and
are removed from the working tree, the file in the index is marked are removed from the working tree, the file in the index is marked
with a SKIP_WORKTREE bit. Note that if a tracked file has the with a SKIP_WORKTREE bit. Note that if a tracked file has the
SKIP_WORKTREE bit set but the file is later written by the user to SKIP_WORKTREE bit set but the file is later written by the user to
the working tree anyway, the SKIP_WORKTREE bit will be cleared at the working tree anyway, the SKIP_WORKTREE bit will be cleared at
the beginning of any subsequent Git operation. the beginning of any subsequent Git operation.
+
Most sparse checkout users are unaware of this implementation
detail, and the term should generally be avoided in user-facing
descriptions and command flags. Unfortunately, prior to the
`sparse-checkout` subcommand this low-level detail was exposed,
and as of time of writing, is still exposed in various places.


Most sparse checkout users are unaware of this implementation *`sparse-checkout`*::
detail, and the term should generally be avoided in user-facing a subcommand in git used to reduce the files present in
descriptions and command flags. Unfortunately, prior to the
`sparse-checkout` subcommand this low-level detail was exposed,
and as of time of writing, is still exposed in various places.

sparse-checkout: a subcommand in git used to reduce the files present in
the working tree to a subset of all tracked files. Also, the the working tree to a subset of all tracked files. Also, the
name of the file in the $GIT_DIR/info directory used to track name of the file in the $GIT_DIR/info directory used to track
the sparsity patterns corresponding to the user's desired the sparsity patterns corresponding to the user's desired
subset. subset.


sparse cone: see cone mode *`sparse cone`*:: see cone mode


sparse directory: An entry in the index corresponding to a directory, which *`sparse directory`*::
An entry in the index corresponding to a directory, which
appears in the index instead of all the files under that directory appears in the index instead of all the files under that directory
that would normally appear. See also sparse-index. Something that that would normally appear. See also sparse-index. Something that
can cause confusion is that the "sparse directory" does NOT match can cause confusion is that the "sparse directory" does NOT match
@ -52,7 +56,8 @@ sparse directory: An entry in the index corresponding to a directory, which
working tree. May be renamed in the future (e.g. to "skipped working tree. May be renamed in the future (e.g. to "skipped
directory"). directory").


sparse index: A special mode for sparse-checkout that also makes the *`sparse index`*::
A special mode for sparse-checkout that also makes the
index sparse by recording a directory entry in lieu of all the index sparse by recording a directory entry in lieu of all the
files underneath that directory (thus making that a "skipped files underneath that directory (thus making that a "skipped
directory" which unfortunately has also been called a "sparse directory" which unfortunately has also been called a "sparse
@ -60,7 +65,8 @@ sparse index: A special mode for sparse-checkout that also makes the
directories. Controlled by the --[no-]sparse-index option to directories. Controlled by the --[no-]sparse-index option to
init|set|reapply. init|set|reapply.


sparsity patterns: patterns from $GIT_DIR/info/sparse-checkout used to *`sparsity patterns`*::
patterns from $GIT_DIR/info/sparse-checkout used to
define the set of files of interest. A warning: It is easy to define the set of files of interest. A warning: It is easy to
over-use this term (or the shortened "patterns" term), for two over-use this term (or the shortened "patterns" term), for two
reasons: (1) users in cone mode specify directories rather than reasons: (1) users in cone mode specify directories rather than
@ -70,7 +76,8 @@ sparsity patterns: patterns from $GIT_DIR/info/sparse-checkout used to
transiently differ in the working tree or index from the sparsity transiently differ in the working tree or index from the sparsity
patterns (see "Sparse specification vs. sparsity patterns"). patterns (see "Sparse specification vs. sparsity patterns").


sparse specification: The set of paths in the user's area of focus. This *`sparse specification`*::
The set of paths in the user's area of focus. This
is typically just the tracked files that match the sparsity is typically just the tracked files that match the sparsity
patterns, but the sparse specification can temporarily differ and patterns, but the sparse specification can temporarily differ and
include additional files. (See also "Sparse specification include additional files. (See also "Sparse specification
@ -87,12 +94,13 @@ sparse specification: The set of paths in the user's area of focus. This
* If working with the index and the working copy, the sparse * If working with the index and the working copy, the sparse
specification is the union of the paths from above. specification is the union of the paths from above.


vivifying: When a command restores a tracked file to the working tree (and *`vivifying`*::
When a command restores a tracked file to the working tree (and
hopefully also clears the SKIP_WORKTREE bit in the index for that hopefully also clears the SKIP_WORKTREE bit in the index for that
file), this is referred to as "vivifying" the file. file), this is referred to as "vivifying" the file.




=== Purpose of sparse-checkouts === == Purpose of sparse-checkouts ==


sparse-checkouts exist to allow users to work with a subset of their sparse-checkouts exist to allow users to work with a subset of their
files. files.
@ -120,14 +128,12 @@ those usecases, sparse-checkouts can modify different subcommands in over a
half dozen different ways. Let's start by considering the high level half dozen different ways. Let's start by considering the high level
usecases: usecases:


A) Users are _only_ interested in the sparse portion of the repo [horizontal]

A):: Users are _only_ interested in the sparse portion of the repo
A*) Users are _only_ interested in the sparse portion of the repo A*):: Users are _only_ interested in the sparse portion of the repo
that they have downloaded so far that they have downloaded so far

B):: Users want a sparse working tree, but are working in a larger whole
B) Users want a sparse working tree, but are working in a larger whole C):: sparse-checkout is a behind-the-scenes implementation detail allowing

C) sparse-checkout is a behind-the-scenes implementation detail allowing
Git to work with a specially crafted in-house virtual file system; Git to work with a specially crafted in-house virtual file system;
users are actually working with a "full" working tree that is users are actually working with a "full" working tree that is
lazily populated, and sparse-checkout helps with the lazy population lazily populated, and sparse-checkout helps with the lazy population
@ -136,7 +142,7 @@ usecases:
It may be worth explaining each of these in a bit more detail: It may be worth explaining each of these in a bit more detail:




(Behavior A) Users are _only_ interested in the sparse portion of the repo === (Behavior A) Users are _only_ interested in the sparse portion of the repo


These folks might know there are other things in the repository, but These folks might know there are other things in the repository, but
don't care. They are uninterested in other parts of the repository, and don't care. They are uninterested in other parts of the repository, and
@ -163,8 +169,7 @@ side-effects of various other commands (such as the printed diffstat
after a merge or pull) can lead to worries about local repository size after a merge or pull) can lead to worries about local repository size
growing unnecessarily[10]. growing unnecessarily[10].


(Behavior A*) Users are _only_ interested in the sparse portion of the repo === (Behavior A*) Users are _only_ interested in the sparse portion of the repo that they have downloaded so far (a variant on the first usecase)
that they have downloaded so far (a variant on the first usecase)


This variant is driven by folks who using partial clones together with This variant is driven by folks who using partial clones together with
sparse checkouts and do disconnected development (so far sounding like a sparse checkouts and do disconnected development (so far sounding like a
@ -173,15 +178,14 @@ reason for yet another variant is that downloading even just the blobs
through history within their sparse specification may be too much, so they through history within their sparse specification may be too much, so they
only download some. They would still like operations to succeed without only download some. They would still like operations to succeed without
network connectivity, though, so things like `git log -S${SEARCH_TERM} -p` network connectivity, though, so things like `git log -S${SEARCH_TERM} -p`
or `git grep ${SEARCH_TERM} OLDREV ` would need to be prepared to provide or `git grep ${SEARCH_TERM} OLDREV` would need to be prepared to provide
partial results that depend on what happens to have been downloaded. partial results that depend on what happens to have been downloaded.


This variant could be viewed as Behavior A with the sparse specification This variant could be viewed as Behavior A with the sparse specification
for history querying operations modified from "sparsity patterns" to for history querying operations modified from "sparsity patterns" to
"sparsity patterns limited to the blobs we have already downloaded". "sparsity patterns limited to the blobs we have already downloaded".


(Behavior B) Users want a sparse working tree, but are working in a === (Behavior B) Users want a sparse working tree, but are working in a larger whole
larger whole


Stolee described this usecase this way[11]: Stolee described this usecase this way[11]:


@ -229,8 +233,7 @@ those expensive checks when interacting with the working copy, and may
prefer getting "unrelated" results from their history queries over having prefer getting "unrelated" results from their history queries over having
slow commands. slow commands.


(Behavior C) sparse-checkout is an implementational detail supporting a === (Behavior C) sparse-checkout is an implementational detail supporting a special VFS.
special VFS.


This usecase goes slightly against the traditional definition of This usecase goes slightly against the traditional definition of
sparse-checkout in that it actually tries to present a full or dense sparse-checkout in that it actually tries to present a full or dense
@ -255,13 +258,13 @@ will perceive the checkout as dense, and commands should thus behave as if
all files are present. all files are present.




=== Usecases of primary concern === == Usecases of primary concern ==


Most of the rest of this document will focus on Behavior A and Behavior Most of the rest of this document will focus on Behavior A and Behavior
B. Some notes about the other two cases and why we are not focusing on B. Some notes about the other two cases and why we are not focusing on
them: them:


(Behavior A*) === (Behavior A*)


Supporting this usecase is estimated to be difficult and a lot of work. Supporting this usecase is estimated to be difficult and a lot of work.
There are no plans to implement it currently, but it may be a potential There are no plans to implement it currently, but it may be a potential
@ -275,7 +278,7 @@ valid for this usecase, with the only exception being that it redefines the
sparse specification to restrict it to already-downloaded blobs. The hard sparse specification to restrict it to already-downloaded blobs. The hard
part is in making commands capable of respecting that modified definition. part is in making commands capable of respecting that modified definition.


(Behavior C) === (Behavior C)


This usecase violates some of the early sparse-checkout documented This usecase violates some of the early sparse-checkout documented
assumptions (since files marked as SKIP_WORKTREE will be displayed to users assumptions (since files marked as SKIP_WORKTREE will be displayed to users
@ -300,20 +303,20 @@ Behavior C do not assume they are part of the Behavior B camp and propose
patches that break things for the real Behavior B folks. patches that break things for the real Behavior B folks.




=== Oversimplified mental models === == Oversimplified mental models ==


An oversimplification of the differences in the above behaviors is: An oversimplification of the differences in the above behaviors is:


Behavior A: Restrict worktree and history operations to sparse specification (Behavior A):: Restrict worktree and history operations to sparse specification
Behavior B: Restrict worktree operations to sparse specification; have any (Behavior B):: Restrict worktree operations to sparse specification; have any
history operations work across all files history operations work across all files
Behavior C: Do not restrict either worktree or history operations to the (Behavior C):: Do not restrict either worktree or history operations to the
sparse specification...with the exception of branch checkouts or sparse specification...with the exception of branch checkouts or
switches which avoid writing files that will match the index so switches which avoid writing files that will match the index so
they can later lazily be populated instead. they can later lazily be populated instead.




=== Desired behavior === == Desired behavior ==


As noted previously, despite the simple idea of just working with a subset As noted previously, despite the simple idea of just working with a subset
of files, there are a range of different behavioral changes that need to be of files, there are a range of different behavioral changes that need to be
@ -326,37 +329,38 @@ understanding these differences can be beneficial.


* Commands behaving the same regardless of high-level use-case * Commands behaving the same regardless of high-level use-case


* commands that only look at files within the sparsity specification ** commands that only look at files within the sparsity specification


* diff (without --cached or REVISION arguments) *** diff (without --cached or REVISION arguments)
* grep (without --cached or REVISION arguments) *** grep (without --cached or REVISION arguments)
* diff-files *** diff-files


* commands that restore files to the working tree that match sparsity ** commands that restore files to the working tree that match sparsity
patterns, and remove unmodified files that don't match those patterns, and remove unmodified files that don't match those
patterns: patterns:


* switch *** switch
* checkout (the switch-like half) *** checkout (the switch-like half)
* read-tree *** read-tree
* reset --hard *** reset --hard


* commands that write conflicted files to the working tree, but otherwise ** commands that write conflicted files to the working tree, but otherwise
will omit writing files to the working tree that do not match the will omit writing files to the working tree that do not match the
sparsity patterns: sparsity patterns:


* merge *** merge
* rebase *** rebase
* cherry-pick *** cherry-pick
* revert *** revert


* `am` and `apply --cached` should probably be in this section but *** `am` and `apply --cached` should probably be in this section but
are buggy (see the "Known bugs" section below) are buggy (see the "Known bugs" section below)


The behavior for these commands somewhat depends upon the merge The behavior for these commands somewhat depends upon the merge
strategy being used: strategy being used:
* `ort` behaves as described above
* `octopus` and `resolve` will always vivify any file changed in the merge *** `ort` behaves as described above
*** `octopus` and `resolve` will always vivify any file changed in the merge
relative to the first parent, which is rather suboptimal. relative to the first parent, which is rather suboptimal.


It is also important to note that these commands WILL update the index It is also important to note that these commands WILL update the index
@ -372,21 +376,21 @@ understanding these differences can be beneficial.
specification and the sparsity patterns (much like the commands in the specification and the sparsity patterns (much like the commands in the
previous section). previous section).


* commands that always ignore sparsity since commits must be full-tree ** commands that always ignore sparsity since commits must be full-tree


* archive *** archive
* bundle *** bundle
* commit *** commit
* format-patch *** format-patch
* fast-export *** fast-export
* fast-import *** fast-import
* commit-tree *** commit-tree


* commands that write any modified file to the working tree (conflicted ** commands that write any modified file to the working tree (conflicted
or not, and whether those paths match sparsity patterns or not): or not, and whether those paths match sparsity patterns or not):


* stash *** stash
* apply (without `--index` or `--cached`) *** apply (without `--index` or `--cached`)


* Commands that may slightly differ for behavior A vs. behavior B: * Commands that may slightly differ for behavior A vs. behavior B:


@ -394,19 +398,20 @@ understanding these differences can be beneficial.
behaviors, but may differ in verbosity and types of warning and error behaviors, but may differ in verbosity and types of warning and error
messages. messages.


* commands that make modifications to which files are tracked: ** commands that make modifications to which files are tracked:
* add
* rm *** add
* mv *** rm
* update-index *** mv
*** update-index


The fact that files can move between the 'tracked' and 'untracked' The fact that files can move between the 'tracked' and 'untracked'
categories means some commands will have to treat untracked files categories means some commands will have to treat untracked files
differently. But if we have to treat untracked files differently, differently. But if we have to treat untracked files differently,
then additional commands may also need changes: then additional commands may also need changes:


* status *** status
* clean *** clean


In particular, `status` may need to report any untracked files outside In particular, `status` may need to report any untracked files outside
the sparsity specification as an erroneous condition (especially to the sparsity specification as an erroneous condition (especially to
@ -420,9 +425,10 @@ understanding these differences can be beneficial.
may need to ignore the sparse specification by its nature. Also, its may need to ignore the sparse specification by its nature. Also, its
current --[no-]ignore-skip-worktree-entries default is totally bogus. current --[no-]ignore-skip-worktree-entries default is totally bogus.


* commands for manually tweaking paths in both the index and the working tree ** commands for manually tweaking paths in both the index and the working tree
* `restore`
* the restore-like half of `checkout` *** `restore`
*** the restore-like half of `checkout`


These commands should be similar to add/rm/mv in that they should These commands should be similar to add/rm/mv in that they should
only operate on the sparse specification by default, and require a only operate on the sparse specification by default, and require a
@ -433,18 +439,19 @@ understanding these differences can be beneficial.


* Commands that significantly differ for behavior A vs. behavior B: * Commands that significantly differ for behavior A vs. behavior B:


* commands that query history ** commands that query history
* diff (with --cached or REVISION arguments)
* grep (with --cached or REVISION arguments) *** diff (with --cached or REVISION arguments)
* show (when given commit arguments) *** grep (with --cached or REVISION arguments)
* blame (only matters when one or more -C flags are passed) *** show (when given commit arguments)
* and annotate *** blame (only matters when one or more -C flags are passed)
* log **** and annotate
* whatchanged (may not exist anymore) *** log
* ls-files *** whatchanged (may not exist anymore)
* diff-index *** ls-files
* diff-tree *** diff-index
* ls-tree *** diff-tree
*** ls-tree


Note: for log and whatchanged, revision walking logic is unaffected Note: for log and whatchanged, revision walking logic is unaffected
but displaying of patches is affected by scoping the command to the but displaying of patches is affected by scoping the command to the
@ -458,91 +465,91 @@ understanding these differences can be beneficial.


* Commands I don't know how to classify * Commands I don't know how to classify


* range-diff ** range-diff


Is this like `log` or `format-patch`? Is this like `log` or `format-patch`?


* cherry ** cherry


See range-diff See range-diff


* Commands unaffected by sparse-checkouts * Commands unaffected by sparse-checkouts


* shortlog ** shortlog
* show-branch ** show-branch
* rev-list ** rev-list
* bisect ** bisect


* branch ** branch
* describe ** describe
* fetch ** fetch
* gc ** gc
* init ** init
* maintenance ** maintenance
* notes ** notes
* pull (merge & rebase have the necessary changes) ** pull (merge & rebase have the necessary changes)
* push ** push
* submodule ** submodule
* tag ** tag


* config ** config
* filter-branch (works in separate checkout without sparse-checkout setup) ** filter-branch (works in separate checkout without sparse-checkout setup)
* pack-refs ** pack-refs
* prune ** prune
* remote ** remote
* repack ** repack
* replace ** replace


* bugreport ** bugreport
* count-objects ** count-objects
* fsck ** fsck
* gitweb ** gitweb
* help ** help
* instaweb ** instaweb
* merge-tree (doesn't touch worktree or index, and merges always compute full-tree) ** merge-tree (doesn't touch worktree or index, and merges always compute full-tree)
* rerere ** rerere
* verify-commit ** verify-commit
* verify-tag ** verify-tag


* commit-graph ** commit-graph
* hash-object ** hash-object
* index-pack ** index-pack
* mktag ** mktag
* mktree ** mktree
* multi-pack-index ** multi-pack-index
* pack-objects ** pack-objects
* prune-packed ** prune-packed
* symbolic-ref ** symbolic-ref
* unpack-objects ** unpack-objects
* update-ref ** update-ref
* write-tree (operates on index, possibly optimized to use sparse dir entries) ** write-tree (operates on index, possibly optimized to use sparse dir entries)


* for-each-ref ** for-each-ref
* get-tar-commit-id ** get-tar-commit-id
* ls-remote ** ls-remote
* merge-base (merges are computed full tree, so merge base should be too) ** merge-base (merges are computed full tree, so merge base should be too)
* name-rev ** name-rev
* pack-redundant ** pack-redundant
* rev-parse ** rev-parse
* show-index ** show-index
* show-ref ** show-ref
* unpack-file ** unpack-file
* var ** var
* verify-pack ** verify-pack


* <Everything under 'Interacting with Others' in 'git help --all'> ** <Everything under 'Interacting with Others' in 'git help --all'>
* <Everything under 'Low-level...Syncing' in 'git help --all'> ** <Everything under 'Low-level...Syncing' in 'git help --all'>
* <Everything under 'Low-level...Internal Helpers' in 'git help --all'> ** <Everything under 'Low-level...Internal Helpers' in 'git help --all'>
* <Everything under 'External commands' in 'git help --all'> ** <Everything under 'External commands' in 'git help --all'>


* Commands that might be affected, but who cares? * Commands that might be affected, but who cares?


* merge-file ** merge-file
* merge-index ** merge-index
* gitk? ** gitk?




=== Behavior classes === == Behavior classes ==


From the above there are a few classes of behavior: From the above there are a few classes of behavior:


@ -573,6 +580,7 @@ From the above there are a few classes of behavior:


Commands in this class generally behave like the "restrict" class, Commands in this class generally behave like the "restrict" class,
except that: except that:

(1) they will ignore the sparse specification and write files with (1) they will ignore the sparse specification and write files with
conflicts to the working tree (thus temporarily expanding the conflicts to the working tree (thus temporarily expanding the
sparse specification to include such files.) sparse specification to include such files.)
@ -609,37 +617,39 @@ From the above there are a few classes of behavior:
specification. specification.




=== Subcommand-dependent defaults === == Subcommand-dependent defaults ==


Note that we have different defaults depending on the command for the Note that we have different defaults depending on the command for the
desired behavior : desired behavior :


* Commands defaulting to "restrict": * Commands defaulting to "restrict":
* diff-files
* diff (without --cached or REVISION arguments)
* grep (without --cached or REVISION arguments)
* switch
* checkout (the switch-like half)
* reset (<commit>)


* restore ** diff-files
* checkout (the restore-like half) ** diff (without --cached or REVISION arguments)
* checkout-index ** grep (without --cached or REVISION arguments)
* reset (with pathspec) ** switch
** checkout (the switch-like half)
** reset (<commit>)

** restore
** checkout (the restore-like half)
** checkout-index
** reset (with pathspec)


This behavior makes sense; these interact with the working tree. This behavior makes sense; these interact with the working tree.


* Commands defaulting to "restrict modulo conflicts": * Commands defaulting to "restrict modulo conflicts":
* merge
* rebase
* cherry-pick
* revert


* am ** merge
* apply --index (which is kind of like an `am --no-commit`) ** rebase
** cherry-pick
** revert


* read-tree (especially with -m or -u; is kind of like a --no-commit merge) ** am
* reset (<tree-ish>, due to similarity to read-tree) ** apply --index (which is kind of like an `am --no-commit`)

** read-tree (especially with -m or -u; is kind of like a --no-commit merge)
** reset (<tree-ish>, due to similarity to read-tree)


These also interact with the working tree, but require slightly These also interact with the working tree, but require slightly
different behavior either so that (a) conflicts can be resolved or (b) different behavior either so that (a) conflicts can be resolved or (b)
@ -648,16 +658,17 @@ desired behavior :
(See also the "Known bugs" section below regarding `am` and `apply`) (See also the "Known bugs" section below regarding `am` and `apply`)


* Commands defaulting to "no restrict": * Commands defaulting to "no restrict":
* archive
* bundle
* commit
* format-patch
* fast-export
* fast-import
* commit-tree


* stash ** archive
* apply (without `--index`) ** bundle
** commit
** format-patch
** fast-export
** fast-import
** commit-tree

** stash
** apply (without `--index`)


These have completely different defaults and perhaps deserve the most These have completely different defaults and perhaps deserve the most
detailed explanation: detailed explanation:
@ -679,15 +690,18 @@ desired behavior :
sparse specification then we'll lose changes from the user. sparse specification then we'll lose changes from the user.


* Commands defaulting to "restrict also specially applied to untracked files": * Commands defaulting to "restrict also specially applied to untracked files":
* add
* rm
* mv
* update-index
* status
* clean (?)


** add
** rm
** mv
** update-index
** status
** clean (?)

....
Our original implementation for the first three of these commands was Our original implementation for the first three of these commands was
"no restrict", but it had some severe usability issues: "no restrict", but it had some severe usability issues:

* `git add <somefile>` if honored and outside the sparse * `git add <somefile>` if honored and outside the sparse
specification, can result in the file randomly disappearing later specification, can result in the file randomly disappearing later
when some subsequent command is run (since various commands when some subsequent command is run (since various commands
@ -701,8 +715,10 @@ desired behavior :
So, we switched `add` and `rm` to default to "restrict", which made So, we switched `add` and `rm` to default to "restrict", which made
usability problems much less severe and less frequent, but we still got usability problems much less severe and less frequent, but we still got
complaints because commands like: complaints because commands like:

git add <file-outside-sparse-specification> git add <file-outside-sparse-specification>
git rm <file-outside-sparse-specification> git rm <file-outside-sparse-specification>

would silently do nothing. We should instead print an error in those would silently do nothing. We should instead print an error in those
cases to get usability right. cases to get usability right.


@ -711,21 +727,22 @@ desired behavior :


There may be a difference in here between behavior A and behavior B in There may be a difference in here between behavior A and behavior B in
terms of verboseness of errors or additional warnings. terms of verboseness of errors or additional warnings.
....


* Commands falling under "restrict or no restrict dependent upon behavior * Commands falling under "restrict or no restrict dependent upon behavior
A vs. behavior B" A vs. behavior B"


* diff (with --cached or REVISION arguments) ** diff (with --cached or REVISION arguments)
* grep (with --cached or REVISION arguments) ** grep (with --cached or REVISION arguments)
* show (when given commit arguments) ** show (when given commit arguments)
* blame (only matters when one or more -C flags passed) ** blame (only matters when one or more -C flags passed)
* and annotate *** and annotate
* log ** log
* and variants: shortlog, gitk, show-branch, whatchanged, rev-list *** and variants: shortlog, gitk, show-branch, whatchanged, rev-list
* ls-files ** ls-files
* diff-index ** diff-index
* diff-tree ** diff-tree
* ls-tree ** ls-tree


For now, we default to behavior B for these, which want a default of For now, we default to behavior B for these, which want a default of
"no restrict". "no restrict".
@ -749,7 +766,7 @@ desired behavior :
implemented. implemented.




=== Sparse specification vs. sparsity patterns === == Sparse specification vs. sparsity patterns ==


In a well-behaved situation, the sparse specification is given directly In a well-behaved situation, the sparse specification is given directly
by the $GIT_DIR/info/sparse-checkout file. However, it can transiently by the $GIT_DIR/info/sparse-checkout file. However, it can transiently
@ -821,45 +838,48 @@ under behavior B index operations are lumped with history and tend to
operate full-tree. operate full-tree.




=== Implementation Questions === == Implementation Questions ==


* Do the options --scope={sparse,all} sound good to others? Are there better * Do the options --scope={sparse,all} sound good to others? Are there better options?
options?
* Names in use, or appearing in patches, or previously suggested: ** Names in use, or appearing in patches, or previously suggested:
* --sparse/--dense
* --ignore-skip-worktree-bits *** --sparse/--dense
* --ignore-skip-worktree-entries *** --ignore-skip-worktree-bits
* --ignore-sparsity *** --ignore-skip-worktree-entries
* --[no-]restrict-to-sparse-paths *** --ignore-sparsity
* --full-tree/--sparse-tree *** --[no-]restrict-to-sparse-paths
* --[no-]restrict *** --full-tree/--sparse-tree
* --scope={sparse,all} *** --[no-]restrict
* --focus/--unfocus *** --scope={sparse,all}
* --limit/--unlimited *** --focus/--unfocus
* Rationale making me lean slightly towards --scope={sparse,all}: *** --limit/--unlimited
* We want a name that works for many commands, so we need a name that
** Rationale making me lean slightly towards --scope={sparse,all}:

*** We want a name that works for many commands, so we need a name that
does not conflict does not conflict
* We know that we have more than two possible usecases, so it is best *** We know that we have more than two possible usecases, so it is best
to avoid a flag that appears to be binary. to avoid a flag that appears to be binary.
* --scope={sparse,all} isn't overly long and seems relatively *** --scope={sparse,all} isn't overly long and seems relatively
explanatory explanatory
* `--sparse`, as used in add/rm/mv, is totally backwards for *** `--sparse`, as used in add/rm/mv, is totally backwards for
grep/log/etc. Changing the meaning of `--sparse` for these grep/log/etc. Changing the meaning of `--sparse` for these
commands would fix the backwardness, but possibly break existing commands would fix the backwardness, but possibly break existing
scripts. Using a new name pairing would allow us to treat scripts. Using a new name pairing would allow us to treat
`--sparse` in these commands as a deprecated alias. `--sparse` in these commands as a deprecated alias.
* There is a different `--sparse`/`--dense` pair for commands using *** There is a different `--sparse`/`--dense` pair for commands using
revision machinery, so using that naming might cause confusion revision machinery, so using that naming might cause confusion
* There is also a `--sparse` in both pack-objects and show-branch, which *** There is also a `--sparse` in both pack-objects and show-branch, which
don't conflict but do suggest that `--sparse` is overloaded don't conflict but do suggest that `--sparse` is overloaded
* The name --ignore-skip-worktree-bits is a double negative, is *** The name --ignore-skip-worktree-bits is a double negative, is
quite a mouthful, refers to an implementation detail that many quite a mouthful, refers to an implementation detail that many
users may not be familiar with, and we'd need a negation for it users may not be familiar with, and we'd need a negation for it
which would probably be even more ridiculously long. (But we which would probably be even more ridiculously long. (But we
can make --ignore-skip-worktree-bits a deprecated alias for can make --ignore-skip-worktree-bits a deprecated alias for
--no-restrict.) --no-restrict.)


* If a config option is added (sparse.scope?) what should the values and ** If a config option is added (sparse.scope?) what should the values and
description be? "sparse" (behavior A), "worktree-sparse-history-dense" description be? "sparse" (behavior A), "worktree-sparse-history-dense"
(behavior B), "dense" (behavior C)? There's a risk of confusion, (behavior B), "dense" (behavior C)? There's a risk of confusion,
because even for Behaviors A and B we want some commands to be because even for Behaviors A and B we want some commands to be
@ -868,19 +888,20 @@ operate full-tree.
the primary difference we are focusing is just the history-querying the primary difference we are focusing is just the history-querying
commands (log/diff/grep). Previous config suggestion here: [13] commands (log/diff/grep). Previous config suggestion here: [13]


* Is `--no-expand` a good alias for ls-files's `--sparse` option? ** Is `--no-expand` a good alias for ls-files's `--sparse` option?
(`--sparse` does not map to either `--scope=sparse` or `--scope=all`, (`--sparse` does not map to either `--scope=sparse` or `--scope=all`,
because in non-cone mode it does nothing and in cone-mode it shows the because in non-cone mode it does nothing and in cone-mode it shows the
sparse directory entries which are technically outside the sparse sparse directory entries which are technically outside the sparse
specification) specification)


* Under Behavior A: ** Under Behavior A:
* Does ls-files' `--no-expand` override the default `--scope=all`, or
does it need an extra flag?
* Does ls-files' `-t` option imply `--scope=all`?
* Does update-index's `--[no-]skip-worktree` option imply `--scope=all`?


* sparse-checkout: once behavior A is fully implemented, should we take *** Does ls-files' `--no-expand` override the default `--scope=all`, or
does it need an extra flag?
*** Does ls-files' `-t` option imply `--scope=all`?
*** Does update-index's `--[no-]skip-worktree` option imply `--scope=all`?

** sparse-checkout: once behavior A is fully implemented, should we take
an interim measure to ease people into switching the default? Namely, an interim measure to ease people into switching the default? Namely,
if folks are not already in a sparse checkout, then require if folks are not already in a sparse checkout, then require
`sparse-checkout init/set` to take a `sparse-checkout init/set` to take a
@ -892,7 +913,7 @@ operate full-tree.
is seamless for them. is seamless for them.




=== Implementation Goals/Plans === == Implementation Goals/Plans ==


* Get buy-in on this document in general. * Get buy-in on this document in general.


@ -910,25 +931,26 @@ operate full-tree.
request that they not trigger this bug." flag request that they not trigger this bug." flag


* Flags & Config * Flags & Config
* Make `--sparse` in add/rm/mv a deprecated alias for `--scope=all`
* Make `--ignore-skip-worktree-bits` in checkout-index/checkout/restore ** Make `--sparse` in add/rm/mv a deprecated alias for `--scope=all`
** Make `--ignore-skip-worktree-bits` in checkout-index/checkout/restore
a deprecated aliases for `--scope=all` a deprecated aliases for `--scope=all`
* Create config option (sparse.scope?), tie it to the "Cliff notes" ** Create config option (sparse.scope?), tie it to the "Cliff notes"
overview overview


* Add --scope=sparse (and --scope=all) flag to each of the history querying ** Add --scope=sparse (and --scope=all) flag to each of the history querying
commands. IMPORTANT: make sure diff machinery changes don't mess with commands. IMPORTANT: make sure diff machinery changes don't mess with
format-patch, fast-export, etc. format-patch, fast-export, etc.


=== Known bugs === == Known bugs ==


This list used to be a lot longer (see e.g. [1,2,3,4,5,6,7,8,9]), but we've This list used to be a lot longer (see e.g. [1,2,3,4,5,6,7,8,9]), but we've
been working on it. been working on it.


0. Behavior A is not well supported in Git. (Behavior B didn't used to 1. Behavior A is not well supported in Git. (Behavior B didn't used to
be either, but was the easier of the two to implement.) be either, but was the easier of the two to implement.)


1. am and apply: 2. am and apply:


apply, without `--index` or `--cached`, relies on files being present apply, without `--index` or `--cached`, relies on files being present
in the working copy, and also writes to them unconditionally. As in the working copy, and also writes to them unconditionally. As
@ -948,7 +970,7 @@ been working on it.
files and then complain that those vivified files would be files and then complain that those vivified files would be
overwritten by merge. overwritten by merge.


2. reset --hard: 3. reset --hard:


reset --hard provides confusing error message (works correctly, but reset --hard provides confusing error message (works correctly, but
misleads the user into believing it didn't): misleads the user into believing it didn't):
@ -971,13 +993,13 @@ been working on it.
`git reset --hard` DID remove addme from the index and the working tree, contrary `git reset --hard` DID remove addme from the index and the working tree, contrary
to the error message, but in line with how reset --hard should behave. to the error message, but in line with how reset --hard should behave.


3. read-tree 4. read-tree


`read-tree` doesn't apply the 'SKIP_WORKTREE' bit to *any* of the `read-tree` doesn't apply the 'SKIP_WORKTREE' bit to *any* of the
entries it reads into the index, resulting in all your files suddenly entries it reads into the index, resulting in all your files suddenly
appearing to be "deleted". appearing to be "deleted".


4. Checkout, restore: 5. Checkout, restore:


These command do not handle path & revision arguments appropriately: These command do not handle path & revision arguments appropriately:


@ -1030,7 +1052,7 @@ been working on it.
S tracked S tracked
H tracked-but-maybe-skipped H tracked-but-maybe-skipped


5. checkout and restore --staged, continued: 6. checkout and restore --staged, continued:


These commands do not correctly scope operations to the sparse These commands do not correctly scope operations to the sparse
specification, and make it worse by not setting important SKIP_WORKTREE specification, and make it worse by not setting important SKIP_WORKTREE
@ -1046,56 +1068,82 @@ been working on it.
the sparse specification, but then it will be important to set the the sparse specification, but then it will be important to set the
SKIP_WORKTREE bits appropriately. SKIP_WORKTREE bits appropriately.


6. Performance issues; see: 7. Performance issues; see:

https://lore.kernel.org/git/CABPp-BEkJQoKZsQGCYioyga_uoDQ6iBeW+FKr8JhyuuTMK1RDw@mail.gmail.com/ https://lore.kernel.org/git/CABPp-BEkJQoKZsQGCYioyga_uoDQ6iBeW+FKr8JhyuuTMK1RDw@mail.gmail.com/




=== Reference Emails === == Reference Emails ==


Emails that detail various bugs we've had in sparse-checkout: Emails that detail various bugs we've had in sparse-checkout:


[1] (Original descriptions of behavior A & behavior B) [1] (Original descriptions of behavior A & behavior B):
https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
[2] (Fix stash applications in sparse checkouts; bugs from behavioral differences)
https://lore.kernel.org/git/ccfedc7140dbf63ba26a15f93bd3885180b26517.1606861519.git.gitgitgadget@gmail.com/
[3] (Present-despite-skipped entries)
https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/
[4] (Clone --no-checkout interaction)
https://lore.kernel.org/git/pull.801.v2.git.git.1591324899170.gitgitgadget@gmail.com/ (clone --no-checkout)
[5] (The need for update_sparsity() and avoiding `read-tree -mu HEAD`)
https://lore.kernel.org/git/3a1f084641eb47515b5a41ed4409a36128913309.1585270142.git.gitgitgadget@gmail.com/
[6] (SKIP_WORKTREE is advisory, not mandatory)
https://lore.kernel.org/git/844306c3e86ef67591cc086decb2b760e7d710a3.1585270142.git.gitgitgadget@gmail.com/
[7] (`worktree add` should copy sparsity settings from current worktree)
https://lore.kernel.org/git/c51cb3714e7b1d2f8c9370fe87eca9984ff4859f.1644269584.git.gitgitgadget@gmail.com/
[8] (Avoid negative surprises in add, rm, and mv)
https://lore.kernel.org/git/cover.1617914011.git.matheus.bernardino@usp.br/
https://lore.kernel.org/git/pull.1018.v4.git.1632497954.gitgitgadget@gmail.com/
[9] (Move from out-of-cone to in-cone)
https://lore.kernel.org/git/20220630023737.473690-6-shaoxuan.yuan02@gmail.com/
https://lore.kernel.org/git/20220630023737.473690-4-shaoxuan.yuan02@gmail.com/
[10] (Unnecessarily downloading objects outside sparse specification)
https://lore.kernel.org/git/CAOLTT8QfwOi9yx_qZZgyGa8iL8kHWutEED7ok_jxwTcYT_hf9Q@mail.gmail.com/


[11] (Stolee's comments on high-level usecases) https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
https://lore.kernel.org/git/1a1e33f6-3514-9afc-0a28-5a6b85bd8014@gmail.com/
[2] (Fix stash applications in sparse checkouts; bugs from behavioral differences):

https://lore.kernel.org/git/ccfedc7140dbf63ba26a15f93bd3885180b26517.1606861519.git.gitgitgadget@gmail.com/

[3] (Present-despite-skipped entries):

https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/

[4] (Clone --no-checkout interaction):

https://lore.kernel.org/git/pull.801.v2.git.git.1591324899170.gitgitgadget@gmail.com/ (clone --no-checkout)

[5] (The need for update_sparsity() and avoiding `read-tree -mu HEAD`):

https://lore.kernel.org/git/3a1f084641eb47515b5a41ed4409a36128913309.1585270142.git.gitgitgadget@gmail.com/

[6] (SKIP_WORKTREE is advisory, not mandatory):

https://lore.kernel.org/git/844306c3e86ef67591cc086decb2b760e7d710a3.1585270142.git.gitgitgadget@gmail.com/

[7] (`worktree add` should copy sparsity settings from current worktree):

https://lore.kernel.org/git/c51cb3714e7b1d2f8c9370fe87eca9984ff4859f.1644269584.git.gitgitgadget@gmail.com/

[8] (Avoid negative surprises in add, rm, and mv):

* https://lore.kernel.org/git/cover.1617914011.git.matheus.bernardino@usp.br/
* https://lore.kernel.org/git/pull.1018.v4.git.1632497954.gitgitgadget@gmail.com/

[9] (Move from out-of-cone to in-cone):

* https://lore.kernel.org/git/20220630023737.473690-6-shaoxuan.yuan02@gmail.com/
* https://lore.kernel.org/git/20220630023737.473690-4-shaoxuan.yuan02@gmail.com/

[10] (Unnecessarily downloading objects outside sparse specification):

https://lore.kernel.org/git/CAOLTT8QfwOi9yx_qZZgyGa8iL8kHWutEED7ok_jxwTcYT_hf9Q@mail.gmail.com/

[11] (Stolee's comments on high-level usecases):

https://lore.kernel.org/git/1a1e33f6-3514-9afc-0a28-5a6b85bd8014@gmail.com/


[12] Others commenting on eventually switching default to behavior A: [12] Others commenting on eventually switching default to behavior A:

* https://lore.kernel.org/git/xmqqh719pcoo.fsf@gitster.g/ * https://lore.kernel.org/git/xmqqh719pcoo.fsf@gitster.g/
* https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/ * https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/
* https://lore.kernel.org/git/a86af661-cf58-a4e5-0214-a67d3a794d7e@github.com/ * https://lore.kernel.org/git/a86af661-cf58-a4e5-0214-a67d3a794d7e@github.com/


[13] Previous config name suggestion and description [13] Previous config name suggestion and description:
* https://lore.kernel.org/git/CABPp-BE6zW0nJSStcVU=_DoDBnPgLqOR8pkTXK3dW11=T01OhA@mail.gmail.com/
https://lore.kernel.org/git/CABPp-BE6zW0nJSStcVU=_DoDBnPgLqOR8pkTXK3dW11=T01OhA@mail.gmail.com/


[14] Tangential issue: switch to cone mode as default sparse specification mechanism: [14] Tangential issue: switch to cone mode as default sparse specification mechanism:
https://lore.kernel.org/git/a1b68fd6126eb341ef3637bb93fedad4309b36d0.1650594746.git.gitgitgadget@gmail.com/
https://lore.kernel.org/git/a1b68fd6126eb341ef3637bb93fedad4309b36d0.1650594746.git.gitgitgadget@gmail.com/


[15] Lengthy email on grep behavior, covering what should be searched: [15] Lengthy email on grep behavior, covering what should be searched:
* https://lore.kernel.org/git/CABPp-BGVO3QdbfE84uF_3QDF0-y2iHHh6G5FAFzNRfeRitkuHw@mail.gmail.com/
https://lore.kernel.org/git/CABPp-BGVO3QdbfE84uF_3QDF0-y2iHHh6G5FAFzNRfeRitkuHw@mail.gmail.com/


[16] Email explaining sparsity patterns vs. SKIP_WORKTREE and history operations, [16] Email explaining sparsity patterns vs. SKIP_WORKTREE and history operations,
search for the parenthetical comment starting "We do not check". search for the parenthetical comment starting "We do not check".
https://lore.kernel.org/git/CABPp-BFsCPPNOZ92JQRJeGyNd0e-TCW-LcLyr0i_+VSQJP+GCg@mail.gmail.com/
https://lore.kernel.org/git/CABPp-BFsCPPNOZ92JQRJeGyNd0e-TCW-LcLyr0i_+VSQJP+GCg@mail.gmail.com/


[17] https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/ [17] https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/