1102 lines
		
	
	
		
			46 KiB
		
	
	
	
		
			Plaintext
		
	
	
			
		
		
	
	
			1102 lines
		
	
	
		
			46 KiB
		
	
	
	
		
			Plaintext
		
	
	
| Table of contents:
 | |
| 
 | |
|   * Terminology
 | |
|   * Purpose of sparse-checkouts
 | |
|   * Usecases of primary concern
 | |
|   * Oversimplified mental models ("Cliff Notes" for this document!)
 | |
|   * Desired behavior
 | |
|   * Behavior classes
 | |
|   * Subcommand-dependent defaults
 | |
|   * Sparse specification vs. sparsity patterns
 | |
|   * Implementation Questions
 | |
|   * Implementation Goals/Plans
 | |
|   * Known bugs
 | |
|   * Reference Emails
 | |
| 
 | |
| 
 | |
| === Terminology ===
 | |
| 
 | |
| cone mode: one of two modes for specifying the desired subset of files
 | |
| 	in a sparse-checkout.  In cone-mode, the user specifies
 | |
| 	directories (getting both everything under that directory as
 | |
| 	well as everything in leading directories), while in non-cone
 | |
| 	mode, the user specifies gitignore-style patterns.  Controlled
 | |
| 	by the --[no-]cone option to sparse-checkout init|set.
 | |
| 
 | |
| SKIP_WORKTREE: When tracked files do not match the sparse specification and
 | |
| 	are removed from the working tree, the file in the index is marked
 | |
| 	with a SKIP_WORKTREE bit.  Note that if a tracked file has the
 | |
| 	SKIP_WORKTREE bit set but the file is later written by the user to
 | |
| 	the working tree anyway, the SKIP_WORKTREE bit will be cleared at
 | |
| 	the beginning of any subsequent Git operation.
 | |
| 
 | |
| 	Most sparse checkout users are unaware of this implementation
 | |
| 	detail, and the term should generally be avoided in user-facing
 | |
| 	descriptions and command flags.  Unfortunately, prior to the
 | |
| 	`sparse-checkout` subcommand this low-level detail was exposed,
 | |
| 	and as of time of writing, is still exposed in various places.
 | |
| 
 | |
| sparse-checkout: a subcommand in git used to reduce the files present in
 | |
| 	the working tree to a subset of all tracked files.  Also, the
 | |
| 	name of the file in the $GIT_DIR/info directory used to track
 | |
| 	the sparsity patterns corresponding to the user's desired
 | |
| 	subset.
 | |
| 
 | |
| sparse cone: see cone mode
 | |
| 
 | |
| sparse directory: An entry in the index corresponding to a directory, which
 | |
| 	appears in the index instead of all the files under that directory
 | |
| 	that would normally appear.  See also sparse-index.  Something that
 | |
| 	can cause confusion is that the "sparse directory" does NOT match
 | |
| 	the sparse specification, i.e. the directory is NOT present in the
 | |
| 	working tree.  May be renamed in the future (e.g. to "skipped
 | |
| 	directory").
 | |
| 
 | |
| sparse index: A special mode for sparse-checkout that also makes the
 | |
| 	index sparse by recording a directory entry in lieu of all the
 | |
| 	files underneath that directory (thus making that a "skipped
 | |
| 	directory" which unfortunately has also been called a "sparse
 | |
| 	directory"), and does this for potentially multiple
 | |
| 	directories.  Controlled by the --[no-]sparse-index option to
 | |
| 	init|set|reapply.
 | |
| 
 | |
| sparsity patterns: patterns from $GIT_DIR/info/sparse-checkout used to
 | |
| 	define the set of files of interest.  A warning: It is easy to
 | |
| 	over-use this term (or the shortened "patterns" term), for two
 | |
| 	reasons: (1) users in cone mode specify directories rather than
 | |
| 	patterns (their directories are transformed into patterns, but
 | |
| 	users may think you are talking about non-cone mode if you use the
 | |
| 	word "patterns"), and (b) the sparse specification might
 | |
| 	transiently differ in the working tree or index from the sparsity
 | |
| 	patterns (see "Sparse specification vs. sparsity patterns").
 | |
| 
 | |
| sparse specification: The set of paths in the user's area of focus.  This
 | |
| 	is typically just the tracked files that match the sparsity
 | |
| 	patterns, but the sparse specification can temporarily differ and
 | |
| 	include additional files.  (See also "Sparse specification
 | |
| 	vs. sparsity patterns")
 | |
| 
 | |
| 	* When working with history, the sparse specification is exactly
 | |
| 	  the set of files matching the sparsity patterns.
 | |
| 	* When interacting with the working tree, the sparse specification
 | |
| 	  is the set of tracked files with a clear SKIP_WORKTREE bit or
 | |
| 	  tracked files present in the working copy.
 | |
| 	* When modifying or showing results from the index, the sparse
 | |
| 	  specification is the set of files with a clear SKIP_WORKTREE bit
 | |
| 	  or that differ in the index from HEAD.
 | |
| 	* If working with the index and the working copy, the sparse
 | |
| 	  specification is the union of the paths from above.
 | |
| 
 | |
| vivifying: When a command restores a tracked file to the working tree (and
 | |
| 	hopefully also clears the SKIP_WORKTREE bit in the index for that
 | |
| 	file), this is referred to as "vivifying" the file.
 | |
| 
 | |
| 
 | |
| === Purpose of sparse-checkouts ===
 | |
| 
 | |
| sparse-checkouts exist to allow users to work with a subset of their
 | |
| files.
 | |
| 
 | |
| You can think of sparse-checkouts as subdividing "tracked" files into two
 | |
| categories -- a sparse subset, and all the rest.  Implementationally, we
 | |
| mark "all the rest" in the index with a SKIP_WORKTREE bit and leave them
 | |
| out of the working tree.  The SKIP_WORKTREE files are still tracked, just
 | |
| not present in the working tree.
 | |
| 
 | |
| In the past, sparse-checkouts were defined by "SKIP_WORKTREE means the file
 | |
| is missing from the working tree but pretend the file contents match HEAD".
 | |
| That was not only bogus (it actually meant the file missing from the
 | |
| working tree matched the index rather than HEAD), but it was also a
 | |
| low-level detail which only provided decent behavior for a few commands.
 | |
| There were a surprising number of ways in which that guiding principle gave
 | |
| command results that violated user expectations, and as such was a bad
 | |
| mental model.  However, it persisted for many years and may still be found
 | |
| in some corners of the code base.
 | |
| 
 | |
| Anyway, the idea of "working with a subset of files" is simple enough, but
 | |
| there are multiple different high-level usecases which affect how some Git
 | |
| subcommands should behave.  Further, even if we only considered one of
 | |
| those usecases, sparse-checkouts can modify different subcommands in over a
 | |
| half dozen different ways.  Let's start by considering the high level
 | |
| usecases:
 | |
| 
 | |
|   A) Users are _only_ interested in the sparse portion of the repo
 | |
| 
 | |
|   A*) Users are _only_ interested in the sparse portion of the repo
 | |
|       that they have downloaded so far
 | |
| 
 | |
|   B) Users want a sparse working tree, but are working in a larger whole
 | |
| 
 | |
|   C) sparse-checkout is a behind-the-scenes implementation detail allowing
 | |
|      Git to work with a specially crafted in-house virtual file system;
 | |
|      users are actually working with a "full" working tree that is
 | |
|      lazily populated, and sparse-checkout helps with the lazy population
 | |
|      piece.
 | |
| 
 | |
| It may be worth explaining each of these in a bit more detail:
 | |
| 
 | |
| 
 | |
|   (Behavior A) Users are _only_ interested in the sparse portion of the repo
 | |
| 
 | |
| These folks might know there are other things in the repository, but
 | |
| don't care.  They are uninterested in other parts of the repository, and
 | |
| only want to know about changes within their area of interest.  Showing
 | |
| them other files from history (e.g. from diff/log/grep/etc.)  is a
 | |
| usability annoyance, potentially a huge one since other changes in
 | |
| history may dwarf the changes they are interested in.
 | |
| 
 | |
| Some of these users also arrive at this usecase from wanting to use partial
 | |
| clones together with sparse checkouts (in a way where they have downloaded
 | |
| blobs within the sparse specification) and do disconnected development.
 | |
| Not only do these users generally not care about other parts of the
 | |
| repository, but consider it a blocker for Git commands to try to operate on
 | |
| those.  If commands attempt to access paths in history outside the sparsity
 | |
| specification, then the partial clone will attempt to download additional
 | |
| blobs on demand, fail, and then fail the user's command.  (This may be
 | |
| unavoidable in some cases, e.g. when `git merge` has non-trivial changes to
 | |
| reconcile outside the sparse specification, but we should limit how often
 | |
| users are forced to connect to the network.)
 | |
| 
 | |
| Also, even for users using partial clones that do not mind being
 | |
| always connected to the network, the need to download blobs as
 | |
| side-effects of various other commands (such as the printed diffstat
 | |
| after a merge or pull) can lead to worries about local repository size
 | |
| growing unnecessarily[10].
 | |
| 
 | |
|   (Behavior A*) Users are _only_ interested in the sparse portion of the repo
 | |
|       that they have downloaded so far (a variant on the first usecase)
 | |
| 
 | |
| This variant is driven by folks who using partial clones together with
 | |
| sparse checkouts and do disconnected development (so far sounding like a
 | |
| subset of behavior A users) and doing so on very large repositories.  The
 | |
| reason for yet another variant is that downloading even just the blobs
 | |
| through history within their sparse specification may be too much, so they
 | |
| only download some.  They would still like operations to succeed without
 | |
| network connectivity, though, so things like `git log -S${SEARCH_TERM} -p`
 | |
| or `git grep ${SEARCH_TERM} OLDREV ` would need to be prepared to provide
 | |
| partial results that depend on what happens to have been downloaded.
 | |
| 
 | |
| This variant could be viewed as Behavior A with the sparse specification
 | |
| for history querying operations modified from "sparsity patterns" to
 | |
| "sparsity patterns limited to the blobs we have already downloaded".
 | |
| 
 | |
|   (Behavior B) Users want a sparse working tree, but are working in a
 | |
|       larger whole
 | |
| 
 | |
| Stolee described this usecase this way[11]:
 | |
| 
 | |
| "I'm also focused on users that know that they are a part of a larger
 | |
| whole. They know they are operating on a large repository but focus on
 | |
| what they need to contribute their part. I expect multiple "roles" to
 | |
| use very different, almost disjoint parts of the codebase. Some other
 | |
| "architect" users operate across the entire tree or hop between different
 | |
| sections of the codebase as necessary. In this situation, I'm wary of
 | |
| scoping too many features to the sparse-checkout definition, especially
 | |
| "git log," as it can be too confusing to have their view of the codebase
 | |
| depend on your "point of view."
 | |
| 
 | |
| People might also end up wanting behavior B due to complex inter-project
 | |
| dependencies.  The initial attempts to use sparse-checkouts usually involve
 | |
| the directories you are directly interested in plus what those directories
 | |
| depend upon within your repository.  But there's a monkey wrench here: if
 | |
| you have integration tests, they invert the hierarchy: to run integration
 | |
| tests, you need not only what you are interested in and its in-tree
 | |
| dependencies, you also need everything that depends upon what you are
 | |
| interested in or that depends upon one of your dependencies...AND you need
 | |
| all the in-tree dependencies of that expanded group.  That can easily
 | |
| change your sparse-checkout into a nearly dense one.
 | |
| 
 | |
| Naturally, that tends to kill the benefits of sparse-checkouts.  There are
 | |
| a couple solutions to this conundrum: either avoid grabbing in-repo
 | |
| dependencies (maybe have built versions of your in-repo dependencies pulled
 | |
| from a CI cache somewhere), or say that users shouldn't run integration
 | |
| tests directly and instead do it on the CI server when they submit a code
 | |
| review.  Or do both.  Regardless of whether you stub out your in-repo
 | |
| dependencies or stub out the things that depend upon you, there is
 | |
| certainly a reason to want to query and be aware of those other stubbed-out
 | |
| parts of the repository, particularly when the dependencies are complex or
 | |
| change relatively frequently.  Thus, for such uses, sparse-checkouts can be
 | |
| used to limit what you directly build and modify, but these users do not
 | |
| necessarily want their sparse checkout paths to limit their queries of
 | |
| versions in history.
 | |
| 
 | |
| Some people may also be interested in behavior B over behavior A simply as
 | |
| a performance workaround: if they are using non-cone mode, then they have
 | |
| to deal with its inherent quadratic performance problems.  In that mode,
 | |
| every operation that checks whether paths match the sparsity specification
 | |
| can be expensive.  As such, these users may only be willing to pay for
 | |
| those expensive checks when interacting with the working copy, and may
 | |
| prefer getting "unrelated" results from their history queries over having
 | |
| slow commands.
 | |
| 
 | |
|   (Behavior C) sparse-checkout is an implementational detail supporting a
 | |
| 	       special VFS.
 | |
| 
 | |
| This usecase goes slightly against the traditional definition of
 | |
| sparse-checkout in that it actually tries to present a full or dense
 | |
| checkout to the user.  However, this usecase utilizes the same underlying
 | |
| technical underpinnings in a new way which does provide some performance
 | |
| advantages to users.  The basic idea is that a company can have an in-house
 | |
| Git-aware Virtual File System which pretends all files are present in the
 | |
| working tree, by intercepting all file system accesses and using those to
 | |
| fetch and write accessed files on demand via partial clones.  The VFS uses
 | |
| sparse-checkout to prevent Git from writing or paying attention to many
 | |
| files, and manually updates the sparse checkout patterns itself based on
 | |
| user access and modification of files in the working tree.  See commit
 | |
| ecc7c8841d ("repo_read_index: add config to expect files outside sparse
 | |
| patterns", 2022-02-25) and the link at [17] for a more detailed description
 | |
| of such a VFS.
 | |
| 
 | |
| The biggest difference here is that users are completely unaware that the
 | |
| sparse-checkout machinery is even in use.  The sparse patterns are not
 | |
| specified by the user but rather are under the complete control of the VFS
 | |
| (and the patterns are updated frequently and dynamically by it).  The user
 | |
| will perceive the checkout as dense, and commands should thus behave as if
 | |
| all files are present.
 | |
| 
 | |
| 
 | |
| === Usecases of primary concern ===
 | |
| 
 | |
| Most of the rest of this document will focus on Behavior A and Behavior
 | |
| B.  Some notes about the other two cases and why we are not focusing on
 | |
| them:
 | |
| 
 | |
|   (Behavior A*)
 | |
| 
 | |
| Supporting this usecase is estimated to be difficult and a lot of work.
 | |
| There are no plans to implement it currently, but it may be a potential
 | |
| future alternative.  Knowing about the existence of additional alternatives
 | |
| may affect our choice of command line flags (e.g. if we need tri-state or
 | |
| quad-state flags rather than just binary flags), so it was still important
 | |
| to at least note.
 | |
| 
 | |
| Further, I believe the descriptions below for Behavior A are probably still
 | |
| valid for this usecase, with the only exception being that it redefines the
 | |
| sparse specification to restrict it to already-downloaded blobs.  The hard
 | |
| part is in making commands capable of respecting that modified definition.
 | |
| 
 | |
|   (Behavior C)
 | |
| 
 | |
| This usecase violates some of the early sparse-checkout documented
 | |
| assumptions (since files marked as SKIP_WORKTREE will be displayed to users
 | |
| as present in the working tree).  That violation may mean various
 | |
| sparse-checkout related behaviors are not well suited to this usecase and
 | |
| we may need tweaks -- to both documentation and code -- to handle it.
 | |
| However, this usecase is also perhaps the simplest model to support in that
 | |
| everything behaves like a dense checkout with a few exceptions (e.g. branch
 | |
| checkouts and switches write fewer things, knowing the VFS will lazily
 | |
| write the rest on an as-needed basis).
 | |
| 
 | |
| Since there is no publicly available VFS-related code for folks to try,
 | |
| the number of folks who can test such a usecase is limited.
 | |
| 
 | |
| The primary reason to note the Behavior C usecase is that as we fix things
 | |
| to better support Behaviors A and B, there may be additional places where
 | |
| we need to make tweaks allowing folks in this usecase to get the original
 | |
| non-sparse treatment.  For an example, see ecc7c8841d ("repo_read_index:
 | |
| add config to expect files outside sparse patterns", 2022-02-25).  The
 | |
| secondary reason to note Behavior C, is so that folks taking advantage of
 | |
| Behavior C do not assume they are part of the Behavior B camp and propose
 | |
| patches that break things for the real Behavior B folks.
 | |
| 
 | |
| 
 | |
| === Oversimplified mental models ===
 | |
| 
 | |
| An oversimplification of the differences in the above behaviors is:
 | |
| 
 | |
|   Behavior A: Restrict worktree and history operations to sparse specification
 | |
|   Behavior B: Restrict worktree operations to sparse specification; have any
 | |
| 	      history operations work across all files
 | |
|   Behavior C: Do not restrict either worktree or history operations to the
 | |
| 	      sparse specification...with the exception of branch checkouts or
 | |
| 	      switches which avoid writing files that will match the index so
 | |
| 	      they can later lazily be populated instead.
 | |
| 
 | |
| 
 | |
| === Desired behavior ===
 | |
| 
 | |
| As noted previously, despite the simple idea of just working with a subset
 | |
| of files, there are a range of different behavioral changes that need to be
 | |
| made to different subcommands to work well with such a feature.  See
 | |
| [1,2,3,4,5,6,7,8,9,10] for various examples.  In particular, at [2], we saw
 | |
| that mere composition of other commands that individually worked correctly
 | |
| in a sparse-checkout context did not imply that the higher level command
 | |
| would work correctly; it sometimes requires further tweaks.  So,
 | |
| understanding these differences can be beneficial.
 | |
| 
 | |
| * Commands behaving the same regardless of high-level use-case
 | |
| 
 | |
|   * commands that only look at files within the sparsity specification
 | |
| 
 | |
|       * diff (without --cached or REVISION arguments)
 | |
|       * grep (without --cached or REVISION arguments)
 | |
|       * diff-files
 | |
| 
 | |
|   * commands that restore files to the working tree that match sparsity
 | |
|     patterns, and remove unmodified files that don't match those
 | |
|     patterns:
 | |
| 
 | |
|       * switch
 | |
|       * checkout (the switch-like half)
 | |
|       * read-tree
 | |
|       * reset --hard
 | |
| 
 | |
|   * commands that write conflicted files to the working tree, but otherwise
 | |
|     will omit writing files to the working tree that do not match the
 | |
|     sparsity patterns:
 | |
| 
 | |
|       * merge
 | |
|       * rebase
 | |
|       * cherry-pick
 | |
|       * revert
 | |
| 
 | |
|       * `am` and `apply --cached` should probably be in this section but
 | |
| 	are buggy (see the "Known bugs" section below)
 | |
| 
 | |
|     The behavior for these commands somewhat depends upon the merge
 | |
|     strategy being used:
 | |
|       * `ort` behaves as described above
 | |
|       * `octopus` and `resolve` will always vivify any file changed in the merge
 | |
| 	relative to the first parent, which is rather suboptimal.
 | |
| 
 | |
|     It is also important to note that these commands WILL update the index
 | |
|     outside the sparse specification relative to when the operation began,
 | |
|     BUT these commands often make a commit just before or after such that
 | |
|     by the end of the operation there is no change to the index outside the
 | |
|     sparse specification.  Of course, if the operation hits conflicts or
 | |
|     does not make a commit, then these operations clearly can modify the
 | |
|     index outside the sparse specification.
 | |
| 
 | |
|     Finally, it is important to note that at least the first four of these
 | |
|     commands also try to remove differences between the sparse
 | |
|     specification and the sparsity patterns (much like the commands in the
 | |
|     previous section).
 | |
| 
 | |
|   * commands that always ignore sparsity since commits must be full-tree
 | |
| 
 | |
|       * archive
 | |
|       * bundle
 | |
|       * commit
 | |
|       * format-patch
 | |
|       * fast-export
 | |
|       * fast-import
 | |
|       * commit-tree
 | |
| 
 | |
|   * commands that write any modified file to the working tree (conflicted
 | |
|     or not, and whether those paths match sparsity patterns or not):
 | |
| 
 | |
|       * stash
 | |
|       * apply (without `--index` or `--cached`)
 | |
| 
 | |
| * Commands that may slightly differ for behavior A vs. behavior B:
 | |
| 
 | |
|   Commands in this category behave mostly the same between the two
 | |
|   behaviors, but may differ in verbosity and types of warning and error
 | |
|   messages.
 | |
| 
 | |
|   * commands that make modifications to which files are tracked:
 | |
|       * add
 | |
|       * rm
 | |
|       * mv
 | |
|       * update-index
 | |
| 
 | |
|     The fact that files can move between the 'tracked' and 'untracked'
 | |
|     categories means some commands will have to treat untracked files
 | |
|     differently.  But if we have to treat untracked files differently,
 | |
|     then additional commands may also need changes:
 | |
| 
 | |
|       * status
 | |
|       * clean
 | |
| 
 | |
|     In particular, `status` may need to report any untracked files outside
 | |
|     the sparsity specification as an erroneous condition (especially to
 | |
|     avoid the user trying to `git add` them, forcing `git add` to display
 | |
|     an error).
 | |
| 
 | |
|     It's not clear to me exactly how (or even if) `clean` would change,
 | |
|     but it's the other command that also affects untracked files.
 | |
| 
 | |
|     `update-index` may be slightly special.  Its --[no-]skip-worktree flag
 | |
|     may need to ignore the sparse specification by its nature.  Also, its
 | |
|     current --[no-]ignore-skip-worktree-entries default is totally bogus.
 | |
| 
 | |
|   * commands for manually tweaking paths in both the index and the working tree
 | |
|       * `restore`
 | |
|       * the restore-like half of `checkout`
 | |
| 
 | |
|     These commands should be similar to add/rm/mv in that they should
 | |
|     only operate on the sparse specification by default, and require a
 | |
|     special flag to operate on all files.
 | |
| 
 | |
|     Also, note that these commands currently have a number of issues (see
 | |
|     the "Known bugs" section below)
 | |
| 
 | |
| * Commands that significantly differ for behavior A vs. behavior B:
 | |
| 
 | |
|   * commands that query history
 | |
|       * diff (with --cached or REVISION arguments)
 | |
|       * grep (with --cached or REVISION arguments)
 | |
|       * show (when given commit arguments)
 | |
|       * blame (only matters when one or more -C flags are passed)
 | |
| 	* and annotate
 | |
|       * log
 | |
|       * whatchanged
 | |
|       * ls-files
 | |
|       * diff-index
 | |
|       * diff-tree
 | |
|       * ls-tree
 | |
| 
 | |
|     Note: for log and whatchanged, revision walking logic is unaffected
 | |
|     but displaying of patches is affected by scoping the command to the
 | |
|     sparse-checkout.  (The fact that revision walking is unaffected is
 | |
|     why rev-list, shortlog, show-branch, and bisect are not in this
 | |
|     list.)
 | |
| 
 | |
|     ls-files may be slightly special in that e.g. `git ls-files -t` is
 | |
|     often used to see what is sparse and what is not.  Perhaps -t should
 | |
|     always work on the full tree?
 | |
| 
 | |
| * Commands I don't know how to classify
 | |
| 
 | |
|   * range-diff
 | |
| 
 | |
|     Is this like `log` or `format-patch`?
 | |
| 
 | |
|   * cherry
 | |
| 
 | |
|     See range-diff
 | |
| 
 | |
| * Commands unaffected by sparse-checkouts
 | |
| 
 | |
|   * shortlog
 | |
|   * show-branch
 | |
|   * rev-list
 | |
|   * bisect
 | |
| 
 | |
|   * branch
 | |
|   * describe
 | |
|   * fetch
 | |
|   * gc
 | |
|   * init
 | |
|   * maintenance
 | |
|   * notes
 | |
|   * pull (merge & rebase have the necessary changes)
 | |
|   * push
 | |
|   * submodule
 | |
|   * tag
 | |
| 
 | |
|   * config
 | |
|   * filter-branch (works in separate checkout without sparse-checkout setup)
 | |
|   * pack-refs
 | |
|   * prune
 | |
|   * remote
 | |
|   * repack
 | |
|   * replace
 | |
| 
 | |
|   * bugreport
 | |
|   * count-objects
 | |
|   * fsck
 | |
|   * gitweb
 | |
|   * help
 | |
|   * instaweb
 | |
|   * merge-tree (doesn't touch worktree or index, and merges always compute full-tree)
 | |
|   * rerere
 | |
|   * verify-commit
 | |
|   * verify-tag
 | |
| 
 | |
|   * commit-graph
 | |
|   * hash-object
 | |
|   * index-pack
 | |
|   * mktag
 | |
|   * mktree
 | |
|   * multi-pack-index
 | |
|   * pack-objects
 | |
|   * prune-packed
 | |
|   * symbolic-ref
 | |
|   * unpack-objects
 | |
|   * update-ref
 | |
|   * write-tree (operates on index, possibly optimized to use sparse dir entries)
 | |
| 
 | |
|   * for-each-ref
 | |
|   * get-tar-commit-id
 | |
|   * ls-remote
 | |
|   * merge-base (merges are computed full tree, so merge base should be too)
 | |
|   * name-rev
 | |
|   * pack-redundant
 | |
|   * rev-parse
 | |
|   * show-index
 | |
|   * show-ref
 | |
|   * unpack-file
 | |
|   * var
 | |
|   * verify-pack
 | |
| 
 | |
|   * <Everything under 'Interacting with Others' in 'git help --all'>
 | |
|   * <Everything under 'Low-level...Syncing' in 'git help --all'>
 | |
|   * <Everything under 'Low-level...Internal Helpers' in 'git help --all'>
 | |
|   * <Everything under 'External commands' in 'git help --all'>
 | |
| 
 | |
| * Commands that might be affected, but who cares?
 | |
| 
 | |
|   * merge-file
 | |
|   * merge-index
 | |
|   * gitk?
 | |
| 
 | |
| 
 | |
| === Behavior classes ===
 | |
| 
 | |
| From the above there are a few classes of behavior:
 | |
| 
 | |
|   * "restrict"
 | |
| 
 | |
|     Commands in this class only read or write files in the working tree
 | |
|     within the sparse specification.
 | |
| 
 | |
|     When moving to a new commit (e.g. switch, reset --hard), these commands
 | |
|     may update index files outside the sparse specification as of the start
 | |
|     of the operation, but by the end of the operation those index files
 | |
|     will match HEAD again and thus those files will again be outside the
 | |
|     sparse specification.
 | |
| 
 | |
|     When paths are explicitly specified, these paths are intersected with
 | |
|     the sparse specification and will only operate on such paths.
 | |
|     (e.g. `git restore [--staged] -- '*.png'`, `git reset -p -- '*.md'`)
 | |
| 
 | |
|     Some of these commands may also attempt, at the end of their operation,
 | |
|     to cull transient differences between the sparse specification and the
 | |
|     sparsity patterns (see "Sparse specification vs. sparsity patterns" for
 | |
|     details, but this basically means either removing unmodified files not
 | |
|     matching the sparsity patterns and marking those files as
 | |
|     SKIP_WORKTREE, or vivifying files that match the sparsity patterns and
 | |
|     marking those files as !SKIP_WORKTREE).
 | |
| 
 | |
|   * "restrict modulo conflicts"
 | |
| 
 | |
|     Commands in this class generally behave like the "restrict" class,
 | |
|     except that:
 | |
|       (1) they will ignore the sparse specification and write files with
 | |
| 	  conflicts to the working tree (thus temporarily expanding the
 | |
| 	  sparse specification to include such files.)
 | |
|       (2) they are grouped with commands which move to a new commit, since
 | |
| 	  they often create a commit and then move to it, even though we
 | |
| 	  know there are many exceptions to moving to the new commit.  (For
 | |
| 	  example, the user may rebase a commit that becomes empty, or have
 | |
| 	  a cherry-pick which conflicts, or a user could run `merge
 | |
| 	  --no-commit`, and we also view `apply --index` kind of like `am
 | |
| 	  --no-commit`.)  As such, these commands can make changes to index
 | |
| 	  files outside the sparse specification, though they'll mark such
 | |
| 	  files with SKIP_WORKTREE.
 | |
| 
 | |
|   * "restrict also specially applied to untracked files"
 | |
| 
 | |
|     Commands in this class generally behave like the "restrict" class,
 | |
|     except that they have to handle untracked files differently too, often
 | |
|     because these commands are dealing with files changing state between
 | |
|     'tracked' and 'untracked'.  Often, this may mean printing an error
 | |
|     message if the command had nothing to do, but the arguments may have
 | |
|     referred to files whose tracked-ness state could have changed were it
 | |
|     not for the sparsity patterns excluding them.
 | |
| 
 | |
|   * "no restrict"
 | |
| 
 | |
|     Commands in this class ignore the sparse specification entirely.
 | |
| 
 | |
|   * "restrict or no restrict dependent upon behavior A vs. behavior B"
 | |
| 
 | |
|     Commands in this class behave like "no restrict" for folks in the
 | |
|     behavior B camp, and like "restrict" for folks in the behavior A camp.
 | |
|     However, when behaving like "restrict" a warning of some sort might be
 | |
|     provided that history queries have been limited by the sparse-checkout
 | |
|     specification.
 | |
| 
 | |
| 
 | |
| === Subcommand-dependent defaults ===
 | |
| 
 | |
| Note that we have different defaults depending on the command for the
 | |
| desired behavior :
 | |
| 
 | |
|   * Commands defaulting to "restrict":
 | |
|     * diff-files
 | |
|     * diff (without --cached or REVISION arguments)
 | |
|     * grep (without --cached or REVISION arguments)
 | |
|     * switch
 | |
|     * checkout (the switch-like half)
 | |
|     * reset (<commit>)
 | |
| 
 | |
|     * restore
 | |
|     * checkout (the restore-like half)
 | |
|     * checkout-index
 | |
|     * reset (with pathspec)
 | |
| 
 | |
|     This behavior makes sense; these interact with the working tree.
 | |
| 
 | |
|   * Commands defaulting to "restrict modulo conflicts":
 | |
|     * merge
 | |
|     * rebase
 | |
|     * cherry-pick
 | |
|     * revert
 | |
| 
 | |
|     * am
 | |
|     * apply --index (which is kind of like an `am --no-commit`)
 | |
| 
 | |
|     * read-tree (especially with -m or -u; is kind of like a --no-commit merge)
 | |
|     * reset (<tree-ish>, due to similarity to read-tree)
 | |
| 
 | |
|     These also interact with the working tree, but require slightly
 | |
|     different behavior either so that (a) conflicts can be resolved or (b)
 | |
|     because they are kind of like a merge-without-commit operation.
 | |
| 
 | |
|     (See also the "Known bugs" section below regarding `am` and `apply`)
 | |
| 
 | |
|   * Commands defaulting to "no restrict":
 | |
|     * archive
 | |
|     * bundle
 | |
|     * commit
 | |
|     * format-patch
 | |
|     * fast-export
 | |
|     * fast-import
 | |
|     * commit-tree
 | |
| 
 | |
|     * stash
 | |
|     * apply (without `--index`)
 | |
| 
 | |
|     These have completely different defaults and perhaps deserve the most
 | |
|     detailed explanation:
 | |
| 
 | |
|     In the case of commands in the first group (format-patch,
 | |
|     fast-export, bundle, archive, etc.), these are commands for
 | |
|     communicating history, which will be broken if they restrict to a
 | |
|     subset of the repository.  As such, they operate on full paths and
 | |
|     have no `--restrict` option for overriding.  Some of these commands may
 | |
|     take paths for manually restricting what is exported, but it needs to
 | |
|     be very explicit.
 | |
| 
 | |
|     In the case of stash, it needs to vivify files to avoid losing the
 | |
|     user's changes.
 | |
| 
 | |
|     In the case of apply without `--index`, that command needs to update
 | |
|     the working tree without the index (or the index without the working
 | |
|     tree if `--cached` is passed), and if we restrict those updates to the
 | |
|     sparse specification then we'll lose changes from the user.
 | |
| 
 | |
|   * Commands defaulting to "restrict also specially applied to untracked files":
 | |
|     * add
 | |
|     * rm
 | |
|     * mv
 | |
|     * update-index
 | |
|     * status
 | |
|     * clean (?)
 | |
| 
 | |
|     Our original implementation for the first three of these commands was
 | |
|     "no restrict", but it had some severe usability issues:
 | |
|       * `git add <somefile>` if honored and outside the sparse
 | |
| 	specification, can result in the file randomly disappearing later
 | |
| 	when some subsequent command is run (since various commands
 | |
| 	automatically clean up unmodified files outside the sparse
 | |
| 	specification).
 | |
|       * `git rm '*.jpg'` could very negatively surprise users if it deletes
 | |
| 	files outside the range of the user's interest.
 | |
|       * `git mv` has similar surprises when moving into or out of the cone,
 | |
| 	so best to restrict by default
 | |
| 
 | |
|     So, we switched `add` and `rm` to default to "restrict", which made
 | |
|     usability problems much less severe and less frequent, but we still got
 | |
|     complaints because commands like:
 | |
| 	git add <file-outside-sparse-specification>
 | |
| 	git rm <file-outside-sparse-specification>
 | |
|     would silently do nothing.  We should instead print an error in those
 | |
|     cases to get usability right.
 | |
| 
 | |
|     update-index needs to be updated to match, and status and maybe clean
 | |
|     also need to be updated to specially handle untracked paths.
 | |
| 
 | |
|     There may be a difference in here between behavior A and behavior B in
 | |
|     terms of verboseness of errors or additional warnings.
 | |
| 
 | |
|   * Commands falling under "restrict or no restrict dependent upon behavior
 | |
|     A vs. behavior B"
 | |
| 
 | |
|     * diff (with --cached or REVISION arguments)
 | |
|     * grep (with --cached or REVISION arguments)
 | |
|     * show (when given commit arguments)
 | |
|     * blame (only matters when one or more -C flags passed)
 | |
|       * and annotate
 | |
|     * log
 | |
|       * and variants: shortlog, gitk, show-branch, whatchanged, rev-list
 | |
|     * ls-files
 | |
|     * diff-index
 | |
|     * diff-tree
 | |
|     * ls-tree
 | |
| 
 | |
|     For now, we default to behavior B for these, which want a default of
 | |
|     "no restrict".
 | |
| 
 | |
|     Note that two of these commands -- diff and grep -- also appeared in a
 | |
|     different list with a default of "restrict", but only when limited to
 | |
|     searching the working tree.  The working tree vs. history distinction
 | |
|     is fundamental in how behavior B operates, so this is expected.  Note,
 | |
|     though, that for diff and grep with --cached, when doing "restrict"
 | |
|     behavior, the difference between sparse specification and sparsity
 | |
|     patterns is important to handle.
 | |
| 
 | |
|     "restrict" may make more sense as the long term default for these[12].
 | |
|     Also, supporting "restrict" for these commands might be a fair amount
 | |
|     of work to implement, meaning it might be implemented over multiple
 | |
|     releases.  If that behavior were the default in the commands that
 | |
|     supported it, that would force behavior B users to need to learn to
 | |
|     slowly add additional flags to their commands, depending on git
 | |
|     version, to get the behavior they want.  That gradual switchover would
 | |
|     be painful, so we should avoid it at least until it's fully
 | |
|     implemented.
 | |
| 
 | |
| 
 | |
| === Sparse specification vs. sparsity patterns ===
 | |
| 
 | |
| In a well-behaved situation, the sparse specification is given directly
 | |
| by the $GIT_DIR/info/sparse-checkout file.  However, it can transiently
 | |
| diverge for a few reasons:
 | |
| 
 | |
|     * needing to resolve conflicts (merging will vivify conflicted files)
 | |
|     * running Git commands that implicitly vivify files (e.g. "git stash apply")
 | |
|     * running Git commands that explicitly vivify files (e.g. "git checkout
 | |
|       --ignore-skip-worktree-bits FILENAME")
 | |
|     * other commands that write to these files (perhaps a user copies it
 | |
|       from elsewhere)
 | |
| 
 | |
| For the last item, note that we do automatically clear the SKIP_WORKTREE
 | |
| bit for files that are present in the working tree.  This has been true
 | |
| since 82386b4496 ("Merge branch 'en/present-despite-skipped'",
 | |
| 2022-03-09)
 | |
| 
 | |
| However, such a situation is transient because:
 | |
| 
 | |
|    * Such transient differences can and will be automatically removed as
 | |
|      a side-effect of commands which call unpack_trees() (checkout,
 | |
|      merge, reset, etc.).
 | |
|    * Users can also request such transient differences be corrected via
 | |
|      running `git sparse-checkout reapply`.  Various places recommend
 | |
|      running that command.
 | |
|    * Additional commands are also welcome to implicitly fix these
 | |
|      differences; we may add more in the future.
 | |
| 
 | |
| While we avoid dropping unstaged changes or files which have conflicts,
 | |
| we otherwise aggressively try to fix these transient differences.  If
 | |
| users want these differences to persist, they should run the `set` or
 | |
| `add` subcommands of `git sparse-checkout` to reflect their intended
 | |
| sparse specification.
 | |
| 
 | |
| However, when we need to do a query on history restricted to the
 | |
| "relevant subset of files" such a transiently expanded sparse
 | |
| specification is ignored.  There are a couple reasons for this:
 | |
| 
 | |
|    * The behavior wanted when doing something like
 | |
| 	 git grep expression REVISION
 | |
|      is roughly what the users would expect from
 | |
| 	 git checkout REVISION && git grep expression
 | |
|      (modulo a "REVISION:" prefix), which has a couple ramifications:
 | |
| 
 | |
|    * REVISION may have paths not in the current index, so there is no
 | |
|      path we can consult for a SKIP_WORKTREE setting for those paths.
 | |
| 
 | |
|    * Since `checkout` is one of those commands that tries to remove
 | |
|      transient differences in the sparse specification, it makes sense
 | |
|      to use the corrected sparse specification
 | |
|      (i.e. $GIT_DIR/info/sparse-checkout) rather than attempting to
 | |
|      consult SKIP_WORKTREE anyway.
 | |
| 
 | |
| So, a transiently expanded (or restricted) sparse specification applies to
 | |
| the working tree, but not to history queries where we always use the
 | |
| sparsity patterns.  (See [16] for an early discussion of this.)
 | |
| 
 | |
| Similar to a transiently expanded sparse specification of the working tree
 | |
| based on additional files being present in the working tree, we also need
 | |
| to consider additional files being modified in the index.  In particular,
 | |
| if the user has staged changes to files (relative to HEAD) that do not
 | |
| match the sparsity patterns, and the file is not present in the working
 | |
| tree, we still want to consider the file part of the sparse specification
 | |
| if we are specifically performing a query related to the index (e.g. git
 | |
| diff --cached [REVISION], git diff-index [REVISION], git restore --staged
 | |
| --source=REVISION -- PATHS, etc.)  Note that a transiently expanded sparse
 | |
| specification for the index usually only matters under behavior A, since
 | |
| under behavior B index operations are lumped with history and tend to
 | |
| operate full-tree.
 | |
| 
 | |
| 
 | |
| === Implementation Questions ===
 | |
| 
 | |
|   * Do the options --scope={sparse,all} sound good to others?  Are there better
 | |
|     options?
 | |
|     * Names in use, or appearing in patches, or previously suggested:
 | |
|       * --sparse/--dense
 | |
|       * --ignore-skip-worktree-bits
 | |
|       * --ignore-skip-worktree-entries
 | |
|       * --ignore-sparsity
 | |
|       * --[no-]restrict-to-sparse-paths
 | |
|       * --full-tree/--sparse-tree
 | |
|       * --[no-]restrict
 | |
|       * --scope={sparse,all}
 | |
|       * --focus/--unfocus
 | |
|       * --limit/--unlimited
 | |
|     * Rationale making me lean slightly towards --scope={sparse,all}:
 | |
|       * We want a name that works for many commands, so we need a name that
 | |
| 	does not conflict
 | |
|       * We know that we have more than two possible usecases, so it is best
 | |
| 	to avoid a flag that appears to be binary.
 | |
|       * --scope={sparse,all} isn't overly long and seems relatively
 | |
| 	explanatory
 | |
|       * `--sparse`, as used in add/rm/mv, is totally backwards for
 | |
| 	grep/log/etc.  Changing the meaning of `--sparse` for these
 | |
| 	commands would fix the backwardness, but possibly break existing
 | |
| 	scripts.  Using a new name pairing would allow us to treat
 | |
| 	`--sparse` in these commands as a deprecated alias.
 | |
|       * There is a different `--sparse`/`--dense` pair for commands using
 | |
| 	revision machinery, so using that naming might cause confusion
 | |
|       * There is also a `--sparse` in both pack-objects and show-branch, which
 | |
| 	don't conflict but do suggest that `--sparse` is overloaded
 | |
|       * The name --ignore-skip-worktree-bits is a double negative, is
 | |
| 	quite a mouthful, refers to an implementation detail that many
 | |
| 	users may not be familiar with, and we'd need a negation for it
 | |
| 	which would probably be even more ridiculously long.  (But we
 | |
| 	can make --ignore-skip-worktree-bits a deprecated alias for
 | |
| 	--no-restrict.)
 | |
| 
 | |
|   * If a config option is added (sparse.scope?) what should the values and
 | |
|     description be?  "sparse" (behavior A), "worktree-sparse-history-dense"
 | |
|     (behavior B), "dense" (behavior C)?  There's a risk of confusion,
 | |
|     because even for Behaviors A and B we want some commands to be
 | |
|     full-tree and others to operate sparsely, so the wording may need to be
 | |
|     more tied to the usecases and somehow explain that.  Also, right now,
 | |
|     the primary difference we are focusing is just the history-querying
 | |
|     commands (log/diff/grep).  Previous config suggestion here: [13]
 | |
| 
 | |
|   * Is `--no-expand` a good alias for ls-files's `--sparse` option?
 | |
|     (`--sparse` does not map to either `--scope=sparse` or `--scope=all`,
 | |
|     because in non-cone mode it does nothing and in cone-mode it shows the
 | |
|     sparse directory entries which are technically outside the sparse
 | |
|     specification)
 | |
| 
 | |
|   * Under Behavior A:
 | |
|     * Does ls-files' `--no-expand` override the default `--scope=all`, or
 | |
|       does it need an extra flag?
 | |
|     * Does ls-files' `-t` option imply `--scope=all`?
 | |
|     * Does update-index's `--[no-]skip-worktree` option imply `--scope=all`?
 | |
| 
 | |
|   * sparse-checkout: once behavior A is fully implemented, should we take
 | |
|     an interim measure to ease people into switching the default?  Namely,
 | |
|     if folks are not already in a sparse checkout, then require
 | |
|     `sparse-checkout init/set` to take a
 | |
|     `--set-scope=(sparse|worktree-sparse-history-dense|dense)` flag (which
 | |
|     would set sparse.scope according to the setting given), and throw an
 | |
|     error if the flag is not provided?  That error would be a great place
 | |
|     to warn folks that the default may change in the future, and get them
 | |
|     used to specifying what they want so that the eventual default switch
 | |
|     is seamless for them.
 | |
| 
 | |
| 
 | |
| === Implementation Goals/Plans ===
 | |
| 
 | |
|  * Get buy-in on this document in general.
 | |
| 
 | |
|  * Figure out answers to the 'Implementation Questions' sections (above)
 | |
| 
 | |
|  * Fix bugs in the 'Known bugs' section (below)
 | |
| 
 | |
|  * Provide some kind of method for backfilling the blobs within the sparse
 | |
|    specification in a partial clone
 | |
| 
 | |
|  [Below here is kind of spitballing since the first two haven't been resolved]
 | |
| 
 | |
|  * update-index: flip the default to --no-ignore-skip-worktree-entries,
 | |
|    nuke this stupid "Oh, there's a bug?  Let me add a flag to let users
 | |
|    request that they not trigger this bug." flag
 | |
| 
 | |
|  * Flags & Config
 | |
|    * Make `--sparse` in add/rm/mv a deprecated alias for `--scope=all`
 | |
|    * Make `--ignore-skip-worktree-bits` in checkout-index/checkout/restore
 | |
|      a deprecated aliases for `--scope=all`
 | |
|    * Create config option (sparse.scope?), tie it to the "Cliff notes"
 | |
|      overview
 | |
| 
 | |
|    * Add --scope=sparse (and --scope=all) flag to each of the history querying
 | |
|      commands.  IMPORTANT: make sure diff machinery changes don't mess with
 | |
|      format-patch, fast-export, etc.
 | |
| 
 | |
| === Known bugs ===
 | |
| 
 | |
| This list used to be a lot longer (see e.g. [1,2,3,4,5,6,7,8,9]), but we've
 | |
| been working on it.
 | |
| 
 | |
| 0. Behavior A is not well supported in Git.  (Behavior B didn't used to
 | |
|    be either, but was the easier of the two to implement.)
 | |
| 
 | |
| 1. am and apply:
 | |
| 
 | |
|    apply, without `--index` or `--cached`, relies on files being present
 | |
|    in the working copy, and also writes to them unconditionally.  As
 | |
|    such, it should first check for the files' presence, and if found to
 | |
|    be SKIP_WORKTREE, then clear the bit and vivify the paths, then do
 | |
|    its work.  Currently, it just throws an error.
 | |
| 
 | |
|    apply, with either `--cached` or `--index`, will not preserve the
 | |
|    SKIP_WORKTREE bit.  This is fine if the file has conflicts, but
 | |
|    otherwise SKIP_WORKTREE bits should be preserved for --cached and
 | |
|    probably also for --index.
 | |
| 
 | |
|    am, if there are no conflicts, will vivify files and fail to preserve
 | |
|    the SKIP_WORKTREE bit.  If there are conflicts and `-3` is not
 | |
|    specified, it will vivify files and then complain the patch doesn't
 | |
|    apply.  If there are conflicts and `-3` is specified, it will vivify
 | |
|    files and then complain that those vivified files would be
 | |
|    overwritten by merge.
 | |
| 
 | |
| 2. reset --hard:
 | |
| 
 | |
|    reset --hard provides confusing error message (works correctly, but
 | |
|    misleads the user into believing it didn't):
 | |
| 
 | |
|     $ touch addme
 | |
|     $ git add addme
 | |
|     $ git ls-files -t
 | |
|     H addme
 | |
|     H tracked
 | |
|     S tracked-but-maybe-skipped
 | |
|     $ git reset --hard                           # usually works great
 | |
|     error: Path 'addme' not uptodate; will not remove from working tree.
 | |
|     HEAD is now at bdbbb6f third
 | |
|     $ git ls-files -t
 | |
|     H tracked
 | |
|     S tracked-but-maybe-skipped
 | |
|     $ ls -1
 | |
|     tracked
 | |
| 
 | |
|     `git reset --hard` DID remove addme from the index and the working tree, contrary
 | |
|     to the error message, but in line with how reset --hard should behave.
 | |
| 
 | |
| 3. read-tree
 | |
| 
 | |
|    `read-tree` doesn't apply the 'SKIP_WORKTREE' bit to *any* of the
 | |
|    entries it reads into the index, resulting in all your files suddenly
 | |
|    appearing to be "deleted".
 | |
| 
 | |
| 4. Checkout, restore:
 | |
| 
 | |
|    These command do not handle path & revision arguments appropriately:
 | |
| 
 | |
|     $ ls
 | |
|     tracked
 | |
|     $ git ls-files -t
 | |
|     H tracked
 | |
|     S tracked-but-maybe-skipped
 | |
|     $ git status --porcelain
 | |
|     $ git checkout -- '*skipped'
 | |
|     error: pathspec '*skipped' did not match any file(s) known to git
 | |
|     $ git ls-files -- '*skipped'
 | |
|     tracked-but-maybe-skipped
 | |
|     $ git checkout HEAD -- '*skipped'
 | |
|     error: pathspec '*skipped' did not match any file(s) known to git
 | |
|     $ git ls-tree HEAD | grep skipped
 | |
|     100644 blob 276f5a64354b791b13840f02047738c77ad0584f	tracked-but-maybe-skipped
 | |
|     $ git status --porcelain
 | |
|     $ git checkout HEAD~1 -- '*skipped'
 | |
|     $ git ls-files -t
 | |
|     H tracked
 | |
|     H tracked-but-maybe-skipped
 | |
|     $ git status --porcelain
 | |
|     M  tracked-but-maybe-skipped
 | |
|     $ git checkout HEAD -- '*skipped'
 | |
|     $ git status --porcelain
 | |
|     $
 | |
| 
 | |
|     Note that checkout without a revision (or restore --staged) fails to
 | |
|     find a file to restore from the index, even though ls-files shows
 | |
|     such a file certainly exists.
 | |
| 
 | |
|     Similar issues occur with HEAD (--source=HEAD in restore's case),
 | |
|     but suddenly works when HEAD~1 is specified.  And then after that it
 | |
|     will work with HEAD specified, even though it didn't before.
 | |
| 
 | |
|     Directories are also an issue:
 | |
| 
 | |
|     $ git sparse-checkout set nomatches
 | |
|     $ git status
 | |
|     On branch main
 | |
|     You are in a sparse checkout with 0% of tracked files present.
 | |
| 
 | |
|     nothing to commit, working tree clean
 | |
|     $ git checkout .
 | |
|     error: pathspec '.' did not match any file(s) known to git
 | |
|     $ git checkout HEAD~1 .
 | |
|     Updated 1 path from 58916d9
 | |
|     $ git ls-files -t
 | |
|     S tracked
 | |
|     H tracked-but-maybe-skipped
 | |
| 
 | |
| 5. checkout and restore --staged, continued:
 | |
| 
 | |
|    These commands do not correctly scope operations to the sparse
 | |
|    specification, and make it worse by not setting important SKIP_WORKTREE
 | |
|    bits:
 | |
| 
 | |
|    $ git restore --source OLDREV --staged outside-sparse-cone/
 | |
|    $ git status --porcelain
 | |
|    MD outside-sparse-cone/file1
 | |
|    MD outside-sparse-cone/file2
 | |
|    MD outside-sparse-cone/file3
 | |
| 
 | |
|    We can add a --scope=all mode to `git restore` to let it operate outside
 | |
|    the sparse specification, but then it will be important to set the
 | |
|    SKIP_WORKTREE bits appropriately.
 | |
| 
 | |
| 6. Performance issues; see:
 | |
|     https://lore.kernel.org/git/CABPp-BEkJQoKZsQGCYioyga_uoDQ6iBeW+FKr8JhyuuTMK1RDw@mail.gmail.com/
 | |
| 
 | |
| 
 | |
| === Reference Emails ===
 | |
| 
 | |
| Emails that detail various bugs we've had in sparse-checkout:
 | |
| 
 | |
| [1] (Original descriptions of behavior A & behavior B)
 | |
|     https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
 | |
| [2] (Fix stash applications in sparse checkouts; bugs from behavioral differences)
 | |
|     https://lore.kernel.org/git/ccfedc7140dbf63ba26a15f93bd3885180b26517.1606861519.git.gitgitgadget@gmail.com/
 | |
| [3] (Present-despite-skipped entries)
 | |
|     https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/
 | |
| [4] (Clone --no-checkout interaction)
 | |
|     https://lore.kernel.org/git/pull.801.v2.git.git.1591324899170.gitgitgadget@gmail.com/ (clone --no-checkout)
 | |
| [5] (The need for update_sparsity() and avoiding `read-tree -mu HEAD`)
 | |
|     https://lore.kernel.org/git/3a1f084641eb47515b5a41ed4409a36128913309.1585270142.git.gitgitgadget@gmail.com/
 | |
| [6] (SKIP_WORKTREE is advisory, not mandatory)
 | |
|     https://lore.kernel.org/git/844306c3e86ef67591cc086decb2b760e7d710a3.1585270142.git.gitgitgadget@gmail.com/
 | |
| [7] (`worktree add` should copy sparsity settings from current worktree)
 | |
|     https://lore.kernel.org/git/c51cb3714e7b1d2f8c9370fe87eca9984ff4859f.1644269584.git.gitgitgadget@gmail.com/
 | |
| [8] (Avoid negative surprises in add, rm, and mv)
 | |
|     https://lore.kernel.org/git/cover.1617914011.git.matheus.bernardino@usp.br/
 | |
|     https://lore.kernel.org/git/pull.1018.v4.git.1632497954.gitgitgadget@gmail.com/
 | |
| [9] (Move from out-of-cone to in-cone)
 | |
|     https://lore.kernel.org/git/20220630023737.473690-6-shaoxuan.yuan02@gmail.com/
 | |
|     https://lore.kernel.org/git/20220630023737.473690-4-shaoxuan.yuan02@gmail.com/
 | |
| [10] (Unnecessarily downloading objects outside sparse specification)
 | |
|      https://lore.kernel.org/git/CAOLTT8QfwOi9yx_qZZgyGa8iL8kHWutEED7ok_jxwTcYT_hf9Q@mail.gmail.com/
 | |
| 
 | |
| [11] (Stolee's comments on high-level usecases)
 | |
|      https://lore.kernel.org/git/1a1e33f6-3514-9afc-0a28-5a6b85bd8014@gmail.com/
 | |
| 
 | |
| [12] Others commenting on eventually switching default to behavior A:
 | |
|   * https://lore.kernel.org/git/xmqqh719pcoo.fsf@gitster.g/
 | |
|   * https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/
 | |
|   * https://lore.kernel.org/git/a86af661-cf58-a4e5-0214-a67d3a794d7e@github.com/
 | |
| 
 | |
| [13] Previous config name suggestion and description
 | |
|   * https://lore.kernel.org/git/CABPp-BE6zW0nJSStcVU=_DoDBnPgLqOR8pkTXK3dW11=T01OhA@mail.gmail.com/
 | |
| 
 | |
| [14] Tangential issue: switch to cone mode as default sparse specification mechanism:
 | |
|   https://lore.kernel.org/git/a1b68fd6126eb341ef3637bb93fedad4309b36d0.1650594746.git.gitgitgadget@gmail.com/
 | |
| 
 | |
| [15] Lengthy email on grep behavior, covering what should be searched:
 | |
|   * https://lore.kernel.org/git/CABPp-BGVO3QdbfE84uF_3QDF0-y2iHHh6G5FAFzNRfeRitkuHw@mail.gmail.com/
 | |
| 
 | |
| [16] Email explaining sparsity patterns vs. SKIP_WORKTREE and history operations,
 | |
|      search for the parenthetical comment starting "We do not check".
 | |
|     https://lore.kernel.org/git/CABPp-BFsCPPNOZ92JQRJeGyNd0e-TCW-LcLyr0i_+VSQJP+GCg@mail.gmail.com/
 | |
| 
 | |
| [17] https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/
 |