git/builtin
Derrick Stolee 28cb5e66dd maintenance: add prefetch task
When working with very large repositories, an incremental 'git fetch'
command can download a large amount of data. If there are many other
users pushing to a common repo, then this data can rival the initial
pack-file size of a 'git clone' of a medium-size repo.

Users may want to keep the data on their local repos as close as
possible to the data on the remote repos by fetching periodically in
the background. This can break up a large daily fetch into several
smaller hourly fetches.

The task is called "prefetch" because it is work done in advance
of a foreground fetch to make that 'git fetch' command much faster.

However, if we simply ran 'git fetch <remote>' in the background,
then the user running a foreground 'git fetch <remote>' would lose
some important feedback when a new branch appears or an existing
branch updates. This is especially true if a remote branch is
force-updated and this isn't noticed by the user because it occurred
in the background. Further, the functionality of 'git push
--force-with-lease' becomes suspect.

When running 'git fetch <remote> <options>' in the background, use
the following options for careful updating:

1. --no-tags prevents getting a new tag when a user wants to see
   the new tags appear in their foreground fetches.

2. --refmap= removes the configured refspec which usually updates
   refs/remotes/<remote>/* with the refs advertised by the remote.
   While this looks confusing, this was documented and tested by
   b40a50264a (fetch: document and test --refmap="", 2020-01-21),
   including this sentence in the documentation:

	Providing an empty `<refspec>` to the `--refmap` option
	causes Git to ignore the configured refspecs and rely
	entirely on the refspecs supplied as command-line arguments.

3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*"
   we can ensure that we actually load the new values somewhere in
   our refspace while not updating refs/heads or refs/remotes. By
   storing these refs here, the commit-graph job will update the
   commit-graph with the commits from these hidden refs.

4. --prune will delete the refs/prefetch/<remote> refs that no
   longer appear on the remote.

5. --no-write-fetch-head prevents updating FETCH_HEAD.

We've been using this step as a critical background job in Scalar
[1] (and VFS for Git). This solved a pain point that was showing up
in user reports: fetching was a pain! Users do not like waiting to
download the data that was created while they were away from their
machines. After implementing background fetch, the foreground fetch
commands sped up significantly because they mostly just update refs
and download a small amount of new data. The effect is especially
dramatic when paried with --no-show-forced-udpates (through
fetch.showForcedUpdates=false).

[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
..
add.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
am.c maintenance: replace run_auto_gc() 2020-09-17 11:30:05 -07:00
annotate.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
apply.c
archive.c
bisect--helper.c Merge branch 'al/bisect-first-parent' 2020-08-17 17:02:45 -07:00
blame.c Merge branch 'dl/opt-callback-cleanup' 2020-05-05 14:54:27 -07:00
branch.c Merge branch 'es/get-worktrees-unsort' 2020-07-06 22:09:15 -07:00
bundle.c Merge branch 'bc/sha-256-part-3' 2020-08-11 18:04:11 -07:00
cat-file.c Merge branch 'cc/cat-file-usage-update' into master 2020-07-09 14:00:41 -07:00
check-attr.c
check-ignore.c check-ignore: fix documentation and implementation to match 2020-02-18 15:28:58 -08:00
check-mailmap.c
check-ref-format.c
checkout-index.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
checkout.c checkout: support renormalization with checkout -m <paths> 2020-08-03 11:48:15 -07:00
clean.c clean: optimize and document cases where we recurse into subdirectories 2020-06-12 17:27:16 -07:00
clone.c Merge branch 'jk/strvec' 2020-08-10 10:23:57 -07:00
column.c
commit-graph.c Merge branch 'ds/commit-graph-bloom-updates' into master 2020-07-30 13:20:31 -07:00
commit-tree.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
commit.c maintenance: replace run_auto_gc() 2020-09-17 11:30:05 -07:00
config.c worktree: drop get_worktrees() unused 'flags' argument 2020-06-22 10:31:15 -07:00
count-objects.c
credential.c
describe.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
diff-files.c diff-files: treat "i-t-a" files as "not-in-index" 2020-06-22 10:46:45 -07:00
diff-index.c
diff-tree.c diff-tree.c: load notes machinery when required 2020-04-20 18:22:54 -07:00
diff.c Merge branch 'ct/diff-with-merge-base-clarification' into master 2020-07-09 14:00:43 -07:00
difftool.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
env--helper.c env--helper: mark a file-local symbol as static 2019-07-11 14:31:04 -07:00
fast-export.c fast-export: use local array to store anonymized oid 2020-06-25 14:19:23 -07:00
fetch-pack.c Merge branch 'jt/cdn-offload' 2020-06-25 12:27:47 -07:00
fetch.c maintenance: replace run_auto_gc() 2020-09-17 11:30:05 -07:00
fmt-merge-msg.c Lib-ify fmt-merge-msg 2020-03-24 15:04:43 -07:00
for-each-ref.c Merge branch 'jk/for-each-ref-multi-key-sort-fix' 2020-05-08 14:25:04 -07:00
fsck.c fsck: do not lazy fetch known non-promisor object 2020-08-06 13:01:03 -07:00
gc.c maintenance: add prefetch task 2020-09-25 10:53:04 -07:00
get-tar-commit-id.c
grep.c Merge branch 'jk/strvec' 2020-08-10 10:23:57 -07:00
hash-object.c
help.c help: drop usage of 'common' and 'useful' for guides 2020-08-04 18:34:01 -07:00
index-pack.c builtin/index-pack: add option to specify hash algorithm 2020-06-19 14:04:08 -07:00
init-db.c repository: enable SHA-256 support by default 2020-07-30 09:16:49 -07:00
interpret-trailers.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
log.c Merge branch 'jk/log-fp-implies-m' 2020-08-17 17:02:49 -07:00
ls-files.c Merge branch 'dl/opt-callback-cleanup' 2020-05-05 14:54:27 -07:00
ls-remote.c strvec: convert builtin/ callers away from argv_array name 2020-07-28 15:02:18 -07:00
ls-tree.c
mailinfo.c
mailsplit.c
merge-base.c rebase: --fork-point regression fix 2020-02-11 09:59:39 -08:00
merge-file.c
merge-index.c
merge-ours.c
merge-recursive.c Ensure index matches head before invoking merge machinery, round N 2019-08-19 10:08:03 -07:00
merge-tree.c Merge branch 'jk/tree-walk-overflow' 2019-08-22 12:34:10 -07:00
merge.c maintenance: replace run_auto_gc() 2020-09-17 11:30:05 -07:00
mktag.c sha1-file: allow check_object_signature() to handle any repo 2020-01-31 10:45:39 -08:00
mktree.c
multi-pack-index.c multi-pack-index: add [--[no-]progress] option. 2019-10-23 12:05:06 +09:00
mv.c git-mv: improve error message for conflicted file 2020-07-20 14:35:43 -07:00
name-rev.c name-rev: sort tip names before applying 2020-02-05 10:36:33 -08:00
notes.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
pack-objects.c Merge branch 'jt/has_object' 2020-08-13 14:13:39 -07:00
pack-redundant.c
pack-refs.c
patch-id.c patch-id: use oid_to_hex() to print multiple object IDs 2019-12-09 12:26:40 -08:00
prune-packed.c Lib-ify prune-packed 2020-03-24 15:04:44 -07:00
prune.c Merge branch 'tb/shallow-cleanup' 2020-05-13 12:19:18 -07:00
pull.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
push.c Merge branch 'dl/push-recurse-submodules-fix' 2020-05-05 14:54:28 -07:00
range-diff.c strvec: convert builtin/ callers away from argv_array name 2020-07-28 15:02:18 -07:00
read-tree.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
rebase.c maintenance: replace run_auto_gc() 2020-09-17 11:30:05 -07:00
receive-pack.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
reflog.c Merge branch 'es/get-worktrees-unsort' 2020-07-06 22:09:15 -07:00
remote-ext.c strvec: convert builtin/ callers away from argv_array name 2020-07-28 15:02:18 -07:00
remote-fd.c
remote.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
repack.c strvec: fix indentation in renamed calls 2020-07-28 15:02:18 -07:00
replace.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
rerere.c
reset.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
rev-list.c bisect: combine args passed to find_bisection() 2020-08-07 15:13:03 -07:00
rev-parse.c Merge branch 'tb/shallow-cleanup' 2020-05-13 12:19:18 -07:00
revert.c Merge branch 'ra/cherry-pick-revert-skip' 2019-07-19 11:30:21 -07:00
rm.c rm: support the --pathspec-from-file option 2020-02-19 10:56:49 -08:00
send-pack.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
shortlog.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
show-branch.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
show-index.c builtin/show-index: provide options to determine hash algo 2020-05-27 10:07:07 -07:00
show-ref.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
sparse-checkout.c Merge branch 'xl/upgrade-repo-format' 2020-06-29 14:17:24 -07:00
stash.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
stripspace.c
submodule--helper.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
symbolic-ref.c
tag.c Merge branch 'jk/for-each-ref-multi-key-sort-fix' 2020-05-08 14:25:04 -07:00
unpack-file.c
unpack-objects.c sha1-file: pass git_hash_algo to hash_object_file() 2020-01-31 10:45:39 -08:00
update-index.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
update-ref.c strvec: rename files from argv-array to strvec 2020-07-28 15:02:17 -07:00
update-server-info.c
upload-archive.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
upload-pack.c
var.c
verify-commit.c Merge branch 'jk/no-system-includes-in-dot-c' 2019-07-31 14:38:56 -07:00
verify-pack.c Merge branch 'bc/sha-256-part-3' 2020-08-11 18:04:11 -07:00
verify-tag.c
worktree.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
write-tree.c