Go to file
Jeff King b6e8a3b540 limit_list: avoid quadratic behavior from still_interesting
When we are limiting a rev-list traversal due to
UNINTERESTING refs, we have to walk down the tips (both
interesting and uninteresting) to find where they intersect.
We keep a queue of commits to examine, pop commits off
the queue one by one, and potentially add their parents.  The
size of the queue will naturally fluctuate based on the
"width" of the history graph; i.e., the number of
simultaneous lines of development. But for the most part it
will stay in the same ballpark as the initial number of tips
we fed, shrinking over time (as we hit common ancestors of
the tips). So roughly speaking, if we start with `N` tips,
we'll spend much of the time with a queue around `N` items.

For each UNINTERESTING commit we pop, we call
still_interesting to check whether marking its parents as
UNINTERESTING has made the whole queue uninteresting (in
which case we can quit early).  Because the queue is stored
as a linked list, this is `O(N)`, where `N` is the number of
items in the queue. So processing a queue with `N` commits
marked UNINTERESTING (and one or more interesting commits)
will take `O(N^2)`.

If you feed a lot of positive tips, this isn't a problem.
They aren't UNINTERESTING, so they don't incur the
still_interesting check.  It also isn't a problem if you
traverse from an interesting tip to some UNINTERESTING
bases. We order the queue by recency, so the interesting
commits stay at the front of the queue as we walk down them.
The linear check can exit early as soon as it sees one
interesting commit left in the queue.

But if you want to know whether an older commit is reachable
from a set of newer tips, we end up processing in the
opposite direction: from the UNINTERESTING ones down to the
interesting one. This may happen when we call:

  git rev-list $commits --not --all

in check_everything_connected after a fetch. If we fetched
something much older than most of our refs, and if we have a
large number of refs, the traversal cost is dominated by the
quadratic behavior.

These commands simulate the connectivity check of such a
fetch, when you have `$n` distinct refs in the receiver:

    # positive ref is 100,000 commits deep
    git rev-list --all | head -100000 | tail -1 >input

    # huge number of more recent negative refs
    git rev-list --all | head -$n | sed s/^/^/ >>input

    time git rev-list --stdin <input

Here are timings for various `n` on the linux.git
repository. The `n=1` case provides a baseline for just
walking the commits, which lets us see the still_interesting
overhead. The times marked with `+` subtract that baseline
to show just the extra time growth due to the large number
of refs. The `x` numbers show the slowdown of the adjusted
time versus the prior trial.

       n  | before                 | after
    --------------------------------------------------------
        1 | 0.991s                 | 0.848s
    10000 | 1.120s (+0.129s)       | 0.885s (+0.037s)
    20000 | 1.451s (+0.460s, 3.5x) | 0.923s (+0.075s, 2.0x)
    40000 | 2.731s (+1.740s, 3.8x) | 0.994s (+0.146s, 1.9x)
    80000 | 8.235s (+7.244s, 4.2x) | 1.123s (+0.275s, 1.9x)

Each trial doubles `n`, so you can see the quadratic (`4x`)
behavior before this patch. Afterwards, we have a roughly
linear relationship.

The implementation is fairly straightforward. Whenever we do
the linear search, we cache the interesting commit we find,
and next time check it before doing another linear search.
If that commit is removed from the list or becomes
UNINTERESTING itself, then we fall back to the linear
search. This is very similar to the trick used by fce87ae
(Fix quadratic performance in rewrite_one., 2008-07-12).

I considered and rejected several possible alternatives:

  1. Keep a count of UNINTERESTING commits in the queue.
     This requires managing the count not only when removing
     an item from the queue, but also when marking an item
     as UNINTERESTING. That requires touching the other
     functions which mark commits, and would require knowing
     quickly which commits are in the queue (lookup in the
     queue is linear, so we would need an auxiliary
     structure or to also maintain an IN_QUEUE flag in each
     commit object).

  2. Keep a separate list of interesting commits. Drop items
     from it when they are dropped from the queue, or if
     they become UNINTERESTING. This again suffers from
     extra complexity to maintain the list, not to mention
     CPU and memory.

  3. Use a better data structure for the queue. This is
     something that could help the fix in fce87ae, because
     we order the queue by recency, and it is about
     inserting quickly in recency order. So a normal
     priority queue would help there. But here, we cannot
     disturb the order of the queue, which makes things
     harder. We really do need an auxiliary index to track
     the flag we care about, which is basically option (2)
     above.

The "cache" trick is simple, and the numbers above show that
it works well in practice. This is because the length of
time it takes to find an interesting commit is proportional
to the length of time it will remain cached (i.e., if we
have to walk a long way to find it, it also means we have to
pop a lot of elements in the queue until we get rid of it
and have to find another interesting commit).

The worst case is still quadratic, though. We could have `N`
uninteresting commits at the front of the queue, followed by
`N` interesting commits, where commit `i` has parent `i+N`.
When we pop commit `i`, we will notice that the parent of
the next commit, `i+1+N` is still interesting and cache it.
But then handling commit `i+1`, we will mark its parent
`i+1+N` uninteresting, and immediately invalidate our cache.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-17 15:22:05 -07:00
Documentation Git 2.0.5 2014-12-17 11:30:46 -08:00
block-sha1
builtin diff-tree: avoid lookup_unknown_object 2014-07-28 10:14:34 -07:00
compat Merge branch 'cb/byte-order' into maint 2014-07-22 10:25:02 -07:00
contrib Merge branch 'ep/shell-assign-and-export-vars' into maint 2014-07-22 10:22:57 -07:00
ewah ewah_bitmap.c: do not assume size_t and eword_t are the same size 2014-04-22 16:21:16 -07:00
git-gui Merge branch 'jl/use-vsatisfy-correctly-for-2.0' 2014-05-19 10:35:24 -07:00
gitk-git
gitweb Merge branch 'jl/nor-or-nand-and' 2014-04-08 12:00:28 -07:00
mergetools
perl code and test: fix misuses of "nor" 2014-03-31 15:29:33 -07:00
po fr: a lot of good fixups 2014-05-17 19:08:59 +02:00
ppc
t Merge branch 'maint-1.9' into maint-2.0 2015-01-07 13:27:19 -08:00
templates
vcs-svn
xdiff
.gitattributes
.gitignore
.mailmap .mailmap: combine Stefan Beller's emails 2014-07-23 11:27:05 -07:00
COPYING
GIT-VERSION-GEN Git 2.0.5 2014-12-17 11:30:46 -08:00
INSTALL
LGPL-2.1
Makefile Merge branch 'nd/index-pack-one-fd-per-thread' into maint 2014-06-25 11:47:58 -07:00
README
RelNotes Git 2.0.5 2014-12-17 11:30:46 -08:00
abspath.c
aclocal.m4
advice.c
advice.h
alias.c
alloc.c alloc: factor out commit index 2014-07-28 10:14:33 -07:00
archive-tar.c
archive-zip.c
archive.c Merge branch 'rm/strchrnul-not-strlen' 2014-03-18 13:51:18 -07:00
archive.h
argv-array.c
argv-array.h
attr.c
attr.h
base85.c
bisect.c Merge branch 'nd/log-show-linear-break' 2014-04-03 12:38:11 -07:00
bisect.h
blob.c add object_as_type helper for casting objects 2014-07-28 10:14:33 -07:00
blob.h
branch.c Merge branch 'an/branch-config-message' 2014-03-31 16:31:20 -07:00
branch.h
builtin.h
bulk-checkin.c
bulk-checkin.h
bundle.c Merge branch 'nd/log-show-linear-break' 2014-04-03 12:38:11 -07:00
bundle.h
cache-tree.c Merge branch 'rm/strchrnul-not-strlen' 2014-03-18 13:51:18 -07:00
cache-tree.h
cache.h Sync with v1.9.5 2014-12-17 11:28:54 -08:00
check-builtins.sh check-builtins.sh: use the $(...) construct for command substitution 2014-03-25 13:42:52 -07:00
check-racy.c
check_bindir
color.c
color.h
column.c comments: fix misuses of "nor" 2014-03-31 15:29:27 -07:00
column.h
combine-diff.c Merge branch 'mk/show-s-no-extra-blank-line-for-merges' into maint 2014-06-25 11:49:39 -07:00
command-list.txt
commit-slab.h commit-slab: provide a static initializer 2014-06-13 12:08:17 -07:00
commit.c add object_as_type helper for casting objects 2014-07-28 10:14:33 -07:00
commit.h reuse cached commit buffer when parsing signatures 2014-06-13 12:10:13 -07:00
config.c Sync with v1.9.5 2014-12-17 11:28:54 -08:00
config.mak.in
config.mak.uname Sync with v1.9.5 2014-12-17 11:28:54 -08:00
configure.ac Merge branch 'dm/configure-iconv-locale-charset' 2014-03-25 11:07:51 -07:00
connect.c use xmemdupz() to allocate copies of strings given by start and length 2014-07-21 10:37:02 -07:00
connect.h
connected.c
connected.h
convert.c
convert.h
copy.c
credential-cache--daemon.c
credential-cache.c
credential-store.c
credential.c
credential.h
csum-file.c
csum-file.h
ctype.c
daemon.c
date.c i18n: fix uncatchable comments for translators in date.c 2014-04-17 11:03:28 -07:00
decorate.c
decorate.h
delta.h comments: fix misuses of "nor" 2014-03-31 15:29:27 -07:00
diff-delta.c
diff-lib.c Merge branch 'jk/diff-files-assume-unchanged' into maint 2014-06-25 11:47:09 -07:00
diff-no-index.c Merge branch 'jc/fix-diff-no-index-diff-opt-parse' into maint 2014-04-09 11:59:16 -07:00
diff.c Merge branch 'bg/xcalloc-nmemb-then-size' into maint 2014-07-22 10:25:17 -07:00
diff.h
diffcore-break.c
diffcore-delta.c
diffcore-order.c Merge branch 'nd/no-more-fnmatch' 2014-03-14 14:25:31 -07:00
diffcore-pickaxe.c pickaxe: simplify kwset loop in contains() 2014-03-24 15:13:17 -07:00
diffcore-rename.c Merge branch 'dd/use-alloc-grow' 2014-03-18 13:50:21 -07:00
diffcore.h Merge branch 'nd/diff-quiet-stat-dirty' into maint 2014-03-18 13:59:56 -07:00
dir.c dir.c:trim_trailing_spaces(): fix for " \ " sequence 2014-06-02 15:48:48 -07:00
dir.h
editor.c
entry.c Merge branch 'mh/remove-subtree-long-pathname-fix' into maint 2014-04-03 13:39:05 -07:00
environment.c Sync with v1.9.5 2014-12-17 11:28:54 -08:00
exec_cmd.c
exec_cmd.h
fast-import.c
fetch-pack.c Merge branch 'jk/shallow-update-fix' into maint 2014-04-03 13:39:03 -07:00
fetch-pack.h
fmt-merge-msg.h
fsck.c Sync with v1.9.5 2014-12-17 11:28:54 -08:00
fsck.h
generate-cmdlist.sh
gettext.c
gettext.h
git-add--interactive.perl Merge branch 'jl/nor-or-nand-and' 2014-04-08 12:00:28 -07:00
git-am.sh Merge branch 'jl/nor-or-nand-and' 2014-04-08 12:00:28 -07:00
git-archimport.perl
git-bisect.sh
git-compat-util.h Merge branch 'ym/fix-opportunistic-index-update-race' into maint 2014-06-25 11:49:48 -07:00
git-cvsexportcommit.perl
git-cvsimport.perl
git-cvsserver.perl
git-difftool--helper.sh
git-difftool.perl
git-filter-branch.sh filter-branch: eliminate duplicate mapped parents 2014-07-01 08:30:41 -07:00
git-instaweb.sh git-instaweb: add support for Apache 2.4 2014-05-27 12:57:19 -07:00
git-merge-octopus.sh
git-merge-one-file.sh
git-merge-resolve.sh
git-mergetool--lib.sh
git-mergetool.sh
git-p4.py Merge branch 'cl/p4-use-diff-tree' into maint 2014-05-08 10:01:32 -07:00
git-parse-remote.sh
git-pull.sh pull: do not abuse 'break' inside a shell 'case' 2014-06-12 12:15:49 -07:00
git-quiltimport.sh
git-rebase--am.sh Merge branch 'km/avoid-non-function-return-in-rebase' 2014-04-21 10:42:46 -07:00
git-rebase--interactive.sh Merge branch 'rr/rebase-autostash-fix' into maint 2014-06-25 11:49:31 -07:00
git-rebase--merge.sh Merge branch 'bc/fix-rebase-merge-skip' into maint 2014-07-16 11:16:16 -07:00
git-rebase.sh Merge branch 'rr/rebase-autostash-fix' into maint 2014-06-25 11:49:31 -07:00
git-relink.perl
git-remote-testgit.sh Merge branch 'ep/shell-assign-and-export-vars' into maint 2014-07-22 10:22:57 -07:00
git-request-pull.sh Merge branch 'lt/request-pull' 2014-05-19 10:35:36 -07:00
git-send-email.perl
git-sh-i18n.sh
git-sh-setup.sh
git-stash.sh Merge branch 'ep/shell-assign-and-export-vars' into maint 2014-07-22 10:22:57 -07:00
git-submodule.sh Revert "submodule: explicit local branch creation in module_clone" 2014-04-02 14:15:36 -07:00
git-svn.perl Git 2.0: git svn: Set default --prefix='origin/' if --prefix is not given 2014-04-19 11:30:13 +00:00
git-web--browse.sh
git.c
git.rc
git.spec.in
gpg-interface.c
gpg-interface.h
graph.c
graph.h
grep.c Merge branch 'rs/grep-h-c' 2014-03-18 13:51:20 -07:00
grep.h
hashmap.c
hashmap.h
help.c Merge branch 'rt/help-pretty-prints-cmd-names' 2014-03-14 14:27:00 -07:00
help.h
hex.c
http-backend.c use xmemdupz() to allocate copies of strings given by start and length 2014-07-21 10:37:02 -07:00
http-fetch.c
http-push.c Merge branch 'ah/fix-http-push' into maint 2014-07-22 10:29:07 -07:00
http-walker.c
http.c Merge branch 'mh/object-code-cleanup' 2014-03-14 14:26:29 -07:00
http.h Merge branch 'jl/nor-or-nand-and' 2014-04-08 12:00:28 -07:00
ident.c
imap-send.c imap-send.c: rearrange xcalloc arguments 2014-05-27 14:02:45 -07:00
khash.h
kwset.c
kwset.h
levenshtein.c
levenshtein.h
line-log.c
line-log.h
line-range.c
line-range.h
list-objects.c Merge branch 'jk/pack-bitmap' 2014-04-08 12:00:33 -07:00
list-objects.h
ll-merge.c
ll-merge.h
lockfile.c
log-tree.c Merge branch 'zk/log-graph-showsig' into maint 2014-07-22 10:28:51 -07:00
log-tree.h
mailmap.c
mailmap.h
match-trees.c
merge-blobs.c
merge-blobs.h
merge-recursive.c Merge branch 'jk/commit-buffer-length' into maint 2014-07-16 11:16:38 -07:00
merge-recursive.h
merge.c
mergesort.c
mergesort.h
name-hash.c
notes-cache.c replace dangerous uses of strbuf_attach 2014-06-12 10:29:42 -07:00
notes-cache.h
notes-merge.c commit: record buffer length in cache 2014-06-13 12:09:38 -07:00
notes-merge.h
notes-utils.c commit_tree: take a pointer/len pair rather than a const strbuf 2014-06-12 10:29:41 -07:00
notes-utils.h commit_tree: take a pointer/len pair rather than a const strbuf 2014-06-12 10:29:41 -07:00
notes.c notes.c: rearrange xcalloc arguments 2014-05-27 14:02:45 -07:00
notes.h
object.c object_as_type: set commit index 2014-07-28 10:14:34 -07:00
object.h add object_as_type helper for casting objects 2014-07-28 10:14:33 -07:00
pack-bitmap-write.c
pack-bitmap.c add `ignore_missing_links` mode to revwalk 2014-04-04 13:31:38 -07:00
pack-bitmap.h
pack-check.c
pack-objects.c pack-objects: use free()+xcalloc() instead of xrealloc()+memset() 2014-06-02 13:51:22 -07:00
pack-objects.h
pack-revindex.c pack-revindex.c: rearrange xcalloc arguments 2014-05-27 14:02:45 -07:00
pack-revindex.h
pack-write.c
pack.h
pager.c pager: do allow spawning pager recursively 2014-04-28 16:03:22 -07:00
parse-options-cb.c
parse-options.c Merge branch 'mr/opt-set-ptr' 2014-04-08 12:00:17 -07:00
parse-options.h Merge branch 'mr/opt-set-ptr' 2014-04-08 12:00:17 -07:00
patch-delta.c
patch-ids.c
patch-ids.h
path.c Sync with v1.9.5 2014-12-17 11:28:54 -08:00
pathspec.c use xcalloc() to allocate zero-initialized memory 2014-07-21 10:30:21 -07:00
pathspec.h
pkt-line.c
pkt-line.h comments: fix misuses of "nor" 2014-03-31 15:29:27 -07:00
preload-index.c
pretty.c Merge branch 'jk/pretty-G-format-fixes' into maint 2014-07-16 11:17:21 -07:00
prio-queue.c
prio-queue.h
progress.c
progress.h
prompt.c
prompt.h
quote.c
quote.h
reachable.c
reachable.h
read-cache.c Sync with v1.9.5 2014-12-17 11:28:54 -08:00
reflog-walk.c reflog-walk.c: rearrange xcalloc arguments 2014-05-27 14:02:45 -07:00
reflog-walk.h
refs.c add object_as_type helper for casting objects 2014-07-28 10:14:33 -07:00
refs.h remote prune: optimize "dangling symref" check/warning 2014-05-27 12:30:47 -07:00
remote-curl.c
remote-testsvn.c
remote.c remote.c: rearrange xcalloc arguments 2014-05-27 14:02:45 -07:00
remote.h
replace_object.c Merge branch 'dd/use-alloc-grow' 2014-03-18 13:50:21 -07:00
rerere.c
rerere.h
resolve-undo.c
resolve-undo.h
revision.c limit_list: avoid quadratic behavior from still_interesting 2015-04-17 15:22:05 -07:00
revision.h Merge branch 'jk/pack-bitmap' 2014-04-08 12:00:33 -07:00
run-command.c commit: fix patch hunk editing with "commit -p -m" 2014-03-18 11:25:12 -07:00
run-command.h run-command: mark run_hook_with_custom_index as deprecated 2014-03-18 11:26:12 -07:00
send-pack.c
send-pack.h
sequencer.c commit: record buffer length in cache 2014-06-13 12:09:38 -07:00
sequencer.h
server-info.c
setup.c Merge branch 'mw/symlinks' into maint 2014-05-28 15:45:57 -07:00
sh-i18n--envsubst.c use xmemdupz() to allocate copies of strings given by start and length 2014-07-21 10:37:02 -07:00
sha1-array.c
sha1-array.h
sha1-lookup.c
sha1-lookup.h
sha1_file.c Merge branch 'rs/fix-alt-odb-path-comparison' into maint 2014-07-16 11:17:08 -07:00
sha1_name.c commit: record buffer length in cache 2014-06-13 12:09:38 -07:00
shallow.c shallow: verify shallow file after taking lock 2014-03-17 15:03:32 -07:00
shell.c
shortlog.h
show-index.c
sideband.c sideband.c: do not use ANSI control sequence on non-terminal 2014-06-02 11:02:27 -07:00
sideband.h
sigchain.c
sigchain.h
strbuf.c
strbuf.h
streaming.c
streaming.h
string-list.c
string-list.h
submodule.c
submodule.h
symlinks.c
tag.c add object_as_type helper for casting objects 2014-07-28 10:14:33 -07:00
tag.h
tar.h
test-chmtime.c comments: fix misuses of "nor" 2014-03-31 15:29:27 -07:00
test-ctype.c
test-date.c
test-delta.c
test-dump-cache-tree.c
test-genrandom.c
test-hashmap.c
test-index-version.c
test-line-buffer.c
test-match-trees.c
test-mergesort.c
test-mktemp.c
test-parse-options.c parse-options: remove unused OPT_SET_PTR 2014-03-31 13:01:19 -07:00
test-path-utils.c
test-prio-queue.c
test-read-cache.c
test-regex.c
test-revision-walking.c
test-run-command.c
test-scrap-cache-tree.c
test-sha1.c
test-sha1.sh
test-sigchain.c
test-string-list.c
test-subprocess.c
test-svn-fe.c
test-urlmatch-normalization.c
test-wildmatch.c
thread-utils.c
thread-utils.h
trace.c
transport-helper.c transport-helper.c: rearrange xcalloc arguments 2014-05-27 14:02:45 -07:00
transport.c
transport.h
tree-diff.c
tree-walk.c
tree-walk.h
tree.c add object_as_type helper for casting objects 2014-07-28 10:14:33 -07:00
tree.h
unimplemented.sh
unix-socket.c
unix-socket.h
unpack-trees.c Sync with v1.9.5 2014-12-17 11:28:54 -08:00
unpack-trees.h
upload-pack.c Merge branch 'nd/log-show-linear-break' 2014-04-03 12:38:11 -07:00
url.c
url.h
urlmatch.c
urlmatch.h
usage.c
userdiff.c userdiff: have 'cpp' hunk header pattern catch more C++ anchor points 2014-03-21 15:03:32 -07:00
userdiff.h
utf8.c Merge branch 'maint-1.9' into maint-2.0 2015-01-07 13:27:19 -08:00
utf8.h utf8: add is_hfs_dotgit() helper 2014-12-17 11:04:39 -08:00
varint.c
varint.h
version.c
version.h
versioncmp.c
walker.c object.h: centralize object flag allocation 2014-03-25 15:09:24 -07:00
walker.h
wildmatch.c
wildmatch.h
wrap-for-bin.sh
wrapper.c read-cache.c: verify index file before we opportunistically update it 2014-04-10 12:27:58 -07:00
write_or_die.c
ws.c
wt-status.c Merge branch 'jl/status-added-submodule-is-never-ignored' into maint 2014-06-25 11:50:03 -07:00
wt-status.h Merge branch 'mm/status-porcelain-format-i18n-fix' 2014-03-31 16:31:25 -07:00
xdiff-interface.c
xdiff-interface.h
zlib.c

README

////////////////////////////////////////////////////////////////

	Git - the stupid content tracker

////////////////////////////////////////////////////////////////

"git" can mean anything, depending on your mood.

 - random three-letter combination that is pronounceable, and not
   actually used by any common UNIX command.  The fact that it is a
   mispronunciation of "get" may or may not be relevant.
 - stupid. contemptible and despicable. simple. Take your pick from the
   dictionary of slang.
 - "global information tracker": you're in a good mood, and it actually
   works for you. Angels sing, and a light suddenly fills the room.
 - "goddamn idiotic truckload of sh*t": when it breaks

Git is a fast, scalable, distributed revision control system with an
unusually rich command set that provides both high-level operations
and full access to internals.

Git is an Open Source project covered by the GNU General Public
License version 2 (some parts of it are under different licenses,
compatible with the GPLv2). It was originally written by Linus
Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

See Documentation/gittutorial.txt to get started, then see
Documentation/everyday.txt for a useful minimum set of commands, and
Documentation/git-commandname.txt for documentation of each command.
If git has been correctly installed, then the tutorial can also be
read with "man gittutorial" or "git help tutorial", and the
documentation of each command with "man git-commandname" or "git help
commandname".

CVS users may also want to read Documentation/gitcvs-migration.txt
("man gitcvs-migration" or "git help cvs-migration" if git is
installed).

Many Git online resources are accessible from http://git-scm.com/
including full documentation and Git related tools.

The user discussion and development of Git take place on the Git
mailing list -- everyone is welcome to post bug reports, feature
requests, comments and patches to git@vger.kernel.org (read
Documentation/SubmittingPatches for instructions on patch submission).
To subscribe to the list, send an email with just "subscribe git" in
the body to majordomo@vger.kernel.org. The mailing list archives are
available at http://news.gmane.org/gmane.comp.version-control.git/,
http://marc.info/?l=git and other archival sites.

The maintainer frequently sends the "What's cooking" reports that
list the current status of various development topics to the mailing
list.  The discussion following them give a good reference for
project status, development direction and remaining tasks.