Go to file
René Scharfe afc72b5d3a mergesort: use ranks stack
The bottom-up mergesort implementation needs to skip through sublists a
lot.  A recursive version could avoid that, but would require log2(n)
stack frames.  Explicitly manage a stack of sorted sublists of various
lengths instead to avoid fast-forwarding while also keeping a lid on
memory usage.

While this patch was developed independently, a ranks stack is also used
in https://github.com/mono/mono/blob/master/mono/eglib/sort.frag.h by
the Mono project.

The idea is to keep slots for log2(n_max) sorted sublists, one for each
power of 2.  Such a construct can accommodate lists of any length up to
n_max.  Since there is a known maximum number of items (effectively
SIZE_MAX), we can preallocate the whole rank stack.

We add items one by one, which is akin to incrementing a binary number.
Make use of that by keeping track of the number of items and check bits
in it instead of checking for NULL in the rank stack when checking if a
sublist of a certain rank exists, in order to avoid memory accesses.

The first item can go into the empty first slot as a sublist of length
2^0.  The second one needs to be merged with the previous sublist and
the result goes into the empty second slot as a sublist of length 2^1.
The third one goes into vacated first slot and so on.  At the end we
merge all the sublists to get the result.

The new version still performs a stable sort by making sure to put items
seen earlier first when the compare function indicates equality.  That's
done by preferring items from sublists with a higher rank.

The new merge function also tries to minimize the number of operations.
Like blame.c::blame_merge(), the function doesn't set the next pointer
if it already points to the right item, and it exits when it reaches the
end of one of the two sublists that it's given.  The old code couldn't
do the latter because it kept all items in a single list.

The number of comparisons stays the same, though.  Here's example output
of "test-tool mergesort test" for the rand distributions with the most
number of comparisons with the ranks stack:

   $ t/helper/test-tool mergesort test | awk '
       NR > 1 && $1 != "rand" {next}
       $7 > max[$3] {max[$3] = $7; line[$3] = $0}
       END {for (n in line) print line[n]}
   '

distribut mode                    n        m get_next set_next  compare verdict
rand      copy                  100       32      669      420      569 OK
rand      dither               1023       64     9997     5396     8974 OK
rand      dither               1024      512    10007     6159     8983 OK
rand      dither               1025      256    10993     5988     9968 OK

Here are the differences to the results without this patch:

distribut mode                    n        m get_next set_next  compare
rand      copy                  100       32     -515     -280        0
rand      dither               1023       64    -6376    -4834        0
rand      dither               1024      512    -6377    -4081        0
rand      dither               1025      256    -7461    -5287        0

The numbers of get_next and set_next calls are reduced significantly.

NB: These winners are different than the ones shown in the patch that
introduced the unriffle mode because the addition of the unriffle_skewed
mode in between changed the consumption of rand() values.

Here are the distributions with the most comparisons overall with the
ranks stack:

   $ t/helper/test-tool mergesort test | awk '
       $7 > max[$3] {max[$3] = $7; line[$3] = $0}
       END {for (n in line) print line[n]}
   '

distribut mode                    n        m get_next set_next  compare verdict
sawtooth  unriffle_skewed       100      128      689      632      589 OK
sawtooth  unriffle_skewed      1023     1024    10230    10220     9207 OK
sawtooth  unriffle             1024     1024    10241    10240     9217 OK
sawtooth  unriffle_skewed      1025     2048    11266    10242    10241 OK

And here the differences to before:

distribut mode                    n        m get_next set_next  compare
sawtooth  unriffle_skewed       100      128     -495      -68        0
sawtooth  unriffle_skewed      1023     1024    -6143      -10        0
sawtooth  unriffle             1024     1024    -6143        0        0
sawtooth  unriffle_skewed      1025     2048    -7188    -1033        0

We get a similar reduction of get_next calls here, but only a slight
reduction of set_next calls, if at all.

And here are the results of p0071-sort.sh before:

0071.12: llist_mergesort() unsorted    0.36(0.33+0.01)
0071.14: llist_mergesort() sorted      0.15(0.13+0.01)
0071.16: llist_mergesort() reversed    0.16(0.14+0.01)

... and here the ones with this patch:

0071.12: llist_mergesort() unsorted    0.24(0.22+0.01)
0071.14: llist_mergesort() sorted      0.12(0.10+0.01)
0071.16: llist_mergesort() reversed    0.12(0.10+0.01)

NB: We can't use t/perf/run to compare revisions in one run because it
uses the test-tool from the worktree, not from the revisions being
tested.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:09 -07:00
.github Merge branch 'tb/ci-run-cocci-with-18.04' into maint 2021-02-11 13:57:36 -08:00
Documentation Git 2.30.2 2021-02-12 15:51:13 +01:00
block-sha1
builtin Merge branch 'ab/branch-sort' into maint 2021-02-08 14:05:55 -08:00
ci ci/install-depends: attempt to fix "brew cask" stuff 2021-01-14 19:08:56 -08:00
compat Sync with 2.29.3 2021-02-12 15:51:12 +01:00
contrib Merge branch 'js/cmake-extra-built-ins-fix' 2020-12-14 10:21:38 -08:00
ewah
git-gui Merge https://github.com/prati0100/git-gui 2020-12-18 15:07:10 -08:00
gitk-git
gitweb
mergetools
negotiator
perl
po l10n: zh_CN: for git v2.30.0 l10n round 1 and 2 2020-12-27 19:23:27 +08:00
ppc
refs refs/files-backend: don't peek into `struct lock_file` 2021-01-06 13:53:32 -08:00
sha1collisiondetection@855827c583
sha1dc
sha256
t p0071: test performance of llist_mergesort() 2021-10-01 12:43:09 -07:00
templates
trace2
vcs-svn
xdiff
.cirrus.yml
.clang-format
.editorconfig
.gitattributes
.gitignore
.gitmodules
.mailmap
.travis.yml
.tsan-suppressions
CODE_OF_CONDUCT.md
COPYING
GIT-VERSION-GEN Git 2.30.2 2021-02-12 15:51:13 +01:00
INSTALL doc: mention Python 3.x supports 2020-12-14 15:01:03 -08:00
LGPL-2.1
Makefile Merge branch 'js/skip-dashed-built-ins-from-config-mak' into maint 2021-02-05 16:31:28 -08:00
README.md
RelNotes Git 2.30.2 2021-02-12 15:51:13 +01:00
abspath.c
aclocal.m4
add-interactive.c
add-interactive.h
add-patch.c
advice.c
advice.h
alias.c
alias.h
alloc.c
alloc.h
apply.c Merge branch 'ab/unreachable-break' 2020-12-18 15:15:18 -08:00
apply.h
archive-tar.c
archive-zip.c
archive.c
archive.h
attr.c
attr.h
banned.h
base85.c
bisect.c
bisect.h
blame.c
blame.h
blob.c
blob.h
bloom.c
bloom.h
branch.c
branch.h
builtin.h
bulk-checkin.c
bulk-checkin.h
bundle.c
bundle.h
cache-tree.c
cache-tree.h
cache.h Sync with 2.29.3 2021-02-12 15:51:12 +01:00
chdir-notify.c
chdir-notify.h
check-builtins.sh
check_bindir
checkout.c
checkout.h
color.c
color.h
column.c
column.h
combine-diff.c
command-list.txt
commit-graph.c commit-graph: don't peek into `struct lock_file` 2021-01-06 13:53:32 -08:00
commit-graph.h
commit-reach.c
commit-reach.h
commit-slab-decl.h
commit-slab-impl.h
commit-slab.h
commit.c
commit.h
common-main.c
config.c
config.h
config.mak.dev
config.mak.in
config.mak.uname Merge branch 'rb/nonstop-config-mak-uname-update' 2020-12-18 15:15:18 -08:00
configure.ac
connect.c Merge branch 'jk/forbid-lf-in-git-url' into maint 2021-02-05 16:31:27 -08:00
connect.h
connected.c
connected.h
convert.c
convert.h
copy.c
credential.c
credential.h
csum-file.c
csum-file.h
ctype.c
daemon.c
date.c
decorate.c
decorate.h
delta-islands.c
delta-islands.h
delta.h
detect-compiler
diff-delta.c
diff-lib.c
diff-no-index.c
diff.c Merge branch 'jc/diff-I-status-fix' 2020-12-18 15:15:18 -08:00
diff.h
diffcore-break.c
diffcore-delta.c
diffcore-order.c
diffcore-pickaxe.c
diffcore-rename.c
diffcore.h
dir-iterator.c
dir-iterator.h
dir.c
dir.h
editor.c
entry.c
environment.c
exec-cmd.c
exec-cmd.h
fetch-negotiator.c
fetch-negotiator.h
fetch-pack.c
fetch-pack.h
fmt-merge-msg.c
fmt-merge-msg.h
fsck.c Merge branch 'jk/forbid-lf-in-git-url' into maint 2021-02-05 16:31:27 -08:00
fsck.h
fsmonitor.c
fsmonitor.h
fuzz-commit-graph.c
fuzz-pack-headers.c
fuzz-pack-idx.c
generate-cmdlist.sh
generate-configlist.sh
gettext.c gettext.c: remove/reword a mostly-useless comment 2021-01-11 13:07:33 -08:00
gettext.h
git-add--interactive.perl
git-archimport.perl
git-bisect.sh
git-compat-util.h Sync with 2.29.3 2021-02-12 15:51:12 +01:00
git-cvsexportcommit.perl
git-cvsimport.perl
git-cvsserver.perl
git-difftool--helper.sh
git-filter-branch.sh
git-instaweb.sh
git-merge-octopus.sh
git-merge-one-file.sh
git-merge-resolve.sh
git-mergetool--lib.sh Merge branch 'pb/mergetool-tool-help-fix' into maint 2021-02-05 16:31:24 -08:00
git-mergetool.sh
git-p4.py Merge branch 'dl/p4-encode-after-kw-expansion' into maint 2021-02-08 14:05:54 -08:00
git-quiltimport.sh
git-rebase--preserve-merges.sh
git-request-pull.sh
git-send-email.perl
git-sh-i18n.sh
git-sh-setup.sh
git-submodule.sh
git-svn.perl
git-web--browse.sh
git.c
git.rc
gpg-interface.c
gpg-interface.h
graph.c
graph.h
grep.c
grep.h
hash.h
hashmap.c
hashmap.h
help.c
help.h
hex.c
http-backend.c
http-fetch.c
http-push.c
http-walker.c
http.c
http.h
ident.c
imap-send.c
iterator.h
json-writer.c
json-writer.h
khash.h
kwset.c
kwset.h
levenshtein.c
levenshtein.h
line-log.c
line-log.h
line-range.c
line-range.h
linear-assignment.c
linear-assignment.h
list-objects-filter-options.c
list-objects-filter-options.h
list-objects-filter.c
list-objects-filter.h
list-objects.c
list-objects.h
list.h
ll-merge.c
ll-merge.h
lockfile.c
lockfile.h
log-tree.c
log-tree.h
ls-refs.c
ls-refs.h
mailinfo.c
mailinfo.h
mailmap.c
mailmap.h
match-trees.c
mem-pool.c
mem-pool.h
merge-blobs.c
merge-blobs.h
merge-ort-wrappers.c
merge-ort-wrappers.h
merge-ort.c
merge-ort.h
merge-recursive.c
merge-recursive.h
merge.c
mergesort.c mergesort: use ranks stack 2021-10-01 12:43:09 -07:00
mergesort.h
midx.c midx: don't peek into `struct lock_file` 2021-01-06 13:53:32 -08:00
midx.h
name-hash.c
notes-cache.c
notes-cache.h
notes-merge.c
notes-merge.h
notes-utils.c
notes-utils.h
notes.c
notes.h
object-store.h
object.c
object.h
oid-array.c
oid-array.h
oidmap.c
oidmap.h
oidset.c
oidset.h
pack-bitmap-write.c
pack-bitmap.c
pack-bitmap.h
pack-check.c
pack-objects.c
pack-objects.h
pack-revindex.c
pack-revindex.h
pack-write.c
pack.h
packfile.c
packfile.h
pager.c
parse-options-cb.c
parse-options.c
parse-options.h parse-options: format argh like error messages 2021-01-06 15:10:27 -08:00
patch-delta.c
patch-ids.c Merge branch 'jk/log-cherry-pick-duplicate-patches' into maint 2021-02-05 16:31:28 -08:00
patch-ids.h patch-ids: handle duplicate hashmap entries 2021-01-12 11:13:32 -08:00
path.c
path.h
pathspec.c
pathspec.h
pkt-line.c
pkt-line.h
preload-index.c
pretty.c
pretty.h
prio-queue.c
prio-queue.h
progress.c
progress.h
promisor-remote.c
promisor-remote.h
prompt.c
prompt.h
protocol.c
protocol.h
prune-packed.c
prune-packed.h
quote.c
quote.h
range-diff.c
range-diff.h
reachable.c
reachable.h
read-cache.c read-cache: try not to peek into `struct {lock_,temp}file` 2021-01-06 13:53:32 -08:00
rebase-interactive.c
rebase-interactive.h
rebase.c
rebase.h
ref-filter.c branch: show "HEAD detached" first under reverse sort 2021-01-07 15:13:21 -08:00
ref-filter.h branch: sort detached HEAD based on a flag 2021-01-07 15:13:21 -08:00
reflog-walk.c
reflog-walk.h
refs.c
refs.h
refspec.c
refspec.h
remote-curl.c
remote.c Merge branch 'nk/refspecs-negative-fix' 2020-12-23 13:59:46 -08:00
remote.h
replace-object.c
replace-object.h
repo-settings.c
repository.c
repository.h
rerere.c
rerere.h
reset.c
reset.h
resolve-undo.c
resolve-undo.h
revision.c Merge branch 'jk/log-cherry-pick-duplicate-patches' into maint 2021-02-05 16:31:28 -08:00
revision.h
run-command.c Sync with 2.29.3 2021-02-12 15:51:12 +01:00
run-command.h
send-pack.c
send-pack.h
sequencer.c
sequencer.h
serve.c
serve.h
server-info.c
setup.c
sh-i18n--envsubst.c
sha1-file.c
sha1-lookup.c
sha1-lookup.h
sha1-name.c
sha1dc_git.c
sha1dc_git.h
shallow.c
shallow.h
shell.c
shortlog.h
sideband.c
sideband.h
sigchain.c
sigchain.h
split-index.c
split-index.h
stable-qsort.c
strbuf.c
strbuf.h
streaming.c
streaming.h
string-list.c
string-list.h
strmap.c
strmap.h strmap: make callers of strmap_remove() to call it in void context 2020-12-15 15:30:44 -08:00
strvec.c
strvec.h
sub-process.c
sub-process.h
submodule-config.c
submodule-config.h
submodule.c
submodule.h
symlinks.c Sync with 2.20.5 2021-02-12 15:49:35 +01:00
tag.c
tag.h
tar.h
tempfile.c
tempfile.h
thread-utils.c
thread-utils.h
tmp-objdir.c
tmp-objdir.h
trace.c
trace.h
trace2.c
trace2.h
trailer.c
trailer.h
transport-helper.c
transport-internal.h
transport.c
transport.h
tree-diff.c
tree-walk.c
tree-walk.h
tree.c
tree.h
unicode-width.h
unimplemented.sh
unix-socket.c
unix-socket.h
unpack-trees.c Sync with 2.28.1 2021-02-12 15:50:14 +01:00
unpack-trees.h
upload-pack.c Merge branch 'tb/partial-clone-filters-fix' 2020-12-17 15:06:40 -08:00
upload-pack.h
url.c
url.h
urlmatch.c
urlmatch.h
usage.c
userdiff.c
userdiff.h
utf8.c
utf8.h
varint.c
varint.h
version.c
version.h
versioncmp.c
walker.c
walker.h
wildmatch.c
wildmatch.h
worktree.c
worktree.h
wrap-for-bin.sh
wrapper.c
write-or-die.c
ws.c
wt-status.c branch: sort detached HEAD based on a flag 2021-01-07 15:13:21 -08:00
wt-status.h branch: sort detached HEAD based on a flag 2021-01-07 15:13:21 -08:00
xdiff-interface.c
xdiff-interface.h
zlib.c

README.md

Build status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://lore.kernel.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks