![]() Add a new test-tool helper, name-hash, to output the value of the name-hash algorithms for the input list of strings, one per line. Since the name-hash values can be stored in the .bitmap files, it is important that these hash functions do not change across Git versions. Add a simple test to t5310-pack-bitmaps.sh to provide some testing of the current values. Due to how these functions are implemented, it would be difficult to change them without disturbing these values. The paths used for this test are carefully selected to demonstrate some of the behavior differences of the two current name hash versions, including which conditions will cause them to collide. Create a performance test that uses test_size to demonstrate how collisions occur for these hash algorithms. This test helps inform someone as to the behavior of the name-hash algorithms for their repo based on the paths at HEAD. My copy of the Git repository shows modest statistics around the collisions of the default name-hash algorithm: Test this tree -------------------------------------------------- 5314.1: paths at head 4.5K 5314.2: distinct hash value: v1 4.1K 5314.3: maximum multiplicity: v1 13 5314.4: distinct hash value: v2 4.2K 5314.5: maximum multiplicity: v2 9 Here, the maximum collision multiplicity is 13, but around 10% of paths have a collision with another path. In a more interesting example, the microsoft/fluentui [1] repo had these statistics at time of committing: Test this tree -------------------------------------------------- 5314.1: paths at head 19.5K 5314.2: distinct hash value: v1 8.2K 5314.3: maximum multiplicity: v1 279 5314.4: distinct hash value: v2 17.8K 5314.5: maximum multiplicity: v2 44 [1] https://github.com/microsoft/fluentui That demonstrates that of the nearly twenty thousand path names, they are assigned around eight thousand distinct values. 279 paths are assigned to a single value, leading the packing algorithm to sort objects from those paths together, by size. With the v2 name hash function, the maximum multiplicity lowers to 44, leaving some room for further improvement. In a more extreme example, an internal monorepo had a much worse collision rate: Test this tree -------------------------------------------------- 5314.1: paths at head 227.3K 5314.2: distinct hash value: v1 72.3K 5314.3: maximum multiplicity: v1 14.4K 5314.4: distinct hash value: v2 166.5K 5314.5: maximum multiplicity: v2 138 Here, we can see that the v2 name hash function provides somem improvements, but there are still a number of collisions that could lead to repacking problems at this scale. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> |
||
---|---|---|
.github | ||
Documentation | ||
block-sha1 | ||
builtin | ||
ci | ||
compat | ||
contrib | ||
ewah | ||
git-gui | ||
gitk-git | ||
gitweb | ||
mergetools | ||
negotiator | ||
oss-fuzz | ||
perl | ||
po | ||
refs | ||
reftable | ||
sha1 | ||
sha1collisiondetection@855827c583 | ||
sha1dc | ||
sha256 | ||
t | ||
templates | ||
trace2 | ||
xdiff | ||
.cirrus.yml | ||
.clang-format | ||
.editorconfig | ||
.gitattributes | ||
.gitignore | ||
.gitlab-ci.yml | ||
.gitmodules | ||
.mailmap | ||
.tsan-suppressions | ||
CODE_OF_CONDUCT.md | ||
COPYING | ||
GIT-VERSION-GEN | ||
INSTALL | ||
LGPL-2.1 | ||
Makefile | ||
README.md | ||
RelNotes | ||
SECURITY.md | ||
abspath.c | ||
abspath.h | ||
aclocal.m4 | ||
add-interactive.c | ||
add-interactive.h | ||
add-patch.c | ||
advice.c | ||
advice.h | ||
alias.c | ||
alias.h | ||
alloc.c | ||
alloc.h | ||
apply.c | ||
apply.h | ||
archive-tar.c | ||
archive-zip.c | ||
archive.c | ||
archive.h | ||
attr.c | ||
attr.h | ||
banned.h | ||
base85.c | ||
base85.h | ||
bisect.c | ||
bisect.h | ||
blame.c | ||
blame.h | ||
blob.c | ||
blob.h | ||
bloom.c | ||
bloom.h | ||
branch.c | ||
branch.h | ||
builtin.h | ||
bulk-checkin.c | ||
bulk-checkin.h | ||
bundle-uri.c | ||
bundle-uri.h | ||
bundle.c | ||
bundle.h | ||
cache-tree.c | ||
cache-tree.h | ||
cbtree.c | ||
cbtree.h | ||
chdir-notify.c | ||
chdir-notify.h | ||
check-builtins.sh | ||
checkout.c | ||
checkout.h | ||
chunk-format.c | ||
chunk-format.h | ||
color.c | ||
color.h | ||
column.c | ||
column.h | ||
combine-diff.c | ||
command-list.txt | ||
commit-graph.c | ||
commit-graph.h | ||
commit-reach.c | ||
commit-reach.h | ||
commit-slab-decl.h | ||
commit-slab-impl.h | ||
commit-slab.h | ||
commit.c | ||
commit.h | ||
common-main.c | ||
config.c | ||
config.h | ||
config.mak.dev | ||
config.mak.in | ||
config.mak.uname | ||
configure.ac | ||
connect.c | ||
connect.h | ||
connected.c | ||
connected.h | ||
convert.c | ||
convert.h | ||
copy.c | ||
copy.h | ||
credential.c | ||
credential.h | ||
csum-file.c | ||
csum-file.h | ||
ctype.c | ||
daemon.c | ||
date.c | ||
date.h | ||
decorate.c | ||
decorate.h | ||
delta-islands.c | ||
delta-islands.h | ||
delta.h | ||
detect-compiler | ||
diagnose.c | ||
diagnose.h | ||
diff-delta.c | ||
diff-lib.c | ||
diff-merges.c | ||
diff-merges.h | ||
diff-no-index.c | ||
diff.c | ||
diff.h | ||
diffcore-break.c | ||
diffcore-delta.c | ||
diffcore-order.c | ||
diffcore-pickaxe.c | ||
diffcore-rename.c | ||
diffcore-rotate.c | ||
diffcore.h | ||
dir-iterator.c | ||
dir-iterator.h | ||
dir.c | ||
dir.h | ||
editor.c | ||
editor.h | ||
entry.c | ||
entry.h | ||
environment.c | ||
environment.h | ||
exec-cmd.c | ||
exec-cmd.h | ||
fetch-negotiator.c | ||
fetch-negotiator.h | ||
fetch-pack.c | ||
fetch-pack.h | ||
fmt-merge-msg.c | ||
fmt-merge-msg.h | ||
fsck.c | ||
fsck.h | ||
fsmonitor--daemon.h | ||
fsmonitor-ipc.c | ||
fsmonitor-ipc.h | ||
fsmonitor-ll.h | ||
fsmonitor-path-utils.h | ||
fsmonitor-settings.c | ||
fsmonitor-settings.h | ||
fsmonitor.c | ||
fsmonitor.h | ||
generate-cmdlist.sh | ||
generate-configlist.sh | ||
generate-hooklist.sh | ||
gettext.c | ||
gettext.h | ||
git-archimport.perl | ||
git-compat-util.h | ||
git-curl-compat.h | ||
git-cvsexportcommit.perl | ||
git-cvsimport.perl | ||
git-cvsserver.perl | ||
git-difftool--helper.sh | ||
git-filter-branch.sh | ||
git-instaweb.sh | ||
git-merge-octopus.sh | ||
git-merge-one-file.sh | ||
git-merge-resolve.sh | ||
git-mergetool--lib.sh | ||
git-mergetool.sh | ||
git-p4.py | ||
git-quiltimport.sh | ||
git-request-pull.sh | ||
git-send-email.perl | ||
git-sh-i18n.sh | ||
git-sh-setup.sh | ||
git-submodule.sh | ||
git-svn.perl | ||
git-web--browse.sh | ||
git-zlib.c | ||
git-zlib.h | ||
git.c | ||
git.rc | ||
gpg-interface.c | ||
gpg-interface.h | ||
graph.c | ||
graph.h | ||
grep.c | ||
grep.h | ||
hash-lookup.c | ||
hash-lookup.h | ||
hash.h | ||
hashmap.c | ||
hashmap.h | ||
help.c | ||
help.h | ||
hex-ll.c | ||
hex-ll.h | ||
hex.c | ||
hex.h | ||
hook.c | ||
hook.h | ||
http-backend.c | ||
http-fetch.c | ||
http-push.c | ||
http-walker.c | ||
http.c | ||
http.h | ||
ident.c | ||
ident.h | ||
imap-send.c | ||
iterator.h | ||
json-writer.c | ||
json-writer.h | ||
khash.h | ||
kwset.c | ||
kwset.h | ||
levenshtein.c | ||
levenshtein.h | ||
line-log.c | ||
line-log.h | ||
line-range.c | ||
line-range.h | ||
linear-assignment.c | ||
linear-assignment.h | ||
list-objects-filter-options.c | ||
list-objects-filter-options.h | ||
list-objects-filter.c | ||
list-objects-filter.h | ||
list-objects.c | ||
list-objects.h | ||
list.h | ||
lockfile.c | ||
lockfile.h | ||
log-tree.c | ||
log-tree.h | ||
loose.c | ||
loose.h | ||
ls-refs.c | ||
ls-refs.h | ||
mailinfo.c | ||
mailinfo.h | ||
mailmap.c | ||
mailmap.h | ||
match-trees.c | ||
match-trees.h | ||
mem-pool.c | ||
mem-pool.h | ||
merge-blobs.c | ||
merge-blobs.h | ||
merge-ll.c | ||
merge-ll.h | ||
merge-ort-wrappers.c | ||
merge-ort-wrappers.h | ||
merge-ort.c | ||
merge-ort.h | ||
merge-recursive.c | ||
merge-recursive.h | ||
merge.c | ||
merge.h | ||
mergesort.h | ||
midx-write.c | ||
midx.c | ||
midx.h | ||
name-hash.c | ||
name-hash.h | ||
notes-cache.c | ||
notes-cache.h | ||
notes-merge.c | ||
notes-merge.h | ||
notes-utils.c | ||
notes-utils.h | ||
notes.c | ||
notes.h | ||
object-file-convert.c | ||
object-file-convert.h | ||
object-file.c | ||
object-file.h | ||
object-name.c | ||
object-name.h | ||
object-store-ll.h | ||
object-store.h | ||
object.c | ||
object.h | ||
oid-array.c | ||
oid-array.h | ||
oidmap.c | ||
oidmap.h | ||
oidset.c | ||
oidset.h | ||
oidtree.c | ||
oidtree.h | ||
pack-bitmap-write.c | ||
pack-bitmap.c | ||
pack-bitmap.h | ||
pack-check.c | ||
pack-mtimes.c | ||
pack-mtimes.h | ||
pack-objects.c | ||
pack-objects.h | ||
pack-revindex.c | ||
pack-revindex.h | ||
pack-write.c | ||
pack.h | ||
packfile.c | ||
packfile.h | ||
pager.c | ||
pager.h | ||
parallel-checkout.c | ||
parallel-checkout.h | ||
parse-options-cb.c | ||
parse-options.c | ||
parse-options.h | ||
parse.c | ||
parse.h | ||
patch-delta.c | ||
patch-ids.c | ||
patch-ids.h | ||
path.c | ||
path.h | ||
pathspec.c | ||
pathspec.h | ||
pkt-line.c | ||
pkt-line.h | ||
preload-index.c | ||
preload-index.h | ||
pretty.c | ||
pretty.h | ||
prio-queue.c | ||
prio-queue.h | ||
progress.c | ||
progress.h | ||
promisor-remote.c | ||
promisor-remote.h | ||
prompt.c | ||
prompt.h | ||
protocol-caps.c | ||
protocol-caps.h | ||
protocol.c | ||
protocol.h | ||
prune-packed.c | ||
prune-packed.h | ||
pseudo-merge.c | ||
pseudo-merge.h | ||
quote.c | ||
quote.h | ||
range-diff.c | ||
range-diff.h | ||
reachable.c | ||
reachable.h | ||
read-cache-ll.h | ||
read-cache.c | ||
read-cache.h | ||
rebase-interactive.c | ||
rebase-interactive.h | ||
rebase.c | ||
rebase.h | ||
ref-filter.c | ||
ref-filter.h | ||
reflog-walk.c | ||
reflog-walk.h | ||
reflog.c | ||
reflog.h | ||
refs.c | ||
refs.h | ||
refspec.c | ||
refspec.h | ||
remote-curl.c | ||
remote.c | ||
remote.h | ||
replace-object.c | ||
replace-object.h | ||
repo-settings.c | ||
repo-settings.h | ||
repository.c | ||
repository.h | ||
rerere.c | ||
rerere.h | ||
reset.c | ||
reset.h | ||
resolve-undo.c | ||
resolve-undo.h | ||
revision.c | ||
revision.h | ||
run-command.c | ||
run-command.h | ||
sane-ctype.h | ||
scalar.c | ||
send-pack.c | ||
send-pack.h | ||
sequencer.c | ||
sequencer.h | ||
serve.c | ||
serve.h | ||
server-info.c | ||
server-info.h | ||
setup.c | ||
setup.h | ||
sh-i18n--envsubst.c | ||
sha1dc_git.c | ||
sha1dc_git.h | ||
shallow.c | ||
shallow.h | ||
shared.mak | ||
shell.c | ||
shortlog.h | ||
sideband.c | ||
sideband.h | ||
sigchain.c | ||
sigchain.h | ||
simple-ipc.h | ||
sparse-index.c | ||
sparse-index.h | ||
split-index.c | ||
split-index.h | ||
stable-qsort.c | ||
statinfo.c | ||
statinfo.h | ||
strbuf.c | ||
strbuf.h | ||
streaming.c | ||
streaming.h | ||
string-list.c | ||
string-list.h | ||
strmap.c | ||
strmap.h | ||
strvec.c | ||
strvec.h | ||
sub-process.c | ||
sub-process.h | ||
submodule-config.c | ||
submodule-config.h | ||
submodule.c | ||
submodule.h | ||
symlinks.c | ||
symlinks.h | ||
tag.c | ||
tag.h | ||
tar.h | ||
tempfile.c | ||
tempfile.h | ||
thread-utils.c | ||
thread-utils.h | ||
tmp-objdir.c | ||
tmp-objdir.h | ||
trace.c | ||
trace.h | ||
trace2.c | ||
trace2.h | ||
trailer.c | ||
trailer.h | ||
transport-helper.c | ||
transport-internal.h | ||
transport.c | ||
transport.h | ||
tree-diff.c | ||
tree-walk.c | ||
tree-walk.h | ||
tree.c | ||
tree.h | ||
unicode-width.h | ||
unimplemented.sh | ||
unix-socket.c | ||
unix-socket.h | ||
unix-stream-server.c | ||
unix-stream-server.h | ||
unpack-trees.c | ||
unpack-trees.h | ||
upload-pack.c | ||
upload-pack.h | ||
url.c | ||
url.h | ||
urlmatch.c | ||
urlmatch.h | ||
usage.c | ||
userdiff.c | ||
userdiff.h | ||
utf8.c | ||
utf8.h | ||
varint.c | ||
varint.h | ||
version.c | ||
version.h | ||
versioncmp.c | ||
versioncmp.h | ||
walker.c | ||
walker.h | ||
wildmatch.c | ||
wildmatch.h | ||
worktree.c | ||
worktree.h | ||
wrap-for-bin.sh | ||
wrapper.c | ||
wrapper.h | ||
write-or-die.c | ||
write-or-die.h | ||
ws.c | ||
ws.h | ||
wt-status.c | ||
wt-status.h | ||
xdiff-interface.c | ||
xdiff-interface.h |
README.md
Git - fast, scalable, distributed revision control system
Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.
Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.
Please read the file INSTALL for installation instructions.
Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.
See Documentation/gittutorial.txt to get started, then see
Documentation/giteveryday.txt for a useful minimum set of commands, and
Documentation/git-<commandname>.txt
for documentation of each command.
If git has been correctly installed, then the tutorial can also be
read with man gittutorial
or git help tutorial
, and the
documentation of each command with man git-<commandname>
or git help <commandname>
.
CVS users may also want to read Documentation/gitcvs-migration.txt
(man gitcvs-migration
or git help cvs-migration
if git is
installed).
The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission and Documentation/CodingGuidelines).
Those wishing to help with error message, usage and informational message
string translations (localization l10) should see po/README.md
(a po
file is a Portable Object file that holds the translations).
To subscribe to the list, send an email to git+subscribe@vger.kernel.org (see https://subspace.kernel.org/subscribing.html for details). The mailing list archives are available at https://lore.kernel.org/git/, https://marc.info/?l=git and other archival sites.
Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.
The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.
The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):
- random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
- stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
- "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
- "goddamn idiotic truckload of sh*t": when it breaks