Go to file
Garima Singh f1294eaf7f bloom.c: introduce core Bloom filter constructs
Introduce the constructs for Bloom filters, Bloom filter keys
and Bloom filter settings.
For details on what Bloom filters are and how they work, refer
to Dr. Derrick Stolee's blog post [1]. It provides a concise
explanation of the adoption of Bloom filters as described in
[2] and [3].

Implementation specifics:
1. We currently use 7 and 10 for the number of hashes and the
   size of each entry respectively. They served as great starting
   values, the mathematical details behind this choice are
   described in [1] and [4]. The implementation, while not
   completely open to it at the moment, is flexible enough to allow
   for tweaking these settings in the future.

   Note: The performance gains we have observed with these values
   are significant enough that we did not need to tweak these
   settings. The performance numbers are included in the cover letter
   of this series and in the commit message of the subsequent commit
   where we use Bloom filters to speed up `git log -- path`.

2. As described in [1] and [3], we do not need 7 independent hashing
   functions. We use the Murmur3 hashing scheme, seed it twice and
   then combine those to procure an arbitrary number of hash values.

3. The filters will be sized according to the number of changes in
   each commit, in multiples of 8 bit words.

[1] Derrick Stolee
      "Supercharging the Git Commit Graph IV: Bloom Filters"
      https://devblogs.microsoft.com/devops/super-charging-the-git-commit-graph-iv-Bloom-filters/

[2] Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh, George Varghese
    "An Improved Construction for Counting Bloom Filters"
    http://theory.stanford.edu/~rinap/papers/esa2006b.pdf
    https://doi.org/10.1007/11841036_61

[3] Peter C. Dillinger and Panagiotis Manolios
    "Bloom Filters in Probabilistic Verification"
    http://www.ccs.neu.edu/home/pete/pub/Bloom-filters-verification.pdf
    https://doi.org/10.1007/978-3-540-30494-4_26

[4] Thomas Mueller Graf, Daniel Lemire
    "Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters"
    https://arxiv.org/abs/1912.08258

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 09:59:53 -07:00
.github
Documentation Merge branch 'ds/default-pack-use-sparse-to-true' 2020-03-29 09:32:51 -07:00
block-sha1
builtin Merge branch 'ds/default-pack-use-sparse-to-true' 2020-03-29 09:32:51 -07:00
ci Merge branch 'yz/p4-py3' 2020-03-25 13:57:43 -07:00
compat Merge branch 'js/mingw-open-in-gdb' into maint 2020-03-17 15:02:25 -07:00
contrib Merge branch 'kk/complete-diff-color-moved' 2020-03-09 11:21:20 -07:00
ewah Merge branch 'jk/object-filter-with-bitmap' 2020-03-02 15:07:18 -08:00
git-gui Merge https://github.com/prati0100/git-gui 2020-03-19 16:06:51 -07:00
gitk-git
gitweb
mergetools
negotiator
perl
po l10n: tr.po: change file mode to 644 2020-03-21 18:26:56 +08:00
ppc
refs
sha1collisiondetection@855827c583
sha1dc
sha256 hash: implement and use a context cloning function 2020-02-24 09:33:21 -08:00
t bloom.c: introduce core Bloom filter constructs 2020-03-30 09:59:53 -07:00
templates Merge branch 'kw/fsmonitor-watchman-racefix' 2020-02-14 12:54:20 -08:00
trace2
vcs-svn
xdiff
.cirrus.yml
.clang-format
.editorconfig
.gitattributes
.gitignore stash: remove the stash.useBuiltin setting 2020-03-05 12:50:28 -08:00
.gitmodules
.mailmap Merge branch 'bc/wildcard-credential' 2020-03-05 10:43:02 -08:00
.travis.yml
.tsan-suppressions
CODE_OF_CONDUCT.md
COPYING
GIT-VERSION-GEN The first batch post 2.26 cycle 2020-03-25 13:57:44 -07:00
INSTALL
LGPL-2.1
Makefile bloom.c: add the murmur3 hash implementation 2020-03-30 09:59:53 -07:00
README.md
RelNotes The first batch post 2.26 cycle 2020-03-25 13:57:44 -07:00
abspath.c real_path_if_valid(): remove unsafe API 2020-03-10 11:41:40 -07:00
aclocal.m4
add-interactive.c Merge branch 'js/builtin-add-i-cmds' into maint 2020-03-17 15:02:20 -07:00
add-interactive.h
add-patch.c
advice.c Merge branch 'hw/advise-ng' 2020-03-25 13:57:41 -07:00
advice.h Merge branch 'hw/advise-ng' 2020-03-25 13:57:41 -07:00
alias.c
alias.h
alloc.c
alloc.h
apply.c convert: permit passing additional metadata to filter processes 2020-03-16 11:37:02 -07:00
apply.h
archive-tar.c
archive-zip.c
archive.c convert: provide additional metadata to filters 2020-03-16 11:37:02 -07:00
archive.h convert: provide additional metadata to filters 2020-03-16 11:37:02 -07:00
argv-array.c
argv-array.h
attr.c
attr.h
azure-pipelines.yml Azure Pipeline: switch to the latest agent pools 2020-02-27 09:58:43 -08:00
banned.h
base85.c
bisect.c bisect: libify `bisect_next_all` 2020-02-19 09:37:15 -08:00
bisect.h bisect: libify `bisect_next_all` 2020-02-19 09:37:15 -08:00
blame.c
blame.h blame: provide type of fingerprints pointer 2020-02-24 12:08:48 -08:00
blob.c
blob.h
bloom.c bloom.c: introduce core Bloom filter constructs 2020-03-30 09:59:53 -07:00
bloom.h bloom.c: introduce core Bloom filter constructs 2020-03-30 09:59:53 -07:00
branch.c
branch.h
builtin.h
bulk-checkin.c
bulk-checkin.h
bundle.c
bundle.h
cache-tree.c
cache-tree.h
cache.h Merge branch 'bc/filter-process' 2020-03-26 17:11:20 -07:00
chdir-notify.c
chdir-notify.h
check-builtins.sh
check_bindir
checkout.c
checkout.h
color.c color.c: alias RGB colors 8-15 to aixterm colors 2020-02-11 11:19:00 -08:00
color.h
column.c
column.h
combine-diff.c
command-list.txt
commit-graph.c commit-graph: define and use MAX_NUM_CHUNKS 2020-03-30 09:59:52 -07:00
commit-graph.h
commit-reach.c
commit-reach.h
commit-slab-decl.h
commit-slab-impl.h
commit-slab.h commit-slab: clarify slabname##_peek()'s return value 2020-03-10 11:44:24 -07:00
commit.c Merge branch 'at/rebase-fork-point-regression-fix' 2020-03-26 17:11:21 -07:00
commit.h
common-main.c
config.c Merge branch 'bw/remote-rename-update-config' 2020-02-25 11:18:32 -08:00
config.h config: provide access to the current line number 2020-02-10 10:52:10 -08:00
config.mak.dev Merge branch 'bc/sha-256-part-1-of-4' 2020-03-26 17:11:20 -07:00
config.mak.in
config.mak.uname
configure.ac
connect.c
connect.h
connected.c connected.c: reprepare packs for corner cases 2020-03-15 15:39:00 -07:00
connected.h
convert.c convert: provide additional metadata to filters 2020-03-16 11:37:02 -07:00
convert.h convert: provide additional metadata to filters 2020-03-16 11:37:02 -07:00
copy.c
credential-cache--daemon.c
credential-cache.c
credential-store.c
credential.c credential: allow wildcard patterns when matching config 2020-02-20 13:05:43 -08:00
credential.h Merge branch 'bc/wildcard-credential' 2020-03-05 10:43:02 -08:00
csum-file.c hash: implement and use a context cloning function 2020-02-24 09:33:21 -08:00
csum-file.h
ctype.c
daemon.c
date.c
decorate.c
decorate.h
delta-islands.c
delta-islands.h
delta.h
detect-compiler
diff-delta.c
diff-lib.c
diff-no-index.c
diff.c convert: provide additional metadata to filters 2020-03-16 11:37:02 -07:00
diff.h
diffcore-break.c
diffcore-delta.c
diffcore-order.c
diffcore-pickaxe.c
diffcore-rename.c
diffcore.h
dir-iterator.c
dir-iterator.h
dir.c Merge branch 'ds/sparse-add' 2020-03-05 10:43:02 -08:00
dir.h
editor.c real_path: remove unsafe API 2020-03-10 11:41:40 -07:00
entry.c convert: provide additional metadata to filters 2020-03-16 11:37:02 -07:00
environment.c real_path: remove unsafe API 2020-03-10 11:41:40 -07:00
exec-cmd.c
exec-cmd.h
fast-import.c fast-import: add options for rewriting submodules 2020-02-28 09:53:41 -08:00
fetch-negotiator.c
fetch-negotiator.h
fetch-pack.c
fetch-pack.h
fmt-merge-msg.h
fsck.c
fsck.h
fsmonitor.c
fsmonitor.h
fuzz-commit-graph.c
fuzz-pack-headers.c
fuzz-pack-idx.c
generate-cmdlist.sh
gettext.c
gettext.h
git-add--interactive.perl
git-archimport.perl
git-bisect.sh
git-compat-util.h
git-cvsexportcommit.perl
git-cvsimport.perl
git-cvsserver.perl
git-difftool--helper.sh
git-filter-branch.sh
git-instaweb.sh
git-merge-octopus.sh
git-merge-one-file.sh
git-merge-resolve.sh
git-mergetool--lib.sh
git-mergetool.sh
git-p4.py Merge branch 'yz/p4-py3' 2020-03-25 13:57:43 -07:00
git-parse-remote.sh
git-quiltimport.sh
git-rebase--preserve-merges.sh
git-request-pull.sh
git-send-email.perl
git-sh-i18n.sh
git-sh-setup.sh
git-submodule.sh Merge branch 'es/recursive-single-branch-clone' 2020-03-05 10:43:03 -08:00
git-svn.perl
git-web--browse.sh
git.c stash: remove the stash.useBuiltin setting 2020-03-05 12:50:28 -08:00
git.rc
gpg-interface.c gpg-interface: prefer check_signature() for GPG verification 2020-03-15 09:46:28 -07:00
gpg-interface.h gpg-interface: prefer check_signature() for GPG verification 2020-03-15 09:46:28 -07:00
graph.c
graph.h
grep.c
grep.h
hash.h hash: implement and use a context cloning function 2020-02-24 09:33:21 -08:00
hashmap.c
hashmap.h
help.c
help.h
hex.c hex: add functions to parse hex object IDs in any algorithm 2020-02-24 09:33:21 -08:00
http-backend.c
http-fetch.c
http-push.c
http-walker.c
http.c Merge branch 'js/https-proxy-config' 2020-03-25 13:57:42 -07:00
http.h
ident.c
imap-send.c
interdiff.c
interdiff.h
iterator.h
json-writer.c
json-writer.h
khash.h
kwset.c
kwset.h
levenshtein.c
levenshtein.h
line-log.c
line-log.h
line-range.c
line-range.h
linear-assignment.c
linear-assignment.h
list-objects-filter-options.c
list-objects-filter-options.h
list-objects-filter.c
list-objects-filter.h
list-objects.c
list-objects.h
list.h
ll-merge.c
ll-merge.h
lockfile.c
lockfile.h
log-tree.c Merge branch 'hi/gpg-prefer-check-signature' 2020-03-26 17:11:20 -07:00
log-tree.h
ls-refs.c
ls-refs.h
mailinfo.c Merge branch 'rs/micro-cleanups' 2020-03-02 15:07:20 -08:00
mailinfo.h
mailmap.c
mailmap.h
match-trees.c
mem-pool.c
mem-pool.h
merge-blobs.c
merge-blobs.h
merge-recursive.c Merge branch 'en/t3433-rebase-stat-dirty-failure' into maint 2020-03-17 15:02:23 -07:00
merge-recursive.h
merge.c builtin/checkout: compute checkout metadata for checkouts 2020-03-16 11:37:02 -07:00
mergesort.c
mergesort.h
midx.c nth_packed_object_oid(): use customary integer return 2020-02-24 12:55:42 -08:00
midx.h
name-hash.c
notes-cache.c
notes-cache.h
notes-merge.c
notes-merge.h
notes-utils.c strbuf: add and use strbuf_insertstr() 2020-02-10 09:04:45 -08:00
notes-utils.h
notes.c Merge branch 'jh/notes-fanout-fix' into maint 2020-03-17 15:02:22 -07:00
notes.h
object-store.h packed_object_info(): use object_id for returning delta base 2020-02-24 12:55:53 -08:00
object.c Merge branch 'jk/object-filter-with-bitmap' 2020-03-02 15:07:18 -08:00
object.h pack-bitmap: fix leak of haves/wants object lists 2020-02-13 09:08:58 -08:00
oidmap.c
oidmap.h
oidset.c
oidset.h Merge branch 'en/oidset-uninclude-hashmap' 2020-03-25 13:57:44 -07:00
pack-bitmap-write.c
pack-bitmap.c Merge branch 'jk/nth-packed-object-id' 2020-03-05 10:43:03 -08:00
pack-bitmap.h Merge branch 'jk/object-filter-with-bitmap' 2020-03-02 15:07:18 -08:00
pack-check.c pack-check: push oid lookup into loop 2020-02-24 12:55:53 -08:00
pack-objects.c pack-objects: convert oe_set_delta_ext() to use object_id 2020-02-24 12:55:52 -08:00
pack-objects.h pack-objects: convert oe_set_delta_ext() to use object_id 2020-02-24 12:55:52 -08:00
pack-revindex.c
pack-revindex.h
pack-write.c
pack.h
packfile.c packfile: drop nth_packed_object_sha1() 2020-02-24 12:55:53 -08:00
packfile.h packfile: drop nth_packed_object_sha1() 2020-02-24 12:55:53 -08:00
pager.c
parse-options-cb.c parse-options: simplify parse_options_dup() 2020-02-10 09:45:49 -08:00
parse-options.c Merge branch 'pb/am-show-current-patch' 2020-03-09 11:21:19 -07:00
parse-options.h Merge branch 'pb/am-show-current-patch' 2020-03-09 11:21:19 -07:00
patch-delta.c
patch-ids.c
patch-ids.h
path.c Merge branch 'bc/sha-256-part-1-of-4' 2020-03-26 17:11:20 -07:00
path.h
pathspec.c prefix_path: show gitdir if worktree unavailable 2020-03-15 09:35:46 -07:00
pathspec.h
pkt-line.c
pkt-line.h
preload-index.c
pretty.c Merge branch 'rs/strbuf-insertstr' 2020-02-17 13:22:17 -08:00
pretty.h
prio-queue.c
prio-queue.h
progress.c
progress.h
promisor-remote.c
promisor-remote.h
prompt.c
prompt.h
protocol.c
protocol.h
quote.c quote: use isalnum() to check for alphanumeric characters 2020-02-24 09:30:29 -08:00
quote.h
range-diff.c
range-diff.h
reachable.c pack-bitmap: basic noop bitmap filter infrastructure 2020-02-14 10:46:22 -08:00
reachable.h
read-cache.c
rebase-interactive.c Merge branch 'rt/format-zero-length-fix' 2020-03-09 11:21:21 -07:00
rebase-interactive.h Merge branch 'en/rebase-backend' 2020-03-02 15:07:19 -08:00
rebase.c pull --rebase/remote rename: document and honor single-letter abbreviations rebase types 2020-02-10 10:52:10 -08:00
rebase.h pull --rebase/remote rename: document and honor single-letter abbreviations rebase types 2020-02-10 10:52:10 -08:00
ref-filter.c Merge branch 'dr/push-remote-ref-update' 2020-03-11 10:58:16 -07:00
ref-filter.h
reflog-walk.c
reflog-walk.h
refs.c
refs.h
refspec.c
refspec.h
remote-curl.c Merge branch 'rs/show-progress-in-dumb-http-fetch' 2020-03-09 11:21:21 -07:00
remote-testsvn.c
remote.c remote: drop "explicit" parameter from remote_ref_for_branch() 2020-03-03 14:56:05 -08:00
remote.h remote: drop "explicit" parameter from remote_ref_for_branch() 2020-03-03 14:56:05 -08:00
replace-object.c
replace-object.h
repo-settings.c config: set pack.useSparse=true by default 2020-03-20 14:22:31 -07:00
repository.c repository: require a build flag to use SHA-256 2020-02-24 09:33:21 -08:00
repository.h
rerere.c
rerere.h
resolve-undo.c
resolve-undo.h
revision.c
revision.h
run-command.c Merge branch 'bc/run-command-nullness-after-free-fix' into maint 2020-02-14 12:42:27 -08:00
run-command.h run-command.h: fix mis-indented struct member 2020-02-22 09:05:34 -08:00
send-pack.c
send-pack.h
sequencer.c Merge branch 'bc/filter-process' 2020-03-26 17:11:20 -07:00
sequencer.h Merge branch 'pw/advise-rebase-skip' 2020-03-25 13:57:43 -07:00
serve.c
serve.h
server-info.c
setup.c Merge branch 'bc/sha-256-part-1-of-4' 2020-03-26 17:11:20 -07:00
sh-i18n--envsubst.c
sha1-array.c
sha1-array.h
sha1-file.c Merge branch 'bc/sha-256-part-1-of-4' 2020-03-26 17:11:20 -07:00
sha1-lookup.c
sha1-lookup.h
sha1-name.c nth_packed_object_oid(): use customary integer return 2020-02-24 12:55:42 -08:00
sha1dc_git.c
sha1dc_git.h
shallow.c
shell.c
shortlog.h
sideband.c
sideband.h
sigchain.c
sigchain.h
split-index.c
split-index.h
stable-qsort.c
strbuf.c credential: allow wildcard patterns when matching config 2020-02-20 13:05:43 -08:00
strbuf.h Merge branch 'bc/wildcard-credential' 2020-03-05 10:43:02 -08:00
streaming.c
streaming.h
string-list.c
string-list.h Merge branch 'en/string-list-can-be-custom-sorted' into maint 2020-02-14 12:42:27 -08:00
sub-process.c
sub-process.h
submodule-config.c Merge branch 'mr/show-config-scope' 2020-02-17 13:22:17 -08:00
submodule-config.h
submodule.c Merge branch 'dt/submodule-rm-with-stale-cache' into maint 2020-03-17 15:02:21 -07:00
submodule.h get_superproject_working_tree(): return strbuf 2020-03-10 11:41:40 -07:00
symlinks.c
tag.c
tag.h
tar.h
tempfile.c
tempfile.h
thread-utils.c
thread-utils.h
tmp-objdir.c
tmp-objdir.h
trace.c
trace.h
trace2.c
trace2.h
trailer.c
trailer.h
transport-helper.c
transport-internal.h
transport.c Merge branch 'jk/no-flush-upon-disconnecting-slrpc-transport' into maint 2020-02-14 12:42:28 -08:00
transport.h
tree-diff.c
tree-walk.c
tree-walk.h
tree.c
tree.h
unicode-width.h unicode: update the width tables to Unicode 13.0 2020-03-17 15:06:37 -07:00
unimplemented.sh
unix-socket.c
unix-socket.h
unpack-trees.c Merge branch 'bc/filter-process' 2020-03-26 17:11:20 -07:00
unpack-trees.h builtin/checkout: compute checkout metadata for checkouts 2020-03-16 11:37:02 -07:00
upload-pack.c config: split repo scope to local and worktree 2020-02-10 10:32:20 -08:00
upload-pack.h
url.c
url.h
urlmatch.c credential: allow wildcard patterns when matching config 2020-02-20 13:05:43 -08:00
urlmatch.h credential: allow wildcard patterns when matching config 2020-02-20 13:05:43 -08:00
usage.c
userdiff.c
userdiff.h
utf8.c
utf8.h
varint.c
varint.h
version.c
version.h
versioncmp.c
walker.c Merge branch 'rs/show-progress-in-dumb-http-fetch' 2020-03-09 11:21:21 -07:00
walker.h remote-curl: show progress for fetches over dumb HTTP 2020-03-03 13:15:40 -08:00
wildmatch.c
wildmatch.h
worktree.c Merge branch 'bc/sha-256-part-1-of-4' 2020-03-26 17:11:20 -07:00
worktree.h worktree: add utility to find worktree by pathname 2020-02-24 13:04:30 -08:00
wrap-for-bin.sh
wrapper.c
write-or-die.c
ws.c
wt-status.c
wt-status.h
xdiff-interface.c
xdiff-interface.h
zlib.c

README.md

Build Status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://lore.kernel.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks