Go to file
Jeff King c9e3a4e76d patch-ids: handle duplicate hashmap entries
This fixes a bug introduced in dfb7a1b4d0 (patch-ids: stop using a
hand-rolled hashmap implementation, 2016-07-29) in which

  git rev-list --cherry-pick A...B

will fail to suppress commits reachable from A even if a commit with
matching patch-id appears in B.

Around the time of that commit, the algorithm for "--cherry-pick" looked
something like this:

  0. Traverse all of the commits, marking them as being on the left or
     right side of the symmetric difference.

  1. Iterate over the left-hand commits, inserting a patch-id struct for
     each into a hashmap, and pointing commit->util to the patch-id
     struct.

  2. Iterate over the right-hand commits, checking which are present in
     the hashmap. If so, we exclude the commit from the output _and_ we
     mark the patch-id as "seen".

  3. Iterate again over the left-hand commits, checking whether
     commit->util->seen is set; if so, exclude them from the output.

At the end, we'll have eliminated commits from both sides that have a
matching patch-id on the other side. But there's a subtle assumption
here: for any given patch-id, we must have exactly one struct
representing it. If two commits from A both have the same patch-id and
we allow duplicates in the hashmap, then we run into a problem:

  a. In step 1, we insert two patch-id structs into the hashmap.

  b. In step 2, our lookups will find only one of these structs, so only
     one "seen" flag is marked.

  c. In step 3, one of the commits in A will have its commit->util->seen
     set, but the other will not. We'll erroneously output the latter.

Prior to dfb7a1b4d0, our hashmap did not allow duplicates. Afterwards,
it used hashmap_add(), which explicitly does allow duplicates.

At that point, the solution would have been easy: when we are about to
add a duplicate, skip doing so and return the existing entry which
matches. But it gets more complicated.

In 683f17ec44 (patch-ids: replace the seen indicator with a commit
pointer, 2016-07-29), our step 3 goes away entirely. Instead, in step 2,
when the right-hand side finds a matching patch_id from the left-hand
side, we can directly mark the left-hand patch_id->commit to be omitted.
Solving that would be easy, too; there's a one-to-many relationship of
patch-ids to commits, so we just need to keep a list.

But there's more. Commit b3dfeebb92 (rebase: avoid computing unnecessary
patch IDs, 2016-07-29) built on that by lazily computing the full
patch-ids. So we don't even know when adding to the hashmap whether two
commits truly have the same id. We'd have to tentatively assign them a
list, and then possibly split them apart (possibly into N new structs)
at the moment we compute the real patch-ids. This could work, but it's
complicated and error-prone.

Instead, let's accept that we may store duplicates, and teach the lookup
side to be more clever. Rather than asking for a single matching
patch-id, it will need to iterate over all matching patch-ids. This does
mean examining every entry in a single hash bucket, but the worst-case
for a hash lookup was already doing that.

We'll keep the hashmap details out of the caller by providing a simple
iteration interface. We can retain the simple has_commit_patch_id()
interface for the other callers, but we'll simplify its return value
into an integer, rather than returning the patch_id struct. That way
they won't be tempted to look at the "commit" field of the return value
without iterating.

Reported-by: Arnaud Morin <arnaud.morin@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-12 11:13:32 -08:00
.github Merge branch 'js/ci-ghwf-dedup-tests' 2020-10-08 21:53:26 -07:00
Documentation Git 2.29.2 2020-10-29 14:24:09 -07:00
block-sha1
builtin Merge branch 'jk/committer-date-is-author-date-fix' into maint 2020-10-29 14:18:47 -07:00
ci ci: do not skip tagged revisions in GitHub workflows 2020-10-08 11:58:41 -07:00
compat compat/mingw.h: drop extern from function declaration 2020-10-07 09:55:20 -07:00
contrib Merge branch 'js/cmake-vs' 2020-10-05 14:01:52 -07:00
ewah
git-gui Merge https://github.com/prati0100/git-gui 2020-10-17 13:10:58 -07:00
gitk-git Merge remote-tracking branch 'paulus/master' into pm/gitk-update 2020-10-03 10:06:27 -07:00
gitweb
mergetools Merge branch 'ls/mergetool-meld-auto-merge' 2020-09-22 12:36:29 -07:00
negotiator
perl
po Merge branch 'master' of github.com:Softcatala/git-po 2020-10-18 09:56:33 +08:00
ppc
refs Merge branch 'hn/refs-trace-backend' 2020-09-22 12:36:28 -07:00
sha1collisiondetection@855827c583
sha1dc
sha256
t patch-ids: handle duplicate hashmap entries 2021-01-12 11:13:32 -08:00
templates hooks--update.sample: use hash-agnostic zero OID 2020-09-23 09:31:45 -07:00
trace2
vcs-svn
xdiff
.cirrus.yml
.clang-format
.editorconfig
.gitattributes
.gitignore Merge branch 'js/cmake-vs' 2020-10-05 14:01:52 -07:00
.gitmodules
.mailmap
.travis.yml
.tsan-suppressions
CODE_OF_CONDUCT.md
COPYING
GIT-VERSION-GEN Git 2.29.2 2020-10-29 14:24:09 -07:00
INSTALL
LGPL-2.1
Makefile Merge branch 'js/no-builtins-on-disk-option' into maint 2020-10-22 15:01:22 -07:00
README.md
RelNotes Git 2.29.2 2020-10-29 14:24:09 -07:00
abspath.c
aclocal.m4
add-interactive.c
add-interactive.h
add-patch.c Merge branch 'pw/add-p-edit-ita-path' 2020-09-22 12:36:28 -07:00
advice.c
advice.h
alias.c
alias.h
alloc.c
alloc.h
apply.c Merge branch 'jk/leakfix' 2020-08-27 14:04:49 -07:00
apply.h
archive-tar.c archive: read short blobs in archive.c::write_archive_entry() 2020-09-19 15:56:05 -07:00
archive-zip.c archive: read short blobs in archive.c::write_archive_entry() 2020-09-19 15:56:05 -07:00
archive.c archive: add --add-file 2020-09-19 15:56:06 -07:00
archive.h archive: add --add-file 2020-09-19 15:56:06 -07:00
attr.c
attr.h
banned.h
base85.c
bisect.c bisect--helper: reimplement `bisect_next` and `bisect_auto_next` shell functions in C 2020-09-24 12:06:30 -07:00
bisect.h
blame.c Merge branch 'tb/bloom-improvements' 2020-09-29 14:01:20 -07:00
blame.h
blob.c
blob.h
bloom.c builtin/commit-graph.c: introduce '--max-new-filters=<n>' 2020-09-18 10:35:39 -07:00
bloom.h bloom: encode out-of-bounds filters as non-empty 2020-09-17 21:55:50 -07:00
branch.c wt-status: tolerate dangling marks 2020-09-02 14:39:25 -07:00
branch.h
builtin.h Merge branch 'ds/maintenance-part-1' 2020-09-25 15:25:38 -07:00
bulk-checkin.c
bulk-checkin.h
bundle.c Merge branch 'jt/interpret-branch-name-fallback' 2020-09-09 13:53:09 -07:00
bundle.h
cache-tree.c
cache-tree.h
cache.h builtin/clone: avoid failure with GIT_DEFAULT_HASH 2020-09-22 09:22:32 -07:00
chdir-notify.c
chdir-notify.h
check-builtins.sh
check_bindir
checkout.c
checkout.h
color.c
color.h
column.c
column.h
combine-diff.c Merge branch 'jk/diff-cc-oidfind-fix' 2020-10-05 14:01:55 -07:00
command-list.txt maintenance: create basic maintenance runner 2020-09-17 11:30:04 -07:00
commit-graph.c Merge branch 'tb/bloom-improvements' 2020-09-29 14:01:20 -07:00
commit-graph.h Merge branch 'tb/bloom-improvements' 2020-09-29 14:01:20 -07:00
commit-reach.c commit-reach: fix in_merge_bases_many bug 2020-10-02 10:26:31 -07:00
commit-reach.h
commit-slab-decl.h
commit-slab-impl.h
commit-slab.h
commit.c Merge branch 'jt/interpret-branch-name-fallback' 2020-09-09 13:53:09 -07:00
commit.h drop unused argc parameters 2020-09-30 12:53:47 -07:00
common-main.c
config.c Merge branch 'jk/leakfix' 2020-08-27 14:04:49 -07:00
config.h
config.mak.dev
config.mak.in
config.mak.uname
configure.ac
connect.c Merge branch 'jk/leakfix' 2020-08-27 14:04:49 -07:00
connect.h
connected.c Merge branch 'rs/more-buffered-io' 2020-08-24 14:54:31 -07:00
connected.h
convert.c convert: drop unused crlf_action from check_global_conv_flags_eol() 2020-09-30 12:53:47 -07:00
convert.h
copy.c
credential.c credential: treat CR/LF as line endings in the credential protocol 2020-10-03 10:41:03 -07:00
credential.h
csum-file.c
csum-file.h
ctype.c
daemon.c
date.c
decorate.c
decorate.h
delta-islands.c
delta-islands.h
delta.h
detect-compiler
diff-delta.c
diff-lib.c Merge branch 'so/combine-diff-simplify' 2020-10-05 14:01:51 -07:00
diff-no-index.c
diff.c diff: fix modified lines stats with --stat and --numstat 2020-09-24 12:31:45 -07:00
diff.h Merge branch 'so/combine-diff-simplify' 2020-10-05 14:01:51 -07:00
diffcore-break.c
diffcore-delta.c
diffcore-order.c
diffcore-pickaxe.c
diffcore-rename.c
diffcore.h
dir-iterator.c
dir-iterator.h
dir.c dir.c: drop unused "untracked" from treat_path_fast() 2020-09-30 12:53:48 -07:00
dir.h
editor.c
entry.c
environment.c Merge branch 'jk/leakfix' 2020-08-27 14:04:49 -07:00
exec-cmd.c
exec-cmd.h
fetch-negotiator.c
fetch-negotiator.h
fetch-pack.c Merge branch 'jt/lazy-fetch' 2020-09-03 12:37:04 -07:00
fetch-pack.h Merge branch 'jt/lazy-fetch' 2020-09-03 12:37:04 -07:00
fmt-merge-msg.c
fmt-merge-msg.h
fsck.c
fsck.h
fsmonitor.c
fsmonitor.h
fuzz-commit-graph.c commit-graph: pass a 'struct repository *' in more places 2020-09-09 12:51:48 -07:00
fuzz-pack-headers.c
fuzz-pack-idx.c
generate-cmdlist.sh Fit to Plan 9's ANSI/POSIX compatibility layer 2020-09-09 22:31:31 -07:00
generate-configlist.sh
gettext.c
gettext.h
git-add--interactive.perl Merge branch 'pw/add-p-edit-ita-path' 2020-09-22 12:36:28 -07:00
git-archimport.perl
git-bisect.sh Merge branch 'mr/bisect-in-c-2' 2020-10-04 12:49:08 -07:00
git-compat-util.h
git-cvsexportcommit.perl cvsexportcommit: do not run git programs in dashed form 2020-08-26 14:49:52 -07:00
git-cvsimport.perl
git-cvsserver.perl
git-difftool--helper.sh
git-filter-branch.sh
git-instaweb.sh
git-merge-octopus.sh
git-merge-one-file.sh
git-merge-resolve.sh
git-mergetool--lib.sh
git-mergetool.sh
git-p4.py git-p4: use HEAD~$n to find parent commit for unshelve 2020-09-19 13:44:55 -07:00
git-parse-remote.sh
git-quiltimport.sh
git-rebase--preserve-merges.sh
git-request-pull.sh
git-send-email.perl
git-sh-i18n.sh
git-sh-setup.sh
git-submodule.sh Merge branch 'td/submodule-update-quiet' 2020-10-05 14:01:53 -07:00
git-svn.perl
git-web--browse.sh
git.c Merge branch 'js/no-builtins-on-disk-option' 2020-10-08 21:53:26 -07:00
git.rc
gpg-interface.c
gpg-interface.h
graph.c
graph.h
grep.c
grep.h
hash.h
hashmap.c
hashmap.h hashmap_for_each_entry(): workaround MSVC's runtime check failure #3 2020-09-30 13:26:54 -07:00
help.c help: do not expect built-in commands to be hardlinked 2020-10-07 15:25:10 -07:00
help.h help: do not expect built-in commands to be hardlinked 2020-10-07 15:25:10 -07:00
hex.c
http-backend.c
http-fetch.c
http-push.c
http-walker.c
http.c
http.h
ident.c Merge branch 'pw/rebase-i-more-options' 2020-09-03 12:37:01 -07:00
imap-send.c
iterator.h
json-writer.c
json-writer.h
khash.h
kwset.c
kwset.h
levenshtein.c
levenshtein.h
line-log.c Merge branch 'tb/bloom-improvements' 2020-09-29 14:01:20 -07:00
line-log.h
line-range.c
line-range.h
linear-assignment.c
linear-assignment.h
list-objects-filter-options.c fetch: do not override partial clone filter 2020-09-28 16:11:59 -07:00
list-objects-filter-options.h
list-objects-filter.c
list-objects-filter.h
list-objects.c
list-objects.h
list.h
ll-merge.c
ll-merge.h
lockfile.c
lockfile.h
log-tree.c Merge branch 'so/combine-diff-simplify' 2020-10-05 14:01:51 -07:00
log-tree.h
ls-refs.c
ls-refs.h
mailinfo.c
mailinfo.h
mailmap.c
mailmap.h
match-trees.c
mem-pool.c
mem-pool.h
merge-blobs.c
merge-blobs.h
merge-recursive.c
merge-recursive.h
merge.c
mergesort.c
mergesort.h
midx.c Merge branch 'rs/misc-cleanups' 2020-09-18 17:58:00 -07:00
midx.h
name-hash.c
notes-cache.c
notes-cache.h
notes-merge.c
notes-merge.h
notes-utils.c
notes-utils.h
notes.c
notes.h
object-store.h
object.c
object.h maintenance: add auto condition for commit-graph task 2020-09-17 11:30:05 -07:00
oid-array.c
oid-array.h
oidmap.c
oidmap.h
oidset.c blame: validate and peel the object names on the ignore list 2020-09-24 22:20:58 -07:00
oidset.h blame: validate and peel the object names on the ignore list 2020-09-24 22:20:58 -07:00
pack-bitmap-write.c pack-bitmap-write: use hashwrite_be32() in write_hash_cache() 2020-09-06 13:40:41 -07:00
pack-bitmap.c
pack-bitmap.h
pack-check.c
pack-objects.c
pack-objects.h
pack-revindex.c
pack-revindex.h
pack-write.c pack-write: use hashwrite_be32() in write_idx_file() 2020-09-19 12:15:36 -07:00
pack.h
packfile.c Merge branch 'mt/delta-base-cache-races' 2020-10-04 12:49:15 -07:00
packfile.h midx: traverse the local MIDX first 2020-08-28 14:07:09 -07:00
pager.c
parse-options-cb.c assert PARSE_OPT_NONEG in parse-options callbacks 2020-09-30 12:53:47 -07:00
parse-options.c
parse-options.h
patch-delta.c
patch-ids.c patch-ids: handle duplicate hashmap entries 2021-01-12 11:13:32 -08:00
patch-ids.h patch-ids: handle duplicate hashmap entries 2021-01-12 11:13:32 -08:00
path.c
path.h
pathspec.c
pathspec.h
pkt-line.c
pkt-line.h
preload-index.c
pretty.c pretty: refactor `format_sanitized_subject()` 2020-08-28 13:52:51 -07:00
pretty.h pretty: refactor `format_sanitized_subject()` 2020-08-28 13:52:51 -07:00
prio-queue.c
prio-queue.h
progress.c
progress.h
promisor-remote.c promisor-remote: remove unused variable 2020-09-21 22:32:49 -07:00
promisor-remote.h promisor-remote: remove unused variable 2020-09-21 22:32:49 -07:00
prompt.c
prompt.h
protocol.c protocol: re-enable v2 protocol by default 2020-09-25 11:40:42 -07:00
protocol.h
prune-packed.c
prune-packed.h
quote.c quote: turn 'nodq' parameter into a set of flags 2020-09-10 13:08:07 -07:00
quote.h quote: turn 'nodq' parameter into a set of flags 2020-09-10 13:08:07 -07:00
range-diff.c
range-diff.h
reachable.c
reachable.h
read-cache.c read-cache: fix mem-pool allocation for multi-threaded index loading 2020-09-06 12:34:12 -07:00
rebase-interactive.c
rebase-interactive.h
rebase.c
rebase.h
ref-filter.c Merge branch 'ma/worktree-cleanups' 2020-10-05 14:01:52 -07:00
ref-filter.h ref-filter: make internal reachable-filter API more precise 2020-09-18 15:41:55 -07:00
reflog-walk.c
reflog-walk.h
refs.c Merge branch 'hn/refs-trace-backend' 2020-09-22 12:36:28 -07:00
refs.h Merge branch 'jt/interpret-branch-name-fallback' 2020-09-09 13:53:09 -07:00
refspec.c Merge branch 'jk/refspecs-negative' 2020-10-05 14:01:54 -07:00
refspec.h Merge branch 'jk/refspecs-negative' 2020-10-05 14:01:54 -07:00
remote-curl.c Merge branch 'jt/lazy-fetch' 2020-09-03 12:37:04 -07:00
remote.c Merge branch 'jk/refspecs-negative' 2020-10-05 14:01:54 -07:00
remote.h Merge branch 'jk/refspecs-negative' 2020-10-05 14:01:54 -07:00
replace-object.c
replace-object.h
repo-settings.c Merge branch 'tb/bloom-improvements' 2020-09-29 14:01:20 -07:00
repository.c
repository.h Merge branch 'tb/bloom-improvements' 2020-09-29 14:01:20 -07:00
rerere.c
rerere.h
reset.c
reset.h
resolve-undo.c
resolve-undo.h
revision.c patch-ids: handle duplicate hashmap entries 2021-01-12 11:13:32 -08:00
revision.h revision: add separate field for "-m" of "diff-index -m" 2020-08-31 13:42:58 -07:00
run-command.c maintenance: replace run_auto_gc() 2020-09-17 11:30:05 -07:00
run-command.h maintenance: replace run_auto_gc() 2020-09-17 11:30:05 -07:00
send-pack.c Merge branch 'hx/push-atomic-with-cert' 2020-09-25 15:25:41 -07:00
send-pack.h
sequencer.c Merge branch 'jk/committer-date-is-author-date-fix' into maint 2020-10-29 14:18:47 -07:00
sequencer.h Merge branch 'pw/rebase-i-more-options' 2020-09-03 12:37:01 -07:00
serve.c
serve.h
server-info.c
setup.c
sh-i18n--envsubst.c
sha1-file.c
sha1-lookup.c
sha1-lookup.h
sha1-name.c wt-status: tolerate dangling marks 2020-09-02 14:39:25 -07:00
sha1dc_git.c
sha1dc_git.h
shallow.c
shallow.h
shell.c
shortlog.h shortlog: allow multiple groups to be specified 2020-09-27 12:21:05 -07:00
sideband.c
sideband.h
sigchain.c
sigchain.h
split-index.c
split-index.h
stable-qsort.c
strbuf.c
strbuf.h
streaming.c
streaming.h
string-list.c
string-list.h
strvec.c
strvec.h
sub-process.c
sub-process.h
submodule-config.c
submodule-config.h
submodule.c Merge branch 'so/combine-diff-simplify' 2020-10-05 14:01:51 -07:00
submodule.h
symlinks.c
tag.c
tag.h
tar.h
tempfile.c
tempfile.h
thread-utils.c
thread-utils.h
tmp-objdir.c
tmp-objdir.h
trace.c
trace.h
trace2.c
trace2.h
trailer.c Merge branch 'jk/shortlog-group-by-trailer' 2020-10-04 12:49:14 -07:00
trailer.h trailer: add interface for iterating over commit trailers 2020-09-27 12:21:05 -07:00
transport-helper.c Merge branch 'jx/proc-receive-hook' 2020-09-25 15:25:39 -07:00
transport-internal.h
transport.c Merge branch 'jx/proc-receive-hook' 2020-09-25 15:25:39 -07:00
transport.h Merge branch 'jt/lazy-fetch' 2020-09-03 12:37:04 -07:00
tree-diff.c bloom/diff: properly short-circuit on max_changes 2020-09-17 09:31:25 -07:00
tree-walk.c
tree-walk.h
tree.c
tree.h
unicode-width.h
unimplemented.sh
unix-socket.c
unix-socket.h
unpack-trees.c
unpack-trees.h
upload-pack.c Merge branch 'rs/more-buffered-io' 2020-08-24 14:54:31 -07:00
upload-pack.h
url.c
url.h
urlmatch.c
urlmatch.h
usage.c
userdiff.c
userdiff.h
utf8.c
utf8.h
varint.c
varint.h
version.c
version.h
versioncmp.c
walker.c
walker.h
wildmatch.c
wildmatch.h
worktree.c Merge branch 'ma/worktree-cleanups' 2020-10-05 14:01:52 -07:00
worktree.h Merge branch 'ma/worktree-cleanups' 2020-10-05 14:01:52 -07:00
wrap-for-bin.sh
wrapper.c xrealloc: do not reuse pointer freed by zero-length realloc() 2020-09-02 12:18:14 -07:00
write-or-die.c
ws.c
wt-status.c Merge branch 'ma/worktree-cleanups' 2020-10-05 14:01:52 -07:00
wt-status.h wt-status: introduce wt_status_state_free_buffers() 2020-09-27 14:21:47 -07:00
xdiff-interface.c
xdiff-interface.h
zlib.c

README.md

Build status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://lore.kernel.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks