Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fail gracefully instead of crashing when attempting to write the
contents of a corrupt in-core index as a tree object.
* ps/cache-tree-w-broken-index-entry:
unpack-trees: detect mismatching number of cache-tree/index entries
cache-tree: detect mismatching number of index entries
cache-tree: refactor verification to return error codes
In t4058 we have some tests that exercise git-read-tree(1) when used
with a tree that contains duplicate entries. While the expectation is
that we fail, we ideally should fail gracefully without a segfault.
But that is not the case: we never check that the number of entries in
the cache-tree is less than or equal to the number of entries in the
index. This can lead to an out-of-bounds read as we unconditionally
access `istate->cache[idx]`, where `idx` is controlled by the number of
cache-tree entries and the current position therein. The result is a
segfault.
Fix this segfault by adding a sanity check for the number of index
entries before dereferencing them.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `cache_tree_verify()` will `BUG()` whenever it finds that
the cache-tree extension of the index is corrupt. The function is thus
inherently untestable because the resulting call to `abort()` will be
detected by our testing framework and labelled an error. And rightfully
so: it shouldn't ever be possible to hit bugs, as they should indicate a
programming error rather than corruption of on-disk state.
Refactor the function to instead return error codes. This also ensures
that the function can be used e.g. by git-fsck(1) without the whole
process dying. Furthermore, this refactoring plugs some memory leaks
when returning early by creating a common exit path.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_git_dir()` function retrieves the path to the Git directory for
`the_repository`. Make it accept a `struct repository` such that it can
work on arbitrary repositories and make it part of the repository
subsystem. This reduces our reliance on `the_repository` and clarifies
scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.
It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.
Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit
For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).
Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both functions `is_empty_{blob,tree}_oid()` use `the_repository` to
derive the hash function that shall be used. Require callers to pass in
the hash algorithm to get rid of this implicit dependency.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both `oidread()` and `oidclr()` use `the_repository` to derive the hash
function that shall be used. Require callers to pass in the hash
algorithm to get rid of this implicit dependency.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.
* eb/hash-transition: (30 commits)
t1016-compatObjectFormat: add tests to verify the conversion between objects
t1006: test oid compatibility with cat-file
t1006: rename sha1 to oid
test-lib: compute the compatibility hash so tests may use it
builtin/ls-tree: let the oid determine the output algorithm
object-file: handle compat objects in check_object_signature
tree-walk: init_tree_desc take an oid to get the hash algorithm
builtin/cat-file: let the oid determine the output algorithm
rev-parse: add an --output-object-format parameter
repository: implement extensions.compatObjectFormat
object-file: update object_info_extended to reencode objects
object-file-convert: convert commits that embed signed tags
object-file-convert: convert commit objects when writing
object-file-convert: don't leak when converting tag objects
object-file-convert: convert tag objects when writing
object-file-convert: add a function to convert trees between algorithms
object: factor out parse_mode out of fast-import and tree-walk into in object.h
cache: add a function to read an OID of a specific algorithm
tag: sign both hashes
commit: export add_header_signature to support handling signatures on tags
...
"git merge-tree" has learned that the three trees involved in the
3-way merge only need to be trees, not necessarily commits.
* js/merge-tree-3-trees:
fill_tree_descriptor(): mark error message for translation
cache-tree: avoid an unnecessary check
Always check `parse_tree*()`'s return value
t4301: verify that merge-tree fails on missing blob objects
merge-ort: do check `parse_tree()`'s return value
merge-tree: fail with a non-zero exit code on missing tree objects
merge-tree: accept 3 trees as arguments
The first thing the `parse_tree()` function does is to return early if
the tree has already been parsed. Therefore we do not need to guard the
`parse_tree()` call behind a check of that flag.
As of time of writing, there are no other instances of this in Git's
code bases: whenever the `parsed` flag guards a `parse_tree()` call, it
guards more than just that call.
Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Otherwise we may easily run into serious crashes: For example, if we run
`init_tree_desc()` directly after a failed `parse_tree()`, we are
accessing uninitialized data or trying to dereference `NULL`.
Note that the `parse_tree()` function already takes care of showing an
error message. The `parse_tree_indirectly()` and
`repo_get_commit_tree()` functions do not, therefore those latter call
sites need to show a useful error message while the former do not.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each of these were checked with
gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).
...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file. These cases were:
* builtin/credential-cache.c
* builtin/pull.c
* builtin/send-pack.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To make it possible for git ls-tree to display the tree encoded
in the hash algorithm of the oid specified to git ls-tree, update
init_tree_desc to take as a parameter the oid of the tree object.
Update all callers of init_tree_desc and init_tree_desc_gently
to pass the oid of the tree object.
Use the oid of the tree object to discover the hash algorithm
of the oid and store that hash algorithm in struct tree_desc.
Use the hash algorithm in decode_tree_entry and
update_tree_entry_internal to handle reading a tree object encoded in
a hash algorithm that differs from the repositories hash algorithm.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To create the proper signatures for commit objects both versions of
the commit object need to be generated and signed. After that it is
a waste to throw away the work of generating the compatibility hash
so update write_object_file_flags to take a compatibility hash input
parameter that it can use to skip the work of generating the
compatability hash.
Update the places that don't generate the compatability hash to
pass NULL so it is easy to tell write_object_file_flags should
not attempt to use their compatability hash.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.
Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first). This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For the functions defined in read-cache.c, move their declarations from
cache.h to a new header, read-cache-ll.h. Also move some related inline
functions from cache.h to read-cache.h. The purpose of the
read-cache-ll.h/read-cache.h split is that about 70% of the sites don't
need the inline functions and the extra headers they include.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Dozens of files made use of trace and trace2 functions, without
explicitly including trace.h or trace2.h. This made it more difficult
to find which files could remove a dependence on cache.h. Make C files
explicitly include trace.h or trace2.h if they are using them.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ab/remove-implicit-use-of-the-repository:
libs: use "struct repository *" argument, not "the_repository"
post-cocci: adjust comments for recent repo_* migration
cocci: apply the "revision.h" part of "the_repository.pending"
cocci: apply the "rerere.h" part of "the_repository.pending"
cocci: apply the "refs.h" part of "the_repository.pending"
cocci: apply the "promisor-remote.h" part of "the_repository.pending"
cocci: apply the "packfile.h" part of "the_repository.pending"
cocci: apply the "pretty.h" part of "the_repository.pending"
cocci: apply the "object-store.h" part of "the_repository.pending"
cocci: apply the "diff.h" part of "the_repository.pending"
cocci: apply the "commit.h" part of "the_repository.pending"
cocci: apply the "commit-reach.h" part of "the_repository.pending"
cocci: apply the "cache.h" part of "the_repository.pending"
cocci: add missing "the_repository" macros to "pending"
cocci: sort "the_repository" rules by header
cocci: fix incorrect & verbose "the_repository" rules
cocci: remove dead rule from "the_repository.pending.cocci"
As can easily be seen from grepping in our sources, we had these uses
of "the_repository" in various library code in cases where the
function in question was already getting a "struct repository *"
argument. Let's use that argument instead.
Out of these changes only the changes to "cache-tree.c",
"commit-reach.c", "shallow.c" and "upload-pack.c" would have cleanly
applied before the migration away from the "repo_*()" wrapper macros
in the preceding commits.
The rest aren't new, as we'd previously implicitly refer to
"the_repository", but it's now more obvious that we were doing the
wrong thing all along, and should have used the parameter instead.
The change to change "get_index_format_default(the_repository)" in
"read-cache.c" to use the "r" variable instead should arguably have
been part of [1], or in the subsequent cleanup in [2]. Let's do it
here, as can be seen from the initial code in [3] it's not important
that we use "the_repository" there, but would prefer to always use the
current repository.
This change excludes the "the_repository" use in "upload-pack.c"'s
upload_pack_advertise(), as the in-flight [4] makes that change.
1. ee1f0c242e (read-cache: add index.skipHash config option,
2023-01-06)
2. 6269f8eaad (treewide: always have a valid "index_state.repo"
member, 2023-01-17)
3. 7211b9e753 (repo-settings: consolidate some config settings,
2019-08-13)
4. <Y/hbUsGPVNAxTdmS@coredump.intra.peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply the part of "the_repository.pending.cocci" pertaining to
"promisor-remote.h".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply the part of "the_repository.pending.cocci" pertaining to
"object-store.h".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since a64215b6cd ("object.h: stop depending on cache.h; make
cache.h depend on object.h", 2023-02-24), we have a few headers that
could have replaced their include of cache.h with an include of
object.h. Make that change now.
Some C files had to start including cache.h after this change (or some
smaller header it had brought in), because the C files were depending
on things from cache.h but were only formerly implicitly getting
cache.h through one of these headers being modified in this patch.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places. It does mean that we also need to add
includes of alloc.h in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use size_t to store the original length of the strbuf tree_len, as
that's the correct type.
Don't double the allocated size of the strbuf when adding a subdirectory
name. And the chance of the trailing slash fitting in the slack left by
strbuf_add() is very high, so stop pre-growing the strbuf at all.
Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix typos in code comments which repeat various words. Most of the
cases are simple in that they repeat a word that usually cannot be
repeated in a grammatically correct sentence. Just remove the
incorrectly duplicated word in these cases and rewrap text, if needed.
A tricky case is usage of "that that", which is sometimes grammatically
correct. However, an instance of this in "t7527-builtin-fsmonitor.sh"
doesn't need two words "that", because there is only one daemon being
discussed, so replace the second "that" with "the".
Reword code comment "entries exist on on-disk index" in function
update_one in file cache-tree.c, by replacing incorrect preposition "on"
with "in".
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function has never used its repository or cache_tree parameters
since it was introduced in 9ad2d5ea71 (sparse-index: loose integration
with cache_tree_verify(), 2021-03-30).
As that commit notes, it may eventually be extended further, and that
might require looking at more data. But we can easily add them back if
necessary (and the repository is even included in the index_state these
days already). In the mean time, dropping them makes the code shorter
and appeases -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts 080ab56a46 (cache-tree: implement cache_tree_find_path(),
2022-05-23). The cache_tree_find_path() method was never actually called
in the topic that added it. I cannot find any reference to it in any of
my forks, so this appears to not be needed at the moment.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new bug() and BUG_if_bug() API is introduced to make it easier to
uniformly log "detect multiple bugs and abort in the end" pattern.
* ab/bug-if-bug:
cache-tree.c: use bug() and BUG_if_bug()
receive-pack: use bug() and BUG_if_bug()
parse-options.c: use optbug() instead of BUG() "opts" check
parse-options.c: use new bug() API for optbug()
usage.c: add a non-fatal bug() function to go with BUG()
common-main.c: move non-trace2 exit() behavior out of trace2.c
"sparse-checkout" learns to work well with the sparse-index
feature.
* ds/sparse-sparse-checkout:
sparse-checkout: integrate with sparse index
p2000: add test for 'git sparse-checkout [add|set]'
sparse-index: complete partial expansion
sparse-index: partially expand directories
sparse-checkout: --no-sparse-index needs a full index
cache-tree: implement cache_tree_find_path()
sparse-index: introduce partially-sparse indexes
sparse-index: create expand_index()
t1092: stress test 'git sparse-checkout set'
t1092: refactor 'sparse-index contents' test
Introduce a filesystem-dependent mechanism to optimize the way the
bits for many loose object files are ensured to hit the disk
platter.
* ns/batch-fsync:
core.fsyncmethod: performance tests for batch mode
t/perf: add iteration setup mechanism to perf-lib
core.fsyncmethod: tests for batch mode
test-lib-functions: add parsing helpers for ls-files and ls-tree
core.fsync: use batch mode and sync loose objects by default on Windows
unpack-objects: use the bulk-checkin infrastructure
update-index: use the bulk-checkin infrastructure
builtin/add: add ODB transaction around add_files_to_cache
cache-tree: use ODB transaction around writing a tree
core.fsyncmethod: batched disk flushes for loose-objects
bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
bulk-checkin: rename 'state' variable and separate 'plugged' boolean
Change "BUG" output originally added in a97e4075a1 (Keep
rename/rename conflicts of intermediate merges while doing recursive
merge, 2007-03-31), and later made to say it was a "BUG" in
19c6a4f836 (merge-recursive: do not return NULL only to cause
segfault, 2010-01-21) to use the new bug() function.
This gets the same job done with slightly less code, as we won't need
to prefix lines with "BUG: ". More importantly we'll now log the full
set of messages via trace2, before this we'd only log the one BUG()
invocation.
We don't replace the last "BUG()" invocation with "BUG_if_bug()", as
in this case we're sure that we called bug() earlier, so there's no
need to make it a conditional.
While we're at it let's replace "There" with "there" in the message,
i.e. not start a message with a capital letter, per the
CodingGuidelines.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Given a 'struct cache_tree', it may be beneficial to navigate directly
to a node within that corresponds to a given path name. Create
cache_tree_find_path() for this function. It returns NULL when no such
path exists.
The implementation is adapted from do_invalidate_path() which does a
similar search but also modifies the nodes it finds along the way. The
method could be implemented simply using tail-recursion, but this while
loop does the same thing.
This new method is not currently used, but will be in an upcoming
change.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Take advantage of the odb transaction infrastructure around writing the
cached tree to the object database.
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the hash_object_file() function to take an "enum
object_type".
Since a preceding commit all of its callers are passing either
"{commit,tree,blob,tag}_type", or the result of a call to type_name(),
the parse_object() caller that would pass NULL is now using
stream_object_signature().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the write_object_file() function to take an "enum object_type"
instead of a "const char *type". Its callers either passed
{commit,tree,blob,tag}_type and can pass the corresponding OBJ_* type
instead, or were hardcoding strings like "blob".
This avoids the back & forth fragility where the callers of
write_object_file() would have the enum type, and convert it
themselves via type_name(). We do have to now do that conversion
ourselves before calling write_object_file_prepare(), but those
codepaths will be similarly adjusted in subsequent commits.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various operating modes of "git reset" have been made to work
better with the sparse index.
* vd/sparse-reset:
unpack-trees: improve performance of next_cache_entry
reset: make --mixed sparse-aware
reset: make sparse-aware (except --mixed)
reset: integrate with sparse index
reset: expand test coverage for sparse checkouts
sparse-index: update command for expand/collapse test
reset: preserve skip-worktree bit in mixed reset
reset: rename is_missing to !is_in_reset_tree
Remove `ensure_full_index` guard on `prime_cache_tree` and update
`prime_cache_tree_rec` to correctly reconstruct sparse directory entries in
the cache tree. While processing a tree's entries, `prime_cache_tree_rec`
must determine whether a directory entry is sparse or not by searching for
it in the index (*without* expanding the index). If a matching sparse
directory index entry is found, no subtrees are added to the cache tree
entry and the entry count is set to 1 (representing the sparse directory
itself). Otherwise, the tree is assumed to not be sparse and its subtrees
are recursively added to the cache tree.
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Disable `command_requires_full_index` repo setting and add
`ensure_full_index` guards around code paths that cannot yet use sparse
directory index entries. `reset --soft` does not modify the index, so no
compatibility changes are needed for it to function without expanding the
index. For all other reset modes (`--mixed`, `--hard`, `--keep`, `--merge`),
the full index is expanded to prevent cache tree corruption and invalid
variable accesses.
Additionally, the `read_cache()` check verifying an uncorrupted index is
moved after argument parsing and preparing the repo settings. The index is
not used by the preceding argument handling, but `read_cache()` must be run
*after* enabling sparse index for the command (so that the index is not
expanded unnecessarily) and *before* using the index for reset (so that it
is verified as uncorrupted).
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent sparse-index addition, namely any use of index_name_pos(),
can expand sparse index entries and breaks any code that walks
cache-tree or existing index entries. One such instance of such a
breakage has been corrected.
* pw/sparse-cache-tree-verify-fix:
t1092: run "rebase --apply" without "-q" in testing
sparse index: fix use-after-free bug in cache_tree_verify()
Fix a regression in the error output emitted when .git/objects can't
be written to. Before 9c4d6c0297 (cache-tree: Write updated
cache-tree after commit, 2014-07-13) we'd emit only one "insufficient
permission" error, now we'll do so again.
The cause is rather straightforward, we've got WRITE_TREE_SILENT for
the use-case of wanting to prepare an index silently, quieting any
permission etc. error output. Then when we attempt to update to
that (possibly broken) index we'll run into the same errors again.
But with 9c4d6c0297 the gap between the cache-tree API and the object
store wasn't closed in terms of asking write_object_file() to be
silent. I.e. post-9c4d6c0297b the first call is to prepare_index(),
and after that we'll call prepare_to_commit(). We only want verbose
error output from the latter.
So let's add and use that facility with a corresponding HASH_SILENT
flag, its only user is cache-tree.c's update_one(), which will set it
if its "WRITE_TREE_SILENT" flag is set.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a sparse index it is possible for the tree that is being verified
to be freed while it is being verified. This happens when the index is
sparse but the cache tree is not and index_name_pos() looks up a path
from the cache tree that is a descendant of a sparse index entry. That
triggers a call to ensure_full_index() which frees the cache tree that
is being verified. Carrying on trying to verify the tree after this
results in a use-after-free bug. Instead restart the verification if a
sparse index is converted to a full index. This bug is triggered by a
call to reset_head() in "git rebase --apply". Thanks to René Scharfe
and Derrick Stolee for their help analyzing the problem.
==74345==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000001b20 at pc 0x557cbe82d3a2 bp 0x7ffdfee08090 sp 0x7ffdfee08080
READ of size 4 at 0x606000001b20 thread T0
#0 0x557cbe82d3a1 in verify_one /home/phil/src/git/cache-tree.c:863
#1 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#2 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#3 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#4 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
#5 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
#6 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
#7 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
#8 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
#9 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
#10 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
#11 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
#12 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
#13 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
#14 0x557cbe5bcb8d in _start (/home/phil/src/git/git+0x1b9b8d)
0x606000001b20 is located 0 bytes inside of 56-byte region [0x606000001b20,0x606000001b58)
freed by thread T0 here:
#0 0x7fdd4bacff19 in __interceptor_free /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:127
#1 0x557cbe82af60 in cache_tree_free /home/phil/src/git/cache-tree.c:35
#2 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
#3 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
#4 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
#5 0x557cbeb2557a in ensure_full_index /home/phil/src/git/sparse-index.c:310
#6 0x557cbea45c4a in index_name_stage_pos /home/phil/src/git/read-cache.c:588
#7 0x557cbe82ce37 in verify_one /home/phil/src/git/cache-tree.c:850
#8 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#9 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#10 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#11 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
#12 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
#13 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
#14 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
#15 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
#16 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
#17 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
#18 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
#19 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
#20 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
previously allocated by thread T0 here:
#0 0x7fdd4bad0459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x557cbebc1807 in xcalloc /home/phil/src/git/wrapper.c:140
#2 0x557cbe82b7d8 in cache_tree /home/phil/src/git/cache-tree.c:17
#3 0x557cbe82b7d8 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:763
#4 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
#5 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
#6 0x557cbe8304e1 in prime_cache_tree /home/phil/src/git/cache-tree.c:779
#7 0x557cbeab7fa7 in reset_head /home/phil/src/git/reset.c:85
#8 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
#9 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
#10 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
#11 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
#12 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
#13 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
#14 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git read-tree" checks the existence of the blobs referenced by the
given tree, but does not bulk prefetch them. Add a bulk prefetch.
The lack of prefetch here was noticed at $DAYJOB during a merge
involving some specific commits, but I couldn't find a minimal merge
that didn't also trigger the prefetch in check_updates() in
unpack-trees.c (and in all these cases, the lack of prefetch in
cache-tree.c didn't matter because all the relevant blobs would have
already been prefetched by then). This is why I used read-tree here to
exercise this code path.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>