For diff family commands, we can tell them to exclude changes outside
of some directories if --relative is requested.
In diff_unmerge(), NULL will be returned if the requested path is
outside of the interesting directories, thus we'll run into NULL
pointer dereference in run_diff_files when trying to dereference
its return value.
Checking for return value of diff_unmerge before dereferencing
is not sufficient, though. Since, diff engine will try to work on such
pathspec later.
Let's not run diff on those unintesting entries, instead.
As a side effect, by skipping like that, we can save some CPU cycles.
Reported-by: Thomas De Zeeuw <thomas@slight.dev>
Tested-by: Carlo Arenas <carenas@gmail.com>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While comparing an index to a tree, we may see a sparse directory entry.
In this case, we should compare that portion of the tree to the tree
represented by that entry. This could include a new tree which needs to
be expanded to a full list of added files. It could also include an
existing tree, in which case all of the changes inside are important to
describe, including the modifications, additions, and deletions. Note
that the case where the tree has a path and the index does not remains
identical to before: the lack of a cache entry is the same with a sparse
index.
Use diff_tree_oid() appropriately to compute the diff.
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Up until recently, object IDs did not have an algorithm member, only a
hash. Consequently, it was possible to share one null (all-zeros)
object ID among all hash algorithms. Now that we're going to be
handling objects from multiple hash algorithms, it's important to make
sure that all object IDs have a correct algorithm field.
Introduce a per-algorithm null OID, and add it to struct hash_algo.
Introduce a wrapper function as well, and use it everywhere we used to
use the null_oid constant.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Validate that fsmonitor is valid to futureproof against bugs where
check_removed might be called from places that haven't refreshed.
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach git to honor fsmonitor rather than issuing an lstat
when checking for dirty local deletes. Eliminates O(files)
lstats during `git diff HEAD`
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
do_diff_cache() builds a struct rev_info to hand to diff_cache() from
scratch by initializing it using repo_init_revisions() and then
replacing its diffopt and prune_data members.
The diffopt member is initialized to a heap-allocated list of options,
though. Release it using diff_setup_done() before overwriting it.
The initial value of the prune_data member doesn't need to be released,
but the copy created using copy_pathspec() does. Clear it after use.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" and other commands that share the same machinery to
compare with working tree files have been taught to take advantage
of the fsmonitor data when available.
* nk/diff-files-vs-fsmonitor:
p7519-fsmonitor: add a git add benchmark
p7519-fsmonitor: refactor to avoid code duplication
perf lint: add make test-lint to perf tests
t/perf: add fsmonitor perf test for git diff
t/perf/p7519-fsmonitor.sh: warm cache on first git status
t/perf/README: elaborate on output format
fsmonitor: use fsmonitor data in `git diff`
With fsmonitor enabled, the first call to match_stat_with_submodule
calls refresh_fsmonitor, incurring the overhead of reading the list of
updated files -- but run_diff_files does not respect the
CE_FSMONITOR_VALID flag.
Make use of the fsmonitor extension to skip lstat() calls on files
that fsmonitor judged as unmodified.
Notably, this change improves performance of the git shell prompt when
GIT_PS1_SHOWDIRTYSTATE is set.
Signed-off-by: Alex Vandiver <alexmv@dropbox.com>
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Get rid of 'dense' argument that is redundant for every function that has
'struct rev_info *rev' argument as well, as the value of 'dense' passed is
always taken from 'rev->dense_combined_merges' field.
The only place where this was not the case is in 'submodule.c' where
'diff_tree_combined_merge()' was called with '1' for 'dense' argument. However,
at that call the 'revs' instance used is local to the function, and we now just
set 'revs->dense_combined_merges' to 1 in this local instance.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"format-patch --range-diff=<prev> <origin>..HEAD" has been taught
not to ignore <origin> when <prev> is a single version.
* es/format-patch-interdiff-cleanup:
format-patch: use 'origin' as start of current-series-range when known
diff-lib: tighten show_interdiff()'s interface
diff: move show_interdiff() from its own file to diff-lib
There is currently no easy way to take the diff between the working tree
or index and the merge base between an arbitrary commit and HEAD. Even
diff's `...` notation doesn't allow this because it only works between
commits. However, the ability to do this would be desirable to a user
who would like to see all the changes they've made on a branch plus
uncommitted changes without taking into account changes made in the
upstream branch.
Teach diff-index and diff (with one commit) the --merge-base option
which allows a user to use the merge base of a commit and HEAD as the
"before" side.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future commit, we will be using this function to implement
--merge-base functionality in various diff commands.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future commit, we will teach run_diff_index() to accept more
options via flag bits. For now, change `cached` into a flag in the
`option` bitfield. The behaviour should remain exactly the same.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To compute and show an interdiff, show_interdiff() needs only the two
OID's to compare and a diffopts, yet it expects callers to supply an
entire rev_info. The demand for rev_info is not only overkill, but also
places unnecessary burden on potential future callers which might not
otherwise have a rev_info at hand. Address this by tightening its
signature to require only the items it needs instead of a full rev_info.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
show_interdiff() is a relatively small function and not likely to grow
larger or more complicated. Rather than dedicating an entire source file
to it, relocate it to diff-lib.c which houses other "take two things and
compare them" functions meant to be re-used but not so low-level as to
reside in the core diff implementation.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add separate 'match_missing' field for diff-index to use and set it when we
encounter "-m" option. This field won't then be cleared when another meaning of
"-m" is reverted (e.g., by "--no-diff-merges"), nor it will be affected by
future option(s) that might drive 'ignore_merges' field.
Use this new field from diff-lib:do_oneway_diff() instead of reusing
'ignore_merges' field.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating "new file" diffs against i-t-a index entries, diff-lib
erroneously used the mode of the cache entry rather than the mode of the
file in the worktree. This changes run_diff_files() to correctly use the
mode of the worktree file in this case.
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documented behavior of `git diff-files --raw` is to display
[...] 0{40} if creation, unmerged or "look at work tree".
on the right hand (i.e. postimage) side. This happens for files that
have unstaged modifications, and for files that are unmodified but
stat-dirty.
For intent-to-add files, we used to show the empty blob's hash instead.
In c26022ea8f (diff: convert diff_addremove to struct object_id,
2017-05-30), we made that worse by inadvertently changing that to the
hash of the empty tree.
Let's make the behavior consistent with files that have unstaged
modifications (which applies to intent-to-add files, too) by showing
all-zero values also for intent-to-add files.
Accordingly, this patch adjusts the expectations set by the regression
test introduced in feea6946a5 (diff-files: treat "i-t-a" files as
"not-in-index", 2020-06-20).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Without this bug fix, t7519's four "status doesn't detect unreported
modifications" test cases would fail occasionally (and, oddly enough,
*a lot* more frequently on Windows).
The reason is that these test cases intentionally use the side effect of
`git status` to re-write the index if any updates were detected: they
first clean the worktree, run `git status` to update the index as well
as show the output to the casual reader, then make the worktree dirty
again and expect no changes to reported if running with a mocked
fsmonitor hook.
The problem with this strategy was that the index was written during
said `git status` on the clean worktree for the *wrong* reason: not
because the index was marked as changed (it wasn't), but because the
recorded mtimes were racy with the index' own mtime.
As the mtime granularity on Windows is 100 nanoseconds (see e.g.
https://docs.microsoft.com/en-us/windows/desktop/SysInfo/file-times),
the mtimes of the files are often enough *not* racy with the index', so
that that `git status` call currently does not always update the index
(including the fsmonitor extension), causing the test case to fail.
The obvious fix: if we change *any* index entry's `CE_FSMONITOR_VALID`
flag, we should also mark the index as changed. That will cause the
index to be written upon `git status`, *including* an updated fsmonitor
extension.
Side note: Even though the reader might think that the t7519 issue
should be *much* more prevalent on Linux, given that the ext4 filesystem
(that seems to be used by every Linux distribution) stores mtimes in
nanosecond precision. However, ext4 uses `current_kernel_time()` (see
https://unix.stackexchange.com/questions/11599#comment762968_11599; it
is *amazingly* hard to find any proper source of information about such
ext4 questions) whose accuracy seems to depend on many factors but is
safely worse than the 100-nanosecond granularity of NTFS (again, it is
*horribly* hard to find anything remotely authoritative about this
question). So it seems that the racy index condition that hid the bug
fixed by this patch simply is a lot more likely on Linux than on
Windows. But not impossible ;-)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sole purpose of this function is to fix the sorting order of the
queued diff entries. It doesn't need to know about any diff options, so
we can drop the unused parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various codepaths in the core-ish part learn to work on an
arbitrary in-core index structure, not necessarily the default
instance "the_index".
* nd/the-index: (23 commits)
revision.c: reduce implicit dependency the_repository
revision.c: remove implicit dependency on the_index
ws.c: remove implicit dependency on the_index
tree-diff.c: remove implicit dependency on the_index
submodule.c: remove implicit dependency on the_index
line-range.c: remove implicit dependency on the_index
userdiff.c: remove implicit dependency on the_index
rerere.c: remove implicit dependency on the_index
sha1-file.c: remove implicit dependency on the_index
patch-ids.c: remove implicit dependency on the_index
merge.c: remove implicit dependency on the_index
merge-blobs.c: remove implicit dependency on the_index
ll-merge.c: remove implicit dependency on the_index
diff-lib.c: remove implicit dependency on the_index
read-cache.c: remove implicit dependency on the_index
diff.c: remove implicit dependency on the_index
grep.c: remove implicit dependency on the_index
diff.c: remove the_index dependency in textconv() functions
blame.c: rename "repo" argument to "r"
combine-diff.c: remove implicit dependency on the_index
...
spatch transformation to replace boolean uses of !hashcmp() to
newly introduced oideq() is added, and applied, to regain
performance lost due to support of multiple hash algorithms.
* jk/cocci:
show_dirstat: simplify same-content check
read-cache: use oideq() in ce_compare functions
convert hashmap comparison functions to oideq()
convert "hashcmp() != 0" to "!hasheq()"
convert "oidcmp() != 0" to "!oideq()"
convert "hashcmp() == 0" to hasheq()
convert "oidcmp() == 0" to oideq()
introduce hasheq() and oideq()
coccinelle: use <...> for function exclusion
The unpack_trees() API used in checking out a branch and merging
walks one or more trees along with the index. When the cache-tree
in the index tells us that we are walking a tree whose flattened
contents is known (i.e. matches a span in the index), as linearly
scanning a span in the index is much more efficient than having to
open tree objects recursively and listing their entries, the walk
can be optimized, which is done in this topic.
* nd/unpack-trees-with-cache-tree:
Document update for nd/unpack-trees-with-cache-tree
cache-tree: verify valid cache-tree in the test suite
unpack-trees: add missing cache invalidation
unpack-trees: reuse (still valid) cache-tree from src_index
unpack-trees: reduce malloc in cache-tree walk
unpack-trees: optimize walking same trees with cache-tree
unpack-trees: add performance tracing
trace.h: support nested performance tracing
This is the flip side of the previous two patches: checking
for a non-zero oidcmp() can be more strictly expressed as
inequality. Like those patches, we write "!= 0" in the
coccinelle transformation, which covers by isomorphism the
more common:
if (oidcmp(E1, E2))
As with the previous two patches, this patch can be achieved
almost entirely by running "make coccicheck"; the only
differences are manual line-wrap fixes to match the original
code.
There is one thing to note for anybody replicating this,
though: coccinelle 1.0.4 seems to miss the case in
builtin/tag.c, even though it's basically the same as all
the others. Running with 1.0.7 does catch this, so
presumably it's just a coccinelle bug that was fixed in the
interim.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the more restrictive oideq() should, in the long run,
give the compiler more opportunities to optimize these
callsites. For now, this conversion should be a complete
noop with respect to the generated code.
The result is also perhaps a little more readable, as it
avoids the "zero is equal" idiom. Since it's so prevalent in
C, I think seasoned programmers tend not to even notice it
anymore, but it can sometimes make for awkward double
negations (e.g., we can drop a few !!oidcmp() instances
here).
This patch was generated almost entirely by the included
coccinelle patch. This mechanical conversion should be
completely safe, because we check explicitly for cases where
oidcmp() is compared to 0, which is what oideq() is doing
under the hood. Note that we don't have to catch "!oidcmp()"
separately; coccinelle's standard isomorphisms make sure the
two are treated equivalently.
I say "almost" because I did hand-edit the coccinelle output
to fix up a few style violations (it mostly keeps the
original formatting, but sometimes unwraps long lines).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Performance measurements are listed right now as a flat list, which is
fine when we measure big blocks. But when we start adding more and
more measurements, some of them could be just part of a bigger
measurement and a flat list gives a wrong impression that they are
executed at the same level instead of nested.
Add trace_performance_enter() and trace_performance_leave() to allow
indent these nested measurements. For now it does not help much
because the only nested thing is (lazy) name hash initialization
(e.g. called in diff-index from "git status"). This will help more
because I'm going to add some more tracing that's actually nested.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the match_patchspec API and friends take an index_state instead
of assuming the_index in dir.c. All external call sites are converted
blindly to keep the patch simple and retain current behavior.
Individual call sites may receive further updates to use the right
index instead of the_index.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git pull --rebase" on a corrupt HEAD caused a segfault. In
general we substitute an empty tree object when running the in-core
equivalent of the diff-index command, and the codepath has been
corrected to do so as well to fix this issue.
* jk/has-uncommitted-changes-fix:
has_uncommitted_changes(): fall back to empty tree
If has_uncommitted_changes() can't resolve HEAD (e.g.,
because it's unborn or corrupt), then we end up calling
run_diff_index() with an empty revs.pending array. This
causes a segfault, as run_diff_index() blindly looks at the
first pending item.
Fixing this raises a question of fault: should
run_diff_index() handle this case, or is the caller wrong to
pass an empty pending list?
Looking at the other callers of run_diff_index(), they
handle this in one of three ways:
- they resolve the object themselves, and avoid doing the
diff if it's not valid
- they resolve the object themselves, and fall back to the
empty tree
- they use setup_revisions(), which will die() if the
object isn't valid
Since this is the only broken caller, that argues that the
fix should go there. Falling back to the empty tree makes
sense here, as we'd claim uncommitted changes if and only if
the index is non-empty. This may be a little funny in the
case of corruption (the corrupt HEAD probably _isn't_
empty), but:
- we don't actually know the reason here that HEAD didn't
resolve (the much more likely case is that we have an
unborn HEAD, in which case the empty tree comparison is
the right thing)
- this matches how other code, like "git diff", behaves
While we're thinking about it, let's add an assertion to
run_diff_index(). It should always be passed a single
object, and as this bug shows, it's easy to get it wrong
(and an assertion is easier to hunt down than a segfault, or
a quietly ignored extra tree).
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This option is supposed to fix the diff of "diff-files" (not reporting
ita entries as new files) and "diff-index --cached <tree>" (showing ita
entries as present in the index with empty content) but not
"diff-index <tree>".
When --ita-invisible-in-index is set on "git diff-index <tree>",
unpack_trees() will eventually call oneway_diff() on the ita entry
with the same code flow as "diff-index --cached <tree>". We want to
ignore the ita entry for "diff-index --cached <tree>" but not
"diff-index <tree>" since the latter will examine and produce a diff
based on worktree entry's (real) content, not ita index entry's
(empty) content.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All the known heavy code blocks are measured (except object database
access). This should help identify if an optimization is effective or
not. An unoptimized git-status would give something like below:
0.001791141 s: read cache ...
0.004011363 s: preload index
0.000516161 s: refresh index
0.003139257 s: git command: ... 'status' '--porcelain=2'
0.006788129 s: diff-files
0.002090267 s: diff-index
0.001885735 s: initialize name hash
0.032013138 s: read directory
0.051781209 s: git command: './git' 'status'
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An infrastructure to define what hash function is used in Git is
introduced, and an effort to plumb that throughout various
codepaths has been started.
* bc/hash-algo:
repository: fix a sparse 'using integer as NULL pointer' warning
Switch empty tree and blob lookups to use hash abstraction
Integrate hash algorithm support with repo setup
Add structure representing hash algorithm
setup: expose enumerated repo info
We learned to talk to watchman to speed up "git status" and other
operations that need to see which paths have been modified.
* bp/fsmonitor:
fsmonitor: preserve utf8 filenames in fsmonitor-watchman log
fsmonitor: read entirety of watchman output
fsmonitor: MINGW support for watchman integration
fsmonitor: add a performance test
fsmonitor: add a sample integration script for Watchman
fsmonitor: add test cases for fsmonitor extension
split-index: disable the fsmonitor extension when running the split index test
fsmonitor: add a test tool to dump the index extension
update-index: add fsmonitor support to update-index
ls-files: Add support in ls-files to display the fsmonitor valid bit
fsmonitor: add documentation for the fsmonitor extension.
fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
update-index: add a new --force-write-index option
preload-index: add override to enable testing preload-index
bswap: add 64 bit endianness helper get_be64
Switch the uses of empty_tree_oid and empty_blob_oid to use the
current_hash abstraction that represents the current hash algorithm in
use.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A single-word "unsigned flags" in the diff options is being split
into a structure with many bitfields.
* bw/diff-opt-impl-to-bitfields:
diff: make struct diff_flags members lowercase
diff: remove DIFF_OPT_CLR macro
diff: remove DIFF_OPT_SET macro
diff: remove DIFF_OPT_TST macro
diff: remove touched flags
diff: add flag to indicate textconv was set via cmdline
diff: convert flags to be stored in bitfields
add, reset: use DIFF_OPT_SET macro to set a diff flag
Remove the `DIFF_OPT_SET` macro and instead set the flags directly.
This conversion is done using the following semantic patch:
@@
expression E;
identifier fld;
@@
- DIFF_OPT_SET(&E, fld)
+ E.flags.fld = 1
@@
type T;
T *ptr;
identifier fld;
@@
- DIFF_OPT_SET(ptr, fld)
+ ptr->flags.fld = 1
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the `DIFF_OPT_TST` macro and instead access the flags directly.
This conversion is done using the following semantic patch:
@@
expression E;
identifier fld;
@@
- DIFF_OPT_TST(&E, fld)
+ E.flags.fld
@@
type T;
T *ptr;
identifier fld;
@@
- DIFF_OPT_TST(ptr, fld)
+ ptr->flags.fld
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We cannot add many more flags to the diff machinery due to the
limitations of the number of flags that can be stored in a single
unsigned int. In order to allow for more flags to be added to the diff
machinery in the future this patch converts the flags to be stored in
bitfields in 'struct diff_flags'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the declaration and definition of resolve_gitlink_ref to use
struct object_id and apply the following semantic patch:
@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3.hash)
+ resolve_gitlink_ref(E1, E2, &E3)
@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3->hash)
+ resolve_gitlink_ref(E1, E2, E3)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the index is read from disk, the fsmonitor index extension is used
to flag the last known potentially dirty index entries. The registered
core.fsmonitor command is called with the time the index was last
updated and returns the list of files changed since that time. This list
is used to flag any additional dirty cache entries and untracked cache
directories.
We can then use this valid state to speed up preload_index(),
ie_match_stat(), and refresh_cache_ent() as they do not need to lstat()
files to detect potential changes for those entries marked
CE_FSMONITOR_VALID.
In addition, if the untracked cache is turned on valid_cached_dir() can
skip checking directories for new or changed files as fsmonitor will
invalidate the cache only for those directories that have been
identified as having potential changes.
To keep the CE_FSMONITOR_VALID state accurate during git operations;
when git updates a cache entry to match the current state on disk,
it will now set the CE_FSMONITOR_VALID bit.
Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit
is cleared and the corresponding untracked cache directory is marked
invalid.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of freeing `foo.objects` for an object array `foo` (sometimes
conditionally), call `object_array_clear(&foo)`. This means we don't
poke as much into the implementation, which is already a good thing, but
also that we release the individual entries as well, thereby fixing at
least one memory-leak (in diff-lib.c).
If someone is holding on to a pointer to an element's `name` or `path`,
that is now a dangling pointer, i.e., we'd be turning an unpleasant
situation into an outright bug. To the best of my understanding no such
long-term pointers are being taken.
The way we handle `study` in builting/reflog.c still looks like it might
leak. That will be addressed in the next commit.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.
* ab/free-and-null:
*.[ch] refactoring: make use of the FREE_AND_NULL() macro
coccinelle: make use of the "expression" FREE_AND_NULL() rule
coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
coccinelle: make use of the "type" FREE_AND_NULL() rule
coccinelle: add a rule to make "type" code use FREE_AND_NULL()
git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
Apply the result of the just-added coccinelle rule. This manually
excludes a few occurrences, mostly things that resulted in many
FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent
change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our code often opens a path to an optional file, to work on its
contents when we can successfully open it. We can ignore a failure
to open if such an optional file does not exist, but we do want to
report a failure in opening for other reasons (e.g. we got an I/O
error, or the file is there, but we lack the permission to open).
The exact errors we need to ignore are ENOENT (obviously) and
ENOTDIR (less obvious). Instead of repeating comparison of errno
with these two constants, introduce a helper function to do so.
* jc/noent-notdir:
treewide: use is_missing_file_error() where ENOENT and ENOTDIR are checked
compat-util: is_missing_file_error()
Convert diff_change to take a struct object_id. In addition convert the
function pointer type 'change_fn_t' to also take a struct object_id.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert diff_addremove to take a struct object_id. In addtion convert
the function pointer type 'add_remove_fn_t' to also take a struct
object_id.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the is_missing_file_error() helper introduced in the previous
step, update all hits from
$ git grep -e ENOENT --and -e ENOTDIR
There are codepaths that only check ENOENT, and it is possible that
some of them should be checking both. Updating them is kept out of
this step deliberately, as we do not want to change behaviour in this
step.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert parse_tree_indirect to take a pointer to struct object_id.
Update all the callers. This transformation was achieved using the
following semantic patch and manual updates to the declaration and
definition. Update builtin/checkout.c manually as well, since it uses a
ternary expression not handled by the semantic patch.
@@
expression E1;
@@
- parse_tree_indirect(E1.hash)
+ parse_tree_indirect(&E1)
@@
expression E1;
@@
- parse_tree_indirect(E1->hash)
+ parse_tree_indirect(E1)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is needed to convert parse_tree_indirect.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If i-t-a entries are present and there is no change between the index
and HEAD i-t-a entries, index_differs_from() still returns "dirty, new
entries" (aka, the resulting commit is not empty), but cache-tree will
skip i-t-a entries and produce the exact same tree of current
commit.
index_differs_from() is supposed to catch this so we can abort
git-commit (unless --no-empty is specified). Update it to optionally
ignore i-t-a entries when doing a diff between the index and HEAD so
that it would return "no change" in this case and abort commit.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When comparing the index and the working tree to show which paths are
new, and comparing the tree recorded in the HEAD and the index to see if
committing the contents recorded in the index would result in an empty
commit, we would want the former comparison to say "these are new paths"
and the latter to say "there is no change" for paths that are marked as
intent-to-add.
We made a similar attempt at d95d728a ("diff-lib.c: adjust position of
i-t-a entries in diff", 2015-03-16), which redefined the semantics of
these two comparison modes globally, which was a disaster and had to be
reverted at 78cc1a54 ("Revert "diff-lib.c: adjust position of i-t-a
entries in diff"", 2015-06-23).
To make sure we do not repeat the same mistake, introduce a new internal
diffopt option so that this different semantics can be asked for only by
callers that ask it, while making sure other unaudited callers will get
the same comparison result.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:
@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash
@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object. This provides no
functional change, as it is essentially a macro substitution.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash. Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
This reverts commit d95d728aba.
It turns out that many other commands that need to interact with the
result of running diff-files and diff-index, e.g. "git apply", "git
rm", etc., need to be adjusted to the new world order it brings in.
For example, it would break this sequence to correct a whitespace
breakage in the parts you changed:
git add -N file
git diff --cached file | git apply --cached --whitespace=fix
git checkout file
In the old world order, "diff" showed a patch to modify an existing
empty file by adding its full contents, and "apply" updated the
index by modifying the existing empty blob (which is what an
Intent-to-Add entry records in the index) with that patch.
In the new world order, "diff" shows a patch to create a new file
with its full contents, but because "apply" thinks that the i-t-a
entry already exists in the index, it refused to accept a creation.
Adjusting "apply" to this new world order is easy, but we need to
assess the extent of the damage to the rest of the system the new
world order brought in before going forward and adjust them all,
after which we can resurrect the commit being reverted here.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After "git add -N", the path appeared in output of "git diff HEAD"
and "git diff --cached HEAD", leading "git status" to classify it
as "Changes to be committed". Such a path, however, is not yet to
be scheduled to be committed. "git diff" showed the change to the
path as modification, not as a "new file", in the header of its
output.
Treat such paths as "yet to be added to the index but Git already
know about them"; "git diff HEAD" and "git diff --cached HEAD"
should not talk about them, and "git diff" should show them as new
files yet to be added to the index.
* nd/diff-i-t-a:
diff-lib.c: adjust position of i-t-a entries in diff
Identify parts of the code that knows that we use SHA-1 hash to
name our objects too much, and use (1) symbolic constants instead
of hardcoded 20 as byte count and/or (2) use struct object_id
instead of unsigned char [20] for object names.
* bc/object-id:
apply: convert threeway_stage to object_id
patch-id: convert to use struct object_id
commit: convert parts to struct object_id
diff: convert struct combine_diff_path to object_id
bulk-checkin.c: convert to use struct object_id
zip: use GIT_SHA1_HEXSZ for trailers
archive.c: convert to use struct object_id
bisect.c: convert leaf functions to use struct object_id
define utility functions for object IDs
define a structure for object IDs
Entries added by "git add -N" are reminder for the user so that they
don't forget to add them before committing. These entries appear in
the index even though they are not real. Their presence in the index
leads to a confusing "git status" like this:
On branch master
Changes to be committed:
new file: foo
Changes not staged for commit:
modified: foo
If you do a "git commit", "foo" will not be included even though
"status" reports it as "to be committed". This patch changes the
output to become
On branch master
Changes not staged for commit:
new file: foo
no changes added to commit
The two hunks in diff-lib.c adjust "diff-index" and "diff-files" so
that i-t-a entries appear as new files in diff-files and nothing in
diff-index.
Due to this change, diff-files may start to report "new files" for the
first time. "add -u" needs to be told about this or it will die in
denial, screaming "new files can't exist! Reality is wrong." Luckily,
it's the only one among run_diff_files() callers that needs fixing.
Now in the new world order, a hierarchy in the index that contain
i-t-a paths is written out as a tree object as if these i-t-a
entries do not exist, and comparing the index with such a tree
object that would result from writing out the hierarchy will result
in no difference. Update a test in t2203 that expected the i-t-a
entries to appear as "added to the index" in the comparison to
instead expect no output.
An earlier change eec3e7e4 (cache-tree: invalidate i-t-a paths after
generating trees, 2012-12-16) becomes an unnecessary pessimization
in the new world order---a cache-tree in the index that corresponds
to a hierarchy with i-t-a paths can now be marked as valid and
record the object name of the tree that results from writing a tree
object out of that hierarchy, as it will compare equal to that tree.
Reverting the commit is left for the future, though, as it is purely
a performance issue and no longer affects correctness.
Helped-by: Michael J Gruber <git@drmicha.warpmail.net>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also, convert a constant to GIT_SHA1_HEXSZ.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the need to have duplicated "if there is a change then feed
null_sha1 and otherwise sha1 from the cache entry" for the "new"
side of the diff by introducing two temporary variables to point
at the object name of the old and the new side of the blobs.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we try to diff an index entry marked CE_VALID (because it
was marked with --assume-unchanged), we do not bother even
running stat() on the file to see if it was removed. This
started long ago with 540e694 (Prevent diff machinery from
examining assume-unchanged entries on worktree, 2009-08-11).
However, the subsequent code may look at our "struct stat"
and expect to find actual data; currently it will find
whatever cruft was left on the stack. This can cause
problems in two situations:
1. We call match_stat_with_submodule with the stat data,
so a submodule may be erroneously marked as changed.
2. If --find-copies-harder is in effect, we pass all
entries, even unchanged ones, to diff_change, so it can
list them as rename/copy sources. Since we found no
change, we assume that function will realize it and not
actually display any diff output. However, we end up
feeding it a bogus mode, leading it to sometimes claim
there was a mode change.
We can fix both by splitting the CE_VALID and regular code
paths, and making sure only to look at the stat information
in the latter. Furthermore, we push the declaration of our
"struct stat" down into the code paths that actually set it,
so we cannot accidentally access it uninitialized in future
code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach combine-diff to honour the path-output-order imposed by
diffcore-order, and optimize how matching paths are found in
the N-way diffs made with parents.
* ks/combine-diff:
tests: add checking that combine-diff emits only correct paths
combine-diff: simplify intersect_paths() further
combine-diff: combine_diff_path.len is not needed anymore
combine-diff: optimize combine_diff_path sets intersection
diff test: add tests for combine-diff with orderfile
diffcore-order: export generic ordering interface
The field was used in order to speed-up name comparison and also to
mark removed paths by setting it to 0.
Because the updated code does significantly less strcmp and also
just removes paths from the list and free right after we know a path
will not be needed, it is not needed anymore.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps reduce the number of match_pathspec_depth() call sites and
show how match_pathspec_depth() is used.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.
* jl/submodule-mv: (53 commits)
rm: delete .gitmodules entry of submodules removed from the work tree
mv: update the path entry in .gitmodules for moved submodules
submodule.c: add .gitmodules staging helper functions
mv: move submodules using a gitfile
mv: move submodules together with their work trees
rm: do not set a variable twice without intermediate reading.
t6131 - skip tests if on case-insensitive file system
parse_pathspec: accept :(icase)path syntax
pathspec: support :(glob) syntax
pathspec: make --literal-pathspecs disable pathspec magic
pathspec: support :(literal) syntax for noglob pathspec
kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
parse_pathspec: make sure the prefix part is wildcard-free
rename field "raw" to "_raw" in struct pathspec
tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
remove match_pathspec() in favor of match_pathspec_depth()
remove init_pathspec() in favor of parse_pathspec()
remove diff_tree_{setup,release}_paths
convert common_prefix() to use struct pathspec
...
Teach "git diff --diff-filter" to express "I do not want to see
these classes of changes" more directly by listing only the
unwanted ones in lowercase (e.g. "--diff-filter=d" will show
everything but deletion) and deprecate "diff-files -q" which did
the same thing as "--diff-filter=d".
* jc/diff-filter-negation:
diff: deprecate -q option to diff-files
diff: allow lowercase letter to specify what change class to exclude
diff: reject unknown change class given to --diff-filter
diff: preparse --diff-filter string argument
diff: factor out match_filter()
diff: pass the whole diff_options to diffcore_apply_filter()
This was inherited from "show-diff -q" that was invented to tell
comparison between the index and the working tree to ignore only
removals in 2005.
These days, it is spelled as "--diff-filter=d".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reimplements the ancient "-q" option to "git diff-files" that
was inherited from "show-diff -q" in terms of "--diff-filter=d". We
will be deprecating the "-q" option, so let's issue a warning when
we do so.
Incidentally this also tentatively fixes "git diff --no-index" to
honor "-q" and hide deletions; the use will get the same warning.
We should remove the support for "-q" in a future version but it is
not that urgent.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at there, move free_pathspec() to pathspec.c
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
match_pathspec_depth() and tree_entry_interesting() check max_depth
field in order to support "git grep --max-depth". The feature
activation is tied to "recursive" field, which led to some unwanted
activation, e.g. 5c8eeb8 (diff-index: enable recursive pathspec
matching in unpack_trees - 2012-01-15).
This patch decouples the activation from "recursive" field, puts it in
"magic" field instead. This makes sure that only "git grep" can
activate this feature. And because parse_pathspec knows when the
feature is not used, it does not need to sort pathspec (required for
max_depth to work correctly). A small win for non-grep cases.
Even though a new magic flag is introduced, no magic syntax is. The
magic can be only enabled by parse_pathspec() caller. We might someday
want to support ":(maxdepth:10)src." It all depends on actual use
cases.
max_depth feature cannot be enabled via init_pathspec() anymore. But
that's ok because init_pathspec() is on its way to /dev/null.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the type merge_fn_t to accept the array of cache_entry pointers
as const pointers to const pointers. This documents the fact that the
merge functions don't modify the cache_entry contents or replace any of
the pointers in the array.
Only a single cast is necessary in unpack_nondirectories because adding
two const modifiers at once is not allowed in C. The cast is safe in
that it doesn't mask any modfication; call_unpack_fn only needs the
array for reading.
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add const to struct cache_entry pointers throughout the tree which are
only used for reading. This allows callers to pass in const pointers.
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff code represents paths using the diff_filespec
struct. This struct has a sha1 to represent the sha1 of the
content at that path, as well as a sha1_valid member which
indicates whether its sha1 field is actually useful. If
sha1_valid is not true, then the filespec represents a
working tree file (e.g., for the no-index case, or for when
the index is not up-to-date).
The diff_filespec is only used internally, though. At the
interfaces to the diff subsystem, callers feed the sha1
directly, and we create a diff_filespec from it. It's at
that point that we look at the sha1 and decide whether it is
valid or not; callers may pass the null sha1 as a sentinel
value to indicate that it is not.
We should not typically see the null sha1 coming from any
other source (e.g., in the index itself, or from a tree).
However, a corrupt tree might have a null sha1, which would
cause "diff --patch" to accidentally diff the working tree
version of a file instead of treating it as a blob.
This patch extends the edges of the diff interface to accept
a "sha1_valid" flag whenever we accept a sha1, and to use
that flag when creating a filespec. In some cases, this
means passing the flag through several layers, making the
code change larger than would be desirable.
One alternative would be to simply die() upon seeing
corrupted trees with null sha1s. However, this fix more
directly addresses the problem (while bogus sha1s in a tree
are probably a bad thing, it is really the sentinel
confusion sending us down the wrong code path that is what
makes it devastating). And it means that git is more capable
of examining and debugging these corrupted trees. For
example, you can still "diff --raw" such a tree to find out
when the bogus entry was introduced; you just cannot do a
"--patch" diff (just as you could not with any other
corrupted tree, as we do not have any content to diff).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pathspec structure has a few bits of data to drive various operation
modes after we unified the pathspec matching logic in various codepaths.
For example, max_depth field is there so that "git grep" can limit the
output for files found in limited depth of tree traversal. Also in order
to show just the surface level differences in "git diff-tree", recursive
field stops us from descending into deeper level of the tree structure
when it is set to false, and this also affects pathspec matching when
we have wildcards in the pathspec.
The diff-index has always wanted the recursive behaviour, and wanted to
match pathspecs without any depth limit. But we forgot to do so when we
updated tree_entry_interesting() logic to unify the pathspec matching
logic.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/diff-index-unpack:
diff-index: pass pathspec down to unpack-trees machinery
unpack-trees: allow pruning with pathspec
traverse_trees(): allow pruning with pathspec
The git term is 'working tree', so replace the most public references
to 'working copy'.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
And finally, pass the pathspec down through unpack_trees() to traverse_trees()
callchain.
Before and after applying this series, looking for changes in the kernel
repository with a fairly narrow pathspec becomes somewhat faster.
(without patch)
$ /usr/bin/time git diff --raw v2.6.27 -- net/ipv6 >/dev/null
0.48user 0.05system 0:00.53elapsed 100%CPU (0avgtext+0avgdata 163296maxresident)k
0inputs+952outputs (0major+11163minor)pagefaults 0swaps
(with patch)
$ /usr/bin/time git diff --raw v2.6.27 -- net/ipv6 >/dev/null
0.01user 0.00system 0:00.02elapsed 104%CPU (0avgtext+0avgdata 43856maxresident)k
0inputs+24outputs (0major+3688minor)pagefaults 0swaps
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The latter is meant to be an API for internal callers that want to inspect
the resulting diff-queue, while the former is an implementation of "git
diff-index" command. Extract the common logic into a single helper
function and make them thin wrappers around it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 34110cd (Make 'unpack_trees()' have a separate source and
destination index, 2008-03-06), we can run unpack_trees() without munging
the index at all, but do_diff_cache() tried ever so carefully to work
around the old behaviour of the function.
We can just tell unpack_trees() not to touch the original index and there
is no need to clean-up whatever the previous round has done.
Signed-off-by: Junio C Hamano <gitster@pobox.com>