Commit Graph

418 Commits (48bc5094de4d2c549efd82780d4488071955d4ff)

Author SHA1 Message Date
Junio C Hamano 15108de2fa Merge branch 'jk/format-patch-ignore-noprefix'
"git format-patch" honors the src/dst prefixes set to nonstandard
values with configuration variables like "diff.noprefix", causing
receiving end of the patch that expects the standard -p1 format to
break.  Teach "format-patch" to ignore end-user configuration and
always use the standard prefixes.

This is a backward compatibility breaking change.

* jk/format-patch-ignore-noprefix:
  rebase: prefer --default-prefix to --{src,dst}-prefix for format-patch
  format-patch: add format.noprefix option
  format-patch: do not respect diff.noprefix
  diff: add --default-prefix option
  t4013: add tests for diff prefix options
  diff: factor out src/dst prefix setup
2023-03-21 14:18:55 -07:00
Junio C Hamano 88cc8ed8bc Merge branch 'en/header-cleanup'
Code clean-up to clarify the rule that "git-compat-util.h" must be
the first to be included.

* en/header-cleanup:
  diff.h: remove unnecessary include of object.h
  Remove unnecessary includes of builtin.h
  treewide: replace cache.h with more direct headers, where possible
  replace-object.h: move read_replace_refs declaration from cache.h to here
  object-store.h: move struct object_info from cache.h
  dir.h: refactor to no longer need to include cache.h
  object.h: stop depending on cache.h; make cache.h depend on object.h
  ident.h: move ident-related declarations out of cache.h
  pretty.h: move has_non_ascii() declaration from commit.h
  cache.h: remove dependence on hex.h; make other files include it explicitly
  hex.h: move some hex-related declarations from cache.h
  hash.h: move some oid-related declarations from cache.h
  alloc.h: move ALLOC_GROW() functions from cache.h
  treewide: remove unnecessary cache.h includes in source files
  treewide: remove unnecessary cache.h includes
  treewide: remove unnecessary git-compat-util.h includes in headers
  treewide: ensure one of the appropriate headers is sourced first
2023-03-17 14:03:09 -07:00
Jeff King 6799aadfdf diff: factor out src/dst prefix setup
We directly manipulate diffopt's a_prefix and b_prefix to set up either
the default "a/foo" prefix or the "--no-prefix" variant. Although this
is only a few lines, it's worth pulling these into their own functions.
That lets us avoid one repetition already in this patch, but will also
give us a cleaner interface for callers which want to tweak this
setting.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-09 08:32:17 -08:00
Junio C Hamano ece8dc97ae Merge branch 'jc/diff-algo-attribute'
The "diff" drivers specified by the "diff" attribute attached to
paths can now specify which algorithm (e.g. histogram) to use.

* jc/diff-algo-attribute:
  diff: teach diff to read algorithm from diff driver
  diff: consolidate diff algorithm option parsing
2023-02-27 10:08:56 -08:00
Elijah Newren f524970185 diff.h: remove unnecessary include of object.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:30 -08:00
Elijah Newren a64215b6cd object.h: stop depending on cache.h; make cache.h depend on object.h
Things should be able to depend on object.h without pulling in all of
cache.h.  Move an enum to allow this.

Note that a couple files previously depended on things brought in
through cache.h indirectly (revision.h -> commit.h -> object.h ->
cache.h).  As such, this change requires making existing dependencies
more explicit in half a dozen files.  The inclusion of strbuf.h in
some headers if of particular note: these headers directly embedded a
strbuf in some new structs, meaning they should have been including
strbuf.h all along but were indirectly getting the necessary
definitions.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:29 -08:00
John Cai a4cf900ee7 diff: teach diff to read algorithm from diff driver
It can be useful to specify diff algorithms per file type. For example,
one may want to use the minimal diff algorithm for .json files, another
for .c files, etc.

The diff machinery already checks attributes for a diff driver. Teach
the diff driver parser a new type "algorithm" to look for in the
config, which will be used if a driver has been specified through the
attributes.

Enforce precedence of the diff algorithm by favoring the command line
option, then looking at the driver attributes & config combination, then
finally the diff.algorithm config.

To enforce precedence order, use a new `ignore_driver_algorithm` member
during options parsing to indicate the diff algorithm was set via command
line args.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-21 09:29:10 -08:00
Junio C Hamano ab91f6b7c4 Merge branch 'rs/diff-parseopts'
The way the diff machinery prepares the options array for the
parse_options API has been refactored to avoid resource leaks.

* rs/diff-parseopts:
  diff: remove parseopts member from struct diff_options
  diff: use add_diff_options() in diff_opt_parse()
  diff: factor out add_diff_options()
2022-12-19 11:46:17 +09:00
René Scharfe 189e97bc4b diff: remove parseopts member from struct diff_options
repo_diff_setup() builds the struct option array with git diff's command
line options and stores a pointer to it in the parseopts member of
struct diff_options.  The array is freed by diff_setup_done(), but not
by release_revisions().  Thus calling only repo_diff_setup() and
release_revisions() leaks that array.

We could free it in release_revisions() as well to plug that leak, but
there is a better way: Only build it when needed.  Absorb
prep_parse_options() into the last place that uses the parseopts member
of struct diff_options, add_diff_parseopts(), and get rid of said
member.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-02 08:25:30 +09:00
René Scharfe c5630c4868 diff: factor out add_diff_options()
Add a function for appending the parseopts member of struct diff_options
to a struct option array.  Use it in two sites instead of accessing the
parseopts member directly.  Decoupling callers from diff internals like
that allows us to change the latter.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-02 08:25:29 +09:00
Jerry Zhang 51276c1832 patch-id: use stable patch-id for rebases
Git doesn't persist patch-ids during the rebase process, so there is
no need to specifically invoke the unstable variant. Use the stable
logic for all internal patch-id calculations to minimize the number of
code paths and improve test coverage.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24 15:44:19 -07:00
Junio C Hamano 9a160990ef Merge branch 'js/diff-filter-negation-fix'
"git diff --diff-filter=aR" is now parsed correctly.

* js/diff-filter-negation-fix:
  diff-filter: be more careful when looking for negative bits
  diff.c: move the diff filter bits definitions up a bit
  docs(diff): lose incorrect claim about `diff-files --diff-filter=A`
2022-02-16 15:14:30 -08:00
Elijah Newren 95433eeed9 diff: add ability to insert additional headers for paths
When additional headers are provided, we need to
  * add diff_filepairs to diff_queued_diff for each paths in the
    additional headers map which, unless that path is part of
    another diff_filepair already found in diff_queued_diff
  * format the headers (colorization, line_prefix for --graph)
  * make sure the various codepaths that attempt to return early
    if there are "no changes" take into account the headers that
    need to be shown.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02 10:02:28 -08:00
Johannes Schindelin 75408ca949 diff-filter: be more careful when looking for negative bits
The `--diff-filter=<bits>` option allows to filter the diff by certain
criteria, for example `R` to only show renamed files. It also supports
negating a filter via a down-cased letter, i.e. `r` to show _everything
but_ renamed files.

However, the code is a bit overzealous when trying to figure out whether
`git diff` should start with all diff-filters turned on because the user
provided a lower-case letter: if the `--diff-filter` argument starts
with an upper-case letter, we must not start with all bits turned on.

Even worse, it is possible to specify the diff filters in multiple,
separate options, e.g. `--diff-filter=AM [...] --diff-filter=m`.

Let's accumulate the include/exclude filters independently, and only
special-case the "only exclude filters were specified" case after
parsing the options altogether.

Note: The code replaced by this commit took pains to avoid setting any
unused bits of `options->filter`. That was unnecessary, though, as all
accesses happen via the `filter_bit_tst()` function using specific bits,
and setting the unused bits has no effect. Therefore, we can simplify
the code by using `~0` (or in this instance, `~<unwanted-bit>`).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-28 10:18:18 -08:00
Ævar Arnfjörð Bjarmason a47fcbe6e4 diff.h: move pickaxe fields together again
Move the pickaxe and pickaxe_opts fields next to each other again. In
a past life they'd been on adjacent lines, but when they got moved
from a global variable to the diff_options struct in 6b5ee137e5 (Diff
clean-up., 2005-09-21) they got split apart.

That split made sense at the time, the "char*" and "int" (flags)
options were being grouped, but we've long since abandoned that
pattern in the diff_options struct, and now it makes more sense to
group these together again.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason d26ec88009 pickaxe: die when --find-object and --pickaxe-all are combined
Neither the --pickaxe-all documentation nor --find-object's has ever
suggested that you can combine the two. See f506b8e8b5 (git log/diff:
add -G<regexp> that greps in the patch text, 2010-08-23) and
15af58c1ad (diffcore: add a pickaxe option to find a specific blob,
2018-01-04).

But we've silently tolerated it, which makes the logic in
diffcore_pickaxe() harder to reason about. Let's assert that we won't
have the two combined.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason 188e9e28c5 pickaxe: die when -G and --pickaxe-regex are combined
When the -G and --pickaxe-regex options are combined we simply ignore
the --pickaxe-regex option. Let's die instead as suggested by our
documentation, since -G is always a regex.

When --pickaxe-regex was added in d01d8c6782 (Support for pickaxe
matching regular expressions, 2006-03-29) only the -S option
existed. Then when -G was added in f506b8e8b5 (git log/diff: add
-G<regexp> that greps in the patch text, 2010-08-23) neither the
documentation for --pickaxe-regex was updated accordingly, nor was
something like this assertion added.

Since 5bc3f0b567 (diffcore-pickaxe doc: document -S and -G properly,
2013-05-31) we've claimed that --pickaxe-regex should only be used
with -S, but have silently tolerated combining it with -G, let's die
instead.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Junio C Hamano 845d6030f8 Merge branch 'jc/diffcore-rotate'
"git {diff,log} --{skip,rotate}-to=<path>" allows the user to
discard diff output for early paths or move them to the end of the
output.

* jc/diffcore-rotate:
  diff: --{rotate,skip}-to=<path>
2021-02-25 16:43:30 -08:00
Junio C Hamano 1eb4136ac2 diff: --{rotate,skip}-to=<path>
In the implementation of "git difftool", there is a case where the
user wants to start viewing the diffs at a specific path and
continue on to the rest, optionally wrapping around to the
beginning.  Since it is somewhat cumbersome to implement such a
feature as a post-processing step of "git diff" output, let's
support it internally with two new options.

 - "git diff --rotate-to=C", when the resulting patch would show
   paths A B C D E without the option, would "rotate" the paths to
   shows patch to C D E A B instead.  It is an error when there is
   no patch for C is shown.

 - "git diff --skip-to=C" would instead "skip" the paths before C,
   and shows patch to C D E.  Again, it is an error when there is no
   patch for C is shown.

 - "git log [-p]" also accepts these two options, but it is not an
   error if there is no change to the specified path.  Instead, the
   set of output paths are rotated or skipped to the specified path
   or the first path that sorts after the specified path.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 09:30:42 -08:00
Ævar Arnfjörð Bjarmason e900d494dc diff: add an API for deferred freeing
Add a diff_free() function to free anything we may have allocated in
the "diff_options" struct, and the ability to make calling it a noop
by setting "no_free" in "diff_options".

This is required because when e.g. "git diff" is run we'll allocate
things in that struct, use the diff machinery once, and then exit.

But if we run e.g. "git log -p" we're going to re-use what we
allocated across multiple diff_flush() calls, and only want to free
things at the end.

We've thus ended up with features like the recently added "diff -I"[1]
where we'll leak memory. As it turns out it could have simply used the
pattern established in 6ea57703f6 (log: prepare log/log-tree to reuse
the diffopt.close_file attribute, 2016-06-22).

Manually adding more such flags to things log_tree_commit() every time
we need to allocate something would be tedious. Let's instead move
that fclose() code it to a new diff_free(), in anticipation of freeing
more things in that function in follow-up commits.

Some functions such as log_tree_commit() need an idiom of optionally
retaining a previous "no_free", as they may either free the memory
themselves, or their caller may do so. I'm keeping that idiom in
log_show_early() for good measure, even though I don't think it's
currently called in this manner. It also gets passed an existing
"struct rev_info", so future callers may want to set the "no_free"
flag.

This change is a bit hard to read because while the freeing pattern
we're introducing isn't unusual, the "file" member is a special
snowflake. We usually don't want to fclose() it. This is because
"file" is usually stdout, in which case we don't want to fclose()
it. We only want to opt-in to closing it when we e.g. open a file on
the filesystem. Thus the opt-in "close_file" flag.

So the API in general just needs a "no_free" flag to defer freeing,
but the "file" member still needs its "close_file" flag. This is made
more confusing because while refactoring this code we could replace
some "close_file=0" with "no_free=1", whereas others need to set both
flags.

This is because there were some cases where an existing "close_file=0"
meant "let's defer deallocation", and others where it meant "we don't
want to close this file handle at all".

1. 296d4a94e7 (diff: add -I<regex> that ignores matching changes,
   2020-10-20)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-11 09:21:05 -08:00
Junio C Hamano 0806279428 Merge branch 'sj/untracked-files-in-submodule-directory-is-not-dirty'
"git diff" showed a submodule working tree with untracked cruft as
"Submodule commit <objectname>-dirty", but a natural expectation is
that the "-dirty" indicator would align with "git describe --dirty",
which does not consider having untracked files in the working tree
as source of dirtiness.  The inconsistency has been fixed.

* sj/untracked-files-in-submodule-directory-is-not-dirty:
  diff: do not show submodule with untracked files as "-dirty"
2021-01-25 14:19:18 -08:00
Sangeeta Jain 8ef9312464 diff: do not show submodule with untracked files as "-dirty"
Git diff reports a submodule directory as -dirty even when there are
only untracked files in the submodule directory. This is inconsistent
with what `git describe --dirty` says when run in the submodule
directory in that state.

Make `--ignore-submodules=untracked` the default for `git diff` when
there is no configuration variable or command line option, so that the
command would not give '-dirty' suffix to a submodule whose working
tree has untracked files, to make it consistent with `git
describe --dirty` that is run in the submodule working tree.

And also make `--ignore-submodules=none` the default for `git status`
so that the user doesn't end up deleting a submodule that has
uncommitted (untracked) files.

Signed-off-by: Sangeeta Jain <sangunb09@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08 14:27:35 -08:00
Junio C Hamano 1ae0949a03 Merge branch 'mk/diff-ignore-regex'
"git diff" family of commands learned the "-I<regex>" option to
ignore hunks whose changed lines all match the given pattern.

* mk/diff-ignore-regex:
  diff: add -I<regex> that ignores matching changes
  merge-base, xdiff: zero out xpparam_t structures
2020-11-02 13:17:44 -08:00
Junio C Hamano b6fb70c985 Merge branch 'dl/diff-merge-base'
"git diff A...B" learned "git diff --merge-base A B", which is a
longer short-hand to say the same thing.

* dl/diff-merge-base:
  contrib/completion: complete `git diff --merge-base`
  builtin/diff-tree: learn --merge-base
  builtin/diff-index: learn --merge-base
  t4068: add --merge-base tests
  diff-lib: define diff_get_merge_base()
  diff-lib: accept option flags in run_diff_index()
  contrib/completion: extract common diff/difftool options
  git-diff.txt: backtick quote command text
  git-diff-index.txt: make --cached description a proper sentence
  t4068: remove unnecessary >tmp
2020-11-02 13:17:39 -08:00
Michał Kępień 296d4a94e7 diff: add -I<regex> that ignores matching changes
Add a new diff option that enables ignoring changes whose all lines
(changed, removed, and added) match a given regular expression.  This is
similar to the -I/--ignore-matching-lines option in standalone diff
utilities and can be used e.g. to ignore changes which only affect code
comments or to look for unrelated changes in commits containing a large
number of automatically applied modifications (e.g. a tree-wide string
replacement).  The difference between -G/-S and the new -I option is
that the latter filters output on a per-change basis.

Use the 'ignore' field of xdchange_t for marking a change as ignored or
not.  Since the same field is used by --ignore-blank-lines, identical
hunk emitting rules apply for --ignore-blank-lines and -I.  These two
options can also be used together in the same git invocation (they are
complementary to each other).

Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate
that it is logically a "sibling" of xdl_mark_ignorable_regex() rather
than its "parent".

Signed-off-by: Michał Kępień <michal@isc.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 12:53:26 -07:00
Junio C Hamano 34415c76c8 Merge branch 'so/combine-diff-simplify'
Code simplification.

* so/combine-diff-simplify:
  diff: get rid of redundant 'dense' argument
2020-10-05 14:01:51 -07:00
Junio C Hamano 288ed98bf7 Merge branch 'tb/bloom-improvements'
"git commit-graph write" learned to limit the number of bloom
filters that are computed from scratch with the --max-new-filters
option.

* tb/bloom-improvements:
  commit-graph: introduce 'commitGraph.maxNewFilters'
  builtin/commit-graph.c: introduce '--max-new-filters=<n>'
  commit-graph: rename 'split_commit_graph_opts'
  bloom: encode out-of-bounds filters as non-empty
  bloom/diff: properly short-circuit on max_changes
  bloom: use provided 'struct bloom_filter_settings'
  bloom: split 'get_bloom_filter()' in two
  commit-graph.c: store maximum changed paths
  commit-graph: respect 'commitGraph.readChangedPaths'
  t/helper/test-read-graph.c: prepare repo settings
  commit-graph: pass a 'struct repository *' in more places
  t4216: use an '&&'-chain
  commit-graph: introduce 'get_bloom_filter_settings()'
2020-09-29 14:01:20 -07:00
Sergey Organov d01141de5a diff: get rid of redundant 'dense' argument
Get rid of 'dense' argument that is redundant for every function that has
'struct rev_info *rev' argument as well, as the value of 'dense' passed is
always taken from 'rev->dense_combined_merges' field.

The only place where this was not the case is in 'submodule.c' where
'diff_tree_combined_merge()' was called with '1' for 'dense' argument. However,
at that call the 'revs' instance used is local to the function, and we now just
set 'revs->dense_combined_merges' to 1 in this local instance.

Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-29 11:54:53 -07:00
Denton Liu 0f5a1d449b builtin/diff-index: learn --merge-base
There is currently no easy way to take the diff between the working tree
or index and the merge base between an arbitrary commit and HEAD. Even
diff's `...` notation doesn't allow this because it only works between
commits. However, the ability to do this would be desirable to a user
who would like to see all the changes they've made on a branch plus
uncommitted changes without taking into account changes made in the
upstream branch.

Teach diff-index and diff (with one commit) the --merge-base option
which allows a user to use the merge base of a commit and HEAD as the
"before" side.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-20 21:30:26 -07:00
Denton Liu 177a830268 diff-lib: define diff_get_merge_base()
In a future commit, we will be using this function to implement
--merge-base functionality in various diff commands.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-20 21:30:26 -07:00
Denton Liu 4c3fe82ef1 diff-lib: accept option flags in run_diff_index()
In a future commit, we will teach run_diff_index() to accept more
options via flag bits. For now, change `cached` into a flag in the
`option` bitfield. The behaviour should remain exactly the same.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-20 21:30:26 -07:00
Derrick Stolee b16a827764 bloom/diff: properly short-circuit on max_changes
Commit e3696980 (diff: halt tree-diff early after max_changes,
2020-03-30) intended to create a mechanism to short-circuit a diff
calculation after a certain number of paths were modified. By
incrementing a "num_changes" counter throughout the recursive
ll_diff_tree_paths(), this was supposed to match the number of changes
that would be written into the changed-path Bloom filters.
Unfortunately, this was not implemented correctly and instead misses
simple cases like file modifications. This then does not stop very
large changed-path filters from being written (unless they add or remove
many files).

To start, change the implementation in ll_diff_tree_paths() to instead
use the global diff_queue_diff struct's 'nr' member as the count. This
is a way to simplify the logic instead of making more mistakes in the
complicated diff code.

This has a drawback: the diff_queue_diff struct only lists the paths
corresponding to blob changes, not their leading directories. Thus,
get_or_compute_bloom_filter() needs an additional check to see if the
hashmap with the leading directories becomes too large.

One reason why this was not caught by test cases was that the test in
t4216-log-bloom.sh that was supposed to check this "too many changes"
condition only checked this on the initial commit of a repository. The
old logic counted these values correctly. Update this test in a few
ways:

1. Use GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS to reduce the limit,
   allowing smaller commits to engage with this logic.

2. Create several interesting cases of edits, adds, removes, and mode
   changes (in the second commit). By testing both sides of the
   inequality with the *_MAX_CHANGED_PATHS variable, we can see that
   the count is exactly correct, so none of these changes are missed
   or over-counted.

3. Use the trace2 data value filter_found_large to verify that these
   commits are on the correct side of the limit.

Another way to verify the behavior is correct is through performance
tests. By testing on my local copies of the Git repository and the Linux
kernel repository, I could measure the effect of these short-circuits
when computing a fresh commit-graph file with changed-path Bloom filters
using the command

  GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=N time \
    git commit-graph write --reachable --changed-paths

and reporting the wall time and resulting commit-graph size.

For Git, the results are

|        |      N=1       |       N=10     |      N=512     |
|--------|----------------|----------------|----------------|
| HEAD~1 | 10.90s  9.18MB | 11.11s  9.34MB | 11.31s  9.35MB |
| HEAD   |  9.21s  8.62MB | 11.11s  9.29MB | 11.29s  9.34MB |

For Linux, the results are

|        |       N=1      |     N=20      |     N=512     |
|--------|----------------|---------------|---------------|
| HEAD~1 | 61.28s  64.3MB | 76.9s  72.6MB | 77.6s  72.6MB |
| HEAD   | 49.44s  56.3MB | 68.7s  65.9MB | 69.2s  65.9MB |

Naturally, the improvement becomes much less as the limit grows, as
fewer commits satisfy the short-circuit.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 09:31:25 -07:00
Eric Sunshine 72a7239016 diff-lib: tighten show_interdiff()'s interface
To compute and show an interdiff, show_interdiff() needs only the two
OID's to compare and a diffopts, yet it expects callers to supply an
entire rev_info. The demand for rev_info is not only overkill, but also
places unnecessary burden on potential future callers which might not
otherwise have a rev_info at hand. Address this by tightening its
signature to require only the items it needs instead of a full rev_info.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-08 15:03:27 -07:00
Eric Sunshine cdffbdc217 diff: move show_interdiff() from its own file to diff-lib
show_interdiff() is a relatively small function and not likely to grow
larger or more complicated. Rather than dedicating an entire source file
to it, relocate it to diff-lib.c which houses other "take two things and
compare them" functions meant to be re-used but not so low-level as to
reside in the core diff implementation.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-08 15:03:26 -07:00
SZEDER Gábor 0ee3cb888d diff.h: drop diff_tree_oid() & friends' return value
ll_diff_tree_oid() has only ever returned 0 [1], so it's return value
is basically useless.  It's only caller diff_tree_oid() has only ever
returned the return value of ll_diff_tree_oid() as-is [2], so its
return value is just as useless.  Most of diff_tree_oid()'s callers
simply ignore its return value, except:

  - diff_root_tree_oid() is a thin wrapper around diff_tree_oid() and
    returns with its return value, but all of diff_root_tree_oid()'s
    callers ignore its return value.

  - rev_compare_tree() and rev_same_tree_as_empty() do look at the
    return value in a condition, but, since the return value is always
    0, the former's < 0 condition is never fulfilled, while the
    latter's >= 0 condition is always fulfilled.

So let's drop the return value of ll_diff_tree_oid(), diff_tree_oid()
and diff_root_tree_oid(), and drop those conditions from
rev_compare_tree() and rev_same_tree_as_empty() as well.

[1] ll_diff_tree_oid() and its ancestors have been returning only 0
    ever since it was introduced as diff_tree() in 9174026cfe (Add
    "diff-tree" program to show which files have changed between two
    trees., 2005-04-09).
[2] diff_tree_oid() traces back to diff-tree.c:main() in 9174026cfe as
    well.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-08 12:28:49 -07:00
Derrick Stolee e369698016 diff: halt tree-diff early after max_changes
When computing the changed-paths bloom filters for the commit-graph,
we limit the size of the filter by restricting the number of paths
in the diff. Instead of computing a large diff and then ignoring the
result, it is better to halt the diff computation early.

Create a new "max_changes" option in struct diff_options. If non-zero,
then halt the diff computation after discovering strictly more changed
paths. This includes paths corresponding to trees that change.

Use this max_changes option in the bloom filter calculations. This
reduces the time taken to compute the filters for the Linux kernel
repo from 2m50s to 2m35s. On a large internal repository with ~500
commits that perform tree-wide changes, the time reduced from
6m15s to 3m48s.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 09:59:53 -07:00
Junio C Hamano 26c816a67d Merge branch 'hw/doc-in-header'
* hw/doc-in-header:
  trace2: move doc to trace2.h
  submodule-config: move doc to submodule-config.h
  tree-walk: move doc to tree-walk.h
  trace: move doc to trace.h
  run-command: move doc to run-command.h
  parse-options: add link to doc file in parse-options.h
  credential: move doc to credential.h
  argv-array: move doc to argv-array.h
  cache: move doc to cache.h
  sigchain: move doc to sigchain.h
  pathspec: move doc to pathspec.h
  revision: move doc to revision.h
  attr: move doc to attr.h
  refs: move doc to refs.h
  remote: move doc to remote.h and refspec.h
  sha1-array: move doc to sha1-array.h
  merge: move doc to ll-merge.h
  graph: move doc to graph.h and graph.c
  dir: move doc to dir.h
  diff: move doc to diff.h and diffcore.h
2019-12-16 13:08:39 -08:00
Heba Waly 13c4d7eb22 diff: move doc to diff.h and diffcore.h
Move the documentation from Documentation/technical/api-diff.txt to both
diff.h and diffcore.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.

Also documentation/technical/api-diff.txt is removed because the information
it has is now redundant and it'll be hard to keep it up to date and
synchronized with the documentation in the header files.

There are three members documented in the doc file that weren't found in
the header files, assuming the doc wasn't up to date and the members
no longer exist:
touched_flags, COLOR_DIFF_WORDS and QUIET.

Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-18 15:21:28 +09:00
Daniel Ferreira e4cb659ebd diff: export diffstat interface
Make the diffstat interface (namely, the diffstat_t struct and
compute_diffstat) no longer be internal to diff.c and allow it to be used
by other parts of git.

This is helpful for code that may want to easily extract information
from files using the diff machinery, while flushing it differently from
how the show_* functions used by diff_flush() do it. One example is the
builtin implementation of git-add--interactive's status.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
Signed-off-by: Slavica Đukić <slawica92@hotmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-14 11:10:04 +09:00
brian m. carlson 36261e42ec patch-id: convert to use the_hash_algo
Convert the two separate patch-id implementations to use the_hash_algo
in their implementation.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19 15:04:57 -07:00
Thomas Gummerer 430be36eb5 range-diff: suppress line count in outer diff
The line count in the outer diff's hunk headers of a range diff is not
all that interesting.  It merely shows how far along the inner diff
are on both sides.  That number is of no use for human readers, and
range-diffs are not meant to be machine readable.

In a subsequent commit we're going to add some more contextual
information such as the filename corresponding to the diff to the hunk
headers.  Remove the unnecessary information, and just keep the "@@"
to indicate that a new hunk of the outer diff is starting.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-11 14:29:27 -07:00
Junio C Hamano 8202d12fca Merge branch 'sb/format-patch-base-patch-id-fix'
The "--base" option of "format-patch" computed the patch-ids for
prerequisite patches in an unstable way, which has been updated to
compute in a way that is compatible with "git patch-id --stable".

* sb/format-patch-base-patch-id-fix:
  format-patch: make --base patch-id output stable
  format-patch: inform user that patch-id generation is unstable
2019-06-13 13:18:46 -07:00
Junio C Hamano 20aa7c594f Merge branch 'nd/diff-parseopt'
A brown-paper-bag bugfix to a change already in 'master'.

* nd/diff-parseopt:
  parse-options: check empty value in OPT_INTEGER and OPT_ABBREV
  diff-parseopt: restore -U (no argument) behavior
  diff-parseopt: correct variable types that are used by parseopt
2019-05-30 10:50:44 -07:00
Nguyễn Thái Ngọc Duy 7f125ff909 diff-parseopt: correct variable types that are used by parseopt
Most number-related OPT_ macros store the value in an 'int'
variable. Many of the variables in 'struct diff_options' have a
different type, but during the conversion to using parse_options() I
failed to notice and correct.

The problem was reported on s360x which is a big-endian
architechture. The variable to store '-w' option in this case is
xdl_opts, 'long' type, 8 bytes. But since parse_options() assumes
'int' (4 bytes), it will store bits in the wrong part of xdl_opts. The
problem was found on little-endian platforms because parse_options()
will accidentally store at the right part of xdl_opts.

There aren't much to say about the type change (except that 'int' for
xdl_opts should still be big enough, since Windows' long is the same
size as 'int' and nobody has complained so far). Some safety checks may
be implemented in the future to prevent class of bugs.

Reported-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-29 11:04:30 -07:00
Stephen Boyd a8f6855f48 format-patch: make --base patch-id output stable
We weren't flushing the context each time we processed a hunk in the
patch-id generation code in diff.c, but we were doing that when we
generated "stable" patch-ids with the 'patch-id' tool. Let's port that
similar logic over from patch-id.c into diff.c so we can get the same
hash when we're generating patch-ids for 'format-patch --base=' types of
command invocations.

Cc: Xiaolong Ye <xiaolong.ye@intel.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-08 19:27:43 +09:00
Junio C Hamano dcd6a8c09a Merge branch 'nd/diff-parseopt-4'
Fourth batch to teach the diff machinery to use the parse-options
API.

* nd/diff-parseopt-4:
  am: avoid diff_opt_parse()
  diff --no-index: use parse_options() instead of diff_opt_parse()
  range-diff: use parse_options() instead of diff_opt_parse()
  diff.c: allow --no-color-moved-ws
  diff-parseopt: convert --color-moved-ws
  diff-parseopt: convert --[no-]color-moved
  diff-parseopt: convert --inter-hunk-context
  diff-parseopt: convert --no-prefix
  diff-parseopt: convert --line-prefix
  diff-parseopt: convert --[src|dst]-prefix
  diff-parseopt: convert --[no-]abbrev
  diff-parseopt: convert --diff-filter
  diff-parseopt: convert --find-object
  diff-parseopt: convert -O
  diff-parseopt: convert --pickaxe-all|--pickaxe-regex
  diff-parseopt: convert -S|-G
  diff-parseopt: convert -l
  diff-parseopt: convert -z
  diff-parseopt: convert --ita-[in]visible-in-index
  diff-parseopt: convert --ws-error-highlight
2019-04-25 16:41:12 +09:00
Nguyễn Thái Ngọc Duy cdb5330a9b am: avoid diff_opt_parse()
diff_opt_parse() is a heavy hammer to just set diff filter. But it's
the only way because of the diff_status_letters[] mapping. Add a new
API to set diff filter and use it in git-am. diff_opt_parse()'s only
remaining call site in revision.c will be gone soon and having it here
just because of git-am does not make sense.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:24 +09:00
Nguyễn Thái Ngọc Duy 16bb3d714d diff --no-index: use parse_options() instead of diff_opt_parse()
While at there, move exit() back to the caller. It's easier to see the
flow that way than burying it in diff-no-index.c

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:24 +09:00
Junio C Hamano 12e5bdd9c4 Merge branch 'jk/diff-no-index-initialize'
"git diff --no-index" may still want to access Git goodies like
--ext-diff and --textconv, but so far these have been ignored,
which has been corrected.

* jk/diff-no-index-initialize:
  diff: reuse diff setup for --no-index case
2019-03-07 09:59:59 +09:00
Junio C Hamano 1dc2f8c122 Merge branch 'jk/unused-params'
Code clean-up.

* jk/unused-params:
  ref-filter: drop unused "sz" parameters
  ref-filter: drop unused "obj" parameters
  ref-filter: drop unused buf/sz pairs
  files-backend: drop refs parameter from split_symref_update()
  pack-objects: drop unused parameter from oe_map_new_pack()
  merge-recursive: drop several unused parameters
  diff: drop complete_rewrite parameter from run_external_diff()
  diff: drop unused emit data parameter from sane_truncate_line()
  diff: drop unused color reset parameters
  diff: drop options parameter from diffcore_fix_diff_index()
2019-03-07 09:59:57 +09:00
Junio C Hamano c425d361f5 Merge branch 'en/combined-all-paths'
Output from "diff --cc" did not show the original paths when the
merge involved renames.  A new option adds the paths in the
original trees to the output.

* en/combined-all-paths:
  log,diff-tree: add --combined-all-paths option
2019-03-07 09:59:54 +09:00
Jeff King 287ab28bfa diff: reuse diff setup for --no-index case
When "--no-index" is in effect (or implied by the arguments), git-diff
jumps early to a special code path to perform that diff. This means we
miss out on some settings like enabling --ext-diff and --textconv by
default.

Let's jump to the no-index path _after_ we've done more setup on
rev.diffopt. Since some of the options don't affect us (e.g., items
related to the index), let's re-order the setup into two blocks (see the
in-code comments).

Note that we also need to stop re-initializing the diffopt struct in
diff_no_index(). This should not be necessary, as it will already have
been initialized by cmd_diff() (and there are no other callers). That in
turn lets us drop the "repository" argument from diff_no_index (which
never made much sense, since the whole point is that you don't need a
repository).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-24 07:08:34 -08:00
Jeff King 784c0daed5 diff: drop options parameter from diffcore_fix_diff_index()
The sole purpose of this function is to fix the sorting order of the
queued diff entries. It doesn't need to know about any diff options, so
we can drop the unused parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-14 15:26:14 -08:00
Elijah Newren d76ce4f734 log,diff-tree: add --combined-all-paths option
The combined diff format for merges will only list one filename, even if
rename or copy detection is active.  For example, with raw format one
might see:

  ::100644 100644 100644 fabadb8 cc95eb0 4866510 MM	describe.c
  ::100755 100755 100755 52b7a2d 6d1ac04 d2ac7d7 RM	bar.sh
  ::100644 100644 100644 e07d6c5 9042e82 ee91881 RR	phooey.c

This doesn't let us know what the original name of bar.sh was in the
first parent, and doesn't let us know what either of the original names
of phooey.c were in either of the parents.  In contrast, for non-merge
commits, raw format does provide original filenames (and a rename score
to boot).  In order to also provide original filenames for merge
commits, add a --combined-all-paths option (which must be used with
either -c or --cc, and is likely only useful with rename or copy
detection active) so that we can print tab-separated filenames when
renames are involved.  This transforms the above output to:

  ::100644 100644 100644 fabadb8 cc95eb0 4866510 MM	desc.c	desc.c	desc.c
  ::100755 100755 100755 52b7a2d 6d1ac04 d2ac7d7 RM	foo.sh	bar.sh	bar.sh
  ::100644 100644 100644 e07d6c5 9042e82 ee91881 RR	fooey.c	fuey.c	phooey.c

Further, in patch format, this changes the from/to headers so that
instead of just having one "from" header, we get one for each parent.
For example, instead of having

  --- a/phooey.c
  +++ b/phooey.c

we would see

  --- a/fooey.c
  --- a/fuey.c
  +++ b/phooey.c

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-07 20:15:25 -08:00
Nguyễn Thái Ngọc Duy 4a28847839 diff.c: prepare to use parse_options() for parsing
This is a preparation step to start using parse_options() to parse
diff/revision options instead of what we have now. There are a couple
of good things from using parse_options():

- better help usage
- easier to add new options
- better completion support
- help usage generation
- better integration with main command option parser. We can just
  concat the main command's option array and diffopt's together and
  parse all in one go.
- detect colidding options (e.g. --reverse is used by revision code,
  so diff code can't use it as long name for -R)
- consistent syntax, e.g. option that takes mandatory argument will
  now accept both "--option=value" and "--option value".

The plan is migrate all diff/rev options to parse_options(). Then we
could get rid of diff_opt_parse() and expose parseopts[] directly to
the caller.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-27 16:28:18 -08:00
Nguyễn Thái Ngọc Duy 2b393ef357 diff.h: avoid bit fields in struct diff_flags
Bitfield addresses cannot be passed around in a pointer. This makes it
hard to use parse-options to set/unset them. Turn this struct to
normal integers. This of course increases the size of this struct
multiple times, but since we only have a handful of diff_options
variables around, memory consumption is not at all a concern.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-27 16:28:18 -08:00
Nguyễn Thái Ngọc Duy 4a1b13a397 diff.h: keep forward struct declarations sorted
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-27 16:28:18 -08:00
Junio C Hamano 932b867be0 Merge branch 'sb/diff-color-moved-config-option-fixup'
Minor inconsistency fix.

* sb/diff-color-moved-config-option-fixup:
  diff: align move detection error handling with other options
2019-01-14 15:29:31 -08:00
Stefan Beller d173e799ea diff: align move detection error handling with other options
This changes the error handling for the options --color-moved-ws
and --color-moved-ws to be like the rest of the options.

Move the die() call out of parse_color_moved_ws into the parsing
of command line options. As the function returns a bit field, change
its signature to return an unsigned instead of an int; add a new bit
to signal errors. Once the error is signaled, we discard the other
bits, such that it doesn't matter if the error bit overlaps with any
other bit.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-14 16:27:50 +09:00
Nguyễn Thái Ngọc Duy bd7ad45b64 notes-cache.c: remove the_repository references
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12 14:50:06 +09:00
Nguyễn Thái Ngọc Duy ffc00a48ef diff-lib.c: remove the_repository references
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12 14:50:06 +09:00
Junio C Hamano 11877b9ebe Merge branch 'nd/the-index'
Various codepaths in the core-ish part learn to work on an
arbitrary in-core index structure, not necessarily the default
instance "the_index".

* nd/the-index: (23 commits)
  revision.c: reduce implicit dependency the_repository
  revision.c: remove implicit dependency on the_index
  ws.c: remove implicit dependency on the_index
  tree-diff.c: remove implicit dependency on the_index
  submodule.c: remove implicit dependency on the_index
  line-range.c: remove implicit dependency on the_index
  userdiff.c: remove implicit dependency on the_index
  rerere.c: remove implicit dependency on the_index
  sha1-file.c: remove implicit dependency on the_index
  patch-ids.c: remove implicit dependency on the_index
  merge.c: remove implicit dependency on the_index
  merge-blobs.c: remove implicit dependency on the_index
  ll-merge.c: remove implicit dependency on the_index
  diff-lib.c: remove implicit dependency on the_index
  read-cache.c: remove implicit dependency on the_index
  diff.c: remove implicit dependency on the_index
  grep.c: remove implicit dependency on the_index
  diff.c: remove the_index dependency in textconv() functions
  blame.c: rename "repo" argument to "r"
  combine-diff.c: remove implicit dependency on the_index
  ...
2018-10-19 13:34:02 +09:00
Nguyễn Thái Ngọc Duy acd00ea049 userdiff.c: remove implicit dependency on the_index
[jc: squashed in missing forward decl in userdiff.h found by Ramsay]

Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21 09:50:58 -07:00
Nguyễn Thái Ngọc Duy e675765235 diff.c: remove implicit dependency on the_index
A new variant repo_diff_setup() is added that takes 'struct repository *'
and diff_setup() becomes a thin macro around it that is protected by
NO_THE_REPOSITORY_COMPATIBILITY_MACROS, similar to NO_THE_INDEX_....
The plan is these macros will always be defined for all library files
and the macros are only accessible in builtin/

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21 09:48:10 -07:00
Nguyễn Thái Ngọc Duy 6afaf80785 diff.c: remove the_index dependency in textconv() functions
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21 09:48:10 -07:00
Nguyễn Thái Ngọc Duy b78ea5fc35 diff.c: reduce implicit dependency on the_index
diff and textconv code has so widespread use that it's hard to simply
update their api and all call sites at once because it would result in
a big patch. For now reduce the_index references to two places:
diff_setup() and fill_textconv().

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21 09:48:10 -07:00
Junio C Hamano 30035d1d60 Merge branch 'sb/range-diff-colors'
The color output support for recently introduced "range-diff"
command got tweaked a bit.

* sb/range-diff-colors:
  range-diff: indent special lines as context
  range-diff: make use of different output indicators
  diff.c: add --output-indicator-{new, old, context}
  diff.c: rewrite emit_line_0 more understandably
  diff.c: omit check for line prefix in emit_line_0
  diff: use emit_line_0 once per line
  diff.c: add set_sign to emit_line_0
  diff.c: reorder arguments for emit_line_ws_markup
  diff.c: simplify caller of emit_line_0
  t3206: add color test for range-diff --dual-color
  test_decode_color: understand FAINT and ITALIC
2018-09-17 13:53:54 -07:00
Stefan Beller 7648b79eee diff.c: add --output-indicator-{new, old, context}
This will prove useful in range-diff in a later patch as we will be able to
differentiate between adding a new file (that line is starting with +++
and then the file name) and regular new lines.

It could also be useful for experimentation in new patch formats, i.e.
we could teach git to emit moved lines with lines other than +/-.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 14:33:24 -07:00
Junio C Hamano 81eab6871e Merge branch 'js/range-diff'
"git tbdiff" that lets us compare individual patches in two
iterations of a topic has been rewritten and made into a built-in
command.

* js/range-diff: (21 commits)
  range-diff: use dim/bold cues to improve dual color mode
  range-diff: make --dual-color the default mode
  range-diff: left-pad patch numbers
  completion: support `git range-diff`
  range-diff: populate the man page
  range-diff --dual-color: skip white-space warnings
  range-diff: offer to dual-color the diffs
  diff: add an internal option to dual-color diffs of diffs
  color: add the meta color GIT_COLOR_REVERSE
  range-diff: use color for the commit pairs
  range-diff: add tests
  range-diff: do not show "function names" in hunk headers
  range-diff: adjust the output of the commit pairs
  range-diff: suppress the diff headers
  range-diff: indent the diffs just like tbdiff
  range-diff: right-trim commit messages
  range-diff: also show the diff between patches
  range-diff: improve the order of the shown commits
  range-diff: first rudimentary implementation
  Introduce `range-diff` to compare iterations of a topic branch
  ...
2018-08-20 11:33:53 -07:00
Nguyễn Thái Ngọc Duy ff7fe37b05 diff.c: move read_index() code back to the caller
This code is only needed for diff-tree (since f0c6b2a2fd ([PATCH]
Optimize diff-tree -[CM] --stdin - 2005-05-27)). Let the caller do the
preparation instead and avoid read_index() in diff.c code.

read_index() should be avoided (in addition to the_index) because it
uses get_index_file() underneath to get the path $GIT_DIR/index. This
effectively pulls the_repository in and may become the only reason to
pull a 'struct repository *' in diff.c. Let's keep the dependencies as
few as possible and kick it back to diff-tree.c

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 14:14:42 -07:00
Johannes Schindelin a7be92acd9 range-diff: use dim/bold cues to improve dual color mode
It *is* a confusing thing to look at a diff of diffs. All too easy is it
to mix up whether the -/+ markers refer to the "inner" or the "outer"
diff, i.e. whether a `+` indicates that a line was added by either the
old or the new diff (or both), or whether the new diff does something
different than the old diff.

To make things easier to process for normal developers, we introduced
the dual color mode which colors the lines according to the commit diff,
i.e. lines that are added by a commit (whether old, new, or both) are
colored in green. In non-dual color mode, the lines would be colored
according to the outer diff: if the old commit added a line, it would be
colored red (because that line addition is only present in the first
commit range that was specified on the command-line, i.e. the "old"
commit, but not in the second commit range, i.e. the "new" commit).

However, this dual color mode is still not making things clear enough,
as we are looking at two levels of diffs, and we still only pick a color
according to *one* of them (the outer diff marker is colored
differently, of course, but in particular with deep indentation, it is
easy to lose track of that outer diff marker's background color).

Therefore, let's add another dimension to the mix. Still use
green/red/normal according to the commit diffs, but now also dim the
lines that were only in the old commit, and use bold face for the lines
that are only in the new commit.

That way, it is much easier not to lose track of, say, when we are
looking at a line that was added in the previous iteration of a patch
series but the new iteration adds a slightly different version: the
obsolete change will be dimmed, the current version of the patch will be
bold.

At least this developer has a much easier time reading the range-diffs
that way.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 10:44:52 -07:00
Johannes Schindelin f7c3b4e2d8 diff: add an internal option to dual-color diffs of diffs
When diffing diffs, it can be quite daunting to figure out what the heck
is going on, as there are nested +/- signs.

Let's make this easier by adding a flag in diff_options that allows
color-coding the outer diff sign with inverted colors, so that the
preimage and postimage is colored like the diff it is.

Of course, this really only makes sense when the preimage and postimage
*are* diffs. So let's not expose this flag via a command-line option for
now.

This is a feature that was invented by git-tbdiff, and it will be used
by `git range-diff` in the next commit, by offering it via a new option:
`--dual-color`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 10:44:51 -07:00
Johannes Schindelin 1cdde296a5 range-diff: suppress the diff headers
When showing the diff between corresponding patches of the two branch
versions, we have to make up a fake filename to run the diff machinery.

That filename does not carry any meaningful information, hence tbdiff
suppresses it. So we should, too.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 10:44:51 -07:00
Nguyễn Thái Ngọc Duy f758a7f8ac diff.h: remove extern from function declaration
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-03 10:42:55 -07:00
Junio C Hamano a81575aa91 Merge branch 'sb/diff-color-move-more'
"git diff --color-moved" feature has further been tweaked.

* sb/diff-color-move-more:
  diff.c: offer config option to control ws handling in move detection
  diff.c: add white space mode to move detection that allows indent changes
  diff.c: factor advance_or_nullify out of mark_color_as_moved
  diff.c: decouple white space treatment from move detection algorithm
  diff.c: add a blocks mode for moved code detection
  diff.c: adjust hash function signature to match hashmap expectation
  diff.c: do not pass diff options as keydata to hashmap
  t4015: avoid git as a pipe input
  xdiff/xdiffi.c: remove unneeded function declarations
  xdiff/xdiff.h: remove unused flags
2018-08-02 15:30:40 -07:00
Stefan Beller ca1f4ae4df diff.c: add white space mode to move detection that allows indent changes
The option of --color-moved has proven to be useful as observed on the
mailing list. However when refactoring sometimes the indentation changes,
for example when partitioning a functions into smaller helper functions
the code usually mostly moved around except for a decrease in indentation.

To just review the moved code ignoring the change in indentation, a mode
to ignore spaces in the move detection as implemented in a previous patch
would be enough.  However the whole move coloring as motivated in commit
2e2d5ac (diff.c: color moved lines differently, 2017-06-30), brought
up the notion of the reviewer being able to trust the move of a "block".

As there are languages such as python, which depend on proper relative
indentation for the control flow of the program, ignoring any white space
change in a block would not uphold the promises of 2e2d5ac that allows
reviewers to pay less attention to the inside of a block, as inside
the reviewer wants to assume the same program flow.

This new mode of white space ignorance will take this into account and will
only allow the same white space changes per line in each block. This patch
even allows only for the same change at the beginning of the lines.

As this is a white space mode, it is made exclusive to other white space
modes in the move detection.

This patch brings some challenges, related to the detection of blocks.
We need a wide net to catch the possible moved lines, but then need to
narrow down to check if the blocks are still intact. Consider this
example (ignoring block sizes):

 - A
 - B
 - C
 +    A
 +    B
 +    C

At the beginning of a block when checking if there is a counterpart
for A, we have to ignore all space changes. However at the following
lines we have to check if the indent change stayed the same.

Checking if the indentation change did stay the same, is done by computing
the indentation change by the difference in line length, and then assume
the change is only in the beginning of the longer line, the common tail
is the same. That is why the test contains lines like:

 - <TAB> A
 ...
 + A <TAB>
 ...

As the first line starting a block is caught using a compare function that
ignores white spaces unlike the rest of the block, where the white space
delta is taken into account for the comparison, we also have to think about
the following situation:

 - A
 - B
 -   A
 -   B
 +    A
 +    B
 +      A
 +      B

When checking if the first A (both in the + and - lines) is a start of
a block, we have to check all 'A' and record all the white space deltas
such that we can find the example above to be just one block that is
indented.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-19 12:02:54 -07:00
Stefan Beller b3095712f9 diff.c: decouple white space treatment from move detection algorithm
In the original implementation of the move detection logic the choice for
ignoring white space changes is the same for the move detection as it is
for the regular diff.  Some cases came up where different treatment would
have been nice.

Allow the user to specify that white space should be ignored differently
during detection of moved lines than during generation of added and removed
lines. This is done by providing analogs to the --ignore-space-at-eol,
-b, and -w options by introducing the option --color-moved-ws=<modes>
with the modes named "ignore-space-at-eol", "ignore-space-change" and
"ignore-all-space", which is used only during the move detection phase.

As we change the default, we'll adjust the tests.

For now we do not infer any options to treat white spaces in the move
detection from the generic white space options given to diff.
This can be tuned later to reasonable default.

As we plan on adding more white space related options in a later patch,
that interferes with the current white space options, use a flag field
and clamp it down to  XDF_WHITESPACE_FLAGS, as that (a) allows to easily
check at parse time if we give invalid combinations and (b) can reuse
parts of this patch.

By having the white space treatment in its own option, we'll also
make it easier for a later patch to have an config option for
spaces in the move detection.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17 11:25:31 -07:00
Stefan Beller 51da15eb23 diff.c: add a blocks mode for moved code detection
The new "blocks" mode provides a middle ground between plain and zebra.
It is as intuitive (few colors) as plain, but still has the requirement
for a minimum of lines/characters to count a block as moved.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
 (https://public-inbox.org/git/87o9j0uljo.fsf@evledraar.gmail.com/)
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17 11:25:31 -07:00
Ben Peart 85b460305c merge: add merge.renames config setting
Add the ability to control rename detection for merge via a config setting.
This setting behaves the same and defaults to the value of diff.renames but only
applies to merge.

Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-08 16:19:41 +09:00
Junio C Hamano 868f7d2338 Merge branch 'nd/diff-stat-with-summary'
"git diff" and friends learned "--compact-summary" that shows the
information usually given with the "--summary" option on the same
line as the diffstat output of the "--stat" option (which saves
vertical space and keeps info on a single path at the same place).

* nd/diff-stat-with-summary:
  diff: add --compact-summary
  diff.c: refactor pprint_rename() to use strbuf
2018-03-14 12:01:02 -07:00
Nguyễn Thái Ngọc Duy ddf88fa616 diff: add --compact-summary
Certain information is currently shown with --summary, but when used
in combination with --stat it's a bit hard to read since info of the
same file is in two places (--stat and --summary).

On top of that, commits that add or remove files double the number of
display lines, which could be a lot if you add or remove a lot of
files.

--compact-summary embeds most of --summary back in --stat in the
little space between the file name part and the graph line, e.g. with
commit 0433d533f1:

   Documentation/merge-config.txt         |  4 +
   builtin/merge.c                        |  2 +
   ...-pull-verify-signatures.sh (new +x) | 81 ++++++++++++++
   t/t7612-merge-verify-signatures.sh     | 45 ++++++++
   4 files changed, 132 insertions(+)

It helps both condensing information and saving some text
space. What's new in diffstat is:

- A new 0644 file is shown as (new)
- A new 0755 file is shown as (new +x)
- A new symlink is shown as (new +l)
- A deleted file is shown as (gone)
- A mode change adding executable bit is shown as (mode +x)
- A mode change removing it is shown as (mode -x)

Note that --compact-summary does not contain all the information
--summary provides. Rewrite percentage is not shown but it could be
added later, like R50% or C20%.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-27 15:22:47 -08:00
Junio C Hamano c0d75f0e2e Merge branch 'sb/diff-blobfind-pickaxe'
"diff" family of commands learned "--find-object=<object-id>" option
to limit the findings to changes that involve the named object.

* sb/diff-blobfind-pickaxe:
  diff: use HAS_MULTI_BITS instead of counting bits manually
  diff: properly error out when combining multiple pickaxe options
  diffcore: add a pickaxe option to find a specific blob
  diff: introduce DIFF_PICKAXE_KINDS_MASK
  diff: migrate diff_flags.pickaxe_ignore_case to a pickaxe_opts bit
  diff.h: make pickaxe_opts an unsigned bit field
2018-01-23 13:16:37 -08:00
Stefan Beller 15af58c1ad diffcore: add a pickaxe option to find a specific blob
Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

One might be tempted to extend git-describe to also work with blobs,
such that `git describe <blob-id>` gives a description as
'<commit-ish>:<path>'.  This was implemented at [2]; as seen by the sheer
number of responses (>110), it turns out this is tricky to get right.
The hard part to get right is picking the correct 'commit-ish' as that
could be the commit that (re-)introduced the blob or the blob that
removed the blob; the blob could exist in different branches.

Junio hinted at a different approach of solving this problem, which this
patch implements. Teach the diff machinery another flag for restricting
the information to what is shown. For example:

    $ ./git log --oneline --find-object=v2.0.0:Makefile
    b2feb64309 Revert the whole "ask curl-config" topic for now
    47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"

we observe that the Makefile as shipped with 2.0 was appeared in
v1.9.2-471-g47fbfded53 and in v2.0.0-rc1-5-gb2feb6430b.  The
reason why these commits both occur prior to v2.0.0 are evil
merges that are not found using this new mechanism.

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob
[2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-04 15:02:40 -08:00
Stefan Beller cf63051ada diff: introduce DIFF_PICKAXE_KINDS_MASK
Currently the check whether to perform pickaxing is done via checking
`diffopt->pickaxe`, which contains the command line argument that we
want to pickaxe for. Soon we'll introduce a new type of pickaxing, that
will not store anything in the `.pickaxe` field, so let's migrate the
check to be dependent on pickaxe_opts.

It is not enough to just replace the check for pickaxe by pickaxe_opts,
because flags might be set, but pickaxing was not requested ('-i').
To cope with that, introduce a mask to check only for the bits indicating
the modes of operation.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-04 15:02:40 -08:00
Stefan Beller c1ddc4610c diff: migrate diff_flags.pickaxe_ignore_case to a pickaxe_opts bit
Currently flags for pickaxing are found in different places. Unify the
flags into the `pickaxe_opts` field, which will contain any pickaxe related
flags.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-04 15:02:40 -08:00
Stefan Beller 929ed70a72 diff.h: make pickaxe_opts an unsigned bit field
This variable is used as a bit field[1], and as we are about to add more
fields, indicate its usage as a bit field by making it unsigned.

[1] containing the bits

    #define DIFF_PICKAXE_ALL	1
    #define DIFF_PICKAXE_REGEX	2
    #define DIFF_PICKAXE_KIND_S	4
    #define DIFF_PICKAXE_KIND_G	8

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-04 15:02:40 -08:00
Jonathan Tan 2477ab2ea8 diff: support anchoring line(s)
Teach diff a new algorithm, one that attempts to prevent user-specified
lines from appearing as a deletion or addition in the end result. The
end user can use this by specifying "--anchored=<text>" one or more
times when using Git commands like "diff" and "show".

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-28 10:40:04 +09:00
Junio C Hamano 8cc633286a Merge branch 'bw/diff-opt-impl-to-bitfields'
A single-word "unsigned flags" in the diff options is being split
into a structure with many bitfields.

* bw/diff-opt-impl-to-bitfields:
  diff: make struct diff_flags members lowercase
  diff: remove DIFF_OPT_CLR macro
  diff: remove DIFF_OPT_SET macro
  diff: remove DIFF_OPT_TST macro
  diff: remove touched flags
  diff: add flag to indicate textconv was set via cmdline
  diff: convert flags to be stored in bitfields
  add, reset: use DIFF_OPT_SET macro to set a diff flag
2017-11-09 14:31:27 +09:00
Junio C Hamano f4c214b529 Merge branch 'jk/revision-pruning-optim'
Pathspec-limited revision traversal was taught not to keep finding
unneeded differences once it knows two trees are different inside
given pathspec.

* jk/revision-pruning-optim:
  revision: quit pruning diff more quickly when possible
2017-11-06 14:24:26 +09:00
Brandon Williams 0d1e0e7801 diff: make struct diff_flags members lowercase
Now that the flags stored in struct diff_flags are being accessed
directly and not through macros, change all struct members from being
uppercase to lowercase.
This conversion is done using the following semantic patch:

	@@
	expression E;
	@@
	- E.RECURSIVE
	+ E.recursive

	@@
	expression E;
	@@
	- E.TREE_IN_RECURSIVE
	+ E.tree_in_recursive

	@@
	expression E;
	@@
	- E.BINARY
	+ E.binary

	@@
	expression E;
	@@
	- E.TEXT
	+ E.text

	@@
	expression E;
	@@
	- E.FULL_INDEX
	+ E.full_index

	@@
	expression E;
	@@
	- E.SILENT_ON_REMOVE
	+ E.silent_on_remove

	@@
	expression E;
	@@
	- E.FIND_COPIES_HARDER
	+ E.find_copies_harder

	@@
	expression E;
	@@
	- E.FOLLOW_RENAMES
	+ E.follow_renames

	@@
	expression E;
	@@
	- E.RENAME_EMPTY
	+ E.rename_empty

	@@
	expression E;
	@@
	- E.HAS_CHANGES
	+ E.has_changes

	@@
	expression E;
	@@
	- E.QUICK
	+ E.quick

	@@
	expression E;
	@@
	- E.NO_INDEX
	+ E.no_index

	@@
	expression E;
	@@
	- E.ALLOW_EXTERNAL
	+ E.allow_external

	@@
	expression E;
	@@
	- E.EXIT_WITH_STATUS
	+ E.exit_with_status

	@@
	expression E;
	@@
	- E.REVERSE_DIFF
	+ E.reverse_diff

	@@
	expression E;
	@@
	- E.CHECK_FAILED
	+ E.check_failed

	@@
	expression E;
	@@
	- E.RELATIVE_NAME
	+ E.relative_name

	@@
	expression E;
	@@
	- E.IGNORE_SUBMODULES
	+ E.ignore_submodules

	@@
	expression E;
	@@
	- E.DIRSTAT_CUMULATIVE
	+ E.dirstat_cumulative

	@@
	expression E;
	@@
	- E.DIRSTAT_BY_FILE
	+ E.dirstat_by_file

	@@
	expression E;
	@@
	- E.ALLOW_TEXTCONV
	+ E.allow_textconv

	@@
	expression E;
	@@
	- E.TEXTCONV_SET_VIA_CMDLINE
	+ E.textconv_set_via_cmdline

	@@
	expression E;
	@@
	- E.DIFF_FROM_CONTENTS
	+ E.diff_from_contents

	@@
	expression E;
	@@
	- E.DIRTY_SUBMODULES
	+ E.dirty_submodules

	@@
	expression E;
	@@
	- E.IGNORE_UNTRACKED_IN_SUBMODULES
	+ E.ignore_untracked_in_submodules

	@@
	expression E;
	@@
	- E.IGNORE_DIRTY_SUBMODULES
	+ E.ignore_dirty_submodules

	@@
	expression E;
	@@
	- E.OVERRIDE_SUBMODULE_CONFIG
	+ E.override_submodule_config

	@@
	expression E;
	@@
	- E.DIRSTAT_BY_LINE
	+ E.dirstat_by_line

	@@
	expression E;
	@@
	- E.FUNCCONTEXT
	+ E.funccontext

	@@
	expression E;
	@@
	- E.PICKAXE_IGNORE_CASE
	+ E.pickaxe_ignore_case

	@@
	expression E;
	@@
	- E.DEFAULT_FOLLOW_RENAMES
	+ E.default_follow_renames

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:51:40 +09:00
Brandon Williams b2100e5291 diff: remove DIFF_OPT_CLR macro
Remove the `DIFF_OPT_CLR` macro and instead set the flags directly.
This conversion is done using the following semantic patch:

	@@
	expression E;
	identifier fld;
	@@
	- DIFF_OPT_CLR(&E, fld)
	+ E.flags.fld = 0

	@@
	type T;
	T *ptr;
	identifier fld;
	@@
	- DIFF_OPT_CLR(ptr, fld)
	+ ptr->flags.fld = 0

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:51:30 +09:00
Brandon Williams 23dcf77f48 diff: remove DIFF_OPT_SET macro
Remove the `DIFF_OPT_SET` macro and instead set the flags directly.
This conversion is done using the following semantic patch:

	@@
	expression E;
	identifier fld;
	@@
	- DIFF_OPT_SET(&E, fld)
	+ E.flags.fld = 1

	@@
	type T;
	T *ptr;
	identifier fld;
	@@
	- DIFF_OPT_SET(ptr, fld)
	+ ptr->flags.fld = 1

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:50:03 +09:00
Brandon Williams 3b69daed86 diff: remove DIFF_OPT_TST macro
Remove the `DIFF_OPT_TST` macro and instead access the flags directly.
This conversion is done using the following semantic patch:

	@@
	expression E;
	identifier fld;
	@@
	- DIFF_OPT_TST(&E, fld)
	+ E.flags.fld

	@@
	type T;
	T *ptr;
	identifier fld;
	@@
	- DIFF_OPT_TST(ptr, fld)
	+ ptr->flags.fld

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:50:03 +09:00
Brandon Williams 25567af805 diff: remove touched flags
Now that the set of parallel touched flags are no longer being used,
remove them.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:50:02 +09:00
Brandon Williams afa73c5384 diff: add flag to indicate textconv was set via cmdline
git-show is unique in that it wants to use textconv by default except
for when it is showing blobs.  When asked to show a blob, show doesn't
want to use textconv unless the user explicitly requested that it be
used by providing the command line flag '--textconv'.

Currently this is done by using a parallel set of 'touched' flags which
get set every time a particular flag is set or cleared.  In a future
patch we want to eliminate this parallel set of flags so instead of
relying on if the textconv flag has been touched, add a new flag
'TEXTCONV_SET_VIA_CMDLINE' which is only set if textconv is set to true
via the command line.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:50:02 +09:00
Brandon Williams 02f2f56bc3 diff: convert flags to be stored in bitfields
We cannot add many more flags to the diff machinery due to the
limitations of the number of flags that can be stored in a single
unsigned int.  In order to allow for more flags to be added to the diff
machinery in the future this patch converts the flags to be stored in
bitfields in 'struct diff_flags'.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:50:02 +09:00
Jeff King a937b37e76 revision: quit pruning diff more quickly when possible
When the revision traversal machinery is given a pathspec,
we must compute the parent-diff for each commit to determine
which ones are TREESAME. We set the QUICK diff flag to avoid
looking at more entries than we need; we really just care
whether there are any changes at all.

But there is one case where we want to know a bit more: if
--remove-empty is set, we care about finding cases where the
change consists only of added entries (in which case we may
prune the parent in try_to_simplify_commit()). To cover that
case, our file_add_remove() callback does not quit the diff
upon seeing an added entry; it keeps looking for other types
of entries.

But this means when --remove-empty is not set (and it is not
by default), we compute more of the diff than is necessary.
You can see this in a pathological case where a commit adds
a very large number of entries, and we limit based on a
broad pathspec. E.g.:

  perl -e '
    chomp(my $blob = `git hash-object -w --stdin </dev/null`);
    for my $a (1..1000) {
      for my $b (1..1000) {
        print "100644 $blob\t$a/$b\n";
      }
    }
  ' | git update-index --index-info
  git commit -qm add

  git rev-list HEAD -- .

This case takes about 100ms now, but after this patch only
needs 6ms. That's not a huge improvement, but it's easy to
get and it protects us against even more pathological cases
(e.g., going from 1 million to 10 million files would take
ten times as long with the current code, but not increase at
all after this patch).

This is reported to minorly speed-up pathspec limiting in
real world repositories (like the 100-million-file Windows
repository), but probably won't make a noticeable difference
outside of pathological setups.

This patch actually covers the case without --remove-empty,
and the case where we see only deletions. See the in-code
comment for details.

Note that we have to add a new member to the diff_options
struct so that our callback can see the value of
revs->remove_empty_trees. This callback parameter could be
passed to the "add_remove" and "change" callbacks, but
there's not much point. They already receive the
diff_options struct, and doing it this way avoids having to
update the function signature of the other callbacks
(arguably the format_callback and output_prefix functions
could benefit from the same simplification).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-14 11:43:49 +09:00
Jonathan Tan f0b8fb6e59 diff: define block by number of alphanumeric chars
The existing behavior of diff --color-moved=zebra does not define the
minimum size of a block at all, instead relying on a heuristic applied
later to filter out sets of adjacent moved lines that are shorter than 3
lines long. This can be confusing, because a block could thus be colored
as moved at the source but not at the destination (or vice versa),
depending on its neighbors.

Instead, teach diff that the minimum size of a block is 20 alphanumeric
characters, the same heuristic used by "git blame". This allows diff to
still exclude uninteresting lines appearing on their own (such as those
solely consisting of one or a few closing braces), as was the intention
of the adjacent-moved-line heuristic.

This requires a change in some tests in that some of their lines are no
longer considered to be part of a block, because they are too short.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-16 11:44:00 -07:00
Stefan Beller 86b452e276 diff.c: add dimming to moved line detection
Any lines inside a moved block of code are not interesting. Boundaries
of blocks are only interesting if they are next to another block of moved
code.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:59:42 -07:00
Stefan Beller 176841f0c9 diff.c: color moved lines differently, plain mode
Add the 'plain' mode for move detection of code. This omits the checking
for adjacent blocks, so it is not as useful. If you have a lot of the
same blocks moved in the same patch, the 'Zebra' would end up slow as it
is O(n^2) (n is number of same blocks). So this may be useful there and
is generally easy to add. Instead be very literal at the move detection,
do not skip over short blocks here.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:59:42 -07:00