This replaces the greedy implementation to coalesce lost lines by using
dynamic programming to find the Longest Common Subsequence.
The O(n²) time complexity is obviously bigger than previous
implementation but it can produce shorter diff results (and most likely
easier to read).
List of lost lines is now doubly-linked because we reverse-read it when
reading the direction matrix.
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The combined diff --cc output does not honor options to ignore
whitespace changes (-b, -w, and --ignore-space-at-eol).
Correct this by passing diff flags to diff engine, so that combined
diff behaves as normal diff does with spaces, and by coalescing
lines that are removed from both (or more) parents, honoring the
same rule to ignore whitespace changes.
With this change, a conflict-less merge done using a ignore-*
strategy option will not show any conflict if shown in combined-diff
using the same option.
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running "git log --graph --cc -p" the diff output for merges is not
indented by the graph structure, unlike the diffs of non-merge commits
(added in commit 7be5761 - diff.c: Output the text graph padding before
each diff line).
Fix this by teaching the combined diff code to output diff_line_prefix()
before each line.
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "raw" format of combine-diff output is supposed to have as many
colons as there are parents at the beginning, then blob modes for
these parents, and then object names for these parents.
We weren't however prepared to handle a more than 32-way merge and
did not show the correct number of colons in such a case.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff code represents paths using the diff_filespec
struct. This struct has a sha1 to represent the sha1 of the
content at that path, as well as a sha1_valid member which
indicates whether its sha1 field is actually useful. If
sha1_valid is not true, then the filespec represents a
working tree file (e.g., for the no-index case, or for when
the index is not up-to-date).
The diff_filespec is only used internally, though. At the
interfaces to the diff subsystem, callers feed the sha1
directly, and we create a diff_filespec from it. It's at
that point that we look at the sha1 and decide whether it is
valid or not; callers may pass the null sha1 as a sentinel
value to indicate that it is not.
We should not typically see the null sha1 coming from any
other source (e.g., in the index itself, or from a tree).
However, a corrupt tree might have a null sha1, which would
cause "diff --patch" to accidentally diff the working tree
version of a file instead of treating it as a blob.
This patch extends the edges of the diff interface to accept
a "sha1_valid" flag whenever we accept a sha1, and to use
that flag when creating a filespec. In some cases, this
means passing the flag through several layers, making the
code change larger than would be desirable.
One alternative would be to simply die() upon seeing
corrupted trees with null sha1s. However, this fix more
directly addresses the problem (while bogus sha1s in a tree
are probably a bad thing, it is really the sentinel
confusion sending us down the wrong code path that is what
makes it devastating). And it means that git is more capable
of examining and debugging these corrupted trees. For
example, you can still "diff --raw" such a tree to find out
when the bogus entry was introduced; you just cannot do a
"--patch" diff (just as you could not with any other
corrupted tree, as we do not have any content to diff).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If both la and context are zero at the start of the loop, la wraps around
and we end up reading from memory far away. Skip the loop in that case
instead.
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of passing the hash of a commit and then searching that
same commit in the single caller, simply pass the commit directly.
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Maintaining an array of hashes is easier using sha1_array than
open-coding it. This patch also fixes a leak of the SHA1 array
in diff_tree_combined_merge().
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This teaches combine-diff machinery to feed a combined merge to a callback
function when DIFF_FORMAT_CALLBACK is specified.
So far, format callback functions are not used for anything but 2-way
diffs. A callback is given a diff_queue_struct, which is an array of
diff_filepair. As its name suggests, a diff_filepair is a _pair_ of
diff_filespec that represents a single preimage and a single postimage.
Since "diff -c" is to compare N parents with a single merge result and
filter out any paths whose result match one (or more) of the parent(s),
its output has to be able to represent N preimages and 1 postimage. For
this reason, a callback function that inspects a diff_filepair that
results from this new infrastructure can and is expected to view the
preimage side (i.e. pair->one) as an array of diff_filespec. Each element
in the array, except for the last one, is marked with "has_more_entries"
bit, so that the same callback function can be used for 2-way diffs and
combined diffs.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This lets us store more than just a bit flag for whether we
want color; we can also store whether we want automatic
colors. This can be useful for making the automatic-color
decision closer to the point of use.
This mostly just involves replacing DIFF_OPT_* calls with
manipulations of the flag. The biggest exception is that
calls to DIFF_OPT_TST must check for "o->use_color > 0",
which lets an "unknown" value (i.e., the default) stay at
"no color". In the previous code, a value of "-1" was not
propagated at all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The combined diff machinery can be used to compare:
- a merge commit with its parent commits;
- a working-tree file with multiple stages in an unmerged index; or
- a working-tree file with the HEAD and the index.
The internal function combine-diff.c:show_patch_diff() checked if it needs
to read the "result" from the working tree by looking at the object name
of the result --- if it is null_sha1, it read from the working tree.
This mistook a merge that records a deletion as the conflict resolution
as if it is a cue to read from the working tree. Pass this information
explicitly from the caller instead.
Noticed and reported by Johan Herland.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When doing a combined diff, we did not respect textconv attributes at
all. This generally lead to us printing "Binary files differ" when we
could show a combined diff of the converted text.
This patch converts file contents according to textconv attributes. The
implementation is slightly ugly; because the textconv code is tightly
linked with the diff_filespec code, we temporarily create a diff_filespec
during conversion. In practice, though, this should not create a
performance problem.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The combined diff code path is totally different from the
regular diff code path, and didn't handle binary files at
all. The results of a combined diff on a binary file could
range from annoying (since we spewed binary garbage,
possibly upsetting the user's terminal), to wrong (embedded
NULs caused us to show incorrect diffs, with lines truncated
at the NUL character), to potential security problems
(embedded NULs could interfere with "-z" output, possibly
defeating policy hooks which parse diff output).
Instead, we consider a combined diff to be binary if any of
the input blobs is binary. To show a binary combined diff,
we indicate "Binary blobs differ"; the "index" meta line
will show which parents had which blob.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One loop combined both the patch generation and checking
whether there was any mode change to report. Let's factor
that into two separate loops, as we may care about the mode
change even if we are not generating patches (e.g., because
we are showing a binary diff, which will come in a future
patch).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a pretty big logical chunk, so it makes the function
a bit more readable to have it split out. In addition, it
will make it easier to add an alternate code path for binary
diffs in a future patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdi_diff_outf() overrides the structure members of its last parameter,
ignoring any value that callers pass in. It's no surprise then that all
callers pass a pointer to an uninitialized structure. They also don't
read it after the call, so the parameter is neither used for input nor
for output. Turn it into a local variable of xdi_diff_outf().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since the xdiff library had been introduced to git, all its callers
have used the flag XDF_NEED_MINIMAL. It makes sure that the smallest
possible diff is produced, but that takes quite some time if there are
lots of differences that can be expressed in multiple ways.
This flag makes a difference for only 0.1% of the non-merge commits in
the git repo of Linux, both in terms of diff size and execution time.
The patches there are mostly nice and small.
SungHyun Nam however reported a case in a different repo where a diff
took more than 20 times longer to generate with XDF_NEED_MINIMAL than
without. Rebasing became really slow.
This patch removes this flag from all callers. The default of xdiff is
saner because it has minimal to no impact in the normal case of small
diffs and doesn't incur that much of a speed penalty for large ones.
A follow-up patch may introduce a command line option to set the flag if
the user needs it, similar to GNU diff's -d/--minimal.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Consider an evil merge of two commits A and B, both of which have a
file 'foo', but the merge result does not have that file.
The combined-diff code learned in 4462731 (combine-diff: do not punt
on removed or added files., 2006-02-06) to concisely show only the
removal, since that is the evil part and the previous contents are
presumably uninteresting.
However, to diagnose an empty merge result, it overloaded the variable
that holds the file's length. This means that the check also triggers
for truncated files. Consequently, such files were not shown in the
diff at all despite the merge being clearly evil.
Fix this by adding a new variable that distinguishes whether the file
was deleted (which is the case 4462731 handled) or truncated. In the
truncated case, we show the full combined diff again, which is rather
spammy but at least does not hide the evilness.
Reported-by: David Martínez Martí <desarrollo@gestiweb.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Inspired by the coloring of quilt.
Introduce a separate color and paint the hunk comment part, i.e. the name
of the function, in a separate color "diff.func" (defaults to plain).
Whitespace between hunk header and hunk comment is printed in plain color.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When combine-diff inspected the diff from one parent to the merge result,
it misinterpreted a header in the form @@ -l,k +0,0 @@.
This hunk header means that K lines were removed from the beginning of the
file, so the lost lines must be queued to the sline that represents the
first line of the merge result, but we incremented our pointer incorrectly
and ended up queuing it to the second line, which in turn made the lossage
appear _after_ the first line.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For a deleted line in a patch with the parent we are looking at, the
append_lost() function finds the same line among a run of lines that were
deleted from the same location by patches from parents we previously
checked. This is so that patches with two parents
@@ -1,4 +1,3 @@ @@ -1,4 +1,3 @@
one one
-two -two
three three
-quatro -fyra
+four +four
can be coalesced into this sequence, reusing one line that describes the
removal of "two" for both parents.
@@@ -1,4 -1,4 +1,3 @@@
one
--two
three
- quatro
-frya
++four
While reading the second patch (that removes "two" and then "fyra"), after
finding where removal of the "two" matches, we need to find existing
removal of "fyra" (if exists) in the removal list, but the match has to
happen after all the existing matches (in this case "two"). The code used
a naïve O(n^2) algorithm to compute this by scanning the whole removal
list over and over again.
This patch remembers where the next scan should be started in the existing
removal list to avoid this.
Noticed by Linus Torvalds.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Lots of die() calls did not actually report the kind of error, which
can leave the user confused as to the real problem. Use die_errno()
where we check a system/library call that sets errno on failure, or
one of the following that wrap such calls:
Function Passes on error from
-------- --------------------
odb_pack_keep open
read_ancestry fopen
read_in_full xread
strbuf_read xread
strbuf_read_file open or strbuf_read_file
strbuf_readlink readlink
write_in_full xwrite
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Essentially; s/type* /type */ as per the coding guidelines.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The combine diff logic knew only about blobs (and their checked-out form
in the work tree, either regular files or symlinks), and barfed when fed
submodules. This "externalizes" gitlinks in the same way as the normal
patch generation codepath does (i.e. "Subproject commit Xxx\n") to fix the
issue.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These weren't used outside and can be safely moved
Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently inside show_patch_diff() we have an fstat() call after an
ok lstat() call. Since before the call to fstat() we have already
tested for the link case with S_ISLNK(), the fstat() can be removed.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When showing combined diff using work tree contents, use strbuf_readlink()
to read symbolic links.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
We're going to be adding some parameters to this, so we can't have
any uninitialized data in it.
Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many call sites use strbuf_init(&foo, 0) to initialize local
strbuf variable "foo" which has not been accessed since its
declaration. These can be replaced with a static initialization
using the STRBUF_INIT macro which is just as readable, saves a
function call, and takes up fewer lines.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
With a new configuration "diff.mnemonicprefix", "git diff" shows the
differences between various combinations of preimage and postimage trees
with prefixes different from the standard "a/" and "b/". Hopefully this
will make the distinction stand out for some people.
"git diff" compares the (i)ndex and the (w)ork tree;
"git diff HEAD" compares a (c)ommit and the (w)ork tree;
"git diff --cached" compares a (c)ommit and the (i)ndex;
"git-diff HEAD:file1 file2" compares an (o)bject and a (w)ork tree entity;
"git diff --no-index a b" compares two non-git things (1) and (2).
Because these mnemonics now have meanings, they are swapped when reverse
diff is in effect and this feature is enabled.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the tracked contents have CRLF line endings, colored diff output
shows "^M" at the end of output lines, which is distracting, even though
the pager we use by default ("less") knows to hide them.
The problem is that "less" hides a carriage-return only at the end of the
line, immediately before a line feed. The colored diff output does not
take this into account, and emits four element sequence for each line:
- force this color;
- the line up to but not including the terminating line feed;
- reset color
- line feed.
By including the carriage return at the end of the line in the second
item, we are breaking the smart our pager has in order not to show "^M".
This can be fixed by changing the sequence to:
- force this color;
- the line up to but not including the terminating end-of-line;
- reset color
- end-of-line.
where end-of-line is either a single linefeed or a CRLF pair. When the
output is not colored, "force this color" and "reset color" sequences are
both empty, so we won't have this problem with or without this patch.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix git-diff to make it produce useful 3-way diffs for merge conflicts in
repositories with autocrlf enabled. Otherwise it always reports that the
whole file was changed, because it uses the contents from the working tree
without necessary conversion.
Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This further enhances xdi_diff_outf() interface so that it takes two
common parameters: the callback function that processes one line at a
time, and a pointer to its application specific callback data structure.
xdi_diff_outf() creates its own "xdiff_emit_state" structure and stashes
these two away inside it, which is used by the lowest level output
function in the xdiff_outf() callchain, consume_one(), to call back to the
application layer. With this restructuring, we lift the requirement that
the caller supplied callback data structure embeds xdiff_emit_state
structure as its first member.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To prepare for the need to initialize and release resources for an
xdi_diff with the xdiff_outf output function, make a new function to
wrap this usage.
Old:
ecb.outf = xdiff_outf;
ecb.priv = &state;
...
xdi_diff(file_p, file_o, &xpp, &xecfg, &ecb);
New:
xdi_diff_outf(file_p, file_o, &state.xm, &xpp, &xecfg, &ecb);
Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we include a few uninteresting lines before the interesting ones as
context, we are only interested in seeing the surviving lines themselves
and not the deleted lines that are before them. Mark the added leading
context lines in give_context() and not show deleted lines form them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These variables were made unnecessary by commit
3969cf7db1.
Signed-off-by: Adam Simpkins <adam@adamsimpkins.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The resulting data is zero terminated after the read loop, but
the subsequent loop that scans for '\n' will overrun the buffer.
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This moves the logic to quote two paths (prefix + path) in
C-style introduced in the previous commit from the
dump_quoted_path() in combine-diff.c to quote.c, and uses it to
fix rewrite_diff() that never C-quoted the pathnames correctly.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier when showing combined diff, the filenames on the ---/+++
header lines were quoted incorrectly. a/ (or b/) prefix was
output literally and then the path was output, with c-quoting.
This fixes the quoting logic, and while at it, adjusts the code
to use the customizable prefix (a_prefix and b_prefix)
introduced recently.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This inserts a new function xdi_diff() that currently does not
do anything other than calling the underlying xdl_diff() to the
callchain of current callers of xdl_diff() function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
reverse_diff was a bit-value in disguise, it's merged in the flags now.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* quote_c_style works on a strbuf instead of a wild buffer.
* quote_c_style is now clever enough to not add double quotes if not needed.
* write_name_quoted inherits those advantages, but also take a different
set of arguments. Now instead of asking for quotes or not, you pass a
"terminator". If it's \0 then we assume you don't want to escape, else C
escaping is performed. In any case, the terminator is also appended to the
stream. It also no longer takes the prefix/prefix_len arguments, as it's
seldomly used, and makes some optimizations harder.
* write_name_quotedpfx is created to work like write_name_quoted and take
the prefix/prefix_len arguments.
Thanks to those API changes, diff.c has somehow lost weight, thanks to the
removal of functions that were wrappers around the old write_name_quoted
trying to give it a semantics like the new one, but performing a lot of
allocations for this goal. Now we always write directly to the stream, no
intermediate allocation is performed.
As a side effect of the refactor in builtin-apply.c, the length of the bar
graphs in diffstats are not affected anymore by the fact that the path was
clipped.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
The instances of xdemitconf_t were initialized member by member.
Instead, initialize them to all zero, so we do not have
to update those places each time we introduce a new member.
[jc: minimally fixed by getting rid of a new global]
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch fixes all calls to xread() where the return value is not
stored into an ssize_t. The patch should not have any effect whatsoever,
other than putting better/more appropriate type names on variables.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This enhances the attributes mechanism so that external programs
meant for existing GIT_EXTERNAL_DIFF interface can be specifed
per path.
To configure such a custom diff driver, first define a custom
diff driver in the configuration:
[diff "my-c-diff"]
command = <<your command string comes here>>
Then mark the paths that you want to use this custom driver
using the attribute mechanism.
*.c diff=my-c-diff
The intent of this separation is that the attribute mechanism is
used for specifying the type of the contents, while the
configuration mechanism is used to define what needs to be done
to that type of the contents, which would be specific to both
platform and personal taste.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Some systems have sizeof(off_t) == 8 while sizeof(size_t) == 4.
This implies that we are able to access and work on files whose
maximum length is around 2^63-1 bytes, but we can only malloc or
mmap somewhat less than 2^32-1 bytes of memory.
On such a system an implicit conversion of off_t to size_t can cause
the size_t to wrap, resulting in unexpected and exciting behavior.
Right now we are working around all gcc warnings generated by the
-Wshorten-64-to-32 option by passing the off_t through xsize_t().
In the future we should make xsize_t on such problematic platforms
detect the wrapping and die if such a file is accessed.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When core.symlinks is false, and a merge of symbolic links had conflicts,
the merge result is left as a file in the working directory. A decision
must be made whether the file is treated as a regular file or as a
symbolic link. This patch treats the file as a symbolic link only if
all merge parents were also symbolic links.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value. One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.
This is an initial step for the removal of the version using a char array
found in object reading code paths. The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Few of us use git to compare or even version-control 2GB files,
but when we do, we'll want it to work.
Reading a recent patch, I noticed two lines like this:
int len = st.st_size;
Instead of "int", that should be "size_t". Otherwise, in the
non-symlink case, with 64-bit size_t, if the file's size is 2GB,
the following xmalloc will fail:
result = xmalloc(len + 1);
trying to allocate 2^64 - 2^31 + 1 bytes (assuming sign-extension
in the int-to-size_t promotion). And even if it didn't fail, the
subsequent "result[len] = 0;" would be equivalent to an unpleasant
"result[-2147483648] = 0;"
The other nearby "int"-declared size variable, sz, should also be of
type size_t, for the same reason. If sz ever wraps around and becomes
negative, xread will corrupt memory _before_ the "result" buffer.
Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>