A new diffcore transformation, diffcore-break.c, is introduced.
When the -B flag is given, a patch that represents a complete
rewrite is broken into a deletion followed by a creation. This
makes it easier to review such a complete rewrite patch.
The -B flag takes the same syntax as the -M and -C flags to
specify the minimum amount of non-source material the resulting
file needs to have to be considered a complete rewrite, and
defaults to 99% if not specified.
As the new test t4008-diff-break-rewrite.sh demonstrates, if a
file is a complete rewrite, it is broken into a delete/create
pair, which can further be subjected to the usual rename
detection if -M or -C is used. For example, if file0 gets
completely rewritten to make it as if it were rather based on
file1 which itself disappeared, the following happens:
The original change looks like this:
file0 --> file0' (quite different from file0)
file1 --> /dev/null
After diffcore-break runs, it would become this:
file0 --> /dev/null
/dev/null --> file0'
file1 --> /dev/null
Then diffcore-rename matches them up:
file1 --> file0'
The internal score values are finer grained now. Earlier
maximum of 10000 has been raised to 60000; there is no user
visible changes but there is no reason to waste available bits.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The three diff-* brothers had a sequence of calls into diffcore
that were almost identical. Introduce a new diffcore_std()
function that takes all the necessary arguments to consolidate
it. This will make later enhancements and changing the order of
diffcore application simpler.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This attempts to optimize "diff-tree -[CM] --stdin", which
compares successible tree pairs. This optimization does not
make much sense for other commands in the diff-* brothers.
When reading from --stdin and using rename/copy detection, the
patch makes diff-tree to read the current index file first.
This is done to reuse the optimization used by diff-cache in the
non-cached case. Similarity estimator can avoid expanding a
blob if the index says what is in the work tree has an exact
copy of that blob already expanded.
Another optimization the patch makes is to check only file sizes
first to terminate similarity estimation early. In order for
this to work, it needs a way to tell the size of the blob
without expanding it. Since an obvious way of doing it, which
is to keep all the blobs previously used in the memory, is too
costly, it does so by keeping the filesize for each object it
has already seen in memory.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When --pickaxe-all is given in addition to -S, pickaxe shows the
entire diffs contained in the changeset, not just the diffs for
the filepair that touched the sought-after string. This is
useful to see the changes in context.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This changes the argument of diff_setup() from an integer that
says if we are feeding reversed diff to a bitmask, so that later
global options can be added more easily.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
No need to make them multiple lines, in fact we explicitly don't want that.
This also fixes a 64-bit problem pointed out by Markus F.X.J. Oberhumer,
where we gave "%.*s" a "ptrdiff_t" length argument instead of an "int".
diff-tree does the culling of uninteresting paths internally, and
fundamentally has to do so for performance reasons. So there's no
point in calling the separate pathname culling logic here,
especially as it seems slightly broken.
This adds a "-t" flag to tell the raw diff output to include the tree
objects in the output when doing a recursive diff.
Since that's how the non-recursive output already handles trees and the
flag thus doesn't make sense without "-r", I made "-t" imply "-r".
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Instead of checking silent flag all over the place, simply use
the NO_OUTPUT option diffcore provides to suppress the diff
output.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For later stages to reorder patches, pruning logic and rename detection
logic should not decide which delete to discard (because another entry
said it will take over the file as a rename) until the very end.
Also fix some tests that were assuming the earlier "last one is rename
or keep everything else is copy" semantics of diff-raw format, which no
longer is true.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This changes the diff-raw format again, following the mailing
list discussion. The new format explicitly expresses which one
is a rename and which one is a copy.
The documentation and tests are updated to match this change.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Thomas Glanzmann noticed that diff-tree -z HEAD piped to
diff-helper -z did not work. Since diff-helper -z expects NUL
terminated lines, we should generate such.
The output side of the diff-helper should always be using '\n'
termination; earlier it used the same line_termination used for
the input side, which was a mistake.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This moves the path selection logic from individual programs to a new
diffcore transformer (diff-tree still needs to have its own for
performance reasons). Also the header printing code in diff-tree was
tweaked not to produce anything when pickaxe is in effect and there is
nothing interesting to report. An interesting example is the following
in the GIT archive itself:
$ git-whatchanged -p -C -S'or something in a real script'
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update the diff-raw format as Linus and I discussed, except that
it does not use sequence of underscore '_' letters to express
nonexistence. All '0' mode is used for that purpose instead.
The new diff-raw format can express rename/copy, and the earlier
restriction that -M and -C _must_ be used with the patch format
output is no longer necessary. The patch makes -M and -C flags
independent of -p flag, so you need to say git-whatchanged -M -p
to get the diff/patch format.
Updated are both documentations and tests.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This does not actually supress the extra headers when pickaxe is
used, but prepares enough support for diff-tree to implement it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make the commit explanation buffer larger, and make sure that if
we truncate it, we put a "..." marker there to visually tell people
about the truncation (tested with a much smaller buffer to make
sure it looks sane).
Also make sure that the explanation is properly line-terminated,
and add an extra newline iff we have a diff.
This steals the "pickaxe" feature from JIT and make it available
to the bare Plumbing layer. From the command line, the user
gives a string he is intersted in.
Using the diff-core infrastructure previously introduced, it
filters the differences to limit the output only to the diffs
between <src> and <dst> where the string appears only in one but
not in the other. For example:
$ ./git-rev-list HEAD | ./git-diff-tree -Sdiff-tree-helper --stdin -M
would show the diffs that touch the string "diff-tree-helper".
In real software-archaeologist application, you would typically
look for a few to several lines of code and see where that code
came from.
The "pickaxe" module runs after "rename/copy detection" module,
so it even crosses the file rename boundary, as the above
example demonstrates.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This introduces the diff-core, the layer between the diff-tree
family and the external diff interface engine. The calls to the
interface diff-tree family uses (diff_change and diff_addremove)
have not changed and will not change. The purpose of the
diff-core layer is to provide an infrastructure to transform the
set of differences sent from the applications, before sending
them to the external diff interface.
The recently introduced rename detection code has been rewritten
to use the diff-core facility. When applications send in
separate creates and deletes, matching ones are transformed into
a single rename-and-edit diff, and sent out to the external diff
interface as such.
This patch also enhances the rename detection code further to be
able to detect copies. Currently this happens only as long as
copy sources appear as part of the modified files, but there
already is enough provision for callers to report unmodified
files to diff-core, so that they can be also used as copy source
candidates. Extending the callers this way will be done in a
separate patch.
Please see and marvel at how well this works by trying out the
newly added t/t4003-diff-rename-1.sh test script.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix various things that sparse complains about:
- use NULL instead of 0
- make sure we declare everything properly, or mark it static
- use proper function declarations ("fn(void)" instead of "fn()")
Sparse is always right.
Add '-R' flag to diff-tree, and change the test subdirectory
shell files to be executable (something that Junio couldn't
get me to do through the pure patch with my current patch
handling infrastructure).
This cleans up the way calls are made into the diff core from diff-tree
family and diff-helper. Earlier, these programs had "if
(generating_patch)" sprinkled all over the place, but those ugliness are
gone and handled uniformly from the diff core, even when not generating
patch format.
This also allowed diff-cache and diff-files to acquire -R
(reverse) option to generate diff in reverse. Users of
diff-tree can swap two trees easily so I did not add -R there.
[ Linus' note: I'll add -R to "diff-tree" too, since a "commit
diff" doesn't have another tree to switch around: the other
tree is always the parent(s) of the commit ]
Also -M<digits-as-mantissa> suggestion made by Linus has been
implemented.
Documentation updates are also included.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixes all in-code names that leaved during "big name change".
Signed-off-by: Alexey Nezhdanov <snake@penza-gsm.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This rips out the rename detection engine from diff-helper and moves it
to the diff core, and updates the internal calling convention used by
diff-tree family into the diff core. In order to give the same option
name to diff-tree family as well as to diff-helper, I've changed the
earlier diff-helper '-r' option to '-M' (stands for Move; sorry but the
natural abbreviation 'r' for 'rename' is already taken for 'recursive').
Although I did a fair amount of test with the git-diff-tree with
existing rename commits in the core GIT repository, this should still be
considered beta (preview) release. This patch depends on the diff-delta
infrastructure just committed.
This implements almost everything I wanted to see in this series of
patch, except a few minor cleanups in the calling convention into diff
core, but that will be a separate cleanup patch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This normally doesn't matter, but if you have a filename that is
sometimes a directory and sometimes a regular file (or symlink),
we don't want the regular file case to trigger a "partial match".
We used to trigger the "interesting subdirectory" check for any
matching name that started with the same character series, regardless
of whether it had the matching slash or not.
We can't just do the "sha1_to_hex()" thing directly, since the
buffer in question will be overwritten by the name of the parent.
So teach diff_tree_commit() to generate the proper hex name itself.
We use "--" to mark end of command line switches, not "-". Also,
allow more flexibility in the passed-in sha1 names, in that a
single sha1 uses the "commit-diff" logic that compares against
its parent(s).
This updates the usage message string and Documentation/core-git.txt
to describe the new flags added to the git-diff-tree command.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This allows you to trivially do fancy and readable output. Something like
git-rev-list HEAD | git-diff-tree -p -v --stdin kernel/ | less -S
gives a nice output of what has changed in the kernel/ subdirectory lately.
This only shows the tree headers when something actually changed. Also,
add a "silent" mode, which doesn't actually show the changes at all,
just the commit information.
This means that you can do
git-rev-list HEAD --max-count=10 | git-diff-tree --stdin update-cache.c
to see which (if any) of the last ten commits changed update-cache.c.
Use the "-m" flag to see merges too. Normally they are suppressed.
Mention the '-p' option in the usage help string of git-diff-tree.
Signed-Off-by: Thomas Glanzmann <sithglan@stud.uni-erlangen.de>
Signed-Off-by: Linus Torvalds <torvalds@osdl.org>
This allows the programs to use various simplified versions of
the SHA1 names, eg just say "HEAD" for the SHA1 pointed to by
the .git/HEAD file etc.
For example, this commit has been done with
git-commit-tree $(git-write-tree) -p HEAD
instead of the traditional "$(cat .git/HEAD)" syntax.
This patch renames read_tree_with_tree_or_commit_sha1() to
read_object_with_reference() and extends it to automatically
dereference not just "commit" objects but "tag" objects. With
this patch, you can say e.g.:
ls-tree $tag
read-tree -m $(merge-base $tag $HEAD) $tag $HEAD
diff-cache $tag
diff-tree $tag $HEAD
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes diff-tree -p imply recursive behaviour.
Other commands in the family always takes a flat universe view
so this is not even needed.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This uses the reworked diff interface to generate patches directly out
of diff-tree when -p is specified.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce xmalloc and xrealloc to die gracefully with a descriptive
message when out of memory, rather than taking a SIGSEGV.
Signed-off-by: Christopher Li<chrislgit@chrisli.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is based on a patch by David Woodhouse, but with the selection
tests much simplified and streamlined.
It makes diff-tree take extra arguments, specifying the files or
directories which should be considered "interesting". Changes in
uninteresting directories are not reported.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Usage string fixes to make maintenance easier (only one instance
of a string to update not multiple copies). I've spotted and
corrected inconsistent usage text in diff-tree while doing this.
Also diff-cache and read-tree usage text have been corrected to
match their up-to-date features. Earlier, neither "--cached"
form of diff-cache nor "-m single-merge" form of read-tree were
described.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Updates diff-tree.c to use read_tree_with_tree_or_commit_sha1()
function. The command can take either tree or commit IDs with this patch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sometimes it's just easier to not have to look up the "commit"->"tree"
translation by hand first. It's trivial to do inside diff-tree, and
it's just being polite.