|
|
|
/*
|
|
|
|
* Helper functions for tree diff generation
|
|
|
|
*/
|
|
|
|
#include "cache.h"
|
|
|
|
#include "diff.h"
|
Finally implement "git log --follow"
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.
But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..
The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.
So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.
To make a long story short: I'm sorry for it taking so long until I just
did it.
I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.
All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.
One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.
IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.
Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.
And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do
git log -p --follow builtin-rev-list.c
and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
18 years ago
|
|
|
#include "diffcore.h"
|
|
|
|
#include "tree.h"
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Compare two tree entries, taking into account only path/S_ISDIR(mode),
|
|
|
|
* but not their sha1's.
|
|
|
|
*
|
|
|
|
* NOTE files and directories *always* compare differently, even when having
|
|
|
|
* the same name - thanks to base_name_compare().
|
tree-diff: remove special-case diff-emitting code for empty-tree cases
While walking trees, we iterate their entries from lowest to highest in
sort order, so empty tree means all entries were already went over.
If we artificially assign +infinity value to such tree "entry", it will
go after all usual entries, and through the usual driver loop we will be
taking the same actions, which were hand-coded for special cases, i.e.
t1 empty, t2 non-empty
pathcmp(+∞, t2) -> +1
show_path(/*t1=*/NULL, t2); /* = t1 > t2 case in main loop */
t1 non-empty, t2-empty
pathcmp(t1, +∞) -> -1
show_path(t1, /*t2=*/NULL); /* = t1 < t2 case in main loop */
In other words when we have t1 and t2, we return a sign that tells the
caller to indicate the "earlier" one to be emitted, and by returning the
sign that causes the non-empty side to be emitted, we will automatically
cause the entries from the remaining side to be emitted, without
attempting to touch the empty side at all. We can teach
tree_entry_pathcmp() to pretend that an empty tree has an element that
sorts after anything else to achieve this.
Right now we never go to when compared tree descriptors are both
infinity, as this condition is checked in the loop beginning as
finishing criteria, but will do so in the future, when there will be
several parents iterated simultaneously, and some pair of them would run
to the end.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
*
|
|
|
|
* NOTE empty (=invalid) descriptor(s) take part in comparison as +infty,
|
|
|
|
* so that they sort *after* valid tree entries.
|
|
|
|
*
|
|
|
|
* Due to this convention, if trees are scanned in sorted order, all
|
|
|
|
* non-empty descriptors will be processed first.
|
|
|
|
*/
|
|
|
|
static int tree_entry_pathcmp(struct tree_desc *t1, struct tree_desc *t2)
|
|
|
|
{
|
|
|
|
struct name_entry *e1, *e2;
|
|
|
|
int cmp;
|
|
|
|
|
tree-diff: remove special-case diff-emitting code for empty-tree cases
While walking trees, we iterate their entries from lowest to highest in
sort order, so empty tree means all entries were already went over.
If we artificially assign +infinity value to such tree "entry", it will
go after all usual entries, and through the usual driver loop we will be
taking the same actions, which were hand-coded for special cases, i.e.
t1 empty, t2 non-empty
pathcmp(+∞, t2) -> +1
show_path(/*t1=*/NULL, t2); /* = t1 > t2 case in main loop */
t1 non-empty, t2-empty
pathcmp(t1, +∞) -> -1
show_path(t1, /*t2=*/NULL); /* = t1 < t2 case in main loop */
In other words when we have t1 and t2, we return a sign that tells the
caller to indicate the "earlier" one to be emitted, and by returning the
sign that causes the non-empty side to be emitted, we will automatically
cause the entries from the remaining side to be emitted, without
attempting to touch the empty side at all. We can teach
tree_entry_pathcmp() to pretend that an empty tree has an element that
sorts after anything else to achieve this.
Right now we never go to when compared tree descriptors are both
infinity, as this condition is checked in the loop beginning as
finishing criteria, but will do so in the future, when there will be
several parents iterated simultaneously, and some pair of them would run
to the end.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
/* empty descriptors sort after valid tree entries */
|
|
|
|
if (!t1->size)
|
|
|
|
return t2->size ? 1 : 0;
|
|
|
|
else if (!t2->size)
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
e1 = &t1->entry;
|
|
|
|
e2 = &t2->entry;
|
|
|
|
cmp = base_name_compare(e1->path, tree_entry_len(e1), e1->mode,
|
|
|
|
e2->path, tree_entry_len(e2), e2->mode);
|
|
|
|
return cmp;
|
tree-diff: consolidate code for emitting diffs and recursion in one place
Currently both compare_tree_entry() and show_entry() invoke opt diff
callbacks (opt->add_remove() and opt->change()), and also they both have
code which decides whether to recurse into sub-tree, and whether to emit
a tree as separate entry if DIFF_OPT_TREE_IN_RECURSIVE is set.
I.e. we have code duplication and logic scattered on two places.
Let's consolidate it - all diff emiting code and recurion logic moves
to show_entry, which is now named as show_path, because it shows diff
for a path, based on up to two tree entries, with actual diff emitting
code being kept in new helper emit_diff() for clarity.
What we have as the result, is that compare_tree_entry is now free from
code with logic for diff generation, and also performance is not
affected as timings for
`git log --raw --no-abbrev --no-renames`
for navy.git and `linux.git v3.10..v3.11`, just like in previous patch,
stay the same.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/* convert path, t1/t2 -> opt->diff_*() callbacks */
|
|
|
|
static void emit_diff(struct diff_options *opt, struct strbuf *path,
|
|
|
|
struct tree_desc *t1, struct tree_desc *t2)
|
|
|
|
{
|
|
|
|
unsigned int mode1 = t1 ? t1->entry.mode : 0;
|
|
|
|
unsigned int mode2 = t2 ? t2->entry.mode : 0;
|
|
|
|
|
|
|
|
if (mode1 && mode2) {
|
|
|
|
opt->change(opt, mode1, mode2, t1->entry.sha1, t2->entry.sha1,
|
|
|
|
1, 1, path->buf, 0, 0);
|
|
|
|
}
|
|
|
|
else {
|
|
|
|
const unsigned char *sha1;
|
|
|
|
unsigned int mode;
|
|
|
|
int addremove;
|
|
|
|
|
|
|
|
if (mode2) {
|
|
|
|
addremove = '+';
|
|
|
|
sha1 = t2->entry.sha1;
|
|
|
|
mode = mode2;
|
|
|
|
} else {
|
|
|
|
addremove = '-';
|
|
|
|
sha1 = t1->entry.sha1;
|
|
|
|
mode = mode1;
|
|
|
|
}
|
tree-diff: consolidate code for emitting diffs and recursion in one place
Currently both compare_tree_entry() and show_entry() invoke opt diff
callbacks (opt->add_remove() and opt->change()), and also they both have
code which decides whether to recurse into sub-tree, and whether to emit
a tree as separate entry if DIFF_OPT_TREE_IN_RECURSIVE is set.
I.e. we have code duplication and logic scattered on two places.
Let's consolidate it - all diff emiting code and recurion logic moves
to show_entry, which is now named as show_path, because it shows diff
for a path, based on up to two tree entries, with actual diff emitting
code being kept in new helper emit_diff() for clarity.
What we have as the result, is that compare_tree_entry is now free from
code with logic for diff generation, and also performance is not
affected as timings for
`git log --raw --no-abbrev --no-renames`
for navy.git and `linux.git v3.10..v3.11`, just like in previous patch,
stay the same.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
|
|
|
|
opt->add_remove(opt, addremove, mode, sha1, 1, path->buf, 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
tree-diff: consolidate code for emitting diffs and recursion in one place
Currently both compare_tree_entry() and show_entry() invoke opt diff
callbacks (opt->add_remove() and opt->change()), and also they both have
code which decides whether to recurse into sub-tree, and whether to emit
a tree as separate entry if DIFF_OPT_TREE_IN_RECURSIVE is set.
I.e. we have code duplication and logic scattered on two places.
Let's consolidate it - all diff emiting code and recurion logic moves
to show_entry, which is now named as show_path, because it shows diff
for a path, based on up to two tree entries, with actual diff emitting
code being kept in new helper emit_diff() for clarity.
What we have as the result, is that compare_tree_entry is now free from
code with logic for diff generation, and also performance is not
affected as timings for
`git log --raw --no-abbrev --no-renames`
for navy.git and `linux.git v3.10..v3.11`, just like in previous patch,
stay the same.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
|
|
|
|
/* new path should be added to diff
|
|
|
|
*
|
|
|
|
* 3 cases on how/when it should be called and behaves:
|
|
|
|
*
|
|
|
|
* !t1, t2 -> path added, parent lacks it
|
|
|
|
* t1, !t2 -> path removed from parent
|
|
|
|
* t1, t2 -> path modified
|
|
|
|
*/
|
|
|
|
static void show_path(struct strbuf *base, struct diff_options *opt,
|
|
|
|
struct tree_desc *t1, struct tree_desc *t2)
|
|
|
|
{
|
|
|
|
unsigned mode;
|
|
|
|
const char *path;
|
tree-diff: consolidate code for emitting diffs and recursion in one place
Currently both compare_tree_entry() and show_entry() invoke opt diff
callbacks (opt->add_remove() and opt->change()), and also they both have
code which decides whether to recurse into sub-tree, and whether to emit
a tree as separate entry if DIFF_OPT_TREE_IN_RECURSIVE is set.
I.e. we have code duplication and logic scattered on two places.
Let's consolidate it - all diff emiting code and recurion logic moves
to show_entry, which is now named as show_path, because it shows diff
for a path, based on up to two tree entries, with actual diff emitting
code being kept in new helper emit_diff() for clarity.
What we have as the result, is that compare_tree_entry is now free from
code with logic for diff generation, and also performance is not
affected as timings for
`git log --raw --no-abbrev --no-renames`
for navy.git and `linux.git v3.10..v3.11`, just like in previous patch,
stay the same.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
int pathlen;
|
|
|
|
int old_baselen = base->len;
|
tree-diff: consolidate code for emitting diffs and recursion in one place
Currently both compare_tree_entry() and show_entry() invoke opt diff
callbacks (opt->add_remove() and opt->change()), and also they both have
code which decides whether to recurse into sub-tree, and whether to emit
a tree as separate entry if DIFF_OPT_TREE_IN_RECURSIVE is set.
I.e. we have code duplication and logic scattered on two places.
Let's consolidate it - all diff emiting code and recurion logic moves
to show_entry, which is now named as show_path, because it shows diff
for a path, based on up to two tree entries, with actual diff emitting
code being kept in new helper emit_diff() for clarity.
What we have as the result, is that compare_tree_entry is now free from
code with logic for diff generation, and also performance is not
affected as timings for
`git log --raw --no-abbrev --no-renames`
for navy.git and `linux.git v3.10..v3.11`, just like in previous patch,
stay the same.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
int isdir, recurse = 0, emitthis = 1;
|
|
|
|
|
|
|
|
/* at least something has to be valid */
|
|
|
|
assert(t1 || t2);
|
|
|
|
|
|
|
|
if (t2) {
|
|
|
|
/* path present in resulting tree */
|
|
|
|
tree_entry_extract(t2, &path, &mode);
|
|
|
|
pathlen = tree_entry_len(&t2->entry);
|
|
|
|
isdir = S_ISDIR(mode);
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* a path was removed - take path from parent. Also take
|
|
|
|
* mode from parent, to decide on recursion.
|
|
|
|
*/
|
|
|
|
tree_entry_extract(t1, &path, &mode);
|
|
|
|
pathlen = tree_entry_len(&t1->entry);
|
|
|
|
|
|
|
|
isdir = S_ISDIR(mode);
|
|
|
|
mode = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (DIFF_OPT_TST(opt, RECURSIVE) && isdir) {
|
|
|
|
recurse = 1;
|
|
|
|
emitthis = DIFF_OPT_TST(opt, TREE_IN_RECURSIVE);
|
|
|
|
}
|
|
|
|
|
|
|
|
strbuf_add(base, path, pathlen);
|
|
|
|
|
tree-diff: consolidate code for emitting diffs and recursion in one place
Currently both compare_tree_entry() and show_entry() invoke opt diff
callbacks (opt->add_remove() and opt->change()), and also they both have
code which decides whether to recurse into sub-tree, and whether to emit
a tree as separate entry if DIFF_OPT_TREE_IN_RECURSIVE is set.
I.e. we have code duplication and logic scattered on two places.
Let's consolidate it - all diff emiting code and recurion logic moves
to show_entry, which is now named as show_path, because it shows diff
for a path, based on up to two tree entries, with actual diff emitting
code being kept in new helper emit_diff() for clarity.
What we have as the result, is that compare_tree_entry is now free from
code with logic for diff generation, and also performance is not
affected as timings for
`git log --raw --no-abbrev --no-renames`
for navy.git and `linux.git v3.10..v3.11`, just like in previous patch,
stay the same.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
if (emitthis)
|
|
|
|
emit_diff(opt, base, t1, t2);
|
|
|
|
|
|
|
|
if (recurse) {
|
|
|
|
strbuf_addch(base, '/');
|
tree-diff: consolidate code for emitting diffs and recursion in one place
Currently both compare_tree_entry() and show_entry() invoke opt diff
callbacks (opt->add_remove() and opt->change()), and also they both have
code which decides whether to recurse into sub-tree, and whether to emit
a tree as separate entry if DIFF_OPT_TREE_IN_RECURSIVE is set.
I.e. we have code duplication and logic scattered on two places.
Let's consolidate it - all diff emiting code and recurion logic moves
to show_entry, which is now named as show_path, because it shows diff
for a path, based on up to two tree entries, with actual diff emitting
code being kept in new helper emit_diff() for clarity.
What we have as the result, is that compare_tree_entry is now free from
code with logic for diff generation, and also performance is not
affected as timings for
`git log --raw --no-abbrev --no-renames`
for navy.git and `linux.git v3.10..v3.11`, just like in previous patch,
stay the same.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
diff_tree_sha1(t1 ? t1->entry.sha1 : NULL,
|
|
|
|
t2 ? t2->entry.sha1 : NULL, base->buf, opt);
|
|
|
|
}
|
|
|
|
|
|
|
|
strbuf_setlen(base, old_baselen);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void skip_uninteresting(struct tree_desc *t, struct strbuf *base,
|
|
|
|
struct diff_options *opt)
|
Set up for better tree diff optimizations
This is mainly just a cleanup patch, and sets up for later changes where
the tree-diff.c "interesting()" function can return more than just a
yes/no value.
In particular, it should be quite possible to say "no subsequent entries
in this tree can possibly be interesting any more", and thus allow the
callers to short-circuit the tree entirely.
In fact, changing the callers to do so is trivial, and is really all this
patch really does, because changing "interesting()" itself to say that
nothing further is going to be interesting is definitely more complicated,
considering that we may have arbitrary pathspecs.
But in cleaning up the callers, this actually fixes a potential small
performance issue in diff_tree(): if the second tree has a lot of
uninterestign crud in it, we would keep on doing the "is it interesting?"
check on the first tree for each uninteresting entry in the second one.
The answer is obviously not going to change, so that was just not helping.
The new code is clearer and simpler and avoids this issue entirely.
I also renamed "interesting()" to "tree_entry_interesting()", because I
got frustrated by the fact that
- we actually had *another* function called "interesting()" in another
file, and I couldn't tell from the profiles which one was the one that
mattered more.
- when rewriting it to return a ternary value, you can't just do
if (interesting(...))
...
any more, but want to assign the return value to a local variable. The
name of choice for that variable would normally be "interesting", so
I just wanted to make the function name be more specific, and avoid
that whole issue (even though I then didn't choose that name for either
of the users, just to avoid confusion in the patch itself ;)
In other words, this doesn't really change anything, but I think it's a
good thing to do, and if somebody comes along and writes the logic for
"yeah, none of the pathspecs you have are interesting", we now support
that trivially.
It could easily be a meaningful optimization for things like "blame",
where there's just one pathspec, and stopping when you've seen it would
allow you to avoid about 50% of the tree traversals on average.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
18 years ago
|
|
|
{
|
|
|
|
enum interesting match;
|
|
|
|
|
Set up for better tree diff optimizations
This is mainly just a cleanup patch, and sets up for later changes where
the tree-diff.c "interesting()" function can return more than just a
yes/no value.
In particular, it should be quite possible to say "no subsequent entries
in this tree can possibly be interesting any more", and thus allow the
callers to short-circuit the tree entirely.
In fact, changing the callers to do so is trivial, and is really all this
patch really does, because changing "interesting()" itself to say that
nothing further is going to be interesting is definitely more complicated,
considering that we may have arbitrary pathspecs.
But in cleaning up the callers, this actually fixes a potential small
performance issue in diff_tree(): if the second tree has a lot of
uninterestign crud in it, we would keep on doing the "is it interesting?"
check on the first tree for each uninteresting entry in the second one.
The answer is obviously not going to change, so that was just not helping.
The new code is clearer and simpler and avoids this issue entirely.
I also renamed "interesting()" to "tree_entry_interesting()", because I
got frustrated by the fact that
- we actually had *another* function called "interesting()" in another
file, and I couldn't tell from the profiles which one was the one that
mattered more.
- when rewriting it to return a ternary value, you can't just do
if (interesting(...))
...
any more, but want to assign the return value to a local variable. The
name of choice for that variable would normally be "interesting", so
I just wanted to make the function name be more specific, and avoid
that whole issue (even though I then didn't choose that name for either
of the users, just to avoid confusion in the patch itself ;)
In other words, this doesn't really change anything, but I think it's a
good thing to do, and if somebody comes along and writes the logic for
"yeah, none of the pathspecs you have are interesting", we now support
that trivially.
It could easily be a meaningful optimization for things like "blame",
where there's just one pathspec, and stopping when you've seen it would
allow you to avoid about 50% of the tree traversals on average.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
18 years ago
|
|
|
while (t->size) {
|
|
|
|
match = tree_entry_interesting(&t->entry, base, 0, &opt->pathspec);
|
|
|
|
if (match) {
|
|
|
|
if (match == all_entries_not_interesting)
|
|
|
|
t->size = 0;
|
|
|
|
break;
|
Set up for better tree diff optimizations
This is mainly just a cleanup patch, and sets up for later changes where
the tree-diff.c "interesting()" function can return more than just a
yes/no value.
In particular, it should be quite possible to say "no subsequent entries
in this tree can possibly be interesting any more", and thus allow the
callers to short-circuit the tree entirely.
In fact, changing the callers to do so is trivial, and is really all this
patch really does, because changing "interesting()" itself to say that
nothing further is going to be interesting is definitely more complicated,
considering that we may have arbitrary pathspecs.
But in cleaning up the callers, this actually fixes a potential small
performance issue in diff_tree(): if the second tree has a lot of
uninterestign crud in it, we would keep on doing the "is it interesting?"
check on the first tree for each uninteresting entry in the second one.
The answer is obviously not going to change, so that was just not helping.
The new code is clearer and simpler and avoids this issue entirely.
I also renamed "interesting()" to "tree_entry_interesting()", because I
got frustrated by the fact that
- we actually had *another* function called "interesting()" in another
file, and I couldn't tell from the profiles which one was the one that
mattered more.
- when rewriting it to return a ternary value, you can't just do
if (interesting(...))
...
any more, but want to assign the return value to a local variable. The
name of choice for that variable would normally be "interesting", so
I just wanted to make the function name be more specific, and avoid
that whole issue (even though I then didn't choose that name for either
of the users, just to avoid confusion in the patch itself ;)
In other words, this doesn't really change anything, but I think it's a
good thing to do, and if somebody comes along and writes the logic for
"yeah, none of the pathspecs you have are interesting", we now support
that trivially.
It could easily be a meaningful optimization for things like "blame",
where there's just one pathspec, and stopping when you've seen it would
allow you to avoid about 50% of the tree traversals on average.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
18 years ago
|
|
|
}
|
|
|
|
update_tree_entry(t);
|
Set up for better tree diff optimizations
This is mainly just a cleanup patch, and sets up for later changes where
the tree-diff.c "interesting()" function can return more than just a
yes/no value.
In particular, it should be quite possible to say "no subsequent entries
in this tree can possibly be interesting any more", and thus allow the
callers to short-circuit the tree entirely.
In fact, changing the callers to do so is trivial, and is really all this
patch really does, because changing "interesting()" itself to say that
nothing further is going to be interesting is definitely more complicated,
considering that we may have arbitrary pathspecs.
But in cleaning up the callers, this actually fixes a potential small
performance issue in diff_tree(): if the second tree has a lot of
uninterestign crud in it, we would keep on doing the "is it interesting?"
check on the first tree for each uninteresting entry in the second one.
The answer is obviously not going to change, so that was just not helping.
The new code is clearer and simpler and avoids this issue entirely.
I also renamed "interesting()" to "tree_entry_interesting()", because I
got frustrated by the fact that
- we actually had *another* function called "interesting()" in another
file, and I couldn't tell from the profiles which one was the one that
mattered more.
- when rewriting it to return a ternary value, you can't just do
if (interesting(...))
...
any more, but want to assign the return value to a local variable. The
name of choice for that variable would normally be "interesting", so
I just wanted to make the function name be more specific, and avoid
that whole issue (even though I then didn't choose that name for either
of the users, just to avoid confusion in the patch itself ;)
In other words, this doesn't really change anything, but I think it's a
good thing to do, and if somebody comes along and writes the logic for
"yeah, none of the pathspecs you have are interesting", we now support
that trivially.
It could easily be a meaningful optimization for things like "blame",
where there's just one pathspec, and stopping when you've seen it would
allow you to avoid about 50% of the tree traversals on average.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
18 years ago
|
|
|
}
|
|
|
|
}
|
|
|
|
|
tree-diff: rework diff_tree interface to be sha1 based
In the next commit this will allow to reduce intermediate calls, when
recursing into subtrees - at that stage we know only subtree sha1, and
it is natural for tree walker to start from that phase. For now we do
diff_tree
show_path
diff_tree_sha1
diff_tree
...
and the change will allow to reduce it to
diff_tree
show_path
diff_tree
Also, it will allow to omit allocating strbuf for each subtree, and just
reuse the common strbuf via playing with its len.
The above-mentioned improvements go in the next 2 patches.
The downside is that try_to_follow_renames(), if active, we cause
re-reading of 2 initial trees, which was negligible based on my timings,
and which is outweighed cogently by the upsides.
NOTE To keep with the current interface and semantics, I needed to
rename the function from diff_tree() to diff_tree_sha1(). As
diff_tree_sha1() was already used, and the function we are talking here
is its more low-level helper, let's use convention for prefixing
such helpers with "ll_". So the final renaming is
diff_tree() -> ll_diff_tree_sha1()
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
static int ll_diff_tree_sha1(const unsigned char *old, const unsigned char *new,
|
|
|
|
const char *base_str, struct diff_options *opt)
|
|
|
|
{
|
tree-diff: rework diff_tree interface to be sha1 based
In the next commit this will allow to reduce intermediate calls, when
recursing into subtrees - at that stage we know only subtree sha1, and
it is natural for tree walker to start from that phase. For now we do
diff_tree
show_path
diff_tree_sha1
diff_tree
...
and the change will allow to reduce it to
diff_tree
show_path
diff_tree
Also, it will allow to omit allocating strbuf for each subtree, and just
reuse the common strbuf via playing with its len.
The above-mentioned improvements go in the next 2 patches.
The downside is that try_to_follow_renames(), if active, we cause
re-reading of 2 initial trees, which was negligible based on my timings,
and which is outweighed cogently by the upsides.
NOTE To keep with the current interface and semantics, I needed to
rename the function from diff_tree() to diff_tree_sha1(). As
diff_tree_sha1() was already used, and the function we are talking here
is its more low-level helper, let's use convention for prefixing
such helpers with "ll_". So the final renaming is
diff_tree() -> ll_diff_tree_sha1()
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
struct tree_desc t1, t2;
|
|
|
|
void *t1tree, *t2tree;
|
|
|
|
struct strbuf base;
|
|
|
|
int baselen = strlen(base_str);
|
|
|
|
|
tree-diff: rework diff_tree interface to be sha1 based
In the next commit this will allow to reduce intermediate calls, when
recursing into subtrees - at that stage we know only subtree sha1, and
it is natural for tree walker to start from that phase. For now we do
diff_tree
show_path
diff_tree_sha1
diff_tree
...
and the change will allow to reduce it to
diff_tree
show_path
diff_tree
Also, it will allow to omit allocating strbuf for each subtree, and just
reuse the common strbuf via playing with its len.
The above-mentioned improvements go in the next 2 patches.
The downside is that try_to_follow_renames(), if active, we cause
re-reading of 2 initial trees, which was negligible based on my timings,
and which is outweighed cogently by the upsides.
NOTE To keep with the current interface and semantics, I needed to
rename the function from diff_tree() to diff_tree_sha1(). As
diff_tree_sha1() was already used, and the function we are talking here
is its more low-level helper, let's use convention for prefixing
such helpers with "ll_". So the final renaming is
diff_tree() -> ll_diff_tree_sha1()
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
t1tree = fill_tree_descriptor(&t1, old);
|
|
|
|
t2tree = fill_tree_descriptor(&t2, new);
|
|
|
|
|
|
|
|
/* Enable recursion indefinitely */
|
|
|
|
opt->pathspec.recursive = DIFF_OPT_TST(opt, RECURSIVE);
|
|
|
|
|
|
|
|
strbuf_init(&base, PATH_MAX);
|
|
|
|
strbuf_add(&base, base_str, baselen);
|
|
|
|
|
Set up for better tree diff optimizations
This is mainly just a cleanup patch, and sets up for later changes where
the tree-diff.c "interesting()" function can return more than just a
yes/no value.
In particular, it should be quite possible to say "no subsequent entries
in this tree can possibly be interesting any more", and thus allow the
callers to short-circuit the tree entirely.
In fact, changing the callers to do so is trivial, and is really all this
patch really does, because changing "interesting()" itself to say that
nothing further is going to be interesting is definitely more complicated,
considering that we may have arbitrary pathspecs.
But in cleaning up the callers, this actually fixes a potential small
performance issue in diff_tree(): if the second tree has a lot of
uninterestign crud in it, we would keep on doing the "is it interesting?"
check on the first tree for each uninteresting entry in the second one.
The answer is obviously not going to change, so that was just not helping.
The new code is clearer and simpler and avoids this issue entirely.
I also renamed "interesting()" to "tree_entry_interesting()", because I
got frustrated by the fact that
- we actually had *another* function called "interesting()" in another
file, and I couldn't tell from the profiles which one was the one that
mattered more.
- when rewriting it to return a ternary value, you can't just do
if (interesting(...))
...
any more, but want to assign the return value to a local variable. The
name of choice for that variable would normally be "interesting", so
I just wanted to make the function name be more specific, and avoid
that whole issue (even though I then didn't choose that name for either
of the users, just to avoid confusion in the patch itself ;)
In other words, this doesn't really change anything, but I think it's a
good thing to do, and if somebody comes along and writes the logic for
"yeah, none of the pathspecs you have are interesting", we now support
that trivially.
It could easily be a meaningful optimization for things like "blame",
where there's just one pathspec, and stopping when you've seen it would
allow you to avoid about 50% of the tree traversals on average.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
18 years ago
|
|
|
for (;;) {
|
|
|
|
int cmp;
|
|
|
|
|
|
|
|
if (diff_can_quit_early(opt))
|
|
|
|
break;
|
|
|
|
if (opt->pathspec.nr) {
|
tree-diff: rework diff_tree interface to be sha1 based
In the next commit this will allow to reduce intermediate calls, when
recursing into subtrees - at that stage we know only subtree sha1, and
it is natural for tree walker to start from that phase. For now we do
diff_tree
show_path
diff_tree_sha1
diff_tree
...
and the change will allow to reduce it to
diff_tree
show_path
diff_tree
Also, it will allow to omit allocating strbuf for each subtree, and just
reuse the common strbuf via playing with its len.
The above-mentioned improvements go in the next 2 patches.
The downside is that try_to_follow_renames(), if active, we cause
re-reading of 2 initial trees, which was negligible based on my timings,
and which is outweighed cogently by the upsides.
NOTE To keep with the current interface and semantics, I needed to
rename the function from diff_tree() to diff_tree_sha1(). As
diff_tree_sha1() was already used, and the function we are talking here
is its more low-level helper, let's use convention for prefixing
such helpers with "ll_". So the final renaming is
diff_tree() -> ll_diff_tree_sha1()
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
skip_uninteresting(&t1, &base, opt);
|
|
|
|
skip_uninteresting(&t2, &base, opt);
|
|
|
|
}
|
tree-diff: rework diff_tree interface to be sha1 based
In the next commit this will allow to reduce intermediate calls, when
recursing into subtrees - at that stage we know only subtree sha1, and
it is natural for tree walker to start from that phase. For now we do
diff_tree
show_path
diff_tree_sha1
diff_tree
...
and the change will allow to reduce it to
diff_tree
show_path
diff_tree
Also, it will allow to omit allocating strbuf for each subtree, and just
reuse the common strbuf via playing with its len.
The above-mentioned improvements go in the next 2 patches.
The downside is that try_to_follow_renames(), if active, we cause
re-reading of 2 initial trees, which was negligible based on my timings,
and which is outweighed cogently by the upsides.
NOTE To keep with the current interface and semantics, I needed to
rename the function from diff_tree() to diff_tree_sha1(). As
diff_tree_sha1() was already used, and the function we are talking here
is its more low-level helper, let's use convention for prefixing
such helpers with "ll_". So the final renaming is
diff_tree() -> ll_diff_tree_sha1()
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
if (!t1.size && !t2.size)
|
tree-diff: remove special-case diff-emitting code for empty-tree cases
While walking trees, we iterate their entries from lowest to highest in
sort order, so empty tree means all entries were already went over.
If we artificially assign +infinity value to such tree "entry", it will
go after all usual entries, and through the usual driver loop we will be
taking the same actions, which were hand-coded for special cases, i.e.
t1 empty, t2 non-empty
pathcmp(+∞, t2) -> +1
show_path(/*t1=*/NULL, t2); /* = t1 > t2 case in main loop */
t1 non-empty, t2-empty
pathcmp(t1, +∞) -> -1
show_path(t1, /*t2=*/NULL); /* = t1 < t2 case in main loop */
In other words when we have t1 and t2, we return a sign that tells the
caller to indicate the "earlier" one to be emitted, and by returning the
sign that causes the non-empty side to be emitted, we will automatically
cause the entries from the remaining side to be emitted, without
attempting to touch the empty side at all. We can teach
tree_entry_pathcmp() to pretend that an empty tree has an element that
sorts after anything else to achieve this.
Right now we never go to when compared tree descriptors are both
infinity, as this condition is checked in the loop beginning as
finishing criteria, but will do so in the future, when there will be
several parents iterated simultaneously, and some pair of them would run
to the end.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
break;
|
tree-diff: don't assume compare_tree_entry() returns -1,0,1
It does, but we'll be reworking it in the next patch after it won't, and
besides it is better to stick to standard
strcmp/memcmp/base_name_compare/etc... convention, where comparison
function returns <0, =0, >0
Regarding performance, comparing for <0, =0, >0 should be a little bit
faster, than switch, because it is just 1 test-without-immediate
instruction and then up to 3 conditional branches, and in switch you
have up to 3 tests with immediate and up to 3 conditional branches.
No worry, that update_tree_entry(t2) is duplicated for =0 and >0 - it
will be good after we'll be adding support for multiparent walker and
will stay that way.
=0 case goes first, because it happens more often in real diffs - i.e.
paths are the same.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
|
tree-diff: rework diff_tree interface to be sha1 based
In the next commit this will allow to reduce intermediate calls, when
recursing into subtrees - at that stage we know only subtree sha1, and
it is natural for tree walker to start from that phase. For now we do
diff_tree
show_path
diff_tree_sha1
diff_tree
...
and the change will allow to reduce it to
diff_tree
show_path
diff_tree
Also, it will allow to omit allocating strbuf for each subtree, and just
reuse the common strbuf via playing with its len.
The above-mentioned improvements go in the next 2 patches.
The downside is that try_to_follow_renames(), if active, we cause
re-reading of 2 initial trees, which was negligible based on my timings,
and which is outweighed cogently by the upsides.
NOTE To keep with the current interface and semantics, I needed to
rename the function from diff_tree() to diff_tree_sha1(). As
diff_tree_sha1() was already used, and the function we are talking here
is its more low-level helper, let's use convention for prefixing
such helpers with "ll_". So the final renaming is
diff_tree() -> ll_diff_tree_sha1()
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
cmp = tree_entry_pathcmp(&t1, &t2);
|
tree-diff: don't assume compare_tree_entry() returns -1,0,1
It does, but we'll be reworking it in the next patch after it won't, and
besides it is better to stick to standard
strcmp/memcmp/base_name_compare/etc... convention, where comparison
function returns <0, =0, >0
Regarding performance, comparing for <0, =0, >0 should be a little bit
faster, than switch, because it is just 1 test-without-immediate
instruction and then up to 3 conditional branches, and in switch you
have up to 3 tests with immediate and up to 3 conditional branches.
No worry, that update_tree_entry(t2) is duplicated for =0 and >0 - it
will be good after we'll be adding support for multiparent walker and
will stay that way.
=0 case goes first, because it happens more often in real diffs - i.e.
paths are the same.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
|
|
|
|
/* t1 = t2 */
|
|
|
|
if (cmp == 0) {
|
|
|
|
if (DIFF_OPT_TST(opt, FIND_COPIES_HARDER) ||
|
tree-diff: rework diff_tree interface to be sha1 based
In the next commit this will allow to reduce intermediate calls, when
recursing into subtrees - at that stage we know only subtree sha1, and
it is natural for tree walker to start from that phase. For now we do
diff_tree
show_path
diff_tree_sha1
diff_tree
...
and the change will allow to reduce it to
diff_tree
show_path
diff_tree
Also, it will allow to omit allocating strbuf for each subtree, and just
reuse the common strbuf via playing with its len.
The above-mentioned improvements go in the next 2 patches.
The downside is that try_to_follow_renames(), if active, we cause
re-reading of 2 initial trees, which was negligible based on my timings,
and which is outweighed cogently by the upsides.
NOTE To keep with the current interface and semantics, I needed to
rename the function from diff_tree() to diff_tree_sha1(). As
diff_tree_sha1() was already used, and the function we are talking here
is its more low-level helper, let's use convention for prefixing
such helpers with "ll_". So the final renaming is
diff_tree() -> ll_diff_tree_sha1()
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
hashcmp(t1.entry.sha1, t2.entry.sha1) ||
|
|
|
|
(t1.entry.mode != t2.entry.mode))
|
|
|
|
show_path(&base, opt, &t1, &t2);
|
|
|
|
|
tree-diff: rework diff_tree interface to be sha1 based
In the next commit this will allow to reduce intermediate calls, when
recursing into subtrees - at that stage we know only subtree sha1, and
it is natural for tree walker to start from that phase. For now we do
diff_tree
show_path
diff_tree_sha1
diff_tree
...
and the change will allow to reduce it to
diff_tree
show_path
diff_tree
Also, it will allow to omit allocating strbuf for each subtree, and just
reuse the common strbuf via playing with its len.
The above-mentioned improvements go in the next 2 patches.
The downside is that try_to_follow_renames(), if active, we cause
re-reading of 2 initial trees, which was negligible based on my timings,
and which is outweighed cogently by the upsides.
NOTE To keep with the current interface and semantics, I needed to
rename the function from diff_tree() to diff_tree_sha1(). As
diff_tree_sha1() was already used, and the function we are talking here
is its more low-level helper, let's use convention for prefixing
such helpers with "ll_". So the final renaming is
diff_tree() -> ll_diff_tree_sha1()
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
update_tree_entry(&t1);
|
|
|
|
update_tree_entry(&t2);
|
tree-diff: don't assume compare_tree_entry() returns -1,0,1
It does, but we'll be reworking it in the next patch after it won't, and
besides it is better to stick to standard
strcmp/memcmp/base_name_compare/etc... convention, where comparison
function returns <0, =0, >0
Regarding performance, comparing for <0, =0, >0 should be a little bit
faster, than switch, because it is just 1 test-without-immediate
instruction and then up to 3 conditional branches, and in switch you
have up to 3 tests with immediate and up to 3 conditional branches.
No worry, that update_tree_entry(t2) is duplicated for =0 and >0 - it
will be good after we'll be adding support for multiparent walker and
will stay that way.
=0 case goes first, because it happens more often in real diffs - i.e.
paths are the same.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
}
|
|
|
|
|
|
|
|
/* t1 < t2 */
|
|
|
|
else if (cmp < 0) {
|
tree-diff: rework diff_tree interface to be sha1 based
In the next commit this will allow to reduce intermediate calls, when
recursing into subtrees - at that stage we know only subtree sha1, and
it is natural for tree walker to start from that phase. For now we do
diff_tree
show_path
diff_tree_sha1
diff_tree
...
and the change will allow to reduce it to
diff_tree
show_path
diff_tree
Also, it will allow to omit allocating strbuf for each subtree, and just
reuse the common strbuf via playing with its len.
The above-mentioned improvements go in the next 2 patches.
The downside is that try_to_follow_renames(), if active, we cause
re-reading of 2 initial trees, which was negligible based on my timings,
and which is outweighed cogently by the upsides.
NOTE To keep with the current interface and semantics, I needed to
rename the function from diff_tree() to diff_tree_sha1(). As
diff_tree_sha1() was already used, and the function we are talking here
is its more low-level helper, let's use convention for prefixing
such helpers with "ll_". So the final renaming is
diff_tree() -> ll_diff_tree_sha1()
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
show_path(&base, opt, &t1, /*t2=*/NULL);
|
|
|
|
update_tree_entry(&t1);
|
tree-diff: don't assume compare_tree_entry() returns -1,0,1
It does, but we'll be reworking it in the next patch after it won't, and
besides it is better to stick to standard
strcmp/memcmp/base_name_compare/etc... convention, where comparison
function returns <0, =0, >0
Regarding performance, comparing for <0, =0, >0 should be a little bit
faster, than switch, because it is just 1 test-without-immediate
instruction and then up to 3 conditional branches, and in switch you
have up to 3 tests with immediate and up to 3 conditional branches.
No worry, that update_tree_entry(t2) is duplicated for =0 and >0 - it
will be good after we'll be adding support for multiparent walker and
will stay that way.
=0 case goes first, because it happens more often in real diffs - i.e.
paths are the same.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
}
|
|
|
|
|
|
|
|
/* t1 > t2 */
|
|
|
|
else {
|
tree-diff: rework diff_tree interface to be sha1 based
In the next commit this will allow to reduce intermediate calls, when
recursing into subtrees - at that stage we know only subtree sha1, and
it is natural for tree walker to start from that phase. For now we do
diff_tree
show_path
diff_tree_sha1
diff_tree
...
and the change will allow to reduce it to
diff_tree
show_path
diff_tree
Also, it will allow to omit allocating strbuf for each subtree, and just
reuse the common strbuf via playing with its len.
The above-mentioned improvements go in the next 2 patches.
The downside is that try_to_follow_renames(), if active, we cause
re-reading of 2 initial trees, which was negligible based on my timings,
and which is outweighed cogently by the upsides.
NOTE To keep with the current interface and semantics, I needed to
rename the function from diff_tree() to diff_tree_sha1(). As
diff_tree_sha1() was already used, and the function we are talking here
is its more low-level helper, let's use convention for prefixing
such helpers with "ll_". So the final renaming is
diff_tree() -> ll_diff_tree_sha1()
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
show_path(&base, opt, /*t1=*/NULL, &t2);
|
|
|
|
update_tree_entry(&t2);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
strbuf_release(&base);
|
tree-diff: rework diff_tree interface to be sha1 based
In the next commit this will allow to reduce intermediate calls, when
recursing into subtrees - at that stage we know only subtree sha1, and
it is natural for tree walker to start from that phase. For now we do
diff_tree
show_path
diff_tree_sha1
diff_tree
...
and the change will allow to reduce it to
diff_tree
show_path
diff_tree
Also, it will allow to omit allocating strbuf for each subtree, and just
reuse the common strbuf via playing with its len.
The above-mentioned improvements go in the next 2 patches.
The downside is that try_to_follow_renames(), if active, we cause
re-reading of 2 initial trees, which was negligible based on my timings,
and which is outweighed cogently by the upsides.
NOTE To keep with the current interface and semantics, I needed to
rename the function from diff_tree() to diff_tree_sha1(). As
diff_tree_sha1() was already used, and the function we are talking here
is its more low-level helper, let's use convention for prefixing
such helpers with "ll_". So the final renaming is
diff_tree() -> ll_diff_tree_sha1()
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
free(t2tree);
|
|
|
|
free(t1tree);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
Finally implement "git log --follow"
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.
But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..
The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.
So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.
To make a long story short: I'm sorry for it taking so long until I just
did it.
I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.
All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.
One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.
IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.
Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.
And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do
git log -p --follow builtin-rev-list.c
and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
18 years ago
|
|
|
/*
|
|
|
|
* Does it look like the resulting diff might be due to a rename?
|
|
|
|
* - single entry
|
|
|
|
* - not a valid previous file
|
|
|
|
*/
|
|
|
|
static inline int diff_might_be_rename(void)
|
|
|
|
{
|
|
|
|
return diff_queued_diff.nr == 1 &&
|
|
|
|
!DIFF_FILE_VALID(diff_queued_diff.queue[0]->one);
|
|
|
|
}
|
|
|
|
|
tree-diff: rework diff_tree interface to be sha1 based
In the next commit this will allow to reduce intermediate calls, when
recursing into subtrees - at that stage we know only subtree sha1, and
it is natural for tree walker to start from that phase. For now we do
diff_tree
show_path
diff_tree_sha1
diff_tree
...
and the change will allow to reduce it to
diff_tree
show_path
diff_tree
Also, it will allow to omit allocating strbuf for each subtree, and just
reuse the common strbuf via playing with its len.
The above-mentioned improvements go in the next 2 patches.
The downside is that try_to_follow_renames(), if active, we cause
re-reading of 2 initial trees, which was negligible based on my timings,
and which is outweighed cogently by the upsides.
NOTE To keep with the current interface and semantics, I needed to
rename the function from diff_tree() to diff_tree_sha1(). As
diff_tree_sha1() was already used, and the function we are talking here
is its more low-level helper, let's use convention for prefixing
such helpers with "ll_". So the final renaming is
diff_tree() -> ll_diff_tree_sha1()
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
static void try_to_follow_renames(const unsigned char *old, const unsigned char *new, const char *base, struct diff_options *opt)
|
Finally implement "git log --follow"
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.
But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..
The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.
So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.
To make a long story short: I'm sorry for it taking so long until I just
did it.
I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.
All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.
One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.
IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.
Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.
And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do
git log -p --follow builtin-rev-list.c
and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
18 years ago
|
|
|
{
|
|
|
|
struct diff_options diff_opts;
|
|
|
|
struct diff_queue_struct *q = &diff_queued_diff;
|
|
|
|
struct diff_filepair *choice;
|
Finally implement "git log --follow"
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.
But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..
The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.
So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.
To make a long story short: I'm sorry for it taking so long until I just
did it.
I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.
All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.
One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.
IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.
Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.
And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do
git log -p --follow builtin-rev-list.c
and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
18 years ago
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* follow-rename code is very specific, we need exactly one
|
|
|
|
* path. Magic that matches more than one path is not
|
|
|
|
* supported.
|
|
|
|
*/
|
|
|
|
GUARD_PATHSPEC(&opt->pathspec, PATHSPEC_FROMTOP | PATHSPEC_LITERAL);
|
|
|
|
#if 0
|
|
|
|
/*
|
|
|
|
* We should reject wildcards as well. Unfortunately we
|
|
|
|
* haven't got a reliable way to detect that 'foo\*bar' in
|
|
|
|
* fact has no wildcards. nowildcard_len is merely a hint for
|
|
|
|
* optimization. Let it slip for now until wildmatch is taught
|
|
|
|
* about dry-run mode and returns wildcard info.
|
|
|
|
*/
|
|
|
|
if (opt->pathspec.has_wildcard)
|
|
|
|
die("BUG:%s:%d: wildcards are not supported",
|
|
|
|
__FILE__, __LINE__);
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/* Remove the file creation entry from the diff queue, and remember it */
|
|
|
|
choice = q->queue[0];
|
|
|
|
q->nr = 0;
|
|
|
|
|
Finally implement "git log --follow"
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.
But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..
The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.
So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.
To make a long story short: I'm sorry for it taking so long until I just
did it.
I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.
All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.
One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.
IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.
Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.
And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do
git log -p --follow builtin-rev-list.c
and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
18 years ago
|
|
|
diff_setup(&diff_opts);
|
|
|
|
DIFF_OPT_SET(&diff_opts, RECURSIVE);
|
|
|
|
DIFF_OPT_SET(&diff_opts, FIND_COPIES_HARDER);
|
Finally implement "git log --follow"
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.
But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..
The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.
So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.
To make a long story short: I'm sorry for it taking so long until I just
did it.
I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.
All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.
One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.
IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.
Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.
And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do
git log -p --follow builtin-rev-list.c
and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
18 years ago
|
|
|
diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
|
|
|
|
diff_opts.single_follow = opt->pathspec.items[0].match;
|
Fix diffcore-break total breakage
Ok, so on the kernel list, some people noticed that "git log --follow"
doesn't work too well with some files in the x86 merge, because a lot of
files got renamed in very special ways.
In particular, there was a pattern of doing single commits with renames
that looked basically like
- rename "filename.h" -> "filename_64.h"
- create new "filename.c" that includes "filename_32.h" or
"filename_64.h" depending on whether we're 32-bit or 64-bit.
which was preparatory for smushing the two trees together.
Now, there's two issues here:
- "filename.c" *remained*. Yes, it was a rename, but there was a new file
created with the old name in the same commit. This was important,
because we wanted each commit to compile properly, so that it was
bisectable, so splitting the rename into one commit and the "create
helper file" into another was *not* an option.
So we need to break associations where the contents change too much.
Fine. We have the -B flag for that. When we break things up, then the
rename detection will be able to figure out whether there are better
alternatives.
- "git log --follow" didn't with with -B.
Now, the second case was really simple: we use a different "diffopt"
structure for the rename detection than the basic one (which we use for
showing the diffs). So that second case is trivially fixed by a trivial
one-liner that just copies the break_opt values from the "real" diffopts
to the one used for rename following. So now "git log -B --follow" works
fine:
diff --git a/tree-diff.c b/tree-diff.c
index 26bdbdd..7c261fd 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -319,6 +319,7 @@ static void try_to_follow_renames(struct tree_desc *t1, struct tree_desc *t2, co
diff_opts.detect_rename = DIFF_DETECT_RENAME;
diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
diff_opts.single_follow = opt->paths[0];
+ diff_opts.break_opt = opt->break_opt;
paths[0] = NULL;
diff_tree_setup_paths(paths, &diff_opts);
if (diff_setup_done(&diff_opts) < 0)
however, the end result does *not* work. Because our diffcore-break.c
logic is totally bogus!
In particular:
- it used to do
if (base_size < MINIMUM_BREAK_SIZE)
return 0; /* we do not break too small filepair */
which basically says "don't bother to break small files". But that
"base_size" is the *smaller* of the two sizes, which means that if some
large file was rewritten into one that just includes another file, we
would look at the (small) result, and decide that it's smaller than the
break size, so it cannot be worth it to break it up! Even if the other
side was ten times bigger and looked *nothing* like the samell file!
That's clearly bogus. I replaced "base_size" with "max_size", so that
we compare the *bigger* of the filepair with the break size.
- It calculated a "merge_score", which was the score needed to merge it
back together if nothing else wanted it. But even if it was *so*
different that we would never want to merge it back, we wouldn't
consider it a break! That makes no sense. So I added
if (*merge_score_p > break_score)
return 1;
to make it clear that if we wouldn't want to merge it at the end, it
was *definitely* a break.
- It compared the whole "extent of damage", counting all inserts and
deletes, but it based this score on the "base_size", and generated the
damage score with
delta_size = src_removed + literal_added;
damage_score = delta_size * MAX_SCORE / base_size;
but that makes no sense either, since quite often, this will result in
a number that is *bigger* than MAX_SCORE! Why? Because base_size is
(again) the smaller of the two files we compare, and when you start out
from a small file and add a lot (or start out from a large file and
remove a lot), the base_size is going to be much smaller than the
damage!
Again, the fix was to replace "base_size" with "max_size", at which
point the damage actually becomes a sane percentage of the whole.
With these changes in place, not only does "git log -B --follow" work for
the case that triggered this in the first place, ie now
git log -B --follow arch/x86/kernel/vmlinux_64.lds.S
actually gives reasonable results. But I also wanted to verify it in
general, by doing a full-history
git log --stat -B -C
on my kernel tree with the old code and the new code.
There's some tweaking to be done, but generally, the new code generates
much better results wrt breaking up files (and then finding better rename
candidates). Here's a few examples of the "--stat" output:
- This:
include/asm-x86/Kbuild | 2 -
include/asm-x86/debugreg.h | 79 +++++++++++++++++++++++++++++++++++------
include/asm-x86/debugreg_32.h | 64 ---------------------------------
include/asm-x86/debugreg_64.h | 65 ---------------------------------
4 files changed, 68 insertions(+), 142 deletions(-)
Becomes:
include/asm-x86/Kbuild | 2 -
include/asm-x86/{debugreg_64.h => debugreg.h} | 9 +++-
include/asm-x86/debugreg_32.h | 64 -------------------------
3 files changed, 7 insertions(+), 68 deletions(-)
- This:
include/asm-x86/bug.h | 41 +++++++++++++++++++++++++++++++++++++++--
include/asm-x86/bug_32.h | 37 -------------------------------------
include/asm-x86/bug_64.h | 34 ----------------------------------
3 files changed, 39 insertions(+), 73 deletions(-)
Becomes
include/asm-x86/{bug_64.h => bug.h} | 20 +++++++++++++-----
include/asm-x86/bug_32.h | 37 -----------------------------------
2 files changed, 14 insertions(+), 43 deletions(-)
Now, in some other cases, it does actually turn a rename into a real
"delete+create" pair, and then the diff is usually bigger, so truth in
advertizing: it doesn't always generate a nicer diff. But for what -B was
meant for, I think this is a big improvement, and I suspect those cases
where it generates a bigger diff are tweakable.
So I think this diff fixes a real bug, but we might still want to tweak
the default values and perhaps the exact rules for when a break happens.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
17 years ago
|
|
|
diff_opts.break_opt = opt->break_opt;
|
use custom rename score during --follow
If you provide a custom rename score on the command line,
like:
git log -M50 --follow foo.c
it is completely ignored, and there is no way to --follow
with a looser rename score. Instead, let's use the same
rename score that will be used for generating diffs. This is
convenient, and mirrors what we do with the break-score.
You can see an example of it being useful in git.git:
$ git log --oneline --summary --follow \
Documentation/technical/api-string-list.txt
86d4b52 string-list: Add API to remove an item from an unsorted list
1d2f80f string_list: Fix argument order for string_list_append
e242148 string-list: add unsorted_string_list_lookup()
0dda1d1 Fix two leftovers from path_list->string_list
c455c87 Rename path_list to string_list
create mode 100644 Documentation/technical/api-string-list.txt
$ git log --oneline --summary -M40 --follow \
Documentation/technical/api-string-list.txt
86d4b52 string-list: Add API to remove an item from an unsorted list
1d2f80f string_list: Fix argument order for string_list_append
e242148 string-list: add unsorted_string_list_lookup()
0dda1d1 Fix two leftovers from path_list->string_list
c455c87 Rename path_list to string_list
rename Documentation/technical/{api-path-list.txt => api-string-list.txt} (47%)
328a475 path-list documentation: document all functions and data structures
530e741 Start preparing the API documents.
create mode 100644 Documentation/technical/api-path-list.txt
You could have two separate rename scores, one for following
and one for diff. But almost nobody is going to want that,
and it would just be unnecessarily confusing. Besides which,
we re-use the diff results from try_to_follow_renames for
the actual diff output, which means having them as separate
scores is actively wrong. E.g., with the current code, you
get:
$ git log --oneline --diff-filter=R --name-status \
-M90 --follow git.spec.in
27dedf0 GIT 0.99.9j aka 1.0rc3
R084 git-core.spec.in git.spec.in
f85639c Rename the RPM from "git" to "git-core"
R098 git.spec.in git-core.spec.in
The first one should not be considered a rename by the -M
score we gave, but we print it anyway, since we blindly
re-use the diff information from the follow (which uses the
default score). So this could also be considered simply a
bug-fix, as with the current code "-M" is completely ignored
when using "--follow".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 years ago
|
|
|
diff_opts.rename_score = opt->rename_score;
|
|
|
|
diff_setup_done(&diff_opts);
|
tree-diff: rework diff_tree interface to be sha1 based
In the next commit this will allow to reduce intermediate calls, when
recursing into subtrees - at that stage we know only subtree sha1, and
it is natural for tree walker to start from that phase. For now we do
diff_tree
show_path
diff_tree_sha1
diff_tree
...
and the change will allow to reduce it to
diff_tree
show_path
diff_tree
Also, it will allow to omit allocating strbuf for each subtree, and just
reuse the common strbuf via playing with its len.
The above-mentioned improvements go in the next 2 patches.
The downside is that try_to_follow_renames(), if active, we cause
re-reading of 2 initial trees, which was negligible based on my timings,
and which is outweighed cogently by the upsides.
NOTE To keep with the current interface and semantics, I needed to
rename the function from diff_tree() to diff_tree_sha1(). As
diff_tree_sha1() was already used, and the function we are talking here
is its more low-level helper, let's use convention for prefixing
such helpers with "ll_". So the final renaming is
diff_tree() -> ll_diff_tree_sha1()
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
ll_diff_tree_sha1(old, new, base, &diff_opts);
|
Finally implement "git log --follow"
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.
But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..
The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.
So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.
To make a long story short: I'm sorry for it taking so long until I just
did it.
I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.
All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.
One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.
IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.
Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.
And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do
git log -p --follow builtin-rev-list.c
and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
18 years ago
|
|
|
diffcore_std(&diff_opts);
|
|
|
|
free_pathspec(&diff_opts.pathspec);
|
Finally implement "git log --follow"
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.
But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..
The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.
So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.
To make a long story short: I'm sorry for it taking so long until I just
did it.
I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.
All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.
One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.
IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.
Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.
And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do
git log -p --follow builtin-rev-list.c
and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
18 years ago
|
|
|
|
|
|
|
/* Go through the new set of filepairing, and see if we find a more interesting one */
|
|
|
|
opt->found_follow = 0;
|
|
|
|
for (i = 0; i < q->nr; i++) {
|
|
|
|
struct diff_filepair *p = q->queue[i];
|
Finally implement "git log --follow"
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.
But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..
The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.
So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.
To make a long story short: I'm sorry for it taking so long until I just
did it.
I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.
All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.
One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.
IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.
Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.
And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do
git log -p --follow builtin-rev-list.c
and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
18 years ago
|
|
|
|
|
|
|
/*
|
|
|
|
* Found a source? Not only do we use that for the new
|
|
|
|
* diff_queued_diff, we will also use that as the path in
|
Finally implement "git log --follow"
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.
But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..
The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.
So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.
To make a long story short: I'm sorry for it taking so long until I just
did it.
I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.
All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.
One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.
IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.
Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.
And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do
git log -p --follow builtin-rev-list.c
and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
18 years ago
|
|
|
* the future!
|
|
|
|
*/
|
|
|
|
if ((p->status == 'R' || p->status == 'C') &&
|
|
|
|
!strcmp(p->two->path, opt->pathspec.items[0].match)) {
|
|
|
|
const char *path[2];
|
|
|
|
|
|
|
|
/* Switch the file-pairs around */
|
|
|
|
q->queue[i] = choice;
|
|
|
|
choice = p;
|
|
|
|
|
|
|
|
/* Update the path we use from now on.. */
|
|
|
|
path[0] = p->one->path;
|
|
|
|
path[1] = NULL;
|
|
|
|
free_pathspec(&opt->pathspec);
|
|
|
|
parse_pathspec(&opt->pathspec,
|
|
|
|
PATHSPEC_ALL_MAGIC & ~PATHSPEC_LITERAL,
|
|
|
|
PATHSPEC_LITERAL_PATH, "", path);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The caller expects us to return a set of vanilla
|
|
|
|
* filepairs to let a later call to diffcore_std()
|
|
|
|
* it makes to sort the renames out (among other
|
|
|
|
* things), but we already have found renames
|
|
|
|
* ourselves; signal diffcore_std() not to muck with
|
|
|
|
* rename information.
|
|
|
|
*/
|
|
|
|
opt->found_follow = 1;
|
Finally implement "git log --follow"
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.
But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..
The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.
So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.
To make a long story short: I'm sorry for it taking so long until I just
did it.
I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.
All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.
One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.
IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.
Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.
And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do
git log -p --follow builtin-rev-list.c
and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
18 years ago
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Then, discard all the non-relevant file pairs...
|
|
|
|
*/
|
|
|
|
for (i = 0; i < q->nr; i++) {
|
|
|
|
struct diff_filepair *p = q->queue[i];
|
|
|
|
diff_free_filepair(p);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* .. and re-instate the one we want (which might be either the
|
|
|
|
* original one, or the rename/copy we found)
|
Finally implement "git log --follow"
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.
But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..
The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.
So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.
To make a long story short: I'm sorry for it taking so long until I just
did it.
I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.
All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.
One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.
IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.
Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.
And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do
git log -p --follow builtin-rev-list.c
and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
18 years ago
|
|
|
*/
|
|
|
|
q->queue[0] = choice;
|
|
|
|
q->nr = 1;
|
Finally implement "git log --follow"
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.
But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..
The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.
So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.
To make a long story short: I'm sorry for it taking so long until I just
did it.
I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.
All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.
One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.
IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.
Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.
And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do
git log -p --follow builtin-rev-list.c
and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
18 years ago
|
|
|
}
|
|
|
|
|
|
|
|
int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const char *base, struct diff_options *opt)
|
|
|
|
{
|
|
|
|
int retval;
|
|
|
|
|
tree-diff: rework diff_tree interface to be sha1 based
In the next commit this will allow to reduce intermediate calls, when
recursing into subtrees - at that stage we know only subtree sha1, and
it is natural for tree walker to start from that phase. For now we do
diff_tree
show_path
diff_tree_sha1
diff_tree
...
and the change will allow to reduce it to
diff_tree
show_path
diff_tree
Also, it will allow to omit allocating strbuf for each subtree, and just
reuse the common strbuf via playing with its len.
The above-mentioned improvements go in the next 2 patches.
The downside is that try_to_follow_renames(), if active, we cause
re-reading of 2 initial trees, which was negligible based on my timings,
and which is outweighed cogently by the upsides.
NOTE To keep with the current interface and semantics, I needed to
rename the function from diff_tree() to diff_tree_sha1(). As
diff_tree_sha1() was already used, and the function we are talking here
is its more low-level helper, let's use convention for prefixing
such helpers with "ll_". So the final renaming is
diff_tree() -> ll_diff_tree_sha1()
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
retval = ll_diff_tree_sha1(old, new, base, opt);
|
|
|
|
if (!*base && DIFF_OPT_TST(opt, FOLLOW_RENAMES) && diff_might_be_rename())
|
|
|
|
try_to_follow_renames(old, new, base, opt);
|
|
|
|
|
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
|
|
|
int diff_root_tree_sha1(const unsigned char *new, const char *base, struct diff_options *opt)
|
|
|
|
{
|
|
|
|
return diff_tree_sha1(NULL, new, base, opt);
|
|
|
|
}
|