When we introduced the cached origin per commit, we gave up proper
garbage collecting because it meant that commits hold onto their
cached copy. There is no need to do so.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The reason to do this is the same as in the previous change for
line copy detection within the same file (-M).
Also this fixes -C and -C -C (aka find-copies-harder) logic; in
this application we are not interested in the similarity
matching diffcore-rename makes, because we are only interested
in scanning files that were modified, or in the case of -C -C,
scanning all files in the parent and we want to do that
ourselves.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Otherwise we would miss copied lines that are contained in the
parts before or after the part that we find after splitting the
blame_entry (i.e. split[0] and split[2]).
Signed-off-by: Junio C Hamano <junkio@cox.net>
If more than one parents in an Octopus merge have the same
origin, ignore later ones because it would not make any
difference in the outcome.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The idea is that we are interested in renaming into only one path, so
we do not care about renames that happen elsewhere.
Signed-off-by: Junio C Hamano <junkio@cox.net>
We forgot to add prefix to the given path.
[jc: interestingly enough, Jeff King had the same idea after I
pushed mine out to "pu", and his patch was cleaner, so I dropped
mine.]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a shorthand for "<rev> --not <rev>^@", i.e. "include
this commit but exclude any of its parents".
When a new file $F is introduced by revision $R, this notation
can be used to find a copy-and-paste from existing file in the
parents of that revision without annotating the ancestry of the
lines that were copied from:
git pickaxe -f -C $R^! -- $F
Signed-off-by: Junio C Hamano <junkio@cox.net>
Depending on how bushy the commit DAG is, this saves calls to
the internal diff-tree for fork-point commits. For example,
annotating Makefile in the kernel repository saves about a third
of such diff-tree calls.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When a merge adds a new file from the second parent, the
earlier code tried to find renames in the first parent before
noticing that the vertion from the second parent was added
without modification.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When compiled for debugging, make sure that refcnt sanity check
code detects underflows in origin reference counting.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This makes "git-pickaxe -C master -- revision.c" to finish with
proper refcounts for all origins. I am reasonably happy with
it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The command rejects -L1,10 as an invalid line range specifier
and I got frustrated enough by it, so this makes it allow both
forms of input.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The origin structure is allocated for each commit and path while
the code traverse down it is copied into different blame entries.
To avoid leaks, try refcounting them.
This still seems to leak, which I haven't tracked down fully yet.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When assigning blames for code movements across file boundaries,
we used to iterate over blame entries (i.e. groups of lines to
be blamed) in the outer loop and compared each entry with paths
in the parent commit in an inner loop. This meant that we
opened the blob data from each path number of times.
Reorganize the loop so that we read the same path only once, and
compare it against all relevant blame entries.
This should perform better, but seems to give mixed results,
though.
Signed-off-by: Junio C Hamano <junkio@cox.net>
After finding out which path in the parent to scan to pass
blames, using get_tree_entry() to extract the blob information
again was quite wasteful, since diff-tree already gave us that
information. Separate the function to create an origin out as
get_origin().
You'll never know what is more efficient unless you try and/or
think hard. I somehow thought that extracting one known path
out of commit's tree is cheaper than running a diff-tree for the
current path between the commit and its parent, but it is not
the case. In real, non-toy projects, most commits do not touch
the path you are interested in, and if the path is a few levels
away from the toplevel, whole-subdirectory comparison logic
diff-tree allows us to skip opening lower subdirectories.
This commit rewrites find_origin() function to use a single-path
diff-tree to see if the parent has the same blob as the current
suspect, which is cheaper than extracting the blob information
using get_tree_entry() and comparing it with what the current
suspect has. This shaves about 6% overhead when annotating
kernel/sched.c in the Linux kernel repository on my machine.
The saving rises to 25% for arch/i386/kernel/Makefile.
Signed-off-by: Junio C Hamano <junkio@cox.net>
It used to be that we can compare the address of the origin
structure to determine if they are the same because they are
always registered with scoreboard. After introduction of the
loop to try finding the best split, that is not true anymore.
The current code has rather serious leaks with origin structure,
but more importantly it gets confused when two origins that
points at the same commit and same path.
We might eventually have to refcount and gc origin, but let's
fix the correctness issue first.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds scoring logic to blame_entry to prevent blames on very
trivial chunks (e.g. lots of empty lines, indent followed by a
closing brace) from being passed down to unrelated lines in the
parent.
The current heuristics are quite simple and may need to be
tweaked later, but we need to start somewhere.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Instead of comparing number of lines matched, look at the
matched characters and count alnums, so that we do not pass
blame on not-so-interesting lines, such as an empty line and
a line that is indentation followed by a closing brace.
Add an option --score-debug to show the score of each
blame_entry while we cook this further on the "next" branch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
We would want to be able to refer to the end of the file as
"the beginning of Nth line" for a file that is N lines long.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This completes the initial round of git-pickaxe. In addition to
the detection of line movements we already have, this finds new
lines that were created by moving or cutting-and-pasting lines
from different files in the parent.
With this,
git pickaxe -f -n -C v1.4.0 -- revision.c
finds that a major part of that file actually came from
rev-list.c when Linus split the latter at commit ae563642 and
blames them to earlier commits that touch rev-list.c.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This makes pickaxe more intelligent than the classic blame.
A typical example is a change that moves one static C function
from lower part of the file to upper part of the same file,
because you added a new caller in the middle.
The versions in the parent and the child would look like this:
parent child
A static foo() {
B ...
C }
D A
E B
F C
G D
static foo() { ... call foo();
... E
} F
H G
H
With the classic blame algorithm, we can blame lines A B C D E F
G and H to the parent. The child is guilty of introducing the
line "... call foo();", and the blame is placed on the child.
However, the classic blame algorithm fails to notice that the
implementation of foo() at the top of the file is not new, and
moved from the lower part of the parent.
This commit introduces detection of such line movements, and
correctly blames the lines that were simply moved in the file to
the parent.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Currently it does what git-blame does, but only faster.
More importantly, its internal structure is designed to support
content movement (aka cut-and-paste) more easily by allowing
more than one paths to be taken from the same commit.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The latest GNU diff from CVS emits an empty line to express
an empty context line, instead of more traditional "single
white space followed by a newline". Do not get broken by it.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
I noticed a case not handled in a recent patch.
Demonstrate it like this:
$ touch new-file
$ git-send-email --dry-run --from j --to k new-file 2>err
new-file
OK. Log says:
Date: Thu, 19 Oct 2006 10:26:24 +0200
Sendmail: /usr/sbin/sendmail
From: j
Subject:
Cc:
To: k
Result: OK
$ cat err
Use of uninitialized value in pattern match (m//) at /p/bin/git-send-email line 416.
Use of uninitialized value in concatenation (.) or string at /p/bin/git-send-email line 420.
Use of uninitialized value in concatenation (.) or string at /p/bin/git-send-email line 468.
There's a patch for the $author_name part below.
The example above shows that $subject may also be used uninitialized.
That should be easy to fix, too.
Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* mw/pathinfo:
gitweb: Fix search form when PATH_INFO is enabled
gitweb: Document features better
gitweb: warn if feature cannot be overridden.
gitweb: start to generate PATH_INFO URLs.
Conflicts:
gitweb/README
* jc/send-email:
Make git-send-email detect mbox-style patches more readily
git-send-email: real name with period need to be dq-quoted on From: line
git-send-email: do not drop custom headers the user prepared
* rs/rebase:
git-rebase: Add a -v option to show a diffstat of the changes upstream at the start of a rebase.
git-rebase: Use --ignore-if-in-upstream option when executing git-format-patch.
Supposing that both the base and result sizes were both full size 64-bit
values, their encoding would occupy only 9.2 bytes each. Therefore
inflating 64 bytes is way overkill. Limit it to 20 bytes instead which
should be plenty enough for a couple years to come.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Cyrus imap refuses messages with a 'From ' Header.
[jc: Mike McCormack says this is fine with Courier as well.]
Signed-off-by: Markus Amsler <markus.amsler@oribi.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We are not rebuilding the xdiff library when its header files change.
Add dependancies for those to the main Makefile.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Ryan Anderson <ryan@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
I had local modifications in the tree and doing bisect reset required me to
manually edit .git/HEAD.
Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Jim Mayering noticed that xdiff library took insanely long time
when comparing files with many identical lines.
This was because the hash function used in the library is broken
on 64-bit architectures and caused too many collisions.
http://thread.gmane.org/gmane.comp.version-control.git/28962/focus=28994
Acked-by: Davide Libenzi <davidel@xmaliserver.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Currently git-svnimport generates broken tags missing the timespec in the
'tagger' line. This is a random stab at a minimal fix.
Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>