This splits Geert's similarity fingerprint code into main
program and fingerprinting function. The next step would be to
try using this in pack-objects.c::try_delta() -- which would be
a good evaluation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
I've merged everything I think is ready for 1.3.0, so this is
the final round -- hopefully I can release this with minimum
last-minute fixup as v1.3.0 early next week.
Signed-off-by: Junio C Hamano <junkio@cox.net>
* lt/logopt:
Fix up rev-list option parsing.
Fix up default abbrev in setup_revisions() argument parser.
Common option parsing for "git log --diff" and friends
This basically does a few things that are sadly somewhat interdependent,
and nontrivial to split out
- get rid of "struct log_tree_opt"
The fields in "log_tree_opt" are moved into "struct rev_info", and all
users of log_tree_opt are changed to use the rev_info struct instead.
- add the parsing for the log_tree_opt arguments to "setup_revision()"
- make setup_revision set a flag (revs->diff) if the diff-related
arguments were used. This allows "git log" to decide whether it wants
to show diffs or not.
- make setup_revision() also initialize the diffopt part of rev_info
(which we had from before, but we just didn't initialize it)
- make setup_revision() do all the "finishing touches" on it all (it will
do the proper flag combination logic, and call "diff_setup_done()")
Now, that was the easy and straightforward part.
The slightly more involved part is that some of the programs that want to
use the new-and-improved rev_info parsing don't actually want _commits_,
they may want tree'ish arguments instead. That meant that I had to change
setup_revision() to parse the arguments not into the "revs->commits" list,
but into the "revs->pending_objects" list.
Then, when we do "prepare_revision_walk()", we walk that list, and create
the sorted commit list from there.
This actually cleaned some stuff up, but it's the less obvious part of the
patch, and re-organized the "revision.c" logic somewhat. It actually paves
the way for splitting argument parsing _entirely_ out of "revision.c",
since now the argument parsing really is totally independent of the commit
walking: that didn't use to be true, since there was lots of overlap with
get_commit_reference() handling etc, now the _only_ overlap is the shared
(and trivial) "add_pending_object()" thing.
However, I didn't do that file split, just because I wanted the diff
itself to be smaller, and show the actual changes more clearly. If this
gets accepted, I'll do further cleanups then - that includes the file
split, but also using the new infrastructure to do a nicer "git diff" etc.
Even in this form, it actually ends up removing more lines than it adds.
It's nice to note how simple and straightforward this makes the built-in
"git log" command, even though it continues to support all the diff flags
too. It doesn't get much simpler that this.
I think this is worth merging soonish, because it does allow for future
cleanup and even more sharing of code. However, it obviously touches
"revision.c", which is subtle. I've tested that it passes all the tests we
have, and it passes my "looks sane" detector, but somebody else should
also give it a good look-over.
[jc: squashed the original and three "oops this too" updates, with
another fix-up.]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
I noticed bisect does not work well without both good and bad.
Running this script in git.git repository would give you quite
different results:
#!/bin/sh
initial=e83c5163316f89bfbde7d9ab23ca2e25604af290
mid0=`git rev-list --bisect ^$initial --all`
git rev-list $mid0 | wc -l
git rev-list ^$mid0 --all | wc -l
mid1=`git rev-list --bisect --all`
git rev-list $mid1 | wc -l
git rev-list ^$mid1 --all | wc -l
The $initial commit is the very first commit you made. The
first midpoint bisects things evenly as designed, but the latter
does not.
The reason I got interested in this was because I was wondering
if something like the following would help people converting a
huge repository from foreign SCM, or preparing a repository to
be fetched over plain dumb HTTP only:
#!/bin/sh
N=4
P=.git/objects/pack
bottom=
while test 0 \< $N
do
N=$((N-1))
if test -z "$bottom"
then
newbottom=`git rev-list --bisect --all`
else
newbottom=`git rev-list --bisect ^$bottom --all`
fi
if test -z "$bottom"
then
rev_list="$newbottom"
elif test 0 = $N
then
rev_list="^$bottom --all"
else
rev_list="^$bottom $newbottom"
fi
p=$(git rev-list --unpacked --objects $rev_list |
git pack-objects $P/pack)
git show-index <$P/pack-$p.idx | wc -l
bottom=$newbottom
done
The idea is to pack older half of the history to one pack, then
older half of the remaining history to another, to continue a
few times, using finer granularity as we get closer to the tip.
This may not matter, since for a truly huge history, running
bisect number of times could be quite time consuming, and we
might be better off running "git rev-list --all" once into a
temporary file, and manually pick cut-off points from the
resulting list of commits. After all we are talking about
"approximately half" for such an usage, and older history does
not matter much.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Partly because we've messed up and now have some commits with trailing
whitespace, but partly because this also just simplifies the code, let's
remove trailing whitespace from the end when pretty-printing commits.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Noticed by Johannes. We do not install it anymore, but still have
been shipping the source, which was crazy.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This fixes f4ee3eb689 breakage, which
added an extra trailing blank line after stripping trailing blank lines
by mistake.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Relying on eye-candy progress bar was fragile to begin with.
Run fetch-pack with -k option, and count the objects that are in
the pack that were transferred from the other end.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The regexp on the right hand side of expr : operator somehow was
broken.
expr 'z+pu:refs/tags/ko-pu' : 'z\+\(.*\)'
does not strip '+'; write 'z+\(.*\)' instead.
We probably should switch to shell based substring post 1.3.0;
that's not bashism but just POSIX anyway.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Now, you can say "git diff --stat" (to get an idea how many changes are
uncommitted), or "git log --stat".
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Some words, e.g., `match', are special to expr(1), and cause strange
parsing effects. Track down all uses of expr and mangle the arguments
so that this isn't a problem.
Signed-off-by: Mark Wooding <mdw@distorted.org.uk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When running t3600-rm test under fakeroot (or as root), we
cannot make a file unremovable with "chmod a-w .". Detect this
case early and skip that test.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This trivially avoids keeping the commit message data around after we
don't need it any more, avoiding a continually growing "git log" memory
footprint.
It's not a huge deal, but it's somewhat noticeable. For the current kernel
tree, doing a full "git log" I got
- before: /usr/bin/time git log > /dev/null
0.81user 0.02system 0:00.84elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+8851minor)pagefaults 0swaps
- after: /usr/bin/time git log > /dev/null
0.79user 0.03system 0:00.83elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+5039minor)pagefaults 0swaps
ie the touched pages dropped from 8851 to 5039. For the historic kernel
archive, the numbers are 18357->11037 minor page faults.
We could/should in theory free the commits themselves, but that's really a
lot harder, since during revision traversal we may hit the same commit
twice through different children having it as a parent, even after we've
shown it once (when that happens, we'll silently ignore it next time, but
we still need the "struct commit" to know).
And as the commit message data is clearly the biggest part of the commit,
this is the really easy 60% solution.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This target lists undocumented commands, and/or whose document
is not referenced from the main git documentation.
For now, there are some exceptions I added primarily because I
lack the energy to document them myself:
- merge backends (we should really document them)
- ssh-push/ssh-pull (does anybody still use them?)
- annotate and blame (maybe after one of them eats the other ;-)
Signed-off-by: Junio C Hamano <junkio@cox.net>
* jc/combine:
stripspace: make sure not to leave an incomplete line.
git-commit: do not muck with commit message when no_edit is set.
When showing a commit message, do not lose an incomplete line.
Retire t5501-old-fetch-and-upload test.
combine-diff: type fix.
* master:
stripspace: make sure not to leave an incomplete line.
git-commit: do not muck with commit message when no_edit is set.
When showing a commit message, do not lose an incomplete line.
Retire t5501-old-fetch-and-upload test.
The variable hunk_end points at a line number, which is
represented as unsigned long by all the other variables.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When dealing with a commit log message for human consumption, it
never makes sense to keep a log that ends with an incomplete
line, so make it a part of the clean-up process done by
git-stripspace.
Acked-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Spotted by Linus and Darrin Thompson. When we took a commit
message from -F <file> with an incomplete line, we appended "git
status" output, which ended up attaching a lone "#" at the end.
We still need the "do we have anything to commit?" check by
running "status" (which has to know what to do in different
cases with -i/-o/-a), but there is no point appending its output
to the proposed commit message given by the user.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The previous round showed the delete-only hunks at the end, but
forgot to mark them interesting when they were.
Signed-off-by: Junio C Hamano <junkio@cox.net>
We used to lose hunks that appear at the end and have only
deletion. This makes sure that the record beyond the end of
file (which holds such deletions) is examined.
Signed-off-by: Junio C Hamano <junkio@cox.net>