When creating a pack using objects that reside in existing packs, we try
to avoid recomputing futile delta between an object (trg) and a candidate
for its base object (src) if they are stored in the same packfile, and trg
is not recorded as a delta already. This heuristics makes sense because it
is likely that we tried to express trg as a delta based on src but it did
not produce a good delta when we created the existing pack.
As the pack heuristics prefer producing delta to remove data, and Linus's
law dictates that the size of a file grows over time, we tend to record
the newest version of the file as inflated, and older ones as delta
against it.
When creating a thin-pack to transfer recent history, it is likely that we
will try to send an object that is recorded in full, as it is newer. But
the heuristics to avoid recomputing futile delta effectively forbids us
from attempting to express such an object as a delta based on another
object. Sending an object in full is often more expensive than sending a
suboptimal delta based on other objects, and it is even more so if we
could use an object we know the receiving end already has (i.e. preferred
base object) as the delta base.
Tweak the recomputation avoidance logic, so that we do not punt on
computing delta against a preferred base object.
The effect of this change can be seen on two simulated upload-pack
workloads. The first is based on 44 reflog entries from my git.git
origin/master reflog, and represents the packs that kernel.org sent me git
updates for the past month or two. The second workload represents much
larger fetches, going from git's v1.0.0 tag to v1.1.0, then v1.1.0 to
v1.2.0, and so on.
The table below shows the average generated pack size and the average CPU
time consumed for each dataset, both before and after the patch:
dataset
| reflog | tags
---------------------------------
before | 53358 | 2750977
size after | 32398 | 2668479
change | -39% | -3%
---------------------------------
before | 0.18 | 1.12
CPU after | 0.18 | 1.15
change | +0% | +3%
This patch makes a much bigger difference for packs with a shorter slice
of history (since its effect is seen at the boundaries of the pack) though
it has some benefit even for larger packs.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When showing the raw timestamp, we format the numeric
seconds-since-epoch into a buffer, followed by the timezone
string. This string has come straight from the commit
object. A well-formed object should have a timezone string
of only a few bytes, but we could be operating on data
pushed by a malicious user.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we fetch from a remote, we print a status table like:
From url
* [new branch] foo -> origin/foo
We create this table in a static buffer using sprintf. If
the remote refnames are long, they can overflow this buffer
and smash the stack.
Instead, let's use a strbuf to build the string.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comment on top of stripspace() claims that the buffer
will no longer be NUL-terminated. However, this has not been
the case at least since the move to using strbuf in 2007.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running git describe --dirty the index should be refreshed. Previously
the cached index would cause describe to think that the index was dirty when,
in reality, it was just stale.
The issue was exposed by python setuptools which hardlinks files into another
directory when building a distribution.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git remote rename' will only update the remote's fetch refspec if it
looks like a default one. If the remote has no default fetch refspec,
as in
[remote "origin"]
url = git://git.kernel.org/pub/scm/git/git.git
fetch = +refs/heads/*:refs/remotes/upstream/*
we would not update the fetch refspec and even if there is a ref
called "refs/remotes/origin/master", we should not rename it, since it
was not created by fetching from the remote.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Martin von Zweigbergk <martin.von.zweigbergk@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When renaming a remote, we also try to update the fetch refspec
accordingly, but only if it has the default format. For others, such
as refs/heads/master:refs/heads/origin, we are conservative and leave
it untouched. Let's give the user a warning about refspecs that are
not updated, so he can manually update the config if necessary.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Martin von Zweigbergk <martin.von.zweigbergk@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When renaming a remote called 'o' using 'git remote rename o foo', git
should also rename any remote-tracking branches for the remote. This
does happen, but any remote-tracking branches starting with
'refs/remotes/o', such as 'refs/remotes/origin/bar', will also be
renamed (to 'refs/remotes/foorigin/bar' in this case).
Fix it by simply matching one more character, up to the slash
following the remote name.
Signed-off-by: Martin von Zweigbergk <martin.von.zweigbergk@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When renaming a remote whose name is contained in a configured fetch
refspec for that remote, we currently replace the first occurrence of
the remote name in the refspec. This is correct in most cases, but
breaks if the remote name occurs in the fetch refspec before the
expected place. For example, we currently change
[remote "remote"]
url = git://git.kernel.org/pub/scm/git/git.git
fetch = +refs/heads/*:refs/remotes/remote/*
into
[remote "origin"]
url = git://git.kernel.org/pub/scm/git/git.git
fetch = +refs/heads/*:refs/origins/remote/*
Reduce the risk of changing incorrect sections of the refspec by
matching the entire ":refs/remotes/<name>/" instead of just "<name>".
Signed-off-by: Martin von Zweigbergk <martin.von.zweigbergk@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It makes no sense to do the - possibly very expensive - call to "rev-list
<new-ref-sha1> --not --all" in check_for_new_submodule_commits() when
there aren't any submodules configured.
Leave check_for_new_submodule_commits() early when no name <-> path
mappings for submodules are found in the configuration. To make that work
reading the configuration had to be moved further up in cmd_fetch(), as
doing that after the actual fetch of the superproject was too late.
Reported-by: Martin Fick <mfick@codeaurora.org>
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit ffa69e61d3, reversing
changes made to 4a13c4d148.
Adding a new command line option to receive-pack and feed it from
send-pack is not an acceptable way to add features, as there is no
guarantee that your updated send-pack will be talking to updated
receive-pack. New features need to be added via the capability mechanism
negotiated over the protocol.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When asked if "refs///heads/master" is valid, check-ref-format says "Yes,
it is well formed", and when asked to print canonical form, it shows
"refs/heads/master". This is so that it can be tucked after "$GIT_DIR/"
to form a valid pathname for a loose ref, and we normalize a pathname like
"$GIT_DIR/refs///heads/master" to de-dup the slashes in it.
Similarly, when asked if "/refs/heads/master" is valid, check-ref-format
says "Yes, it is Ok", but the leading slash is not removed when printing,
leading to "$GIT_DIR//refs/heads/master".
Fix it to make sure such leading slashes are removed. Add tests that such
refnames are accepted and normalized correctly.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cloning from a local repository blindly copies or hardlinks all the files
under objects/ hierarchy. This results in two issues:
- If the repository cloned has an "objects/info/alternates" file, and the
command line of clone specifies --reference, the ones specified on the
command line get overwritten by the copy from the original repository.
- An entry in a "objects/info/alternates" file can specify the object
stores it borrows objects from as a path relative to the "objects/"
directory. When cloning a repository with such an alternates file, if
the new repository is not sitting next to the original repository, such
relative paths needs to be adjusted so that they can be used in the new
repository.
This updates add_to_alternates_file() to take the path to the alternate
object store, including the "/objects" part at the end (earlier, it was
taking the path to $GIT_DIR and was adding "/objects" itself), as it is
technically possible to specify in objects/info/alternates file the path
of a directory whose name does not end with "/objects".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also add a test to expose a long-standing bug that is triggered when
cloning with --reference option from a local repository that has its own
alternates. The alternate object stores specified on the command line
are lost, and only alternates copied from the source repository remain.
The bug will be fixed in the next patch.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function was not gentle at all to the callers and died without giving
them a chance to deal with possible errors. Rename it to read_gitfile(),
and update all the callers.
As no existing caller needs a true "gently" variant, we do not bother
adding one at this point.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A malicious server can return ACK with non-existent SHA-1 or not a
commit. lookup_commit() in this case may return NULL. Do not let
fetch-pack crash by accessing NULL address in this case.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first paragraph about flag order is no longer true and is
mentioned in git-checkout-index.txt. The rest is also mentioned in
git-checkout-index.txt.
Remove it and keep uptodate document in one place.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The following sequence of commands reveals an issue with error
reporting of relative paths:
$ mkdir sub
$ cd sub
$ git ls-files --error-unmatch ../bbbbb
error: pathspec 'b' did not match any file(s) known to git.
$ git commit --error-unmatch ../bbbbb
error: pathspec 'b' did not match any file(s) known to git.
This bug is visible only if the normalized path (i.e., the relative
path from the repository root) is longer than the prefix.
Otherwise, the code skips over the normalized path and reads from
an unused memory location which still contains a leftover of the
original command line argument.
So instead, use the existing facilities to deal with relative paths
correctly.
Also fix inconsistency between "checkout" and "commit", e.g.
$ cd Documentation
$ git checkout nosuch.txt
error: pathspec 'Documentation/nosuch.txt' did not match...
$ git commit nosuch.txt
error: pathspec 'nosuch.txt' did not match...
by propagating the prefix down the codepath that reports the error.
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many pathnames in a fast-import stream need to be quoted. In
particular:
1. Pathnames at the end of an "M" or "D" line need quoting
if they contain a LF or start with double-quote.
2. Pathnames on a "C" or "R" line need quoting as above,
but also if they contain spaces.
For (1), we weren't quoting at all. For (2), we put
double-quotes around the paths to handle spaces, but ignored
the possibility that they would need further quoting.
This patch checks whether each pathname needs c-style
quoting, and uses it. This is slightly overkill for (1),
which doesn't actually need to quote many characters that
vanilla c-style quoting does. However, it shouldn't hurt, as
any implementation needs to be ready to handle quoted
strings anyway.
In addition to adding a test, we have to tweak a test which
blindly assumed that case (2) would always use
double-quotes, whether it needed to or not.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reflog manpage says:
git reflog [show] [log-options] [<ref>]
the subcommand 'show' is the default "in the absence of any
subcommands". Currently this is only true if the user provided either
at least one option or no additional argument at all. For example:
git reflog master
won't work. Change this by actually calling cmd_log_reflog in
absence of any subcommand.
Signed-off-by: Michael Schubert <mschub@elegosoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, git push --quiet produces some non-error output, e.g.:
$ git push --quiet
Unpacking objects: 100% (3/3), done.
Add the --quiet option to send-pack/receive-pack and pass it to
unpack-objects in the receive-pack codepath and to receive-pack in
the push codepath.
This fixes a bug reported for the fedora git package:
https://bugzilla.redhat.com/show_bug.cgi?id=725593
Reported-by: Jesse Keating <jkeating@redhat.com>
Cc: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the case of a corrupt repository, git ls-tree may report an error but
presently it exits with a code of 0.
This change uses the return code of read_tree_recursive instead.
Improved-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Jon Seymour <jon.seymour@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reset command creates its reflog entry from argv.
However, it does so after having run parse_options, which
means the only thing left in argv is any non-option
arguments. Thus you would end up with confusing reflog
entries like:
$ git reset --hard HEAD^
$ git reset --soft HEAD@{1}
$ git log -2 -g --oneline
8e46cad HEAD@{0}: HEAD@{1}: updating HEAD
1eb9486 HEAD@{1}: HEAD^: updating HEAD
However, we must also consider that some scripts may set
GIT_REFLOG_ACTION before calling reset, and we need to show
their reflog action (with our text appended). For example:
rebase -i (squash): updating HEAD
On top of that, we also set the ORIG_HEAD reflog action
(even though it doesn't generally exist). In that case, the
reset argument is somewhat meaningless, as it has nothing to
do with what's in ORIG_HEAD.
This patch changes the reset reflog code to show:
$GIT_REFLOG_ACTION: updating {HEAD,ORIG_HEAD}
as before, but only if GIT_REFLOG_ACTION is set. Otherwise,
show:
reset: moving to $rev
for HEAD, and:
reset: updating ORIG_HEAD
for ORIG_HEAD (this is still somewhat superfluous, since we
are in the ORIG_HEAD reflog, obviously, but at least we now
mention which command was used to update it).
While we're at it, we can clean up the code a bit:
- Use strbufs to make the message.
- Use the "rev" parameter instead of showing all options.
This makes more sense, since it is the only thing
impacting the writing of the ref.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because "diff --cached HEAD" showed an incorrect blob object name on the
LHS of the diff, we ended up updating the index entry with bogus value,
not what we read from the tree.
Noticed by John Nowak.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 03feddd (git-check-ref-format: reject funny ref names, 2005-10-13),
"git branch -d" can take more than one branch names to remove.
The documentation was correct, but the usage string was not.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Until now, "git tag -l foo* bar*" would silently ignore the
second argument, showing only refs starting with "foo". It's
not just unfriendly not to take a second pattern; we
actually generated subtly wrong results (from the user's
perspective) because some of the requested tags were
omitted.
This patch allows an arbitrary number of patterns on the
command line; if any of them matches, the ref is shown.
While we're tweaking the documentation, let's also make it
clear that the pattern is fnmatch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we want to know if commit A contains commit B (or any
one of a set of commits, B through Z), we generally
calculate the merge bases and see if B is a merge base of A
(or for a set, if any of the commits B through Z have that
property).
When we are going to check a series of commits A1 through An
to see whether each contains B (e.g., because we are
deciding which tags to show with "git tag --contains"), we
do a series of merge base calculations. This can be very
expensive, as we repeat a lot of traversal work.
Instead, let's leverage the fact that we are going to use
the same --contains list for each tag, and mark areas of the
commit graph is definitely containing those commits, or
definitely not containing those commits. Later tags can then
stop traversing as soon as they see a previously calculated
answer.
This sped up "git tag --contains HEAD~200" in the linux-2.6
repository from:
real 0m15.417s
user 0m15.197s
sys 0m0.220s
to:
real 0m5.329s
user 0m5.144s
sys 0m0.184s
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.
But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.
In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size. The worst part of this story is that we often pass around
z_stream (the state object used by zlib) to keep track of the number of
used bytes in input/output buffer by inspecting these two fields, which
practically limits our callchain to the same 4GB limit.
Wrap z_stream in another structure git_zstream that can express avail_in
and avail_out in unsigned long. For now, just die() when the caller gives
a size that cannot be given to a single zlib call. In later patches in the
series, we would make git_inflate() and git_deflate() internally loop to
give callers an illusion that our "improved" version of zlib interface can
operate on a buffer larger than 4GB in one go.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use
of deflateInit2 in remote-curl.c to tell the library to use gzip header
and trailer in git_deflate_init_gzip().
There is only one caller that cares about the status from deflateEnd().
Introduce git_deflate_end_gently() to let that sole caller retrieve the
status and act on it (i.e. die) for now, but we would probably want to
make inflate_end/deflate_end die when they ran out of memory and get
rid of the _gently() kind.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When create a new branch, we fed "refs/heads/<proposed name>" as a string
to get_sha1() and expected it to fail when a branch already exists.
The right way to check if a ref exists is to check with resolve_ref().
A naïve solution that might appear attractive but does not work is to
forbid slashes in get_describe_name() but that will not work. A describe
name is is in the form of "ANYTHING-g<short sha1>", and that ANYTHING part
comes from a original tag name used in the repository the user ran the
describe command. A sick user could have a confusing hierarchical tag
whose name is "refs/heads/foobar" (stored as refs/tags/refs/heads/foobar")
to generate a describe name "refs/heads/foobar-6-g02ac983", and we should
be able to use that name to refer to the object whose name is 02ac983.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit simply hijacked --quiet and essentially made it into a
no-op. Instead, take it as a cue that the end user wants to omit the patch
output from commands that default to show patches, e.g. "show".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In older versions of git, we used rfc822 header folding to
indicate that the original subject line had multiple lines
in it. But since a1f6baa (format-patch: wrap long header
lines, 2011-02-23), we now use header folding whenever there
is a long line.
This means that "git am" cannot trust header folding as a
sign from format-patch that newlines should be preserved.
Instead, format-patch needs to signal more explicitly that
the newlines are significant. This patch does so by
rfc2047-encoding the newlines in the subject line. No
changes are needed on the "git am" end; it already decodes
the newlines properly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a pretty_print_context representing the parameters
for a pretty-print session, but we did not use it uniformly.
As a result, functions kept growing more and more arguments.
Let's clean this up in a few ways:
1. All pretty-print pp_* functions now take a context.
This lets us reduce the number of arguments to these
functions, since we were just passing around the
context values separately.
2. The context argument now has a cmit_fmt field, which
was passed around separately. That's one less argument
per function.
3. The context argument always comes first, which makes
calling a little more uniform.
This drops lines from some callers, and adds lines in a few
places (because we need an extra line to set the context's
fmt field). Overall, we don't save many lines, but the lines
that are there are a lot simpler and more readable.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many callers don't actually care about the pretty print
context at all; let's just give them a simple way of
pretty-printing a commit without having to create a context
struct.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Without the "-k" option, mailinfo will convert a folded
subject header like:
Subject: this is a
subject that doesn't
fit on one line
into a single line. With "-k", however, we assumed that
these newlines were significant and represented something
that the sending side would want us to preserve.
For messages created by format-patch, this assumption was
broken by a1f6baa (format-patch: wrap long header lines,
2011-02-23). For messages sent by arbitrary MUAs, this was
probably never a good assumption to make, as they may have
been folding subjects in accordance with rfc822's line
length recommendations all along.
This patch now joins folded lines with a single whitespace
character. This treats header folding purely as a syntactic
feature of the transport mechanism, not as something that
format-patch is trying to tell us about the original
subject.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 13fc2c1 (remote: disallow some nonsensical option
combinations, 2011-03-30) made it impossible to use "remote
add -t foo --mirror". The argument was that specifying
specific branches is useless because:
1. Push mirrors do not want a refspec at all.
2. The point of fetch mirroring is to use a broad refspec
like "refs/*", but using "-t" overrides that.
Point (1) is valid; "-t" with push mirrors is useless. But
point (2) ignored another side effect of using --mirror: it
fetches the refs directly into the refs/ namespace as they
are found upstream, instead of placing them in a
separate-remote layout.
So 13fc2c1 was overly constrictive, and disallowed
reasonable specific-branch mirroring, like:
git remote add -t heads/foo -t heads/bar --mirror=fetch
which makes the local "foo" and "bar" branches direct
mirrors of the remote, but does not fetch anything else.
This patch restores the original behavior, but only for
fetch mirrors.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The option can be used to check if read-tree with the same set of other
options like "-m" and "-u" would succeed without actually changing either
the index or the working tree.
The relevant tests in the t10?? range were extended to do a read-tree -n
before the real read-tree to make sure neither the index nor any local
files were changed with -n and the same exit code as without -n is
returned. The helper functions added for that purpose reside in the new
t/lib-read-tree.sh file.
The only exception is #13 in t1004 ("unlinking an un-unlink-able
symlink"). As this is an issue of wrong directory permissions it is not
detected with -n.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When receiving a push, we advertise ref tips from any
alternate repositories, in case that helps the client send a
smaller pack. Since these refs don't actually exist in the
destination repository, we don't transmit the real ref
names, but instead use the pseudo-ref ".have".
If your alternate has a large number of duplicate refs (for
example, because it is aggregating objects from many related
repositories, some of which will have the same tags and
branch tips), then we will send each ".have $sha1" line
multiple times. This is a pointless waste of bandwidth, as
we are simply repeating the same fact to the client over and
over.
This patch eliminates duplicate .have refs early on. It does
so efficiently by sorting the complete list and skipping
duplicates. This has the side effect of re-ordering the
.have lines by ascending sha1; this isn't a problem, though,
as the original order was meaningless.
There is a similar .have system in fetch-pack, but it
does not suffer from the same problem. For each alternate
ref we consider in fetch-pack, we actually open the object
and mark it with the SEEN flag, so duplicates are
automatically culled.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The foreach_alt_odb function triggers a callback for each
alternate object db we have, with room for a single void
pointer as data. Currently, we always call refs_from_alternate_cb
as the callback function, and then pass another callback (to
receive each ref individually) as the void pointer.
This has two problems:
1. C technically forbids stuffing a function pointer into
a "void *". In practice, this probably doesn't matter
on any architectures git runs on, but it never hurts to
follow the letter of the law.
2. There is no room for an extra data pointer. Indeed, the
alternate_ref_fn that refs_from_alternate_cb calls
takes a void* for data, but we always pass it NULL.
Instead, let's properly stuff our function pointer into a
data struct, which also leaves room for an extra
caller-supplied data pointer. And to keep things simple for
existing callers, let's make a for_each_alternate_ref
function that takes care of creating the extra struct.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We add every local ref to a list so that we can mark them
and all of their ancestors back to a certain cutoff point.
However, if some refs point to the same commit, we will end
up adding them to the list many times.
Furthermore, since commit_lists are stored as linked lists,
we must do an O(n) traversal of the list in order to find
the right place to insert each commit. This makes building
the list O(n^2) in the number of refs.
For normal repositories, this isn't a big deal. We have a
few hundreds refs at most, and most of them are unique. But
consider an "alternates" repo that serves as an object
database for many other similar repos. For reachability, it
needs to keep a copy of the refs in each child repo. This
means it may have a large number of refs, many of which
point to the same commits.
By noting commits we have already added to the list, we can
shrink the size of "n" in such a repo to the number of
unique commits, which is on the order of what a normal repo
would contain (it's actually more than a normal repo, since child repos
may have branches at different states, but in practice it tends
to be much smaller than the list with duplicates).
Here are the results on one particular giant repo
(containing objects for all Rails forks on GitHub):
$ git for-each-ref | wc -l
112514
[before]
$ git fetch --no-tags ../remote.git
63.52user 0.12system 1:03.68elapsed 99%CPU (0avgtext+0avgdata 137648maxresident)k
1856inputs+48outputs (11major+19603minor)pagefaults 0swaps
$ git fetch --no-tags ../remote.git
6.15user 0.08system 0:06.25elapsed 99%CPU (0avgtext+0avgdata 123856maxresident)k
0inputs+40outputs (0major+18872minor)pagefaults 0swaps
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of barfing, simply ignore bad object names seen in the
input. This is useful when reading from "git notes list" output
that may refer to objects that have already been garbage collected.
Signed-off-by: Junio C Hamano <gitster@pobox.com>