The builtin-clone now does the http commit walking and the tree unpacking
in the same process, and the commit walker leaves the in-core objects in a
funny state. When forgetting the data read from the tree object, the
object should be marked "not parsed yet" for later users.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also fix an underallocation in walker.c::interpret_target().
Signed-off-by: Krzysztof Kowalczyk <kkowalczyk@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This simplifies a few things, makes a few things slightly more
complicated, but, more importantly, allows that, when struct ref can
represent a symref, http_fetch_ref() can return one.
Incidentally makes the string that http_fetch_ref() gets include "refs/"
(if appropriate), because that's how the name field of struct ref works.
As far as I can tell, the usage in walker:interpret_target() wouldn't have
worked previously, if it ever would have been used, which it wouldn't
(since the fetch process uses the hash instead of the name of the ref
there).
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This error message prints the reponse from the server at this point.
Label it as such in the output.
Signed-off-by: Sam Vilain <sam@vilain.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This turns the extern functions to be provided by the backend into a
struct of pointers, renames the functions to be more
namespace-friendly, and updates http-fetch to this interface. It
removes the unused include from http-push.c. It makes git-http-fetch a
builtin (with the implementation a separate file, accessible
directly).
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_line is now strbuf_getline, and is a first class citizen, it returns 0
when reading a line worked, EOF else.
The ->eof marker was used non-locally by fast-import.c, mimic the same
behaviour using a static int in "read_next_command", that now returns -1 on
EOF, and avoids to call strbuf_getline when it's in EOF state.
Also no longer automagically strbuf_release the buffer, it's counter
intuitive and breaks fast-import in a very subtle way.
Note: being at EOF implies that command_buf.len == 0.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Add strbuf_rtrim to remove trailing spaces.
* Add strbuf_insert to insert data at a given position.
* Off-by one fix in strbuf_addf: strbuf_avail() does not counts the final
\0 so the overflow test for snprintf is the strict comparison. This is
not critical as the growth mechanism chosen will always allocate _more_
memory than asked, so the second test will not fail. It's some kind of
miracle though.
* Add size extension hints for strbuf_init and strbuf_read. If 0, default
applies, else:
+ initial buffer has the given size for strbuf_init.
+ first growth checks it has at least this size rather than the
default 8192.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Without this patch, the code would look for the submodule
commits in the superproject and (needlessly) fail when it
couldn't find them.
Signed-off-by: Sven Verdoolaege <skimo@liacs.nl>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time. There are a few files that need
to have trailing whitespaces (most notably, test vectors). The results
still passes the test, and build result in Documentation/ area is unchanged.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This removes slightly more lines than it adds, but the real reason for
doing this is that future optimizations will require more setup of the
tree descriptor, and so we want to do it in one place.
Also renamed the "desc.buf" field to "desc.buffer" just to trigger
compiler errors for old-style manual initializations, making sure I
didn't miss anything.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a mechanical clean-up of the way *.c files include
system header files.
(1) sources under compat/, platform sha-1 implementations, and
xdelta code are exempt from the following rules;
(2) the first #include must be "git-compat-util.h" or one of
our own header file that includes it first (e.g. config.h,
builtin.h, pkt-line.h);
(3) system headers that are included in "git-compat-util.h"
need not be included in individual C source files.
(4) "git-compat-util.h" does not have to include subsystem
specific header files (e.g. expat.h).
Signed-off-by: Junio C Hamano <junkio@cox.net>
Otherwise there're such things like:
Cannot obtain needed none 9a6e87b60dbd2305c95cecce7d9d60f849a0658d
while processing commit 0000000000000000000000000000000000000000.
which while looks weird. What is the none needed for?
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds a "int *flag" parameter to resolve_ref() and makes
for_each_ref() family to call callback function with an extra
"int flag" parameter. They are used to give two bits of
information (REF_ISSYMREF and REF_ISPACKED) about the ref.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a long overdue fix to the API for for_each_ref() family
of functions. It allows the callers to specify a callback data
pointer, so that the caller does not have to use static
variables to communicate with the callback funciton.
The updated for_each_ref() family takes a function of type
int (*fn)(const char *, const unsigned char *, void *)
and a void pointer as parameters, and calls the function with
the name of the ref and its SHA-1 with the caller-supplied void
pointer as parameters.
The commit updates two callers, builtin-name-rev.c and
builtin-pack-refs.c as an example.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Like xmalloc and xrealloc xstrdup dies with a useful message if
the native strdup() implementation returns NULL rather than a
valid pointer.
I just tried to use xstrdup in new code and found it to be missing.
However I expected it to be present as xmalloc and xrealloc are
already commonly used throughout the code.
[jc: removed the part that deals with last_XXX, which I am
finding more and more dubious these days.]
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This abstracts away the size of the hash values when copying them
from memory location to memory location, much as the introduction
of hashcmp abstracted away hash value comparsion.
A few call sites were using char* rather than unsigned char* so
I added the cast rather than open hashcpy to be void*. This is a
reasonable tradeoff as most call sites already use unsigned char*
and the existing hashcmp is also declared to be unsigned char*.
[jc: Splitted the patch to "master" part, to be followed by a
patch for merge-recursive.c which is not in "master" yet.
Fixed the cast in the latter hunk to combine-diff.c which was
wrong in the original.
Also converted ones left-over in combine-diff.c, diff-lib.c and
upload-pack.c ]
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
With the latest changes in fetch.c, http-fetch crashed accessing
write_ref[i], where write_ref was NULL.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This makes it possible to fetch many commits (refs) at once, greatly
speeding up cg-clone.
Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
pull() now takes an array of arguments instead of just one of each kind.
Currently, no users use the new capability, but that'll change.
Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Currently it's a bit weird that pull() takes a single argument
describing the commit but takes the write_ref from a global variable.
This makes it take that as a parameter as well, which might be nicer
for the libification in the future, but especially it will make for
nicer code when we implement pull()ing multiple commits at once.
Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This updates the type-enumeration constants introduced to reduce
the memory footprint of "struct object" to match the type bits
already used in the packfile format, by removing the former
(i.e. TYPE_* constant macros) and using the latter (i.e. enum
object_type) throughout the code for consistency.
Eventually we can stop passing around the "type strings"
entirely, and this will help - no confusion about two different
integer enumeration.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.
In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.
Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.
This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.
There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.
Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
A few style fixes to get the code in line with the rest.
- asterisk to make a type a pointer to something goes in front
of the variable, not at the end of the base type.
E.g. a pointer to an integer is "int *ip", not "int* ip".
- open parenthesis for function parameter list, unlike
syntactic constructs, comes immediately after the function
name. E.g. "if (foo) bar();" not "if(foo) bar ();".
- "else" does not come on the same line as the closing brace of
corresponding "if".
The style is mostly a matter of personal taste, and people may
disagree, but consistency is important.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This function reads a freshly fetched tree object, and schedules
the objects pointed by it for further fetching, so doing
lookup_tree() and process_tree() recursively from there does not
make much sense. We need to use process() on it to make sure we
fetch it first, and leave the recursive processing to later
stages.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds a "tree_entry()" function that combines the common operation of
doing a "tree_entry_extract()" + "update_tree_entry()".
It also has a simplified calling convention, designed for simple loops
that traverse over a whole tree: the arguments are pointers to the tree
descriptor and a name_entry structure to fill in, and it returns a boolean
"true" if there was an entry left to be gotten in the tree.
This allows tree traversal with
struct tree_desc desc;
struct name_entry entry;
desc.buf = tree->buffer;
desc.size = tree->size;
while (tree_entry(&desc, &entry) {
... use "entry.{path, sha1, mode, pathlen}" ...
}
which is not only shorter than writing it out in full, it's hopefully less
error prone too.
[ It's actually a tad faster too - we don't need to recalculate the entry
pathlength in both extract and update, but need to do it only once.
Also, some callers can avoid doing a "strlen()" on the result, since
it's returned as part of the name_entry structure.
However, by now we're talking just 1% speedup on "git-rev-list --objects
--all", and we're definitely at the point where tree walking is no
longer the issue any more. ]
NOTE! Not everybody wants to use this new helper function, since some of
the tree walkers very much on purpose do the descriptor update separately
from the entry extraction. So the "extract + update" sequence still
remains as the core sequence, this is just a simplified interface.
We should probably add a silly two-line inline helper function for
initializing the descriptor from the "struct tree" too, just to cut down
on the noise from that common "desc" initializer.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This leaves only the horrid code in builtin-read-tree.c using the old
interface. Some day I will gather the strength to tackle that one too.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Instead, just use the tree buffer directly, and use the tree-walk
infrastructure to walk the buffers instead of the tree-entry list.
The tree-entry list is inefficient, and generates tons of small
allocations for no good reason. The tree-walk infrastructure is
generally no harder to use than following a linked list, and allows
us to do most tree parsing in-place.
Some programs still use the old tree-entry lists, and are a bit
painful to convert without major surgery. For them we have a helper
function that creates a temporary tree-entry list on demand.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This finally removes the tree-entry list from "struct tree", since most of
the users can just use the tree-walk infrastructure to walk the raw tree
buffers instead of the tree-entry list.
The tree-entry list is inefficient, and generates tons of small
allocations for no good reason. The tree-walk infrastructure is generally
no harder to use than following a linked list, and allows us to do most
tree parsing in-place.
Some programs still use the old tree-entry lists, and are a bit painful to
convert without major surgery. For them we have a helper function that
creates a temporary tree-entry list on demand. We can convert those too
eventually, but with this they no longer affect any users who don't need
the explicit lists.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Funnily enough, this variable was never assigned ever since it
was introduced, and has been protecting some code that has never
been executed.
Signed-off-by: Junio C Hamano <junkio@cox.net>
If a ref is changed by http-fetch, local-fetch or ssh-fetch
record the change and the remote URL/name in the log for the ref.
This requires loading the config file to check logAllRefUpdates.
Also fixed a bug in the ref lock generation; the log file name was
not being produced right due to a bad prefix length.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Created 'struct ref_lock' to contain the data necessary to perform
a ref update. This change improves writing a ref as the file names
are generated only once (rather than twice) and supports following
symrefs (up to the maximum depth). Further the ref_lock structure
provides room to extend the update API with ref logging.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Be sure not to fetch objects that already exist in the local repository.
The main process loop no longer performs this check, http-fetch now checks
prior to starting a new request queue entry and when fetch_object() is called,
and local-fetch now checks when fetch_object() is called.
As discussed in this thread: http://marc.theaimsgroup.com/?t=112854890500001
Signed-off-by: Nick Hengeveld <nickh@reactrix.com>
With the --recover option, we verify that we have absolutely
everything reachable from the target, not assuming that things
reachable from refs will be complete.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The fetch code does not need object ref lists; by disabling them we
can save some time and memory.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The call to parse_object() in process() is not actually needed - if
the object type is unknown, parse_object() will be called by loop();
if the type is known, the object will be parsed by the appropriate
process_*() function.
After this change blobs which exist locally are no longer parsed,
which gives about 2x CPU usage improvement; the downside is that there
will be no warnings for existing corrupted blobs, but detecting such
corruption is the job of git-fsck-objects, not the fetch programs.
Newly fetched objects are still checked for corruption in http-fetch.c
and ssh-fetch.c (local-fetch.c does not seem to do it, but the removed
parse_object() call would not be reached for new objects anyway).
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Remove holes left after deleting flags, and use shifts to emphasize
that flags are single bits.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
If the SEEN flag was not set, the TO_SCAN flag cannot be set,
therefore testing it is pointless.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
It does not matter if we call prefetch() or set the TO_SCAN flag before
or after adding the object to process_queue. However, doing it before
object_list_insert() allows us to kill 3 lines of duplicated code.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The TO_FETCH flag also became redundant after adding the SEEN flag -
it was set and checked in process() to prevent adding the same object
to process_queue multiple times, but now SEEN guards against this.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
After adding the SEEN flag, the SCANNED flag became obviously
redundant - each object can get into process_queue through process()
only once, and therefore multiple calls to process_object() for the
same object are not possible.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The process() function is very often called multiple times for the
same object (because lots of trees refer to the same blobs), but did
not have a fast check for this, therefore a lot of useless calls to
has_sha1_file() and parse_object() were made before discovering that
nothing needs to be done.
This patch adds the SEEN flag which is used in process() to make it
look at each object only once. When testing git-local-fetch on the
repository of GIT, this gives a 14x improvement in CPU usage (mainly
because the redundant calls to parse_object() are now avoided -
parse_object() always unpacks and parses the object data, even if it
was already parsed before).
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
In all places where process() is called except the one in pull() (which
is executed only once) the pointer to the object is already available,
so pass it as the argument to process() instead of sha1 and avoid an
unneeded call to lookup_object_type().
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The call to parse_object() in process() is not actually needed - if
the object type is unknown, parse_object() will be called by loop();
if the type is known, the object will be parsed by the appropriate
process_*() function.
After this change blobs which exist locally are no longer parsed,
which gives about 2x CPU usage improvement; the downside is that there
will be no warnings for existing corrupted blobs, but detecting such
corruption is the job of git-fsck-objects, not the fetch programs.
Newly fetched objects are still checked for corruption in http-fetch.c
and ssh-fetch.c (local-fetch.c does not seem to do it, but the removed
parse_object() call would not be reached for new objects anyway).
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Remove holes left after deleting flags, and use shifts to emphasize
that flags are single bits.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>