This patch introduces a strict mode, which ensures that:
- no malformed object will be written
- no object with broken links will be written
The patch ensures this by delaying the write of all non blob object.
These object are written, after all objects they link to are written.
An error can only result in unreferenced objects.
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Preventing objects with broken links entering the repository
means, that write of some objects must be delayed.
This patch adds a cache to keep the object data in memory. The delta
resolving code must also search in the cache.
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since it is now OK to pass a null pointer to display_progress() and
stop_progress() resulting in a no-op, then we can simplify the code
and remove a bunch of lines by not making those calls conditional all
the time.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows for better management of progress "object" existence,
as well as making the progress display implementation more independent
from its callers.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each progress can be on a single line instead of two.
[sp: Changed "Checking files out" to "Checking out files" at
Johannes Sixt's suggestion as it better explains the
action that is taking place]
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This patch fixes all calls to xread() where the return value is not
stored into an ssize_t. The patch should not have any effect whatsoever,
other than putting better/more appropriate type names on variables.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
If the progress bar ends up in a box, better provide a title for it too.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Instead of having this code duplicated in multiple places, let's have
a common interface for progress display. If someday someone wishes to
display a cheezy progress bar instead then only one file will have to
be changed.
Note: I left merge-recursive.c out since it has a strange notion of
progress as it apparently increase the expected total number as it goes.
Someone with more intimate knowledge of what that is supposed to mean
might look at converting it to the common progress interface.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Change a few size and offset variables to more appropriate type, then
add overflow tests on those offsets. This prevents any bad data to be
generated/processed if off_t happens to not be large enough to handle
some big packs.
Better be safe than sorry.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch introduces the MSB() macro to obtain the desired number of
most significant bits from a given variable independently of the variable
type.
It is then used to better implement the overflow test on the OBJ_OFS_DELTA
base offset variable with the property of always working correctly
regardless of the type/size of that variable.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value. One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.
This is an initial step for the removal of the version using a char array
found in object reading code paths. The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This mechanically converts strncmp() to use prefixcmp(), but only when
the parameters match specific patterns, so that they can be verified
easily. Leftover from this will be fixed in a separate step, including
idiotic conversions like
if (!strncmp("foo", arg, 3))
=>
if (!(-prefixcmp(arg, "foo")))
This was done by using this script in px.perl
#!/usr/bin/perl -i.bak -p
if (/strncmp\(([^,]+), "([^\\"]*)", (\d+)\)/ && (length($2) == $3)) {
s|strncmp\(([^,]+), "([^\\"]*)", (\d+)\)|prefixcmp($1, "$2")|;
}
if (/strncmp\("([^\\"]*)", ([^,]+), (\d+)\)/ && (length($1) == $3)) {
s|strncmp\("([^\\"]*)", ([^,]+), (\d+)\)|(-prefixcmp($2, "$1"))|;
}
and running:
$ git grep -l strncmp -- '*.c' | xargs perl px.perl
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a mechanical clean-up of the way *.c files include
system header files.
(1) sources under compat/, platform sha-1 implementations, and
xdelta code are exempt from the following rules;
(2) the first #include must be "git-compat-util.h" or one of
our own header file that includes it first (e.g. config.h,
builtin.h, pkt-line.h);
(3) system headers that are included in "git-compat-util.h"
need not be included in individual C source files.
(4) "git-compat-util.h" does not have to include subsystem
specific header files (e.g. expat.h).
Signed-off-by: Junio C Hamano <junkio@cox.net>
Some applications which invoke unpack-objects or index-pack --stdin
may want to examine the pack header to determine the number of
objects contained in the pack and use that value to determine which
executable to invoke to handle the rest of the pack stream.
However if the caller consumes the pack header from the input stream
then its no longer available for unpack-objects or index-pack --stdin,
both of which need the version and object count to process the stream.
This change introduces --pack_header=ver,cnt as a command line option
that the caller can supply to indicate it has already consumed the
pack header and what version and object count were found in that
header. As this option is only meant for low level applications
such as receive-pack we are not documenting it at this time.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
For delta resolution to be possible, a list of sha1/offset tuple must
be constructed in memory in order to load the appropriate base object.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds a new object, namely OBJ_OFS_DELTA, renames OBJ_DELTA to
OBJ_REF_DELTA to better make the distinction between those two delta
objects, and adds support for the handling of those new delta objects
in sha1_file.c only.
The OBJ_OFS_DELTA contains a relative offset from the delta object's
position in a pack instead of the 20-byte SHA1 reference to identify
the base object. Since the base is likely to be not so far away, the
relative offset is more likely to have a smaller encoding on average
than an absolute offset. And for those delta objects the base must
always be stored first because there is no way to know the distance of
later objects when streaming a pack. Hence this relative offset is
always meant to be negative.
The offset encoding is slightly denser than the one used for object
size -- credits to <linux@horizon.com> (whoever this is) for bringing
it to my attention.
This allows for pack size reduction between 3.2% (Linux-2.6) to over 5%
(linux-historic). Runtime pack access should be faster too since delta
replay does skip a search in the pack index for each delta in a chain.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The code called this operation "desperate" but the option flag is -r
and the word "recover" describes what it does better.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The command unpack-objects dies upon the first error. This is
probably considered a feature -- if a pack is corrupt, instead
of trying to extract from it and possibly risking to contaminate
a good repository with objects whose validity is dubious, we
should seek a good copy of the pack and retry. However, we may
not have any good copy anywhere. This implements the last
resort effort to extract what are salvageable from such a
corrupt pack.
This flag might have helped Sergio when recovering from a
corrupt pack. In my test, it managed to salvage 247 objects out
of a pack that had 251 objects but without it the command
stopped after extracting 73 objects.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This abstracts away the size of the hash values when copying them
from memory location to memory location, much as the introduction
of hashcmp abstracted away hash value comparsion.
A few call sites were using char* rather than unsigned char* so
I added the cast rather than open hashcpy to be void*. This is a
reasonable tradeoff as most call sites already use unsigned char*
and the existing hashcmp is also declared to be unsigned char*.
[jc: Splitted the patch to "master" part, to be followed by a
patch for merge-recursive.c which is not in "master" yet.
Fixed the cast in the latter hunk to combine-diff.c which was
wrong in the original.
Also converted ones left-over in combine-diff.c, diff-lib.c and
upload-pack.c ]
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Introduces global inline:
hashcmp(const unsigned char *sha1, const unsigned char *sha2)
Uses memcmp for comparison and returns the result based on the length of
the hash name (a future runtime decision).
Acked-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
With this, unpack-objects will write out the loose objects with
new-style headers when core.legacyheaders configuration is set
to false.
One unfortunate thing is that we still need inflate/deflate cycle
when unpacking, even for objects in the pack stream that are not
deltified, because it is not possible to determine the boundary of
objects in the encoded stream cheaply without inflating it first.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The very initial version of unpack-objects.c::unpack_all() used
to unpack from the end of the pack, but since end of June last
year it was changed to stream from the front and the comment
does not reflect the reality anymore.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This replaces occurences of "blob", "commit", "tag", and "tree",
where they're really used as type specifiers, which we already
have defined global constants for.
Signed-off-by: Peter Eriksen <s022018@student.dtu.dk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
After experimenting with code to add the ability to encode a delta
against part of the deltified file, it turns out that resulting packs
are _bigger_ than when this ability is not used. The raw delta output
might be smaller, but it doesn't compress as well using gzip with a
negative net saving on average.
Said bit would in fact be more useful to allow for encoding the copying
of chunks larger than 64KB providing more savings with large files.
This will correspond to packs version 3.
While the current code still produces packs version 2, it is made future
proof so pack versions 2 and 3 are accepted. Any pack version 2 are
compatible with version 3 since the redefined bit was never used before.
When enough time has passed, code to use that bit to produce version 3
packs could be added.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We had errno==EINTR check after read(2)/write(2) sprinkled all
over the places, always doing continue. Consolidate them into
xread()/xwrite() wrapper routines.
Credits for suggestion goes to HPA -- bugs are mine.
Signed-off-by: Junio C Hamano <junkio@cox.net>
These commands are converted to run from a subdirectory.
commit-tree convert-objects merge-base merge-index mktag
pack-objects pack-redundant prune-packed read-tree tar-tree
unpack-file unpack-objects update-server-info write-tree
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch documents the -n command-line option to git-unpack-objects,
as it was previously undocumented.
Signed-off-by: Nikolai Weibull <nikolai@bitwi.se>
Signed-off-by: Junio C Hamano <junkio@cox.net>
- Call inflateEnd to release zlib state after use.
- After resolving delta, free base object data.
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We used to print the index of the object we unpacked, not how many we
had unpacked. Which caused slightly confusing progress reports like
100% (2/3) done
rather than the more obvious "3/3" for 100% ;)
This ends up being very calming for big "git clone"s, since otherwise
you just get very frustrated with a long silence, wondering whether it's
working at all.
Use "-q" to quiet it down.
Now if we could just do the same for the initial "figure out what to
pack" phase, which can also be quite slow if the other end is busy (or
not packed and not in cache)...
zlib actually writes a header for that case, and while ignoring that
header will get us the right data, it will also end up messing up our
stream position. So we actually want zlib to "uncompress" even an empty
object.
It can no longer be as verbose, since it doesn't have a good way to
resolve deltas (now that it is purely streaming, it cannot seek around
to read the objects a delta is based on).
But it can check that the thing unpacks cleanly at least as far as pack
syntax goes - all the objects uncompress cleanly, and the pack has the
right final SHA1.
I'd like to add back the "dry-run" thing, but it turns out that to do it
well, I'd have to keep all the object data in memory (which is not
acceptable). So I'll clean it up a bit and make it do as many checks as
it can.
This makes it match the new delta encoding, and admittedly makes the
code easier to follow.
This also updates the PACK file version to 2, since this (and the delta
encoding change in the previous commit) are incompatible with the old
format.
It gets a bit more complicated to unpack in a streaming environment, but
here it is. The rewrite is actually a lot cleaner in other ways, it's
just a bit more subtle.
Standalone unpack-objects command was not adjusted for header length
encoding change when dealing with deltified entry. This fixes it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This also adds a header with a signature, version info, and the number
of objects to the pack file. It also encodes the file length and type
more efficiently.
Also, make the writing of the SHA1 as a end-header be conditional: not
every user will necessarily want to write the SHA1 to the file itself,
even though current users do (but we migh end up using the same helper
functions for the object files themselves, that don't do this).
This also makes the packed index file contain the SHA1 of the packed
data file at the end (just before its own SHA1). That way you can
validate the pairing of the two if you want to.
We want to be able to check their integrity later, and putting the
sha1-sum of the contents at the end is a good thing. The writing
routines are generic, so we could try to re-use them for the index file,
instead of having the same logic duplicated.
Update unpack-objects to know about the extra 20 bytes at the end
of the index.
This actually successfully packed and unpacked a git archive down to
1.3MB (17MB unpacked).
Right now unpacking is way too noisy, lots of debug messages left.
This finishes the initial round of git-pack-object /
git-unpack-object pair. They are now good enough to be used as
a transport medium:
- Fix delta direction in pack-objects; the original was
computing delta to create the base object from the object to
be squashed, which was quite unfriendly for unpacker ;-).
- Add a script to test the very basics.
- Implement unpacker for both regular and deltified objects.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>