This probably hasn't been properly tested before. Here's a script to
create a 8GB repo with the necessary characteristics (copy the
test-genrandom executable from the Git build tree to /tmp first):
-----
#!/bin/bash
git init
git config core.compression 0
# create big objects with no deltas
for i in $(seq -w 1 2 63)
do
echo $i
/tmp/test-genrandom $i 268435456 > file_$i
git add file_$i
rm file_$i
echo "file_$i -delta" >> .gitattributes
done
# create "deltifiable" objects in between big objects
for i in $(seq -w 2 2 64)
do
echo "$i $i $i" >> grow
cp grow file_$i
git add file_$i
rm file_$i
done
rm grow
# create a pack with them
git commit -q -m "commit of big objects interlaced with small deltas"
git repack -a -d
-----
Then clone this repo over the Git protocol.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the same spirit of prettifying Git's output display for mere mortals,
here's a simple extension to the progress API allowing for a final
message to be provided when terminating a progress line, and use it for
the display of the number of objects needed to complete a thin pack,
saving yet one more line of screen display.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The throughput display needs a delay period before accounting and
displaying anything. Yet it might be called after some amount of data
has already been transferred. The display of total data is therefore
accounted late and therefore smaller than the reality.
Let's call display_throughput() with an absolute amount of transferred
data instead of a relative number, and let the throughput code find the
relative amount of data by itself as needed. This way the displayed
total is always exact.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a good idea to use pack index version 2 all the time since it has
proper protection against propagation of certain pack corruptions when
repacking which is not possible with index version 1, as demonstrated
in test t5302.
Hence this config option.
The default is still pack index version 1.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
... and call it "Receiving objects" when over stdin to look clearer
to end users.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since it is now OK to pass a null pointer to display_progress() and
stop_progress() resulting in a no-op, then we can simplify the code
and remove a bunch of lines by not making those calls conditional all
the time.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows for better management of progress "object" existence,
as well as making the progress display implementation more independent
from its callers.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that some pointers have lost their const attribute, we can free their
associated memory when done with them. This is more a correctness issue
about the rule for freeing those pointers which isn't completely trivial
more than the leak itself which didn't matter as the program is
exiting anyway.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Two functions, namely write_idx_file() and open_pack_file(), currently
return a const pointer. However that pointer is either a copy of the
first argument, or set to a malloc'd buffer when that first argument
is null. In the later case it is wrong to qualify that pointer as const
since ownership of the buffer is transferred to the caller to dispose of,
and obviously the free() function is not meant to be passed const
pointers.
Making the return pointer not const causes a warning when the first
argument is returned since that argument is also marked const.
The correct thing to do is therefore to remove the const qualifiers,
avoiding the need for ugly casts only to silence some warnings.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Each progress can be on a single line instead of two.
[sp: Changed "Checking files out" to "Checking out files" at
Johannes Sixt's suggestion as it better explains the
action that is taking place]
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
xmkstemp() performs error checking and prints a standard error message when
an error occur.
Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I audited git for potential undetected write failures.
In the cases fixed below, the diagnostics I add mimic the diagnostics
used in surrounding code, even when that means not reporting
the precise strerror(errno) cause of the error.
Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch unifies the write_index_file functions in
builtin-pack-objects.c and index-pack.c. As the name
"index" is overloaded in git, move in the direction of
using "idx" and "pack idx" when refering to the pack index.
There should be no change in functionality.
Signed-off-by: Geert Bosch <bosch@gnat.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch fixes all calls to xread() where the return value is not
stored into an ssize_t. The patch should not have any effect whatsoever,
other than putting better/more appropriate type names on variables.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Now that fast-import is using a "library function" to handle
correcting its packfile's object count and trailing SHA-1 we
should reuse the same function in index-pack, to reduce the
size of the code we must maintain.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the progress bar ends up in a box, better provide a title for it too.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Instead of having this code duplicated in multiple places, let's have
a common interface for progress display. If someday someone wishes to
display a cheezy progress bar instead then only one file will have to
be changed.
Note: I left merge-recursive.c out since it has a strange notion of
progress as it apparently increase the expected total number as it goes.
Someone with more intimate knowledge of what that is supposed to mean
might look at converting it to the common progress interface.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is necessary for testing the new capabilities in some automated
way without having an actual 4GB+ pack.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Like previous patch but for index-pack.
[ There is quite some code duplication between pack-objects and index-pack
for generating a pack index (and fast-import as well I suppose). This
should be reworked into a common function eventually. But not now. ]
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Change a few size and offset variables to more appropriate type, then
add overflow tests on those offsets. This prevents any bad data to be
generated/processed if off_t happens to not be large enough to handle
some big packs.
Better be safe than sorry.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch introduces the MSB() macro to obtain the desired number of
most significant bits from a given variable independently of the variable
type.
It is then used to better implement the overflow test on the OBJ_OFS_DELTA
base offset variable with the property of always working correctly
regardless of the type/size of that variable.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When some operations are interrupted (or "die()'d" or crashed) then the
partial object/pack/index file may remain around. Make it more obvious
in their name that those files are temporary stuff and can be cleaned up
if no operation is in progress.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When appending objects to a pack, make sure the appended data is really
what we expect instead of simply loading potentially corrupted objects
and legitimating them by computing a SHA1 of that corrupt data.
With this the sha1_object() can lose its test_for_collision parameter
which is now redundent.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Use hash_sha1_file() instead of duplicating code to compute object SHA1.
While at it make it accept a const pointer.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Waaaaaaay back Git was considered to be secure as it never overwrote
an object it already had. This was ensured by always unpacking the
packfile received over the network (both in fetch and receive-pack)
and our already existing logic to not create a loose object for an
object we already have.
Lately however we keep "large-ish" packfiles on both fetch and push
by running them through index-pack instead of unpack-objects. This
would let an attacker perform a birthday attack.
How? Assume the attacker knows a SHA-1 that has two different
data streams. He knows the client is likely to have the "good"
one. So he sends the "evil" variant to the other end as part of
a "large-ish" packfile. The recipient keeps that packfile, and
indexes it. Now since this is a birthday attack there is a SHA-1
collision; two objects exist in the repository with the same SHA-1.
They have *very* different data streams. One of them is "evil".
Currently the poor recipient cannot tell the two objects apart,
short of by examining the timestamp of the packfiles. But lets
say the recipient repacks before he realizes he's been attacked.
We may wind up packing the "evil" version of the object, and deleting
the "good" one. This is made *even more likely* by Junio's recent
rearrange_packed_git patch (b867092f).
It is extremely unlikely for a SHA1 collisions to occur, but if it
ever happens with a remote (hence untrusted) object we simply must
not let the fetch succeed.
Normally received packs should not contain objects we already have.
But when they do we must ensure duplicated objects with the same SHA1
actually contain the same data.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We shouldn't attempt to assign constant strings into char*, as the
string is not writable at runtime. Likewise we should always be
treating unsigned values as unsigned values, not as signed values.
Most of these are very straightforward. The only exception is the
(unnecessary) xstrdup/free in builtin-branch.c for the detached
head case. Since this is a user-level interactive type program
and that particular code path is executed no more than once, I feel
that the extra xstrdup call is well worth the easy elimination of
this warning.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
A filesystem might not be able to completely supply our pread
request in one system call, such as if we are reading data from a
network file system and the requested length is just simply huge.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value. One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.
This is an initial step for the removal of the version using a char array
found in object reading code paths. The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Sometime typename() is used, sometimes type_names[] is accessed directly.
Let's enforce typename() all the time which allows for validating the
type.
Also let's add a function to go from a name to a type and use it instead
of manual memcpy() when appropriate.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch fixes issues mentioned by Junio, Nico and Simon:
- I forgot to convert the usage string when removing the "--" from
the subcommands,
- a style fix in the bundle_header,
- use xread() instead of read(),
- use write_or_die() instead of write(),
- make the bundle header extensible,
- fail if the whitespace after a sha1 of a reference is missing,
- close() the fds passed to a subprocess,
- in verify_bundle(), do not use "rev-list --stdin", but rather
pass the revs directly (avoiding a fork()),
- fix a corrupted comment in show_object(), and
- fix the size check in index_pack.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Some workflows require use of repositories on machines that cannot be
connected, preventing use of git-fetch / git-push to transport objects and
references between the repositories.
git-bundle provides an alternate transport mechanism, effectively allowing
git-fetch and git-pull to operate using sneakernet transport. `git-bundle
create` allows the user to create a bundle containing one or more branches
or tags, but with specified basis assumed to exist on the target
repository. At the receiving end, git-bundle acts like git-fetch-pack,
allowing the user to invoke git-fetch or git-pull using the bundle file as
the URL. git-fetch and git-ls-remote determine they have a bundle URL by
checking that the URL points to a file, but are otherwise unchanged in
operation with bundles.
The original patch was done by Mark Levedahl <mdl123@verizon.net>.
It was updated to make git-bundle a builtin, and get rid of the tar
format: now, the first line is supposed to say "# v2 git bundle", the next
lines either contain a prerequisite ("-" followed by the hash of the
needed commit), or a ref (the hash of a commit, followed by the name of
the ref), and finally the pack. As a result, the bundle argument can be
"-" now.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This mechanically converts strncmp() to use prefixcmp(), but only when
the parameters match specific patterns, so that they can be verified
easily. Leftover from this will be fixed in a separate step, including
idiotic conversions like
if (!strncmp("foo", arg, 3))
=>
if (!(-prefixcmp(arg, "foo")))
This was done by using this script in px.perl
#!/usr/bin/perl -i.bak -p
if (/strncmp\(([^,]+), "([^\\"]*)", (\d+)\)/ && (length($2) == $3)) {
s|strncmp\(([^,]+), "([^\\"]*)", (\d+)\)|prefixcmp($1, "$2")|;
}
if (/strncmp\("([^\\"]*)", ([^,]+), (\d+)\)/ && (length($1) == $3)) {
s|strncmp\("([^\\"]*)", ([^,]+), (\d+)\)|(-prefixcmp($2, "$1"))|;
}
and running:
$ git grep -l strncmp -- '*.c' | xargs perl px.perl
Signed-off-by: Junio C Hamano <junkio@cox.net>
We have a number of badly checked write() calls. Often we are
expecting write() to write exactly the size we requested or fail,
this fails to handle interrupts or short writes. Switch to using
the new write_in_full(). Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xwrite().
Note, the changes to config handling are much larger and handled
in the next patch in the sequence.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We have a number of badly checked read() calls. Often we are
expecting read() to read exactly the size we requested or fail, this
fails to handle interrupts or short reads. Add a read_in_full()
providing those semantics. Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xread().
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
If ever new object types are added for future extensions then better
have current git version report them as "unknown" instead of
"corrupted".
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a mechanical clean-up of the way *.c files include
system header files.
(1) sources under compat/, platform sha-1 implementations, and
xdelta code are exempt from the following rules;
(2) the first #include must be "git-compat-util.h" or one of
our own header file that includes it first (e.g. config.h,
builtin.h, pkt-line.h);
(3) system headers that are included in "git-compat-util.h"
need not be included in individual C source files.
(4) "git-compat-util.h" does not have to include subsystem
specific header files (e.g. expat.h).
Signed-off-by: Junio C Hamano <junkio@cox.net>
It was reported by Randal L. Schwartz <merlyn@stonehenge.com> that
indexing the Linux repository ~150MB pack takes about an hour on OS x
while it's a minute on Linux. It seems that the OS X mmap()
implementation is more than 2 orders of magnitude slower than the Linux
one.
Linus proposed a patch replacing mmap() with pread() bringing index-pack
performance on OS X in line with the Linux one. The performances on
Linux also improved by a small margin.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
git-index-pack can call memcpy with overlapping source and destination
buffers. The patch below makes it use memmove instead.
If you want to demonstrate a failure, add the following two lines
+ if (input_offset < input_len)
+ abort ();
before the existing memcpy call (shown in the patch below),
and then run this:
(cd t; sh ./t5500-fetch-pack.sh)
Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The declaration of discard_cache() in cache.h already has its "void".
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This makes both git-fetch and git-push (fetch-pack and receive-pack)
safe against a possible race with aparallel git-repack -a -d that could
prune the new pack while it is not yet referenced, and remove the .keep
file after refs have been updated.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
If by chance we receive a pack which content (list of objects) matches
another pack that we already have, and if that pack is marked with a
.keep file, then we should not overwrite it.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Some applications which invoke unpack-objects or index-pack --stdin
may want to examine the pack header to determine the number of
objects contained in the pack and use that value to determine which
executable to invoke to handle the rest of the pack stream.
However if the caller consumes the pack header from the input stream
then its no longer available for unpack-objects or index-pack --stdin,
both of which need the version and object count to process the stream.
This change introduces --pack_header=ver,cnt as a command line option
that the caller can supply to indicate it has already consumed the
pack header and what version and object count were found in that
header. As this option is only meant for low level applications
such as receive-pack we are not documenting it at this time.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
To prevent a race condition between `index-pack --stdin` and
`repack -a -d` where the repack deletes the newly created pack
file before any refs are updated to reference objects contained
within it we mark the pack file as one that should be kept. This
removes it from the list of packs that `repack -a -d` will consider
for removal.
Callers such as `receive-pack` which want to invoke `index-pack`
should use this new --keep option to prevent the newly created pack
and index file pair from being deleted before they have finished any
related ref updates. Only after all ref updates have been finished
should the associated .keep file be removed.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Use proper english. Be more exact in one comment.
[jc: I threw in a bit of style clean-up as well]
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
It appears that git-unpack-objects writes the last part of the input
buffer to stdout after the pack has been parsed. This looks a bit
suspicious since the last fill() might have filled the buffer up to
the 4096 byte limit and more data might still be pending on stdin,
but since this is about being a drop-in replacement for unpack-objects
let's simply duplicate the same behavior for now.
[jc: with fix-up appeared in Nico's sleep]
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is more interesting to look at when performing a big fetch.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>