In a repository created with git older than f49fb35 (git-init-db: create
"pack" subdirectory under objects, 2005-06-27), objects/pack/ directory is
not created upon initialization. It was Ok because subdirectories are
created as needed inside directories init-db creates, and back then,
packfiles were recent invention.
After the said commit, new codepaths started relying on the presense of
objects/pack/ directory in the repository. This was exacerbated with
8b4eb6b (Do not perform cross-directory renames when creating packs,
2008-09-22) that moved the location temporary pack files are created from
objects/ directory to objects/pack/ directory, because moving temporary to
the final location was done carefully with lazy leading directory creation.
Many packfile related operations in such an old repository can fail
mysteriously because of this.
This commit introduces two helper functions to make things work better.
- odb_mkstemp() is a specialized version of mkstemp() to refactor the
code and teach it to create leading directories as needed;
- odb_pack_keep() refactors the code to create a ".keep" file while
create leading directories as needed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Programs that use git_config need to find the global configuration.
When runtime prefix computation is enabled, this requires that
git_extract_argv0_path() is called early in the program's main().
This commit adds the necessary calls.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
R. Tyler Ballance reported a mysterious transient repository corruption;
after much digging, it turns out that we were not catching and reporting
memory allocation errors from some calls we make to zlib.
This one _just_ wraps things; it doesn't do the "retry on low memory
error" part, at least not yet. It is an independent issue from the
reporting. Some of the errors are expected and passed back to the caller,
but we die when zlib reports it failed to allocate memory for now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
LF at the end of format strings given to die() is redundant because
die already adds one on its own.
Signed-off-by: Alexander Potashev <aspotashev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In one case, it was possible to have a bad offset equal to 0 effectively
pointing a delta onto itself and crashing git after too many recursions.
In the other cases, a negative offset could result due to off_t being
signed. Catch those.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Another (but minor this time) fallout from commit 9441b61 (index-pack:
rationalize delta resolution code, 2008-10-17).
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before commit d0b92a3f6e it was possible to run 'git index-pack'
directly in the .git/objects/pack/ directory. Restore that ability.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since commit 9441b61dc5, two issues affected correct behavior of
index-pack:
1) The real_type of a delta object is the 'real_type' of its base, not
the 'type' which can be a "delta type". Consequence of this is a
corrupted pack index file which only needs to be recreated with a
good index-pack command ('git verify-pack' will flag those).
2) The code sequence:
result->data = patch_delta(get_base_data(base), base->obj->size,
delta_data, delta_size, &result->size);
has two issues of its own since base->obj->size should instead be
base->size as we want the size of the actual object data and not
the size of the delta object it is represented by. Except that
simply replacing base->obj->size with base->size won't make the
code more correct as the C language doesn't enforce a particular
ordering for the evaluation of needed arguments for a function call,
hence base->size could be pushed on the stack before get_base_data()
which initializes base->size is called.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Tested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Win32 does not allow renaming read-only files (at least on a Samba
share), making push into a local directory to fail. Thus, defer
the chmod() call in index-pack.c:final() only after
move_temp_to_file() was called.
Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
There is no need to keep the base object data around after its last delta
has been resolved. This also means that long delta chains with only one
delta per base won't grow the cache size unnecessarily as the base will
be freed before recursing down.
To make it easy, find_delta_children() is modified so the first and last
indices are initialized in all cases.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of having strange loops for walking unresolved deltas with the
same base duplicated in many places, let's rework the code so this is
done in a single place instead. This simplifies callers quite a bit too.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since v1.6.0.2~13^2~ the completion of a thin pack uses sha1write() for
its ability to compute a SHA1 on the written data. This also provides
data buffering which, along with commit 92392b4a45, will confuse pread()
whenever an appended object is 1) freed due to memory pressure because
of the depth-first delta processing, and 2) needed again because it has
many delta children, and 3) its data is still buffered by sha1write().
Let's fix the issue by simply forcing cached data out when such an
object is written so it can be pread()'d at leisure.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If we use pread() while at the end of the file, it will return 0, which is
not an error from the operating system point of view. In this case, errno
has not been set and must not be used.
Signed-off-by: Samuel Tardieu <sam@rfc1149.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
On ARM I have the following compilation errors:
CC fast-import.o
In file included from cache.h:8,
from builtin.h:6,
from fast-import.c:142:
arm/sha1.h:14: error: conflicting types for 'SHA_CTX'
/usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here
arm/sha1.h:16: error: conflicting types for 'SHA1_Init'
/usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here
arm/sha1.h:17: error: conflicting types for 'SHA1_Update'
/usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here
arm/sha1.h:18: error: conflicting types for 'SHA1_Final'
/usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here
make: *** [fast-import.o] Error 1
This is because openssl header files are always included in
git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not
set, which somehow brings in <openssl/sha1.h> clashing with the custom
ARM version. Compilation of git is probably broken on PPC too for the
same reason.
Turns out that the only file requiring openssl/ssl.h and openssl/err.h
is imap-send.c. But only moving those problematic includes there
doesn't solve the issue as it also includes cache.h which brings in the
conflicting local SHA1 header file.
As suggested by Jeff King, the best solution is to rename our references
to SHA1 functions and structure to something git specific, and define those
according to the implementation used.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
A comment on top of create_tmpfile() describes caveats ('can have
problems on various systems (FAT, NFS, Coda)') that should apply
in this situation as well. This in the end did not end up solving
any of my personal problems, but it might be a useful cleanup patch
nevertheless.
Signed-off-by: Petr Baudis <pasky@suse.cz>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When completing a thin pack, a new header has to be written to
the pack and a new SHA1 computed. Make sure that the SHA1 of what
is being read back matches the SHA1 of what was written for both:
the original pack and the appended objects.
To do so, a couple write_or_die() calls were converted to sha1write()
which has the advantage of doing some buffering as well as handling
SHA1 and CRC32 checksum already.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, this function has the potential to read corrupted pack data
from disk and give it a valid SHA1 checksum. Let's add the ability to
validate SHA1 checksum of existing data along the way, including before
and after any arbitrary point in the pack.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git index-pack" is an independent command and does not setup git
repository while still need pack.indexversion. It may miss the
info if it is in a subdirectory of the repository.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When index-pack completes a thin pack it appends objects to the pack.
Since the commit 92392b4(index-pack: Honor core.deltaBaseCacheLimit when
resolving deltas) such an object can be pruned in case of memory
pressure, and will be read back again by get_data_from_pack(). For this
to work, the fields in object_entry structure need to be initialized
properly.
Noticed by Pierre Habouzit.
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Acked-by: Nicolas Pitre <nico@cam.org>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we are trying to resolve deltas for a long delta chain composed
of multi-megabyte objects we can easily run into requiring 500M+
of memory to hold each object in the chain on the call stack while
we recurse into the dependent objects and resolve them.
We now use a simple delta cache that discards objects near the
bottom of the call stack first, as they are the most least recently
used objects in this current delta chain. If we recurse out of a
chain we may find the base object is no longer available, as it was
free'd to keep memory under the deltaBaseCacheLimit. In such cases
we must unpack the base object again, which will require recursing
back to the root of the top of the delta chain as we released that
root first.
The astute reader will probably realize that we can still exceed
the delta base cache limit, but this happens only if the most
recent base plus the delta plus the inflated dependent sum up to
more than the base cache limit. Due to the way patch_delta is
currently implemented we cannot operate in less memory anyway.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we free the data stored within a base_data we need the struct
object_entry to get the data back again for use with another dependent
delta. Storing the object_entry* in base_data makes it simple to call
get_data_from_pack() to recover the compressed information.
This however means that we must add the missing base object to the end of
our packfile prior to calling resolve_delta() on each of the dependent
deltas. Adding the base first ensures we can read the base back from the
pack we are indexing, as if it had been included by the remote side.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We need to release earlier inflated base objects when memory gets
low, which means we need to be able to walk up or down the stack
to locate the objects we want to release, and free their data.
The new link/unlink routines allow inserting and removing the struct
base_data during recursion inside resolve_delta, and the global
base_cache gives us the head of the chain (bottom of the stack)
so we can traverse it.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We need to discard base objects which are not recently used if our
memory gets low, such as when we are unpacking a long delta chain
of a very large object.
To support tracking the available base objects we combine the
pointer and size into a struct. Future changes would allow the
data pointer to be free'd and marked NULL if memory gets low.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you misuse a git command, you are shown the usage string.
But this is currently shown in the dashed form. So if you just
copy what you see, it will not work, when the dashed form
is no longer supported.
This patch makes git commands show the dash-less version.
For shell scripts that do not specify OPTIONS_SPEC, git-sh-setup.sh
generates a dash-less usage string now.
Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When printing valuds of type uint32_t, we should use PRIu32, and should
not assume that it is unsigned int. On 32-bit platforms, it could be
defined as unsigned long. The same caution applies to ntohl().
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This means that we can depend on packs always being stable on disk,
simplifying a lot of the object serialization worries. And unlike loose
objects, serializing pack creation IO isn't going to be a performance
killer.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git_config() only had a function parameter, but no callback data
parameter. This assumes that all callback functions only modify
global variables.
With this patch, every callback gets a void * parameter, and it is hoped
that this will help the libification effort.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adds strict option, which bails out if the pack would
introduces broken object or links in the repository.
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This probably hasn't been properly tested before. Here's a script to
create a 8GB repo with the necessary characteristics (copy the
test-genrandom executable from the Git build tree to /tmp first):
-----
#!/bin/bash
git init
git config core.compression 0
# create big objects with no deltas
for i in $(seq -w 1 2 63)
do
echo $i
/tmp/test-genrandom $i 268435456 > file_$i
git add file_$i
rm file_$i
echo "file_$i -delta" >> .gitattributes
done
# create "deltifiable" objects in between big objects
for i in $(seq -w 2 2 64)
do
echo "$i $i $i" >> grow
cp grow file_$i
git add file_$i
rm file_$i
done
rm grow
# create a pack with them
git commit -q -m "commit of big objects interlaced with small deltas"
git repack -a -d
-----
Then clone this repo over the Git protocol.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the same spirit of prettifying Git's output display for mere mortals,
here's a simple extension to the progress API allowing for a final
message to be provided when terminating a progress line, and use it for
the display of the number of objects needed to complete a thin pack,
saving yet one more line of screen display.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The throughput display needs a delay period before accounting and
displaying anything. Yet it might be called after some amount of data
has already been transferred. The display of total data is therefore
accounted late and therefore smaller than the reality.
Let's call display_throughput() with an absolute amount of transferred
data instead of a relative number, and let the throughput code find the
relative amount of data by itself as needed. This way the displayed
total is always exact.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a good idea to use pack index version 2 all the time since it has
proper protection against propagation of certain pack corruptions when
repacking which is not possible with index version 1, as demonstrated
in test t5302.
Hence this config option.
The default is still pack index version 1.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
... and call it "Receiving objects" when over stdin to look clearer
to end users.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since it is now OK to pass a null pointer to display_progress() and
stop_progress() resulting in a no-op, then we can simplify the code
and remove a bunch of lines by not making those calls conditional all
the time.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows for better management of progress "object" existence,
as well as making the progress display implementation more independent
from its callers.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that some pointers have lost their const attribute, we can free their
associated memory when done with them. This is more a correctness issue
about the rule for freeing those pointers which isn't completely trivial
more than the leak itself which didn't matter as the program is
exiting anyway.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Two functions, namely write_idx_file() and open_pack_file(), currently
return a const pointer. However that pointer is either a copy of the
first argument, or set to a malloc'd buffer when that first argument
is null. In the later case it is wrong to qualify that pointer as const
since ownership of the buffer is transferred to the caller to dispose of,
and obviously the free() function is not meant to be passed const
pointers.
Making the return pointer not const causes a warning when the first
argument is returned since that argument is also marked const.
The correct thing to do is therefore to remove the const qualifiers,
avoiding the need for ugly casts only to silence some warnings.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Each progress can be on a single line instead of two.
[sp: Changed "Checking files out" to "Checking out files" at
Johannes Sixt's suggestion as it better explains the
action that is taking place]
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
xmkstemp() performs error checking and prints a standard error message when
an error occur.
Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I audited git for potential undetected write failures.
In the cases fixed below, the diagnostics I add mimic the diagnostics
used in surrounding code, even when that means not reporting
the precise strerror(errno) cause of the error.
Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch unifies the write_index_file functions in
builtin-pack-objects.c and index-pack.c. As the name
"index" is overloaded in git, move in the direction of
using "idx" and "pack idx" when refering to the pack index.
There should be no change in functionality.
Signed-off-by: Geert Bosch <bosch@gnat.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch fixes all calls to xread() where the return value is not
stored into an ssize_t. The patch should not have any effect whatsoever,
other than putting better/more appropriate type names on variables.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Now that fast-import is using a "library function" to handle
correcting its packfile's object count and trailing SHA-1 we
should reuse the same function in index-pack, to reduce the
size of the code we must maintain.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the progress bar ends up in a box, better provide a title for it too.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Instead of having this code duplicated in multiple places, let's have
a common interface for progress display. If someday someone wishes to
display a cheezy progress bar instead then only one file will have to
be changed.
Note: I left merge-recursive.c out since it has a strange notion of
progress as it apparently increase the expected total number as it goes.
Someone with more intimate knowledge of what that is supposed to mean
might look at converting it to the common progress interface.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is necessary for testing the new capabilities in some automated
way without having an actual 4GB+ pack.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Like previous patch but for index-pack.
[ There is quite some code duplication between pack-objects and index-pack
for generating a pack index (and fast-import as well I suppose). This
should be reworked into a common function eventually. But not now. ]
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Change a few size and offset variables to more appropriate type, then
add overflow tests on those offsets. This prevents any bad data to be
generated/processed if off_t happens to not be large enough to handle
some big packs.
Better be safe than sorry.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch introduces the MSB() macro to obtain the desired number of
most significant bits from a given variable independently of the variable
type.
It is then used to better implement the overflow test on the OBJ_OFS_DELTA
base offset variable with the property of always working correctly
regardless of the type/size of that variable.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>