This makes it match the new delta encoding, and admittedly makes the
code easier to follow.
This also updates the PACK file version to 2, since this (and the delta
encoding change in the previous commit) are incompatible with the old
format.
Since the delta data format is not tied to any actual git object
anymore, now is the time to add a small improvement to the delta data
header as it is been done for packed object header. This patch allows
for reducing the delta header of about 2 bytes and makes for simpler
code.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Deltas are useless by themselves and when you use them you need to get
to their base objects. A base object should inherit recency from the
most recent deltified object that is based on it and that is what this
patch teaches git-pack-objects.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It gets a bit more complicated to unpack in a streaming environment, but
here it is. The rewrite is actually a lot cleaner in other ways, it's
just a bit more subtle.
The diff_delta() interface was extended to reject generating too big a
delta while we were working on the packed GIT archive format.
Take advantage of that when generating delta in the similarity estimator
used in diffcore-rename.c
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Standalone unpack-objects command was not adjusted for header length
encoding change when dealing with deltified entry. This fixes it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The fsck-cache complains if objects referred to by files in .git/refs/
or objects stored in files under .git/objects/??/ are not found as
stand-alone SHA1 files (i.e. found in alternate object pools
GIT_ALTERNATE_OBJECT_DIRECTORIES or packed archives stored under
.git/objects/pack).
Although this is a good semantics to maintain consistency of a single
.git/objects directory as a self contained set of objects, it sometimes
is useful to consider it is OK as long as these "outside" objects are
available.
This commit introduces a new flag, --standalone, to git-fsck-cache.
When it is not specified, connectivity checks and .git/refs pointer
checks are taught that it is OK when expected objects do not exist under
.git/objects/?? hierarchy but are available from an packed archive or in
an alternate object pool.
Another new flag, --full, makes git-fsck-cache to check not only the
current GIT_OBJECT_DIRECTORY but also objects found in alternate object
pools and packed GIT archives.a
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The commands git-fsck-cache and probably git-*-pull needs to have a way
to enumerate objects contained in packed GIT archives and alternate
object pools. This commit exposes the data structure used to keep track
of them from sha1_file.c, and adds a couple of accessor interface
functions for use by the enhanced git-fsck-cache command.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This was causing random segfaults, because use_packed_git() got
confused by random garbage there.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The fsck-cache complains if objects referred to by files in .git/refs/
or objects stored in files under .git/objects/??/ are not found as
stand-alone SHA1 files (i.e. found in alternate object pools
GIT_ALTERNATE_OBJECT_DIRECTORIES or packed archives stored under
.git/objects/pack).
Although this is a good semantics to maintain consistency of a single
.git/objects directory as a self contained set of objects, it sometimes
is useful to consider it is OK as long as these "outside" objects are
available.
This commit introduces a new flag, --standalone, to git-fsck-cache.
When it is not specified, connectivity checks and .git/refs pointer
checks are taught that it is OK when expected objects do not exist under
.git/objects/?? hierarchy but are available from an packed archive or in
an alternate object pool.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This also adds a header with a signature, version info, and the number
of objects to the pack file. It also encodes the file length and type
more efficiently.
We use sha1_object_info() now, and getting size is also trivial.
I admit that this is more of "because we can" not "because I see
immediate need for it", though.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When trying to find out the type of the object, there is no need
to uncompress the whole object. Just use sha1_object_info().
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The initial one was not doing enough to figure things out
without uncompressing too much. It also fixes a potential
segfault resulting from missing use_packed_git() call.
We would need to introduce unuse_packed_git() call and do proper
use counting to figure out when it is safe to unmap, but
currently we do not unmap packed file yet.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now, there's still a misfeature there, which is that when you
create a new object, it doesn't check whether that object
already exists in the pack-file, so you'll end up with a few
recent objects that you really don't need (notably tree
objects), and this patch fixes it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This replaces sha1sum(1) with sum(1) in t/t1002. GNU sum(1) runs in
"BSD compatibility" mode by default, and not all systems have GNU
coreutils. On any system without GNU coreutils (or sha1sum) t1002 will
fail. This patch should make t1002 complete successfully everywhere
that sum(1) runs.
I've tested this on Darwin and Linux; it works on both platforms.
Signed-off-by: Mark Allen <mrallen1@yahoo.com>
Acked-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
GIT_OBJECT_DIRECTORY and GIT_ALTERNATE_OBJECT_DIRECTORIES can
have the "pack" subdirectory that houses "packed GIT" files
produced by git-pack-objects (e.g. .git/objects/pack/foo.pack
and .git/objects/pack/foo.idx; always store them as pairs). The
following functions in sha1_file.c can then read object contents
from such packed file:
- sha1_object_info()
- has_sha1_file()
- read_sha1_file()
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This lets us eliminate one use of map_sha1_file() outside
sha1_file.c, to bring us one step closer to the packed GIT.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Packed delta files created by git-pack-objects seems to be the
way to go, and existing "delta" object handling code has exposed
the object representation details to too many places. Remove it
while we refactor code to come up with a proper interface in
sha1_file.c.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In contrast to other plumbing tools, git-ssh-push only
allow a very restrictive form of commit-id filenames.
This patch removes this restriction.
Acked-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If we have a very long commit message, and we end up getting a
bufferfull of data from git-rev-list that all belongs to one commit,
we ended up throwing away the data from a previous read that should
have been included. The result was a error message about not being
able to parse the output of git-rev-list.
Also, if the git-rev-list output that we can't parse is long, only put
the first 80 chars in the error message. Otherwise we end up with an
enormous error window.
Also, make the writing of the SHA1 as a end-header be conditional: not
every user will necessarily want to write the SHA1 to the file itself,
even though current users do (but we migh end up using the same helper
functions for the object files themselves, that don't do this).
This also makes the packed index file contain the SHA1 of the packed
data file at the end (just before its own SHA1). That way you can
validate the pairing of the two if you want to.
We want to be able to check their integrity later, and putting the
sha1-sum of the contents at the end is a good thing. The writing
routines are generic, so we could try to re-use them for the index file,
instead of having the same logic duplicated.
Update unpack-objects to know about the extra 20 bytes at the end
of the index.
Here is a script to simplify validating the gpg signature created by
git-tag-script. Might be useful to add to the git tree so that people
don't have to search for the right post in the git mailinglist archives
Check that $GIT_DIR (or .git, if GIT_DIR is not set) is a directory.
This means we can give a more informative error message if the user
runs gitk somewhere that isn't a git repository.
Starting from big objects and going backwards means that we end up
picking a delta that goes from a bigger object to a smaller one. That's
advantageous for two reasons: the bigger object is likely the newer one
(since things tend to grow, rather than shrink), and doing a delete
tends to be smaller than doing an add.
So the deltas don't tend to be top-of-tree, and the packed end result is
just slightly smaller.
This will scan 2 or more object repositories and look for common objects, check
if they are hardlinked, and replace one with a hardlink to the other if not.
This version warns when skipping files because of size differences, and
handle more than 2 repositories automatically.
Signed-off-by: Ryan Anderson <ryan@michonline.com>
Cheered-on-by: Jeff Garzik <jgarzik@pobox.com>
Acked-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If you have two lists of heads, and you want to see ones reachable from
list $a but not from list $b, just do
git-rev-list $(git-rev-parse $a --not $b)
which is useful for both bisecting (where "b" would be the list of known
good revisions, and "a" would be the latest found bad head) and for just
seeing what the difference between two sets of heads are if you want to
generate a pack-file for the difference.
This actually successfully packed and unpacked a git archive down to
1.3MB (17MB unpacked).
Right now unpacking is way too noisy, lots of debug messages left.
This finishes the initial round of git-pack-object /
git-unpack-object pair. They are now good enough to be used as
a transport medium:
- Fix delta direction in pack-objects; the original was
computing delta to create the base object from the object to
be squashed, which was quite unfriendly for unpacker ;-).
- Add a script to test the very basics.
- Implement unpacker for both regular and deltified objects.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>