The initial one was not doing enough to figure things out
without uncompressing too much. It also fixes a potential
segfault resulting from missing use_packed_git() call.
We would need to introduce unuse_packed_git() call and do proper
use counting to figure out when it is safe to unmap, but
currently we do not unmap packed file yet.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now, there's still a misfeature there, which is that when you
create a new object, it doesn't check whether that object
already exists in the pack-file, so you'll end up with a few
recent objects that you really don't need (notably tree
objects), and this patch fixes it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This replaces sha1sum(1) with sum(1) in t/t1002. GNU sum(1) runs in
"BSD compatibility" mode by default, and not all systems have GNU
coreutils. On any system without GNU coreutils (or sha1sum) t1002 will
fail. This patch should make t1002 complete successfully everywhere
that sum(1) runs.
I've tested this on Darwin and Linux; it works on both platforms.
Signed-off-by: Mark Allen <mrallen1@yahoo.com>
Acked-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
GIT_OBJECT_DIRECTORY and GIT_ALTERNATE_OBJECT_DIRECTORIES can
have the "pack" subdirectory that houses "packed GIT" files
produced by git-pack-objects (e.g. .git/objects/pack/foo.pack
and .git/objects/pack/foo.idx; always store them as pairs). The
following functions in sha1_file.c can then read object contents
from such packed file:
- sha1_object_info()
- has_sha1_file()
- read_sha1_file()
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This lets us eliminate one use of map_sha1_file() outside
sha1_file.c, to bring us one step closer to the packed GIT.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Packed delta files created by git-pack-objects seems to be the
way to go, and existing "delta" object handling code has exposed
the object representation details to too many places. Remove it
while we refactor code to come up with a proper interface in
sha1_file.c.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In contrast to other plumbing tools, git-ssh-push only
allow a very restrictive form of commit-id filenames.
This patch removes this restriction.
Acked-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If we have a very long commit message, and we end up getting a
bufferfull of data from git-rev-list that all belongs to one commit,
we ended up throwing away the data from a previous read that should
have been included. The result was a error message about not being
able to parse the output of git-rev-list.
Also, if the git-rev-list output that we can't parse is long, only put
the first 80 chars in the error message. Otherwise we end up with an
enormous error window.
Also, make the writing of the SHA1 as a end-header be conditional: not
every user will necessarily want to write the SHA1 to the file itself,
even though current users do (but we migh end up using the same helper
functions for the object files themselves, that don't do this).
This also makes the packed index file contain the SHA1 of the packed
data file at the end (just before its own SHA1). That way you can
validate the pairing of the two if you want to.
We want to be able to check their integrity later, and putting the
sha1-sum of the contents at the end is a good thing. The writing
routines are generic, so we could try to re-use them for the index file,
instead of having the same logic duplicated.
Update unpack-objects to know about the extra 20 bytes at the end
of the index.
Here is a script to simplify validating the gpg signature created by
git-tag-script. Might be useful to add to the git tree so that people
don't have to search for the right post in the git mailinglist archives
Check that $GIT_DIR (or .git, if GIT_DIR is not set) is a directory.
This means we can give a more informative error message if the user
runs gitk somewhere that isn't a git repository.
Starting from big objects and going backwards means that we end up
picking a delta that goes from a bigger object to a smaller one. That's
advantageous for two reasons: the bigger object is likely the newer one
(since things tend to grow, rather than shrink), and doing a delete
tends to be smaller than doing an add.
So the deltas don't tend to be top-of-tree, and the packed end result is
just slightly smaller.
This will scan 2 or more object repositories and look for common objects, check
if they are hardlinked, and replace one with a hardlink to the other if not.
This version warns when skipping files because of size differences, and
handle more than 2 repositories automatically.
Signed-off-by: Ryan Anderson <ryan@michonline.com>
Cheered-on-by: Jeff Garzik <jgarzik@pobox.com>
Acked-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If you have two lists of heads, and you want to see ones reachable from
list $a but not from list $b, just do
git-rev-list $(git-rev-parse $a --not $b)
which is useful for both bisecting (where "b" would be the list of known
good revisions, and "a" would be the latest found bad head) and for just
seeing what the difference between two sets of heads are if you want to
generate a pack-file for the difference.
This actually successfully packed and unpacked a git archive down to
1.3MB (17MB unpacked).
Right now unpacking is way too noisy, lots of debug messages left.
This finishes the initial round of git-pack-object /
git-unpack-object pair. They are now good enough to be used as
a transport medium:
- Fix delta direction in pack-objects; the original was
computing delta to create the base object from the object to
be squashed, which was quite unfriendly for unpacker ;-).
- Add a script to test the very basics.
- Implement unpacker for both regular and deltified objects.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A zero disables delta generation (like before), but we make the window
be one bigger than specified, since we use one entry for the one to be
tested (it used to be that "--window=1" was meaningless, since we'd have
used up the single-entry window with the entry to be tested, and had no
chance of actually ever finding a delta).
The default window remains at 10, but now it really means "test the 10
closest objects", not "test the 9 closest objects".
Anything that generates a delta to see if two objects are close usually
isn't interested in the delta ends up being bigger than some specified
size, and this allows us to stop delta generation early when that
happens.
When Junio fixed the lack of a successful error code from try_delta(),
that uncovered an off-by-one error in the caller.
Also, some testing made it clear that we now find a lot more deltas,
because we used to (incorrectly) break early on bogus "failure"
cases.
Return value of try_delta is checked for negativeness, but the
success path does not return anything, letting compiler warn and
presumably return garbage.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Describe what to implement in fetch() and fetch_ref() for
pull backend writers a bit better.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
An earlier change to optimize directory-file conflict check
broke what "read-tree --emu23" expects. This is fixed by this
commit.
(1) Introduces an explicit flag to tell add_cache_entry() not to
check for conflicts and use it when reading an existing tree
into an empty stage --- by definition this case can never
introduce such conflicts.
(2) Makes read-cache.c:has_file_name() and read-cache.c:has_dir_name()
aware of the cache stages, and flag conflict only with paths
in the same stage.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When a merge adds a file DF and removes a directory there by
deleting a path DF/DF, git-merge-one-file-script can be called
for the removal of DF/DF when the path DF is already created by
"git-read-tree -m -u". When this happens, we get confused by a
failure return from 'rm -f -- "$4"' (where $4 is DF/DF); finding
file DF there the "rm -f" command complains that DF is not a
directory.
What we want to ensure is that there is no file DF/DF in this
case. Avoid getting ourselves confused by first checking if
there is a file, and only then try to remove it (and check for
failure from the "rm" command).
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds more tests for --emu23. One is to show how it can
carry forward more local changes than the straightforward
two-way fast forward, and another is to show the recent
overeager optimization of directory/file conflict check broke
things, which will be fixed in the next commit.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Using git-cherry, forward port local commits missing from the
new upstream head. This also depends on "-m" flag support in
git-commit-script.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The git-cherry command helps the git-rebase script by finding
commits that have not been merged upstream. Commits already
included in upstream are prefixed with '-' (meaning "drop from
my local pull"), while commits missing from upstream are
prefixed with '+' (meaning "add to the updated upstream").
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With -m flag specified, git-commit-script takes the commit
message along with author information from an existing commit.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Usually all of the match_xxx routines in date.c fill tm
structure assuming that the parsed string talks about local
time, and parse_date routine compensates for it by adjusting the
value with tz offset parsed out separately. However, this logic
does not work well when we feed GIT raw commit timestamp to it,
because what match_digits gets is already in GMT.
A good testcase is:
$ make test-date
$ ./test-date 'Fri Jun 24 16:55:27 2005 -0700' '1119657327 -0700'
These two timestamps represent the same time, but the second one
without the fix this commit introduces gives you 7 hours off.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
So far it just reads the header and generates the list of objects.
It also sorts them by the order they are written in the pack file,
since that ends up being the same order we got them originally, and
is thus "most recent first".
This is kind of like a tar-ball for a set of objects, ready to be
shipped off to another end. Alternatively, you could use is as a packed
representation of the object database directly, if you changed
"read_sha1_file()" to read these kinds of packs.
The latter is partiularly useful to generate a "packed history", ie you
could pack up your old history efficiently, but still have it available
(at a performance hit, of course).
I haven't actually written an unpacker yet, so the end result has not
been verified in any way yet. I obviously always write bug-free code,
so it just has to work, no?