kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	ab7cd7bb8c	pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	3f9ac8d259	pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	cac251d0bc	relax delta selection filtering in pack-objects This change provides a 8% saving on the pack size with a 4% CPU time increase for git-repack -a on the current git archive. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	7a979d99ba	Thin pack - create packfile with missing delta base. This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	e4c9327a77	pack-objects: avoid delta chains that are too long. This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	ca5381d43e	pack-objects: finishing touches. This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	a49dd05fd0	pack-objects: reuse data from existing packs. When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	024701f1d8	Make pack-objects chattier. You could give -q to squelch it, but currently no tool does it. This would make 'git clone host:repo here' over ssh not silent again. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	21fcd1bdea	fetch-clone progress: finishing touches. This makes fetch-pack also report the progress of packing part. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	82f9d58a39	code comments: spell Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nikolai Weibull	63ae26f87a	Document the --non-empty command-line option to git-pack-objects. This provides (minimal) documentation for the --non-empty command-line option to the pack-objects command. Signed-off-by: Nikolai Weibull <nikolai@bitwi.se> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	53228a5fb8	Make the rest of commands work from a subdirectory. These commands are converted to run from a subdirectory. commit-tree convert-objects merge-base merge-index mktag pack-objects pack-redundant prune-packed read-tree tar-tree unpack-file unpack-objects update-server-info write-tree Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Linus Torvalds	ef07618fdd	git-repack: Properly abort in corrupt repository In a corrupt repository, git-repack produces a pack that does not contain needed objects without complaining, and the result of this combined with -d flag can be very painful -- e.g. a lossage of one tree object can lead to lossage of blobs reachable only through that tree. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	f3123c4ab3	pack-objects: Allow use of pre-generated pack. git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache directory, when a necessary pack is found. This is hopefully useful when upload-pack (called from git-daemon) is expected to receive requests for the same set of objects many times (e.g full cloning request of any project, or updates from the set of heads previous day to the latest for a slow moving project). Currently git-pack-objects does not keep pack files it creates for reusing. It might be useful to add --update-cache option to it, which would allow it store pack files it created in the pack-cache directory, and prune rarely used ones from it. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Linus Torvalds	4546738b58	Unlocalized isspace and friends Do our own ctype.h, just to get the sane semantics: we want locale-independence, _and_ we want the right signed behaviour. Plus we only use a very small subset of ctype.h anyway (isspace, isalpha, isdigit and isalnum). Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Linus Torvalds	64560374cc	Add support for "local" packing This adds the "--local" flag to git-pack-objects, which acts like "--incremental", except that instead of ignoring all packed objects, it only ignores objects that are packed and in an alternate object tree. As a result, it effectively only does a local re-pack: any remote-packed objects will stay in the alternate object directories. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	84c8d8aec5	Fix packname hash generation. This changes the generation of hash packfiles have in their names, from "hash of object names as fed to us" to "hash of object names in the resulting pack, in the order they appear in the index file". The new "git-index-pack" command is taught to output the computed hash value to its standard output. With this, we can store downloaded pack in a temporary file without knowing its final name, run git-index-pack to generate idx for it while finding out its final name, and then rename the pack and idx to their final names. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Sergey Vlasov	adee7bdf50	[PATCH] Plug memory leak in git-pack-objects find_deltas() should free its temporary objects before returning. [jc: Sergey, if you have [PATCH] title on the Subject line of your e-mail, please do not repeat it on the first line in your message body. Thanks.] Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net>	20 years ago
Linus Torvalds	5f3de58ff8	Make the name of a pack-file depend on the objects packed there-in. This means that the .git/objects/pack directory is also rsync'able, since the filenames created there-in are either unique or refer to the same data. Otherwise you might not be able to pull from a directory that is partly packed without having to worry about missing objects due to pack-file name clashes.	20 years ago
Linus Torvalds	1c4a291202	Add "--non-empty" flag to git-pack-objects It skips writing the pack-file if it ends up being empty.	20 years ago
Linus Torvalds	eb019375ab	Add "--incremental" flag to git-pack-objects It won't add an object that is already in a pack to the new pack.	20 years ago
Nicolas Pitre	dcde55bc58	[PATCH] assorted delta code cleanup This is a wrap-up patch including all the cleanups I've done to the delta code and its usage. The most important change is the factorization of the delta header handling code. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	20 years ago
Linus Torvalds	01247d8742	Make git pack files use little-endian size encoding This makes it match the new delta encoding, and admittedly makes the code easier to follow. This also updates the PACK file version to 2, since this (and the delta encoding change in the previous commit) are incompatible with the old format.	20 years ago
Junio C Hamano	9d5ab9625d	[PATCH] Emit base objects of a delta chain when the delta is output. Deltas are useless by themselves and when you use them you need to get to their base objects. A base object should inherit recency from the most recent deltified object that is based on it and that is what this patch teaches git-pack-objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	20 years ago
Junio C Hamano	e1ddc97684	[PATCH] Fix unpack-objects for header length information. Standalone unpack-objects command was not adjusted for header length encoding change when dealing with deltified entry. This fixes it. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	20 years ago
Linus Torvalds	a733cb606f	Change pack file format. Hopefully for the last time. This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently.	20 years ago
Linus Torvalds	d22b9290ab	git-pack-objects: add "--stdout" flag to write the pack file to stdout This also suppresses creation of the index file.	20 years ago
Linus Torvalds	a69d094366	Teach packing about "tag" objects (And teach sha1_file and unpack-object know how to unpack them too, of course)	20 years ago
Junio C Hamano	36e4d74a21	[PATCH] Enhance sha1_file_size() into sha1_object_info() This lets us eliminate one use of map_sha1_file() outside sha1_file.c, to bring us one step closer to the packed GIT. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	20 years ago
Junio C Hamano	c4584ae3fd	[PATCH] Remove "delta" object representation. Packed delta files created by git-pack-objects seems to be the way to go, and existing "delta" object handling code has exposed the object representation details to too many places. Remove it while we refactor code to come up with a proper interface in sha1_file.c. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	20 years ago
Linus Torvalds	e18088451d	csum-file interface updates: return resulting SHA1 Also, make the writing of the SHA1 as a end-header be conditional: not every user will necessarily want to write the SHA1 to the file itself, even though current users do (but we migh end up using the same helper functions for the object files themselves, that don't do this). This also makes the packed index file contain the SHA1 of the packed data file at the end (just before its own SHA1). That way you can validate the pairing of the two if you want to.	20 years ago
Linus Torvalds	c38138cd78	git-pack-objects: write the pack files with a SHA1 csum We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index.	20 years ago
Linus Torvalds	27225f2e87	git-pack-objects: use name information (if any) to sort objects for packing. This is incredibly cheezy. But it's cheap, and it works pretty well.	20 years ago
Linus Torvalds	521a4f4cf4	git-pack-objects: do the delta search in reverse size order Starting from big objects and going backwards means that we end up picking a delta that goes from a bigger object to a smaller one. That's advantageous for two reasons: the bigger object is likely the newer one (since things tend to grow, rather than shrink), and doing a delete tends to be smaller than doing an add. So the deltas don't tend to be top-of-tree, and the packed end result is just slightly smaller.	20 years ago
Linus Torvalds	c4fb06c0d0	Fix object packing/unpacking. This actually successfully packed and unpacked a git archive down to 1.3MB (17MB unpacked). Right now unpacking is way too noisy, lots of debug messages left.	20 years ago
Junio C Hamano	8ee378a0f0	[PATCH] Finish initial cut of git-pack-object/git-unpack-object pair. This finishes the initial round of git-pack-object / git-unpack-object pair. They are now good enough to be used as a transport medium: - Fix delta direction in pack-objects; the original was computing delta to create the base object from the object to be squashed, which was quite unfriendly for unpacker ;-). - Add a script to test the very basics. - Implement unpacker for both regular and deltified objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	20 years ago
Linus Torvalds	d116a45a9a	Add "--depth=N" parameter to git-pack-objects to limit maximum delta depth It too defaults to 10. A nice round random number.	20 years ago
Linus Torvalds	f846bbff15	git-pack-objects: make "--window=x" semantics more logical. A zero disables delta generation (like before), but we make the window be one bigger than specified, since we use one entry for the one to be tested (it used to be that "--window=1" was meaningless, since we'd have used up the single-entry window with the entry to be tested, and had no chance of actually ever finding a delta). The default window remains at 10, but now it really means "test the 10 closest objects", not "test the 9 closest objects".	20 years ago
Linus Torvalds	75c42d8cc3	Add a "max_size" parameter to diff_delta() Anything that generates a delta to see if two objects are close usually isn't interested in the delta ends up being bigger than some specified size, and this allows us to stop delta generation early when that happens.	20 years ago
Linus Torvalds	78817c15de	Fix delta "sliding window" code When Junio fixed the lack of a successful error code from try_delta(), that uncovered an off-by-one error in the caller. Also, some testing made it clear that we now find a lot more deltas, because we used to (incorrectly) break early on bogus "failure" cases.	20 years ago
Junio C Hamano	eb41ab11e8	[PATCH] (patchlet) pack-objects.c: try_delta() Return value of try_delta is checked for negativeness, but the success path does not return anything, letting compiler warn and presumably return garbage. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	20 years ago
Linus Torvalds	d38c3721a1	git-pack-objects: mark the delta packing with a 'D'. When writing a delta, we take the real type from the object we're doing the delta against, and just write a 'D' as the type of the current object.	20 years ago
Linus Torvalds	49397104f2	git-pack-objects: fix typo ("<" should be "=")	20 years ago
Linus Torvalds	c323ac7d9c	git-pack-objects: create a packed object representation. This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no?	20 years ago

... 2 3 4 5 6

253 Commits (56a853b62c0ae7ebaad0a7a0a704f5ef561eb795)