kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Nicolas Pitre	67c08ce14f	pack-objects: remove redundent status information The final 'nr_result' and 'written' values must always be the same otherwise we're in deep trouble. So let's remove a redundent report. And for paranoia sake let's make sure those two variables are actually equal after all objects are written (one never knows). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Junio C Hamano	e9195b584f	pack-objects: tweak "do not even attempt delta" heuristics The heuristics to give up deltification when both the source and the target are both in the same pack affects negatively when we are repacking the subset of objects in the existing pack. This caused any incremental updates to use suboptimal packs. Tweak the heuristics to avoid this problem. Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Nicolas Pitre	231f240b63	git-pack-objects progress flag documentation and cleanup This adds documentation for --progress and --all-progress, remove a duplicate --progress handling and make usage string more readable. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Nicolas Pitre	fa438a2eb1	make git-push a bit more verbose Currently git-push displays progress status for the local packing of objects to send, but nothing once it starts to push it over the connection. Having progress status in that later case is especially nice when pushing lots of objects over a slow network link. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Junio C Hamano	63fba759bc	pack-objects: document --delta-base-offset option Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Nicolas Pitre	a270069699	allow delta data reuse even if base object is a preferred base Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Nicolas Pitre	f130446920	zap a debug remnant Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Nicolas Pitre	780e6e735b	make pack data reuse compatible with both delta types This is the missing part to git-pack-objects allowing it to reuse delta data to/from any of the two delta types. It can reuse delta from any type, and it outputs base offsets when --allow-delta-base-offset is provided and the base is also included in the pack. Otherwise it outputs base sha1 references just like it always did. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Nicolas Pitre	be6b19145f	make git-pack-objects able to create deltas with offset to base This is enabled with --delta-base-offset only, and doesn't work with pack data reuse yet. The idea is to allow for the fetch protocol to use an extension flag to notify the remote end that --delta-base-offset can be used with git-pack-objects. Eventually git-repack will always provide this flag. With this, all delta base objects are now pushed before deltas that depend on them. This is a requirements for OBJ_OFS_DELTA. This is not a requirement for OBJ_REF_DELTA but always doing so makes the code simpler. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Nicolas Pitre	eb32d236df	introduce delta objects with offset to base This adds a new object, namely OBJ_OFS_DELTA, renames OBJ_DELTA to OBJ_REF_DELTA to better make the distinction between those two delta objects, and adds support for the handling of those new delta objects in sha1_file.c only. The OBJ_OFS_DELTA contains a relative offset from the delta object's position in a pack instead of the 20-byte SHA1 reference to identify the base object. Since the base is likely to be not so far away, the relative offset is more likely to have a smaller encoding on average than an absolute offset. And for those delta objects the base must always be stored first because there is no way to know the distance of later objects when streaming a pack. Hence this relative offset is always meant to be negative. The offset encoding is slightly denser than the one used for object size -- credits to <linux@horizon.com> (whoever this is) for bringing it to my attention. This allows for pack size reduction between 3.2% (Linux-2.6) to over 5% (linux-historic). Runtime pack access should be faster too since delta replay does skip a search in the pack index for each delta in a chain. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Nicolas Pitre	43057304c0	many cleanups to sha1_file.c Those cleanups are mainly to set the table for the support of deltas with base objects referenced by offsets instead of sha1. This means that many pack lookup functions are converted to take a pack/offset tuple instead of a sha1. This eliminates many struct pack_entry usages since this structure carried redundent information in many cases, and it increased stack footprint needlessly for a couple recursively called functions that used to declare a local copy of it for every recursion loop. In the process, packed_object_info_detail() has been reorganized as well so to look much saner and more amenable to deltas with offset support. Finally the appropriate adjustments have been made to functions that depend on the above changes. But there is no functionality changes yet simply some code refactoring at this point. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Junio C Hamano	4321134cd8	pack-objects: document --revs, --unpacked and --all. Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Junio C Hamano	8d1d8f83b5	pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Junio C Hamano	b5d97e6b0a	pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Junio C Hamano	72518e9c26	more lightweight revalidation while reusing deflated stream in packing When copying from an existing pack and when copying from a loose object with new style header, the code makes sure that the piece we are going to copy out inflates well and inflate() consumes the data in full while doing so. The check to see if the xdelta really apply is quite expensive as you described, because you would need to have the image of the base object which can be represented as a delta against something else. Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Junio C Hamano	7042dbf7a1	pack-objects: fix thinko in revalidate code When revalidating an entry from an existing pack entry->size and entry->type are not necessarily the size of the final object when the entry is deltified, but for base objects they must match. Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Junio C Hamano	df6d61017a	pack-objects: re-validate data we copy from elsewhere. When reusing data from an existing pack and from a new style loose objects, we used to just copy it staight into the resulting pack. Instead make sure they are not corrupt, but do so only when we are not streaming to stdout, in which case the receiving end will do the validation either by unpacking the stream or by constructing the .idx file. Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
Shawn Pearce	e702496e43	Convert memcpy(a,b,20) to hashcpy(a,b). This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void. This is a reasonable tradeoff as most call sites already use unsigned char and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	18 years ago
David Rientjes	a89fccd281	Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length. Introduces global inline: hashcmp(const unsigned char sha1, const unsigned char sha2) Uses memcmp for comparison and returns the result based on the length of the hash name (a future runtime decision). Acked-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
David Rientjes	96f1e58f52	remove unnecessary initializations [jc: I needed to hand merge the changes to the updated codebase, so the result needs to be checked.] Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Matthias Kestenholz	5d4a600335	Make git-pack-objects a builtin Signed-off-by: Matthias Kestenholz <matthias@spinlock.ch> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	ceec1361eb	pack-objects: reuse deflated data from new-style loose objects. When packing an object without deltifying, if the data is stored in a loose object that is encoded with a new style header, copy it without inflating and deflating. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Jeff King	4812a93a8c	pack-objects: check pack.window for default window size For some repositories, deltas simply don't make sense. One can disable them for git-repack by adding --window, but git-push insists on making the deltas which can be very CPU-intensive for little benefit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Pavel Roskin	82e5a82fd7	Fix more typos, primarily in the code The only visible change is that git-blame doesn't understand "--compability" anymore, but it does accept "--compatibility" instead, which is already documented. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	560b25a86f	don't load objects needlessly when repacking If no delta is attempted on some objects then it is useless to load them in memory, neither create any delta index for them. The best thing to do is therefore to load and index them only when really needed. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	8dbbd14ea3	consider previous pack undeltified object state only when reusing delta data Without this there would never be a chance to improve packing for previously undeltified objects. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Linus Torvalds	51d1e83f91	Do not try futile object pairs when repacking. In the repacking window, if both objects we are looking at already came from the same (old) pack-file, don't bother delta'ing them against each other. That means that we'll still always check for better deltas for (and against!) _unpacked_ objects, but assuming incremental repacks, you'll avoid the delta creation 99% of the time. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	363b7817e0	upload-pack: prepare for sideband message support. This does not implement sideband for propagating the status to the downloader yet, but add code to capture the standard error output from the pack-objects process in preparation for sending it off to the client when the protocol extension allows us to do so. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Florian Forster	1d7f171c3a	Remove all void-pointer arithmetic. ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in various ways. Usually the strategy that required the least changes was used. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Linus Torvalds	ce0bd64299	pack-objects: improve path grouping heuristics. This trivial patch not only simplifies the name hashing, it actually improves packing for both git and the kernel. The git archive pack shrinks from 6824090->6622627 bytes (a 3% improvement), and the kernel pack shrinks from 108756213 to 108219021 (a mere 0.5% improvement, but still, it's an improvement from making the hashing much simpler!) We just create a 32-bit hash, where we "age" previous characters by two bits, so the last characters in a filename count most. So when we then compare the hashes in the sort routine, filenames that end the same way sort the same way. It takes the subdirectory into account (unless the filename is > 16 characters), but files with the same name within the same subdirectory will obviously sort closer than files in different subdirectories. And, incidentally (which is why I tried the hash change in the first place, of course) builtin-rev-list.c will sort fairly close to rev-list.c. And no, it's not a "good hash" in the sense of being secure or unique, but that's not what we're looking for. The whole "hash" thing is misnamed here. It's not so much a hash as a "sorting number". [jc: rolled in simplification for computing the sorting number computation for thin pack base objects] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Linus Torvalds	4c068a9831	tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	c3b06a69ff	improve depth heuristic for maximum delta size This provides a linear decrement on the penalty related to delta depth instead of being an 1/x function. With this another 5% reduction is observed on packs for both the GIT repo and the Linux kernel repo, as well as fixing a pack size regression in another sample repo I have. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	1b9bc5a7b7	Fix pack-index issue on 64-bit platforms a bit more portably. Apparently <stdint.h> is not enough for uint32_t on OpenBSD; use "unsigned int" -- hopefully that would stay 32-bit on every platform we care about, at least until we update the pack-index file format. Our sha1 routines optimized for architectures use uint32_t and expects '#include <stdint.h>' to be enough, so OpenBSD on arm or ppc might have similar issues down the road, I dunno. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	ff45715ce5	pack-object: slightly more efficient Avoid creating a delta index for objects with maximum depth since they are not going to be used as delta base anyway. This also reduce peak memory usage slightly as the current object's delta index is not useful until the next object in the loop is considered for deltification. This saves a bit more than 1% on CPU usage. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	4e8da19581	simple euristic for further free packing improvements Given that the early eviction of objects with maximum delta depth may exhibit bad packing on its own, why not considering a bias against deep base objects in try_delta() to mitigate that bad behavior. This patch adjust the MAX_size allowed for a delta based on the depth of the base object as well as enabling the early eviction of max depth objects from the object window. When used separately, those two things produce slightly better and much worse results respectively. But their combined effect is a surprising significant packing improvement. With this really simple patch the GIT repo gets nearly 15% smaller, and the Linux kernel repo about 5% smaller, with no significantly measurable CPU usage difference. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Ben Clifford	d9635e9c53	include header to define uint32_t, necessary on Mac OS X Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Dennis Stosberg	66561f5a77	Fix git-pack-objects for 64-bit platforms The offset of an object in the pack is recorded as a 4-byte integer in the index file. When reading the offset from the mmap'ed index in prepare_pack_revindex(), the address is dereferenced as a long*. This works fine as long as the long type is four bytes wide. On NetBSD/sparc64, however, a long is 8 bytes wide and so dereferencing the offset produces garbage. [jc: taking suggestion by Linus to use uint32_t] Signed-off-by: Dennis Stosberg <dennis@stosberg.net> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	86118bcb46	pack-object: squelch eye-candy on non-tty One of my post-update scripts runs a git-fetch into a separate repository and sends the results back to me (2>&1); I end up getting this in the mail: Generating pack... Done counting 180 objects. Result has 131 objects. Deltifying 131 objects. 0% (0/131) done^M 1% (2/131) done^M... This defaults not to do the progress report when not on a tty. You could give --progress to force the progress report, but let's not bother even documenting it nor mentioning it in the usage string. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	9a8b6a0a9d	pack-objects: update size heuristucs. We used to omit delta base candidates that is much bigger than the target, but delta size does not grow when we delete more, so that was not a very good heuristics. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	f6c7081aa9	use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	0dec30b978	fix pack-object buffer size The input line has 40 _chars_ of sha1 and no 20 _bytes_. It should also account for the space before the pathname, and the terminating \n and \0. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	f527cb8c38	pack-objects: do not stop at object that is "too small" Because we sort the delta window by name-hash and then size, hitting an object that is too small to consider as a delta base for the current object does not mean we do not have better candidate in the window beyond it. Noticed by Shawn Pearce, analyzed by Nico, Linus and me. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	ca9de6cadf	Try using Geert similarity code in pack-objects. It appears the fingerprinting itself is too expensive to be worth doing for this purpose. A failed experiment. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	5379a5c5ee	Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Peter Eriksen	8e44025925	Use blob_, commit_, tag_, and tree_type throughout. This replaces occurences of "blob", "commit", "tag", and "tree", where they're really used as type specifiers, which we already have defined global constants for. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	687dd75c95	safe_fgets() - even more anal fgets() This is from Linus -- the previous round forgot to clear error after EINTR case. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Linus Torvalds	da93d12b00	pack-objects: be incredibly anal about stdio semantics This is the "letter of the law" version of using fgets() properly in the face of incredibly broken stdio implementations. We can work around the Solaris breakage with SA_RESTART, but in case anybody else is ever that stupid, here's the "safe" (read: "insanely anal") way to use fgets. It probably goes without saying that I'm not terribly impressed by Solaris libc. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Linus Torvalds	fb7a6531e6	Fix Solaris stdio signal handling stupidities This uses sigaction() to install the SIGALRM handler with SA_RESTART, so that Solaris stdio doesn't break completely when a signal interrupts a read. Thanks to Jason Riedy for confirming the silly Solaris signal behaviour. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	1b0c7174a1	tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	70ca1a3f85	pack-objects: simplify "thin" pack. There was a misguided logic to overly prefer using objects that we are not going to pack as the base object. This was unnecessary. It does not matter to the unpacking side where the base object is -- it matters more to make the resulting delta smaller. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago

22 Commits (a990999e0df3d0518a2fef60feb1ec269e36ada6)