kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	4321134cd8	pack-objects: document --revs, --unpacked and --all. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	8d1d8f83b5	pack-objects: further work on internal rev-list logic. This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	b5d97e6b0a	pack-objects: run rev-list equivalent internally. Instead of piping the rev-list output from its standard input, you can say: pack-objects --all --unpacked --revs pack and feed the rev parameters you would otherwise give the rev-list on its command line from the standard input. In other words: echo 'master..next' \| pack-objects --revs pack and rev-list --objects master..next \| pack-objects pack are equivalent. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	72518e9c26	more lightweight revalidation while reusing deflated stream in packing When copying from an existing pack and when copying from a loose object with new style header, the code makes sure that the piece we are going to copy out inflates well and inflate() consumes the data in full while doing so. The check to see if the xdelta really apply is quite expensive as you described, because you would need to have the image of the base object which can be represented as a delta against something else. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	7042dbf7a1	pack-objects: fix thinko in revalidate code When revalidating an entry from an existing pack entry->size and entry->type are not necessarily the size of the final object when the entry is deltified, but for base objects they must match. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	df6d61017a	pack-objects: re-validate data we copy from elsewhere. When reusing data from an existing pack and from a new style loose objects, we used to just copy it staight into the resulting pack. Instead make sure they are not corrupt, but do so only when we are not streaming to stdout, in which case the receiving end will do the validation either by unpacking the stream or by constructing the .idx file. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Shawn Pearce	e702496e43	Convert memcpy(a,b,20) to hashcpy(a,b). This abstracts away the size of the hash values when copying them from memory location to memory location, much as the introduction of hashcmp abstracted away hash value comparsion. A few call sites were using char* rather than unsigned char* so I added the cast rather than open hashcpy to be void. This is a reasonable tradeoff as most call sites already use unsigned char and the existing hashcmp is also declared to be unsigned char*. [jc: Splitted the patch to "master" part, to be followed by a patch for merge-recursive.c which is not in "master" yet. Fixed the cast in the latter hunk to combine-diff.c which was wrong in the original. Also converted ones left-over in combine-diff.c, diff-lib.c and upload-pack.c ] Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
David Rientjes	a89fccd281	Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length. Introduces global inline: hashcmp(const unsigned char sha1, const unsigned char sha2) Uses memcmp for comparison and returns the result based on the length of the hash name (a future runtime decision). Acked-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
David Rientjes	96f1e58f52	remove unnecessary initializations [jc: I needed to hand merge the changes to the updated codebase, so the result needs to be checked.] Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Matthias Kestenholz	5d4a600335	Make git-pack-objects a builtin Signed-off-by: Matthias Kestenholz <matthias@spinlock.ch> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	ceec1361eb	pack-objects: reuse deflated data from new-style loose objects. When packing an object without deltifying, if the data is stored in a loose object that is encoded with a new style header, copy it without inflating and deflating. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Jeff King	4812a93a8c	pack-objects: check pack.window for default window size For some repositories, deltas simply don't make sense. One can disable them for git-repack by adding --window, but git-push insists on making the deltas which can be very CPU-intensive for little benefit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Pavel Roskin	82e5a82fd7	Fix more typos, primarily in the code The only visible change is that git-blame doesn't understand "--compability" anymore, but it does accept "--compatibility" instead, which is already documented. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	560b25a86f	don't load objects needlessly when repacking If no delta is attempted on some objects then it is useless to load them in memory, neither create any delta index for them. The best thing to do is therefore to load and index them only when really needed. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	8dbbd14ea3	consider previous pack undeltified object state only when reusing delta data Without this there would never be a chance to improve packing for previously undeltified objects. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Linus Torvalds	51d1e83f91	Do not try futile object pairs when repacking. In the repacking window, if both objects we are looking at already came from the same (old) pack-file, don't bother delta'ing them against each other. That means that we'll still always check for better deltas for (and against!) _unpacked_ objects, but assuming incremental repacks, you'll avoid the delta creation 99% of the time. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	363b7817e0	upload-pack: prepare for sideband message support. This does not implement sideband for propagating the status to the downloader yet, but add code to capture the standard error output from the pack-objects process in preparation for sending it off to the client when the protocol extension allows us to do so. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Florian Forster	1d7f171c3a	Remove all void-pointer arithmetic. ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in various ways. Usually the strategy that required the least changes was used. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Linus Torvalds	ce0bd64299	pack-objects: improve path grouping heuristics. This trivial patch not only simplifies the name hashing, it actually improves packing for both git and the kernel. The git archive pack shrinks from 6824090->6622627 bytes (a 3% improvement), and the kernel pack shrinks from 108756213 to 108219021 (a mere 0.5% improvement, but still, it's an improvement from making the hashing much simpler!) We just create a 32-bit hash, where we "age" previous characters by two bits, so the last characters in a filename count most. So when we then compare the hashes in the sort routine, filenames that end the same way sort the same way. It takes the subdirectory into account (unless the filename is > 16 characters), but files with the same name within the same subdirectory will obviously sort closer than files in different subdirectories. And, incidentally (which is why I tried the hash change in the first place, of course) builtin-rev-list.c will sort fairly close to rev-list.c. And no, it's not a "good hash" in the sense of being secure or unique, but that's not what we're looking for. The whole "hash" thing is misnamed here. It's not so much a hash as a "sorting number". [jc: rolled in simplification for computing the sorting number computation for thin pack base objects] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Linus Torvalds	4c068a9831	tree_entry(): new tree-walking helper function This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	c3b06a69ff	improve depth heuristic for maximum delta size This provides a linear decrement on the penalty related to delta depth instead of being an 1/x function. With this another 5% reduction is observed on packs for both the GIT repo and the Linux kernel repo, as well as fixing a pack size regression in another sample repo I have. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	1b9bc5a7b7	Fix pack-index issue on 64-bit platforms a bit more portably. Apparently <stdint.h> is not enough for uint32_t on OpenBSD; use "unsigned int" -- hopefully that would stay 32-bit on every platform we care about, at least until we update the pack-index file format. Our sha1 routines optimized for architectures use uint32_t and expects '#include <stdint.h>' to be enough, so OpenBSD on arm or ppc might have similar issues down the road, I dunno. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	ff45715ce5	pack-object: slightly more efficient Avoid creating a delta index for objects with maximum depth since they are not going to be used as delta base anyway. This also reduce peak memory usage slightly as the current object's delta index is not useful until the next object in the loop is considered for deltification. This saves a bit more than 1% on CPU usage. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	4e8da19581	simple euristic for further free packing improvements Given that the early eviction of objects with maximum delta depth may exhibit bad packing on its own, why not considering a bias against deep base objects in try_delta() to mitigate that bad behavior. This patch adjust the MAX_size allowed for a delta based on the depth of the base object as well as enabling the early eviction of max depth objects from the object window. When used separately, those two things produce slightly better and much worse results respectively. But their combined effect is a surprising significant packing improvement. With this really simple patch the GIT repo gets nearly 15% smaller, and the Linux kernel repo about 5% smaller, with no significantly measurable CPU usage difference. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Ben Clifford	d9635e9c53	include header to define uint32_t, necessary on Mac OS X Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Dennis Stosberg	66561f5a77	Fix git-pack-objects for 64-bit platforms The offset of an object in the pack is recorded as a 4-byte integer in the index file. When reading the offset from the mmap'ed index in prepare_pack_revindex(), the address is dereferenced as a long*. This works fine as long as the long type is four bytes wide. On NetBSD/sparc64, however, a long is 8 bytes wide and so dereferencing the offset produces garbage. [jc: taking suggestion by Linus to use uint32_t] Signed-off-by: Dennis Stosberg <dennis@stosberg.net> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	86118bcb46	pack-object: squelch eye-candy on non-tty One of my post-update scripts runs a git-fetch into a separate repository and sends the results back to me (2>&1); I end up getting this in the mail: Generating pack... Done counting 180 objects. Result has 131 objects. Deltifying 131 objects. 0% (0/131) done^M 1% (2/131) done^M... This defaults not to do the progress report when not on a tty. You could give --progress to force the progress report, but let's not bother even documenting it nor mentioning it in the usage string. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	9a8b6a0a9d	pack-objects: update size heuristucs. We used to omit delta base candidates that is much bigger than the target, but delta size does not grow when we delete more, so that was not a very good heuristics. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	f6c7081aa9	use delta index data when finding best delta matches This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	0dec30b978	fix pack-object buffer size The input line has 40 _chars_ of sha1 and no 20 _bytes_. It should also account for the space before the pathname, and the terminating \n and \0. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	f527cb8c38	pack-objects: do not stop at object that is "too small" Because we sort the delta window by name-hash and then size, hitting an object that is too small to consider as a delta base for the current object does not mean we do not have better candidate in the window beyond it. Noticed by Shawn Pearce, analyzed by Nico, Linus and me. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	ca9de6cadf	Try using Geert similarity code in pack-objects. It appears the fingerprinting itself is too expensive to be worth doing for this purpose. A failed experiment. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	5379a5c5ee	Thin pack generation: optimization. Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Peter Eriksen	8e44025925	Use blob_, commit_, tag_, and tree_type throughout. This replaces occurences of "blob", "commit", "tag", and "tree", where they're really used as type specifiers, which we already have defined global constants for. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	687dd75c95	safe_fgets() - even more anal fgets() This is from Linus -- the previous round forgot to clear error after EINTR case. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Linus Torvalds	da93d12b00	pack-objects: be incredibly anal about stdio semantics This is the "letter of the law" version of using fgets() properly in the face of incredibly broken stdio implementations. We can work around the Solaris breakage with SA_RESTART, but in case anybody else is ever that stupid, here's the "safe" (read: "insanely anal") way to use fgets. It probably goes without saying that I'm not terribly impressed by Solaris libc. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Linus Torvalds	fb7a6531e6	Fix Solaris stdio signal handling stupidities This uses sigaction() to install the SIGALRM handler with SA_RESTART, so that Solaris stdio doesn't break completely when a signal interrupts a read. Thanks to Jason Riedy for confirming the silly Solaris signal behaviour. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	1b0c7174a1	tree/diff header cleanup. Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	70ca1a3f85	pack-objects: simplify "thin" pack. There was a misguided logic to overly prefer using objects that we are not going to pack as the base object. This was unnecessary. It does not matter to the unpacking side where the base object is -- it matters more to make the resulting delta smaller. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	38fd0721d0	diff-delta: allow reusing of the reference buffer index When a reference buffer is used multiple times then its index can be computed only once and reused multiple times. This patch adds an extra pointer to a pointer argument (from_index) to diff_delta() for this. If from_index is NULL then everything is like before. If from_index is non NULL and from_index is NULL then the index is created and its location stored to from_index. In this case the caller has the responsibility to free the memory pointed to by from_index. If from_index and from_index are non NULL then the index is reused as is. This currently saves about 10% of CPU time to repack the git archive. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Luck, Tony	2b74cffa91	Re-fix compilation warnings. Commit `8fcf1ad9c6` has a combination of double cast and Andreas' switch to using unsigned long ... just the latter is sufficient (and a lot less ugly than using the double cast). Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Timo Hirvonen	962554c616	Use setenv(), fix warnings - Fix -Wundef -Wold-style-definition warnings - Make pll_free() static [jc: original patch by Timo had another unrelated bits: - Use setenv() instead of putenv() I'm postponing that part for now.] Signed-off-by: Timo Hirvonen <tihirvon@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Luck, Tony	8fcf1ad9c6	fix warning from pack-objects.c When compiling on ia64 I get this warning (from gcc 3.4.3): gcc -o pack-objects.o -c -g -O2 -Wall -DSHA1_HEADER='<openssl/sha.h>' pack-objects.c pack-objects.c: In function `pack_revindex_ix': pack-objects.c:94: warning: cast from pointer to integer of different size A double cast (first to long, then to int) shuts gcc up, but is there a better way? [jc: Andreas Ericsson suggests to use ulong instead. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	eeef7135fe	pack-objects: hash basename and direname a bit differently. ...so that "Makefile"s from different revs are sorted together, separate from "t/Makefile"s, but close enough. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	b76f6b6278	pack-objects: allow "thin" packs to exceed depth limits When creating a new pack to be used in .git/objects/pack/ directory, we carefully count the depth of deltified objects to be reused, so that the generated pack does not to exceed the specified depth limit for runtime efficiency. However, when we are generating a thin pack that does not contain base objects, such a pack can only be used during network transfer that is expanded on the other end upon reception, so being careful and artificially cutting the delta chain does not buy us anything except increased bandwidth requirement. This patch disables the delta chain depth limit check when reusing an existing delta. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	1d6b38cc76	pack-objects: use full pathname to help hashing with "thin" pack. This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	b925410d10	pack-objects: thin pack micro-optimization. Since we sort objects by type, hash, preferredness and then size, after we have a delta against preferred base, there is no point trying a delta with non-preferred base. This seems to save expensive calls to diff-delta and it also seems to save the output space as well. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Junio C Hamano	183bdb2ccc	pack-objects eye-candy: finishing touches. This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	5e8dc750ee	also adds progress when actually writing a pack If that pack is big, it takes significant time to write and might benefit from some more eye candies as well. This is however disabled when the pack is written to stdout since in that case the output is usually piped into unpack_objects which already does its own progress reporting. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago
Nicolas Pitre	b2504a0d2f	nicer eye candies for pack-objects This provides a stable and simpler progress reporting mechanism that updates progress as often as possible but accurately not updating more than once a second. The deltification phase is also made more interesting to watch (since repacking a big repository and only seeing a dot appear once every many seconds is rather boring and doesn't provide much food for anticipation). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	19 years ago

11 Commits (83b5d2f5b0c95fe102bc3d1cc2947abbdf5e5c5b)