kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	4947367267	list-objects: pass callback data to show_objects() The traverse_commit_list() API takes two callback functions, one to show commit objects, and the other to show other kinds of objects. Even though the former has a callback data parameter, so that the callback does not have to rely on global state, the latter does not. Give the show_objects() callback the same callback data parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Michael Haggerty	d932f4eb9f	Rename git_checkattr() to git_check_attr() Suggested by: Junio Hamano <gitster@pobox.com> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Junio C Hamano	1b4bb16b9e	pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) \| git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Junio C Hamano	ef49a7a012	zlib: zlib can only process 4GB at a time The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Junio C Hamano	225a6f1068	zlib: wrap deflateBound() too Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Junio C Hamano	55bb5c9147	zlib: wrap deflate side of the API Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Junio C Hamano	15366280c2	Teach core.bigfilethreashold to pack-objects The pack-objects command should take notice of the object file and refrain from attempting to delta large ones, to be consistent with the fast-import command. Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Junio C Hamano	ebcfb3791a	write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Junio C Hamano	b361888dd5	thread-utils.h: simplify the inclusion All files that include this header file use the same four line incantation: #ifndef NO_PTHREADS #include <pthread.h> #include "thread-utils.h" #endif Move the responsibility for that gymnastics to the header file from the files that include it. This approach makes it easier to later declare new services that are related to threading in thread-utils.h and have them available to all the threading code. Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Jonathan Nieder	bc9b21755e	pack-objects: mark file-local variable static old_try_to_free_routine is not meant for use from other files. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Nicolas Pitre	71064a956b	make pack-objects a bit more resilient to repo corruption Right now, packing valid objects could fail when creating a thin pack simply because a pack edge object used as a preferred base is corrupted. Since preferred base objects are not strictly needed to produce a valid pack, let's not consider the inability to read them as a fatal error. Delta compression may well be attempted against other objects in the search window. To avoid warning storms (we are in the inner loop of the delta search window) a warning is emitted only on the first occurrence. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Štěpán Němec	884220653f	Put a space between `<' and argument in pack-objects usage string This makes it cosistent with other places (including the git-pack-objects(1) manpage itself) and avoids possible confusion (I, for one, mistook `<object-list' for a `<object-list>' typo at first when preparing this series). Signed-off-by: Štěpán Němec <stepnem@gmail.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Štěpán Němec	0adda9362a	Use parentheses and `...' where appropriate Remove some stray usage of other bracket types and asterisks for the same purpose. Signed-off-by: Štěpán Němec <stepnem@gmail.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Štěpán Němec	62b4698e55	Use angles for placeholders consistently Signed-off-by: Štěpán Němec <stepnem@gmail.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Erik Faye-Lund	c03c83152d	do not depend on signed integer overflow Signed integer overflow is not defined in C, so do not depend on it. This fixes a problem with GCC 4.4.0 and -O3 where the optimizer would consider "consumed_bytes > consumed_bytes + bytes" as a constant expression, and never execute the die()-call. Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Johannes Schindelin	8695353147	Fix typo in pack-objects' usage Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Pat Thoyts <patthoyts@users.sourceforge.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	14 years ago
Linus Torvalds	81b50f3ce4	Move 'builtin-' into a 'builtin/' subdirectory This shrinks the top-level directory a bit, and makes it much more pleasant to use auto-completion on the thing. Instead of [torvalds@nehalem git]$ em buil<tab> Display all 180 possibilities? (y or n) [torvalds@nehalem git]$ em builtin-sh builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o [torvalds@nehalem git]$ em builtin-shor<tab> builtin-shortlog.c builtin-shortlog.o [torvalds@nehalem git]$ em builtin-shortlog.c you get [torvalds@nehalem git]$ em buil<tab> [type] builtin/ builtin.h [torvalds@nehalem git]$ em builtin [auto-completes to] [torvalds@nehalem git]$ em builtin/sh<tab> [type] shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o [torvalds@nehalem git]$ em builtin/sho [auto-completes to] [torvalds@nehalem git]$ em builtin/shor<tab> [type] shortlog.c shortlog.o [torvalds@nehalem git]$ em builtin/shortlog.c which doesn't seem all that different, but not having that annoying break in "Display all 180 possibilities?" is quite a relief. NOTE! If you do this in a clean tree (no object files etc), or using an editor that has auto-completion rules that ignores '.o' files, you won't see that annoying 'Display all 180 possibilities?' message - it will just show the choices instead. I think bash has some cut-off around 100 choices or something. So the reason I see this is that I'm using an odd editory, and thus don't have the rules to cut down on auto-completion. But you can simulate that by using 'ls' instead, or something similar. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	15 years ago
Michael Lukashov	1b22b6c897	refactor duplicated encode_header in pack-objects and fast-import The following function is duplicated: encode_header Move this function to sha1_file.c and rename it 'encode_in_pack_object_header', as suggested by Junio C Hamano Signed-off-by: Michael Lukashov <michael.lukashov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	15 years ago
Nicolas Pitre	720c9f7bda	Revert "pack-objects: fix pack generation when using pack_size_limit" This reverts most of commit `a2430dde8c`. That commit made the situation better for repositories with relatively small number of objects. However with many objects and a small pack size limit, the time required to complete the repack tends towards O(n^2), or even much worse with long delta chains. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	15 years ago
Nicolas Pitre	07cf0f2407	make --max-pack-size argument to 'git pack-object' count in bytes The value passed to --max-pack-size used to count in MiB which was inconsistent with the corresponding configuration variable as well as other command arguments which are defined to count in bytes with an optional unit suffix. This brings --max-pack-size in line with the rest of Git. Also, in order not to cause havoc with people used to the previous megabyte scale, and because this is a sane thing to do anyway, a minimum size of 1 MiB is enforced to avoid an explosion of pack files. Adjust and extend test suite accordingly. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	15 years ago
Nicolas Pitre	a2430dde8c	pack-objects: fix pack generation when using pack_size_limit Current handling of pack_size_limit is quite suboptimal. Let's consider a list of objects to pack which contain alternatively big and small objects (which pretty matches reality when big blobs are interlaced with tree objects). Currently, the code simply close the pack and opens a new one when the next object in line breaks the size limit. The current code may degenerate into: - small tree object => store into pack #1 - big blob object busting the pack size limit => store into pack #2 - small blob but pack #2 is over the limit already => pack #3 - big blob busting the size limit => pack #4 - small tree but pack #4 is over the limit => pack #5 - big blob => pack #6 - small tree => pack #7 - ... and so on. The reality is that the content of packs 1, 3, 5 and 7 could well be stored more efficiently (and delta compressed) together in pack #1 if the big blobs were not forcing an immediate transition to a new pack. Incidentally this can be fixed pretty easily by simply skipping over those objects that are too big to fit in the current pack while trying the whole list of unwritten objects, and then that list considered from the beginning again when a new pack is opened. This creates much fewer smallish pack files and help making more predictable test cases for the test suite. This change made one of the self sanity checks useless so it is removed as well. That check was rather redundant already anyway. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	15 years ago
Dan McGee	7eb151d6e2	Make NO_PTHREADS the sole thread configuration variable When the first piece of threaded code was introduced in commit `8ecce684`, it came with its own THREADED_DELTA_SEARCH Makefile option. Since this time, more threaded code has come into the codebase and a NO_PTHREADS option has also been added. Get rid of the original option as the newer, more generic option covers everything we need. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	15 years ago
Linus Torvalds	3bb7256281	make "index-pack" a built-in This required some fairly trivial packfile function 'const' cleanup, since the builtin commands get a const char *argv[] array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	15 years ago
Junio C Hamano	7fb0eaa289	git_attr(): fix function signature The function took (name, namelen) as its arguments, but all the public callers wanted to pass a full string. Demote the counted-string interface to an internal API status, and allow public callers to just pass the string to the function. Signed-off-by: Junio C Hamano <gitster@pobox.com>	15 years ago
Andrzej K. Haczewski	44626dc7d5	MSVC: Windows-native implementation for subset of Pthreads API This patch implements native to Windows subset of pthreads API used by Git. It allows to remove Pthreads for Win32 dependency for MSVC, msysgit and Cygwin. [J6t: If the MinGW build was built as part of the msysgit build environment, then threading was already enabled because the pthreads-win32 package is available in msysgit. With this patch, we can now enable threaded code unconditionally.] Signed-off-by: Andrzej K. Haczewski <ahaczewski@gmail.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	15 years ago
Nicolas Pitre	4f36627518	pack-objects: split implications of --all-progress from progress activation Currently the --all-progress flag is used to use force progress display during the writing object phase even if output goes to stdout which is primarily the case during a push operation. This has the unfortunate side effect of forcing progress display even if stderr is not a terminal. Let's introduce the --all-progress-implied argument which has the same intent except for actually forcing the activation of any progress display. With this, progress display will be automatically inhibited whenever stderr is not a terminal, or full progress display will be included otherwise. This should let people use 'git push' within a cron job without filling their logs with useless percentage displays. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Tested-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	15 years ago
Nicolas Pitre	ef0555712c	pack-objects: move thread autodetection closer to relevant code Let's keep thread stuff close together if possible. And in this case, this even reduces the #ifdef noise, and allows for skipping the autodetection altogether if delta search is not needed (like with a pure clone). Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	15 years ago
Thiago Farina	812bdbc31b	pack-objects: remove SP at the end of usage string These spaces immediately before the end of lines are unnecessary. While at it, instead of using a single string literal with backslashes at end of each line, split the lines into individual string literals and tell the compiler to concatenate them. Signed-off-by: Thiago Farina <tfransosi@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Nicolas Pitre	0ef95f72f8	pack-objects: free preferred base memory after usage When adding objects for preferred delta base, the content from tree objects leading to given paths is kept in a cache. This has the potential to grow significantly, especially with large directories as the whole tree object content is loaded in memory, even if in practice the number of those objects is limited to the 256 cache entries plus the $window root tree objects. Still, that can't hurt freeing that up after object enumeration is done, and before more memory is needed for delta search. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Junio C Hamano	dcda3614d4	builtin-pack-objects.c: avoid vla This is one of only two places that we use C99 variable length array on the stack, which some older compilers apparently are not happy with. Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Brian Gianforcaro	eeefa7c90e	Style fixes, add a space after if/for/while. The majority of code in core git appears to use a single space after if/for/while. This is an attempt to bring more code to this standard. These are entirely cosmetic changes. Signed-off-by: Brian Gianforcaro <b.gianfo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Nicolas Pitre	5749b0b2f9	don't let the delta cache grow unbounded in 'git repack' I have 4GB of RAM on my system which should, in theory, be quite enough to repack a 600 MB repository. However the unbounded delta cache size always pushes it into swap, at which point everything virtually comes to a halt. So unbounded caches are never a good idea. A default of 256MB should be a good compromize between memory usage and speed where medium sized repositories are still likely to fit in the cache with a reasonable memory usage, and larger repositories are going to take quite some time to repack already anyway. While at it, clarify the associated config variable documentation entries a bit. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Johannes Schindelin	7f3140cd23	git repack: keep commits hidden by a graft When you have grafts that pretend that a given commit has different parents than the ones recorded in the commit object, it is dangerous to let 'git repack' remove those hidden parents, as you can easily remove the graft and end up with a broken repository. So let's play it safe and keep those parent objects and everything that is reachable by them, in addition to the grafted parents. As this behavior can only be triggered by git pack-objects, and as that command handles duplicate parents gracefully, we do not bother to cull duplicated parents that may result by using both true and grafted parents. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Thomas Rast	d824cbba02	Convert existing die(..., strerror(errno)) to die_errno() Change calls to die(..., strerror(errno)) to use the new die_errno(). In the process, also make slight style adjustments: at least state _something_ about the function that failed (instead of just printing the pathname), and put paths in single quotes. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Linus Torvalds	48fb7deb5b	Fix big left-shifts of unsigned char Shifting 'unsigned char' or 'unsigned short' left can result in sign extension errors, since the C integer promotion rules means that the unsigned char/short will get implicitly promoted to a signed 'int' due to the shift (or due to other operations). This normally doesn't matter, but if you shift things up sufficiently, it will now set the sign bit in 'int', and a subsequent cast to a bigger type (eg 'long' or 'unsigned long') will now sign-extend the value despite the original expression being unsigned. One example of this would be something like unsigned long size; unsigned char c; size += c << 24; where despite all the variables being unsigned, 'c << 24' ends up being a signed entity, and will get sign-extended when then doing the addition in an 'unsigned long' type. Since git uses 'unsigned char' pointers extensively, we actually have this bug in a couple of places. I may have missed some, but this is the result of looking at git grep '[^0-9 ][ ]<<[ ][a-z]' -- '.c' '.h' git grep '<<[ ]24' which catches at least the common byte cases (shifting variables by a variable amount, and shifting by 24 bits). I also grepped for just 'unsigned char' variables in general, and converted the ones that most obviously ended up getting implicitly cast immediately anyway (eg hash_name(), encode_85()). In addition to just avoiding 'unsigned char', this patch also tries to use a common idiom for the delta header size thing. We had three different variations on it: "& 0x7fUL" in one place (getting the sign extension right), and "& ~0x80" and "& 0x7f" in two other places (not getting it right). Apart from making them all just avoid using "unsigned char" at all, I also unified them to then use a simple "& 0x7f". I considered making a sparse extension which warns about doing implicit casts from unsigned types to signed types, but it gets rather complex very quickly, so this is just a hack. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Christian Couder	dae556bdb1	environment: add global variable to disable replacement This new "read_replace_refs" global variable is set to 1 by default, so that replace refs are used by default. But reachability traversal and packing commands ("cmd_fsck", "cmd_prune", "cmd_pack_objects", "upload_pack", "cmd_unpack_objects") set it to 0, as they must work with the original DAG. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Mike Ralphson	3ea3c215c0	Fix typos / spelling in comments Signed-off-by: Mike Ralphson <mike@abacus.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Linus Torvalds	8d2dfc49b1	process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Linus Torvalds	cf2ab916af	show_object(): push path_name() call further down In particular, pushing the "path_name()" call _into_ the show() function would seem to allow - more clarity into who "owns" the name (ie now when we free the name in the show_object callback, it's because we generated it ourselves by calling path_name()) - not calling path_name() at all, either because we don't care about the name in the first place, or because we are actually happy walking the linked list of "struct name_path *" and the last component. Now, I didn't do that latter optimization, because it would require some more coding, but especially looking at "builtin-pack-objects.c", we really don't even want the whole pathname, we really would be better off with the list of path components. Why? We use that name for two things: - add_preferred_base_object(), which actually _wants_ to traverse the path, and now does it by looking for '/' characters! - for 'name_hash()', which only cares about the last 16 characters of a name, so again, generating the full name seems to be just unnecessary work. Anyway, so I didn't look any closer at those things, but it did convince me that the "show_object()" calling convention was crazy, and we're actually better off doing _less_ in list-objects.c, and giving people access to the internal data structures so that they can decide whether they want to generate a path-name or not. This patch does that, and then for people who did use the name (even if they might do something more clever in the future), it just does the straightforward "name = path_name(path, component); .. free(name);" thing. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Linus Torvalds	213152688c	process_{tree,blob}: Remove useless xstrdup calls On Wed, 8 Apr 2009, Björn Steinbrink wrote: > > The name of the processed object was duplicated for passing it to > add_object(), but that already calls path_name, which allocates a new > string anyway. So the memory allocated by the xstrdup calls just went > nowhere, leaking memory. Ack, ack. There's another easy 5% or so for the built-in object walker: once we've created the hash from the name, the name isn't interesting any more, and so something trivial like this can help a bit. Does it matter? Probably not on its own. But a few more memory saving tricks and it might all make a difference. Linus Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Dan McGee	b6c29915d2	Update delta compression message to be less misleading In the case of a small repository, pack-objects is smart enough to not start more threads than necessary. However, the output to the user always reports the value of the pack.threads configuration and not the real number of threads to be used. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Christian Couder	11c211fa06	list-objects: add "void *data" parameter to show functions The goal of this patch is to get rid of the "static struct rev_info revs" static variable in "builtin-rev-list.c". To do that, we need to pass the revs to the "show_commit" function in "builtin-rev-list.c" and this in turn means that the "traverse_commit_list" function in "list-objects.c" must be passed functions pointers to functions with 2 parameters instead of one. So we have to change all the callers and all the functions passed to "traverse_commit_list". Anyway this makes the code more clean and more generic, so it should be a good thing in the long run. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Nicolas Pitre	720fe22d50	avoid possible overflow in delta size filtering computation On a 32-bit system, the maximum possible size for an object is less than 4GB, while 64-bit systems may cope with larger objects. Due to this limitation, variables holding object sizes are using an unsigned long type (32 bits on 32-bit systems, or 64 bits on 64-bit systems). When large objects are encountered, and/or people play with large delta depth values, it is possible for the maximum allowed delta size computation to overflow, especially on a 32-bit system. When this occurs, surviving result bits may represent a value much smaller than what it is supposed to be, or even zero. This prevents some objects from being deltified although they do get deltified when a smaller depth limit is used. Fix this by always performing a 64-bit multiplication. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Brandon Casey	094085e336	pack-objects: don't loosen objects available in alternate or kept packs If pack-objects is called with the --unpack-unreachable option then it will unpack (i.e. loosen) all unreferenced objects from local not-kept packs, including those that also exist in packs residing in an alternate object database or a locally kept pack. The only user of this option is git-repack. In this case, repack will follow the call to pack-objects with a call to prune-packed, which will delete these newly loosened objects, making the act of loosening a waste of time. The unnecessary loosening can be avoided by checking whether an object exists in a non-local pack or a locally kept pack before loosening it. This fixes the 'local packed unreachable obs that exist in alternate ODB are not loosened' test in t7700. Signed-off-by: Brandon Casey <drafnel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Brandon Casey	4d6acb7041	Remove --kept-pack-only option and associated infrastructure This option to pack-objects/rev-list was created to improve the -A and -a options of repack. It was found to be lacking in that it did not provide the ability to differentiate between local and non-local kept packs, and found to be unnecessary since objects residing in local kept packs can be filtered out by the --honor-pack-keep option. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Brandon Casey	79bc4c7155	pack-objects: only repack or loosen objects residing in "local" packs These two features were invented for use by repack when repack will delete the local packs that have been made redundant. The packs accessible through alternates are not deleted by repack, so the objects contained in them are still accessible after the local packs are deleted. They do not need to be repacked into the new pack or loosened. For the case of loosening they would immediately be deleted by the subsequent prune-packed that is called by repack anyway. This fixes the test 'packed unreachable obs in alternate ODB are not loosened' in t7700. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Junio C Hamano	69e020ae00	is_kept_pack(): final clean-up Now is_kept_pack() is just a member lookup into a structure, we can write it as such. Also rewrite the sole caller of has_sha1_kept_pack() to switch on the criteria the callee uses (namely, revs->kept_pack_only) between calling has_sha1_kept_pack() and has_sha1_pack(), so that these two callees do not have to take a pointer to struct rev_info as an argument. This removes the header file dependency issue temporarily introduced by the earlier commit, so we revert changes associated to that as well. Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Junio C Hamano	03a9683d22	Simplify is_kept_pack() This removes --unpacked=<packfile> parameter from the revision parser, and rewrites its use in git-repack to pass a single --kept-pack-only option instead. The new --kept-pack-only option means just that. When this option is given, is_kept_pack() that used to say "not on the --unpacked=<packfile> list" now says "the packfile has corresponding .keep file". Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Junio C Hamano	386cb77210	Consolidate ignore_packed logic more This refactors three loops that check if a given packfile is on the ignore_packed list into a function is_kept_pack(). The function returns false for a pack on the list, and true for a pack not on the list, because this list is solely used by "git repack" to pass list of packfiles that do not have corresponding .keep files, i.e. a packfile not on the list is "kept". Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago
Junio C Hamano	6e180cdcec	Make sure objects/pack exists before creating a new pack In a repository created with git older than `f49fb35` (git-init-db: create "pack" subdirectory under objects, 2005-06-27), objects/pack/ directory is not created upon initialization. It was Ok because subdirectories are created as needed inside directories init-db creates, and back then, packfiles were recent invention. After the said commit, new codepaths started relying on the presense of objects/pack/ directory in the repository. This was exacerbated with `8b4eb6b` (Do not perform cross-directory renames when creating packs, 2008-09-22) that moved the location temporary pack files are created from objects/ directory to objects/pack/ directory, because moving temporary to the final location was done carefully with lazy leading directory creation. Many packfile related operations in such an old repository can fail mysteriously because of this. This commit introduces two helper functions to make things work better. - odb_mkstemp() is a specialized version of mkstemp() to refactor the code and teach it to create leading directories as needed; - odb_pack_keep() refactors the code to create a ".keep" file while create leading directories as needed. Signed-off-by: Junio C Hamano <gitster@pobox.com>	16 years ago

25 Commits (e5fa45c159241b609bce40fa7a8687796e4b941d)