The traverse_commit_list() API takes two callback functions, one to show
commit objects, and the other to show other kinds of objects. Even though
the former has a callback data parameter, so that the callback does not
have to rely on global state, the latter does not.
Give the show_objects() callback the same callback data parameter.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Suggested by: Junio Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.
But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.
In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size. The worst part of this story is that we often pass around
z_stream (the state object used by zlib) to keep track of the number of
used bytes in input/output buffer by inspecting these two fields, which
practically limits our callchain to the same 4GB limit.
Wrap z_stream in another structure git_zstream that can express avail_in
and avail_out in unsigned long. For now, just die() when the caller gives
a size that cannot be given to a single zlib call. In later patches in the
series, we would make git_inflate() and git_deflate() internally loop to
give callers an illusion that our "improved" version of zlib interface can
operate on a buffer larger than 4GB in one go.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use
of deflateInit2 in remote-curl.c to tell the library to use gzip header
and trailer in git_deflate_init_gzip().
There is only one caller that cares about the status from deflateEnd().
Introduce git_deflate_end_gently() to let that sole caller retrieve the
status and act on it (i.e. die) for now, but we would probably want to
make inflate_end/deflate_end die when they ran out of memory and get
rid of the _gently() kind.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pack-objects command should take notice of the object file and
refrain from attempting to delta large ones, to be consistent with
the fast-import command.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove two globals, pack_idx_default version and pack_idx_off32_limit,
and place them in a pack_idx_option structure. Allow callers to pass
it to write_idx_file() as a parameter.
Adjust all callers to the API change.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All files that include this header file use the same four line
incantation:
#ifndef NO_PTHREADS
#include <pthread.h>
#include "thread-utils.h"
#endif
Move the responsibility for that gymnastics to the header file from the
files that include it. This approach makes it easier to later declare new
services that are related to threading in thread-utils.h and have them
available to all the threading code.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
old_try_to_free_routine is not meant for use from other files.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now, packing valid objects could fail when creating a thin pack
simply because a pack edge object used as a preferred base is corrupted.
Since preferred base objects are not strictly needed to produce a valid
pack, let's not consider the inability to read them as a fatal error.
Delta compression may well be attempted against other objects in the
search window. To avoid warning storms (we are in the inner loop of
the delta search window) a warning is emitted only on the first
occurrence.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes it cosistent with other places (including the
git-pack-objects(1) manpage itself) and avoids possible confusion (I,
for one, mistook `<object-list' for a `<object-list>' typo at first when
preparing this series).
Signed-off-by: Štěpán Němec <stepnem@gmail.com>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove some stray usage of other bracket types and asterisks for the
same purpose.
Signed-off-by: Štěpán Němec <stepnem@gmail.com>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed integer overflow is not defined in C, so do not depend on it.
This fixes a problem with GCC 4.4.0 and -O3 where the optimizer would
consider "consumed_bytes > consumed_bytes + bytes" as a constant
expression, and never execute the die()-call.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Pat Thoyts <patthoyts@users.sourceforge.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This shrinks the top-level directory a bit, and makes it much more
pleasant to use auto-completion on the thing. Instead of
[torvalds@nehalem git]$ em buil<tab>
Display all 180 possibilities? (y or n)
[torvalds@nehalem git]$ em builtin-sh
builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c
builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o
[torvalds@nehalem git]$ em builtin-shor<tab>
builtin-shortlog.c builtin-shortlog.o
[torvalds@nehalem git]$ em builtin-shortlog.c
you get
[torvalds@nehalem git]$ em buil<tab> [type]
builtin/ builtin.h
[torvalds@nehalem git]$ em builtin [auto-completes to]
[torvalds@nehalem git]$ em builtin/sh<tab> [type]
shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o
[torvalds@nehalem git]$ em builtin/sho [auto-completes to]
[torvalds@nehalem git]$ em builtin/shor<tab> [type]
shortlog.c shortlog.o
[torvalds@nehalem git]$ em builtin/shortlog.c
which doesn't seem all that different, but not having that annoying
break in "Display all 180 possibilities?" is quite a relief.
NOTE! If you do this in a clean tree (no object files etc), or using an
editor that has auto-completion rules that ignores '*.o' files, you
won't see that annoying 'Display all 180 possibilities?' message - it
will just show the choices instead. I think bash has some cut-off
around 100 choices or something.
So the reason I see this is that I'm using an odd editory, and thus
don't have the rules to cut down on auto-completion. But you can
simulate that by using 'ls' instead, or something similar.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The following function is duplicated:
encode_header
Move this function to sha1_file.c and rename it 'encode_in_pack_object_header',
as suggested by Junio C Hamano
Signed-off-by: Michael Lukashov <michael.lukashov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts most of commit a2430dde8c.
That commit made the situation better for repositories with relatively
small number of objects. However with many objects and a small pack size
limit, the time required to complete the repack tends towards O(n^2),
or even much worse with long delta chains.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The value passed to --max-pack-size used to count in MiB which was
inconsistent with the corresponding configuration variable as well as
other command arguments which are defined to count in bytes with an
optional unit suffix. This brings --max-pack-size in line with the
rest of Git.
Also, in order not to cause havoc with people used to the previous
megabyte scale, and because this is a sane thing to do anyway, a
minimum size of 1 MiB is enforced to avoid an explosion of pack files.
Adjust and extend test suite accordingly.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Current handling of pack_size_limit is quite suboptimal. Let's consider
a list of objects to pack which contain alternatively big and small
objects (which pretty matches reality when big blobs are interlaced
with tree objects). Currently, the code simply close the pack and opens
a new one when the next object in line breaks the size limit.
The current code may degenerate into:
- small tree object => store into pack #1
- big blob object busting the pack size limit => store into pack #2
- small blob but pack #2 is over the limit already => pack #3
- big blob busting the size limit => pack #4
- small tree but pack #4 is over the limit => pack #5
- big blob => pack #6
- small tree => pack #7
- ... and so on.
The reality is that the content of packs 1, 3, 5 and 7 could well be
stored more efficiently (and delta compressed) together in pack #1 if
the big blobs were not forcing an immediate transition to a new pack.
Incidentally this can be fixed pretty easily by simply skipping over
those objects that are too big to fit in the current pack while trying
the whole list of unwritten objects, and then that list considered from
the beginning again when a new pack is opened. This creates much fewer
smallish pack files and help making more predictable test cases for the
test suite.
This change made one of the self sanity checks useless so it is removed
as well. That check was rather redundant already anyway.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the first piece of threaded code was introduced in commit 8ecce684, it
came with its own THREADED_DELTA_SEARCH Makefile option. Since this time,
more threaded code has come into the codebase and a NO_PTHREADS option has
also been added. Get rid of the original option as the newer, more generic
option covers everything we need.
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This required some fairly trivial packfile function 'const' cleanup,
since the builtin commands get a const char *argv[] array.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function took (name, namelen) as its arguments, but all the public
callers wanted to pass a full string.
Demote the counted-string interface to an internal API status, and allow
public callers to just pass the string to the function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch implements native to Windows subset of pthreads API used by Git.
It allows to remove Pthreads for Win32 dependency for MSVC, msysgit and
Cygwin.
[J6t: If the MinGW build was built as part of the msysgit build
environment, then threading was already enabled because the
pthreads-win32 package is available in msysgit. With this patch, we can now
enable threaded code unconditionally.]
Signed-off-by: Andrzej K. Haczewski <ahaczewski@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently the --all-progress flag is used to use force progress display
during the writing object phase even if output goes to stdout which is
primarily the case during a push operation. This has the unfortunate
side effect of forcing progress display even if stderr is not a
terminal.
Let's introduce the --all-progress-implied argument which has the same
intent except for actually forcing the activation of any progress
display. With this, progress display will be automatically inhibited
whenever stderr is not a terminal, or full progress display will be
included otherwise. This should let people use 'git push' within a cron
job without filling their logs with useless percentage displays.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Tested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's keep thread stuff close together if possible. And in this case,
this even reduces the #ifdef noise, and allows for skipping the
autodetection altogether if delta search is not needed (like with a pure
clone).
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These spaces immediately before the end of lines are unnecessary.
While at it, instead of using a single string literal with backslashes
at end of each line, split the lines into individual string literals
and tell the compiler to concatenate them.
Signed-off-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When adding objects for preferred delta base, the content from tree
objects leading to given paths is kept in a cache. This has the
potential to grow significantly, especially with large directories as
the whole tree object content is loaded in memory, even if in practice
the number of those objects is limited to the 256 cache entries plus the
$window root tree objects. Still, that can't hurt freeing that up after
object enumeration is done, and before more memory is needed for delta
search.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is one of only two places that we use C99 variable length array on
the stack, which some older compilers apparently are not happy with.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The majority of code in core git appears to use a single
space after if/for/while. This is an attempt to bring more
code to this standard. These are entirely cosmetic changes.
Signed-off-by: Brian Gianforcaro <b.gianfo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I have 4GB of RAM on my system which should, in theory, be quite enough
to repack a 600 MB repository. However the unbounded delta cache size
always pushes it into swap, at which point everything virtually comes to
a halt. So unbounded caches are never a good idea.
A default of 256MB should be a good compromize between memory usage and
speed where medium sized repositories are still likely to fit in the
cache with a reasonable memory usage, and larger repositories are going
to take quite some time to repack already anyway.
While at it, clarify the associated config variable documentation
entries a bit.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you have grafts that pretend that a given commit has different
parents than the ones recorded in the commit object, it is dangerous
to let 'git repack' remove those hidden parents, as you can easily
remove the graft and end up with a broken repository.
So let's play it safe and keep those parent objects and everything
that is reachable by them, in addition to the grafted parents.
As this behavior can only be triggered by git pack-objects, and as that
command handles duplicate parents gracefully, we do not bother to cull
duplicated parents that may result by using both true and grafted
parents.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change calls to die(..., strerror(errno)) to use the new die_errno().
In the process, also make slight style adjustments: at least state
_something_ about the function that failed (instead of just printing
the pathname), and put paths in single quotes.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Shifting 'unsigned char' or 'unsigned short' left can result in sign
extension errors, since the C integer promotion rules means that the
unsigned char/short will get implicitly promoted to a signed 'int' due to
the shift (or due to other operations).
This normally doesn't matter, but if you shift things up sufficiently, it
will now set the sign bit in 'int', and a subsequent cast to a bigger type
(eg 'long' or 'unsigned long') will now sign-extend the value despite the
original expression being unsigned.
One example of this would be something like
unsigned long size;
unsigned char c;
size += c << 24;
where despite all the variables being unsigned, 'c << 24' ends up being a
signed entity, and will get sign-extended when then doing the addition in
an 'unsigned long' type.
Since git uses 'unsigned char' pointers extensively, we actually have this
bug in a couple of places.
I may have missed some, but this is the result of looking at
git grep '[^0-9 ][ ]*<<[ ][a-z]' -- '*.c' '*.h'
git grep '<<[ ]*24'
which catches at least the common byte cases (shifting variables by a
variable amount, and shifting by 24 bits).
I also grepped for just 'unsigned char' variables in general, and
converted the ones that most obviously ended up getting implicitly cast
immediately anyway (eg hash_name(), encode_85()).
In addition to just avoiding 'unsigned char', this patch also tries to use
a common idiom for the delta header size thing. We had three different
variations on it: "& 0x7fUL" in one place (getting the sign extension
right), and "& ~0x80" and "& 0x7f" in two other places (not getting it
right). Apart from making them all just avoid using "unsigned char" at
all, I also unified them to then use a simple "& 0x7f".
I considered making a sparse extension which warns about doing implicit
casts from unsigned types to signed types, but it gets rather complex very
quickly, so this is just a hack.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new "read_replace_refs" global variable is set to 1 by
default, so that replace refs are used by default. But
reachability traversal and packing commands ("cmd_fsck",
"cmd_prune", "cmd_pack_objects", "upload_pack",
"cmd_unpack_objects") set it to 0, as they must work with the
original DAG.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Here's a less trivial thing, and slightly more dubious one.
I was looking at that "struct object_array objects", and wondering why we
do that. I have honestly totally forgotten. Why not just call the "show()"
function as we encounter the objects? Rather than add the objects to the
object_array, and then at the very end going through the array and doing a
'show' on all, just do things more incrementally.
Now, there are possible downsides to this:
- the "buffer using object_array" _can_ in theory result in at least
better I-cache usage (two tight loops rather than one more spread out
one). I don't think this is a real issue, but in theory..
- this _does_ change the order of the objects printed. Instead of doing a
"process_tree(revs, commit->tree, &objects, NULL, "");" in the loop
over the commits (which puts all the root trees _first_ in the object
list, this patch just adds them to the list of pending objects, and
then we'll traverse them in that order (and thus show each root tree
object together with the objects we discover under it)
I _think_ the new ordering actually makes more sense, but the object
ordering is actually a subtle thing when it comes to packing
efficiency, so any change in order is going to have implications for
packing. Good or bad, I dunno.
- There may be some reason why we did it that odd way with the object
array, that I have simply forgotten.
Anyway, now that we don't buffer up the objects before showing them
that may actually result in lower memory usage during that whole
traverse_commit_list() phase.
This is seriously not very deeply tested. It makes sense to me, it seems
to pass all the tests, it looks ok, but...
Does anybody remember why we did that "object_array" thing? It used to be
an "object_list" a long long time ago, but got changed into the array due
to better memory usage patterns (those linked lists of obejcts are
horrible from a memory allocation standpoint). But I wonder why we didn't
do this back then. Maybe there's a reason for it.
Or maybe there _used_ to be a reason, and no longer is.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In particular, pushing the "path_name()" call _into_ the show() function
would seem to allow
- more clarity into who "owns" the name (ie now when we free the name in
the show_object callback, it's because we generated it ourselves by
calling path_name())
- not calling path_name() at all, either because we don't care about the
name in the first place, or because we are actually happy walking the
linked list of "struct name_path *" and the last component.
Now, I didn't do that latter optimization, because it would require some
more coding, but especially looking at "builtin-pack-objects.c", we really
don't even want the whole pathname, we really would be better off with the
list of path components.
Why? We use that name for two things:
- add_preferred_base_object(), which actually _wants_ to traverse the
path, and now does it by looking for '/' characters!
- for 'name_hash()', which only cares about the last 16 characters of a
name, so again, generating the full name seems to be just unnecessary
work.
Anyway, so I didn't look any closer at those things, but it did convince
me that the "show_object()" calling convention was crazy, and we're
actually better off doing _less_ in list-objects.c, and giving people
access to the internal data structures so that they can decide whether
they want to generate a path-name or not.
This patch does that, and then for people who did use the name (even if
they might do something more clever in the future), it just does the
straightforward "name = path_name(path, component); .. free(name);" thing.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Wed, 8 Apr 2009, Björn Steinbrink wrote:
>
> The name of the processed object was duplicated for passing it to
> add_object(), but that already calls path_name, which allocates a new
> string anyway. So the memory allocated by the xstrdup calls just went
> nowhere, leaking memory.
Ack, ack.
There's another easy 5% or so for the built-in object walker: once we've
created the hash from the name, the name isn't interesting any more, and
so something trivial like this can help a bit.
Does it matter? Probably not on its own. But a few more memory saving
tricks and it might all make a difference.
Linus
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the case of a small repository, pack-objects is smart enough to not
start more threads than necessary. However, the output to the user always
reports the value of the pack.threads configuration and not the real
number of threads to be used.
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The goal of this patch is to get rid of the "static struct rev_info
revs" static variable in "builtin-rev-list.c".
To do that, we need to pass the revs to the "show_commit" function
in "builtin-rev-list.c" and this in turn means that the
"traverse_commit_list" function in "list-objects.c" must be passed
functions pointers to functions with 2 parameters instead of one.
So we have to change all the callers and all the functions passed
to "traverse_commit_list".
Anyway this makes the code more clean and more generic, so it
should be a good thing in the long run.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On a 32-bit system, the maximum possible size for an object is less than
4GB, while 64-bit systems may cope with larger objects. Due to this
limitation, variables holding object sizes are using an unsigned long
type (32 bits on 32-bit systems, or 64 bits on 64-bit systems).
When large objects are encountered, and/or people play with large delta
depth values, it is possible for the maximum allowed delta size
computation to overflow, especially on a 32-bit system. When this
occurs, surviving result bits may represent a value much smaller than
what it is supposed to be, or even zero. This prevents some objects
from being deltified although they do get deltified when a smaller depth
limit is used. Fix this by always performing a 64-bit multiplication.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If pack-objects is called with the --unpack-unreachable option then it
will unpack (i.e. loosen) all unreferenced objects from local not-kept
packs, including those that also exist in packs residing in an alternate
object database or a locally kept pack. The only user of this option is
git-repack.
In this case, repack will follow the call to pack-objects with a call to
prune-packed, which will delete these newly loosened objects, making the
act of loosening a waste of time. The unnecessary loosening can be
avoided by checking whether an object exists in a non-local pack or a
locally kept pack before loosening it.
This fixes the 'local packed unreachable obs that exist in alternate ODB
are not loosened' test in t7700.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This option to pack-objects/rev-list was created to improve the -A and -a
options of repack. It was found to be lacking in that it did not provide
the ability to differentiate between local and non-local kept packs, and
found to be unnecessary since objects residing in local kept packs can be
filtered out by the --honor-pack-keep option.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These two features were invented for use by repack when repack will delete
the local packs that have been made redundant. The packs accessible
through alternates are not deleted by repack, so the objects contained in
them are still accessible after the local packs are deleted. They do not
need to be repacked into the new pack or loosened. For the case of
loosening they would immediately be deleted by the subsequent prune-packed
that is called by repack anyway.
This fixes the test
'packed unreachable obs in alternate ODB are not loosened' in t7700.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now is_kept_pack() is just a member lookup into a structure, we can write
it as such.
Also rewrite the sole caller of has_sha1_kept_pack() to switch on the
criteria the callee uses (namely, revs->kept_pack_only) between calling
has_sha1_kept_pack() and has_sha1_pack(), so that these two callees do not
have to take a pointer to struct rev_info as an argument.
This removes the header file dependency issue temporarily introduced by
the earlier commit, so we revert changes associated to that as well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This removes --unpacked=<packfile> parameter from the revision parser, and
rewrites its use in git-repack to pass a single --kept-pack-only option
instead.
The new --kept-pack-only option means just that. When this option is
given, is_kept_pack() that used to say "not on the --unpacked=<packfile>
list" now says "the packfile has corresponding .keep file".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This refactors three loops that check if a given packfile is on the
ignore_packed list into a function is_kept_pack(). The function returns
false for a pack on the list, and true for a pack not on the list, because
this list is solely used by "git repack" to pass list of packfiles that do
not have corresponding .keep files, i.e. a packfile not on the list is
"kept".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a repository created with git older than f49fb35 (git-init-db: create
"pack" subdirectory under objects, 2005-06-27), objects/pack/ directory is
not created upon initialization. It was Ok because subdirectories are
created as needed inside directories init-db creates, and back then,
packfiles were recent invention.
After the said commit, new codepaths started relying on the presense of
objects/pack/ directory in the repository. This was exacerbated with
8b4eb6b (Do not perform cross-directory renames when creating packs,
2008-09-22) that moved the location temporary pack files are created from
objects/ directory to objects/pack/ directory, because moving temporary to
the final location was done carefully with lazy leading directory creation.
Many packfile related operations in such an old repository can fail
mysteriously because of this.
This commit introduces two helper functions to make things work better.
- odb_mkstemp() is a specialized version of mkstemp() to refactor the
code and teach it to create leading directories as needed;
- odb_pack_keep() refactors the code to create a ".keep" file while
create leading directories as needed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>