Our current strategy with prune is that an object falls into
one of three categories:
1. Reachable (from ref tips, reflogs, index, etc).
2. Not reachable, but recent (based on the --expire time).
3. Not reachable and not recent.
We keep objects from (1) and (2), but prune objects in (3).
The point of (2) is that these objects may be part of an
in-progress operation that has not yet updated any refs.
However, it is not always the case that objects for an
in-progress operation will have a recent mtime. For example,
the object database may have an old copy of a blob (from an
abandoned operation, a branch that was deleted, etc). If we
create a new tree that points to it, a simultaneous prune
will leave our tree, but delete the blob. Referencing that
tree with a commit will then work (we check that the tree is
in the object database, but not that all of its referred
objects are), as will mentioning the commit in a ref. But
the resulting repo is corrupt; we are missing the blob
reachable from a ref.
One way to solve this is to be more thorough when
referencing a sha1: make sure that not only do we have that
sha1, but that we have objects it refers to, and so forth
recursively. The problem is that this is very expensive.
Creating a parent link would require traversing the entire
object graph!
Instead, this patch pushes the extra work onto prune, which
runs less frequently (and has to look at the whole object
graph anyway). It creates a new category of objects: objects
which are not recent, but which are reachable from a recent
object. We do not prune these objects, just like the
reachable and recent ones.
This lets us avoid the recursive check above, because if we
have an object, even if it is unreachable, we should have
its referent. We can make a simple inductive argument that
with this patch, this property holds (that there are no
objects with missing referents in the repository):
0. When we have no objects, we have nothing to refer or be
referred to, so the property holds.
1. If we add objects to the repository, their direct
referents must generally exist (e.g., if you create a
tree, the blobs it references must exist; if you create
a commit to point at the tree, the tree must exist).
This is already the case before this patch. And it is
not 100% foolproof (you can make bogus objects using
`git hash-object`, for example), but it should be the
case for normal usage.
Therefore for any sequence of object additions, the
property will continue to hold.
2. If we remove objects from the repository, then we will
not remove a child object (like a blob) if an object
that refers to it is being kept. That is the part
implemented by this patch.
Note, however, that our reachability check and the
actual pruning are not atomic. So it _is_ still
possible to violate the property (e.g., an object
becomes referenced just as we are deleting it). This
patch is shooting for eliminating problems where the
mtimes of dependent objects differ by hours or days,
and one is dropped without the other. It does nothing
to help with short races.
Naively, the simplest way to implement this would be to add
all recent objects as tips to the reachability traversal.
However, this does not perform well. In a recently-packed
repository, all reachable objects will also be recent, and
therefore we have to look at each object twice. This patch
instead performs the reachability traversal, then follows up
with a second traversal for recent objects, skipping any
that have already been marked.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prune has to walk $GIT_DIR/objects/?? in order to find the
set of loose objects to prune. Other parts of the code
(e.g., count-objects) want to do the same. Let's factor it
out into a reusable for_each-style function.
Note that this is not quite a straight code movement. The
original code had strange behavior when it found a file of
the form "[0-9a-f]{2}/.{38}" that did _not_ contain all hex
digits. It executed a "break" from the loop, meaning that we
stopped pruning in that directory (but still pruned other
directories!). This was probably a bug; we do not want to
process the file as an object, but we should keep going
otherwise (and that is how the new code handles it).
We are also a little more careful with loose object
directories which fail to open. The original code silently
ignored any failures, but the new code will complain about
any problems besides ENOENT.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The semantics of this flag was changed in commit
e1111cef23 inline lookup_replace_object() calls
but wasn't renamed at the time to minimize code churn. Rename it now,
and add a comment explaining its use.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at it, rename prune_tmp_object(), which used to be a helper to
remove temporary files that were created to become loose object
files, to prune_tmp_file(), as the function is also used to remove
any random cruft whose name begins with tmp_ directly in .git/object
or .git/object/pack directories these days.
Noticed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch teaches "prune" to remove shallow roots that are no longer
reachable from any refs (e.g. when the relevant refs are removed).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit b60daf0 (Make git-prune-packed a bit more chatty. - 2007-01-12)
changes the meaning of prune_packed_objects()'s argument, from "dry
run or not dry run" to a bitmap.
It however forgot to update prune_packed_objects() caller in
builtin/prune.c to use new DRY_RUN macro. It's fine (for a long time!)
but there is a risk that someday someone may change the value of
DRY_RUN to something else and builtin/prune.c suddenly breaks. Avoid
that possibility.
While at there, change "opts == VERBOSE" to "opts & VERBOSE" as there
is no obvious reason why we only be chatty when DRY_RUN is not set.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier we added support for --expire=all (or --expire=now) that
considers all crufts, regardless of their age, as eligible for
garbage collection by turning command argument parsers that use
approxidate() to use parse_expiry_date(), but "git prune" used a
built-in parse-options facility OPT_DATE() and did not benefit from
the new function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some call-sites do:
o = parse_object(sha1);
if (!o)
die("bad object %s", some_name);
We can now handle that as a one-liner, and get more
consistent output.
In the third case of this patch, it looks like we are losing
information, as the existing message also outputs the sha1
hex; however, parse_object will already have written a more
specific complaint about the sha1, so there is no point in
repeating it here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In addition to updating the xstrdup(mkpath(...)) call sites with
mkpathdup(), we also fix a memory leak (in merge_3way()) caused by
neglecting to free the memory allocated to the 'base_name' variable.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git prune" reports removal of loose object files that are no longer
necessary only under the "-v" option, but unconditionally reports
removal of temporary files that are no longer needed.
The original thinking was that the presence of a leftover temporary
file should be an unusual occurrence that may indicate an earlier
failure of some sort, and the user may want to be reminded of it.
Removing an unnecessary loose object file, on the other hand, is
just part of the normal operation. That is why the former is always
printed out and the latter only when -v is used.
But neither report is particularly useful. Hide both of these
behind the "-v" option for consistency.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both git-prune and git-repack (and thus, git-gc) try to rmdir while
holding a DIR* handle on the directory. This can leave dangling
empty directories in the .git/objects on platforms where directory
cannot be removed while they are open.
First call closedir() and then rmdir(); that is more logical ordering.
Reported-by: John Chen <john0312@gmail.com>
Reported-by: Stefan Naewe <stefan.naewe@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>
Improved-and-Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
prune already shows progress meter while pruning. The marking part may
take a few seconds or more, depending on repository size. Show
progress meter during this time too.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allows better help text to be defined than "dry run". Also make use
of the macro in places that already had a different description. No
object code changes intended.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allows better help text to be defined than "be verbose". Also make use
of the macro in places that already had a different description. No
object code changes intended.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For consistency with other git commands, let git prune accept the long
options --dry-run and --verbose for the respective short ones -n and -v.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This shrinks the top-level directory a bit, and makes it much more
pleasant to use auto-completion on the thing. Instead of
[torvalds@nehalem git]$ em buil<tab>
Display all 180 possibilities? (y or n)
[torvalds@nehalem git]$ em builtin-sh
builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c
builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o
[torvalds@nehalem git]$ em builtin-shor<tab>
builtin-shortlog.c builtin-shortlog.o
[torvalds@nehalem git]$ em builtin-shortlog.c
you get
[torvalds@nehalem git]$ em buil<tab> [type]
builtin/ builtin.h
[torvalds@nehalem git]$ em builtin [auto-completes to]
[torvalds@nehalem git]$ em builtin/sh<tab> [type]
shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o
[torvalds@nehalem git]$ em builtin/sho [auto-completes to]
[torvalds@nehalem git]$ em builtin/shor<tab> [type]
shortlog.c shortlog.o
[torvalds@nehalem git]$ em builtin/shortlog.c
which doesn't seem all that different, but not having that annoying
break in "Display all 180 possibilities?" is quite a relief.
NOTE! If you do this in a clean tree (no object files etc), or using an
editor that has auto-completion rules that ignores '*.o' files, you
won't see that annoying 'Display all 180 possibilities?' message - it
will just show the choices instead. I think bash has some cut-off
around 100 choices or something.
So the reason I see this is that I'm using an odd editory, and thus
don't have the rules to cut down on auto-completion. But you can
simulate that by using 'ls' instead, or something similar.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new "read_replace_refs" global variable is set to 1 by
default, so that replace refs are used by default. But
reachability traversal and packing commands ("cmd_fsck",
"cmd_prune", "cmd_pack_objects", "upload_pack",
"cmd_unpack_objects") set it to 0, as they must work with the
original DAG.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To give OPT_FILENAME the prefix, we pass the prefix to parse_options()
which passes the prefix to parse_options_start() which sets the prefix
member of parse_opts_ctx accordingly. If there isn't a prefix in the
calling context, passing NULL will suffice.
Signed-off-by: Stephen Boyd <bebarino@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps to notice when something's going wrong, especially on
systems which lock open files.
I used the following criteria when selecting the code for replacement:
- it was already printing a warning for the unlink failures
- it is in a function which already printing something or is
called from such a function
- it is in a static function, returning void and the function is only
called from a builtin main function (cmd_)
- it is in a function which handles emergency exit (signal handlers)
- it is in a function which is obvously cleaning up the lockfiles
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new inline function is_dot_or_dotdot is used to check if the
directory name is either "." or "..". It returns a non-zero value if
the given string is "." or "..". It's applicable to a lot of Git
source code.
Signed-off-by: Alexander Potashev <aspotashev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds an option "-v" which makes "git prune" more verbose:
It outputs all removed objects while removing them.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
As Shawn pointed out, not all temporary file creation routines can
ensure that the generated temporary file is of a certain length.
e.g. Java's createTempFile(prefix, suffix). So just depend on the
prefix 'tmp_obj_' for detection.
Update prune, and fix the "fix" introduced by a08c53a1 :)
Signed-off-by: Brandon "appendixless" Casey <casey@nrlssc.navy.mil>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 5723fe7e, temporary objects are now created in their final destination
directories, rather than in .git/objects/. Teach fsck to recognize and
ignore the temporary objects it encounters, and teach prune to remove them.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you misuse a git command, you are shown the usage string.
But this is currently shown in the dashed form. So if you just
copy what you see, it will not work, when the dashed form
is no longer supported.
This patch makes git commands show the dash-less version.
For shell scripts that do not specify OPTIONS_SPEC, git-sh-setup.sh
generates a dash-less usage string now.
Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the pack-files are now always created stably on disk, there is no
need to sync() before pruning lose objects or old stale pack-files.
[jc: with Nico's clean-up]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Finally, this resurrects the documented behaviour to protect other
objects listed on the command line from getting pruned.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the OPT_DATE() introduced earlier, this updates builtin-prune to
use parse_options().
Signed-off-by: Michele Ballabio <barra_cuda@katamail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Write errors when repacking (eg, due to out-of-space conditions)
can leave temporary packs (and possibly other files beginning
with "tmp_") lying around which no existing
codepath removes and which aren't obvious to the casual user.
These can also be multi-megabyte files wasting noticeable space.
Unfortunately there's no way to definitely tell in builtin-prune
that a tmp_ file is not being used by a concurrent process,
such as a fetch. However, it is documented that pruning should
only be done on a quiet repository and --expire is honoured
(using code from Johannes Schindelin, along with a test case
he wrote) so that its safety is the same as that of loose
object pruning.
Since they might be signs of a problem (unlike orphaned loose
objects) the names of any removed files are printed.
Signed-off-by: David Tweed (david.tweed@gmail.com)
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier, 'git prune' would prune all loose unreachable objects.
This could be quite dangerous, as the objects could be used in
an ongoing operation.
This patch adds a mode to expire only loose, unreachable objects
which are older than a certain time. For example, by
git prune --expire 14.days
you can prune only those objects which are loose, unreachable
and older than 14 days (and thus probably outdated).
The implementation uses st.st_mtime rather than st.st_ctime,
because it can be tested better, using 'touch -d <time>' (and
omitting the test when the platform does not support that
command line switch).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Don't try to remove the containing directory for every pruned object but
try only once after the directory has been scanned instead.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value. One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.
This is an initial step for the removal of the version using a char array
found in object reading code paths. The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This reverts commit 9b088c4e39.
Protecting 'mature' objects does not make it any safer. We should
admit that git-prune is inherently unsafe when run in parallel with
other operations without involving unwarranted locking overhead,
and with the latest git, even rebase and reset would not immediately
create crufts anyway.
This option gives grace period to objects that are unreachable
from the refs from getting pruned.
The default value is 24 hours and may be changed using
gc.prunegrace.
Signed-off-by: Matthias Lederhofer <matled@gmx.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This moves major part of builtin-prune into a separate file,
reachable.c. It is used to mark the objects that are reachable
from refs, and optionally from reflogs.
The patch looks very large, but if you look at it with diff -C,
which this message is formatted in, most of them are copied
lines and there are very little additions.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is necessary for the next step, because the reason I am
making the connectivity walker into a library is because I want
to use it for cleaning up stale reflog entries.
Signed-off-by: Junio C Hamano <junkio@cox.net>
I want to make the first part of 'git prune' that marks the
reachable objects callable as a library, so this starts the
first step toward the goal by making the callchain to pass
rev_info structure as an argument.
No functionality change should be in this step.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Somehow we forgot to turn save_commit_buffer off while walking
the reachable objects. Releasing the memory for commit object
data that we do not use matters for large projects (for example,
about 90MB is saved while traversing linux-2.6 history).
Signed-off-by: Junio C Hamano <junkio@cox.net>
It passed (const char*) to a function that took a (char *); the
buffer itself was of course writable, so pass the buffer itself.
Signed-off-by: Junio C Hamano <junkio@cox.net>
prune_object() in show_only mode would previously just show the path to the
object that would be deleted. The path the object is stored in shouldn't be
shown to users, they only know about sha1 identifiers so show that instead.
Further, the sha1 alone isn't that useful for examining what is going to be
deleted. This patch also adds the object type to the output, which makes it
easy to pick out, say, the commits and use git-show to display them.
Signed-off-by: Andy Parkins <andyparkins@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Both the git-prune manpage and everday.txt say that git-prune should also prune
unpacked objects that are also found in packs, by running git prune-packed.
Junio thought this was "a regression when prune was rewritten as a built-in."
So modify prune to call prune-packed again.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This adds a "int *flag" parameter to resolve_ref() and makes
for_each_ref() family to call callback function with an extra
"int flag" parameter. They are used to give two bits of
information (REF_ISSYMREF and REF_ISPACKED) about the ref.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a long overdue fix to the API for for_each_ref() family
of functions. It allows the callers to specify a callback data
pointer, so that the caller does not have to use static
variables to communicate with the callback funciton.
The updated for_each_ref() family takes a function of type
int (*fn)(const char *, const unsigned char *, void *)
and a void pointer as parameters, and calls the function with
the name of the ref and its SHA-1 with the caller-supplied void
pointer as parameters.
The commit updates two callers, builtin-name-rev.c and
builtin-pack-refs.c as an example.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Like xmalloc and xrealloc xstrdup dies with a useful message if
the native strdup() implementation returns NULL rather than a
valid pointer.
I just tried to use xstrdup in new code and found it to be missing.
However I expected it to be present as xmalloc and xrealloc are
already commonly used throughout the code.
[jc: removed the part that deals with last_XXX, which I am
finding more and more dubious these days.]
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>