9f673f9 (gc: config option for running --auto in background -
2014-02-08) puts "gc --auto" in background to reduce user's wait
time. Part of the garbage collecting is pack-refs and pruning
reflogs. These require locking some refs and may abort other processes
trying to lock the same ref. If gc --auto is fired in the middle of a
script, gc's holding locks in the background could fail the script,
which could never happen before 9f673f9.
Keep running pack-refs and "reflog --prune" in foreground to stop
parallel ref updates. The remaining background operations (repack,
prune and rerere) should not impact running git processes.
Reported-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 1c192f3 (gc --aggressive: make it really aggressive - 2007-12-06)
made --depth=250 the default value, it didn't really explain the
reason behind, especially the pros and cons of --depth=250.
An old mail from Linus below explains it at length. Long story short,
--depth=250 is a disk saver and a performance killer. Not everybody
agrees on that aggressiveness. Let the user configure it.
From: Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] gc --aggressive: make it really aggressive
Date: Thu, 6 Dec 2007 08:19:24 -0800 (PST)
Message-ID: <alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org>
Gmane-URL: http://article.gmane.org/gmane.comp.gcc.devel/94637
On Thu, 6 Dec 2007, Harvey Harrison wrote:
>
> 7:41:25elapsed 86%CPU
Heh. And this is why you want to do it exactly *once*, and then just
export the end result for others ;)
> -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 pack-1d46...pack
But yeah, especially if you allow longer delta chains, the end result can
be much smaller (and what makes the one-time repack more expensive is the
window size, not the delta chain - you could make the delta chains longer
with no cost overhead at packing time)
HOWEVER.
The longer delta chains do make it potentially much more expensive to then
use old history. So there's a trade-off. And quite frankly, a delta depth
of 250 is likely going to cause overflows in the delta cache (which is
only 256 entries in size *and* it's a hash, so it's going to start having
hash conflicts long before hitting the 250 depth limit).
So when I said "--depth=250 --window=250", I chose those numbers more as
an example of extremely aggressive packing, and I'm not at all sure that
the end result is necessarily wonderfully usable. It's going to save disk
space (and network bandwidth - the delta's will be re-used for the network
protocol too!), but there are definitely downsides too, and using long
delta chains may simply not be worth it in practice.
(And some of it might just want to have git tuning, ie if people think
that long deltas are worth it, we could easily just expand on the delta
hash, at the cost of some more memory used!)
That said, the good news is that working with *new* history will not be
affected negatively, and if you want to be _really_ sneaky, there are ways
to say "create a pack that contains the history up to a version one year
ago, and be very aggressive about those old versions that we still want to
have around, but do a separate pack for newer stuff using less aggressive
parameters"
So this is something that can be tweaked, although we don't really have
any really nice interfaces for stuff like that (ie the git delta cache
size is hardcoded in the sources and cannot be set in the config file, and
the "pack old history more aggressively" involves some manual scripting
and knowing how "git pack-objects" works rather than any nice simple
command line switch).
So the thing to take away from this is:
- git is certainly flexible as hell
- .. but to get the full power you may need to tweak things
- .. happily you really only need to have one person to do the tweaking,
and the tweaked end results will be available to others that do not
need to know/care.
And whether the difference between 320MB and 500MB is worth any really
involved tweaking (considering the potential downsides), I really don't
know. Only testing will tell.
Linus
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Don't change git environment: move the GIT_EDITOR=":" override to the
hook command subprocess, like it's already done for GIT_INDEX_FILE.
Signed-off-by: Benoit Pierre <benoit.pierre@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`gc --auto` takes time and can block the user temporarily (but not any
less annoyingly). Make it run in background on systems that support
it. The only thing lost with running in background is printouts. But
gc output is not really interesting. You can keep it in foreground by
changing gc.autodetach.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 64a99eb4 git gc refuses to run without the --force option if
another gc process on the same repository is already running.
However, if the repository is shared and user A runs git gc on the
repository and while that gc is still running user B runs git gc on
the same repository the gc process run by user A will not be noticed
and the gc run by user B will go ahead and run.
The problem is that the kill(pid, 0) test fails with an EPERM error
since user B is not allowed to signal processes owned by user A
(unless user B is root).
Update the test to recognize an EPERM error as meaning the process
exists and another gc should not be run (unless --force is given).
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch teaches "prune" to remove shallow roots that are no longer
reachable from any refs (e.g. when the relevant refs are removed).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This file isn't really harmful, but isn't useful either, and can create
minor annoyance for the user:
* It's confusing, as the presence of a *.pid file often implies that a
process is currently running. A user running "ls .git/" and finding
this file may incorrectly guess that a "git gc" is currently running.
* Leaving this file means that a "git gc" in an already gc-ed repo is
no-longer a no-op. A user running "git gc" in a set of repositories,
and then synchronizing this set (e.g. rsync -av, unison, ...) will see
all the gc.pid files as changed, which creates useless noise.
This patch unlinks the file after the garbage collection is done, so that
gc.pid is actually present only during execution.
Future versions of Git may want to use the information left in the gc.pid
file (e.g. for policies like "don't attempt to run a gc if one has
already been ran less than X hours ago"). If so, this patch can safely be
reverted. For now, let's not bother the users.
Explained-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Improved-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This may happen when `git gc --auto` is run automatically, then the
user, to avoid wait time, switches to a new terminal, keeps working
and `git gc --auto` is started again because the first gc instance has
not clean up the repository.
This patch tries to avoid multiple gc running, especially in --auto
mode. In the worst case, gc may be delayed 12 hours if a daemon reuses
the pid stored in gc.pid.
kill(pid, 0) support is added to MinGW port so it should work on
Windows too.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This task emerged from b04ba2bb (parse-options: deprecate OPT_BOOLEAN,
2011-09-27). All occurrences of the respective variables have
been reviewed and none of them relied on the counting up mechanism,
but all of them were using the variable as a true boolean.
This patch does not change semantics of any command intentionally.
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --quiet is requested, gc --auto should not display messages unless
there is an error.
Signed-off-by: Tobias Ulmer <tobiasu@tmux.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-gc executes many sub-commands. The argument list for
some of these is constant, but for others we add more
arguments at runtime. The latter is implemented by allocating
a constant extra number of NULLs, and either using a custom
append function, or just referencing unused slots by number.
As of commit 7e52f56, which added two new arguments, it is
possible to exceed the constant number of slots for "repack"
by running "git gc --aggressive", causing "git gc" to die.
This patch converts all of the static argv lists to use
argv-array. In addition to fixing the overflow caused by
7e52f56, it has a few advantages:
1. We can drop the custom append function (which,
incidentally, had an off-by-one error exacerbating the
static limit).
2. We can drop the ugly magic numbers used when adding
arguments to "prune".
3. Adding further arguments will be easier; you can just
add new "push" calls without worrying about increasing
any static limits.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we pack everything into one big pack with "git repack
-Ad", any unreferenced objects in to-be-deleted packs are
exploded into loose objects, with the intent that they will
be examined and possibly cleaned up by the next run of "git
prune".
Since the exploded objects will receive the mtime of the
pack from which they come, if the source pack is old, those
loose objects will end up pruned immediately. In that case,
it is much more efficient to skip the exploding step
entirely for these objects.
This patch teaches pack-objects to receive the expiration
information and avoid writing these objects out. It also
teaches "git gc" to pass the value of gc.pruneexpire to
repack (which in turn learns to pass it along to
pack-objects) so that this optimization happens
automatically during "git gc" and "git gc --auto".
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split up the "Auto packing the repository" message into quiet and
verbose variants to make translation easier.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allows better help text to be defined than "be quiet". Also make use
of the macro in a place that already had a different description. No
object code changes intended.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Given a request for command-line usage information rather than some
more substantial action, the only friendly thing to do is to report
the usage information as soon as possible and exit.
Without this change, as "git gc" glances over the repository, it can
be distracted by the desire to report a malformed configuration file.
Noticed while working through reports from Duy's repository access
checker.
[jn: with rewritten log message and tests]
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This shrinks the top-level directory a bit, and makes it much more
pleasant to use auto-completion on the thing. Instead of
[torvalds@nehalem git]$ em buil<tab>
Display all 180 possibilities? (y or n)
[torvalds@nehalem git]$ em builtin-sh
builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c
builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o
[torvalds@nehalem git]$ em builtin-shor<tab>
builtin-shortlog.c builtin-shortlog.o
[torvalds@nehalem git]$ em builtin-shortlog.c
you get
[torvalds@nehalem git]$ em buil<tab> [type]
builtin/ builtin.h
[torvalds@nehalem git]$ em builtin [auto-completes to]
[torvalds@nehalem git]$ em builtin/sh<tab> [type]
shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o
[torvalds@nehalem git]$ em builtin/sho [auto-completes to]
[torvalds@nehalem git]$ em builtin/shor<tab> [type]
shortlog.c shortlog.o
[torvalds@nehalem git]$ em builtin/shortlog.c
which doesn't seem all that different, but not having that annoying
break in "Display all 180 possibilities?" is quite a relief.
NOTE! If you do this in a clean tree (no object files etc), or using an
editor that has auto-completion rules that ignores '*.o' files, you
won't see that annoying 'Display all 180 possibilities?' message - it
will just show the choices instead. I think bash has some cut-off
around 100 choices or something.
So the reason I see this is that I'm using an odd editory, and thus
don't have the rules to cut down on auto-completion. But you can
simulate that by using 'ls' instead, or something similar.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git reset' is missing --quiet, and 'git gc' is not using OPT__QUIET.
Let's fix that.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "gc --auto --quiet" decides there is something to do, it tells the
user what it is doing, as it is going to make the user wait for a bit.
But the message was a bit too long.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To give OPT_FILENAME the prefix, we pass the prefix to parse_options()
which passes the prefix to parse_options_start() which sets the prefix
member of parse_opts_ctx accordingly. If there isn't a prefix in the
calling context, passing NULL will suffice.
Signed-off-by: Stephen Boyd <bebarino@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The default was not to change the window or depth at all. As suggested
by Jon Smirl, Linus Torvalds and others, default to
--window=250 --depth=250
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With this patch, "git gc --no-prune" will not prune any loose (and
dangling) object, and "git gc --prune=5.minutes.ago" will prune
all loose objects older than 5 minutes.
This patch benefitted from suggestions by Thomas Rast and Jan Krï¿œger.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A function that runs a hook is used in several Git commands.
builtin-commit.c has the one that is most general for cases without
piping. The one in builtin-gc.c prints some useful warnings.
This patch moves a merged version of these variants into libgit and
lets the other builtins use this libified run_hook().
The run_hook() function used in receive-pack.c feeds the standard
input of the pre-receive or post-receive hooks. This function is
renamed to run_receive_hook() because the libified run_hook() cannot
handle this.
Mentored-by: Daniel Barkalow <barkalow@iabervon.org>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there is no grace period before pruning unreferenced objects, it is
pointless to push those objects in their loose form just to delete them
right away.
Also be more explicit about the possibility of using "now" in the
gc.pruneexpire config variable (needed for the above behavior to
happen).
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you misuse a git command, you are shown the usage string.
But this is currently shown in the dashed form. So if you just
copy what you see, it will not work, when the dashed form
is no longer supported.
This patch makes git commands show the dash-less version.
For shell scripts that do not specify OPTIONS_SPEC, git-sh-setup.sh
generates a dash-less usage string now.
Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git_config() only had a function parameter, but no callback data
parameter. This assumes that all callback functions only modify
global variables.
With this patch, every callback gets a void * parameter, and it is hoped
that this will help the libification effort.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that repack -A will leave unreferenced objects unpacked, there is
no reason to use the -a option to repack (which will discard unreferenced
objects). The unpacked unreferenced objects will not be repacked by a
subsequent repack, and will eventually be pruned by git-gc based on the
gc.pruneExpire config option.
If such a hook is available and exits with a non-zero status, then
git-gc --auto won't run.
Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent discussion on the list, with the improvement f7c22cc (always start
looking up objects in the last used pack first, 2007-05-30) brought in,
reached the concensus that the current default 20 is too low.
Reference: http://thread.gmane.org/gmane.comp.version-control.git/77586
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The gc.auto configuration variable is somewhat ambiguous now that there
is also a gc.autopacklimit setting. Some users may assume that it controls
all auto-gc'ing. Also, now users must set two configuration variables to
zero when they want to disable autopacking. Since it is unlikely that users
will want to autopack based on some threshold of pack files when they have
disabled autopacking based on the number of loose objects, be nice and allow
a setting of zero for gc.auto to disable all autopacking.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The only reason we did not call "prune" in git-gc was that it is an
inherently dangerous operation: if there is a commit going on, you will
prune loose objects that were just created, and are, in fact, needed by the
commit object just about to be created.
Since it is dangerous, we told users so. That led to many users not even
daring to run it when it was actually safe. Besides, they are users, and
should not have to remember such details as when to call git-gc with
--prune, or to call git-prune directly.
Of course, the consequence was that "git gc --auto" gets triggered much
more often than we would like, since unreferenced loose objects (such as
left-overs from a rebase or a reset --hard) were never pruned.
Alas, git-prune recently learnt the option --expire <minimum-age>, which
makes it a much safer operation. This allows us to call prune from git-gc,
with a grace period of 2 weeks for the unreferenced loose objects (this
value was determined in a discussion on the git list as a safe one).
If you want to override this grace period, just set the config variable
gc.pruneExpire to a different value; an example would be
[gc]
pruneExpire = 6.months.ago
or even "never", if you feel really paranoid.
Note that this new behaviour makes "--prune" be a no-op.
While adding a test to t5304-prune.sh (since it really tests the implicit
call to "prune"), also the original test for "prune --expire" was moved
there from t1410-reflog.sh, where it did not belong.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Brandon Casey correctly points out that we repack with -A without --prune
and with -a with --prune, so it is not just unreferenced loose objects
that are pruned away when the option is given.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous message had too much of a "boy, you should
really turn off this annoying gc" flair to it. Instead,
let's make sure the user understands what is happening, that
they can run it themselves, and where to find more info.
Suggested by Brian Gernhardt.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since "-a" is removed from the base repack command line,
we can simplify how we add additional options to this
command line when using --auto.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This makes use of repack's new "-A" option which does not drop packed
unreachable objects. This makes git-gc safe to call at any time,
particularly when a repository is referenced as an alternate by
another repository.
git-gc --prune will use the "-a" option to repack instead of "-A", so
that packed unreachable objects will be removed.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This teaches "git-gc --auto" to consolidate many packs into one
without losing unreachable objects in them by using "repack -A"
when there are too many packfiles that are not marked with *.keep
in the repository. gc.autopacklimit configuration can be used
to set the maximum number of packs a repository is allowed to
have before this mechanism kicks in.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We used to build the command line to run repack outside of
need_to_gc() but with the next patch we would want to tweak the
command line depending on the nature of need.
Signed-off-by: Junio C Hamano <gitster@pobox.com>