|
|
|
git-repack(1)
|
|
|
|
=============
|
|
|
|
|
|
|
|
NAME
|
|
|
|
----
|
|
|
|
git-repack - Pack unpacked objects in a repository
|
|
|
|
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
--------
|
|
|
|
[verse]
|
builtin/repack.c: support writing a MIDX while repacking
Teach `git repack` a new `--write-midx` option for callers that wish to
persist a multi-pack index in their repository while repacking.
There are two existing alternatives to this new flag, but they don't
cover our particular use-case. These alternatives are:
- Call 'git multi-pack-index write' after running 'git repack', or
- Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running
'git repack'.
The former works, but introduces a gap in bitmap coverage between
repacking and writing a new MIDX (since the repack may have deleted a
pack included in the existing MIDX, invalidating it altogether).
Setting the 'GIT_TEST_' environment variable is obviously unsupported.
In fact, even if it were supported officially, it still wouldn't work,
because it generates the MIDX *after* redundant packs have been dropped,
leading to the same issue as above.
Introduce a new option which eliminates this race by teaching `git
repack` to generate the MIDX at the critical point: after the new packs
have been written and moved into place, but before the redundant packs
have been removed.
This option is compatible with `git repack`'s '--bitmap' option (it
changes the interpretation to be: "write a bitmap corresponding to the
MIDX after one has been generated").
There is a little bit of additional noise in the patch below to avoid
repeating ourselves when selecting which packs to delete. Instead of a
single loop as before (where we iterate over 'existing_packs', decide if
a pack is worth deleting, and if so, delete it), we have two loops (the
first where we decide which ones are worth deleting, and the second
where we actually do the deleting). This makes it so we have a single
check we can make consistently when (1) telling the MIDX which packs we
want to exclude, and (2) actually unlinking the redundant packs.
There is also a tiny change to short-circuit the body of
write_midx_included_packs() when no packs remain in the case of an empty
repository. The MIDX code does not handle this, so avoid trying to
generate a MIDX covering zero packs in the first place.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
3 years ago
|
|
|
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m] [--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>] [--write-midx]
|
|
|
|
|
|
|
|
DESCRIPTION
|
|
|
|
-----------
|
|
|
|
|
|
|
|
This command is used to combine all objects that do not currently
|
|
|
|
reside in a "pack", into a pack. It can also be used to re-organize
|
|
|
|
existing packs into a single, more efficient pack.
|
|
|
|
|
|
|
|
A pack is a collection of objects, individually compressed, with
|
|
|
|
delta compression applied, stored in a single file, with an
|
|
|
|
associated index file.
|
|
|
|
|
|
|
|
Packs are used to reduce the load on mirror systems, backup
|
|
|
|
engines, disk storage, etc.
|
|
|
|
|
|
|
|
OPTIONS
|
|
|
|
-------
|
|
|
|
|
|
|
|
-a::
|
|
|
|
Instead of incrementally packing the unpacked objects,
|
|
|
|
pack everything referenced into a single pack.
|
|
|
|
Especially useful when packing a repository that is used
|
|
|
|
for private development. Use
|
|
|
|
with `-d`. This will clean up the objects that `git prune`
|
|
|
|
leaves behind, but `git fsck --full --dangling` shows as
|
|
|
|
dangling.
|
|
|
|
+
|
|
|
|
Note that users fetching over dumb protocols will have to fetch the
|
|
|
|
whole new pack in order to get any contained object, no matter how many
|
|
|
|
other objects in that pack they already have locally.
|
|
|
|
+
|
|
|
|
Promisor packfiles are repacked separately: if there are packfiles that
|
|
|
|
have an associated ".promisor" file, these packfiles will be repacked
|
|
|
|
into another separate pack, and an empty ".promisor" file corresponding
|
|
|
|
to the new separate pack will be written.
|
|
|
|
|
|
|
|
-A::
|
|
|
|
Same as `-a`, unless `-d` is used. Then any unreachable
|
|
|
|
objects in a previous pack become loose, unpacked objects,
|
|
|
|
instead of being left in the old pack. Unreachable objects
|
|
|
|
are never intentionally added to a pack, even when repacking.
|
|
|
|
This option prevents unreachable objects from being immediately
|
|
|
|
deleted by way of being left in the old pack and then
|
|
|
|
removed. Instead, the loose unreachable objects
|
|
|
|
will be pruned according to normal expiry rules
|
|
|
|
with the next 'git gc' invocation. See linkgit:git-gc[1].
|
|
|
|
|
|
|
|
-d::
|
|
|
|
After packing, if the newly created packs make some
|
|
|
|
existing packs redundant, remove the redundant packs.
|
|
|
|
Also run 'git prune-packed' to remove redundant
|
|
|
|
loose object files.
|
|
|
|
|
|
|
|
-l::
|
|
|
|
Pass the `--local` option to 'git pack-objects'. See
|
|
|
|
linkgit:git-pack-objects[1].
|
|
|
|
|
|
|
|
-f::
|
|
|
|
Pass the `--no-reuse-delta` option to `git-pack-objects`, see
|
|
|
|
linkgit:git-pack-objects[1].
|
|
|
|
|
|
|
|
-F::
|
|
|
|
Pass the `--no-reuse-object` option to `git-pack-objects`, see
|
|
|
|
linkgit:git-pack-objects[1].
|
|
|
|
|
|
|
|
-q::
|
|
|
|
Pass the `-q` option to 'git pack-objects'. See
|
|
|
|
linkgit:git-pack-objects[1].
|
|
|
|
|
|
|
|
-n::
|
|
|
|
Do not update the server information with
|
|
|
|
'git update-server-info'. This option skips
|
|
|
|
updating local catalog files needed to publish
|
|
|
|
this repository (or a direct copy of it)
|
|
|
|
over HTTP or FTP. See linkgit:git-update-server-info[1].
|
|
|
|
|
|
|
|
--window=<n>::
|
|
|
|
--depth=<n>::
|
|
|
|
These two options affect how the objects contained in the pack are
|
|
|
|
stored using delta compression. The objects are first internally
|
|
|
|
sorted by type, size and optionally names and compared against the
|
|
|
|
other objects within `--window` to see if using delta compression saves
|
|
|
|
space. `--depth` limits the maximum delta depth; making it too deep
|
|
|
|
affects the performance on the unpacker side, because delta data needs
|
|
|
|
to be applied that many times to get to the necessary object.
|
|
|
|
+
|
|
|
|
The default value for --window is 10 and --depth is 50. The maximum
|
|
|
|
depth is 4095.
|
|
|
|
|
|
|
|
--threads=<n>::
|
|
|
|
This option is passed through to `git pack-objects`.
|
|
|
|
|
|
|
|
--window-memory=<n>::
|
|
|
|
This option provides an additional limit on top of `--window`;
|
|
|
|
the window size will dynamically scale down so as to not take
|
|
|
|
up more than '<n>' bytes in memory. This is useful in
|
|
|
|
repositories with a mix of large and small objects to not run
|
|
|
|
out of memory with a large window, but still be able to take
|
|
|
|
advantage of the large window for the smaller objects. The
|
|
|
|
size can be suffixed with "k", "m", or "g".
|
|
|
|
`--window-memory=0` makes memory usage unlimited. The default
|
|
|
|
is taken from the `pack.windowMemory` configuration variable.
|
|
|
|
Note that the actual memory usage will be the limit multiplied
|
|
|
|
by the number of threads used by linkgit:git-pack-objects[1].
|
|
|
|
|
|
|
|
--max-pack-size=<n>::
|
|
|
|
Maximum size of each output pack file. The size can be suffixed with
|
|
|
|
"k", "m", or "g". The minimum size allowed is limited to 1 MiB.
|
|
|
|
If specified, multiple packfiles may be created, which also
|
|
|
|
prevents the creation of a bitmap index.
|
|
|
|
The default is unlimited, unless the config variable
|
|
|
|
`pack.packSizeLimit` is set. Note that this option may result in
|
|
|
|
a larger and slower repository; see the discussion in
|
|
|
|
`pack.packSizeLimit`.
|
|
|
|
|
|
|
|
-b::
|
|
|
|
--write-bitmap-index::
|
|
|
|
Write a reachability bitmap index as part of the repack. This
|
builtin/repack.c: support writing a MIDX while repacking
Teach `git repack` a new `--write-midx` option for callers that wish to
persist a multi-pack index in their repository while repacking.
There are two existing alternatives to this new flag, but they don't
cover our particular use-case. These alternatives are:
- Call 'git multi-pack-index write' after running 'git repack', or
- Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running
'git repack'.
The former works, but introduces a gap in bitmap coverage between
repacking and writing a new MIDX (since the repack may have deleted a
pack included in the existing MIDX, invalidating it altogether).
Setting the 'GIT_TEST_' environment variable is obviously unsupported.
In fact, even if it were supported officially, it still wouldn't work,
because it generates the MIDX *after* redundant packs have been dropped,
leading to the same issue as above.
Introduce a new option which eliminates this race by teaching `git
repack` to generate the MIDX at the critical point: after the new packs
have been written and moved into place, but before the redundant packs
have been removed.
This option is compatible with `git repack`'s '--bitmap' option (it
changes the interpretation to be: "write a bitmap corresponding to the
MIDX after one has been generated").
There is a little bit of additional noise in the patch below to avoid
repeating ourselves when selecting which packs to delete. Instead of a
single loop as before (where we iterate over 'existing_packs', decide if
a pack is worth deleting, and if so, delete it), we have two loops (the
first where we decide which ones are worth deleting, and the second
where we actually do the deleting). This makes it so we have a single
check we can make consistently when (1) telling the MIDX which packs we
want to exclude, and (2) actually unlinking the redundant packs.
There is also a tiny change to short-circuit the body of
write_midx_included_packs() when no packs remain in the case of an empty
repository. The MIDX code does not handle this, so avoid trying to
generate a MIDX covering zero packs in the first place.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
3 years ago
|
|
|
only makes sense when used with `-a`, `-A` or `-m`, as the bitmaps
|
|
|
|
must be able to refer to all reachable objects. This option
|
builtin/repack.c: support writing a MIDX while repacking
Teach `git repack` a new `--write-midx` option for callers that wish to
persist a multi-pack index in their repository while repacking.
There are two existing alternatives to this new flag, but they don't
cover our particular use-case. These alternatives are:
- Call 'git multi-pack-index write' after running 'git repack', or
- Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running
'git repack'.
The former works, but introduces a gap in bitmap coverage between
repacking and writing a new MIDX (since the repack may have deleted a
pack included in the existing MIDX, invalidating it altogether).
Setting the 'GIT_TEST_' environment variable is obviously unsupported.
In fact, even if it were supported officially, it still wouldn't work,
because it generates the MIDX *after* redundant packs have been dropped,
leading to the same issue as above.
Introduce a new option which eliminates this race by teaching `git
repack` to generate the MIDX at the critical point: after the new packs
have been written and moved into place, but before the redundant packs
have been removed.
This option is compatible with `git repack`'s '--bitmap' option (it
changes the interpretation to be: "write a bitmap corresponding to the
MIDX after one has been generated").
There is a little bit of additional noise in the patch below to avoid
repeating ourselves when selecting which packs to delete. Instead of a
single loop as before (where we iterate over 'existing_packs', decide if
a pack is worth deleting, and if so, delete it), we have two loops (the
first where we decide which ones are worth deleting, and the second
where we actually do the deleting). This makes it so we have a single
check we can make consistently when (1) telling the MIDX which packs we
want to exclude, and (2) actually unlinking the redundant packs.
There is also a tiny change to short-circuit the body of
write_midx_included_packs() when no packs remain in the case of an empty
repository. The MIDX code does not handle this, so avoid trying to
generate a MIDX covering zero packs in the first place.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
3 years ago
|
|
|
overrides the setting of `repack.writeBitmaps`. This option
|
|
|
|
has no effect if multiple packfiles are created, unless writing a
|
|
|
|
MIDX (in which case a multi-pack bitmap is created).
|
|
|
|
|
repack: add `repack.packKeptObjects` config var
The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.
However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.
Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:
1. The pushed .keep pack contains objects that were
already in the repository (e.g., blobs due to a revert of
an old commit).
2. Receive-pack updates the refs, making the objects
reachable, but before it removes the .keep file, the
repack runs.
In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.
This patch introduces both a command-line and config option
to disable the `--honor-pack-keep` option. By default, it
is triggered when pack.writeBitmaps (or `--write-bitmap-index`
is turned on), but specifying it explicitly can override the
behavior (e.g., in cases where you prefer .keep files to
bitmaps, but only when they are present).
Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
--pack-kept-objects::
|
|
|
|
Include objects in `.keep` files when repacking. Note that we
|
|
|
|
still do not delete `.keep` packs after `pack-objects` finishes.
|
|
|
|
This means that we may duplicate objects, but this makes the
|
|
|
|
option safe to use when there are concurrent pushes or fetches.
|
|
|
|
This option is generally only useful if you are writing bitmaps
|
|
|
|
with `-b` or `repack.writeBitmaps`, as it ensures that the
|
repack: add `repack.packKeptObjects` config var
The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.
However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.
Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:
1. The pushed .keep pack contains objects that were
already in the repository (e.g., blobs due to a revert of
an old commit).
2. Receive-pack updates the refs, making the objects
reachable, but before it removes the .keep file, the
repack runs.
In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.
This patch introduces both a command-line and config option
to disable the `--honor-pack-keep` option. By default, it
is triggered when pack.writeBitmaps (or `--write-bitmap-index`
is turned on), but specifying it explicitly can override the
behavior (e.g., in cases where you prefer .keep files to
bitmaps, but only when they are present).
Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
|
|
|
bitmapped packfile has the necessary objects.
|
|
|
|
|
|
|
|
--keep-pack=<pack-name>::
|
|
|
|
Exclude the given pack from repacking. This is the equivalent
|
|
|
|
of having `.keep` file on the pack. `<pack-name>` is the
|
|
|
|
pack file name without leading directory (e.g. `pack-123.pack`).
|
|
|
|
The option could be specified multiple times to keep multiple
|
|
|
|
packs.
|
|
|
|
|
|
|
|
--unpack-unreachable=<when>::
|
|
|
|
When loosening unreachable objects, do not bother loosening any
|
|
|
|
objects older than `<when>`. This can be used to optimize out
|
|
|
|
the write of any objects that would be immediately pruned by
|
|
|
|
a follow-up `git prune`.
|
|
|
|
|
repack: add --keep-unreachable option
The usual way to do a full repack (and what is done by
git-gc) is to run "repack -Ad --unpack-unreachable=<when>",
which will loosen any unreachable objects newer than
"<when>", and drop any older ones.
This is a safer alternative to "repack -ad", because
"<when>" becomes a grace period during which we will not
drop any new objects that are about to be referenced.
However, it isn't perfectly safe. It's always possible that
a process is about to reference an old object. Even if that
process were to take care to update the timestamp on the
object, there is no atomicity with a simultaneously running
"repack" process.
So while unlikely, there is a small race wherein we may drop
an object that is in the process of being referenced. If you
do automated repacking on a large number of active
repositories, you may hit it eventually, and the result is a
corrupted repository.
It would be nice to fix that race in the long run, but it's
complicated. In the meantime, there is a much simpler
strategy for automated repository maintenance: do not drop
objects at all. We already have a "--keep-unreachable"
option in pack-objects; we just need to plumb it through
from git-repack.
Note that this _isn't_ plumbed through from git-gc, so at
this point it's strictly a tool for people doing their own
advanced repository maintenance strategy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
9 years ago
|
|
|
-k::
|
|
|
|
--keep-unreachable::
|
|
|
|
When used with `-ad`, any unreachable objects from existing
|
|
|
|
packs will be appended to the end of the packfile instead of
|
repack: extend --keep-unreachable to loose objects
If you use "repack -adk" currently, we will pack all objects
that are already packed into the new pack, and then drop the
old packs. However, loose unreachable objects will be left
as-is. In theory these are meant to expire eventually with
"git prune". But if you are using "repack -k", you probably
want to keep things forever and therefore do not run "git
prune" at all. Meaning those loose objects may build up over
time and end up fooling any object-count heuristics (such as
the one done by "gc --auto", though since git-gc does not
support "repack -k", this really applies to whatever custom
scripts people might have driving "repack -k").
With this patch, we instead stuff any loose unreachable
objects into the pack along with the already-packed
unreachable objects. This may seem wasteful, but it is
really no more so than using "repack -k" in the first place.
We are at a slight disadvantage, in that we have no useful
ordering for the result, or names to hand to the delta code.
However, this is again no worse than what "repack -k" is
already doing for the packed objects. The packing of these
objects doesn't matter much because they should not be
accessed frequently (unless they actually _do_ become
referenced, but then they would get moved to a different
part of the packfile during the next repack).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
9 years ago
|
|
|
being removed. In addition, any unreachable loose objects will
|
|
|
|
be packed (and their loose counterparts removed).
|
repack: add --keep-unreachable option
The usual way to do a full repack (and what is done by
git-gc) is to run "repack -Ad --unpack-unreachable=<when>",
which will loosen any unreachable objects newer than
"<when>", and drop any older ones.
This is a safer alternative to "repack -ad", because
"<when>" becomes a grace period during which we will not
drop any new objects that are about to be referenced.
However, it isn't perfectly safe. It's always possible that
a process is about to reference an old object. Even if that
process were to take care to update the timestamp on the
object, there is no atomicity with a simultaneously running
"repack" process.
So while unlikely, there is a small race wherein we may drop
an object that is in the process of being referenced. If you
do automated repacking on a large number of active
repositories, you may hit it eventually, and the result is a
corrupted repository.
It would be nice to fix that race in the long run, but it's
complicated. In the meantime, there is a much simpler
strategy for automated repository maintenance: do not drop
objects at all. We already have a "--keep-unreachable"
option in pack-objects; we just need to plumb it through
from git-repack.
Note that this _isn't_ plumbed through from git-gc, so at
this point it's strictly a tool for people doing their own
advanced repository maintenance strategy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
9 years ago
|
|
|
|
|
|
|
-i::
|
|
|
|
--delta-islands::
|
|
|
|
Pass the `--delta-islands` option to `git-pack-objects`, see
|
|
|
|
linkgit:git-pack-objects[1].
|
|
|
|
|
builtin/repack.c: add '--geometric' option
Often it is useful to both:
- have relatively few packfiles in a repository, and
- avoid having so few packfiles in a repository that we repack its
entire contents regularly
This patch implements a '--geometric=<n>' option in 'git repack'. This
allows the caller to specify that they would like each pack to be at
least a factor times as large as the previous largest pack (by object
count).
Concretely, say that a repository has 'n' packfiles, labeled P1, P2,
..., up to Pn. Each packfile has an object count equal to 'objects(Pn)'.
With a geometric factor of 'r', it should be that:
objects(Pi) > r*objects(P(i-1))
for all i in [1, n], where the packs are sorted by
objects(P1) <= objects(P2) <= ... <= objects(Pn).
Since finding a true optimal repacking is NP-hard, we approximate it
along two directions:
1. We assume that there is a cutoff of packs _before starting the
repack_ where everything to the right of that cut-off already forms
a geometric progression (or no cutoff exists and everything must be
repacked).
2. We assume that everything smaller than the cutoff count must be
repacked. This forms our base assumption, but it can also cause
even the "heavy" packs to get repacked, for e.g., if we have 6
packs containing the following number of objects:
1, 1, 1, 2, 4, 32
then we would place the cutoff between '1, 1' and '1, 2, 4, 32',
rolling up the first two packs into a pack with 2 objects. That
breaks our progression and leaves us:
2, 1, 2, 4, 32
^
(where the '^' indicates the position of our split). To restore a
progression, we move the split forward (towards larger packs)
joining each pack into our new pack until a geometric progression
is restored. Here, that looks like:
2, 1, 2, 4, 32 ~> 3, 2, 4, 32 ~> 5, 4, 32 ~> ... ~> 9, 32
^ ^ ^ ^
This has the advantage of not repacking the heavy-side of packs too
often while also only creating one new pack at a time. Another wrinkle
is that we assume that loose, indexed, and reflog'd objects are
insignificant, and lump them into any new pack that we create. This can
lead to non-idempotent results.
Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
|
|
|
-g=<factor>::
|
|
|
|
--geometric=<factor>::
|
|
|
|
Arrange resulting pack structure so that each successive pack
|
|
|
|
contains at least `<factor>` times the number of objects as the
|
|
|
|
next-largest pack.
|
|
|
|
+
|
|
|
|
`git repack` ensures this by determining a "cut" of packfiles that need
|
|
|
|
to be repacked into one in order to ensure a geometric progression. It
|
|
|
|
picks the smallest set of packfiles such that as many of the larger
|
|
|
|
packfiles (by count of objects contained in that pack) may be left
|
|
|
|
intact.
|
|
|
|
+
|
|
|
|
Unlike other repack modes, the set of objects to pack is determined
|
|
|
|
uniquely by the set of packs being "rolled-up"; in other words, the
|
|
|
|
packs determined to need to be combined in order to restore a geometric
|
|
|
|
progression.
|
|
|
|
+
|
|
|
|
When `--unpacked` is specified, loose objects are implicitly included in
|
|
|
|
this "roll-up", without respect to their reachability. This is subject
|
|
|
|
to change in the future. This option (implying a drastically different
|
|
|
|
repack mode) is not guaranteed to work with all other combinations of
|
|
|
|
option to `git repack`.
|
builtin/repack.c: add '--geometric' option
Often it is useful to both:
- have relatively few packfiles in a repository, and
- avoid having so few packfiles in a repository that we repack its
entire contents regularly
This patch implements a '--geometric=<n>' option in 'git repack'. This
allows the caller to specify that they would like each pack to be at
least a factor times as large as the previous largest pack (by object
count).
Concretely, say that a repository has 'n' packfiles, labeled P1, P2,
..., up to Pn. Each packfile has an object count equal to 'objects(Pn)'.
With a geometric factor of 'r', it should be that:
objects(Pi) > r*objects(P(i-1))
for all i in [1, n], where the packs are sorted by
objects(P1) <= objects(P2) <= ... <= objects(Pn).
Since finding a true optimal repacking is NP-hard, we approximate it
along two directions:
1. We assume that there is a cutoff of packs _before starting the
repack_ where everything to the right of that cut-off already forms
a geometric progression (or no cutoff exists and everything must be
repacked).
2. We assume that everything smaller than the cutoff count must be
repacked. This forms our base assumption, but it can also cause
even the "heavy" packs to get repacked, for e.g., if we have 6
packs containing the following number of objects:
1, 1, 1, 2, 4, 32
then we would place the cutoff between '1, 1' and '1, 2, 4, 32',
rolling up the first two packs into a pack with 2 objects. That
breaks our progression and leaves us:
2, 1, 2, 4, 32
^
(where the '^' indicates the position of our split). To restore a
progression, we move the split forward (towards larger packs)
joining each pack into our new pack until a geometric progression
is restored. Here, that looks like:
2, 1, 2, 4, 32 ~> 3, 2, 4, 32 ~> 5, 4, 32 ~> ... ~> 9, 32
^ ^ ^ ^
This has the advantage of not repacking the heavy-side of packs too
often while also only creating one new pack at a time. Another wrinkle
is that we assume that loose, indexed, and reflog'd objects are
insignificant, and lump them into any new pack that we create. This can
lead to non-idempotent results.
Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
|
|
|
|
builtin/repack.c: support writing a MIDX while repacking
Teach `git repack` a new `--write-midx` option for callers that wish to
persist a multi-pack index in their repository while repacking.
There are two existing alternatives to this new flag, but they don't
cover our particular use-case. These alternatives are:
- Call 'git multi-pack-index write' after running 'git repack', or
- Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running
'git repack'.
The former works, but introduces a gap in bitmap coverage between
repacking and writing a new MIDX (since the repack may have deleted a
pack included in the existing MIDX, invalidating it altogether).
Setting the 'GIT_TEST_' environment variable is obviously unsupported.
In fact, even if it were supported officially, it still wouldn't work,
because it generates the MIDX *after* redundant packs have been dropped,
leading to the same issue as above.
Introduce a new option which eliminates this race by teaching `git
repack` to generate the MIDX at the critical point: after the new packs
have been written and moved into place, but before the redundant packs
have been removed.
This option is compatible with `git repack`'s '--bitmap' option (it
changes the interpretation to be: "write a bitmap corresponding to the
MIDX after one has been generated").
There is a little bit of additional noise in the patch below to avoid
repeating ourselves when selecting which packs to delete. Instead of a
single loop as before (where we iterate over 'existing_packs', decide if
a pack is worth deleting, and if so, delete it), we have two loops (the
first where we decide which ones are worth deleting, and the second
where we actually do the deleting). This makes it so we have a single
check we can make consistently when (1) telling the MIDX which packs we
want to exclude, and (2) actually unlinking the redundant packs.
There is also a tiny change to short-circuit the body of
write_midx_included_packs() when no packs remain in the case of an empty
repository. The MIDX code does not handle this, so avoid trying to
generate a MIDX covering zero packs in the first place.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
3 years ago
|
|
|
-m::
|
|
|
|
--write-midx::
|
|
|
|
Write a multi-pack index (see linkgit:git-multi-pack-index[1])
|
|
|
|
containing the non-redundant packs.
|
|
|
|
|
|
|
|
CONFIGURATION
|
|
|
|
-------------
|
|
|
|
|
|
|
|
Various configuration variables affect packing, see
|
|
|
|
linkgit:git-config[1] (search for "pack" and "delta").
|
|
|
|
|
|
|
|
By default, the command passes `--delta-base-offset` option to
|
|
|
|
'git pack-objects'; this typically results in slightly smaller packs,
|
|
|
|
but the generated packs are incompatible with versions of Git older than
|
|
|
|
version 1.4.4. If you need to share your repository with such ancient Git
|
transport: drop support for git-over-rsync
The git-over-rsync protocol is inefficient and broken, and
has been for a long time. It transfers way more objects than
it needs (grabbing all of the remote's "objects/",
regardless of which objects we need). It does its own ad-hoc
parsing of loose and packed refs from the remote, but
doesn't properly override packed refs with loose ones,
leading to garbage results (e.g., expecting the other side
to have an object pointed to by a stale packed-refs entry,
or complaining that the other side has two copies of the
refs[1]).
This latter breakage means that nobody could have
successfully pulled from a moderately active repository
since cd547b4 (fetch/push: readd rsync support, 2007-10-01).
We never made an official deprecation notice in the release
notes for git's rsync protocol, but the tutorial has marked
it as such since 914328a (Update tutorial., 2005-08-30).
And on the mailing list as far back as Oct 2005, we can find
Junio mentioning it as having "been deprecated for quite
some time."[2,3,4]. So it was old news then; cogito had
deprecated the transport in July of 2005[5] (though it did
come back briefly when Linus broke git-http-pull!).
Of course some people professed their love of rsync through
2006, but Linus clarified in his usual gentle manner[6]:
> Thanks! This is why I still use rsync, even though
> everybody and their mother tells me "Linus says rsync is
> deprecated."
No. You're using rsync because you're actively doing
something _wrong_.
The deprecation sentiment was reinforced in 2008, with a
mention that cloning via rsync is broken (with no fix)[7].
Even the commit porting rsync over to C from shell (cd547b4)
lists it as deprecated! So between the 10 years of informal
warnings, and the fact that it has been severely broken
since 2007, it's probably safe to simply remove it without
further deprecation warnings.
[1] http://article.gmane.org/gmane.comp.version-control.git/285101
[2] http://article.gmane.org/gmane.comp.version-control.git/10093
[3] http://article.gmane.org/gmane.comp.version-control.git/17734
[4] http://article.gmane.org/gmane.comp.version-control.git/18911
[5] http://article.gmane.org/gmane.comp.version-control.git/5617
[6] http://article.gmane.org/gmane.comp.version-control.git/19354
[7] http://article.gmane.org/gmane.comp.version-control.git/103635
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
9 years ago
|
|
|
versions, either directly or via the dumb http protocol, then you
|
|
|
|
need to set the configuration variable `repack.UseDeltaBaseOffset` to
|
|
|
|
"false" and repack. Access from old Git versions over the native protocol
|
|
|
|
is unaffected by this option as the conversion is performed on the fly
|
|
|
|
as needed in that case.
|
|
|
|
|
|
|
|
Delta compression is not used on objects larger than the
|
|
|
|
`core.bigFileThreshold` configuration variable and on files with the
|
|
|
|
attribute `delta` set to false.
|
|
|
|
|
|
|
|
SEE ALSO
|
|
|
|
--------
|
|
|
|
linkgit:git-pack-objects[1]
|
|
|
|
linkgit:git-prune-packed[1]
|
|
|
|
|
|
|
|
GIT
|
|
|
|
---
|
|
|
|
Part of the linkgit:git[1] suite
|