You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
123 lines
6.0 KiB
123 lines
6.0 KiB
= Cruft packs |
|
|
|
The cruft packs feature offer an alternative to Git's traditional mechanism of |
|
removing unreachable objects. This document provides an overview of Git's |
|
pruning mechanism, and how a cruft pack can be used instead to accomplish the |
|
same. |
|
|
|
== Background |
|
|
|
To remove unreachable objects from your repository, Git offers `git repack -Ad` |
|
(see linkgit:git-repack[1]). Quoting from the documentation: |
|
|
|
[quote] |
|
[...] unreachable objects in a previous pack become loose, unpacked objects, |
|
instead of being left in the old pack. [...] loose unreachable objects will be |
|
pruned according to normal expiry rules with the next 'git gc' invocation. |
|
|
|
Unreachable objects aren't removed immediately, since doing so could race with |
|
an incoming push which may reference an object which is about to be deleted. |
|
Instead, those unreachable objects are stored as loose objects and stay that way |
|
until they are older than the expiration window, at which point they are removed |
|
by linkgit:git-prune[1]. |
|
|
|
Git must store these unreachable objects loose in order to keep track of their |
|
per-object mtimes. If these unreachable objects were written into one big pack, |
|
then either freshening that pack (because an object contained within it was |
|
re-written) or creating a new pack of unreachable objects would cause the pack's |
|
mtime to get updated, and the objects within it would never leave the expiration |
|
window. Instead, objects are stored loose in order to keep track of the |
|
individual object mtimes and avoid a situation where all cruft objects are |
|
freshened at once. |
|
|
|
This can lead to undesirable situations when a repository contains many |
|
unreachable objects which have not yet left the grace period. Having large |
|
directories in the shards of `.git/objects` can lead to decreased performance in |
|
the repository. But given enough unreachable objects, this can lead to inode |
|
starvation and degrade the performance of the whole system. Since we |
|
can never pack those objects, these repositories often take up a large amount of |
|
disk space, since we can only zlib compress them, but not store them in delta |
|
chains. |
|
|
|
== Cruft packs |
|
|
|
A cruft pack eliminates the need for storing unreachable objects in a loose |
|
state by including the per-object mtimes in a separate file alongside a single |
|
pack containing all loose objects. |
|
|
|
A cruft pack is written by `git repack --cruft` when generating a new pack. |
|
linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft` |
|
is a classic all-into-one repack, meaning that everything in the resulting pack is |
|
reachable, and everything else is unreachable. Once written, the `--cruft` |
|
option instructs `git repack` to generate another pack containing only objects |
|
not packed in the previous step (which equates to packing all unreachable |
|
objects together). This progresses as follows: |
|
|
|
1. Enumerate every object, marking any object which is (a) not contained in a |
|
kept-pack, and (b) whose mtime is within the grace period as a traversal |
|
tip. |
|
|
|
2. Perform a reachability traversal based on the tips gathered in the previous |
|
step, adding every object along the way to the pack. |
|
|
|
3. Write the pack out, along with a `.mtimes` file that records the per-object |
|
timestamps. |
|
|
|
This mode is invoked internally by linkgit:git-repack[1] when instructed to |
|
write a cruft pack. Crucially, the set of in-core kept packs is exactly the set |
|
of packs which will not be deleted by the repack; in other words, they contain |
|
all of the repository's reachable objects. |
|
|
|
When a repository already has a cruft pack, `git repack --cruft` typically only |
|
adds objects to it. An exception to this is when `git repack` is given the |
|
`--cruft-expiration` option, which allows the generated cruft pack to omit |
|
expired objects instead of waiting for linkgit:git-gc[1] to expire those objects |
|
later on. |
|
|
|
It is linkgit:git-gc[1] that is typically responsible for removing expired |
|
unreachable objects. |
|
|
|
== Caution for mixed-version environments |
|
|
|
Repositories that have cruft packs in them will continue to work with any older |
|
version of Git. Note, however, that previous versions of Git which do not |
|
understand the `.mtimes` file will use the cruft pack's mtime as the mtime for |
|
all of the objects in it. In other words, do not expect older (pre-cruft pack) |
|
versions of Git to interpret or even read the contents of the `.mtimes` file. |
|
|
|
Note that having mixed versions of Git GC-ing the same repository can lead to |
|
unreachable objects never being completely pruned. This can happen under the |
|
following circumstances: |
|
|
|
- An older version of Git running GC explodes the contents of an existing |
|
cruft pack loose, using the cruft pack's mtime. |
|
- A newer version running GC collects those loose objects into a cruft pack, |
|
where the .mtime file reflects the loose object's actual mtimes, but the |
|
cruft pack mtime is "now". |
|
|
|
Repeating this process will lead to unreachable objects not getting pruned as a |
|
result of repeatedly resetting the objects' mtimes to the present time. |
|
|
|
If you are GC-ing repositories in a mixed version environment, consider omitting |
|
the `--cruft` option when using linkgit:git-repack[1] and linkgit:git-gc[1], and |
|
leaving the `gc.cruftPacks` configuration unset until all writers understand |
|
cruft packs. |
|
|
|
== Alternatives |
|
|
|
Notable alternatives to this design include: |
|
|
|
- The location of the per-object mtime data, and |
|
- Storing unreachable objects in multiple cruft packs. |
|
|
|
On the location of mtime data, a new auxiliary file tied to the pack was chosen |
|
to avoid complicating the `.idx` format. If the `.idx` format were ever to gain |
|
support for optional chunks of data, it may make sense to consolidate the |
|
`.mtimes` format into the `.idx` itself. |
|
|
|
Storing unreachable objects among multiple cruft packs (e.g., creating a new |
|
cruft pack during each repacking operation including only unreachable objects |
|
which aren't already stored in an earlier cruft pack) is significantly more |
|
complicated to construct, and so aren't pursued here. The obvious drawback to |
|
the current implementation is that the entire cruft pack must be re-written from |
|
scratch.
|
|
|