|
|
|
git-pack-objects(1)
|
|
|
|
===================
|
|
|
|
|
|
|
|
NAME
|
|
|
|
----
|
|
|
|
git-pack-objects - Create a packed archive of objects
|
|
|
|
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
|
|
--------
|
pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series. This may become necessary if
repeated repacking makes delta chain too long. With this, the
output of the command becomes identical to that of the older
implementation. But the performance suffers greatly.
It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.
It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.
$ time old-git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
real 12m8.530s user 11m1.450s sys 0m57.920s
$ time git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
real 0m59.549s user 0m56.670s sys 0m2.400s
$ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
real 11m13.830s user 9m45.240s sys 0m44.330s
There is one remaining issue when --no-reuse-delta option is not
used. It can create delta chains that are deeper than specified.
A<--B<--C<--D E F G
Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.
B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:
E<--F<--G<--A
Oops. We ended up making D a bit too deep, didn't we? B, C and
D form a chain on top of A!
This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta. Unfortunately, deferring the decision until just before
the deltification is not an option. To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.
To prevent this from happening, we should keep A from being
deltified. But how would we tell that, cheaply?
To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3). Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.
However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
19 years ago
|
|
|
[verse]
|
|
|
|
'git pack-objects' [-q | --progress | --all-progress] [--all-progress-implied]
|
|
|
|
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
|
|
|
|
[--local] [--incremental] [--window=<n>] [--depth=<n>]
|
|
|
|
[--revs [--unpacked | --all]] [--stdout | base-name]
|
|
|
|
[--keep-true-parents] < object-list
|
|
|
|
|
|
|
|
|
|
|
|
DESCRIPTION
|
|
|
|
-----------
|
|
|
|
Reads list of objects from the standard input, and writes a packed
|
|
|
|
archive with specified base-name, or to the standard output.
|
|
|
|
|
|
|
|
A packed archive is an efficient way to transfer a set of objects
|
|
|
|
between two repositories as well as an access efficient archival
|
|
|
|
format. In a packed archive, an object is either stored as a
|
|
|
|
compressed whole or as a difference from some other object.
|
|
|
|
The latter is often called a delta.
|
|
|
|
|
|
|
|
The packed archive format (.pack) is designed to be self-contained
|
|
|
|
so that it can be unpacked without any further information. Therefore,
|
|
|
|
each object that a delta depends upon must be present within the pack.
|
|
|
|
|
|
|
|
A pack index file (.idx) is generated for fast, random access to the
|
|
|
|
objects in the pack. Placing both the index file (.idx) and the packed
|
|
|
|
archive (.pack) in the pack/ subdirectory of $GIT_OBJECT_DIRECTORY (or
|
|
|
|
any of the directories on $GIT_ALTERNATE_OBJECT_DIRECTORIES)
|
|
|
|
enables Git to read from the pack archive.
|
|
|
|
|
|
|
|
The 'git unpack-objects' command can read the packed archive and
|
|
|
|
expand the objects contained in the pack into "one-file
|
|
|
|
one-object" format; this is typically done by the smart-pull
|
|
|
|
commands when a pack is created on-the-fly for efficient network
|
|
|
|
transport by their peers.
|
|
|
|
|
|
|
|
|
|
|
|
OPTIONS
|
|
|
|
-------
|
|
|
|
base-name::
|
|
|
|
Write into a pair of files (.pack and .idx), using
|
|
|
|
<base-name> to determine the name of the created file.
|
|
|
|
When this option is used, the two files are written in
|
|
|
|
<base-name>-<SHA-1>.{pack,idx} files. <SHA-1> is a hash
|
|
|
|
of the sorted object names to make the resulting filename
|
|
|
|
based on the pack content, and written to the standard
|
|
|
|
output of the command.
|
|
|
|
|
|
|
|
--stdout::
|
|
|
|
Write the pack contents (what would have been written to
|
|
|
|
.pack file) out to the standard output.
|
|
|
|
|
|
|
|
--revs::
|
|
|
|
Read the revision arguments from the standard input, instead of
|
|
|
|
individual object names. The revision arguments are processed
|
|
|
|
the same way as 'git rev-list' with the `--objects` flag
|
|
|
|
uses its `commit` arguments to build the list of objects it
|
|
|
|
outputs. The objects on the resulting list are packed.
|
|
|
|
|
|
|
|
--unpacked::
|
|
|
|
This implies `--revs`. When processing the list of
|
|
|
|
revision arguments read from the standard input, limit
|
|
|
|
the objects packed to those that are not already packed.
|
|
|
|
|
|
|
|
--all::
|
|
|
|
This implies `--revs`. In addition to the list of
|
|
|
|
revision arguments read from the standard input, pretend
|
docs: don't talk about $GIT_DIR/refs/ everywhere
It is misleading to say that we pull refs from $GIT_DIR/refs/*, because we
may also consult the packed refs mechanism. These days we tend to treat
the "refs hierarchy" as more of an abstract namespace that happens to be
represented as $GIT_DIR/refs. At best, this is a minor inaccuracy, but at
worst it can confuse users who then look in $GIT_DIR/refs and find that it
is missing some of the refs they expected to see.
This patch drops most uses of "$GIT_DIR/refs/*", changing them into just
"refs/*", under the assumption that users can handle the concept of an
abstract refs namespace. There are a few things to note:
- most cases just dropped the $GIT_DIR/ portion. But for cases where
that left _just_ the word "refs", I changed it to "refs/" to help
indicate that it was a hierarchy. I didn't do the same for longer
paths (e.g., "refs/heads" remained, instead of becoming
"refs/heads/").
- in some cases, no change was made, as the text was explicitly about
unpacked refs (e.g., the discussion in git-pack-refs).
- In some cases it made sense instead to note the existence of packed
refs (e.g., in check-ref-format and rev-parse).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
15 years ago
|
|
|
as if all refs under `refs/` are specified to be
|
|
|
|
included.
|
|
|
|
|
|
|
|
--include-tag::
|
|
|
|
Include unasked-for annotated tags if the object they
|
|
|
|
reference was included in the resulting packfile. This
|
|
|
|
can be useful to send new tags to native Git clients.
|
|
|
|
|
|
|
|
--window=<n>::
|
|
|
|
--depth=<n>::
|
|
|
|
These two options affect how the objects contained in
|
|
|
|
the pack are stored using delta compression. The
|
|
|
|
objects are first internally sorted by type, size and
|
|
|
|
optionally names and compared against the other objects
|
|
|
|
within --window to see if using delta compression saves
|
|
|
|
space. --depth limits the maximum delta depth; making
|
|
|
|
it too deep affects the performance on the unpacker
|
|
|
|
side, because delta data needs to be applied that many
|
|
|
|
times to get to the necessary object.
|
|
|
|
The default value for --window is 10 and --depth is 50.
|
|
|
|
|
|
|
|
--window-memory=<n>::
|
|
|
|
This option provides an additional limit on top of `--window`;
|
|
|
|
the window size will dynamically scale down so as to not take
|
|
|
|
up more than '<n>' bytes in memory. This is useful in
|
|
|
|
repositories with a mix of large and small objects to not run
|
|
|
|
out of memory with a large window, but still be able to take
|
|
|
|
advantage of the large window for the smaller objects. The
|
|
|
|
size can be suffixed with "k", "m", or "g".
|
|
|
|
`--window-memory=0` makes memory usage unlimited, which is the
|
|
|
|
default.
|
|
|
|
|
|
|
|
--max-pack-size=<n>::
|
|
|
|
Maximum size of each output pack file. The size can be suffixed with
|
|
|
|
"k", "m", or "g". The minimum size allowed is limited to 1 MiB.
|
|
|
|
If specified, multiple packfiles may be created.
|
|
|
|
The default is unlimited, unless the config variable
|
|
|
|
`pack.packSizeLimit` is set.
|
|
|
|
|
|
|
|
--honor-pack-keep::
|
|
|
|
This flag causes an object already in a local pack that
|
|
|
|
has a .keep file to be ignored, even if it would have
|
|
|
|
otherwise been packed.
|
|
|
|
|
|
|
|
--incremental::
|
|
|
|
This flag causes an object already in a pack to be ignored
|
|
|
|
even if it would have otherwise been packed.
|
|
|
|
|
|
|
|
--local::
|
|
|
|
This flag causes an object that is borrowed from an alternate
|
|
|
|
object store to be ignored even if it would have otherwise been
|
|
|
|
packed.
|
|
|
|
|
|
|
|
--non-empty::
|
|
|
|
Only create a packed archive if it would contain at
|
|
|
|
least one object.
|
|
|
|
|
|
|
|
--progress::
|
|
|
|
Progress status is reported on the standard error stream
|
|
|
|
by default when it is attached to a terminal, unless -q
|
|
|
|
is specified. This flag forces progress status even if
|
|
|
|
the standard error stream is not directed to a terminal.
|
|
|
|
|
|
|
|
--all-progress::
|
|
|
|
When --stdout is specified then progress report is
|
|
|
|
displayed during the object count and compression phases
|
|
|
|
but inhibited during the write-out phase. The reason is
|
|
|
|
that in some cases the output stream is directly linked
|
|
|
|
to another command which may wish to display progress
|
|
|
|
status of its own as it processes incoming pack data.
|
|
|
|
This flag is like --progress except that it forces progress
|
|
|
|
report for the write-out phase as well even if --stdout is
|
|
|
|
used.
|
|
|
|
|
|
|
|
--all-progress-implied::
|
|
|
|
This is used to imply --all-progress whenever progress display
|
|
|
|
is activated. Unlike --all-progress this flag doesn't actually
|
|
|
|
force any progress display by itself.
|
|
|
|
|
pack-objects: finishing touches.
This introduces --no-reuse-delta option to disable reusing of
existing delta, which is a large part of the optimization
introduced by this series. This may become necessary if
repeated repacking makes delta chain too long. With this, the
output of the command becomes identical to that of the older
implementation. But the performance suffers greatly.
It still allows reusing non-deltified representations; there is
no point uncompressing and recompressing the whole text.
It also adds a couple more statistics output, while squelching
it under -q flag, which the last round forgot to do.
$ time old-git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects....................
real 12m8.530s user 11m1.450s sys 0m57.920s
$ time git-pack-objects --stdout >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
real 0m59.549s user 0m56.670s sys 0m2.400s
$ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
Generating pack...
Done counting 184141 objects.
Packing 184141 objects.....................
Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
real 11m13.830s user 9m45.240s sys 0m44.330s
There is one remaining issue when --no-reuse-delta option is not
used. It can create delta chains that are deeper than specified.
A<--B<--C<--D E F G
Suppose we have a delta chain A to D (A is stored in full either
in a pack or as a loose object. B is depth1 delta relative to A,
C is depth2 delta relative to B...) with loose objects E, F, G.
And we are going to pack all of them.
B, C and D are left as delta against A, B and C respectively.
So A, E, F, and G are examined for deltification, and let's say
we decided to keep E expanded, and store the rest as deltas like
this:
E<--F<--G<--A
Oops. We ended up making D a bit too deep, didn't we? B, C and
D form a chain on top of A!
This is because we did not know what the final depth of A would
be, when we checked objects and decided to keep the existing
delta. Unfortunately, deferring the decision until just before
the deltification is not an option. To be able to make B, C,
and D candidates for deltification with the rest, we need to
know the type and final unexpanded size of them, but the major
part of the optimization comes from the fact that we do not read
the delta data to do so -- getting the final size is quite an
expensive operation.
To prevent this from happening, we should keep A from being
deltified. But how would we tell that, cheaply?
To do this most precisely, after check_object() runs, each
object that is used as the base object of some existing delta
needs to be marked with the maximum depth of the objects we
decided to keep deltified (in this case, D is depth 3 relative
to A, so if no other delta chain that is longer than 3 based on
A exists, mark A with 3). Then when attempting to deltify A, we
would take that number into account to see if the final delta
chain that leads to D becomes too deep.
However, this is a bit cumbersome to compute, so we would cheat
and reduce the maximum depth for A arbitrarily to depth/4 in
this implementation.
Signed-off-by: Junio C Hamano <junkio@cox.net>
19 years ago
|
|
|
-q::
|
|
|
|
This flag makes the command not to report its progress
|
|
|
|
on the standard error stream.
|
|
|
|
|
|
|
|
--no-reuse-delta::
|
|
|
|
When creating a packed archive in a repository that
|
|
|
|
has existing packs, the command reuses existing deltas.
|
|
|
|
This sometimes results in a slightly suboptimal pack.
|
|
|
|
This flag tells the command not to reuse existing deltas
|
|
|
|
but compute them from scratch.
|
|
|
|
|
|
|
|
--no-reuse-object::
|
|
|
|
This flag tells the command not to reuse existing object data at all,
|
|
|
|
including non deltified object, forcing recompression of everything.
|
Custom compression levels for objects and packs
Add config variables pack.compression and core.loosecompression ,
and switch --compression=level to pack-objects.
Loose objects will be compressed using core.loosecompression if set,
else core.compression if set, else Z_BEST_SPEED.
Packed objects will be compressed using --compression=level if seen,
else pack.compression if set, else core.compression if set,
else Z_DEFAULT_COMPRESSION. This is the "pack compression level".
Loose objects added to a pack undeltified will be recompressed
to the pack compression level if it is unequal to the current
loose compression level by the preceding rules, or if the loose
object was written while core.legacyheaders = true. Newly
deltified loose objects are always compressed to the current
pack compression level.
Previously packed objects added to a pack are recompressed
to the current pack compression level exactly when their
deltification status changes, since the previous pack data
cannot be reused.
In either case, the --no-reuse-object switch from the first
patch below will always force recompression to the current pack
compression level, instead of assuming the pack compression level
hasn't changed and pack data can be reused when possible.
This applies on top of the following patches from Nicolas Pitre:
[PATCH] allow for undeltified objects not to be reused
[PATCH] make "repack -f" imply "pack-objects --no-reuse-object"
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
18 years ago
|
|
|
This implies --no-reuse-delta. Useful only in the obscure case where
|
|
|
|
wholesale enforcement of a different compression level on the
|
|
|
|
packed data is desired.
|
|
|
|
|
|
|
|
--compression=<n>::
|
Custom compression levels for objects and packs
Add config variables pack.compression and core.loosecompression ,
and switch --compression=level to pack-objects.
Loose objects will be compressed using core.loosecompression if set,
else core.compression if set, else Z_BEST_SPEED.
Packed objects will be compressed using --compression=level if seen,
else pack.compression if set, else core.compression if set,
else Z_DEFAULT_COMPRESSION. This is the "pack compression level".
Loose objects added to a pack undeltified will be recompressed
to the pack compression level if it is unequal to the current
loose compression level by the preceding rules, or if the loose
object was written while core.legacyheaders = true. Newly
deltified loose objects are always compressed to the current
pack compression level.
Previously packed objects added to a pack are recompressed
to the current pack compression level exactly when their
deltification status changes, since the previous pack data
cannot be reused.
In either case, the --no-reuse-object switch from the first
patch below will always force recompression to the current pack
compression level, instead of assuming the pack compression level
hasn't changed and pack data can be reused when possible.
This applies on top of the following patches from Nicolas Pitre:
[PATCH] allow for undeltified objects not to be reused
[PATCH] make "repack -f" imply "pack-objects --no-reuse-object"
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
18 years ago
|
|
|
Specifies compression level for newly-compressed data in the
|
|
|
|
generated pack. If not specified, pack compression level is
|
|
|
|
determined first by pack.compression, then by core.compression,
|
|
|
|
and defaults to -1, the zlib default, if neither is set.
|
|
|
|
Add --no-reuse-object if you want to force a uniform compression
|
|
|
|
level on all data no matter the source.
|
Custom compression levels for objects and packs
Add config variables pack.compression and core.loosecompression ,
and switch --compression=level to pack-objects.
Loose objects will be compressed using core.loosecompression if set,
else core.compression if set, else Z_BEST_SPEED.
Packed objects will be compressed using --compression=level if seen,
else pack.compression if set, else core.compression if set,
else Z_DEFAULT_COMPRESSION. This is the "pack compression level".
Loose objects added to a pack undeltified will be recompressed
to the pack compression level if it is unequal to the current
loose compression level by the preceding rules, or if the loose
object was written while core.legacyheaders = true. Newly
deltified loose objects are always compressed to the current
pack compression level.
Previously packed objects added to a pack are recompressed
to the current pack compression level exactly when their
deltification status changes, since the previous pack data
cannot be reused.
In either case, the --no-reuse-object switch from the first
patch below will always force recompression to the current pack
compression level, instead of assuming the pack compression level
hasn't changed and pack data can be reused when possible.
This applies on top of the following patches from Nicolas Pitre:
[PATCH] allow for undeltified objects not to be reused
[PATCH] make "repack -f" imply "pack-objects --no-reuse-object"
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
18 years ago
|
|
|
|
|
|
|
--thin::
|
|
|
|
Create a "thin" pack by omitting the common objects between a
|
|
|
|
sender and a receiver in order to reduce network transfer. This
|
|
|
|
option only makes sense in conjunction with --stdout.
|
|
|
|
+
|
|
|
|
Note: A thin pack violates the packed archive format by omitting
|
|
|
|
required objects and is thus unusable by Git without making it
|
|
|
|
self-contained. Use `git index-pack --fix-thin`
|
|
|
|
(see linkgit:git-index-pack[1]) to restore the self-contained property.
|
|
|
|
|
|
|
|
--delta-base-offset::
|
|
|
|
A packed archive can express the base object of a delta as
|
|
|
|
either a 20-byte object name or as an offset in the
|
|
|
|
stream, but ancient versions of Git don't understand the
|
|
|
|
latter. By default, 'git pack-objects' only uses the
|
|
|
|
former format for better compatibility. This option
|
|
|
|
allows the command to use the latter format for
|
|
|
|
compactness. Depending on the average delta chain
|
|
|
|
length, this option typically shrinks the resulting
|
|
|
|
packfile by 3-5 per-cent.
|
|
|
|
+
|
|
|
|
Note: Porcelain commands such as `git gc` (see linkgit:git-gc[1]),
|
|
|
|
`git repack` (see linkgit:git-repack[1]) pass this option by default
|
|
|
|
in modern Git when they put objects in your repository into pack files.
|
|
|
|
So does `git bundle` (see linkgit:git-bundle[1]) when it creates a bundle.
|
|
|
|
|
|
|
|
--threads=<n>::
|
|
|
|
Specifies the number of threads to spawn when searching for best
|
|
|
|
delta matches. This requires that pack-objects be compiled with
|
|
|
|
pthreads otherwise this option is ignored with a warning.
|
|
|
|
This is meant to reduce packing time on multiprocessor machines.
|
|
|
|
The required amount of memory for the delta search window is
|
|
|
|
however multiplied by the number of threads.
|
|
|
|
Specifying 0 will cause Git to auto-detect the number of CPU's
|
|
|
|
and set the number of threads accordingly.
|
|
|
|
|
|
|
|
--index-version=<version>[,<offset>]::
|
|
|
|
This is intended to be used by the test suite only. It allows
|
|
|
|
to force the version for the generated pack index, and to force
|
|
|
|
64-bit index entries on objects located above the given offset.
|
|
|
|
|
|
|
|
--keep-true-parents::
|
|
|
|
With this option, parents that are hidden by grafts are packed
|
|
|
|
nevertheless.
|
|
|
|
|
|
|
|
SEE ALSO
|
|
|
|
--------
|
|
|
|
linkgit:git-rev-list[1]
|
|
|
|
linkgit:git-repack[1]
|
|
|
|
linkgit:git-prune-packed[1]
|
|
|
|
|
|
|
|
GIT
|
|
|
|
---
|
|
|
|
Part of the linkgit:git[1] suite
|