Commit Graph

270 Commits (bed8947751b8057bb6714c53eb15a098b62c65da)

Author SHA1 Message Date
Jonathan Tan ffb2c0fe5c index-pack: support checking objects but not links
The index-pack command currently supports the
--check-self-contained-and-connected argument, for internal use only,
that instructs it to only check for broken links and not broken objects.
For partial clones, we need the inverse, so add a --fsck-objects
argument that checks for broken objects and not broken links, also for
internal use only.

This will be used by fetch-pack in a subsequent patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15 10:16:24 -07:00
Junio C Hamano 99321e327b Merge branch 'nd/object-allocation-comments'
Code doc update.

* nd/object-allocation-comments:
  object.h: realign object flag allocation comment
  object.h: update flag allocation comment
2018-03-14 12:01:06 -07:00
brian m. carlson b4f5aca40e sha1_file: convert read_sha1_file to struct object_id
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file.  Do the same for read_sha1_file_extended.

Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id.  Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:50 -07:00
brian m. carlson abef9020e3 sha1_file: convert sha1_object_info* to object_id
Convert sha1_object_info and sha1_object_info_extended to take pointers
to struct object_id and rename them to use "oid" instead of "sha1" in
their names.  Update the declaration and definition and apply the
following semantic patch, plus the standard object_id transforms:

@@
expression E1, E2;
@@
- sha1_object_info(E1.hash, E2)
+ oid_object_info(&E1, E2)

@@
expression E1, E2;
@@
- sha1_object_info(E1->hash, E2)
+ oid_object_info(E1, E2)

@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1.hash, E2, E3)
+ oid_object_info_extended(&E1, E2, E3)

@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1->hash, E2, E3)
+ oid_object_info_extended(E1, E2, E3)

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
brian m. carlson ef7b5195f1 streaming: convert open_istream to use struct object_id
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
brian m. carlson 17e65451e3 sha1_file: convert check_sha1_signature to struct object_id
Convert this function to take a pointer to struct object_id and rename
it check_object_signature.  Introduce temporaries to convert the return
values of lookup_replace_object and lookup_replace_object_extended into
struct object_id.

The temporaries are needed because in order to convert
lookup_replace_object, open_istream needs to be converted, and
open_istream needs check_sha1_signature to be converted, causing a loop
of dependencies.  The temporaries will be removed in a future patch.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
brian m. carlson af8caf33d5 builtin/index-pack: convert struct ref_delta_entry to object_id
Convert this struct to use a member of type object_id.  Convert various
static functions as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:48 -07:00
Junio C Hamano 169c9c0169 Merge branch 'bw/c-plus-plus'
Avoid using identifiers that clash with C++ keywords.  Even though
it is not a goal to compile Git with C++ compilers, changes like
this help use of code analysis tools that targets C++ on our
codebase.

* bw/c-plus-plus: (37 commits)
  replace: rename 'new' variables
  trailer: rename 'template' variables
  tempfile: rename 'template' variables
  wrapper: rename 'template' variables
  environment: rename 'namespace' variables
  diff: rename 'template' variables
  environment: rename 'template' variables
  init-db: rename 'template' variables
  unpack-trees: rename 'new' variables
  trailer: rename 'new' variables
  submodule: rename 'new' variables
  split-index: rename 'new' variables
  remote: rename 'new' variables
  ref-filter: rename 'new' variables
  read-cache: rename 'new' variables
  line-log: rename 'new' variables
  imap-send: rename 'new' variables
  http: rename 'new' variables
  entry: rename 'new' variables
  diffcore-delta: rename 'new' variables
  ...
2018-03-06 14:54:07 -08:00
Nguyễn Thái Ngọc Duy 95308d64ce object.h: update flag allocation comment
Since the "flags" is shared, it's a good idea to keep track of who
uses what bit. When we need to use more flags in library code, we can
be sure it won't be re-used for another purpose by some caller.

While at there, fix the location of "5" (should be in a different
column than "4" two lines down)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-06 11:41:21 -08:00
Junio C Hamano 0fd90daba8 Merge branch 'bc/hash-algo'
More abstraction of hash function from the codepath.

* bc/hash-algo:
  hash: update obsolete reference to SHA1_HEADER
  bulk-checkin: abstract SHA-1 usage
  csum-file: abstract uses of SHA-1
  csum-file: rename sha1file to hashfile
  read-cache: abstract away uses of SHA-1
  pack-write: switch various SHA-1 values to abstract forms
  pack-check: convert various uses of SHA-1 to abstract forms
  fast-import: switch various uses of SHA-1 to the_hash_algo
  sha1_file: switch uses of SHA-1 to the_hash_algo
  builtin/unpack-objects: switch uses of SHA-1 to the_hash_algo
  builtin/index-pack: improve hash function abstraction
  hash: create union for hash context allocation
  hash: move SHA-1 macros to hash.h
2018-02-15 14:55:47 -08:00
Junio C Hamano 8be8342b4c Merge branch 'po/object-id'
Conversion from uchar[20] to struct object_id continues.

* po/object-id:
  sha1_file: rename hash_sha1_file_literally
  sha1_file: convert write_loose_object to object_id
  sha1_file: convert force_object_loose to object_id
  sha1_file: convert write_sha1_file to object_id
  notes: convert write_notes_tree to object_id
  notes: convert combine_notes_* to object_id
  commit: convert commit_tree* to object_id
  match-trees: convert splice_tree to object_id
  cache: clear whole hash buffer with oidclr
  sha1_file: convert hash_sha1_file to object_id
  dir: convert struct sha1_stat to use object_id
  sha1_file: convert pretend_sha1_file to object_id
2018-02-15 14:55:43 -08:00
Brandon Williams debca9d2fe object: rename function 'typename' to 'type_name'
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-14 13:10:05 -08:00
Junio C Hamano f3d618d2bf Merge branch 'jh/fsck-promisors'
In preparation for implementing narrow/partial clone, the machinery
for checking object connectivity used by gc and fsck has been
taught that a missing object is OK when it is referenced by a
packfile specially marked as coming from trusted repository that
promises to make them available on-demand and lazily.

* jh/fsck-promisors:
  gc: do not repack promisor packfiles
  rev-list: support termination at promisor objects
  sha1_file: support lazily fetching missing objects
  introduce fetch-object: fetch one promisor object
  index-pack: refactor writing of .keep files
  fsck: support promisor objects as CLI argument
  fsck: support referenced promisor objects
  fsck: support refs pointing to promisor objects
  fsck: introduce partialclone extension
  extension.partialclone: introduce partial clone extension
2018-02-13 13:39:03 -08:00
brian m. carlson 98a3beab6a csum-file: rename sha1file to hashfile
Rename struct sha1file to struct hashfile, along with all of its related
functions.

The transformation in this commit was made by global search-and-replace.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02 11:28:41 -08:00
brian m. carlson 454253f059 builtin/index-pack: improve hash function abstraction
Convert several uses of unsigned char [20] to struct object_id and
convert various hard-coded constants and uses of SHA-1 functions to use
the_hash_algo.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02 11:28:41 -08:00
Patryk Obara f070faccc1 sha1_file: convert hash_sha1_file to object_id
Convert the declaration and definition of hash_sha1_file to use
struct object_id and adjust all function calls.

Rename this function to hash_object_file.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
Christian Couder 72885a6d51 index-pack: use skip_to_optional_arg()
Let's simplify index-pack option parsing using
skip_to_optional_arg().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-11 16:10:12 -08:00
Jonathan Tan 8b4c0103a9 sha1_file: support lazily fetching missing objects
Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.

The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.

However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file.c that take in a hash, without the user
     regarding how the object is stored (loose or packed)
 (2) functions in packfile.c (because I need to check callers that know
     about the loose/packed distinction and operate on both differently,
     and ensure that they can handle the concept of objects that are
     neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and others.  For
for_each_packed_object, the callers either already work or are fixed in
this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
   - find_pack_entry_one
     - this searches a single pack that is provided as an argument; the
       caller already knows (through other means) that the sought object
       is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a file if
     it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promisor objects
   - builtin/count-objects - informational purposes only (check if loose
     object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
     not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:52:42 -08:00
Jonathan Tan 88e2f9ed8e introduce fetch-object: fetch one promisor object
Introduce fetch-object, providing the ability to fetch one object from a
promisor remote.

This uses fetch-pack. To do this, the transport mechanism has been
updated with 2 flags, "from-promisor" to indicate that the resulting
pack comes from a promisor remote (and thus should be annotated as such
by index-pack), and "no-dependents" to indicate that only the objects
themselves need to be fetched (but fetching additional objects is
nevertheless safe).

Whenever "no-dependents" is used, fetch-pack will refrain from using any
object flags, because it is most likely invoked as part of a dynamic
object fetch by another Git command (which may itself use object flags).
An alternative to this is to leave fetch-pack alone, and instead update
the allocation of flags so that fetch-pack's flags never overlap with
any others, but this will end up shrinking the number of flags available
to nearly every other Git command (that is, every Git command that
accesses objects), so the approach in this commit was used instead.

This will be tested in a subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:46:05 -08:00
Jonathan Tan 8e29c7c3af index-pack: refactor writing of .keep files
In a subsequent commit, index-pack will be taught to write ".promisor"
files which are similar to the ".keep" files it knows how to write.
Refactor the writing of ".keep" files, so that the implementation of
writing ".promisor" files becomes easier.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:46:05 -08:00
Derrick Stolee 19716b21a4 cleanup: fix possible overflow errors in binary search
A common mistake when writing binary search is to allow possible
integer overflow by using the simple average:

	mid = (min + max) / 2;

Instead, use the overflow-safe version:

	mid = min + (max - min) / 2;

This translation is safe since the operation occurs inside a loop
conditioned on "min < max". The included changes were found using
the following git grep:

	git grep '/ *2;' '*.c'

Making this cleanup will prevent future review friction when a new
binary search is contructed based on existing code.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-10 08:57:24 +09:00
Jonathan Tan 4f39cd821d pack: move pack name-related functions
Currently, sha1_file.c and cache.h contain many functions, both related
to and unrelated to packfiles. This makes both files very large and
causes an unclear separation of concerns.

Create a new file, packfile.c, to hold all packfile-related functions
currently in sha1_file.c. It has a corresponding header packfile.h.

In this commit, the pack name-related functions are moved. Subsequent
commits will move the other functions.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:06 -07:00
Junio C Hamano 00b7cf2379 Merge branch 'jt/unify-object-info'
Code clean-ups.

* jt/unify-object-info:
  sha1_file: refactor has_sha1_file_with_flags
  sha1_file: do not access pack if unneeded
  sha1_file: teach sha1_object_info_extended more flags
  sha1_file: refactor read_object
  sha1_file: move delta base cache code up
  sha1_file: rename LOOKUP_REPLACE_OBJECT
  sha1_file: rename LOOKUP_UNKNOWN_OBJECT
  sha1_file: teach packed_object_info about typename
2017-07-05 13:32:57 -07:00
Jonathan Tan e83e71c5e1 sha1_file: refactor has_sha1_file_with_flags
has_sha1_file_with_flags() implements many mechanisms in common with
sha1_object_info_extended(). Make has_sha1_file_with_flags() a
convenience function for sha1_object_info_extended() instead.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-26 10:28:58 -07:00
Junio C Hamano 50f03c6676 Merge branch 'ab/free-and-null'
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.

* ab/free-and-null:
  *.[ch] refactoring: make use of the FREE_AND_NULL() macro
  coccinelle: make use of the "expression" FREE_AND_NULL() rule
  coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
  coccinelle: make use of the "type" FREE_AND_NULL() rule
  coccinelle: add a rule to make "type" code use FREE_AND_NULL()
  git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
2017-06-24 14:28:41 -07:00
Junio C Hamano f31d23a399 Merge branch 'bw/config-h'
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.

* bw/config-h:
  config: don't implicitly use gitdir or commondir
  config: respect commondir
  setup: teach discover_git_directory to respect the commondir
  config: don't include config.h by default
  config: remove git_config_iter
  config: create config.h
2017-06-24 14:28:41 -07:00
Ævar Arnfjörð Bjarmason 6a83d90207 coccinelle: make use of the "type" FREE_AND_NULL() rule
Apply the result of the just-added coccinelle rule. This manually
excludes a few occurrences, mostly things that resulted in many
FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent
change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-16 12:44:03 -07:00
Brandon Williams b2141fc1d2 config: don't include config.h by default
Stop including config.h by default in cache.h.  Instead only include
config.h in those files which require use of the config system.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 12:56:22 -07:00
brian m. carlson c251c83df2 object: convert parse_object* to take struct object_id
Make parse_object, parse_object_or_die, and parse_object_buffer take a
pointer to struct object_id.  Remove the temporary variables inserted
earlier, since they are no longer necessary.  Transform all of the
callers using the following semantic patch:

@@
expression E1;
@@
- parse_object(E1.hash)
+ parse_object(&E1)

@@
expression E1;
@@
- parse_object(E1->hash)
+ parse_object(E1)

@@
expression E1, E2;
@@
- parse_object_or_die(E1.hash, E2)
+ parse_object_or_die(&E1, E2)

@@
expression E1, E2;
@@
- parse_object_or_die(E1->hash, E2)
+ parse_object_or_die(E1, E2)

@@
expression E1, E2, E3, E4, E5;
@@
- parse_object_buffer(E1.hash, E2, E3, E4, E5)
+ parse_object_buffer(&E1, E2, E3, E4, E5)

@@
expression E1, E2, E3, E4, E5;
@@
- parse_object_buffer(E1->hash, E2, E3, E4, E5)
+ parse_object_buffer(E1, E2, E3, E4, E5)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:58 +09:00
brian m. carlson 3aca1fc6c9 Convert lookup_blob to struct object_id
Convert lookup_blob to take a pointer to struct object_id.

The commit was created with manual changes to blob.c and blob.h, plus
the following semantic patch:

@@
expression E1;
@@
- lookup_blob(E1.hash)
+ lookup_blob(&E1)

@@
expression E1;
@@
- lookup_blob(E1->hash)
+ lookup_blob(E1)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
brian m. carlson 3e9309815d Convert remaining callers of lookup_blob to object_id
All but a few callers of lookup_blob have been converted to struct
object_id.  Introduce a temporary, which will be removed later, into
parse_object to ease the transition, and convert the remaining callers
so that we can update lookup_blob to take struct object_id *.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
brian m. carlson e6a492b7be pack: convert struct pack_idx_entry to struct object_id
Convert struct pack_idx_entry to use struct object_id by changing the
definition and applying the following semantic patch, plus the standard
object_id transforms:

@@
struct pack_idx_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct pack_idx_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
Junio C Hamano dfe46c5ce6 Merge branch 'jk/loose-object-info-report-error'
Update error handling for codepath that deals with corrupt loose
objects.

* jk/loose-object-info-report-error:
  index-pack: detect local corruption in collision check
  sha1_loose_object_info: return error for corrupted objects
2017-04-16 23:29:30 -07:00
Jeff King 51054177b3 index-pack: detect local corruption in collision check
When we notice that we have a local copy of an incoming
object, we compare the two objects to make sure we haven't
found a collision. Before we get to the actual object
bytes, though, we compare the type and size from
sha1_object_info().

If our local object is corrupted, then the type will be
OBJ_BAD, which obviously will not match the incoming type,
and we'll report "SHA1 COLLISION FOUND" (with capital
letters and everything). This is confusing, as the problem
is not a collision but rather local corruption. We should
report that instead (just like we do if reading the rest of
the object content fails a few lines later).

Note that we _could_ just ignore the error and mark it as a
non-collision. That would let you "git fetch" to replace a
corrupted object. But it's not a very reliable method for
repairing a repository. The earlier want/have negotiation
tries to get the other side to omit objects we already have,
and it would not realize that we are "missing" this
corrupted object. So we're better off complaining loudly
when we see corruption, and letting the user take more
drastic measures to repair (like making a full clone
elsewhere and copying the pack into place).

Note that the test sets transfer.unpackLimit in the
receiving repository so that we use index-pack (which is
what does the collision check). Normally for such a small
push we'd use unpack-objects, which would simply try to
write the loose object, and discard the new one when we see
that there's already an old one.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-01 10:48:11 -07:00
Jeff King 5b1ef2cef4 replace unchecked snprintf calls with heap buffers
We'd prefer to avoid unchecked snprintf calls because
truncation can lead to unexpected results.

These are all cases where truncation shouldn't ever happen,
because the input to snprintf is fixed in size. That makes
them candidates for xsnprintf(), but it's simpler still to
just use the heap, and then nobody has to wonder if "100" is
big enough.

We'll use xstrfmt() where possible, and a strbuf when we need
the resulting size or to reuse the same buffer in a loop.

Signed-off-by: Jeff King <peff@peff.net>
2017-03-30 14:59:50 -07:00
Jeff King 594fa9998c odb_mkstemp: write filename into strbuf
The odb_mkstemp() function expects the caller to provide a
fixed buffer to write the resulting tempfile name into. But
it creates the template using snprintf without checking the
return value. This means we could silently truncate the
filename.

In practice, it's unlikely that the truncation would end in
the template-pattern that mkstemp needs to open the file. So
we'd probably end up failing either way, unless the path was
specially crafted.

The simplest fix would be to notice the truncation and die.
However, we can observe that most callers immediately
xstrdup() the result anyway. So instead, let's switch to
using a strbuf, which is easier for them (and isn't a big
deal for the other 2 callers, who can just strbuf_release
when they're done with it).

Note that many of the callers used static buffers, but this
was purely to avoid putting a large buffer on the stack. We
never passed the static buffers out of the function, so
there's no complicated memory handling we need to change.

Signed-off-by: Jeff King <peff@peff.net>
2017-03-28 15:28:04 -07:00
Jeff King 892e723afd do not check odb_mkstemp return value for errors
The odb_mkstemp function does not return an error; it dies
on failure instead. But many of its callers compare the
resulting descriptor against -1 and die themselves.

Mostly this is just pointless, but it does raise a question
when looking at the callers: if they show the results of the
"template" buffer after a failure, what's in it? The answer
is: it doesn't matter, because it cannot happen.

So let's make that clear by removing the bogus error checks.
In bitmap_writer_finish(), we can drop the error-handling
code entirely. In the other two cases, it's shared with the
open() in another code path; we can just move the
error-check next to that open() call.

And while we're at it, let's flesh out the function's
docstring a bit to make the error behavior clear.

Signed-off-by: Jeff King <peff@peff.net>
2017-03-28 15:28:04 -07:00
Jeff King f20754802a index-pack: make pointer-alias fallbacks safer
The final() function accepts a NULL value for certain
parameters, and falls back to writing into a reusable "name"
buffer, and then either:

  1. For "keep_name", requiring all uses to do "keep_name ?
     keep_name : name.buf". This is awkward, and it's easy
     to accidentally look at the maybe-NULL keep_name.

  2. For "final_index_name" and "final_pack_name", aliasing
     those pointers to the "name" buffer. This is easier to
     use, but the aliased pointers become invalid after the
     buffer is reused (this isn't a bug now, but it's a
     potential pitfall).

One way to make this safer would be to introduce an extra
pointer to do the aliasing, and have its lifetime match the
validity of the "name" buffer. But it's still easy to
accidentally use the wrong name (i.e., to use
"final_pack_name" instead of the aliased pointer).

Instead, let's use three separate buffers that will remain
valid through the function. That makes it safe to alias the
pointers and use them consistently. The extra allocations
shouldn't matter, as this function is not performance
sensitive.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16 11:33:43 -07:00
Jeff King ba47a3088f replace snprintf with odb_pack_name()
In several places we write the name of the pack filename
into a fixed-size buffer using snprintf(), but do not check
the return value.  As a result, a very long object directory
could cause us to quietly truncate the pack filename
(potentially leading to a corrupted repository, as a newly
written packfile could be missing its .pack extension).

We can use odb_pack_name() to do this with a strbuf (and
shorten the code, as well).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16 11:26:18 -07:00
Jeff King eaeefc3276 odb_pack_keep(): stop generating keepfile name
The odb_pack_keep() function generates the name of a .keep
file and opens it. This has two problems:

  1. It requires a fixed-size buffer to create the filename
     and doesn't notice when the result is truncated.

  2. Of the two callers, one sometimes wants to open a
     filename it already has, which makes things awkward (it
     has to do so manually, and skips the leading-directory
     creation).

Instead, let's have odb_pack_keep() just open the file.
Generating the name isn't hard, and a future patch will
switch callers over to odb_pack_name() anyway.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16 11:17:00 -07:00
Jeff King 29401e1575 index-pack: skip collision check when not in repository
You can run "git index-pack path/to/foo.pack" outside of a
repository to generate an index file, or just to verify the
contents. There's no point in doing a collision check, since
we obviously do not have any objects to collide with.

The current code will blindly look in .git/objects based on
the result of setup_git_env(). That effectively gives us the
right answer (since we won't find any objects), but it's a
waste of time, and it conflicts with our desire to
eventually get rid of the "fallback to .git" behavior of
setup_git_env().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-16 13:57:19 -08:00
Jeff King 7176a31444 index-pack: complain when --stdin is used outside of a repo
The index-pack builtin is marked as RUN_SETUP_GENTLY,
because it's perfectly fine to index a pack in the
filesystem outside of any repository. However, --stdin mode
will write the result to the object database, which does not
make sense outside of a repository. Doing so creates a bogus
".git" directory with nothing in it except the newly-created
pack and its index.

Instead, let's flag this as an error and abort.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-16 09:29:43 -08:00
René Scharfe 1b5294de40 use QSORT, part 2
Convert two more qsort(3) calls to QSORT to reduce code size and for
better safety and consistency.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-29 20:40:23 -07:00
René Scharfe 9ed0d8d6e6 use QSORT
Apply the semantic patch contrib/coccinelle/qsort.cocci to the code
base, replacing calls of qsort(3) with QSORT.  The resulting code is
shorter and supports empty arrays with NULL pointers.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-29 15:42:18 -07:00
Jeff King 411481be6f index-pack: add --max-input-size=<size> option
When receiving a pack-file, it can be useful to abort the
`git index-pack`, if the pack-file is too big.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-24 12:31:05 -07:00
Junio C Hamano a58a8e3f71 Merge branch 'jk/push-progress'
"git push" and "git clone" learned to give better progress meters
to the end user who is waiting on the terminal.

* jk/push-progress:
  receive-pack: send keepalives during quiet periods
  receive-pack: turn on connectivity progress
  receive-pack: relay connectivity errors to sideband
  receive-pack: turn on index-pack resolving progress
  index-pack: add flag for showing delta-resolution progress
  clone: use a real progress meter for connectivity check
  check_connected: add progress flag
  check_connected: relay errors to alternate descriptor
  check_everything_connected: use a struct with named options
  check_everything_connected: convert to argv_array
  rev-list: add optional progress reporting
  check_everything_connected: always pass --quiet to rev-list
2016-08-03 15:10:28 -07:00
Jeff King 83558686ce receive-pack: send keepalives during quiet periods
After a client has sent us the complete pack, we may spend
some time processing the data and running hooks. If the
client asked us to be quiet, receive-pack won't send any
progress data during the index-pack or connectivity-check
steps. And hooks may or may not produce their own progress
output. In these cases, the network connection is totally
silent from both ends.

Git itself doesn't care about this (it will wait forever),
but other parts of the system (e.g., firewalls,
load-balancers, etc) might hang up the connection. So we'd
like to send some sort of keepalive to let the network and
the client side know that we're still alive and processing.

We can use the same trick we did in 05e9515 (upload-pack:
send keepalive packets during pack computation, 2013-09-08).
Namely, we will send an empty sideband data packet every `N`
seconds that we do not relay any stderr data over the
sideband channel. As with 05e9515, this means that we won't
bother sending keepalives when there's actual progress data,
but will kick in when it has been disabled (or if there is a
lull in the progress data).

The concept is simple, but the details are subtle enough
that they need discussing here.

Before the client sends us the pack, we don't want to do any
keepalives. We'll have sent our ref advertisement, and we're
waiting for them to send us the pack (and tell us that they
support sidebands at all).

While we're receiving the pack from the client (or waiting
for it to start), there's no need for keepalives; it's up to
them to keep the connection active by sending data.
Moreover, it would be wrong for us to do so. When we are the
server in the smart-http protocol, we must treat our
connection as half-duplex. So any keepalives we send while
receiving the pack would potentially be buffered by the
webserver. Not only does this make them useless (since they
would not be delivered in a timely manner), but it could
actually cause a deadlock if we fill up the buffer with
keepalives. (It wouldn't be wrong to send keepalives in this
phase for a full-duplex connection like ssh; it's simply
pointless, as it is the client's responsibility to speak).

As soon as we've gotten all of the pack data, then the
client is waiting for us to speak, and we should start
keepalives immediately. From here until the end of the
connection, we send one any time we are not otherwise
sending data.

But there's a catch. Receive-pack doesn't know the moment
we've gotten all the data. It passes the descriptor to
index-pack, who reads all of the data, and then starts
resolving the deltas. We have to communicate that back.

To make this work, we instruct the sideband muxer to enable
keepalives in three phases:

  1. In the beginning, not at all.

  2. While reading from index-pack, wait for a signal
     indicating end-of-input, and then start them.

  3. Afterwards, always.

The signal from index-pack in phase 2 has to come over the
stderr channel which the muxer is reading. We can't use an
extra pipe because the portable run-command interface only
gives us stderr and stdout.

Stdout is already used to pass the .keep filename back to
receive-pack. We could also send a signal there, but then we
would find out about it in the main thread. And the
keepalive needs to be done by the async muxer thread (since
it's the one writing sideband data back to the client). And
we can't reliably signal the async thread from the main
thread, because the async code sometimes uses threads and
sometimes uses forked processes.

Therefore the signal must come over the stderr channel,
where it may be interspersed with other random
human-readable messages from index-pack. This patch makes
the signal a single NUL byte.  This is easy to parse, should
not appear in any normal stderr output, and we don't have to
worry about any timing issues (like seeing half the signal
bytes in one read(), and half in a subsequent one).

This is a bit ugly, but it's simple to code and should work
reliably.

Another option would be to stop using an async thread for
muxing entirely, and just poll() both stderr and stdout of
index-pack from the main thread. This would work for
index-pack (because we aren't doing anything useful in the
main thread while it runs anyway). But it would make the
connectivity check and the hook muxers much more
complicated, as they need to simultaneously feed the
sub-programs while reading their stderr.

The index-pack phase is the only one that needs this
signaling, so it could simply behave differently than the
other two. That would mean having two separate
implementations of copy_to_sideband (and the keepalive
code), though. And it still doesn't get rid of the
signaling; it just means we can write a nicer message like
"END_OF_INPUT" or something on stdout, since we don't have
to worry about separating it from the stderr cruft.

One final note: this signaling trick is only done with
index-pack, not with unpack-objects. There's no point in
doing it for the latter, because by definition it only kicks
in for a small number of objects, where keepalives are not
as useful (and this conveniently lets us avoid duplicating
the implementation).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20 12:11:11 -07:00
Jeff King e376f17fd1 index-pack: add flag for showing delta-resolution progress
The index-pack command has two progress meters: one for
"receiving objects", and one for "resolving deltas". You get
neither by default, or both with "-v".

But for a push through receive-pack, we would want only the
"resolving deltas" phase, _not_ the "receiving objects"
progress. There are two reasons for this.

One is simply that existing clients are already printing
"writing objects" progress at the same time.  Arguably
"receiving" from the far end is more useful, because it
tells you what has actually gotten there, as opposed to what
might be stuck in a buffer somewhere between the client and
server. But that would require a protocol extension to tell
clients not to print their progress. Possible, but
complexity for little gain.

The second reason is much more important. In a full-duplex
connection like git-over-ssh, we can print progress while
the pack is incoming, and it will immediately get to the
client. But for a half-duplex connection like git-over-http,
we should not say anything until we have received the full
request.  Anything we write is subject to being stuck in a
buffer by the webserver.  Worse, we can end up in a deadlock
if that buffer fills up.

So our best bet is to avoid writing anything that isn't a
small fixed size until we've received the full pack.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20 12:11:10 -07:00
Nguyễn Thái Ngọc Duy da49a7da3a index-pack: correct "offset" type in unpack_entry_data()
unpack_entry_data() receives an off_t value from unpack_raw_entry(),
which could be larger than unsigned long on 32-bit systems with large
file support. Correct the type so truncation does not happen. This
only affects bad object reporting though.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-13 09:15:08 -07:00
Nguyễn Thái Ngọc Duy fd3e67474c index-pack: report correct bad object offsets even if they are large
Use the right type for offsets in this case, off_t, which makes a
difference on 32-bit systems with large file support, and change
formatting code accordingly.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-13 09:14:47 -07:00
Nguyễn Thái Ngọc Duy 7171a0b0cf index-pack: correct "len" type in unpack_data()
On 32-bit systems with large file support, one entry could be larger
than 4GB and overflow "len". Correct it so we can unpack a full entry.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-13 09:14:38 -07:00
Junio C Hamano b262b8f889 Merge branch 'va/i18n-misc-updates' into maint
Mark several messages for translation.

* va/i18n-misc-updates:
  i18n: unpack-trees: avoid substituting only a verb in sentences
  i18n: builtin/pull.c: split strings marked for translation
  i18n: builtin/pull.c: mark placeholders for translation
  i18n: git-parse-remote.sh: mark strings for translation
  i18n: branch: move comment for translators
  i18n: branch: unmark string for translation
  i18n: builtin/rm.c: remove a comma ',' from string
  i18n: unpack-trees: mark strings for translation
  i18n: builtin/branch.c: mark option for translation
  i18n: index-pack: use plural string instead of normal one
2016-05-26 13:17:20 -07:00
Junio C Hamano e5e7a9115d Merge branch 'va/i18n-misc-updates'
Mark several messages for translation.

* va/i18n-misc-updates:
  i18n: unpack-trees: avoid substituting only a verb in sentences
  i18n: builtin/pull.c: split strings marked for translation
  i18n: builtin/pull.c: mark placeholders for translation
  i18n: git-parse-remote.sh: mark strings for translation
  i18n: branch: move comment for translators
  i18n: branch: unmark string for translation
  i18n: builtin/rm.c: remove a comma ',' from string
  i18n: unpack-trees: mark strings for translation
  i18n: builtin/branch.c: mark option for translation
  i18n: index-pack: use plural string instead of normal one
2016-05-17 14:38:23 -07:00
Junio C Hamano 1d1cbe224f Merge branch 'jc/index-pack' into maint
Code clean-up.

* jc/index-pack:
  index-pack: add a helper function to derive .idx/.keep filename
  index-pack: correct --keep[=<msg>]
2016-04-14 18:37:16 -07:00
Vasco Almeida 71d99b81da i18n: index-pack: use plural string instead of normal one
Git could output "completed with 1 local objects", but in this
case using "object" instead of "objects" is the correct form.

Use Q_() instead of _().

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-08 15:15:54 -07:00
Junio C Hamano 3583bf594d Merge branch 'jc/index-pack'
Code clean-up.

* jc/index-pack:
  index-pack: add a helper function to derive .idx/.keep filename
2016-04-03 10:29:31 -07:00
Junio C Hamano d4a22303ab Merge branch 'jc/maint-index-pack-keep'
"git index-pack --keep[=<msg>] pack-$name.pack" simply did not work.

* jc/maint-index-pack-keep:
  index-pack: correct --keep[=<msg>]
2016-04-03 10:29:29 -07:00
Junio C Hamano 090de6b289 Merge branch 'jk/pack-idx-corruption-safety'
The code to read the pack data using the offsets stored in the pack
idx file has been made more carefully check the validity of the
data in the idx.

* jk/pack-idx-corruption-safety:
  sha1_file.c: mark strings for translation
  use_pack: handle signed off_t overflow
  nth_packed_object_offset: bounds-check extended offset
  t5313: test bounds-checks of corrupted/malicious pack/idx files
2016-03-04 13:45:47 -08:00
Junio C Hamano bfee614a2f index-pack: add a helper function to derive .idx/.keep filename
These are automatically named by replacing .pack suffix in the
name of the packfile.  Add a small helper to do so, as I'll be
adding another one soonish.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-03 13:16:53 -08:00
Junio C Hamano 13f0a6ddb9 Merge branch 'jc/maint-index-pack-keep' into jc/index-pack
* jc/maint-index-pack-keep:
  index-pack: correct --keep[=<msg>]
2016-03-03 13:16:45 -08:00
Junio C Hamano 0e94242df1 index-pack: correct --keep[=<msg>]
When 592ce208 (index-pack: use strip_suffix to avoid magic numbers,
2014-06-30) refactored the code to derive names of .idx and .keep
files from the name of .pack file, a copy-and-paste typo crept in,
mistakingly attempting to create and store the keep message file in
the .idx file we just created, instead of .keep file.

As we create the .keep file with O_CREAT|O_EXCL, and we do so after
we write the .idx file, we luckily do not clobber the .idx file, but
because we deliberately ignored EEXIST when creating .keep file
(which is justifiable because only the existence of .keep file
matters), nobody noticed this mistake so far.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-03 11:06:01 -08:00
Jeff King 47fe3f6ef0 nth_packed_object_offset: bounds-check extended offset
If a pack .idx file has a corrupted offset for an object, we
may try to access an offset in the .idx or .pack file that
is larger than the file's size.  For the .pack case, we have
use_pack() to protect us, which realizes the access is out
of bounds. But if the corrupted value asks us to look in the
.idx file's secondary 64-bit offset table, we blindly add it
to the mmap'd index data and access arbitrary memory.

We can fix this with a simple bounds-check compared to the
size we found when we opened the .idx file.

Note that there's similar code in index-pack that is
triggered only during "index-pack --verify". To support
both, we pull the bounds-check into a separate function,
which dies when it sees a corrupted file.

It would be nice if we could return an error, so that the
pack code could try to find a good copy of the object
elsewhere. Currently nth_packed_object_offset doesn't have
any way to return an error, but it could probably use "0" as
a sentinel value (since no object can start there). This is
the minimal fix, and we can improve the resilience later on
top.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-25 11:32:43 -08:00
Jeff King 50a6c8efa2 use st_add and st_mult for allocation size computation
If our size computation overflows size_t, we may allocate a
much smaller buffer than we expected and overflow it. It's
probably impossible to trigger an overflow in most of these
sites in practice, but it is easy enough convert their
additions and multiplications into overflow-checking
variants. This may be fixing real bugs, and it makes
auditing the code easier.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
Jeff King b32fa95fd8 convert trivial cases to ALLOC_ARRAY
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:

  1. It automatically checks the array-size multiplication
     for overflow.

  2. It always uses sizeof(*array) for the element-size,
     so that it can never go out of sync with the declared
     type of the array.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
brian m. carlson ed1c9977cb Remove get_object_hash.
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object.  This provides no
functional change, as it is essentially a macro substitution.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson f2fd0760f6 Convert struct object to object_id
struct object is one of the major data structures dealing with object
IDs.  Convert it to use struct object_id instead of an unsigned char
array.  Convert get_object_hash to refer to the new member as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson 7999b2cf77 Add several uses of get_object_hash.
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash.  Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
Jeff King ef1286d3c0 use xsnprintf for generating git object headers
We generally use 32-byte buffers to format git's "type size"
header fields. These should not generally overflow unless
you can produce some truly gigantic objects (and our types
come from our internal array of constant strings). But it is
a good idea to use xsnprintf to make sure this is the case.

Note that we slightly modify the interface to
write_sha1_file_prepare, which nows uses "hdrlen" as an "in"
parameter as well as an "out" (on the way in it stores the
allocated size of the header, and on the way out it returns
the ultimate size of the header).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Junio C Hamano 51a22ce147 Merge branch 'jc/finalize-temp-file'
Long overdue micro clean-up.

* jc/finalize-temp-file:
  sha1_file.c: rename move_temp_to_file() to finalize_object_file()
2015-08-19 14:48:55 -07:00
Junio C Hamano cb5add5868 sha1_file.c: rename move_temp_to_file() to finalize_object_file()
Since 5a688fe4 ("core.sharedrepository = 0mode" should set, not
loosen, 2009-03-25), we kept reminding ourselves:

    NEEDSWORK: this should be renamed to finalize_temp_file() as
    "moving" is only a part of what it does, when no patch between
    master to pu changes the call sites of this function.

without doing anything about it.  Let's do so.

The purpose of this function was not to move but to finalize.  The
detail of the primarily implementation of finalizing was to link the
temporary file to its final name and then to unlink, which wasn't
even "moving".  The alternative implementation did "move" by calling
rename(2), which is a fun tangent.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 11:10:37 -07:00
Junio C Hamano b2f44feba5 Merge branch 'js/fsck-opt'
Allow ignoring fsck errors on specific set of known-to-be-bad
objects, and also tweaking warning level of various kinds of non
critical breakages reported.

* js/fsck-opt:
  fsck: support ignoring objects in `git fsck` via fsck.skiplist
  fsck: git receive-pack: support excluding objects from fsck'ing
  fsck: introduce `git fsck --connectivity-only`
  fsck: support demoting errors to warnings
  fsck: document the new receive.fsck.<msg-id> options
  fsck: allow upgrading fsck warnings to errors
  fsck: optionally ignore specific fsck issues completely
  fsck: disallow demoting grave fsck errors to warnings
  fsck: add a simple test for receive.fsck.<msg-id>
  fsck: make fsck_tag() warn-friendly
  fsck: handle multiple authors in commits specially
  fsck: make fsck_commit() warn-friendly
  fsck: make fsck_ident() warn-friendly
  fsck: report the ID of the error/warning
  fsck (receive-pack): allow demoting errors to warnings
  fsck: offer a function to demote fsck errors to warnings
  fsck: provide a function to parse fsck message IDs
  fsck: introduce identifiers for fsck messages
  fsck: introduce fsck options
2015-08-03 11:01:18 -07:00
Junio C Hamano de62fe8c42 Merge branch 'jk/index-pack-reduce-recheck' into maint
Disable "have we lost a race with competing repack?" check while
receiving a huge object transfer that runs index-pack.

* jk/index-pack-reduce-recheck:
  index-pack: avoid excessive re-reading of pack directory
2015-07-27 12:21:38 -07:00
Junio C Hamano 0bf46af089 Merge branch 'jc/fix-alloc-sortbuf-in-index-pack'
A hotfix for what is in 2.5-rc but not in 2.4.

* jc/fix-alloc-sortbuf-in-index-pack:
  index-pack: fix allocation of sorted_by_pos array
2015-07-09 14:31:42 -07:00
Junio C Hamano 781d93067d index-pack: fix allocation of sorted_by_pos array
When c6458e60 (index-pack: kill union delta_base to save memory,
2015-04-18) attempted to reduce the memory footprint of index-pack,
one of the key thing it did was to keep track of ref-deltas and
ofs-deltas separately.

In fix_unresolved_deltas(), however it forgot that it now wants to
look only at ref deltas in one place.  The code allocated an array
for nr_unresolved, which is sum of number of ref- and ofs-deltas
minus nr_resolved, which may be larger or smaller than the number
ref-deltas.  Depending on nr_resolved, this was either under or over
allocating.

Also, the old code before this change had to use 'i' and 'n' because
some of the things we see in the (old) deltas[] array we scanned
with 'i' would not make it into the sorted_by_pos[] array in the old
world order, but now because you have only ref delta in a separate
ref_deltas[] array, they increment lock&step.  We no longer need
separate variables.  And most importantly, we shouldn't pass the
nr_unresolved parameter, as this number does not play a role in the
working of this helper function.

Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-04 15:26:03 -07:00
Junio C Hamano 712b351bd3 Merge branch 'jk/index-pack-reduce-recheck'
Disable "have we lost a race with competing repack?" check while
receiving a huge object transfer that runs index-pack.

* jk/index-pack-reduce-recheck:
  index-pack: avoid excessive re-reading of pack directory
2015-06-24 12:21:54 -07:00
Johannes Schindelin 5d477a334a fsck (receive-pack): allow demoting errors to warnings
For example, missing emails in commit and tag objects can be demoted to
mere warnings with

	git config receive.fsck.missingemail=warn

The value is actually a comma-separated list.

In case that the same key is listed in multiple receive.fsck.<msg-id>
lines in the config, the latter configuration wins (this can happen for
example when both $HOME/.gitconfig and .git/config contain message type
settings).

As git receive-pack does not actually perform the checks, it hands off
the setting to index-pack or unpack-objects in the form of an optional
argument to the --strict option.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 14:27:34 -07:00
Johannes Schindelin 22410549fc fsck: introduce fsck options
Just like the diff machinery, we are about to introduce more settings,
therefore it makes sense to carry them around as a (pointer to a) struct
containing all of them.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22 10:23:32 -07:00
Junio C Hamano c3b1c1e9b2 Merge branch 'nd/slim-index-pack-memory-usage'
An earlier optimization broke index-pack for a large object
transfer; this fixes it before the breakage hits any released
version.

* nd/slim-index-pack-memory-usage:
  index-pack: fix truncation of off_t in comparison
2015-06-16 14:27:08 -07:00
Jeff King 0eeb077be7 index-pack: avoid excessive re-reading of pack directory
Since 45e8a74 (has_sha1_file: re-check pack directory before
giving up, 2013-08-30), we spend extra effort for
has_sha1_file to give the right answer when somebody else is
repacking. Usually this effort does not matter, because
after finding that the object does not exist, the next step
is usually to die().

However, some code paths make a large number of
has_sha1_file checks which are _not_ expected to return 1.
The collision test in index-pack.c is such a case. On a
local system, this can cause a performance slowdown of
around 5%. But on a system with high-latency system calls
(like NFS), it can be much worse.

This patch introduces a "quick" flag to has_sha1_file which
callers can use when they would prefer high performance at
the cost of false negatives during repacks. There may be
other code paths that can use this, but the index-pack one
is the most obviously critical, so we'll start with
switching that one.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-09 12:26:35 -07:00
Jeff King f0e7f11d05 index-pack: fix truncation of off_t in comparison
Commit c6458e6 (index-pack: kill union delta_base to save
memory, 2015-04-18) refactored the comparison functions used
in sorting and binary searching our delta list. The
resulting code does something like:

  int cmp_offsets(off_t a, off_t b)
  {
	  return a - b;
  }

This works most of the time, but produces nonsensical
results when the difference between the two offsets is
larger than what can be stored in an "int". This can lead to
unresolved deltas if the packsize is larger than 2G (even on
64-bit systems, an int is still typically 32 bits):

  $ git clone git://github.com/mozilla/gecko-dev
  Cloning into 'gecko-dev'...
  remote: Counting objects: 4800161, done.
  remote: Compressing objects: 100% (178/178), done.
  remote: Total 4800161 (delta 88), reused 0 (delta 0), pack-reused 4799978
  Receiving objects: 100% (4800161/4800161), 2.21 GiB | 3.26 MiB/s, done.
  Resolving deltas:  99% (3808820/3811944), completed with 0 local objects.
  fatal: pack has 3124 unresolved deltas
  fatal: index-pack failed

We can fix it by doing direct comparisons between the
offsets and returning constants; the callers only care about
the sign of the comparison, not the magnitude.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-04 10:28:57 -07:00
Junio C Hamano eb10a85098 Merge branch 'nd/slim-index-pack-memory-usage'
Memory usage of "git index-pack" has been trimmed by tens of
per-cent.

* nd/slim-index-pack-memory-usage:
  index-pack: kill union delta_base to save memory
  index-pack: reduce object_entry size to save memory
2015-05-11 14:23:44 -07:00
Nguyễn Thái Ngọc Duy c6458e60ed index-pack: kill union delta_base to save memory
Once we know the number of objects in the input pack, we allocate an
array of nr_objects of struct delta_entry. On x86-64, this struct is
32 bytes long. The union delta_base, which is part of struct
delta_entry, provides enough space to store either ofs-delta (8 bytes)
or ref-delta (20 bytes).

Because ofs-delta encoding is more efficient space-wise and more
performant at runtime than ref-delta encoding, Git packers try to use
ofs-delta whenever possible, and it is expected that objects encoded
as ref-delta are minority.

In the best clone case where no ref-delta object is present, we waste
(20-8) * nr_objects bytes because of this union. That's about 38MB out
of 100MB for deltas[] with 3.4M objects, or 38%. deltas[] would be
around 62MB without the waste.

This patch attempts to eliminate that. deltas[] array is split into
two: one for ofs-delta and one for ref-delta. Many functions are also
duplicated because of this split. With this patch, ofs_deltas[] array
takes 51MB. ref_deltas[] should remain unallocated in clone case (0
bytes). This array grows as we see ref-delta. We save about half in
this case, or 25% of total bookkeeping.

The saving is more than the calculation above because some padding in
the old delta_entry struct is removed. ofs_delta_entry is 16 bytes,
including the 4 bytes padding. That's 13MB for padding, but packing
the struct could break platforms that do not support unaligned
access. If someone on 32-bit is really low on memory and only deals
with packs smaller than 2G, using 32-bit off_t would eliminate the
padding and save 27MB on top.

A note about ofs_deltas allocation. We could use ref_deltas memory
allocation strategy for ofs_deltas. But that probably just adds more
overhead on top. ofs-deltas are generally the majority (1/2 to 2/3) in
any pack. Incremental realloc may lead to too many memcpy. And if we
preallocate, say 1/2 or 2/3 of nr_objects initially, the growth rate
of ALLOC_GROW() could make this array larger than nr_objects, wasting
more memory.

Brought-up-by: Matthew Sporleder <msporleder@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-18 17:48:32 -07:00
Junio C Hamano 6902c4da58 Merge branch 'rs/deflate-init-cleanup'
Code simplification.

* rs/deflate-init-cleanup:
  zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
2015-03-17 16:01:26 -07:00
René Scharfe 9a6f1287fb zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
Clear the git_zstream variable at the start of git_deflate_init() etc.
so that callers don't have to do that.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-05 15:46:03 -08:00
Nguyễn Thái Ngọc Duy 417305764a index-pack: reduce object_entry size to save memory
For each object in the input pack, we need one struct object_entry. On
x86-64, this struct is 64 bytes long. Although:

 - The 8 bytes for delta_depth and base_object_no are only useful when
   show_stat is set. And it's never set unless someone is debugging.

 - The three fields hdr_size, type and real_type take 4 bytes each
   even though they never use more than 4 bits.

By moving delta_depth and base_object_no out of struct object_entry
and make the other 3 fields one byte long instead of 4, we shrink 25%
of this struct.

On a 3.4M object repo (*) that's about 53MB. The saving is less
impressive compared to index-pack memory use for basic bookkeeping (**),
about 16%.

(*) linux-2.6.git already has 4M objects as of v3.19-rc7 so this is
not an unrealistic number of objects that we have to deal with.

(**)  3.4M * (sizeof(object_entry) + sizeof(delta_entry)) = 311MB

Brought-up-by: Matthew Sporleder <msporleder@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-27 12:23:25 -08:00
Junio C Hamano 1cb4b3d380 Merge branch 'js/fsck-tag-validation'
New tag object format validation added in 2.2 showed garbage
after a tagname it reported in its error message.

* js/fsck-tag-validation:
  index-pack: terminate object buffers with NUL
  fsck: properly bound "invalid tag name" error message
2014-12-22 12:27:41 -08:00
Duy Nguyen a1e920a0a7 index-pack: terminate object buffers with NUL
We have some tricky checks in fsck that rely on a side effect of
require_end_of_header(), and would otherwise easily run outside
non-NUL-terminated buffers. This is a bit brittle, so let's make sure
that only NUL-terminated buffers are passed around to begin with.

Jeff "Peff" King contributed the detailed analysis which call paths are
involved and pointed out that we also have to patch the get_data()
function in unpack-objects.c, which is what Johannes "Dscho" Schindelin
implemented.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Analyzed-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-09 11:56:37 -08:00
Etienne Buira e0e21283b6 index-pack: fix compilation with NO_PTHREADS
type_cas_lock/unlock() should be defined as no-op for NO_PTHREADS
build, just like all the other locking primitives.

Signed-off-by: Etienne Buira <etienne.buira@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-13 12:33:30 -07:00
Junio C Hamano 1c2ea2cdc0 Merge branch 'rs/realloc-array'
Code cleanup.

* rs/realloc-array:
  use REALLOC_ARRAY for changing the allocation size of arrays
  add macro REALLOC_ARRAY
2014-09-26 14:39:45 -07:00
Junio C Hamano 13f4f04692 Merge branch 'js/fsck-tag-validation'
Teach "git fsck" to inspect the contents of annotated tag objects.

* js/fsck-tag-validation:
  Make sure that index-pack --strict checks tag objects
  Add regression tests for stricter tag fsck'ing
  fsck: check tag objects' headers
  Make sure fsck_commit_buffer() does not run out of the buffer
  fsck_object(): allow passing object data separately from the object itself
  Refactor type_from_string() to allow continuing after detecting an error
2014-09-26 14:39:43 -07:00
Junio C Hamano bd656f6e7b Merge branch 'jk/index-pack-threading-races'
When receiving an invalid pack stream that records the same object
twice, multiple threads got confused due to a race.  We should
reject or correct such a stream upon receiving, but that will be a
larger change.

* jk/index-pack-threading-races:
  index-pack: fix race condition with duplicate bases
2014-09-19 11:38:34 -07:00
René Scharfe 2756ca4347 use REALLOC_ARRAY for changing the allocation size of arrays
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-18 09:13:42 -07:00
Johannes Schindelin 90a398bbd7 fsck_object(): allow passing object data separately from the object itself
When fsck'ing an incoming pack, we need to fsck objects that cannot be
read via read_sha1_file() because they are not local yet (and might even
be rejected if transfer.fsckobjects is set to 'true').

For commits, there is a hack in place: we basically cache commit
objects' buffers anyway, but the same is not true, say, for tag objects.

By refactoring fsck_object() to take the object buffer and size as
optional arguments -- optional, because we still fall back to the
previous method to look at the cached commit objects if the caller
passes NULL -- we prepare the machinery for the upcoming handling of tag
objects.

The assumption that such buffers are inherently NUL terminated is now
wrong, of course, hence we pass the size of the buffer so that we can
add a sanity check later, to prevent running past the end of the buffer.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-10 13:54:21 -07:00
Jeff King ab791dd138 index-pack: fix race condition with duplicate bases
When we are resolving deltas in an indexed pack, we do it by
first selecting a potential base (either one stored in full
in the pack, or one created by resolving another delta), and
then resolving any deltas that use that base.  When we
resolve a particular delta, we flip its "real_type" field
from OBJ_{REF,OFS}_DELTA to whatever the real type is.

We assume that traversing the objects this way will visit
each delta only once. This is correct for most packs; we
visit the delta only when we process its base, and each
object (and thus each base) appears only once. However, if a
base object appears multiple times in the pack, we will try
to resolve any deltas based on it once for each instance.

We can detect this case by noting that a delta we are about
to resolve has already had its real_type field flipped, and
we already do so with an assert().  However, if multiple
threads are in use, we may race with another thread on
comparing and flipping the field. We need to synchronize the
access.

The right mechanism for doing this is a compare-and-swap (we
atomically "claim" the delta for our own and find out
whether our claim was successful). We can implement this
in C by using a pthread mutex to protect the operation. This
is not the fastest way of doing a compare-and-swap; many
processors provide instructions for this, and gcc and other
compilers provide builtins to access them. However, some
experiments showed that lock contention does not cause a
significant slowdown here. Adding c-a-s support for many
compilers would increase the maintenance burden (and we
would still end up including the pthread version as a
fallback).

Note that we only need to touch the OBJ_REF_DELTA codepath
here. An OBJ_OFS_DELTA object points to its base using an
offset, and therefore has only one base, even if another
copy of that base object appears in the pack (we do still
touch it briefly because the setting of real_type is
factored out of resolve_data).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-29 14:50:43 -07:00
Junio C Hamano 9ab0882255 Merge branch 'maint'
* maint:
  use xmemdupz() to allocate copies of strings given by start and length
  use xcalloc() to allocate zero-initialized memory
2014-07-21 12:35:39 -07:00
René Scharfe 51a60f5bfb use xcalloc() to allocate zero-initialized memory
Use xcalloc() instead of xmalloc() followed by memset() to allocate
and zero out memory because it's shorter and avoids duplicating the
function parameters.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-21 10:30:21 -07:00
Junio C Hamano 6e4094731a Merge branch 'jk/strip-suffix'
* jk/strip-suffix:
  prepare_packed_git_one: refactor duplicate-pack check
  verify-pack: use strbuf_strip_suffix
  strbuf: implement strbuf_strip_suffix
  index-pack: use strip_suffix to avoid magic numbers
  use strip_suffix instead of ends_with in simple cases
  replace has_extension with ends_with
  implement ends_with via strip_suffix
  add strip_suffix function
  sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one()
2014-07-16 11:26:00 -07:00
Junio C Hamano 5c18fde0d9 Merge branch 'jk/commit-buffer-length' into maint
A handful of code paths had to read the commit object more than
once when showing header fields that are usually not parsed.  The
internal data structure to keep track of the contents of the commit
object has been updated to reduce the need for this double-reading,
and to allow the caller find the length of the object.

* jk/commit-buffer-length:
  reuse cached commit buffer when parsing signatures
  commit: record buffer length in cache
  commit: convert commit->buffer to a slab
  commit-slab: provide a static initializer
  use get_commit_buffer everywhere
  convert logmsg_reencode to get_commit_buffer
  use get_commit_buffer to avoid duplicate code
  use get_cached_commit_buffer where appropriate
  provide helpers to access the commit buffer
  provide a helper to set the commit buffer
  provide a helper to free commit buffer
  sequencer: use logmsg_reencode in get_message
  logmsg_reencode: return const buffer
  do not create "struct commit" with xcalloc
  commit: push commit_index update into alloc_commit_node
  alloc: include any-object allocations in alloc_report
  replace dangerous uses of strbuf_attach
  commit_tree: take a pointer/len pair rather than a const strbuf
2014-07-16 11:16:38 -07:00
Junio C Hamano 8061ae8b46 Merge branch 'jk/commit-buffer-length'
Move "commit->buffer" out of the in-core commit object and keep
track of their lengths.  Use this to optimize the code paths to
validate GPG signatures in commit objects.

* jk/commit-buffer-length:
  reuse cached commit buffer when parsing signatures
  commit: record buffer length in cache
  commit: convert commit->buffer to a slab
  commit-slab: provide a static initializer
  use get_commit_buffer everywhere
  convert logmsg_reencode to get_commit_buffer
  use get_commit_buffer to avoid duplicate code
  use get_cached_commit_buffer where appropriate
  provide helpers to access the commit buffer
  provide a helper to set the commit buffer
  provide a helper to free commit buffer
  sequencer: use logmsg_reencode in get_message
  logmsg_reencode: return const buffer
  do not create "struct commit" with xcalloc
  commit: push commit_index update into alloc_commit_node
  alloc: include any-object allocations in alloc_report
  replace dangerous uses of strbuf_attach
  commit_tree: take a pointer/len pair rather than a const strbuf
2014-07-02 12:53:02 -07:00
Jeff King 592ce20820 index-pack: use strip_suffix to avoid magic numbers
We also switch to using strbufs, which lets us avoid the
potentially dangerous combination of a manual malloc
followed by a strcpy.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-30 13:43:17 -07:00
Jeff King 2975c770ca replace has_extension with ends_with
These two are almost the same function, with the exception
that has_extension only matches if there is content before
the suffix. So ends_with(".exe", ".exe") is true, but
has_extension would not be.

This distinction does not matter to any of the callers,
though, and we can just replace uses of has_extension with
ends_with. We prefer the "ends_with" name because it is more
generic, and there is nothing about the function that
requires it to be used for file extensions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-30 13:43:16 -07:00
Junio C Hamano 1881d2b88c Merge branch 'ym/fix-opportunistic-index-update-race' into maint
"git status", even though it is a read-only operation, tries to
update the index with refreshed lstat(2) info to optimize future
accesses to the working tree opportunistically, but this could
race with a "read-write" operation that modify the index while it
is running.  Detect such a race and avoid overwriting the index.

* ym/fix-opportunistic-index-update-race:
  read-cache.c: verify index file before we opportunistically update it
  wrapper.c: add xpread() similar to xread()
2014-06-25 11:49:48 -07:00
Junio C Hamano 182c3d69e4 Merge branch 'jk/index-pack-report-missing' into maint
The error reporting from "git index-pack" has been improved to
distinguish missing objects from type errors.

* jk/index-pack-report-missing:
  index-pack: distinguish missing objects from type errors
2014-06-25 11:48:14 -07:00
Junio C Hamano a9041df7ab Merge branch 'nd/index-pack-one-fd-per-thread' into maint
We used to disable threaded "git index-pack" on platforms without
thread-safe pread(); use a different workaround for such
platforms to allow threaded "git index-pack".

* nd/index-pack-one-fd-per-thread:
  index-pack: work around thread-unsafe pread()
2014-06-25 11:47:58 -07:00
Jeff King 8597ea3afe commit: record buffer length in cache
Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.

For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.

This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:09:38 -07:00
Jeff King 0fb370da9c provide a helper to free commit buffer
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.

Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it.  But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual).  In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.

Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:07:47 -07:00
Junio C Hamano a0460132a7 Merge branch 'jk/index-pack-report-missing'
* jk/index-pack-report-missing:
  index-pack: distinguish missing objects from type errors
2014-06-06 11:28:13 -07:00
Junio C Hamano 53f52cd92a Merge branch 'nd/index-pack-one-fd-per-thread'
Enable threaded index-pack on platforms without thread-unsafe
pread() emulation.

* nd/index-pack-one-fd-per-thread:
  index-pack: work around thread-unsafe pread()
2014-06-03 12:06:42 -07:00
Junio C Hamano 9af098c29b Merge branch 'ym/fix-opportunistic-index-update-race'
Read-only operations such as "git status" that internally refreshes
the index write out the refreshed index to the disk to optimize
future accesses to the working tree, but this could race with a
"read-write" operation that modify the index while it is running.
Detect such a race and avoid overwriting the index.

Duy raised a good point that we may need to do the same for the
normal writeout codepath, not just the "opportunistic" update
codepath.  While that is true, nobody sane would be running two
simultaneous operations that are clearly write-oriented competing
with each other against the same index file.  So in that sense that
can be done as a less urgent follow-up for this topic.

* ym/fix-opportunistic-index-update-race:
  read-cache.c: verify index file before we opportunistically update it
  wrapper.c: add xpread() similar to xread()
2014-06-03 12:06:41 -07:00
Jeff King 77583e7739 index-pack: distinguish missing objects from type errors
When we fetch a pack that does not contain an object we
expected to receive, we get an error like:

  $ git init --bare tmp.git && cd tmp.git
  $ git fetch ../parent.git
  [...]
  error: Could not read 964953ec7bcc0245cb1d0db4095455edd21a2f2e
  fatal: Failed to traverse parents of commit b8247b40caf6704fe52736cdece6d6aae87471aa
  error: ../parent.git did not send all necessary objects

This comes from the check_everything_connected rev-list. If
we try cloning the same repo (rather than a fetch), we end
up using index-pack's --check-self-contained-and-connected
option instead, which produces output like:

  $ git clone --no-local --bare parent.git tmp.git
  [...]
  fatal: object of unexpected type
  fatal: index-pack failed

Not only is the sha1 missing, but it's a misleading message.
There's no type problem, but rather a missing object
problem; we don't notice the difference because we simply
compare OBJ_BAD != OBJ_BLOB.  Let's provide a different
message for this case:

  $ git clone --no-local --bare parent.git tmp.git
  fatal: did not receive expected object 6b00a8c61ed379d5f925a72c1987c9c52129d364
  fatal: index-pack failed

While we're at it, let's also improve a true type mismatch
error to look like

  fatal: object 6b00a8c61ed379d5f925a72c1987c9c52129d364: expected type blob, got tree

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-12 11:27:50 -07:00
Nguyễn Thái Ngọc Duy 39539495ac index-pack: work around thread-unsafe pread()
Multi-threaing of index-pack was disabled with c0f8654
(index-pack: Disable threading on cygwin - 2012-06-26), because
pread() implementations for Cygwin and MSYS were not thread
safe.  Recent Cygwin does offer usable pread() and we enabled
multi-threading with 103d530f (Cygwin 1.7 has thread-safe pread,
2013-07-19).

Work around this problem on platforms with a thread-unsafe
pread() emulation by opening one file handle per thread; it
would prevent parallel pread() on different file handles from
stepping on each other.

Also remove NO_THREAD_SAFE_PREAD that was introduced in c0f8654
because it's no longer used anywhere.

This workaround is unconditional, even for platforms with
thread-safe pread() because the overhead is small (a couple file
handles more) and not worth fragmenting the code.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Tested-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-16 09:29:41 -07:00
Yiannis Marangos 9aa91af036 wrapper.c: add xpread() similar to xread()
It is a common mistake to call read(2)/pread(2) and forget to
anticipate that they may return error with EAGAIN/EINTR when the
system call is interrupted.

We have xread() helper to relieve callers of read(2) from having to
worry about it; add xpread() helper to do the same for pread(2).

Update the caller in the builtin/index-pack.c and the mmap emulation
in compat/.

Signed-off-by: Yiannis Marangos <yiannis.marangos@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-10 12:18:55 -07:00
Junio C Hamano 3dd108348f Merge branch 'nd/index-pack-error-message' into maint
* nd/index-pack-error-message:
  index-pack: report error using the correct variable
2014-04-03 13:39:04 -07:00
Junio C Hamano 0e8c09263e Merge branch 'nd/index-pack-error-message'
* nd/index-pack-error-message:
  index-pack: report error using the correct variable
2014-03-25 11:08:19 -07:00
Junio C Hamano de983a0a18 index-pack: report error using the correct variable
We feed a string pointer that is potentially NULL to die() when
showing the message.  Don't.

Noticed-by: Nguyễn Thái Ngọc Duy  <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-17 15:08:36 -07:00
Michael Haggerty afc711b8e1 rename read_replace_refs to check_replace_refs
The semantics of this flag was changed in commit

    e1111cef23 inline lookup_replace_object() calls

but wasn't renamed at the time to minimize code churn.  Rename it now,
and add a comment explaining its use.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-20 14:16:55 -08:00
Christian Couder 5955654823 replace {pre,suf}fixcmp() with {starts,ends}_with()
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.

The change can be recreated by mechanically applying this:

    $ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
      grep -v strbuf\\.c |
      xargs perl -pi -e '
        s|!prefixcmp\(|starts_with\(|g;
        s|prefixcmp\(|!starts_with\(|g;
        s|!suffixcmp\(|ends_with\(|g;
        s|suffixcmp\(|!ends_with\(|g;
      '

on the result of preparatory changes in this series.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05 14:13:21 -08:00
Junio C Hamano b8f23112f0 Merge branch 'jk/free-tree-buffer'
* jk/free-tree-buffer:
  clear parsed flag when we free tree buffers
2013-09-17 11:37:33 -07:00
Jeff King 6e454b9a31 clear parsed flag when we free tree buffers
Many code paths will free a tree object's buffer and set it
to NULL after finishing with it in order to keep memory
usage down during a traversal. However, out of 8 sites that
do this, only one actually unsets the "parsed" flag back.
Those sites that don't are setting a trap for later users of
the tree object; even after calling parse_tree, the buffer
will remain NULL, causing potential segfaults.

It is not known whether this is triggerable in the current
code. Most commands do not do an in-memory traversal
followed by actually using the objects again. However, it
does not hurt to be safe for future callers.

In most cases, we can abstract this out to a
"free_tree_buffer" helper. However, there are two
exceptions:

  1. The fsck code relies on the parsed flag to know that we
     were able to parse the object at one point. We can
     switch this to using a flag in the "flags" field.

  2. The index-pack code sets the buffer to NULL but does
     not free it (it is freed by a caller). We should still
     unset the parsed flag here, but we cannot use our
     helper, as we do not want to free the buffer.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-06 10:29:12 -07:00
Nguyễn Thái Ngọc Duy c6807a40dc clone: open a shortcut for connectivity check
In order to make sure the cloned repository is good, we run "rev-list
--objects --not --all $new_refs" on the repository. This is expensive
on large repositories. This patch attempts to mitigate the impact in
this special case.

In the "good" clone case, we only have one pack. If all of the
following are met, we can be sure that all objects reachable from the
new refs exist, which is the intention of running "rev-list ...":

 - all refs point to an object in the pack
 - there are no dangling pointers in any object in the pack
 - no objects in the pack point to objects outside the pack

The second and third checks can be done with the help of index-pack as
a slight variation of --strict check (which introduces a new condition
for the shortcut: pack transfer must be used and the number of objects
large enough to call index-pack). The first is checked in
check_everything_connected after we get an "ok" from index-pack.

"index-pack + new checks" is still faster than the current "index-pack
+ rev-list", which is the whole point of this patch. If any of the
conditions fail, we fall back to the good old but expensive "rev-list
..". In that case it's even more expensive because we have to pay for
the new checks in index-pack. But that should only happen when the
other side is either buggy or malicious.

Cloning linux-2.6 over file://

        before         after
real    3m25.693s      2m53.050s
user    5m2.037s       4m42.396s
sys     0m13.750s      0m16.574s

A more realistic test with ssh:// over wireless

        before         after
real    11m26.629s     10m4.213s
user    5m43.196s      5m19.444s
sys     0m35.812s      0m37.630s

This shortcut is not applied to shallow clones, partly because shallow
clones should have no more objects than a usual fetch and the cost of
rev-list is acceptable, partly to avoid dealing with corner cases when
grafting is involved.

This shortcut does not apply to unpack-objects code path either
because the number of objects must be small in order to trigger that
code path.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28 08:07:20 -07:00
Nguyễn Thái Ngọc Duy 920734b069 index-pack: remove dead code (it should never happen)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28 08:07:03 -07:00
Junio C Hamano 1d066c58ee Merge branch 'nd/index-pack-threaded-fixes'
"index-pack --verify-stat" used a few counters outside protection
of mutex, possibly showing incorrect numbers.

* nd/index-pack-threaded-fixes:
  index-pack: guard nr_resolved_deltas reads by lock
  index-pack: protect deepest_delta in multithread code
2013-04-01 09:06:23 -07:00
Junio C Hamano ed8852c286 Merge branch 'jk/index-pack-correct-depth-fix'
"index-pack --fix-thin" used uninitialize value to compute delta
depths of objects it appends to the resulting pack.

* jk/index-pack-correct-depth-fix:
  index-pack: always zero-initialize object_entry list
2013-04-01 09:06:19 -07:00
Jeff King 57165db003 index-pack: always zero-initialize object_entry list
Commit 38a4556 (index-pack: start learning to emulate
"verify-pack -v", 2011-06-03) added a "delta_depth" counter
to each "struct object_entry". Initially, all object entries
have their depth set to 0; in resolve_delta, we then set the
depth of each delta to "base + 1". Base entries never have
their depth touched, and remain at 0.

To ensure that all depths start at 0, that commit changed
calls to xmalloc the object_entry list into calls to
xcalloc.  However, it forgot that we grow the list with
xrealloc later. These extra entries are used when we add an
object from elsewhere to complete a thin pack. If we add a
non-delta object, its depth value will just be uninitialized
heap data.

This patch fixes it by zero-initializing entries we add to
the objects list via the xrealloc.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-20 12:53:26 -07:00
Thomas Rast 8f82aad4e7 index-pack: guard nr_resolved_deltas reads by lock
The threaded parts of index-pack increment the number of resolved
deltas in nr_resolved_deltas guarded by counter_mutex.  However, the
per-thread outer loop accessed nr_resolved_deltas without any locks.

This is not wrong as such, since it doesn't matter all that much
whether we get an outdated value.  However, unless someone proves that
this one lock makes all the performance difference, it would be much
cleaner to guard _all_ accesses to the variable with the lock.

The only such use is display_progress() in the threaded section (all
others are in the conclude_pack() callchain outside the threaded
part).  To make it obvious that it cannot deadlock, move it out of
work_mutex.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-19 08:40:47 -07:00
Nguyễn Thái Ngọc Duy 3aba2fddcb index-pack: protect deepest_delta in multithread code
deepest_delta is a global variable but is updated without protection
in resolve_delta(), a multithreaded function. Add a new mutex for it,
but only protect and update when it's actually used (i.e. show_stat is
non-zero).

Another variable that will not be updated is delta_depth in "struct
object_entry" as it's only useful when show_stat is 1. Putting it in
"if (show_stat)" makes it clearer.

The local variable "stat" is renamed to "show_stat" after moving to
global scope because the name "stat" conflicts with stat(2) syscall.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-19 08:40:19 -07:00
Nguyễn Thái Ngọc Duy 5c3459fc61 index-pack: fix buffer overflow caused by translations
The translation of "completed with %d local objects" is put in a
48-byte buffer, which may be enough for English but not true for any
translations. Convert it to use strbuf (i.e. no hard limit on
translation length).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-16 22:08:53 -07:00
Nguyễn Thái Ngọc Duy f350df429f i18n: mark more index-pack strings for translation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-31 13:05:05 -07:00
Junio C Hamano 37b92a9a2e Merge branch 'jk/index-pack-streaming-fix'
The streaming index-pack introduced in 1.7.11 had a data corruption
bug, and this should fix it.

* jk/index-pack-streaming-fix:
  index-pack: loop while inflating objects in unpack_data
2012-07-15 21:40:07 -07:00
Jeff King f8b090386b index-pack: loop while inflating objects in unpack_data
When the unpack_data function is given a consume() callback,
it unpacks only 64K of the input at a time, feeding it to
git_inflate along with a 64K output buffer.  However,
because we are inflating, there is a good chance that the
output buffer will fill before consuming all of the input.
In this case, we need to loop on git_inflate until we have
fed the whole input buffer, feeding each chunk of output to
the consume buffer.

The current code does not do this, and as a result, will
fail the loop condition and trigger a fatal "serious inflate
inconsistency" error in this case.

While we're rearranging the loop, let's get rid of the
extra last_out pointer. It is meant to point to the
beginning of the buffer that we feed to git_inflate, but in
practice this is always the beginning of our same 64K
buffer, because:

  1. At the beginning of the loop, we are feeding the
     buffer.

  2. At the end of the loop, if we are using a consume()
     function, we reset git_inflate's pointer to the
     beginning of the buffer.  If we are not using a
     consume() function, then we do not care about the value
     of last_out at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-10 16:45:00 -07:00
Junio C Hamano c592023aed Merge branch 'rj/platform-pread-may-be-thread-unsafe'
On Cygwin, the platform pread(3) is not thread safe, just like our
own compat/ emulation, and cannot be used in the index-pack program.

* rj/platform-pread-may-be-thread-unsafe:
  index-pack: Disable threading on cygwin
2012-07-09 09:00:45 -07:00
Junio C Hamano c0f86547c5 index-pack: Disable threading on cygwin
The Cygwin implementation of pread() is not thread-safe since, just
like the emulation provided by compat/pread.c, it uses a sequence of
seek-read-seek calls. In order to avoid failues due to thread-safety
issues, commit b038a61 disables threading when NO_PREAD is defined.
(ie when using the emulation code in compat/pread.c).

We introduce a new build variable, NO_THREAD_SAFE_PREAD, which allows
use to disable the threaded index-pack code on cygwin, in addition to
the above NO_PREAD case.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-06-26 14:23:03 -07:00
Nguyễn Thái Ngọc Duy 4614043c8f index-pack: use streaming interface for collision test on large blobs
When putting whole objects in core is unavoidable, try match object
type and size first before actually inflating.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-24 10:37:48 -07:00
Nguyễn Thái Ngọc Duy 8a2e163ccd index-pack: factor out unpack core from get_data_from_pack
This allows caller to consume large inflated object with a fixed
amount of memory.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-23 09:08:55 -07:00
Nguyễn Thái Ngọc Duy 9ec2dde9f3 index-pack: use streaming interface on large blobs (most of the time)
unpack_raw_entry() will not allocate and return decompressed blobs if
they are larger than core.bigFileThreshold. sha1_object() may not be
called on those objects because there's no actual content.

sha1_object() is called later on those objects, where we can safely
use get_data_from_pack() to retrieve blob content for checking.
However we always do that when we definitely need the blob
content. And we often don't.

There are two cases when we may need object content. The first case is
when we find an in-repo blob with the same SHA-1. We need to do
collision test, byte-on-byte. If this test is on, the blob must be
loaded on memory (i.e. no streaming). Normally (e.g. in
fetch/pull/clone) this does not happen because git avoid to send
objects that client already has.

The other case is when --strict is specified and the object in
question is not a blob, which can't happen in reality becase we deal
with large _blobs_ here.

Note: --verify (or git-verify-pack) a pack from current repository
will trigger collision test on every object in the pack, which
effectively disables this patch. This could be easily worked around by
setting GIT_DIR to an imaginary place with no packs.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-23 09:08:54 -07:00
Nguyễn Thái Ngọc Duy 681b07de11 index-pack: hash non-delta objects while reading from stream
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-23 09:08:54 -07:00
Junio C Hamano cc13431a49 Merge branch 'nd/threaded-index-pack'
Enables threading in index-pack to resolve base data in parallel.

By Nguyễn Thái Ngọc Duy (3) and Ramsay Jones (1)
* nd/threaded-index-pack:
  index-pack: disable threading if NO_PREAD is defined
  index-pack: support multithreaded delta resolving
  index-pack: restructure pack processing into three main functions
  compat/win32/pthread.h: Add an pthread_key_delete() implementation
2012-05-14 11:50:40 -07:00
Nguyễn Thái Ngọc Duy b038a61020 index-pack: disable threading if NO_PREAD is defined
NO_PREAD simulates pread() as a sequence of seek, read, seek in
compat/pread.c. The simulation is not thread-safe because another
thread could move the file offset away in the middle of pread
operation. Do not allow threading in that case.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-07 15:48:19 -07:00
Nguyễn Thái Ngọc Duy b8a2486f15 index-pack: support multithreaded delta resolving
This puts delta resolving on each base on a separate thread, one base
cache per thread. Per-thread data is grouped in struct thread_local.
When running with nr_threads == 1, no pthreads calls are made. The
system essentially runs in non-thread mode.

An experiment on a Xeon 24 core machine with git.git shows that
performance does not increase proportional to the number of cores. So
by default, we use maximum 3 cores. Some numbers with --threads from 1
to 16:

1..4
real    0m8.003s  0m5.307s  0m4.321s  0m3.830s
user    0m7.720s  0m8.009s  0m8.133s  0m8.305s
sys     0m0.224s  0m0.372s  0m0.360s  0m0.360s

5..8
real    0m3.727s  0m3.604s  0m3.332s  0m3.369s
user    0m9.361s  0m9.817s  0m9.525s  0m9.769s
sys     0m0.584s  0m0.624s  0m0.540s  0m0.560s

9..12
real    0m3.036s  0m3.139s  0m3.177s  0m2.961s
user    0m8.977s  0m10.205s 0m9.737s  0m10.073s
sys     0m0.596s  0m0.680s  0m0.684s  0m0.680s

13..16
real    0m2.985s  0m2.894s  0m2.975s  0m2.971s
user    0m9.825s  0m10.573s 0m10.833s 0m11.361s
sys     0m0.788s  0m0.732s  0m0.904s  0m1.016s

On an Intel dual core and linux-2.6.git

1..4
real    2m37.789s 2m7.963s  2m0.920s  1m58.213s
user    2m28.415s 2m52.325s 2m50.176s 2m41.187s
sys     0m7.808s  0m11.181s 0m11.224s 0m10.731s

Thanks Ramsay Jones for troubleshooting and support on MinGW platform.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-07 15:48:15 -07:00
Nguyễn Thái Ngọc Duy 5272f75587 index-pack: restructure pack processing into three main functions
The second pass in parse_pack_objects() are split into
resolve_deltas(). The final phase, fixing thin pack or just seal the
pack, is now in conclude_pack() function. Main pack processing is now
a sequence of these functions:

 - parse_pack_objects() reads through the input pack
 - resolve_deltas()     makes sure all deltas can be resolved
 - conclude_pack()      seals the output pack
 - write_idx_file()     writes companion index file
 - final()              moves the pack/index to proper place

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-07 15:45:03 -07:00
Nguyễn Thái Ngọc Duy c2b97ecf51 i18n: index-pack: mark strings for translation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-24 14:55:48 -07:00
Nguyễn Thái Ngọc Duy 20e95d0a84 index-pack: eliminate unlimited recursion in get_base_data()
Revese the order of delta applying so that by the time a delta is
applied, its base is either non-delta or already inflated.
get_base_data() is still recursive, but because base's data is always
ready, the inner get_base_data() call never has any chance to call
itself again.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-16 14:34:33 -08:00
Nguyễn Thái Ngọc Duy 2baad22013 index-pack: eliminate recursion in find_unresolved_deltas
Current find_unresolved_deltas() links all bases together in a form of
tree, using struct base_data, with prev_base pointer to point to
parent node. Then it traverses down from parent to children in
recursive manner with all base_data allocated on stack.

To eliminate recursion, we simply need to put all on heap
(parse_pack_objects and fix_unresolved_deltas). After that, it's
simple non-recursive depth-first traversal loop. Each node also
maintains its own state (ofs and ref indices) to iterate over all
children nodes.

So we process one node:

 - if it returns a new (child) node (a parent base), we link it to our
   tree, then process the new node.

 - if it returns nothing, the node is done, free it. We go back to
   parent node and resume whatever it's doing.

and do it until we have no nodes to process.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-16 14:28:27 -08:00
Ævar Arnfjörð Bjarmason ab1900a36e Appease Sun Studio by renaming "tmpfile"
On Solaris the system headers define the "tmpfile" name, which'll
cause Git compiled with Sun Studio 12 Update 1 to whine about us
redefining the name:

    "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile     (E_PRAGMA_REDEFINE_STATIC)
    "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile    (E_PRAGMA_REDEFINE_STATIC)
    "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile   (E_PRAGMA_REDEFINE_STATIC)
    "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile    (E_PRAGMA_REDEFINE_STATIC)

Just renaming the "tmpfile" variable to "tmp_file" in the relevant
places is the easiest way to fix this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-21 10:21:04 -08:00
Junio C Hamano 68be2fea50 receive-pack, fetch-pack: reject bogus pack that records objects twice
When receive-pack & fetch-pack are run and store the pack obtained over
the wire to a local repository, they internally run the index-pack command
with the --strict option. Make sure that we reject incoming packfile that
records objects twice to avoid spreading such a damage.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-16 22:05:21 -08:00
Junio C Hamano d907bf8ef3 Merge branch 'jc/index-pack'
* jc/index-pack:
  verify-pack: use index-pack --verify
  index-pack: show histogram when emulating "verify-pack -v"
  index-pack: start learning to emulate "verify-pack -v"
  index-pack: a miniscule refactor
  index-pack --verify: read anomalous offsets from v2 idx file
  write_idx_file: need_large_offset() helper function
  index-pack: --verify
  write_idx_file: introduce a struct to hold idx customization options
  index-pack: group the delta-base array entries also by type

Conflicts:
	builtin/verify-pack.c
	cache.h
	sha1_file.c
2011-07-19 09:54:51 -07:00
Junio C Hamano eb4f4076aa Merge branch 'jc/zlib-wrap'
* jc/zlib-wrap:
  zlib: allow feeding more than 4GB in one go
  zlib: zlib can only process 4GB at a time
  zlib: wrap deflateBound() too
  zlib: wrap deflate side of the API
  zlib: wrap inflateInit2 used to accept only for gzip format
  zlib: wrap remaining calls to direct inflate/inflateEnd
  zlib wrapper: refactor error message formatter

Conflicts:
	sha1_file.c
2011-07-19 09:33:04 -07:00
Junio C Hamano ef49a7a012 zlib: zlib can only process 4GB at a time
The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.

But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.

In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size. The worst part of this story is that we often pass around
z_stream (the state object used by zlib) to keep track of the number of
used bytes in input/output buffer by inspecting these two fields, which
practically limits our callchain to the same 4GB limit.

Wrap z_stream in another structure git_zstream that can express avail_in
and avail_out in unsigned long. For now, just die() when the caller gives
a size that cannot be given to a single zlib call. In later patches in the
series, we would make git_inflate() and git_deflate() internally loop to
give callers an illusion that our "improved" version of zlib interface can
operate on a buffer larger than 4GB in one go.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:52:15 -07:00
Junio C Hamano 55bb5c9147 zlib: wrap deflate side of the API
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use
of deflateInit2 in remote-curl.c to tell the library to use gzip header
and trailer in git_deflate_init_gzip().

There is only one caller that cares about the status from deflateEnd().
Introduce git_deflate_end_gently() to let that sole caller retrieve the
status and act on it (i.e. die) for now, but we would probably want to
make inflate_end/deflate_end die when they ran out of memory and get
rid of the _gently() kind.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:10:29 -07:00
Junio C Hamano d1a0ed187c index-pack: show histogram when emulating "verify-pack -v"
The histogram produced by "verify-pack -v" always had an artificial
limit of 50, but index-pack knows what the maximum delta depth is, so
we do not have to limit it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05 22:45:37 -07:00