From 0e8189e2708bc1da08c77c7e1d960f420b6890a5 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Fri, 31 Oct 2008 11:31:08 -0400 Subject: [PATCH 01/11] close another possibility for propagating pack corruption Abstract -------- With index v2 we have a per object CRC to allow quick and safe reuse of pack data when repacking. This, however, doesn't currently prevent a stealth corruption from being propagated into a new pack when _not_ reusing pack data as demonstrated by the modification to t5302 included here. The Context ----------- The Git database is all checksummed with SHA1 hashes. Any kind of corruption can be confirmed by verifying this per object hash against corresponding data. However this can be costly to perform systematically and therefore this check is often not performed at run time when accessing the object database. First, the loose object format is entirely compressed with zlib which already provide a CRC verification of its own when inflating data. Any disk corruption would be caught already in this case. Then, packed objects are also compressed with zlib but only for their actual payload. The object headers and delta base references are not deflated for obvious performance reasons, however this leave them vulnerable to potentially undetected disk corruptions. Object types are often validated against the expected type when they're requested, and deflated size must always match the size recorded in the object header, so those cases are pretty much covered as well. Where corruptions could go unnoticed is in the delta base reference. Of course, in the OBJ_REF_DELTA case, the odds for a SHA1 reference to get corrupted so it actually matches the SHA1 of another object with the same size (the delta header stores the expected size of the base object to apply against) are virtually zero. In the OBJ_OFS_DELTA case, the reference is a pack offset which would have to match the start boundary of a different base object but still with the same size, and although this is relatively much more "probable" than in the OBJ_REF_DELTA case, the probability is also about zero in absolute terms. Still, the possibility exists as demonstrated in t5302 and is certainly greater than a SHA1 collision, especially in the OBJ_OFS_DELTA case which is now the default when repacking. Again, repacking by reusing existing pack data is OK since the per object CRC provided by index v2 guards against any such corruptions. What t5302 failed to test is a full repack in such case. The Solution ------------ As unlikely as this kind of stealth corruption can be in practice, it certainly isn't acceptable to propagate it into a freshly created pack. But, because this is so unlikely, we don't want to pay the run time cost associated with extra validation checks all the time either. Furthermore, consequences of such corruption in anything but repacking should be rather visible, and even if it could be quite unpleasant, it still has far less severe consequences than actively creating bad packs. So the best compromize is to check packed object CRC when unpacking objects, and only during the compression/writing phase of a repack, and only when not streaming the result. The cost of this is minimal (less than 1% CPU time), and visible only with a full repack. Someone with a stats background could provide an objective evaluation of this, but I suspect that it's bad RAM that has more potential for data corruptions at this point, even in those cases where this extra check is not performed. Still, it is best to prevent a known hole for corruption when recreating object data into a new pack. What about the streamed pack case? Well, any client receiving a pack must always consider that pack as untrusty and perform full validation anyway, hence no such stealth corruption could be propagated to remote repositoryes already. It is therefore worthless doing local validation in that case. Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- builtin-pack-objects.c | 10 ++++++++++ cache.h | 3 +++ sha1_file.c | 15 +++++++++++++++ t/t5302-pack-index.sh | 3 ++- 4 files changed, 30 insertions(+), 1 deletion(-) diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 15b80db5a1..1591417962 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -1698,6 +1698,16 @@ static void prepare_pack(int window, int depth) get_object_details(); + /* + * If we're locally repacking then we need to be doubly careful + * from now on in order to make sure no stealth corruption gets + * propagated to the new pack. Clients receiving streamed packs + * should validate everything they get anyway so no need to incur + * the additional cost here in that case. + */ + if (!pack_to_stdout) + do_check_packed_object_crc = 1; + if (!nr_objects || !window || !depth) return; diff --git a/cache.h b/cache.h index b0edbf9b9f..6b18fab3c1 100644 --- a/cache.h +++ b/cache.h @@ -565,6 +565,9 @@ extern int force_object_loose(const unsigned char *sha1, time_t mtime); /* just like read_sha1_file(), but non fatal in presence of bad objects */ extern void *read_object(const unsigned char *sha1, enum object_type *type, unsigned long *size); +/* global flag to enable extra checks when accessing packed objects */ +extern int do_check_packed_object_crc; + extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned long size, const char *type); extern int move_temp_to_file(const char *tmpfile, const char *filename); diff --git a/sha1_file.c b/sha1_file.c index ab2b520f03..88d9cf357f 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1694,6 +1694,8 @@ static void *unpack_delta_entry(struct packed_git *p, return result; } +int do_check_packed_object_crc; + void *unpack_entry(struct packed_git *p, off_t obj_offset, enum object_type *type, unsigned long *sizep) { @@ -1701,6 +1703,19 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset, off_t curpos = obj_offset; void *data; + if (do_check_packed_object_crc && p->index_version > 1) { + struct revindex_entry *revidx = find_pack_revindex(p, obj_offset); + unsigned long len = revidx[1].offset - obj_offset; + if (check_pack_crc(p, &w_curs, obj_offset, len, revidx->nr)) { + const unsigned char *sha1 = + nth_packed_object_sha1(p, revidx->nr); + error("bad packed object CRC for %s", + sha1_to_hex(sha1)); + mark_bad_packed_object(p, sha1); + return NULL; + } + } + *type = unpack_object_header(p, &w_curs, &curpos, sizep); switch (*type) { case OBJ_OFS_DELTA: diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh index 344ab25b8b..29896141b9 100755 --- a/t/t5302-pack-index.sh +++ b/t/t5302-pack-index.sh @@ -160,7 +160,8 @@ test_expect_success \ test_expect_success \ '[index v2] 5) pack-objects refuses to reuse corrupted data' \ - 'test_must_fail git pack-objects test-5 Date: Wed, 29 Oct 2008 19:02:45 -0400 Subject: [PATCH 02/11] better validation on delta base object offsets In one case, it was possible to have a bad offset equal to 0 effectively pointing a delta onto itself and crashing git after too many recursions. In the other cases, a negative offset could result due to off_t being signed. Catch those. Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- builtin-pack-objects.c | 4 ++-- builtin-unpack-objects.c | 2 ++ index-pack.c | 2 +- sha1_file.c | 2 +- 4 files changed, 6 insertions(+), 4 deletions(-) diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 1591417962..cc1e47f41a 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -1038,10 +1038,10 @@ static void check_object(struct object_entry *entry) c = buf[used_0++]; ofs = (ofs << 7) + (c & 127); } - if (ofs >= entry->in_pack_offset) + ofs = entry->in_pack_offset - ofs; + if (ofs <= 0 || ofs >= entry->in_pack_offset) die("delta base offset out of bound for %s", sha1_to_hex(entry->idx.sha1)); - ofs = entry->in_pack_offset - ofs; if (reuse_delta && !entry->preferred_base) { struct revindex_entry *revidx; revidx = find_pack_revindex(p, ofs); diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c index 9f4bdd3296..47ed610677 100644 --- a/builtin-unpack-objects.c +++ b/builtin-unpack-objects.c @@ -370,6 +370,8 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size, base_offset = (base_offset << 7) + (c & 127); } base_offset = obj_list[nr].offset - base_offset; + if (base_offset <= 0 || base_offset >= obj_list[nr].offset) + die("offset value out of bound for delta base object"); delta_data = get_data(delta_size); if (dry_run || !delta_data) { diff --git a/index-pack.c b/index-pack.c index 6f89bb9ac7..da03eeeca1 100644 --- a/index-pack.c +++ b/index-pack.c @@ -334,7 +334,7 @@ static void *unpack_raw_entry(struct object_entry *obj, union delta_base *delta_ base_offset = (base_offset << 7) + (c & 127); } delta_base->offset = obj->idx.offset - base_offset; - if (delta_base->offset >= obj->idx.offset) + if (delta_base->offset <= 0 || delta_base->offset >= obj->idx.offset) bad_object(obj->idx.offset, "delta base offset is out of bound"); break; case OBJ_COMMIT: diff --git a/sha1_file.c b/sha1_file.c index 88d9cf357f..e57949b415 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1355,7 +1355,7 @@ static off_t get_delta_base(struct packed_git *p, base_offset = (base_offset << 7) + (c & 127); } base_offset = delta_obj_offset - base_offset; - if (base_offset >= delta_obj_offset) + if (base_offset <= 0 || base_offset >= delta_obj_offset) return 0; /* out of bound */ *curpos += used; } else if (type == OBJ_REF_DELTA) { From 09ded04b7e1f0096bb2fe356b2f5a298296151dd Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Wed, 29 Oct 2008 19:02:46 -0400 Subject: [PATCH 03/11] make unpack_object_header() non fatal It is possible to have pack corruption in the object header. Currently unpack_object_header() simply die() on them instead of letting the caller deal with that gracefully. So let's have unpack_object_header() return an error instead, and find a better name for unpack_object_header_gently() in that context. All callers of unpack_object_header() are ready for it. Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- builtin-pack-objects.c | 2 +- cache.h | 2 +- sha1_file.c | 20 +++++++++++--------- 3 files changed, 13 insertions(+), 11 deletions(-) diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index cc1e47f41a..64aefdf23b 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -1002,7 +1002,7 @@ static void check_object(struct object_entry *entry) * We want in_pack_type even if we do not reuse delta * since non-delta representations could still be reused. */ - used = unpack_object_header_gently(buf, avail, + used = unpack_object_header_buffer(buf, avail, &entry->in_pack_type, &entry->size); diff --git a/cache.h b/cache.h index 6b18fab3c1..c440598e27 100644 --- a/cache.h +++ b/cache.h @@ -754,7 +754,7 @@ extern const unsigned char *nth_packed_object_sha1(struct packed_git *, uint32_t extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t); extern off_t find_pack_entry_one(const unsigned char *, struct packed_git *); extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *); -extern unsigned long unpack_object_header_gently(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep); +extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep); extern unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t); extern const char *packed_object_info_detail(struct packed_git *, off_t, unsigned long *, unsigned long *, unsigned int *, unsigned char *); extern int matches_pack_name(struct packed_git *p, const char *name); diff --git a/sha1_file.c b/sha1_file.c index e57949b415..7698177488 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1110,7 +1110,8 @@ static int legacy_loose_object(unsigned char *map) return 0; } -unsigned long unpack_object_header_gently(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep) +unsigned long unpack_object_header_buffer(const unsigned char *buf, + unsigned long len, enum object_type *type, unsigned long *sizep) { unsigned shift; unsigned char c; @@ -1122,10 +1123,10 @@ unsigned long unpack_object_header_gently(const unsigned char *buf, unsigned lon size = c & 15; shift = 4; while (c & 0x80) { - if (len <= used) - return 0; - if (sizeof(long) * 8 <= shift) + if (len <= used || sizeof(long) * 8 <= shift) { + error("bad object header"); return 0; + } c = buf[used++]; size += (c & 0x7f) << shift; shift += 7; @@ -1164,7 +1165,7 @@ static int unpack_sha1_header(z_stream *stream, unsigned char *map, unsigned lon * really worth it and we don't write it any longer. But we * can still read it. */ - used = unpack_object_header_gently(map, mapsize, &type, &size); + used = unpack_object_header_buffer(map, mapsize, &type, &size); if (!used || !valid_loose_object_type[type]) return -1; map += used; @@ -1411,10 +1412,11 @@ static int unpack_object_header(struct packed_git *p, * insane, so we know won't exceed what we have been given. */ base = use_pack(p, w_curs, *curpos, &left); - used = unpack_object_header_gently(base, left, &type, sizep); - if (!used) - die("object offset outside of pack file"); - *curpos += used; + used = unpack_object_header_buffer(base, left, &type, sizep); + if (!used) { + type = OBJ_BAD; + } else + *curpos += used; return type; } From 3d77d8774fc18216f18efa5c5907f73b22d61604 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Wed, 29 Oct 2008 19:02:47 -0400 Subject: [PATCH 04/11] make packed_object_info() resilient to pack corruptions In the same spirit as commit 8eca0b47ff, let's try to survive a pack corruption by making packed_object_info() able to fall back to alternate packs or loose objects. Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- sha1_file.c | 36 ++++++++++++++++++++++++++++++------ 1 file changed, 30 insertions(+), 6 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 7698177488..384a43044e 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1314,8 +1314,10 @@ unsigned long get_size_from_delta(struct packed_git *p, } while ((st == Z_OK || st == Z_BUF_ERROR) && stream.total_out < sizeof(delta_head)); inflateEnd(&stream); - if ((st != Z_STREAM_END) && stream.total_out != sizeof(delta_head)) - die("delta data unpack-initial failed"); + if ((st != Z_STREAM_END) && stream.total_out != sizeof(delta_head)) { + error("delta data unpack-initial failed"); + return 0; + } /* Examine the initial part of the delta to figure out * the result size. @@ -1382,15 +1384,29 @@ static int packed_delta_info(struct packed_git *p, off_t base_offset; base_offset = get_delta_base(p, w_curs, &curpos, type, obj_offset); + if (!base_offset) + return OBJ_BAD; type = packed_object_info(p, base_offset, NULL); + if (type <= OBJ_NONE) { + struct revindex_entry *revidx = find_pack_revindex(p, base_offset); + const unsigned char *base_sha1 = + nth_packed_object_sha1(p, revidx->nr); + mark_bad_packed_object(p, base_sha1); + type = sha1_object_info(base_sha1, NULL); + if (type <= OBJ_NONE) + return OBJ_BAD; + } /* We choose to only get the type of the base object and * ignore potentially corrupt pack file that expects the delta * based on a base with a wrong size. This saves tons of * inflate() calls. */ - if (sizep) + if (sizep) { *sizep = get_size_from_delta(p, w_curs, curpos); + if (*sizep == 0) + type = OBJ_BAD; + } return type; } @@ -1500,8 +1516,9 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, *sizep = size; break; default: - die("pack %s contains unknown object type %d", - p->pack_name, type); + error("unknown object type %i at offset %"PRIuMAX" in %s", + type, (uintmax_t)obj_offset, p->pack_name); + type = OBJ_BAD; } unuse_pack(&w_curs); return type; @@ -1971,7 +1988,14 @@ int sha1_object_info(const unsigned char *sha1, unsigned long *sizep) if (!find_pack_entry(sha1, &e, NULL)) return status; } - return packed_object_info(e.p, e.offset, sizep); + + status = packed_object_info(e.p, e.offset, sizep); + if (status < 0) { + mark_bad_packed_object(e.p, sha1); + status = sha1_object_info(sha1, sizep); + } + + return status; } static void *read_packed_sha1(const unsigned char *sha1, From 03d660150cbc80cd7d2ec90c3c4e6ce563295e3a Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Wed, 29 Oct 2008 19:02:48 -0400 Subject: [PATCH 05/11] make check_object() resilient to pack corruptions The check_object() function tries to get away with the least amount of pack access possible when it already has partial information on given object rather than calling the more costly packed_object_info(). When things don't look right, it should just give up and fall back to packed_object_info() directly instead of die()'ing. Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- builtin-pack-objects.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 64aefdf23b..c05fb944b0 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -1005,6 +1005,8 @@ static void check_object(struct object_entry *entry) used = unpack_object_header_buffer(buf, avail, &entry->in_pack_type, &entry->size); + if (used == 0) + goto give_up; /* * Determine if this is a delta and if so whether we can @@ -1016,6 +1018,8 @@ static void check_object(struct object_entry *entry) /* Not a delta hence we've already got all we need. */ entry->type = entry->in_pack_type; entry->in_pack_header_size = used; + if (entry->type < OBJ_COMMIT || entry->type > OBJ_BLOB) + goto give_up; unuse_pack(&w_curs); return; case OBJ_REF_DELTA: @@ -1032,16 +1036,20 @@ static void check_object(struct object_entry *entry) ofs = c & 127; while (c & 128) { ofs += 1; - if (!ofs || MSB(ofs, 7)) - die("delta base offset overflow in pack for %s", - sha1_to_hex(entry->idx.sha1)); + if (!ofs || MSB(ofs, 7)) { + error("delta base offset overflow in pack for %s", + sha1_to_hex(entry->idx.sha1)); + goto give_up; + } c = buf[used_0++]; ofs = (ofs << 7) + (c & 127); } ofs = entry->in_pack_offset - ofs; - if (ofs <= 0 || ofs >= entry->in_pack_offset) - die("delta base offset out of bound for %s", - sha1_to_hex(entry->idx.sha1)); + if (ofs <= 0 || ofs >= entry->in_pack_offset) { + error("delta base offset out of bound for %s", + sha1_to_hex(entry->idx.sha1)); + goto give_up; + } if (reuse_delta && !entry->preferred_base) { struct revindex_entry *revidx; revidx = find_pack_revindex(p, ofs); @@ -1078,6 +1086,8 @@ static void check_object(struct object_entry *entry) */ entry->size = get_size_from_delta(p, &w_curs, entry->in_pack_offset + entry->in_pack_header_size); + if (entry->size == 0) + goto give_up; unuse_pack(&w_curs); return; } @@ -1087,6 +1097,7 @@ static void check_object(struct object_entry *entry) * with sha1_object_info() to find about the object type * at this point... */ + give_up: unuse_pack(&w_curs); } From 08698b1e32bc414f214b7300b40c30a30d9ecd1c Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Wed, 29 Oct 2008 19:02:49 -0400 Subject: [PATCH 06/11] make find_pack_revindex() aware of the nasty world It currently calls die() whenever given offset is not found thinking that such thing should never happen. But this offset may come from a corrupted pack whych _could_ happen and not be found. Callers should deal with this possibility gracefully instead. Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- builtin-pack-objects.c | 2 ++ pack-revindex.c | 3 ++- sha1_file.c | 18 ++++++++++++------ 3 files changed, 16 insertions(+), 7 deletions(-) diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index c05fb944b0..04a5a55ef9 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -1053,6 +1053,8 @@ static void check_object(struct object_entry *entry) if (reuse_delta && !entry->preferred_base) { struct revindex_entry *revidx; revidx = find_pack_revindex(p, ofs); + if (!revidx) + goto give_up; base_ref = nth_packed_object_sha1(p, revidx->nr); } entry->in_pack_header_size = used + used_0; diff --git a/pack-revindex.c b/pack-revindex.c index 6096b6224a..1de53c8934 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -140,7 +140,8 @@ struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs) else lo = mi + 1; } while (lo < hi); - die("internal error: pack revindex corrupt"); + error("bad offset for revindex"); + return NULL; } void discard_revindex(void) diff --git a/sha1_file.c b/sha1_file.c index 384a43044e..9ce1df0cff 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1388,9 +1388,12 @@ static int packed_delta_info(struct packed_git *p, return OBJ_BAD; type = packed_object_info(p, base_offset, NULL); if (type <= OBJ_NONE) { - struct revindex_entry *revidx = find_pack_revindex(p, base_offset); - const unsigned char *base_sha1 = - nth_packed_object_sha1(p, revidx->nr); + struct revindex_entry *revidx; + const unsigned char *base_sha1; + revidx = find_pack_revindex(p, base_offset); + if (!revidx) + return OBJ_BAD; + base_sha1 = nth_packed_object_sha1(p, revidx->nr); mark_bad_packed_object(p, base_sha1); type = sha1_object_info(base_sha1, NULL); if (type <= OBJ_NONE) @@ -1682,9 +1685,12 @@ static void *unpack_delta_entry(struct packed_git *p, * This is costly but should happen only in the presence * of a corrupted pack, and is better than failing outright. */ - struct revindex_entry *revidx = find_pack_revindex(p, base_offset); - const unsigned char *base_sha1 = - nth_packed_object_sha1(p, revidx->nr); + struct revindex_entry *revidx; + const unsigned char *base_sha1; + revidx = find_pack_revindex(p, base_offset); + if (!revidx) + return NULL; + base_sha1 = nth_packed_object_sha1(p, revidx->nr); error("failed to read delta base object %s" " at offset %"PRIuMAX" from %s", sha1_to_hex(base_sha1), (uintmax_t)base_offset, From 64bd76b1de75483dea646c39c390113ffc821299 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Wed, 29 Oct 2008 19:02:50 -0400 Subject: [PATCH 07/11] pack-objects: allow "fixing" a corrupted pack without a full repack When the pack data to be reused is found to be bad, let's fall back to full object access through the generic path which has its own strategies to find alternate object sources in that case. This allows for "fixing" a corrupted pack simply by copying either another pack containing the object(s) found to be bad, or the loose object itself, into the object store and launch a repack without the need for -f. Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- builtin-pack-objects.c | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 04a5a55ef9..5be6664c72 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -277,6 +277,7 @@ static unsigned long write_object(struct sha1file *f, */ if (!to_reuse) { + no_reuse: if (!usable_delta) { buf = read_sha1_file(entry->idx.sha1, &type, &size); if (!buf) @@ -358,20 +359,30 @@ static unsigned long write_object(struct sha1file *f, struct revindex_entry *revidx; off_t offset; - if (entry->delta) { + if (entry->delta) type = (allow_ofs_delta && entry->delta->idx.offset) ? OBJ_OFS_DELTA : OBJ_REF_DELTA; - reused_delta++; - } hdrlen = encode_header(type, entry->size, header); + offset = entry->in_pack_offset; revidx = find_pack_revindex(p, offset); datalen = revidx[1].offset - offset; if (!pack_to_stdout && p->index_version > 1 && - check_pack_crc(p, &w_curs, offset, datalen, revidx->nr)) - die("bad packed object CRC for %s", sha1_to_hex(entry->idx.sha1)); + check_pack_crc(p, &w_curs, offset, datalen, revidx->nr)) { + error("bad packed object CRC for %s", sha1_to_hex(entry->idx.sha1)); + unuse_pack(&w_curs); + goto no_reuse; + } + offset += entry->in_pack_header_size; datalen -= entry->in_pack_header_size; + if (!pack_to_stdout && p->index_version == 1 && + check_pack_inflate(p, &w_curs, offset, datalen, entry->size)) { + error("corrupt packed object for %s", sha1_to_hex(entry->idx.sha1)); + unuse_pack(&w_curs); + goto no_reuse; + } + if (type == OBJ_OFS_DELTA) { off_t ofs = entry->idx.offset - entry->delta->idx.offset; unsigned pos = sizeof(dheader) - 1; @@ -383,21 +394,19 @@ static unsigned long write_object(struct sha1file *f, sha1write(f, header, hdrlen); sha1write(f, dheader + pos, sizeof(dheader) - pos); hdrlen += sizeof(dheader) - pos; + reused_delta++; } else if (type == OBJ_REF_DELTA) { if (limit && hdrlen + 20 + datalen + 20 >= limit) return 0; sha1write(f, header, hdrlen); sha1write(f, entry->delta->idx.sha1, 20); hdrlen += 20; + reused_delta++; } else { if (limit && hdrlen + datalen + 20 >= limit) return 0; sha1write(f, header, hdrlen); } - - if (!pack_to_stdout && p->index_version == 1 && - check_pack_inflate(p, &w_curs, offset, datalen, entry->size)) - die("corrupt packed object for %s", sha1_to_hex(entry->idx.sha1)); copy_pack_data(f, p, &w_curs, offset, datalen); unuse_pack(&w_curs); reused++; @@ -1074,6 +1083,7 @@ static void check_object(struct object_entry *entry) */ entry->type = entry->in_pack_type; entry->delta = base_entry; + entry->delta_size = entry->size; entry->delta_sibling = base_entry->delta_child; base_entry->delta_child = entry; unuse_pack(&w_curs); From 538cf6b6e5f663eea9b777ef12ae58eb0c919480 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Wed, 29 Oct 2008 19:02:51 -0400 Subject: [PATCH 08/11] extend test coverage for latest pack corruption resilience improvements Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- t/t5303-pack-corruption-resilience.sh | 96 +++++++++++++++++++++++++-- 1 file changed, 89 insertions(+), 7 deletions(-) diff --git a/t/t5303-pack-corruption-resilience.sh b/t/t5303-pack-corruption-resilience.sh index 31b20b21d2..ac181ea38a 100755 --- a/t/t5303-pack-corruption-resilience.sh +++ b/t/t5303-pack-corruption-resilience.sh @@ -41,11 +41,17 @@ create_new_pack() { git verify-pack -v ${pack}.pack } +do_repack() { + pack=`printf "$blob_1\n$blob_2\n$blob_3\n" | + git pack-objects $@ .git/objects/pack/pack` && + pack=".git/objects/pack/pack-${pack}" +} + do_corrupt_object() { ofs=`git show-index < ${pack}.idx | grep $1 | cut -f1 -d" "` && ofs=$(($ofs + $2)) && chmod +w ${pack}.pack && - dd if=/dev/zero of=${pack}.pack count=1 bs=1 conv=notrunc seek=$ofs && + dd of=${pack}.pack count=1 bs=1 conv=notrunc seek=$ofs && test_must_fail git verify-pack ${pack}.pack } @@ -60,7 +66,7 @@ test_expect_success \ test_expect_success \ 'create corruption in header of first object' \ - 'do_corrupt_object $blob_1 0 && + 'do_corrupt_object $blob_1 0 < /dev/zero && test_must_fail git cat-file blob $blob_1 > /dev/null && test_must_fail git cat-file blob $blob_2 > /dev/null && test_must_fail git cat-file blob $blob_3 > /dev/null' @@ -119,7 +125,7 @@ test_expect_success \ 'create corruption in header of first delta' \ 'create_new_pack && git prune-packed && - do_corrupt_object $blob_2 0 && + do_corrupt_object $blob_2 0 < /dev/zero && git cat-file blob $blob_1 > /dev/null && test_must_fail git cat-file blob $blob_2 > /dev/null && test_must_fail git cat-file blob $blob_3 > /dev/null' @@ -133,6 +139,15 @@ test_expect_success \ git cat-file blob $blob_2 > /dev/null && git cat-file blob $blob_3 > /dev/null' +test_expect_success \ + '... and then a repack "clears" the corruption' \ + 'do_repack && + git prune-packed && + git verify-pack ${pack}.pack && + git cat-file blob $blob_1 > /dev/null && + git cat-file blob $blob_2 > /dev/null && + git cat-file blob $blob_3 > /dev/null' + test_expect_success \ 'create corruption in data of first delta' \ 'create_new_pack && @@ -152,11 +167,20 @@ test_expect_success \ git cat-file blob $blob_2 > /dev/null && git cat-file blob $blob_3 > /dev/null' +test_expect_success \ + '... and then a repack "clears" the corruption' \ + 'do_repack && + git prune-packed && + git verify-pack ${pack}.pack && + git cat-file blob $blob_1 > /dev/null && + git cat-file blob $blob_2 > /dev/null && + git cat-file blob $blob_3 > /dev/null' + test_expect_success \ 'corruption in delta base reference of first delta (OBJ_REF_DELTA)' \ 'create_new_pack && git prune-packed && - do_corrupt_object $blob_2 2 && + do_corrupt_object $blob_2 2 < /dev/zero && git cat-file blob $blob_1 > /dev/null && test_must_fail git cat-file blob $blob_2 > /dev/null && test_must_fail git cat-file blob $blob_3 > /dev/null' @@ -171,17 +195,75 @@ test_expect_success \ git cat-file blob $blob_3 > /dev/null' test_expect_success \ - 'corruption in delta base reference of first delta (OBJ_OFS_DELTA)' \ + '... and then a repack "clears" the corruption' \ + 'do_repack && + git prune-packed && + git verify-pack ${pack}.pack && + git cat-file blob $blob_1 > /dev/null && + git cat-file blob $blob_2 > /dev/null && + git cat-file blob $blob_3 > /dev/null' + +test_expect_success \ + 'corruption #0 in delta base reference of first delta (OBJ_OFS_DELTA)' \ 'create_new_pack --delta-base-offset && git prune-packed && - do_corrupt_object $blob_2 2 && + do_corrupt_object $blob_2 2 < /dev/zero && git cat-file blob $blob_1 > /dev/null && test_must_fail git cat-file blob $blob_2 > /dev/null && test_must_fail git cat-file blob $blob_3 > /dev/null' test_expect_success \ - '... and a redundant pack allows for full recovery too' \ + '... but having a loose copy allows for full recovery' \ 'mv ${pack}.idx tmp && + git hash-object -t blob -w file_2 && + mv tmp ${pack}.idx && + git cat-file blob $blob_1 > /dev/null && + git cat-file blob $blob_2 > /dev/null && + git cat-file blob $blob_3 > /dev/null' + +test_expect_success \ + '... and then a repack "clears" the corruption' \ + 'do_repack --delta-base-offset && + git prune-packed && + git verify-pack ${pack}.pack && + git cat-file blob $blob_1 > /dev/null && + git cat-file blob $blob_2 > /dev/null && + git cat-file blob $blob_3 > /dev/null' + +test_expect_success \ + 'corruption #1 in delta base reference of first delta (OBJ_OFS_DELTA)' \ + 'create_new_pack --delta-base-offset && + git prune-packed && + printf "\x01" | do_corrupt_object $blob_2 2 && + git cat-file blob $blob_1 > /dev/null && + test_must_fail git cat-file blob $blob_2 > /dev/null && + test_must_fail git cat-file blob $blob_3 > /dev/null' + +test_expect_success \ + '... but having a loose copy allows for full recovery' \ + 'mv ${pack}.idx tmp && + git hash-object -t blob -w file_2 && + mv tmp ${pack}.idx && + git cat-file blob $blob_1 > /dev/null && + git cat-file blob $blob_2 > /dev/null && + git cat-file blob $blob_3 > /dev/null' + +test_expect_success \ + '... and then a repack "clears" the corruption' \ + 'do_repack --delta-base-offset && + git prune-packed && + git verify-pack ${pack}.pack && + git cat-file blob $blob_1 > /dev/null && + git cat-file blob $blob_2 > /dev/null && + git cat-file blob $blob_3 > /dev/null' + +test_expect_success \ + '... and a redundant pack allows for full recovery too' \ + 'do_corrupt_object $blob_2 2 < /dev/zero && + git cat-file blob $blob_1 > /dev/null && + test_must_fail git cat-file blob $blob_2 > /dev/null && + test_must_fail git cat-file blob $blob_3 > /dev/null && + mv ${pack}.idx tmp && git hash-object -t blob -w file_1 && git hash-object -t blob -w file_2 && printf "$blob_1\n$blob_2\n" | git pack-objects .git/objects/pack/pack && From 59dd9ed18398d96922e345f7987f1245870fb3e5 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Wed, 29 Oct 2008 19:02:52 -0400 Subject: [PATCH 09/11] pack-objects: don't leak pack window reference when splitting packs Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- builtin-pack-objects.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 5be6664c72..1b6eff314e 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -389,22 +389,28 @@ static unsigned long write_object(struct sha1file *f, dheader[pos] = ofs & 127; while (ofs >>= 7) dheader[--pos] = 128 | (--ofs & 127); - if (limit && hdrlen + sizeof(dheader) - pos + datalen + 20 >= limit) + if (limit && hdrlen + sizeof(dheader) - pos + datalen + 20 >= limit) { + unuse_pack(&w_curs); return 0; + } sha1write(f, header, hdrlen); sha1write(f, dheader + pos, sizeof(dheader) - pos); hdrlen += sizeof(dheader) - pos; reused_delta++; } else if (type == OBJ_REF_DELTA) { - if (limit && hdrlen + 20 + datalen + 20 >= limit) + if (limit && hdrlen + 20 + datalen + 20 >= limit) { + unuse_pack(&w_curs); return 0; + } sha1write(f, header, hdrlen); sha1write(f, entry->delta->idx.sha1, 20); hdrlen += 20; reused_delta++; } else { - if (limit && hdrlen + datalen + 20 >= limit) + if (limit && hdrlen + datalen + 20 >= limit) { + unuse_pack(&w_curs); return 0; + } sha1write(f, header, hdrlen); } copy_pack_data(f, p, &w_curs, offset, datalen); From 8c1f6f6c5797d1a3fa9ec9d38ae30b4f19e70a3c Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Sun, 9 Nov 2008 13:08:38 -0800 Subject: [PATCH 10/11] t5303: work around printf breakage in dash Signed-off-by: Junio C Hamano --- t/t5303-pack-corruption-resilience.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t5303-pack-corruption-resilience.sh b/t/t5303-pack-corruption-resilience.sh index ac181ea38a..f471eadb96 100755 --- a/t/t5303-pack-corruption-resilience.sh +++ b/t/t5303-pack-corruption-resilience.sh @@ -234,7 +234,7 @@ test_expect_success \ 'corruption #1 in delta base reference of first delta (OBJ_OFS_DELTA)' \ 'create_new_pack --delta-base-offset && git prune-packed && - printf "\x01" | do_corrupt_object $blob_2 2 && + tr "\000" "\001" /dev/null && test_must_fail git cat-file blob $blob_2 > /dev/null && test_must_fail git cat-file blob $blob_3 > /dev/null' From a2d2c478f325690e1575e14e7d310fd7435d83be Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Sun, 9 Nov 2008 13:11:06 -0800 Subject: [PATCH 11/11] t5303: fix printf format string for portability printf "\x01" is bad; write printf "\001" for portability. Testing with dash is a good way to find this kind of POSIX.1 violation breakages. Signed-off-by: Junio C Hamano --- t/t5303-pack-corruption-resilience.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t5303-pack-corruption-resilience.sh b/t/t5303-pack-corruption-resilience.sh index f471eadb96..d4e30fc43c 100755 --- a/t/t5303-pack-corruption-resilience.sh +++ b/t/t5303-pack-corruption-resilience.sh @@ -234,7 +234,7 @@ test_expect_success \ 'corruption #1 in delta base reference of first delta (OBJ_OFS_DELTA)' \ 'create_new_pack --delta-base-offset && git prune-packed && - tr "\000" "\001" /dev/null && test_must_fail git cat-file blob $blob_2 > /dev/null && test_must_fail git cat-file blob $blob_3 > /dev/null'