Merge branch 'ds/chunked-file-api'

The common code to deal with "chunked file format" that is shared by the multi-pack-index and commit-graph files have been factored out, to help codepaths for both filetypes to become more robust. * ds/chunked-file-api: commit-graph.c: display correct number of chunks when writing chunk-format: add technical docs chunk-format: restore duplicate chunk checks midx: use 64-bit multiplication for chunk sizes midx: use chunk-format read API commit-graph: use chunk-format read API chunk-format: create read chunk API midx: use chunk-format API in write_midx_internal() midx: drop chunk progress during write midx: return success/failure in chunk write methods midx: add num_large_offsets to write_midx_context midx: add pack_perm to write_midx_context midx: add entries to write_midx_context midx: use context in write_midx_pack_names() midx: rename pack_info to write_midx_context commit-graph: use chunk-format write API chunk-format: create chunk format write API commit-graph: anonymize data in chunk_write_fn
2021-03-01 14:02:57 -08:00 · 2021-03-01 14:02:57 -08:00 · 660dd97a62
parent 12bd17521c c4ff24bbb3
commit 660dd97a62
10 changed files with 650 additions and 463 deletions
--- a/Documentation/technical/chunk-format.txt
+++ b/Documentation/technical/chunk-format.txt
@ -0,0 +1,116 @@
 Chunk-based file formats
 ========================
 Some file formats in Git use a common concept of "chunks" to describe
 sections of the file. This allows structured access to a large file by
 scanning a small "table of contents" for the remaining data. This common
 format is used by the `commit-graph` and `multi-pack-index` files. See
 link:technical/pack-format.html[the `multi-pack-index` format] and
 link:technical/commit-graph-format.html[the `commit-graph` format] for
 how they use the chunks to describe structured data.
 A chunk-based file format begins with some header information custom to
 that format. That header should include enough information to identify
 the file type, format version, and number of chunks in the file. From this
 information, that file can determine the start of the chunk-based region.
 The chunk-based region starts with a table of contents describing where
 each chunk starts and ends. This consists of (C+1) rows of 12 bytes each,
 where C is the number of chunks. Consider the following table:
  | Chunk ID (4 bytes) | Chunk Offset (8 bytes) |
  |--------------------|------------------------|
  | ID[0]              | OFFSET[0]              |
  | ...                | ...                    |
  | ID[C]              | OFFSET[C]              |
  | 0x0000             | OFFSET[C+1]            |
 Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset.
 Each integer is stored in network-byte order.
 The chunk identifier `ID[i]` is a label for the data stored within this
 fill from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the
 size of the `i`th chunk is equal to the difference between `OFFSET[i+1]`
 and `OFFSET[i]`. This requires that the chunk data appears contiguously
 in the same order as the table of contents.
 The final entry in the table of contents must be four zero bytes. This
 confirms that the table of contents is ending and provides the offset for
 the end of the chunk-based data.
 Note: The chunk-based format expects that the file contains _at least_ a
 trailing hash after `OFFSET[C+1]`.
 Functions for working with chunk-based file formats are declared in
 `chunk-format.h`. Using these methods provide extra checks that assist
 developers when creating new file formats.
 Writing chunk-based file formats
 --------------------------------
 To write a chunk-based file format, create a `struct chunkfile` by
 calling `init_chunkfile()` and pass a `struct hashfile` pointer. The
 caller is responsible for opening the `hashfile` and writing header
 information so the file format is identifiable before the chunk-based
 format begins.
 Then, call `add_chunk()` for each chunk that is intended for write. This
 populates the `chunkfile` with information about the order and size of
 each chunk to write. Provide a `chunk_write_fn` function pointer to
 perform the write of the chunk data upon request.
 Call `write_chunkfile()` to write the table of contents to the `hashfile`
 followed by each of the chunks. This will verify that each chunk wrote
 the expected amount of data so the table of contents is correct.
 Finally, call `free_chunkfile()` to clear the `struct chunkfile` data. The
 caller is responsible for finalizing the `hashfile` by writing the trailing
 hash and closing the file.
 Reading chunk-based file formats
 --------------------------------
 To read a chunk-based file format, the file must be opened as a
 memory-mapped region. The chunk-format API expects that the entire file
 is mapped as a contiguous memory region.
 Initialize a `struct chunkfile` pointer with `init_chunkfile(NULL)`.
 After reading the header information from the beginning of the file,
 including the chunk count, call `read_table_of_contents()` to populate
 the `struct chunkfile` with the list of chunks, their offsets, and their
 sizes.
 Extract the data information for each chunk using `pair_chunk()` or
 `read_chunk()`:
 * `pair_chunk()` assigns a given pointer with the location inside the
  memory-mapped file corresponding to that chunk's offset. If the chunk
  does not exist, then the pointer is not modified.
 * `read_chunk()` takes a `chunk_read_fn` function pointer and calls it
  with the appropriate initial pointer and size information. The function
  is not called if the chunk does not exist. Use this method to read chunks
  if you need to perform immediate parsing or if you need to execute logic
  based on the size of the chunk.
 After calling these methods, call `free_chunkfile()` to clear the
 `struct chunkfile` data. This will not close the memory-mapped region.
 Callers are expected to own that data for the timeframe the pointers into
 the region are needed.
 Examples
 --------
 These file formats use the chunk-format API, and can be used as examples
 for future formats:
 * *commit-graph:* see `write_commit_graph_file()` and `parse_commit_graph()`
  in `commit-graph.c` for how the chunk-format API is used to write and
  parse the commit-graph file format documented in
  link:technical/commit-graph-format.html[the commit-graph file format].
 * *multi-pack-index:* see `write_midx_internal()` and `load_multi_pack_index()`
  in `midx.c` for how the chunk-format API is used to write and
  parse the multi-pack-index file format documented in
  link:technical/pack-format.html[the multi-pack-index file format].
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@ -61,6 +61,9 @@ CHUNK LOOKUP:
      the length using the next chunk position if necessary.) Each chunk
      ID appears at most once.
  The CHUNK LOOKUP matches the table of contents from
  link:technical/chunk-format.html[the chunk-based file format].
  The remaining data in the body is described one chunk at a time, and
  these chunks may be given in any order. Chunks are required unless
  otherwise specified.
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@ -336,6 +336,9 @@ CHUNK LOOKUP:
 	    (Chunks are provided in file-order, so you can infer the length
 	    using the next chunk position if necessary.)
 	The CHUNK LOOKUP matches the table of contents from
 	link:technical/chunk-format.html[the chunk-based file format].
 	The remaining data in the body is described one chunk at a time, and
 	these chunks may be given in any order. Chunks are required unless
 	otherwise specified.
--- a/1
+++ b/1
@ -834,6 +834,7 @@ LIB_OBJS += bundle.o
 LIB_OBJS += cache-tree.o
 LIB_OBJS += chdir-notify.o
 LIB_OBJS += checkout.o
 LIB_OBJS += chunk-format.o
 LIB_OBJS += color.o
 LIB_OBJS += column.o
 LIB_OBJS += combine-diff.o
--- a/chunk-format.c
+++ b/chunk-format.c
@ -0,0 +1,179 @@
 #include "cache.h"
 #include "chunk-format.h"
 #include "csum-file.h"
 /*
 * When writing a chunk-based file format, collect the chunks in
 * an array of chunk_info structs. The size stores the _expected_
 * amount of data that will be written by write_fn.
 */
 struct chunk_info {
 	uint32_t id;
 	uint64_t size;
 	chunk_write_fn write_fn;
 	const void *start;
 };
 struct chunkfile {
 	struct hashfile *f;
 	struct chunk_info *chunks;
 	size_t chunks_nr;
 	size_t chunks_alloc;
 };
 struct chunkfile *init_chunkfile(struct hashfile *f)
 {
 	struct chunkfile *cf = xcalloc(1, sizeof(*cf));
 	cf->f = f;
 	return cf;
 }
 void free_chunkfile(struct chunkfile *cf)
 {
 	if (!cf)
 		return;
 	free(cf->chunks);
 	free(cf);
 }
 int get_num_chunks(struct chunkfile *cf)
 {
 	return cf->chunks_nr;
 }
 void add_chunk(struct chunkfile *cf,
 	       uint32_t id,
 	       size_t size,
 	       chunk_write_fn fn)
 {
 	ALLOC_GROW(cf->chunks, cf->chunks_nr + 1, cf->chunks_alloc);
 	cf->chunks[cf->chunks_nr].id = id;
 	cf->chunks[cf->chunks_nr].write_fn = fn;
 	cf->chunks[cf->chunks_nr].size = size;
 	cf->chunks_nr++;
 }
 int write_chunkfile(struct chunkfile *cf, void *data)
 {
 	int i;
 	uint64_t cur_offset = hashfile_total(cf->f);
 	/* Add the table of contents to the current offset */
 	cur_offset += (cf->chunks_nr + 1) * CHUNK_TOC_ENTRY_SIZE;
 	for (i = 0; i < cf->chunks_nr; i++) {
 		hashwrite_be32(cf->f, cf->chunks[i].id);
 		hashwrite_be64(cf->f, cur_offset);
 		cur_offset += cf->chunks[i].size;
 	}
 	/* Trailing entry marks the end of the chunks */
 	hashwrite_be32(cf->f, 0);
 	hashwrite_be64(cf->f, cur_offset);
 	for (i = 0; i < cf->chunks_nr; i++) {
 		off_t start_offset = hashfile_total(cf->f);
 		int result = cf->chunks[i].write_fn(cf->f, data);
 		if (result)
 			return result;
 		if (hashfile_total(cf->f) - start_offset != cf->chunks[i].size)
 			BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead",
 			    cf->chunks[i].size, cf->chunks[i].id,
 			    hashfile_total(cf->f) - start_offset);
 	}
 	return 0;
 }
 int read_table_of_contents(struct chunkfile *cf,
 			   const unsigned char *mfile,
 			   size_t mfile_size,
 			   uint64_t toc_offset,
 			   int toc_length)
 {
 	int i;
 	uint32_t chunk_id;
 	const unsigned char *table_of_contents = mfile + toc_offset;
 	ALLOC_GROW(cf->chunks, toc_length, cf->chunks_alloc);
 	while (toc_length--) {
 		uint64_t chunk_offset, next_chunk_offset;
 		chunk_id = get_be32(table_of_contents);
 		chunk_offset = get_be64(table_of_contents + 4);
 		if (!chunk_id) {
 			error(_("terminating chunk id appears earlier than expected"));
 			return 1;
 		}
 		table_of_contents += CHUNK_TOC_ENTRY_SIZE;
 		next_chunk_offset = get_be64(table_of_contents + 4);
 		if (next_chunk_offset < chunk_offset ||
 		    next_chunk_offset > mfile_size - the_hash_algo->rawsz) {
 			error(_("improper chunk offset(s) %"PRIx64" and %"PRIx64""),
 			      chunk_offset, next_chunk_offset);
 			return -1;
 		}
 		for (i = 0; i < cf->chunks_nr; i++) {
 			if (cf->chunks[i].id == chunk_id) {
 				error(_("duplicate chunk ID %"PRIx32" found"),
 					chunk_id);
 				return -1;
 			}
 		}
 		cf->chunks[cf->chunks_nr].id = chunk_id;
 		cf->chunks[cf->chunks_nr].start = mfile + chunk_offset;
 		cf->chunks[cf->chunks_nr].size = next_chunk_offset - chunk_offset;
 		cf->chunks_nr++;
 	}
 	chunk_id = get_be32(table_of_contents);
 	if (chunk_id) {
 		error(_("final chunk has non-zero id %"PRIx32""), chunk_id);
 		return -1;
 	}
 	return 0;
 }
 static int pair_chunk_fn(const unsigned char *chunk_start,
 			 size_t chunk_size,
 			 void *data)
 {
 	const unsigned char **p = data;
 	*p = chunk_start;
 	return 0;
 }
 int pair_chunk(struct chunkfile *cf,
 	       uint32_t chunk_id,
 	       const unsigned char **p)
 {
 	return read_chunk(cf, chunk_id, pair_chunk_fn, p);
 }
 int read_chunk(struct chunkfile *cf,
 	       uint32_t chunk_id,
 	       chunk_read_fn fn,
 	       void *data)
 {
 	int i;
 	for (i = 0; i < cf->chunks_nr; i++) {
 		if (cf->chunks[i].id == chunk_id)
 			return fn(cf->chunks[i].start, cf->chunks[i].size, data);
 	}
 	return CHUNK_NOT_FOUND;
 }
--- a/chunk-format.h
+++ b/chunk-format.h
@ -0,0 +1,68 @@
 #ifndef CHUNK_FORMAT_H
 #define CHUNK_FORMAT_H
 #include "git-compat-util.h"
 struct hashfile;
 struct chunkfile;
 #define CHUNK_TOC_ENTRY_SIZE (sizeof(uint32_t) + sizeof(uint64_t))
 /*
 * Initialize a 'struct chunkfile' for writing _or_ reading a file
 * with the chunk format.
 *
 * If writing a file, supply a non-NULL 'struct hashfile *' that will
 * be used to write.
 *
 * If reading a file, use a NULL 'struct hashfile *' and then call
 * read_table_of_contents(). Supply the memory-mapped data to the
 * pair_chunk() or read_chunk() methods, as appropriate.
 *
 * DO NOT MIX THESE MODES. Use different 'struct chunkfile' instances
 * for reading and writing.
 */
 struct chunkfile *init_chunkfile(struct hashfile *f);
 void free_chunkfile(struct chunkfile *cf);
 int get_num_chunks(struct chunkfile *cf);
 typedef int (*chunk_write_fn)(struct hashfile *f, void *data);
 void add_chunk(struct chunkfile *cf,
 	       uint32_t id,
 	       size_t size,
 	       chunk_write_fn fn);
 int write_chunkfile(struct chunkfile *cf, void *data);
 int read_table_of_contents(struct chunkfile *cf,
 			   const unsigned char *mfile,
 			   size_t mfile_size,
 			   uint64_t toc_offset,
 			   int toc_length);
 #define CHUNK_NOT_FOUND (-2)
 /*
 * Find 'chunk_id' in the given chunkfile and assign the
 * given pointer to the position in the mmap'd file where
 * that chunk begins.
 *
 * Returns CHUNK_NOT_FOUND if the chunk does not exist.
 */
 int pair_chunk(struct chunkfile *cf,
 	       uint32_t chunk_id,
 	       const unsigned char **p);
 typedef int (*chunk_read_fn)(const unsigned char *chunk_start,
 			     size_t chunk_size, void *data);
 /*
 * Find 'chunk_id' in the given chunkfile and call the
 * given chunk_read_fn method with the information for
 * that chunk.
 *
 * Returns CHUNK_NOT_FOUND if the chunk does not exist.
 */
 int read_chunk(struct chunkfile *cf,
 	       uint32_t chunk_id,
 	       chunk_read_fn fn,
 	       void *data);
 #endif
--- a/commit-graph.c
+++ b/commit-graph.c
@ -19,6 +19,7 @@
 #include "shallow.h"
 #include "json-writer.h"
 #include "trace2.h"
 #include "chunk-format.h"
 void git_test_write_commit_graph_or_die(void)
 {
@ -44,7 +45,6 @@ void git_test_write_commit_graph_or_die(void)
 #define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */
 #define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */
 #define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */
 #define MAX_NUM_CHUNKS 9
 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
@ -59,8 +59,7 @@ void git_test_write_commit_graph_or_die(void)
 #define GRAPH_HEADER_SIZE 8
 #define GRAPH_FANOUT_SIZE (4 * 256)
-#define GRAPH_CHUNKLOOKUP_WIDTH 12
+#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * CHUNK_TOC_ENTRY_SIZE \
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
 			+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 #define CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW (1ULL << 31)
@ -298,15 +297,43 @@ static int verify_commit_graph_lite(struct commit_graph *g)
 	return 0;
 }
 static int graph_read_oid_lookup(const unsigned char *chunk_start,
 				 size_t chunk_size, void *data)
 {
 	struct commit_graph *g = data;
 	g->chunk_oid_lookup = chunk_start;
 	g->num_commits = chunk_size / g->hash_len;
 	return 0;
 }
 static int graph_read_bloom_data(const unsigned char *chunk_start,
 				  size_t chunk_size, void *data)
 {
 	struct commit_graph *g = data;
 	uint32_t hash_version;
 	g->chunk_bloom_data = chunk_start;
 	hash_version = get_be32(chunk_start);
 	if (hash_version != 1)
 		return 0;
 	g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
 	g->bloom_filter_settings->hash_version = hash_version;
 	g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4);
 	g->bloom_filter_settings->bits_per_entry = get_be32(chunk_start + 8);
 	g->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES;
 	return 0;
 }
 struct commit_graph *parse_commit_graph(struct repository *r,
 					void *graph_map, size_t graph_size)
 {
-	const unsigned char *data, *chunk_lookup;
+	const unsigned char *data;
 	uint32_t i;
 	struct commit_graph *graph;
 	uint64_t next_chunk_offset;
 	uint32_t graph_signature;
 	unsigned char graph_version, hash_version;
 	struct chunkfile *cf = NULL;
 	if (!graph_map)
 		return NULL;
@ -347,7 +374,7 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 	graph->data_len = graph_size;
 	if (graph_size < GRAPH_HEADER_SIZE +
-			 (graph->num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH +
+			 (graph->num_chunks + 1) * CHUNK_TOC_ENTRY_SIZE +
 			 GRAPH_FANOUT_SIZE + the_hash_algo->rawsz) {
 		error(_("commit-graph file is too small to hold %u chunks"),
 		      graph->num_chunks);
@ -355,108 +382,28 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 		return NULL;
 	}
-	chunk_lookup = data + 8;
+	cf = init_chunkfile(NULL);
 	next_chunk_offset = get_be64(chunk_lookup + 4);
 	for (i = 0; i < graph->num_chunks; i++) {
 		uint32_t chunk_id;
 		uint64_t chunk_offset = next_chunk_offset;
 		int chunk_repeated = 0;
-		chunk_id = get_be32(chunk_lookup + 0);
+	if (read_table_of_contents(cf, graph->data, graph_size,
-
+				   GRAPH_HEADER_SIZE, graph->num_chunks))
 		chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 		next_chunk_offset = get_be64(chunk_lookup + 4);
 		if (chunk_offset > graph_size - the_hash_algo->rawsz) {
 			error(_("commit-graph improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32),
 			      (uint32_t)chunk_offset);
 		goto free_and_return;
 		}
-		switch (chunk_id) {
+	pair_chunk(cf, GRAPH_CHUNKID_OIDFANOUT,
-		case GRAPH_CHUNKID_OIDFANOUT:
+		   (const unsigned char **)&graph->chunk_oid_fanout);
-			if (graph->chunk_oid_fanout)
+	read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph);
-				chunk_repeated = 1;
+	pair_chunk(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data);
-			else
+	pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
-				graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset);
+	pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
-			break;
+	pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
 		   &graph->chunk_generation_data);
 	pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
 		   &graph->chunk_generation_data_overflow);
-		case GRAPH_CHUNKID_OIDLOOKUP:
+	if (r->settings.commit_graph_read_changed_paths) {
-			if (graph->chunk_oid_lookup)
+		pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
-				chunk_repeated = 1;
+			   &graph->chunk_bloom_indexes);
-			else {
+		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
-				graph->chunk_oid_lookup = data + chunk_offset;
+			   graph_read_bloom_data, graph);
 				graph->num_commits = (next_chunk_offset - chunk_offset)
 						     / graph->hash_len;
 			}
 			break;
 		case GRAPH_CHUNKID_DATA:
 			if (graph->chunk_commit_data)
 				chunk_repeated = 1;
 			else
 				graph->chunk_commit_data = data + chunk_offset;
 			break;
 		case GRAPH_CHUNKID_GENERATION_DATA:
 			if (graph->chunk_generation_data)
 				chunk_repeated = 1;
 			else
 				graph->chunk_generation_data = data + chunk_offset;
 			break;
 		case GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW:
 			if (graph->chunk_generation_data_overflow)
 				chunk_repeated = 1;
 			else
 				graph->chunk_generation_data_overflow = data + chunk_offset;
 			break;
 		case GRAPH_CHUNKID_EXTRAEDGES:
 			if (graph->chunk_extra_edges)
 				chunk_repeated = 1;
 			else
 				graph->chunk_extra_edges = data + chunk_offset;
 			break;
 		case GRAPH_CHUNKID_BASE:
 			if (graph->chunk_base_graphs)
 				chunk_repeated = 1;
 			else
 				graph->chunk_base_graphs = data + chunk_offset;
 			break;
 		case GRAPH_CHUNKID_BLOOMINDEXES:
 			if (graph->chunk_bloom_indexes)
 				chunk_repeated = 1;
 			else if (r->settings.commit_graph_read_changed_paths)
 				graph->chunk_bloom_indexes = data + chunk_offset;
 			break;
 		case GRAPH_CHUNKID_BLOOMDATA:
 			if (graph->chunk_bloom_data)
 				chunk_repeated = 1;
 			else if (r->settings.commit_graph_read_changed_paths) {
 				uint32_t hash_version;
 				graph->chunk_bloom_data = data + chunk_offset;
 				hash_version = get_be32(data + chunk_offset);
 				if (hash_version != 1)
 					break;
 				graph->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
 				graph->bloom_filter_settings->hash_version = hash_version;
 				graph->bloom_filter_settings->num_hashes = get_be32(data + chunk_offset + 4);
 				graph->bloom_filter_settings->bits_per_entry = get_be32(data + chunk_offset + 8);
 				graph->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES;
 			}
 			break;
 		}
 		if (chunk_repeated) {
 			error(_("commit-graph chunk id %08x appears multiple times"), chunk_id);
 			goto free_and_return;
 		}
 	}
 	if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) {
@ -473,9 +420,11 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 	if (verify_commit_graph_lite(graph))
 		goto free_and_return;
 	free_chunkfile(cf);
 	return graph;
 free_and_return:
 	free_chunkfile(cf);
 	free(graph->bloom_filter_settings);
 	free(graph);
 	return NULL;
@ -1051,8 +1000,9 @@ struct write_commit_graph_context {
 };
 static int write_graph_chunk_fanout(struct hashfile *f,
-				    struct write_commit_graph_context *ctx)
+				    void *data)
 {
 	struct write_commit_graph_context *ctx = data;
 	int i, count = 0;
 	struct commit **list = ctx->commits.list;
@ -1077,8 +1027,9 @@ static int write_graph_chunk_fanout(struct hashfile *f,
 }
 static int write_graph_chunk_oids(struct hashfile *f,
-				  struct write_commit_graph_context *ctx)
+				  void *data)
 {
 	struct write_commit_graph_context *ctx = data;
 	struct commit **list = ctx->commits.list;
 	int count;
 	for (count = 0; count < ctx->commits.nr; count++, list++) {
@ -1096,8 +1047,9 @@ static const struct object_id *commit_to_oid(size_t index, const void *table)
 }
 static int write_graph_chunk_data(struct hashfile *f,
-				  struct write_commit_graph_context *ctx)
+				  void *data)
 {
 	struct write_commit_graph_context *ctx = data;
 	struct commit **list = ctx->commits.list;
 	struct commit **last = ctx->commits.list + ctx->commits.nr;
 	uint32_t num_extra_edges = 0;
@ -1198,8 +1150,9 @@ static int write_graph_chunk_data(struct hashfile *f,
 }
 static int write_graph_chunk_generation_data(struct hashfile *f,
-					      struct write_commit_graph_context *ctx)
+					     void *data)
 {
 	struct write_commit_graph_context *ctx = data;
 	int i, num_generation_data_overflows = 0;
 	for (i = 0; i < ctx->commits.nr; i++) {
@ -1221,8 +1174,9 @@ static int write_graph_chunk_generation_data(struct hashfile *f,
 }
 static int write_graph_chunk_generation_data_overflow(struct hashfile *f,
-						       struct write_commit_graph_context *ctx)
+						      void *data)
 {
 	struct write_commit_graph_context *ctx = data;
 	int i;
 	for (i = 0; i < ctx->commits.nr; i++) {
 		struct commit *c = ctx->commits.list[i];
@ -1239,8 +1193,9 @@ static int write_graph_chunk_generation_data_overflow(struct hashfile *f,
 }
 static int write_graph_chunk_extra_edges(struct hashfile *f,
-					 struct write_commit_graph_context *ctx)
+					 void *data)
 {
 	struct write_commit_graph_context *ctx = data;
 	struct commit **list = ctx->commits.list;
 	struct commit **last = ctx->commits.list + ctx->commits.nr;
 	struct commit_list *parent;
@ -1293,8 +1248,9 @@ static int write_graph_chunk_extra_edges(struct hashfile *f,
 }
 static int write_graph_chunk_bloom_indexes(struct hashfile *f,
-					   struct write_commit_graph_context *ctx)
+					   void *data)
 {
 	struct write_commit_graph_context *ctx = data;
 	struct commit **list = ctx->commits.list;
 	struct commit **last = ctx->commits.list + ctx->commits.nr;
 	uint32_t cur_pos = 0;
@ -1328,8 +1284,9 @@ static void trace2_bloom_filter_settings(struct write_commit_graph_context *ctx)
 }
 static int write_graph_chunk_bloom_data(struct hashfile *f,
-					struct write_commit_graph_context *ctx)
+					void *data)
 {
 	struct write_commit_graph_context *ctx = data;
 	struct commit **list = ctx->commits.list;
 	struct commit **last = ctx->commits.list + ctx->commits.nr;
@ -1805,8 +1762,9 @@ static int write_graph_chunk_base_1(struct hashfile *f,
 }
 static int write_graph_chunk_base(struct hashfile *f,
-				  struct write_commit_graph_context *ctx)
+				    void *data)
 {
 	struct write_commit_graph_context *ctx = data;
 	int num = write_graph_chunk_base_1(f, ctx->new_base_graph);
 	if (num != ctx->num_commit_graphs_after - 1) {
@ -1817,27 +1775,16 @@ static int write_graph_chunk_base(struct hashfile *f,
 	return 0;
 }
 typedef int (*chunk_write_fn)(struct hashfile *f,
 			      struct write_commit_graph_context *ctx);
 struct chunk_info {
 	uint32_t id;
 	uint64_t size;
 	chunk_write_fn write_fn;
 };
 static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 {
 	uint32_t i;
 	int fd;
 	struct hashfile *f;
 	struct lock_file lk = LOCK_INIT;
 	struct chunk_info chunks[MAX_NUM_CHUNKS + 1];
 	const unsigned hashsz = the_hash_algo->rawsz;
 	struct strbuf progress_title = STRBUF_INIT;
 	int num_chunks = 3;
 	uint64_t chunk_offset;
 	struct object_id file_hash;
 	struct chunkfile *cf;
 	if (ctx->split) {
 		struct strbuf tmp_file = STRBUF_INIT;
@ -1883,98 +1830,62 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 		f = hashfd(fd, get_lock_file_path(&lk));
 	}
-	chunks[0].id = GRAPH_CHUNKID_OIDFANOUT;
+	cf = init_chunkfile(f);
-	chunks[0].size = GRAPH_FANOUT_SIZE;
+
-	chunks[0].write_fn = write_graph_chunk_fanout;
+	add_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, GRAPH_FANOUT_SIZE,
-	chunks[1].id = GRAPH_CHUNKID_OIDLOOKUP;
+		  write_graph_chunk_fanout);
-	chunks[1].size = hashsz * ctx->commits.nr;
+	add_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, hashsz * ctx->commits.nr,
-	chunks[1].write_fn = write_graph_chunk_oids;
+		  write_graph_chunk_oids);
-	chunks[2].id = GRAPH_CHUNKID_DATA;
+	add_chunk(cf, GRAPH_CHUNKID_DATA, (hashsz + 16) * ctx->commits.nr,
-	chunks[2].size = (hashsz + 16) * ctx->commits.nr;
+		  write_graph_chunk_data);
 	chunks[2].write_fn = write_graph_chunk_data;
 	if (git_env_bool(GIT_TEST_COMMIT_GRAPH_NO_GDAT, 0))
 		ctx->write_generation_data = 0;
-	if (ctx->write_generation_data) {
+	if (ctx->write_generation_data)
-		chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA;
+		add_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
-		chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr;
+			  sizeof(uint32_t) * ctx->commits.nr,
-		chunks[num_chunks].write_fn = write_graph_chunk_generation_data;
+			  write_graph_chunk_generation_data);
-		num_chunks++;
+	if (ctx->num_generation_data_overflows)
-	}
+		add_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
-	if (ctx->num_generation_data_overflows) {
+			  sizeof(timestamp_t) * ctx->num_generation_data_overflows,
-		chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW;
+			  write_graph_chunk_generation_data_overflow);
-		chunks[num_chunks].size = sizeof(timestamp_t) * ctx->num_generation_data_overflows;
+	if (ctx->num_extra_edges)
-		chunks[num_chunks].write_fn = write_graph_chunk_generation_data_overflow;
+		add_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES,
-		num_chunks++;
+			  4 * ctx->num_extra_edges,
-	}
+			  write_graph_chunk_extra_edges);
 	if (ctx->num_extra_edges) {
 		chunks[num_chunks].id = GRAPH_CHUNKID_EXTRAEDGES;
 		chunks[num_chunks].size = 4 * ctx->num_extra_edges;
 		chunks[num_chunks].write_fn = write_graph_chunk_extra_edges;
 		num_chunks++;
 	}
 	if (ctx->changed_paths) {
-		chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMINDEXES;
+		add_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
-		chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr;
+			  sizeof(uint32_t) * ctx->commits.nr,
-		chunks[num_chunks].write_fn = write_graph_chunk_bloom_indexes;
+			  write_graph_chunk_bloom_indexes);
-		num_chunks++;
+		add_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
-		chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMDATA;
+			  sizeof(uint32_t) * 3
-		chunks[num_chunks].size = sizeof(uint32_t) * 3
+				+ ctx->total_bloom_filter_data_size,
-					  + ctx->total_bloom_filter_data_size;
+			  write_graph_chunk_bloom_data);
 		chunks[num_chunks].write_fn = write_graph_chunk_bloom_data;
 		num_chunks++;
 	}
-	if (ctx->num_commit_graphs_after > 1) {
+	if (ctx->num_commit_graphs_after > 1)
-		chunks[num_chunks].id = GRAPH_CHUNKID_BASE;
+		add_chunk(cf, GRAPH_CHUNKID_BASE,
-		chunks[num_chunks].size = hashsz * (ctx->num_commit_graphs_after - 1);
+			  hashsz * (ctx->num_commit_graphs_after - 1),
-		chunks[num_chunks].write_fn = write_graph_chunk_base;
+			  write_graph_chunk_base);
 		num_chunks++;
 	}
 	chunks[num_chunks].id = 0;
 	chunks[num_chunks].size = 0;
 	hashwrite_be32(f, GRAPH_SIGNATURE);
 	hashwrite_u8(f, GRAPH_VERSION);
 	hashwrite_u8(f, oid_version());
-	hashwrite_u8(f, num_chunks);
+	hashwrite_u8(f, get_num_chunks(cf));
 	hashwrite_u8(f, ctx->num_commit_graphs_after - 1);
 	chunk_offset = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
 	for (i = 0; i <= num_chunks; i++) {
 		uint32_t chunk_write[3];
 		chunk_write[0] = htonl(chunks[i].id);
 		chunk_write[1] = htonl(chunk_offset >> 32);
 		chunk_write[2] = htonl(chunk_offset & 0xffffffff);
 		hashwrite(f, chunk_write, 12);
 		chunk_offset += chunks[i].size;
 	}
 	if (ctx->report_progress) {
 		strbuf_addf(&progress_title,
 			    Q_("Writing out commit graph in %d pass",
 			       "Writing out commit graph in %d passes",
-			       num_chunks),
+			       get_num_chunks(cf)),
-			    num_chunks);
+			    get_num_chunks(cf));
 		ctx->progress = start_delayed_progress(
 			progress_title.buf,
-			num_chunks * ctx->commits.nr);
+			get_num_chunks(cf) * ctx->commits.nr);
 	}
-	for (i = 0; i < num_chunks; i++) {
+	write_chunkfile(cf, ctx);
 		uint64_t start_offset = f->total + f->offset;
 		if (chunks[i].write_fn(f, ctx))
 			return -1;
 		if (f->total + f->offset != start_offset + chunks[i].size)
 			BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead",
 			    chunks[i].size, chunks[i].id,
 			    f->total + f->offset - start_offset);
 	}
 	stop_progress(&ctx->progress);
 	strbuf_release(&progress_title);
@ -1991,6 +1902,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 	close_commit_graph(ctx->r->objects);
 	finalize_hashfile(f, file_hash.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	free_chunkfile(cf);
 	if (ctx->split) {
 		FILE *chainf = fdopen_lock_file(&lk, "w");
--- a/midx.c
+++ b/midx.c
@ -11,6 +11,7 @@
 #include "trace2.h"
 #include "run-command.h"
 #include "repository.h"
 #include "chunk-format.h"
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@ -21,14 +22,12 @@
 #define MIDX_HEADER_SIZE 12
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + the_hash_algo->rawsz)
 #define MIDX_MAX_CHUNKS 5
 #define MIDX_CHUNK_ALIGNMENT 4
 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
 #define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
 #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
 #define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
 #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
 #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
 #define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
 #define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
@ -53,6 +52,19 @@ static char *get_midx_filename(const char *object_dir)
 	return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
 static int midx_read_oid_fanout(const unsigned char *chunk_start,
 				size_t chunk_size, void *data)
 {
 	struct multi_pack_index *m = data;
 	m->chunk_oid_fanout = (uint32_t *)chunk_start;
 	if (chunk_size != 4 * 256) {
 		error(_("multi-pack-index OID fanout is of the wrong size"));
 		return 1;
 	}
 	return 0;
 }
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local)
 {
 	struct multi_pack_index *m = NULL;
@ -64,6 +76,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 	char *midx_name = get_midx_filename(object_dir);
 	uint32_t i;
 	const char *cur_pack_name;
 	struct chunkfile *cf = NULL;
 	fd = git_open(midx_name);
@ -113,58 +126,23 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 	m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS);
-	for (i = 0; i < m->num_chunks; i++) {
+	cf = init_chunkfile(NULL);
 		uint32_t chunk_id = get_be32(m->data + MIDX_HEADER_SIZE +
 					     MIDX_CHUNKLOOKUP_WIDTH * i);
 		uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 +
 						 MIDX_CHUNKLOOKUP_WIDTH * i);
-		if (chunk_offset >= m->data_len)
+	if (read_table_of_contents(cf, m->data, midx_size,
-			die(_("invalid chunk offset (too large)"));
+				   MIDX_HEADER_SIZE, m->num_chunks))
 		goto cleanup_fail;
-		switch (chunk_id) {
+	if (pair_chunk(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names) == CHUNK_NOT_FOUND)
 			case MIDX_CHUNKID_PACKNAMES:
 				m->chunk_pack_names = m->data + chunk_offset;
 				break;
 			case MIDX_CHUNKID_OIDFANOUT:
 				m->chunk_oid_fanout = (uint32_t *)(m->data + chunk_offset);
 				break;
 			case MIDX_CHUNKID_OIDLOOKUP:
 				m->chunk_oid_lookup = m->data + chunk_offset;
 				break;
 			case MIDX_CHUNKID_OBJECTOFFSETS:
 				m->chunk_object_offsets = m->data + chunk_offset;
 				break;
 			case MIDX_CHUNKID_LARGEOFFSETS:
 				m->chunk_large_offsets = m->data + chunk_offset;
 				break;
 			case 0:
 				die(_("terminating multi-pack-index chunk id appears earlier than expected"));
 				break;
 			default:
 				/*
 				 * Do nothing on unrecognized chunks, allowing future
 				 * extensions to add optional chunks.
 				 */
 				break;
 		}
 	}
 	if (!m->chunk_pack_names)
 		die(_("multi-pack-index missing required pack-name chunk"));
-	if (!m->chunk_oid_fanout)
+	if (read_chunk(cf, MIDX_CHUNKID_OIDFANOUT, midx_read_oid_fanout, m) == CHUNK_NOT_FOUND)
 		die(_("multi-pack-index missing required OID fanout chunk"));
-	if (!m->chunk_oid_lookup)
+	if (pair_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, &m->chunk_oid_lookup) == CHUNK_NOT_FOUND)
 		die(_("multi-pack-index missing required OID lookup chunk"));
-	if (!m->chunk_object_offsets)
+	if (pair_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, &m->chunk_object_offsets) == CHUNK_NOT_FOUND)
 		die(_("multi-pack-index missing required object offsets chunk"));
 	pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets);
 	m->num_objects = ntohl(m->chunk_oid_fanout[255]);
 	m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
@ -190,6 +168,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 cleanup_fail:
 	free(m);
 	free(midx_name);
 	free(cf);
 	if (midx_map)
 		munmap(midx_map, midx_size);
 	if (0 <= fd)
@ -265,7 +244,7 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 	const unsigned char *offset_data;
 	uint32_t offset32;
-	offset_data = m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH;
+	offset_data = m->chunk_object_offsets + (off_t)pos * MIDX_CHUNK_OFFSET_WIDTH;
 	offset32 = get_be32(offset_data + sizeof(uint32_t));
 	if (m->chunk_large_offsets && offset32 & MIDX_LARGE_OFFSET_NEEDED) {
@ -281,7 +260,8 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
 {
-	return get_be32(m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH);
+	return get_be32(m->chunk_object_offsets +
 			(off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
 }
 static int nth_midxed_pack_entry(struct repository *r,
@ -451,49 +431,56 @@ static int pack_info_compare(const void *_a, const void *_b)
 	return strcmp(a->pack_name, b->pack_name);
 }
-struct pack_list {
+struct write_midx_context {
 	struct pack_info *info;
 	uint32_t nr;
 	uint32_t alloc;
 	struct multi_pack_index *m;
 	struct progress *progress;
 	unsigned pack_paths_checked;
 	struct pack_midx_entry *entries;
 	uint32_t entries_nr;
 	uint32_t *pack_perm;
 	unsigned large_offsets_needed:1;
 	uint32_t num_large_offsets;
 };
 static void add_pack_to_midx(const char *full_path, size_t full_path_len,
 			     const char *file_name, void *data)
 {
-	struct pack_list *packs = (struct pack_list *)data;
+	struct write_midx_context *ctx = data;
 	if (ends_with(file_name, ".idx")) {
-		display_progress(packs->progress, ++packs->pack_paths_checked);
+		display_progress(ctx->progress, ++ctx->pack_paths_checked);
-		if (packs->m && midx_contains_pack(packs->m, file_name))
+		if (ctx->m && midx_contains_pack(ctx->m, file_name))
 			return;
-		ALLOC_GROW(packs->info, packs->nr + 1, packs->alloc);
+		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
-		packs->info[packs->nr].p = add_packed_git(full_path,
+		ctx->info[ctx->nr].p = add_packed_git(full_path,
 						      full_path_len,
 						      0);
-		if (!packs->info[packs->nr].p) {
+		if (!ctx->info[ctx->nr].p) {
 			warning(_("failed to add packfile '%s'"),
 				full_path);
 			return;
 		}
-		if (open_pack_index(packs->info[packs->nr].p)) {
+		if (open_pack_index(ctx->info[ctx->nr].p)) {
 			warning(_("failed to open pack-index '%s'"),
 				full_path);
-			close_pack(packs->info[packs->nr].p);
+			close_pack(ctx->info[ctx->nr].p);
-			FREE_AND_NULL(packs->info[packs->nr].p);
+			FREE_AND_NULL(ctx->info[ctx->nr].p);
 			return;
 		}
-		packs->info[packs->nr].pack_name = xstrdup(file_name);
+		ctx->info[ctx->nr].pack_name = xstrdup(file_name);
-		packs->info[packs->nr].orig_pack_int_id = packs->nr;
+		ctx->info[ctx->nr].orig_pack_int_id = ctx->nr;
-		packs->info[packs->nr].expired = 0;
+		ctx->info[ctx->nr].expired = 0;
-		packs->nr++;
+		ctx->nr++;
 	}
 }
@ -643,27 +630,26 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 	return deduplicated_entries;
 }
-static size_t write_midx_pack_names(struct hashfile *f,
+static int write_midx_pack_names(struct hashfile *f, void *data)
 				    struct pack_info *info,
 				    uint32_t num_packs)
 {
 	struct write_midx_context *ctx = data;
 	uint32_t i;
 	unsigned char padding[MIDX_CHUNK_ALIGNMENT];
 	size_t written = 0;
-	for (i = 0; i < num_packs; i++) {
+	for (i = 0; i < ctx->nr; i++) {
 		size_t writelen;
-		if (info[i].expired)
+		if (ctx->info[i].expired)
 			continue;
-		if (i && strcmp(info[i].pack_name, info[i - 1].pack_name) <= 0)
+		if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0)
 			BUG("incorrect pack-file order: %s before %s",
-			    info[i - 1].pack_name,
+			    ctx->info[i - 1].pack_name,
-			    info[i].pack_name);
+			    ctx->info[i].pack_name);
-		writelen = strlen(info[i].pack_name) + 1;
+		writelen = strlen(ctx->info[i].pack_name) + 1;
-		hashwrite(f, info[i].pack_name, writelen);
+		hashwrite(f, ctx->info[i].pack_name, writelen);
 		written += writelen;
 	}
@ -672,18 +658,17 @@ static size_t write_midx_pack_names(struct hashfile *f,
 	if (i < MIDX_CHUNK_ALIGNMENT) {
 		memset(padding, 0, sizeof(padding));
 		hashwrite(f, padding, i);
 		written += i;
 	}
-	return written;
+	return 0;
 }
-static size_t write_midx_oid_fanout(struct hashfile *f,
+static int write_midx_oid_fanout(struct hashfile *f,
-				    struct pack_midx_entry *objects,
+				 void *data)
 				    uint32_t nr_objects)
 {
-	struct pack_midx_entry *list = objects;
+	struct write_midx_context *ctx = data;
-	struct pack_midx_entry *last = objects + nr_objects;
+	struct pack_midx_entry *list = ctx->entries;
 	struct pack_midx_entry *last = ctx->entries + ctx->entries_nr;
 	uint32_t count = 0;
 	uint32_t i;
@ -704,21 +689,21 @@ static size_t write_midx_oid_fanout(struct hashfile *f,
 		list = next;
 	}
-	return MIDX_CHUNK_FANOUT_SIZE;
+	return 0;
 }
-static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len,
+static int write_midx_oid_lookup(struct hashfile *f,
-				    struct pack_midx_entry *objects,
+				 void *data)
 				    uint32_t nr_objects)
 {
-	struct pack_midx_entry *list = objects;
+	struct write_midx_context *ctx = data;
 	unsigned char hash_len = the_hash_algo->rawsz;
 	struct pack_midx_entry *list = ctx->entries;
 	uint32_t i;
 	size_t written = 0;
-	for (i = 0; i < nr_objects; i++) {
+	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *obj = list++;
-		if (i < nr_objects - 1) {
+		if (i < ctx->entries_nr - 1) {
 			struct pack_midx_entry *next = list;
 			if (oidcmp(&obj->oid, &next->oid) >= 0)
 				BUG("OIDs not in order: %s >= %s",
@ -727,50 +712,48 @@ static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len,
 		}
 		hashwrite(f, obj->oid.hash, (int)hash_len);
 		written += hash_len;
 	}
-	return written;
+	return 0;
 }
-static size_t write_midx_object_offsets(struct hashfile *f, int large_offset_needed,
+static int write_midx_object_offsets(struct hashfile *f,
-					uint32_t *perm,
+				     void *data)
 					struct pack_midx_entry *objects, uint32_t nr_objects)
 {
-	struct pack_midx_entry *list = objects;
+	struct write_midx_context *ctx = data;
 	struct pack_midx_entry *list = ctx->entries;
 	uint32_t i, nr_large_offset = 0;
 	size_t written = 0;
-	for (i = 0; i < nr_objects; i++) {
+	for (i = 0; i < ctx->entries_nr; i++) {
 		struct pack_midx_entry *obj = list++;
-		if (perm[obj->pack_int_id] == PACK_EXPIRED)
+		if (ctx->pack_perm[obj->pack_int_id] == PACK_EXPIRED)
 			BUG("object %s is in an expired pack with int-id %d",
 			    oid_to_hex(&obj->oid),
 			    obj->pack_int_id);
-		hashwrite_be32(f, perm[obj->pack_int_id]);
+		hashwrite_be32(f, ctx->pack_perm[obj->pack_int_id]);
-		if (large_offset_needed && obj->offset >> 31)
+		if (ctx->large_offsets_needed && obj->offset >> 31)
 			hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++);
-		else if (!large_offset_needed && obj->offset >> 32)
+		else if (!ctx->large_offsets_needed && obj->offset >> 32)
 			BUG("object %s requires a large offset (%"PRIx64") but the MIDX is not writing large offsets!",
 			    oid_to_hex(&obj->oid),
 			    obj->offset);
 		else
 			hashwrite_be32(f, (uint32_t)obj->offset);
 		written += MIDX_CHUNK_OFFSET_WIDTH;
 	}
-	return written;
+	return 0;
 }
-static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_offset,
+static int write_midx_large_offsets(struct hashfile *f,
-				       struct pack_midx_entry *objects, uint32_t nr_objects)
+				    void *data)
 {
-	struct pack_midx_entry *list = objects, *end = objects + nr_objects;
+	struct write_midx_context *ctx = data;
-	size_t written = 0;
+	struct pack_midx_entry *list = ctx->entries;
 	struct pack_midx_entry *end = ctx->entries + ctx->entries_nr;
 	uint32_t nr_large_offset = ctx->num_large_offsets;
 	while (nr_large_offset) {
 		struct pack_midx_entry *obj;
@ -785,34 +768,26 @@ static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_off
 		if (!(offset >> 31))
 			continue;
-		written += hashwrite_be64(f, offset);
+		hashwrite_be64(f, offset);
 		nr_large_offset--;
 	}
-	return written;
+	return 0;
 }
 static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
 			       struct string_list *packs_to_drop, unsigned flags)
 {
 	unsigned char cur_chunk, num_chunks = 0;
 	char *midx_name;
 	uint32_t i;
 	struct hashfile *f = NULL;
 	struct lock_file lk;
-	struct pack_list packs;
+	struct write_midx_context ctx = { 0 };
 	uint32_t *pack_perm = NULL;
 	uint64_t written = 0;
 	uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1];
 	uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1];
 	uint32_t nr_entries, num_large_offsets = 0;
 	struct pack_midx_entry *entries = NULL;
 	struct progress *progress = NULL;
 	int large_offsets_needed = 0;
 	int pack_name_concat_len = 0;
 	int dropped_packs = 0;
 	int result = 0;
 	struct chunkfile *cf;
 	midx_name = get_midx_filename(object_dir);
 	if (safe_create_leading_directories(midx_name))
@ -820,61 +795,62 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 			  midx_name);
 	if (m)
-		packs.m = m;
+		ctx.m = m;
 	else
-		packs.m = load_multi_pack_index(object_dir, 1);
+		ctx.m = load_multi_pack_index(object_dir, 1);
-	packs.nr = 0;
+	ctx.nr = 0;
-	packs.alloc = packs.m ? packs.m->num_packs : 16;
+	ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
-	packs.info = NULL;
+	ctx.info = NULL;
-	ALLOC_ARRAY(packs.info, packs.alloc);
+	ALLOC_ARRAY(ctx.info, ctx.alloc);
-	if (packs.m) {
+	if (ctx.m) {
-		for (i = 0; i < packs.m->num_packs; i++) {
+		for (i = 0; i < ctx.m->num_packs; i++) {
-			ALLOC_GROW(packs.info, packs.nr + 1, packs.alloc);
+			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
-			packs.info[packs.nr].orig_pack_int_id = i;
+			ctx.info[ctx.nr].orig_pack_int_id = i;
-			packs.info[packs.nr].pack_name = xstrdup(packs.m->pack_names[i]);
+			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
-			packs.info[packs.nr].p = NULL;
+			ctx.info[ctx.nr].p = NULL;
-			packs.info[packs.nr].expired = 0;
+			ctx.info[ctx.nr].expired = 0;
-			packs.nr++;
+			ctx.nr++;
 		}
 	}
-	packs.pack_paths_checked = 0;
+	ctx.pack_paths_checked = 0;
 	if (flags & MIDX_PROGRESS)
-		packs.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0);
+		ctx.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0);
 	else
-		packs.progress = NULL;
+		ctx.progress = NULL;
-	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs);
+	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
-	stop_progress(&packs.progress);
+	stop_progress(&ctx.progress);
-	if (packs.m && packs.nr == packs.m->num_packs && !packs_to_drop)
+	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
 		goto cleanup;
-	entries = get_sorted_entries(packs.m, packs.info, packs.nr, &nr_entries);
+	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr);
-	for (i = 0; i < nr_entries; i++) {
+	ctx.large_offsets_needed = 0;
-		if (entries[i].offset > 0x7fffffff)
+	for (i = 0; i < ctx.entries_nr; i++) {
-			num_large_offsets++;
+		if (ctx.entries[i].offset > 0x7fffffff)
-		if (entries[i].offset > 0xffffffff)
+			ctx.num_large_offsets++;
-			large_offsets_needed = 1;
+		if (ctx.entries[i].offset > 0xffffffff)
 			ctx.large_offsets_needed = 1;
 	}
-	QSORT(packs.info, packs.nr, pack_info_compare);
+	QSORT(ctx.info, ctx.nr, pack_info_compare);
 	if (packs_to_drop && packs_to_drop->nr) {
 		int drop_index = 0;
 		int missing_drops = 0;
-		for (i = 0; i < packs.nr && drop_index < packs_to_drop->nr; i++) {
+		for (i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
-			int cmp = strcmp(packs.info[i].pack_name,
+			int cmp = strcmp(ctx.info[i].pack_name,
 					 packs_to_drop->items[drop_index].string);
 			if (!cmp) {
 				drop_index++;
-				packs.info[i].expired = 1;
+				ctx.info[i].expired = 1;
 			} else if (cmp > 0) {
 				error(_("did not see pack-file %s to drop"),
 				      packs_to_drop->items[drop_index].string);
@ -882,7 +858,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 				missing_drops++;
 				i--;
 			} else {
-				packs.info[i].expired = 0;
+				ctx.info[i].expired = 0;
 			}
 		}
@ -898,19 +874,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	 *
 	 * pack_perm[old_id] = new_id
 	 */
-	ALLOC_ARRAY(pack_perm, packs.nr);
+	ALLOC_ARRAY(ctx.pack_perm, ctx.nr);
-	for (i = 0; i < packs.nr; i++) {
+	for (i = 0; i < ctx.nr; i++) {
-		if (packs.info[i].expired) {
+		if (ctx.info[i].expired) {
 			dropped_packs++;
-			pack_perm[packs.info[i].orig_pack_int_id] = PACK_EXPIRED;
+			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED;
 		} else {
-			pack_perm[packs.info[i].orig_pack_int_id] = i - dropped_packs;
+			ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs;
 		}
 	}
-	for (i = 0; i < packs.nr; i++) {
+	for (i = 0; i < ctx.nr; i++) {
-		if (!packs.info[i].expired)
+		if (!ctx.info[i].expired)
-			pack_name_concat_len += strlen(packs.info[i].pack_name) + 1;
+			pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
 	}
 	if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
@ -921,123 +897,52 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
 	FREE_AND_NULL(midx_name);
-	if (packs.m)
+	if (ctx.m)
-		close_midx(packs.m);
+		close_midx(ctx.m);
-	cur_chunk = 0;
+	if (ctx.nr - dropped_packs == 0) {
 	num_chunks = large_offsets_needed ? 5 : 4;
 	if (packs.nr - dropped_packs == 0) {
 		error(_("no pack files to index."));
 		result = 1;
 		goto cleanup;
 	}
-	written = write_midx_header(f, num_chunks, packs.nr - dropped_packs);
+	cf = init_chunkfile(f);
-	chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES;
+	add_chunk(cf, MIDX_CHUNKID_PACKNAMES, pack_name_concat_len,
-	chunk_offsets[cur_chunk] = written + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH;
+		  write_midx_pack_names);
 	add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, MIDX_CHUNK_FANOUT_SIZE,
 		  write_midx_oid_fanout);
 	add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP,
 		  (size_t)ctx.entries_nr * the_hash_algo->rawsz,
 		  write_midx_oid_lookup);
 	add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS,
 		  (size_t)ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH,
 		  write_midx_object_offsets);
-	cur_chunk++;
+	if (ctx.large_offsets_needed)
-	chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT;
+		add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS,
-	chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + pack_name_concat_len;
+			(size_t)ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH,
 			write_midx_large_offsets);
-	cur_chunk++;
+	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
-	chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP;
+	write_chunkfile(cf, &ctx);
 	chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + MIDX_CHUNK_FANOUT_SIZE;
 	cur_chunk++;
 	chunk_ids[cur_chunk] = MIDX_CHUNKID_OBJECTOFFSETS;
 	chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * the_hash_algo->rawsz;
 	cur_chunk++;
 	chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * MIDX_CHUNK_OFFSET_WIDTH;
 	if (large_offsets_needed) {
 		chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS;
 		cur_chunk++;
 		chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] +
 					   num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH;
 	}
 	chunk_ids[cur_chunk] = 0;
 	for (i = 0; i <= num_chunks; i++) {
 		if (i && chunk_offsets[i] < chunk_offsets[i - 1])
 			BUG("incorrect chunk offsets: %"PRIu64" before %"PRIu64,
 			    chunk_offsets[i - 1],
 			    chunk_offsets[i]);
 		if (chunk_offsets[i] % MIDX_CHUNK_ALIGNMENT)
 			BUG("chunk offset %"PRIu64" is not properly aligned",
 			    chunk_offsets[i]);
 		hashwrite_be32(f, chunk_ids[i]);
 		hashwrite_be64(f, chunk_offsets[i]);
 		written += MIDX_CHUNKLOOKUP_WIDTH;
 	}
 	if (flags & MIDX_PROGRESS)
 		progress = start_delayed_progress(_("Writing chunks to multi-pack-index"),
 					  num_chunks);
 	for (i = 0; i < num_chunks; i++) {
 		if (written != chunk_offsets[i])
 			BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32,
 			    chunk_offsets[i],
 			    written,
 			    chunk_ids[i]);
 		switch (chunk_ids[i]) {
 			case MIDX_CHUNKID_PACKNAMES:
 				written += write_midx_pack_names(f, packs.info, packs.nr);
 				break;
 			case MIDX_CHUNKID_OIDFANOUT:
 				written += write_midx_oid_fanout(f, entries, nr_entries);
 				break;
 			case MIDX_CHUNKID_OIDLOOKUP:
 				written += write_midx_oid_lookup(f, the_hash_algo->rawsz, entries, nr_entries);
 				break;
 			case MIDX_CHUNKID_OBJECTOFFSETS:
 				written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, entries, nr_entries);
 				break;
 			case MIDX_CHUNKID_LARGEOFFSETS:
 				written += write_midx_large_offsets(f, num_large_offsets, entries, nr_entries);
 				break;
 			default:
 				BUG("trying to write unknown chunk id %"PRIx32,
 				    chunk_ids[i]);
 		}
 		display_progress(progress, i + 1);
 	}
 	stop_progress(&progress);
 	if (written != chunk_offsets[num_chunks])
 		BUG("incorrect final offset %"PRIu64" != %"PRIu64,
 		    written,
 		    chunk_offsets[num_chunks]);
 	finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 	commit_lock_file(&lk);
 cleanup:
-	for (i = 0; i < packs.nr; i++) {
+	for (i = 0; i < ctx.nr; i++) {
-		if (packs.info[i].p) {
+		if (ctx.info[i].p) {
-			close_pack(packs.info[i].p);
+			close_pack(ctx.info[i].p);
-			free(packs.info[i].p);
+			free(ctx.info[i].p);
 		}
-		free(packs.info[i].pack_name);
+		free(ctx.info[i].pack_name);
 	}
-	free(packs.info);
+	free(ctx.info);
-	free(entries);
+	free(ctx.entries);
-	free(pack_perm);
+	free(ctx.pack_perm);
 	free(midx_name);
 	return result;
 }
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@ -585,7 +585,7 @@ test_expect_success 'detect bad hash version' '
 test_expect_success 'detect low chunk count' '
 	corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\01" \
-		"missing the .* chunk"
+		"final chunk has non-zero id"
 '
 test_expect_success 'detect missing OID fanout chunk' '
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@ -314,12 +314,12 @@ test_expect_success 'verify bad OID version' '
 test_expect_success 'verify truncated chunk count' '
 	corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\01" $objdir \
-		"missing required"
+		"final chunk has non-zero id"
 '
 test_expect_success 'verify extended chunk count' '
 	corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\07" $objdir \
-		"terminating multi-pack-index chunk id appears earlier than expected"
+		"terminating chunk id appears earlier than expected"
 '
 test_expect_success 'verify missing required chunk' '
@ -329,7 +329,7 @@ test_expect_success 'verify missing required chunk' '
 test_expect_success 'verify invalid chunk offset' '
 	corrupt_midx_and_verify $MIDX_BYTE_CHUNK_OFFSET "\01" $objdir \
-		"invalid chunk offset (too large)"
+		"improper chunk offset(s)"
 '
 test_expect_success 'verify packnames out of order' '