kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	2b3303166b	Merge branch 'ly/reftable-writer-leakfix' Leakfix. * ly/reftable-writer-leakfix: reftable/writer: fix memory leak when `writer_index_hash()` fails reftable/writer: fix memory leak when `padded_write()` fails	2025-05-19 16:02:47 -07:00
Lidong Yan	91db6c735d	reftable/writer: fix memory leak when `writer_index_hash()` fails In reftable/writer.c:writer_index_hash(), if `reftable_buf_add` failed, key allocated by `reftable_malloc` will not be insert into `obj_index_tree` thus leaks. Simple add reftable_free(key) will solve this problem. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-12 09:19:50 -07:00
Lidong Yan	c8e752eaef	reftable/writer: fix memory leak when `padded_write()` fails In reftable/writer.c:padded_write(), if w->writer failed, zeroed allocated in `reftable_calloc` will leak. w->writer could be `reftable_write_data` in reftable/stack.c, and could fail due to some write error. Simply add reftable_free(zeroed) will solve this problem. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-12 09:19:49 -07:00
Junio C Hamano	a819a3da85	Merge branch 'ps/reftable-api-revamp' Overhaul of the reftable API. * ps/reftable-api-revamp: reftable/table: move printing logic into test helper reftable/constants: make block types part of the public interface reftable/table: introduce iterator for table blocks reftable/table: add `reftable_table` to the public interface reftable/block: expose a generic iterator over reftable records reftable/block: make block iterators reseekable reftable/block: store block pointer in the block iterator reftable/block: create public interface for reading blocks git-zlib: use `struct z_stream_s` instead of typedef reftable/block: rename `block_reader` to `reftable_block` reftable/block: rename `block` to `block_data` reftable/table: move reading block into block reader reftable/block: simplify how we track restart points reftable/blocksource: consolidate code into a single file reftable/reader: rename data structure to "table" reftable: fix formatting of the license header	2025-04-29 14:21:30 -07:00
Junio C Hamano	6e2a3b8ae0	Merge branch 'ps/reftable-sans-compat-util' Make the code in reftable library less reliant on the service routines it used to borrow from Git proper, to make it easier to use by external users of the library. * ps/reftable-sans-compat-util: Makefile: skip reftable library for Coccinelle reftable: decouple from Git codebase by pulling in "compat/posix.h" git-compat-util.h: split out POSIX-emulating bits compat/mingw: split out POSIX-related bits reftable/basics: introduce `REFTABLE_UNUSED` annotation reftable/basics: stop using `SWAP()` macro reftable/stack: stop using `sleep_millisec()` reftable/system: introduce `reftable_rand()` reftable/reader: stop using `ARRAY_SIZE()` macro reftable/basics: provide wrappers for big endian conversion reftable/basics: stop using `st_mult()` in array allocators reftable: stop using `BUG()` in trivial cases reftable/record: don't `BUG()` in `reftable_record_cmp()` reftable/record: stop using `BUG()` in `reftable_record_init()` reftable/record: stop using `COPY_ARRAY()` reftable/blocksource: stop using `xmmap()` reftable/stack: stop using `write_in_full()` reftable/stack: stop using `read_in_full()`	2025-04-08 11:43:14 -07:00
Patrick Steinhardt	0f8ee94b63	reftable/constants: make block types part of the public interface Now that reftable blocks can be read individually via the public interface it becomes necessary for callers to be able to distinguish the different types of blocks. Expose the relevant constants. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:53:12 -07:00
Patrick Steinhardt	6dcc05ffc3	reftable: fix formatting of the license header The license headers used across the reftable library doesn't follow our typical coding style for multi-line comments. Fix it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:53:09 -07:00
Meet Soni	0e1b9c5eed	reftable: adapt write_object_record() to propagate block_writer_add() errors Previously, write_object_record() would flush the current block and retry appending the record whenever block_writer_add() returned any nonzero error. This forced an assumption that every failure meant the block was full, even when errors such as memory allocation or I/O failures occurred. Update the write_object_record() to inspect the error code returned by block_writer_add() and flush and reinitialize the writer iff the error is REFTABLE_ENTRY_TOO_BIG_ERROR. For any other error, immediately propagate it. If the flush and reinitialization still fail with REFTABLE_ENTRY_TOO_BIG_ERROR, reset the record's offset length to zero before a final attempt. All call sites now handle various error codes returned by block_writer_add(). Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 01:51:08 -07:00
Meet Soni	9ce297239b	reftable: adapt writer_add_record() to propagate block_writer_add() errors Previously, writer_add_record() would flush the current block and retry appending the record whenever block_writer_add() returned any nonzero error. This forced an assumption that every failure meant the block was full, even when errors such as memory allocation or I/O failures occurred. Update the writer_add_record() to inspect the error code returned by block_writer_add() and only flush and reinitialize the writer when the error is REFTABLE_ENTRY_TOO_BIG_ERROR. For any other error, immediately propagate it. Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 01:51:07 -07:00
Patrick Steinhardt	f93b2a0424	reftable/basics: introduce `REFTABLE_UNUSED` annotation Introduce the `REFTABLE_UNUSED` annotation and replace all existing users of `UNUSED` in the reftable library to use the new macro instead. Note that we unconditionally define `MAYBE_UNUSED` in the exact same way, so doing so unconditionally for `REFTABLE_UNUSED` should be fine, too. Suggested-by: Toon Claes <toon@iotcl.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-18 10:55:38 -08:00
Patrick Steinhardt	e676694298	reftable/basics: provide wrappers for big endian conversion We're using a mixture of big endian conversion functions provided by both the reftable library, but also by the Git codebase. Refactor the code so that we exclusively use reftable-provided wrappers in order to untangle us from the Git codebase. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-18 10:55:37 -08:00
Patrick Steinhardt	445f9f4f35	reftable: stop using `BUG()` in trivial cases Stop using `BUG()` in the remaining trivial cases that we still have in the reftable library. Instead of aborting the program, we'll now bubble up a `REFTABLE_API_ERROR` to indicate misuse of the calling conventions. Note that in both `reftable_reader_{inc,dec}ref()` we simply stop calling `BUG()` altogether. The only situation where the counter should be zero is when the structure has already been free'd anyway, so we would run into undefined behaviour regardless of whether we try to abort the program or not. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-18 10:55:36 -08:00
Junio C Hamano	82522a9e2c	Merge branch 'kn/reflog-migration-fix-followup' Code clean-up. * kn/reflog-migration-fix-followup: reftable: prevent 'update_index' changes after adding records refs: use 'uint64_t' for 'ref_update.index' refs: mark `ref_transaction_update_reflog()` as static	2025-02-14 17:53:48 -08:00
Karthik Nayak	017bd89239	reftable: prevent 'update_index' changes after adding records The function `reftable_writer_set_limits()` allows updating the 'min_update_index' and 'max_update_index' of a reftable writer. These values are written to both the writer's header and footer. Since the header is written during the first block write, any subsequent changes to the update index would create a mismatch between the header and footer values. The footer would contain the newer values while the header retained the original ones. To protect against this bug, prevent callers from updating these values after any record is written. To do this, modify the function to return an error whenever the limits are modified after any record adds. Check for record adds within `reftable_writer_set_limits()` by checking the `last_key` and `next` variable. The former is updated after each record added, but is reset at certain points. The latter is set after writing the first block. Modify all callers of the function to anticipate a return type and handle it accordingly. Add a unit test to also ensure the function returns the error as expected. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-22 09:51:36 -08:00
Patrick Steinhardt	5ac65f0d6b	reftable/basics: adjust `common_prefix_size()` to return `size_t` The `common_prefix_size()` function computes the length of the common prefix between two buffers. As such its return value will always be an unsigned integer, as the length cannot be negative. Regardless of that, the function returns a signed integer, which is nonsensical and causes a couple of -Wsign-compare warnings all over the place. Adjust the function to return a `size_t` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 14:20:29 -08:00
René Scharfe	8db127d43f	reftable: avoid leaks on realloc error When realloc(3) fails, it returns NULL and keeps the original allocation intact. REFTABLE_ALLOC_GROW overwrites both the original pointer and the allocation count variable in that case, simultaneously leaking the original allocation and misrepresenting the number of storable items. parse_names() and reftable_buf_add() avoid leaking by restoring the original pointer value on failure, but all other callers seem to be OK with losing the old allocation. Add a new variant of the macro, REFTABLE_ALLOC_GROW_OR_NULL, which plugs the leak and zeros the allocation counter. Use it for those callers. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-28 08:00:44 -08:00
Junio C Hamano	f7c607fac3	Merge branch 'kn/reftable-writer-log-write-verify' Reftable backend adds check for upper limit of log's update_index. * kn/reftable-writer-log-write-verify: reftable/writer: ensure valid range for log's update_index	2024-12-23 09:32:08 -08:00
Junio C Hamano	de9278127e	Merge branch 'ps/reftable-detach' Isolates the reftable subsystem from the rest of Git's codebase by using fewer pieces of Git's infrastructure. * ps/reftable-detach: reftable/system: provide thin wrapper for lockfile subsystem reftable/stack: drop only use of `get_locked_file_path()` reftable/system: provide thin wrapper for tempfile subsystem reftable/stack: stop using `fsync_component()` directly reftable/system: stop depending on "hash.h" reftable: explicitly handle hash format IDs reftable/system: move "dir.h" to its only user	2024-12-10 10:04:56 +09:00
Karthik Nayak	49c6b912e2	reftable/writer: ensure valid range for log's update_index Each reftable addition has an associated update_index. While writing refs, the update_index is verified to be within the range of the reftable writer, i.e. `writer.min_update_index <= ref.update_index` and `writer.max_update_index => ref.update_index`. The corresponding check for reflogs in `reftable_writer_add_log` is however missing. Add a similar check, but only check for the upper limit. This is because reflogs are treated a bit differently than refs. Each reflog entry in reftable has an associated update_index and we also allow expiring entries in the middle, which is done by simply writing a new reflog entry with the same update_index. This means, writing reflog entries with update_index lesser than the writer's update_index is an expected scenario. Add a new unit test to check for the limits and fix some of the existing tests, which were setting arbitrary values for the update_index by ensuring they stay within the now checked limits. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-07 08:04:46 +09:00
Patrick Steinhardt	ef46ad0815	reftable: rename scratch buffer Both `struct block_writer` and `struct reftable_writer` have a `buf` member that is being reused to optimize the number of allocations. Rename the variable to `scratch` to clarify its intend and provide a comment explaining why it exists. Suggested-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-26 08:39:38 +09:00
Patrick Steinhardt	66ed011bf7	reftable/writer: optimize allocations by using a scratch buffer Both `writer_add_record()` and `reftable_writer_add_ref()` get executed for every single ref record we're adding to the reftable writer. And as both functions use a local buffer to write data, the allocations we have to do here add up during larger transactions. Refactor the code to use a scratch buffer part of the `reftable_writer` itself such that we can reuse it. This signifcantly reduces the number of allocations during large transactions, e.g. when migrating refs from the "files" backend to the "reftable" backend. Before this change: HEAP SUMMARY: in use at exit: 80,048 bytes in 49 blocks total heap usage: 5,032,171 allocs, 5,032,122 frees, 418,792,092 bytes allocated After this change: HEAP SUMMARY: in use at exit: 80,048 bytes in 49 blocks total heap usage: 3,025,864 allocs, 3,025,815 frees, 372,746,291 bytes allocated This also translate into a small speedup: Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~) Time (mean ± σ): 827.2 ms ± 16.5 ms [User: 689.4 ms, System: 124.9 ms] Range (min … max): 809.0 ms … 924.7 ms 50 runs Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD) Time (mean ± σ): 813.6 ms ± 11.6 ms [User: 679.0 ms, System: 123.4 ms] Range (min … max): 786.7 ms … 833.5 ms 50 runs Summary migrate files:reftable (refcount = 1000000, revision = HEAD) ran 1.02 ± 0.02 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-21 07:59:16 +09:00
Patrick Steinhardt	c2f08236ed	reftable/system: stop depending on "hash.h" We include "hash.h" in "reftable/system.h" such that we can use hash format IDs as well as the raw size of SHA1 and SHA256. As we are in the process of converting the reftable library to become standalone we of course cannot rely on those constants anymore. Introduce a new `enum reftable_hash` to replace internal uses of the hash format IDs and new constants that replace internal uses of the hash size. Adapt the reftable backend to set up the correct hash function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 12:23:10 +09:00
Patrick Steinhardt	88e297275b	reftable: explicitly handle hash format IDs The hash format IDs are used for two different things across the reftable codebase: - They are used as a 32 bit unsigned integer when reading and writing the header in order to identify the hash function. - They are used internally to identify which hash function is in use. When one only considers the second usecase one might think that one can easily change the representation of those hash IDs. But because those IDs end up in the reftable header and footer on disk it is important that those never change. Create separate constants `REFTABLE_FORMAT_ID_` and use them in contexts where we read or write reftable headers. This serves multiple purposes: - It allows us to more easily discern cases where we actually use those constants for the on-disk format. - It detangles us from the same constants that are defined in libgit.a, which is another required step to convert the reftable library to become standalone. - It makes the next step easier where we stop using `GIT__FORMAT_ID` constants in favor of a custom enum. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 12:23:09 +09:00
Patrick Steinhardt	20590cd287	reftable: handle trivial `reftable_buf` errors Convert the reftable library such that we handle failures with the new `reftable_buf` interfaces. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-17 16:59:56 -04:00
Patrick Steinhardt	4abc8022ff	reftable/record: adapt `reftable_record_key()` to handle allocation failures The `reftable_record_key()` function cannot pass any errors to the caller as it has a `void` return type. Adapt it and its callers such that we can handle errors and start handling allocation failures. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-17 16:59:56 -04:00
Patrick Steinhardt	be4c070a3c	reftable: convert from `strbuf` to `reftable_buf` Convert the reftable library to use the `reftable_buf` interface instead of the `strbuf` interface. This is mostly a mechanical change via sed(1) with some manual fixes where functions for `strbuf` and `reftable_buf` differ. The converted code does not yet handle allocation failures. This will be handled in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-17 16:59:56 -04:00
Patrick Steinhardt	409f04995e	reftable: stop using `strbuf_addbuf()` We're about to introduce our own `reftable_buf` type to replace `strbuf`. Get rid of the seldomly-used `strbuf_addbuf()` function such that we have to reimplement one less function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-17 16:59:55 -04:00
Patrick Steinhardt	24e0ade65b	reftable: introduce `REFTABLE_FREE_AND_NULL()` We have several calls to `FREE_AND_NULL()` in the reftable library, which of course uses free(3P). As the reftable allocators are pluggable we should rather call the reftable specific function, which is `reftable_free()`. Introduce a new macro `REFTABLE_FREE_AND_NULL()` and adapt the callsites accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-02 07:53:56 -07:00
Patrick Steinhardt	12b9078066	reftable: handle trivial allocation failures Handle trivial allocation failures in the reftable library and its unit tests. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-02 07:53:55 -07:00
Patrick Steinhardt	51afc709dc	reftable/tree: handle allocation failures The tree interfaces of the reftable library handle both insertion and searching of tree nodes with a single function, where the behaviour is altered between the two via an `insert` bit. This makes it quit awkward to handle allocation failures because on inserting we'd have to check for `NULL` pointers and return an error, whereas on searching entries we don't have to handle it as an allocation error. Split up concerns of this function into two separate functions, one for inserting entries and one for searching entries. This makes it easy for us to check for allocation errors as `tree_insert()` should never return a `NULL` pointer now. Adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-02 07:53:55 -07:00
Patrick Steinhardt	2d5dbb37b2	reftable/block: handle allocation failures Handle allocation failures in `block_writer_init()` and `block_reader_init()`. This requires us to bubble up error codes into `writer_reinit_block_writer()`. Adapt call sites accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-02 07:53:55 -07:00
Patrick Steinhardt	74d1c18757	reftable/writer: handle allocation failures in `reftable_new_writer()` Handle allocation failures in `reftable_new_writer()`. Adapt the function to return an error code to return such failures. While at it, rename it to match our code style as we have to touch up every callsite anyway. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-02 07:53:52 -07:00
Patrick Steinhardt	b680af2dba	reftable/writer: handle allocation failures in `writer_index_hash()` Handle allocation errors in `writer_index_hash()`. Adjust its only caller in `reftable_writer_add_ref()` accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-02 07:53:52 -07:00
Jeff King	4695c3f3a9	reftable: mark unused parameters in virtual functions The reftable code uses a lot of virtual function pointers, but many of the concrete implementations do not need all of the parameters. For the most part these are obviously fine to just mark as UNUSED (e.g., the empty_iterator functions unsurprisingly do not do anything). Here are a few cases where I dug a little deeper (but still ended up just marking them UNUSED): - the iterator exclude_patterns is best-effort and optional (though it would be nice to support in the long run as an optimization) - ignoring the ref_store in many transaction functions is unexpected, but works because the ref_transaction itself carries enough information to do what we need. - ignoring "err" for in some cases (e.g., transaction abort) is OK because we do not return any errors. It is a little odd for reftable_be_create_reflog(), though, since we do return errors there. We should perhaps be creating string error messages at this layer, but I've punted on that for now. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-17 09:46:12 -07:00
Patrick Steinhardt	c22d75b027	reftable/writer: improve error when passed an invalid block size The reftable format only supports block sizes up to 16MB. When the writer is being passed a value bigger than that it simply calls abort(3P), which isn't all that helpful due to the lack of a proper error message. Improve this by calling `BUG()` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-13 17:02:38 -07:00
Patrick Steinhardt	e0cf3d8f8b	reftable/writer: drop static variable used to initialize strbuf We have a static variable in the reftable writer code that is merely used to initialize the `last_key` of the writer. Convert the code to instead use `strbuf_init()` and drop the variable. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-13 17:02:38 -07:00
Patrick Steinhardt	799237852b	reftable: pass opts as constant pointer We sometimes pass the refatble write options as value and sometimes as a pointer. This is quite confusing and makes the reader wonder whether the options get modified sometimes. In fact, `reftable_new_writer()` does cause the caller-provided options to get updated when some values aren't set up. This is quite unexpected, but didn't cause any harm until now. Adapt the code so that we do not modify the caller-provided values anymore. While at it, refactor the code to code to consistently pass the options as a constant pointer to clarify that the caller-provided opts will not ever get modified. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-13 17:02:38 -07:00
Patrick Steinhardt	8aaeffe3b5	reftable/writer: reset `last_key` instead of releasing it The reftable writer tracks the last key that it has written so that it can properly compute the compressed prefix for the next record it is about to write. This last key must be reset whenever we move on to write the next block, which is done in `writer_reinit_block_writer()`. We do this by calling `strbuf_release()` though, which needlessly deallocates the underlying buffer. Convert the code to use `strbuf_reset()` instead, which saves one allocation per block we're about to write. This requires us to also amend `reftable_writer_free()` to release the buffer's memory now as we previously seemingly relied on `writer_reinit_block_writer()` to release the memory for us. Releasing memory here is the right thing to do anyway. While at it, convert a callsite where we truncate the buffer by setting its length to zero to instead use `strbuf_reset()`, too. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-08 17:01:41 -07:00
Patrick Steinhardt	60dd319519	reftable/writer: unify releasing memory There are two code paths which release memory of the reftable writer: - `reftable_writer_close()` releases internal state after it has written data. - `reftable_writer_free()` releases the block that was written to and the writer itself. Both code paths free different parts of the writer, and consequently the caller must make sure to call both. And while callers mostly do this already, this falls apart when a write failure causes the caller to skip calling `reftable_write_close()`. Introduce a new function `reftable_writer_release()` that releases all internal state and call it from both paths. Like this it is fine for the caller to not call `reftable_writer_close()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-08 17:01:41 -07:00
Patrick Steinhardt	7e892fec47	reftable/writer: refactorings for `writer_flush_nonempty_block()` Large parts of the reftable library do not conform to Git's typical code style. Refactor `writer_flush_nonempty_block()` such that it conforms better to it and add some documentation that explains some of its more intricate behaviour. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-08 17:01:41 -07:00
Patrick Steinhardt	d0dd119f72	reftable/writer: refactorings for `writer_add_record()` Large parts of the reftable library do not conform to Git's typical code style. Refactor `writer_add_record()` such that it conforms better to it and add some documentation that explains some of its more intricate behaviour. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-08 17:01:41 -07:00
Junio C Hamano	f424d7c33d	Merge branch 'ps/reftable-styles' Code clean-up in various reftable code paths. * ps/reftable-styles: reftable/record: improve semantics when initializing records reftable/merged: refactor initialization of iterators reftable/merged: refactor seeking of records reftable/stack: use `size_t` to track stack length reftable/stack: use `size_t` to track stack slices during compaction reftable/stack: index segments with `size_t` reftable/stack: fix parameter validation when compacting range reftable: introduce macros to allocate arrays reftable: introduce macros to grow arrays	2024-02-12 13:16:10 -08:00
Patrick Steinhardt	b4ff12c8ee	reftable: introduce macros to allocate arrays Similar to the preceding commit, let's carry over macros to allocate arrays with `REFTABLE_ALLOC_ARRAY()` and `REFTABLE_CALLOC_ARRAY()`. This requires us to change the signature of `reftable_calloc()`, which only takes a single argument right now and thus puts the burden on the caller to calculate the final array's size. This is a net improvement though as it means that we can now provide proper overflow checks when multiplying the array size with the member size. Convert callsites of `reftable_calloc()` to the new signature and start using the new macros where possible. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-06 12:10:08 -08:00
Patrick Steinhardt	f6b58c1be4	reftable: introduce macros to grow arrays Throughout the reftable library we have many cases where we need to grow arrays. In order to avoid too many reallocations, we roughly double the capacity of the array on each iteration. The resulting code pattern is duplicated across many sites. We have similar patterns in our main codebase, which is why we have eventually introduced an `ALLOC_GROW()` macro to abstract it away and avoid some code duplication. We cannot easily reuse this macro here though because `ALLOC_GROW()` uses `REALLOC_ARRAY()`, which in turn will call realloc(3P) to grow the array. The reftable code is structured as a library though (even if the boundaries are fuzzy), and one property this brings with it is that it is possible to plug in your own allocators. So instead of using realloc(3P), we need to use `reftable_realloc()` that knows to use the user-provided implementation. So let's introduce two new macros `REFTABLE_REALLOC_ARRAY()` and `REFTABLE_ALLOC_GROW()` that mirror what we do in our main codebase, with two modifications: - They use `reftable_realloc()`, as explained above. - They use a different growth factor of `2 * cap + 1` instead of `(cap + 16) * 3 / 2`. The second change is because we know a bit more about the allocation patterns in the reftable library. In most cases, we end up only having a handful of items in the array and don't end up growing them. The initial capacity that our normal growth factor uses (which is 24) would thus end up over-allocating in a lot of code paths. This effect is measurable: - Before change: HEAP SUMMARY: in use at exit: 671,983 bytes in 152 blocks total heap usage: 3,843,446 allocs, 3,843,294 frees, 223,761,402 bytes allocated - After change with a growth factor of `(2 * alloc + 1)`: HEAP SUMMARY: in use at exit: 671,983 bytes in 152 blocks total heap usage: 3,843,446 allocs, 3,843,294 frees, 223,761,410 bytes allocated - After change with a growth factor of `(alloc + 16)* 2 / 3`: HEAP SUMMARY: in use at exit: 671,983 bytes in 152 blocks total heap usage: 3,833,673 allocs, 3,833,521 frees, 4,728,251,742 bytes allocated While the total heap usage is roughly the same, we do end up allocating significantly more bytes with our usual growth factor (in fact, roughly 21 times as many). Convert the reftable library to use these new macros. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-06 12:10:08 -08:00
Patrick Steinhardt	4950acae7d	reftable: document reading and writing indices The way the index gets written and read is not trivial at all and requires the reader to piece together a bunch of parts to figure out how it works. Add some documentation to hopefully make this easier to understand for the next reader. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-01 11:11:33 -08:00
Patrick Steinhardt	e7485601ca	reftable/writer: fix writing multi-level indices When finishing a section we will potentially write an index that makes it more efficient to look up relevant blocks. The index records written will encode, for each block of the indexed section, what the offset of that block is as well as the last key of that block. Thus, the reader would iterate through the index records to find the first key larger or equal to the wanted key and then use the encoded offset to look up the desired block. When there are a lot of blocks to index though we may end up writing multiple index blocks, too. To not require a linear search across all index blocks we instead end up writing a multi-level index. Instead of referring to the block we are after, an index record may point to another index block. The reader will then access the highest-level index and follow down the chain of index blocks until it hits the sought-after block. It has been observed though that it is impossible to seek ref records of the last ref block when using a multi-level index. While the multi-level index exists and looks fine for most of the part, the highest-level index was missing an index record pointing to the last block of the next index. Thus, every additional level made more refs become unseekable at the end of the ref section. The root cause is that we are not flushing the last block of the current level once done writing the level. Consequently, it wasn't recorded in the blocks that need to be indexed by the next-higher level and thus we forgot about it. Fix this bug by flushing blocks after we have written all index records. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-01 11:11:32 -08:00
Patrick Steinhardt	b66e006ff5	reftable/writer: simplify writing index records When finishing the current section some index records might be written for the section to the table. The logic that adds these records to the writer duplicates what we already have in `writer_add_record()`, making this more complicated than it really has to be. Simplify the code by using `writer_add_record()` instead. While at it, drop the unneeded braces around a loop to make the code conform to our code style better. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-01 11:11:32 -08:00
Patrick Steinhardt	9ebb2d7b08	reftable/writer: use correct type to iterate through index entries The reftable writer is tracking the number of blocks it has to index via the `index_len` variable. But while this variable is of type `size_t`, some sites use an `int` to loop through the index entries. Convert the code to consistently use `size_t`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-01 11:11:32 -08:00
John Cai	1df18a1c9a	reftable: honor core.fsync While the reffiles backend honors configured fsync settings, the reftable backend does not. Address this by fsyncing reftable files using the write-or-die api's fsync_component() in two places: when we add additional entries into the table, and when we close the reftable writer. This commits adds a flush function pointer as a new member of reftable_writer because we are not sure that the first argument to the *write function pointer always contains a file descriptor. In the case of strbuf_add_void, the first argument is a buffer. This way, we can pass in a corresponding flush function that knows how to flush depending on which writer is being used. This patch does not contain tests as they will need to wait for another patch to start to exercise the reftable backend. At that point, the tests will be added to observe that fsyncs are happening when the reftable is in use. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-23 13:45:27 -08:00
Patrick Steinhardt	ddac965965	reftable/writer: fix index corruption when writing multiple indices Each reftable may contain multiple types of blocks for refs, objects and reflog records, where each of these may have an index that makes it more efficient to find the records. It was observed that the index for log records can become corrupted under certain circumstances, where the first entry of the index points into the object index instead of to the log records. As it turns out, this corruption can occur whenever we write a log index as well as at least one additional index. Writing records and their index is basically a two-step process: 1. We write all blocks for the corresponding record. Each block that gets written is added to a list of blocks to index. 2. Once all blocks were written we finish the section. If at least two blocks have been added to the list of blocks to index then we will now write the index for those blocks and flush it, as well. When we have a very large number of blocks then we may decide to write a multi-level index, which is why we also keep track of the list of the index blocks in the same way as we previously kept track of the blocks to index. Now when we have finished writing all index blocks we clear the index and flush the last block to disk. This is done in the wrong order though because flushing the block to disk will re-add it to the list of blocks to be indexed. The result is that the next section we are about to write will have an entry in the list of blocks to index that points to the last block of the preceding section's index, which will corrupt the log index. Fix this corruption by clearing the index after having written the last block. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-03 09:54:20 -08:00

1 2

61 Commits (v2.50.0-rc2)