kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	3151e6a121	Merge branch 'ps/reftable-alloc-failures-zalloc-fix' Recent reftable updates mistook a NULL return from a request for 0-byte allocation as OOM and died unnecessarily, which has been corrected. * ps/reftable-alloc-failures-zalloc-fix: reftable/basics: return NULL on zero-sized allocations reftable/stack: fix zero-sized allocation when there are no readers reftable/merged: fix zero-sized allocation when there are no readers reftable/stack: don't perform auto-compaction with less than two tables	2024-12-23 09:32:06 -08:00
Patrick Steinhardt	5ab83521cf	reftable/merged: fix zero-sized allocation when there are no readers It was reported [1] that Git started to fail with an out-of-memory error when initializing repositories with the reftable backend on NonStop platforms. A bisect led to `802c0646ac` (reftable/merged: handle allocation failures in `merged_table_init_iter()`, 2024-10-02), which changed how we allocate memory when initializing a merged table. The root cause of this seems to be that NonStop returns a `NULL` pointer when doing a zero-sized allocation. This would've already happened before the above change, but we never noticed because we did not check the result. Now we do notice and thus return an out-of-memory error to the caller. Fix the issue by skipping the allocation altogether in case there are no readers. [1]: <00ad01db5017$aa9ce340$ffd6a9c0$@nexbridge.com> Reported-by: Randall S. Becker <rsbecker@nexbridge.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-22 00:58:23 -08:00
Patrick Steinhardt	9d471b9dfe	reftable/merged: drain priority queue on reseek In `5bf96e0c39` (reftable/generic: move seeking of records into the iterator, 2024-05-13) we have refactored the reftable codebase such that iterators can be initialized once and then re-seeked multiple times. This feature is used by `1869525066` (refs/reftable: wire up support for exclude patterns, 2024-09-16) in order to skip records based on exclude patterns provided by the caller. The logic to re-seek the merged iterator is insufficient though because we don't drain the priority queue on a re-seek. This means that the queue may contain stale entries and thus reading the next record in the queue will return the wrong entry. While this is an obvious bug, it is harmless in the context of above exclude patterns: - If the queue contained stale entries that match the pattern then the caller would already know to filter out such refs. This is because our codebase is prepared to handle backends that don't have a way to efficiently implement exclude patterns. - If the queue contained stale entries that don't match the pattern we'd eventually filter out any duplicates. This is because the reftable code discards items with the same ref name and sorts any remaining entries properly. So things happen to work in this context regardless of the bug, and there is no other use case yet where we re-seek iterators. We're about to introduce a caching mechanism though where iterators are reused by the reftable backend, and that will expose the bug. Fix the issue by draining the priority queue when seeking and add a testcase that surfaces the issue. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-26 17:18:38 +09:00
Patrick Steinhardt	c2f08236ed	reftable/system: stop depending on "hash.h" We include "hash.h" in "reftable/system.h" such that we can use hash format IDs as well as the raw size of SHA1 and SHA256. As we are in the process of converting the reftable library to become standalone we of course cannot rely on those constants anymore. Introduce a new `enum reftable_hash` to replace internal uses of the hash format IDs and new constants that replace internal uses of the hash size. Adapt the reftable backend to set up the correct hash function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-19 12:23:10 +09:00
Patrick Steinhardt	12b9078066	reftable: handle trivial allocation failures Handle trivial allocation failures in the reftable library and its unit tests. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-02 07:53:55 -07:00
Patrick Steinhardt	d0501c8c9d	reftable/pq: handle allocation failures when adding entries Handle allocation failures when adding entries to the pqueue. Adapt its only caller accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-02 07:53:55 -07:00
Patrick Steinhardt	0a8372f509	reftable/reader: handle allocation failures in `reader_init_iter()` Handle allocation failures in `reader_init_iter()`. This requires us to also adapt `reftable_reader_init_*_iterator()` to bubble up the new error codes. Adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-02 07:53:53 -07:00
Patrick Steinhardt	802c0646ac	reftable/merged: handle allocation failures in `merged_table_init_iter()` Handle allocation failures in `merged_table_init_iter()`. While at it, merge `merged_iter_init()` into the function. It only has a single caller and merging them makes it easier to handle allocation failures consistently. This change also requires us to adapt `reftable_stack_init_*_iterator()` to bubble up the new error codes of `merged_table_iter_init()`. Adapt callsites accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-02 07:53:53 -07:00
Patrick Steinhardt	6014639837	reftable/generic: drop interface The `reftable_table` interface provides a generic infrastructure that can abstract away whether the underlying table is a single table, or a merged table. This abstraction can make it rather hard to reason about the code. We didn't ever use it to implement the reftable backend, and with the preceding patches in this patch series we in fact don't use it at all anymore. Furthermore, it became somewhat useless with the recent refactorings that made it possible to seek reftable iterators multiple times, as these now provide generic access to tables for us. The interface is thus redundant and only brings unnecessary complexity with it. Remove the `struct reftable_table` interface and its associated functions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-22 07:59:48 -07:00
Patrick Steinhardt	b8ca235ca5	reftable/merged: stop using generic tables in the merged table The merged table provides access to a reftable stack by merging the contents of those tables into a virtual table. These subtables are being tracked via `struct reftable_table`, which is a generic interface for accessing either a single reftable or a merged reftable. So in theory, it would be possible for the merged table to merge together other merged tables. This is somewhat nonsensical though: we only ever set up a merged table over normal reftables, and there is no reason to do otherwise. This generic interface thus makes the code way harder to follow and reason about than really necessary. The abstraction layer may also have an impact on performance, even though the extra set of vtable function calls probably doesn't really matter. Refactor the merged tables to use a `struct reftable_reader` for each of the subtables instead, which gives us direct access to the underlying tables. Adjust names accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-22 07:59:46 -07:00
Patrick Steinhardt	6631ed3ce7	reftable/merged: rename `reftable_new_merged_table()` Rename `reftable_new_merged_table()` to `reftable_merged_table_new()` such that the name matches our coding style. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-22 07:59:46 -07:00
Patrick Steinhardt	987762a51a	reftable/merged: expose functions to initialize iterators We do not expose any functions via our public headers that would allow a caller to initialize a reftable iterator from a merged table. Instead, they are expected to go via the generic `reftable_table` interface, which is somewhat roundabout. Implement two new functions to initialize iterators for ref and log records to plug this gap. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-22 07:59:45 -07:00
Junio C Hamano	40a163f217	Merge branch 'ps/ref-storage-migration' A new command has been added to migrate a repository that uses the files backend for its ref storage to use the reftable backend, with limitations. * ps/ref-storage-migration: builtin/refs: new command to migrate ref storage formats refs: implement logic to migrate between ref storage formats refs: implement removal of ref storages worktree: don't store main worktree twice reftable: inline `merged_table_release()` refs/files: fix NULL pointer deref when releasing ref store refs/files: extract function to iterate through root refs refs/files: refactor `add_pseudoref_and_head_entries()` refs: allow to skip creation of reflog entries refs: pass storage format to `ref_store_init()` explicitly refs: convert ref storage format to an enum setup: unset ref storage when reinitializing repository version	2024-06-17 15:55:55 -07:00
Patrick Steinhardt	b5d7db9e83	reftable: inline `merged_table_release()` The function `merged_table_release()` releases a merged table, whereas `reftable_merged_table_free()` releases a merged table and then also free's its pointer. But all callsites of `merged_table_release()` are in fact followed by `reftable_merged_table_free()`, which is redundant. Inline `merged_table_release()` into `reftable_merged_table_free()` to get rid of this redundance. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-06 09:04:32 -07:00
Patrick Steinhardt	369b84196e	reftable/merged: adapt interface to allow reuse of iterators Refactor the interfaces exposed by `struct reftable_merged_table` and `struct merged_iter` such that they support iterator reuse. This is done by separating initialization of the iterator and seeking on it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-13 17:04:19 -07:00
Patrick Steinhardt	08efe69212	reftable/stack: provide convenience functions to create iterators There exist a bunch of call sites in the reftable backend that want to create iterators for a reftable stack. This is rather convoluted right now, where you always have to go via the merged table. And it is about to become even more convoluted when we split up iterator initialization and seeking in the next commit. Introduce convenience functions that allow the caller to create an iterator from a reftable stack directly without going through the merged table. Adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-13 17:04:19 -07:00
Patrick Steinhardt	5bf96e0c39	reftable/generic: move seeking of records into the iterator Reftable iterators are created by seeking on the parent structure of a corresponding record. For example, to create an iterator for the merged table you would call `reftable_merged_table_seek_ref()`. Most notably, it is not posible to create an iterator and then seek it afterwards. While this may be a bit easier to reason about, it comes with two significant downsides. The first downside is that the logic to find records is split up between the parent data structure and the iterator itself. Conceptually, it is more straight forward if all that logic was contained in a single place, which should be the iterator. The second and more significant downside is that it is impossible to reuse iterators for multiple seeks. Whenever you want to look up a record, you need to re-create the whole infrastructure again, which is quite a waste of time. Furthermore, it is impossible to optimize seeks, such as when seeking the same record multiple times. To address this, we essentially split up the concerns properly such that the parent data structure is responsible for setting up the iterator via a new `init_iter()` callback, whereas the iterator handles seeks via a new `seek()` callback. This will eventually allow us to call `seek()` on the iterator multiple times, where every iterator can potentially optimize for certain cases. Note that at this point in time we are not yet ready to reuse the iterators. This will be left for a future patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-13 17:04:18 -07:00
Patrick Steinhardt	701713a254	reftable/merged: simplify indices for subiterators When seeking on a merged table, we perform the seek for each of the subiterators. If the subiterator has the desired record we add it to the priority queue, otherwise we skip it and don't add it to the stack of subiterators hosted by the merged table. The consequence of this is that the index of the subiterator in the merged table does not necessarily correspond to the index of it in the merged iterator. Next to being potentially confusing, it also means that we won't easily be able to re-seek the merged iterator because we have no clear connection between both of the data structures. Refactor the code so that the index stays the same in both structures. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-13 17:04:18 -07:00
Patrick Steinhardt	e08f49a4f5	reftable/merged: split up initialization and seeking of records To initialize a `struct merged_iter`, we need to seek all subiterators to the wanted record and then add their results to the priority queue used to sort the records. This logic is split up across two functions, `merged_table_seek_record()` and `merged_iter_init()`. The scope of these functions is somewhat weird though, where `merged_iter_init()` is only responsible for adding the records of the subiterators to the priority queue. Clarify the scope of those functions such that `merged_iter_init()` is only responsible for initializing the iterator's structure. Performing the subiterator seeks are now part of `merged_table_seek_record()`. This step is required to move seeking of records into the generic `struct reftable_iterator` infrastructure. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-13 17:04:17 -07:00
Patrick Steinhardt	080f8c4565	reftable/merged: avoid duplicate pqueue emptiness check When calling `merged_iter_next_void()` we first check whether the iter has been exhausted already. We already perform this check two levels down the stack in `merged_iter_next_entry()` though, which makes this check redundant. Now if this check was there to accelerate the common case it might have made sense to keep it. But the iterator being exhausted is rather the uncommon case because you can expect most reftable stacks to contain more than two refs. Simplify the code by removing the check. As `merged_iter_next_void()` is basically empty except for calling `merged_iter_next()` now, merge these two functions. This also results in a tiny speedup when iterating over many refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 125.6 ms ± 3.8 ms [User: 122.7 ms, System: 2.8 ms] Range (min … max): 122.4 ms … 153.4 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 124.0 ms ± 3.9 ms [User: 121.1 ms, System: 2.8 ms] Range (min … max): 120.1 ms … 156.4 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.01 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-04 10:19:40 -08:00
Patrick Steinhardt	f8c1a8e2e1	reftable/merged: circumvent pqueue with single subiter The merged iterator uses a priority queue to order records so that we can yielid them in the expected order. This priority queue of course comes with some overhead as we need to add, compare and remove entries in that priority queue. In the general case, that overhead cannot really be avoided. But when we have a single subiter left then there is no need to use the priority queue anymore because the order is exactly the same as what that subiter would return. While having a single subiter may sound like an edge case, it happens more frequently than one might think. In the most common scenario, you can expect a repository to have a single large table that contains most of the records and then a set of smaller tables which contain later additions to the reftable stack. In this case it is quite likely that we exhaust subiters of those smaller stacks before exhausting the large table. Special-case this and return records directly from the remaining subiter. This results in a sizeable speedup when iterating over 1m refs in a repository with a single table: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 135.4 ms ± 4.4 ms [User: 132.5 ms, System: 2.8 ms] Range (min … max): 131.0 ms … 166.3 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 126.3 ms ± 3.9 ms [User: 123.3 ms, System: 2.8 ms] Range (min … max): 122.7 ms … 157.0 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.07 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-04 10:19:40 -08:00
Patrick Steinhardt	3b6dd6ad1d	reftable/merged: handle subiter cleanup on close only When advancing one of the subiters fails we immediately release resources associated with that subiter. This is not necessary though as we will release these resources when closing the merged iterator anyway. Drop the logic and only release resources when the merged iterator is done. This is a mere cleanup that should help reduce the cognitive load when reading through the code. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-04 10:19:39 -08:00
Patrick Steinhardt	2d71a1d4a2	reftable/merged: remove unnecessary null check for subiters Whenever we advance a subiter we first call `iterator_is_null()`. This is not needed though because we only ever advance subiters which have entries in the priority queue, and we do not end entries to the priority queue when the subiter has been exhausted. Drop the check as well as the now-unused function. This results in a surprisingly big speedup: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 138.1 ms ± 4.4 ms [User: 135.1 ms, System: 2.8 ms] Range (min … max): 133.4 ms … 167.3 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 134.4 ms ± 4.2 ms [User: 131.5 ms, System: 2.8 ms] Range (min … max): 130.0 ms … 164.0 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.03 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-04 10:19:39 -08:00
Patrick Steinhardt	bb2d6be4c1	reftable/merged: make subiters own their records For each subiterator, the merged table needs to track their current record. This record is owned by the priority queue though instead of by the merged iterator. This is not optimal performance-wise. For one, we need to move around records whenever we add or remove a record from the priority queue. Thus, the bigger the entries the more bytes we need to copy around. And compared to pointers, a reftable record is rather on the bigger side. The other issue is that this makes it harder to reuse the records. Refactor the code so that the merged iterator tracks ownership of the records per-subiter. Instead of having records in the priority queue, we can now use mere pointers to the per-subiter records. This also allows us to swap records between the caller and the per-subiter record instead of doing an actual copy via `reftable_record_copy_from()`, which removes the need to release the caller-provided record. This results in a noticeable speedup when iterating through many refs. The following benchmark iterates through 1 million refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 145.5 ms ± 4.5 ms [User: 142.5 ms, System: 2.8 ms] Range (min … max): 141.3 ms … 177.0 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 139.0 ms ± 4.7 ms [User: 136.1 ms, System: 2.8 ms] Range (min … max): 134.2 ms … 182.2 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) This refactoring also allows a subsequent refactoring where we start reusing memory allocated by the reftable records because we do not need to release the caller-provided record anymore. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-04 10:19:39 -08:00
Patrick Steinhardt	aad8ad6fe1	reftable/merged: advance subiter on subsequent iteration When advancing the merged iterator, we pop the topmost entry from its priority queue and then advance the sub-iterator that the entry belongs to, adding the result as a new entry. This is quite sensible in the case where the merged iterator is used to actually iterate through records. But the merged iterator is also used when we look up a single record, only, so advancing the sub-iterator is wasted effort because we would never even look at the result. Instead of immediately advancing the sub-iterator, we can also defer this to the next iteration of the merged iterator by storing the intent-to-advance. This results in a small speedup when reading many records. The following benchmark creates 10000 refs, which will also end up with many ref lookups: Benchmark 1: update-ref: create many refs (revision = HEAD~) Time (mean ± σ): 337.2 ms ± 7.3 ms [User: 200.1 ms, System: 136.9 ms] Range (min … max): 329.3 ms … 373.2 ms 100 runs Benchmark 2: update-ref: create many refs (revision = HEAD) Time (mean ± σ): 332.5 ms ± 5.9 ms [User: 197.2 ms, System: 135.1 ms] Range (min … max): 327.6 ms … 359.8 ms 100 runs Summary update-ref: create many refs (revision = HEAD) ran 1.01 ± 0.03 times faster than update-ref: create many refs (revision = HEAD~) While this speedup alone isn't really worth it, this refactoring will also allow two additional optimizations in subsequent patches. First, it will allow us to special-case when there is only a single sub-iter left to circumvent the priority queue altogether. And second, it makes it easier to avoid copying records to the caller. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-04 10:19:30 -08:00
Patrick Steinhardt	48929d2e47	reftable/merged: make `merged_iter` structure private The `merged_iter` structure is not used anywhere outside of "merged.c", but is declared in its header. Move it into the code file so that it is clear that its implementation details are never exposed to anything. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-04 10:19:30 -08:00
Junio C Hamano	9f67cbd0a7	Merge branch 'ps/reftable-iteration-perf' The code to iterate over refs with the reftable backend has seen some optimization. * ps/reftable-iteration-perf: reftable/reader: add comments to `table_iter_next()` reftable/record: don't try to reallocate ref record name reftable/block: swap buffers instead of copying reftable/pq: allocation-less comparison of entry keys reftable/merged: skip comparison for records of the same subiter reftable/merged: allocation-less dropping of shadowed records reftable/record: introduce function to compare records by key	2024-02-26 18:10:24 -08:00
Patrick Steinhardt	5730a9dccf	reftable/merged: skip comparison for records of the same subiter When retrieving the next entry of a merged iterator we need to drop all records of other sub-iterators that would be shadowed by the record that we are about to return. We do this by comparing record keys, dropping all keys that are smaller or equal to the key of the record we are about to return. There is an edge case here where we can skip that comparison: when the record in the priority queue comes from the same subiterator as the record we are about to return then we know that its key must be larger than the key of the record we are about to return. This property is guaranteed by the sub-iterators, and if it didn't hold then the whole merged iterator would return records in the wrong order, too. While this may seem like a very specific edge case it's in fact quite likely to happen. For most repositories out there you can assume that we will end up with one large table and several smaller ones on top of it. Thus, it is very likely that the next entry will sort towards the top of the priority queue. Special case this and break out of the loop in that case. The following benchmark uses git-show-ref(1) to print a single ref matching a pattern out of 1 million refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 162.6 ms ± 4.5 ms [User: 159.0 ms, System: 3.5 ms] Range (min … max): 156.6 ms … 188.5 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 156.8 ms ± 4.7 ms [User: 153.0 ms, System: 3.6 ms] Range (min … max): 151.4 ms … 188.4 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.04 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-12 09:18:04 -08:00
Patrick Steinhardt	a96e9a20f3	reftable/merged: allocation-less dropping of shadowed records The purpose of the merged reftable iterator is to iterate through all entries of a set of tables in the correct order. This is implemented by using a sub-iterator for each table, where the next entry of each of these iterators gets put into a priority queue. For each iteration, we do roughly the following steps: 1. Retrieve the top record of the priority queue. This is the entry we want to return to the caller. 2. Retrieve the next record of the sub-iterator that this record came from. If any, add it to the priority queue at the correct position. The position is determined by comparing the record keys, which e.g. corresponds to the refname for ref records. 3. Keep removing the top record of the priority queue until we hit the first entry whose key is larger than the returned record's key. This is required to drop "shadowed" records. The last step will lead to at least one comparison to the next entry, but may lead to many comparisons in case the reftable stack consists of many tables with shadowed records. It is thus part of the hot code path when iterating through records. The code to compare the entries with each other is quite inefficient though. Instead of comparing record keys with each other directly, we first format them into `struct strbuf`s and only then compare them with each other. While we already optimized this code path to reuse buffers in `829231dc20` (reftable/merged: reuse buffer to compute record keys, 2023-12-11), the cost to format the keys into the buffers still adds up quite significantly. Refactor the code to use `reftable_record_cmp()` instead, which has been introduced in the preceding commit. This function compares records with each other directly without requiring any memory allocations or copying and is thus way more efficient. The following benchmark uses git-show-ref(1) to print a single ref matching a pattern out of 1 million refs. This is the most direct way to exercise ref iteration speed as we remove all overhead of having to show the refs, too. Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 180.7 ms ± 4.7 ms [User: 177.1 ms, System: 3.4 ms] Range (min … max): 174.9 ms … 211.7 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 162.1 ms ± 4.4 ms [User: 158.5 ms, System: 3.4 ms] Range (min … max): 155.4 ms … 189.3 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.11 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-12 09:18:04 -08:00
Patrick Steinhardt	3ddef475d0	reftable/record: improve semantics when initializing records According to our usual coding style, the `reftable_new_record()` function would indicate that it is allocating a new record. This is not the case though as the function merely initializes records without allocating any memory. Replace `reftable_new_record()` with a new `reftable_record_init()` function that takes a record pointer as input and initializes it accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-06 12:10:09 -08:00
Patrick Steinhardt	62d3c8e8c8	reftable/merged: refactor initialization of iterators Refactor the initialization of the merged iterator to fit our code style better. This refactoring prepares the code for a refactoring of how records are being initialized. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-06 12:10:09 -08:00
Patrick Steinhardt	59f302ca5a	reftable/merged: refactor seeking of records The code to seek reftable records in the merged table code is quite hard to read and does not conform to our coding style in multiple ways: - We have multiple exit paths where we release resources even though that is not really necessary. - We use a scoped error variable `e` which is hard to reason about. This variable is not required at all. - We allocate memory in the variable declarations, which is easy to miss. Refactor the function so that it becomes more maintainable in the future. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-06 12:10:08 -08:00
Patrick Steinhardt	81879123c3	reftable/stack: use `size_t` to track stack length While the stack length is already stored as `size_t`, we frequently use `int`s to refer to those stacks throughout the reftable library. Convert those cases to use `size_t` instead to make things consistent. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-06 12:10:08 -08:00
Patrick Steinhardt	b4ff12c8ee	reftable: introduce macros to allocate arrays Similar to the preceding commit, let's carry over macros to allocate arrays with `REFTABLE_ALLOC_ARRAY()` and `REFTABLE_CALLOC_ARRAY()`. This requires us to change the signature of `reftable_calloc()`, which only takes a single argument right now and thus puts the burden on the caller to calculate the final array's size. This is a net improvement though as it means that we can now provide proper overflow checks when multiplying the array size with the member size. Convert callsites of `reftable_calloc()` to the new signature and start using the new macros where possible. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-06 12:10:08 -08:00
Junio C Hamano	481d69dd63	Merge branch 'ps/reftable-fixes-and-optims' More fixes and optimizations to the reftable backend. * ps/reftable-fixes-and-optims: reftable/merged: transfer ownership of records when iterating reftable/merged: really reuse buffers to compute record keys reftable/record: store "val2" hashes as static arrays reftable/record: store "val1" hashes as static arrays reftable/record: constify some parts of the interface reftable/writer: fix index corruption when writing multiple indices reftable/stack: do not auto-compact twice in `reftable_stack_add()` reftable/stack: do not overwrite errors when compacting	2024-01-16 10:11:57 -08:00
Junio C Hamano	492ee03f60	Merge branch 'en/header-cleanup' Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files	2024-01-08 14:05:15 -08:00
Patrick Steinhardt	19b9496c1f	reftable/merged: transfer ownership of records when iterating When iterating over records with the merged iterator we put the records into a priority queue before yielding them to the caller. This means that we need to allocate the contents of these records before we can pass them over to the caller. The handover to the caller is quite inefficient though because we first deallocate the record passed in by the caller and then copy over the new record, which requires us to reallocate memory. Refactor the code to instead transfer ownership of the new record to the caller. So instead of reallocating all contents, we now release the old record and then copy contents of the new record into place. The following benchmark of `git show-ref --quiet` in a repository with around 350k refs shows a clear improvement. Before: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated After: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 357,007 allocs, 356,814 frees, 24,193,602 bytes allocated This shows that we now have roundabout a single allocation per record that we're yielding from the iterator. Ideally, we'd also get rid of this allocation so that the number of allocations doesn't scale with the number of refs anymore. This would require some larger surgery though because the memory is owned by the priority queue before transferring it over to the caller. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-03 09:54:21 -08:00
Patrick Steinhardt	5473aca376	reftable/merged: really reuse buffers to compute record keys In `829231dc20` (reftable/merged: reuse buffer to compute record keys, 2023-12-11), we have refactored the merged iterator to reuse a pair of long-living strbufs by relying on the fact that `reftable_record_key()` tries to reuse already allocated strbufs by calling `strbuf_reset()`, which should give us significantly fewer reallocations compared to the old code that used on-stack strbufs that are allocated for each and every iteration. Unfortunately, we called `strbuf_release()` on these long-living strbufs that we meant to reuse on each iteration, defeating the optimization. Fix this performance issue by not releasing those buffers on iteration anymore, where we instead rely on `merged_iter_close()` to release the buffers for us. Using `git show-ref --quiet` in a repository with ~350k refs this leads to a significant drop in allocations. Before: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated After: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-03 09:54:21 -08:00
Elijah Newren	eea0e59ffb	treewide: remove unnecessary includes in source files Each of these were checked with gcc -E -I. ${SOURCE_FILE} \| grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-26 12:04:31 -08:00
Patrick Steinhardt	829231dc20	reftable/merged: reuse buffer to compute record keys When iterating over entries in the merged iterator's queue, we compute the key of each of the entries and write it into a buffer. We do not reuse the buffer though and thus re-allocate it on every iteration, which is wasteful given that we never transfer ownership of the allocated bytes outside of the loop. Refactor the code to reuse the buffer. This also fixes a potential memory leak when `merged_iter_advance_subiter()` returns an error. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-11 07:23:16 -08:00
Elijah Conners	c18eecbe5c	reftable: use a pointer for pq_entry param The speed of the merged_iter_pqueue_add() can be improved by using a pointer to the pq_entry struct, which is 96 bytes. Since the pq_entry param is worked directly on the stack and does not currently have a pointer to it, the merged_iter_pqueue_add() function is slightly slower. References to pq_entry in reftable have typically included pointers, such as both of the params for pq_less(). Since we are working with pointers in the pq_entry param, as keenly pointed out, the pq_entry param has also been made into a const since the contents of the pq_entry param are copied and not manipulated. Signed-off-by: Elijah Conners <business@elijahpepe.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-15 11:32:37 -07:00
Han-Wen Nienhuys	66c0dabab5	reftable: make reftable_record a tagged union This reduces the amount of glue code, because we don't need a void pointer or vtable within the structure. The only snag is that reftable_index_record contain a strbuf, so it cannot be zero-initialized. To address this, use reftable_new_record() to return fresh instance, given a record type. Since reftable_new_record() doesn't cause heap allocation anymore, it should be balanced with reftable_record_release() rather than reftable_record_destroy(). Thanks to Peff for the suggestion. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-20 11:31:53 -08:00
Han-Wen Nienhuys	1ae2b8cda8	reftable: add merged table view This adds an abstract, read-only interface to the ref database. This primitive is used to construct the read view of the ref database (the read view is constructed by merging several *.ref files). It also provides the mechanism to provide a unified view of the refs in the main repository and the per-worktree refs. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-08 10:45:48 -07:00

43 Commits (71e19a003197960cec38d30e71b49d182bcf8510)