kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Junio C Hamano	a819a3da85	Merge branch 'ps/reftable-api-revamp' Overhaul of the reftable API. * ps/reftable-api-revamp: reftable/table: move printing logic into test helper reftable/constants: make block types part of the public interface reftable/table: introduce iterator for table blocks reftable/table: add `reftable_table` to the public interface reftable/block: expose a generic iterator over reftable records reftable/block: make block iterators reseekable reftable/block: store block pointer in the block iterator reftable/block: create public interface for reading blocks git-zlib: use `struct z_stream_s` instead of typedef reftable/block: rename `block_reader` to `reftable_block` reftable/block: rename `block` to `block_data` reftable/table: move reading block into block reader reftable/block: simplify how we track restart points reftable/blocksource: consolidate code into a single file reftable/reader: rename data structure to "table" reftable: fix formatting of the license header	2025-04-29 14:21:30 -07:00
Patrick Steinhardt	50d8459477	reftable/block: expose a generic iterator over reftable records Expose a generic iterator over reftable records and expose it via the public interface. Together with an upcoming iterator for reftable blocks contained in a table this will allow users to trivially iterate through blocks and their respective records individually. This functionality will be used to implement consistency checks for the reftable backend, which requires more fine-grained control over how we read data. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:53:12 -07:00
Patrick Steinhardt	6da48a5e00	reftable/block: make block iterators reseekable Refactor the block iterators so that initialization and seeking are different from one another. This makes the iterator trivially reseekable by storing the pointer to the block at initialization time, which we can then reuse on every seek. This refactoring prepares the code for exposing a `reftable_iterator` interface for blocks in a subsequent commit. Callsites are adjusted accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:53:11 -07:00
Patrick Steinhardt	156d79cef0	reftable/block: store block pointer in the block iterator The block iterator requires access to a bunch of data from the underlying `reftable_block` that it is iterating over. This data is stored by copying over relevant data into a separate set of variables. This has multiple downsides: - We require more storage space than necessary. This is more of a theoretical issue as we shouldn't ever have many blocks. - We have to perform more bookkeeping, and the variable names are inconsistent across the two data structures. This can lead to some confusion. - The lifetime of the block iterator is tied to the block anyway, but we hide that a bit by only storing pointers pointing into the block. There isn't really any good reason why we rip out parts of the block instead of storing a pointer to the block itself. Refactor the code to do so. Despite being simpler, it also allows us to decouple the lifetime of the block iterator from seeking in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:53:11 -07:00
Patrick Steinhardt	655e18d6b4	reftable/block: create public interface for reading blocks While users of the reftable library wouldn't generally require access to individual blocks in a reftable table, there are valid usecases where one may require low-level access to them. One such upcoming usecase in the Git codebase is to implement consistency checks for the reftable library where we want to verify each block individually. Create a public interface for reading blocks. The interface isn't yet complete and lacks e.g. a way to read individual records from a block. Such missing functionality will be backfilled in subsequent commits. Note that this change also requires us to expose `reftable_buf`, which is used by the `reftable_block_first_key()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:53:11 -07:00
Patrick Steinhardt	ce76cec964	git-zlib: use `struct z_stream_s` instead of typedef Throughout the Git codebase we're using the typedeffed version of `z_stream`, which maps to `struct z_stream_s`. By using a typedef instead of the struct it becomes somewhat harder to predeclare the symbol so that headers depending on the struct can do so without having to pull in "zlib-compat.h". We don't yet have users that would really care about this: the only users that declare `z_stream` as a pointer are in "reftable/block.h", which is a header that is internal to the reftable library. But in the next step we're going to expose the `struct reftable_block` publicly, and that struct does contain a pointer to `z_stream`. And as the public header shouldn't depend on "reftable/system.h", which is an internal implementation detail, we won't have the typedef for `z_stream` readily available. Prepare for this change by using `struct z_stream_s` throughout our code base. In case zlib-ng is used we use a define to map from `z_stream_s` to `zng_stream_s`. Drop the pre-declaration of `struct z_stream` while at it. This struct does not exist in the first place, and the declaration wasn't needed because "reftable/block.h" already includes "reftable/basics.h" which transitively includes "reftable/system.h" and thus "git-zlib.h". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:53:11 -07:00
Patrick Steinhardt	12a9aa8cb7	reftable/block: rename `block_reader` to `reftable_block` The `block_reader` structure is used to access parsed data of a reftable block. The structure is currently treated as an internal implementation detail and not exposed via our public interfaces. The functionality provided by the structure is useful to external users of the reftable library though, for example when implementing consistency checks that need to scan through the blocks manually. Rename the structure to `reftable_block` now that the name has been made available in the preceding commit. This name is in line with the naming schema used for other data structures like `reftable_table` in that it describes the underlying entity that it provides access to. The new data structure isn't yet exposed via the public interface, which is left for a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:53:10 -07:00
Patrick Steinhardt	2b3362c10d	reftable/block: rename `block` to `block_data` The `reftable_block` structure associates a byte slice with a block source. As such it only holds the data of a reftable block without actually encoding any of the details for how to access that data. Rename the structure to instead be called `reftable_block_data`. Besides clarifying that this really only holds data, it also allows us to rename the `reftable_block_reader` to `reftable_block` in the next commit, as this is the structure that actually encapsulates access to the reftable blocks. Rename the `struct reftable_block_reader::block` member accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:53:10 -07:00
Patrick Steinhardt	fd888311fb	reftable/table: move reading block into block reader The logic to read blocks from a reftable is scattered across both the table and the block subsystems. Besides causing somewhat fuzzy responsibilities, it also means that we have to awkwardly pass around the ownership of blocks between the subsystems. Refactor the code so that we stop passing the block when initializing a reader, but instead by passing in the block source plus the offset at which we're supposed to read a block. Like this, the ownership of the block itself doesn't need to get handed over as the block reader is the one owning the block right from the start. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:53:10 -07:00
Patrick Steinhardt	ba620d296a	reftable/block: simplify how we track restart points Restart points record the location of reftable records that do not use prefix compression and are used to perform a binary search inside of a block. These restart points are encoded at the end of a block, between the record data and the footer of a table. The block structure contains three different variables related to these restart points: - The block length contains the length of the reftable block up to the restart points. - The restart count contains the number of restart points contained in the block. - The restart bytes variable tracks where the restart point data begins. Tracking all three of these variables is unnecessary though as the data can be derived from one another: the block length without restart points is the exact same as the offset of the restart count data, which we already track via the `restart_bytes` data. Refactor the code so that we track the location of restart bytes not as a pointer, but instead as an offset. This allows us to trivially get rid of the `block_len` variable as described above. This avoids having the confusing `block_len` variable and allows us to do less bookkeeping overall. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:53:09 -07:00
Patrick Steinhardt	1ac4e5e83d	reftable/blocksource: consolidate code into a single file The code that implements block sources is distributed across a couple of files. Consolidate all of it into "reftable/blocksource.c" and its accompanying header so that it is easier to locate and more self contained. While at it, rename some of the functions to have properly scoped names. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:53:09 -07:00
Patrick Steinhardt	6dcc05ffc3	reftable: fix formatting of the license header The license headers used across the reftable library doesn't follow our typical coding style for multi-line comments. Fix it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-07 14:53:09 -07:00
Meet Soni	27571684dd	reftable: propagate specific error codes in block_writer_add() Previously, functions block_writer_add() and related functions returned -1 when the record did not fit, forcing the caller to assume that any failure meant the entry was too big. Replace these generic -1 returns with defined error codes. This prepares the codebase for finer-grained error handling so that callers can distinguish between a block-full condition and other errors. Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-21 01:51:07 -07:00
Patrick Steinhardt	ffe6643668	reftable/block: adapt header and footer size to return a `size_t` The functions `header_size()` and `footer_size()` return a positive integer representing the size of the header and footer, respectively, dependent on the version of the reftable format. Similar to the preceding commit, these functions return a signed integer though, which is nonsensical given that there is no way for these functions to return negative. Adapt the functions to return a `size_t` instead to fix a couple of sign comparison warnings. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 14:20:29 -08:00
Patrick Steinhardt	57adf71b93	reftable/basics: adjust `hash_size()` to return `uint32_t` The `hash_size()` function returns the number of bytes used by the hash function. Weirdly enough though, it returns a signed integer for its size even though the size obviously cannot ever be negative. The only case where it could be negative is if the function returned an error when asked for an unknown hash, but we assert(3p) instead. Adjust the type of `hash_size()` to be `uint32_t` and adapt all places that use signed integers for the hash size to follow suit. This also allows us to get rid of a couple asserts that we had which verified that the size was indeed positive, which further stresses the point that this refactoring makes sense. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 14:20:29 -08:00
Patrick Steinhardt	ef46ad0815	reftable: rename scratch buffer Both `struct block_writer` and `struct reftable_writer` have a `buf` member that is being reused to optimize the number of allocations. Rename the variable to `scratch` to clarify its intend and provide a comment explaining why it exists. Suggested-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-26 08:39:38 +09:00
Patrick Steinhardt	d94ac23d3b	reftable/block: optimize allocations by using scratch buffer The block writer needs to compute the key for every record that one adds to the writer. The buffer for this key is stored on the stack and thus reallocated on every call to `block_writer_add()`, which is inefficient. Refactor the code so that we store the buffer in the `block_writer` struct itself so that we can reuse it. This reduces the number of allocations when writing many refs, e.g. when migrating one million refs from the "files" backend to the "reftable backend. Before this change: HEAP SUMMARY: in use at exit: 80,048 bytes in 49 blocks total heap usage: 3,025,864 allocs, 3,025,815 frees, 372,746,291 bytes allocated After this change: HEAP SUMMARY: in use at exit: 80,048 bytes in 49 blocks total heap usage: 2,013,250 allocs, 2,013,201 frees, 347,543,583 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-21 07:59:17 +09:00
Patrick Steinhardt	aa248b8ab2	reftable/block: rename `block_writer::buf` variable Adapt the name of the `block_writer::buf` variable to instead be called `block`. This aligns it with the existing `block_len` variable, which tracks the length of this buffer, and is generally a bit more tied to the actual context where this variable gets used. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-11-21 07:59:17 +09:00
Patrick Steinhardt	be4c070a3c	reftable: convert from `strbuf` to `reftable_buf` Convert the reftable library to use the `reftable_buf` interface instead of the `strbuf` interface. This is mostly a mechanical change via sed(1) with some manual fixes where functions for `strbuf` and `reftable_buf` differ. The converted code does not yet handle allocation failures. This will be handled in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2024-10-17 16:59:56 -04:00
Patrick Steinhardt	2d5dbb37b2	reftable/block: handle allocation failures Handle allocation failures in `block_writer_init()` and `block_reader_init()`. This requires us to bubble up error codes into `writer_reinit_block_writer()`. Adapt call sites accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-10-02 07:53:55 -07:00
Patrick Steinhardt	8e9e136d61	reftable: use `uint16_t` to track restart interval The restart interval can at most be `UINT16_MAX` as specified in the technical documentation of the reftable format. Furthermore, it cannot ever be negative. Regardless of that we use an `int` to track the restart interval. Change the type to use an `uint16_t` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-13 17:02:38 -07:00
Junio C Hamano	5aec7231c8	Merge branch 'ps/reftable-write-optim' Code to write out reftable has seen some optimization and simplification. * ps/reftable-write-optim: reftable/block: reuse compressed array reftable/block: reuse zstream when writing log blocks reftable/writer: reset `last_key` instead of releasing it reftable/writer: unify releasing memory reftable/writer: refactorings for `writer_flush_nonempty_block()` reftable/writer: refactorings for `writer_add_record()` refs/reftable: don't recompute committer ident reftable: remove name checks refs/reftable: skip duplicate name checks refs/reftable: perform explicit D/F check when writing symrefs refs/reftable: fix D/F conflict error message on ref copy	2024-05-08 10:18:43 -07:00
Patrick Steinhardt	9da5c992dd	reftable/block: avoid copying block iterators on seek When seeking a reftable record in a block we need to position the iterator _before_ the sought-after record so that the next call to `block_iter_next()` would yield that record. To achieve this, the loop that performs the linear seeks to restore the previous position once it has found the record. This is done by advancing two `block_iter`s: one to check whether the next record is our sought-after record, and one that we update after every iteration. This of course involves quite a lot of copying and also leads to needless memory allocations. Refactor the code to get rid of the `next` iterator and the copying this involves. Instead, we can restore the previous offset such that the call to `next` will return the correct record. Next to being simpler conceptually this also leads to a nice speedup. The following benchmark parser 10k refs out of 100k existing refs via `git-rev-list --no-walk`: Benchmark 1: rev-list: print many refs (HEAD~) Time (mean ± σ): 170.2 ms ± 1.7 ms [User: 86.1 ms, System: 83.6 ms] Range (min … max): 166.4 ms … 180.3 ms 500 runs Benchmark 2: rev-list: print many refs (HEAD~) Time (mean ± σ): 161.6 ms ± 1.6 ms [User: 78.1 ms, System: 83.0 ms] Range (min … max): 158.4 ms … 172.3 ms 500 runs Summary rev-list: print many refs (HEAD) ran 1.05 ± 0.01 times faster than rev-list: print many refs (HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:37:59 -07:00
Patrick Steinhardt	ce1f213cc9	reftable/block: reuse `zstream` state on inflation When calling `inflateInit()` and `inflate()`, the zlib library will allocate several data structures for the underlying `zstream` to keep track of various information. Thus, when inflating repeatedly, it is possible to optimize memory allocation patterns by reusing the `zstream` and then calling `inflateReset()` on it to prepare it for the next chunk of data to inflate. This is exactly what the reftable code is doing: when iterating through reflogs we need to potentially inflate many log blocks, but we discard the `zstream` every single time. Instead, as we reuse the `block_reader` for each of the blocks anyway, we can initialize the `zstream` once and then reuse it for subsequent inflations. Refactor the code to do so, which leads to a significant reduction in the number of allocations. The following measurements were done when iterating through 1 million reflog entries. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 23,028 allocs, 22,906 frees, 162,813,552 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 302 allocs, 180 frees, 88,352 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Patrick Steinhardt	dd347bbce6	reftable/block: reuse uncompressed blocks The reftable backend stores reflog entries in a compressed format and thus needs to uncompress blocks before one can read records from it. For each reflog block we thus have to allocate an array that we can decompress the block contents into. This block is being discarded whenever the table iterator moves to the next block. Consequently, we reallocate a new array on every block, which is quite wasteful. Refactor the code to reuse the uncompressed block data when moving the block reader to a new block. This significantly reduces the number of allocations when iterating through many compressed blocks. The following measurements are done with `git reflog list` when listing 100k reflogs. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 45,755 allocs, 45,633 frees, 254,779,456 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 23,028 allocs, 22,906 frees, 162,813,547 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Patrick Steinhardt	bcdc586db0	reftable/block: move ownership of block reader into `struct table_iter` The table iterator allows the caller to iterate through all records in a reftable table. To do so it iterates through all blocks of the desired type one by one, where for each block it creates a new block iterator and yields all its entries. One of the things that is somewhat confusing in this context is who owns the block reader that is being used to read the blocks and pass them to the block iterator. Intuitively, as the table iterator is responsible for iterating through the blocks, one would assume that this iterator is also responsible for managing the lifecycle of the reader. And while it somewhat is, the block reader is ultimately stored inside of the block iterator. Refactor the code such that the block reader is instead fully managed by the table iterator. Instead of passing the reader to the block iterator, we now only end up passing the block data to it. Despite clearing up the lifecycle of the reader, it will also allow for better reuse of the reader in subsequent patches. The following benchmark prints a single matching ref out of 1 million refs. Before: HEAP SUMMARY: in use at exit: 13,603 bytes in 125 blocks total heap usage: 6,607 allocs, 6,482 frees, 509,635 bytes allocated After: HEAP SUMMARY: in use at exit: 13,603 bytes in 125 blocks total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated Note that while there are more allocation and free calls now, the overall number of bytes allocated is significantly lower. The number of allocations will be reduced significantly by the next patch though. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Patrick Steinhardt	b371221a60	reftable/block: introduce `block_reader_release()` Introduce a new function `block_reader_release()` that releases resources acquired by the block reader. This function will be extended in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Patrick Steinhardt	aac8c03cc4	reftable/block: better grouping of functions Function definitions and declaration of `struct block_reader` and `struct block_iter` are somewhat mixed up, making it hard to see which functions belong together. Rearrange them. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Patrick Steinhardt	42c7bdc36d	reftable/block: merge `block_iter_seek()` and `block_reader_seek()` The function `block_iter_seek()` is merely a simple wrapper around `block_reader_seek()`. Merge those two functions into a new function `block_iter_seek_key()` that more clearly says what it is actually doing. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Patrick Steinhardt	3122d44025	reftable/block: rename `block_reader_start()` The function `block_reader_start()` does not really apply to the block reader, but to the block iterator. It's name is thus somewhat confusing. Rename it to `block_iter_seek_start()` to clarify. We will rename `block_reader_seek()` in similar spirit in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-15 10:36:09 -07:00
Patrick Steinhardt	fa74f32291	reftable/block: reuse compressed array Similar to the preceding commit, let's reuse the `compressed` array that we use to store compressed data in. This results in a small reduction in memory allocations when writing many refs. Before: HEAP SUMMARY: in use at exit: 671,931 bytes in 151 blocks total heap usage: 22,620,528 allocs, 22,620,377 frees, 1,245,549,984 bytes allocated After: HEAP SUMMARY: in use at exit: 671,931 bytes in 151 blocks total heap usage: 22,618,257 allocs, 22,618,106 frees, 1,236,351,528 bytes allocated So while the reduction in allocations isn't really all that big, it's a low hanging fruit and thus there isn't much of a reason not to pick it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-08 17:01:42 -07:00
Patrick Steinhardt	a155ab2bf4	reftable/block: reuse zstream when writing log blocks While most reftable blocks are written to disk as-is, blocks for log records are compressed with zlib. To compress them we use `compress2()`, which is a simple wrapper around the more complex `zstream` interface that would require multiple function invocations. One downside of this interface is that `compress2()` will reallocate internal state of the `zstream` interface on every single invocation. Consequently, as we call `compress2()` for every single log block which we are about to write, this can lead to quite some memory allocation churn. Refactor the code so that the block writer reuses a `zstream`. This significantly reduces the number of bytes allocated when writing many refs in a single transaction, as demonstrated by the following benchmark that writes 100k refs in a single transaction. Before: HEAP SUMMARY: in use at exit: 671,931 bytes in 151 blocks total heap usage: 22,631,887 allocs, 22,631,736 frees, 1,854,670,793 bytes allocated After: HEAP SUMMARY: in use at exit: 671,931 bytes in 151 blocks total heap usage: 22,620,528 allocs, 22,620,377 frees, 1,245,549,984 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-08 17:01:42 -07:00
Patrick Steinhardt	7b8abc4d8c	reftable/record: use scratch buffer when decoding records When decoding log records we need a temporary buffer to decode the reflog entry's name, mail address and message. As this buffer is local to the function we thus have to reallocate it for every single log record which we're about to decode, which is inefficient. Refactor the code such that callers need to pass in a scratch buffer, which allows us to reuse it for multiple decodes. This reduces the number of allocations when iterating through reflogs. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 2,068,487 allocs, 2,068,365 frees, 305,122,946 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 1,068,485 allocs, 1,068,363 frees, 281,122,886 bytes allocated Note that this commit also drop some redundant calls to `strbuf_reset()` right before calling `decode_string()`. The latter already knows to reset the buffer, so there is no need for these. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-05 09:10:06 -08:00
Patrick Steinhardt	daf4f43d0d	reftable/record: decode keys in place When reading a record from a block, we need to decode the record's key. As reftable keys are prefix-compressed, meaning they reuse a prefix from the preceding record's key, this is a bit more involved than just having to copy the relevant bytes: we need to figure out the prefix and suffix lengths, copy the prefix from the preceding record and finally copy the suffix from the current record. This is done by passing three buffers to `reftable_decode_key()`: one buffer that holds the result, one buffer that holds the last key, and one buffer that points to the current record. The final key is then assembled by calling `strbuf_add()` twice to copy over the prefix and suffix. Performing two memory copies is inefficient though. And we can indeed do better by decoding keys in place. Instead of providing two buffers, the caller may only call a single buffer that is already pre-populated with the last key. Like this, we only have to call `strbuf_setlen()` to trim the record to its prefix and then `strbuf_add()` to add the suffix. This refactoring leads to a noticeable performance bump when iterating over 1 million refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 112.2 ms ± 3.9 ms [User: 109.3 ms, System: 2.8 ms] Range (min … max): 109.2 ms … 149.6 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 106.0 ms ± 3.5 ms [User: 103.2 ms, System: 2.7 ms] Range (min … max): 103.2 ms … 133.7 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.06 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-04 10:19:49 -08:00
Patrick Steinhardt	c0cadb0576	reftable/block: reuse buffer to compute record keys When iterating over entries in the block iterator we compute the key of each of the entries and write it into a buffer. We do not reuse the buffer though and thus re-allocate it on every iteration, which is wasteful. Refactor the code to reuse the buffer. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-11 07:23:17 -08:00
Patrick Steinhardt	a8305bc6d8	reftable/block: introduce macro to initialize `struct block_iter` There are a bunch of locations where we initialize members of `struct block_iter`, which makes it harder than necessary to expand this struct to have additional members. Unify the logic via a new `BLOCK_ITER_INIT` macro that initializes all members. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-11 07:23:17 -08:00
Han-Wen Nienhuys	019bd34082	reftable: fix typo in header Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-23 12:28:28 -08:00
Han-Wen Nienhuys	e581fd7231	reftable: reading/writing blocks The reftable format is structured as a sequence of block. Within a block, records are prefix compressed, with an index of offsets for fully expand keys to enable binary search within blocks. This commit provides the logic to read and write these blocks. Helped-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-08 10:45:48 -07:00

38 Commits (v2.50.0-rc2)