Commit Graph

403 Commits (v2.50.0-rc2)

Author SHA1 Message Date
Carlo Marcelo Arenas Belón f1228cd12c reftable: make REFTABLE_UNUSED C99 compatible
Since f93b2a0424 (reftable/basics: introduce `REFTABLE_UNUSED`
annotation, 2025-02-18), the reftable library was migrated to
use an internal version of `UNUSED`, which unconditionally sets
a GNU __attribute__ to avoid warnings function parameters that
are not being used.

Make the definition conditional to prevent breaking the build
with non GNU compilers.

Reported-by: "Randall S. Becker" <rsbecker@nexbridge.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-29 09:18:53 -07:00
Junio C Hamano 90eedabbf7 Merge branch 'ps/reftable-read-block-perffix'
Performance regression in not-yet-released code has been corrected.

* ps/reftable-read-block-perffix:
  reftable: fix perf regression when reading blocks of unwanted type
2025-05-19 16:02:48 -07:00
Junio C Hamano 2b3303166b Merge branch 'ly/reftable-writer-leakfix'
Leakfix.

* ly/reftable-writer-leakfix:
  reftable/writer: fix memory leak when `writer_index_hash()` fails
  reftable/writer: fix memory leak when `padded_write()` fails
2025-05-19 16:02:47 -07:00
Patrick Steinhardt 1970333644 reftable: fix perf regression when reading blocks of unwanted type
In fd888311fb (reftable/table: move reading block into block reader,
2025-04-07), we have refactored how reftable blocks are read so that
most of the logic is contained in the "block.c" subsystem itself. Most
importantly, the whole logic to read the data itself is now contained in
that subsystem.

This change caused a significant performance regression though when
reading blocks that aren't of the specific type one is searching for:

    Benchmark 1: update-ref: create 100k refs (revision = fd888311fbc~)
      Time (mean ± σ):      2.171 s ±  0.028 s    [User: 1.189 s, System: 0.977 s]
      Range (min … max):    2.117 s …  2.206 s    10 runs

    Benchmark 2: update-ref: create 100k refs (revision = fd888311fb)
      Time (mean ± σ):      3.418 s ±  0.030 s    [User: 2.371 s, System: 1.037 s]
      Range (min … max):    3.377 s …  3.473 s    10 runs

    Summary
      update-ref: create 100k refs (revision = fd888311fbc~) ran
        1.57 ± 0.02 times faster than update-ref: create 100k refs (revision = fd888311fb)

The root caute of the performance regression is that we changed when
exactly blocks of an uninteresting type are being discarded. Previous to
the refactoring in the mentioned commit we'd load the block data, read
its type, notice that it's not the wanted type and discard the block.
After the commit though we don't discard the block immediately, but we
fully decode it only to realize that it's not the desired type. We then
discard the block again, but have already performed a bunch of pointless
work.

Fix the regression by making `reftable_block_init()` return early in
case the block is not of the desired type. This fixes the performance
hit:

    Benchmark 1: update-ref: create 100k refs (revision = HEAD~)
      Time (mean ± σ):      2.712 s ±  0.018 s    [User: 1.990 s, System: 0.716 s]
      Range (min … max):    2.682 s …  2.741 s    10 runs

    Benchmark 2: update-ref: create 100k refs (revision = HEAD)
      Time (mean ± σ):      1.670 s ±  0.012 s    [User: 0.991 s, System: 0.676 s]
      Range (min … max):    1.652 s …  1.693 s    10 runs

    Summary
      update-ref: create 100k refs (revision = HEAD) ran
        1.62 ± 0.02 times faster than update-ref: create 100k refs (revision = HEAD~)

Note that the baseline performance is lower than in the original due to
a couple of unrelated performance improvements that have landed since
the original commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-12 10:55:24 -07:00
Lidong Yan 91db6c735d reftable/writer: fix memory leak when `writer_index_hash()` fails
In reftable/writer.c:writer_index_hash(), if `reftable_buf_add` failed,
key allocated by `reftable_malloc` will not be insert into `obj_index_tree`
thus leaks. Simple add reftable_free(key) will solve this problem.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-12 09:19:50 -07:00
Lidong Yan c8e752eaef reftable/writer: fix memory leak when `padded_write()` fails
In reftable/writer.c:padded_write(), if w->writer failed, zeroed
allocated in `reftable_calloc` will leak. w->writer could be
`reftable_write_data` in reftable/stack.c, and could fail due to
some write error. Simply add reftable_free(zeroed) will solve this
problem.

Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-12 09:19:49 -07:00
Junio C Hamano a819a3da85 Merge branch 'ps/reftable-api-revamp'
Overhaul of the reftable API.

* ps/reftable-api-revamp:
  reftable/table: move printing logic into test helper
  reftable/constants: make block types part of the public interface
  reftable/table: introduce iterator for table blocks
  reftable/table: add `reftable_table` to the public interface
  reftable/block: expose a generic iterator over reftable records
  reftable/block: make block iterators reseekable
  reftable/block: store block pointer in the block iterator
  reftable/block: create public interface for reading blocks
  git-zlib: use `struct z_stream_s` instead of typedef
  reftable/block: rename `block_reader` to `reftable_block`
  reftable/block: rename `block` to `block_data`
  reftable/table: move reading block into block reader
  reftable/block: simplify how we track restart points
  reftable/blocksource: consolidate code into a single file
  reftable/reader: rename data structure to "table"
  reftable: fix formatting of the license header
2025-04-29 14:21:30 -07:00
Junio C Hamano 139d703511 Merge branch 'ps/reftable-windows-unlink-fix'
Portability fix.

* ps/reftable-windows-unlink-fix:
  reftable: ignore file-in-use errors when unlink(3p) fails on Windows
2025-04-15 13:50:13 -07:00
Junio C Hamano 6e2a3b8ae0 Merge branch 'ps/reftable-sans-compat-util'
Make the code in reftable library less reliant on the service
routines it used to borrow from Git proper, to make it easier to
use by external users of the library.

* ps/reftable-sans-compat-util:
  Makefile: skip reftable library for Coccinelle
  reftable: decouple from Git codebase by pulling in "compat/posix.h"
  git-compat-util.h: split out POSIX-emulating bits
  compat/mingw: split out POSIX-related bits
  reftable/basics: introduce `REFTABLE_UNUSED` annotation
  reftable/basics: stop using `SWAP()` macro
  reftable/stack: stop using `sleep_millisec()`
  reftable/system: introduce `reftable_rand()`
  reftable/reader: stop using `ARRAY_SIZE()` macro
  reftable/basics: provide wrappers for big endian conversion
  reftable/basics: stop using `st_mult()` in array allocators
  reftable: stop using `BUG()` in trivial cases
  reftable/record: don't `BUG()` in `reftable_record_cmp()`
  reftable/record: stop using `BUG()` in `reftable_record_init()`
  reftable/record: stop using `COPY_ARRAY()`
  reftable/blocksource: stop using `xmmap()`
  reftable/stack: stop using `write_in_full()`
  reftable/stack: stop using `read_in_full()`
2025-04-08 11:43:14 -07:00
Patrick Steinhardt e0011188ca reftable/table: move printing logic into test helper
The logic to print individual blocks in a table is hosted in the
reftable library. This is only the case due to historical reasons though
because users of the library had no interfaces to read blocks one by
one. Otherwise, printing individual blocks has no place in the reftable
library given that the format will not be generic in the first place.

We have now grown a public interface to iterate through blocks contained
in a table, and thus we can finally move the logic to print them into
the test helper.

Move over the logic and refactor it accordingly. Note that the iterator
also trivially allows us to access index sections, which we previously
didn't print at all. This omission wasn't intentional though, so start
dumping those sections as well so that we can assert that indices are
written as expected.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:13 -07:00
Patrick Steinhardt 0f8ee94b63 reftable/constants: make block types part of the public interface
Now that reftable blocks can be read individually via the public
interface it becomes necessary for callers to be able to distinguish the
different types of blocks. Expose the relevant constants.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:12 -07:00
Patrick Steinhardt da89659365 reftable/table: introduce iterator for table blocks
Introduce a new iterator that allows the caller to iterate through all
blocks contained in a table. This gives users more fine-grained control
over how exactly those blocks are being read and exposes information to
callers that was previously inaccessible.

This iterator will be required by a future patch series that adds
consistency checks for the reftable backend. In addition to that though
we will also reimplement `reftable_table_print_blocks()` on top of this
new iterator in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:12 -07:00
Patrick Steinhardt c8cbe85a23 reftable/table: add `reftable_table` to the public interface
The `reftable_table` interface is an internal implementation detail that
callers have no access to. Having direct access to this structure is
important though for a subsequent patch series that will implement
consistency checks for the reftable backend.

Move the structure into "reftable-table.h" so that it part of the public
interface.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:12 -07:00
Patrick Steinhardt 50d8459477 reftable/block: expose a generic iterator over reftable records
Expose a generic iterator over reftable records and expose it via the
public interface. Together with an upcoming iterator for reftable blocks
contained in a table this will allow users to trivially iterate through
blocks and their respective records individually.

This functionality will be used to implement consistency checks for the
reftable backend, which requires more fine-grained control over how we
read data.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:12 -07:00
Patrick Steinhardt 6da48a5e00 reftable/block: make block iterators reseekable
Refactor the block iterators so that initialization and seeking are
different from one another. This makes the iterator trivially reseekable
by storing the pointer to the block at initialization time, which we can
then reuse on every seek.

This refactoring prepares the code for exposing a `reftable_iterator`
interface for blocks in a subsequent commit. Callsites are adjusted
accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:11 -07:00
Patrick Steinhardt 156d79cef0 reftable/block: store block pointer in the block iterator
The block iterator requires access to a bunch of data from the
underlying `reftable_block` that it is iterating over. This data is
stored by copying over relevant data into a separate set of variables.
This has multiple downsides:

  - We require more storage space than necessary. This is more of a
    theoretical issue as we shouldn't ever have many blocks.

  - We have to perform more bookkeeping, and the variable names are
    inconsistent across the two data structures. This can lead to some
    confusion.

  - The lifetime of the block iterator is tied to the block anyway, but
    we hide that a bit by only storing pointers pointing into the block.

There isn't really any good reason why we rip out parts of the block
instead of storing a pointer to the block itself.

Refactor the code to do so. Despite being simpler, it also allows us to
decouple the lifetime of the block iterator from seeking in a subsequent
commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:11 -07:00
Patrick Steinhardt 655e18d6b4 reftable/block: create public interface for reading blocks
While users of the reftable library wouldn't generally require access to
individual blocks in a reftable table, there are valid usecases where
one may require low-level access to them. One such upcoming usecase in
the Git codebase is to implement consistency checks for the reftable
library where we want to verify each block individually.

Create a public interface for reading blocks. The interface isn't yet
complete and lacks e.g. a way to read individual records from a block.
Such missing functionality will be backfilled in subsequent commits.

Note that this change also requires us to expose `reftable_buf`, which
is used by the `reftable_block_first_key()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:11 -07:00
Patrick Steinhardt ce76cec964 git-zlib: use `struct z_stream_s` instead of typedef
Throughout the Git codebase we're using the typedeffed version of
`z_stream`, which maps to `struct z_stream_s`. By using a typedef
instead of the struct it becomes somewhat harder to predeclare the
symbol so that headers depending on the struct can do so without having
to pull in "zlib-compat.h".

We don't yet have users that would really care about this: the only
users that declare `z_stream` as a pointer are in "reftable/block.h",
which is a header that is internal to the reftable library. But in the
next step we're going to expose the `struct reftable_block` publicly,
and that struct does contain a pointer to `z_stream`. And as the public
header shouldn't depend on "reftable/system.h", which is an internal
implementation detail, we won't have the typedef for `z_stream` readily
available.

Prepare for this change by using `struct z_stream_s` throughout our code
base. In case zlib-ng is used we use a define to map from `z_stream_s`
to `zng_stream_s`.

Drop the pre-declaration of `struct z_stream` while at it. This struct
does not exist in the first place, and the declaration wasn't needed
because "reftable/block.h" already includes "reftable/basics.h" which
transitively includes "reftable/system.h" and thus "git-zlib.h".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:11 -07:00
Patrick Steinhardt 12a9aa8cb7 reftable/block: rename `block_reader` to `reftable_block`
The `block_reader` structure is used to access parsed data of a reftable
block. The structure is currently treated as an internal implementation
detail and not exposed via our public interfaces. The functionality
provided by the structure is useful to external users of the reftable
library though, for example when implementing consistency checks that
need to scan through the blocks manually.

Rename the structure to `reftable_block` now that the name has been made
available in the preceding commit. This name is in line with the naming
schema used for other data structures like `reftable_table` in that it
describes the underlying entity that it provides access to.

The new data structure isn't yet exposed via the public interface, which
is left for a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:10 -07:00
Patrick Steinhardt 2b3362c10d reftable/block: rename `block` to `block_data`
The `reftable_block` structure associates a byte slice with a block
source. As such it only holds the data of a reftable block without
actually encoding any of the details for how to access that data.

Rename the structure to instead be called `reftable_block_data`. Besides
clarifying that this really only holds data, it also allows us to rename
the `reftable_block_reader` to `reftable_block` in the next commit, as
this is the structure that actually encapsulates access to the reftable
blocks.

Rename the `struct reftable_block_reader::block` member accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:10 -07:00
Patrick Steinhardt fd888311fb reftable/table: move reading block into block reader
The logic to read blocks from a reftable is scattered across both the
table and the block subsystems. Besides causing somewhat fuzzy
responsibilities, it also means that we have to awkwardly pass around
the ownership of blocks between the subsystems.

Refactor the code so that we stop passing the block when initializing a
reader, but instead by passing in the block source plus the offset at
which we're supposed to read a block. Like this, the ownership of the
block itself doesn't need to get handed over as the block reader is the
one owning the block right from the start.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:10 -07:00
Patrick Steinhardt ba620d296a reftable/block: simplify how we track restart points
Restart points record the location of reftable records that do not use
prefix compression and are used to perform a binary search inside of a
block. These restart points are encoded at the end of a block, between
the record data and the footer of a table.

The block structure contains three different variables related to these
restart points:

  - The block length contains the length of the reftable block up to the
    restart points.

  - The restart count contains the number of restart points contained in
    the block.

  - The restart bytes variable tracks where the restart point data
    begins.

Tracking all three of these variables is unnecessary though as the data
can be derived from one another: the block length without restart points
is the exact same as the offset of the restart count data, which we
already track via the `restart_bytes` data.

Refactor the code so that we track the location of restart bytes not as
a pointer, but instead as an offset. This allows us to trivially get rid
of the `block_len` variable as described above. This avoids having the
confusing `block_len` variable and allows us to do less bookkeeping
overall.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:09 -07:00
Patrick Steinhardt 1ac4e5e83d reftable/blocksource: consolidate code into a single file
The code that implements block sources is distributed across a couple of
files. Consolidate all of it into "reftable/blocksource.c" and its
accompanying header so that it is easier to locate and more self
contained.

While at it, rename some of the functions to have properly scoped names.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:09 -07:00
Patrick Steinhardt b648bd6549 reftable/reader: rename data structure to "table"
The `struct reftable_reader` subsystem encapsulates a table that has
been read from the disk. As such, the current name of that structure is
somewhat hard to understand as it only talks about the fact that we read
something from disk, without really giving an indicator _what_ that is.

Furthermore, this naming schema doesn't really fit well into how the
other structures are named: `reftable_merged_table`, `reftable_stack`,
`reftable_block` and `reftable_record` are all named after what they
encapsulate.

Rename the subsystem to `reftable_table`, which directly gives a hint
that the data structure is about handling the individual tables part of
the stack.

While this change results in a lot of churn, it prepares for us exposing
the APIs to third-party callers now that the reftable library is a
standalone library that can be linked against by other projects.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:09 -07:00
Patrick Steinhardt 6dcc05ffc3 reftable: fix formatting of the license header
The license headers used across the reftable library doesn't follow our
typical coding style for multi-line comments. Fix it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:09 -07:00
Junio C Hamano c7c4e5e419 Merge branch 'ps/reftable-sans-compat-util' into ps/reftable-api-revamp
* ps/reftable-sans-compat-util:
  Makefile: skip reftable library for Coccinelle
  reftable: decouple from Git codebase by pulling in "compat/posix.h"
  git-compat-util.h: split out POSIX-emulating bits
  compat/mingw: split out POSIX-related bits
  reftable/basics: introduce `REFTABLE_UNUSED` annotation
  reftable/basics: stop using `SWAP()` macro
  reftable/stack: stop using `sleep_millisec()`
  reftable/system: introduce `reftable_rand()`
  reftable/reader: stop using `ARRAY_SIZE()` macro
  reftable/basics: provide wrappers for big endian conversion
  reftable/basics: stop using `st_mult()` in array allocators
  reftable: stop using `BUG()` in trivial cases
  reftable/record: don't `BUG()` in `reftable_record_cmp()`
  reftable/record: stop using `BUG()` in `reftable_record_init()`
  reftable/record: stop using `COPY_ARRAY()`
  reftable/blocksource: stop using `xmmap()`
  reftable/stack: stop using `write_in_full()`
  reftable/stack: stop using `read_in_full()`
2025-04-01 19:05:13 +09:00
Meet Soni 0e1b9c5eed reftable: adapt write_object_record() to propagate block_writer_add() errors
Previously, write_object_record() would flush the current block and retry
appending the record whenever block_writer_add() returned any nonzero
error. This forced an assumption that every failure meant the block was
full, even when errors such as memory allocation or I/O failures occurred.

Update the write_object_record() to inspect the error code returned by
block_writer_add() and flush and reinitialize the writer iff the
error is REFTABLE_ENTRY_TOO_BIG_ERROR. For any other error, immediately
propagate it.

If the flush and reinitialization still fail with
REFTABLE_ENTRY_TOO_BIG_ERROR, reset the record's offset length to zero
before a final attempt.

All call sites now handle various error codes returned by
block_writer_add().

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21 01:51:08 -07:00
Meet Soni 9ce297239b reftable: adapt writer_add_record() to propagate block_writer_add() errors
Previously, writer_add_record() would flush the current block and retry
appending the record whenever block_writer_add() returned any nonzero
error. This forced an assumption that every failure meant the block was
full, even when errors such as memory allocation or I/O failures occurred.

Update the writer_add_record() to inspect the error code returned by
block_writer_add() and only flush and reinitialize the writer when the
error is REFTABLE_ENTRY_TOO_BIG_ERROR. For any other error, immediately
propagate it.

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21 01:51:07 -07:00
Meet Soni 27571684dd reftable: propagate specific error codes in block_writer_add()
Previously, functions block_writer_add() and related functions returned
-1 when the record did not fit, forcing the caller to assume that any
failure meant the entry was too big. Replace these generic -1 returns
with defined error codes.

This prepares the codebase for finer-grained error handling so that
callers can distinguish between a block-full condition and other errors.

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21 01:51:07 -07:00
René Scharfe bad7910399 reftable: release name on reftable_reader_new() error
If block_source_read_block() or parse_footer() fail, we leak the "name"
member of struct reftable_reader in reftable_reader_new().  Release it.

Reported by: H Z <shiyuyuranzh@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-04 09:21:39 -08:00
Patrick Steinhardt 3262a53c12 reftable: ignore file-in-use errors when unlink(3p) fails on Windows
Unlinking a file may fail on Windows systems when the file is still held
open by another process. This is incompatible with POSIX semantics and
by extension with Git's assumed semantics when unlinking files, which
is that files can be unlinked regardless of whether they are still open
or not. To counteract this incompatibility, we have some custom error
handling in the `mingw_unlink()` wrapper that first retries the deletion
with some delay, and then asks the user whether we should continue to
retry.

While this logic might be sensible in many callsites throughout Git, it
is less when used in the reftable library. We only use unlink(3) there
to delete tables which aren't referenced anymore, and the code is very
aware of the limitations on Windows. As such, all calls to unlink(3p)
don't perform any error checking at all and are fine with the call
failing.

Instead, the library provides the `reftable_stack_clean()` function,
which Git knows to execute in git-pack-refs(1) after compacting a stack.
The effect of this function is that all stale tables will eventually get
deleted once they aren't kept open anymore.

So while we're fine with unlink(3p) failing, the Windows-emulation of
that function will still perform several sleeps and ultimately end up
asking the user:

    $ git pack-refs
    Unlink of file 'C:/temp/jgittest/jgit/.git/reftable/0x000000000002-0x000000000004-50486d0e.ref' failed. Should I try again? (y/n) n
    Unlink of file 'C:/temp/jgittest/jgit/.git/reftable/0x000000000002-0x000000000004-50486d0e.ref' failed. Should I try again? (y/n) n
    Unlink of file 'C:/temp/jgittest/jgit/.git/reftable/0x000000000002-0x000000000004-50486d0e.ref' failed. Should I try again? (y/n) n

It even asks multiple times, which is doubly annoying and puzzling to
the user:

  1. It asks when trying to delete the old file after having written the
     compacted stack.

  2. It asks when reloading the stack, where it will try to unlink
     now-unreferenced tables.

  3. It asks when calling `reftable_stack_clean()`, where it will try to
     unlink now-stale tables.

Fix the issue by making it possible to disable this behaviour with a
preprocessor define. As "git-compat-util.h" is only included from
"system.h", and given that "system.h" is only ever included by headers
and code that are internal to the reftable library, we can set that
macro in this header without impacting anything else but the reftable
library.

Reported-by: Christian Reich <Zottelbart@t-online.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 14:29:18 -08:00
Patrick Steinhardt 6af23ac66c reftable: decouple from Git codebase by pulling in "compat/posix.h"
The reftable library includes "git-compat-util.h" in order to get a
POSIX-like programming environment that papers over various differences
between platforms. The header also brings with it a couple of helpers
specific to the Git codebase though, and over time we have started to
use these helpers in the reftable library, as well.

This makes it very hard to use the reftable library as a standalone
library without the rest of the Git codebase, so other libraries like
e.g. libgit2 cannot easily use it. But now that we have removed all
calls to Git-specific functionality and have split out "compat/posix.h"
as a separate header we can address this.

Stop including "git-compat-util.h" and instead include "compat/posix.h"
to finalize the decoupling of the reftable library from the rest of the
Git codebase. The only bits which remain specific to Git are "system.h"
and "system.c", which projects will have to provide.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:55:41 -08:00
Patrick Steinhardt f93b2a0424 reftable/basics: introduce `REFTABLE_UNUSED` annotation
Introduce the `REFTABLE_UNUSED` annotation and replace all existing
users of `UNUSED` in the reftable library to use the new macro instead.

Note that we unconditionally define `MAYBE_UNUSED` in the exact same
way, so doing so unconditionally for `REFTABLE_UNUSED` should be fine,
too.

Suggested-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:55:38 -08:00
Patrick Steinhardt f8ed12dec4 reftable/basics: stop using `SWAP()` macro
Stop using `SWAP()` macro in favor of an open-coded variant of it. Note
that this also requires us to open-code the build assert that `SWAP()`
itself uses to verify that the size of both variables matches.

This is done to reduce our dependency on the Git codebase.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:55:38 -08:00
Patrick Steinhardt 10f2935c7f reftable/stack: stop using `sleep_millisec()`
Refactor our use of `sleep_millisec()` by open-coding it with poll(3p),
which is the current implementation of this function. Ideally, we'd use
a more direct way to sleep, but there is no equivalent to sleep(3p) that
would accept milliseconds as input.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:55:38 -08:00
Patrick Steinhardt 712f6cfe54 reftable/system: introduce `reftable_rand()`
Introduce a new system-level `reftable_rand()` function that generates a
single unsigned integer for us. The implementation of this function is
to be provided by the calling codebase, which allows us to more easily
hook into pre-seeded random number generators.

Adapt the two callsites where we generated random data.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:55:38 -08:00
Patrick Steinhardt 01a587da8c reftable/reader: stop using `ARRAY_SIZE()` macro
We have a single user of the `ARRAY_SIZE()` macro in the reftable
reader. Drop its use to reduce our dependence on the Git codebase.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:55:37 -08:00
Patrick Steinhardt e676694298 reftable/basics: provide wrappers for big endian conversion
We're using a mixture of big endian conversion functions provided by
both the reftable library, but also by the Git codebase. Refactor the
code so that we exclusively use reftable-provided wrappers in order to
untangle us from the Git codebase.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:55:37 -08:00
Patrick Steinhardt 6e3ea71639 reftable/basics: stop using `st_mult()` in array allocators
We're using `st_mult()` as part of our macro helpers that allocate
arrays. This is bad due two two reasons:

  - `st_mult()` causes us to die in case the multiplication overflows.

  - `st_mult()` ties us to the Git codebase.

Refactor the code to instead detect overflows manually and return an
error in such cases.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:55:37 -08:00
Patrick Steinhardt 445f9f4f35 reftable: stop using `BUG()` in trivial cases
Stop using `BUG()` in the remaining trivial cases that we still have in
the reftable library. Instead of aborting the program, we'll now bubble
up a `REFTABLE_API_ERROR` to indicate misuse of the calling conventions.

Note that in both `reftable_reader_{inc,dec}ref()` we simply stop
calling `BUG()` altogether. The only situation where the counter should
be zero is when the structure has already been free'd anyway, so we
would run into undefined behaviour regardless of whether we try to abort
the program or not.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:55:36 -08:00
Patrick Steinhardt 6f6127decd reftable/record: don't `BUG()` in `reftable_record_cmp()`
The reftable library aborts with a bug in case `reftable_record_cmp()`
is invoked with two records of differing types. This would cause the
program to die without the caller being able to handle the error, which
is not something we want in the context of library code. And it ties us
to the Git codebase.

Refactor the code such that `reftable_record_cmp()` returns an error
code separate from the actual comparison result. This requires us to
also adapt some callers up the callchain in a similar fashion.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:55:36 -08:00
Patrick Steinhardt 9d9fac0f34 reftable/record: stop using `BUG()` in `reftable_record_init()`
We're aborting the program via `BUG()` in case `reftable_record_init()`
was invoked with an unknown record type. This is bad because we may now
die in library code, and because it makes us depend on the Git codebase.

Refactor the code such that `reftable_record_init()` can return an error
code to the caller. Adapt any callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:55:36 -08:00
Patrick Steinhardt a967966432 reftable/record: stop using `COPY_ARRAY()`
Drop our use of `COPY_ARRAY()`, replacing it with an open-coded variant
thereof. This is done to reduce our dependency on the Git library.

While at it, guard the whole array copy logic so that we only copy it in
case there actually is anything to be copied. Otherwise, we may end up
trying to allocate a zero-sized array, which will return a NULL pointer
and thus cause us to return an `REFTABLE_OUT_OF_MEMORY_ERROR`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:55:35 -08:00
Patrick Steinhardt 70afa6fa31 reftable/blocksource: stop using `xmmap()`
We use `xmmap()` to map reftables into memory. This function has two
problems:

  - It causes us to die in case the mmap fails.

  - It ties us to the Git codebase.

Refactor the code to use mmap(3p) instead with manual error checking.
Note that this function may not be the system-provided mmap(3p), but may
point to our `git_mmap()` wrapper that emulates the syscall on systems
that do not have mmap(3p) available.

Fix `reftable_block_source_from_file()` to properly bubble up the error
code in case the map(3p) call fails.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:55:35 -08:00
Patrick Steinhardt e31db89558 reftable/stack: stop using `write_in_full()`
Similar to the preceding commit, drop our use of `write_in_full()` and
implement a new wrapper `reftable_write_full()` that handles this logic
for us. This is done to reduce our dependency on the Git library.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:55:35 -08:00
Patrick Steinhardt cb3e368b69 reftable/stack: stop using `read_in_full()`
There is a single callsite of `read_in_full()` in the reftable library.
Open-code the function to reduce our dependency on the Git library.

Note that we only partially port over the logic from `read_in_full()`
and its underlying `xread()` helper. Most importantly, the latter also
knows to handle `EWOULDBLOCK` via `handle_nonblock()`. This logic is
irrelevant for us though because the reftable library never sets the
`O_NONBLOCK` option in the first place.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 10:55:35 -08:00
Junio C Hamano 82522a9e2c Merge branch 'kn/reflog-migration-fix-followup'
Code clean-up.

* kn/reflog-migration-fix-followup:
  reftable: prevent 'update_index' changes after adding records
  refs: use 'uint64_t' for 'ref_update.index'
  refs: mark `ref_transaction_update_reflog()` as static
2025-02-14 17:53:48 -08:00
Junio C Hamano 9d0e81e2ae Merge branch 'ps/zlib-ng'
The code paths to interact with zlib has been cleaned up in
preparation for building with zlib-ng.

* ps/zlib-ng:
  ci: make "linux-musl" job use zlib-ng
  ci: switch linux-musl to use Meson
  compat/zlib: allow use of zlib-ng as backend
  git-zlib: cast away potential constness of `next_in` pointer
  compat/zlib: provide stubs for `deflateSetHeader()`
  compat/zlib: provide `deflateBound()` shim centrally
  git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"
  compat: introduce new "zlib.h" header
  git-compat-util: drop `z_const` define
  compat: drop `uncompress2()` compatibility shim
2025-02-06 14:56:45 -08:00
Patrick Steinhardt 41f1a8435a git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"
We include "compat/zlib.h" in "git-compat-util.h", which is
unnecessarily broad given that we only have a small handful of files
that use the zlib library. Move the header into "git-zlib.h" instead and
adapt users of zlib to include that header.

One exception is the reftable library, as we don't want to use the
Git-specific wrapper of zlib there, so we include "compat/zlib.h"
instead. Furthermore, we move the include into "reftable/system.h" so
that users of the library other than Git can wire up zlib themselves.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28 13:03:22 -08:00
Patrick Steinhardt 629188ede7 compat: introduce new "zlib.h" header
Introduce a new "compat/zlib-compat.h" header that we include instead of
including <zlib.h> directly. This will allow us to wire up zlib-ng as an
alternative backend for zlib compression in a subsequent commit.

Note that we cannot just call the file "compat/zlib.h", as that may
otherwise cause us to include that file instead of <zlib.h>.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28 13:03:22 -08:00