Commit Graph

8 Commits (db271e7bb60d42f3d6eadce99ee7cd8749b49754)

Author SHA1 Message Date
Patrick Steinhardt f1bf54aee3 reftable: allow inlining of a few functions
We have a few functions which are basically just accessors to
structures. As those functions are executed inside the hot loop when
iterating through many refs, the fact that they cannot be inlined is
costing us some performance.

Move the function definitions into their respective headers so that they
can be inlined. This results in a performance improvement when iterating
over 1 million refs:

    Benchmark 1: show-ref: single matching ref (revision = HEAD~)
      Time (mean ± σ):     105.9 ms ±   3.6 ms    [User: 103.0 ms, System: 2.8 ms]
      Range (min … max):   103.1 ms … 133.4 ms    1000 runs

    Benchmark 2: show-ref: single matching ref (revision = HEAD)
      Time (mean ± σ):     100.7 ms ±   3.4 ms    [User: 97.8 ms, System: 2.8 ms]
      Range (min … max):    97.8 ms … 124.0 ms    1000 runs

    Summary
      show-ref: single matching ref (revision = HEAD) ran
        1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 10:19:49 -08:00
Patrick Steinhardt bb2d6be4c1 reftable/merged: make subiters own their records
For each subiterator, the merged table needs to track their current
record. This record is owned by the priority queue though instead of by
the merged iterator. This is not optimal performance-wise.

For one, we need to move around records whenever we add or remove a
record from the priority queue. Thus, the bigger the entries the more
bytes we need to copy around. And compared to pointers, a reftable
record is rather on the bigger side. The other issue is that this makes
it harder to reuse the records.

Refactor the code so that the merged iterator tracks ownership of the
records per-subiter. Instead of having records in the priority queue, we
can now use mere pointers to the per-subiter records. This also allows
us to swap records between the caller and the per-subiter record instead
of doing an actual copy via `reftable_record_copy_from()`, which removes
the need to release the caller-provided record.

This results in a noticeable speedup when iterating through many refs.
The following benchmark iterates through 1 million refs:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     145.5 ms ±   4.5 ms    [User: 142.5 ms, System: 2.8 ms]
    Range (min … max):   141.3 ms … 177.0 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     139.0 ms ±   4.7 ms    [User: 136.1 ms, System: 2.8 ms]
    Range (min … max):   134.2 ms … 182.2 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

This refactoring also allows a subsequent refactoring where we start
reusing memory allocated by the reftable records because we do not need
to release the caller-provided record anymore.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 10:19:39 -08:00
Junio C Hamano 9f67cbd0a7 Merge branch 'ps/reftable-iteration-perf'
The code to iterate over refs with the reftable backend has seen
some optimization.

* ps/reftable-iteration-perf:
  reftable/reader: add comments to `table_iter_next()`
  reftable/record: don't try to reallocate ref record name
  reftable/block: swap buffers instead of copying
  reftable/pq: allocation-less comparison of entry keys
  reftable/merged: skip comparison for records of the same subiter
  reftable/merged: allocation-less dropping of shadowed records
  reftable/record: introduce function to compare records by key
2024-02-26 18:10:24 -08:00
Patrick Steinhardt dbe4e8b3fd reftable/pq: allocation-less comparison of entry keys
The priority queue is used by the merged iterator to iterate over
reftable records from multiple tables in the correct order. The queue
ends up having one record for each table that is being iterated over,
with the record that is supposed to be shown next at the top. For
example, the key of a ref record is equal to its name so that we end up
sorting the priority queue lexicographically by ref name.

To figure out the order we need to compare the reftable record keys with
each other. This comparison is done by formatting them into a `struct
strbuf` and then doing `strbuf_strcmp()` on the result. We then discard
the buffers immediately after the comparison.

This ends up being very expensive. Because the priority queue usually
contains as many records as we have tables, we call the comparison
function `O(log($tablecount))` many times for every record we insert.
Furthermore, when iterating over many refs, we will insert at least one
record for every ref we are iterating over. So ultimately, this ends up
being called `O($refcount * log($tablecount))` many times.

Refactor the code to use the new `refatble_record_cmp()` function that
has been implemented in a preceding commit. This function does not need
to allocate memory and is thus significantly more efficient.

The following benchmark prints a single ref matching a specific pattern
out of 1 million refs via git-show-ref(1), where the reftable stack
consists of three tables:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     224.4 ms ±   6.5 ms    [User: 220.6 ms, System: 3.6 ms]
    Range (min … max):   216.5 ms … 261.1 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     172.9 ms ±   4.4 ms    [User: 169.2 ms, System: 3.6 ms]
    Range (min … max):   166.5 ms … 204.6 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.30 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:18:04 -08:00
Patrick Steinhardt f6b58c1be4 reftable: introduce macros to grow arrays
Throughout the reftable library we have many cases where we need to grow
arrays. In order to avoid too many reallocations, we roughly double the
capacity of the array on each iteration. The resulting code pattern is
duplicated across many sites.

We have similar patterns in our main codebase, which is why we have
eventually introduced an `ALLOC_GROW()` macro to abstract it away and
avoid some code duplication. We cannot easily reuse this macro here
though because `ALLOC_GROW()` uses `REALLOC_ARRAY()`, which in turn will
call realloc(3P) to grow the array. The reftable code is structured as a
library though (even if the boundaries are fuzzy), and one property this
brings with it is that it is possible to plug in your own allocators. So
instead of using realloc(3P), we need to use `reftable_realloc()` that
knows to use the user-provided implementation.

So let's introduce two new macros `REFTABLE_REALLOC_ARRAY()` and
`REFTABLE_ALLOC_GROW()` that mirror what we do in our main codebase,
with two modifications:

  - They use `reftable_realloc()`, as explained above.

  - They use a different growth factor of `2 * cap + 1` instead of `(cap
    + 16) * 3 / 2`.

The second change is because we know a bit more about the allocation
patterns in the reftable library. In most cases, we end up only having a
handful of items in the array and don't end up growing them. The initial
capacity that our normal growth factor uses (which is 24) would thus end
up over-allocating in a lot of code paths. This effect is measurable:

  - Before change:

      HEAP SUMMARY:
          in use at exit: 671,983 bytes in 152 blocks
        total heap usage: 3,843,446 allocs, 3,843,294 frees, 223,761,402 bytes allocated

  - After change with a growth factor of `(2 * alloc + 1)`:

      HEAP SUMMARY:
          in use at exit: 671,983 bytes in 152 blocks
        total heap usage: 3,843,446 allocs, 3,843,294 frees, 223,761,410 bytes allocated

  - After change with a growth factor of `(alloc + 16)* 2 / 3`:

      HEAP SUMMARY:
          in use at exit: 671,983 bytes in 152 blocks
        total heap usage: 3,833,673 allocs, 3,833,521 frees, 4,728,251,742 bytes allocated

While the total heap usage is roughly the same, we do end up allocating
significantly more bytes with our usual growth factor (in fact, roughly
21 times as many).

Convert the reftable library to use these new macros.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-06 12:10:08 -08:00
Elijah Conners c18eecbe5c reftable: use a pointer for pq_entry param
The speed of the merged_iter_pqueue_add() can be improved by using a
pointer to the pq_entry struct, which is 96 bytes. Since the pq_entry
param is worked directly on the stack and does not currently have a
pointer to it, the merged_iter_pqueue_add() function is slightly
slower.

References to pq_entry in reftable have typically included pointers,
such as both of the params for pq_less().

Since we are working with pointers in the pq_entry param, as keenly
pointed out, the pq_entry param has also been made into a const since
the contents of the pq_entry param are copied and not manipulated.

Signed-off-by: Elijah Conners <business@elijahpepe.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-15 11:32:37 -07:00
Han-Wen Nienhuys 66c0dabab5 reftable: make reftable_record a tagged union
This reduces the amount of glue code, because we don't need a void
pointer or vtable within the structure.

The only snag is that reftable_index_record contain a strbuf, so it
cannot be zero-initialized. To address this, use reftable_new_record()
to return fresh instance, given a record type. Since
reftable_new_record() doesn't cause heap allocation anymore, it should
be balanced with reftable_record_release() rather than
reftable_record_destroy().

Thanks to Peff for the suggestion.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-20 11:31:53 -08:00
Han-Wen Nienhuys 3b34f636df reftable: add a heap-based priority queue for reftable records
This is needed to create a merged view multiple reftables

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00