Move functions relating to the reftable iterator from "generic.c" into
"iter.c". This prepares for the removal of the former subsystem.
While at it, remove some unneeded braces to conform to our coding style.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The filtering ref iterator can be used to only yield refs which are not
in a specific skip list. This iterator has an option to double-check the
results it returns, which causes us to seek the reference we are about
to yield via a separate table such that we detect whether the reference
that the first iterator has yielded actually exists.
The value of this is somewhat dubious, and I cannot think of any usecase
where this functionality should be required. Furthermore, this option is
never set in our codebase, which means that it is essentially untested.
And last but not least, the `struct reftable_table` that is used to
implement it is about to go away.
So while we could refactor the code to not use a `reftable_table`, it
very much feels like a wasted effort. Let's just drop this code.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To read a reference for the reftable stack, we first create a generic
`reftable_table` from the merged table and then read the reference via a
convenience function. We are about to remove these generic interfaces,
so let's instead open-code the logic to prepare for this removal.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merged table provides access to a reftable stack by merging the
contents of those tables into a virtual table. These subtables are being
tracked via `struct reftable_table`, which is a generic interface for
accessing either a single reftable or a merged reftable. So in theory,
it would be possible for the merged table to merge together other merged
tables.
This is somewhat nonsensical though: we only ever set up a merged table
over normal reftables, and there is no reason to do otherwise. This
generic interface thus makes the code way harder to follow and reason
about than really necessary. The abstraction layer may also have an
impact on performance, even though the extra set of vtable function
calls probably doesn't really matter.
Refactor the merged tables to use a `struct reftable_reader` for each of
the subtables instead, which gives us direct access to the underlying
tables. Adjust names accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `reftable_new_merged_table()` to `reftable_merged_table_new()`
such that the name matches our coding style.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not expose any functions via our public headers that would allow a
caller to initialize a reftable iterator from a merged table. Instead,
they are expected to go via the generic `reftable_table` interface,
which is somewhat roundabout.
Implement two new functions to initialize iterators for ref and log
records to plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/reftable-stack-compaction:
reftable/stack: handle locked tables during auto-compaction
reftable/stack: fix corruption on concurrent compaction
reftable/stack: use lock_file when adding table to "tables.list"
reftable/stack: do not die when fsyncing lock file files
reftable/stack: simplify tracking of table locks
reftable/stack: update stats on failed full compaction
reftable/stack: test compaction with already-locked tables
reftable/stack: extract function to setup stack with N tables
reftable/stack: refactor function to gather table sizes
Perforce tests have been updated.
* ps/p4-tests-updates:
t98xx: mark Perforce tests as memory-leak free
ci: update Perforce version to r23.2
t98xx: fix Perforce tests with p4d r23 and newer
An expensive operation to prepare tracing was done in re-encoding
code path even when the tracing was not requested, which has been
corrected.
* dh/encoding-trace-optim:
convert: return early when not tracing
Some project conventions have been added to CodingGuidelines.
* ps/doc-more-c-coding-guidelines:
Documentation: consistently use spaces inside initializers
Documentation: document idiomatic function names
Documentation: document naming schema for structs and their functions
Documentation: clarify indentation style for C preprocessor directives
clang-format: fix indentation width for preprocessor directives
"git grep -W" omits blank lines that follow the found function at
the end of the file, just like it omits blank lines before the next
function.
* rs/grep-omit-blank-lines-after-function-at-eof:
grep: -W: skip trailing empty lines at EOF, too
"git notes add -m '' --allow-empty" and friends that take prepared
data to create notes should not invoke an editor, but it started
doing so since Git 2.42, which has been corrected.
* dd/notes-empty-no-edit-by-default:
notes: do not trigger editor when adding an empty note
Test script linter has been updated to catch an attempt to use
one-shot export construct "VAR=VAL func" for shell functions (which
does not work for some shells) better.
* es/shell-check-updates:
check-non-portable-shell: improve `VAR=val shell-func` detection
check-non-portable-shell: suggest alternative for `VAR=val shell-func`
check-non-portable-shell: loosen one-shot assignment error message
t4034: fix use of one-shot variable assignment with shell function
t3430: drop unnecessary one-shot "VAR=val shell-func" invocation
A 'P' command to "git add -p" that passes the patch hunk to the
pager has been added.
* rj/add-p-pager:
add-patch: render hunks through the pager
pager: introduce wait_for_pager
pager: do not close fd 2 unnecessarily
add-patch: test for 'p' command
When compacting tables, it may happen that we want to compact a set of
tables which are already locked by a concurrent process that compacts
them. In the case where we wanted to perform a full compaction of all
tables it is sensible to bail out in this case, as we cannot fulfill the
requested action.
But when performing auto-compaction it isn't necessarily in our best
interest of us to abort the whole operation. For example, due to the
geometric compacting schema that we use, it may be that process A takes
a lot of time to compact the bulk of all tables whereas process B
appends a bunch of new tables to the stack. B would in this case also
notice that it has to compact the tables that process A is compacting
already and thus also try to compact the same range, probably including
the new tables it has appended. But because those tables are locked
already, it will fail and thus abort the complete auto-compaction. The
consequence is that the stack will grow longer and longer while A isn't
yet done with compaction, which will lead to a growing performance
impact.
Instead of aborting auto-compaction altogether, let's gracefully handle
this situation by instead compacting tables which aren't locked. To do
so, instead of locking from the beginning of the slice-to-be-compacted,
we start locking tables from the end of the slice. Once we hit the first
table that is locked already, we abort. If we succeeded to lock two or
more tables, then we simply reduce the slice of tables that we're about
to compact to those which we managed to lock.
This ensures that we can at least make some progress for compaction in
said scenario. It also helps in other scenarios, like for example when a
process died and left a stale lockfile behind. In such a case we can at
least ensure some compaction on a best-effort basis.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The locking employed by compaction uses the following schema:
1. Lock "tables.list" and verify that it matches the version we have
loaded in core.
2. Lock each of the tables in the user-supplied range of tables that
we are supposed to compact. These locks prohibit any concurrent
process to compact those tables while we are doing that.
3. Unlock "tables.list". This enables concurrent processes to add new
tables to the stack, but also allows them to compact tables outside
of the range of tables that we have locked.
4. Perform the compaction.
5. Lock "tables.list" again.
6. Move the compacted table into place.
7. Write the new order of tables, including the compacted table, into
the lockfile.
8. Commit the lockfile into place.
Letting concurrent processes modify the "tables.list" file while we are
doing the compaction is very much part of the design and thus expected.
After all, it may take some time to compact tables in the case where we
are compacting a lot of very large tables.
But there is a bug in the code. Suppose we have two processes which are
compacting two slices of the table. Given that we lock each of the
tables before compacting them, we know that the slices must be disjunct
from each other. But regardless of that, compaction performed by one
process will always impact what the other process needs to write to the
"tables.list" file.
Right now, we do not check whether the "tables.list" has been changed
after we have locked it for the second time in (5). This has the
consequence that we will always commit the old, cached in-core tables to
disk without paying to respect what the other process has written. This
scenario would then lead to data loss and corruption.
This can even happen in the simpler case of one compacting process and
one writing process. The newly-appended table by the writing process
would get discarded by the compacting process because it never sees the
new table.
Fix this bug by re-checking whether our stack is still up to date after
locking for the second time. If it isn't, then we adjust the indices of
tables to replace in the updated stack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When modifying "tables.list", we need to lock the list before updating
it to ensure that no concurrent writers modify the list at the same
point in time. While we do this via the `lock_file` subsystem when
compacting the stack, we manually handle the lock when adding a new
table to it. While not wrong, it is at least inconsistent.
Refactor the code to consistently lock "tables.list" via the `lock_file`
subsytem.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use `fsync_component_or_die()` when committing an addition to the
"tables.list" lock file, which unsurprisingly dies in case the fsync
fails. Given that this is part of the reftable library, we should never
die and instead let callers handle the error.
Adapt accordingly and use `fsync_component()` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When compacting tables, we store the locks of all tables we are about to
compact in the `table_locks` array. As we currently only ever compact
all tables in the user-provided range or none, we simply track those
locks via the indices of the respective tables in the merged stack.
This is about to change though, as we will introduce a mode where auto
compaction gracefully handles the case of already-locked files. In this
case, it may happen that we only compact a subset of the user-supplied
range of tables. In this case, the indices will not necessarily match
the lock indices anymore.
Refactor the code such that we track the number of locks via a separate
variable. The resulting code is expected to perform the same, but will
make it easier to perform the described change.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When auto-compaction fails due to a locking error, we update the
statistics to indicate this failure. We're not doing the same when
performing a full compaction.
Fix this inconsistency by using `stack_compact_range_stats()`, which
handles the stat update for us.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're lacking test coverage for compacting tables when some of the
tables that we are about to compact are locked. Add two tests that
exercise this, one for auto-compaction and one for full compaction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're about to add two tests, and both of them will want to initialize
the reftable stack with a set of N tables. Introduce a new function that
handles this and refactor existing tests that use such a setup to use
it.
Note that this changes the exact records contained in the preexisting
tests. This is fine though as we only care about the shape of the stack
here, not the shape of each table.
Furthermore, with this change we now start to disable auto compaction
when writing the tables, as otherwise we might not end up with the
expected amount of new tables added. This also slightly changes the
behaviour of these tests, but the properties we care for remain intact.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the function that gathers table sizes to be more idiomatic. For
one, use `REFTABLE_CALLOC_ARRAY()` instead of `reftable_calloc()`.
Second, avoid using an integer to iterate through the tables in the
reftable stack given that `stack_len` itself is using a `size_t`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reviewing guidelines document now explicitly encourages people
to give positive reviews and how.
* jc/doc-reviewing-guidelines-positive-reviews:
ReviewingGuidelines: encourage positive reviews more
"git rebase --help" referred to "offset" (the difference between
the location a change was taken from and the change gets replaced)
incorrectly and called it "fuzz", which has been corrected.
* jc/doc-rebase-fuzz-vs-offset-fix:
doc: difference in location to apply is "offset", not "fuzz"
It has been documented that we avoid "VAR=VAL shell_func" and why.
* jc/doc-one-shot-export-with-shell-func:
CodingGuidelines: document a shell that "fails" "VAR=VAL shell_func"
Another reftable test has been ported to use the unit test framework.
* cp/unit-test-reftable-merged:
t-reftable-merged: add test for REFTABLE_FORMAT_ERROR
t-reftable-merged: use reftable_ref_record_equal to compare ref records
t-reftable-merged: add tests for reftable_merged_table_max_update_index
t-reftable-merged: improve the const-correctness of helper functions
t-reftable-merged: improve the test t_merged_single_record()
t: harmonize t-reftable-merged.c with coding guidelines
t: move reftable/merged_test.c to the unit testing framework
A CI job that use clang-format to check coding style issues in new
code has been added.
* kn/ci-clang-format:
ci/style-check: add `RemoveBracesLLVM` in CI job
check-whitespace: detect if no base_commit is provided
ci: run style check on GitHub and GitLab
clang-format: formalize some of the spacing rules
clang-format: avoid spacing around bitfield colon
clang-format: indent preprocessor directives after hash
"git checkout --ours" (no other arguments) complained that the
option is incompatible with branch switching, which is technically
correct, but found confusing by some users. It now says that the
user needs to give pathspec to specify what paths to checkout.
* jc/checkout-no-op-switch-errors:
checkout: special case error messages during noop switching
"git add -p" by users with diff.suppressBlankEmpty set to true
failed to parse the patch that represents an unmodified empty line
with an empty line (not a line with a single space on it), which
has been corrected.
* pw/add-patch-with-suppress-blank-empty:
add-patch: use normalize_marker() when recounting edited hunk
add-patch: handle splitting hunks with diff.suppressBlankEmpty
A build tweak knob has been simplified by not setting the value
that is already the default; another unused one has been removed.
* rj/make-cleanup:
config.mak.uname: remove unused uname_P variable
Makefile: drop -Wno-universal-initializer from SP_EXTRA_FLAGS
Many Porcelain commands that internally use the merge machinery
were taught to consistently honor the diff.algorithm configuration.
* ad/merge-with-diff-algorithm:
merge-recursive: honor diff.algorithm
Unit test clean-up.
* rs/t-strvec-use-test-msg:
t-strvec: fix type mismatch in check_strvec
t-strvec: improve check_strvec() output
t-strvec: use test_msg()
All the Perforce tests are free of memory leaks. This went unnoticed
because most folks do not have p4 and p4d installed on their computers.
Consequently, given that the prerequisites for running those tests
aren't fulfilled, `TEST_PASSES_SANITIZE_LEAK=check` won't notice that
those tests are indeed memory leak free.
Mark those tests accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update our Perforce version from r21.2 to r23.2. Note that the updated
version is not the newest version. Instead, it is the last version where
the way that Perforce is being distributed remains the same as in r21.2.
Newer releases stopped distributing p4 and p4d executables as well as
the macOS archives directly and would thus require more work.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some of the tests in t98xx modify the Perforce depot in ways that the
tool wouldn't normally allow. This is done to test behaviour of git-p4
in certain edge cases that we have observed in the wild, but which
should in theory not be possible.
Naturally, modifying the depot on disk directly is quite intimate with
the tool and thus prone to breakage when Perforce updates the way that
data is stored. And indeed, those tests are broken nowadays with r23 of
Perforce. While a file revision was previously stored as a plain file
"depot/file,v", it is now stored in a directory "depot/file,d" with
compression.
Adapt those tests to handle both old- and new-style depot layouts.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When Git adds a file requiring encoding conversion and tracing of encoding
conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING
environment variable, the `trace_encoding()` function still allocates &
prepares "human readable" copies of the file contents before and after
conversion to show in the trace. This results in a high memory footprint
and increased runtime without providing any user-visible benefit.
This fix introduces an early exit from the `trace_encoding()` function
when tracing is not requested, preventing unnecessary memory allocation
and processing.
Signed-off-by: D Harithamma <harithamma.d@ibm.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our coding guide is inconsistent with how it uses spaces inside of
initializers (`struct foo bar = { something }`). While we mostly carry
the space between open and closing braces and the initialized members,
in one case we don't.
Fix this one instance such that we consistently carry the space. This is
also consistent with how clang-format formats such initializers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We semi-regularly have discussions around whether a function shall be
named `S_release()`, `S_clear()` or `S_free()`. Indeed, it may not be
obvious which of these is preferable as we never really defined what
each of these variants means exactly.
Carve out a space where we can add idiomatic names for common functions
in our coding guidelines and define each of those functions. Like this,
we can get to a shared understanding of their respective semantics and
can easily point towards our style guide in future discussions such that
our codebase becomes more consistent over time.
Note that the intent is not to rename all functions which violate these
semantics right away. Rather, the intent is to slowly converge towards a
common style over time.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We nowadays have a proper mishmash of struct-related functions that are
called `<verb>_<struct>` (e.g. `clear_prio_queue()`) versus functions
that are called `<struct>_<verb>` (e.g. `strbuf_clear()`). While the
former style may be easier to tie into a spoken conversation, most of
our communication happens in text anyway. Furthermore, prefixing
functions with the name of the structure they operate on makes it way
easier to group them together, see which functions are related, and will
also help folks who are using code completion.
Let's thus settle on one style, namely the one where functions start
with the name of the structure they operate on.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commit, we have settled on using a single space per
nesting level to indent preprocessor directives. Clarify our coding
guidelines accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In [1], we have improved our clang-format configuration to also specify
the style for how to indent preprocessor directives. But while we have
settled the question of where to put the indentation, either before or
after the hash sign, we didn't specify exactly how to indent.
With the current configuration, clang-format uses tabs to indent each
level of nested preprocessor directives, which is in fact unintentional
and never done in our codebase. Instead, we use a mixture of indenting
by either one or two spaces, where using a single space is somewhat more
common.
Adapt our clang-format configuration accordingly by specifying an
indentation width of one space.
[1]: <20240708092317.267915-1-karthik.188@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>