With `GIT_TEST_PASSING_SANITIZE_LEAK=check`, one can double check
whether a memory leak fix caused some test suites to become leak free.
This is done by running all tests with the leak checker enabled. If a
test suite does not declare `TEST_PASSES_SANITIZE_LEAK=true` but still
finishes successfully with the leak checker enabled, then this indicates
that the test is leak free and thus missing the annotation.
It is somewhat slow to execute though because it runs all of our test
suites with the leak sanitizer enabled. It is also pointless in most
cases, because the only test suites that need to be checked are those
which _aren't_ yet marked with `TEST_PASSES_SANITIZE_LEAK=true`.
Introduce a new value "check-failing". When set, we behave the same as
if "check" was passed, except that we only check those tests which do
not have `TEST_PASSES_SANITIZE_LEAK=true` set. This is significantly
faster than running all test suites but still fulfills the usecase of
finding newly-leak-free test suites.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This change affects how 'git cat-file' works with the index when
specifying an object with the ":<path>" syntax (which will give file
contents from the index).
'git cat-file' expands a sparse index to a full index any time contents
are requested from the index by specifying an object with the ":<path>"
syntax. This is true even when the requested file is part of the sparse
index, and results in much slower 'git cat-file' operations when working
within the sparse index.
Mark 'git cat-file' as not needing a full index, so that you only pay
the cost of expanding the sparse index to a full index when you request
a file outside of the sparse index.
Add tests to ensure both that:
- 'git cat-file' returns the correct file contents whether or not the
file is in the sparse index
- 'git cat-file' expands to the full index any time you request
something outside of the sparse index
Signed-off-by: Kevin Lyles <klyles+github@epic.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'run_on_sparse' and 'run_on_all' functions do not work correctly for
commands accepting standard input, because they run the same command
multiple times and the first instance consumes it. This also indirectly
affects 'test_all_match' and 'test_sparse_match'.
To allow these functions to work with commands accepting standard input,
first slurp standard input to a temporary file, and then run the command
with its standard input redirected from the temporary file. This ensures
that each command sees the same contents from its standard input.
Note that this does not impact commands that do not read from standard
input; they continue to ignore it. Additionally, existing uses of the
run_on_* functions do not need to do anything differently, as the
standard input of the test environment is already connected to
/dev/null.
We do not explicitly clean up the input files because they are cleaned
up with the rest of the test repositories and their contents may be
useful for figuring out which command failed when a test case fails.
Signed-off-by: Kevin Lyles <klyles@epic.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we're using `clar` as powerful test framework, we have to
adjust the Visual C build (read: the CMake definition) to be able to
handle that, too.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the ctype tests to use the new clar unit testing framework.
Introduce a new function `cl_failf()` that allows us to print a
formatted error message, which we can use to point out which of the
characters was classified incorrectly. This results in output like this
on failure:
# start of suite 1: ctype
not ok 1 - ctype::isspace
---
reason: |
Test failed.
0x0d is classified incorrectly: expected 0, got 1
at:
file: 't/unit-tests/ctype.c'
line: 36
function: 'test_ctype__isspace'
---
ok 2 - ctype::isdigit
ok 3 - ctype::isalpha
ok 4 - ctype::isalnum
ok 5 - ctype::is_glob_special
ok 6 - ctype::is_regex_special
ok 7 - ctype::is_pathspec_magic
ok 8 - ctype::isascii
ok 9 - ctype::islower
ok 10 - ctype::isupper
ok 11 - ctype::iscntrl
ok 12 - ctype::ispunct
ok 13 - ctype::isxdigit
ok 14 - ctype::isprint
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the strvec tests to use the new clar unit testing framework.
This is a first test balloon that demonstrates how the testing infra for
clar-based tests looks like.
The tests are part of the "t/unit-tests/bin/unit-tests" binary. When
running that binary with an injected error, it generates TAP output:
# ./t/unit-tests/bin/unit-tests
TAP version 13
# start of suite 1: strvec
ok 1 - strvec::init
ok 2 - strvec::dynamic_init
ok 3 - strvec::clear
not ok 4 - strvec::push
---
reason: |
String mismatch: (&vec)->v[i] != expect[i]
'foo' != 'fo' (at byte 2)
at:
file: 't/unit-tests/strvec.c'
line: 48
function: 'test_strvec__push'
---
ok 5 - strvec::pushf
ok 6 - strvec::pushl
ok 7 - strvec::pushv
ok 8 - strvec::replace_at_head
ok 9 - strvec::replace_at_tail
ok 10 - strvec::replace_in_between
ok 11 - strvec::replace_with_substring
ok 12 - strvec::remove_at_head
ok 13 - strvec::remove_at_tail
ok 14 - strvec::remove_in_between
ok 15 - strvec::pop_empty_array
ok 16 - strvec::pop_non_empty_array
ok 17 - strvec::split_empty_string
ok 18 - strvec::split_single_item
ok 19 - strvec::split_multiple_items
ok 20 - strvec::split_whitespace_only
ok 21 - strvec::split_multiple_consecutive_whitespaces
ok 22 - strvec::detach
1..22
The binary also supports some parameters that allow us to run only a
subset of unit tests or alter the output:
$ ./t/unit-tests/bin/unit-tests -h
Usage: ./t/unit-tests/bin/unit-tests [options]
Options:
-sname Run only the suite with `name` (can go to individual test name)
-iname Include the suite with `name`
-xname Exclude the suite with `name`
-v Increase verbosity (show suite names)
-q Only report tests that had an error
-Q Quit as soon as a test fails
-t Display results in tap format
-l Print suite names
-r[filename] Write summary file (to the optional filename)
Furthermore, running `make unit-tests` runs the binary along with all
the other unit tests we have.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test driver in "unit-test.c" is responsible for setting up our unit
tests and eventually running them. As such, it is also responsible for
parsing the command line arguments.
The clar unit testing framework provides function `clar_test()` that
parses command line arguments and then executes the tests for us. In
theory that would already be sufficient. We have the special requirement
to always generate TAP-formatted output though, so we'd have to always
pass the "-t" argument to clar. Furthermore, some of the options exposed
by clar are ineffective when "-t" is used, but they would still be shown
when the user passes the "-h" parameter to have the clar show its usage.
Implement our own option handling instead of using the one provided by
clar, which gives us greater flexibility in how exactly we set things
up.
We would ideally not use any "normal" code of ours for this such that
the unit testing framework doesn't depend on it working correctly. But
it is somewhat dubious whether we really want to reimplement all of the
option parsing. So for now, let's be pragmatic and reuse it until we
find a good reason in the future why we'd really want to avoid it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wire up the clar unit testing framework by introducing a new
"unit-tests" executable. In contrast to the existing framework, this
will result in a single executable for all test suites. The ability to
pick specific tests to execute is retained via functionality built into
the clar itself.
Note that we need to be a bit careful about how we need to invalidate
our Makefile rules. While we obviously have to regenerate the clar suite
when our test suites change, we also have to invalidate it in case any
of the test suites gets removed. We do so by using our typical pattern
of creating a `GIT-TEST-SUITES` file that gets updated whenever the set
of test suites changes, so that we can easily depend on that file.
Another specialty is that we generate a "clar-decls.h" file. The test
functions are neither static, nor do they have external declarations.
This is because they are getting parsed via "generate.py", which then
creates the external generations that get populated into an array. These
declarations are only seen by the main function though.
The consequence is that we will get a bunch of "missing prototypes"
errors from our compiler for each of these test functions. To fix those
errors, we extract the `extern` declarations from "clar.suite" and put
them into a standalone header that then gets included by each of our
unit tests. This gets rid of compiler warnings for every function which
has been extracted by "generate.py". More importantly though, it does
_not_ get rid of warnings in case a function really isn't being used by
anything. Thus, it would cause a compiler error if a function name was
mistyped and thus not picked up by "generate.py".
The test driver "unit-test.c" is an empty stub for now. It will get
implemented in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have several third-party sources in our codebase that we have
imported from upstream projects. These sources are mostly excluded from
our static analysis, for example when running Coccinelle.
Do the same for our "sparse" target by filtering them out.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "hdr-check" Makefile target compiles each of our headers as a
standalone code unit to ensure that they are not missing any type
declarations and can be included standalone.
With the next commit we will wire up the clar unit testing framework,
which will have the effect that some headers start depending on
generated ones. While we could declare that dependency explicitly, it
does not really feel very maintainable in the future.
Instead, we do the same as in the preceding commit and have the objects
depend on all of our generated headers. While again overly broad, it is
easy to maintain and generating headers is not an expensive thing to do
anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "check" Makefile target is essentially an alias around the "sparse"
target. The one difference though is that it will tell users to instead
run the "test" target in case they do not have sparse(1) installed, as
chances are high that they wanted to execute the test suite rather than
doing semantic checks.
But even though the "check" target ultimately just ends up executing
`make sparse`, it still depends on our generated headers. This does not
make any sense though: they are irrelevant for the "test" target advice,
and if these headers are required for the "sparse" target they must be
declared as a dependency on the aliased target, not the alias.
But even moving the dependency to the "sparse" target is wrong, as
concurrent builds may then end up generating the headers and running
sparse concurrently. Instead, we make them a dependency of the specific
objects. While that is overly broad, it does ensure correct ordering.
The alternative, specifying which file depends on what generated header
explicitly, feels rather unmaintainable.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `shellapi.h` header was included as of
https://github.com/clar-test/clar/commit/136e763211aa, to have
`SHFileOperation()` declared so that it could be called.
However, https://github.com/clar-test/clar/commit/5ce31b69b525 removed
that call, and therefore that `#include <shellapi.h>` is unnecessary.
It is also unwanted in Git because this project uses a subset of Git for
Windows' SDK in its CI builds that (for bandwidth reasons) excludes tons
of header files, including `shellapi.h`.
So let's remove it.
Note: Since the `windows.h` header would include `shellapi.h` anyway, we
also define `WIN32_LEAN_AND_MEAN` to avoid this and similar other
unnecessary includes before including `windows.h`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When CLAR_FIXTURE_PATH is unset, the `fs_copy()` function seems not to
be used. But it is declared as `static`, and GCC does not like that,
complaining that it should not be declared/defined to begin with.
We could mark this function as (potentially) unused by following the
`MAYBE_UNUSED` pattern from Git's `git-compat-util.h`. However, this is
a GCC-only construct that is not understood by Visual C. Besides, `clar`
does not use that pattern at all.
Instead, let's use the `((void)SYMBOL);` pattern that `clar` already
uses elsewhere; This avoids the compile error by sorta kinda make the
function used after a fashion.
Note: GCC 14.x (which Git for Windows' SDK already uses) is able to
figure out that this function is unused even though there are recursive
calls between `fs_copy()` and `fs_copydir_helper()`; Earlier GCC
versions do not detect that, and therefore the issue has been hidden
from the regular Linux CI builds (where GCC 14.x is not yet used). That
is the reason why this change is only made in the Windows-specific
portion of `t/unit-tests/clar/clar/fs.h`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using mingw-w64 to compile the code, and using `_stat()`, it is
necessary to use `struct _stat`, too, and not `struct stat` (as the
latter is incompatible with the "dashed" version because it is limited
to 32-bit time types for backwards compatibility).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The NonStop platform does not have `mkdtemp()` available, which we rely
on in `build_sandbox_path()`. Fix this issue by using `mktemp()` and
`mkdir()` instead on this platform.
This has been cherry-picked from the upstream pull request at [1].
[1]: https://github.com/clar-test/clar/pull/96
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our unit testing framework is a homegrown solution. While it supports
most of our needs, it is likely that the volume of unit tests will grow
quite a bit in the future such that we can exercise low-level subsystems
directly. This surfaces several shortcomings that the current solution
has:
- There is no way to run only one specific tests. While some of our
unit tests wire this up manually, others don't. In general, it
requires quite a bit of boilerplate to get this set up correctly.
- Failures do not cause a test to stop execution directly. Instead,
the test author needs to return manually whenever an assertion
fails. This is rather verbose and is not done correctly in most of
our unit tests.
- Wiring up a new testcase requires both implementing the test
function and calling it in the respective test suite's main
function, which is creating code duplication.
We can of course fix all of these issues ourselves, but that feels
rather pointless when there are already so many unit testing frameworks
out there that have those features.
We line out some requirements for any unit testing framework in
"Documentation/technical/unit-tests.txt". The "clar" unit testing
framework, which isn't listed in that table yet, ticks many of the
boxes:
- It is licensed under ISC, which is compatible.
- It is easily vendorable because it is rather tiny at around 1200
lines of code.
- It is easily hackable due to the same reason.
- It has TAP support.
- It has skippable tests.
- It preprocesses test files in order to extract test functions, which
then get wired up automatically.
While it's not perfect, the fact that clar originates from the libgit2
project means that it should be rather easy for us to collaborate with
upstream to plug any gaps.
Import the clar unit testing framework at commit 1516124 (Merge pull
request #97 from pks-t/pks-whitespace-fixes, 2024-08-15). The framework
will be wired up in subsequent commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the prove target, we append GIT_TEST_OPTS to the arguments
that we execute each of the tests with. This doesn't only include the
intended test scripts, but also ends up passing the arguments to our
unit tests. This is unintentional though as they do not even know to
interpret those arguments, and is inconsistent with how we execute unit
tests without prove.
This isn't much of an issue because our current set of unit tests mostly
ignore their arguments anyway. With the introduction of clar-based unit
tests this is about to become an issue though, as these do parse their
command line argument to alter behaviour.
Prepare for this by passing GIT_TEST_OPTS to "run-test.sh" via an
environment variable. Like this, we can conditionally forward it to our
test scripts, only.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--auto` flag for git-pack-refs(1) allows the ref backend to decide
whether or not a repack is in order. This switch has been introduced
mostly with the "reftable" backend in mind, which already knows to
auto-compact its tables during normal operations. When the flag is set,
then it will use the same auto-compaction mechanism and thus end up
doing nothing in most cases.
The "files" backend does not have any such heuristic yet and instead
packs any loose references unconditionally. So we rewrite the complete
"packed-refs" file even if there's only a single loose reference to be
packed.
Even worse, starting with 9f6714ab3e (builtin/gc: pack refs when using
`git maintenance run --auto`, 2024-03-25), `git pack-refs --auto` is
unconditionally executed via our auto maintenance, so we end up repacking
references every single time auto maintenance kicks in. And while that
commit already mentioned that the "files" backend unconditionally packs
refs now, the author obviously didn't quite think about the consequences
thereof. So while the idea was sound, we really should have added a
heuristic to the "files" backend before implementing it.
Introduce a heuristic that decides whether or not it is worth to pack
loose references. The important factors to decide here are the number of
loose references in comparison to the overall size of the "packed-refs"
file. The bigger the "packed-refs" file, the longer it takes to rewrite
it and thus we scale up the limit of allowed loose references before we
repack.
As is the nature of heuristics, this mechansim isn't obviously
"correct", but should rather be seen as a tradeoff between how much
resources we spend packing refs and how inefficient the ref store
becomes. For all I can say, we have successfully been using the exact
same heuristic in Gitaly for several years by now.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two tests in t0601 which exercise the same underlying logic,
once via `git pack-refs --auto` and once via `git maintenance run
--auto`. Merge these two tests into one such that it becomes easier to
extend test coverage for both commands at the same time.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have an implementation of a function that computes the log2 for an
integer. While we could instead use log2(3P), that involves floating
point numbers and is thus needlessly complex and inefficient.
We're about to add a second caller that wants to compute log2 for a
`size_t`. Let's thus move the function into "wrapper.h" such that it
becomes generally available.
While at it, tweak the implementation a bit:
- The parameter is converted from `int` to `uintmax_t`. This
conversion is safe to do in "bisect.c" because we already check that
the argument is positive.
- The return value is an `unsigned`. It cannot ever be negative, so it
is pointless for it to be a signed integer.
- Loop until `!n` instead of requiring that `n > 1` and then subtract
1 from the result and add a special case for `!sz`. This helps
compilers to generate more efficient code.
Compilers recognize the pattern of this function and optimize
accordingly. On GCC 14.2 x86_64:
log2u(unsigned long):
test rdi, rdi
je .L3
bsr rax, rdi
ret
.L3:
mov eax, -1
ret
Clang 18.1 does not yet recognize the pattern, but starts to do so on
Clang trunk x86_64. The code isn't quite as efficient as the one
generated by GCC, but still manages to optimize away the loop:
log2u(unsigned long):
test rdi, rdi
je .LBB0_1
shr rdi
bsr rcx, rdi
mov eax, 127
cmovne rax, rcx
xor eax, -64
add eax, 65
ret
.LBB0_1:
mov eax, -1
ret
The pattern is also recognized on other platforms like ARM64 GCC 14.2.0,
where we end up using `clz`:
log2u(unsigned long):
clz x2, x0
cmp x0, 0
mov w1, 63
sub w0, w1, w2
csinv w0, w0, wzr, ne
ret
Note that we have a similar function `fastlog2()` in the reftable code.
As that codebase is separate from the Git codebase we do not adapt it to
use the new function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was reported that git-verify-pack(1) has started to crash with Git
v2.46.0 when run outside of a repository. This is another fallout from
c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), where we have stopped setting the default hash algorithm
for `the_repository`. Consequently, code that relies on `the_hash_algo`
will now crash when it hasn't explicitly been initialized, which may be
the case when running outside of a Git repository.
The crash is not in git-verify-pack(1) but instead in git-index-pack(1),
which gets called by the former. Ideally, both of these programs should
be able to identify the hash algorithm used by the packfile and index
without having to rely on external information. But unfortunately, the
format for neither of them is completely self-describing, so it is not
possible to derive that information. This is a design issue that we
should address by introducing a new packfile version that encodes its
object hash.
For now though the more important fix is to not make either of these
programs crash anymore, which we do by falling back to SHA1 when the
object hash is unconfigured. This pessimizes reading packfiles which
use a different hash than SHA1, but restores previous behaviour.
Reported-by: Ilya K <me@0upti.me>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If "git rebase" fails to start after stashing the user's uncommitted
changes then it forgets to restore the stashed changes and remove the
state directory. To make matters worse, running "git rebase --abort" to
apply the stashed changes and cleanup the state directory fails because
the state directory only contains the "autostash" file and is missing
the "head-name" and "onto" files required by read_basic_state().
Fix this by applying the autostash and removing the state directory if
the pre-rebase hook or initial checkout fail. This matches what
finish_rebase() does at the end of a successful rebase. If the user
modifies any files after the autostash is created it is possible there
will be conflicts when the autostash is applied. In that case
apply_autostash() saves the stash in a new entry under refs/stash and so
it is safe to remove the state directory containing the autostash file.
New tests are added to check the autostash is applied and the state
directory is removed if the rebase fails to start. Checks are also added
to some existing tests in order to ensure there is no state directory
left behind when a rebase fails to start and no autostash has been
created.
Reported-by: Brian Lyles <brianmlyles@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-pack-redundant(1) command is already in the process of being
phased out and dies unless the user passes the `--i-still-use-this` flag
since 4406522b76 (pack-redundant: escalate deprecation warning to an
error, 2023-03-23). We haven't heard any complaints, so let's announce
the removal of this command in Git 3.0 in our breaking changes document.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More trace2 events at key points on push and fetch code paths have
been added.
* js/fetch-push-trace2-annotation:
send-pack: add new tracing regions for push
fetch: add top-level trace2 regions
trace2: implement trace2_printf() for event target
VSCode has supported three-way merges since 2022, see
<https://github.com/microsoft/vscode/issues/5770#issuecomment-1188658476>.
Although the program binary is located at /usr/bin/code, name the
mergetool "vscode" because the word "code" is too generic and would lead
to confusion. The name "vscode" also matches Git's existing
contrib/vscode directory.
On Windows, VSCode adds the directory that contains code.cmd to %PATH%,
so there is no need to invoke mergetool_find_win32_cmd to search for the
program.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
helper/test-oid-array.c along with t0064-oid-array.sh test the
oid-array.h API, which provides storage and processing
efficiency over large lists of object identifiers.
Migrate them to the unit testing framework for better runtime
performance and efficiency. As we don't initialize a repository
in these tests, the hash algo that functions like oid_array_lookup()
use is not initialized, therefore call repo_set_hash_algo() to
initialize it. And init_hash_algo():lib-oid.c can aid in this
process, so make it public.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If neither HAVE_DEV_TTY nor GIT_WINDOWS_NATIVE is set, the fallback
code calls the system getpass(). This unfortunately ignores the "echo"
boolean parameter, as we have no way to implement that functionality.
But we still have to keep the unused parameter, since our interface
has to match the other implementations.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git-log we leave the save_commit_buffer flag set to "1", which tells
the commit parsing code to store the object content after it has parsed
it to find parents, tree, etc. That lets us reuse the contents for
pretty-printing the commit in the output. And then after printing each
commit, we call free_commit_buffer(), since we don't need it anymore.
But some options may cause us to traverse commits which are not part of
the output. And so git-log does not see them at all, and doesn't free
them. One such case is something like:
git log -n 1000 --skip=1000000
which will churn through a million commits, before showing only a
thousand. We loop through these inside get_revision(), without freeing
the contents. As a result, we end up storing the object data for those
million commits simultaneously.
We should free the stored buffers (if any) for those commits as we skip
over them, which is what this patch does. Running the above command in
linux.git drops the peak heap usage from ~1.1GB to ~200MB, according to
valgrind/massif. (I thought we might get an even bigger improvement, but
the remaining memory is going to commit/tree structs, which we do hold
on to forever).
Note that this problem doesn't occur if:
- you're running a git-rev-list without a --format parameter; it turns
off save_commit_buffer by default, since it only output the object
id
- you've built a commit-graph file, since in that case we'd use the
optimized graph data instead of the initial parse, and then do a
lazy parse for commits we're actually going to output
There are probably some other option combinations that can likewise
end up with useless stored commit buffers. For example, if you ask for
"foo..bar", then we'll have to walk down to the merge base, and
everything on the "foo" side won't be shown. Tuning the "save" behavior
to handle that might be tricky (I guess maybe drop buffers for anything
we mark as UNINTERESTING?). And in the long run, the right solution here
is probably to make sure the commit-graph is built (since it fixes the
memory problem _and_ drastically reduces CPU usage).
But since this "--skip" case is an easy one-liner, it's worth fixing in
the meantime. It should be OK to make this call even if there is no
saved buffer (e.g., because save_commit_buffer=0, or because a
commit-graph was used), since it's O(1) to look up the buffer and is a
noop if it isn't present. I verified by running the above command after
"git commit-graph write --reachable", and it takes the same time with
and without this patch.
Reported-by: Yuri Karnilaev <karnilaev@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is needed to build things with -Werror=unused-parameter on a
platform without symbolic link support.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We provide custom malloc/free callbacks for the pcre library to use.
Those take an extra "data" parameter, but we don't use it. Back when
these were added in 513f2b0bbd (grep: make PCRE2 aware of custom
allocator, 2019-10-16), we only had MAYBE_UNUSED.
But these days we have UNUSED, which we should prefer, as it will
let the compiler inform us if the code changes to actually use the
parameters.
I also moved the annotations to come after the variable name, which is
how we typically spell it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "opts" parameter is always used, so marking it with MAYBE_UNUSED is
just confusing.
This annotation goes back to 41abfe15d9 (maintenance: add pack-refs
task, 2021-02-09), when it really was unused. Back then we did not have
the UNUSED macro that would complain if the code changed to use the
parameter. So when we started using it in bfc2f9eb8e (builtin/gc:
forward git-gc(1)'s `--auto` flag when packing refs, 2024-03-25), nobody
noticed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A function that uses a parameter in one build may lose all uses of
the parameter in another build, depending on the configuration. A
workaround for such a case, MAYBE_UNUSED, should also be mentioned
when we recommend the use of UNUSED to our developers.
Keep the addition to the guideline short and document the criteria
to choose between UNUSED and MAYBE_UNUSED near their definition.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/unused-parameters:
CodingGuidelines: mention -Wunused-parameter and UNUSED
config.mak.dev: enable -Wunused-parameter by default
compat: mark unused parameters in win32/mingw functions
compat: disable -Wunused-parameter in win32/headless.c
compat: disable -Wunused-parameter in 3rd-party code
t-reftable-readwrite: mark unused parameter in callback function
gc: mark unused config parameter in virtual functions
The underlying machinery for "git diff-index" has long been made to
expand the sparse index as needed, but the command fully expanded
the sparse index upfront, which now has been taught not to do.
* ds/sparse-diff-index:
diff-index: integrate with the sparse index
Another test for reftable library ported to the unit test framework.
* cp/unit-test-reftable-block:
t-reftable-block: mark unused argv/argc
t-reftable-block: add tests for index blocks
t-reftable-block: add tests for obj blocks
t-reftable-block: add tests for log blocks
t-reftable-block: remove unnecessary variable 'j'
t-reftable-block: use xstrfmt() instead of xstrdup()
t-reftable-block: use block_iter_reset() instead of block_iter_close()
t-reftable-block: use reftable_record_key() instead of strbuf_addstr()
t-reftable-block: use reftable_record_equal() instead of check_str()
t-reftable-block: release used block reader
t: harmonize t-reftable-block.c with coding guidelines
t: move reftable/block_test.c to the unit testing framework
The code in the reftable library has been cleaned up by discarding
unused "generic" interface.
* ps/reftable-drop-generic:
reftable: mark unused parameters in empty iterator functions
reftable/generic: drop interface
t/helper: refactor to not use `struct reftable_table`
t/helper: use `hash_to_hex_algop()` to print hashes
t/helper: inline printing of reftable records
t/helper: inline `reftable_table_print()`
t/helper: inline `reftable_stack_print_directory()`
t/helper: inline `reftable_reader_print_file()`
t/helper: inline `reftable_dump_main()`
reftable/dump: drop unused `compact_stack()`
reftable/generic: move generic iterator code into iterator interface
reftable/iter: drop double-checking logic
reftable/stack: open-code reading refs
reftable/merged: stop using generic tables in the merged table
reftable/merged: rename `reftable_new_merged_table()`
reftable/merged: expose functions to initialize iterators
The command line prompt support used to be littered with bash-isms,
which has been corrected to work with more shells.
* ah/git-prompt-portability:
git-prompt: support custom 0-width PS1 markers
git-prompt: ta-da! document usage in other shells
git-prompt: don't use shell $'...'
git-prompt: add some missing quotes
git-prompt: replace [[...]] with standard code
git-prompt: don't use shell arrays
git-prompt: fix uninitialized variable
git-prompt: use here-doc instead of here-string
These unused parameters were marked in a68ec8683a (reftable: mark unused
parameters in virtual functions, 2024-08-17), but the functions were
moved to a new file in a parallel branch via f2406c81b9
(reftable/generic: move generic iterator code into iterator interface,
2024-08-22).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>