When parsing `%(if)` atoms we expect a few other atoms to exist to
complete it, like `%(then)` and `%(end)`. Whether or not we have seen
these other atoms is tracked in an allocated `if_then_else` structure,
which gets free'd by the `if_then_else_handler()` once we have parsed
the complete conditional expression.
This results in a memory leak when the `%(if)` atom is not terminated
correctly and thus incomplete. We never end up executing its handler and
thus don't end up freeing the structure.
Plug this memory leak by introducing a new `at_end_data_free` callback
function. If set, we'll execute it in `pop_stack_element()` and pass it
the `at_end_data` variable with the intent to free its state. Wire it up
for the `%(if)` atom accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After using the ref-filter API, callers should use ref_filter_clear() to
free any used memory. However, there's not a matching function to clear
the ref_format struct.
Traditionally this did not need to be cleaned up, as it was just a way
for the caller to store and pass format options as a single unit. Even
though the parsing step of some placeholders may allocate data, that's
usually inside their "used_atom" structs, which are part of the
ref_filter itself.
But a few placeholders keep data outside of there. The %(ahead-behind)
and %(is-base) parsers both keep a master list of bases, because they
perform a single filtering pass outside of the use of any particular
atom. And since the format parser does not have access to the ref_filter
struct, they store their cross-atom data in the ref_format struct
itself.
And thus when they are finished, the ref_format also needs to be cleaned
up. So let's add a function to do so, and call it from all of the users
of the ref-filter API.
The %(is-base) case is found by running LSan on t6300. After this patch,
the script can now be marked leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we expand the %(upstream) or %(push) placeholders, we rely on
remote.c's remote_ref_for_branch() to fill in the ":refname" argument.
But that function has confusing memory ownership semantics: it may or
may not return an allocated string, depending on whether we are in
"upstream" mode or "push" mode. The caller in ref-filter.c always
duplicates the result, meaning that we leak the original in the case of
%(push:refname).
To solve this, let's make the return value from remote_ref_for_branch()
consistent, by always returning an allocated pointer. Note that the
switch to returning a non-const pointer has a ripple effect inside the
function, too. We were storing the "dst" result as a const pointer, too,
even though it is always allocated! It is the return value from
apply_refspecs(), which is always a non-const allocated string.
And then on the caller side in ref-filter.c (and this is the only caller
at all), we just need to avoid the extra duplication when the return
value is non-NULL.
This clears up one case that LSan finds in t6300, but there are more.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we parse a %(describe) placeholder, we stuff its arguments into a
strvec, which is then detached into the used_atom struct. But later,
when ref_array_clear() frees the atom, we never free the memory.
To solve this, we just need to add the appropriate free() calls. But
it's a little awkward, since we have to free each element of the array,
in addition to the array itself. Instead, let's store the actual strvec,
which lets us do a simple strvec_clear().
This clears up one case that LSan finds in t6300, but there are more.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we parse a placeholder like "%(trailers:key=foo)", our atom parsing
function is passed just the argument string "key=foo". We duplicate this
into its own string, but never free it, causing a leak.
We do the duplication for two reasons:
1. There's a mismatch with the pretty.c trailer-formatting code that
we rely on. It expects to see a closing paren, like "key=foo)". So
we duplicate the argument string with that extra character to pass
along.
This is probably something we could fix in the long run, but it's
somewhat non-trivial if we want to avoid regressing error cases for
things like "git log --format='%(trailer:oops'". So let's accept
it as a necessity for now.
2. The argument parser expects to store the list of "key" entries
("foo" in this case) in a string-list. It also stores the length of
the string in the string-list "util" field. The original caller in
pretty.c uses this with a "nodup" string list to avoid making extra
copies, which creates a subtle dependency on the lifetime of the
original format string.
We do the same here, which creates that same dependency. So we
can't simply free it as soon as the parsing is done.
There are two possible solutions here. The first is to hold on to the
duplicated "argbuf" string in the used_atom struct, so that it lives as
long as the string_list which references it.
But I think a less-subtle solution, and what this patch does, is to
switch to a duplicating string_list. That makes it self-contained, and
lets us free argbuf immediately. It may involve a few extra allocations,
but this parsing is something that happens once per program, not once
per output ref.
This clears up one case that LSan finds in t6300, but there are more.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The trailer API takes options via a trailer_opts struct. Some of those
options point to data structures which require extra storage. Those
structures aren't actually embedded in the options struct, but rather we
pass pointers, and the caller is responsible for managing them. This is
a little convoluted, but makes sense since some of them are not even
concrete (e.g., you can pass a filter function and a void data pointer,
but the trailer code doesn't even know what's in the pointer).
When for-each-ref, etc, parse the %(trailers) placeholder, they stuff
the extra data into a ref_trailer_buf struct. But we only hold a single
static global instance of this struct. So if a format string has
multiple %(trailer) placeholders, they'll stomp on each other: the "key"
list will end up with entries for all of them, and the separator buffers
will use the values from whichever was parsed last.
Instead, we should have a ref_trailer_buf for each instance of the
placeholder, and store it alongside the trailer_opts in the used_atom
structure.
And that's what this patch does. Note that we also have to add code to
clean them up in ref_array_clear(). The original code did not bother
cleaning them up, but it wasn't technically a "leak" since they were
still reachable from the static global instance.
Reported-by: Brooke Kuhlmann <brooke@alchemists.io>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no need to cast invalid_arg before freeing it. It is already a
non-const pointer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To expand the "%(trailers)" placeholder, we have to feed the commit or
tag body to the trailer API. But that API doesn't know anything about
signatures, and will be confused by a signed tag like this:
the subject
the body
Some-trailer: foo
-----BEGIN PGP SIGNATURE-----
...etc...
because it will start looking for trailers after the signature, and get
stopped walking backwards by the very non-trailer signature lines. So it
thinks there are no trailers.
This problem has existed since %(trailers) was added to the ref-filter
code, but back then trailers on tags weren't something we really
considered (commits don't have the same problem because their signatures
are embedded in the header). But since 066cef7707 (builtin/tag: add
--trailer option, 2024-05-05), we'd generate an object like the above
for "git tag -s --trailer 'Some-trailer: foo' my-tag".
The implementation here is pretty simple: we just make a NUL-terminated
copy of the non-signature part of the tag (which we've already parsed)
and pass it to the trailer API. There are some alternatives I rejected,
at least for now:
- the trailer code already understands skipping past some cruft at the
end of a commit, such as patch dividers. see find_end_of_log_message().
We could teach it to do the same for signatures. But since this is
the only context where we'd want that feature, and since we've already
parsed the object into subject/body/signature here, it seemed easier
to just pass in the truncated message.
- it would be nice if we could just pass in a pointer/len pair to the
trailer API (rather than a NUL-terminated string) to avoid the extra
copy. I think this is possible, since as noted above, the trailer
code already has to deal with ignoring some cruft at the end of the
input. But after an initial attempt at this, it got pretty messy, as
we have to touch a lot of intermediate functions that are also
called in other contexts.
So I went for the simple and stupid thing, at least for now. I don't
think the extra copy overhead will be all that bad. The previous
patch noted that an extra copy seemed to cause about 1-2% slowdown
for something simple like "%(subject)". But here we are only
triggering it for "%(trailers)" (and only when there is a
signature), and the trailer code is a bit allocation-heavy already.
I couldn't measure any difference formatting "%(trailers)" on
linux.git before and after (even though there are not even any
trailers to find).
Reported-by: Brooke Kuhlmann <brooke@alchemists.io>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we know we're going to show the subject or body of a tag or commit,
we call find_subpos(), which returns pointers and lengths for the three
parts: subject, body, signature.
Oddly, the function finds the signature twice: once by calling
parse_signature() at the start, which copies the signature into a
separate strbuf, and then again by calling parse_signed_buffer() after
we've parsed past the subject.
This is due to 482c119186 (gpg-interface: improve interface for parsing
tags, 2021-02-11) and 88bce0e24c (ref-filter: hoist signature parsing,
2021-02-11). The idea is that in a multi-hash world, tag signatures may
appear in the header, rather than at the end of the body, in which case
we need to extract them into a separate buffer.
But parse_signature() would never find such a buffer! It only looks for
signature lines (like "-----BEGIN PGP") at the start of each line,
without any header keyword. So this code will never find anything except
the usual in-body signature.
And the extra code has two downsides:
1. We spend time copying the payload and signature into strbufs. That
might even be useful if we ended up with a NUL-terminated copy of
the payload data, but we throw it away immediately. And the
signature, since it comes at the end of the message, is already its
own NUL-terminated buffer.
The overhead isn't huge, but I measured a pretty consistent 1-2%
speedup running "git for-each-ref --format='%(subject)'" with this
patch on a clone of linux.git.
2. The output of find_subpos() is a set of three ptr/len combinations,
but only two of them point into the original buffer. This makes the
interface confusing: you can't do pointer comparisons between them,
and you have to remember to free the signature buffer. Since
there's only one caller, it's not too bad in practice, but it did
bite me while working on the next patch (and simplifying it will
pave the way for that).
In the long run we might have to go back to something like this
approach, if we do have multi-hash header signatures. But I would argue
that the extra buffer should kick in only for a header signature, and be
passed out of find_subpos() separately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't usually include newlines in test titles, because you get funny
TAP output like:
ok 417 - show good signature with custom format
ok 418 - show good signature with custom format
with ssh
ok 419 - signature atom with grade option and bad signature
where a TAP parser would ignore the extra line anyway, giving the wrong
title. This comes from 26c9c03f0a (ref-filter: add new "signature" atom,
2023-06-04), and I think it was probably just editor line wrapping.
I checked for other cases with:
git grep "test_expect_success [A-Z_,]* '[^']*$"
git grep 'test_expect_success [A-Z_,]* "[^"]*$'
but this was the only hit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make our codebase compilable with the -Werror=unused-parameter
option.
* jk/unused-parameters:
CodingGuidelines: mention -Wunused-parameter and UNUSED
config.mak.dev: enable -Wunused-parameter by default
compat: mark unused parameters in win32/mingw functions
compat: disable -Wunused-parameter in win32/headless.c
compat: disable -Wunused-parameter in 3rd-party code
t-reftable-readwrite: mark unused parameter in callback function
gc: mark unused config parameter in virtual functions
More trace2 events at key points on push and fetch code paths have
been added.
* js/fetch-push-trace2-annotation:
send-pack: add new tracing regions for push
fetch: add top-level trace2 regions
trace2: implement trace2_printf() for event target
This is needed to build things with -Werror=unused-parameter on a
platform without symbolic link support.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We provide custom malloc/free callbacks for the pcre library to use.
Those take an extra "data" parameter, but we don't use it. Back when
these were added in 513f2b0bbd (grep: make PCRE2 aware of custom
allocator, 2019-10-16), we only had MAYBE_UNUSED.
But these days we have UNUSED, which we should prefer, as it will
let the compiler inform us if the code changes to actually use the
parameters.
I also moved the annotations to come after the variable name, which is
how we typically spell it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "opts" parameter is always used, so marking it with MAYBE_UNUSED is
just confusing.
This annotation goes back to 41abfe15d9 (maintenance: add pack-refs
task, 2021-02-09), when it really was unused. Back then we did not have
the UNUSED macro that would complain if the code changed to use the
parameter. So when we started using it in bfc2f9eb8e (builtin/gc:
forward git-gc(1)'s `--auto` flag when packing refs, 2024-03-25), nobody
noticed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A function that uses a parameter in one build may lose all uses of
the parameter in another build, depending on the configuration. A
workaround for such a case, MAYBE_UNUSED, should also be mentioned
when we recommend the use of UNUSED to our developers.
Keep the addition to the guideline short and document the criteria
to choose between UNUSED and MAYBE_UNUSED near their definition.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/unused-parameters:
CodingGuidelines: mention -Wunused-parameter and UNUSED
config.mak.dev: enable -Wunused-parameter by default
compat: mark unused parameters in win32/mingw functions
compat: disable -Wunused-parameter in win32/headless.c
compat: disable -Wunused-parameter in 3rd-party code
t-reftable-readwrite: mark unused parameter in callback function
gc: mark unused config parameter in virtual functions
The underlying machinery for "git diff-index" has long been made to
expand the sparse index as needed, but the command fully expanded
the sparse index upfront, which now has been taught not to do.
* ds/sparse-diff-index:
diff-index: integrate with the sparse index
Another test for reftable library ported to the unit test framework.
* cp/unit-test-reftable-block:
t-reftable-block: mark unused argv/argc
t-reftable-block: add tests for index blocks
t-reftable-block: add tests for obj blocks
t-reftable-block: add tests for log blocks
t-reftable-block: remove unnecessary variable 'j'
t-reftable-block: use xstrfmt() instead of xstrdup()
t-reftable-block: use block_iter_reset() instead of block_iter_close()
t-reftable-block: use reftable_record_key() instead of strbuf_addstr()
t-reftable-block: use reftable_record_equal() instead of check_str()
t-reftable-block: release used block reader
t: harmonize t-reftable-block.c with coding guidelines
t: move reftable/block_test.c to the unit testing framework
The code in the reftable library has been cleaned up by discarding
unused "generic" interface.
* ps/reftable-drop-generic:
reftable: mark unused parameters in empty iterator functions
reftable/generic: drop interface
t/helper: refactor to not use `struct reftable_table`
t/helper: use `hash_to_hex_algop()` to print hashes
t/helper: inline printing of reftable records
t/helper: inline `reftable_table_print()`
t/helper: inline `reftable_stack_print_directory()`
t/helper: inline `reftable_reader_print_file()`
t/helper: inline `reftable_dump_main()`
reftable/dump: drop unused `compact_stack()`
reftable/generic: move generic iterator code into iterator interface
reftable/iter: drop double-checking logic
reftable/stack: open-code reading refs
reftable/merged: stop using generic tables in the merged table
reftable/merged: rename `reftable_new_merged_table()`
reftable/merged: expose functions to initialize iterators
The command line prompt support used to be littered with bash-isms,
which has been corrected to work with more shells.
* ah/git-prompt-portability:
git-prompt: support custom 0-width PS1 markers
git-prompt: ta-da! document usage in other shells
git-prompt: don't use shell $'...'
git-prompt: add some missing quotes
git-prompt: replace [[...]] with standard code
git-prompt: don't use shell arrays
git-prompt: fix uninitialized variable
git-prompt: use here-doc instead of here-string
These unused parameters were marked in a68ec8683a (reftable: mark unused
parameters in virtual functions, 2024-08-17), but the functions were
moved to a new file in a parallel branch via f2406c81b9
(reftable/generic: move generic iterator code into iterator interface,
2024-08-22).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is conceptually the same as the cases in df9d638c24 (unit-tests:
ignore unused argc/argv, 2024-08-17), but this unit test was migrated
from the reftable tests in a parallel branch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that -Wunused-parameter is on by default for DEVELOPER=1 builds,
people may trigger it, blocking their build. When it's a mistake for the
parameter to exist, the path forward is obvious: remove it. But
sometimes you need to suppress the warning, and the "UNUSED" mechanism
for that is specific to our project, so people may not know about it.
Let's put some advice in CodingGuidelines, including an example warning
message. That should help people who grep for the warning text after
seeing it from the compiler.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Having now removed or annotated all of the unused function parameters in
our code base, I found that each instance falls into one of three
categories:
1. ignoring the parameter is a bug (e.g., a function takes a ptr/len
pair, but ignores the length). Detecting these helps us find the
bugs.
2. the parameter is unnecessary (and usually left over from a
refactoring or earlier iteration of a patches series). Removing
these cleans up the code.
3. the function has to conform to a specific interface (because it's
used via a function pointer, or matches something on the other side
of an #ifdef). These ones are annoying, but annotating them with
UNUSED is not too bad (especially if the compiler tells you about
the problem promptly).
Certainly instances of (3) are more common than (1), but after finding
all of these, I think there were enough cases of (1) that it justifies
the work in annotating all of the (3)s.
And since the code base is now at a spot where we compile cleanly with
-Wunused-parameter, turning it on will make it the responsibility of
individual patch writers going forward.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The compat/ directory contains many stub functions, wrappers, and so on
that have to conform to a specific interface, but don't necessarily need
to use all of their parameters. Let's mark them to avoid complaints from
-Wunused-parameter.
This was done mostly via guess-and-check with the Windows build in
GitHub CI. I also confirmed that the win+VS build is similarly happy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As with the files touched in the previous commit, win32/headless.c does
not include git-compat-util.h, so it doesn't have our UNUSED macro.
Unlike those ones, this is not third-party code, so it would not be a
big deal to modify it.
However, I'm not sure if including git-compat-util.h would create other
headaches (and I don't even have a machine to test this on; I'm relying
on Windows CI to compile it at all). Given how trivial the file is, and
that the unused parameters are not interesting (they are just
boilerplate for the wWinMain() function), we can just use the same trick
as the previous commit and disable the warnings via pragma.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We carry some vendored 3rd-party code in compat/ that does not build
cleanly with -Wunused-parameters. We could mark these with UNUSED, but
there are two reasons not to:
1. This is code imported from elsewhere, so we'd prefer to avoid
modifying it in an invasive way that could create conflicts if we
tried to pull in a new version.
2. These files don't include git-compat-util.h at all, so we'd need to
factor out (or repeat) our UNUSED macro.
In theory we could modify the build process to invoke the compiler with
the extra warning disabled for these files, but there are tricky corner
cases there (e.g., for NO_REGEX we cannot assume that the compiler
understands -Wno-unused-parameter as an option, so we'd have to use our
detect-compiler script).
Instead, let's rely on the gcc diagnostic #pragma. This is horribly
unportable, of course, but it should do what we want. Compilers which
don't understand this particular pragma should ignore it (per the
standard), and compilers which do care about "-Wunused-parameter" will
hopefully respect it, even if they are not gcc (e.g., clang does).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This spot was originally marked in in 4695c3f3a9 (reftable: mark unused
parameters in virtual functions, 2024-08-17), but was copied in
5b539a5361 (t: move reftable/readwrite_test.c to the unit testing
framework, 2024-08-13).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit d1ae15d68b (builtin/gc: refactor to read config into structure,
2024-08-16) added a new parameter to the maintenance_task virtual
functions, but most of them don't need to look at it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some cases, a user may be generating a patch for an old commit which
now has an out-of-date author or other identity. For example, consider a
team member who contributes to an internal fork of an upstream project,
but leaves before this change is submitted upstream.
In this case, the team members company address may no longer be valid,
and will thus bounce when sending email.
This can be manually avoided by editing the generated patch files, or by
carefully using --suppress-<cc|to> options. This requires a lot of
manual intervention and is easy to forget.
Git has support for mapping old email addresses and names to a canonical
name and address via the .mailmap file (and its associated mailmap.file,
mailmap.blob, and log.mailmap options).
Teach git send-email to enable mailmap support for all addresses. This
ensures that addresses point to the canonical real name and email
address.
Add the sendemail.mailmap configuration option and its associated
--mailmap (and --use-mailmap for compatibility with git log) options.
For now, the default behavior is to disable the mailmap in order to
avoid any surprises or breaking any existing setups.
These options support per-identity configuration via the
sendemail.identity configuration blocks. This enables identity-specific
configuration in cases where users may not want to enable support.
In addition, support send-email specific mailmap data via
sendemail.mailmap.file, sendemail.mailmap.blob and their
identity-specific variants.
The intention of these options is to enable mapping addresses which are
no longer valid to a current project or team maintainer. Such mappings
may change the actual person being referred to, and may not make sense
in a traditional mailmap file which is intended for updating canonical
name and address for the same individual.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git check-mailmap command reads the mailmap from either the default
.mailmap location and then from the mailmap.blob and mailmap.file
configurations.
A following change to git send-email will want to support new
configuration options based on the configured identity. The
identity-based configuration and options only make sense in the context
of git send-email.
Expose the read_mailmap_file and read_mailmap_blob functions from
mailmap.c. Teach git check-mailmap the --mailmap-file and
--mailmap-blob options which load the additional mailmap sources.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git check-mailmap splits each provided contact using split_ident_line.
This function requires that the contact either be of the form "Name
<user@host>" or of the form "<user@host>". In particular, if the mail
portion of the contact is not surrounded by angle brackets,
split_ident_line will reject it.
This results in git check-mailmap rejecting attempts to translate simple
email addresses:
$ git check-mailmap user@host
fatal: unable to parse contact: user@host
This limits the usability of check-mailmap as it requires placing angle
brackets around plain email addresses.
In particular, attempting to use git check-mailmap to support mapping
addresses in git send-email is not straight forward. The sanitization
and validation functions in git send-email strip angle brackets from
plain email addresses. It is not trivial to add brackets prior to
invoking git check-mailmap.
Instead, modify check_mailmap() to allow such strings as contacts. In
particular, treat any line which cannot be split by split_ident_line as
a simple email address.
No attempt is made to actually parse the address line, or validate that
it is actually an email address. Implementing such validation is not
trivial. Besides, we weren't validating the address between angle
brackets before anyways.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>