Commit Graph

12420 Commits (v2.48.0-rc0)

Author SHA1 Message Date
Patrick Steinhardt 1c299d03e5 refs: introduce "initial" transaction flag
There are two different ways to commit a transaction:

  - `ref_transaction_commit()` can be used to commit a regular
    transaction and is what almost every caller wants.

  - `initial_ref_transaction_commit()` can be used when it is known that
    the ref store that the transaction is committed for is empty and
    when there are no concurrent processes. This is used when cloning a
    new repository.

Implementing this via two separate functions has a couple of downsides.
First, every reference backend needs to implement a separate callback
even in the case where they don't special-case the initial transaction.
Second, backends are basically forced to reimplement the whole logic for
how to commit the transaction like the "files" backend does, even though
backends may wish to only tweak certain behaviour of a "normal" commit.
Third, it is awkward that callers must never prepare the transaction as
this is somewhat different than how a transaction typically works.

Refactor the code such that we instead mark initial transactions via a
separate flag when starting the transaction. This addresses all of the
mentioned painpoints, where the most important part is that it will
allow backends to have way more leeway in how exactly they want to
handle the initial transaction.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:15 +09:00
Patrick Steinhardt a0efef1446 refs: allow passing flags when setting up a transaction
Allow passing flags when setting up a transaction such that the
behaviour of the transaction itself can be altered. This functionality
will be used in a subsequent patch.

Adapt callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21 07:59:14 +09:00
Junio C Hamano 38e4df6615 Merge branch 'la/trailer-info'
Renaming a handful of variables and structure fields.

* la/trailer-info:
  trailer: spread usage of "trailer_block" language
2024-11-20 14:47:17 +09:00
Junio C Hamano 0c11ef1356 Merge branch 'jt/repack-local-promisor'
"git gc" discards any objects that are outside promisor packs that
are referred to by an object in a promisor pack, and we do not
refetch them from the promisor at runtime, resulting an unusable
repository.  Work it around by including these objects in the
referring promisor pack at the receiving end of the fetch.

* jt/repack-local-promisor:
  index-pack: repack local links into promisor packs
  t5300: move --window clamp test next to unclamped
  t0410: use from-scratch server
  t0410: make test description clearer
2024-11-20 14:47:16 +09:00
Junio C Hamano cc53ddf7f0 Merge branch 'db/submodule-fetch-with-remote-name-fix' into maint-2.47
A "git fetch" from the superproject going down to a submodule used
a wrong remote when the default remote names are set differently
between them.

* db/submodule-fetch-with-remote-name-fix:
  submodule: correct remote name with fetch
2024-11-20 14:43:00 +09:00
Junio C Hamano 76c1953395 Merge branch 'ps/maintenance-start-crash-fix' into maint-2.47
"git maintenance start" crashed due to an uninitialized variable
reference, which has been corrected.

* ps/maintenance-start-crash-fix:
  builtin/gc: fix crash when running `git maintenance start`
2024-11-20 14:42:58 +09:00
Junio C Hamano f1a50f12b9 Merge branch 'jk/fsmonitor-event-listener-race-fix' into maint-2.47
On macOS, fsmonitor can fall into a race condition that results in
a client waiting forever to be notified for an event that have
already happened.  This problem has been corrected.

* jk/fsmonitor-event-listener-race-fix:
  fsmonitor: initialize fs event listener before accepting clients
  simple-ipc: split async server initialization and running
2024-11-20 14:42:57 +09:00
Jonathan Tan 1f2be8bed6 index-pack: teach --promisor to forbid pack name
Currently,

 - Running "index-pack --promisor" outside a repo segfaults.
 - It may be confusing to a user that running "index-pack --promisor"
   within a repo may make changes to the repo's object DB, especially
   since the packs indexed by the index-pack invocation may not even be
   related to the repo.

As discussed in [1] and [2], teaching --promisor to forbid a packfile
name solves both these problems. This combination of arguments requires
a repo (since we are writing the resulting .pack and .idx to it) and it
is clear that the files are related to the repo.

Currently, Git uses "index-pack --promisor" only when fetching into
a repo, so it could be argued that we should teach "index-pack" a
new argument (say, "--fetching-mode") instead of tying --promisor to
a generic argument like the packfile name. However, this --promisor
feature could conceivably be used whenever we have a packfile that is
known to come from the promisor remote (whether obtained through Git's
fetch protocol or through other means) so not using a new argument seems
reasonable - one could envision a user-made script obtaining a packfile
and then running "index-pack --promisor --stdin", for example. In fact,
it might be possible to relax the restriction further (say, by also
allowing --promisor when indexing a packfile that is in the object DB),
but relaxing the restriction is backwards-compatible so we can revisit
that later.

One thing to watch out for is the possibility of a future Git feature
that indexes a pack in the context of a repo, but does not necessarily
write the resulting pack to it (and does not necessarily desire to
make any changes to the object DB). One such feature would be fetch
quarantine, which might need the repo context in order to detect
hash collisions, but would also need to ensure that the object DB
is undisturbed in case the fetch fails for whatever reason, even if
the reason occurs only after the indexing is complete. It may not be
obvious to the implementer of such a feature that "index-pack" could
sometimes write packs other than the indexed pack to the object DB,
but there are already other ways that "fetch" could write to the object
DB (in particular, packfile URIs and bundle URIs), so hopefully the
implementation of this future feature would already include a test that
the object DB be undisturbed.

This change requires the change to t5300 by 1f52cdfacb (index-pack:
document and test the --promisor option, 2022-03-09) to be undone.
(--promisor is already tested indirectly, so we don't need the explicit
test here any more.)

[1] https://lore.kernel.org/git/20241114005652.GC1140565@coredump.intra.peff.net/
[2] https://lore.kernel.org/git/20241119185345.GB15723@coredump.intra.peff.net/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-20 10:37:56 +09:00
Patrick Steinhardt 656ca9204a builtin/gc: provide hint when maintenance hits a stale schedule lock
When running scheduled maintenance via `git maintenance start`, we
acquire a lockfile to ensure that no other scheduled maintenance task is
running in the repository concurrently. If so, we do provide an error to
the user hinting that another process seems to be running in this repo.

There are two important cases why such a lockfile may exist:

  - An actual git-maintenance(1) process is still running in this
    repository.

  - An earlier process may have crashed or was interrupted part way
    through and has left a stale lockfile behind.

In c95547a394 (builtin/gc: fix crash when running `git maintenance
start`, 2024-10-10), we have fixed an issue where git-maintenance(1)
would crash with the "start" subcommand, and the underlying bug causes
the second scenario to trigger quite often now.

Most users don't know how to get out of that situation again though.
Ideally, we'd be removing the stale lock for our users automatically.
But in the context of repository maintenance this is rather risky, as it
can easily run for hours or even days. So finding a clear point where we
know that the old process has exited is basically impossible.

We have the same issue in other subsystems, e.g. when locking refs. Our
lockfile interfaces thus provide the `unable_to_lock_message()` function
for exactly this purpose: it provides a nice hint to the user that
explains what is going on and how to get out of that situation again by
manually removing the file.

Adapt git-maintenance(1) to print a similar hint. While we could use the
above function, we can provide a bit more context as we know exactly
what kind of process would create the lockfile.

Reported-by: Miguel Rincon Barahona <mrincon@gitlab.com>
Reported-by: Kev Kloss <kkloss@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-20 10:26:12 +09:00
Elijah Newren 5e904f1a4a fast-import: avoid making replace refs point to themselves
If someone replaces a commit with a modified version, then builds on
that commit, and then later decides to rewrite history in a format like

    git fast-export --all | CMD_TO_TWEAK_THE_STREAM | git fast-import

and CMD_TO_TWEAK_THE_STREAM undoes the modifications that the
replacement did, then at the end you'd get a replace ref that points to
itself.  For example:

    $ git show-ref | grep replace
    fb92ebc654641b310e7d0360d0a5a49316fd7264 refs/replace/fb92ebc654641b310e7d0360d0a5a49316fd7264

Git commands which pay attention to replace refs will die with an error
when a self-referencing replace ref is present:

    $ git log
    fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264

Avoid such problems by deleting replace refs that will simply end up
pointing to themselves at the end of our writing.  Unless users specify
--quiet, warn them when we delete such a replace ref.

Two notes about this patch:
  * We are not ignoring the problematic update of the replace ref
    (turning it into a no-op), we are replacing the update with a delete.
    The logic here is that if the repository had a value for the replace
    ref before fast-import was run, and the replace ref was explicitly
    named in the fast-import stream, we don't want the replace ref to be
    left with a pre-fast-import value.
  * While loops with more than one element (e.g. refs/replace/A points
    to B, and refs/replace/B points to A) are possible, they seem much
    less plausible.  It is pretty easy to create a sequence of
    git-filter-repo commands that will trigger a self-referencing replace
    ref, but I do not know how to trigger a scenario with a cycle length
    greater than 1.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 09:39:33 +09:00
brian m. carlson e8b3bcf491 index-pack: rename struct thread_local
"thread_local" is a keyword in C23.  To make sure that our code compiles
on a wide variety of C versions, rename struct thread_local to "struct
thread_local_data" to avoid a conflict.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 09:42:08 +09:00
brian m. carlson 0ffb5a6bf1 Allow cloning from repositories owned by another user
Historically, Git has allowed users to clone from an untrusted
repository, and we have documented that this is safe to do so:

    `upload-pack` tries to avoid any dangerous configuration options or
    hooks from the repository it's serving, making it safe to clone an
    untrusted directory and run commands on the resulting clone.

However, this was broken by f4aa8c8bb1 ("fetch/clone: detect dubious
ownership of local repositories", 2024-04-10) in an attempt to make
things more secure.  That change resulted in a variety of problems when
cloning locally and over SSH, but it did not change the stated security
boundary.  Because the security boundary has not changed, it is safe to
adjust part of the code that patch introduced.

To do that and restore the previous functionality, adjust enter_repo to
take two flags instead of one.

The two bits are

 - ENTER_REPO_STRICT: callers that require exact paths (as opposed
   to allowing known suffixes like ".git", ".git/.git" to be
   omitted) can set this bit.  Corresponds to the "strict" parameter
   that the flags word replaces.

 - ENTER_REPO_ANY_OWNER_OK: callers that are willing to run without
   ownership check can set this bit.

The former is --strict-paths option of "git daemon".  The latter is
set only by upload-pack, which honors the claimed security boundary.

Note that local clones across ownership boundaries require --no-local so
that upload-pack is used.  Document this fact in the manual page and
provide an example.

This patch was based on one written by Junio C Hamano.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-15 11:05:06 +09:00
Taylor Blau e199290592 pack-objects: only perform verbatim reuse on the preferred pack
When reusing objects from source pack(s), write_reused_pack_verbatim()
is responsible for reusing objects whole eword_t's at a time. It works
by taking the longest continuous run of objects from the beginning of
each source pack that the caller wants, and reuses the entirety of that
section from each pack.

This is based on the assumption that we don't have any gaps within the
region. This assumption relieves us from having to patch any
OFS_DELTAs, since we know that there aren't any gaps between any delta
and its base in that region.

To illustrate why this assumption is necessary, suppose we have some
pack P, which has objects X, Y, and Z. If the MIDX's copy of Y was
selected from a pack other than P, then the bit corresponding to object
Y will appear earlier in the bitmap than the bits corresponding to X and
Z.

If pack-objects already has or will use the copy of Y from the pack it
was selected from in the MIDX, then it is an error to reuse all objects
between X and Z in the source pack. Doing so will cause us to reuse Y
from a different pack than the one which represents Y in the MIDX,
causing us to either:

 - include the object twice, assuming that the caller wants Y in the
   pack, or

 - include the object once, resulting in us packing more objects than
   necessary.

This regression comes from ca0fd69e37 (pack-objects: prepare
`write_reused_pack_verbatim()` for multi-pack reuse, 2023-12-14), which
incorrectly assumed that there would be no gaps in reusable regions of
non-preferred packs.

Instead, we can only safely perform the whole-word reuse optimization on
the preferred pack, where we know with certainty that no gaps exist in
that region of the bitmap. We can still reuse objects from non-preferred
packs, but we have to inspect them individually in write_reused_pack()
to ensure that any gaps that may exist are accounted for.

This allows us to simplify the implementation of
write_reused_pack_verbatim() back to almost its pre-multi-pack reuse
form, since we can now assume that the beginning of the pack appears at
the beginning of the bitmap, meaning that we don't have to account for
any bits up to the first word boundary (like we had to special case in
ca0fd69e37).

The only significant changes from the pre-ca0fd69e37 implementation are:

 - that we can no longer inspect words up to the end of
   reuse_packfile_bitmap->word_alloc, since we only want to look at
   words whose bits all correspond to objects in the given packfile, and

 - that we return early when given a reuse_packfile which is not
   preferred, making the call a noop.

In the future, it might be possible to restore this optimization if we
could guarantee that some reuse packs don't contain any gaps by
construction (similar to the "disjoint packs" idea in very early
versions of multi-pack reuse).

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-15 09:13:31 +09:00
Junio C Hamano 51ba601160 Merge branch 'en/shallow-exclude-takes-a-ref-fix'
The "--shallow-exclude=<ref>" option to various history transfer
commands takes a ref, not an arbitrary revision.

* en/shallow-exclude-takes-a-ref-fix:
  doc: correct misleading descriptions for --shallow-exclude
  upload-pack: fix ambiguous error message
2024-11-13 08:35:32 +09:00
Junio C Hamano 6890c99e38 Merge branch 'ps/leakfixes-part-9'
More leakfixes.

* ps/leakfixes-part-9: (22 commits)
  list-objects-filter-options: work around reported leak on error
  builtin/merge: release output buffer after performing merge
  dir: fix leak when parsing "status.showUntrackedFiles"
  t/helper: fix leaking buffer in "dump-untracked-cache"
  t/helper: stop re-initialization of `the_repository`
  sparse-index: correctly free EWAH contents
  dir: release untracked cache data
  combine-diff: fix leaking lost lines
  builtin/tag: fix leaking key ID on failure to sign
  transport-helper: fix leaking import/export marks
  builtin/commit: fix leaking cleanup config
  trailer: fix leaking strbufs when formatting trailers
  trailer: fix leaking trailer values
  builtin/commit: fix leaking change data contents
  upload-pack: fix leaking URI protocols
  pretty: clear signature check
  diff-lib: fix leaking diffopts in `do_diff_cache()`
  revision: fix leaking bloom filters
  builtin/grep: fix leak with `--max-count=0`
  grep: fix leak in `grep_splice_or()`
  ...
2024-11-13 08:35:31 +09:00
Simon Marchi 98e4015593 builtin/difftool: intialize some hashmap variables
When running a dir-diff command that produces no diff, variables
`wt_modified` and `tmp_modified` are used while uninitialized, causing:

    $ /home/smarchi/src/git/git-difftool --dir-diff master
    free(): invalid pointer
    [1]    334004 IOT instruction (core dumped)  /home/smarchi/src/git/git-difftool --dir-diff master
    $ valgrind --track-origins=yes /home/smarchi/src/git/git-difftool --dir-diff master
    ...
    Invalid free() / delete / delete[] / realloc()
       at 0x48478EF: free (vg_replace_malloc.c:989)
       by 0x422CAC: hashmap_clear_ (hashmap.c:208)
       by 0x283830: run_dir_diff (difftool.c:667)
       by 0x284103: cmd_difftool (difftool.c:801)
       by 0x238E0F: run_builtin (git.c:484)
       by 0x2392B9: handle_builtin (git.c:750)
       by 0x2399BC: cmd_main (git.c:921)
       by 0x356FEF: main (common-main.c:64)
     Address 0x1ffefff180 is on thread 1's stack
     in frame #2, created by run_dir_diff (difftool.c:358)
    ...

If taking any `goto finish` path before these variables are initialized,
`hashmap_clear_and_free()` operates on uninitialized data, sometimes
causing a crash.

This regression was introduced in 7f795a1715 (builtin/difftool: plug
several trivial memory leaks, 2024-09-26).

Fix it by initializing those variables with the `HASHMAP_INIT` macro.

Add a test comparing the main branch to itself, resulting in no diff.

Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-13 08:11:19 +09:00
Jeff King fe17a25905 refspec: store raw refspecs inside refspec_item
The refspec struct keeps two matched arrays: one for the refspec_item
structs and one for the original raw refspec strings. The main reason
for this is that there are other users of refspec_item that do not care
about the raw strings. But it does make managing the refspec struct
awkward, as we must keep the two arrays in sync. This has led to bugs in
the past (both leaks and double-frees).

Let's just store a copy of the raw refspec string directly in each
refspec_item struct. This simplifies the handling at a small cost:

  1. Direct callers of refspec_item_init() will now get an extra copy of
     the refspec string, even if they don't need it. This should be
     negligible, as the struct is already allocating two strings for the
     parsed src/dst values (and we tend to only do it sparingly anyway
     for things like the TAG_REFSPEC literal).

  2. Users of refspec_appendf() will now generate a temporary string,
     copy it, and then free the result (versus handing off ownership of
     the temporary string). We could get around this by having a "nodup"
     variant of refspec_item_init(), but it doesn't seem worth the extra
     complexity for something that is not remotely a hot code path.

Code which accesses refspec->raw now needs to look at refspec->item.raw.
Other callers which just use refspec_item directly can remain the same.
We'll free the allocated string in refspec_item_clear(), which they
should be calling anyway to free src/dst.

One subtle note: refspec_item_init() can return an error, in which case
we'll still have set its "raw" field. But that is also true of the "src"
and "dst" fields, so any caller which does not _clear() the failed item
is already potentially leaking. In practice most code just calls die()
on an error anyway, but you can see the exception in valid_fetch_refspec(),
which does correctly call _clear() even on error.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12 18:16:48 +09:00
Jeff King d36af33081 refspec: drop separate raw_nr count
A refspec struct contains zero or more refspec_item structs, along with
matching "raw" strings. The items and raw strings are kept in separate
arrays, but those arrays will always have the same length (because we
write them only via refspec_append_nodup(), which grows both). This can
lead to bugs when manipulating the array, since the arrays and lengths
must be modified in lockstep. For example, the bug fixed in the previous
commit, which forgot to decrement raw_nr.

So let's get rid of "raw_nr" and have only "nr", making this kind of bug
impossible (and also making it clear that the two are always matched,
something that existing code already assumed but was not guaranteed by
the interface).

Even though we'd expect "alloc" and "raw_alloc" to likewise move in
lockstep, we still need to keep separate counts there if we want to
continue to use ALLOC_GROW() for both.

Conceptually this would all be simpler if refspec_item just held onto
its own raw string, and we had a single array. But there are callers
which use refspec_item outside of "struct refspec" (and so don't hold on
to a matching "raw" string at all), which we'd possibly need to adjust.
So let's not worry about refactoring that for now, and just get rid of
the redundant count variable. That is the first step on the road to
combining them anyway.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12 18:16:48 +09:00
Jeff King b970509c59 fetch: adjust refspec->raw_nr when filtering prefetch refspecs
In filter_prefetch_refspecs(), we may remove one or more refspecs if
they point into refs/tags/. When we do, we remove the item from the
refspec->items array, shifting subsequent items down, and then decrement
the refspec->nr count.

We also remove the item from the refspec->raw array, but fail to
decrement refspec->raw_nr. This leaves us with a count that is too high,
and anybody looking at the "raw" array will erroneously see either:

  1. The removed entry, if there were no subsequent items to shift down.

  2. A duplicate of the final entry, as everything is shifted down but
     there was nothing to overwrite the final item.

The obvious culprit to run into this is calling refspec_clear(), which
will try to free the removed entry (case 1) or double-free the final
entry (case 2). But even though the bug has existed since the function
was added in 2e03115d0c (fetch: add --prefetch option, 2021-04-16), we
did not trigger it in the test suite. The --prefetch option is normally
only used with configured refspecs, and we never bother to call
refspec_clear() on those (they are stored as part of a struct remote,
which is held in a global variable).

But you could trigger case 2 manually like:

  git fetch --prefetch . refs/tags/foo refs/tags/bar

Ironically you couldn't trigger case 1, because the code accidentally
leaked the string in the raw array, and the two bugs (the leak and the
double-free) cancelled out. But when we fixed the leak in ea4780307c
(fetch: free "raw" string when shrinking refspec, 2024-09-24), it became
possible to trigger that, too, with a single item:

  git fetch --prefetch . refs/tags/foo

We can fix both cases by just correctly decrementing "raw_nr" when we
shrink the array. Even though we don't expect people to use --prefetch
with command-line refspecs, we'll add a test to make sure it behaves
well (like the test just before it, we're just confirming that the
filtered prefetch succeeds at all).

Reported-by: Eric Mills <ermills@epic.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12 18:16:47 +09:00
Jonathan Tan c08589efdc index-pack: repack local links into promisor packs
Teach index-pack to, when processing the objects in a pack with
--promisor specified on the CLI, repack local objects (and the local
objects that they refer to, recursively) referenced by these objects
into promisor packs.

This prevents the situation in which, when fetching from a promisor
remote, we end up with promisor objects (newly fetched) referring
to non-promisor objects (locally created prior to the fetch). This
situation may arise if the client had previously pushed objects to the
remote, for example. One issue that arises in this situation is that,
if the non-promisor objects become inaccessible except through promisor
objects (for example, if the branch pointing to them has moved to
point to the promisor object that refers to them), then GC will garbage
collect them. There are other ways to solve this, but the simplest
seems to be to enforce the invariant that we don't have promisor objects
referring to non-promisor objects.

This repacking is done from index-pack to minimize the performance
impact. During a fetch, the only time most objects are fully inflated
in memory is when their object ID is computed, so we also scan the
objects (to see which objects they refer to) during this time.

Also to minimize the performance impact, an object is calculated to be
local if it's a loose object or present in a non-promisor pack. (If it's
also in a promisor pack or referred to by an object in a promisor pack,
it is technically already a promisor object. But a misidentification
of a promisor object as a non-promisor object is relatively benign
here - we will thus repack that promisor object into a promisor pack,
duplicating it in the object store, but there is no correctness issue,
just an issue of inefficiency.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-12 10:18:16 +09:00
Junio C Hamano 02a2d5706d Merge branch 'jk/left-right-bitmap'
When called with '--left-right' and '--use-bitmap-index', 'rev-list'
will produce output without any left/right markers, which has been
corrected.

* jk/left-right-bitmap:
  rev-list: skip bitmap traversal for --left-right
2024-11-08 12:56:28 +09:00
Jeff King b8150bfee1 describe: stop traversing when we run out of names
When trying to describe a commit, we'll traverse from the commit,
collecting candidate tags that point to its ancestors. But once we've
seen all of the tags in the repo, there's no point in traversing
further. There's nothing left to find!

For a default "git describe", this isn't usually a big problem. In a
large repo you'll probably have multiple tags, so we'll eventually find
10 candidates (the default for max_candidates) and stop there. And in a
small repo, it's quick to traverse to the root.

But you can imagine a large repo with few tags. Or, as we saw in a real
world case, explicitly limiting the set of matches like this (on
linux.git):

  git describe --match=v6.12-rc4 HEAD

which goes all the way to the root before realizing that no, there are
no other tags under consideration besides the one we fed via --match.
If we add in "--candidates=1" there, it's much faster (at least as of
the previous commit).

But we should be able to speed this up without the user asking for it.
After expanding all matching tags, we know the total number of names. We
could just stop the traversal there, but as hinted at above we already
have a mechanism for doing that: the max_candidate limit. So we can just
reduce that limit to match the number of possible candidates.

Our p6100 test shows this off:

  Test                                           HEAD^             HEAD
  ---------------------------------------------------------------------------------------
  6100.2: describe HEAD                          0.71(0.65+0.06)   0.72(0.68+0.04) +1.4%
  6100.3: describe HEAD with one max candidate   0.01(0.00+0.00)   0.01(0.00+0.00) +0.0%
  6100.4: describe HEAD with one tag             0.72(0.66+0.05)   0.01(0.00+0.00) -98.6%

Now we are fast automatically, just as if --candidates=1 were supplied
by the user.

Reported-by: Josh Poimboeuf <jpoimboe@kernel.org>
Helped-by: Rasmus Villemoes <ravi@prevas.dk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-07 13:28:22 +09:00
Jeff King 7379046221 describe: stop digging for max_candidates+1
By default, describe considers only 10 candidate matches, and stops
traversing when we have enough. This makes things much faster in a large
repository, where collecting all candidates requires walking all the way
down to the root (or at least to the oldest tag). This goes all the way
back to 8713ab3079 (Improve git-describe performance by reducing
revision listing., 2007-01-13).

However, we don't stop immediately when we have enough candidates. We
keep traversing and only bail when we find one more candidate that we're
ignoring. Usually this is not too expensive, if the tags are sprinkled
evenly throughout history. But if you are unlucky, you might hit the max
candidate quickly, and then have a huge swath of history before finding
the next one.

Our p6100 test has exactly this unlucky case: with a max of "1", we find
a recent tag quickly and then have to go all the way to the root to find
the old tag that will be discarded.

A more interesting real-world case is:

  git describe --candidates=1 --match=v6.12-rc4 HEAD

in the linux.git repo. There we restrict the set of tags to a single
one, so there is no older candidate to find at all! But despite
--candidates=1, we keep traversing to the root only to find nothing.

So why do we keep traversing after hitting thet max? There are two
reasons I can see:

  1. In theory the extra information that there was another candidate
     could be useful, and we record it in the gave_up_on variable. But
     we only show this information with --debug.

  2. After finding the candidate, there's more processing we do in our
     loop. The most important of this is propagating the "within" flags
     to our parent commits, and putting them in the commit_list we'll
     use for finish_depth_computation().

     That function continues the traversal until we've counted all
     commits reachable from the starting point but not reachable from
     our best candidate tag (so essentially counting "$tag..$start", but
     avoiding re-walking over the bits we've seen).  If we break
     immediately without putting those commits into the list, our depth
     computation will be wrong (in the worst case we'll count all the
     way down to the root, not realizing those commits are included in
     our tag).

But we don't need to find a new candidate for (2). As soon as we finish
the loop iteration where we hit max_candidates, we can then quit on the
next iteration. This should produce the same output as the original code
(which could, after all, find a candidate on the very next commit
anyway) but ends the traversal with less pointless digging.

We still have to set "gave_up_on"; we've popped it off the list and it
has to go back. An alternative would be to re-order the loop so that it
never gets popped, but it's perhaps still useful to show in the --debug
output, so we need to know it anyway. We do have to adjust the --debug
output since it's now just a commit where we stopped traversing, and not
the max+1th candidate.

p6100 shows the speedup using linux.git:

  Test                                           HEAD^             HEAD
  ---------------------------------------------------------------------------------------
  6100.2: describe HEAD                          0.70(0.63+0.06)   0.71(0.66+0.04) +1.4%
  6100.3: describe HEAD with one max candidate   0.70(0.64+0.05)   0.01(0.00+0.00) -98.6%
  6100.4: describe HEAD with one tag             0.70(0.67+0.03)   0.70(0.63+0.06) +0.0%

Reported-by: Josh Poimboeuf <jpoimboe@kernel.org>
Helped-by: Rasmus Villemoes <ravi@prevas.dk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-07 13:28:22 +09:00
Junio C Hamano 2664f2a0cb Merge branch 'ps/leakfixes-part-9' into ps/leakfixes-part-10
* ps/leakfixes-part-9: (22 commits)
  list-objects-filter-options: work around reported leak on error
  builtin/merge: release output buffer after performing merge
  dir: fix leak when parsing "status.showUntrackedFiles"
  t/helper: fix leaking buffer in "dump-untracked-cache"
  t/helper: stop re-initialization of `the_repository`
  sparse-index: correctly free EWAH contents
  dir: release untracked cache data
  combine-diff: fix leaking lost lines
  builtin/tag: fix leaking key ID on failure to sign
  transport-helper: fix leaking import/export marks
  builtin/commit: fix leaking cleanup config
  trailer: fix leaking strbufs when formatting trailers
  trailer: fix leaking trailer values
  builtin/commit: fix leaking change data contents
  upload-pack: fix leaking URI protocols
  pretty: clear signature check
  diff-lib: fix leaking diffopts in `do_diff_cache()`
  revision: fix leaking bloom filters
  builtin/grep: fix leak with `--max-count=0`
  grep: fix leak in `grep_splice_or()`
  ...
2024-11-07 13:25:01 +09:00
Elijah Newren 00e10e0751 doc: correct misleading descriptions for --shallow-exclude
The documentation for the --shallow-exclude option to clone/fetch/etc.
claims that the option takes a revision, but it does not.  As per
upload-pack.c's process_deepen_not(), it passes the option to
expand_ref() and dies if it does not find exactly one ref matching the
name passed.  Further, this has always been the case ever since these
options were introduced by the commits merged in a460ea4a3c (Merge
branch 'nd/shallow-deepen', 2016-10-10).  Fix the documentation to
match the implementation.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:53:23 -08:00
Patrick Steinhardt ff67083ccd builtin/merge: release output buffer after performing merge
The `obuf` member of `struct merge_options` is used to buffer output in
some cases. In order to not discard its allocated memory we only release
its contents in `merge_finalize()` when we're not currently recursing
into a subtree.

This results in some situations where we seemingly do not release the
buffer reliably. We thus have calls to `strbuf_release()` for this
buffer scattered across the codebase. But we're missing one callsite in
git-merge(1), which causes a memory leak.

We should ideally refactor this interface so that callers don't have to
know about any such internals. But for now, paper over the issue by
adding one more `strbuf_release()` call.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:57 -08:00
Patrick Steinhardt d06e3ec858 builtin/tag: fix leaking key ID on failure to sign
We do not free the key ID when signing a tag fails. Do so by using
the common exit path.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:55 -08:00
Patrick Steinhardt 6ef9f77a15 builtin/commit: fix leaking cleanup config
The cleanup string set by the config is leaking when it is being
overridden by an option. Fix this by tracking these via two separate
variables such that we can free the old value.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:54 -08:00
Patrick Steinhardt d34b5cbf02 builtin/commit: fix leaking change data contents
While we free the worktree change data, we never free its contents. Fix
this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:53 -08:00
Patrick Steinhardt 43fedde3df builtin/grep: fix leak with `--max-count=0`
When executing with `--max-count=0` we'll return early from git-grep(1)
without performing any cleanup, which causes memory leaks. Plug these.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:52 -08:00
Patrick Steinhardt 5f5dd8e297 builtin/ls-remote: plug leaking server options
The list of server options populated via `OPT_STRING_LIST()` is never
cleared, causing a memory leak. Plug it.

This leak is exposed by t5702, but plugging it alone does not make the
whole test suite pass.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04 22:37:51 -08:00
Taylor Blau 1c5a712f26 Merge branch 'jk/dumb-http-finalize'
The dumb-http code regressed when the result of re-indexing a pack
yielded an *.idx file that differs in content from the *.idx file it
downloaded from the remote. This has been corrected by no longer
relying on the *.idx file we got from the remote.

* jk/dumb-http-finalize:
  packfile: use oidread() instead of hashcpy() to fill object_id
  packfile: use object_id in find_pack_entry_one()
  packfile: convert find_sha1_pack() to use object_id
  http-walker: use object_id instead of bare hash
  packfile: warn people away from parse_packed_git()
  packfile: drop sha1_pack_index_name()
  packfile: drop sha1_pack_name()
  packfile: drop has_pack_index()
  dumb-http: store downloaded pack idx as tempfile
  t5550: count fetches in "previously-fetched .idx" test
  midx: avoid duplicate packed_git entries
2024-11-01 12:53:32 -04:00
Taylor Blau 20ab7fa3b6 Merge branch 'sa/notes-edit'
Teach 'git notes add' and 'git notes append' a new '-e' flag,
instructing them to open the note in $GIT_EDITOR before saving.

* sa/notes-edit:
  notes: teach the -e option to edit messages in editor
2024-11-01 12:53:25 -04:00
Taylor Blau 7b1f01f02e Merge branch 'ss/duplicate-typos'
Typofixes.

* ss/duplicate-typos:
  global: Fix duplicate word typos
2024-11-01 12:53:23 -04:00
Taylor Blau 787297b396 Merge branch 'rj/cygwin-exit'
Treat ECONNABORTED the same as ECONNRESET in 'git credential-cache' to
work around a possible Cygwin regression. This resolves a race condition
caused by changes in Cygwin's handling of socket closures, allowing the
client to exit cleanly when encountering ECONNABORTED.

* rj/cygwin-exit:
  credential-cache: treat ECONNABORTED like ECONNRESET
2024-11-01 12:53:19 -04:00
Taylor Blau 268fd2fe58 Merge branch 'ps/platform-compat-fixes'
Various platform compatibility fixes split out of the larger effort
to use Meson as the primary build tool.

* ps/platform-compat-fixes:
  t6006: fix prereq handling with `test_format ()`
  http: fix build error on FreeBSD
  builtin/credential-cache: fix missing parameter for stub function
  t7300: work around platform-specific behaviour with long paths on MinGW
  t5500, t5601: skip tests which exercise paths with '[::1]' on Cygwin
  t3404: work around platform-specific behaviour on macOS 10.15
  t1401: make invocation of tar(1) work with Win32-provided one
  t/lib-gpg: fix setup of GNUPGHOME in MinGW
  t/lib-gitweb: test against the build version of gitweb
  t/test-lib: wire up NO_ICONV prerequisite
  t/test-lib: fix quoting of TEST_RESULTS_SAN_FILE
2024-11-01 12:53:17 -04:00
Jeff King 16a186fede rev-list: skip bitmap traversal for --left-right
Running:

  git rev-list --left-right --use-bitmap-index one...two

will produce output without any left-right markers, since the bitmap
traversal returns only a single set of reachable commits. Instead we
should refuse to use bitmaps here and produce the correct output using a
traditional traversal.

This is probably not the only remaining option that misbehaves with
bitmaps, but it's particularly egregious in that it feels like it
_could_ work. Doing two separate traversals for the left/right sides and
then taking the symmetric set differences should yield the correct
answer, but our traversal code doesn't know how to do that.

It's not clear if naively doing two separate traversals would always be
a performance win. A traditional traversal only needs to walk down to
the merge base, but bitmaps always fill out the full reachability set.
So depending on your bitmap coverage, we could end up walking old bits
of history twice to fill out the same uninteresting bits on both sides.
We'd also of course end up with a very large --boundary set, if the user
asked for that.

So this might or might not be something worth implementing later. But
for now, let's make sure we don't produce the wrong answer if somebody
tries it.

The test covers this, but also the same thing with "--count" (which is
what I originally tried in a real-world case). Ironically the
try_bitmap_count() code already realizes that "--left-right" won't work
there. But that just causes us to fall back to the regular bitmap
traversal code, which itself doesn't handle counting (we produce a list
of objects rather than a count).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-11-01 11:02:27 -04:00
Jeff King 479ab76c9f packfile: use object_id in find_pack_entry_one()
The main function we use to search a pack index for an object is
find_pack_entry_one(). That function still takes a bare pointer to the
hash, despite the fact that its underlying bsearch_pack() function needs
an object_id struct. And so we end up making an extra copy of the hash
into the struct just to do a lookup.

As it turns out, all callers but one already have such an object_id. So
we can just take a pointer to that struct and use it directly. This
avoids the extra copy and provides a more type-safe interface.

The one exception is get_delta_base() in packfile.c, when we are chasing
a REF_DELTA from inside the pack (and thus we have a pointer directly to
the mmap'd pack memory, not a struct). We can just bump the hashcpy()
from inside find_pack_entry_one() to this one caller that needs it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25 17:35:46 -04:00
Jeff King 4d99559147 packfile: convert find_sha1_pack() to use object_id
The find_sha1_pack() function has a few problems:

  - it's badly named, since it works with any object hash

  - it takes the hash as a bare pointer rather than an object_id struct

We can fix both of these easily, as all callers actually have a real
object_id anyway.

I also found the existence of this function somewhat confusing, as it is
about looking in an arbitrary set of linked packed_git structs. It's
good for things like dumb-http which are looking in downloaded remote
packs, and not our local packs. But despite the name, it is not a good
way to find the pack which contains a local object (it skips the use of
the midx, the pack mru list, and so on).

So let's also add an explanatory comment above the declaration that may
point people in the right direction.

I suspect the calls in fast-import.c, which use the packed_git list from
the repository struct, could actually just be using find_pack_entry().
But since we'd need to keep it anyway for dumb-http, I didn't dig
further there. If we eventually drop dumb-http support, then it might be
worth examining them to see if we can get rid of the function entirely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25 17:35:46 -04:00
Jeff King 4390fea963 packfile: drop sha1_pack_index_name()
Like sha1_pack_name() that we dropped in the previous commit, this
function uses an error-prone static strbuf and the somewhat misleading
name "sha1".

The only caller left is in pack-redundant.c. While this command is
marked for potential removal in our BreakingChanges document, we still
have it for now. But it's simple enough to convert it to use its own
strbuf with the underlying odb_pack_name() function, letting us drop the
otherwise obsolete function.

Note that odb_pack_name() does its own strbuf_reset(), so it's safe to
use directly within a loop like this.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25 17:35:46 -04:00
Taylor Blau 55d12c24d7 Merge branch 'wm/shortlog-hash'
Teaches 'shortlog' to explicitly use SHA-1 when operating outside of
a repository.

* wm/shortlog-hash:
  builtin/shortlog: explicitly set hash algo when there is no repo
2024-10-25 14:02:49 -04:00
Taylor Blau 0ab43ed95c Merge branch 'jc/a-commands-without-the-repo'
Commands that can also work outside Git have learned to take the
repository instance "repo" when we know we are in a repository, and
NULL when we are not, in a parameter.  The uses of the_repository
variable in a few of them have been removed using the new calling
convention.

* jc/a-commands-without-the-repo:
  archive: remove the_repository global variable
  annotate: remove usage of the_repository global
  git: pass in repo to builtin based on setup_git_directory_gently
2024-10-25 14:02:36 -04:00
Taylor Blau 6cbcc68ea7 Merge branch 'db/submodule-fetch-with-remote-name-fix'
A "git fetch" from the superproject going down to a submodule used
a wrong remote when the default remote names are set differently
between them.

* db/submodule-fetch-with-remote-name-fix:
  submodule: correct remote name with fetch
2024-10-25 14:01:09 -04:00
Taylor Blau 8e08668322 Merge branch 'cw/worktree-relative'
An extra worktree attached to a repository points at each other to
allow finding the repository from the worktree and vice versa
possible.  Turn this linkage to relative paths.

* cw/worktree-relative:
  worktree: add test for path handling in linked worktrees
  worktree: link worktrees with relative paths
  worktree: refactor infer_backlink() to use *strbuf
  worktree: repair copied repository and linked worktrees
2024-10-22 14:40:39 -04:00
Sven Strickroth c32d4a8cfe global: Fix duplicate word typos
Used regex to find these typos:

    (?<!struct )(?<=\s)([a-z]{1,}) \1(?=\s)

Signed-off-by: Sven Strickroth <email@cs-ware.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-21 16:05:04 -04:00
Abraham Samuel Adekunle dab0b9e176 notes: teach the -e option to edit messages in editor
Notes can be added to a commit using:
	- "-m" to provide a message on the command line.
	- -C to copy a note from a blob object.
	- -F to read the note from a file.
When these options are used, Git does not open an editor,
it simply takes the content provided via these options and
attaches it to the commit as a note.

Improve flexibility to fine-tune the note before finalizing it
by allowing the messages to be prefilled in the editor and edited
after the messages have been provided through -[mF].

Signed-off-by: Abraham Samuel Adekunle <abrahamadekunle50@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-21 15:52:48 -04:00
Ramsay Jones 468a7e41e8 credential-cache: treat ECONNABORTED like ECONNRESET
On Cygwin, t0301 fails because "git credential-cache exit" returns a
non-zero exit code. What's supposed to happen here is:

  1. The client (the "credential-cache" invocation above) connects to a
     previously-spawned credential-cache--daemon.

  2. The client sends an "exit" command to the daemon.

  3. The daemon unlinks the socket and then exits, closing the
     descriptor back to the client.

  4. The client sees EOF on the descriptor and exits successfully.

That works on most platforms, and even _used_ to work on Cygwin. But
that changed in Cygwin's ef95c03522 (Cygwin: select: Fix FD_CLOSE
handling, 2021-04-06). After that commit, the client sees a read error
with errno set to ECONNABORTED, and it reports the error and dies.

It's not entirely clear if this is a Cygwin bug. It seems that calling
fclose() on the filehandles pointing to the sockets is sufficient to
avoid this error return, even though exiting should in general look the
same from the client's perspective.

However, we can't just call fclose() here. It's important in step 3
above to unlink the socket before closing the descriptor to avoid the
race mentioned by 7d5e9c9849 (credential-cache--daemon: clarify "exit"
action semantics, 2016-03-18). The client will exit as soon as it sees
the descriptor close, and the daemon may or may not have actually
unlinked the socket by then. That makes test code like this:

  git credential exit &&
  test_path_is_missing .git-credential-cache

racy.

So we probably _could_ fix this by calling:

  delete_tempfile(&socket_file);
  fclose(in);
  fclose(out);

before we exit(). Or by replacing the exit() with a return up the stack,
in which case the fclose() happens as we unwind. But in that case we'd
still need to call delete_tempfile() here to avoid the race.

But simpler still is that we can notice that we already special-case
ECONNRESET on the client side, courtesy of 1f180e5eb9 (credential-cache:
interpret an ECONNRESET as an EOF, 2017-07-27). We can just do the same
thing here (I suspect that prior to the Cygwin commit that introduced
this problem, we were really just seeing ECONNRESET instead of
ECONNABORTED, so the "new" problem is just the switch of the errno
values).

There's loads more debugging in this thread:

  https://lore.kernel.org/git/9dc3e85f-a532-6cff-de11-1dfb2e4bc6b6@ramsayjones.plus.com/

but I've tried to summarize the useful bits in this commit message.

[jk: commit message]

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-18 17:18:05 -04:00
Taylor Blau c1662a00b6 Merge branch 'ps/maintenance-start-crash-fix'
"git maintenance start" crashed due to an uninitialized variable
reference, which has been corrected.

* ps/maintenance-start-crash-fix:
  builtin/gc: fix crash when running `git maintenance start`
2024-10-18 13:56:26 -04:00
Taylor Blau 645cc7a2a7 Merge branch 'kh/checkout-ignore-other-docfix'
Doc updates.

* kh/checkout-ignore-other-docfix:
  checkout: refer to other-worktree branch, not ref
2024-10-18 13:56:24 -04:00
Wolfgang Müller b33001645e builtin/shortlog: explicitly set hash algo when there is no repo
Whilst git-shortlog(1) does not explicitly need any repository
information when run without reference to one, it still parses some of
its arguments with parse_revision_opt() which assumes that the hash
algorithm is set. However, in c8aed5e8da (repository: stop setting SHA1
as the default object hash, 2024-05-07) we stopped setting up a default
hash algorithm and instead require commands to set it up explicitly.

This was done for most other commands like in ab274909d4 (builtin/diff:
explicitly set hash algo when there is no repo, 2024-05-07) but was
missed for builtin/shortlog, making git-shortlog(1) segfault outside of
a repository when given arguments like --author that trigger a call to
parse_revision_opt().

Fix this for now by explicitly setting the hash algorithm to SHA1. Also
add a regression test for the segfault.

Thanks-to: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Wolfgang Müller <wolf@oriole.systems>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-17 16:10:54 -04:00
Patrick Steinhardt 87ad2a9d56 builtin/credential-cache: fix missing parameter for stub function
When not compiling the credential cache we may use a stub function for
`cmd_credential_cache()`. With commit 9b1cb5070f (builtin: add a
repository parameter for builtin functions, 2024-09-13), we have added a
new parameter to all of those top-level `cmd_*()` functions, and did
indeed adapt the non-stubbed-out `cmd_credential_cache()`. But we didn't
adapt the stubbed-out variant, so the code does not compile.

Fix this by adding the missing parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-16 17:00:49 -04:00
Taylor Blau b43e23fa02 Merge branch 'jk/fsmonitor-event-listener-race-fix'
On macOS, fsmonitor can fall into a race condition that results in
a client waiting forever to be notified for an event that have
already happened.  This problem has been corrected.

* jk/fsmonitor-event-listener-race-fix:
  fsmonitor: initialize fs event listener before accepting clients
  simple-ipc: split async server initialization and running
2024-10-15 16:56:43 -04:00
Taylor Blau fd98f659fd Merge branch 'xx/remote-server-option-config'
A new configuration variable remote.<name>.serverOption makes the
transport layer act as if the --serverOption=<value> option is
given from the command line.

* xx/remote-server-option-config:
  ls-remote: leakfix for not clearing server_options
  fetch: respect --server-option when fetching multiple remotes
  transport.c:🤝 make use of server options from remote
  remote: introduce remote.<name>.serverOption configuration
  transport: introduce parse_transport_option() method
2024-10-15 16:56:43 -04:00
Taylor Blau f004467b04 Merge branch 'jh/config-unset-doc-fix'
Docfix.

* jh/config-unset-doc-fix:
  git-config.1: remove value from positional args in unset usage
2024-10-15 16:56:43 -04:00
Linus Arver 3f0346d4dc trailer: spread usage of "trailer_block" language
Deprecate the "trailer_info" struct name and replace it with
"trailer_block". This is more readable, for two reasons:

  1. "trailer_info" on the surface sounds like it's about a single
     trailer when in reality it is a collection of one or more trailers,
     and

  2. the "*_block" suffix is more informative than "*_info", because it
     describes a block (or region) of contiguous text which has trailers
     in it, which has been parsed into the trailer_block structure.

Rename the

    size_t trailer_block_start, trailer_block_end;

members of trailer_info to just "start" and "end". Rename the "info"
pointer to "trailer_block" because it is more descriptive. Update
comments accordingly.

Signed-off-by: Linus Arver <linus@ucla.edu>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-14 12:33:02 -04:00
John Cai 528d3e4d53 archive: remove the_repository global variable
As part of the effort to get rid of global state due to the global
the_repository variable, replace the_repository with the repository
argument that gets passed down through the builtin function.

The repo might be NULL, but we should be safe in write_archive() because
it detects if we are outside of a repository and calls
setup_git_directory() which will error.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-11 09:37:18 -07:00
John Cai ebe8f4b6ec annotate: remove usage of the_repository global
As part of the effort to get rid of global state due to the_repository
variable, remove the the_repository with the repository argument that
gets passed down through the builtin function.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-11 09:37:18 -07:00
John Cai 5db948d413 git: pass in repo to builtin based on setup_git_directory_gently
The current code in run_builtin() passes in a repository to the builtin
based on whether cmd_struct's option flag has RUN_SETUP.

This is incorrect, however, since some builtins that only have
RUN_SETUP_GENTLY can potentially take a repository.
setup_git_directory_gently() tells us whether or not a command is being
run inside of a repository.

Use the output of setup_git_directory_gently() to help determine whether
or not there is a repository to pass to the builtin. If not, then we
just pass NULL.

As part of this patch, we need to modify add to check for a NULL repo
before calling repo_git_config(), since add -h can be run outside of a
repository.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-11 09:37:17 -07:00
Junio C Hamano 31bc4454de Merge branch 'ps/leakfixes-part-8'
More leakfixes.

* ps/leakfixes-part-8: (23 commits)
  builtin/send-pack: fix leaking list of push options
  remote: fix leaking push reports
  t/helper: fix leaks in proc-receive helper
  pack-write: fix return parameter of `write_rev_file_order()`
  revision: fix leaking saved parents
  revision: fix memory leaks when rewriting parents
  midx-write: fix leaking buffer
  pack-bitmap-write: fix leaking OID array
  pseudo-merge: fix leaking strmap keys
  pseudo-merge: fix various memory leaks
  line-log: fix several memory leaks
  diff: improve lifecycle management of diff queues
  builtin/revert: fix leaking `gpg_sign` and `strategy` config
  t/helper: fix leaking repository in partial-clone helper
  builtin/clone: fix leaking repo state when cloning with bundle URIs
  builtin/pack-redundant: fix various memory leaks
  builtin/stash: fix leaking `pathspec_from_file`
  submodule: fix leaking submodule entry list
  wt-status: fix leaking buffer with sparse directories
  shell: fix leaking strings
  ...
2024-10-10 14:22:29 -07:00
Kristoffer Haugsbakk b8139c8f4e checkout: refer to other-worktree branch, not ref
We can only check out commits or branches, not refs in general.  And the
problem here is if another worktree is using the branch that we want to
check out.

Let’s be more direct and just talk about branches instead of refs.

Also replace “be held” with “in use”.  Further, “in use” is not
restricted to a branch being checked out (e.g. the branch could be busy
on a rebase), hence generalize to “or otherwise in use” in the option
description.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-10 13:09:13 -07:00
Patrick Steinhardt c95547a394 builtin/gc: fix crash when running `git maintenance start`
It was reported on the mailing list that running `git maintenance start`
immediately segfaults starting with b6c3f8e12c (builtin/maintenance: fix
leak in `get_schedule_cmd()`, 2024-09-26). And indeed, this segfault is
trivial to reproduce up to a point where one is scratching their head
why we didn't catch this regression in our test suite.

The root cause of this error is `get_schedule_cmd()`, which does not
populate the `out` parameter in all cases anymore starting with the
mentioned commit. Callers do assume it to always be populated though and
will e.g. call `strvec_split()` on the returned value, which will of
course segfault when the variable is uninitialized.

So why didn't we catch this trivial regression? The reason is that our
tests always set up the "GIT_TEST_MAINT_SCHEDULER" environment variable
via "t/test-lib.sh", which allows us to override the scheduler command
with a custom one so that we don't accidentally modify the developer's
system. But the faulty code where we don't set the `out` parameter will
only get hit in case that environment variable is _not_ set, which is
never the case when executing our tests.

Fix the regression by again unconditionally allocating the value in the
`out` parameter, if provided. Add a test that unsets the environment
variable to catch future regressions in this area.

Reported-by: Shubham Kanodia <shubham.kanodia10@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-10 10:04:43 -07:00
Daniel Black 0c1a9987da submodule: correct remote name with fetch
The code fetches the submodules remote based on the superproject remote name
instead of the submodule remote name[1].

Instead of grabbing the default remote of the superproject repository, ask
the default remote of the submodule we are going to run 'git fetch' in.

1. https://lore.kernel.org/git/ZJR5SPDj4Wt_gmRO@pweza/

Signed-off-by: Daniel Black <daniel@mariadb.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-09 10:48:08 -07:00
Josh Heinrichs f36b8cbaef git-config.1: remove value from positional args in unset usage
The synopsis for `git config unset` mentions two positional arguments:
`<name>` and `<value>`. While the first argument is correct, the second
is not. Users are expected to provide the value via `--value=<value>`.

Remove the positional argument. The `--value=<value>` option is already
documented correctly, so this is all we need to do to fix the
documentation.

Signed-off-by: Josh Heinrichs <joshiheinrichs@gmail.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-08 23:35:45 -07:00
Jeff King 51907f8fee fsmonitor: initialize fs event listener before accepting clients
There's a racy hang in fsmonitor on macOS that we sometimes see in CI.
When we serve a client, what's supposed to happen is:

  1. The client thread calls with_lock__wait_for_cookie() in which we
     create a cookie file and then wait for a pthread_cond event

  2. The filesystem event listener sees the cookie file creation, does
     some internal book-keeping, and then triggers the pthread_cond.

But there's a problem: we start the listener that accepts client threads
before we start the fs event thread. So it's possible for us to accept a
client which creates the cookie file and starts waiting before the fs
event thread is initialized, and we miss those filesystem events
entirely. That leaves the client thread hanging forever.

In CI, the symptom is that t9210 (which is testing scalar, which always
enables fsmonitor under the hood) may hang forever in "scalar clone". It
is waiting on "git fetch" which is waiting on the fsmonitor daemon.

The race happens more frequently under load, but you can trigger it
predictably with a sleep like this, which delays the start of the fs
event thread:

  --- a/compat/fsmonitor/fsm-listen-darwin.c
  +++ b/compat/fsmonitor/fsm-listen-darwin.c
  @@ -510,6 +510,7 @@ void fsm_listen__loop(struct fsmonitor_daemon_state *state)
          FSEventStreamSetDispatchQueue(data->stream, data->dq);
          data->stream_scheduled = 1;

  +       sleep(1);
          if (!FSEventStreamStart(data->stream)) {
                  error(_("Failed to start the FSEventStream"));
                  goto force_error_stop_without_loop;

One solution might be to reverse the order of initialization: start the
fs event thread before we start the thread listening for clients. But
the fsmonitor code explicitly does it in the opposite direction. The fs
event thread wants to refer to the ipc_server_data struct, so we need it
to be initialized first.

A further complication is that we need a signal from the fs event thread
that it is actually ready and listening. And those details happen within
backend-specific fsmonitor code, whereas the initialization is in the
shared code.

So instead, let's use the ipc_server init/start split added in the
previous commit. The generic fsmonitor code will init the ipc_server but
_not_ start it, leaving that to the backend specific code, which now
needs to call ipc_server_start_async() at the right time.

For macOS, that is right after we start the FSEventStream that you can
see in the diff above.

It's not clear to me if Windows suffers from the same problem (and we
simply don't trigger it in CI), or if it is immune. Regardless, the
obvious place to start accepting clients there is right after we've
established the ReadDirectoryChanges watch.

This makes the hangs go away in our macOS CI environment, even when
compiled with the sleep() above.

Helped-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-08 12:03:56 -07:00
Jeff King 766fce69e9 simple-ipc: split async server initialization and running
To start an async ipc server, you call ipc_server_run_async(). That
initializes the ipc_server_data object, and starts all of the threads
running, which may immediately start serving clients.

This can create some awkward timing problems, though. In the fsmonitor
daemon (the sole user of the simple-ipc system), we want to create the
ipc server early in the process, which means we may start serving
clients before the rest of the daemon is fully initialized.

To solve this, let's break run_async() into two parts: an initialization
which allocates all data and spawns the threads (without letting them
run), and a start function which actually lets them begin work. Since we
have two simple-ipc implementations, we have to handle this twice:

  - in ipc-unix-socket.c, we have a central listener thread which hands
    connections off to worker threads using a work_available mutex. We
    can hold that mutex after init, and release it when we're ready to
    start.

    We do need an extra "started" flag so that we know whether the main
    thread is holding the mutex or not (e.g., if we prematurely stop the
    server, we want to make sure all of the worker threads are released
    to hear about the shutdown).

  - in ipc-win32.c, we don't have a central mutex. So we'll introduce a
    new startup_barrier mutex, which we'll similarly hold until we're
    ready to let the threads proceed.

    We again need a "started" flag here to make sure that we release the
    barrier mutex when shutting down, so that the sub-threads can
    proceed to the finish.

I've renamed the run_async() function to init_async() to make sure we
catch all callers, since they'll now need to call the matching
start_async().

We could leave run_async() as a wrapper that does both, but there's not
much point. There are only two callers, one of which is fsmonitor, which
will want to actually do work between the two calls. And the other is
just a test-tool wrapper.

For now I've added the start_async() calls in fsmonitor where they would
otherwise have happened, so there should be no behavior change with this
patch.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-08 12:03:56 -07:00
Caleb White 717af916cd worktree: link worktrees with relative paths
Git currently stores absolute paths to both the main repository and
linked worktrees. However, this causes problems when moving repositories
or working in containerized environments where absolute paths differ
between systems. The worktree links break, and users are required to
manually execute `worktree repair` to repair them, leading to workflow
disruptions. Additionally, mapping repositories inside of containerized
environments renders the repository unusable inside the containers, and
this is not repairable as repairing the worktrees inside the containers
will result in them being broken outside the containers.

To address this, this patch makes Git always write relative paths when
linking worktrees. Relative paths increase the resilience of the
worktree links across various systems and environments, particularly
when the worktrees are self-contained inside the main repository (such
as when using a bare repository with worktrees). This improves
portability, workflow efficiency, and reduces overall breakages.

Although Git now writes relative paths, existing repositories with
absolute paths are still supported. There are no breaking changes
to workflows based on absolute paths, ensuring backward compatibility.

At a low level, the changes involve modifying functions in `worktree.c`
and `builtin/worktree.c` to use `relative_path()` when writing the
worktree’s `.git` file and the main repository’s `gitdir` reference.
Instead of hardcoding absolute paths, Git now computes the relative path
between the worktree and the repository, ensuring that these links are
portable. Locations where these respective file are read have also been
updated to properly handle both absolute and relative paths. Generally,
relative paths are always resolved into absolute paths before any
operations or comparisons are performed.

Additionally, `repair_worktrees_after_gitdir_move()` has been introduced
to address the case where both the `<worktree>/.git` and
`<repo>/worktrees/<id>/gitdir` links are broken after the gitdir is
moved (such as during a re-initialization). This function repairs both
sides of the worktree link using the old gitdir path to reestablish the
correct paths after a move.

The `worktree.path` struct member has also been updated to always store
the absolute path of a worktree. This ensures that worktree consumers
never have to worry about trying to resolve the absolute path themselves.

Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-08 11:49:22 -07:00
Xing Xin 0f490d270a ls-remote: leakfix for not clearing server_options
Ensure `server_options` is properly cleared using `string_list_clear()`
in `builtin/ls-remote.c:cmd_ls_remote`.

Although we cannot yet enable `TEST_PASSES_SANITIZE_LEAK=true` for
`t/t5702-protocol-v2.sh` due to other existing leaks, this fix ensures
that "git-ls-remote" related server options tests pass the sanitize leak
check:

  ...
  ok 12 - server-options are sent when using ls-remote
  ok 13 - server-options from configuration are used by ls-remote
  ...

Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-08 10:22:10 -07:00
Xing Xin 148bc7bf4b fetch: respect --server-option when fetching multiple remotes
Fix an issue where server options specified via the command line
(`--server-option` or `-o`) were not sent when fetching from multiple
remotes using Git protocol v2.

To reproduce the issue with a repository containing multiple remotes:

  GIT_TRACE_PACKET=1 git -c protocol.version=2 fetch --server-option=demo --all

Observe that no server options are sent to any remote.

The root cause was identified in `builtin/fetch.c:fetch_multiple`, which
is invoked when fetching from more than one remote. This function forks
a `git-fetch` subprocess for each remote but did not include the
specified server options in the subprocess arguments.

This commit ensures that command-line specified server options are
properly passed to each subprocess. Relevant tests have been added.

Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-08 10:22:09 -07:00
Xing Xin 06708ce180 transport: introduce parse_transport_option() method
Add the `parse_transport_option()` method to parse the `push.pushOption`
configuration. This method will also be used in the next commit to
handle the new `remote.<name>.serverOption` configuration for setting
server options in Git protocol v2.

Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-08 10:22:06 -07:00
Junio C Hamano 2ab53b59ef Merge branch 'kn/osx-fsmonitor-with-submodules-fix'
macOS with fsmonitor daemon can hang forever when a submodule is
involved, which has been corrected.

* kn/osx-fsmonitor-with-submodules-fix:
  fsmonitor OSX: fix hangs for submodules
2024-10-04 14:21:43 -07:00
Koji Nakamaru 435a6900d2 fsmonitor OSX: fix hangs for submodules
fsmonitor_classify_path_absolute() expects state->path_gitdir_watch.buf
has no trailing '/' or '.' For a submodule, fsmonitor_run_daemon() sets
the value with trailing "/." (as repo_get_git_dir(the_repository) on
Darwin returns ".") so that fsmonitor_classify_path_absolute() returns
IS_OUTSIDE_CONE.

In this case, fsevent_callback() doesn't update cookie_list so that
fsmonitor_publish() does nothing and with_lock__mark_cookies_seen() is
not invoked.

As with_lock__wait_for_cookie() infinitely waits for state->cookies_cond
that with_lock__mark_cookies_seen() should unlock, the whole daemon
hangs.

Remove trailing "/." from state->path_gitdir_watch.buf for submodules
and add a corresponding test in t7527-builtin-fsmonitor.sh. The test is
disabled for MINGW because hangs treated with this patch occur only for
Darwin and there is no simple way to terminate the win32 fsmonitor
daemon that hangs.

Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-04 08:01:27 -07:00
Junio C Hamano 59ee4f7013 Merge branch 'jk/http-leakfixes'
Leakfixes.

* jk/http-leakfixes: (28 commits)
  http-push: clean up local_refs at exit
  http-push: clean up loose request when falling back to packed
  http-push: clean up objects list
  http-push: free xml_ctx.cdata after use
  http-push: free remote_ls_ctx.dentry_name
  http-push: free transfer_request strbuf
  http-push: free transfer_request dest field
  http-push: free curl header lists
  http-push: free repo->url string
  http-push: clear refspecs before exiting
  http-walker: free fake packed_git list
  remote-curl: free HEAD ref with free_one_ref()
  http: stop leaking buffer in http_get_info_packs()
  http: call git_inflate_end() when releasing http_object_request
  http: fix leak of http_object_request struct
  http: fix leak when redacting cookies from curl trace
  transport-helper: fix leak of dummy refs_list
  fetch-pack: clear pack lockfiles list
  fetch: free "raw" string when shrinking refspec
  transport-helper: fix strbuf leak in push_refs_with_push()
  ...
2024-10-02 07:46:26 -07:00
Junio C Hamano 365529e1ea Merge branch 'ps/leakfixes-part-7'
More leak-fixes.

* ps/leakfixes-part-7: (23 commits)
  diffcore-break: fix leaking filespecs when merging broken pairs
  revision: fix leaking parents when simplifying commits
  builtin/maintenance: fix leak in `get_schedule_cmd()`
  builtin/maintenance: fix leaking config string
  promisor-remote: fix leaking partial clone filter
  grep: fix leaking grep pattern
  submodule: fix leaking submodule ODB paths
  trace2: destroy context stored in thread-local storage
  builtin/difftool: plug several trivial memory leaks
  builtin/repack: fix leaking configuration
  diffcore-order: fix leaking buffer when parsing orderfiles
  parse-options: free previous value of `OPTION_FILENAME`
  diff: fix leaking orderfile option
  builtin/pull: fix leaking "ff" option
  dir: fix off by one errors for ignored and untracked entries
  builtin/submodule--helper: fix leaking remote ref on errors
  t/helper: fix leaking subrepo in nested submodule config helper
  builtin/submodule--helper: fix leaking error buffer
  builtin/submodule--helper: clear child process when not running it
  submodule: fix leaking update strategy
  ...
2024-10-02 07:46:26 -07:00
Junio C Hamano 9293a93186 Merge branch 'ds/sparse-checkout-expansion-advice'
When "git sparse-checkout disable" turns a sparse checkout into a
regular checkout, the index is fully expanded.  This totally
expected behaviour however had an "oops, we are expanding the
index" advice message, which has been corrected.

* ds/sparse-checkout-expansion-advice:
  sparse-checkout: disable advice in 'disable'
2024-10-02 07:46:25 -07:00
Junio C Hamano 4251403327 Merge branch 'ds/background-maintenance-with-credential'
Background tasks "git maintenance" runs may need to use credential
information when going over the network, but a credential helper
may work only in an interactive environment, and end up blocking a
scheduled task waiting for UI.  Credential helpers can now behave
differently when they are not running interactively.

* ds/background-maintenance-with-credential:
  scalar: configure maintenance during 'reconfigure'
  maintenance: add custom config to background jobs
  credential: add new interactive config option
2024-09-30 16:16:16 -07:00
Junio C Hamano 22baac8892 Merge branch 'pw/submodule-process-sigpipe'
When a subprocess to work in a submodule spawned by "git submodule"
fails with SIGPIPE, the parent Git process caught the death of it,
but gave a generic "failed to work in that submodule", which was
misleading.  We now behave as if the parent got SIGPIPE and die.

* pw/submodule-process-sigpipe:
  submodule status: propagate SIGPIPE
2024-09-30 16:16:15 -07:00
Patrick Steinhardt 66893a14d0 builtin/send-pack: fix leaking list of push options
The list of push options is leaking. Plug the leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30 11:23:09 -07:00
Patrick Steinhardt a6c30623d7 remote: fix leaking push reports
The push reports that report failures to the user when pushing a
reference leak in several places. Plug these leaks by introducing a new
function `ref_push_report_free()` that frees the list of reports and
call it as required. While at it, fix a trivially leaking error string
in the vicinity.

These leaks get hit in t5411, but plugging them does not make the whole
test suite pass.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30 11:23:08 -07:00
Patrick Steinhardt 2f0ee051dd pack-write: fix return parameter of `write_rev_file_order()`
While the return parameter of `write_rev_file_order()` is a string
constant, the function may indeed return an allocated string when its
first parameter is a `NULL` pointer. This makes for a confusing calling
convention, where callers need to be aware of these intricate ownership
rules and cast away the constness to free the string in some cases.

Adapt the function and its caller `write_rev_file()` to always return an
allocated string and adapt callers to always free the return value.

Note that this requires us to also adapt `rename_tmp_packfile()`, which
compares the pointers to packfile data with each other. Now that the
path of the reverse index file gets allocated unconditionally the check
will always fail. This is fixed by using strcmp(3P) instead, which also
feels way less fragile.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30 11:23:08 -07:00
Patrick Steinhardt fdf972a9df builtin/revert: fix leaking `gpg_sign` and `strategy` config
We leak the config values when `gpg_sign` or `strategy` options are
being overridden via the command line. To fix this we need to free the
old value, which requires us to figure out whether the value was changed
via an option in the first place. The easy way to do this, which is to
initialize local variables with `NULL`, doesn't work because we cannot
tell the case where the user has passed e.g. `--no-gpg-sign`. Instead,
we use a sentinel value for both values that we can compare against to
check whether the user has passed the option.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30 11:23:05 -07:00
Patrick Steinhardt 6361dea6e8 builtin/clone: fix leaking repo state when cloning with bundle URIs
When cloning with bundle URIs we re-initialize `the_repository` after
having fetched the bundle. This causes a bunch of memory leaks though
because we do not release its previous state.

These leaks can be plugged by calling `repo_clear()` before we call
`repo_init()`. But this causes another issue because the remote that we
used is tied to the lifetime of the repository's remote state, which
would also get released. We thus have to make sure that it does not get
free'd under our feet.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30 11:23:04 -07:00
Patrick Steinhardt a0f2a2f581 builtin/pack-redundant: fix various memory leaks
There are various different memory leaks in git-pack-redundant(1),
mostly caused by not even trying to free allocated memory. Fix them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30 11:23:04 -07:00
Patrick Steinhardt 64fe1e4a8c builtin/stash: fix leaking `pathspec_from_file`
The `OPT_PATHSPEC_FROM_FILE()` option maps to `OPT_FILENAME()`, which we
know will always allocate memory when passed. We never free the memory
though, causing a memory leak. Plug it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30 11:23:04 -07:00
Patrick Steinhardt 9a48fc1da2 builtin/annotate: fix leaking args vector
We're leaking the args vector in git-annotate(1) because we never clear
it. Fixing it isn't as easy as calling `strvec_clear()` though because
calling `cmd_blame()` will cause the underlying array to be modified.
Instead, we also need to pass a shallow copy of the argv array to the
function.

Do so to plug the memory leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30 11:23:02 -07:00
Junio C Hamano a5031223cd Merge branch 'jk/http-leakfixes' into ps/leakfixes-part-8
* jk/http-leakfixes: (28 commits)
  http-push: clean up local_refs at exit
  http-push: clean up loose request when falling back to packed
  http-push: clean up objects list
  http-push: free xml_ctx.cdata after use
  http-push: free remote_ls_ctx.dentry_name
  http-push: free transfer_request strbuf
  http-push: free transfer_request dest field
  http-push: free curl header lists
  http-push: free repo->url string
  http-push: clear refspecs before exiting
  http-walker: free fake packed_git list
  remote-curl: free HEAD ref with free_one_ref()
  http: stop leaking buffer in http_get_info_packs()
  http: call git_inflate_end() when releasing http_object_request
  http: fix leak of http_object_request struct
  http: fix leak when redacting cookies from curl trace
  transport-helper: fix leak of dummy refs_list
  fetch-pack: clear pack lockfiles list
  fetch: free "raw" string when shrinking refspec
  transport-helper: fix strbuf leak in push_refs_with_push()
  ...
2024-09-30 11:22:21 -07:00
Junio C Hamano 674e46fdd5 Merge branch 'ps/leakfixes-part-7' into ps/leakfixes-part-8
* ps/leakfixes-part-7: (23 commits)
  diffcore-break: fix leaking filespecs when merging broken pairs
  revision: fix leaking parents when simplifying commits
  builtin/maintenance: fix leak in `get_schedule_cmd()`
  builtin/maintenance: fix leaking config string
  promisor-remote: fix leaking partial clone filter
  grep: fix leaking grep pattern
  submodule: fix leaking submodule ODB paths
  trace2: destroy context stored in thread-local storage
  builtin/difftool: plug several trivial memory leaks
  builtin/repack: fix leaking configuration
  diffcore-order: fix leaking buffer when parsing orderfiles
  parse-options: free previous value of `OPTION_FILENAME`
  diff: fix leaking orderfile option
  builtin/pull: fix leaking "ff" option
  dir: fix off by one errors for ignored and untracked entries
  builtin/submodule--helper: fix leaking remote ref on errors
  t/helper: fix leaking subrepo in nested submodule config helper
  builtin/submodule--helper: fix leaking error buffer
  builtin/submodule--helper: clear child process when not running it
  submodule: fix leaking update strategy
  ...
2024-09-30 11:22:10 -07:00
Patrick Steinhardt b6c3f8e12c builtin/maintenance: fix leak in `get_schedule_cmd()`
The `get_schedule_cmd()` function allows us to override the schedule
command with a specific test command such that we can verify the
underlying logic in a platform-independent way. Its memory management is
somewhat wild though, because it basically gives up and assigns an
allocated string to the string constant output pointer. While this part
is marked with `UNLEAK()` to mask this, we also leak the local string
lists.

Rework the function such that it has a separate out parameter. If set,
we will assign it the final allocated command. Plug the other memory
leaks and create a common exit path.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27 08:25:37 -07:00
Patrick Steinhardt 84e9fc361d builtin/maintenance: fix leaking config string
When parsing the maintenance strategy from config we allocate a config
string, but do not free it after parsing it. Plug this leak by instead
using `git_config_get_string_tmp()`, which does not allocate any memory.

This leak is exposed by t7900, but plugging it alone does not make the
test suite pass.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27 08:25:37 -07:00
Patrick Steinhardt 7f795a1715 builtin/difftool: plug several trivial memory leaks
There are several leaking data structures in git-difftool(1). Plug them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27 08:25:36 -07:00
Patrick Steinhardt dea4a9521e builtin/repack: fix leaking configuration
When repacking, we assemble git-pack-objects(1) arguments both for the
"normal" pack and for the cruft pack. This configuration gets populated
with a bunch of `OPT_PASSTHRU` options that we end up passing to the
child process. These options are allocated, but never free'd.

Create a new `pack_objects_args_release()` function that releases the
memory for us and call it for both sets of options.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27 08:25:36 -07:00
Patrick Steinhardt 49af1b7722 builtin/pull: fix leaking "ff" option
The `opt_ff` field gets populated either via `OPT_PASSTHRU` via
`config_get_ff()` or when `--rebase` is passed. So we sometimes end up
overriding the value in `opt_ff` with another value, but we do not free
the old value, causing a memory leak.

Adapt the type of the variable to be `char *` and consistently assign
allocated strings to it such that we can easily free it when it is being
overridden.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27 08:25:35 -07:00
Patrick Steinhardt 5bf922a4e9 builtin/submodule--helper: fix leaking remote ref on errors
When `update_submodule()` fails we return with `die_message()`, which
only causes us to print the same message as `die()` would without
actually causing the process to die. We don't free memory in that case
and thus leak memory.

Fix the leak by freeing the remote ref.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27 08:25:35 -07:00
Patrick Steinhardt 2266bb4f6a builtin/submodule--helper: fix leaking error buffer
Fix leaking error buffer when `compute_alternate_path()` fails.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27 08:25:34 -07:00
Patrick Steinhardt 8f786a8e9f builtin/submodule--helper: clear child process when not running it
In `runcommand_in_submodule_cb()` we may end up not executing the child
command when `argv` is empty. But we still populate the command with
environment variables and other things, which needs cleanup. This leads
to a memory leak because we do not call `finish_command()`.

Fix this by clearing the child process when we don't execute it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27 08:25:34 -07:00
Patrick Steinhardt 2e492f2047 submodule: fix leaking update strategy
We're not freeing the submodule update strategy command. Provide a
helper function that does this for us and call it in
`update_data_release()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27 08:25:34 -07:00
Patrick Steinhardt 0f26223b6d builtin/help: fix leaking `html_path` when reading config multiple times
The `html_path` variable gets populated via `git_help_config()`, which
puts an allocated string into it if its value has been configured. We do
not clear the old value though, which causes a memory leak in case the
config exists multiple times.

Plug this leak. The leak is exposed by t0012, but plugging it alone is
not sufficient to make the test suite pass.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27 08:25:34 -07:00
Patrick Steinhardt 02e36f9ffa builtin/help: fix dangling reference to `html_path`
In `get_html_page_path()` we may end up assigning the return value of
`system_path()` to the global `html_path` variable. But as we also
assign the returned value to `to_free`, we will deallocate its memory
upon returning from the function. Consequently, `html_path` will now
point to deallocated memory.

Fix this issue by instead assigning the value to a separate local
variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27 08:25:34 -07:00
Junio C Hamano a344b47165 Merge branch 'ak/typofix-builtins'
Typofix.

* ak/typofix-builtins:
  builtin: fix typos
2024-09-25 18:24:50 -07:00
Junio C Hamano 52f57e94bd Merge branch 'ps/reftable-exclude'
The reftable backend learned to more efficiently handle exclude
patterns while enumerating the refs.

* ps/reftable-exclude:
  refs/reftable: wire up support for exclude patterns
  reftable/reader: make table iterator reseekable
  t/unit-tests: introduce reftable library
  Makefile: stop listing test library objects twice
  builtin/receive-pack: fix exclude patterns when announcing refs
  refs: properly apply exclude patterns to namespaced refs
2024-09-25 10:37:11 -07:00
Jeff King d121a7dd21 fetch-pack: clear pack lockfiles list
If the --lock-pack option is passed (which it typically is when
fetch-pack is used under the hood by smart-http), then we may end up
with entries in our pack_lockfiles string_list. We need to clear them
before returning to avoid a leak.

In git-fetch this isn't a problem, since the same cleanup happens via
transport_unlock_pack(). But the leak is detectable in t5551, which does
http fetches.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-25 10:24:54 -07:00