Commit Graph

464 Commits (cf47560e945f3d9a0a42e89b9d6ef59fb085bbb5)

Author SHA1 Message Date
Junio C Hamano 56072ff038 Merge branch 'jk/fetch-check-graph-objects-fix'
Under a race against another process that is repacking the
repository, especially a partially cloned one, "git fetch" may
mistakenly think some objects we do have are missing, which has
been corrected.

* jk/fetch-check-graph-objects-fix:
  fetch-pack: re-scan when double-checking graph objects
2025-08-28 11:28:58 -07:00
Jeff King 8f32a5a6c0 fetch-pack: re-scan when double-checking graph objects
The fetch code tries to avoid asking the remote side for an object we
already have. It does this by traversing recent commits reachable from
our refs looking for matches. Commit 5d4cc78f72 (fetch-pack: die if in
commit graph but not obj db, 2024-11-05) introduced an extra check
there: if we think we have an object because it's in the commit graph,
we double-check that we actually have it in our object database with a
call to odb_has_object().

But that call does not pass any flags, and so the function won't call
reprepared_packed_git() if it does not find the object. That opens us up
to the usual race against some other process repacking the odb:

  1. We scan the list of packs in objects/pack but haven't yet opened them.

  2. Somebody else packs the object into a new pack (which we don't know
     about), and deletes the old pack it was in.

  3. Our odb_has_object() calls tries to open that old pack, but finds it
     is gone. We declare that we don't have the object.

And this causes us to erroneously complain and abort the fetch, thinking
our commit-graph and object database are out of sync. Instead, we should
pass HAS_OBJECT_RECHECK_PACKED, which will add a new step:

  4. We re-scan the pack directory again, find the new pack, and locate
     the object.

Often the fetch code tries to avoid these kinds of re-scans if it's
likely that we won't have the object. If the other side has told us
about object X and we want to know if we have it, we'll skip the re-scan
(to avoid spending a lot of effort when there are many such objects). We
can accept the racy false negative in that case because the worst case
is that we ask the other side to send us the object.

But this is not one of those cases. These are objects which are
accessible from _our_ refs, and which we already found in the commit
graph file. We should have them, and if we don't, we'll die()
immediately. So the performance impact is negligible, and getting the
right answer is important.

There's no test here because it's inherently racy. In fact, I had
trouble even developing a minimal test. The problem seen in the wild can
be produced like this:

  # Any git.git mirror which supports partial clones; I think this
  # should work with any repo that contains submodules, but note that
  # $obj below is specific to this repo
  url=https://github.com/git/git.git

  # This is a commit that is not at the tip of any branches (so after
  # we have it, we'll still have some commits to fetch).
  obj=cf6f63ea6bf35173e02e18bdc6a4ba41288acff9

  git init
  git fetch --filter=tree:0 $url $obj:refs/heads/foo
  git checkout foo
  git commit-graph write --reachable
  git fetch $url

What happens here is that the initial fetch grabs that older commit (and
its ancestors) but no trees or blobs, and the subsequent checkout grabs
the necessary trees and blobs just for that commit. The final fetch
spawns a long sequence of child fetches due to fetch_submodules(), which
wants to check whether there have been any gitlink modifications which
should trigger a fetch of the related submodule (we'll leave aside the
irony that we did not even check out any submodules yet).

That series of fetches causes us to accumulate packs, which eventually
triggers background maintenance to run. That repacks all-into-one, and
the pack containing $obj goes away in favor of a new pack. And then the
fetch eventually fails with:

  fatal: You are attempting to fetch cf6f63ea6b, which is in the commit graph file but
not in the object database.

In the scenario above, the race becomes likely because of the long
series of quick fetches. But I _think_ the bug is independent of partial
clones entirely, and you could run into the same thing with a single
fetch, some other process running "git repack" simultaneously, and a bit
of bad luck. I haven't been able to reproduce, though. I'm not sure if
that's because there's some mis-analysis above, or if the race window is
just small enough that it's hard to trigger.

At any rate, re-scanning here seems like an obviously correct thing to
do with no downside, and it does fix the partial-clone case shown above.

Reported-by: Дилян Палаузов <dilyan.palauzov@aegee.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-25 10:30:03 -07:00
Junio C Hamano 971ba42dd4 Merge branch 'jc/string-list-split'
string_list_split*() family of functions have been extended to
simplify common use cases.

* jc/string-list-split:
  string-list: split-then-remove-empty can be done while splitting
  string-list: optionally omit empty string pieces in string_list_split*()
  diff: simplify parsing of diff.colormovedws
  string-list: optionally trim string pieces split by string_list_split*()
  string-list: unify string_list_split* functions
  string-list: align string_list_split() with its _in_place() counterpart
  string-list: report programming error with BUG
2025-08-21 13:46:59 -07:00
Junio C Hamano 540aaa607c Merge branch 'ps/config-wo-the-repository'
The config API had a set of convenience wrapper functions that
implicitly use the_repository instance; they have been removed and
inlined at the calling sites.

* ps/config-wo-the-repository: (21 commits)
  config: fix sign comparison warnings
  config: move Git config parsing into "environment.c"
  config: remove unused `the_repository` wrappers
  config: drop `git_config_set_multivar()` wrapper
  config: drop `git_config_get_multivar_gently()` wrapper
  config: drop `git_config_set_multivar_in_file_gently()` wrapper
  config: drop `git_config_set_in_file_gently()` wrapper
  config: drop `git_config_set()` wrapper
  config: drop `git_config_set_gently()` wrapper
  config: drop `git_config_set_in_file()` wrapper
  config: drop `git_config_get_bool()` wrapper
  config: drop `git_config_get_ulong()` wrapper
  config: drop `git_config_get_int()` wrapper
  config: drop `git_config_get_string()` wrapper
  config: drop `git_config_get_string()` wrapper
  config: drop `git_config_get_string_multi()` wrapper
  config: drop `git_config_get_value()` wrapper
  config: drop `git_config_get_value()` wrapper
  config: drop `git_config_get()` wrapper
  config: drop `git_config_clear()` wrapper
  ...
2025-08-04 08:10:33 -07:00
Junio C Hamano 9f6dfe43c8 string-list: align string_list_split() with its _in_place() counterpart
The string_list_split_in_place() function was updated by 52acddf3
(string-list: multi-delimiter `string_list_split_in_place()`,
2023-04-24) to take more than one delimiter characters, hoping that
we can later use it to replace our uses of strtok().  We however did
not make a matching change to the string_list_split() function,
which is very similar.

Before giving both functions more features in future commits, allow
string_list_split() to also take more than one delimiter characters
to make them closer to each other.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-02 22:29:27 -07:00
Junio C Hamano 0f6e5037d4 Merge branch 'rs/pop-recent-commit-with-prio-queue'
The pop_most_recent_commit() function can have quite expensive
worst case performance characteristics, which has been optimized by
using prio-queue data structure.

* rs/pop-recent-commit-with-prio-queue:
  commit: use prio_queue_replace() in pop_most_recent_commit()
  prio-queue: add prio_queue_replace()
  commit: convert pop_most_recent_commit() to prio_queue
2025-07-28 12:02:34 -07:00
Patrick Steinhardt 5d215a7b3e config: drop `git_config_get_bool()` wrapper
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.

Follow through with that intent and remove `git_config_get_bool()`. All
callsites are adjusted so that they use
`repo_config_get_bool(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23 08:15:20 -07:00
Patrick Steinhardt 3fda14d86d config: drop `git_config_get_int()` wrapper
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.

Follow through with that intent and remove `git_config_get_int()`. All
callsites are adjusted so that they use
`repo_config_get_int(the_repository, ...)` instead. While some callsites
might already have a repository available, this mechanical conversion is
the exact same as the current situation and thus cannot cause any
regression. Those sites should eventually be cleaned up in a later patch
series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23 08:15:20 -07:00
Patrick Steinhardt 627d08cca7 config: drop `git_config_get_string()` wrapper
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.

Follow through with that intent and remove `git_config_get_string()`.
All callsites are adjusted so that they use
`repo_config_get_string(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23 08:15:19 -07:00
Patrick Steinhardt 9ce196e86b config: drop `git_config()` wrapper
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.

Follow through with that intent and remove `git_config()`. All callsites
are adjusted so that they use `repo_config(the_repository, ...)`
instead. While some callsites might already have a repository available,
this mechanical conversion is the exact same as the current situation
and thus cannot cause any regression. Those sites should eventually be
cleaned up in a later patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23 08:15:18 -07:00
René Scharfe d6ec08788e commit: convert pop_most_recent_commit() to prio_queue
pop_most_recent_commit() calls commit_list_insert_by_date() for parent
commits, which is itself called in a loop.  This can lead to quadratic
complexity if there are many merges.  Replace the commit_list with a
prio_queue to ensure logarithmic worst case complexity and convert all
three users.

Add a performance test that exercises one of them using a pathological
history that consists of 50% merges and 50% root commits to demonstrate
the speedup:

Test                          v2.50.1           HEAD
----------------------------------------------------------------------
1501.2: rev-parse ':/65535'   2.48(2.47+0.00)   0.20(0.19+0.00) -91.9%

Alas, sane histories don't benefit from the conversion much, and
traversing Git's own history takes a 1% performance hit on my machine:

   $ hyperfine -w3 -L git ./git_2.50.1,./git '{git} rev-parse :/^Initial.revision'
   Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision
     Time (mean ± σ):      1.071 s ±  0.004 s    [User: 1.052 s, System: 0.017 s]
     Range (min … max):    1.067 s …  1.078 s    10 runs

   Benchmark 2: ./git rev-parse :/^Initial.revision
     Time (mean ± σ):      1.079 s ±  0.003 s    [User: 1.060 s, System: 0.017 s]
     Range (min … max):    1.074 s …  1.083 s    10 runs

   Summary
     ./git_2.50.1 rev-parse :/^Initial.revision ran
       1.01 ± 0.00 times faster than ./git rev-parse :/^Initial.revision

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-22 07:28:23 -07:00
Junio C Hamano a636d395ff Merge branch 'bc/use-sha256-by-default-in-3.0'
Prepare to flip the default hash function to SHA-256.

* bc/use-sha256-by-default-in-3.0:
  Enable SHA-256 by default in breaking changes mode
  help: add a build option for default hash
  t5300: choose the built-in hash outside of a repo
  t4042: choose the built-in hash outside of a repo
  t1007: choose the built-in hash outside of a repo
  t: default to compile-time default hash if not set
  setup: use the default algorithm to initialize repo format
  Use legacy hash for legacy formats
  builtin: use default hash when outside a repository
  hash: add a constant for the legacy hash algorithm
  hash: add a constant for the default hash algorithm
2025-07-21 09:14:25 -07:00
Junio C Hamano 86c9c14eb9 Merge branch 'bc/use-sha256-by-default-in-3.0' into ps/config-wo-the-repository
* bc/use-sha256-by-default-in-3.0:
  Enable SHA-256 by default in breaking changes mode
  help: add a build option for default hash
  t5300: choose the built-in hash outside of a repo
  t4042: choose the built-in hash outside of a repo
  t1007: choose the built-in hash outside of a repo
  t: default to compile-time default hash if not set
  setup: use the default algorithm to initialize repo format
  Use legacy hash for legacy formats
  builtin: use default hash when outside a repository
  hash: add a constant for the legacy hash algorithm
  hash: add a constant for the default hash algorithm
2025-07-17 09:30:56 -07:00
brian m. carlson 667d251a04 Use legacy hash for legacy formats
We have a large variety of data formats and protocols where no hash
algorithm was defined and the default was assumed to always be SHA-1.
Instead of explicitly stating SHA-1, let's use the constant to represent
the legacy hash algorithm (which is still SHA-1) so that it's clear
for documentary purposes that it's a legacy fallback option and not an
intentional choice to use SHA-1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:58:24 -07:00
Patrick Steinhardt fcf8e3e111 odb: rename `has_object()`
Rename `has_object()` to `odb_has_object()` to match other functions
related to the object database and our modern coding guidelines.

Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:38 -07:00
Patrick Steinhardt e989dd96b8 odb: rename `oid_object_info()`
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.

Introduce compatibility wrappers so that any in-flight topics will
continue to compile.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:37 -07:00
Patrick Steinhardt 798c661ce3 odb: get rid of `the_repository` in `for_each()` functions
There are a couple of iterator-style functions that execute a callback
for each instance of a given set, all of which currently depend on
`the_repository`. Refactor them to instead take an object database as
parameter so that we can get rid of this dependency.

Rename the functions accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:36 -07:00
Patrick Steinhardt 8f49151763 object-store: rename files to "odb.{c,h}"
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:34 -07:00
Patrick Steinhardt 062b914c84 treewide: convert users of `repo_has_object_file()` to `has_object()`
As the comment of `repo_has_object_file()` and its `_with_flags()`
variant tells us, these functions are considered to be deprecated in
favor of `has_object()`. There are a couple of slight benefits in favor
of the replacement:

  - The new function has a short-and-sweet name.

  - More explicit defaults: `has_object()` doesn't fetch missing objects
    via promisor remotes, and neither does it reload packfiles if an
    object wasn't found by default. This ensures that it becomes
    immediately obvious when a simple object existence check may result
    in expensive actions.

Most importantly though, it is confusing that we have two sets of
functions that ultimately do the same thing, but with different
defaults.

Start sunsetting `repo_has_object_file()` and its `_with_flags()`
sibling by replacing all callsites with `has_object()`:

  - `repo_has_object_file(...)` is equivalent to
    `has_object(..., HAS_OBJECT_RECHECK_PACKED | HAS_OBJECT_FETCH_PROMISOR)`.

  - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT)`
    is equivalent to `has_object(..., 0)`.

  - `repo_has_object_file_with_flags(..., OBJECT_INFO_SKIP_FETCH_OBJECT)`
    is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED)`.

  - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)`
    is equivalent to `has_object(..., HAS_OBJECT_FETCH_PROMISOR)`.

The replacements should be functionally equivalent.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29 10:08:13 -07:00
Patrick Steinhardt 68cd492a3e object-store: merge "object-store-ll.h" and "object-store.h"
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.

Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15 08:24:37 -07:00
Junio C Hamano b83a2f9006 Merge branch 'kn/pack-write-with-reduced-globals'
Code clean-up.

* kn/pack-write-with-reduced-globals:
  pack-write: pass hash_algo to internal functions
  pack-write: pass hash_algo to `write_rev_file()`
  pack-write: pass hash_algo to `write_idx_file()`
  pack-write: pass repository to `index_pack_lockfile()`
  pack-write: pass hash_algo to `fixup_pack_header_footer()`
2025-02-03 10:23:34 -08:00
Karthik Nayak e2f6f76585 pack-write: pass repository to `index_pack_lockfile()`
The `index_pack_lockfile()` function uses the global `the_repository`
variable to access the repository. To avoid global variable usage, pass
the repository from the layers above.

Altough the layers above could have access to the repository internally,
simply pass in `the_repository`. This avoids any compatibility issues
and bubbles up global variable usage to upper layers which can be
eventually resolved.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21 12:36:34 -08:00
Justin Tobler ca7158076f fsck: reject misconfigured fsck.skipList
In Git, fsck operations can ignore known broken objects via the
`fsck.skipList` configuration. This option expects a path to a file with
the list of object names. When the configuration is specified without a
path, an error message is printed, but the command continues as if the
configuration was not set. Configuring `fsck.skipList` without a value
is a misconfiguration so config parsing should be more strict and reject
it.

Update `git_fsck_config()` to no longer ignore misconfiguration of
`fsck.skipList`. The same behavior is also present for
`fetch.fsck.skipList` and `receive.fsck.skipList` so the configuration
parsers for these are updated to ensure the related operations remain
consistent.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-07 09:22:25 -08:00
Junio C Hamano 4156b6a741 Merge branch 'ps/build-sign-compare'
Start working to make the codebase buildable with -Wsign-compare.

* ps/build-sign-compare:
  t/helper: don't depend on implicit wraparound
  scalar: address -Wsign-compare warnings
  builtin/patch-id: fix type of `get_one_patchid()`
  builtin/blame: fix type of `length` variable when emitting object ID
  gpg-interface: address -Wsign-comparison warnings
  daemon: fix type of `max_connections`
  daemon: fix loops that have mismatching integer types
  global: trivial conversions to fix `-Wsign-compare` warnings
  pkt-line: fix -Wsign-compare warning on 32 bit platform
  csum-file: fix -Wsign-compare warning on 32-bit platform
  diff.h: fix index used to loop through unsigned integer
  config.mak.dev: drop `-Wno-sign-compare`
  global: mark code units that generate warnings with `-Wsign-compare`
  compat/win32: fix -Wsign-compare warning in "wWinMain()"
  compat/regex: explicitly ignore "-Wsign-compare" warnings
  git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-23 09:32:11 -08:00
Patrick Steinhardt 41f43b8243 global: mark code units that generate warnings with `-Wsign-compare`
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 20:20:02 +09:00
Justin Tobler 05596e93c5 fetch-pack: split out fsck config parsing
When `fetch_pack_config()` is invoked, fetch-pack configuration is
parsed from the config. As part of this operation, fsck message severity
configuration is assigned to the `fsck_msg_types` global variable. This
is optionally used to configure the downstream git-index-pack(1) when
the `--strict` option is specified.

The same parsed fsck message severity configuration is also needed
outside of fetch-pack. Instead of exposing/relying on the existing
global state, split out the fsck config parsing logic into
`fetch_pack_fsck_config()` and expose it. In a subsequent commit, this
is used to provide fsck configuration when invoking `unbundle()`.

For `fetch_pack_fsck_config()` to discern between errors and unhandled
config variables, the return code when `git_config_path()` errors is
changed to a different value also indicating success. This frees up the
previous return code to now indicate the provided config variable
was unhandled. The behavior remains functionally the same.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-28 12:07:58 +09:00
Jonathan Tan 5d4cc78f72 fetch-pack: die if in commit graph but not obj db
When fetching, there is a step in which sought objects are first checked
against the local repository; only objects that are not in the local
repository are then fetched. This check first looks up the commit graph
file, and returns "present" if the object is in there.

However, the action of first looking up the commit graph file is not
done everywhere in Git, especially if the type of the object at the time
of lookup is not known. This means that in a repo corruption situation,
a user may encounter an "object missing" error, attempt to fetch it, and
still encounter the same error later when they reattempt their original
action, because the object is present in the commit graph file but not in
the object DB.

Therefore, make it a fatal error when this occurs. (Note that we cannot
proceed to include this object in the list of objects to be fetched
without changing at least the fetch negotiation code: what would happen
is that the client will send "want X" and "have X" and when I tested
at $DAYJOB with a work server that uses JGit, the server reasonably
returned an empty packfile. And changing the fetch negotiation code to
only use the object DB when deciding what to report as "have" would be
an unnecessary slowdown, I think.)

This was discovered when a lazy fetch of a missing commit completed with
nothing actually fetched, and the writing of the commit graph file after
every fetch then attempted to read said missing commit, triggering a
lazy fetch of said missing commit, resulting in an infinite loop with no
user-visible indication (until they check the list of processes running
on their computer). With this fix, there is no infinite loop. Note that
although the repo corruption we discovered was caused by a bug in GC in
a partial clone, the behavior that this patch teaches Git to warn about
applies to any repo with commit graph enabled and with a missing commit,
whether it is a partial clone or not.

t5330, introduced in 3a1ea94a49 (commit-graph.c: no lazy fetch in
lookup_commit_in_graph(), 2022-07-01), tests that an interaction between
fetch and the commit graph does not cause an infinite loop. This patch
changes the exit code in that situation, so that test had to be changed.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-05 18:57:22 -08:00
Jonathan Tan bf1feb9e53 Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
This reverts commit a6e65fb39c.

This revert simplifies the next patch in this patch set.

The commit message of that commit mentions that the new function "will
be used for the bundle-uri client in a subsequent commit", but it seems
that eventually it wasn't used.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-05 18:57:22 -08:00
Junio C Hamano 3eb6679959 Merge branch 'ps/environ-wo-the-repository'
Code clean-up.

* ps/environ-wo-the-repository: (21 commits)
  environment: stop storing "core.notesRef" globally
  environment: stop storing "core.warnAmbiguousRefs" globally
  environment: stop storing "core.preferSymlinkRefs" globally
  environment: stop storing "core.logAllRefUpdates" globally
  refs: stop modifying global `log_all_ref_updates` variable
  branch: stop modifying `log_all_ref_updates` variable
  repo-settings: track defaults close to `struct repo_settings`
  repo-settings: split out declarations into a standalone header
  environment: guard state depending on a repository
  environment: reorder header to split out `the_repository`-free section
  environment: move `set_git_dir()` and related into setup layer
  environment: make `get_git_namespace()` self-contained
  environment: move object database functions into object layer
  config: make dependency on repo in `read_early_config()` explicit
  config: document `read_early_config()` and `read_very_early_config()`
  environment: make `get_git_work_tree()` accept a repository
  environment: make `get_graft_file()` accept a repository
  environment: make `get_index_file()` accept a repository
  environment: make `get_object_directory()` accept a repository
  environment: make `get_git_common_dir()` accept a repository
  ...
2024-09-23 10:35:05 -07:00
Junio C Hamano 16c0906e8c Merge branch 'ps/leakfixes-part-6'
More leakfixes.

* ps/leakfixes-part-6: (22 commits)
  builtin/repack: fix leaking keep-pack list
  merge-ort: fix two leaks when handling directory rename modifications
  match-trees: fix leaking prefixes in `shift_tree()`
  builtin/fmt-merge-msg: fix leaking buffers
  builtin/grep: fix leaking object context
  builtin/pack-objects: plug leaking list of keep-packs
  builtin/repack: fix leaking line buffer when packing promisors
  negotiator/skipping: fix leaking commit entries
  shallow: fix leaking members of `struct shallow_info`
  shallow: free grafts when unregistering them
  object: clear grafts when clearing parsed object pool
  gpg-interface: fix misdesigned signing key interfaces
  send-pack: fix leaking push cert nonce
  remote: fix leak in reachability check of a remote-tracking ref
  remote: fix leaking tracking refs
  builtin/submodule--helper: fix leaking refs on push-check
  submodule: fix leaking fetch task data
  upload-pack: fix leaking child process data on reachability checks
  builtin/push: fix leaking refspec query result
  send-pack: fix leaking common object IDs
  ...
2024-09-20 11:16:30 -07:00
Patrick Steinhardt a3673f4898 environment: make `get_object_directory()` accept a repository
The `get_object_directory()` function retrieves the path to the object
directory for `the_repository`. Make it accept a `struct repository`
such that it can work on arbitrary repositories and make it part of the
repository subsystem. This reduces our reliance on `the_repository` and
clarifies scope.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-12 10:15:39 -07:00
Jeff King 1a60f2066a drop trailing newline from warning/error/die messages
Our error reporting routines append a trailing newline, and the strings
we pass to them should not include them (otherwise we get an extra blank
line after the message).

These cases were all found by looking at the results of:

  git grep -P '[^_](error|error_errno|warning|die|die_errno)\(.*\\n"[,)]' '*.c'

Note that we _do_ sometimes include a newline in the middle of such
messages, to create multiline output (hence our grep matching "," or ")"
after we see the newline, so we know we're at the end of the string).

It's possible that one or more of these cases could intentionally be
including a blank line at the end, but having looked at them all
manually, I think these are all just mistakes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-05 09:07:12 -07:00
Patrick Steinhardt 63494913ec fetch-pack: fix memory leaks on fetch negotiation
We leak both the `nt_object_array` and `negotiator` structures in
`negotiate_using_fetch()`. Plug both of these leaks.

These leaks were exposed by t5516, but fixing them is not sufficient to
make the whole test suite leak free.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-05 08:49:10 -07:00
John Cai e8207717f1 refs: add referent to each_ref_fn
Add a parameter to each_ref_fn so that callers to the ref APIs
that use this function as a callback can have acess to the
unresolved value of a symbolic ref.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-09 08:47:34 -07:00
Junio C Hamano 125e389470 Merge branch 'xx/bundie-uri-fixes'
When bundleURI interface fetches multiple bundles, Git failed to
take full advantage of all bundles and ended up slurping duplicated
objects.

* xx/bundie-uri-fixes:
  unbundle: extend object verification for fetches
  fetch-pack: expose fsckObjects configuration logic
  bundle-uri: verify oid before writing refs
2024-07-08 14:53:11 -07:00
Junio C Hamano 7b472da915 Merge branch 'ps/use-the-repository'
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help
transition the codebase to rely less on the availability of the
singleton the_repository instance.

* ps/use-the-repository:
  hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE`
  t/helper: remove dependency on `the_repository` in "proc-receive"
  t/helper: fix segfault in "oid-array" command without repository
  t/helper: use correct object hash in partial-clone helper
  compat/fsmonitor: fix socket path in networked SHA256 repos
  replace-object: use hash algorithm from passed-in repository
  protocol-caps: use hash algorithm from passed-in repository
  oidset: pass hash algorithm when parsing file
  http-fetch: don't crash when parsing packfile without a repo
  hash-ll: merge with "hash.h"
  refs: avoid include cycle with "repository.h"
  global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
  hash: require hash algorithm in `empty_tree_oid_hex()`
  hash: require hash algorithm in `is_empty_{blob,tree}_oid()`
  hash: make `is_null_oid()` independent of `the_repository`
  hash: convert `oidcmp()` and `oideq()` to compare whole hash
  global: ensure that object IDs are always padded
  hash: require hash algorithm in `oidread()` and `oidclr()`
  hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`
  hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
2024-07-02 09:59:00 -07:00
Junio C Hamano b781a3e08e Merge branch 'jk/fetch-pack-fsck-wo-lock-pack'
"git fetch-pack -k -k" without passing "--lock-pack" (which we
never do ourselves) did not work at all, which has been corrected.

* jk/fetch-pack-fsck-wo-lock-pack:
  fetch-pack: fix segfault when fscking without --lock-pack
2024-06-27 09:19:59 -07:00
Jeff King 96a6621d25 fetch-pack: fix segfault when fscking without --lock-pack
The fetch-pack internals have multiple options related to creating
".keep" lock-files for the received pack:

  - if args.lock_pack is set, then we tell index-pack to create a .keep
    file. In the fetch-pack plumbing command, this is triggered by
    passing "-k" twice.

  - if the caller passes in a pack_lockfiles string list, then we use it
    to record the path of the keep-file created by index-pack. We get
    that name by reading the stdout of index-pack. In the fetch-pack
    command, this is triggered by passing the (undocumented) --lock-pack
    option; without it, we pass in a NULL string list.

So it's possible to ask index-pack to create the lock-file (using "-k
-k") but not ask to record it (by avoiding "--lock-pack"). This worked
fine until 5476e1efde (fetch-pack: print and use dangling .gitmodules,
2021-02-22), but now it causes a segfault.

Before that commit, if pack_lockfiles was NULL, we wouldn't bother
reading the output from index-pack at all. But since that commit,
index-pack may produce extra output if we asked it to fsck. So even if
nobody cares about the lockfile path, we still need to read it to skip
to the output we do care about.

We correctly check that we didn't get a NULL lockfile path (which can
happen if we did not ask it to create a .keep file at all), but we
missed the case where the lockfile path is not NULL (due to "-k -k") but
the pack_lockfiles string_list is NULL (because nobody passed
"--lock-pack"), and segfault trying to add to the NULL string-list.

We can fix this by skipping the append to the string list when either
the value or the list is NULL. In that case we must also free the
lockfile path to avoid leaking it when it's non-NULL.

Nobody noticed the bug for so long because the transport code used by
"git fetch" always passes in a pack_lockfiles pointer, and remote-curl
(the main user of the fetch-pack plumbing command) always passes
--lock-pack.

Reported-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:58:00 -07:00
Xing Xin d0cbc75680 fetch-pack: expose fsckObjects configuration logic
Currently, we can use "transfer.fsckObjects" and the more specific
"fetch.fsckObjects" to control checks for broken objects in received
packs during fetches. However, these configurations were only
acknowledged by `fetch-pack.c:get_pack` and did not take effect in
direct bundle fetches or fetches with _bundle-uri_ enabled.

This commit exposes the fetch-then-transfer configuration logic by
adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. This
new function is used to replace the assignment for `fsck_objects` in
`fetch-pack.c:get_pack`. In the next commit, this function will also be
used to extend fsck support for bundle-involved fetches.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:30:07 -07:00
Patrick Steinhardt e7da938570 global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.

It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.

Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit

For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).

Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Junio C Hamano cf792653ad Merge branch 'ps/leakfixes'
Leakfixes.

* ps/leakfixes:
  builtin/mv: fix leaks for submodule gitfile paths
  builtin/mv: refactor to use `struct strvec`
  builtin/mv duplicate string list memory
  builtin/mv: refactor `add_slash()` to always return allocated strings
  strvec: add functions to replace and remove strings
  submodule: fix leaking memory for submodule entries
  commit-reach: fix memory leak in `ahead_behind()`
  builtin/credential: clear credential before exit
  config: plug various memory leaks
  config: clarify memory ownership in `git_config_string()`
  builtin/log: stop using globals for format config
  builtin/log: stop using globals for log config
  convert: refactor code to clarify ownership of check_roundtrip_encoding
  diff: refactor code to clarify memory ownership of prefixes
  config: clarify memory ownership in `git_config_pathname()`
  http: refactor code to clarify memory ownership
  checkout: clarify memory ownership in `unique_tracking_name()`
  strbuf: fix leak when `appendwholeline()` fails with EOF
  transport-helper: fix leaking helper name
2024-06-06 12:49:23 -07:00
Patrick Steinhardt 6073b3b5c3 config: clarify memory ownership in `git_config_pathname()`
The out parameter of `git_config_pathname()` is a `const char **` even
though we transfer ownership of memory to the caller. This is quite
misleading and has led to many memory leaks all over the place. Adapt
the parameter to instead be `char **`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-27 11:19:59 -07:00
Junio C Hamano d365a27bf7 Merge branch 'dg/fetch-pack-code-cleanup'
Code clean-up to remove an unused struct definition.

* dg/fetch-pack-code-cleanup:
  fetch-pack: remove unused 'struct loose_object_iter'
2024-05-23 11:04:28 -07:00
Dr. David Alan Gilbert fc0202b0e9 fetch-pack: remove unused 'struct loose_object_iter'
'struct loose_object_iter' in fetch-pack.c is unused since commit
97b2fa08 (fetch-pack: drop custom loose object cache, 2018-11-12).

Remove it.

Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13 16:55:20 -07:00
Patrick Steinhardt 2e5c4758b7 cocci: apply rules to rewrite callers of "refs" interfaces
Apply the rules that rewrite callers of "refs" interfaces to explicitly
pass `struct ref_store`. The resulting patch has been applied with the
`--whitespace=fix` option.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07 10:06:59 -07:00
Junio C Hamano 65973e4e99 Merge branch 'sd/negotiate-trace-fix'
Tracing fix.

* sd/negotiate-trace-fix:
  push: region_leave trace for negotiate_using_fetch
2024-01-30 13:34:13 -08:00
Junio C Hamano 492ee03f60 Merge branch 'en/header-cleanup'
Remove unused header "#include".

* en/header-cleanup:
  treewide: remove unnecessary includes in source files
  treewide: add direct includes currently only pulled in transitively
  trace2/tr2_tls.h: remove unnecessary include
  submodule-config.h: remove unnecessary include
  pkt-line.h: remove unnecessary include
  line-log.h: remove unnecessary include
  http.h: remove unnecessary include
  fsmonitor--daemon.h: remove unnecessary includes
  blame.h: remove unnecessary includes
  archive.h: remove unnecessary include
  treewide: remove unnecessary includes in source files
  treewide: remove unnecessary includes from header files
2024-01-08 14:05:15 -08:00
Sam Delmerico ee9895b0ff push: region_leave trace for negotiate_using_fetch
There were two region_enter events for negotiate_using_fetch instead of
one enter and one leave. This commit replaces the second region_enter
event with a region_leave.

Signed-off-by: Sam Delmerico <delmerico@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-03 15:32:44 -08:00
Elijah Newren eea0e59ffb treewide: remove unnecessary includes in source files
Each of these were checked with
   gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).

...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file.  These cases were:
  * builtin/credential-cache.c
  * builtin/pull.c
  * builtin/send-pack.c

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26 12:04:31 -08:00
Jeff King d49cb162fa fsck: handle NULL value when parsing message config
When parsing fsck.*, receive.fsck.*, or fetch.fsck.*, we don't check for
an implicit bool. So any of:

  [fsck]
  badTree
  [receive "fsck"]
  badTree
  [fetch "fsck"]
  badTree

will cause us to segfault. We can fix it with config_error_nonbool() in
the usual way, but we have to make a few more changes to get good error
messages. The problem is that all three spots do:

  if (skip_prefix(var, "fsck.", &var))

to match and parse the actual message id. But that means that "var" now
just says "badTree" instead of "receive.fsck.badTree", making the
resulting message confusing. We can fix that by storing the parsed
message id in its own separate variable.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09 08:24:47 +09:00