Commit Graph

74087 Commits (f037e607a875790dcadbba3f7a8fc185e2877792)

Author SHA1 Message Date
Jeff King 40d817875d t5500: fix mistaken $SERVER reference in helper function
The end of t5500 contains two tests which use a single helper function,
fetch_filter_blob_limit_zero(). It takes a parameter to point to the
path of the server repository, which we store locally as $SERVER. The
first caller uses the relative path "server", while the second points
into the httpd document root.

Commit 07ef3c6604 (fetch test: use more robust test for filtered
objects, 2019-12-23) refactored some lines, but accidentally switched
"$SERVER" to "server" in one spot. That means the second caller is
looking at the server directory from the previous test rather than its
own.

This happens to work out because the "server" directory from the first
test is still hanging around, and the contents of the two are identical.
But it was clearly not the intended behavior, and is fragile to cleaning
up the leftovers from the first test.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 11:06:45 -07:00
Johannes Schindelin 3c295c87c2 mingw: drop bogus (and unneeded) declaration of `_pgmptr`
In 08809c09aa (mingw: add a helper function to attach GDB to the
current process, 2020-02-13), I added a declaration that was not needed.
Back then, that did not matter, but now that the declaration of that
symbol was changed in mingw-w64's headers, it causes the following
compile error:

      CC compat/mingw.o
  compat/mingw.c: In function 'open_in_gdb':
  compat/mingw.c:35:9: error: function declaration isn't a prototype [-Werror=strict-prototypes]
     35 |         extern char *_pgmptr;
        |         ^~~~~~
  In file included from C:/git-sdk-64/usr/src/git/build-installers/mingw64/lib/gcc/x86_64-w64-mingw32/14.1.0/include/mm_malloc.h:27,
                   from C:/git-sdk-64/usr/src/git/build-installers/mingw64/lib/gcc/x86_64-w64-mingw32/14.1.0/include/xmmintrin.h:34,
                   from C:/git-sdk-64/usr/src/git/build-installers/mingw64/lib/gcc/x86_64-w64-mingw32/14.1.0/include/immintrin.h:31,
                   from C:/git-sdk-64/usr/src/git/build-installers/mingw64/lib/gcc/x86_64-w64-mingw32/14.1.0/include/x86intrin.h:32,
                   from C:/git-sdk-64/usr/src/git/build-installers/mingw64/include/winnt.h:1658,
                   from C:/git-sdk-64/usr/src/git/build-installers/mingw64/include/minwindef.h:163,
                   from C:/git-sdk-64/usr/src/git/build-installers/mingw64/include/windef.h:9,
                   from C:/git-sdk-64/usr/src/git/build-installers/mingw64/include/windows.h:69,
                   from C:/git-sdk-64/usr/src/git/build-installers/mingw64/include/winsock2.h:23,
                   from compat/../git-compat-util.h:215,
                   from compat/mingw.c:1:
  compat/mingw.c:35:22: error: '__p__pgmptr' redeclared without dllimport attribute: previous dllimport ignored [-Werror=attributes]
     35 |         extern char *_pgmptr;
        |                      ^~~~~~~

Let's just drop the declaration and get rid of this compile error.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:59:42 -07:00
Jeff King 96a6621d25 fetch-pack: fix segfault when fscking without --lock-pack
The fetch-pack internals have multiple options related to creating
".keep" lock-files for the received pack:

  - if args.lock_pack is set, then we tell index-pack to create a .keep
    file. In the fetch-pack plumbing command, this is triggered by
    passing "-k" twice.

  - if the caller passes in a pack_lockfiles string list, then we use it
    to record the path of the keep-file created by index-pack. We get
    that name by reading the stdout of index-pack. In the fetch-pack
    command, this is triggered by passing the (undocumented) --lock-pack
    option; without it, we pass in a NULL string list.

So it's possible to ask index-pack to create the lock-file (using "-k
-k") but not ask to record it (by avoiding "--lock-pack"). This worked
fine until 5476e1efde (fetch-pack: print and use dangling .gitmodules,
2021-02-22), but now it causes a segfault.

Before that commit, if pack_lockfiles was NULL, we wouldn't bother
reading the output from index-pack at all. But since that commit,
index-pack may produce extra output if we asked it to fsck. So even if
nobody cares about the lockfile path, we still need to read it to skip
to the output we do care about.

We correctly check that we didn't get a NULL lockfile path (which can
happen if we did not ask it to create a .keep file at all), but we
missed the case where the lockfile path is not NULL (due to "-k -k") but
the pack_lockfiles string_list is NULL (because nobody passed
"--lock-pack"), and segfault trying to add to the NULL string-list.

We can fix this by skipping the append to the string list when either
the value or the list is NULL. In that case we must also free the
lockfile path to avoid leaking it when it's non-NULL.

Nobody noticed the bug for so long because the transport code used by
"git fetch" always passes in a pack_lockfiles pointer, and remote-curl
(the main user of the fetch-pack plumbing command) always passes
--lock-pack.

Reported-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:58:00 -07:00
Elijah Newren f19b916535 merge-ort: convert more error() cases to path_msg()
merge_submodule() stores errors using path_msg(), whereas other call
sites make use of the error() function.  This is inconsistent, and
moving towards path_msg() seems more friendly for libification efforts
since it will allow the caller to determine whether the error messages
need to be printed.

Note that this deferred handling of error messages changes the error
message in a recursive merge from
  error: failed to execute internal merge
to
  From inner merge:  error: failed to execute internal merge
which provides a little more information about the error which may be
useful.  Since the recursive merge strategy still only shows the older
error, we had to adjust the new testcase introduced a few commits ago to
just search for the older message somewhere in the output.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:35:25 -07:00
Elijah Newren 14949d91b6 merge-ort: upon merge abort, only show messages causing the abort
When something goes wrong enough that we need to abort early and not
even attempt merging the remaining files, it probably does not make
sense to report conflicts messages for the subset of files we processed
before hitting the fatal error.  Instead, only show the messages
associated with paths where we hit the fatal error.  Also, print these
messages to stderr rather than stdout.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:35:25 -07:00
Elijah Newren c55c3f20b1 merge-ort: loosen commented requirements
The comment above type_short_descriptions claimed that the order had to
match what was found in the conflict_info_and_types enum.  Since
type_short_descriptions uses designated initializers, the order should
not actually matter; I am guessing that positional initializers may have
been under consideration when that comment was added, but the comment
was not updated when designated initializers were chosen.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:35:25 -07:00
Elijah Newren 5fadf1f933 merge-ort: clearer propagation of failure-to-function from merge_submodule
The 'clean' member variable is somewhat of a tri-state (1 = clean, 0 =
conflicted, -1 = failure-to-determine), but we often like to think of
it as binary (ignoring the possibility of a negative value) and use
constructs like '!clean' to reflect this.  However, these constructs
can make codepaths more difficult to understand, unless we handle the
negative case early and return pre-emptively; do that in
handle_content_merge() to make the code a bit easier to read.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:35:24 -07:00
Elijah Newren 9ed8e17d8a merge-ort: fix type of local 'clean' var in handle_content_merge ()
handle_content_merge() returns an int.  Every caller of
handle_content_merge() expects an int.  However, we declare a local
variable 'clean' that we use for the return value to be unsigned.  To
make matters worse, we also assign 'clean' the return value of
merge_submodule() in one codepath, which is defined to return an int.
It seems that the only reason to have 'clean' be unsigned was to allow a
cutesy bit manipulation operation to be well-defined.  Fix the type of
the 'clean' local in handle_content_merge().

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:35:24 -07:00
Elijah Newren 0b4f726cde merge-ort: maintain expected invariant for priv member
The calling convention for the merge machinery is
   One call to          init_merge_options()
   One or more calls to merge_incore_[non]recursive()
   One call to          merge_finalize()
      (possibly indirectly via merge_switch_to_result())
Both merge_switch_to_result() and merge_finalize() expect
   opt->priv == NULL && result->priv != NULL
which is supposed to be set up by our move_opt_priv_to_result_priv()
function.  However, two codepaths dealing with error cases did not
execute this necessary logic, which could result in assertion failures
(or, if assertions were compiled out, could result in segfaults).  Fix
the oversight and add a test that would have caught one of these
problems.

While at it, also tighten an existing test for a non-recursive merge
to verify that it fails with appropriate status.  Most merge tests in
the testsuite check either for success or conflicts; those testing for
neither are rare and it is good to ensure they support the invariant
assumed by builtin/merge.c in this comment:
    /*
     * The backend exits with 1 when conflicts are
     * left to be resolved, with 2 when it does not
     * handle the given merge at all.
     */
So, explicitly check for the exit status of 2 in these cases.

Reported-by: Matt Cree <matt.cree@gearset.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:35:24 -07:00
Elijah Newren e79bdb426c merge-ort: extract handling of priv member into reusable function
In preparation for a subsequent commit which will ensure we do not
forget to maintain our invariants for the priv member in error
codepaths, extract the necessary functionality out into a separate
function.  This change is cosmetic at this point, and introduces no
changes beyond an extra assertion sanity check.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:35:24 -07:00
Xing Xin 63d903ff52 unbundle: extend object verification for fetches
The existing fetch.fsckObjects and transfer.fsckObjects configurations
were not fully applied to bundle-involved fetches, including direct
bundle fetches and bundle-uri enabled fetches. Furthermore, there was no
object verification support for unbundle.

This commit extends object verification support in `bundle.c:unbundle`
by adding the `VERIFY_BUNDLE_FSCK` option to `verify_bundle_flags`. When
this option is enabled, we append the `--fsck-objects` flag to
`git-index-pack`.

The `VERIFY_BUNDLE_FSCK` option is now used by bundle-involved fetches,
where we use `fetch-pack.c:fetch_pack_fsck_objects` to determine whether
to enable this option for `bundle.c:unbundle`, specifically in:

- `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
- `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.

This addition ensures a consistent logic for object verification during
fetches. Tests have been added to confirm functionality in the scenarios
mentioned above.

Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:30:08 -07:00
Xing Xin d0cbc75680 fetch-pack: expose fsckObjects configuration logic
Currently, we can use "transfer.fsckObjects" and the more specific
"fetch.fsckObjects" to control checks for broken objects in received
packs during fetches. However, these configurations were only
acknowledged by `fetch-pack.c:get_pack` and did not take effect in
direct bundle fetches or fetches with _bundle-uri_ enabled.

This commit exposes the fetch-then-transfer configuration logic by
adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. This
new function is used to replace the assignment for `fsck_objects` in
`fetch-pack.c:get_pack`. In the next commit, this function will also be
used to extend fsck support for bundle-involved fetches.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:30:07 -07:00
Xing Xin 3079026fc1 bundle-uri: verify oid before writing refs
When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not discovered, thus resulting in rather slow
clones. This was particularly problematic when employing the
"creationTokens" heuristic.

To reproduce this issue, consider a repository with a single branch
"main" pointing to commit "A". Firstly, create a base bundle with:

  git bundle create base.bundle main

Then, add a new commit "B" on top of "A", and create an incremental
bundle for "main":

  git bundle create incr.bundle A..main

Now, generate a bundle list with the following content:

  [bundle]
      version = 1
      mode = all
      heuristic = creationToken

  [bundle "base"]
      uri = base.bundle
      creationToken = 1

  [bundle "incr"]
      uri = incr.bundle
      creationToken = 2

A fresh clone with the bundle list above should result in a reference
"refs/bundles/main" pointing to "B" in the new repository. However, git
would still download everything from the server, as if it had fetched
nothing locally.

So why the "refs/bundles/main" is not discovered? After some digging I
found that:

1. Bundles in bundle list are downloaded to local files via
   `bundle-uri.c:download_bundle_list` or via
   `bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
   heuristic.
2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which
   is called by `bundle-uri.c:unbundle_all_bundles` or called within
   `bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
   heuristic.
3. To get all prerequisites of the bundle, the bundle header is read
   inside `bundle-uri.c:unbundle_from_file` to by calling
   `bundle.c:read_bundle_header`.
4. Then it calls `bundle.c:unbundle`, which calls
   `bundle.c:verify_bundle` to ensure the repository contains all the
   prerequisites.
5. `bundle.c:verify_bundle` calls `parse_object`, which eventually
   invokes `packfile.c:prepare_packed_git` or
   `packfile.c:reprepare_packed_git`, filling
   `raw_object_store->packed_git` and setting `packed_git_initialized`.
6. If `bundle.c:unbundle` succeeds, it writes refs via
   `refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here
   bundle refs which can target arbitrary objects are written to the
   repository.
7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions
   `fetch-pack.c:mark_complete_and_common_ref` and
   `fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to
   find local tips for negotiation. The `OBJECT_INFO_QUICK` flag
   prevents `packfile.c:reprepare_packed_git` from being called,
   resulting in failures to parse OIDs that reside only in the latest
   bundle.

In the example above, when unbunding "incr.bundle", "base.pack" is added
to `packed_git` due to prerequisites verification. However, "B" cannot
be found for negotiation because it exists in "incr.pack", which is not
included in `packed_git`.

Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing
bundle refs. When `refs.c:refs_update_ref` is called to write the
corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`.
This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls
`transaction_prepare` of the refs storage backend. For files backend, it
is `files-backend.c:files_transaction_prepare`, and for reftable
backend, it is `reftable-backend.c:reftable_be_transaction_prepare`.
Both functions eventually call `object.c:parse_object`, which can invoke
`packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures
that bundle refs point to valid objects and that all tips from bundle
refs are correctly parsed during subsequent negotiations.

A set of negotiation-related tests for cloning with bundle-uri has been
included to demonstrate that downloaded bundles are utilized to
accelerate fetching.

Additionally, another test has been added to show that bundles with
incorrect headers, where refs point to non-existent objects, do not
result in any bundle refs being created in the repository.

Reviewed-by: Karthik Nayak <karthik.188@gmail.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:30:07 -07:00
Eric Wong 75daa42ddf t1006: ensure cat-file info isn't buffered by default
While working on buffering changes to `git cat-file' in a
separate patch, I inadvertently made the output of --batch-check
and the `info' command of --batch-command buffered as if
opt->buffer_output is turned on by default.

Buffering by default breaks some 3rd-party Perl scripts using
cat-file, but this breakage was not detected anywhere in our
test suite.  Add a small Perl snippet to test this problem since
(AFAIK) other equivalent ways to test this behavior from Bourne
shell and/or awk would require racy sleeps, non-portable FIFOs
or tedious C code.

Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:28:46 -07:00
Peter Krefting 47d2691ae9 git-gui: sv.po: Update Swedish translation (576t0f0u)
Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2024-06-20 19:27:35 +02:00
Kyle Zhao 2e5a636593 merge: avoid write merge state when unable to write index
Writing the merge state after the index write fails is meaningless and
could potentially cause Git to lose changes.

Signed-off-by: Kyle Zhao <kylezhao@tencent.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-18 08:13:35 -07:00
Junio C Hamano 66ac6e4bcd The fourteenth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-17 15:55:59 -07:00
Junio C Hamano 4216329457 Merge branch 'ps/no-writable-strings'
Building with "-Werror -Wwrite-strings" is now supported.

* ps/no-writable-strings: (27 commits)
  config.mak.dev: enable `-Wwrite-strings` warning
  builtin/merge: always store allocated strings in `pull_twohead`
  builtin/rebase: always store allocated string in `options.strategy`
  builtin/rebase: do not assign default backend to non-constant field
  imap-send: fix leaking memory in `imap_server_conf`
  imap-send: drop global `imap_server_conf` variable
  mailmap: always store allocated strings in mailmap blob
  revision: always store allocated strings in output encoding
  remote-curl: avoid assigning string constant to non-const variable
  send-pack: always allocate receive status
  parse-options: cast long name for OPTION_ALIAS
  http: do not assign string constant to non-const field
  compat/win32: fix const-correctness with string constants
  pretty: add casts for decoration option pointers
  object-file: make `buf` parameter of `index_mem()` a constant
  object-file: mark cached object buffers as const
  ident: add casts for fallback name and GECOS
  entry: refactor how we remove items for delayed checkouts
  line-log: always allocate the output prefix
  line-log: stop assigning string constant to file parent buffer
  ...
2024-06-17 15:55:58 -07:00
Junio C Hamano 72576d139d Merge branch 'jk/imap-send-plug-all-msgs-leak'
A leak in "git imap-send" that somehow escapes LSan has been
plugged.

* jk/imap-send-plug-all-msgs-leak:
  imap-send: free all_msgs strbuf in "out" label
2024-06-17 15:55:58 -07:00
Junio C Hamano 42b8b5bfd0 Merge branch 'jk/am-retry'
"git am" has a safety feature to prevent it from starting a new
session when there already is a session going.  It reliably
triggers when a mbox is given on the command line, but it has to
rely on the tty-ness of the standard input.  Add an explicit way to
opt out of this safety with a command line option.

* jk/am-retry:
  test-terminal: drop stdin handling
  am: add explicit "--retry" option
2024-06-17 15:55:56 -07:00
Junio C Hamano cff3b034d5 Merge branch 'jc/varargs-attributes'
Varargs functions that are unannotated as printf-like or execl-like
have been annotated as such.

* jc/varargs-attributes:
  __attribute__: add a few missing format attributes
  __attribute__: mark some functions with LAST_ARG_MUST_BE_NULL
  __attribute__: remove redundant attribute declaration for git_die_config()
  __attribute__: trace2_region_enter_printf() is like "printf"
2024-06-17 15:55:55 -07:00
Junio C Hamano 40a163f217 Merge branch 'ps/ref-storage-migration'
A new command has been added to migrate a repository that uses the
files backend for its ref storage to use the reftable backend, with
limitations.

* ps/ref-storage-migration:
  builtin/refs: new command to migrate ref storage formats
  refs: implement logic to migrate between ref storage formats
  refs: implement removal of ref storages
  worktree: don't store main worktree twice
  reftable: inline `merged_table_release()`
  refs/files: fix NULL pointer deref when releasing ref store
  refs/files: extract function to iterate through root refs
  refs/files: refactor `add_pseudoref_and_head_entries()`
  refs: allow to skip creation of reflog entries
  refs: pass storage format to `ref_store_init()` explicitly
  refs: convert ref storage format to an enum
  setup: unset ref storage when reinitializing repository version
2024-06-17 15:55:55 -07:00
Junio C Hamano dfd668fa84 Merge branch 'ps/check-docs-fix'
"make check-docs" noticed problems and reported to its output but
failed to signal its findings with its exit status, which has been
corrected.

* ps/check-docs-fix:
  ci/test-documentation: work around SyntaxWarning in Python 3.12
  gitlab-ci: add job to run `make check-docs`
  Documentation/lint-manpages: bubble up errors
  Makefile: extract script to lint missing/extraneous manpages
2024-06-17 15:55:54 -07:00
Junio C Hamano 4551858c18 Merge branch 'ps/ci-fix-detection-of-ubuntu-20'
Fix for an embarrassing typo that prevented Python2 tests from running
anywhere.

* ps/ci-fix-detection-of-ubuntu-20:
  ci: fix check for Ubuntu 20.04
2024-06-17 15:55:53 -07:00
Junio C Hamano 7e2d0348d8 Merge branch 'ap/credential-clear-fix'
Upon expiration event, the credential subsystem forgot to clear
in-core authentication material other than password (whose support
was added recently), which has been corrected.

* ap/credential-clear-fix:
  credential: clear expired c->credential, unify secret clearing
2024-06-17 15:55:53 -07:00
Junio C Hamano 4d8ae4d3ca Merge branch 'jc/format-patch-with-range-diff'
The inter/range-diff output has been moved to the end of the patch
when format-patch adds it to a single patch, instead of writing it
before the patch text, to be consistent with what is done for a
cover letter for a multi-patch series.

* jc/format-patch-with-range-diff:
  format-patch: move range/inter diff at the end of a single patch output
  show_log: factor out interdiff/range-diff generation
2024-06-17 15:55:52 -07:00
Eric Wong 8270201971 Git.pm: use array in command_bidi_pipe example
command_bidi_pipe takes the git command and optional arguments as an
array, not a string.  Make sure the documentation example is usable
code.

Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-17 13:41:51 -07:00
Kyle Lippincott 34d982caaf attr: fix msan issue in read_attr_from_index
Memory sanitizer (msan) is detecting a use of an uninitialized variable
(`size`) in `read_attr_from_index`:

    ==2268==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5651f3416504 in read_attr_from_index git/attr.c:868:11
    #1 0x5651f3415530 in read_attr git/attr.c
    #2 0x5651f3413d74 in bootstrap_attr_stack git/attr.c:968:6
    #3 0x5651f3413d74 in prepare_attr_stack git/attr.c:1004:2
    #4 0x5651f3413d74 in collect_some_attrs git/attr.c:1199:2
    #5 0x5651f3413144 in git_check_attr git/attr.c:1345:2
    #6 0x5651f34728da in convert_attrs git/convert.c:1320:2
    #7 0x5651f3473425 in would_convert_to_git_filter_fd git/convert.c:1373:2
    #8 0x5651f357a35e in index_fd git/object-file.c:2630:34
    #9 0x5651f357aa15 in index_path git/object-file.c:2657:7
    #10 0x5651f35db9d9 in add_to_index git/read-cache.c:766:7
    #11 0x5651f35dc170 in add_file_to_index git/read-cache.c:799:9
    #12 0x5651f321f9b2 in add_files git/builtin/add.c:346:7
    #13 0x5651f321f9b2 in cmd_add git/builtin/add.c:565:18
    #14 0x5651f321d327 in run_builtin git/git.c:474:11
    #15 0x5651f321bc9e in handle_builtin git/git.c:729:3
    #16 0x5651f321a792 in run_argv git/git.c:793:4
    #17 0x5651f321a792 in cmd_main git/git.c:928:19
    #18 0x5651f33dde1f in main git/common-main.c:62:11

The issue exists because `size` is an output parameter from
`read_blob_data_from_index`, but it's only modified if
`read_blob_data_from_index` returns non-NULL. The read of `size` when
calling `read_attr_from_buf` unconditionally may read from an
uninitialized value. `read_attr_from_buf` checks that `buf` is non-NULL
before reading from `size`, but by then it's already too late: the
uninitialized read will have happened already. Furthermore, there's no
guarantee that the compiler won't reorder things so that it checks
`size` before checking `!buf`.

Make the call to `read_attr_from_buf` conditional on `buf` being
non-NULL, ensuring that `size` is not read if it's never set.

Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-17 13:32:42 -07:00
Taylor Blau a83e21de6b pack-bitmap.c: ensure pseudo-merge offset reads are bounded
After reading the pseudo-merge extension's metadata table, we allocate
an array to store information about each pseudo-merge, including its
byte offset within the .bitmap file itself.

This is done like so:

    pseudo_merge_ofs = index_end - 24 -
            (index->pseudo_merges.nr * sizeof(uint64_t));
    for (i = 0; i < index->pseudo_merges.nr; i++) {
            index->pseudo_merges.v[i].at = get_be64(pseudo_merge_ofs);
            pseudo_merge_ofs += sizeof(uint64_t);
    }

But if the pseudo-merge table is corrupt, we'll keep calling get_be64()
past the end of the pseudo-merge extension, potentially reading off the
end of the mmap'd region.

Prevent this by ensuring that we have at least `table_size - 24` many
bytes available to read (adding 24 to the left-hand side of our
inequality to account for the length of the metadata component).

This is sufficient to prevent us from reading off the end of the
pseudo-merge extension, and ensures that all of the get_be64() calls
below are in bounds.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 14:19:27 -07:00
Taylor Blau 20c49432e4 Documentation/technical/bitmap-format.txt: add missing position table
While investigating a benign Coverity warning on the new pseudo-merge
implementation, I was struggling to understand the (paraphrased) below:

    ofs = index_end - 24 - (index->pseudo_merges.nr * sizeof(uint64_t));
    for (i = 0; i < index->pseudo_merges.nr; i++) {
            index->pseudo_merges.v[i].at = get_be64(ofs);
            ofs += sizeof(uint64_t);
    }

, in pack-bitmap.c::load_bitmap_header(). Looking at the documentation,
the diagram describing the on-disk format (prior to this patch)
suggested that the optional extended lookup table immediately preceded
the trailing metadata portion.

If that were the case, that would make the above code from
load_bitmap_header() incorrect, as we'd be blindly reading into the
extended offset table.

But later on in the documentation there is a description of the
pseudo-merge position table as immediately preceding the trailing
metadata portion of the extension. And indeed, we do write the position
table in pack-bitmap-write.c:

    /* write positions for all pseudo merges */
    for (i = 0; i < writer->pseudo_merges_nr; i++)
            hashwrite_be64(f, pseudo_merge_ofs[i]);

    hashwrite_be32(f, writer->pseudo_merges_nr);
    hashwrite_be32(f, kh_size(writer->pseudo_merge_commits));
    hashwrite_be64(f, table_start - start);
    hashwrite_be64(f, hashfile_total(f) - start + sizeof(uint64_t));

So this is purely a case of the diagram being out of sync with the
textual description and actual implementation of the format
specification.

Add the missing component back to the format diagram to avoid further
confusion in this area.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 14:19:26 -07:00
Patrick Steinhardt dc89b7d522 hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE`
Guard declarations of functions that implicitly use `the_repository`
with `USE_THE_REPOSITORY_VARIABLE` such that callers don't accidentally
rely on that global variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:35 -07:00
Patrick Steinhardt 912d4756cd t/helper: remove dependency on `the_repository` in "proc-receive"
The "proc-receive" test helper implicitly relies on `the_repository` via
`parse_oid_hex()`. This isn't necessary though, and in fact the whole
command does not depend on `the_repository` at all.

Stop setting up `the_repository` and use `parse_oid_hex_any()` to parse
object IDs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:35 -07:00
Patrick Steinhardt 8e9a1d0dc2 t/helper: fix segfault in "oid-array" command without repository
The "oid-array" test helper can supposedly work without a Git
repository, but will in fact crash because `the_repository->hash_algo`
is not initialized. This is because `oid_pos()`, which is used by
`oid_array_lookup()`, depends on `the_hash_algo->rawsz`.

Ideally, we'd adapt `oid_pos()` to not depend on `the_hash_algo`
anymore. That is a bigger untertaking though, so instead we fall back to
SHA1 when there is no repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:34 -07:00
Patrick Steinhardt fa9e009aa7 t/helper: use correct object hash in partial-clone helper
The `object_info()` function of the partial-clone helper is responsible
for checking the object ID of a repository other than `the_repository`.
We use `parse_oid_hex()` in this function though, which means that we
still depend on `the_repository->hash_algo`.

Fix this by using the object hash of the function-local repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:34 -07:00
Patrick Steinhardt 2a0e11479f compat/fsmonitor: fix socket path in networked SHA256 repos
The IPC socket used by the fsmonitor on Darwin is usually contained in
the Git repository itself. When the repository is hosted on a networked
filesystem though, we instead create the socket path in the user's home
directory or the socket directory. In that case, we derive the path by
hashing the repository path.

But while we always use SHA1 to hash the repository path, we then end up
using `hash_to_hex()` to append the computed hash to the socket path.
This is wrong because `hash_to_hex()` uses the hash algorithm configured
in `the_repository`, which may not be SHA1. The consequence is that we
may append uninitialized bytes to the path when operating in a SHA256
repository.

Fix this bug by using `hash_to_hex_algop()` with SHA1.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:34 -07:00
Patrick Steinhardt 99cf4d6d35 replace-object: use hash algorithm from passed-in repository
In `register_replace_ref()`, we pass in a repository but then use
`get_oid_hex()` to parse passed-in object IDs, which implicitly uses
`the_repository`. Fix this by using the hash algorithm from the
passed-in repository instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:34 -07:00
Patrick Steinhardt 58650befd9 protocol-caps: use hash algorithm from passed-in repository
In `send_info()`, we pass in a repository but then use `get_oid_hex()`
to parse passed-in object IDs, which implicitly uses `the_repository`.
Fix this by using the hash algorithm from the passed-in repository
instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:34 -07:00
Patrick Steinhardt f2c32a66f5 oidset: pass hash algorithm when parsing file
The `oidset_parse_file_carefully()` function implicitly depends on
`the_repository` when parsing object IDs. Fix this by having callers
pass in the hash algorithm to use.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:34 -07:00
Patrick Steinhardt afa2c6ddc8 http-fetch: don't crash when parsing packfile without a repo
The git-http-fetch(1) command accepts a `--packfile=` option, which
allows the user to specify that it shall fetch a specific packfile,
only. The parameter here is the hash of the packfile, which is specific
to the object hash used by the repository. This requirement is implicit
though via our use of `parse_oid_hex()`, which internally uses
`the_repository`.

The git-http-fetch(1) command allows for there to be no repository
though, which only exists such that we can show usage via the "-h"
option. In that case though, starting with c8aed5e8da (repository: stop
setting SHA1 as the default object hash, 2024-05-07), `the_repository`
does not have its object hash initialized anymore and thus we would
crash when trying to parse the object ID outside of a repository.

Fix this issue by dying immediately when we see a "--packfile="
parameter when outside a Git repository. This is not a functional
regression as we would die later on with the same error anyway.

Add a test to detect the segfault. We use the "nongit" function to do
so, which we need to allow-list in `test_must_fail ()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:34 -07:00
Patrick Steinhardt 8a676bdc5c hash-ll: merge with "hash.h"
The "hash-ll.h" header was introduced via d1cbe1e6d8 (hash-ll.h: split
out of hash.h to remove dependency on repository.h, 2023-04-22) to make
explicit the split between hash-related functions that rely on the
global `the_repository`, and those that don't. This split is no longer
necessary now that we we have removed the reliance on `the_repository`.

Merge "hash-ll.h" back into "hash.h". This causes some code units to not
include "repository.h" anymore, which requires us to add some forward
declarations.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Patrick Steinhardt 36026a0f30 refs: avoid include cycle with "repository.h"
There is an include cycle between "refs.h" and "repository.h" via
"commit.h", "object.h" and "hash.h". This has the effect that several
definitions of structs and enums will not be visible once we merge
"hash-ll.h" back into "hash.h" in the next commit.

The only reason that "repository.h" includes "refs.h" is the definition
of `enum ref_storage_format`. Move it into "repository.h" and have
"refs.h" include "repository.h" instead to fix the cycle.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Patrick Steinhardt e7da938570 global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.

It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.

Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit

For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).

Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Patrick Steinhardt 7abbca0e74 hash: require hash algorithm in `empty_tree_oid_hex()`
The `empty_tree_oid_hex()` function use `the_repository` to derive the
hash function that shall be used. Require callers to pass in the hash
algorithm to get rid of this implicit dependency.

While at it, remove the unused `empty_blob_oid_hex()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Patrick Steinhardt 9c34eb93fb hash: require hash algorithm in `is_empty_{blob,tree}_oid()`
Both functions `is_empty_{blob,tree}_oid()` use `the_repository` to
derive the hash function that shall be used. Require callers to pass in
the hash algorithm to get rid of this implicit dependency.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Patrick Steinhardt 861e8c76f6 hash: make `is_null_oid()` independent of `the_repository`
The function `is_null_oid()` uses `oideq(oid, null_oid())` to check
whether a given object ID is the all-zero object ID. `null_oid()`
implicitly relies on `the_repository` though to return the correct null
object ID.

Get rid of this dependency by always comparing the complete hash array
for being all-zeroes. This is possible due to the refactoring of object
IDs so that their hash arrays are always fully initialized.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Patrick Steinhardt d4d364b2c7 hash: convert `oidcmp()` and `oideq()` to compare whole hash
With the preceding commit, the hash array of object IDs is now fully
zero-padded even when the hash algorithm's output is smaller than the
array length. With that, we can now adapt both `oidcmp()` and `oideq()`
to unconditionally memcmp(3P) the whole array instead of depending on
the hash size.

While it may feel inefficient to compare unused bytes for e.g. SHA-1, in
practice the compiler should now be able to produce code that is better
optimized both because we have no branch anymore, but also because the
size to compare is now known at compile time. Goldbolt spits out the
following assembly on an x86_64 platform with GCC 14.1 for the old and
new implementations of `oidcmp()`:

    oidcmp_old:
            movsx   rax, DWORD PTR [rdi+32]
            test    eax, eax
            jne     .L2
            mov     rax, QWORD PTR the_repository[rip]
            cmp     QWORD PTR [rax+16], 32
            je      .L6
    .L4:
            mov     edx, 20
            jmp     memcmp
    .L2:
            lea     rdx, [rax+rax*2]
            lea     rax, [rax+rdx*4]
            lea     rax, hash_algos[0+rax*8]
            cmp     QWORD PTR [rax+16], 32
            jne     .L4
    .L6:
            mov     edx, 32
            jmp     memcmp

    oidcmp_new:
            mov     edx, 32
            jmp     memcmp

The new implementation gets ridi of all the branches and effectively
only ends setting up `edx` for `memcmp()` and then calling it.

And for `oideq()`:

    oideq_old:
            movsx   rcx, DWORD PTR [rdi+32]
            mov     rax, rdi
            mov     rdx, rsi
            test    ecx, ecx
            jne     .L2
            mov     rcx, QWORD PTR the_repository[rip]
            cmp     QWORD PTR [rcx+16], 32
            mov     rcx, QWORD PTR [rax]
            je      .L12
    .L4:
            mov     rsi, QWORD PTR [rax+8]
            xor     rcx, QWORD PTR [rdx]
            xor     rsi, QWORD PTR [rdx+8]
            or      rcx, rsi
            je      .L13
    .L8:
            mov     eax, 1
            test    eax, eax
            sete    al
            movzx   eax, al
            ret
    .L2:
            lea     rsi, [rcx+rcx*2]
            lea     rcx, [rcx+rsi*4]
            lea     rcx, hash_algos[0+rcx*8]
            cmp     QWORD PTR [rcx+16], 32
            mov     rcx, QWORD PTR [rax]
            jne     .L4
    .L12:
            mov     rsi, QWORD PTR [rax+8]
            xor     rcx, QWORD PTR [rdx]
            xor     rsi, QWORD PTR [rdx+8]
            or      rcx, rsi
            jne     .L8
            mov     rcx, QWORD PTR [rax+16]
            mov     rax, QWORD PTR [rax+24]
            xor     rcx, QWORD PTR [rdx+16]
            xor     rax, QWORD PTR [rdx+24]
            or      rcx, rax
            jne     .L8
            xor     eax, eax
    .L14:
            test    eax, eax
            sete    al
            movzx   eax, al
            ret
    .L13:
            mov     edi, DWORD PTR [rdx+16]
            cmp     DWORD PTR [rax+16], edi
            jne     .L8
            xor     eax, eax
            jmp     .L14

    oideq_new:
            mov     rax, QWORD PTR [rdi]
            mov     rdx, QWORD PTR [rdi+8]
            xor     rax, QWORD PTR [rsi]
            xor     rdx, QWORD PTR [rsi+8]
            or      rax, rdx
            je      .L5
    .L2:
            mov     eax, 1
            xor     eax, 1
            ret
    .L5:
            mov     rax, QWORD PTR [rdi+16]
            mov     rdx, QWORD PTR [rdi+24]
            xor     rax, QWORD PTR [rsi+16]
            xor     rdx, QWORD PTR [rsi+24]
            or      rax, rdx
            jne     .L2
            xor     eax, eax
            xor     eax, 1
            ret

Interestingly, the compiler decides to split the comparisons into two so
that it first compares the lower half of the object ID for equality and
then the upper half. If the first check shows a difference, then we
wouldn't even end up comparing the second half.

In both cases, the new generated code is significantly shorter and has
way less branches. While I didn't benchmark the change, I'd be surprised
if the new code was slower.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:32 -07:00
Patrick Steinhardt c98d762ed9 global: ensure that object IDs are always padded
The `oidcmp()` and `oideq()` functions only compare the prefix length as
specified by the given hash algorithm. This mandates that the object IDs
have a valid hash algorithm set, or otherwise we wouldn't be able to
figure out that prefix. As we do not have a hash algorithm in many
cases, for example when handling null object IDs, this assumption cannot
always be fulfilled. We thus have a fallback in place that instead uses
`the_repository` to derive the hash function. This implicit dependency
is hidden away from callers and can be quite surprising, especially in
contexts where there may be no repository.

In theory, we can adapt those functions to always memcmp(3P) the whole
length of their hash arrays. But there exist a couple of sites where we
populate `struct object_id`s such that only the prefix of its hash that
is actually used by the hash algorithm is populated. The remaining bytes
are left uninitialized. The fact that those bytes are uninitialized also
leads to warnings under Valgrind in some places where we copy those
bytes.

Refactor callsites where we populate object IDs to always initialize all
bytes. This also allows us to get rid of `oidcpy_with_padding()`, for
one because the input is now fully initialized, and because `oidcpy()`
will now always copy the whole hash array.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:32 -07:00
Patrick Steinhardt 9da95bda74 hash: require hash algorithm in `oidread()` and `oidclr()`
Both `oidread()` and `oidclr()` use `the_repository` to derive the hash
function that shall be used. Require callers to pass in the hash
algorithm to get rid of this implicit dependency.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:32 -07:00
Patrick Steinhardt f4836570a7 hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`
Many of our hash functions have two variants, one receiving a `struct
git_hash_algo` and one that derives it via `the_repository`. Adapt all
of those functions to always require the hash algorithm as input and
drop the variants that do not accept one.

As those functions are now independent of `the_repository`, we can move
them from "hash.h" to "hash-ll.h".

Note that both in this and subsequent commits in this series we always
just pass `the_repository->hash_algo` as input even if it is obvious
that there is a repository in the context that we should be using the
hash from instead. This is done to be on the safe side and not introduce
any regressions. All callsites should eventually be amended to use a
repo passed via parameters, but this is outside the scope of this patch
series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:32 -07:00
Patrick Steinhardt 129cb1b99d hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
The functions `is_empty_{blob,tree}_sha1()` are mostly unused, except
for a single callsite in "read-cache.c". Most callsites have long since
been converted to use the equivalents that accept a `struct object_id`
instead of a string.

Adapt the remaining callsite and drop those functions.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:32 -07:00