Commit Graph

74067 Commits (c3d034df1606beca8a4ba3bc8a8266f4c711f23c)

Author SHA1 Message Date
Junio C Hamano cff3b034d5 Merge branch 'jc/varargs-attributes'
Varargs functions that are unannotated as printf-like or execl-like
have been annotated as such.

* jc/varargs-attributes:
  __attribute__: add a few missing format attributes
  __attribute__: mark some functions with LAST_ARG_MUST_BE_NULL
  __attribute__: remove redundant attribute declaration for git_die_config()
  __attribute__: trace2_region_enter_printf() is like "printf"
2024-06-17 15:55:55 -07:00
Junio C Hamano 40a163f217 Merge branch 'ps/ref-storage-migration'
A new command has been added to migrate a repository that uses the
files backend for its ref storage to use the reftable backend, with
limitations.

* ps/ref-storage-migration:
  builtin/refs: new command to migrate ref storage formats
  refs: implement logic to migrate between ref storage formats
  refs: implement removal of ref storages
  worktree: don't store main worktree twice
  reftable: inline `merged_table_release()`
  refs/files: fix NULL pointer deref when releasing ref store
  refs/files: extract function to iterate through root refs
  refs/files: refactor `add_pseudoref_and_head_entries()`
  refs: allow to skip creation of reflog entries
  refs: pass storage format to `ref_store_init()` explicitly
  refs: convert ref storage format to an enum
  setup: unset ref storage when reinitializing repository version
2024-06-17 15:55:55 -07:00
Junio C Hamano dfd668fa84 Merge branch 'ps/check-docs-fix'
"make check-docs" noticed problems and reported to its output but
failed to signal its findings with its exit status, which has been
corrected.

* ps/check-docs-fix:
  ci/test-documentation: work around SyntaxWarning in Python 3.12
  gitlab-ci: add job to run `make check-docs`
  Documentation/lint-manpages: bubble up errors
  Makefile: extract script to lint missing/extraneous manpages
2024-06-17 15:55:54 -07:00
Junio C Hamano 4551858c18 Merge branch 'ps/ci-fix-detection-of-ubuntu-20'
Fix for an embarrassing typo that prevented Python2 tests from running
anywhere.

* ps/ci-fix-detection-of-ubuntu-20:
  ci: fix check for Ubuntu 20.04
2024-06-17 15:55:53 -07:00
Junio C Hamano 7e2d0348d8 Merge branch 'ap/credential-clear-fix'
Upon expiration event, the credential subsystem forgot to clear
in-core authentication material other than password (whose support
was added recently), which has been corrected.

* ap/credential-clear-fix:
  credential: clear expired c->credential, unify secret clearing
2024-06-17 15:55:53 -07:00
Junio C Hamano 4d8ae4d3ca Merge branch 'jc/format-patch-with-range-diff'
The inter/range-diff output has been moved to the end of the patch
when format-patch adds it to a single patch, instead of writing it
before the patch text, to be consistent with what is done for a
cover letter for a multi-patch series.

* jc/format-patch-with-range-diff:
  format-patch: move range/inter diff at the end of a single patch output
  show_log: factor out interdiff/range-diff generation
2024-06-17 15:55:52 -07:00
Eric Wong 8270201971 Git.pm: use array in command_bidi_pipe example
command_bidi_pipe takes the git command and optional arguments as an
array, not a string.  Make sure the documentation example is usable
code.

Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-17 13:41:51 -07:00
Kyle Lippincott 34d982caaf attr: fix msan issue in read_attr_from_index
Memory sanitizer (msan) is detecting a use of an uninitialized variable
(`size`) in `read_attr_from_index`:

    ==2268==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5651f3416504 in read_attr_from_index git/attr.c:868:11
    #1 0x5651f3415530 in read_attr git/attr.c
    #2 0x5651f3413d74 in bootstrap_attr_stack git/attr.c:968:6
    #3 0x5651f3413d74 in prepare_attr_stack git/attr.c:1004:2
    #4 0x5651f3413d74 in collect_some_attrs git/attr.c:1199:2
    #5 0x5651f3413144 in git_check_attr git/attr.c:1345:2
    #6 0x5651f34728da in convert_attrs git/convert.c:1320:2
    #7 0x5651f3473425 in would_convert_to_git_filter_fd git/convert.c:1373:2
    #8 0x5651f357a35e in index_fd git/object-file.c:2630:34
    #9 0x5651f357aa15 in index_path git/object-file.c:2657:7
    #10 0x5651f35db9d9 in add_to_index git/read-cache.c:766:7
    #11 0x5651f35dc170 in add_file_to_index git/read-cache.c:799:9
    #12 0x5651f321f9b2 in add_files git/builtin/add.c:346:7
    #13 0x5651f321f9b2 in cmd_add git/builtin/add.c:565:18
    #14 0x5651f321d327 in run_builtin git/git.c:474:11
    #15 0x5651f321bc9e in handle_builtin git/git.c:729:3
    #16 0x5651f321a792 in run_argv git/git.c:793:4
    #17 0x5651f321a792 in cmd_main git/git.c:928:19
    #18 0x5651f33dde1f in main git/common-main.c:62:11

The issue exists because `size` is an output parameter from
`read_blob_data_from_index`, but it's only modified if
`read_blob_data_from_index` returns non-NULL. The read of `size` when
calling `read_attr_from_buf` unconditionally may read from an
uninitialized value. `read_attr_from_buf` checks that `buf` is non-NULL
before reading from `size`, but by then it's already too late: the
uninitialized read will have happened already. Furthermore, there's no
guarantee that the compiler won't reorder things so that it checks
`size` before checking `!buf`.

Make the call to `read_attr_from_buf` conditional on `buf` being
non-NULL, ensuring that `size` is not read if it's never set.

Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-17 13:32:42 -07:00
Taylor Blau a83e21de6b pack-bitmap.c: ensure pseudo-merge offset reads are bounded
After reading the pseudo-merge extension's metadata table, we allocate
an array to store information about each pseudo-merge, including its
byte offset within the .bitmap file itself.

This is done like so:

    pseudo_merge_ofs = index_end - 24 -
            (index->pseudo_merges.nr * sizeof(uint64_t));
    for (i = 0; i < index->pseudo_merges.nr; i++) {
            index->pseudo_merges.v[i].at = get_be64(pseudo_merge_ofs);
            pseudo_merge_ofs += sizeof(uint64_t);
    }

But if the pseudo-merge table is corrupt, we'll keep calling get_be64()
past the end of the pseudo-merge extension, potentially reading off the
end of the mmap'd region.

Prevent this by ensuring that we have at least `table_size - 24` many
bytes available to read (adding 24 to the left-hand side of our
inequality to account for the length of the metadata component).

This is sufficient to prevent us from reading off the end of the
pseudo-merge extension, and ensures that all of the get_be64() calls
below are in bounds.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 14:19:27 -07:00
Taylor Blau 20c49432e4 Documentation/technical/bitmap-format.txt: add missing position table
While investigating a benign Coverity warning on the new pseudo-merge
implementation, I was struggling to understand the (paraphrased) below:

    ofs = index_end - 24 - (index->pseudo_merges.nr * sizeof(uint64_t));
    for (i = 0; i < index->pseudo_merges.nr; i++) {
            index->pseudo_merges.v[i].at = get_be64(ofs);
            ofs += sizeof(uint64_t);
    }

, in pack-bitmap.c::load_bitmap_header(). Looking at the documentation,
the diagram describing the on-disk format (prior to this patch)
suggested that the optional extended lookup table immediately preceded
the trailing metadata portion.

If that were the case, that would make the above code from
load_bitmap_header() incorrect, as we'd be blindly reading into the
extended offset table.

But later on in the documentation there is a description of the
pseudo-merge position table as immediately preceding the trailing
metadata portion of the extension. And indeed, we do write the position
table in pack-bitmap-write.c:

    /* write positions for all pseudo merges */
    for (i = 0; i < writer->pseudo_merges_nr; i++)
            hashwrite_be64(f, pseudo_merge_ofs[i]);

    hashwrite_be32(f, writer->pseudo_merges_nr);
    hashwrite_be32(f, kh_size(writer->pseudo_merge_commits));
    hashwrite_be64(f, table_start - start);
    hashwrite_be64(f, hashfile_total(f) - start + sizeof(uint64_t));

So this is purely a case of the diagram being out of sync with the
textual description and actual implementation of the format
specification.

Add the missing component back to the format diagram to avoid further
confusion in this area.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 14:19:26 -07:00
Patrick Steinhardt dc89b7d522 hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE`
Guard declarations of functions that implicitly use `the_repository`
with `USE_THE_REPOSITORY_VARIABLE` such that callers don't accidentally
rely on that global variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:35 -07:00
Patrick Steinhardt 912d4756cd t/helper: remove dependency on `the_repository` in "proc-receive"
The "proc-receive" test helper implicitly relies on `the_repository` via
`parse_oid_hex()`. This isn't necessary though, and in fact the whole
command does not depend on `the_repository` at all.

Stop setting up `the_repository` and use `parse_oid_hex_any()` to parse
object IDs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:35 -07:00
Patrick Steinhardt 8e9a1d0dc2 t/helper: fix segfault in "oid-array" command without repository
The "oid-array" test helper can supposedly work without a Git
repository, but will in fact crash because `the_repository->hash_algo`
is not initialized. This is because `oid_pos()`, which is used by
`oid_array_lookup()`, depends on `the_hash_algo->rawsz`.

Ideally, we'd adapt `oid_pos()` to not depend on `the_hash_algo`
anymore. That is a bigger untertaking though, so instead we fall back to
SHA1 when there is no repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:34 -07:00
Patrick Steinhardt fa9e009aa7 t/helper: use correct object hash in partial-clone helper
The `object_info()` function of the partial-clone helper is responsible
for checking the object ID of a repository other than `the_repository`.
We use `parse_oid_hex()` in this function though, which means that we
still depend on `the_repository->hash_algo`.

Fix this by using the object hash of the function-local repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:34 -07:00
Patrick Steinhardt 2a0e11479f compat/fsmonitor: fix socket path in networked SHA256 repos
The IPC socket used by the fsmonitor on Darwin is usually contained in
the Git repository itself. When the repository is hosted on a networked
filesystem though, we instead create the socket path in the user's home
directory or the socket directory. In that case, we derive the path by
hashing the repository path.

But while we always use SHA1 to hash the repository path, we then end up
using `hash_to_hex()` to append the computed hash to the socket path.
This is wrong because `hash_to_hex()` uses the hash algorithm configured
in `the_repository`, which may not be SHA1. The consequence is that we
may append uninitialized bytes to the path when operating in a SHA256
repository.

Fix this bug by using `hash_to_hex_algop()` with SHA1.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:34 -07:00
Patrick Steinhardt 99cf4d6d35 replace-object: use hash algorithm from passed-in repository
In `register_replace_ref()`, we pass in a repository but then use
`get_oid_hex()` to parse passed-in object IDs, which implicitly uses
`the_repository`. Fix this by using the hash algorithm from the
passed-in repository instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:34 -07:00
Patrick Steinhardt 58650befd9 protocol-caps: use hash algorithm from passed-in repository
In `send_info()`, we pass in a repository but then use `get_oid_hex()`
to parse passed-in object IDs, which implicitly uses `the_repository`.
Fix this by using the hash algorithm from the passed-in repository
instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:34 -07:00
Patrick Steinhardt f2c32a66f5 oidset: pass hash algorithm when parsing file
The `oidset_parse_file_carefully()` function implicitly depends on
`the_repository` when parsing object IDs. Fix this by having callers
pass in the hash algorithm to use.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:34 -07:00
Patrick Steinhardt afa2c6ddc8 http-fetch: don't crash when parsing packfile without a repo
The git-http-fetch(1) command accepts a `--packfile=` option, which
allows the user to specify that it shall fetch a specific packfile,
only. The parameter here is the hash of the packfile, which is specific
to the object hash used by the repository. This requirement is implicit
though via our use of `parse_oid_hex()`, which internally uses
`the_repository`.

The git-http-fetch(1) command allows for there to be no repository
though, which only exists such that we can show usage via the "-h"
option. In that case though, starting with c8aed5e8da (repository: stop
setting SHA1 as the default object hash, 2024-05-07), `the_repository`
does not have its object hash initialized anymore and thus we would
crash when trying to parse the object ID outside of a repository.

Fix this issue by dying immediately when we see a "--packfile="
parameter when outside a Git repository. This is not a functional
regression as we would die later on with the same error anyway.

Add a test to detect the segfault. We use the "nongit" function to do
so, which we need to allow-list in `test_must_fail ()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:34 -07:00
Patrick Steinhardt 8a676bdc5c hash-ll: merge with "hash.h"
The "hash-ll.h" header was introduced via d1cbe1e6d8 (hash-ll.h: split
out of hash.h to remove dependency on repository.h, 2023-04-22) to make
explicit the split between hash-related functions that rely on the
global `the_repository`, and those that don't. This split is no longer
necessary now that we we have removed the reliance on `the_repository`.

Merge "hash-ll.h" back into "hash.h". This causes some code units to not
include "repository.h" anymore, which requires us to add some forward
declarations.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Patrick Steinhardt 36026a0f30 refs: avoid include cycle with "repository.h"
There is an include cycle between "refs.h" and "repository.h" via
"commit.h", "object.h" and "hash.h". This has the effect that several
definitions of structs and enums will not be visible once we merge
"hash-ll.h" back into "hash.h" in the next commit.

The only reason that "repository.h" includes "refs.h" is the definition
of `enum ref_storage_format`. Move it into "repository.h" and have
"refs.h" include "repository.h" instead to fix the cycle.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Patrick Steinhardt e7da938570 global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.

It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.

Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit

For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).

Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Patrick Steinhardt 7abbca0e74 hash: require hash algorithm in `empty_tree_oid_hex()`
The `empty_tree_oid_hex()` function use `the_repository` to derive the
hash function that shall be used. Require callers to pass in the hash
algorithm to get rid of this implicit dependency.

While at it, remove the unused `empty_blob_oid_hex()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Patrick Steinhardt 9c34eb93fb hash: require hash algorithm in `is_empty_{blob,tree}_oid()`
Both functions `is_empty_{blob,tree}_oid()` use `the_repository` to
derive the hash function that shall be used. Require callers to pass in
the hash algorithm to get rid of this implicit dependency.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Patrick Steinhardt 861e8c76f6 hash: make `is_null_oid()` independent of `the_repository`
The function `is_null_oid()` uses `oideq(oid, null_oid())` to check
whether a given object ID is the all-zero object ID. `null_oid()`
implicitly relies on `the_repository` though to return the correct null
object ID.

Get rid of this dependency by always comparing the complete hash array
for being all-zeroes. This is possible due to the refactoring of object
IDs so that their hash arrays are always fully initialized.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Patrick Steinhardt d4d364b2c7 hash: convert `oidcmp()` and `oideq()` to compare whole hash
With the preceding commit, the hash array of object IDs is now fully
zero-padded even when the hash algorithm's output is smaller than the
array length. With that, we can now adapt both `oidcmp()` and `oideq()`
to unconditionally memcmp(3P) the whole array instead of depending on
the hash size.

While it may feel inefficient to compare unused bytes for e.g. SHA-1, in
practice the compiler should now be able to produce code that is better
optimized both because we have no branch anymore, but also because the
size to compare is now known at compile time. Goldbolt spits out the
following assembly on an x86_64 platform with GCC 14.1 for the old and
new implementations of `oidcmp()`:

    oidcmp_old:
            movsx   rax, DWORD PTR [rdi+32]
            test    eax, eax
            jne     .L2
            mov     rax, QWORD PTR the_repository[rip]
            cmp     QWORD PTR [rax+16], 32
            je      .L6
    .L4:
            mov     edx, 20
            jmp     memcmp
    .L2:
            lea     rdx, [rax+rax*2]
            lea     rax, [rax+rdx*4]
            lea     rax, hash_algos[0+rax*8]
            cmp     QWORD PTR [rax+16], 32
            jne     .L4
    .L6:
            mov     edx, 32
            jmp     memcmp

    oidcmp_new:
            mov     edx, 32
            jmp     memcmp

The new implementation gets ridi of all the branches and effectively
only ends setting up `edx` for `memcmp()` and then calling it.

And for `oideq()`:

    oideq_old:
            movsx   rcx, DWORD PTR [rdi+32]
            mov     rax, rdi
            mov     rdx, rsi
            test    ecx, ecx
            jne     .L2
            mov     rcx, QWORD PTR the_repository[rip]
            cmp     QWORD PTR [rcx+16], 32
            mov     rcx, QWORD PTR [rax]
            je      .L12
    .L4:
            mov     rsi, QWORD PTR [rax+8]
            xor     rcx, QWORD PTR [rdx]
            xor     rsi, QWORD PTR [rdx+8]
            or      rcx, rsi
            je      .L13
    .L8:
            mov     eax, 1
            test    eax, eax
            sete    al
            movzx   eax, al
            ret
    .L2:
            lea     rsi, [rcx+rcx*2]
            lea     rcx, [rcx+rsi*4]
            lea     rcx, hash_algos[0+rcx*8]
            cmp     QWORD PTR [rcx+16], 32
            mov     rcx, QWORD PTR [rax]
            jne     .L4
    .L12:
            mov     rsi, QWORD PTR [rax+8]
            xor     rcx, QWORD PTR [rdx]
            xor     rsi, QWORD PTR [rdx+8]
            or      rcx, rsi
            jne     .L8
            mov     rcx, QWORD PTR [rax+16]
            mov     rax, QWORD PTR [rax+24]
            xor     rcx, QWORD PTR [rdx+16]
            xor     rax, QWORD PTR [rdx+24]
            or      rcx, rax
            jne     .L8
            xor     eax, eax
    .L14:
            test    eax, eax
            sete    al
            movzx   eax, al
            ret
    .L13:
            mov     edi, DWORD PTR [rdx+16]
            cmp     DWORD PTR [rax+16], edi
            jne     .L8
            xor     eax, eax
            jmp     .L14

    oideq_new:
            mov     rax, QWORD PTR [rdi]
            mov     rdx, QWORD PTR [rdi+8]
            xor     rax, QWORD PTR [rsi]
            xor     rdx, QWORD PTR [rsi+8]
            or      rax, rdx
            je      .L5
    .L2:
            mov     eax, 1
            xor     eax, 1
            ret
    .L5:
            mov     rax, QWORD PTR [rdi+16]
            mov     rdx, QWORD PTR [rdi+24]
            xor     rax, QWORD PTR [rsi+16]
            xor     rdx, QWORD PTR [rsi+24]
            or      rax, rdx
            jne     .L2
            xor     eax, eax
            xor     eax, 1
            ret

Interestingly, the compiler decides to split the comparisons into two so
that it first compares the lower half of the object ID for equality and
then the upper half. If the first check shows a difference, then we
wouldn't even end up comparing the second half.

In both cases, the new generated code is significantly shorter and has
way less branches. While I didn't benchmark the change, I'd be surprised
if the new code was slower.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:32 -07:00
Patrick Steinhardt c98d762ed9 global: ensure that object IDs are always padded
The `oidcmp()` and `oideq()` functions only compare the prefix length as
specified by the given hash algorithm. This mandates that the object IDs
have a valid hash algorithm set, or otherwise we wouldn't be able to
figure out that prefix. As we do not have a hash algorithm in many
cases, for example when handling null object IDs, this assumption cannot
always be fulfilled. We thus have a fallback in place that instead uses
`the_repository` to derive the hash function. This implicit dependency
is hidden away from callers and can be quite surprising, especially in
contexts where there may be no repository.

In theory, we can adapt those functions to always memcmp(3P) the whole
length of their hash arrays. But there exist a couple of sites where we
populate `struct object_id`s such that only the prefix of its hash that
is actually used by the hash algorithm is populated. The remaining bytes
are left uninitialized. The fact that those bytes are uninitialized also
leads to warnings under Valgrind in some places where we copy those
bytes.

Refactor callsites where we populate object IDs to always initialize all
bytes. This also allows us to get rid of `oidcpy_with_padding()`, for
one because the input is now fully initialized, and because `oidcpy()`
will now always copy the whole hash array.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:32 -07:00
Patrick Steinhardt 9da95bda74 hash: require hash algorithm in `oidread()` and `oidclr()`
Both `oidread()` and `oidclr()` use `the_repository` to derive the hash
function that shall be used. Require callers to pass in the hash
algorithm to get rid of this implicit dependency.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:32 -07:00
Patrick Steinhardt f4836570a7 hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`
Many of our hash functions have two variants, one receiving a `struct
git_hash_algo` and one that derives it via `the_repository`. Adapt all
of those functions to always require the hash algorithm as input and
drop the variants that do not accept one.

As those functions are now independent of `the_repository`, we can move
them from "hash.h" to "hash-ll.h".

Note that both in this and subsequent commits in this series we always
just pass `the_repository->hash_algo` as input even if it is obvious
that there is a repository in the context that we should be using the
hash from instead. This is done to be on the safe side and not introduce
any regressions. All callsites should eventually be amended to use a
repo passed via parameters, but this is outside the scope of this patch
series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:32 -07:00
Patrick Steinhardt 129cb1b99d hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
The functions `is_empty_{blob,tree}_sha1()` are mostly unused, except
for a single callsite in "read-cache.c". Most callsites have long since
been converted to use the equivalents that accept a `struct object_id`
instead of a string.

Adapt the remaining callsite and drop those functions.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:32 -07:00
Jeff King aecd794fca remote: drop checks for zero-url case
Now that the previous commit removed the possibility that a "struct
remote" will ever have zero url fields, we can drop a number of
redundant checks and untriggerable code paths.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 09:34:39 -07:00
Jeff King ffce821880 remote: always require at least one url in a remote
When we return a struct from remote_get(), the result _almost_ always
has at least one url. In remotes_remote_get_1(), we do this:

        if (name_given && !valid_remote(ret))
                add_url_alias(remote_state, ret, name);
        if (!valid_remote(ret))
                return NULL;

So if the remote doesn't have a url, we give it one based on the name
(this is how unconfigured urls are used as remotes). And if that doesn't
work, we return NULL.

But there's a catch: valid_remote() checks that we have at least one url
_unless_ the remote.*.vcs field is set. This comes from c578f51d52 (Add
a config option for remotes to specify a foreign vcs, 2009-11-18), and
the whole idea was to support remote helpers that don't have their own
url.

However, that mode has been broken since 25d5cc488a (Pass unknown
protocols to external protocol handlers, 2009-12-09)! That commit
unconditionally looks at the url in get_helper(), causing a segfault
with something like:

  git -c remote.foo.vcs=bar fetch foo

We could fix that now, of course. But given that it has been broken for
almost 15 years and nobody noticed, there's a better option. This weird
"there might not be a url" special case requires checks all over the
code base, and it's not clear if there are other similar segfaults
lurking. It would be nice if we could drop that special case.

So instead, let's let the "the remote name is the url" code kick in. If
you have "remote.foo.vcs", then your url (unless otherwise configured)
is "foo". This does have a visible effect compared to what 25d5cc488a
was trying to do. The idea back then is that for a remote without a url,
we'd run:

   # only one command-line option!
   git-remote-bar foo

whereas with our default url, now we'll run:

  git-remote-bar foo foo

Again, in practice nobody can be relying on this because it has been
segfaulting for 15 years. We should consider just removing this "vcs"
config option entirely, but that would be a user-visible breakage. So by
fixing it this way, we can keep things working that have been working,
and simplify away one special case inside our code.

This fixes the segfault from 25d5cc488a (demonstrated by the test), and
we can build further cleanups on top.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 09:34:38 -07:00
Jeff King 7384e75618 t5801: test remote.*.vcs config
The usual way to trigger a remote helper is to use the "::" syntax from:
87422439d1 (Allow specifying the remote helper in the url, 2009-11-18).
Doing:

  git config remote.origin.url hg::https://example.com/repo

will run "git-remote-hg origin https://example.com/repo". Or you can
use the fallback handling from 25d5cc488a (Pass unknown protocols to
external protocol handlers, 2009-12-09):

  git config remote.origin.url "foo://bar"

which will run "git-remote-foo origin foo://bar".

But there's a third way, from c578f51d52 (Add a config option for
remotes to specify a foreign vcs, 2009-11-18):

  git config remote.origin.vcs foo
  git config remote.origin.url bar

which will run "git-remote-foo origin bar". This is mostly redundant
with the other methods, except that it is supposed to allow you to run
without a URL at all. So:

  git config remote.origin.vcs foo

would run "git-remote-foo origin" with no extra URL parameter (under the
assumption that the helper somehow knows how to access the remote repo).
However, this mode has been broken since 25d5cc488a, shortly after it
was added! That commit taught the transport code to always look at the
URL string to parse off the "foo::" bits, meaning it would always
segfault in the no-url case. You can see that with:

  git -c remote.foo.vcs=bar fetch foo

Nobody seems to have noticed in the almost 15 years since, so presumably
it's not a well-used feature. And without that, arguably the whole
remote.*.vcs feature could be removed entirely, as it isn't offering
anything you couldn't do with the "helper::" syntax. But it _does_ work
if you have a URL, and it has been advertised in the documentation for
all that time. So we shouldn't just remove it without warning.

Likewise, even if we were going to deprecate it, we should avoid
breaking it in the meantime. Since there are no tests for it at all,
let's add a few basic ones:

  - this syntax doesn't work well with "git clone" (another point
    against it versus "helper::"). But we can use "clone -c" to set up
    the config manually, passing the URL as usual to clone. This does
    work, though note that I had to use --no-local in the test to avoid
    broken interactions between the local code and the helper. In the
    real world this would be a non-issue, since the remote URL would
    generally not also be a local Git repo!

  - likewise, we should be able to set up the config manually and fetch
    into a repository. This also works.

  - we can simulate a vcs that has no URL support by stuffing the remote
    path into another environment variable. This should work, but
    doesn't (it hits the segfault mentioned above).

In the first two cases, I took the extra step of checking GIT_TRACE
output to confirm that we actually ran the helper (since the URL is a
valid Git repo, the clone/fetch would appear to work even if we
didn't use the helper at all!).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 09:34:38 -07:00
Jeff King e2269a2b59 t5801: make remote-testgit GIT_DIR setup more robust
Our tests use a fake helper that just imports from an existing Git
repository. We're fed the path to that repo on the command line, and
derive the GIT_DIR by tacking on "/.git".

This is wrong if the path is a bare repository, but that's OK since this
is just a limited test. But it's also wrong if the transport code feeds
us the actual .git directory itself (i.e., we expect "/path/to/repo" but
it gives us "/path/to/repo/.git"). None of the current tests do that,
but let's future-proof ourselves against adding a test that does.

We can instead ask "rev-parse" to set our GIT_DIR. Note that we have to
first unset other git variables from our environment. Coming into this
script, we'll have GIT_DIR set to the fetching repository, and we need
to "switch" to the remote one.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 09:34:38 -07:00
Jeff King 9badf97c42 remote: allow resetting url list
Because remote.*.url is treated as a multi-valued key, there is no way
to override previous config. So for example if you have
remote.origin.url set to some wrong value, doing:

  git -c remote.origin.url=right fetch

would not work. It would append "right" to the list, which means we'd
still fetch from "wrong" (since subsequent values are used only as push
urls).

Let's provide a mechanism to reset the list, like we do for other
multi-valued keys (e.g., credential.helper, http.extraheaders, and
merge.suppressDest all use this "empty string means reset" pattern).

Reported-by: Mathew George <mathewegeorge@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 09:34:38 -07:00
Jeff King bd1b88dc7a config: document remote.*.url/pushurl interaction
The documentation for these keys gives a very terse definition and
points you to the fetch/push manpages. But from reading those pages it
was not at all obvious to me that:

  - these are keys that can be defined multiple times with meaningful
    behavior (especially remote.*.url)

  - the way that pushurl overrides url (the git-push page does mention
    that "pushurl defaults to url", but it is not immediately clear what
    a multi-valued url would do in that situation).

Let's try to summarize the current behavior.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 09:34:38 -07:00
Jeff King b68118d2e8 remote: simplify url/pushurl selection
When we want to know the push urls for a remote, there is some simple
logic:

  - if the user configured any remote.*.pushurl keys, then those make
    the complete set of push urls

  - otherwise we push to all urls in remote.*.url

Many spots implement this with a level of indirection, assigning to a
local url/url_nr pair. But since both arrays are now strvecs, we can
just use a pointer to select the appropriate strvec, shortening the code
a bit.

Even though this is now a one-liner, since it is application logic that
is present in so many places, it's worth abstracting a helper function.
In fact, we already have such a function, but it's local to
builtin/push.c. So we'll just make it available everywhere via remote.h.

There are two spots to pay special attention to here:

  1. in builtin/remote.c's get_url(), we are selecting first based on
     push_mode and then falling back to "url" when we're in push_mode
     but no pushurl is defined. The updated code makes that much more
     clear, compared to the original which had an "else" fall-through.

  2. likewise in that file's set_url(), we _only_ respect push_mode,
     sine the point is that we are adding to pushurl in that case
     (whether it is empty or not). And thus it does not use our helper
     function.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 09:34:38 -07:00
Jeff King 8e804415fd remote: use strvecs to store remote url/pushurl
Now that the url/pushurl fields of "struct remote" own their strings, we
can switch from bare arrays to strvecs. This has a few advantages:

  - push/clear are now one-liners

  - likewise the free+assigns in alias_all_urls() can use
    strvec_replace()

  - we now use size_t for storage, avoiding possible overflow

  - this will enable some further cleanups in future patches

There's quite a bit of fallout in the code that reads these fields, as
it tends to access these arrays directly. But it's mostly a mechanical
replacement of "url_nr" with "url.nr", and "url[i]" with "url.v[i]",
with a few variations (e.g. "*url" could become "*url.v", but I used
"url.v[0]" for consistency).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 09:34:38 -07:00
Jeff King 52595c155a remote: transfer ownership of memory in add_url(), etc
Many of the internal functions in remote.c take const strings and store
them forever in instances of "struct remote". Since the functions are
internal and callers are aware of the convention, this seems to mostly
work and not cause leaks. But there are some issues:

  - it's impossible to clear any of the arrays, because the data
    dependencies between them are too muddled (if you free() a string,
    it might also be referenced from another array, causing a
    user-after-free; but if you don't, that might be the last reference,
    causing a leak).

    This is mostly of interest for further refactoring and features, but
    there's at least one spot that's already a problem. In alias_all_urls(),
    we replace elements of remote->url and remote->pushurl with their
    aliased forms, dropping references to the original.

  - sometimes strings from outside callers make their way in. For
    example, calling remote_get("foo") when there is no configured "foo"
    remote will create a remote struct with the single url "foo". But
    we'll do so by holding on to the string passed to remote_get()
    forever.

    In practice I think this works out because we'd usually pass in a
    string that lasts the length of the program (a string literal, or
    argv reference, or other data structure allocated in the main
    function). But it's a rather subtle requirement.

Instead, let's have remote->url and remote->pushurl own their string
memory. They'll copy the const strings that are passed in, and callers
can stop making their own copies. Likewise, when we overwrite an entry,
we can free the memory it points to, fixing the leak mentioned above.

We'll leave the struct members as "const" since they are visible to the
outside world, and shouldn't usually be touched. This requires casting
on free() for now, but we'll clean that further in a future patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 09:34:37 -07:00
Jeff King aa0595fbd6 remote: refactor alias_url() memory ownership
The alias_url() function may return either a newly allocated string
(which the caller must take ownership of), or the original const "url"
parameter that was passed in.

This often works OK because callers are generally passing in a "url"
that they expect to retain ownership of anyway. So whether we got back
the original or a new string, we're always interested in storing it
forever. But I suspect there are some possible leaks here (e.g.,
add_url_alias() may end up discarding the original "url").

Whether there are active leaks or not, this is a confusing setup that
makes further refactoring of memory ownership harder. So instead of
returning the original string, return NULL, forcing callers to decide
what to do with it explicitly. We can then build further cleanups on top
of that.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 09:34:37 -07:00
Jeff King 0295ce7cbf archive: fix check for missing url
Running "git archive --remote" checks that we have at least one url for
the remote. It does so by looking at remote.url[0], but that won't work;
if we have no url at all, then remote.url will be NULL, and we'll
segfault.

Check url_nr instead, which is a more direct way of asking what we
want.

You can trigger the segfault like this:

  git -c remote.foo.vcs=bar archive --remote=foo

but I didn't bother adding a test. This is the tip of the iceberg for
no-url remotes, and a later patch will improve that situation. I just
wanted to clean up this bug so it didn't make further refactoring of
this code more confusing.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 09:34:37 -07:00
Patrick Steinhardt 028bb23a61 BreakingChanges: document that we do not plan to deprecate git-checkout
The git-checkout(1) command is seen by many as hard to understand
because it connects two somewhat unrelated features: switching between
branches and restoring worktree files from arbitrary revisions. In 2019,
we thus implemented two new commands git-switch(1) and git-restore(1) to
split out these separate concerns into standalone functions.

This "replacement" of git-checkout(1) has repeatedly triggered concerns
for our userbase that git-checkout(1) will eventually go away. This is
not the case though: the use of that command is still widespread, and it
is not expected that this will change anytime soon.

Document that all three commands will remain for the foreseeable future.
This decision may be revisited in case we ever figure out that most
everyone has given up on any of the commands.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 09:08:52 -07:00
Patrick Steinhardt fcf0f4801d BreakingChanges: document removal of grafting
The grafting mechanism for objects has been deprecated in e650d0643b
(docs: mark info/grafts as outdated, 2014-03-05), which is more than a
decade ago. The mechanism can lead to hard-to-debug issues and has a
superior replacement with replace refs.

Follow through with the deprecation and mark grafts for removal in Git
3.0.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 09:08:52 -07:00
Patrick Steinhardt 6ccf041d1d BreakingChanges: document upcoming change from "sha1" to "sha256"
Starting with 8e42eb0e9a (doc: sha256 is no longer experimental,
2023-07-31), the "sha256" object format is no longer considered to be
experimental. Furthermore, the SHA-1 hash function is actively
recommended against by for example NIST and FIPS 140-2, and attacks
against it are becoming more practical both due to new weaknesses
(SHAppening, SHAttered, Shambles) and due to the ever-increasing
computing power. It is only a matter of time before it can be considered
to be broken completely.

Let's plan for this event by being active instead of waiting for it to
happend and announce that the default object format is going to change
from "sha1" to "sha256" with Git 3.0.

All major Git implementations (libgit2, JGit, go-git) support the
"sha256" object format and are thus prepared for this change. The most
important missing piece in the puzzle is support in forges. But while
GitLab recently gained experimental support for the "sha256" object
format though, to the best of my knowledge GitHub doesn't support it
yet. Ideally, announcing this upcoming change will encourage forges to
start building that support.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 09:08:52 -07:00
Patrick Steinhardt 57ec9254eb docs: introduce document to announce breaking changes
Over time, Git has grown quite a lot. With this evolution, many ideas
that were sensible at the time they were introduced are not anymore and
are thus considered to be deprecated. And while some deprecations may be
noted in manpages, most of them are actually deprecated in the "hive
mind" of the Git community, only.

Introduce a new document that tracks such breaking changes, but also
deprecations which we are not willing to go through with, to address
this issue. This document serves multiple purposes:

  - It is a way to facilitate discussion around proposed deprecations.

  - It allows users to learn about deprecations and speak up in case
    they have good reasons why a certain feature should not be
    deprecated.

  - It states intent and documents where the Git project wants to go,
    both in the case where we want to deprecate, but also in the case
    where we don't want to deprecate a specific feature.

The document is _not_ intended to cast every single discussion into
stone. It is supposed to be a living document that may change over time
when there are good reasons for it to change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 09:08:52 -07:00
Junio C Hamano 10aa7c74a2 Merge branch 'gt/unit-test-oidtree' into ps/use-the-repository
* gt/unit-test-oidtree:
  t/: migrate helper/test-oidtree.c to unit-tests/t-oidtree.c
2024-06-13 09:39:46 -07:00
Junio C Hamano 092b33da2b Merge branch 'ps/ref-storage-migration' into ps/use-the-repository
* ps/ref-storage-migration:
  builtin/refs: new command to migrate ref storage formats
  refs: implement logic to migrate between ref storage formats
  refs: implement removal of ref storages
  worktree: don't store main worktree twice
  reftable: inline `merged_table_release()`
  refs/files: fix NULL pointer deref when releasing ref store
  refs/files: extract function to iterate through root refs
  refs/files: refactor `add_pseudoref_and_head_entries()`
  refs: allow to skip creation of reflog entries
  refs: pass storage format to `ref_store_init()` explicitly
  refs: convert ref storage format to an enum
  setup: unset ref storage when reinitializing repository version
2024-06-13 09:39:08 -07:00
Derrick Stolee f1160393c1 commit-graph: increment progress indicator
This fixes a bug that was introduced by 368d19b0b7 (commit-graph:
refactor compute_topological_levels(), 2023-03-20): Previously, the
progress indicator was updated from `i + 1` where `i` is the loop
variable of the enclosing `for` loop. After this patch, the update used
`info->progress_cnt + 1` instead, however, unlike `i`, the
`progress_cnt` attribute was not incremented. Let's increment it.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
[jc: squashed in a test update from Patrick Steinhardt]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-12 13:52:14 -07:00
Junio C Hamano d63586cb31 The thirteenth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-12 13:37:18 -07:00
Junio C Hamano 2a061a62e2 Merge branch 'gt/decorate-unit-test'
A test helper that essentially is unit tests on the "decorate"
logic has been rewritten using the unit-tests framework.

* gt/decorate-unit-test:
  t/: migrate helper/test-example-decorate to the unit testing framework
2024-06-12 13:37:18 -07:00