Commit Graph

70 Commits (seen)

Author SHA1 Message Date
Junio C Hamano bb6aff8c20 Merge branch 'ps/object-file-wo-the-repository' into seen
Reduce implicit assumption and dependence on the_repository in the
object-file subsystem.

* ps/object-file-wo-the-repository:
  object-file: get rid of `the_repository` in index-related functions
  object-file: get rid of `the_repository` in `force_object_loose()`
  object-file: get rid of `the_repository` in `read_loose_object()`
  object-file: get rid of `the_repository` in loose object iterators
  object-file: remove declaration for `for_each_file_in_obj_subdir()`
  object-file: inline `for_each_loose_file_in_objdir_buf()`
  object-file: get rid of `the_repository` when writing objects
  odb: introduce `odb_write_object()`
  loose: write loose objects map via their source
  object-file: get rid of `the_repository` in `finalize_object_file()`
  object-file: get rid of `the_repository` in `loose_object_info()`
  object-file: get rid of `the_repository` when freshening objects
  object-file: inline `check_and_freshen()` functions
  object-file: get rid of `the_repository` in `has_loose_object()`
  object-file: stop using `the_hash_algo`
  object-file: fix -Wsign-compare warnings
2025-07-23 12:52:04 -07:00
Junio C Hamano d393f7769d Merge branch 'ps/object-store-midx' into jch
Redefine where the multi-pack-index sits in the object subsystem,
which recently was restructured to allow multiple backends that
support a single object source that belongs to one repository.  A
midx does span mulitple "object sources".

Comments?

* ps/object-store-midx:
  midx: remove now-unused linked list of multi-pack indices
  packfile: stop using linked MIDX list in `get_all_packs()`
  packfile: stop using linked MIDX list in `find_pack_entry()`
  packfile: refactor `get_multi_pack_index()` to work on sources
  midx: stop using linked list when closing MIDX
  packfile: refactor `prepare_packed_git_one()` to work on sources
  midx: start tracking per object database source
2025-07-22 18:28:13 -07:00
Junio C Hamano b5cbb2f1cc Merge branch 'tb/prepare-midx-pack-cleanup' into jch
Improvement on Multi-pack-index API.

* tb/prepare-midx-pack-cleanup:
  midx: return a `packed_git` pointer from `prepare_midx_pack()`
  midx-write.c: extract inner loop from fill_packs_from_midx()
  midx-write.c: guard against incremental MIDXs in want_included_pack()
  midx: access pack names through `nth_midxed_pack_name()`
2025-07-22 18:28:12 -07:00
Patrick Steinhardt cbb388f3e5 object-file: get rid of `the_repository` in `finalize_object_file()`
We implicitly depend on `the_repository` when moving an object file into
place in `finalize_object_file()`. Get rid of this global dependency by
passing in a repository.

Note that one might be pressed to inject an object database instead of a
repository. But the function doesn't really care about the ODB at all.
All it does is to move a file into place while checking whether there is
any collision. As such, the functionality it provides is independent of
the object database and only needs the repository as parameter so that
it can adjust permissions of the file we are about to finalize.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16 22:16:14 -07:00
Patrick Steinhardt 736bb725eb packfile: refactor `get_multi_pack_index()` to work on sources
The function `get_multi_pack_index()` loads multi-pack indices via
`prepare_packed_git()` and then returns the linked list of multi-pack
indices that is stored in `struct object_database`. That list is in the
process of being removed though in favor of storing the MIDX as part of
the object database source it belongs to.

Refactor `get_multi_pack_index()` so that it returns the multi-pack
index for a single object source. Callers are now expected to call this
function for each source they are interested in. This requires them to
iterate through alternates, so we have to prepare alternate object
sources before doing so.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-15 12:07:29 -07:00
Patrick Steinhardt bd52ea343d odb: get rid of `the_repository` in `find_odb()`
Get rid of our dependency on `the_repository` in `find_odb()` by passing
in the object database in which we want to search for the source and
adjusting all callers.

Rename the function to `odb_find_source()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:35 -07:00
Taylor Blau 00fd49db6f midx: return a `packed_git` pointer from `prepare_midx_pack()`
The prepare_midx_pack() function is designed to convert a MIDX-specific
pack_int_id for a given pack into a pointer into an actual `packed_git`
structure.

In general, these calls look something like:

    struct packed_git *p;
    if (prepare_midx_pack(the_repository, some_midx, some_pack_int_id))
        die("could not load pack xyz");
    p = some_midx->packs[some_pack_int_id];

, and in a pre-incremental MIDX world, this pattern works well. However,
in a post-incremental MIDX world, this pattern is a little more prone to
errors.

These errors can happen when the given 'pack_int_id' is not usable as an
index into the 'm->packs' array. And this happens in all layers but the
bottom-most one in an incremental MIDX chain. Each layer stores only the
packs that are local to that layer of the chain, and offsets them by the
total number of packs in the base MIDX(s).

But there is other awkwardness here. Thinking back to the above snippet,
suppose that the pack with ID 'some_pack_int_id' is in a layer in the
middle of the MIDX chain. Then it is still invalid to do:

    p = some_midx->packs[some_pack_int_id - some_midx->num_packs_in_base];

, becuase the top-most layer (here 'some_midx') may not even have that
pack! So we would have to chase down the '->base_midx' pointer in order
to get the correct result. midx.c has a helper to do this (called
'midx_for_pack()'), but it is meant only for internal use.

That means that a full, working version of the above adjusted to handle
incremental MIDXs would look something like:

    struct packed_git *p;
    if (prepare_midx_pack(the_repository, some_midx, some_pack_int_id))
        die("could not load pack xyz");

    while (m && pack_int_id < m->num_packs_in_base)
        m = m->base_midx;

    if (!m)
        BUG("broken midx?");
    if (pack_int_id >= m->num_packs + m->num_packs_in_base)
        BUG("broken pack_int_id?");

    p = m->packs[pack_int_id - m->num_packs_in_base];

, which is far too verbose to access a single pack by its pack_int_id in
a MIDX chain.

Let's instead have prepare_midx_pack() return a pointer to the
packed_git structure itself, hiding the above as an implementation
detail of prepare_midx_pack(). This patch turns the above snippet into:

    struct packed_git *p = prepare_midx_pack(the_repository, some_midx,
                                             some_pack_int_id);
    if (!p)
        die("could not load pack xyz");

making it far easier and less error-prone to access packs by their
pack_int_id in a MIDX chain.

(In the future, we may want to consider similar treatment for, e.g., the
pack_names array. Likewise, it might make sense to rename the "packs"
member of the MIDX structure to suggest that it shouldn't be accessed
directly outside of midx.c.)

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-29 12:53:47 -07:00
Taylor Blau 5dff093ec0 midx-write.c: extract inner loop from fill_packs_from_midx()
The function fill_packs_from_midx() does relatively little, but ends up
in a doubly-nested loop because we're enumerating each pack within each
layer of the incremental MIDX chain.

Let's de-dent the inner loop of fill_packs_from_midx() by extracting its
contents into a separate function, and calling that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-29 12:53:47 -07:00
Taylor Blau 76ec9c2361 midx-write.c: guard against incremental MIDXs in want_included_pack()
The function want_included_pack() is used to determine whether or not a
the packfile corresponding to some given pack_int_id should be included
in a 'git multi-pack-index repack' operation.

This spot looks like it would be broken, particularly in:

    struct packed_git *p;
    if (prepare_midx_pack(r, m, pack_int_id))
        return 0;
    p = m->packs[pack_int_id];

, when pack_int_id is greater than m->num_pack_in_base (supposing that
m->num_packs_in_base is non-zero, or equivalently that m->base_midx is
non-NULL).

Suppose we have two MIDXs in an incremental MIDX chain, each having two
packs:

  - M0 = {packs: [P0, P1], base_midx: NULL}
  - M1 = {packs: [P2, P3], base_midx: M0}

noting that each pack is identified by its global pack_int_id within the
chain.

So if you do something like:

    want_included_pack(the_repository, M1, pack_kept_objects, 2);

that function will call prepare_midx_pack(), which is smart enough to
realize that the pack of interest is in the current layer (M1), and
knows how to adjust its global pack_int_id into an index into the
current layer's 'packs' array.

But the following line:

    p = m->packs[pack_int_id]; /* m is M1, pack_int_id is 2 */

looks broken, since each layer of the MIDX only maintains an array of
the packs stored within that layer, and 'm' wasn't adjusted back to
point at M1->base_midx (M0).

The right thing to do would be:

    struct packed_git *p;
    if (prepare_midx_pack(r, m, pack_int_id))
        return 0;

    /* open-code midx.c::midx_for_pack(), which is private */
    while (m && pack_int_id < m->num_packs_in_base)
        m = m->base_midx;

    if (!m)
        BUG("broken midx?");
    if (pack_int_id >= m->num_packs + m->num_packs_in_base)
        BUG("broken pack_int_id?");

    p = m->packs[pack_int_id - m->num_packs_in_base];

But that would be overkill, since this function never deals with
incremental MIDXs having more than one layer! To see why, observe that
want_included_pack() has two callers:

  - midx-write.c::fill_included_packs_all()
  - midx-write.c::fill_included_packs_batch()

and those functions' sole caller is in midx-write.c::midx_repack(),
which dispatches a call to one or the other depending on whether or not
the batch_size is non-zero.

And at the very top of midx_repack(), we have a guard against
non-trivial incremental MIDX chains:

    if (m->base_midx)
        die(_("cannot repack an incremental multi-pack-index"));

So want_included_pack() is OK because it will never encounter a
situation where it has to chase backwards through the '->base_midx'
pointer. But that is not immediately clear from reading the code, and is
too fragile for my comfort. Make this more clear by adding an ASSERT()
to the above effect.

Apply the same treatment to each of the fill_included_packs-related
functions as well, since those are deceptively OK by the same reasoning.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-29 12:53:47 -07:00
Phillip Wood 3aa98a61da midx: avoid negative array index
nth_midxed_pack_int_id() returns the index of the pack file in the multi
pack index's list of packfiles that the specified object. The index is
returned as a uint32_t. Storing this in an int will make the index
negative if the most significant bit is set. Fix this by using uint32_t
as the rest of the code does. This is unlikely to be a practical problem
as it requires the multipack index to reference 2^31 packfiles.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-22 14:48:37 -07:00
Phillip Wood f874c0ed90 midx repack: avoid potential integer overflow on 64 bit systems
On a 64 bit system the calculation

    p->pack_size * pack_info[i].referenced_objects

could overflow. If a pack file contains 2^28 objects with an average
compressed size of 1KB then the pack size will be 2^38B. If all of the
objects are referenced by the multi-pack index the sum above will
overflow. Avoid this by using shifted integer arithmetic and changing
the order of the calculation so that the pack size is divided by the
total number of objects in the pack before multiplying by the number of
objects referenced by the multi-pack index. Using a shift of 14 bits
should give reasonable accuracy while avoiding overflow for pack sizes
less that 1PB.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-22 14:48:36 -07:00
Phillip Wood b103881d4f midx repack: avoid integer overflow on 32 bit systems
On a 32 bit system "git multi-pack-index --repack --batch-size=120M"
failed with

    fatal: size_t overflow: 6038786 * 1289

The calculation to estimated size of the objects in the pack referenced
by the multi-pack-index uses st_mult() to multiply the pack size by the
number of referenced objects before dividing by the total number of
objects in the pack. As size_t is 32 bits on 32 bit systems this
calculation easily overflows. Fix this by using 64bit arithmetic instead.

Also fix a potential overflow when caluculating the total size of the
objects referenced by the multipack index with a batch size larger
than SIZE_MAX / 2. In that case

    total_size += estimated_size

can overflow as both total_size and estimated_size can be greater that
SIZE_MAX / 2. This is addressed by using saturating arithmetic for the
addition. Although estimated_size is of type uint64_t by the time we
reach this sum it is bounded by the batch size which is of type size_t
and so casting estimated_size to size_t does not truncate the value.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-22 14:48:36 -07:00
Junio C Hamano 36d8035d27 Merge branch 'ps/object-file-cleanup'
Code clean-up.

* ps/object-file-cleanup:
  object-store: merge "object-store-ll.h" and "object-store.h"
  object-store: remove global array of cached objects
  object: split out functions relating to object store subsystem
  object-file: drop `index_blob_stream()`
  object-file: split up concerns of `HASH_*` flags
  object-file: split out functions relating to object store subsystem
  object-file: move `xmmap()` into "wrapper.c"
  object-file: move `git_open_cloexec()` to "compat/open.c"
  object-file: move `safe_create_leading_directories()` into "path.c"
  object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-24 17:25:33 -07:00
Junio C Hamano ee847e0034 Merge branch 'ps/object-wo-the-repository'
The object layer has been updated to take an explicit repository
instance as a parameter in more code paths.

* ps/object-wo-the-repository:
  hash: stop depending on `the_repository` in `null_oid()`
  hash: fix "-Wsign-compare" warnings
  object-file: split out logic regarding hash algorithms
  delta-islands: stop depending on `the_repository`
  object-file-convert: stop depending on `the_repository`
  pack-bitmap-write: stop depending on `the_repository`
  pack-revindex: stop depending on `the_repository`
  pack-check: stop depending on `the_repository`
  environment: move access to "core.bigFileThreshold" into repo settings
  pack-write: stop depending on `the_repository` and `the_hash_algo`
  object: stop depending on `the_repository`
  csum-file: stop depending on `the_repository`
2025-04-15 13:50:15 -07:00
Patrick Steinhardt 1a99fe8010 object-file: move `safe_create_leading_directories()` into "path.c"
The `safe_create_leading_directories()` function and its relatives are
located in "object-file.c", which is not a good fit as they provide
generic functionality not related to objects at all. Move them into
"path.c", which already hosts `safe_create_dir()` and its relative
`safe_create_dir_in_gitdir()`.

"path.c" is free of `the_repository`, but the moved functions depend on
`the_repository` to read the "core.sharedRepository" config. Adapt the
function signature to accept a repository as argument to fix the issue
and adjust callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15 08:24:35 -07:00
Taylor Blau 27afc272c4 midx: implement writing incremental MIDX bitmaps
Now that the pack-bitmap machinery has learned how to read and interact
with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery
(and relevant callers from within the MIDX machinery) to write such
bitmaps.

The details for doing so are mostly straightforward. The main changes
are as follows:

  - find_object_pos() now makes use of an extra MIDX parameter which is
    used to locate the bit positions of objects which are from previous
    layers (and thus do not exist in the current layer's pack_order
    field).

    (Note also that the pack_order field is moved into struct
    write_midx_context to further simplify the callers for
    write_midx_bitmap()).

  - bitmap_writer_build_type_index() first determines how many objects
    precede the current bitmap layer and offsets the bits it sets in
    each respective type-level bitmap by that amount so they can be OR'd
    together.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21 04:34:16 -07:00
Patrick Steinhardt 2582846f2f pack-write: stop depending on `the_repository` and `the_hash_algo`
There are a couple of functions in "pack-write.c" that implicitly depend
on `the_repository` or `the_hash_algo`. Remove this dependency by
injecting the repository via a parameter and adapt callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10 13:16:18 -07:00
Patrick Steinhardt 74d414c9f1 object: stop depending on `the_repository`
There are a couple of functions exposed by "object.c" that implicitly
depend on `the_repository`. Remove this dependency by injecting the
repository via a parameter. Adapt callers accordingly by simply using
`the_repository`, except in cases where the subsystem is already free of
the repository. In that case, we instead pass the repository provided by
the caller's context.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10 13:16:18 -07:00
Patrick Steinhardt 228457c9d9 csum-file: stop depending on `the_repository`
There are multiple sites in "csum-file.c" where we use the global
`the_repository` variable, either explicitly or implicitly by using
`the_hash_algo`.

Refactor the code to stop using `the_repository` by adapting functions
to receive required data as parameters. Adapt callsites accordingly by
either using `the_repository->hash_algo`, or by using a context-provided
hash algorithm in case the subsystem already got rid of its dependency
on `the_repository`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10 13:16:18 -07:00
Patrick Steinhardt 028f618658 path: adjust last remaining users of `the_repository`
With the preceding refactorings we now only have a couple of implicit
users of `the_repository` left in the "path" subsystem, all of which
depend on global state via `calc_shared_perm()`. Make the dependency on
`the_repository` explicit by passing the repo as a parameter instead and
adjust callers accordingly.

Note that this change bubbles up into a couple of subsystems that were
previously declared as free from `the_repository`. Instead of marking
all of them as `the_repository`-dependent again, we instead use the
repository that is available in the calling context. There are three
exceptions though with "copy.c", "pack-write.c" and "tempfile.c".
Adjusting these would require us to adapt callsites all over the place,
so this is left for a future iteration.

Mark "path.c" as free from `the_repository`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-28 13:54:11 -08:00
Junio C Hamano b83a2f9006 Merge branch 'kn/pack-write-with-reduced-globals'
Code clean-up.

* kn/pack-write-with-reduced-globals:
  pack-write: pass hash_algo to internal functions
  pack-write: pass hash_algo to `write_rev_file()`
  pack-write: pass hash_algo to `write_idx_file()`
  pack-write: pass repository to `index_pack_lockfile()`
  pack-write: pass hash_algo to `fixup_pack_header_footer()`
2025-02-03 10:23:34 -08:00
Karthik Nayak 6b2aa7fd37 pack-write: pass hash_algo to `write_rev_file()`
The `write_rev_file()` function uses the global `the_hash_algo` variable
to access the repository's hash_algo. To avoid global variable usage,
pass a hash_algo from the layers above. Also modify children functions
`write_rev_file_order()` and `write_rev_header()` to accept
'the_hash_algo'.

Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.

However, in `midx-write.c`, since all usage of global variables is
removed, don't reintroduce them and instead use the `repo` available in
the context.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21 12:36:34 -08:00
Junio C Hamano 4156b6a741 Merge branch 'ps/build-sign-compare'
Start working to make the codebase buildable with -Wsign-compare.

* ps/build-sign-compare:
  t/helper: don't depend on implicit wraparound
  scalar: address -Wsign-compare warnings
  builtin/patch-id: fix type of `get_one_patchid()`
  builtin/blame: fix type of `length` variable when emitting object ID
  gpg-interface: address -Wsign-comparison warnings
  daemon: fix type of `max_connections`
  daemon: fix loops that have mismatching integer types
  global: trivial conversions to fix `-Wsign-compare` warnings
  pkt-line: fix -Wsign-compare warning on 32 bit platform
  csum-file: fix -Wsign-compare warning on 32-bit platform
  diff.h: fix index used to loop through unsigned integer
  config.mak.dev: drop `-Wno-sign-compare`
  global: mark code units that generate warnings with `-Wsign-compare`
  compat/win32: fix -Wsign-compare warning in "wWinMain()"
  compat/regex: explicitly ignore "-Wsign-compare" warnings
  git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-23 09:32:11 -08:00
Patrick Steinhardt 1f7e6478dc progress: stop using `the_repository`
Stop using `the_repository` in the "progress" subsystem by passing in a
repository when initializing `struct progress`. Furthermore, store a
pointer to the repository in that struct so that we can pass it to the
trace2 API when logging information.

Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18 10:44:30 -08:00
Junio C Hamano 913a1e157c Merge branch 'ps/build-sign-compare' into ps/the-repository
* ps/build-sign-compare:
  t/helper: don't depend on implicit wraparound
  scalar: address -Wsign-compare warnings
  builtin/patch-id: fix type of `get_one_patchid()`
  builtin/blame: fix type of `length` variable when emitting object ID
  gpg-interface: address -Wsign-comparison warnings
  daemon: fix type of `max_connections`
  daemon: fix loops that have mismatching integer types
  global: trivial conversions to fix `-Wsign-compare` warnings
  pkt-line: fix -Wsign-compare warning on 32 bit platform
  csum-file: fix -Wsign-compare warning on 32-bit platform
  diff.h: fix index used to loop through unsigned integer
  config.mak.dev: drop `-Wno-sign-compare`
  global: mark code units that generate warnings with `-Wsign-compare`
  compat/win32: fix -Wsign-compare warning in "wWinMain()"
  compat/regex: explicitly ignore "-Wsign-compare" warnings
  git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-18 10:43:16 -08:00
Patrick Steinhardt 41f43b8243 global: mark code units that generate warnings with `-Wsign-compare`
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06 20:20:02 +09:00
Karthik Nayak f59de71cf7 midx: pass down `hash_algo` to functions using global variables
The functions `get_split_midx_filename_ext()`, `get_midx_filename()` and
`get_midx_filename_ext()` use `hash_to_hex()` which internally uses the
`the_hash_algo` global variable.

Remove this dependency on global variables by passing down the
`hash_algo` through to the functions mentioned and instead calling
`hash_to_hex_algop()` along with the obtained `hash_algo`.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 10:32:21 +09:00
Karthik Nayak 2fed09aa9b midx-write: pass down repository to `write_midx_file[_only]`
In a previous commit, we passed the repository field to all
subcommands in the `builtin/` directory. Utilize this to pass the
repository field down to the `write_midx_file[_only]` functions to
remove the usage of `the_repository` global variables.

With this, all usage of global variables in `midx-write.c` is removed,
hence, remove the `USE_THE_REPOSITORY_VARIABLE` guard from the file.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 10:32:20 +09:00
Karthik Nayak dfa7c68245 write-midx: add repository field to `write_midx_context`
The struct `write_midx_context` is used to pass context for creating
MIDX files. Add the repository field here to ensure that most functions
within `midx-write.c` have access to the field and can use that instead
of the global `the_repository` variable.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 10:32:20 +09:00
Karthik Nayak 20df8141f5 midx-write: use `revs->repo` inside `read_refs_snapshot`
The function `read_refs_snapshot()` uses `parse_oid_hex()`, which relies
on the global `the_hash_algo` variable. Let's instead use
`parse_oid_hex_algop()` and provide the hash algo via `revs->repo`.

Also, while here, fix a missing newline after the function's definition.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 10:32:20 +09:00
Karthik Nayak 3687a4b3e1 midx-write: pass down repository to static functions
In 'midx-write.c' there are a lot of static functions which use global
variables `the_repository` or `the_hash_algo`. In a follow up commit,
the repository variable will be added to `write_midx_context`, which
some of the functions can use. But for functions which do not have
access to this struct, pass down the required information from
non-static functions `write_midx_file` and `write_midx_file_only`.

This requires that the function `hash_to_hex` is also replaced with
`hash_to_hex_algop` since the former internally accesses the
`the_hash_algo` global variable.

This ensures that the usage of global variables is limited to these
non-static functions, which will be cleaned up in a follow up commit.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 10:32:20 +09:00
Karthik Nayak 2cf3fe63f6 packfile: add repository to struct `packed_git`
The struct `packed_git` holds information regarding a packed object
file. Let's add the repository variable to this object, to represent the
repository that this packfile belongs to. This helps remove dependency
on the global `the_repository` object in `packfile.c` by simply using
repository information now readily available in the struct.

We do need to consider that a packfile could be part of the alternates
of a repository, but considering that we only have one repository struct
and also that we currently anyways use 'the_repository', we should be
OK with this change.

We also modify `alloc_packed_git` to ensure that the repository is added
to newly created `packed_git` structs. This requires modifying the
function and all its callee to pass the repository object down the
levels.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04 08:21:53 +09:00
Patrick Steinhardt 2f0ee051dd pack-write: fix return parameter of `write_rev_file_order()`
While the return parameter of `write_rev_file_order()` is a string
constant, the function may indeed return an allocated string when its
first parameter is a `NULL` pointer. This makes for a confusing calling
convention, where callers need to be aware of these intricate ownership
rules and cast away the constness to free the string in some cases.

Adapt the function and its caller `write_rev_file()` to always return an
allocated string and adapt callers to always free the return value.

Note that this requires us to also adapt `rename_tmp_packfile()`, which
compares the pointers to packfile data with each other. Now that the
path of the reverse index file gets allocated unconditionally the check
will always fail. This is fixed by using strcmp(3P) instead, which also
feels way less fragile.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30 11:23:08 -07:00
Patrick Steinhardt 9d4855eef3 midx-write: fix leaking buffer
The buffer used to compute the final MIDX name is never released. Plug
this memory leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30 11:23:07 -07:00
Junio C Hamano 8c1c63d525 Merge branch 'ps/leakfixes-part-5'
Even more leak fixes.

* ps/leakfixes-part-5:
  transport: fix leaking negotiation tips
  transport: fix leaking arguments when fetching from bundle
  builtin/fetch: fix leaking transaction with `--atomic`
  remote: fix leaking peer ref when expanding refmap
  remote: fix leaks when matching refspecs
  remote: fix leaking config strings
  builtin/fetch-pack: fix leaking refs
  sideband: fix leaks when configuring sideband colors
  builtin/send-pack: fix leaking refspecs
  transport: fix leaking OID arrays in git:// transport data
  t/helper: fix leaking multi-pack-indices in "read-midx"
  builtin/repack: fix leaks when computing packs to repack
  midx-write: fix leaking hashfile on error cases
  builtin/archive: fix leaking `OPT_FILENAME()` value
  builtin/upload-archive: fix leaking args passed to `write_archive()`
  builtin/merge-tree: fix leaking `-X` strategy options
  pretty: fix leaking key/value separator buffer
  pretty: fix memory leaks when parsing pretty formats
  convert: fix leaks when resetting attributes
  mailinfo: fix leaking header data
2024-09-03 09:15:00 -07:00
Junio C Hamano 1f4d89dfce Merge branch 'tb/pseudo-merge-bitmap-fixes'
We created a useless pseudo-merge reachability bitmap that is about
0 commits, and attempted to include commits that are not in packs,
which made no sense.  These bugs have been corrected.

* tb/pseudo-merge-bitmap-fixes:
  pseudo-merge.c: ensure pseudo-merge groups are closed
  pseudo-merge.c: do not generate empty pseudo-merge commits
  t/t5333-pseudo-merge-bitmaps.sh: demonstrate empty pseudo-merge groups
  pack-bitmap-write.c: select pseudo-merges even for small bitmaps
  pack-bitmap: drop redundant args from `bitmap_writer_finish()`
  pack-bitmap: drop redundant args from `bitmap_writer_build()`
  pack-bitmap: drop redundant args from `bitmap_writer_build_type_index()`
  pack-bitmap: initialize `bitmap_writer_init()` with packing_data
2024-08-26 11:32:21 -07:00
Patrick Steinhardt 8a7846383e midx-write: fix leaking hashfile on error cases
When writing the MIDX file we first create the `struct hashfile` used to
write the trailer hash, and then afterwards we verify whether we can
actually write the MIDX in the first place. When we decide that we
can't, this leads to a memory leak because we never free the hash file
contents.

We could fix this by freeing the hashfile on the exit path. There is a
better option though: we can simply move the checks for the error
condition earlier. As there is no early exit between creating the
hashfile and finalizing it anymore this is sufficient to fix the memory
leak.

While at it, also move around the block checking for `ctx.entries_nr`.
This change is not required to fix the memory leak, but it feels natural
to move together all massaging of parameters before we go with them and
execute the actual logic.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22 09:18:04 -07:00
Junio C Hamano b9497848df Merge branch 'tb/incremental-midx-part-1'
Incremental updates of multi-pack index files.

* tb/incremental-midx-part-1:
  midx: implement support for writing incremental MIDX chains
  t/t5313-pack-bounds-checks.sh: prepare for sub-directories
  t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  midx: implement verification support for incremental MIDXs
  midx: support reading incremental MIDX chains
  midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
  midx: teach `midx_preferred_pack()` about incremental MIDXs
  midx: teach `midx_contains_pack()` about incremental MIDXs
  midx: remove unused `midx_locate_pack()`
  midx: teach `fill_midx_entry()` about incremental MIDXs
  midx: teach `nth_midxed_offset()` about incremental MIDXs
  midx: teach `bsearch_midx()` about incremental MIDXs
  midx: introduce `bsearch_one_midx()`
  midx: teach `nth_bitmapped_pack()` about incremental MIDXs
  midx: teach `nth_midxed_object_oid()` about incremental MIDXs
  midx: teach `prepare_midx_pack()` about incremental MIDXs
  midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
  midx: add new fields for incremental MIDX chains
  Documentation: describe incremental MIDX format
2024-08-19 11:07:37 -07:00
Taylor Blau 11a08e8332 pack-bitmap: drop redundant args from `bitmap_writer_finish()`
In a similar fashion as the previous commit, drop a redundant argument
from the `bitmap_writer_finish()` function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-15 11:23:15 -07:00
Taylor Blau f00dda4849 pack-bitmap: drop redundant args from `bitmap_writer_build()`
In a similar fashion as the previous commit, drop a redundant argument
from the `bitmap_writer_build()` function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-15 11:22:27 -07:00
Taylor Blau 125ee4ae80 pack-bitmap: drop redundant args from `bitmap_writer_build_type_index()`
The previous commit ensures that the bitmap_writer's "to_pack" field is
initialized early on, so the "to_pack" and "index_nr" arguments to
`bitmap_writer_build_type_index()` are redundant.

Drop them and adjust the callers accordingly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-15 11:20:24 -07:00
Taylor Blau 01e9d12939 pack-bitmap: initialize `bitmap_writer_init()` with packing_data
In order to determine its object order, the pack-bitmap machinery keeps
a 'struct packing_data' corresponding to the pack or pseudo-pack (when
writing a MIDX bitmap) being written.

The to_pack field is provided to the bitmap machinery by callers of
bitmap_writer_build() and assigned to the bitmap_writer struct at that
point.

But a subsequent commit will want to have access to that data earlier on
during commit selection. Prepare for that by adding a 'to_pack' argument
to 'bitmap_writer_init()', and initializing the field during that
function.

Subsequent commits will clean up other functions which take
now-redundant arguments (like nr_objects, which is equivalent to
pdata->objects_nr, or pdata itself).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-15 11:18:04 -07:00
John Cai e8207717f1 refs: add referent to each_ref_fn
Add a parameter to each_ref_fn so that callers to the ref APIs
that use this function as a callback can have acess to the
unresolved value of a symbolic ref.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-09 08:47:34 -07:00
Taylor Blau fcb2205b77 midx: implement support for writing incremental MIDX chains
Now that the rest of the MIDX subsystem and relevant callers have been
updated to learn about how to read and process incremental MIDX chains,
let's finally update the implementation in `write_midx_internal()` to be
able to write incremental MIDX chains.

This new feature is available behind the `--incremental` option for the
`multi-pack-index` builtin, like so:

    $ git multi-pack-index write --incremental

The implementation for doing so is relatively straightforward, and boils
down to a handful of different kinds of changes implemented in this
patch:

  - The `compute_sorted_entries()` function is taught to reject objects
    which appear in any existing MIDX layer.

  - Functions like `write_midx_revindex()` are adjusted to write
    pack_order values which are offset by the number of objects in the
    base MIDX layer.

  - The end of `write_midx_internal()` is adjusted to move
    non-incremental MIDX files when necessary (i.e. when creating an
    incremental chain with an existing non-incremental MIDX in the
    repository).

There are a handful of other changes that are introduced, like new
functions to clear incremental MIDX files that are unrelated to the
current chain (using the same "keep_hash" mechanism as in the
non-incremental case).

The tests explicitly exercising the new incremental MIDX feature are
relatively limited for two reasons:

  1. Most of the "interesting" behavior is already thoroughly covered in
     t5319-multi-pack-index.sh, which handles the core logic of reading
     objects through a MIDX.

     The new tests in t5334-incremental-multi-pack-index.sh are mostly
     focused on creating and destroying incremental MIDXs, as well as
     stitching their results together across layers.

  2. A new GIT_TEST environment variable is added called
     "GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL", which modifies the
     entire test suite to write incremental MIDXs after repacking when
     combined with the "GIT_TEST_MULTI_PACK_INDEX" variable.

     This exercises the long tail of other interesting behavior that is
     defined implicitly throughout the rest of the CI suite. It is
     likewise added to the linux-TEST-vars job.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-06 12:01:39 -07:00
Taylor Blau 97fd770ea1 midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
The function `midx_fanout_add_midx_fanout()` is used to help construct
the fanout table when generating a MIDX by reusing data from an existing
MIDX.

Prepare this function to work with incremental MIDXs by making a few
changes:

  - The bounds checks need to be adjusted to start object lookups taking
    into account the number of objects in the previous MIDX layer (i.e.,
    by starting the lookups at position `m->num_objects_in_base` instead
    of position 0).

  - Likewise, the bounds checks need to end at `m->num_objects_in_base`
    objects after `m->num_objects`.

  - Finally, `midx_fanout_add_midx_fanout()` needs to recur on earlier
    MIDX layers when dealing with an incremental MIDX chain by calling
    itself when given a MIDX with a non-NULL `base_midx`.

Note that after 0c5a62f14b (midx-write.c: do not read existing MIDX with
`packs_to_include`, 2024-06-11), we do not use this function with an
existing MIDX (incremental or not) when generating a MIDX with
--stdin-packs, and likewise for incremental MIDXs.

But it is still used when adding the fanout table from an incremental
MIDX when generating a non-incremental MIDX (without --stdin-packs, of
course).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-06 12:01:38 -07:00
Junio C Hamano ec9d46588e Merge branch 'ds/midx-write-repack-fix'
Repacking a repository with multi-pack index started making stupid
pack selections in Git 2.45, which has been corrected.

* ds/midx-write-repack-fix:
  midx-write: revert use of --stdin-packs
  t5319: add failing test case for repack/expire
2024-07-23 16:54:33 -07:00
Derrick Stolee 8fb6d11fad midx-write: revert use of --stdin-packs
This reverts b7d6f23a17 (midx-write.c: use `--stdin-packs` when
repacking, 2024-04-01) and then marks the test created in the previous
change as passing.

The fundamental issue with the reverted change is that the focus on
pack-files separates the object selection from how the multi-pack-index
selects a single pack-file for an object ID with multiple copies among
the tracked pack-files.

The change was made with the intention of improving delta compression in
the resulting pack-file, but that can be resolved with the existing
object list mechanism. There are other potential pitfalls of doing an
object walk at this time if the repository is a blobless partial clone,
and that will require additional testing on top of the one that changes
here.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-19 07:19:01 -07:00
Junio C Hamano 7b472da915 Merge branch 'ps/use-the-repository'
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help
transition the codebase to rely less on the availability of the
singleton the_repository instance.

* ps/use-the-repository:
  hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE`
  t/helper: remove dependency on `the_repository` in "proc-receive"
  t/helper: fix segfault in "oid-array" command without repository
  t/helper: use correct object hash in partial-clone helper
  compat/fsmonitor: fix socket path in networked SHA256 repos
  replace-object: use hash algorithm from passed-in repository
  protocol-caps: use hash algorithm from passed-in repository
  oidset: pass hash algorithm when parsing file
  http-fetch: don't crash when parsing packfile without a repo
  hash-ll: merge with "hash.h"
  refs: avoid include cycle with "repository.h"
  global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
  hash: require hash algorithm in `empty_tree_oid_hex()`
  hash: require hash algorithm in `is_empty_{blob,tree}_oid()`
  hash: make `is_null_oid()` independent of `the_repository`
  hash: convert `oidcmp()` and `oideq()` to compare whole hash
  global: ensure that object IDs are always padded
  hash: require hash algorithm in `oidread()` and `oidclr()`
  hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`
  hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
2024-07-02 09:59:00 -07:00
Junio C Hamano ffa47b75cf Merge branch 'tb/pseudo-merge-reachability-bitmap'
The pseudo-merge reachability bitmap to help more efficient storage
of the reachability bitmap in a repository with too many refs has
been added.

* tb/pseudo-merge-reachability-bitmap: (26 commits)
  pack-bitmap.c: ensure pseudo-merge offset reads are bounded
  Documentation/technical/bitmap-format.txt: add missing position table
  t/perf: implement performance tests for pseudo-merge bitmaps
  pseudo-merge: implement support for finding existing merges
  ewah: `bitmap_equals_ewah()`
  pack-bitmap: extra trace2 information
  pack-bitmap.c: use pseudo-merges during traversal
  t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()`
  pack-bitmap: implement test helpers for pseudo-merge
  ewah: implement `ewah_bitmap_popcount()`
  pseudo-merge: implement support for reading pseudo-merge commits
  pack-bitmap.c: read pseudo-merge extension
  pseudo-merge: scaffolding for reads
  pack-bitmap: extract `read_bitmap()` function
  pack-bitmap-write.c: write pseudo-merge table
  pseudo-merge: implement support for selecting pseudo-merge commits
  config: introduce `git_config_double()`
  pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
  pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
  pack-bitmap-write: support storing pseudo-merge commits
  ...
2024-06-24 16:39:13 -07:00
Patrick Steinhardt e7da938570 global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.

It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.

Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit

For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).

Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00