Commit Graph

26 Commits (8d71696686762c57e251f8a191ecad28861de6a7)

Author SHA1 Message Date
Patrick Steinhardt c2b5d1490a object-file: get rid of `the_repository` in `force_object_loose()`
The function `force_object_loose()` forces an object to become a loose
object in case it only exists in its packed form. To do so it implicitly
relies on `the_repository`.

Refactor the function by passing a `struct odb_source` as parameter.
While the check whether any such loose object exists already acts on the
whole object database, writing the loose object happens in one specific
source.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16 22:16:17 -07:00
Patrick Steinhardt 0df005353a object-file: get rid of `the_repository` in `read_loose_object()`
The function `read_loose_object()` takes a path to an object file and
tries to parse it. As such, the function does not depend on any specific
object database but instead acts as an ODB-independent way to read a
specific file. As such, all it needs as input is a repository so that we
can derive repo settings and the hash algorithm.

That repository isn't passed in as a parameter though, as we implicitly
depend on the global `the_repository`. Refactor the function so that we
pass in the repository as a parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16 22:16:17 -07:00
Patrick Steinhardt d81712ce65 object-file: get rid of `the_repository` in loose object iterators
The iterators for loose objects still rely on `the_repository`. Refactor
them:

  - `for_each_loose_file_in_objdir()` is refactored so that the caller
    is now expected to pass an `odb_source` as parameter instead of the
    path to that source. Furthermore, it is renamed accordingly to
    `for_each_loose_file_in_source()`.

  - `for_each_loose_object()` is refactored to take in an object
    database now and calls the above function in a loop.

This allows us to get rid of the global dependency.

Adjust callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16 22:16:17 -07:00
Patrick Steinhardt 83439299f1 object-file: remove declaration for `for_each_file_in_obj_subdir()`
The function `for_each_file_in_obj_subdir()` is declared in our headers,
but it is not used anywhere else than in the corresponding code file
itself. Drop the declaration and mark the function as file-local.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16 22:16:16 -07:00
Patrick Steinhardt f2c40e51b2 object-file: inline `for_each_loose_file_in_objdir_buf()`
The function `for_each_loose_file_in_objdir_buf()` is declared in our
headers, but it is not used anywhere else than in the corresponding code
file itself. Drop the declaration and inline the function into its only
caller.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16 22:16:16 -07:00
Patrick Steinhardt e7e952f5c2 object-file: get rid of `the_repository` when writing objects
The logic that writes loose objects still relies on `the_repository` to
decide where exactly the object shall be written to. Refactor it so that
the logic instead operates on a `struct odb_source` so that we can get
rid of this global dependency.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16 22:16:16 -07:00
Patrick Steinhardt ab1c6e1d12 odb: introduce `odb_write_object()`
We do not have a backend-agnostic way to write objects into an object
database. While there is `write_object_file()`, this function is rather
specific to the loose object format.

Introduce `odb_write_object()` to plug this gap. For now, this function
is a simple wrapper around `write_object_file()` and doesn't even use
the passed-in object database yet. This will change in subsequent
commits, where `write_object_file()` is converted so that it works on
top of an `odb_source`. `odb_write_object()` will then become
responsible for deciding which source an object shall be written to.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16 22:16:15 -07:00
Patrick Steinhardt cbb388f3e5 object-file: get rid of `the_repository` in `finalize_object_file()`
We implicitly depend on `the_repository` when moving an object file into
place in `finalize_object_file()`. Get rid of this global dependency by
passing in a repository.

Note that one might be pressed to inject an object database instead of a
repository. But the function doesn't really care about the ODB at all.
All it does is to move a file into place while checking whether there is
any collision. As such, the functionality it provides is independent of
the object database and only needs the repository as parameter so that
it can adjust permissions of the file we are about to finalize.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16 22:16:14 -07:00
Patrick Steinhardt 931e8c9f52 object-file: get rid of `the_repository` in `has_loose_object()`
We implicitly depend on `the_repository` in `has_loose_object()`.
Refactor the function to accept an `odb_source` as input that should be
checked for such a loose object.

This refactoring changes semantics of the function to not check the
whole object database for such a loose object anymore, but instead we
now only check that single source. Existing callers thus need to loop
through all sources manually now.

While this change may seem illogical at first, whether or not an object
exists in a specific format should be answered by the source using that
format. As such, we can eventually convert this into a generic function
`odb_source_has_object()` that simply checks whether a given object
exists in an object source. And as we will know about the format that
any given source uses it allows us to derive whether the object exists
in a given format.

This change also makes `has_loose_object_nonlocal()` obsolete. The only
caller of this function is adapted so that it skips the primary object
source.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16 22:16:13 -07:00
Patrick Steinhardt 18323f5b48 object-file: stop using `the_hash_algo`
There are a couple of users of the `the_hash_algo` macro, which
implicitly depends on `the_repository`. Adapt these callers to not do so
anymore, either by deriving it from already-available context or by
using `the_repository->hash_algo`. The latter variant doesn't yet help
to remove the global dependency, but such users will be adapted in the
following commits to not use `the_repository` anymore.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16 22:16:13 -07:00
Patrick Steinhardt e989dd96b8 odb: rename `oid_object_info()`
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.

Introduce compatibility wrappers so that any in-flight topics will
continue to compile.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:37 -07:00
Patrick Steinhardt 8f49151763 object-store: rename files to "odb.{c,h}"
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:34 -07:00
Patrick Steinhardt a1e2581a1e object-store: rename `object_directory` to `odb_source`
The `object_directory` structure is used as an access point for a single
object directory like ".git/objects". While the structure isn't yet
fully self-contained, the intent is for it to eventually contain all
information required to access objects in one specific location.

While the name "object directory" is a good fit for now, this will
change over time as we continue with the agenda to make pluggable object
databases a thing. Eventually, objects may not be accessed via any kind
of directory at all anymore, but they could instead be backed by any
kind of durable storage mechanism. While it seems quite far-fetched for
now, it is thinkable that eventually this might even be some form of a
database, for example.

As such, the current name of this structure will become worse over time
as we evolve into the direction of pluggable ODBs. Immediate next steps
will start to carve out proper self-contained object directories, which
requires us to pass in these object directories as parameters. Based on
our modern naming schema this means that those functions should then be
named after their subsystem, which means that we would start to bake the
current name into the codebase more and more.

Let's preempt this by renaming the structure. There have been a couple
alternatives that were discussed:

  - `odb_backend` was discarded because it led to the association that
    one object database has a single backend, but the model is that one
    alternate has one backend. Furthermore, "backend" is more about the
    actual backing implementation and less about the high-level concept.

  - `odb_alternate` was discarded because it is a bit of a stretch to
    also call the main object directory an "alternate".

Instead, pick `odb_source` as the new name. It makes it sufficiently
clear that there can be multiple sources and does not cause confusion
when mixed with the already-existing "alternate" terminology.

In the future, this change allows us to easily introduce for example a
`odb_files_source` and other format-specific implementations.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:34 -07:00
Jeff King 141f8c8c05 object-file: drop support for writing objects with unknown types
Since "hash-object --literally" no longer supports objects with unknown
types, there are now no callers of write_object_file_literally() and its
helpers. Let's drop them to simplify the code.

In particular, this gets rid of some ugly copy-and-paste code from
write_object_file_literally(), which is a parallel implementation of
write_object_file(). When the split was originally made, the two weren't
that long, but commits like 63a6745a07 (object-file: update the loose
object map when writing loose objects, 2023-10-01) ended up having to
duplicate some tricky code.

This patch drops all of that duplication and should make things less
error-prone going forward.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:12 -07:00
Jeff King ae24b032a0 object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag
Since cat-file dropped its "--allow-unknown-type" option in the previous
commit, there are no more uses of the internal flag that implemented it.
Let's drop it.

That in turn lets us drop the strbuf parameter of unpack_loose_header(),
which now is always NULL. And without that, we can drop all of the
additional code to inflate larger headers into the strbuf.

Arguably we could drop ULHR_TOO_LONG, as no callers really care about
the distinction from ULHR_BAD. But it's easy enough to retain, and it
does let us produce a slightly more specific message in one instance.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:10 -07:00
Jeff King 53eeed0a81 object-file.h: fix typo in variable declaration
This should be "compat", not "comapt".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:09 -07:00
Patrick Steinhardt 1a793261c5 object-store: move function declarations to their respective subsystems
We carry declarations for a couple of functions in "object-store.h" that
are not defined in "object-store.c", but in a different subsystem. Move
these declarations to the respective headers whose matching code files
carry the corresponding definition.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29 10:08:12 -07:00
Patrick Steinhardt 56ef85e82f object-store: drop `loose_object_path()`
The function `loose_object_path()` is a trivial wrapper around
`odb_loose_path()`, with the only exception that it always uses the
primary object database of the given repository. This doesn't really add
a ton of value though, so let's drop the function and inline it at every
callsite.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29 10:08:12 -07:00
Patrick Steinhardt 70c0f9db4e object-file: split up concerns of `HASH_*` flags
The functions `hash_object_file()`, `write_object_file()` and
`index_fd()` reuse the same set of flags to alter their behaviour. This
not only adds confusion, but given that every function only supports a
subset of the flags it becomes very hard to see which flags can be
passed to what function. Last but not least, this entangles the
implementation of all three function families.

Split up concerns by creating separate flags for each of the function
families.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15 08:24:36 -07:00
Patrick Steinhardt d9f517d051 object-file: split out functions relating to object store subsystem
While we have the "object-store.h" header, most of the functionality for
object stores is actually hosted in "object-file.c". This makes it hard
to find relevant functions and causes us to mix up concerns.

Split out functions relating to the object store subsystem into a new
"object-store.c" file.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15 08:24:36 -07:00
Patrick Steinhardt 97dc141fd6 object-file: move `git_open_cloexec()` to "compat/open.c"
The `git_open_cloexec()` wrapper function provides the ability to open a
file with `O_CLOEXEC` in a platform-agnostic way. This function is
provided by "object-file.c" even though it is not specific to the object
subsystem at all.

Move the file into "compat/open.c". This file already exists before this
commit, but has only been compiled conditionally depending on whether or
not open(3p) may return EINTR. With this change we now unconditionally
compile the object, but wrap `git_open_with_retry()` in an ifdef.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15 08:24:35 -07:00
Patrick Steinhardt 1a99fe8010 object-file: move `safe_create_leading_directories()` into "path.c"
The `safe_create_leading_directories()` function and its relatives are
located in "object-file.c", which is not a good fit as they provide
generic functionality not related to objects at all. Move them into
"path.c", which already hosts `safe_create_dir()` and its relative
`safe_create_dir_in_gitdir()`.

"path.c" is free of `the_repository`, but the moved functions depend on
`the_repository` to read the "core.sharedRepository" config. Adapt the
function signature to accept a repository as argument to fix the issue
and adjust callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15 08:24:35 -07:00
Patrick Steinhardt d1fa670de0 object-file: move `mkdir_in_gitdir()` into "path.c"
The `mkdir_in_gitdir()` function is similar to `safe_create_dir()`, but
the former is hosted in "object-file.c" whereas the latter is hosted in
"path.c". The latter code unit makes way more sense though as the logic
has nothing to do with object files in particular.

Move the file into "path.c". While at it, we:

  - Rename the function to `safe_create_dir_in_gitdir()` so that the
    function names are similar to one another.

  - Remove the dependency on `the_repository` by making the callers pass
    the repository instead.

Adjust callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15 08:24:34 -07:00
Taylor Blau b1b8dfde69 finalize_object_file(): implement collision check
We've had "FIXME!!! Collision check here ?" in finalize_object_file()
since aac1794132 (Improve sha1 object file writing., 2005-05-03). That
is, when we try to write a file with the same name, we assume the
on-disk contents are the same and blindly throw away the new copy.

One of the reasons we never implemented this is because the files it
moves are all named after the cryptographic hash of their contents
(either loose objects, or packs which have their hash in the name these
days). So we are unlikely to see such a collision by accident. And even
though there are weaknesses in sha1, we assume they are mitigated by our
use of sha1dc.

So while it's a theoretical concern now, it hasn't been a priority.
However, if we start using weaker hashes for pack checksums and names,
this will become a practical concern. So in preparation, let's actually
implement a byte-for-byte collision check.

The new check will cause the write of new differing content to be a
failure, rather than a silent noop, and we'll retain the temporary file
on disk. If there's no collision present, we'll clean up the temporary
file as usual after either rename()-ing or link()-ing it into place.

Note that this may cause some extra computation when the files are in
fact identical, but this should happen rarely.

Loose objects are exempt from this check, and the collision check may be
skipped by calling the _flags variant of this function with the
FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons:

  - We don't treat the hash of the loose object file's contents as a
    checksum, since the same loose object can be stored using different
    bytes on disk (e.g., when adjusting core.compression, using a
    different version of zlib, etc.).

    This is fundamentally different from cases where
    finalize_object_file() is operating over a file which uses the hash
    value as a checksum of the contents. In other words, a pair of
    identical loose objects can be stored using different bytes on disk,
    and that should not be treated as a collision.

  - We already use the path of the loose object as its hash value /
    object name, so checking for collisions at the content level doesn't
    add anything.

    Adding a content-level collision check would have to happen at a
    higher level than in finalize_object_file(), since (avoiding race
    conditions) writing an object loose which already exists in the
    repository will prevent us from even reaching finalize_object_file()
    via the object freshening code.

    There is a collision check in index-pack via its `check_collision()`
    function, but there isn't an analogous function in unpack-objects,
    which just feeds the result to write_object_file().

    So skipping the collision check here does not change for better or
    worse the hardness of loose object writes.

As a small note related to the latter bullet point above, we must teach
the tmp-objdir routines to similarly skip the content-level collision
checks when calling migrate_one() on a loose object file, which we do by
setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose
object shard.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27 11:27:47 -07:00
Elijah Newren d1cbe1e6d8 hash-ll.h: split out of hash.h to remove dependency on repository.h
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository->hash_algo).  However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id.  Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.

This allows hash.h and object.h to be fairly small, minimal headers.  It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:32 -07:00
Elijah Newren 87bed17907 object-file.h: move declarations for object-file.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:10 -07:00