The function `repo_set_gitdir()` is called in two situations:
- To initialize the repository with its discovered location. As part
of this we also set up the new object database.
- To update the repository's discovered location in case the process
changes its working directory so that we update relative paths. This
means we also have to update any relative paths that are potentially
used in the object database.
In the context of the object database we ideally wouldn't ever have to
worry about the second case: if all paths used by our object database
sources were absolute, then we wouldn't have to update them. But
unfortunately, the paths aren't only used to locate files owned by the
given source, but we also use them for reporting purposes. One such
example is `repo_get_object_directory()`, where we cannot just change
semantics to always return absolute paths, as that is likely to break
tooling out there.
One solution to this would be to have both a "display path" and an
"internal path". This would allow us to use internal paths for all
internal matters, but continue to use the potentially-relative display
paths so that we don't break compatibility. But converting the codebase
to honor this split is quite a messy endeavour, and it wouldn't even
help us with the goal to get rid of the need to update the display path
on chdir(3p).
Another solution would be to rework "setup.c" so that we never have to
update paths in the first place. In that case, we'd only initialize the
repository once we have figured out final locations for all directories.
This would be a significant simplification of that subsystem indeed, but
the current logic is so messy that it would take significant investments
to get there.
Meanwhile though, while object sources may still use relative paths, the
best thing we can do is to handle the reparenting of the object source
paths in the object database itself. This can be done by registering one
callback for each object database so that we get notified whenever the
current working directory changes, and we then perform the reparenting
ourselves.
Ideally, this wouldn't even happen on the object database level, but
instead handled by each object database source. But we don't yet have
proper pluggable object database sources, so this will need to be
handled at a later point in time.
The logic itself is rather simple:
- We register the callback when creating the object database.
- We unregister the callback when releasing it again.
- We split up `set_git_dir_1()` so that it becomes possible to skip
recreating the object database. This is required because the
function is called both when the current working directory changes,
but also when we set up the repository. Calling this function
without skipping creation of the ODB will result in a bug in case
it's already created.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic to set up a new object database is currently distributed
across two functions in "repository.c":
- In `initialize_repository()` we initialize an empty object database.
This object database is not fully initialized and doesn't have any
sources attached to it.
- The primary object database source is then created in
`repo_set_gitdir()`.
Ideally though, the logic should be entirely self-contained so that we
can iterate more readily on how exactly the sources themselves get set
up.
Refactor `odb_new()` to handle both allocation and setup of the object
database. This ensures that the object database is always initialized
and ready for use, and it allows us to change how the sources get set up
eventually.
Note that `repo_set_gitdir()` still reaches into the sources when the
function gets called with an already-initialized object database. This
will be fixed in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our object database sources have a field `disable_ref_updates`. This
field can obviously be set to disable reference updates, but it is
somewhat curious that this logic is hosted by the object database.
The reason for this is that it was primarily added to keep us from
accidentally updating references while an ODB transaction is ongoing.
Any objects part of the transaction have not yet been committed to disk,
so new references that point to them might get corrupted in case we
never end up committing the transaction. As such, whenever we create a
new transaction we set up a new temporary ODB source and mark it as
disabling reference updates.
This has one (and only one?) upside: once we have committed the
transaction, the temporary source will be dropped and thus we clean up
the disabled reference updates automatically. But other than that, it's
somewhat misdesigned:
- We can have multiple ODB sources, but only the currently active
source inhibits reference updates.
- We're mixing concerns of the refdb with the ODB.
Arguably, the decision of whether we can update references or not should
be handled by the refdb. But that wouldn't be a great fit either, as
there can be one refdb per worktree. So we'd again have the same problem
that a "global" intent becomes localized to a specific instance.
Instead, move the setting into the repository. While at it, convert it
into a boolean.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `odb_clear()` releases all resources allocated to an object
database and ensures that all fields become zero'd out. Despite its
naming though it doesn't really clear the object database so that it
becomes ready for reuse afterwards again -- the caller would first have
to reinitialize it, and that contradicts the terminology of "clearing"
as we have defined it in our coding guidelines.
There isn't really only a reason to have "clearing" semantics, either.
There's only a single caller of `odb_clear()`, and that caller also ends
up freeing the object database structure itself.
Refactor the function to have "freeing" semantics instead, so that the
structure itself is also freed, which allows us to drop some useless
boilerplate to zero out the structure's members.
This refactoring reveals that we're trying to close the commit graph
multiple times: once directly via `free_commit_graph()`, and once via
`odb_close()`. Drop the former call.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have three different locations where we create a new ODB source.
Deduplicate the logic via a new `odb_source_new()` function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"core.commentChar=auto" that attempts to dynamically pick a
suitable comment character is non-workable, as it is too much
trouble to support for little benefit, and is marked as deprecated.
* pw/3.0-commentchar-auto-deprecation:
commit: print advice when core.commentString=auto
config: warn on core.commentString=auto
breaking-changes: deprecate support for core.commentString=auto
As support for this setting was deprecated in the last commit print a
warning (or die when WITH_BREAKING_CHANGES is enabled) if it is set.
Avoid bombarding the user with warnings by only printing it (a) when
running commands that call "git commit" and (b) only once per command.
Some scaffolding is added to repo_read_config() to allow it to
detect deprecated config settings and warn about them. As both
"core.commentChar" and "core.commentString" set the comment
character we record which one of them is used and tailor the
warning message appropriately.
Note the odd combination of die_message() followed by die(NULL)
is to allow the next commit to insert a call to advise() in the middle.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Object database sources are classified either as:
- Local, which means that the source is the repository's primary
source. This is typically ".git/objects".
- Non-local, which is everything else. Most importantly this includes
alternates and quarantine directories.
This locality is often computed ad-hoc by checking whether a given
object source is the first one. This works, but it is quite roundabout.
Refactor the code so that we store locality when creating the sources in
the first place. This makes it both more accessible and robust.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up around object access API.
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
The 'extensions.preciousObjects' setting when set true, prevents
operations that might drop objects from the object storage. This setting
is populated in the global variable
'repository_format_precious_objects'.
Move this global variable to repo scope by adding it to 'struct
repository and also refactor all the occurences accordingly.
This change is part of an ongoing effort to eliminate global variables,
improve modularity and help libify the codebase.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In subsequent commits we'll get rid of our use of `the_repository` in
"odb.c" in favor of explicitly passing in a `struct object_database` or
a `struct odb_source`. In some cases though we'll need access to the
repository, for example to read a config value from it, but we don't
have a way to access the repository owning a specific object database.
Introduce parent pointers for `struct object_database` to its owning
repository as well as for `struct odb_source` to its owning object
database, which will allow us to adapt those use cases.
Note that this change requires us to pass through the object database to
`link_alt_odb_entry()` so that we can set up the parent pointers for any
source there. The callchain is adapted to pass through the object
database accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `object_directory` structure is used as an access point for a single
object directory like ".git/objects". While the structure isn't yet
fully self-contained, the intent is for it to eventually contain all
information required to access objects in one specific location.
While the name "object directory" is a good fit for now, this will
change over time as we continue with the agenda to make pluggable object
databases a thing. Eventually, objects may not be accessed via any kind
of directory at all anymore, but they could instead be backed by any
kind of durable storage mechanism. While it seems quite far-fetched for
now, it is thinkable that eventually this might even be some form of a
database, for example.
As such, the current name of this structure will become worse over time
as we evolve into the direction of pluggable ODBs. Immediate next steps
will start to carve out proper self-contained object directories, which
requires us to pass in these object directories as parameters. Based on
our modern naming schema this means that those functions should then be
named after their subsystem, which means that we would start to bake the
current name into the codebase more and more.
Let's preempt this by renaming the structure. There have been a couple
alternatives that were discussed:
- `odb_backend` was discarded because it led to the association that
one object database has a single backend, but the model is that one
alternate has one backend. Furthermore, "backend" is more about the
actual backing implementation and less about the high-level concept.
- `odb_alternate` was discarded because it is a bit of a stretch to
also call the main object directory an "alternate".
Instead, pick `odb_source` as the new name. It makes it sufficiently
clear that there can be multiple sources and does not cause confusion
when mixed with the already-existing "alternate" terminology.
In the future, this change allows us to easily introduce for example a
`odb_files_source` and other format-specific implementations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `raw_object_store` structure is the central entry point for reading
and writing objects in a repository. The main purpose of this structure
is to manage object directories and provide an interface to access and
write objects in those object directories.
Right now, many of the functions associated with the raw object store
implicitly rely on `the_repository` to get access to its `objects`
pointer, which is the `raw_object_store`. As we want to generally get
rid of using `the_repository` across our codebase we will have to
convert this implicit dependency on this global variable into an
explicit parameter.
This conversion can be done by simply passing in an explicit pointer to
a repository and then using its `->objects` pointer. But there is a
second effort underway, which is to make the object subsystem more
selfcontained so that we can eventually have pluggable object backends.
As such, passing in a repository wouldn't make a ton of sense, and the
goal is to convert the object store interfaces such that we always pass
in a reference to the `raw_object_store` instead.
This will expose the `raw_object_store` type to a lot more callers
though, which surfaces that this type is named somewhat awkwardly. The
"raw_" prefix makes readers wonder whether there is a non-raw variant of
the object store, but there isn't. Furthermore, we nowadays want to name
functions in a way that they can be clearly attributed to a specific
subsystem, but calling them e.g. `raw_object_store_has_object()` is just
too unwieldy, even when dropping the "raw_" prefix.
Instead, rename the structure to `object_database`. This term is already
used a lot throughout our codebase, and it cannot easily be mistaken for
"object directories", either. Furthermore, its acronym ODB is already
well-known and works well as part of a function's name, like for example
`odb_has_object()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.
Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't provide a way to clear a `struct repo_settings`, and instead
open-code this in `repo_clear()`. This is mixing up concerns and means
that developers have to touch multiple files whenever they add a new
field to the structure in case the associated resources need to be
released.
Provide a new `repo_settings_clear()` function to improve this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As explained in an earlier commit, we're refactoring path-related
functions to provide a consistent interface for computing paths into the
commondir, gitdir and worktree. Refactor the "worktree" family of
functions accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new extension, `relativeWorktrees`, is added to indicate that at least
one worktree in the repository has been linked with relative paths.
This ensures older Git versions do not attempt to automatically prune
worktrees with relative paths, as they would not not recognize the
paths as being valid.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* ps/environ-wo-the-repository: (21 commits)
environment: stop storing "core.notesRef" globally
environment: stop storing "core.warnAmbiguousRefs" globally
environment: stop storing "core.preferSymlinkRefs" globally
environment: stop storing "core.logAllRefUpdates" globally
refs: stop modifying global `log_all_ref_updates` variable
branch: stop modifying `log_all_ref_updates` variable
repo-settings: track defaults close to `struct repo_settings`
repo-settings: split out declarations into a standalone header
environment: guard state depending on a repository
environment: reorder header to split out `the_repository`-free section
environment: move `set_git_dir()` and related into setup layer
environment: make `get_git_namespace()` self-contained
environment: move object database functions into object layer
config: make dependency on repo in `read_early_config()` explicit
config: document `read_early_config()` and `read_very_early_config()`
environment: make `get_git_work_tree()` accept a repository
environment: make `get_graft_file()` accept a repository
environment: make `get_index_file()` accept a repository
environment: make `get_object_directory()` accept a repository
environment: make `get_git_common_dir()` accept a repository
...
The `get_git_work_tree()` function retrieves the path of the work tree
of `the_repository`. Make it accept a `struct repository` such that it
can work on arbitrary repositories and make it part of the repository
subsystem. This reduces our reliance on `the_repository` and clarifies
scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_graft_file()` function retrieves the path to the graft file of
`the_repository`. Make it accept a `struct repository` such that it can
work on arbitrary repositories and make it part of the repository
subsystem. This reduces our reliance on `the_repository` and clarifies
scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_index_file()` function retrieves the path to the index file
of `the_repository`. Make it accept a `struct repository` such that it
can work on arbitrary repositories and make it part of the repository
subsystem. This reduces our reliance on `the_repository` and clarifies
scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_object_directory()` function retrieves the path to the object
directory for `the_repository`. Make it accept a `struct repository`
such that it can work on arbitrary repositories and make it part of the
repository subsystem. This reduces our reliance on `the_repository` and
clarifies scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_git_common_dir()` function retrieves the path to the common
directory for `the_repository`. Make it accept a `struct repository`
such that it can work on arbitrary repositories and make it part of the
repository subsystem. This reduces our reliance on `the_repository` and
clarifies scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_git_dir()` function retrieves the path to the Git directory for
`the_repository`. Make it accept a `struct repository` such that it can
work on arbitrary repositories and make it part of the repository
subsystem. This reduces our reliance on `the_repository` and clarifies
scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not clear grafts part of the parsed object pool when clearing the
pool itself, which can lead to memory leaks when a repository is being
cleared.
Fix this by moving `reset_commit_grafts()` into "object.c" and making it
part of the `struct parsed_object_pool` interface such that we can call
it from `parsed_object_pool_clear()`. Adapt `parsed_object_pool_new()`
to take and store a reference to its owning repository, which is needed
by `unparse_commit()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.
It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.
Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit
For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).
Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/ref-storage-migration:
builtin/refs: new command to migrate ref storage formats
refs: implement logic to migrate between ref storage formats
refs: implement removal of ref storages
worktree: don't store main worktree twice
reftable: inline `merged_table_release()`
refs/files: fix NULL pointer deref when releasing ref store
refs/files: extract function to iterate through root refs
refs/files: refactor `add_pseudoref_and_head_entries()`
refs: allow to skip creation of reflog entries
refs: pass storage format to `ref_store_init()` explicitly
refs: convert ref storage format to an enum
setup: unset ref storage when reinitializing repository version
The ref storage format is tracked as a simple unsigned integer, which
makes it harder than necessary to discover what that integer actually is
or where its values are defined.
Convert the ref storage format to instead be an enum.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The base topic started to make it an error for a command to leave
the hash algorithm unspecified, which revealed a few commands that
were not ready for the change. Give users a knob to revert back to
the "default is sha-1" behaviour as an escape hatch, and start
fixing these breakages.
* jc/undecided-is-not-necessarily-sha1-fix:
apply: fix uninitialized hash function
builtin/hash-object: fix uninitialized hash function
builtin/patch-id: fix uninitialized hash function
t1517: test commands that are designed to be run outside repository
setup: add an escape hatch for "no more default hash algorithm" change
Further clean-up the refs subsystem to stop relying on
the_repository, and instead use the repository associated to the
ref_store object.
* ps/refs-without-the-repository-updates:
refs/packed: remove references to `the_hash_algo`
refs/files: remove references to `the_hash_algo`
refs/files: use correct repository
refs: remove `dwim_log()`
refs: drop `git_default_branch_name()`
refs: pass repo when peeling objects
refs: move object peeling into "object.c"
refs: pass ref store when detecting dangling symrefs
refs: convert iteration over replace refs to accept ref store
refs: retrieve worktree ref stores via associated repository
refs: refactor `resolve_gitlink_ref()` to accept a repository
refs: pass repo when retrieving submodule ref store
refs: track ref stores via strmap
refs: implement releasing ref storages
refs: rename `init_db` callback to avoid confusion
refs: adjust names for `init` and `init_db` callbacks
Before discovering the repository details, We used to assume SHA-1
as the "default" hash function, which has been corrected. Hopefully
this will smoke out codepaths that rely on such an unwarranted
assumptions.
* ps/undecided-is-not-necessarily-sha1:
repository: stop setting SHA1 as the default object hash
oss-fuzz/commit-graph: set up hash algorithm
builtin/shortlog: don't set up revisions without repo
builtin/diff: explicitly set hash algo when there is no repo
builtin/bundle: abort "verify" early when there is no repository
builtin/blame: don't access potentially unitialized `the_hash_algo`
builtin/rev-parse: allow shortening to more than 40 hex characters
remote-curl: fix parsing of detached SHA256 heads
attr: fix BUG() when parsing attrs outside of repo
attr: don't recompute default attribute source
parse-options-cb: only abbreviate hashes when hash algo is known
path: move `validate_headref()` to its only user
path: harden validation of HEAD with non-standard hashes
Partially revert c8aed5e8 (repository: stop setting SHA1 as the
default object hash, 2024-05-07), to keep end-user systems still
broken when we have gap in our test coverage but yet give them an
escape hatch to set the GIT_TEST_DEFAULT_HASH_ALGO environment
variable to "sha1" in order to revert to the previous behaviour, in
case we haven't done a thorough job in fixing the fallout from
c8aed5e8. After we build confidence, we should remove the escape
hatch support, but we are not there yet after only fixing three
commands (hash-object, apply, and patch-id) in this series.
Due to the way the end-user facing GIT_DEFAULT_HASH environment
variable is used in our test suite, we unfortunately cannot reuse it
for this purpose.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar as with the preceding commit, the worktree ref stores are always
looked up via `the_repository`. Also, again, those ref stores are stored
in a global map.
Refactor the code so that worktrees have a pointer to their repository.
Like this, we can move the global map into `struct repository` and stop
using `the_repository`. With this change, we can now in theory look up
worktree ref stores for repositories other than `the_repository`. In
practice, the worktree code will need further changes to look up
arbitrary worktrees.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Looking up submodule ref stores has two deficiencies:
- The initialized subrepo will be attributed to `the_repository`.
- The submodule ref store will be tracked in a global map.
This makes it impossible to have submodule ref stores for a repository
other than `the_repository`.
Modify the function to accept the parent repository as parameter and
move the global map into `struct repository`. Like this it becomes
possible to look up submodule ref stores for arbitrary repositories.
Note that this also adds a new reference to `the_repository` in
`resolve_gitlink_ref()`, which is part of the refs interfaces. This will
get adjusted in the next patch.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE4fA2sf7nIh/HeOzvsLXohpav5ssFAmYxBJ0ACgkQsLXohpav
5suE6A//RTmt/rsMCDvpHEYSvox0ln5oMWyXrqKiHLxesMc0uLWRHAUDrHGCg7JP
OoZkf1cV2yOcD4lhO4YrlcHR3n1xdAyGrhc5vyLI4DFAAxdOLl4VDHRazXm51u+p
8GLxQY/1xu9bvde1PDYL2qtjDMskMgqb2Rfvv6ULpfICJrioy+CO5wud7BYIX4qB
oFZQnFLrQnSW9XT3r2+hKJKP4cHXQX5tYY0mkiy3bjbscNGyjdrkqMjJ2QEIWqhj
SUCujS5Clx6WKr0uLxoKs1IemdV0lkg2IbsxMZ5yYxLH2P9O7jQHvjgOx5NgfRlu
NtYMWsrkYhylWUxLiTFgLbJ8DE6sjN+emYOqCDRlr7XPvsvVX6eucX9YRxS4C/XP
izoOhAHJOFRaI/nMuG7iOOmnobKJKy0PbVFgA4W8MtNKZ+4taKF24aSK3TZpArhX
Z3gMQwSWoO6KVPJ7+Et2x/WV5BmVAbpMMufX2ErwOhMDMO9jlvYy0q2OeCaiMg1c
xZGGxC441IsYPVwSrJFU/U+Pl190PEazgmclkaqdothbjeMPb/gBV4j46Rznjld4
68n3h1rW2S5AQbMKie+/Yygi0O087VAvTMsYPxDKsDmbeUHvCEd148dKgdeU59ct
IXkrf2UW7dUWwZv2lv8NMdLue2M5bB9Yeufg3GJkfOaTy+1S5TM=
=g/43
-----END PGP SIGNATURE-----
Sync with Git 2.45.1
* tag 'v2.45.1': (42 commits)
Git 2.45.1
Git 2.44.1
Git 2.43.4
Git 2.42.2
Git 2.41.1
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
...
During the startup of Git, we call `initialize_the_repository()` to set
up `the_repository` as well as `the_index`. Part of this setup is also
to set the default object hash of the repository to SHA1. This has the
effect that `the_hash_algo` is getting initialized to SHA1, as well.
This default hash algorithm eventually gets overridden by most Git
commands via `setup_git_directory()`, which also detects the actual hash
algorithm used by the repository.
There are some commands though that don't access a repository at all, or
at a later point only, and thus retain the default hash function for
some amount of time. As some of the the preceding commits demonstrate,
this can lead to subtle issues when we access `the_hash_algo` when no
repository has been set up.
Address this issue by dropping the set up of the default hash algorithm
completely. The effect of this is that `the_hash_algo` will map to a
`NULL` pointer and thus cause Git to crash when something tries to
access the hash algorithm without it being properly initialized. It thus
forces all Git commands to explicitly set up the hash algorithm in case
there is no repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint-2.44: (41 commits)
Git 2.44.1
Git 2.43.4
Git 2.42.2
Git 2.41.1
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
clone: prevent clashing git dirs when cloning submodule in parallel
...
* maint-2.43: (40 commits)
Git 2.43.4
Git 2.42.2
Git 2.41.1
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
clone: prevent clashing git dirs when cloning submodule in parallel
t7423: add tests for symlinked submodule directories
...
* maint-2.41: (38 commits)
Git 2.41.1
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
clone: prevent clashing git dirs when cloning submodule in parallel
t7423: add tests for symlinked submodule directories
has_dir_name(): do not get confused by characters < '/'
docs: document security issues around untrusted .git dirs
...
* maint-2.40: (39 commits)
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
clone: prevent clashing git dirs when cloning submodule in parallel
t7423: add tests for symlinked submodule directories
has_dir_name(): do not get confused by characters < '/'
docs: document security issues around untrusted .git dirs
upload-pack: disable lazy-fetching by default
...
Now that we have dropped `the_index`, `initialize_the_repository()`
doesn't really do a lot anymore except for setting up the pointer for
`the_repository` and then calling `initialize_repository()`. The former
can be replaced by statically initializing the pointer though, which
basically makes this function moot.
Convert callers to instead call `initialize_repository(the_repository)`
and drop `initialize_thee_repository()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All users of `the_index` have been converted to use either a custom
`struct index_state *` or the index provided by `the_repository`. We can
thus drop the globally-accessible declaration of this variable. In fact,
we can go further than that and drop `the_index` completely now and have
it be allocated dynamically in `initialize_repository()` as all the
other data structures in it are.
This concludes the quest to make Git `the_index` free, which has started
with 4aab5b46f4 (Make read-cache.c "the_index" free., 2007-04-01).
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When Git starts, one of the first things it will do is to call
`initialize_the_repository()`. This function sets up both the global
`the_repository` and `the_index` variables as required. Part of that
setup is also to set `the_repository.index = &the_index` so that the
index can be accessed via the repository.
When calling `repo_init()` on a repository though we set the complete
struct to all-zeroes, which will also cause us to unset the `index`
pointer. And as we don't re-initialize the index in that function, we
will end up with a `NULL` pointer here.
This has been fine until now becaues this function is only used to
create a new repository. git-init(1) does not access the index at all
after initializing the repository, whereas git-checkout(1) only uses
`the_index` directly. We are about to remove `the_index` though, which
will uncover this partially-initialized repository structure.
Refactor the code and create a common `initialize_repository()` function
that gets called from `repo_init()` and `initialize_the_repository()`.
This function sets up both the repository and the index as required.
Like this, we can easily special-case when `repo_init()` gets called
with `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `fsmonitor` repo-setting data is allocated lazily. It needs to be
released in `repo_clear()` along the rest of the allocated data in
`repository`.
This is needed because the next commit will merge v2.39.4, which will
add a `git checkout --recurse-modules` call (which would leak memory
without this here fix) to an otherwise leak-free test script.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.
* eb/hash-transition: (30 commits)
t1016-compatObjectFormat: add tests to verify the conversion between objects
t1006: test oid compatibility with cat-file
t1006: rename sha1 to oid
test-lib: compute the compatibility hash so tests may use it
builtin/ls-tree: let the oid determine the output algorithm
object-file: handle compat objects in check_object_signature
tree-walk: init_tree_desc take an oid to get the hash algorithm
builtin/cat-file: let the oid determine the output algorithm
rev-parse: add an --output-object-format parameter
repository: implement extensions.compatObjectFormat
object-file: update object_info_extended to reencode objects
object-file-convert: convert commits that embed signed tags
object-file-convert: convert commit objects when writing
object-file-convert: don't leak when converting tag objects
object-file-convert: convert tag objects when writing
object-file-convert: add a function to convert trees between algorithms
object: factor out parse_mode out of fast-import and tree-walk into in object.h
cache: add a function to read an OID of a specific algorithm
tag: sign both hashes
commit: export add_header_signature to support handling signatures on tags
...
Similar to the preceding conversion of the AUTO_MERGE pseudo-ref, let's
convert the MERGE_AUTOSTASH ref to become a normal pseudo-ref as well.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 70c70de616 (refs: complete list of special refs, 2023-12-14) we have
inrtoduced a new `is_special_ref()` function that classifies some refs
as being special. The rule is that special refs are exclusively read and
written via the filesystem directly, whereas normal refs exclucsively go
via the refs API.
The intent of that commit was to record the status quo so that we know
to route reads of such special refs consistently. Eventually, the list
should be reduced to its bare minimum of refs which really are special,
namely FETCH_HEAD and MERGE_HEAD.
Follow up on this promise and convert the AUTO_MERGE ref to become a
normal pseudo-ref by using the refs API to both read and write it
instead of accessing the filesystem directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to discern which ref storage format a repository is supposed to
use we need to start setting up and/or discovering the format. This
needs to happen in two separate code paths.
- The first path is when we create a repository via `init_db()`. When
we are re-initializing a preexisting repository we need to retain
the previously used ref storage format -- if the user asked for a
different format then this indicates an error and we error out.
Otherwise we either initialize the repository with the format asked
for by the user or the default format, which currently is the
"files" backend.
- The second path is when discovering repositories, where we need to
read the config of that repository. There is not yet any way to
configure something other than the "files" backend, so we can just
blindly set the ref storage format to this backend.
Wire up this logic so that we have the ref storage format always readily
available when needed. As there is only a single backend and because it
is not configurable we cannot yet verify that this tracking works as
expected via tests, but tests will be added in subsequent commits. To
countermand this ommission now though, raise a BUG() in case the ref
storage format is not set up properly in `ref_store_init()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a configuration option to enable updating and reading from
compatibility hash maps when git accesses the reposotiry.
Call the helper function repo_set_compat_hash_algo with the value
that compatObjectFormat is set to.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>