Now that we have dropped `the_index`, `initialize_the_repository()`
doesn't really do a lot anymore except for setting up the pointer for
`the_repository` and then calling `initialize_repository()`. The former
can be replaced by statically initializing the pointer though, which
basically makes this function moot.
Convert callers to instead call `initialize_repository(the_repository)`
and drop `initialize_thee_repository()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All users of `the_index` have been converted to use either a custom
`struct index_state *` or the index provided by `the_repository`. We can
thus drop the globally-accessible declaration of this variable. In fact,
we can go further than that and drop `the_index` completely now and have
it be allocated dynamically in `initialize_repository()` as all the
other data structures in it are.
This concludes the quest to make Git `the_index` free, which has started
with 4aab5b46f4 (Make read-cache.c "the_index" free., 2007-04-01).
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert git-clone(1) to use `the_repository->index` instead of
`the_index`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When Git starts, one of the first things it will do is to call
`initialize_the_repository()`. This function sets up both the global
`the_repository` and `the_index` variables as required. Part of that
setup is also to set `the_repository.index = &the_index` so that the
index can be accessed via the repository.
When calling `repo_init()` on a repository though we set the complete
struct to all-zeroes, which will also cause us to unset the `index`
pointer. And as we don't re-initialize the index in that function, we
will end up with a `NULL` pointer here.
This has been fine until now becaues this function is only used to
create a new repository. git-init(1) does not access the index at all
after initializing the repository, whereas git-checkout(1) only uses
`the_index` directly. We are about to remove `the_index` though, which
will uncover this partially-initialized repository structure.
Refactor the code and create a common `initialize_repository()` function
that gets called from `repo_init()` and `initialize_the_repository()`.
This function sets up both the repository and the index as required.
Like this, we can easily special-case when `repo_init()` gets called
with `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert builtins to use `the_repository->index` instead of `the_index`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert test-helper tools to use `the_repository->index` instead of
`the_index`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This topic branch fixes two vulnerabilities:
- Recursive clones on case-insensitive filesystems that support symbolic
links are susceptible to case confusion that can be exploited to
execute just-cloned code during the clone operation.
- Repositories can be configured to execute arbitrary code during local
clones. To address this, the ownership checks introduced in v2.30.3
are now extended to cover cloning local repositories.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We will need to call this function from `hook.c` to be able to prevent
hooks from running that were written as part of a `clone` but did not
originate from the template directory.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When looking for a hook and not finding one, and when `STRIP_EXTENSION`
is available (read: if we're on Windows and `.exe` is the required
extension for executable programs), we want to look also for a hook with
that extension.
Previously, we added that handling into the conditional block that was
meant to handle when no hook was found (possibly providing some advice
for the user's benefit). If the hook with that file extension was found,
we'd return early from that function instead of writing out said advice,
of course.
However, we're about to introduce a safety valve to prevent hooks from
being run during a clone, to reduce the attack surface of bugs that
allow writing files to be written into arbitrary locations.
To prepare for that, refactor the logic to avoid the early return, by
separating the `STRIP_EXTENSION` handling from the conditional block
handling the case when no hook was found.
This commit is best viewed with `--patience`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When recursively cloning a repository with submodules, we must ensure
that the submodules paths do not suddenly contain symbolic links that
would let Git write into unintended locations. We just plugged that
vulnerability, but let's add some more defense-in-depth.
Since we can only keep one item on disk if multiple index entries' paths
collide, we may just as well avoid keeping a symbolic link (because that
would allow attack vectors where Git follows those links by mistake).
Technically, we handle more situations than cloning submodules into
paths that were (partially) replaced by symbolic links. This provides
defense-in-depth in case someone finds a case-folding confusion
vulnerability in the future that does not even involve submodules.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In b878579ae7 (clone: report duplicate entries on case-insensitive
filesystems, 2018-08-17) code was added to warn about index entries that
resolve to the same file system entity (usually the cause is a
case-insensitive filesystem).
In Git for Windows, where inodes are not trusted (because of a
performance trade-off, inodes are equal to 0 by default), that check
does not compare inode numbers but the verbatim path.
This logic works well when index entries' paths differ only in case.
However, for file/directory conflicts only the file's path was reported,
leaving the user puzzled with what that path collides.
Let's try ot catch colliding paths even if one path is the prefix of the
other. We do this also in setups where the file system is case-sensitive
because the inode check would not be able to catch those collisions.
While not a complete solution (for example, on macOS, Unicode
normalization could also lead to file/directory conflicts but be missed
by this logic), it is at least another defensive layer on top of what
the previous commits added.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The most critical vulnerabilities in Git lead to a Remote Code Execution
("RCE"), i.e. the ability for an attacker to have malicious code being
run as part of a Git operation that is not expected to run said code,
such has hooks delivered as part of a `git clone`.
A couple of parent commits ago, a bug was fixed that let Git be confused
by the presence of a path `a-` to mistakenly assume that a directory
`a/` can safely be created without removing an existing `a` that is a
symbolic link.
This bug did not represent an exploitable vulnerability on its
own; Let's make sure it stays that way.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Submodules are stored in subdirectories of their superproject. When
these subdirectories have been replaced with symlinks by a malicious
actor, all kinds of mayhem can be caused.
This _should_ not be possible, but many CVEs in the past showed that
_when_ possible, it allows attackers to slip in code that gets executed
during, say, a `git clone --recursive` operation.
Let's add some defense-in-depth to disallow submodule paths to have
anything except directories in them.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In 0060fd1511 (clone --recurse-submodules: prevent name squatting on
Windows, 2019-09-12), I introduced code to verify that a git dir either
does not exist, or is at least empty, to fend off attacks where an
inadvertently (and likely maliciously) pre-populated git dir would be
used while cloning submodules recursively.
The logic used `access(<path>, X_OK)` to verify that a directory exists
before calling `is_empty_dir()` on it. That is a curious way to check
for a directory's existence and might well fail for unwanted reasons.
Even the original author (it was I ;-) ) struggles to explain why this
function was used rather than `stat()`.
This code was _almost_ copypastad in the previous commit, but that
`access()` call was caught during review.
Let's use `stat()` instead also in the code that was almost copied
verbatim. Let's not use `lstat()` because in the unlikely event that
somebody snuck a symbolic link in, pointing to a crafted directory, we
want to verify that that directory is empty.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When creating a submodule path, we must be careful not to follow
symbolic links. Otherwise we may follow a symbolic link pointing to
a gitdir (which are valid symbolic links!) e.g. while cloning.
On case-insensitive filesystems, however, we blindly replace a directory
that has been created as part of the `clone` operation with a symlink
when the path to the latter differs only in case from the former's path.
Let's simply avoid this situation by expecting not ever having to
overwrite any existing file/directory/symlink upon cloning. That way, we
won't even replace a directory that we just created.
This addresses CVE-2024-32002.
Reported-by: Filip Hejsek <filip.hejsek@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
While it is expected to have several git dirs within the `.git/modules/`
tree, it is important that they do not interfere with each other. For
example, if one submodule was called "captain" and another submodule
"captain/hooks", their respective git dirs would clash, as they would be
located in `.git/modules/captain/` and `.git/modules/captain/hooks/`,
respectively, i.e. the latter's files could clash with the actual Git
hooks of the former.
To prevent these clashes, and in particular to prevent hooks from being
written and then executed as part of a recursive clone, we introduced
checks as part of the fix for CVE-2019-1387 in a8dee3ca61 (Disallow
dubiously-nested submodule git directories, 2019-10-01).
It is currently possible to bypass the check for clashing submodule
git dirs in two ways:
1. parallel cloning
2. checkout --recurse-submodules
Let's check not only before, but also after parallel cloning (and before
checking out the submodule), that the git dir is not clashing with
another one, otherwise fail. This addresses the parallel cloning issue.
As to the parallel checkout issue: It requires quite a few manual steps
to create clashing git dirs because Git itself would refuse to
initialize the inner one, as demonstrated by the test case.
Nevertheless, let's teach the recursive checkout (namely, the
`submodule_move_head()` function that is used by the recursive checkout)
to be careful to verify that it does not use a clashing git dir, and if
it does, disable it (by deleting the `HEAD` file so that subsequent Git
calls won't recognize it as a git dir anymore).
Note: The parallel cloning test case contains a `cat err` that proved to
be highly useful when analyzing the racy nature of the operation (the
operation can fail with three different error messages, depending on
timing), and was left on purpose to ease future debugging should the
need arise.
Signed-off-by: Filip Hejsek <filip.hejsek@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Submodule operations must not follow symlinks in working tree, because
otherwise files might be written to unintended places, leading to
vulnerabilities.
Signed-off-by: Filip Hejsek <filip.hejsek@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
There is a bug in directory/file ("D/F") conflict checking optimization:
It assumes that such a conflict cannot happen if a newly added entry's
path is lexicgraphically "greater than" the last already-existing index
entry _and_ contains a directory separator that comes strictly after the
common prefix (`len > len_eq_offset`).
This assumption is incorrect, though: `a-` sorts _between_ `a` and
`a/b`, their common prefix is `a`, the slash comes after the common
prefix, and there is still a file/directory conflict.
Let's re-design this logic, taking these facts into consideration:
- It is impossible for a file to sort after another file with whose
directory it conflicts because the trailing NUL byte is always smaller
than any other character.
- Since there are quite a number of ASCII characters that sort before
the slash (e.g. `-`, `.`, the space character), looking at the last
already-existing index entry is not enough to determine whether there
is a D/F conflict when the first character different from the
existing last index entry's path is a slash.
If it is not a slash, there cannot be a file/directory conflict.
And if the existing index entry's first different character is a
slash, it also cannot be a file/directory conflict because the
optimization requires the newly-added entry's path to sort _after_ the
existing entry's, and the conflicting file's path would not.
So let's fall back to the regular binary search whenever the newly-added
item's path differs in a slash character. If it does not, and it sorts
after the last index entry, there is no D/F conflict and the new index
entry can be safely appended.
This fix also nicely simplifies the logic and makes it much easier to
reason about, while the impact on performance should be negligible:
After this fix, the optimization will be skipped only when index
entry's paths differ in a slash and a space, `!`, `"`, `#`, `$`,
`%`, `&`, `'`, | ( `)`, `*`, `+`, `,`, `-`, or `.`, which should
be a rare situation.
Signed-off-by: Filip Hejsek <filip.hejsek@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
For a long time our general philosophy has been that it's unsafe to run
arbitrary Git commands if you don't trust the hooks or config in .git,
but that running upload-pack should be OK. E.g., see 1456b043fc (Remove
post-upload-hook, 2009-12-10), or the design of uploadpack.packObjectsHook.
But we never really documented this (and even the discussions that led
to 1456b043fc were not on the public list!). Let's try to make our
approach more clear, but also be realistic that even upload-pack carries
some risk.
Helped-by: Filip Hejsek <filip.hejsek@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The upload-pack command tries to avoid trusting the repository in which
it's run (e.g., by not running any hooks and not using any config that
contains arbitrary commands). But if the server side of a fetch or a
clone is a partial clone, then either upload-pack or its child
pack-objects may run a lazy "git fetch" under the hood. And it is very
easy to convince fetch to run arbitrary commands.
The "server" side can be a local repository owned by someone else, who
would be able to configure commands that are run during a clone with the
current user's permissions. This issue has been designated
CVE-2024-32004.
The fix in this commit's parent helps in this scenario, as well as in
related scenarios using SSH to clone, where the untrusted .git directory
is owned by a different user id. But if you received one as a zip file,
on a USB stick, etc, it may be owned by your user but still untrusted.
This has been designated CVE-2024-32465.
To mitigate the issue more completely, let's disable lazy fetching
entirely during `upload-pack`. While fetching from a partial repository
should be relatively rare, it is certainly not an unreasonable workflow.
And thus we need to provide an escape hatch.
This commit works by respecting a GIT_NO_LAZY_FETCH environment variable
(to skip the lazy-fetch), and setting it in upload-pack, but only when
the user has not already done so (which gives us the escape hatch).
The name of the variable is specifically chosen to match what has
already been added in 'master' via e6d5479e7a (git: extend
--no-lazy-fetch to work across subprocesses, 2024-02-27). Since we're
building this fix as a backport for older versions, we could cherry-pick
that patch and its earlier steps. However, we don't really need the
niceties (like a "--no-lazy-fetch" option) that it offers. By using the
same name, everything should just work when the two are eventually
merged, but here are a few notes:
- the blocking of the fetch in e6d5479e7a is incomplete! It sets
fetch_if_missing to 0 when we setup the repository variable, but
that isn't enough. pack-objects in particular will call
prefetch_to_pack() even if that variable is 0. This patch by
contrast checks the environment variable at the lowest level before
we call the lazy fetch, where we can be sure to catch all code
paths.
Possibly the setting of fetch_if_missing from e6d5479e7a can be
reverted, but it may be useful to have. For example, some code may
want to use that flag to change behavior before it gets to the point
of trying to start the fetch. At any rate, that's all outside the
scope of this patch.
- there's documentation for GIT_NO_LAZY_FETCH in e6d5479e7a. We can
live without that here, because for the most part the user shouldn't
need to set it themselves. The exception is if they do want to
override upload-pack's default, and that requires a separate
documentation section (which is added here)
- it would be nice to use the NO_LAZY_FETCH_ENVIRONMENT macro added by
e6d5479e7a, but those definitions have moved from cache.h to
environment.h between 2.39.3 and master. I just used the raw string
literals, and we can replace them with the macro once this topic is
merged to master.
At least with respect to CVE-2024-32004, this does render this commit's
parent commit somewhat redundant. However, it is worth retaining that
commit as defense in depth, and because it may help other issues (e.g.,
symlink/hardlink TOCTOU races, where zip files are not really an
interesting attack vector).
The tests in t0411 still pass, but now we have _two_ mechanisms ensuring
that the evil command is not run. Let's beef up the existing ones to
check that they failed for the expected reason, that we refused to run
upload-pack at all with an alternate user id. And add two new ones for
the same-user case that both the restriction and its escape hatch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When cloning from somebody else's repositories, it is possible that,
say, the `upload-pack` command is overridden in the repository that is
about to be cloned, which would then be run in the user's context who
started the clone.
To remind the user that this is a potentially unsafe operation, let's
extend the ownership checks we have already established for regular
gitdir discovery to extend also to local repositories that are about to
be cloned.
This protection extends also to file:// URLs.
The fixes in this commit address CVE-2024-32004.
Note: This commit does not touch the `fetch`/`clone` code directly, but
instead the function used implicitly by both: `enter_repo()`. This
function is also used by `git receive-pack` (i.e. pushes), by `git
upload-archive`, by `git daemon` and by `git http-backend`. In setups
that want to serve repositories owned by different users than the
account running the service, this will require `safe.*` settings to be
configured accordingly.
Also note: there are tiny time windows where a time-of-check-time-of-use
("TOCTOU") race is possible. The real solution to those would be to work
with `fstat()` and `openat()`. However, the latter function is not
available on Windows (and would have to be emulated with rather
expensive low-level `NtCreateFile()` calls), and the changes would be
quite extensive, for my taste too extensive for the little gain given
that embargoed releases need to pay extra attention to avoid introducing
inadvertent bugs.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Cloning from a partial repository must not fetch missing objects into
the partial repository, because that can lead to arbitrary code
execution.
Add a couple of test cases, pretending to the `upload-pack` command (and
to that command only) that it is working on a repository owned by
someone else.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Filip Hejsek <filip.hejsek@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We observed a series of clone failures arose in a specific set of
repositories after we fully enabled the MIDX bitmap feature within our
Codebase service. These failures were accompanied with error messages
such as:
Cloning into bare repository 'clone.git'...
remote: Enumerating objects: 8, done.
remote: Total 8 (delta 0), reused 0 (delta 0), pack-reused 8 (from 1)
Receiving objects: 100% (8/8), done.
fatal: did not receive expected object ...
fatal: fetch-pack: invalid index-pack output
Temporarily disabling the MIDX feature eliminated the reported issues.
After some investigation we found that all repositories experiencing
failures contain replace references, which seem to be improperly
acknowledged by the MIDX bitmap generation logic.
A more thorough explanation about the root cause from Taylor Blau says:
Indeed, the pack-bitmap-write machinery does not itself call
disable_replace_refs(). So when it generates a reachability bitmap, it
is doing so with the replace refs in mind. You can see that this is
indeed the cause of the problem by looking at the output of an
instrumented version of Git that indicates what bits are being set
during the bitmap generation phase.
With replace refs (incorrectly) enabled, we get:
[2, 4, 6, 8, 13, 3, 6, 7, 3, 4, 6, 8]
and doing the same after calling disable_replace_refs(), we instead get:
[2, 5, 6, 13, 3, 6, 7, 3, 4, 6, 8]
Single pack bitmaps are unaffected by this issue because we generate
them from within pack-objects, which does call disable_replace_refs().
This patch updates the MIDX logic to disable replace objects within the
multi-pack-index builtin, and a test showing a clone (which would fail
with MIDX bitmap) is added to demonstrate the bug.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 850b6edefa (auto-gc: extract a reusable helper from "git fetch",
2020-05-06), we have introduced a helper function `run_auto_gc()` that
kicks off `git gc --auto`. The intent of this function was to pass down
the "--quiet" flag to git-gc(1) as required without duplicating this at
all callsites. In 7c3e9e8cfb (auto-gc: pass --quiet down from am,
commit, merge and rebase, 2020-05-06) we then converted callsites that
need to pass down this flag to use the new helper function. This has the
notable omission of git-receive-pack(1), which is the only remaining
user of `git gc --auto` that sets up the proccess manually. This is
probably because it unconditionally passes down the `--quiet` flag and
thus didn't benefit much from the new helper function.
In a95ce12430 (maintenance: replace run_auto_gc(), 2020-09-17) we then
replaced `run_auto_gc()` with `run_auto_maintenance()` which invokes
git-maintenance(1) instead of git-gc(1). This command is the modern
replacement for git-gc(1) and is both more thorough and also more
flexible because administrators can configure which tasks exactly to run
during maintenance.
But due to git-receive-pack(1) not using `run_auto_gc()` in the first
place it did not get converted to use git-maintenance(1) like we do
everywhere else now. Address this oversight and start to use the newly
introduced function `prepare_auto_maintenance()`. This will also make it
easier for us to adapt this code together with all the other callsites
that invoke auto-maintenance in the future.
This removes the last internal user of `git gc --auto`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `run_auto_maintenance()` function is responsible for spawning a new
`git maintenance run --auto` process. To do so, it sets up the `sturct
child_process` and then runs it by executing `run_command()` directly.
This is rather inflexible in case callers want to modify the child
process somewhat, e.g. to redirect stderr or stdout.
Introduce a new `prepare_auto_maintenance()` function to plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now, there's no specific way to determine whether a credential
helper or git credential itself supports a given set of capabilities.
It would be helpful to have such a way, so let's let credential helpers
and git credential take an argument, "capability", which has it list the
capabilities and a version number on standard output.
Specifically choose a format that is slightly different from regular
credential output and assume that no capabilities are supported if a
non-zero exit status occurs or the data deviates from the format. It is
common for users to write small shell scripts as the argument to
credential.helper, which will almost never be designed to emit
capabilities. We want callers to gracefully handle this case by
assuming that they are not capable of extended support because that is
almost certainly the case, and specifying the error behavior up front
does this and preserves backwards compatibility in a graceful way.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have full support in Git for the authtype capability, let's
add support to the cache credential helper.
When parsing data, we always set the initial capabilities because we're
the helper, and we need both the initial and helper capabilities to be
set in order to have the helper capabilities take effect.
When emitting data, always emit the supported capability and make sure
we emit items only if we have them and they're supported by the caller.
Since we may no longer have a username or password, be sure to emit
those conditionally as well so we don't segfault on a NULL pointer.
Similarly, when comparing credentials, consider both the password and
credential fields when we're matching passwords.
Adjust the partial credential detection code so that we can store
credentials missing a username or password as long as they have an
authtype and credential.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's helpful to have some basic tests for credential helpers supporting
the authtype and credential fields. Let's add some tests for this case
so that we can make sure newly supported helpers work correctly.
Note that we explicitly check that credential helpers can produce
different sets of authtype and credential values based on the username.
While the username is not used in the HTTP protocol with authtype and
credential, it can still be specified in the URL and thus may be part of
the protocol. Additionally, because it is common for users to have
multiple accounts on one service (say, both personal and professional
accounts), it's very helpful to be able to store different credentials
for different accounts in the same helper, and that doesn't become less
useful if one is using, say, Bearer authentication instead of Basic.
Thus, credential helpers should be expected to support this
functionality as basic functionality, so verify here that they do so.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Over HTTP, NTLM and Kerberos require two rounds of authentication on the
client side. It's possible that there are custom authentication schemes
that also implement this same approach. Since these are tricky schemes
to implement and the HTTP library in use may not always handle them
gracefully on all systems, it would be helpful to allow the credential
helper to implement them instead for increased portability and
robustness.
To allow this to happen, add a boolean flag, continue, that indicates
that instead of failing when we get a 401, we should retry another round
of authentication. However, this necessitates some changes in our
current credential code so that we can make this work.
Keep the state[] headers between iterations, but only use them to send
to the helper and only consider the new ones we read from the credential
helper to be valid on subsequent iterations. That avoids us passing
stale data when we finally approve or reject the credential. Similarly,
clear the multistage and wwwauth[] values appropriately so that we
don't pass stale data or think we're trying a multiround response when
we're not. Remove the credential values so that we can actually fill a
second time with new responses.
Limit the number of iterations of reauthentication we do to 3. This
means that if there's a problem, we'll terminate with an error message
instead of retrying indefinitely and not informing the user (and
possibly conducting a DoS on the server).
In our tests, handle creating multiple response output files from our
helper so we can verify that each of the messages sent is correct.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some HTTP authentication schemes, such as NTLM- and Kerberos-based
options, require more than one round trip to authenticate. Currently,
these can only be supported in libcurl, since Git does not have support
for this in the credential helper protocol.
However, in a future commit, we'll add support for this functionality
into the credential helper protocol and Git itself. Because we don't
really want to implement either NTLM or Kerberos, both of which are
complex protocols, we'll want to test this using a fake credential
authentication scheme. In order to do so, update t5563 and its backend
to allow us to accept multiple sets of credentials and respond with
different behavior in each case.
Since we can now provide any number of possible status codes, provide a
non-specific reason phrase so we don't have to generate a more specific
one based on the response. The reason phrase is mandatory according to
the status-line production in RFC 7230, but clients SHOULD ignore it,
and curl does (except to print it).
Each entry in the authorization and challenge fields contains an ID,
which indicates a corresponding credential and response. If the
response is a 200 status, then we continue to execute git-http-backend.
Otherwise, we print the corresponding status and response. If no ID is
matched, we use the default response with a status of 401.
Note that there is an implicit order to the parameters. The ID is
always first and the creds or response value is always last, and
therefore may contain spaces, equals signs, or other arbitrary data.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We recently introduced a way for credential helpers to add arbitrary
state as part of the protocol. Set some limits on line length to avoid
helpers passing extremely large amounts of data. While Git doesn't have
a fixed parsing length, there are other tools which support this
protocol and it's kind to allow them to use a reasonable fixed-size
buffer for parsing. In addition, we would like to be moderate in our
memory usage and imposing reasonable limits is helpful for that purpose.
In the event a credential helper is incapable of storing its serialized
state in 64 KiB, it can feel free to serialize it on disk and store a
reference instead.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we've implemented the state capability, let's send it along by
default when filling credentials so we can make use of it.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Until now, our credential code has mostly deal with usernames and
passwords and we've let libcurl deal with the variant of authentication
to be used. However, now that we have the credential value, the
credential helper can take control of the authentication, so the value
provided might be something that's generated, such as a Digest hash
value.
In such a case, it would be helpful for a credential helper that gets an
erase or store command to be able to keep track of an identifier for the
original secret that went into the computation. Furthermore, some types
of authentication, such as NTLM and Kerberos, actually need two round
trips to authenticate, which will require that the credential helper
keep some state.
In order to allow for these use cases and others, allow storing state in
a field called "state[]". This value is passed back to the credential
helper that created it, which avoids confusion caused by parsing values
from different helpers.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have the credential helper code set up to handle arbitrary
authentications schemes, let's add support for this in the HTTP code,
where we really want to use it. If we're using this new functionality,
don't set a username and password, and instead set a header wherever
we'd normally do so, including for proxy authentication.
Since we can now handle this case, ask the credential helper to enable
the appropriate capabilities.
Finally, if we're using the authtype value, set "Expect: 100-continue".
Any type of authentication that requires multiple rounds (such as NTLM
or Kerberos) requires a 100 Continue (if we're larger than
http.postBuffer) because otherwise we send the pack data before we're
authenticated, the push gets a 401 response, and we can't rewind the
stream. We don't know for certain what other custom schemes might
require this, the HTTP/1.1 standard has required handling this since
1999, the broken HTTP server for which we disabled this (Google's) is
now fixed and has been for some time, and libcurl has a 1-second
fallback in case the HTTP server is still broken. In addition, it is
not unreasonable to require compliance with a 25-year old standard to
use new Git features. For all of these reasons, do so here.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have new fields (authtype and credential), let's document
them for users and credential helper implementers.
Indicate specifically what common values of authtype are and what values
are allowed. Note that, while common, digest and NTLM authentication
are insecure because they require unsalted, uniterated password hashes
to be stored.
Tell users that they can continue to use a username and password even if
the new capability is supported.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have support for a wide variety of types of authentication,
it's important to indicate to other credential helpers whether they
should store credentials, since not every credential helper may
intuitively understand all possible values of the authtype field. Do so
with a boolean field called "ephemeral", to indicate whether the
credential is expected to be temporary.
For example, in HTTP Digest authentication, the Authorization header
value is based off a nonce. It isn't useful to store this value
for later use because reusing the credential long term will not result
in successful authentication due to the nonce necessarily differing.
An additional case is potentially short-lived credentials, which may
last only a few hours. It similarly wouldn't be helper for other
credential helpers to attempt to provide these much later.
We do still pass the value to "git credential store" or "git credential
erase", since it may be helpful to the original helper to know whether
the operation was successful.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We support the new credential and authtype fields, but we lack a way to
indicate to a credential helper that we'd like them to be used. Without
some sort of indication, the credential helper doesn't know if it should
try to provide us a username and password, or a pre-encoded credential.
For example, the helper might prefer a more restricted Bearer token if
pre-encoded credentials are possible, but might have to fall back to
more general username and password if not.
Let's provide a simple way to indicate whether Git (or, for that matter,
the helper) is capable of understanding the authtype and credential
fields. We send this capability when we generate a request, and the
other side may reply to indicate to us that it does, too.
For now, don't enable sending capabilities for the HTTP code. In a
future commit, we'll introduce appropriate handling for that code,
which requires more in-depth work.
The logic for determining whether a capability is supported may seem
complex, but it is not. At each stage, we emit the capability to the
following stage if all preceding stages have declared it. Thus, if the
caller to git credential fill didn't declare it, then we won't send it
to the helper, and if fill's caller did send but the helper doesn't
understand it, then we won't send it on in the response. If we're an
internal user, then we know about all capabilities and will request
them.
For "git credential approve" and "git credential reject", we set the
helper capability before calling the helper, since we assume that the
input we're getting from the external program comes from a previous call
to "git credential fill", and thus we'll invoke send a capability to the
helper if and only if we got one from the standard input, which is the
correct behavior.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the moment, our credential code wants to find a username and password
for access, which, for HTTP, it will pass to libcurl to encode and
process. However, many users want to use authentication schemes that
libcurl doesn't support, such as Bearer authentication. In these
schemes, the secret is not a username and password pair, but some sort
of token that meets the production for authentication data in the RFC.
In fact, in general, it's useful to allow our credential helper to have
knowledge about what specifically to put in the protocol header. Thus,
add a field, credential, which contains data that's preencoded to be
suitable for the protocol in question. If we have such data, we need
neither a username nor a password, so make that adjustment as well.
It is in theory possible to reuse the password field for this. However,
if we do so, we must know whether the credential helper supports our new
scheme before sending it data, which necessitates some sort of
capability inquiry, because otherwise an uninformed credential helper
would store our preencoded data as a password, which would fail the next
time we attempted to connect to the remote server. This design is
substantially simpler, and we can hint to the credential helper that we
support this approach with a simple new field instead of needing to
query it first.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently we create one set of headers for all object requests and reuse
it. However, we'll need to adjust the headers for authentication
purposes in the future, so let's create a new set for each request so
that we can adjust them if the authentication changes.
Note that the cost of allocation here is tiny compared to the fact that
we're making a network call, not to mention probably a full TLS
connection, so this shouldn't have a significant impact on performance.
Moreover, nobody who cares about performance is using the dumb HTTP
protocol anyway, since it often makes huge numbers of requests compared
to the smart protocol.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we retry a post_rpc request, we currently reuse the same headers as
before. In the future, we'd like to be able to modify them based on the
result we get back, so let's reset them on each retry so we can avoid
sending potentially duplicate headers if the values change.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When Git makes an HTTP request, it can negotiate the type of
authentication to use with the server provided the authentication scheme
is one of a few well-known types (Basic, Digest, NTLM, or Negotiate).
However, some servers wish to use other types of authentication, such as
the Bearer type from OAuth2. Since libcurl doesn't natively support
this type, it isn't possible to use it, and the user is forced to
specify the Authorization header using the http.extraheader setting.
However, storing a plaintext token in the repository configuration is
not very secure, especially if a repository can be shared by multiple
parties. We already have support for many types of secure credential
storage by using credential helpers, so let's teach credential helpers
how to produce credentials for an arbitrary scheme.
If the credential helper specifies an authtype field, then it specifies
an authentication scheme (e.g., Bearer) and the password field specifies
the raw authentication token, with any encoding already specified. We
reuse the password field for this because some credential helpers store
the metadata without encryption even though the password is encrypted,
and we'd like to avoid insecure storage if an older version of the
credential helper gets ahold of the data.
The username is not used in this case, but it is still preserved for the
purpose of finding the right credential if the user has multiple
accounts.
If the authtype field is not specified, then the password behaves as
normal and it is passed along with the username to libcurl.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When performing a local clone of a repository we end up either copying
or hardlinking the source repository into the target repository. This is
significantly more performant than if we were to use git-upload-pack(1)
and git-fetch-pack(1) to create the new repository and preserves both
disk space and compute time.
Unfortunately though, performing such a local clone of a repository that
is not owned by the current user is inherently unsafe:
- It is possible that source files get swapped out underneath us while
we are copying or hardlinking them. While we do perform some checks
here to assert that we hardlinked the expected file, they cannot
reliably thwart time-of-check-time-of-use (TOCTOU) style races. It
is thus possible for an adversary to make us copy or hardlink
unexpected files into the target directory.
Ideally, we would address this by starting to use openat(3P),
fstatat(3P) and friends. Due to platform compatibility with Windows
we cannot easily do that though. Furthermore, the scope of these
fixes would likely be quite broad and thus not fit for an embargoed
security release.
- Even if we handled TOCTOU-style races perfectly, hardlinking files
owned by a different user into the target repository is not a good
idea in general. It is possible for an adversary to rewrite those
files to contain whatever data they want even after the clone has
completed.
Address these issues by completely refusing local clones of a repository
that is not owned by the current user. This reuses our existing infra we
have in place via `ensure_valid_ownership()` and thus allows a user to
override the safety guard by adding the source repository path to the
"safe.directory" configuration.
This addresses CVE-2024-32020.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Introduce a new function `die_upon_dubious_ownership()` that uses
`ensure_valid_ownership()` to verify whether a repositroy is safe for
use, and causes Git to die in case it is not.
This function will be used in a subsequent commit.
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When performing local clones with hardlinks we refuse to copy source
files which are symlinks as a mitigation for CVE-2022-39253. This check
can be raced by an adversary though by changing the file to a symlink
after we have checked it.
Fix the issue by checking whether the hardlinked destination file
matches the source file and abort in case it doesn't.
This addresses CVE-2024-32021.
Reported-by: Apple Product Security <product-security@apple.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When a user performs a local clone without `--no-local`, then we end up
copying the source repository into the target repository directly. To
optimize this even further, we try to hardlink files into place instead
of copying data over, which helps both disk usage and speed.
There is an important edge case in this context though, namely when we
try to hardlink symlinks from the source repository into the target
repository. Depending on both platform and filesystem the resulting
behaviour here can be different:
- On macOS and NetBSD, calling link(3P) with a symlink target creates
a hardlink to the file pointed to by the symlink.
- On Linux, calling link(3P) instead creates a hardlink to the symlink
itself.
To unify this behaviour, 36596fd2df (clone: better handle symlinked
files at .git/objects/, 2019-07-10) introduced logic to resolve symlinks
before we try to link(3P) files. Consequently, the new behaviour was to
always create a hard link to the target of the symlink on all platforms.
Eventually though, we figured out that following symlinks like this can
cause havoc when performing a local clone of a malicious repository,
which resulted in CVE-2022-39253. This issue was fixed via 6f054f9fb3
(builtin/clone.c: disallow `--local` clones with symlinks, 2022-07-28),
by refusing symlinks in the source repository.
But even though we now shouldn't ever link symlinks anymore, the code
that resolves symlinks still exists. In the best case the code does not
end up doing anything because there are no symlinks anymore. In the
worst case though this can be abused by an adversary that rewrites the
source file after it has been checked not to be a symlink such that it
actually is a symlink when we call link(3P). Thus, it is still possible
to recreate CVE-2022-39253 due to this time-of-check-time-of-use bug.
Remove the call to `realpath()`. This doesn't yet address the actual
vulnerability, which will be handled in a subsequent commit.
Reported-by: Apple Product Security <product-security@apple.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Update remaining GitHub Actions jobs to avoid warnings against
using deprecated version of Node.js.
* js/github-actions-update:
ci(linux32): add a note about Actions that must not be updated
ci: bump remaining outdated Actions versions
With this backport, `maint-2.39`'s CI builds are finally healthy again.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
* jc/maint-github-actions-update:
GitHub Actions: update to github-script@v7
GitHub Actions: update to checkout@v4
Yet another thing to help `maint-2.39`'s CI builds to become healthy
again.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
The Docker container used by the `linux32` job comes without Node.js,
and therefore the `actions/checkout` and `actions/upload-artifact`
Actions cannot be upgraded to the latest versions (because they use
Node.js).
One time too many, I accidentally tried to update them, where
`actions/checkout` at least fails immediately, but the
`actions/upload-artifact` step is only used when any test fails, and
therefore the CI run usually passes even though that Action was updated
to a version that is incompatible with the Docker container in which
this job runs.
So let's add a big fat warning, mainly for my own benefit, to avoid
running into the very same issue over and over again.
Backported-from: 20e0ff8835 (ci(linux32): add a note about Actions that must not be updated, 2024-02-11)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We seem to be getting "Node.js 16 actions are deprecated." warnings
for jobs that use github-script@v6. Update to github-script@v7,
which is said to use Node.js 20.
Backported-from: c4ddbe043e (GitHub Actions: update to github-script@v7, 2024-02-02)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>