* maint-2.46:
Git 2.46.3
Git 2.45.3
Git 2.44.3
Git 2.43.6
Git 2.42.4
Git 2.41.3
Git 2.40.4
credential: disallow Carriage Returns in the protocol by default
credential: sanitize the user prompt
credential_format(): also encode <host>[:<port>]
t7300: work around platform-specific behaviour with long paths on MinGW
compat/regex: fix argument order to calloc(3)
mingw: drop bogus (and unneeded) declaration of `_pgmptr`
ci: remove 'Upload failed tests' directories' step from linux32 jobs
* maint-2.45:
Git 2.45.3
Git 2.44.3
Git 2.43.6
Git 2.42.4
Git 2.41.3
Git 2.40.4
credential: disallow Carriage Returns in the protocol by default
credential: sanitize the user prompt
credential_format(): also encode <host>[:<port>]
t7300: work around platform-specific behaviour with long paths on MinGW
compat/regex: fix argument order to calloc(3)
mingw: drop bogus (and unneeded) declaration of `_pgmptr`
ci: remove 'Upload failed tests' directories' step from linux32 jobs
* maint-2.43:
Git 2.43.6
Git 2.42.4
Git 2.41.3
Git 2.40.4
credential: disallow Carriage Returns in the protocol by default
credential: sanitize the user prompt
credential_format(): also encode <host>[:<port>]
t7300: work around platform-specific behaviour with long paths on MinGW
compat/regex: fix argument order to calloc(3)
mingw: drop bogus (and unneeded) declaration of `_pgmptr`
ci: remove 'Upload failed tests' directories' step from linux32 jobs
* maint-2.42:
Git 2.42.4
Git 2.41.3
Git 2.40.4
credential: disallow Carriage Returns in the protocol by default
credential: sanitize the user prompt
credential_format(): also encode <host>[:<port>]
t7300: work around platform-specific behaviour with long paths on MinGW
compat/regex: fix argument order to calloc(3)
mingw: drop bogus (and unneeded) declaration of `_pgmptr`
ci: remove 'Upload failed tests' directories' step from linux32 jobs
When asking the user interactively for credentials, we want to avoid
misleading them e.g. via control sequences that pretend that the URL
targets a trusted host when it does not.
While Git learned, over the course of the preceding commits, to disallow
URLs containing URL-encoded control characters by default, credential
helpers are still allowed to specify values very freely (apart from Line
Feed and NUL characters, anything is allowed), and this would allow,
say, a username containing control characters to be specified that would
then be displayed in the interactive terminal prompt asking the user for
the password, potentially sending those control characters directly to
the terminal. This is undesirable because control characters can be used
to mislead users to divulge secret information to untrusted sites.
To prevent such an attack vector, let's add a `git_prompt()` that forces
the displayed text to be sanitized, i.e. displaying question marks
instead of control characters.
Note: While this commit's diff changes a lot of `user@host` strings to
`user%40host`, which may look suspicious on the surface, there is a good
reason for that: this string specifies a user name, not a
<username>@<hostname> combination! In the context of t5541, the actual
combination looks like this: `user%40@127.0.0.1:5541`. Therefore, these
string replacements document a net improvement introduced by this
commit, as `user@host@127.0.0.1` could have left readers wondering where
the user name ends and where the host name begins.
Hinted-at-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Now that the default value for TEST_PASSES_SANITIZE_LEAK is `true` there
is no longer a need to have that variable declared in all of our tests.
Drop it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The dumb-http code regressed when the result of re-indexing a pack
yielded an *.idx file that differs in content from the *.idx file it
downloaded from the remote. This has been corrected by no longer
relying on the *.idx file we got from the remote.
* jk/dumb-http-finalize:
packfile: use oidread() instead of hashcpy() to fill object_id
packfile: use object_id in find_pack_entry_one()
packfile: convert find_sha1_pack() to use object_id
http-walker: use object_id instead of bare hash
packfile: warn people away from parse_packed_git()
packfile: drop sha1_pack_index_name()
packfile: drop sha1_pack_name()
packfile: drop has_pack_index()
dumb-http: store downloaded pack idx as tempfile
t5550: count fetches in "previously-fetched .idx" test
midx: avoid duplicate packed_git entries
This patch fixes a regression in b1b8dfde69 (finalize_object_file():
implement collision check, 2024-09-26) where fetching a v1 pack idx file
over the dumb-http protocol would cause the fetch to fail.
The core of the issue is that dumb-http stores the idx we fetch from the
remote at the same path that will eventually hold the idx we generate
from "index-pack --stdin". The sequence is something like this:
0. We realize we need some object X, which we don't have locally, and
nor does the other side have it as a loose object.
1. We download the list of remote packs from objects/info/packs.
2. For each entry in that file, we download each pack index and store
it locally in .git/objects/pack/pack-$hash.idx (the $hash is not
something we can verify yet and is given to us by the remote).
3. We check each pack index we got to see if it has object X. When we
find a match, we download the matching .pack file from the remote
to a tempfile. We feed that to "index-pack --stdin", which
reindexes the pack, rather than trusting that it has what the other
side claims it does. In most cases, this will end up generating the
exact same (byte-for-byte) pack index which we'll store at the same
pack-$hash.idx path, because the index generation and $hash id are
computed based on what's in the packfile. But:
a. The other side might have used other options to generate the
index. For instance we use index v2 by default, but long ago
it was v1 (and you can still ask for v1 explicitly).
b. The other side might even use a different mechanism to
determine $hash. E.g., long ago it was based on the sorted
list of objects in the packfile, but we switched to using the
pack checksum in 1190a1acf8 (pack-objects: name pack files
after trailer hash, 2013-12-05).
The regression we saw in the real world was (3a). A recent client
fetching from a server with a v1 index downloaded that index, then
complained about trying to overwrite it with its own v2 index. This
collision is otherwise harmless; we know we want to replace the remote
version with our local one, but the collision check doesn't realize
that.
There are a few options to fix it:
- we could teach index-pack a command-line option to ignore only pack
idx collisions, and use it when the dumb-http code invokes
index-pack. This would be an awkward thing to expose users to and
would involve a lot of boilerplate to get the option down to the
collision code.
- we could delete the remote .idx file right before running
index-pack. It should be redundant at that point (since we've just
downloaded the matching pack). But it feels risky to delete
something from our own .git/objects based on what the other side has
said. I'm not entirely positive that a malicious server couldn't lie
about which pack-$hash.idx it has and get us to delete something
precious.
- we can stop co-mingling the downloaded idx files in our local
objects directory. This is a slightly bigger change but I think
fixes the root of the problem more directly.
This patch implements the third option. The big design questions are:
where do we store the downloaded files, and how do we manage their
lifetimes?
There are some additional quirks to the dumb-http system we should
consider. Remember that in step 2 we downloaded every pack index, but in
step 3 we may only download some of the matching packs. What happens to
those other idx files now? They sit in the .git/objects/pack directory,
possibly waiting to be used at a later date. That may save bandwidth for
a subsequent fetch, but it also creates a lot of weird corner cases:
- our local object directory now has semi-untrusted .idx files sitting
around, without their matching .pack
- in case 3b, we noted that we might not generate the same hash as the
other side. In that case even if we download the matching pack,
our index-pack invocation will store it in a different
pack-$hash.idx file. And the unmatched .idx will sit there forever.
- if the server repacks, it may delete the old packs. Now we have
these orphaned .idx files sitting around locally that will never be
used (nor deleted).
- if we repack locally we may delete our local version of the server's
pack index and not realize we have it. So we'll download it again,
even though we have all of the objects it mentions.
I think the right solution here is probably some more complex cache
management system: download the remote .idx files to their own storage
directory, mark them as "seen" when we get their matching pack (to avoid
re-downloading even if we repack), and then delete them when the
server's objects/info/refs no longer mentions them.
But since the dumb http protocol is so ancient and so inferior to the
smart http protocol, I don't think it's worth spending a lot of time
creating such a system. For this patch I'm just downloading the idx
files to .git/objects/tmp_pack_*, and marking them as tempfiles to be
deleted when we exit (and due to the name, any we miss due to a crash,
etc, should eventually be removed by "git gc" runs based on timestamps).
That is slightly worse for one case: if we download an idx but not the
matching pack, we won't retain that idx for subsequent runs. But the
flip side is that we're making other cases better (we never hold on to
useless idx files forever). I suspect that worse case does not even come
up often, since it implies that the packs are generated to match
distinct parts of history (i.e., in practice even in a repo with many
packs you're going to end up grabbing all of those packs to do a clone).
If somebody really cares about that, I think the right path forward is a
managed cache directory as above, and this patch is providing the first
step in that direction anyway (by moving things out of the objects/pack/
directory).
There are two test changes. One demonstrates the broken v1 index case
(it double-checks the resulting clone with fsck to be careful, but prior
to this patch it actually fails at the clone step). The other tweaks the
expectation for a test that covers the "slightly worse" case to
accommodate the extra index download.
The code changes are fairly simple. We stop using finalize_object_file()
to copy the remote's index file into place, and leave it as a tempfile.
We give the tempfile a real ".idx" name, since the packfile code expects
that, and thus we make sure it is out of the usual packs/ directory (so
we'd never mistake it for a real local .idx).
We also have to change parse_pack_index(), which creates a temporary
packed_git to access our index (we need this because all of the pack idx
code assumes we have that struct). It reads the index data from the
tempfile, but prior to this patch would speculatively write the
finalized name into the packed_git struct using the pack-$hash we expect
to use.
I was mildly surprised that this worked at all, since we call
verify_pack_index() on the packed_git which mentions the final name
before moving the file into place! But it works because
parse_pack_index() leaves the mmap-ed data in the struct, so the
lazy-open in verify_pack_index() never triggers, and we read from the
tempfile, ignoring the filename in the struct completely. Hacky, but it
works.
After this patch, parse_pack_index() now uses the index filename we pass
in to derive a matching .pack name. This is OK to change because there
are only two callers, both in the dumb http code (and the other passes
in an existing pack-$hash.idx name, so the derived name is going to be
pack-$hash.pack, which is what we were using anyway).
I'll follow up with some more cleanups in that area, but this patch is
sufficient to fix the regression.
Reported-by: fox <fox.gbr@townlong-yak.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
We have a test in t5550 that looks at index fetching over dumb http. It
creates two branches, each of which is completely stored in its own
pack, then fetches the branches independently. What should (and does)
happen is that the first fetch grabs both .idx files and one .pack file,
and then the fetch of the second branch re-uses the previously
downloaded .idx files (fetching none) and grabs the now-required .pack
file.
Since the next few patches will be touching this area of the code, let's
beef up the test a little by checking that we're downloading the
expected items at each step.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The iconv library is used by Git to reencode files, commit messages and
other things. As such it is a rather integral part, but given that many
platforms nowadays use UTF-8 everywhere you can live without support for
reencoding in many situations. It is thus optional to build Git with
iconv, and some of our platforms wired up in "config.mak.uname" disable
it. But while we support building without it, running our test suite
with "NO_ICONV=Yes" causes many test failures.
Wire up a new test prerequisite ICONV that gets populated via our
GIT-BUILD-OPTIONS. Annotate failing tests accordingly.
Note that this commit does not do a deep dive into every single test to
assess whether the failure is expected or not. Most of the tests do
smell like the expected kind of failure though.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The dumb-http walker code creates a "fake" packed_git list representing
packs we've downloaded from the remote (I call it "fake" because
generally that struct is only used and managed by the local repository
struct). But during our cleanup phase we don't touch those at all,
causing a leak.
There's no support here from the rest of the object-database API, as
these structs are not meant to be freed, except when closing the object
store completely. But we can see that raw_object_store_clear() just
calls free() on them, and that's enough here to fix the leak.
I also added a call to close_pack() before each. In the regular code
this happens via close_object_store(), which we do as part of
raw_object_store_clear(). This is necessary to prevent leaking mmap'd
data (like the pack idx) or descriptors. The leak-checker won't catch
either of these itself, but I did confirm with some hacky warning()
calls and running t5550 that it's easy to leak at least index data.
This is all much more intimate with the packed_git struct than I'd like,
but I think fixing it would be a pretty big refactor. And it's just not
worth it for dumb-http code which is rarely used these days. If we can
silence the leak-checker without creating too much hassle, we should
just do that.
This lets us mark t5550 as leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-http-fetch(1) command accepts a `--packfile=` option, which
allows the user to specify that it shall fetch a specific packfile,
only. The parameter here is the hash of the packfile, which is specific
to the object hash used by the repository. This requirement is implicit
though via our use of `parse_oid_hex()`, which internally uses
`the_repository`.
The git-http-fetch(1) command allows for there to be no repository
though, which only exists such that we can show usage via the "-h"
option. In that case though, starting with c8aed5e8da (repository: stop
setting SHA1 as the default object hash, 2024-05-07), `the_repository`
does not have its object hash initialized anymore and thus we would
crash when trying to parse the object ID outside of a repository.
Fix this issue by dying immediately when we see a "--packfile="
parameter when outside a Git repository. This is not a functional
regression as we would die later on with the same error anyway.
Add a test to detect the segfault. We use the "nongit" function to do
so, which we need to allow-list in `test_must_fail ()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The dumb HTTP transport tries to read the remote HEAD reference by
downloading the "HEAD" file and then parsing it via `http_fetch_ref()`.
This function will either parse the file as an object ID in case it is
exactly `the_hash_algo->hexsz` long, or otherwise it will check whether
the reference starts with "ref :" and parse it as a symbolic ref.
This is broken when parsing detached HEADs of a remote SHA256 repository
because we never update `the_hash_algo` to the discovered remote object
hash. Consequently, `the_hash_algo` will always be the fallback SHA1
hash algorithm, which will cause us to fail parsing HEAD altogteher when
it contains a SHA256 object ID.
Fix this issue by setting up `the_hash_algo` via `repo_set_hash_algo()`.
While at it, let's make the expected SHA1 fallback explicit in our code,
which also addresses an upcoming issue where we are going to remove the
SHA1 fallback for `the_hash_algo`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're currently creating the reference database with a potentially
incorrect object format when the remote repository's object format is
different from the local default object format. This works just fine for
now because the files backend never records the object format anywhere.
But this logic will fail with any new reference backend that encodes
this information in some form either on-disk or in-memory.
The preceding commits have reshuffled code in git-clone(1) so that there
is no code path that will access the reference database before we have
detected the remote's object format. With these refactorings we can now
defer initialization of the reference database until after we have
learned the remote's object format and thus initialize it with the
correct format from the get-go.
These refactorings are required to make git-clone(1) work with the
upcoming reftable backend when cloning repositories with the SHA256
object format.
This change breaks a test in "t5550-http-fetch-dumb.sh" when cloning an
empty repository with `GIT_TEST_DEFAULT_HASH=sha256`. The test expects
the resulting hash format of the empty cloned repository to match the
default hash, but now we always end up with a sha1 repository. The
problem is that for dumb HTTP fetches, we have no easy way to figure out
the remote's hash function except for deriving it based on the hash
length of refs in `info/refs`. But as the remote repository is empty we
cannot rely on this detection mechanism.
Before the change in this commit we already initialized the repository
with the default hash function and then left it as-is. With this patch
we always use the hash function detected via the remote, where we fall
back to "sha1" in case we cannot detect it.
Neither the old nor the new behaviour are correct as we second-guess the
remote hash function in both cases. But given that this is a rather
unlikely edge case (we use the dumb HTTP protocol, the remote repository
uses SHA256 and the remote repository is empty), let's simply adapt the
test to assert the new behaviour. If we want to properly address this
edge case in the future we will have to extend the dumb HTTP protocol so
that we can properly detect the hash function for empty repositories.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
They are equivalents and the former still exists, so as long as the
only change this commit makes are to rewrite test_i18ngrep to
test_grep, there won't be any new bug, even if there still are
callers of test_i18ngrep remaining in the tree, or when merged to
other topics that add new uses of test_i18ngrep.
This patch was produced more or less with
git grep -l -e 'test_i18ngrep ' 't/t[0-9][0-9][0-9][0-9]-*.sh' |
xargs perl -p -i -e 's/test_i18ngrep /test_grep /'
and a good way to sanity check the result yourself is to run the
above in a checkout of c4603c1c (test framework: further deprecate
test_i18ngrep, 2023-10-31) and compare the resulting working tree
contents with the result of applying this patch to the same commit.
You'll see that test_i18ngrep in a few t/lib-*.sh files corrected,
in addition to the manual reproduction.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The CodingGuidelines says we should avoid \{m,n\} in BRE usage.
And their usages in our code base is limited, and subjectively
hard to read.
Replace them with ERE.
Except for "0\{40\}" which would be changed to "$ZERO_OID",
which is a better value for testing with:
GIT_TEST_DEFAULT_HASH=sha256
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier, HTTP transport clients learned to tell the server side
what locale they are in by sending Accept-Language HTTP header, but
this was done only for some requests but not others.
* ll/curl-accept-language:
remote-curl: send Accept-Language header to server
Git server end's ability to accept Accept-Language header was introduced
in f18604bbf2 (http: add Accept-Language header if possible, 2015-01-28),
but this is only used by very early phase of the transfer, which is HTTP
GET request to discover references. For other phases, like POST request
in the smart HTTP, the server does not know what language the client
speaks.
Teach git client to learn end-user's preferred language and throw
accept-language header to the server side. Once the server gets this header,
it has the ability to talk to end-user with language they understand.
This would be very helpful for many non-English speakers.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Li Linchao <lilinchao@oschina.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change those tests that assumed that a .git/info directory would be
created for them when writing .git/info/refs to explicitly create the
directory by using the "--template=" argument to "git init".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the http tests to use "test_hook" insteadd of
"write_script". In both cases we can get rid of sub-shelling. For
"t/t5550-http-fetch-dumb.sh" add a trivial helper which sets up the
hook and calls "update-server-info".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The dumb HTTP protocol exposes ref storage details as part of the protocol,
so it only works with the FILES refstorage backend
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The approach to "fsck" the incoming objects in "index-pack" is
attractive for performance reasons (we have them already in core,
inflated and ready to be inspected), but fundamentally cannot be
applied fully when we receive more than one pack stream, as a tree
object in one pack may refer to a blob object in another pack as
".gitmodules", when we want to inspect blobs that are used as
".gitmodules" file, for example. Teach "index-pack" to emit
objects that must be inspected later and check them in the calling
"fetch-pack" process.
* jt/transfer-fsck-across-packs:
fetch-pack: print and use dangling .gitmodules
fetch-pack: with packfile URIs, use index-pack arg
http-fetch: allow custom index-pack args
http: allow custom index-pack args
This is the next step in teaching fetch-pack to pass its index-pack
arguments when processing packfiles referenced by URIs.
The "--keep" in fetch-pack.c will be replaced with a full message in a
subsequent commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This trick was performed via
$ (cd t &&
sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
-e 's/Master/Main/g' -e 's/retsam/niam/g' \
-- t55[4-9]*.sh t556x*)
This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.
Note that t5541 uses the reversed `master` name: `retsam`. We replace it
by the equivalent for `main`: `niam`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In addition to the manual adjustment to let the `linux-gcc` CI job run
the test suite with `master` and then with `main`, this patch makes sure
that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts
that currently rely on the initial branch name being `master by default.
To determine which test scripts to mark up, the first step was to
force-set the default branch name to `master` in
- all test scripts that contain the keyword `master`,
- t4211, which expects `t/t4211/history.export` with a hard-coded ref to
initialize the default branch,
- t5560 because it sources `t/t556x_common` which uses `master`,
- t8002 and t8012 because both source `t/annotate-tests.sh` which also
uses `master`)
This trick was performed by this command:
$ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
' $(git grep -l master t/t[0-9]*.sh) \
t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh
After that, careful, manual inspection revealed that some of the test
scripts containing the needle `master` do not actually rely on a
specific default branch name: either they mention `master` only in a
comment, or they initialize that branch specificially, or they do not
actually refer to the current default branch. Therefore, the
aforementioned modification was undone in those test scripts thusly:
$ git checkout HEAD -- \
t/t0027-auto-crlf.sh t/t0060-path-utils.sh \
t/t1011-read-tree-sparse-checkout.sh \
t/t1305-config-include.sh t/t1309-early-config.sh \
t/t1402-check-ref-format.sh t/t1450-fsck.sh \
t/t2024-checkout-dwim.sh \
t/t2106-update-index-assume-unchanged.sh \
t/t3040-subprojects-basic.sh t/t3301-notes.sh \
t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \
t/t3436-rebase-more-options.sh \
t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \
t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \
t/t5511-refspec.sh t/t5526-fetch-submodules.sh \
t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \
t/t5548-push-porcelain.sh \
t/t5552-skipping-fetch-negotiator.sh \
t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \
t/t5614-clone-submodules-shallow.sh \
t/t7508-status.sh t/t7606-merge-custom.sh \
t/t9302-fast-import-unpack-limit.sh
We excluded one set of test scripts in these commands, though: the range
of `git p4` tests. The reason? `git p4` stores the (foreign) remote
branch in the branch called `p4/master`, which is obviously not the
default branch. Manual analysis revealed that only five of these tests
actually require a specific default branch name to pass; They were
modified thusly:
$ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
' t/t980[0167]*.sh t/t9811*.sh
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
SHA-256 migration work continues.
* bc/sha-256-part-2: (44 commits)
remote-testgit: adapt for object-format
bundle: detect hash algorithm when reading refs
t5300: pass --object-format to git index-pack
t5704: send object-format capability with SHA-256
t5703: use object-format serve option
t5702: offer an object-format capability in the test
t/helper: initialize the repository for test-sha1-array
remote-curl: avoid truncating refs with ls-remote
t1050: pass algorithm to index-pack when outside repo
builtin/index-pack: add option to specify hash algorithm
remote-curl: detect algorithm for dumb HTTP by size
builtin/ls-remote: initialize repository based on fetch
t5500: make hash independent
serve: advertise object-format capability for protocol v2
connect: parse v2 refs with correct hash algorithm
connect: pass full packet reader when parsing v2 refs
Documentation/technical: document object-format for protocol v2
t1302: expect repo format version 1 for SHA-256
builtin/show-index: provide options to determine hash algo
t5302: modernize test formatting
...
When reading the info/refs file for a repository, we have no explicit
way to detect which hash algorithm is in use because the file doesn't
provide one. Detect the hash algorithm in use by the size of the first
object ID.
If we have an empty repository, we don't know what the hash algorithm is
on the remote side, so default to whatever the local side has
configured. Without doing this, we cannot clone an empty repository
since we don't know its hash algorithm. Test this case appropriately,
since we currently have no tests for cloning an empty repository with
the dumb HTTP protocol.
We anonymize the URL like elsewhere in the function in case the user has
decided to include a secret in the URL.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach http-fetch the ability to download packfiles directly, given a
URL, and to verify them.
The http_pack_request suite has been augmented with a function that
takes a URL directly. With this function, the hash is only used to
determine the name of the temporary file.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Until "credential: refuse to operate when missing host or protocol",
Git's credential handling code interpreted URLs with empty scheme to
mean "give me credentials matching this host for any protocol".
Luckily libcurl does not recognize such URLs (it tries to look for a
protocol named "" and fails). Just in case that changes, let's reject
them within Git as well. This way, credential_from_url is guaranteed to
always produce a "struct credential" with protocol and host set.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
libcurl permits making requests without a URL scheme specified. In
this case, it guesses the URL from the hostname, so I can run
git ls-remote http::ftp.example.com/path/to/repo
and it would make an FTP request.
Any user intentionally using such a URL is likely to have made a typo.
Unfortunately, credential_from_url is not able to determine the host and
protocol in order to determine appropriate credentials to send, and
until "credential: refuse to operate when missing host or protocol",
this resulted in another host's credentials being leaked to the named
host.
Teach credential_from_url_gently to consider such a URL to be invalid
so that fsck can detect and block gitmodules files with such URLs,
allowing server operators to avoid serving them to downstream users
running older versions of Git.
This also means that when such URLs are passed on the command line, Git
will print a clearer error so affected users can switch to the simpler
URL that explicitly specifies the host and protocol they intend.
One subtlety: .gitmodules files can contain relative URLs, representing
a URL relative to the URL they were cloned from. The relative URL
resolver used for .gitmodules can follow ".." components out of the path
part and past the host part of a URL, meaning that such a relative URL
can be used to traverse from a https://foo.example.com/innocent
superproject to a https::attacker.example.com/exploit submodule.
Fortunately a leading ':' in the first path component after a series of
leading './' and '../' components is unlikely to show up in other
contexts, so we can catch this by detecting that pattern.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
In the 'did not use upload-pack service' test, we have a complicated
song-and-dance to ensure that there are no "/git-upload-pack" lines in
"$HTTPD_ROOT_PATH/access.log". Simplify this by just checking that grep
returns a non-zero exit code.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test framework update to more robustly clean up leftover files and
processes after tests are done.
* sg/test-atexit:
t9811-git-p4-label-import: fix pipeline negation
git p4 test: disable '-x' tracing in the p4d watchdog loop
git p4 test: simplify timeout handling
git p4 test: clean up the p4d cleanup functions
git p4 test: use 'test_atexit' to kill p4d and the watchdog process
t0301-credential-cache: use 'test_atexit' to stop the credentials helper
tests: use 'test_atexit' to stop httpd
git-daemon: use 'test_atexit` to stop 'git-daemon'
test-lib: introduce 'test_atexit'
t/lib-git-daemon: make sure to kill the 'git-daemon' process
test-lib: fix interrupt handling with 'dash' and '--verbose-log -x'
If the dumb-http walker encounters a 404 when fetching a loose object,
it then looks at any http-alternates for the object. The 404 check is
implemented by missing_target(), which checks not only the http code,
but also that we got an http error from the CURLcode.
That broke when we stopped using CURLOPT_FAILONERROR in 17966c0a63
(http: avoid disconnecting on 404s for loose objects, 2016-07-11), since
our CURLcode will now be CURLE_OK. As a result, fetching over dumb-http
from a repository with alternates could result in Git printing "Unable
to find abcd1234..." and aborting.
We could probably fix this just by loosening missing_target(). However,
there's other code which looks at the curl result, and it would have to
be tweaked as well. Instead, let's just normalize the result the same
way the smart-http code does.
There's a similar case in processing the alternates (where we failover
from "info/http-alternates" to "info/alternates"). We'll give it the
same treatment.
After this patch, we should be hitting all code paths that need this
normalization (notably absent here is the http_pack_request path, but it
does not use FAILONERROR, nor missing_target()).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use 'test_atexit' to run cleanup commands to stop httpd at the end of
the test script or upon interrupt or failure, as it is shorter,
simpler, and more robust than registering such cleanup commands in the
trap on EXIT in the test scripts.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a follow-up to a6c786fce8 (Mark http-fetch without -a as
deprecated, 2011-08-23). For more than six years, we have been warning
when `-a` is not provided, and the documentation has been saying that
`-a` will become the default.
It is a bit unclear what "default" means here. There is no such thing as
`http-fetch --no-a`. But according to my searches, no-one has been
asking on the mailing list how they should silence the warning and
prepare for overriding the flipped default. So let's assume that
everybody is happy with `-a`. They should be, since not using it may
break the repo in such a way that Git itself is unable to fix it.
Always behave as if `-a` was given. Since `-a` implies `-c` (get commit
objects) and `-t` (get trees), all three options are now unnecessary.
Document all of these as historical artefacts that have no effect.
Leave no-op code for handling these options in http-fetch.c. The
options-handling is currently rather loose. If someone tightens it, we
will not want these ignored options to accidentally turn into hard
errors.
Since `-a` was the only safe and sane usage and we have been pushing
people towards it for a long time, refrain from warning when it is used
"unnecessarily" now. Similarly, do not add anything scary-looking to the
man-page about how it will be removed in the future. We can always do so
later. (It is not like we are in desperate need of freeing up
one-letter arguments.)
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The post-update hooks created in t5550-http-fetch-dumb.sh is missing the
"!#/bin/sh" line which can cause issues with portability. Instead
create the hook using the 'write_script' function which includes the
proper "#!" line.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a redirected http transport gets an error during the
redirected request, we ignored the error we got from the server,
and ended up giving a not-so-useful error message.
* jt/http-base-url-update-upon-redirect:
http: attempt updating base URL only if no error
http.c supports HTTP redirects of the form
http://foo/info/refs?service=git-upload-pack
-> http://anything
-> http://bar/info/refs?service=git-upload-pack
(that is to say, as long as the Git part of the path and the query
string is preserved in the final redirect destination, the intermediate
steps can have any URL). However, if one of the intermediate steps
results in an HTTP exception, a confusing "unable to update url base
from redirection" message is printed instead of a Curl error message
with the HTTP exception code.
This was introduced by 2 commits. Commit c93c92f ("http: update base
URLs when we see redirects", 2013-09-28) introduced a best-effort
optimization that required checking if only the "base" part of the URL
differed between the initial request and the final redirect destination,
but it performed the check before any HTTP status checking was done. If
something went wrong, the normal code path was still followed, so this
did not cause any confusing error messages until commit 6628eb4 ("http:
always update the base URL for redirects", 2016-12-06), which taught
http to die if the non-"base" part of the URL differed.
Therefore, teach http to check the HTTP status before attempting to
check if only the "base" part of the URL differed. This commit teaches
http_request_reauth to return early without updating options->base_url
upon an error; the only invoker of this function that passes a non-NULL
"options" is remote-curl.c (through "http_get_strbuf"), which only uses
options->base_url for an informational message in the situations that
this commit cares about (that is, when the return value is not HTTP_OK).
The included test checks that the redirect scheme at the beginning of
this commit message works, and that returning a 502 in the middle of the
redirect scheme produces the correct result. Note that this is different
from the test in commit 6628eb4 ("http: always update the base URL for
redirects", 2016-12-06) in that this commit tests that a Git-shaped URL
(http://.../info/refs?service=git-upload-pack) works, whereas commit
6628eb4 tests that a non-Git-shaped URL
(http://.../info/refs/foo?service=git-upload-pack) does not work (even
though Git is processing that URL) and is an error that is fatal, not
silently swallowed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git ls-remote" and "git archive --remote" are designed to work
without being in a directory under Git's control. However, recent
updates revealed that we randomly look into a directory called
.git/ without actually doing necessary set-up when working in a
repository. Stop doing so.
* jn/remote-helpers-with-git-dir:
remote helpers: avoid blind fall-back to ".git" when setting GIT_DIR
remote: avoid reading $GIT_DIR config in non-repo
To push from or fetch to the current repository, remote helpers need
to know what repository that is. Accordingly, Git sets the GIT_DIR
environment variable to the path to the current repository when
invoking remote helpers.
There is a special case it does not handle: "git ls-remote" and "git
archive --remote" can be run to inspect a remote repository without
being run from any local repository. GIT_DIR is not useful in this
scenario:
- if we are not in a repository, we don't need to set GIT_DIR to
override an existing GIT_DIR value from the environment. If GIT_DIR
is present then we would be in a repository if it were valid and
would have called die() if it weren't.
- not setting GIT_DIR may cause a helper to do the usual discovery
walk to find the repository. But we know we're not in one, or we
would have found it ourselves. So in the worst case it may expend
a little extra effort to try to find a repository and fail (for
example, remote-curl would do this to try to find repository-level
configuration).
So leave GIT_DIR unset in this case. This makes GIT_DIR easier to
understand for remote helper authors and makes transport code less of
a special case for repository discovery.
Noticed using b1ef400e (setup_git_env: avoid blind fall-back to
".git", 2016-10-20) from 'next':
$ cd /tmp
$ git ls-remote https://kernel.googlesource.com/pub/scm/git/git
fatal: BUG: setup_git_env called without repository
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the error messages from the dumb-http client when it fails
to obtain loose objects; we used to give sensible error message
only upon 404 but we now forbid unexpected redirects that needs to
be reported with something sensible.
* jk/http-walker-limit-redirect:
http-walker: complain about non-404 loose object errors
http: treat http-alternates like redirects
http: make redirects more obvious
remote-curl: rename shadowed options variable
http: always update the base URL for redirects
http: simplify update_url_from_redirect
Finer-grained control of what protocols are allowed for transports
during clone/fetch/push have been enabled via a new configuration
mechanism.
* bw/transport-protocol-policy:
http: respect protocol.*.allow=user for http-alternates
transport: add from_user parameter to is_transport_allowed
http: create function to get curl allowed protocols
transport: add protocol policy config option
http: always warn if libcurl version is too old
lib-proto-disable: variable name fix
Transport with dumb http can be fooled into following foreign URLs
that the end user does not intend to, especially with the server
side redirects and http-alternates mechanism, which can lead to
security issues. Tighten the redirection and make it more obvious
to the end user when it happens.
* jk/http-walker-limit-redirect-2.9:
http: treat http-alternates like redirects
http: make redirects more obvious
remote-curl: rename shadowed options variable
http: always update the base URL for redirects
http: simplify update_url_from_redirect
The http-walker may fetch the http-alternates (or
alternates) file from a remote in order to find more
objects. This should count as a "not from the user" use of
the protocol. But because we implement the redirection
ourselves and feed the new URL to curl, it will use the
CURLOPT_PROTOCOLS rules, not the more restrictive
CURLOPT_REDIR_PROTOCOLS.
The ideal solution would be for each curl request we make to
know whether or not is directly from the user or part of an
alternates redirect, and then set CURLOPT_PROTOCOLS as
appropriate. However, that would require plumbing that
information through all of the various layers of the http
code.
Instead, let's check the protocol at the source: when we are
parsing the remote http-alternates file. The only downside
is that if there's any mismatch between what protocol we
think it is versus what curl thinks it is, it could violate
the policy.
To address this, we'll make the parsing err on the picky
side, and only allow protocols that it can parse
definitively. So for example, you can't elude the "http"
policy by asking for "HTTP://", even though curl might
handle it; we would reject it as unknown. The only unsafe
case would be if you have a URL that starts with "http://"
but curl interprets as another protocol. That seems like an
unlikely failure mode (and we are still protected by our
base CURLOPT_PROTOCOL setting, so the worst you could do is
trigger one of https, ftp, or ftps).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit made HTTP redirects more obvious and
tightened up the default behavior. However, there's another
way for a server to ask a git client to fetch arbitrary
content: by having an http-alternates file (or a regular
alternates file, which is used as a backup).
Similar to the HTTP redirect case, a malicious server can
claim to have refs pointing at object X, return a 404 when
the client asks for X, but point to some other URL via
http-alternates, which the client will transparently fetch.
The end result is that it looks from the user's perspective
like the objects came from the malicious server, as the
other URL is not mentioned at all.
Worse, because we feed the new URL to curl ourselves, the
usual protocol restrictions do not kick in (neither curl's
default of disallowing file://, nor the protocol
whitelisting in f4113cac0 (http: limit redirection to
protocol-whitelist, 2015-09-22).
Let's apply the same rules here as we do for HTTP redirects.
Namely:
- unless http.followRedirects is set to "always", we will
not follow remote redirects from http-alternates (or
alternates) at all
- set CURLOPT_PROTOCOLS alongside CURLOPT_REDIR_PROTOCOLS
restrict ourselves to a known-safe set and respect any
user-provided whitelist.
- mention alternate object stores on stderr so that the
user is aware another source of objects may be involved
The first item may prove to be too restrictive. The most
common use of alternates is to point to another path on the
same server. While it's possible for a single-server
redirect to be an attack, it takes a fairly obscure setup
(victim and evil repository on the same host, host speaks
dumb http, and evil repository has access to edit its own
http-alternates file).
So we could make the checks more specific, and only cover
cross-server redirects. But that means parsing the URLs
ourselves, rather than letting curl handle them. This patch
goes for the simpler approach. Given that they are only used
with dumb http, http-alternates are probably pretty rare.
And there's an escape hatch: the user can allow redirects on
a specific server by setting http.<url>.followRedirects to
"always".
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We instruct curl to always follow HTTP redirects. This is
convenient, but it creates opportunities for malicious
servers to create confusing situations. For instance,
imagine Alice is a git user with access to a private
repository on Bob's server. Mallory runs her own server and
wants to access objects from Bob's repository.
Mallory may try a few tricks that involve asking Alice to
clone from her, build on top, and then push the result:
1. Mallory may simply redirect all fetch requests to Bob's
server. Git will transparently follow those redirects
and fetch Bob's history, which Alice may believe she
got from Mallory. The subsequent push seems like it is
just feeding Mallory back her own objects, but is
actually leaking Bob's objects. There is nothing in
git's output to indicate that Bob's repository was
involved at all.
The downside (for Mallory) of this attack is that Alice
will have received Bob's entire repository, and is
likely to notice that when building on top of it.
2. If Mallory happens to know the sha1 of some object X in
Bob's repository, she can instead build her own history
that references that object. She then runs a dumb http
server, and Alice's client will fetch each object
individually. When it asks for X, Mallory redirects her
to Bob's server. The end result is that Alice obtains
objects from Bob, but they may be buried deep in
history. Alice is less likely to notice.
Both of these attacks are fairly hard to pull off. There's a
social component in getting Mallory to convince Alice to
work with her. Alice may be prompted for credentials in
accessing Bob's repository (but not always, if she is using
a credential helper that caches). Attack (1) requires a
certain amount of obliviousness on Alice's part while making
a new commit. Attack (2) requires that Mallory knows a sha1
in Bob's repository, that Bob's server supports dumb http,
and that the object in question is loose on Bob's server.
But we can probably make things a bit more obvious without
any loss of functionality. This patch does two things to
that end.
First, when we encounter a whole-repo redirect during the
initial ref discovery, we now inform the user on stderr,
making attack (1) much more obvious.
Second, the decision to follow redirects is now
configurable. The truly paranoid can set the new
http.followRedirects to false to avoid any redirection
entirely. But for a more practical default, we will disallow
redirects only after the initial ref discovery. This is
enough to thwart attacks similar to (2), while still
allowing the common use of redirects at the repository
level. Since c93c92f30 (http: update base URLs when we see
redirects, 2013-09-28) we re-root all further requests from
the redirect destination, which should generally mean that
no further redirection is necessary.
As an escape hatch, in case there really is a server that
needs to redirect individual requests, the user can set
http.followRedirects to "true" (and this can be done on a
per-server basis via http.*.followRedirects config).
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>