Commit Graph

23 Commits (next)

Author SHA1 Message Date
Junio C Hamano e1775c0646 Sync with 2.49.1 2025-06-15 21:54:23 -07:00
Taylor Blau 856b515a46 Sync with 2.47.3
* maint-2.47:
  Git 2.47.3
  Git 2.46.4
  Git 2.45.4
  Git 2.44.4
  Git 2.43.7
  wincred: avoid buffer overflow in wcsncat()
  bundle-uri: fix arbitrary file writes via parameter injection
  config: quote values containing CR character
  git-gui: sanitize 'exec' arguments: convert new 'cygpath' calls
  git-gui: do not mistake command arguments as redirection operators
  git-gui: introduce function git_redir for git calls with redirections
  git-gui: pass redirections as separate argument to git_read
  git-gui: pass redirections as separate argument to _open_stdout_stderr
  git-gui: convert git_read*, git_write to be non-variadic
  git-gui: override exec and open only on Windows
  gitk: sanitize 'open' arguments: revisit recently updated 'open' calls
  git-gui: use git_read in githook_read
  git-gui: sanitize $PATH on all platforms
  git-gui: break out a separate function git_read_nice
  git-gui: assure PATH has only absolute elements.
  git-gui: remove option --stderr from git_read
  git-gui: cleanup git-bash menu item
  git-gui: sanitize 'exec' arguments: background
  git-gui: avoid auto_execok in do_windows_shortcut
  git-gui: sanitize 'exec' arguments: simple cases
  git-gui: avoid auto_execok for git-bash menu item
  git-gui: treat file names beginning with "|" as relative paths
  git-gui: remove unused proc is_shellscript
  git-gui: remove git config --list handling for git < 1.5.3
  git-gui: remove special treatment of Windows from open_cmd_pipe
  git-gui: remove HEAD detachment implementation for git < 1.5.3
  git-gui: use only the configured shell
  git-gui: remove Tcl 8.4 workaround on 2>@1 redirection
  git-gui: make _shellpath usable on startup
  git-gui: use [is_Windows], not bad _shellpath
  git-gui: _which, only add .exe suffix if not present
  gitk: encode arguments correctly with "open"
  gitk: sanitize 'open' arguments: command pipeline
  gitk: collect construction of blameargs into a single conditional
  gitk: sanitize 'open' arguments: simple commands, readable and writable
  gitk: sanitize 'open' arguments: simple commands with redirections
  gitk: sanitize 'open' arguments: simple commands
  gitk: sanitize 'exec' arguments: redirect to process
  gitk: sanitize 'exec' arguments: redirections and background
  gitk: sanitize 'exec' arguments: redirections
  gitk: sanitize 'exec' arguments: 'eval exec'
  gitk: sanitize 'exec' arguments: simple cases
  gitk: have callers of diffcmd supply pipe symbol when necessary
  gitk: treat file names beginning with "|" as relative paths
2025-05-28 15:17:05 -04:00
Taylor Blau 199837cd4d Sync with 2.45.4
* maint-2.45:
  Git 2.45.4
  Git 2.44.4
  Git 2.43.7
  wincred: avoid buffer overflow in wcsncat()
  bundle-uri: fix arbitrary file writes via parameter injection
  config: quote values containing CR character
  git-gui: sanitize 'exec' arguments: convert new 'cygpath' calls
  git-gui: do not mistake command arguments as redirection operators
  git-gui: introduce function git_redir for git calls with redirections
  git-gui: pass redirections as separate argument to git_read
  git-gui: pass redirections as separate argument to _open_stdout_stderr
  git-gui: convert git_read*, git_write to be non-variadic
  git-gui: override exec and open only on Windows
  gitk: sanitize 'open' arguments: revisit recently updated 'open' calls
  git-gui: use git_read in githook_read
  git-gui: sanitize $PATH on all platforms
  git-gui: break out a separate function git_read_nice
  git-gui: assure PATH has only absolute elements.
  git-gui: remove option --stderr from git_read
  git-gui: cleanup git-bash menu item
  git-gui: sanitize 'exec' arguments: background
  git-gui: avoid auto_execok in do_windows_shortcut
  git-gui: sanitize 'exec' arguments: simple cases
  git-gui: avoid auto_execok for git-bash menu item
  git-gui: treat file names beginning with "|" as relative paths
  git-gui: remove unused proc is_shellscript
  git-gui: remove git config --list handling for git < 1.5.3
  git-gui: remove special treatment of Windows from open_cmd_pipe
  git-gui: remove HEAD detachment implementation for git < 1.5.3
  git-gui: use only the configured shell
  git-gui: remove Tcl 8.4 workaround on 2>@1 redirection
  git-gui: make _shellpath usable on startup
  git-gui: use [is_Windows], not bad _shellpath
  git-gui: _which, only add .exe suffix if not present
  gitk: encode arguments correctly with "open"
  gitk: sanitize 'open' arguments: command pipeline
  gitk: collect construction of blameargs into a single conditional
  gitk: sanitize 'open' arguments: simple commands, readable and writable
  gitk: sanitize 'open' arguments: simple commands with redirections
  gitk: sanitize 'open' arguments: simple commands
  gitk: sanitize 'exec' arguments: redirect to process
  gitk: sanitize 'exec' arguments: redirections and background
  gitk: sanitize 'exec' arguments: redirections
  gitk: sanitize 'exec' arguments: 'eval exec'
  gitk: sanitize 'exec' arguments: simple cases
  gitk: have callers of diffcmd supply pipe symbol when necessary
  gitk: treat file names beginning with "|" as relative paths

Signed-off-by: Taylor Blau <me@ttaylorr.com>
2025-05-28 14:57:08 -04:00
Taylor Blau a162459bf6 Sync with 2.43.7
* maint-2.43:
  Git 2.43.7
  wincred: avoid buffer overflow in wcsncat()
  bundle-uri: fix arbitrary file writes via parameter injection
  config: quote values containing CR character
  git-gui: sanitize 'exec' arguments: convert new 'cygpath' calls
  git-gui: do not mistake command arguments as redirection operators
  git-gui: introduce function git_redir for git calls with redirections
  git-gui: pass redirections as separate argument to git_read
  git-gui: pass redirections as separate argument to _open_stdout_stderr
  git-gui: convert git_read*, git_write to be non-variadic
  git-gui: override exec and open only on Windows
  gitk: sanitize 'open' arguments: revisit recently updated 'open' calls
  git-gui: use git_read in githook_read
  git-gui: sanitize $PATH on all platforms
  git-gui: break out a separate function git_read_nice
  git-gui: assure PATH has only absolute elements.
  git-gui: remove option --stderr from git_read
  git-gui: cleanup git-bash menu item
  git-gui: sanitize 'exec' arguments: background
  git-gui: avoid auto_execok in do_windows_shortcut
  git-gui: sanitize 'exec' arguments: simple cases
  git-gui: avoid auto_execok for git-bash menu item
  git-gui: treat file names beginning with "|" as relative paths
  git-gui: remove unused proc is_shellscript
  git-gui: remove git config --list handling for git < 1.5.3
  git-gui: remove special treatment of Windows from open_cmd_pipe
  git-gui: remove HEAD detachment implementation for git < 1.5.3
  git-gui: use only the configured shell
  git-gui: remove Tcl 8.4 workaround on 2>@1 redirection
  git-gui: make _shellpath usable on startup
  git-gui: use [is_Windows], not bad _shellpath
  git-gui: _which, only add .exe suffix if not present
  gitk: encode arguments correctly with "open"
  gitk: sanitize 'open' arguments: command pipeline
  gitk: collect construction of blameargs into a single conditional
  gitk: sanitize 'open' arguments: simple commands, readable and writable
  gitk: sanitize 'open' arguments: simple commands with redirections
  gitk: sanitize 'open' arguments: simple commands
  gitk: sanitize 'exec' arguments: redirect to process
  gitk: sanitize 'exec' arguments: redirections and background
  gitk: sanitize 'exec' arguments: redirections
  gitk: sanitize 'exec' arguments: 'eval exec'
  gitk: sanitize 'exec' arguments: simple cases
  gitk: have callers of diffcmd supply pipe symbol when necessary
  gitk: treat file names beginning with "|" as relative paths

Signed-off-by: Taylor Blau <me@ttaylorr.com>
2025-05-28 14:47:12 -04:00
Patrick Steinhardt' via Git Security 35cb1bb0b9 bundle-uri: fix arbitrary file writes via parameter injection
We fetch bundle URIs via `download_https_uri_to_file()`. The logic to
fetch those bundles is not handled in-process, but we instead use a
separate git-remote-https(1) process that performs the fetch for us. The
information about which file should be downloaded and where that file
should be put gets communicated via stdin of that process via a "get"
request. This "get" request has the form "get $uri $file\n\n". As may be
obvious to the reader, this will cause git-remote-https(1) to download
the URI "$uri" and put it into "$file".

The fact that we are using plain spaces and newlines as separators for
the request arguments means that we have to be extra careful with the
respective vaules of these arguments:

  - If "$uri" contained a space we would interpret this as both URI and
    target location.

  - If either "$uri" or "$file" contained a newline we would interpret
    this as a new command.

But we neither quote the arguments such that any characters with special
meaning would be escaped, nor do we verify that none of these special
characters are contained.

If either the URI or file contains a newline character, we are open to
protocol injection attacks. Likewise, if the URI itself contains a
space, then an attacker-controlled URI can lead to partially-controlled
file writes.

Note that the attacker-controlled URIs do not permit completely
arbitrary file writes, but instead allows an attacker to control the
path in which we will write a temporary (e.g., "tmp_uri_XXXXXX")
file.

The result is twofold:

  - By adding a space in "$uri" we can control where exactly a file will
    be written to, including out-of-repository writes. The final
    location is not completely arbitrary, as the injected string will be
    concatenated with the original "$file" path. Furthermore, the name
    of the bundle will be "tmp_uri_XXXXXX", further restricting what an
    adversary would be able to write.

    Also note that is not possible for the URI to contain a newline
    because we end up in `credential_from_url_1()` before we try to
    issue any requests using that URI. As such, it is not possible to
    inject arbitrary commands via the URI.

  - By adding a newline to "$file" we can inject arbitrary commands.
    This gives us full control over where a specific file will be
    written to. Potential attack vectors would be to overwrite hooks,
    but if an adversary were to guess where the user's home directory is
    located they might also easily write e.g. a "~/.profile" file and
    thus cause arbitrary code execution.

    This injection can only become possible when the adversary has full
    control over the target path where a bundle will be downloaded to.
    While this feels unlikely, it is possible to control this path when
    users perform a recursive clone with a ".gitmodules" file that is
    controlled by the adversary.

Luckily though, the use of bundle URIs is not enabled by default in Git
clients (yet): they have to be enabled by setting the `bundle.heuristic`
config key explicitly. As such, the blast radius of this parameter
injection should overall be quite contained.

Fix the issue by rejecting spaces in the URI and newlines in both the
URI and the file. As explained, it shouldn't be required to also
restrict the use of newlines in the URI, as we would eventually die
anyway in `credential_from_url_1()`. But given that we're only one small
step away from arbitrary code execution, let's rather be safe and
restrict newlines in URIs, as well.

Eventually we should probably refactor the way that Git talks with the
git-remote-https(1) subprocess so that it is less fragile. Until then,
these two restrictions should plug the issue.

Reported-by: David Leadbeater <dgl@dgl.cx>
Based-on-patch-by: David Leadbeater <dgl@dgl.cx>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2025-05-23 17:09:48 -04:00
Scott Chacon 435b076ceb bundle-uri: add test for bundle-uri clones with tags
The change to the bundle-uri unbundling refspec now includes tags, so this
adds a very, very simple test to make sure that tags in a bundle are
properly added to the cloned repository and will be included in ref
negotiation with the subsequent fetch.

Signed-off-by: Scott Chacon <schacon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-25 13:36:45 -07:00
Scott Chacon c858c6442b bundle-uri: copy all bundle references ino the refs/bundle space
When downloading bundles via the bundle-uri functionality, we only copy the
references from refs/heads into the refs/bundle space. I'm not sure why this
refspec is hardcoded to be so limited, but it makes the ref negotiation on
the subsequent fetch suboptimal, since it won't use objects that are
referenced outside of the current heads of the bundled repository.

This change to copy everything in refs/ in the bundle to refs/bundles/
significantly helps the subsequent fetch, since nearly all the references
are now included in the negotiation.

The update to the bundle-uri unbundling refspec puts all the heads from a
bundle file into refs/bundle/heads instead of directly into refs/bundle/ so
the tests also need to be updated to look in the new heirarchy.

Signed-off-by: Scott Chacon <schacon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-25 13:36:45 -07:00
Andrew Kreimer f56f9d6c0b t: fix typos
Fix typos and grammar in documentation, comments, etc.

Via codespell.

Reported-by: Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com>
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-24 12:45:53 -04:00
Xing Xin 63d903ff52 unbundle: extend object verification for fetches
The existing fetch.fsckObjects and transfer.fsckObjects configurations
were not fully applied to bundle-involved fetches, including direct
bundle fetches and bundle-uri enabled fetches. Furthermore, there was no
object verification support for unbundle.

This commit extends object verification support in `bundle.c:unbundle`
by adding the `VERIFY_BUNDLE_FSCK` option to `verify_bundle_flags`. When
this option is enabled, we append the `--fsck-objects` flag to
`git-index-pack`.

The `VERIFY_BUNDLE_FSCK` option is now used by bundle-involved fetches,
where we use `fetch-pack.c:fetch_pack_fsck_objects` to determine whether
to enable this option for `bundle.c:unbundle`, specifically in:

- `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
- `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.

This addition ensures a consistent logic for object verification during
fetches. Tests have been added to confirm functionality in the scenarios
mentioned above.

Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:30:08 -07:00
Xing Xin 3079026fc1 bundle-uri: verify oid before writing refs
When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not discovered, thus resulting in rather slow
clones. This was particularly problematic when employing the
"creationTokens" heuristic.

To reproduce this issue, consider a repository with a single branch
"main" pointing to commit "A". Firstly, create a base bundle with:

  git bundle create base.bundle main

Then, add a new commit "B" on top of "A", and create an incremental
bundle for "main":

  git bundle create incr.bundle A..main

Now, generate a bundle list with the following content:

  [bundle]
      version = 1
      mode = all
      heuristic = creationToken

  [bundle "base"]
      uri = base.bundle
      creationToken = 1

  [bundle "incr"]
      uri = incr.bundle
      creationToken = 2

A fresh clone with the bundle list above should result in a reference
"refs/bundles/main" pointing to "B" in the new repository. However, git
would still download everything from the server, as if it had fetched
nothing locally.

So why the "refs/bundles/main" is not discovered? After some digging I
found that:

1. Bundles in bundle list are downloaded to local files via
   `bundle-uri.c:download_bundle_list` or via
   `bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
   heuristic.
2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which
   is called by `bundle-uri.c:unbundle_all_bundles` or called within
   `bundle-uri.c:fetch_bundles_by_token` for the "creationToken"
   heuristic.
3. To get all prerequisites of the bundle, the bundle header is read
   inside `bundle-uri.c:unbundle_from_file` to by calling
   `bundle.c:read_bundle_header`.
4. Then it calls `bundle.c:unbundle`, which calls
   `bundle.c:verify_bundle` to ensure the repository contains all the
   prerequisites.
5. `bundle.c:verify_bundle` calls `parse_object`, which eventually
   invokes `packfile.c:prepare_packed_git` or
   `packfile.c:reprepare_packed_git`, filling
   `raw_object_store->packed_git` and setting `packed_git_initialized`.
6. If `bundle.c:unbundle` succeeds, it writes refs via
   `refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here
   bundle refs which can target arbitrary objects are written to the
   repository.
7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions
   `fetch-pack.c:mark_complete_and_common_ref` and
   `fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to
   find local tips for negotiation. The `OBJECT_INFO_QUICK` flag
   prevents `packfile.c:reprepare_packed_git` from being called,
   resulting in failures to parse OIDs that reside only in the latest
   bundle.

In the example above, when unbunding "incr.bundle", "base.pack" is added
to `packed_git` due to prerequisites verification. However, "B" cannot
be found for negotiation because it exists in "incr.pack", which is not
included in `packed_git`.

Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing
bundle refs. When `refs.c:refs_update_ref` is called to write the
corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`.
This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls
`transaction_prepare` of the refs storage backend. For files backend, it
is `files-backend.c:files_transaction_prepare`, and for reftable
backend, it is `reftable-backend.c:reftable_be_transaction_prepare`.
Both functions eventually call `object.c:parse_object`, which can invoke
`packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures
that bundle refs point to valid objects and that all tips from bundle
refs are correctly parsed during subsequent negotiations.

A set of negotiation-related tests for cloning with bundle-uri has been
included to demonstrate that downloaded bundles are utilized to
accelerate fetching.

Additionally, another test has been added to show that bundles with
incorrect headers, where refs point to non-existent objects, do not
result in any bundle refs being created in the repository.

Reviewed-by: Karthik Nayak <karthik.188@gmail.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20 10:30:07 -07:00
Patrick Steinhardt 9159029329 builtin/clone: fix bundle URIs with mismatching object formats
We create the reference database in git-clone(1) quite early before
connecting to the remote repository. Given that we do not yet know about
the object format that the remote repository uses at that point in time
the consequence is that the refdb may be initialized with the wrong
object format.

This is not a problem in the context of the files backend as we do not
encode the object format anywhere, and furthermore the only reference
that we write between initializing the refdb and learning about the
object format is the "HEAD" symref. It will become a problem though once
we land the reftable backend, which indeed does require to know about
the proper object format at the time of creation. We thus need to
rearrange the logic in git-clone(1) so that we only initialize the refdb
once we have learned about the actual object format.

As a first step, move listing of remote references to happen earlier,
which also allow us to set up the hash algorithm of the repository
earlier now. While we aim to execute this logic as late as possible
until after most of the setup has happened already, detection of the
object format and thus later the setup of the reference database must
happen before any other logic that may spawn Git commands or otherwise
these Git commands may not recognize the repository as such.

The first Git step where we expect the repository to be fully initalized
is when we fetch bundles via bundle URIs. Funny enough, the comments
there also state that "the_repository must match the cloned repo", which
is indeed not necessarily the case for the hash algorithm right now. So
in practice it is the right thing to detect the remote's object format
before downloading bundle URIs anyway, and not doing so causes clones
with bundle URIs to fail when the local default object format does not
match the remote repository's format.

Unfortunately though, this creates a new issue: downloading bundles may
take a long time, so if we list refs beforehand they might've grown
stale meanwhile. It is not clear how to solve this issue except for a
second reference listing though after we have downloaded the bundles,
which may be an expensive thing to do.

Arguably though, it's preferable to have a staleness issue compared to
being unable to clone a repository altogether.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-12 11:16:54 -08:00
Derrick Stolee 25bccb4b79 fetch: download bundles once, even with --all
When fetch.bundleURI is set, 'git fetch' downloads bundles from the
given bundle URI before fetching from the specified remote. However,
when using non-file remotes, 'git fetch --all' will launch 'git fetch'
subprocesses which then read fetch.bundleURI and fetch the bundle list
again. We do not expect the bundle list to have new information during
these multiple runs, so avoid these extra calls by un-setting
fetch.bundleURI in the subprocess arguments.

Be careful to skip fetching bundles for the empty bundle string.
Fetching bundles from the empty list presents some interesting test
failures.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-31 10:07:33 -07:00
Derrick Stolee 026df9e047 bundle-uri: test missing bundles with heuristic
The creationToken heuristic uses a different mechanism for downloading
bundles from the "standard" approach. Specifically: it uses a concrete
order based on the creationToken values and attempts to download as few
bundles as possible. It also modifies local config to store a value for
future fetches to avoid downloading bundles, if possible.

However, if any of the individual bundles has a failed download, then
the logic for the ordering comes into question. It is important to avoid
infinite loops, assigning invalid creation token values in config, but
also to be opportunistic as possible when downloading as many bundles as
seem appropriate.

These tests were used to inform the implementation of
fetch_bundles_by_token() in bundle-uri.c, but are being added
independently here to allow focusing on faulty downloads. There may be
more cases that could be added that result in modifications to
fetch_bundles_by_token() as interesting data shapes reveal themselves in
real scenarios.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:48 -08:00
Derrick Stolee c429bed102 bundle-uri: store fetch.bundleCreationToken
When a bundle list specifies the "creationToken" heuristic, the Git
client downloads the list and then starts downloading bundles in
descending creationToken order. This process stops as soon as all
downloaded bundles can be applied to the repository (because all
required commits are present in the repository or in the downloaded
bundles).

When checking the same bundle list twice, this strategy requires
downloading the bundle with the maximum creationToken again, which is
wasteful. The creationToken heuristic promises that the client will not
have a use for that bundle if its creationToken value is at most the
previous creationToken value.

To prevent these wasteful downloads, create a fetch.bundleCreationToken
config setting that the Git client sets after downloading bundles. This
value allows skipping that maximum bundle download when this config
value is the same value (or larger).

To test that this works correctly, we can insert some "duplicate"
fetches into existing tests and demonstrate that only the bundle list is
downloaded.

The previous logic for downloading bundles by creationToken worked even
if the bundle list was empty, but now we have logic that depends on the
first entry of the list. Terminate early in the (non-sensical) case of
an empty bundle list.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:48 -08:00
Derrick Stolee 7f0cc04f2c fetch: fetch from an external bundle URI
When a user specifies a URI via 'git clone --bundle-uri', that URI may
be a bundle list that advertises a 'bundle.heuristic' value. In that
case, the Git client stores a 'fetch.bundleURI' config value storing
that URI.

Teach 'git fetch' to check for this config value and download bundles
from that URI before fetching from the Git remote(s). Likely, the bundle
provider has configured a heuristic (such as "creationToken") that will
allow the Git client to download only a portion of the bundles before
continuing the fetch.

Since this URI is completely independent of the remote server, we want
to be sure that we connect to the bundle URI before creating a
connection to the Git remote. We do not want to hold a stateful
connection for too long if we can avoid it.

To test that this works correctly, extend the previous tests that set
'fetch.bundleURI' to do follow-up fetches. The bundle list is updated
incrementally at each phase to demonstrate that the heuristic avoids
downloading older bundles. This includes the middle fetch downloading
the objects in bundle-3.bundle from the Git remote, and therefore not
needing that bundle in the third fetch.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:48 -08:00
Derrick Stolee 4074d3c7e1 clone: set fetch.bundleURI if appropriate
Bundle providers may organize their bundle lists in a way that is
intended to improve incremental fetches, not just initial clones.
However, they do need to state that they have organized with that in
mind, or else the client will not expect to save time by downloading
bundles after the initial clone. This is done by specifying a
bundle.heuristic value.

There are two types of bundle lists: those at a static URI and those
that are advertised from a Git remote over protocol v2.

The new fetch.bundleURI config value applies for static bundle URIs that
are not advertised over protocol v2. If the user specifies a static URI
via 'git clone --bundle-uri', then Git can set this config as a reminder
for future 'git fetch' operations to check the bundle list before
connecting to the remote(s).

For lists provided over protocol v2, we will want to take a different
approach and create a property of the remote itself by creating a
remote.<id>.* type config key. That is not implemented in this change.

Later changes will update 'git fetch' to consume this option.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:48 -08:00
Derrick Stolee 7903efb717 bundle-uri: download in creationToken order
The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.

The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.

During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.

Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.

However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:

 ---------------- bundle-4

       4
      / \
 ----|---|------- bundle-3
     |   |
     |   3
     |   |
 ----|---|------- bundle-2
     |   |
     2   |
     |   |
 ----|---|------- bundle-1
      \ /
       1
       |
 (previous commits)

In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.

A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.

Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:48 -08:00
Derrick Stolee 7bc73e7b61 t5558: add tests for creationToken heuristic
As documented in the bundle URI design doc in 2da14fad8f (docs:
document bundle URI standard, 2022-08-09), the 'creationToken' member of
a bundle URI allows a bundle provider to specify a total order on the
bundles.

Future changes will allow the Git client to understand these members and
modify its behavior around downloading the bundles in that order. In the
meantime, create tests that add creation tokens to the bundle list. For
now, the Git client correctly ignores these unknown keys.

Create a new test helper function, test_remote_https_urls, which filters
GIT_TRACE2_EVENT output to extract a list of URLs passed to
git-remote-https child processes. This can be used to verify the order
of these requests as we implement the creationToken heuristic. For now,
we need to sort the actual output since the current client does not have
a well-defined order that it applies to the bundles.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:47 -08:00
Derrick Stolee 8628a842bd bundle-uri: suppress stderr from remote-https
When downloading bundles from a git-remote-https subprocess, the bundle
URI logic wants to be opportunistic and download as much as possible and
work with what did succeed. This is particularly important in the "any"
mode, where any single bundle success will work.

If the URI is not available, the git-remote-https process will die()
with a "fatal:" error message, even though that error is not actually
fatal to the super process. Since stderr is passed through, it looks
like a fatal error to the user.

Suppress stderr to avoid these errors from bubbling to the surface. The
bundle URI API adds its own warning() messages on these failures.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 09:13:25 -07:00
Derrick Stolee 70334fc3eb bundle-uri: quiet failed unbundlings
When downloading a list of bundles in "all" mode, Git has no
understanding of the dependencies between the bundles. Git attempts to
unbundle the bundles in some order, but some may not pass the
verify_bundle() step because of missing prerequisites. This is passed as
error messages to the user, even when they eventually succeed in later
attempts after their dependent bundles are unbundled.

Add a new VERIFY_BUNDLE_QUIET flag to verify_bundle() that avoids the
error messages from the missing prerequisite commits. The method still
returns the number of missing prerequisit commits, allowing callers to
unbundle() to notice that the bundle failed to apply.

Use this flag in bundle-uri.c and test that the messages go away for
'git clone --bundle-uri' commands.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 09:13:25 -07:00
Derrick Stolee c23f592117 bundle-uri: fetch a list of bundles
When the content at a given bundle URI is not understood as a bundle
(based on inspecting the initial content), then Git currently gives up
and ignores that content. Independent bundle providers may want to split
up the bundle content into multiple bundles, but still make them
available from a single URI.

Teach Git to attempt parsing the bundle URI content as a Git config file
providing the key=value pairs for a bundle list. Git then looks at the
mode of the list to see if ANY single bundle is sufficient or if ALL
bundles are required. The content at the selected URIs are downloaded
and the content is inspected again, creating a recursive process.

To guard the recursion against malformed or malicious content, limit the
recursion depth to a reasonable four for now. This can be converted to a
configured value in the future if necessary. The value of four is twice
as high as expected to be useful (a bundle list is unlikely to point to
more bundle lists).

To test this scenario, create an interesting bundle topology where three
incremental bundles are built on top of a single full bundle. By using a
merge commit, the two middle bundles are "independent" in that they do
not require each other in order to unbundle themselves. They each only
need the base bundle. The bundle containing the merge commit requires
both of the middle bundles, though. This leads to some interesting
decisions when unbundling, especially when we later implement heuristics
that promote downloading bundles until the prerequisite commits are
satisfied.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 09:13:25 -07:00
Derrick Stolee 59c1752ab6 bundle-uri: add support for http(s):// and file://
The previous change created the 'git clone --bundle-uri=<uri>' option.
Currently, <uri> must be a filename.

Update copy_uri_to_file() to first inspect the URI for an HTTP(S) prefix
and use git-remote-https as the way to download the data at that URI.
Otherwise, check to see if file:// is present and modify the prefix
accordingly.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10 14:07:37 -07:00
Derrick Stolee 5556891961 clone: add --bundle-uri option
Cloning a remote repository is one of the most expensive operations in
Git. The server can spend a lot of CPU time generating a pack-file for
the client's request. The amount of data can clog the network for a long
time, and the Git protocol is not resumable. For users with poor network
connections or are located far away from the origin server, this can be
especially painful.

Add a new '--bundle-uri' option to 'git clone' to bootstrap a clone from
a bundle. If the user is aware of a bundle server, then they can tell
Git to bootstrap the new repository with these bundles before fetching
the remaining objects from the origin server.

Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10 14:07:37 -07:00