Add a refetch option to fetch-pack to force a full fetch. Use when
applying a new partial clone filter to refetch all matching objects.
Signed-off-by: Robert Coup <robert@coup.net.nz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have established the command-line interface for the --[no-]filter
options for a while now, so we do not need a helper to make this
editable in the future.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future patch we plan to return the name of an unborn current branch
from deep in the callchain to a caller via a new pointer parameter that
points at a variable in the caller when the caller calls
get_remote_refs() and transport_get_remote_refs().
In preparation for that, encapsulate the existing ref_prefixes
parameter into a struct. The aforementioned unborn current branch will
go into this new struct in the future patch.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that Git has switched to using a subprocess to lazy-fetch missing
objects, remove the no_dependents code as it is no longer used.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever a fetch results in a packfile being downloaded, a .keep file is
generated, so that the packfile can be preserved (from, say, a running
"git repack") until refs are written referring to the contents of the
packfile.
In a subsequent patch, a successful fetch using protocol v2 may result
in more than one .keep file being generated. Therefore, teach
fetch_pack() and the transport mechanism to support multiple .keep
files.
Implementation notes:
- builtin/fetch-pack.c normally does not generate .keep files, and thus
is unaffected by this or future changes. However, it has an
undocumented "--lock-pack" feature, used by remote-curl.c when
implementing the "fetch" remote helper command. In keeping with the
remote helper protocol, only one "lock" line will ever be written;
the rest will result in warnings to stderr. However, in practice,
warnings will never be written because the remote-curl.c "fetch" is
only used for protocol v0/v1 (which will not generate multiple .keep
files). (Protocol v2 uses the "stateless-connect" command, not the
"fetch" command.)
- connected.c has an optimization in that connectivity checks on a ref
need not be done if the target object is in a pack known to be
self-contained and connected. If there are multiple packfiles, this
optimization can no longer be done.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, remote-curl acts as a proxy and blindly forwards packets
between an HTTP server and fetch-pack. In the case of a stateless RPC
connection where the connection is terminated before the transaction is
complete, remote-curl will blindly forward the packets before waiting on
more input from fetch-pack. Meanwhile, fetch-pack will read the
transaction and continue reading, expecting more input to continue the
transaction. This results in a deadlock between the two processes.
This can be seen in the following command which does not terminate:
$ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012
Cloning into 'git'...
whereas the v1 version does terminate as expected:
$ git -c protocol.version=1 clone https://github.com/git/git.git --shallow-since=20151012
Cloning into 'git'...
fatal: the remote end hung up unexpectedly
Instead of blindly forwarding packets, make remote-curl insert a
response end packet after proxying the responses from the remote server
when using stateless_connect(). On the RPC client side, ensure that each
response ends as described.
A separate control packet is chosen because we need to be able to
differentiate between what the remote server sends and remote-curl's
control packets. By ensuring in the remote-curl code that a server
cannot send response end packets, we prevent a malicious server from
being able to perform a denial of service attack in which they spoof a
response end packet and cause the described deadlock to happen.
Reported-by: Force Charlie <charlieio@outlook.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We renamed the actual data structure in 910650d2f8 (Rename sha1_array to
oid_array, 2017-03-31), but the file is still called sha1-array. Besides
being slightly confusing, it makes it more annoying to grep for leftover
occurrences of "sha1" in various files, because the header is included
in so many places.
Let's complete the transition by renaming the source and header files
(and fixing up a few comment references).
I kept the "-" in the name, as that seems to be our style; cf.
fc1395f4a4 (sha1_file.c: rename to use dash in file name, 2018-04-10).
We also have oidmap.h and oidset.h without any punctuation, but those
are "struct oidmap" and "struct oidset" in the code. We _could_ make
this "oidarray" to match, but somehow it looks uglier to me because of
the length of "array" (plus it would be a very invasive patch for little
gain).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't need the caller of fetch_pack() to pass in "dest", which is the
remote URL. Since ba227857d2 (Reduce the number of connects when
fetching, 2008-02-04), the caller is responsible for calling
git_connect() itself, and our "dest" parameter is unused.
That commit also started passing us the resulting "conn" child_process
from git_connect(). But likewise, we do not need do anything with it.
The descriptors in "fd" are enough for us, and the caller is responsible
for cleaning up "conn".
We can just drop both parameters.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the scaffolding for protocol version 2 was initially added in
8f6982b4e1 ("protocol: introduce enum protocol_version value
protocol_v2", 2018-03-14). As seen in:
git log -p -G'support for protocol v2 not implemented yet' --full-diff --reverse v2.17.0..v2.20.0
Many of those scaffolding "die" placeholders were removed, but we
hadn't gotten around to fetch-pack yet.
The test here for "fetch refs from cmdline" is very minimal. There's
much better coverage when running the entire test suite under the WIP
GIT_TEST_PROTOCOL_VERSION=2 mode[1], we should ideally have better
coverage without needing to invoke a special test mode.
1. https://public-inbox.org/git/20181213155817.27666-1-avarab@gmail.com/
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the Git pack protocol definition, an error packet may appear only in
a certain context. However, servers can face a runtime error (e.g. I/O
error) at an arbitrary timing. This patch changes the protocol to allow
an error packet to be sent instead of any packet.
Without this protocol spec change, when a server cannot process a
request, there's no way to tell that to a client. Since the server
cannot produce a valid response, it would be forced to cut a connection
without telling why. With this protocol spec change, the server can be
more gentle in this situation. An old client may see these error packets
as an unexpected packet, but this is not worse than having an unexpected
EOF.
Following this protocol spec change, the error packet handling code is
moved to pkt-line.c. Implementation wise, this implementation uses
pkt-line to communicate with a subprocess. Since this is not a part of
Git protocol, it's possible that a packet that is not supposed to be an
error packet is mistakenly parsed as an error packet. This error packet
handling is enabled only for the Git pack protocol parsing code
considering this.
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using GIT_SHA1_HEXSZ, use parse_oid_hex to compute a pointer
and use that in comparisons. This is both simpler to read and works
independent of the hash length. Update references to SHA-1 in the same
function to refer to object IDs instead.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When communicating with a v2 server, perform a fetch by requesting the
'fetch' command.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce protocol_v2, a new value for 'enum protocol_version'.
Subsequent patches will fill in the implementation of protocol_v2.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to prepare for the addition of protocol_v2 push the protocol
version discovery outside of 'get_remote_heads()'. This will allow for
keeping the logic for processing the reference advertisement for
protocol_v1 and protocol_v0 separate from the logic for protocol_v2.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach (partial) fetch to inherit the filter-spec used by
the partial clone. Extend --no-filter to override this
inheritance.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fixup fetch-pack to accept --no-filter to be consistent with
rev-list and pack-objects.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.
The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.
However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.
In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
(1) functions in sha1_file.c that take in a hash, without the user
regarding how the object is stored (loose or packed)
(2) functions in packfile.c (because I need to check callers that know
about the loose/packed distinction and operate on both differently,
and ensure that they can handle the concept of objects that are
neither loose nor packed)
(1) is handled by the modification to sha1_object_info_extended().
For (2), I looked at for_each_packed_object and others. For
for_each_packed_object, the callers either already work or are fixed in
this patch:
- reachable - only to find recent objects
- builtin/fsck - already knows about missing objects
- builtin/cat-file - warning message added in this commit
Callers of the other functions do not need to be changed:
- parse_pack_index
- http - indirectly from http_get_info_packs
- find_pack_entry_one
- this searches a single pack that is provided as an argument; the
caller already knows (through other means) that the sought object
is in a specific pack
- find_sha1_pack
- fast-import - appears to be an optimization to not store a file if
it is already in a pack
- http-walker - to search through a struct alt_base
- http-push - to search through remote packs
- has_sha1_pack
- builtin/fsck - already knows about promisor objects
- builtin/count-objects - informational purposes only (check if loose
object is also packed)
- builtin/prune-packed - check if object to be pruned is packed (if
not, don't prune it)
- revision - used to exclude packed objects if requested by user
- diff - just for optimization
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce fetch-object, providing the ability to fetch one object from a
promisor remote.
This uses fetch-pack. To do this, the transport mechanism has been
updated with 2 flags, "from-promisor" to indicate that the resulting
pack comes from a promisor remote (and thus should be annotated as such
by index-pack), and "no-dependents" to indicate that only the objects
themselves need to be fetched (but fetching additional objects is
nevertheless safe).
Whenever "no-dependents" is used, fetch-pack will refrain from using any
object flags, because it is most likely invoked as part of a dynamic
object fetch by another Git command (which may itself use object flags).
An alternative to this is to leave fetch-pack alone, and instead update
the allocation of flags so that fetch-pack's flags never overlap with
any others, but this will end up shrinking the number of flags available
to nearly every other Git command (that is, every Git command that
accesses objects), so the approach in this commit was used instead.
This will be tested in a subsequent commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since this structure handles an array of object IDs, rename it to struct
oid_array. Also rename the accessor functions and the initialization
constant.
This commit was produced mechanically by providing non-Documentation
files to the following Perl one-liners:
perl -pi -E 's/struct sha1_array/struct oid_array/g'
perl -pi -E 's/\bsha1_array_/oid_array_/g'
perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g'
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare to reuse this code in transport.c for "git fetch".
While we're here, internationalize the existing error message.
Signed-off-by: Matt McCutchen <matt@mattmccutchen.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git-fetch, --depth argument is always relative with the latest
remote refs. This makes it a bit difficult to cover this use case,
where the user wants to make the shallow history, say 3 levels
deeper. It would work if remote refs have not moved yet, but nobody
can guarantee that, especially when that use case is performed a
couple months after the last clone or "git fetch --depth". Also,
modifying shallow boundary using --depth does not work well with
clones created by --since or --not.
This patch fixes that. A new argument --deepen=<N> will add <N> more (*)
parent commits to the current history regardless of where remote refs
are.
Have/Want negotiation is still respected. So if remote refs move, the
server will send two chunks: one between "have" and "want" and another
to extend shallow history. In theory, the client could send no "want"s
in order to get the second chunk only. But the protocol does not allow
that. Either you send no want lines, which means ls-remote; or you
have to send at least one want line that carries deep-relative to the
server..
The main work was done by Dongcan Jiang. I fixed it up here and there.
And of course all the bugs belong to me.
(*) We could even support --deepen=<N> where <N> is negative. In that
case we can cut some history from the shallow clone. This operation
(and --depth=<shorter depth>) does not require interaction with remote
side (and more complicated to implement as a result).
Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 58f2ed0 (remote-curl: pass ref SHA-1 to fetch-pack as well,
2013-12-05) added support for specifying a SHA-1 as well as a ref name.
Add support for specifying just a SHA-1 and set the ref name to the same
value in this case.
Signed-off-by: Gabriel Souza Franco <gabrielfrancosouza@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two variants of this function, one that takes a
string and one that takes a ptr/len combo. But we only call
the latter with the length of a NUL-terminated string, so
our first simplification is to drop it in favor of the
string variant.
Since we know we have a string, we can also replace the
manual memory computation with a call to alloc_ref().
Furthermore, we can rely on get_oid_hex() to complain if it
hits the end of the string. That means we can simplify the
check for "<sha1> <ref>" versus just "<ref>". Rather than
manage the ptr/len pair, we can just bump the start of our
string forward. The original code over-allocated based on
the original "namelen" (which wasn't _wrong_, but was simply
wasteful and confusing).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The strbuf_getline() interface allows a byte other than LF or NUL as
the line terminator, but this is only because I wrote these
codepaths anticipating that there might be a value other than NUL
and LF that could be useful when I introduced line_termination long
time ago. No useful caller that uses other value has emerged.
By now, it is clear that the interface is overly broad without a
good reason. Many codepaths have hardcoded preference to read
either LF terminated or NUL terminated records from their input, and
then call strbuf_getline() with LF or NUL as the third parameter.
This step introduces two thin wrappers around strbuf_getline(),
namely, strbuf_getline_lf() and strbuf_getline_nul(), and
mechanically rewrites these call sites to call either one of
them. The changes contained in this patch are:
* introduction of these two functions in strbuf.[ch]
* mechanical conversion of all callers to strbuf_getline() with
either '\n' or '\0' as the third parameter to instead call the
respective thin wrapper.
After this step, output from "git grep 'strbuf_getline('" would
become a lot smaller. An interim goal of this series is to make
this an empty set, so that we can have strbuf_getline_crlf() take
over the shorter name strbuf_getline().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert this function to use struct object_id. Express several
hardcoded constants in terms of GIT_SHA1_HEXSZ.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Use struct object_id in three fields in struct ref and convert all the
necessary places that use it.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
This patch puts the usage info strings that were not already in docopt-
like format into docopt-like format, which will be a litle easier for
end users and a lot easier for translators. Changes include:
- Placing angle brackets around fill-in-the-blank parameters
- Putting dashes in multiword parameter names
- Adding spaces to [-f|--foobar] to make [-f | --foobar]
- Replacing <foobar>* with [<foobar>...]
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cloning from a shallow repository does not follow the "8 steps for new
.git/shallow" because if it does we need to get through step 6 for all
refs. That means commit walking down to the bottom.
Instead the rule to create .git/shallow is simpler and, more
importantly, cheap: if a shallow commit is found in the pack, it's
probably used (i.e. reachable from some refs), so we add it. Others
are dropped.
One may notice this method seems flawed by the word "probably". A
shallow commit may not be reachable from any refs at all if it's
attached to an object island (a group of objects that are not
reachable by any refs).
If that object island is not complete, a new fetch request may send
more objects to connect it to some ref. At that time, because we
incorrectly installed the shallow commit in this island, the user will
not see anything after that commit (fsck is still ok). This is not
desired.
Given that object islands are rare (C Git never sends such islands for
security reasons) and do not really harm the repository integrity, a
tradeoff is made to surprise the user occasionally but work faster
everyday.
A new option --strict could be added later that follows exactly the 8
steps. "git prune" can also learn to remove dangling objects _and_ the
shallow commits that are attached to them from .git/shallow.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
No callers pass a non-empty pointer as shallow_points at this
stage. As a result, all clients still refuse to talk to shallow
repository on the other end.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The main purpose is to trace the URL parser called by git_connect() in
connect.c
The main features of the parser can be listed as this:
- parse out host and path for URLs with a scheme (git:// file:// ssh://)
- parse host names embedded by [] correctly
- extract the port number, if present
- separate URLs like "file" (which are local)
from URLs like "host:repo" which should use ssh
Add the new parameter "--diag-url" to "git fetch-pack", which prints
the value for protocol, host and path to stderr and exits.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is an extension of c6807a4 (clone: open a shortcut for
connectivity check - 2013-05-26) to reduce the cost of connectivity
check at clone time, this time with smart http protocol.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The definition of "struct ref" in "cache.h", a header file so
central to the system, always confused me. This structure is not
about the local ref used by sha1-name API to name local objects.
It is what refspecs are expanded into, after finding out what refs
the other side has, to define what refs are updated after object
transfer succeeds to what values. It belongs to "remote.h" together
with "struct refspec".
While we are at it, also move the types and functions related to the
Git transport connection to a new header file connect.h
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we can read packet data from memory as easily as a
descriptor, get_remote_heads can take either one as a
source. This will allow further refactoring in remote-curl.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the callers of packet_read_line just read into a
static 1000-byte buffer (callers which handle arbitrary
binary data already use LARGE_PACKET_MAX). This works fine
in practice, because:
1. The only variable-sized data in these lines is a ref
name, and refs tend to be a lot shorter than 1000
characters.
2. When sending ref lines, git-core always limits itself
to 1000 byte packets.
However, the only limit given in the protocol specification
in Documentation/technical/protocol-common.txt is
LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in
pack-protocol.txt, and then only describing what we write,
not as a specific limit for readers.
This patch lets us bump the 1000-byte limit to
LARGE_PACKET_MAX. Even though git-core will never write a
packet where this makes a difference, there are two good
reasons to do this:
1. Other git implementations may have followed
protocol-common.txt and used a larger maximum size. We
don't bump into it in practice because it would involve
very long ref names.
2. We may want to increase the 1000-byte limit one day.
Since packets are transferred before any capabilities,
it's difficult to do this in a backwards-compatible
way. But if we bump the size of buffer the readers can
handle, eventually older versions of git will be
obsolete enough that we can justify bumping the
writers, as well. We don't have plans to do this
anytime soon, but there is no reason not to start the
clock ticking now.
Just bumping all of the reading bufs to LARGE_PACKET_MAX
would waste memory. Instead, since most readers just read
into a temporary buffer anyway, let's provide a single
static buffer that all callers can use. We can further wrap
this detail away by having the packet_read_line wrapper just
use the buffer transparently and return a pointer to the
static storage. That covers most of the cases, and the
remaining ones already read into their own LARGE_PACKET_MAX
buffers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packets sent during ref negotiation are all terminated
by newline; even though the code to chomp these newlines is
short, we end up doing it in a lot of places.
This patch teaches packet_read_line to auto-chomp the
trailing newline; this lets us get rid of a lot of inline
chomping code.
As a result, some call-sites which are not reading
line-oriented data (e.g., when reading chunks of packfiles
alongside sideband) transition away from packet_read_line to
the generic packet_read interface. This patch converts all
of the existing callsites.
Since the function signature of packet_read_line does not
change (but its behavior does), there is a possibility of
new callsites being introduced in later commits, silently
introducing an incompatibility. However, since a later
patch in this series will change the signature, such a
commit would have to be merged directly into this commit,
not to the tip of the series; we can therefore ignore the
issue.
This is an internal cleanup and should produce no change of
behavior in the normal case. However, there is one corner
case to note. Callers of packet_read_line have never been
able to tell the difference between a flush packet ("0000")
and an empty packet ("0004"), as both cause packet_read_line
to return a length of 0. Readers treat them identically,
even though Documentation/technical/protocol-common.txt says
we must not; it also says that implementations should not
send an empty pkt-line.
By stripping out the newline before the result gets to the
caller, we will now treat the newline-only packet ("0005\n")
the same as an empty packet, which in turn gets treated like
a flush packet. In practice this doesn't matter, as neither
empty nor newline-only packets are part of git's protocols
(at least not for the line-oriented bits, and readers who
are not expecting line-oriented packets will be calling
packet_read directly, anyway). But even if we do decide to
care about the distinction later, it is orthogonal to this
patch. The right place to tighten would be to stop treating
empty packets as flush packets, and this change does not
make doing so any harder.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even though "git fetch" has full infrastructure to parse refspecs to
be fetched and match them against the list of refs to come up with
the final list of refs to be fetched, the list of refs that are
requested to be fetched were internally converted to a plain list of
strings at the transport layer and then passed to the underlying
fetch-pack driver.
Stop this conversion and instead pass around an array of refs.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we look up a sha1 object for reading via parse_object() =>
read_sha1_file() => read_object() callpath, we first check
packfiles, and then loose objects. If we still haven't found it, we
re-scan the list of packfiles in `objects/pack`. This final step
ensures that we can co-exist with a simultaneous repack process
which creates a new pack and then prunes the old object.
This extra re-scan usually does not have a performance impact for
two reasons:
1. If an object is missing, then typically the re-scan will find a
new pack, then no more misses will occur. Or if it truly is
missing, then our next step is usually to die().
2. Re-scanning is cheap enough that we do not even notice.
However, these do not always hold. The assumption in (1) is that the
caller is expecting to find the object. This is usually the case,
but the call to `parse_object` in `everything_local` does not follow
this pattern. It is looking to see whether we have objects that the
remote side is advertising, not something we expect to
have. Therefore if we are fetching from a remote which has many refs
pointing to objects we do not have, we may end up re-scanning the
pack directory many times.
Even with this extra re-scanning, the impact is often not noticeable
due to (2); we just readdir() the packs directory and skip any packs
that are already loaded. However, if there are a large number of
packs, even enumerating the directory can be expensive, especially
if we do it repeatedly.
Having this many packs is a good sign the user should run `git gc`,
but it would still be nice to avoid having to scan the directory at
all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
lock is only used by fetch_pack, so move it into that function.
Signed-off-by: Matt Kraai <matt.kraai@amo.abbott.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fetch_pack() is used by transport.c, part of libgit.a while it stays
in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get
undefined reference if a program that uses libgit.a happens to pull it
in.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
This helps removes the hack in fetch_pack() that copies my_args to args.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
It used to be that if "--all", "--depth", and also explicit references
were sought, then the explicit references were not handled correctly
in filter_refs() because the "--all --depth" code took precedence over
the explicit reference handling, and the explicit references were
never noted as having been found. So check for explicitly sought
references before proceeding to the "--all --depth" logic.
This fixes two test cases in t5500.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Set the final value at initialization rather than initializing it then
sometimes changing it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>