Commit Graph

65 Commits (e52bec9469e6df752471d654fca56200d32a0392)

Author SHA1 Message Date
Michael Haggerty 5f0fc64513 fetch-pack: eliminate spurious error messages
It used to be that if "--all", "--depth", and also explicit references
were sought, then the explicit references were not handled correctly
in filter_refs() because the "--all --depth" code took precedence over
the explicit reference handling, and the explicit references were
never noted as having been found.  So check for explicitly sought
references before proceeding to the "--all --depth" logic.

This fixes two test cases in t5500.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:32 -07:00
Michael Haggerty b285668dd2 cmd_fetch_pack(): simplify computation of return value
Set the final value at initialization rather than initializing it then
sometimes changing it.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:32 -07:00
Michael Haggerty 778e7543d2 fetch-pack: report missing refs even if no existing refs were received
This fixes a test in t5500.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:32 -07:00
Michael Haggerty 7418f1a037 cmd_fetch_pack(): return early if finish_connect() fails
This simplifies the logic without changing the behavior.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Michael Haggerty f537cfa750 filter_refs(): simplify logic
Simplify flow within loop: first decide whether to keep the reference,
then keep/free it.  This makes it clearer that each ref has exactly
two possible destinies, and removes duplication of the code for
appending the reference to the linked list.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Michael Haggerty 5096e48753 filter_refs(): build refs list as we go
Instead of temporarily storing matched refs to temporary array
"return_refs", simply append them to newlist as we go.  This changes
the order of references in newlist to strictly sorted if "--all" and
"--depth" and named references are all specified, but that usage is
broken anyway (see the last two tests in t5500).

This changes the last test in t5500 from segfaulting into just
emitting a spurious error (this will be fixed in a moment).

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Michael Haggerty 4ba159996f filter_refs(): delete matched refs from sought list
Remove any references that are available from the remote from the
sought list (rather than overwriting their names with NUL characters,
as previously).  Mark matching entries by writing a non-NULL pointer
to string_list_item::util during the iteration, then use
filter_string_list() later to filter out the entries that have been
marked.

Document this aspect of fetch_pack() in a comment in the header file.
(More documentation is obviously still needed.)

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Michael Haggerty 4c58f13ba6 fetch_pack(): update sought->nr to reflect number of unique entries
fetch_pack() removes duplicates from the "sought" list, thereby
shrinking the list.  But previously, the caller was not informed about
the shrinkage.  This would cause a spurious error message to be
emitted by cmd_fetch_pack() if "git fetch-pack" is called with
duplicate refnames.

Instead, remove duplicates using string_list_remove_duplicates(),
which adjusts sought->nr to reflect the new length of the list.

The last test of t5500 inexplicably *required* "git fetch-pack" to
fail when fetching a list of references that contains duplicates;
i.e., it insisted on the buggy behavior.  So change the test to expect
the correct behavior.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Michael Haggerty 382a967114 filter_refs(): do not check the same sought_pos twice
Once a match has been found at sought_pos, the entry is zeroed and no
future attempts will match that entry.  So increment sought_pos to
avoid checking against the zeroed-out entry during the next iteration.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Michael Haggerty 8bee93dd24 Change fetch_pack() and friends to take string_list arguments
Instead of juggling <nr_heads,heads> (sometimes called
<nr_match,match>), pass around the list of references to be sought in
a single string_list variable called "sought".  Future commits will
make more use of string_list functionality.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Michael Haggerty 63c694534b fetch_pack(): reindent function decl and defn
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Jeff King 36c60f7a08 fetch-pack: mention server version with verbose output
Fetch-pack's verbose mode is more of a debugging mode (and
in fact takes two "-v" arguments to trigger via the
porcelain layer). Let's mention the server version as
another possible item of interest.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-13 21:56:05 -07:00
Junio C Hamano 74991a98df fetch-pack: do not ask for unadvertised capabilities
In the same spirit as the previous fix, stop asking for thin-pack, no-progress
and include-tag capabilities when the other end does not claim to support them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-10 14:27:52 -07:00
Jeff King d50c387163 do not send client agent unless server does first
Commit ff5effdf taught both clients and servers of the git protocol
to send an "agent" capability that just advertises their version for
statistics and debugging purposes.  The protocol-capabilities.txt
document however indicates that the client's advertisement is
actually a response, and should never include capabilities not
mentioned in the server's advertisement.

Adding the unconditional advertisement in the server programs was
OK, then, but the clients broke the protocol.  The server
implementation of git-core itself does not care, but at least one
does: the Google Code git server (or any server using Dulwich), will
hang up with an internal error upon seeing an unknown capability.

Instead, each client must record whether we saw an agent string from
the server, and respond with its agent only if the server mentioned
it first.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-10 12:35:13 -07:00
Jeff King ff5effdf45 include agent identifier in capability string
Instead of having the client advertise a particular version
number in the git protocol, we have managed extensions and
backwards compatibility by having clients and servers
advertise capabilities that they support. This is far more
robust than having each side consult a table of
known versions, and provides sufficient information for the
protocol interaction to complete.

However, it does not allow servers to keep statistics on
which client versions are being used. This information is
not necessary to complete the network request (the
capabilities provide enough information for that), but it
may be helpful to conduct a general survey of client
versions in use.

We already send the client version in the user-agent header
for http requests; adding it here allows us to gather
similar statistics for non-http requests.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-03 13:03:34 -07:00
Junio C Hamano 12d7d15074 Merge branch 'jk/fetch-pack-remove-dups-optim'
The way "fetch-pack" that is given multiple references to fetch tried to
remove duplicates was very inefficient.

By Jeff King
* jk/fetch-pack-remove-dups-optim:
  fetch-pack: sort incoming heads list earlier
  fetch-pack: avoid quadratic loop in filter_refs
  fetch-pack: sort the list of incoming refs
  add sorting infrastructure for list refs
  fetch-pack: avoid quadratic behavior in remove_duplicates
  fetch-pack: sort incoming heads
2012-05-29 13:09:08 -07:00
Junio C Hamano 4dbfaee0c7 Merge branch 'mh/fetch-pack-constness'
Tighten constness of some local variables in a callchain.

By Michael Haggerty
* mh/fetch-pack-constness:
  cmd_fetch_pack(): respect constness of argv parameter
  cmd_fetch_pack(): combine the loop termination conditions
  cmd_fetch_pack(): handle non-option arguments outside of the loop
  cmd_fetch_pack(): declare dest to be const
2012-05-29 13:08:53 -07:00
Jeff King 3d2a33e57f fetch-pack: sort incoming heads list earlier
Commit 4435968 started sorting heads fed to fetch-pack so
that later commits could use more optimized algorithms;
commit 7db8d53 switched the remove_duplicates function to
such an algorithm.

Of course, the sorting is more effective if you do it
_before_ the algorithm in question.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-24 10:02:37 -07:00
Jeff King a0de28805d fetch-pack: avoid quadratic loop in filter_refs
We have a list of refs that we want to compare against the
"match" array. The current code searches the match list
linearly, giving quadratic behavior over the number of refs
when you want to fetch all of them.

Instead, we can compare the lists as we go, giving us linear
behavior.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 13:31:03 -07:00
Jeff King 9e8e704f0b fetch-pack: sort the list of incoming refs
Having the list sorted means we can avoid some quadratic
algorithms when comparing lists.

These should typically be sorted already, but they do come
from the remote, so let's be extra careful. Our ref-sorting
implementation does a mergesort, so we do not have to care
about performance degrading in the common case that the list
is already sorted.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 13:31:03 -07:00
Jeff King 7db8d5370f fetch-pack: avoid quadratic behavior in remove_duplicates
We remove duplicate entries from the list of refs we are
fed in fetch-pack. The original algorithm is quadratic over
the number of refs, but since the list is now guaranteed to
be sorted, we can do it in linear time.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 13:31:03 -07:00
Jeff King 443596850f fetch-pack: sort incoming heads
There's no reason to preserve the incoming order of the
heads we're requested to fetch. By having them sorted, we
can replace some of the quadratic algorithms with linear
ones.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 13:31:03 -07:00
Michael Haggerty 57e6fc6958 cmd_fetch_pack(): respect constness of argv parameter
The old code cast away the constness of the strings passed to the
function in argument argv[], which could result in their being
modified by filter_refs().  Fix by copying reference names from argv
and putting them into our own array (similarly to how refnames passed
to stdin were already handled).

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 12:57:20 -07:00
Michael Haggerty ff22ff9909 cmd_fetch_pack(): combine the loop termination conditions
If an argument that does not start with '-' is found, the loop is
terminated.  So move that check into the for-loop condition.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 12:57:20 -07:00
Michael Haggerty 4cc00fcf5d cmd_fetch_pack(): handle non-option arguments outside of the loop
This makes it more obvious that the code is always executed unless
there is an error, and that the first initialization of nr_heads is
unnecessary.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 12:57:19 -07:00
Michael Haggerty 9d19c6ea52 cmd_fetch_pack(): declare dest to be const
There is no need for it to be non-const, and this avoids the need
for casting away the constness of an argv element.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 12:57:19 -07:00
Junio C Hamano 77cab8af4a Merge branch 'it/fetch-pack-many-refs'
When "git fetch" encounters repositories with too many references, the
command line of "fetch-pack" that is run by a helper e.g. remote-curl,
may fail to hold all of them. Now such an internal invocation can feed
the references through the standard input of "fetch-pack".

By Ivan Todoroski
* it/fetch-pack-many-refs:
  remote-curl: main test case for the OS command line overflow
  fetch-pack: test cases for the new --stdin option
  remote-curl: send the refs to fetch-pack on stdin
  fetch-pack: new --stdin option to read refs from stdin
2012-04-24 14:40:51 -07:00
Ivan Todoroski 078b895fef fetch-pack: new --stdin option to read refs from stdin
If a remote repo has too many tags (or branches), cloning it over the
smart HTTP transport can fail because remote-curl.c puts all the refs
from the remote repo on the fetch-pack command line. This can make the
command line longer than the global OS command line limit, causing
fetch-pack to fail.

This is especially a problem on Windows where the command line limit is
orders of magnitude shorter than Linux. There are already real repos out
there that msysGit cannot clone over smart HTTP due to this problem.

Here is an easy way to trigger this problem:

	git init too-many-refs
	cd too-many-refs
	echo bla > bla.txt
	git add .
	git commit -m test
	sha=$(git rev-parse HEAD)
	tag=$(perl -e 'print "bla" x 30')
	for i in `seq 50000`; do
		echo $sha refs/tags/$tag-$i >> .git/packed-refs
	done

Then share this repo over the smart HTTP protocol and try cloning it:

	$ git clone http://localhost/.../too-many-refs/.git
	Cloning into 'too-many-refs'...
	fatal: cannot exec 'fetch-pack': Argument list too long

50k tags is obviously an absurd number, but it is required to
demonstrate the problem on Linux because it has a much more generous
command line limit. On Windows the clone fails with as little as 500
tags in the above loop, which is getting uncomfortably close to the
number of tags you might see in real long lived repos.

This is not just theoretical, msysGit is already failing to clone our
company repo due to this. It's a large repo converted from CVS, nearly
10 years of history.

Four possible solutions were discussed on the Git mailing list (in no
particular order):

1) Call fetch-pack multiple times with smaller batches of refs.

This was dismissed as inefficient and inelegant.

2) Add option --refs-fd=$n to pass a an fd from where to read the refs.

This was rejected because inheriting descriptors other than
stdin/stdout/stderr through exec() is apparently problematic on Windows,
plus it would require changes to the run-command API to open extra
pipes.

3) Add option --refs-from=$tmpfile to pass the refs using a temp file.

This was not favored because of the temp file requirement.

4) Add option --stdin to pass the refs on stdin, one per line.

In the end this option was chosen as the most efficient and most
desirable from scripting perspective.

There was however a small complication when using stdin to pass refs to
fetch-pack. The --stateless-rpc option to fetch-pack also uses stdin for
communication with the remote server.

If we are going to sneak refs on stdin line by line, it would have to be
done very carefully in the presence of --stateless-rpc, because when
reading refs line by line we might read ahead too much data into our
buffer and eat some of the remote protocol data which is also coming on
stdin.

One way to solve this would be to refactor get_remote_heads() in
fetch-pack.c to accept a residual buffer from our stdin line parsing
above, but this function is used in several places so other callers
would be burdened by this residual buffer interface even when most of
them don't need it.

In the end we settled on the following solution:

If --stdin is specified without --stateless-rpc, fetch-pack would read
the refs from stdin one per line, in a script friendly format.

However if --stdin is specified together with --stateless-rpc,
fetch-pack would read the refs from stdin in packetized format
(pkt-line) with a flush packet terminating the list of refs. This way we
can read the exact number of bytes that we need from stdin, and then
get_remote_heads() can continue reading from the same fd without losing
a single byte of remote protocol data.

This way the --stdin option only loses generality and scriptability when
used together with --stateless-rpc, which is not easily scriptable
anyway because it also uses pkt-line when talking to the remote server.

Signed-off-by: Ivan Todoroski <grnch@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-02 13:47:15 -07:00
Junio C Hamano 592d051759 Merge branch 'cb/transfer-no-progress'
* cb/transfer-no-progress:
  push/fetch/clone --no-progress suppresses progress output
2012-02-20 00:14:55 -08:00
Clemens Buchacher 01fdc21f6e push/fetch/clone --no-progress suppresses progress output
By default, progress output is disabled if stderr is not a terminal.
The --progress option can be used to force progress output anyways.
Conversely, --no-progress does not force progress output. In particular,
if stderr is a terminal, progress output is enabled.

This is unintuitive. Change --no-progress to force output off.

Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-13 13:06:53 -08:00
Michael Haggerty f257659132 everything_local(): mark alternate refs as complete
Objects in an alternate object database are already available to the
local repository and therefore don't need to be fetched.  So mark them
as complete in everything_local().

This fixes a test in t5700.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-12 19:50:39 -08:00
Michael Haggerty c41a802fe9 fetch-pack.c: inline insert_alternate_refs()
The logic of the (single) caller is clearer without encapsulating this
one line in a function.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-12 19:50:39 -08:00
Michael Haggerty 65385ef7d4 fetch-pack.c: rename some parameters from "path" to "refname"
The parameters denote reference names, which are no longer 1:1 with
filesystem paths.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-12 19:50:39 -08:00
Jeff King 1e7ba0f9ca fetch-pack: match refs exactly
When we are determining the list of refs to fetch via
fetch-pack, we have two sets of refs to compare: those on
the remote side, and a "match" list of things we want to
fetch. We iterate through the remote refs alphabetically,
seeing if each one is wanted by the "match" list.

Since def88e9 (Commit first cut at "git-fetch-pack",
2005-07-04), we have used the "path_match" function to do a
suffix match, where a remote ref is considered wanted if
any of the "match" elements is a suffix of the remote
refname.

This enables callers of fetch-pack to specify unqualified
refs and have them matched up with remote refs (e.g., ask
for "A" and get remote's "refs/heads/A"). However, if you
provide a fully qualified ref, then there are corner cases
where we provide the wrong answer. For example, given a
remote with two refs:

   refs/foo/refs/heads/master
   refs/heads/master

asking for "refs/heads/master" will first match
"refs/foo/refs/heads/master" by the suffix rule, and we will
erroneously fetch it instead of refs/heads/master.

As it turns out, all callers of fetch_pack do provide
fully-qualified refs for the match list. There are two ways
fetch_pack can get match lists:

  1. Through the transport code (i.e., via git-fetch)

  2. On the command-line of git-fetch-pack

In the first case, we will always be providing the names of
fully-qualified refs from "struct ref" objects. We will have
pre-matched those ref objects already (since we have to
handle more advanced matching, like wildcard refspecs), and
are just providing a list of the refs whose objects we need.

In the second case, users could in theory be providing
non-qualified refs on the command-line. However, the
fetch-pack documentation claims that refs should be fully
qualified (and has always done so since it was written in
2005).

Let's change this path_match call to simply check for string
equality, matching what the callers of fetch_pack are
expecting.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-13 10:17:50 -08:00
Jeff King afe7c5ff1f drop "match" parameter from get_remote_heads
The get_remote_heads function reads the list of remote refs
during git protocol session. It dates all the way back to
def88e9 (Commit first cut at "git-fetch-pack", 2005-07-04).
At that time, the idea was to come up with a list of refs we
were interested in, and then filter the list as we got it
from the remote side.

Later, 1baaae5 (Make maximal use of the remote refs,
2005-10-28) stopped filtering at the get_remote_heads layer,
letting us use the non-matching refs to find common history.

As a result, all callers now simply pass an empty match
list (and any future callers will want to do the same). So
let's drop these now-useless parameters.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-13 10:08:24 -08:00
Junio C Hamano 9bd500048d Merge branch 'mh/check-ref-format-3'
* mh/check-ref-format-3: (23 commits)
  add_ref(): verify that the refname is formatted correctly
  resolve_ref(): expand documentation
  resolve_ref(): also treat a too-long SHA1 as invalid
  resolve_ref(): emit warnings for improperly-formatted references
  resolve_ref(): verify that the input refname has the right format
  remote: avoid passing NULL to read_ref()
  remote: use xstrdup() instead of strdup()
  resolve_ref(): do not follow incorrectly-formatted symbolic refs
  resolve_ref(): extract a function get_packed_ref()
  resolve_ref(): turn buffer into a proper string as soon as possible
  resolve_ref(): only follow a symlink that contains a valid, normalized refname
  resolve_ref(): use prefixcmp()
  resolve_ref(): explicitly fail if a symlink is not readable
  Change check_refname_format() to reject unnormalized refnames
  Inline function refname_format_print()
  Make collapse_slashes() allocate memory for its result
  Do not allow ".lock" at the end of any refname component
  Refactor check_refname_format()
  Change check_ref_format() to take a flags argument
  Change bad_ref_char() to return a boolean value
  ...
2011-10-10 15:56:18 -07:00
Michael Haggerty 8d9c50105f Change check_ref_format() to take a flags argument
Change check_ref_format() to take a flags argument that indicates what
is acceptable in the reference name (analogous to "git
check-ref-format"'s "--allow-onelevel" and "--refspec-pattern").  This
is more convenient for callers and also fixes a failure in the test
suite (and likely elsewhere in the code) by enabling "onelevel" and
"refspec-pattern" to be allowed independently of each other.

Also rename check_ref_format() to check_refname_format() to make it
obvious that it deals with refnames rather than references themselves.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-05 13:45:29 -07:00
Junio C Hamano ca0c9764bf Merge branch 'jc/fetch-pack-fsck-objects'
* jc/fetch-pack-fsck-objects:
  test: fetch/receive with fsckobjects
  transfer.fsckobjects: unify fetch/receive.fsckobjects
  fetch.fsckobjects: verify downloaded objects

Conflicts:
	Documentation/config.txt
	builtin/fetch-pack.c
2011-10-05 12:36:20 -07:00
Junio C Hamano dab76d3aa6 transfer.fsckobjects: unify fetch/receive.fsckobjects
This single variable can be used to set instead of setting fsckobjects
variable for fetch & receive independently.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-04 12:39:32 -07:00
Junio C Hamano 5e838ea7aa fetch.fsckobjects: verify downloaded objects
This corresponds to receive.fsckobjects configuration variable added (a
lot) earlier in 20dc001 (receive-pack: allow using --strict mode for
unpacking objects, 2008-02-25).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-04 12:27:17 -07:00
Junio C Hamano 3400c222d9 Merge branch 'nd/decorate-grafts'
* nd/decorate-grafts:
  log: Do not decorate replacements with --no-replace-objects
  log: decorate "replaced" on to replaced commits
  log: decorate grafted commits with "grafted"
  Move write_shallow_commits to fetch-pack.c
  Add for_each_commit_graft() to iterate all grafts
  decoration: do not mis-decorate refs with same prefix
2011-08-28 21:22:58 -07:00
Nguyễn Thái Ngọc Duy ec099546a9 fetch-pack: check for valid commit from server
A malicious server can return ACK with non-existent SHA-1 or not a
commit. lookup_commit() in this case may return NULL. Do not let
fetch-pack crash by accessing NULL address in this case.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-18 12:25:54 -07:00
Nguyễn Thái Ngọc Duy 294e15fc19 Move write_shallow_commits to fetch-pack.c
This function produces network traffic and should be in fetch-pack. It
has been in commit.c because it needs to iterate (private) graft
list. It can now do so using for_each_commit_graft().

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-18 11:01:18 -07:00
Junio C Hamano 01f9ffbd5d Merge branch 'jk/haves-from-alternate-odb'
* jk/haves-from-alternate-odb:
  receive-pack: eliminate duplicate .have refs
  bisect: refactor sha1_array into a generic sha1 list
  refactor refs_from_alternate_cb to allow passing extra data
2011-05-29 23:51:22 -07:00
Jeff King 114a6a889f refactor refs_from_alternate_cb to allow passing extra data
The foreach_alt_odb function triggers a callback for each
alternate object db we have, with room for a single void
pointer as data. Currently, we always call refs_from_alternate_cb
as the callback function, and then pass another callback (to
receive each ref individually) as the void pointer.

This has two problems:

  1. C technically forbids stuffing a function pointer into
     a "void *". In practice, this probably doesn't matter
     on any architectures git runs on, but it never hurts to
     follow the letter of the law.

  2. There is no room for an extra data pointer. Indeed, the
     alternate_ref_fn that refs_from_alternate_cb calls
     takes a void* for data, but we always pass it NULL.

Instead, let's properly stuff our function pointer into a
data struct, which also leaves room for an extra
caller-supplied data pointer. And to keep things simple for
existing callers, let's make a for_each_alternate_ref
function that takes care of creating the extra struct.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-19 20:01:10 -07:00
Jeff King ea5f220821 fetch: avoid repeated commits in mark_complete
We add every local ref to a list so that we can mark them
and all of their ancestors back to a certain cutoff point.
However, if some refs point to the same commit, we will end
up adding them to the list many times.

Furthermore, since commit_lists are stored as linked lists,
we must do an O(n) traversal of the list in order to find
the right place to insert each commit. This makes building
the list O(n^2) in the number of refs.

For normal repositories, this isn't a big deal. We have a
few hundreds refs at most, and most of them are unique. But
consider an "alternates" repo that serves as an object
database for many other similar repos. For reachability, it
needs to keep a copy of the refs in each child repo. This
means it may have a large number of refs, many of which
point to the same commits.

By noting commits we have already added to the list, we can
shrink the size of "n" in such a repo to the number of
unique commits, which is on the order of what a normal repo
would contain (it's actually more than a normal repo, since child repos
may have branches at different states, but in practice it tends
to be much smaller than the list with duplicates).

Here are the results on one particular giant repo
(containing objects for all Rails forks on GitHub):

  $ git for-each-ref | wc -l
  112514

  [before]
  $ git fetch --no-tags ../remote.git
  63.52user 0.12system 1:03.68elapsed 99%CPU (0avgtext+0avgdata 137648maxresident)k
  1856inputs+48outputs (11major+19603minor)pagefaults 0swaps

  $ git fetch --no-tags ../remote.git
  6.15user 0.08system 0:06.25elapsed 99%CPU (0avgtext+0avgdata 123856maxresident)k
  0inputs+40outputs (0major+18872minor)pagefaults 0swaps

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-19 18:41:44 -07:00
Junio C Hamano 96220d837c Merge branch 'jc/fetch-progressive-stride'
* jc/fetch-progressive-stride:
  Fix potential local deadlock during fetch-pack
2011-03-29 14:09:08 -07:00
Junio C Hamano 2eee1393f3 Merge branches 'sp/maint-fetch-pack-stop-early' and 'sp/maint-upload-pack-stop-early'
* sp/maint-fetch-pack-stop-early:
  enable "no-done" extension only when fetching over smart-http

* sp/maint-upload-pack-stop-early:
  enable "no-done" extension only when serving over smart-http
2011-03-29 14:09:02 -07:00
Junio C Hamano 4e10cf9a17 Revert two "no-done" reverts
Last night I had to make these two emergency reverts, but now we have a
better understanding of which part of the topic was broken, let's get rid
of the revert to fix it correctly.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-29 12:29:10 -07:00
Junio C Hamano 44d8dc54e7 Fix potential local deadlock during fetch-pack
The fetch-pack/upload-pack protocol relies on the underlying transport
(local pipe or TCP socket) to have enough slack to allow one window worth
of data in flight without blocking the writer.  Traditionally we always
relied on being able to have two windows of 32 "have"s in flight (roughly
3k bytes) to stream.

The recent "progressive-stride" change allows "fetch-pack" to send up to
1024 "have"s without reading any response from "upload-pack".  The
outgoing pipe of "upload-pack" can be clogged with many ACK and NAK that
are unread, while "fetch-pack" is still stuffing its outgoing pipe with
more "have"s, leading to a deadlock.

Revert the change unless we are in stateless rpc (aka smart-http) mode, as
using a large window full of "have"s is still a good way to help reduce
the number of back-and-forth, and there is no buffering issue there (it is
strictly "ping-pong" without an overlap).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-29 12:20:22 -07:00