With this patch we have a speedup and much lower IO when
importing trees with many branches. Instead of forcing
index re-population for each branch switch, we keep
many index files around, one per branch.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We now capture the output of cvsps to a tempfile, and then read it in.
cvsps 2.1 works quite a bit "in memory", and only prints its patchset
info once it has finished talking with cvs, but apparently retaining
all that memory allocation. With this patch, cvsps is finished and
reaped before cvsimport start working (and growing). So the footprint
of the whole process is much lower.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
cvsps output often contains references to CVSPS_NO_BRANCH, commits
that it could not trace to a branch. Ignore that branch.
Additionally, cvsps will sometimes draw circular relationships
between branches -- where two branches are recorded as opening
from the other. In those cases, and where the ancestor branch
hasn't been seen, ignore it.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Avoid "use POSIX qw(strftime dup2 :errno_h)"; it was reported
that a Perl installations on Mandrake 9.1 did not like it, even
though it understood "use POSIX qw(:errno_h)". Funny.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When the server says "created this file whose length is empty",
we mistakenly said "oops, the server did not say a sensible
thing". Fix it.
Spotted and fixed by Linus, acked by Martin.
Signed-off-by: Junio C Hamano <junkio@cox.net>
File retrieval from the socket is now moved to _fetchfile() and we now
cap reads at 1MB. This should limit the memory growth of the cvsimport
process.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This change attempts to clean up the commit function to make it a bit
easier to read (or at least the first half of it). It also improves
robustness and performance. Specifically:
- report get_headref errors on opening ref unless the error is ENOENT
- use regex to check for sha1 instead of length
- use lexically scoped filehandles which get cleaned up automagically
- check for error on both 'print' and 'close' (since output is buffered)
- avoid "fork, do some perl, then exec" in commit(). It's not necessary,
and we probably end up COW'ing parts of the perl process. Plus the code
is much smaller because we can use open2()
- avoid calling strftime over and over (mainly a readability cleanup)
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This should reduce the number of git-update-index forks required per
commit. We now do adds/removes in one call, and we are no longer forced to
deal with argv limitations.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
commit() does not need to be an anonymous subreference. Keep it simple.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Cleanup @skipped after it's used. Close a fhandle.
Removing suspects one at a time.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Sometimes the pserver says "Removed" instead of "Remove-entry".
Signed-off-by: Elrond <elrond+kernel.org@samba-tng.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This simplifies code, and also fixes a subtle bug: when importing in a
shared repository, where another user last imported from CVS, cvsimport
used to complain that it could not open <branch> for update.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The updated code reads the tip of the current branch before and
after the import runs, but forgot to chomp what we read from the
command. The read-tree command did not them with the trailing
LF.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Documentation says -i is "import only", so without it,
subsequent import should update the current branch and working
tree files in a sensible way.
"A sensible way" defined by this commit is "act as if it is a
git pull from foreign repository which happens to be CVS not
git". So:
- If importing into the current branch (note that cvsimport
requires the tracking branch is pristine -- you checked out
the tracking branch but it is your responsibility not to make
your own commits there), fast forward the branch head and
match the index and working tree using two-way merge, just
like "git pull" does.
- If importing into a separate tracking branch, update that
branch head, and merge it into your current branch, again,
just like "git pull" does.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The code which tried to update the master branch was somewhat broken.
=> People should do that manually, with "git merge".
Signed-off-by: Matthias Urlichs <smurf@smurf.noris.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Fixed a couple of bugs in recovering from broken connections:
The _line() method now returns undef correctly when the connection
is broken instead of falling off the function and returning garbage.
Retries are now reported to stderr and the eventual partially
downloaded file is discarded instead of being appended to.
The "Server gone away" test has been removed, because it was
reachable only if the garbage return bug bit.
Signed-off-by: Martin Mares <mj@ucw.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
A couple of things that seem to help importing broken CVS repos...
-S '<slash-delimited-regex>' skips files with a matching path
-v prints file name and version before fetching from cvs
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This fixes a minor bug, which caused the author email to be
doubly enclosed in a <> pair (the code gave enclosing <> to
GIT_AUTHOR_EMAIL and GIT_COMMITTER_EMAIL environment variable).
The read_author_info() subroutine is taught to also understand
the user list in CVSROOT/users format. This is primarily done
to ease migration for CVS users, who can use the -A option
to read from existing CVSROOT/users file. write_author_info()
always writes in the git-cvsimport's native format ('='
delimited and value without quotes).
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch adds the option to specify an author name/email conversion
file in the format
exon=Andreas Ericsson <ae@op5.se>
spawn=Simon Pawn <spawn@frog-pond.org>
which will translate the ugly cvs authornames to the more informative
git style.
The info is saved in $GIT_DIR/cvs-authors, so that subsequent
incremental imports will use the same author-info even if no -A
option is specified. If an -A option *is* specified, the info in
$GIT_DIR/cvs-authors is appended/updated appropriately.
Docs updated accordingly.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Junio C Hamano <junkio@cox.net>
In 'git cvsimport' changes "/" to "-" (or $opt_s) in branch names,
but not in tag names, which is inconsistent.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Fix git import script not to assume that .git/HEAD is a symlink.
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Tell cvsps to be quiet, unless we've been told to be verbose.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
-P:: <cvsps-output-file>
Instead of calling cvsps, read the provided cvsps output file. Useful
for debugging or when cvsps is being handled outside cvsimport.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Add error handling for cases where the cvs server goes away unexpectedly.
While I don't know why the cvs server is so erratic, we should definitely
exit here before committing bogus files.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Perl was warning that $opt_p was undefined in that case.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Perl was warning that $opt_p was undefined in that case.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Detecting if the user passed --no-cvs-direct and don't force the mode.
It allows us to support all the protocol that the standard cvs client
supports at the snail speed you should expect.
This only affects the rlog reading stage.
Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>
Matching and reporting merge parents happens in a subprocess.
Re-open stdout before redirecting stdout to the pipe, so that printing
verbose messages doesn't go to the wrong place.
Signed-Off-By: Matthias Urlichs <smurf@smurf.noris.de>
Alexey Nezhdanov updated CVSps to generate author-name and
author-email information in its output.
If the input looks like it has that already properly formatted,
use that without our own munging.
Signed-off-by: Junio C Hamano <junkio@cox.net>
As promised, this is the "big tool rename" patch. The primary differences
since 0.99.6 are:
(1) git-*-script are no more. The commands installed do not
have any such suffix so users do not have to remember if
something is implemented as a shell script or not.
(2) Many command names with 'cache' in them are renamed with
'index' if that is what they mean.
There are backward compatibility symblic links so that you and
Porcelains can keep using the old names, but the backward
compatibility support is expected to be removed in the near
future.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This patch changes git-cvsimport-script so that it creates tag objects
instead of refs to commits, and adds an option, -u, to convert
underscores in branch and tag names to dots (since CVS doesn't allow
dots in branches and tags.)
Signed-off-by: Junio C Hamano <junkio@cox.net>
... in the newly introduced merge detection code.
Signed-off-by: Martin Langhoff <martin.langhoff@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Added -m and -M flags for git-cvsimport to detect merge commits in cvs.
While this trusts the commit message, in repositories where merge commits
indicate 'merged from FOOBRANCH' the import works surprisingly well.
Even if some merges from CVS are bogus or incomplete, the resulting
branches are in better state to go forward (and merge) than without any
merge detection.
Signed-off-by: Martin Langhoff <martin.langhoff@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
I track a CVS project which has a branch with a '/' in the branch name.
Since git wants the branch name to be a file name at the same time,
substitute that character to a '-' by default (override with "-s <subst>").
This should work well, despite the fact that a division and a difference
are completely different :-)
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Early versions of git-cvsimport defaulted to using preexisting keyword
expansion settings. This change preserves compatibility with existing cvs
imports and allows new repository migrations to kill keyword expansion.
After exploration of the different -k modes in the cvs protocol, we use -kk
which kills keyword expansion wherever possible. Against the protocol
spec, -ko and -kb will sometimes expand keywords.
Should improve our chances of detecting merges and reduce imported
repository size.
Signed-off: Martin Langhoff <martin.langhoff@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The git-cvsimport-script had a copule of small bugs that prevented me
from importing a big CVS repository.
The first was that it didn't handle removed files with a multi-digit
primary revision number.
The second was that it was asking the CVS server for "F" messages,
although they were not handled.
I also updated the documentation for that script to correspond to
actual flags.
Signed-off-by: David K?5gedal <davidk@lysator.liu.se>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Problems found while importing dasher's CVS:
* Allow spaces in filenames.
* cvsps may create unnamed branches with revisions that don't really
exist, which causes the CVS server to return something we haven't
hitherto expected.
* Report deleted files when being verbose.
* Also, report the commit date.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Previously, git-cvsimport-script would fail
on revisions with more than one digit.
Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
git-cvsimport-script: add "import only" option which tells the script
not to perform a checkout after importing.
This ensures that the working directory and cache remain untouched and
will not create them if they do not exist.
Acked-by: Matthias Urlichs <smurf@smurf.noris.de>
Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes the first half of write_sha1_file() and
index_fd() externally visible, to allow callers to compute the
object ID without actually storing it in the object database.
[JC demangled the whitespaces himself because he liked the patch
so much, and reworked the interface to index_fd() slightly,
taking suggestion from Linus and of his own.]
Signed-off-by: Bryan Larsen <bryan.larsen@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If HEAD happened to point to a cvs branch, move the
working directory forward to the tip of the branch.
Additionally, if master and "origin" are equal,
move master forward to new origin first.