When the input mbox does not identify what encoding it is in,
and already have RFC2047 stripped away, we cannot tell what
encoding the header text is in. For body text, when the message
does not say what charset it is in, we fall back to assume
latin-1 input when converting to utf8. This should be done
consistently to the header as well.
Signed-off-by: Junio C Hamano <junkio@cox.net>
It was pointed out that the current behaviour might mispart a patch comment
so remove this behaviour for now.
[jc: this fixes "From: line in the middle" check in t5100 test.]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We exited prematurely from header parsing loop when the header
field did not have a space after the colon but we insisted on
it, and we got the check wrong because we forgot that we strip
the trailing whitespace before we do the check.
The space after the colon is not even required by RFC2822, so
stop requiring it. While we are at it, the header line is
specified to be more strict than "anything with a colon in it"
(there must be one or more characters before the colon, and they
must not be controls, SP or non US-ASCII), so implement that
check as well, lest we mistakenly think something like:
Bogus not a header line: this is not.
as a header line.
Signed-off-by: Junio C Hamano <junkio@cox.net>
- handle_from is fixed to not mangle it's input line.
- Then handle_inbody_header is allowed to look in
the body of a commit message for additional headers
that we haven't already seen.
This allows patches with all of the right information in
unfortunate places to be imported.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Only count lines of the form '^.*: ' and '^From ' as email
header lines.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This prepares for detecting non-email patches that don't have
mail headers. In which case we have already read the first
line so handle_body should not ignore it.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
- Move handle_info into main so it is called once
after everything has been parsed. This allows the removal
of a static variable and removes two duplicate calls.
- Move parsing of inbody headers into handle_commit.
This means we parse the in-body headers after we have decoded
the character set, and it removes code duplication between
handle_multipart_one_part and handle_body.
- Change the flag indicating that we have seen an in body
prefix header into another bit in seen.
This is a little more general and allows the possibility of parsing
in body headers after the body message has begun.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
B and Q decoding is not appropriate for in body headers, so move
it up to where we explicitly know we have a real email header.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Currently we only use the return value from read_one_header line
to tell if the line we have read is a header or not. So make
it a flag. This paves the way for better email detection.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Sometimes people just include the whole format-patch output in
the commit e-mail. Detect it and skip the bogus ">From " line.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Quoted-Printable (RFC 2045) and the "Q" encoding (RFC 2047) are
subtly different; the latter is used on the mail header and an
underscore needs to be decoded to 0x20.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Systems using some uClibc versions do not properly support
iconv stuff. This patch allows Git to be built on those
systems by passing NO_ICONV=YesPlease to make. The only
drawback is mailinfo won't do charset conversion in those
systems.
Signed-off-by: Fernando J. Pereda <ferdy@gentoo.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
If the first part uses quoted-printable to protect iso8859-1
name in the commit log, and the second part was plain ascii text
patchfile without even Content-Transfer-Encoding subheader, we
incorrectly tried to decode the patch as quoted printable.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This was a stupid typo that did not follow
http://www.iana.org/assignments/character-sets
Long noticed but neglected by JC, but finally reported by
Marco.
Signed-off-by: Junio C Hamano <junkio@cox.net>
An isolated developer could have a local-only e-mail, which will
be stripped out by mailinfo because it lacks '@'. Define a
fallback parser to accomodate that.
At the same time, reject authorless patch in git-am.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Added an AIX clause in the Makefile; that clause likely
will be wrong for any AIX pre-5.2, but I can only test
on 5.3. mailinfo.c was missing the compat header file,
and convert-objects.c needs to define a specific
_XOPEN_SOURCE as well as _XOPEN_SOURCE_EXTENDED.
Signed-off-by: E. Jason Riedy <ejr@cs.berkeley.edu>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This attempts to clean up the way various compatibility
functions are defined and used.
- A new header file, git-compat-util.h, is introduced. This
looks at various NO_XXX and does necessary function name
replacements, equivalent of -Dstrcasestr=gitstrcasestr in the
Makefile.
- Those function name replacements are removed from the Makefile.
- Common features such as usage(), die(), xmalloc() are moved
from cache.h to git-compat-util.h; cache.h includes
git-compat-util.h itself.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Specifying the value for a single letter, single dash option
parameter with equal sign looked funny, and more importantly
calling the flag to override encoding from utf-8 to something
else "-u" (obviously abbreviated from "utf-8") did not make any
sense. So spell it out.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This uses i18n.commitencoding configuration item to pick up the
default commit encoding for the repository when converting form
e-mail encoding to commit encoding (the default is utf8).
Signed-off-by: Junio C Hamano <junkio@cox.net>
When the message body does not identify what encoding it is in,
-u assumes it is in latin-1 and converts it to utf8, which is
the recommended encoding for git commit log messages.
With -u=<encoding>, the conversion is made into the specified
one, instead of utf8, to allow project-local policies.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Borrow from NO_MMAP patch by Johannes, squelch compiler warnings by
declaring gitstrcasestr() when we use it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Also make platform specific part more isolated. Currently we only
have Darwin defined, but I've taken a look at SunOS specific patch
(which I dropped on the floor for now) as well. Doing things this way
would make adding it easier.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This attempts to minimally cope with a subset of MIME "features" often
seen in patches sent to our mailing lists. Namely:
- People's name spelled in characters outside ASCII (both on From:
header and the signed-off-by line).
- Content-transfer-encoding using quoted-printable (both in
multipart and non-multipart messages).
These MIME features are detected and decoded by "git mailinfo".
Optionally, with the '-u' flag, the output to .info and .msg is
transliterated from its original chaset to utf-8. This is to
encourage people to use utf8 in their commit messages for
interoperability.
Applymbox accepts additional flag '-u' which is passed to mailinfo.
Signed-off-by: Junio C Hamano / 濱野 純 <junkio@cox.net>
This corresponds to the -k flag to git format-patch --mbox
option. The option should probably not be used when applying a
real e-mail patch, but is needed when format-patch and applymbox
pair is used for cherrypicking.
Signed-off-by: Junio C Hamano <junkio@cox.net>
fix one 'should it be static?' warning and
two 'mixing declarations and code' warnings.
Signed-off-by: Alecs King <alecsk@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Some people split their long E-mail address over two lines
using the RFC2822 header "folding". We can lose authorship
information this way, so make a minimum effort to deal with it,
instead of special casing only the "Subject:" field.
We could teach mailsplit to unfold the folded header, but
teaching mailinfo about folding would make more sense; a single
message can be fed to mailinfo without going through mailsplit.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- avoid duplicating [PATCH] in the commit message body if the
original commit has it already (happens for commits done from
mails via applymbox).
- check if the commit author is different from the one who is
running the script, and emit an appropriate "From:" and
"Date: " lines to the output.
- with '--date', emit "Date: " line to preserve the original
author date even for the user's own commit.
- teach mailinfo to grok not just "From: " but "Date: ".
The patch e-mail output by format-patch starts with the first
line from the original commit message, prefixed with [PATCH],
and optionally a From: line if you are reformatting a patch
obtained from somebody else, a Date: line from the original
commit if (1) --date is specified or (2) for somebody else's
patch, and the rest of the commit message body.
Expected use of this is to move the title line from the commit
to Subject: when sending it via an e-mail, and leave the From:
and the Date: lines as the first lines of your message.
The mailinfo command has been changed to read Date: (in addition
to From: it already understands) and do sensible things when
running applymbox.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rename into a "tools" subdirectory, and change name of "dotest" to "applymbox".
Remove stripspace (which was already copied into git) and cvs2git (which
was likewise already copied into git, and then replaced by a much better
perl version).
All of this was brought on by Ryan Anderson shaming me into it. Thanks.
I guess.
This way we don't get it in the commit message, even if the patch had
been generated by cogito (or CVS, ugh) and people didn't add the proper
"---" marker.
..and git-apply does a lot better job at it anyway.
Also, we break the comment/diff on a line that starts with "diff -", not
just on the "---" line. Especially for git diffs, we actually want that
line in the diff.
(We should probably also break on "Index: ..." followed by "=====")
Now that git does pretty reliable date parsing, we might as well get
the date from the email itself. Of course, it's still questionable
whether the date on the email is all that relevant, but it's certainly
no worse than taking the commit date.
I looked a bit at my old BK tools for the same thing, but they were
just so horrid in many ways that I largely rewrote it all and these
tools do things a bit differently. Instead of aggressively piping
data from one process to another (which was clever but very hard
to follow), this first just splits out the mbox into many smaller
email files, and then does some scripts on these temporary files.