The function fgets() has a big problem with NUL characters: it reads
them, but nobody will know if the NUL comes from the file stream, or
was appended at the end of the line.
So implement a custom read_line_with_nul() function.
Noticed by Tommy Thorn.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git_config() only had a function parameter, but no callback data
parameter. This assumes that all callback functions only modify
global variables.
With this patch, every callback gets a void * parameter, and it is hoped
that this will help the libification effort.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function is intended to be fed one logical line at a time to
inspect, but a QP encoded raw input line can have more than one
lines, just like BASE64 encoded one.
Quoting LF as =0A may be unusual but RFC2045 allows it.
The issue was noticed and fixed by Jay Soffian. JC added a test
to protect the fix from regressing later.
Signed-off-by: Jay Soffian <jaysoffian@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes rebase/am keep the original commit log message
better, even when it does not conform to "single line paragraph
to say what it does, then explain and defend why it is a good
change in later paragraphs" convention.
This change is a two-edged sword. While the earlier behaviour
would make such commit log messages more friendly to readers who
expect to get the birds-eye view with oneline summary formats,
users who primarily use git as a way to interact with foreign
SCM systems would not care much about the convenience of oneline
git log tools, but care more about preserving their own
convention. This changes their commits less useful to readers
who read them with git tools while keeping them more consistent
with the foreign SCM systems they interact with.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For some reason, I got this error message. Maybe it does not make sense,
but then we should not really try to convert the text when it is not
necessary.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At least in the kernel development community, we're generally slowly
converting to UTF-8 everywhere, and the old default of Latin1 in emails is
being supplanted by UTF-8, and it doesn't necessarily show up as such in
the mail headers (because, quite frankly, when people send patches
around, they want the email client to do as little as humanly possible
about the patch)
Despite that, it's often the case that email addresses etc still have
Latin1, so I've seen emails where this is a mixed bag, with Signed-off
parts being copied from email (and containing Latin1 characters), and the
rest of the email being a patch in UTF-8.
So this suggests a very natural change: if the target character set is
utf-8 (the default), and if the source already looks like utf-8, just
assume that it doesn't need any conversion at all.
Only assume that it needs conversion if it isn't already valid utf-8, in
which case we (for historical reasons) will assume it's Latin1.
Basically no really _valid_ latin1 will ever look like utf-8, so while
this changes our historical behaviour, it doesn't do so in practice, and
makes the default behaviour saner for the case where the input was already
in proper format.
We could do a more fancy guess, of course, but this correctly handled a
series of patches I just got from Andrew that had a mixture of Latin1 and
UTF-8 (in different emails, but without any character set indication).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Don't translate the patch to UTF-8, instead preserve the data as
is. This also reverts a test case that was included in the
original patch series.
Also allow overwriting the authorship and title information we
gather from RFC2822 mail headers with additional in-body
headers, which was pointed out by Linus.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
I have come across many emails that use long strings of '-'s as separators
for ideas. This patch below limits the separator to only 3 '-', with the
intent that long string of '-'s will stay in the commit msg and not in the
patch file.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
I am working on a project that required parsing through regular
mboxes that didn't necessarily have patches embedded in them. I
started by creating my own modified copy of git-am and working
from there. Very quickly, I noticed git-mailinfo wasn't able to
handle a big chunk of my email.
After hacking up numerous solutions and running into more
limitations, I decided it was just easier to rewrite a big chunk
of it. The following patch has a bunch of fixes and features
that I needed in order for me do what I wanted.
Note: I'm didn't follow any email rfc papers but I don't think
any of the changes I did required much knowledge (besides the
boundary stuff).
List of major changes/fixes:
- can't create empty patch files fix
- empty patch files don't fail, this failure will come inside git-am
- multipart boundaries are now handled
- only output inbody headers if a patch exists otherwise assume those
headers are part of the reply and instead output the original headers
- decode and filter base64 patches correctly
- various other accidental fixes
I believe I didn't break any existing functionality or
compatibility (other than what I describe above, which is really
only the empty patch file).
I tested this through various mailing list archives and
everything seemed to parse correctly (a couple thousand emails).
[jc: squashed in another patch from Don's five patch series to
fix the test case, as this patch exposes the bug in the test.]
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We shouldn't attempt to assign constant strings into char*, as the
string is not writable at runtime. Likewise we should always be
treating unsigned values as unsigned values, not as signed values.
Most of these are very straightforward. The only exception is the
(unnecessary) xstrdup/free in builtin-branch.c for the detached
head case. Since this is a user-level interactive type program
and that particular code path is executed no more than once, I feel
that the extra xstrdup call is well worth the easy elimination of
this warning.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
It basically considers all the continuation lines to be lines of their
own, and if the total line is bigger than what we can fit in it, we just
truncate the result rather than stop in the middle and then get confused
when we try to parse the "next" line (which is just the remainder of the
first line).
[jc: added test, and tightened boundary a bit per list discussion.]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This mechanically converts strncmp() to use prefixcmp(), but only when
the parameters match specific patterns, so that they can be verified
easily. Leftover from this will be fixed in a separate step, including
idiotic conversions like
if (!strncmp("foo", arg, 3))
=>
if (!(-prefixcmp(arg, "foo")))
This was done by using this script in px.perl
#!/usr/bin/perl -i.bak -p
if (/strncmp\(([^,]+), "([^\\"]*)", (\d+)\)/ && (length($2) == $3)) {
s|strncmp\(([^,]+), "([^\\"]*)", (\d+)\)|prefixcmp($1, "$2")|;
}
if (/strncmp\("([^\\"]*)", ([^,]+), (\d+)\)/ && (length($1) == $3)) {
s|strncmp\("([^\\"]*)", ([^,]+), (\d+)\)|(-prefixcmp($2, "$1"))|;
}
and running:
$ git grep -l strncmp -- '*.c' | xargs perl px.perl
Signed-off-by: Junio C Hamano <junkio@cox.net>
It is plausible for somebody to want to view the commit log in a
different encoding from i18n.commitencoding -- the project's
policy may be UTF-8 and the user may be using a commit message
hook to run iconv to conform to that policy (and either not have
i18n.commitencoding to default to UTF-8 or have it explicitly
set to UTF-8). Even then, Latin-1 may be more convenient for
the usual pager and the terminal the user uses.
The new variable i18n.logoutputencoding is used in preference to
i18n.commitencoding to decide what encoding to recode the log
output in when git-log and friends formats the commit log message.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a mechanical clean-up of the way *.c files include
system header files.
(1) sources under compat/, platform sha-1 implementations, and
xdelta code are exempt from the following rules;
(2) the first #include must be "git-compat-util.h" or one of
our own header file that includes it first (e.g. config.h,
builtin.h, pkt-line.h);
(3) system headers that are included in "git-compat-util.h"
need not be included in individual C source files.
(4) "git-compat-util.h" does not have to include subsystem
specific header files (e.g. expat.h).
Signed-off-by: Junio C Hamano <junkio@cox.net>
builtin-mailinfo.c has its own hexval implementaiton but it can
share the table-lookup one recently implemented in sha1_file.c
Signed-off-by: Junio C Hamano <junkio@cox.net>
[jc: I needed to hand merge the changes to the updated codebase,
so the result needs to be checked.]
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This changes the calling convention of built-in commands and
passes the "prefix" (i.e. pathname of $PWD relative to the
project root level) down to them.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Mail I get sometimes has multiple From lines, like this:
From Majordomo@vger.kernel.org Thu Jul 27 16:39:36 2006
>From mtsirkin Thu Jul 27 16:39:36 2006
Received: from yok.mtl.com [10.0.8.11]
...
which confuses git-mailinfo since that does not recognize >From
as a valid header line.
This patch makes it recognize >From XXX as a valid header line.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When the input mbox does not identify what encoding it is in,
and already have RFC2047 stripped away, we cannot tell what
encoding the header text is in. For body text, when the message
does not say what charset it is in, we fall back to assume
latin-1 input when converting to utf8. This should be done
consistently to the header as well.
Signed-off-by: Junio C Hamano <junkio@cox.net>
It was pointed out that the current behaviour might mispart a patch comment
so remove this behaviour for now.
[jc: this fixes "From: line in the middle" check in t5100 test.]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We exited prematurely from header parsing loop when the header
field did not have a space after the colon but we insisted on
it, and we got the check wrong because we forgot that we strip
the trailing whitespace before we do the check.
The space after the colon is not even required by RFC2822, so
stop requiring it. While we are at it, the header line is
specified to be more strict than "anything with a colon in it"
(there must be one or more characters before the colon, and they
must not be controls, SP or non US-ASCII), so implement that
check as well, lest we mistakenly think something like:
Bogus not a header line: this is not.
as a header line.
Signed-off-by: Junio C Hamano <junkio@cox.net>
- handle_from is fixed to not mangle it's input line.
- Then handle_inbody_header is allowed to look in
the body of a commit message for additional headers
that we haven't already seen.
This allows patches with all of the right information in
unfortunate places to be imported.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Only count lines of the form '^.*: ' and '^From ' as email
header lines.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This prepares for detecting non-email patches that don't have
mail headers. In which case we have already read the first
line so handle_body should not ignore it.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
- Move handle_info into main so it is called once
after everything has been parsed. This allows the removal
of a static variable and removes two duplicate calls.
- Move parsing of inbody headers into handle_commit.
This means we parse the in-body headers after we have decoded
the character set, and it removes code duplication between
handle_multipart_one_part and handle_body.
- Change the flag indicating that we have seen an in body
prefix header into another bit in seen.
This is a little more general and allows the possibility of parsing
in body headers after the body message has begun.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
B and Q decoding is not appropriate for in body headers, so move
it up to where we explicitly know we have a real email header.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Currently we only use the return value from read_one_header line
to tell if the line we have read is a header or not. So make
it a flag. This paves the way for better email detection.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Sometimes people just include the whole format-patch output in
the commit e-mail. Detect it and skip the bogus ">From " line.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Quoted-Printable (RFC 2045) and the "Q" encoding (RFC 2047) are
subtly different; the latter is used on the mail header and an
underscore needs to be decoded to 0x20.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Systems using some uClibc versions do not properly support
iconv stuff. This patch allows Git to be built on those
systems by passing NO_ICONV=YesPlease to make. The only
drawback is mailinfo won't do charset conversion in those
systems.
Signed-off-by: Fernando J. Pereda <ferdy@gentoo.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
If the first part uses quoted-printable to protect iso8859-1
name in the commit log, and the second part was plain ascii text
patchfile without even Content-Transfer-Encoding subheader, we
incorrectly tried to decode the patch as quoted printable.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This was a stupid typo that did not follow
http://www.iana.org/assignments/character-sets
Long noticed but neglected by JC, but finally reported by
Marco.
Signed-off-by: Junio C Hamano <junkio@cox.net>
An isolated developer could have a local-only e-mail, which will
be stripped out by mailinfo because it lacks '@'. Define a
fallback parser to accomodate that.
At the same time, reject authorless patch in git-am.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Added an AIX clause in the Makefile; that clause likely
will be wrong for any AIX pre-5.2, but I can only test
on 5.3. mailinfo.c was missing the compat header file,
and convert-objects.c needs to define a specific
_XOPEN_SOURCE as well as _XOPEN_SOURCE_EXTENDED.
Signed-off-by: E. Jason Riedy <ejr@cs.berkeley.edu>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This attempts to clean up the way various compatibility
functions are defined and used.
- A new header file, git-compat-util.h, is introduced. This
looks at various NO_XXX and does necessary function name
replacements, equivalent of -Dstrcasestr=gitstrcasestr in the
Makefile.
- Those function name replacements are removed from the Makefile.
- Common features such as usage(), die(), xmalloc() are moved
from cache.h to git-compat-util.h; cache.h includes
git-compat-util.h itself.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Specifying the value for a single letter, single dash option
parameter with equal sign looked funny, and more importantly
calling the flag to override encoding from utf-8 to something
else "-u" (obviously abbreviated from "utf-8") did not make any
sense. So spell it out.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This uses i18n.commitencoding configuration item to pick up the
default commit encoding for the repository when converting form
e-mail encoding to commit encoding (the default is utf8).
Signed-off-by: Junio C Hamano <junkio@cox.net>
When the message body does not identify what encoding it is in,
-u assumes it is in latin-1 and converts it to utf8, which is
the recommended encoding for git commit log messages.
With -u=<encoding>, the conversion is made into the specified
one, instead of utf8, to allow project-local policies.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Borrow from NO_MMAP patch by Johannes, squelch compiler warnings by
declaring gitstrcasestr() when we use it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Also make platform specific part more isolated. Currently we only
have Darwin defined, but I've taken a look at SunOS specific patch
(which I dropped on the floor for now) as well. Doing things this way
would make adding it easier.
Signed-off-by: Junio C Hamano <junkio@cox.net>