Callers of reencode_string() that re-encodes a string from one
encoding to another all used ad-hoc way to bypass the case where the
input and the output encodings are the same. Some did strcmp(),
some did strcasecmp(), yet some others when converting to UTF-8 used
is_encoding_utf8().
Introduce same_encoding() helper function to make these callers use
the same logic. Notably, is_encoding_utf8() has a work-around for
common misconfiguration to use "utf8" to name UTF-8 encoding, which
does not match "UTF-8" hence strcasecmp() would not consider the
same. Make use of it in this helper function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently "git am" does insane things if the mbox it is given contains
attachments with a MIME type that aren't "text/*".
In particular, it will still decode them, and pass them "one line at a
time" to the mail body filter, but because it has determined that they
aren't text (without actually looking at the contents, just at the mime
type) the "line" will be the encoding line (eg 'base64') rather than a
line of *content*.
Which then will cause the text filtering to fail, because we won't
correctly notice when the attachment text switches from the commit message
to the actual patch. Resulting in a patch failure, even if patch may be a
perfectly well-formed attachment, it's just that the message type may be
(for example) "application/octet-stream" instead of "text/plain".
Just remove all the bogus games with the message_type. The only difference
that code creates is how the data is passed to the filter function
(chunked per-pred-code line or per post-decode line), and that difference
is *wrong*, since chunking things per pre-decode line can never be a
sensible operation, and cannot possibly matter for binary data anyway.
This code goes all the way back to March of 2007, in commit 87ab799234
("builtin-mailinfo.c infrastrcture changes"), and apparently Don used to
pass random mbox contents to git. However, the pre-decode vs post-decode
logic really shouldn't matter even for that case, and more importantly, "I
fed git am crap" is not a valid reason to break *real* patch attachments.
If somebody really cares, and determines that some attachment is binary
data (by looking at the data, not the MIME-type), the whole attachment
should be dismissed, rather than fed in random-sized chunks to
"handle_filter()".
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Don Zickus <dzickus@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"Content-type: text/plain; charset=UTF-8" header should not appear
twice in the input, but it is always better to gracefully deal with
such a case. The current code concatenates the value to the values
we have seen previously, producing nonsense such as "utf8UTF-8".
Instead of concatenating, forget the previous value and use the last
value we see.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already strip the more common Re: and re:, and we do not often
see RE: from saner MUA, but this prefix does exist and gets used
from time to time.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a line in the message is not a valid utf-8, "git mailinfo"
attempts to convert it to utf-8 assuming the input is latin1 (and
punt if it does not convert cleanly). Using the same heuristics in
"git commit" and "git commit-tree" lets the editor output be in
latin1 to make the overall system more consistent.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic for the -b mode, where [PATCH] is dropped but [foo] is not,
silently ate all spaces after the ].
Fix this by keeping the next isspace() character, if there is any.
Being more thorough is pointless, as the later cleanup_space() call
will normalize any sequence of whitespace to a single ' '.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Without the "-k" option, mailinfo will convert a folded
subject header like:
Subject: this is a
subject that doesn't
fit on one line
into a single line. With "-k", however, we assumed that
these newlines were significant and represented something
that the sending side would want us to preserve.
For messages created by format-patch, this assumption was
broken by a1f6baa (format-patch: wrap long header lines,
2011-02-23). For messages sent by arbitrary MUAs, this was
probably never a good assumption to make, as they may have
been folding subjects in accordance with rfc822's line
length recommendations all along.
This patch now joins folded lines with a single whitespace
character. This treats header folding purely as a syntactic
feature of the transport mechanism, not as something that
format-patch is trying to tell us about the original
subject.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* builtin/commit.c: Replace block of code with a one-liner call to
logmsg_reencode().
* commit.c: new function for looking up a comit by name
* pretty.c: helper methods for getting output encodings
Add helpers get_log_output_encoding() and
get_commit_output_encoding() that eliminate some messy and duplicate
if-blocks.
Signed-off-by: Pat Notz <patnotz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Without this patch at least IBM VisualAge C 5.0 (I have 5.0.2) on AIX
5.1 fails to compile git.
enum style is inconsistent already, with some enums declared on one
line, some over 3 lines with the enum values all on the middle line,
sometimes with 1 enum value per line... and independently of that the
trailing comma is sometimes present and other times absent, often
mixing with/without trailing comma styles in a single file, and
sometimes in consecutive enum declarations.
Clearly, omitting the comma is the more portable style, and this patch
changes all enum declarations to use the portable omitted dangling
comma style consistently.
Signed-off-by: Gary V. Vaughan <gary@thewrittenword.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle perforations found “in the wild” more robustly by recognizing
“%<” as an alternative scissors mark.
This feature is only meant to support old habits. Discourage new use
of the percent-based version by only documenting the 8< symbol so new
users’ perforations can still be recognized by old versions of Git.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This shrinks the top-level directory a bit, and makes it much more
pleasant to use auto-completion on the thing. Instead of
[torvalds@nehalem git]$ em buil<tab>
Display all 180 possibilities? (y or n)
[torvalds@nehalem git]$ em builtin-sh
builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c
builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o
[torvalds@nehalem git]$ em builtin-shor<tab>
builtin-shortlog.c builtin-shortlog.o
[torvalds@nehalem git]$ em builtin-shortlog.c
you get
[torvalds@nehalem git]$ em buil<tab> [type]
builtin/ builtin.h
[torvalds@nehalem git]$ em builtin [auto-completes to]
[torvalds@nehalem git]$ em builtin/sh<tab> [type]
shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o
[torvalds@nehalem git]$ em builtin/sho [auto-completes to]
[torvalds@nehalem git]$ em builtin/shor<tab> [type]
shortlog.c shortlog.o
[torvalds@nehalem git]$ em builtin/shortlog.c
which doesn't seem all that different, but not having that annoying
break in "Display all 180 possibilities?" is quite a relief.
NOTE! If you do this in a clean tree (no object files etc), or using an
editor that has auto-completion rules that ignores '*.o' files, you
won't see that annoying 'Display all 180 possibilities?' message - it
will just show the choices instead. I think bash has some cut-off
around 100 choices or something.
So the reason I see this is that I'm using an odd editory, and thus
don't have the rules to cut down on auto-completion. But you can
simulate that by using 'ls' instead, or something similar.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are rebasing we know that the header lines in the
patch are good and that we don't need to pick up any headers
from the body of the patch.
This makes it possible to rebase commits whose commit message
start with "From" or "Date".
Test vectors by Jeff King.
Signed-off-by: Lukas Sandström <luksan@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A recent "cut at scissors" implementation rewinds and truncates
the output file to store the message when it sees a scissors mark,
but it did not check if these library calls succeeded.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
[sp: Use fseek as rewind returns void]
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This teaches mailinfo the scissors -- >8 -- mark; the command ignores
everything before it in the message body.
For lefties among us, we also support -- 8< -- ;-)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It fed two arguments to override the corresponding global variables,
but the caller always assigned the values to the global variables
first and then passed those global variables to this function.
Stop pretending to be a proper API to confuse people.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There should be no functional change. Just the necessary changes and
simplifications associated with calling strbuf_getwholeline() rather
than an internal function or fgets.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, we remove leading [bracketed] [strings] from the Subject:
header when coming up with the summary of the patch. This is because
there are mailing lists etc that add their own headers to the subject, and
they know they can add things in brackets. The most obvious example is the
Linux kernel security list. Their emails look like
Subject: [Security] [patch] random: make get_random_int() more random
and other people mangle Subject: themselves in a similar way, e.g.:
Subject: [PATCH -rc] [BUGFIX] x86: fix kernel_trap_sp()
Subject: [BUGFIX][PATCH] fix bad page removal from LRU (Was Re: [RFC][PATCH] ..
even though "fix" is more than enough cue to mark it as a [BUGFIX].
Some projects however want to keep these bracketed strings. With this
option, we remove only [bracketed strings that contain word PATCH], so we
will turn things like these
[PATCH] [mailinfo] -b ...
[PATCH v2] [mailinfo] -b ...
[PATCH (v2) 1/4] [mailinfo] -b ...
into
[mailinfo] -b ...
This lacks tests and integration to the "git am" toolchain to be useful,
but it is a start.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit 650d30d8a1.
Some mailing lists are configured add prefix "[listname] " to all their
messages, and also people hand-edit subject lines, be it an output from
format-patch or a patch generated by some other means.
We cannot stop people from mucking with the subject line, and with the
change, there always will be need for hand editing the subject when that
happens. People have depended on the leading [bracketed string] removal.
git-format-patch prepends patches with a [PATCH x/n] prefix, but
mailinfo used to remove any number of square-bracket pairs and
the content between them. This prevents one from using a commit
subject like this:
[ and ] must be allowed as input
Removing the square bracket pair from this rather clumsily
constructed subject line loses important information, so we must
take care not to.
This patch causes the subject stripping to stop after it has
encountered one pair of square brackets.
One possible downside of this patch is that the patch-handling
programs will now fail at removing author-added square-brackets
to be removed, such as
[RFC][PATCH x/n]
However, since format-patch only adds one set of square brackets,
this behaviour is quite easily undesrstood and defended while the
previous behaviour is not.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some platforms do not understand the character encoding "latin1" which is
another name for "ISO8859-1". So use "ISO8859-1" instead which all tested
platforms understand.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When converting between character encodings, git tests whether the "from"
encoding and the "to" encoding have the same name. git should perform this
test case insensitively so that e.g. utf-8 is not seen as a different
encoding than UTF-8.
Additionally, it is not necessary to call tolower() anymore on the encodings
extracted from the mail message.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some ancient platforms (Solaris 7, IRIX 6.5) do not understand 'utf-8', but
all tested implementations understand 'UTF-8'.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These variables were always overwritten or the assigned
value was unused:
builtin-diff-tree.c::cmd_diff_tree(): nr_sha1
builtin-for-each-ref.c::opt_parse_sort(): sort_tail
builtin-mailinfo.c::decode_header_bq(): in
builtin-shortlog.c::insert_one_record(): len
connect.c::git_connect(): path
imap-send.c::v_issue_imap_cmd(): n
pretty.c::pp_user_info(): filler
remote::parse_refspec_internal(): llen
Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
currently for cases like
From: A U Thor <a.u.thor@example.com> (Comment)
mailinfo extracts the following 'Author:' field:
Author: A U Thor (Comment)
^^
which has two extra spaces left in there after removed email part.
I think this is wrong so here is a fix.
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At present we do headers unfolding (see RFC822 3.1.1. LONG HEADER FIELDS) for
all fields except 'From' (always) and 'Subject' (when keep_subject is set)
Not unfolding 'From' is a bug -- see above-mentioned RFC link.
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When native language (RU) is in use, subject header usually contains several
parts, e.g.
Subject: [Navy-patches] [PATCH]
=?utf-8?b?0JjQt9C80LXQvdGR0L0g0YHQv9C40YHQvtC6INC/0LA=?=
=?utf-8?b?0LrQtdGC0L7QsiDQvdC10L7QsdGF0L7QtNC40LzRi9GFINC00LvRjyA=?=
=?utf-8?b?0YHQsdC+0YDQutC4?=
This exposes several bugs in builtin-mailinfo.c:
1. decode_b_segment: do not append explicit NUL -- explicit NUL was preventing
correct header construction on parts concatenation via strbuf_addbuf in
decode_header_bq. Fixes:
-Subject: Изменён список пакетов необходимых для сборки
+Subject: Изменён список па
Then
2. Do not emit '\n' between "encoded-word" where RFC2046 says that linear
white space between them are ignored when displaying. Fixes:
-Subject: Изменён список пакетов необходимых для сборки
+Subject: Изменён список па кетов необходимых для сборки
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
LF at the end of format strings given to die() is redundant because
die already adds one on its own.
Signed-off-by: Alexander Potashev <aspotashev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In handle_from, we calculate the end boundary of a section
to remove from a strbuf using strcspn like this:
el = strcspn(buf, set_of_end_boundaries);
strbuf_remove(&sb, start, el + 1);
This works fine if "el" is the offset of the boundary
character, meaning we remove up to and including that
character. But if the end boundary didn't match (that is, we
hit the end of the string as the boundary instead) then we
want just "el". Asking for "el+1" caught an out-of-bounds
assertion in the strbuf library.
This manifested itself when we got a 'From' header that had
just an email address with nothing else in it (the end of
the string was the end of the address, rather than, e.g., a
trailing '>' character), causing git-mailinfo to barf.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent changes to is_multipart_boundary() caused git-mailinfo to segfault.
The reason was after handling the end of the boundary the code tried to look
for another boundary. Because the boundary list was empty, dereferencing
the pointer to the top of the boundary caused the program to go boom.
The fix is to check to see if the list is empty and if so go on its merry
way instead of looking for another boundary.
I also fixed a couple of increments and decrements that didn't look correct
relating to content_top.
The boundary test case was updated to catch future problems like this again.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After finding a MIME multi-part message boundary line, the handle_body()
function is supposed to first flush any accumulated contents from the
previous part to the output stream. However, the code mistakenly output
the boundary line it found.
The old code that used one global, fixed-length buffer line[] used an
alternate static buffer newline[] for keeping track of this accumulated
contents and flushed newline[] upon seeing the boundary; when 3b6121f
(git-mailinfo: use strbuf's instead of fixed buffers, 2008-07-13)
converted a fixed-length buffer in this program to use strbuf,these two
buffers were converted to "line" and "prev" (the latter of which now has a
much more sensible name) strbufs, but the code mistakenly flushed "line"
(which contains the boundary we have just found), instead of "prev".
This resulted in the first boundary to be output in front of the first
line of the message.
The rewritten implementation of handle_boundary() lost the terminating
newline; this would then result in the second line of the message to be
stuck with the first line.
The is_multipart_boundary() was designed to catch both the internal
boundary and the terminating one (the one with trailing "--"); this also
was broken with the rewrite, and the code in the handle_boundary() to
handle the terminating boundary was never triggered.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using git-rebase, author fields containing a ')' at the last position
had the close-parens character removed; the removal should be done only
when it is of this form:
user@host (User Name)
i.e. the remainder after stripping the e-mail address part is enclosed in
a parentheses pair as a whole, not for addresses like this:
User Name (me) <user@host>
Signed-off-by: Philippe Bruhat (BooK) <book@cpan.org>
Acked-by: Lukas Sandström <lukass@etek.chalmers.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A patch title "[PATCH] 1" was sanitized by the original code by stripping
the "[PATCH]" from the front, but after the conversion to use strbuf this
behaviour was broken due to a counting error.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"Subject: " isn't in the static array "header", and thus
memcmp("Subject:", header[i], 7) will never match.
Even if it did so, hdr_data[] may not have been allocated if there weren't
a "Subject: " in-body when we process "[PATCH]" in the affected codepath.
Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you misuse a git command, you are shown the usage string.
But this is currently shown in the dashed form. So if you just
copy what you see, it will not work, when the dashed form
is no longer supported.
This patch makes git commands show the dash-less version.
For shell scripts that do not specify OPTIONS_SPEC, git-sh-setup.sh
generates a dash-less usage string now.
Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When handling a MIME multipart message, multi-part boundary lines are eaten
by a call to handle_boundary() function from the main loop of handle_body(),
and after that happens, we should update the line length correctly, because
handle_boundary() udpates line[] with new data.
This was caused by a thinko in 9aa2309 (mailinfo: apply the same fix not
to lose NULs in BASE64 and QP codepaths, 2008-05-25).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function fgets() has a big problem with NUL characters: it reads
them, but nobody will know if the NUL comes from the file stream, or
was appended at the end of the line.
So implement a custom read_line_with_nul() function.
Noticed by Tommy Thorn.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git_config() only had a function parameter, but no callback data
parameter. This assumes that all callback functions only modify
global variables.
With this patch, every callback gets a void * parameter, and it is hoped
that this will help the libification effort.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function is intended to be fed one logical line at a time to
inspect, but a QP encoded raw input line can have more than one
lines, just like BASE64 encoded one.
Quoting LF as =0A may be unusual but RFC2045 allows it.
The issue was noticed and fixed by Jay Soffian. JC added a test
to protect the fix from regressing later.
Signed-off-by: Jay Soffian <jaysoffian@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes rebase/am keep the original commit log message
better, even when it does not conform to "single line paragraph
to say what it does, then explain and defend why it is a good
change in later paragraphs" convention.
This change is a two-edged sword. While the earlier behaviour
would make such commit log messages more friendly to readers who
expect to get the birds-eye view with oneline summary formats,
users who primarily use git as a way to interact with foreign
SCM systems would not care much about the convenience of oneline
git log tools, but care more about preserving their own
convention. This changes their commits less useful to readers
who read them with git tools while keeping them more consistent
with the foreign SCM systems they interact with.
Signed-off-by: Junio C Hamano <gitster@pobox.com>