This tightens the parsing of a commit object in a couple of ways.
- The "tree " header must end with a LF (earlier we did not
check this condition).
- Make sure parsing of timestamp on the "committer " header
does not go beyond the buffer, even when (1) the "author "
header does not end with a LF (this means that the commit
object is malformed and lacks the committer information) or
(2) the "committer " header does not have ">" that is the end
of the e-mail address, or (3) the "committer " header does
not end with a LF.
We however still keep the existing behaviour to return a parsed
commit object even when non-structural headers such as committer
and author are malformed, so that tools that need to look at
commits to clean up a history with such broken commits can still
get at the structural data (i.e. the parents chain and the tree
object).
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the body of the commit log message contains a non-ASCII character,
format-patch correctly emitted the encoding header to mark the resulting
message as such. However, if the original message was fully ASCII, the
command line switch "-s" was given to add a new sign-off, and
the signer's name was not ASCII only, the resulting message would have
contained non-ASCII character but was not marked as such.
This was cherry-picked from the fix in 'master'
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The file commit.c got quite large, but it does not have to be: the
code concerning pretty printing is pretty well contained. In fact,
this commit just splits it off into pretty.c, leaving commit.c with
just 672 lines.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
.. by not using quite so much indirection.
This currently grows the "struct commit" a bit, which could be avoided by
using a union for "util" and "indegree" (the topo-sort used to use "util"
anyway, so you cannot use them together), but for now the goal of this was
to simplify, not optimize.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the body of the commit log message contains a non-ASCII character,
format-patch correctly emitted the encoding header to mark the resulting
message as such. However, if the original message was fully ASCII, the
command line switch "-s" was given to add a new sign-off, and
the signer's name was not ASCII only, the resulting message would have
contained non-ASCII character but was not marked as such.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before this patch, clear_commit_marks() recursed for each parent. This
could be potentially very expensive in terms of stack space. Probably
the only reason that this did not lead to problems is the fact that we
typically call clear_commit_marks() after marking a relatively small set
of commits.
Use (sort of) a tail recursion instead: first recurse on the parents
other than the first one, and then continue the loop with the first
parent.
Noticed by Shawn Pearce.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
For that purpose, the ->buf is always initialized with a char * buf living
in the strbuf module. It is made a char * so that we can sloppily accept
things that perform: sb->buf[0] = '\0', and because you can't pass "" as an
initializer for ->buf without making gcc unhappy for very good reasons.
strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf
anymore.
as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying
->buf isn't an option anymore, if ->buf is going to escape from the scope,
and eventually be free'd.
API changes:
* strbuf_setlen now always works, so just make strbuf_reset a convenience
macro.
* strbuf_detatch takes a size_t* optional argument (meaning it can be
NULL) to copy the buffer's len, as it was needed for this refactor to
make the code more readable, and working like the callers.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This changes the interporate() to replace entries with NULL values
by the empty string, and uses it to interpolate missing fields in
custom format output used in git-log and friends. It is most useful
to avoid <unknown> output from %b format for a commit log message
that lack any body text.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add strbuf_remove, change strbuf_insert:
As both are special cases of strbuf_splice, implement them as such.
gcc is able to do the math and generate almost optimal code this way.
Add strbuf_swap:
Exchange the values of its arguments.
Use it in fast-import.c
Also fix spacing issues in strbuf.h
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Careful profiling shows that we spend more time guessing what pattern
allocation will have, whereas we can delay it only at the point where
add_rfc2047 will be used and don't allocate huge memory area for the many
cases where it's not.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Be more clever in how we search for "encoding ...\n": parse for real
instead of the sloppy strstr's.
* use strbuf_splice to do the substring replacements.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also remove the "len" parameter, as:
(1) it was used as a max boundary, and every caller used ~0u
(2) we check for final NUL no matter what, so it doesn't help for speed.
As a result most of the pp_* function takes 3 arguments less, and we need
a lot less local variables, this makes the code way more readable, and
easier to extend if needed.
This patch also fixes some spacing and cosmetic issues.
This patch also fixes (as a side effect) a memory leak intoruced in
builtin-archive.c at commit df4a394f (fmt was xmalloc'ed and not free'd)
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Drop the parameter "msg" of format_commit_message() (as it can be
inferred from the parameter "commit"), add a parameter "template"
in order to avoid accessing the static variable user_format
directly and export the result.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a commit message doesn't have encoding information
and encoding output is utf-8 (default) then an useless
xstrdup() of commit message is done.
If we assume most of users live in an utf-8 world, this
useless copy is the common case.
Performance issue found with KCachegrind.
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These days, show_date() takes a date_mode parameter to specify
the output format, and a separate specialized function for dates
in E-mails does not make much sense anymore.
This retires show_rfc2822_date() function and make it just
another date output format.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Support output of full ISO 8601 style dates in e.g. git log
and other places that use interpolation for formatting.
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
add_user_info() possibly adds way more than just the commit header line.
In fact, it sometimes needs so much more space that there is a buffer
overrun, leading to an ugly crash. For example, the date is printed in its
own line, and usually takes up more space than the equivalent Unix epoch.
So, for good measure, add 80 characters (a full line) to the allocated
space, in addition to the header line length.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
so that an ugly commit message like this can be
handled sanely.
Currently, --pretty=oneline and --pretty=email (hence
format-patch) take and use only the first line of the commit log
message. This changes them to:
- Take the first paragraph, where the definition of the first
paragraph is "skip all blank lines from the beginning, and
then grab everything up to the next empty line".
- Replace all line breaks with a whitespace.
This change would not affect a well-behaved commit message that
adheres to the convention of "single line summary, a blank line,
and then body of message", as its first paragraph always
consists of a single line. Commit messages from different
culture, such as the ones imported from CVS/SVN, can however get
chomped with the existing behaviour at the first linebreak in
the middle of sentence right now, which would become much easier
to see with this change.
The Subject: and --pretty=oneline output would become very long
and unsightly for non-conforming commits, but their messages are
already ugly anyway, and thischange at least avoids the loss of
information.
The Subject: line from a multi-line paragraph is folded using
RFC2822 line folding rules at the places where line breaks were
in the original.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Traditionally we had 16kB limit when formatting log messages for
output, because it was easier to arrange for the caller to have
a reasonably big buffer and pass it down without ever worrying
about reallocating.
This changes the calling convention of pretty_print_commit() to
lift this limit. Instead of the buffer and remaining length, it
now takes a pointer to the pointer that points at the allocated
buffer, and another pointer to the location that stores the
allocated length, and reallocates the buffer as necessary.
To support the user format, the error return of interpolate()
needed to be changed. It used to return a bool telling "Ok the
result fits", or "Sorry, I had to truncate it". Now it returns
0 on success, and returns the size of the buffer it wants in
order to fit the whole result.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time. There are a few files that need
to have trailing whitespaces (most notably, test vectors). The results
still passes the test, and build result in Documentation/ area is unchanged.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Encode ' ' using '=20' even though rfc2047 allows using '_' for
readability. Unfortunately, many programs do not understand this and
just leave the underscore in place. Using '=20' seems to work better.
[jc: with adjustment to t3901]
Signed-off-by: Kristian Høgsberg <hoegsberg@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Check if a line of the header has enough characters to possibly
contain the requested prefix.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This also improves the implementation to match how strndup is
specified (by GNU): if the length given is longer than the string,
only the string's length is allocated and copied, but the string need
not be null-terminated if it is at least as long as the given length.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds --date={local,relative,default} option to log family of commands,
to allow displaying timestamps in user's local timezone, relative time, or
the default format.
Existing --relative-date option is a synonym of --date=relative; we could
probably deprecate it in the long run.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This replaces the fairly odd "created_object()" function that did _most_
of the object setup with a more complete "create_object()" function that
also has a more natural calling convention.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When used with '--boundary A...B', this shows the -/</> marker
you would get with --left-right option to 'git-log' family.
When symmetric diff is not used, everybody is shown to be on the
"right" branch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The function replace_encoding_header is given the whole
commit buffer, including the commit message. When looking
for the encoding header, if none was found in the header, it
would locate any line in the commit message matching
"\nencoding " and remove it.
Instead, we now make sure to search only to the end of the
header.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
It printed the header "encoding " instead of just showing
the encoding, as all other items do.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
There are two breakages in the %P/%p interpolation. It appended
an excess SP at the end of the list, and it gave uninitialized
contents of a buffer on the stack for root commits.
This fixes it, while updating the t6006 test which expected the
wrong output.
Signed-off-by: Junio C Hamano <junkio@cox.net>
A pointer arithmetic error in fill_person caused random data
from the commit object to be included with the timestamp,
which looked something like:
$ git-rev-list --pretty=format:%ct origin/next | head
commit 98453bdb3db10db26099749bc4f2dc029bed9aa9
1174977948 -0700
Merge branch 'master' into next
* master:
Bisect: Use
commit c0ce981f5e
1174889646 -0700
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We shouldn't attempt to assign constant strings into char*, as the
string is not writable at runtime. Likewise we should always be
treating unsigned values as unsigned values, not as signed values.
Most of these are very straightforward. The only exception is the
(unnecessary) xstrdup/free in builtin-branch.c for the detached
head case. Since this is a user-level interactive type program
and that particular code path is executed no more than once, I feel
that the extra xstrdup call is well worth the easy elimination of
this warning.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value. One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.
This is an initial step for the removal of the version using a char array
found in object reading code paths. The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
With this patch,
$ git show -s \
--pretty=format:' Ze komit %h woss%n dunn buy ze great %an'
shows something like
Ze komit 04c5c88 woss
dunn buy ze great Junio C Hamano
The supported placeholders are:
'%H': commit hash
'%h': abbreviated commit hash
'%T': tree hash
'%t': abbreviated tree hash
'%P': parent hashes
'%p': abbreviated parent hashes
'%an': author name
'%ae': author email
'%ad': author date
'%aD': author date, RFC2822 style
'%ar': author date, relative
'%at': author date, UNIX timestamp
'%cn': committer name
'%ce': committer email
'%cd': committer date
'%cD': committer date, RFC2822 style
'%cr': committer date, relative
'%ct': committer date, UNIX timestamp
'%e': encoding
'%s': subject
'%b': body
'%Cred': switch color to red
'%Cgreen': switch color to green
'%Cblue': switch color to blue
'%Creset': reset color
'%n': newline
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
--pretty=o is a valid abbreviation, --pretty=omfg is not
Noticed by: Nicolas Vilz
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The internal function in_merge_bases(A, B) is used to make sure
that commit A is an ancestor of commit B. This changes the
signature of it to take an array of B's and updates its current
callers.
Signed-off-by: Junio C Hamano <junkio@cox.net>
We have a number of badly checked write() calls. Often we are
expecting write() to write exactly the size we requested or fail,
this fails to handle interrupts or short writes. Switch to using
the new write_in_full(). Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xwrite().
Note, the changes to config handling are much larger and handled
in the next patch in the sequence.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This modifies pretty_print_commit() to make the output of git-rev-list and
friends a bit more predictable.
A commit body starting with blank lines might be unheard-of, but still possible
to create using git-commit-tree (so is bound to appear somewhere, sometime).
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We special case the case where encoding recorded in the commit
and the output encoding are the same and do not call iconv().
But we should drop 'encoding' header for this case as well for
consistency.
Signed-off-by: Junio C Hamano <junkio@cox.net>
After re-coding the commit message into the encoding the user
specified (either with core.logoutputencidng or --encoding
option), this drops the "encoding" header altogether. The
output is after re-coding as the user asked (either with the
config or --encoding=<encoding> option), and the extra header
becomes redundant information.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Telling the git-log family not to do any character conversion is
done with --encoding=none, which sets log_output_encoding to an
empty string. However, logmsg_reencode() confused this with
log_output_encoding and commit_encoding set to NULL. The latter
means we should use the default encoding (i.e. utf-8).
Signed-off-by: Junio C Hamano <junkio@cox.net>
It is plausible for somebody to want to view the commit log in a
different encoding from i18n.commitencoding -- the project's
policy may be UTF-8 and the user may be using a commit message
hook to run iconv to conform to that policy (and either not have
i18n.commitencoding to default to UTF-8 or have it explicitly
set to UTF-8). Even then, Latin-1 may be more convenient for
the usual pager and the terminal the user uses.
The new variable i18n.logoutputencoding is used in preference to
i18n.commitencoding to decide what encoding to recode the log
output in when git-log and friends formats the commit log message.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Updated commit objects record the encoding used in their
encoding header. This updates the log family to reencode it
into the encoding specified in i18n.commitencoding (or the
default, which is "utf-8") upon output.
To force a specific encoding that is different, log family takes
command line flag --encoding=<encoding>; giving --encoding=none
entirely disables the reencoding and lets you view log messges
in their original encoding.
Signed-off-by: Junio C Hamano <junkio@cox.net>