Previously a patch that records too large a line number caused the
offset matching code in git-apply to overstep its internal buffer.
Noticed by Johannes Schindelin.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
CRLF conversion bears a slight chance of corrupting data.
autocrlf=true will convert CRLF to LF during commit and LF to
CRLF during checkout. A file that contains a mixture of LF and
CRLF before the commit cannot be recreated by git. For text
files this is the right thing to do: it corrects line endings
such that we have only LF line endings in the repository.
But for binary files that are accidentally classified as text the
conversion can corrupt data.
If you recognize such corruption early you can easily fix it by
setting the conversion type explicitly in .gitattributes. Right
after committing you still have the original file in your work
tree and this file is not yet corrupted. You can explicitly tell
git that this file is binary and git will handle the file
appropriately.
Unfortunately, the desired effect of cleaning up text files with
mixed line endings and the undesired effect of corrupting binary
files cannot be distinguished. In both cases CRLFs are removed
in an irreversible way. For text files this is the right thing
to do because CRLFs are line endings, while for binary files
converting CRLFs corrupts data.
This patch adds a mechanism that can either warn the user about
an irreversible conversion or can even refuse to convert. The
mechanism is controlled by the variable core.safecrlf, with the
following values:
- false: disable safecrlf mechanism
- warn: warn about irreversible conversions
- true: refuse irreversible conversions
The default is to warn. Users are only affected by this default
if core.autocrlf is set. But the current default of git is to
leave core.autocrlf unset, so users will not see warnings unless
they deliberately chose to activate the autocrlf mechanism.
The safecrlf mechanism's details depend on the git command. The
general principles when safecrlf is active (not false) are:
- we warn/error out if files in the work tree can modified in an
irreversible way without giving the user a chance to backup the
original file.
- for read-only operations that do not modify files in the work tree
we do not not print annoying warnings.
There are exceptions. Even though...
- "git add" itself does not touch the files in the work tree, the
next checkout would, so the safety triggers;
- "git apply" to update a text file with a patch does touch the files
in the work tree, but the operation is about text files and CRLF
conversion is about fixing the line ending inconsistencies, so the
safety does not trigger;
- "git diff" itself does not touch the files in the work tree, it is
often run to inspect the changes you intend to next "git add". To
catch potential problems early, safety triggers.
The concept of a safety check was originally proposed in a similar
way by Linus Torvalds. Thanks to Dimitry Potapov for insisting
on getting the naked LF/autocrlf=true case right.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
This new error mode allows a line to have a carriage return at the
end of the line when checking and fixing trailing whitespace errors.
Some people like to keep CRLF line ending recorded in the repository,
and still want to take advantage of the automated trailing whitespace
stripping. We still show ^M in the diff output piped to "less" to
remind them that they do have the CR at the end, but these carriage
return characters at the end are no longer flagged as errors.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you have more than one patch series, an earlier one of which
tries to introduce whitespace breakages and a later one of which
has such a new line in its context, "git-apply --whitespace=fix"
will apply and fix the whitespace breakages in the earlier one,
making the resulting file not to match the context of the later
patch.
A short demonstration is in the new test, t4125.
For example, suppose the first patch is:
diff a/hello.txt b/hello.txt
--- a/hello.txt
+++ b/hello.txt
@@ -20,3 +20,3 @@
Hello world.$
-How Are you$
-Today?$
+How are you $
+today? $
to fix broken case in the string, but it introduces unwanted
trailing whitespaces to the result (pretend you are looking at
"cat -e" output of the patch --- '$' signs are not in the patch
but are shown to make the EOL stand out). And the second patch
is to change the wording of the greeting further:
diff a/hello.txt b/hello.txt
--- a/hello.txt
+++ b/hello.txt
@@ -18,5 +18,5 @@
Greetings $
-Hello world.$
+Hello, everybody. $
How are you $
-today? $
+these days? $
If you apply the first one with --whitespace=fix, you will get
this as the result:
Hello world.$
How are you$
today?$
and this does not match the preimage of the second patch, which
demands extra whitespace after "How are you" and "today?".
This series is about teaching "git apply --whitespace=fix" to
cope with this situation better. If the patch does not apply,
it rewrites the second patch like this and retries:
diff a/hello.txt b/hello.txt
--- a/hello.txt
+++ b/hello.txt
@@ -18,5 +18,5 @@
Greetings$
-Hello world.$
+Hello, everybody.$
How are you$
-today?$
+these days?$
This is done by rewriting the preimage lines in the hunk
(i.e. the lines that begin with ' ' or '-'), using the same
whitespace fixing rules as it is using to apply the patches, so
that it can notice what it did to the previous ones in the
series.
A careful reader may notice that the first patch in the example
did not touch the "Greetings" line, so the trailing whitespace
that is in the original preimage of the second patch is not from
the series. Is rewriting this context line a problem?
If you think about it, you will realize that the reason for the
difference is because the submitter's tree was based on an
earlier version of the file that had whitespaces wrong on that
"Greetings" line, and the change that introduced the "Greetings"
line was added independently of this two-patch series to our
tree already with an earlier "git apply --whitespace=fix".
So it may appear this logic is rewriting too much, it is not
so. It is just rewriting what we would have rewritten in the
past.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is necessary to allow match_fragment() to attempt a match
with a preimage that is based on a version before whitespace
errors were fixed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "patch" parameter used to include leading '+' of an added
line in the patch, and the array was treated as 1-based. Make
it accept the contents of the line alone and simplify the code.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function apply_line() changed its behaviour depending on the
ws_error_action, whitespace_error and if the input was a context.
Make its caller responsible for such checking so that we can convert
the function to copy the contents of line while fixing whitespace
breakage more easily.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We had two pointer variables pointing to the same buffer and an
integer variable used to index into its tail part that was
active (old, oldlines and oldsize for the preimage, and their
'new' counterparts for the postimage).
To help readability, use 'oldlines' as the allocated pointer,
and use 'old' as the pointer to the tail that advances while the
code builds up the contents in the buffer. The size 'oldsize'
can be computed as (old-oldines).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This updates the way preimage and postimage in a patch hunk is
parsed and prepared for applying. By looking at image->line[n].flag,
the code can tell if it is a common context line that is the
same between the preimage and the postimage.
This matters when we actually start applying a patch with
contexts that have whitespace breakages that have already been
fixed in the target file.
Wnen the caller knows the hunk needs to match at the beginning
or at the end, there is no point starting from the line number
that is found in the patch and trying match with increasing
offset. The logic to find matching lines was made more line
oriented with the previous patch and this optimization is now
trivial.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This changes the way git-apply internally works to be more line
oriented. The logic to find where the patch applies with offset
used to count line numbers by always counting LF from the
beginning of the buffer, but it is simplified because we count
the line length of the target file and the preimage snippet
upfront now.
The ultimate motivation is to allow applying patches
whose preimage context has whitespace corruption that has
already been corrected in the local copy. For that purpose, we
introduce a table of line-hash that allows us to match lines
that differ only in whitespaces.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This moves the logic to force match at the beginning and/or at
the end of the buffer to the actual function that finds the
match from its caller. This is a necessary preparation for the
next step to allow matching disregarding certain differences,
such as whitespace changes.
We probably could optimize this even more by taking advantage of
the fact that match_beginning and match_end forces the match to
be at an exact location (anchored at the beginning and/or the
end), but that's for another commit.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This restructures code to find matching location with offset
in find_offset() function, so that there is need for only one
call site of match_fragment() function. There still isn't a
change in the logic of the program.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This moves three "if" conditions out of line from find_offset()
function, which is responsible for finding the matching place in
the preimage to apply the patch. There is no change in the
logic of the program.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This converts the index explicitly on read and write to its on-disk
format, allowing the in-core format to contain more flags, and be
simpler.
In particular, the in-core format is now host-endian (as opposed to the
on-disk one that is network endian in order to be able to be shared
across machines) and as a result we can dispense with all the
htonl/ntohl on accesses to the cache_entry fields.
This will make it easier to make use of various temporary flags that do
not exist in the on-disk format.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When running "git apply --check" while --whitespace=fix is
enabled (either from the command line or via the configuration),
we reported that "N line(s) applied after _fixing_", but --check
by itself does not apply and this message was alarming.
We could even reword the message to say "N line(s) would have
been applied after fixing...", but this patch does not go that
far. Instead, we just make it use the "N lines add whitespace
errors" warning, which happens to be a good diagnostic message a
user would expect from the --check option.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix any sequence of 8 spaces in initial indent, not just the case where
the 8 spaces are the first thing on the line.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use 0 instead of -1 for the case where not tabs or spaces are found; it
will make some later math slightly simpler.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For consistency, make the two tools report whitespace errors in the
same way (the output of "diff --check" has been tweaked to match
that of "git apply").
Note that although the textual content is basically the same only
"git diff --check" provides a colorized version of the problematic
lines; making "git apply" do colorization will require more extensive
changes (figuring out the diff colorization preferences of the user)
and so that will be a subject for another commit.
Signed-off-by: Wincent Colaiuta <win@wincent.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit unifies three separate places where whitespace checking was
performed:
- the whitespace checking previously done in builtin-apply.c is
extracted into a function in ws.c
- the equivalent logic in "git diff" is removed
- the emit_line_with_ws() function is also removed because that also
rechecks the whitespace, and its functionality is rolled into ws.c
The new function is called check_and_emit_line() and it does two things:
checks a line for whitespace errors and optionally emits it. The checking
is based on lines of content rather than patch lines (in other words, the
caller must strip the leading "+" or "-"); this was suggested by Junio on
the mailing list to allow for a future extension to "git show" to display
whitespace errors in blobs.
At the same time we teach it to report all classes of whitespace errors
found for a given line rather than reporting only the first found error.
Signed-off-by: Wincent Colaiuta <win@wincent.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `core.whitespace` configuration variable allows you to define what
`diff` and `apply` should consider whitespace errors for all paths in
the project (See gitlink:git-config[1]). This attribute gives you finer
control per path.
For example, if you have these in the .gitattributes:
frotz whitespace
nitfol -whitespace
xyzzy whitespace=-trailing
all types of whitespace problems known to git are noticed in path 'frotz'
(i.e. diff shows them in diff.whitespace color, and apply warns about
them), no whitespace problem is noticed in path 'nitfol', and the
default types of whitespace problems except "trailing whitespace" are
noticed for path 'xyzzy'. A project with mixed Python and C might want
to have:
*.c whitespace
*.py whitespace=-indent-with-non-tab
in its toplevel .gitattributes file.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds description of core.whitespace to the manual page of git-config,
and updates the stale description of whitespace handling in the manual
page of git-apply.
Also demote "strip" to a synonym status for "fix" as the value of --whitespace
option given to git-apply.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We earlier introduced core.whitespace to allow users to tweak the
definition of what the "whitespace errors" are, for the purpose of diff
output highlighting. This teaches the same to git-apply, so that the
command can both detect (when --whitespace=warn option is given) and fix
(when --whitespace=fix option is given) as configured.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The variables were somewhat misnamed.
* "What to do when whitespace errors are detected" is now called
"ws_error_action" (used to be called "new_whitespace");
* The constants to denote the possible actions are "nowarn_ws_error",
"warn_on_ws_error", "die_on_ws_error", and "correct_ws_error". The
last one used to be "strip_whitespace", but we correct whitespace
error in indent (SP followed by HT) and "strip" is not quite an
accurate name for it.
Other than the renaming of variables and constants, there is no
functional change in this patch. While we are at it, it also fixes
overly long lines and multi-line comment styles (which of course do
not affect the generated code at all).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Solaris Workshop Compiler found a few unreachable statements.
Signed-off-by: Guido Ostkamp <git@ostkamp.fastmail.fm>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ce_match_stat() can be told:
(1) to ignore CE_VALID bit (used under "assume unchanged" mode)
and perform the stat comparison anyway;
(2) not to perform the contents comparison for racily clean
entries and report mismatch of cached stat information;
using its "option" parameter. Give them symbolic constants.
Similarly, run_diff_files() can be told not to report anything
on removed paths. Also give it a symbolic constant for that.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix size_t vs. unsigned long pointer mismatch warnings introduced
with the addition of strbuf_detach().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* make strbuf_read_file take a size hint (works like strbuf_read)
* use it in a couple of places.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For that purpose, the ->buf is always initialized with a char * buf living
in the strbuf module. It is made a char * so that we can sloppily accept
things that perform: sb->buf[0] = '\0', and because you can't pass "" as an
initializer for ->buf without making gcc unhappy for very good reasons.
strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf
anymore.
as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying
->buf isn't an option anymore, if ->buf is going to escape from the scope,
and eventually be free'd.
API changes:
* strbuf_setlen now always works, so just make strbuf_reset a convenience
macro.
* strbuf_detatch takes a size_t* optional argument (meaning it can be
NULL) to copy the buffer's len, as it was needed for this refactor to
make the code more readable, and working like the callers.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-am used "git apply -z --index-info" to find the original versions
of the files touched by the diff, to be able to do an inexpensive
three-way merge.
This operation makes only sense in a repository, since the index
information in the diff refers to blobs, which have to be present in
the current repository.
Therefore, teach "git apply" a mode to write out the result as an
index file to begin with, obviating the need for scripts to do it
themselves.
The sole user for --index-info is "git am" is converted to
use --build-fake-ancestor in this patch.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* quote_c_style works on a strbuf instead of a wild buffer.
* quote_c_style is now clever enough to not add double quotes if not needed.
* write_name_quoted inherits those advantages, but also take a different
set of arguments. Now instead of asking for quotes or not, you pass a
"terminator". If it's \0 then we assume you don't want to escape, else C
escaping is performed. In any case, the terminator is also appended to the
stream. It also no longer takes the prefix/prefix_len arguments, as it's
seldomly used, and makes some optimizations harder.
* write_name_quotedpfx is created to work like write_name_quoted and take
the prefix/prefix_len arguments.
Thanks to those API changes, diff.c has somehow lost weight, thanks to the
removal of functions that were wrappers around the old write_name_quoted
trying to give it a semantics like the new one, but performing a lot of
allocations for this goal. Now we always write directly to the stream, no
intermediate allocation is performed.
As a side effect of the refactor in builtin-apply.c, the length of the bar
graphs in diffstats are not affected anymore by the fact that the path was
clipped.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
If the gain is not obvious in the diffstat, the resulting code is more
readable, _and_ in checkout-index/update-index we now reuse the same buffer
to unquote strings instead of always freeing/mallocing.
This also is more coherent with the next patch that reworks quoting
functions.
The quoting function is also made more efficient scanning for backslashes
and treating portions of strings without a backslash at once.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
git-am used "git apply -z --index-info" to find the original versions
of the files touched by the diff, to be able to do an inexpensive
three-way merge.
This operation makes only sense in a repository, since the index
information in the diff refers to blobs, which have to be present in
the current repository.
Therefore, teach "git apply" a mode to write out the result as an
index file to begin with, obviating the need for scripts to do it
themselves.
The sole user for --index-info is "git am" is converted to
use --build-fake-ancestor in this patch.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The algorithm isn't right here: it accumulates any set of 8 spaces into
tabs even if they're separated by tabs, so
<four spaces><tab><four spaces><tab>
is converted to
<tab><tab><tab>
when it should be just
<tab><tab>
So teach git-apply that a tab hides any group of less than 8 previous
spaces in a row.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" does not record index lines for pure mode changes (i.e. no
lines changed). Therefore, apply --index-info would call out a bogus
error.
Instead, fall back to reading the info from the current index.
Incidentally, this fixes an error where git-rebase would not rebase a
commit including a pure mode change, and changes requiring a threeway
merge.
Noticed and later tested by Chris Shoemaker.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Now, those functions take an "out" strbuf argument, where they store their
result if any. In that case, it also returns 1, else it returns 0.
* those functions support "in place" editing, in the sense that it's OK to
call them this way:
convert_to_git(path, sb->buf, sb->len, sb);
When doable, conversions are done in place for real, else the strbuf
content is just replaced with the new one, transparentely for the caller.
If you want to create a new filter working this way, being the accumulation
of filter1, filter2, ... filtern, then your meta_filter would be:
int meta_filter(..., const char *src, size_t len, struct strbuf *sb)
{
int ret = 0;
ret |= filter1(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
ret |= filter2(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
....
return ret | filtern(..., src, len, sb);
}
That's why subfilters the convert_to_* functions called were also rewritten
to work this way.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier, add_file_to_index() invalidated the path in the cache-tree
but remove_file_from_cache() did not, and the user of the latter
needed to invalidate the entry himself. This led to a few bugs due to
missed invalidate calls already. This patch makes the management of
cache-tree less error prone by making more invalidate calls from lower
level cache API functions.
The rules are:
- If you are going to write the index, you should either maintain
cache_tree correctly.
- If you cannot, alternatively you can remove the entire cache_tree
by calling cache_tree_free() before you call write_cache().
- When you modify the index, cache_tree_invalidate_path() should be
called with the path you are modifying, to discard the entry from
the cache-tree structure.
- The following cache API functions exported from read-cache.c (and
the macro whose names have "cache" instead of "index")
automatically call cache_tree_invalidate_path() for you:
- remove_file_from_index();
- add_file_to_index();
- add_index_entry();
You can modify the index bypassing the above API functions
(e.g. find an existing cache entry from the index and modify it in
place). You need to call cache_tree_invalidate_path() yourself in
such a case.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Add strbuf_rtrim to remove trailing spaces.
* Add strbuf_insert to insert data at a given position.
* Off-by one fix in strbuf_addf: strbuf_avail() does not counts the final
\0 so the overflow test for snprintf is the strict comparison. This is
not critical as the growth mechanism chosen will always allocate _more_
memory than asked, so the second test will not fail. It's some kind of
miracle though.
* Add size extension hints for strbuf_init and strbuf_read. If 0, default
applies, else:
+ initial buffer has the given size for strbuf_init.
+ first growth checks it has at least this size rather than the
default 8192.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>