kernel/git - git - PowerEL Git System

Commit Graph

Author	SHA1	Message	Date
Jeff King	06dec439a3	diff: do not reuse worktree files that need "clean" conversion When accessing a blob for a diff, we may try to reuse file contents in the working tree, under the theory that it is faster to mmap those file contents than it would be to extract the content from the object database. When we have to filter those contents, though, that assumption does not hold. Even for our internal conversions like CRLF, we have to allocate and fill a new buffer anyway. But much worse, for external clean filters we have to exec an arbitrary script, and we have no idea how expensive it may be to run. So let's skip this optimization when conversion into git's "clean" form is required. This applies whenever the "want_file" flag is false. When it's true, the caller actually wants the smudged worktree contents, which the reused file by definition already has (in fact, this is a key optimization going the other direction, since reusing the worktree file there lets us skip smudge filters). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	9 years ago
Junio C Hamano	5580b271af	diff: disable compaction heuristic for now http://lkml.kernel.org/g/20160610075043.GA13411@sigill.intra.peff.net reports that a change to add a new "function" with common ending with the existing one at the end of the file is shown like this: def foo do_foo_stuff() + common_ending() +end + +def bar + do_bar_stuff() + common_ending() end when the new heuristic is in use. In reality, the change is to add the blank line before "def bar" and everything below, which is what the code without the new heuristic shows. Disable the heuristics by default, and resurrect the documentation for the option and the configuration variables, while clearly marking the feature as still experimental. Signed-off-by: Junio C Hamano <gitster@pobox.com>	9 years ago
Stefan Beller	d634d61ed6	xdiff: implement empty line chunk heuristic In order to produce the smallest possible diff and combine several diff hunks together, we implement a heuristic from GNU Diff which moves diff hunks forward as far as possible when we find common context above and below a diff hunk. This sometimes produces less readable diffs when writing C, Shell, or other programming languages, ie: ... /* + * + * + / + +/ ... instead of the more readable equivalent of ... +/* + * + * + / + / ... Implement the following heuristic to (optionally) produce the desired output. If there are diff chunks which can be shifted around, shift each hunk such that the last common empty line is below the chunk with the rest of the context above. This heuristic appears to resolve the above example and several other common issues without producing significantly weird results. However, as with any heuristic it is not really known whether this will always be more optimal. Thus, it can be disabled via diff.compactionHeuristic. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	9 years ago
Matthieu Moy	5404c116aa	diff: activate diff.renames by default Rename detection is a very convenient feature, and new users shouldn't have to dig in the documentation to benefit from it. Potential objections to activating rename detection are that it sometimes fail, and it is sometimes slow. But rename detection is already activated by default in several cases like "git status" and "git merge", so activating diff.renames does not fundamentally change the situation. When the rename detection fails, it now fails consistently between "git diff" and "git status". This setting does not affect plumbing commands, hence well-written scripts will not be affected. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	9 years ago
Jeff King	b1ddfb9151	diff_populate_gitlink: use a strbuf We allocate 100 bytes to hold the "Submodule commit ..." text. This is enough, but it's not immediately obvious that this is the case, and we have to repeat the magic 100 twice. We could get away with xstrfmt here, but we want to know the size, as well, so let's use a real strbuf. And while we're here, we can clean up the logic around size_only. It currently sets and clears the "data" field pointlessly, and leaves the "should_free" flag on even after we have cleared the data. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	9 years ago
Jeff King	96ffc06f72	convert trivial cases to FLEX_ARRAY macros Using FLEX_ARRAY macros reduces the amount of manual computation size we have to do. It also ensures we don't overflow size_t, and it makes sure we write the same number of bytes that we allocated. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	9 years ago
Jeff King	a64e6a44c6	diff: clarify textconv interface The memory allocation scheme for the textconv interface is a bit tricky, and not well documented. It was originally designed as an internal part of diff.c (matching fill_mmfile), but gradually was made public. Refactoring it is difficult, but we can at least improve the situation by documenting the intended flow and enforcing it with an in-code assertion. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	9 years ago
Duy Nguyen	a97262c62f	diff: make -O and --output work in subdirectory Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	9 years ago
Tobias Klauser	086ecab1a7	diff: remove unused #include "sigchain.h" After switching to use the tempfile module in commit `284098f1` (diff: use tempfile module), no declarations from sigchain.h are used in diff.c anymore. Thus, remove the #include. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	9 years ago
Jeff King	d59f765ac9	use sha1_to_hex_r() instead of strcpy Before sha1_to_hex_r() existed, a simple way to get hex sha1 into a buffer was with: strcpy(buf, sha1_to_hex(sha1)); This isn't wrong (assuming the buf is 41 characters), but it makes auditing the code base for bad strcpy() calls harder, as these become false positives. Let's convert them to sha1_to_hex_r(), and likewise for some calls to find_unique_abbrev(). While we're here, we'll double-check that all of the buffers are correctly sized, and use the more obvious GIT_SHA1_HEXSZ constant. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	9 years ago
Jeff King	3efb988098	react to errors in xdi_diff When we call into xdiff to perform a diff, we generally lose the return code completely. Typically by ignoring the return of our xdi_diff wrapper, but sometimes we even propagate that return value up and then ignore it later. This can lead to us silently producing incorrect diffs (e.g., "git log" might produce no output at all, not even a diff header, for a content-level diff). In practice this does not happen very often, because the typical reason for xdiff to report failure is that it malloc() failed (it uses straight malloc, and not our xmalloc wrapper). But it could also happen when xdiff triggers one our callbacks, which returns an error (e.g., outf() in builtin/rerere.c tries to report a write failure in this way). And the next patch also plans to add more failure modes. Let's notice an error return from xdiff and react appropriately. In most of the diff.c code, we can simply die(), which matches the surrounding code (e.g., that is what we do if we fail to load a file for diffing in the first place). This is not that elegant, but we are probably better off dying to let the user know there was a problem, rather than simply generating bogus output. We could also just die() directly in xdi_diff, but the callers typically have a bit more context, and can provide a better message (and if we do later decide to pass errors up, we're one step closer to doing so). There is one interesting case, which is in diff_grep(). Here if we cannot generate the diff, there is nothing to match, and we silently return "no hits". This is actually what the existing code does already, but we make it a little more explicit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	10 years ago
Jeff King	5096d4909f	convert trivial sprintf / strcpy calls to xsnprintf We sometimes sprintf into fixed-size buffers when we know that the buffer is large enough to fit the input (either because it's a constant, or because it's numeric input that is bounded in size). Likewise with strcpy of constant strings. However, these sites make it hard to audit sprintf and strcpy calls for buffer overflows, as a reader has to cross-reference the size of the array with the input. Let's use xsnprintf instead, which communicates to a reader that we don't expect this to overflow (and catches the mistake in case we do). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	10 years ago
Heiko Voigt	851e18c385	submodule: use new config API for worktree configurations We remove the extracted functions and directly parse into and read out of the cache. This allows us to have one unified way of accessing submodule configuration values specific to single submodules. Regardless whether we need to access a configuration from history or from the worktree. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	10 years ago
Michael Haggerty	284098f13f	diff: use tempfile module Also add some code comments explaining how the fields in "struct diff_tempfile" are used. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	10 years ago
René Scharfe	3f4f17b51b	diff: parse ws-error-highlight option more strictly Check if a matched token is followed by a delimiter before advancing the pointer arg. This avoids accepting composite words like "allnew" or "defaultcontext" and misparsing them as "new" or "context". Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	10 years ago
David Turner	076c98372e	log: add "log.follow" configuration variable People who work on projects with mostly linear history with frequent whole file renames may want to always use "git log --follow" when inspecting the life of the content that live in a single path. Teach the command to behave as if "--follow" was given from the command line when log.follow configuration variable is set and there is one (and only one) path on the command line. Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	10 years ago
Jeff King	8dbf3eb685	diff.h: rename DIFF_PLAIN color slot to DIFF_CONTEXT The latter is a much more descriptive name (and we support "color.diff.context" now). This also updates the name of any local variables which were used to store the color. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	10 years ago
Jeff King	74b15bfbf6	diff: accept color.diff.context as a synonym for "plain" The term "plain" is a bit ambiguous; let's allow the more specific "context", but keep "plain" around for compatibility. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	10 years ago
Junio C Hamano	b8767f791c	diff.c: --ws-error-highlight=<kind> option Traditionally, we only cared about whitespace breakages introduced in new lines. Some people want to paint whitespace breakages on old lines, too. When they see a whitespace breakage on a new line, they can spot the same kind of whitespace breakage on the corresponding old line and want to say "Ah, those breakages are there but they were inherited from the original, so let's not touch them for now." Introduce `--ws-error-highlight=<kind>` option, that lets them pass a comma separated list of `old`, `new`, and `context` to specify what lines to highlight whitespace errors on. Signed-off-by: Junio C Hamano <gitster@pobox.com>	10 years ago
Junio C Hamano	0e383e185a	diff.c: add emit_del_line() and emit_context_line() Traditionally, we only had emit_add_line() helper, which knows how to find and paint whitespace breakages on the given line, because we only care about whitespace breakages introduced in new lines. The context lines and old (i.e. deleted) lines are emitted with a simpler emit_line_0() that paints the entire line in plain or old colors. Identify callers of emit_line_0() that show deleted lines and context lines, have them call new helpers, emit_del_line() and emit_context_line(), so that we can later tweak what is done to these two classes of lines. Signed-off-by: Junio C Hamano <gitster@pobox.com>	10 years ago
René Scharfe	9a6f1287fb	zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw} Clear the git_zstream variable at the start of git_deflate_init() etc. so that callers don't have to do that. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	10 years ago
Mårten Kongstad	ab27389aff	diff --shortstat --dirstat: remove duplicate output When --shortstat is used in conjunction with --dirstat=changes, git diff will output the dirstat information twice: first as calculated by the 'lines' algorithm, then as calculated by the 'changes' algorithm: $ git diff --dirstat=changes,10 --shortstat v2.2.0..v2.2.1 23 files changed, 453 insertions(+), 54 deletions(-) 33.5% Documentation/RelNotes/ 26.2% t/ 46.6% Documentation/RelNotes/ 16.6% t/ The same duplication happens for --shortstat together with --dirstat=files, but not for --shortstat together with --dirstat=lines. Limit output to only include one dirstat part, calculated as specified by the --dirstat parameter. Also, add test for this. Signed-off-by: Mårten Kongstad <marten.kongstad@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	10 years ago
Jeff King	f6c5a2968c	color_parse: do not mention variable name in error message Originally the color-parsing function was used only for config variables. It made sense to pass the variable name so that the die() message could be something like: $ git -c color.branch.plain=bogus branch fatal: bad color value 'bogus' for variable 'color.branch.plain' These days we call it in other contexts, and the resulting error messages are a little confusing: $ git log --pretty='%C(bogus)' fatal: bad color value 'bogus' for variable '--pretty format' $ git config --get-color foo.bar bogus fatal: bad color value 'bogus' for variable 'command line' This patch teaches color_parse to complain only about the value, and then return an error code. Config callers can then propagate that up to the config parser, which mentions the variable name. Other callers can provide a custom message. After this patch these three cases now look like: $ git -c color.branch.plain=bogus branch error: invalid color value: bogus fatal: unable to parse 'color.branch.plain' from command-line config $ git log --pretty='%C(bogus)' error: invalid color value: bogus fatal: unable to parse --pretty format $ git config --get-color foo.bar bogus error: invalid color value: bogus fatal: unable to parse default color value Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	10 years ago
René Scharfe	d318027932	run-command: introduce CHILD_PROCESS_INIT Most struct child_process variables are cleared using memset first after declaration. Provide a macro, CHILD_PROCESS_INIT, that can be used to initialize them statically instead. That's shorter, doesn't require a function call and is slightly more readable (especially given that we already have STRBUF_INIT, ARGV_ARRAY_INIT etc.). Helped-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Nguyễn Thái Ngọc Duy	1aaf69e669	diff: shortcut for diff'ing two binary SHA-1 objects If we are given two SHA-1 and asked to determine if they are different (but not _what_ differences), we know right away by comparing SHA-1. A side effect of this patch is, because large files are marked binary, diff-tree will not need to unpack them. 'diff-index --cached' will not either. But 'diff-files' still does. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Nguyễn Thái Ngọc Duy	6bf3b81348	diff --stat: mark any file larger than core.bigfilethreshold binary Too large files may lead to failure to allocate memory. If it happens here, it could impact quite a few commands that involve diff. Moreover, too large files are inefficient to compare anyway (and most likely non-text), so mark them binary and skip looking at their content. Noticed-by: Dale R. Worley <worley@alum.mit.edu> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Nguyễn Thái Ngọc Duy	8e5dd3d654	diff.c: allow to pass more flags to diff_populate_filespec Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
René Scharfe	cedc61a998	strbuf: use strbuf_addstr() for adding C strings Avoid code duplication and let strbuf_addstr() call strlen() for us. Signed-off-by: Rene Scharfe <l.s.r@web.de> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Jeff King	0539cc0038	stat_opt: check extra strlen call As in earlier commits, the diff option parser uses starts_with to find that an argument starts with "--stat-", and then adds strlen("stat-") to find the rest of the option. However, in this case the starts_with and the strlen are separated across functions, making it easy to call the latter without the former. Let's use skip_prefix instead of raw pointer arithmetic to catch such a case. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Jeff King	95b567c7c3	use skip_prefix to avoid repeating strings It's a common idiom to match a prefix and then skip past it with strlen, like: if (starts_with(foo, "bar")) foo += strlen("bar"); This avoids magic numbers, but means we have to repeat the string (and there is no compiler check that we didn't make a typo in one of the strings). We can use skip_prefix to handle this case without repeating ourselves. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Jeff King	ae021d8791	use skip_prefix to avoid magic numbers It's a common idiom to match a prefix and then skip past it with a magic number, like: if (starts_with(foo, "bar")) foo += 3; This is easy to get wrong, since you have to count the prefix string yourself, and there's no compiler check if the string changes. We can use skip_prefix to avoid the magic numbers here. Note that some of these conversions could be much shorter. For example: if (starts_with(arg, "--foo=")) { bar = arg + 6; continue; } could become: if (skip_prefix(arg, "--foo=", &bar)) continue; However, I have left it as: if (skip_prefix(arg, "--foo=", &v)) { bar = v; continue; } to visually match nearby cases which need to actually process the string. Like: if (skip_prefix(arg, "--foo=", &v)) { bar = atoi(v); continue; } Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Jeff King	9e1a5ebe52	parse_diff_color_slot: drop ofs parameter This function originally took a whole config variable name ("var") and an offset ("ofs"). It checked "var+ofs" against each color slot, but reported errors using the whole "var". However, since `8b8e862` (ignore unknown color configuration, 2009-12-12), it returns -1 rather than printing its own error, and therefore only cares about var+ofs. We can drop the ofs parameter and teach its sole caller to derive the pointer itself. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Brian Gesiak	1a4927c5c5	diff.c: rearrange xcalloc arguments xcalloc() takes two arguments: the number of elements and their size. diffstat_add() passes the arguments in reverse order, passing the size of a diffstat_file, followed by the number of diffstat_file to be allocated. Rearrange them so they are in the correct order. Signed-off-by: Brian Gesiak <modocache@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Jeff King	dd63f169d9	move "--follow needs one pathspec" rule to diff_setup_done Because of the way "--follow" is implemented, we must have exactly one pathspec. "git log" enforces this restriction, but other users of the revision traversal code do not. For example, "git format-patch --follow" will segfault during try_to_follow_renames, as we have no pathspecs at all. We can push this check down into diff_setup_done, which is probably a better place anyway. It is the diff code that introduces this restriction, so other parts of the code should not need to care themselves. Reported-by: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Jeff King	f3efe78782	run_external_diff: refactor cmdline setup logic The current logic makes it hard to see what gets put onto the command line in which cases. Pulling out a helper function lets us see that we have two sets of file data, and the second set either uses the original name, or the "other" renamed/copy name. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Jeff King	0d4217d92e	run_external_diff: hoist common bits out of conditional Whether we have diff_filespecs to give to the diff command or not, we always are going to run the program and pass it the pathname. Let's pull that duplicated part out of the conditional to make it more obvious. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Jeff King	5b88caa417	run_external_diff: drop fflush(NULL) This fflush was added in `d5535ec` (Use run_command() to spawn external diff programs instead of fork/exec., 2007-10-19), because flushing buffers before forking is a good habit. But later, `7d0b18a` (Add output flushing before fork(), 2008-08-04) added it to the generic run-command interface, meaning that our flush here is redundant. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Jeff King	89294d143d	run_external_diff: clean up error handling When the external diff reports an error, we try to clean up and die. However, we can make this process a bit simpler: 1. We do not need to bother freeing memory, since we are about to exit. Nor do we need to clean up our tempfiles, since the atexit() handler will do it for us. So we can die as soon as we see the error. 3. We can just call die() rather than fprintf/exit. This does technically change our exit code, but the exit code of "1" is not meaningful here. In fact, it is probably wrong, since "1" from diff usually means "completed successfully, but there were differences". And while we're there, we can mark the error message for translation, and drop the full stop at the end to make it more like our other messages. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Jeff King	ae049c955c	run_external_diff: use an argv_array for the environment We currently use static buffers and a static array for formatting the environment passed to the external diff. There's nothing wrong in the code, but it is much easier to verify that it is correct if we use a dynamic argv_array. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Jeff King	82fbf269b9	run_external_diff: use an argv_array for the command line We currently generate the command-line for the external command using a fixed-length array of size 10. But if there is a rename, we actually need 11 elements (10 items, plus a NULL), and end up writing a random NULL onto the stack. Rather than bump the limit, let's just use an argv_array, which makes this sort of error impossible. Noticed-by: Max L <infthi.inbox@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Jiang Xin	d1d96a82bb	i18n: remove obsolete comments for translators in diffstat generation Since we do not translate diffstat any more, remove the obsolete comments. Signed-off-by: Jiang Xin <worldhello.net@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Kirill Smelkov	7195fbfaf5	combine-diff: speed it up, by using multiparent diff tree-walker directly As was recently shown in "combine-diff: optimize combine_diff_path sets intersection", combine-diff runs very slowly. In that commit we optimized paths sets intersection, but that accounted only for ~ 25% of the slowness, and as my tracing showed, for linux.git v3.10..v3.11, for merges a lot of time is spent computing diff(commit,commit^2) just to only then intersect that huge diff to almost small set of files from diff(commit,commit^1). In previous commit, we described the problem in more details, and reworked the diff tree-walker to be general one - i.e. to work in multiple parent case too. Now is the time to take advantage of it for finding paths for combine diff. The implementation is straightforward - if we know, we can get generated diff paths directly, and at present that means no diff filtering or rename/copy detection was requested(), we can call multiparent tree-walker directly and get ready paths. () because e.g. at present, all diffcore transformations work on diff_filepair queues, but in the future, that limitation can be lifted, if filters would operate directly on combine_diff_paths. Timings for `git log --raw --no-abbrev --no-renames` without `-c` ("git log") and with `-c` ("git log -c") and with `-c --merges` ("git log -c --merges") before and after the patch are as follows: linux.git v3.10..v3.11 log log -c log -c --merges before 1.9s 16.4s 15.2s after 1.9s 2.4s 1.1s The result stayed the same. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Kirill Smelkov	72441af7c4	tree-diff: rework diff_tree() to generate diffs for multiparent cases as well Previously diff_tree(), which is now named ll_diff_tree_sha1(), was generating diff_filepair(s) for two trees t1 and t2, and that was usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes a commit introduces. In Git, however, we have fundamentally built flexibility in that a commit can have many parents - 1 for a plain commit, 2 for a simple merge, but also more than 2 for merging several heads at once. For merges there is a so called combine-diff, which shows diff, a merge introduces by itself, omitting changes done by any parent. That works through first finding paths, that are different to all parents, and then showing generalized diff, with separate columns for +/- for each parent. The code lives in combine-diff.c . There is an impedance mismatch, however, in that a commit could generally have any number of parents, and that while diffing trees, we divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there is no special casing for multiple parents commits in e.g. revision-walker . That impedance mismatch hurts performance badly for generating combined diffs - in "combine-diff: optimize combine_diff_path sets intersection" I've already removed some slowness from it, but from the timings provided there, it could be seen, that combined diffs still cost more than an order of magnitude more cpu time, compared to diff for usual commits, and that would only be an optimistic estimate, if we take into account that for e.g. linux.git there is only one merge for several dozens of plain commits. That slowness comes from the fact that currently, while generating combined diff, a lot of time is spent computing diff(commit,commit^2) just to only then intersect that huge diff to almost small set of files from diff(commit,commit^1). That's because at present, to compute combine-diff, for first finding paths, that "every parent touches", we use the following combine-diff property/definition: D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn) (w.r.t. paths) where D(A,P1...Pn) is combined diff between commit A, and parents Pi and D(A,Pi) is usual two-tree diff Pi..A So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n 1-parent diffs and intersecting results will be slow. And usually, for linux.git and other topic-based workflows, that D(A,P2) is huge, because, if merge-base of A and P2, is several dozens of merges (from A, via first parent) below, that D(A,P2) will be diffing sum of merges from several subsystems to 1 subsystem. The solution is to avoid computing n 1-parent diffs, and to find changed-to-all-parents paths via scanning A's and all Pi's trees simultaneously, at each step comparing their entries, and based on that comparison, populate paths result, and deduce we could skip recursing into subdirectories, if at least for 1 parent, sha1 of that dir tree is the same as in A. That would save us from doing significant amount of needless work. Such approach is very similar to what diff_tree() does, only there we deal with scanning only 2 trees simultaneously, and for n+1 tree, the logic is a bit more complex: D(T,P1...Pn) calculation scheme ------------------------------- D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn) (regarding resulting paths set) D(T,Pj) - diff between T..Pj D(T,P1...Pn) - combined diff from T to parents P1,...,Pn We start from all trees, which are sorted, and compare their entries in lock-step: T P1 Pn - - - \|t\| \|p1\| \|pn\| \|-\| \|--\| ... \|--\| imin = argmin(p1...pn) \| \| \| \| \| \| \|-\| \|--\| \|--\| \|.\| \|. \| \|. \| . . . . . . at any time there could be 3 cases: 1) t < p[imin]; 2) t > p[imin]; 3) t = p[imin]. Schematic deduction of what every case means, and what to do, follows: 1) t < p[imin] -> ∀j t ∉ Pj -> "+t" ∈ D(T,Pj) -> D += "+t"; t↓ 2) t > p[imin] 2.1) ∃j: pj > p[imin] -> "-p[imin]" ∉ D(T,Pj) -> D += ø; ∀ pi=p[imin] pi↓ 2.2) ∀i pi = p[imin] -> pi ∉ T -> "-pi" ∈ D(T,Pi) -> D += "-p[imin]"; ∀i pi↓ 3) t = p[imin] 3.1) ∃j: pj > p[imin] -> "+t" ∈ D(T,Pj) -> only pi=p[imin] remains to investigate 3.2) pi = p[imin] -> investigate δ(t,pi) \| \| v 3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø -> ⎧δ(t,pi) - if pi=p[imin] -> D += ⎨ ⎩"+t" - if pi>p[imin] in any case t↓ ∀ pi=p[imin] pi↓ ~ For comparison, here is how diff_tree() works: D(A,B) calculation scheme ------------------------- A B - - \|a\| \|b\| a < b -> a ∉ B -> D(A,B) += +a a↓ \|-\| \|-\| a > b -> b ∉ A -> D(A,B) += -b b↓ \| \| \| \| a = b -> investigate δ(a,b) a↓ b↓ \|-\| \|-\| \|.\| \|.\| . . . . ~~~~~~~~ This patch generalizes diff tree-walker to work with arbitrary number of parents as described above - i.e. now there is a resulting tree t, and some parents trees tp[i] i=[0..nparent). The generalization builds on the fact that usual diff D(A,B) is by definition the same as combined diff D(A,[B]), so if we could rework the code for common case and make it be not slower for nparent=1 case, usual diff(t1,t2) generation will not be slower, and multiparent diff tree-walker would greatly benefit generating combine-diff. What we do is as follows: 1) diff tree-walker ll_diff_tree_sha1() is internally reworked to be a paths generator (new name diff_tree_paths()), with each generated path being `struct combine_diff_path` with info for path, new sha1,mode and for every parent which sha1,mode it was in it. 2) From that info, we can still generate usual diff queue with struct diff_filepairs, via "exporting" generated combine_diff_path, if we know we run for nparent=1 case. (see emit_diff() which is now named emit_diff_first_parent_only()) 3) In order for diff_can_quit_early(), which checks DIFF_OPT_TST(opt, HAS_CHANGES)) to work, that exporting have to be happening not in bulk, but incrementally, one diff path at a time. For such consumers, there is a new callback in diff_options introduced: ->pathchange(opt, struct combine_diff_path ) which, if set to !NULL, is called for every generated path. (see new compat ll_diff_tree_sha1() wrapper around new paths generator for setup) 4) The paths generation itself, is reworked from previous ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation scheme" provided above: On the start we allocate [nparent] arrays in place what was earlier just for one parent tree. then we just generalize loops, and comparison according to the algorithm. Some notes(): 1) alloca(), for small arrays, is used for "runs not slower for nparent=1 case than before" goal - if we change it to xmalloc()/free() the timings get ~1% worse. For alloca() we use just-introduced xalloca/xalloca_free compatibility wrappers, so it should not be a portability problem. 2) For every parent tree, we need to keep a tag, whether entry from that parent equals to entry from minimal parent. For performance reasons I'm keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ. Not doing so, we'd need to alloca another [nparent] array, which hurts performance. 3) For emitted paths, memory could be reused, if we know the path was processed via callback and will not be needed later. We use efficient hand-made realloc-style path_appendnew(), that saves us from ~1-1.5% of potential additional slowdown. 4) goto(s) are used in several places, as the code executes a little bit faster with lowered register pressure. Also - we should now check for FIND_COPIES_HARDER not only when two entries names are the same, and their hashes are equal, but also for a case, when a path was removed from some of all parents having it. The reason is, if we don't, that path won't be emitted at all (see "a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants all paths - with diff or without - to be emitted, to be later analyzed for being copies sources. The new check is only necessary for nparent >1, as for nparent=1 case xmin_eqtotal always =1 =nparent, and a path is always added to diff as removal. ~~~~~~~~ Timings for # without -c, i.e. testing only nparent=1 case `git log --raw --no-abbrev --no-renames` before and after the patch are as follows: navy.git linux.git v3.10..v3.11 before 0.611s 1.889s after 0.619s 1.907s slowdown 1.3% 0.9% This timings show we did no harm to usual diff(tree1,tree2) generation. From the table we can see that we actually did ~1% slowdown, but I think I've "earned" that 1% in the previous patch ("tree-diff: reuse base str(buf) memory on sub-tree recursion", HEAD~~) so for nparent=1 case, net timings stays approximately the same. The output also stayed the same. (*) If we revert 1)-4) to more usual techniques, for nparent=1 case, we'll get ~2-2.5% of additional slowdown, which I've tried to avoid, as "do no harm for nparent=1 case" rule. For linux.git, combined diff will run an order of magnitude faster and appropriate timings will be provided in the next commit, as we'll be taking advantage of the new diff tree-walker for combined-diff generation there. P.S. and combined diff is not some exotic/for-play-only stuff - for example for a program I write to represent Git archives as readonly filesystem, there is initial scan with `git log --reverse --raw --no-abbrev --no-renames -c` to extract log of what was created/changed when, as a result building a map {} sha1 -> in which commit (and date) a content was added that `-c` means also show combined diff for merges, and without them, if a merge is non-trivial (merges changes from two parents with both having separate changes to a file), or an evil one, the map will not be full, i.e. some valid sha1 would be absent from it. That case was my initial motivation for combined diffs speedup. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Justin Lebar	01689909eb	comments: fix misuses of "nor" Signed-off-by: Justin Lebar <jlebar@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Rohit Mani	2c5495f7b6	use strchrnul() in place of strchr() and strlen() Avoid scanning strings twice, once with strchr() and then with strlen(), by using strchrnul(). Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rohit Mani <rohit.mani@outlook.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Dmitry S. Dolzhenko	4c960a432c	diff.c: use ALLOC_GROW() Use ALLOC_GROW() instead of open-coding it in diffstat_add() and diff_q(). Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Nguyễn Thái Ngọc Duy	f34b205f6c	diff: do not quit early on stat-dirty files When QUICK is set (i.e. with --quiet) we try to do as little work as possible, stopping after seeing the first change. stat-dirty is considered a "change" but it may turn out not, if no actual content is changed. The actual content test is performed too late in the process and the shortcut may be taken prematurely, leading to incorrect return code. Assume we do "git diff --quiet". If we have a stat-dirty file "a" and a really dirty file "b". We break the loop in run_diff_files() and stop after "a" because we have got a "change". Later in diffcore_skip_stat_unmatch() we find out "a" is actually not changed. But there's nothing else in the diff queue, we incorrectly declare "no change", ignoring the fact that "b" is changed. This also happens to "git diff --quiet HEAD" when it hits diff_can_quit_early() in oneway_diff(). This patch does the content test earlier in order to keep going if "a" is unchanged. The test result is cached so that when diffcore_skip_stat_unmatch() is done in the end, we spend no cycles on re-testing "a". Reported-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Nguyễn Thái Ngọc Duy	fceb907225	diff.c: move diffcore_skip_stat_unmatch core logic out for reuse later Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Thomas Rast	aba4727281	diff: do not reuse_worktree_file for submodules The GIT_EXTERNAL_DIFF calling code attempts to reuse existing worktree files for the worktree side of diffs, for performance reasons. However, that code also tries to do the same with submodules. This results in calls to $GIT_EXTERNAL_DIFF where the old-file is a file of the form "Submodule commit $sha1", but the new-file is a directory in the worktree. Fix it by never reusing a worktree "file" in the submodule case. Reported-by: Grégory Pakosz <gregory.pakosz@gmail.com> Signed-off-by: Thomas Rast <tr@thomasrast.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago
Jeff King	428d52a5a5	diff_filespec: drop xfrm_flags field The only mention of this field in the code is by some debugging code which prints it out (and it will always be zero, since we never touch it otherwise). It was obsoleted very early on by `25d5ea4` ([PATCH] Redo rename/copy detection logic., 2005-05-24). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	11 years ago

1 2 3 4 5 ...

966 Commits (9b601eafd1437df2e11b032bfbfd1ac5d32d3290)