Commit Graph

8 Commits (f290089879501a855df2eb41db5b38cb0035a765)

Author SHA1 Message Date
Jeff King a64e6a44c6 diff: clarify textconv interface
The memory allocation scheme for the textconv interface is a
bit tricky, and not well documented. It was originally
designed as an internal part of diff.c (matching
fill_mmfile), but gradually was made public.

Refactoring it is difficult, but we can at least improve the
situation by documenting the intended flow and enforcing it
with an in-code assertion.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 10:40:35 -08:00
Jeff King 3813e69031 refactor get_textconv to not require diff_filespec
This function actually does two things:

  1. Load the userdiff driver for the filespec.

  2. Decide whether the driver has a textconv component, and
     initialize the textconv cache if applicable.

Only part (1) requires the filespec object, and some callers
may not have a filespec at all. So let's split them it into
two functions, and put part (2) with the userdiff code,
which is a better fit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-23 15:46:02 -07:00
Jeff King d9bae1a178 diff: cache textconv output
Running a textconv filter can take a long time. It's
particularly bad for a large file which needs to be spooled
to disk, but even for small files, the fork+exec overhead
can add up for something like "git log -p".

This patch uses the notes-cache mechanism to keep a fast
cache of textconv output. Caches are stored in
refs/notes/textconv/$x, where $x is the userdiff driver
defined in gitattributes.

Caching is enabled only if diff.$x.cachetextconv is true.

In my test repo, on a commit with 45 jpg and avi files
changed and a textconv to show their exif tags:

  [before]
  $ time git show >/dev/null
  real    0m13.724s
  user    0m12.057s
  sys     0m1.624s

  [after, first run]
  $ git config diff.mfo.cachetextconv true
  $ time git show >/dev/null
  real    0m14.252s
  user    0m12.197s
  sys     0m1.800s

  [after, subsequent runs]
  $ time git show >/dev/null
  real    0m0.352s
  user    0m0.148s
  sys     0m0.200s

So for a slight (3.8%) cost on the first run, we achieve an
almost 40x speed up on subsequent runs.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 00:05:31 -07:00
Thomas Rast 80c49c3de2 color-words: make regex configurable via attributes
Make the --color-words splitting regular expression configurable via
the diff driver's 'wordregex' attribute.  The user can then set the
driver on a file in .gitattributes.  If a regex is given on the
command line, it overrides the driver's setting.

We also provide built-in regexes for the languages that already had
funcname patterns, and add an appropriate diff driver entry for C/++.
(The patterns are designed to run UTF-8 sequences into a single chunk
to make sure they remain readable.)

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-17 10:44:21 -08:00
Jeff King c7534ef4a1 userdiff: require explicitly allowing textconv
Diffs that have been produced with textconv almost certainly
cannot be applied, so we want to be careful not to generate
them in things like format-patch.

This introduces a new diff options, ALLOW_TEXTCONV, which
controls this behavior. It is off by default, but is
explicitly turned on for the "log" family of commands, as
well as the "diff" porcelain (but not diff-* plumbing).

Because both text conversion and external diffing are
controlled by these diff options, we can get rid of the
"plumbing versus porcelain" distinction when reading the
config. This was an attempt to control the same thing, but
suffered from being too coarse-grained.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-26 14:09:48 -07:00
Jeff King 9cb92c390c diff: add filter for converting binary to text
When diffing binary files, it is sometimes nice to see the
differences of a canonical text form rather than either a
binary patch or simply "binary files differ."

Until now, the only option for doing this was to define an
external diff command to perform the diff. This was a lot of
work, since the external command needed to take care of
doing the diff itself (including mode changes), and lost the
benefit of git's colorization and other options.

This patch adds a text conversion option, which converts a
file to its canonical format before performing the diff.
This is less flexible than an arbitrary external diff, but
is much less work to set up. For example:

  $ echo '*.jpg diff=exif' >>.gitattributes
  $ git config diff.exif.textconv exiftool
  $ git config diff.exif.binary false

allows one to see jpg diffs represented by the text output
of exiftool.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-18 08:02:55 -07:00
Jeff King 122aa6f9c0 diff: introduce diff.<driver>.binary
The "diff" gitattribute is somewhat overloaded right now. It
can say one of three things:

  1. this file is definitely binary, or definitely not
     (i.e., diff or !diff)
  2. this file should use an external diff engine (i.e.,
     diff=foo, diff.foo.command = custom-script)
  3. this file should use particular funcname patterns
     (i.e., diff=foo, diff.foo.(x?)funcname = some-regex)

Most of the time, there is no conflict between these uses,
since using one implies that the other is irrelevant (e.g.,
an external diff engine will decide for itself whether the
file is binary).

However, there is at least one conflicting situation: there
is no way to say "use the regular rules to determine whether
this file is binary, but if we do diff it textually, use
this funcname pattern." That is, currently setting diff=foo
indicates that the file is definitely text.

This patch introduces a "binary" config option for a diff
driver, so that one can explicitly set diff.foo.binary. We
default this value to "don't know". That is, setting a diff
attribute to "foo" and using "diff.foo.funcname" will have
no effect on the binaryness of a file. To get the current
behavior, one can set diff.foo.binary to true.

This patch also has one additional advantage: it cleans up
the interface to the userdiff code a bit. Before, calling
code had to know more about whether attributes were false,
true, or unset to determine binaryness. Now that binaryness
is a property of a driver, we can represent these situations
just by passing back a driver struct.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-18 08:02:26 -07:00
Jeff King be58e70dba diff: unify external diff and funcname parsing code
Both sets of code assume that one specifies a diff profile
as a gitattribute via the "diff=foo" attribute. They then
pull information about that profile from the config as
diff.foo.*.

The code for each is currently completely separate from the
other, which has several disadvantages:

  - there is duplication as we maintain code to create and
    search the separate lists of external drivers and
    funcname patterns

  - it is difficult to add new profile options, since it is
    unclear where they should go

  - the code is difficult to follow, as we rely on the
    "check if this file is binary" code to find the funcname
    pattern as a side effect. This is the first step in
    refactoring the binary-checking code.

This patch factors out these diff profiles into "userdiff"
drivers. A file with "diff=foo" uses the "foo" driver, which
is specified by a single struct.

Note that one major difference between the two pieces of
code is that the funcname patterns are always loaded,
whereas external drivers are loaded only for the "git diff"
porcelain; the new code takes care to retain that situation.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-18 08:02:21 -07:00