The annotations emitted by chainlint to indicate detected problems are
overly terse, so much so that developers new to the project -- those who
should most benefit from the linting -- may find them baffling. For
instance, although the author of chainlint and seasoned Git developers
may understand that "?!AMP?!" is an abbreviation of "ampersand" and
indicates a break in the &&-chain, this may not be obvious to newcomers.
The "?!LOOP?!" case is particularly serious because that terse single
word does nothing to convey that the loop body should end with
"|| return 1" (or "|| exit 1" in a subshell) to ensure that a failing
command in the body aborts the loop immediately. Moreover, unlike
&&-chaining which is ubiquitous in Git tests, the "|| return 1" idiom is
relatively infrequent, thus may be harder for a newcomer to discover by
consulting nearby code.
Address these shortcomings by emitting human-readable messages which
both explain the problem and give a strong hint about how to correct it.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While working on chainlint.pl recently, we introduced some bugs that
showed incorrect line numbers in the output. But it was hard to notice,
since we sanitize the output by removing all of the line numbers! It
would be nice to retain these so we can catch any regressions.
The main reason we sanitize is for maintainability: we concatenate all
of the test snippets into a single file, so it's hard for each ".expect"
file to know at which offset its test input will be found. We can handle
that by storing the per-test line numbers in the ".expect" files, and
then dynamically offsetting them as we build the concatenated test and
expect files together.
The changes to the ".expect" files look like tedious boilerplate, but it
actually makes adding new tests easier. You can now just run:
perl chainlint.pl chainlint/foo.test |
tail -n +2 >chainlint/foo.expect
to save the output of the script minus the comment headers (after
checking that it is correct, of course). Whereas before you had to strip
the line numbers. The conversions here were done mechanically using
something like the script above, and then spot-checked manually.
It would be possible to do all of this in shell via the Makefile, but it
gets a bit complicated (and requires a lot of extra processes). Instead,
I've written a short perl script that generates the concatenated files
(we already depend on perl, since chainlint.pl uses it). Incidentally,
this improves a few other things:
- we incorrectly used $(CHAINLINTTMP_SQ) inside a double-quoted
string. So if your test directory required quoting, like:
make "TEST_OUTPUT_DIRECTORY=/tmp/h'orrible"
we'd fail the chainlint tests.
- the shell in the Makefile didn't handle &&-chaining correctly in its
loops (though in practice the "sed" and "cat" invocations are not
likely to fail).
- likewise, the sed invocation to strip numbers was hiding the exit
code of chainlint.pl itself. In practice this isn't a big deal;
since there are linter violations in the test files, we expect it to
exit non-zero. But we could later use exit codes to distinguish
serious errors from expected ones.
- we now use a constant number of processes, instead of scaling with
the number of test scripts. So it should be a little faster (on my
machine, "make check-chainlint" goes from 133ms to 73ms).
There are some alternatives to this approach, but I think this is still
a good intermediate step:
1. We could invoke chainlint.pl individually on each test file, and
compare it to the expected output (and possibly using "make" to
avoid repeating already-done checks). This is a much bigger change
(and we'd have to figure out what to do with the "# LINT" lines in
the inputs). But in this case we'd still want the "expect" files to
be annotated with line numbers. So most of what's in this patch
would be needed anyway.
2. Likewise, we could run a single chainlint.pl and feed it all of the
scripts (with "--jobs=1" to get deterministic output). But we'd
still need to annotate the scripts as we did here, and we'd still
need to either assemble the "expect" file, or break apart the
script output to compare to each individual ".expect" file.
So we may pursue those in the long run, but this patch gives us more
robust tests without too much extra work or moving in a useless
direction.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition,
it instead dumps out a parsed token representation of the test. Since it
lacks comments, indentations, here-doc bodies, and so forth, this
tokenized representation can be difficult for the test author to digest
and relate back to the original test definition.
However, now that each parsed token carries positional information, the
location of a detected problem can be pinpointed precisely in the
original test definition. Therefore, take advantage of this information
to annotate the test definition itself rather than annotating the parsed
token stream, thus making it easier for a test author to relate a
problem back to the source.
Maintaining the positional meta-information associated with each
detected problem requires a slight change in how the problems are
managed internally. In particular, shell syntax such as:
msg="total: $(cd data; wc -w *.txt) words"
requires the lexical analyzer to recursively invoke the parser in order
to detect problems within the $(...) expression inside the double-quoted
string. In this case, the recursive parse context will detect the broken
&&-chain between the `cd` and `wc` commands, returning the token stream:
cd data ; ?!AMP?! wc -w *.txt
However, the parent parse context will see everything inside the
double-quotes as a single string token:
"total: $(cd data ; ?!AMP?! wc -w *.txt) words"
losing whatever positional information was attached to the ";" token
where the problem was detected.
One way to preserve the positional information of a detected problem in
a recursive parse context within a string would be to attach the
positional information to the annotation textually; for instance:
"total: $(cd data ; ?!AMP:21:22?! wc -w *.txt) words"
and then extract the positional information when annotating the original
test definition.
However, a cleaner and much simpler approach is to maintain the list of
detected problems separately rather than embedding the problems as
annotations directly in the parsed token stream. Not only does this
ensure that positional information within recursive parse contexts is
not lost, but it keeps the token stream free from non-token pollution,
which may simplify implementation of validations added in the future
since they won't have to handle non-token "?!FOO!?" items specially.
Finally, the chainlint self-test "expect" files need a few mechanical
adjustments now that the original test definitions are emitted rather
than the parsed token stream. In particular, the following items missing
from the historic parsed-token output are now preserved verbatim:
* indentation (and whitespace, in general)
* comments
* here-doc bodies
* here-doc tag quoting (i.e. "\EOF")
* line-splices (i.e. "\" at the end of a line)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The purpose of chainlint is to highlight problems it finds in test code
by inserting annotations at the location of each problem. Arbitrarily
eliding bits of the code it is checking is not helpful, yet this is
exactly what chainlint.sed does by cavalierly and unnecessarily dropping
the here-doc operator and tag; i.e. `cat <<TAG` becomes simply `cat` in
the output. This behavior can make it more difficult for the test writer
to align the annotated output of chainlint.sed with the original test
code. Address this by retaining here-doc tags.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
chainlint.sed inserts a ">" annotation at the beginning of a line to
signal that its heuristics have identified an end-of-subshell. This was
useful as a debugging aid during development of the script, but it has
no value to test writers and might even confuse them into thinking that
the linter is misbehaving by inserting line-noise into the shell code it
is validating. Moreover, its presence also potentially makes it
difficult to reuse the chainlint self-test "expect" output should a more
capable linter ever be developed. Therefore, drop the ">" annotation.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When chainlint.sed detects a broken &&-chain, it places an ?!AMP?!
annotation at the beginning of the line. However, this is an unusual
location for programmers accustomed to error messages (from compilers,
for instance) indicating the exact point of the problem. Therefore,
relocate the ?!AMP?! annotation to the end of the line in order to
better direct the programmer's attention to the source of the problem.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
chainlint.sed swallows top-level here-docs to avoid being fooled by
content which might look like start-of-subshell. It likewise swallows
here-docs in subshells to avoid marking content lines as breaking the
&&-chain, and to avoid being fooled by content which might look like
end-of-subshell, start-of-nested-subshell, or other specially-recognized
constructs.
At the time of implementation, it was believed that it was not possible
to support arbitrary here-doc tag names since 'sed' provides no way to
stash the opening tag name in a variable for later comparison against a
line signaling end-of-here-doc. Consequently, tag names are hard-coded,
with "EOF" being the only tag recognized at the top-level, and only
"EOF", "EOT", and "INPUT_END" being recognized within subshells. Also,
special care was taken to avoid being confused by here-docs nested
within other here-docs.
In practice, this limited number of hard-coded tag names has been "good
enough" for the 13000+ existing Git test, despite many of those tests
using tags other than the recognized ones, since the bodies of those
here-docs do not contain content which would fool the linter.
Nevertheless, the situation is not ideal since someone writing new
tests, and choosing a name not in the "blessed" set could potentially
trigger a false-positive.
To address this shortcoming, upgrade chainlint.sed to handle arbitrary
here-doc tag names, both at the top-level and within subshells.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --chain-lint option uses heuristics and knowledge of shell syntax to
detect broken &&-chains in subshells by pure textual inspection. The
heuristics handle a range of stylistic variations in existing tests
(evolved over the years), however, they are still best-guesses. As such,
it is possible for future changes to accidentally break assumptions upon
which the heuristics are based. Protect against this possibility by
adding tests which check the linter itself for correctness.
In addition to protecting against regressions, these tests help document
(for humans) expected behavior, which is important since the linter's
implementation language ('sed') does not necessarily lend itself to easy
comprehension.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>