Build procedure update plus introduction of Meson based builds.
* ps/build: (24 commits)
Introduce support for the Meson build system
Documentation: add comparison of build systems
t: allow overriding build dir
t: better support for out-of-tree builds
Documentation: extract script to generate a list of mergetools
Documentation: teach "cmd-list.perl" about out-of-tree builds
Documentation: allow sourcing generated includes from separate dir
Makefile: simplify building of templates
Makefile: write absolute program path into bin-wrappers
Makefile: allow "bin-wrappers/" directory to exist
Makefile: refactor generators to be PWD-independent
Makefile: extract script to generate gitweb.js
Makefile: extract script to generate gitweb.cgi
Makefile: extract script to massage Python scripts
Makefile: extract script to massage Shell scripts
Makefile: use "generate-perl.sh" to massage Perl library
Makefile: extract script to massage Perl scripts
Makefile: consistently use PERL_PATH
Makefile: generate doc versions via GIT-VERSION-GEN
Makefile: generate "git.rc" via GIT-VERSION-GEN
...
"git tag" has been taught to refuse to create refs/tags/HEAD
as such a tag will be confusing in the context of UI provided by
the Git Porcelain commands.
* jc/forbid-head-as-tagname:
tag: "git tag" refuses to use HEAD as a tagname
t5604: do not expect that HEAD can be a valid tagname
refs: drop strbuf_ prefix from helpers
refs: move ref name helpers around
In order to generate "gitweb.cgi" we have to replace various different
placeholders. This is done ad-hoc and is thus not easily reusable across
different build systems.
Introduce a new GITWEB-BUILD-OPTIONS.in template that we populate at
configuration time with the expected options. This script is then used
as input for a new "generate-gitweb.sh" script that generates the final
"gitweb.cgi" file. While this requires us to repeat the options multiple
times, it is in line to how we generate other build options like our
GIT-BUILD-OPTIONS file.
While at it, refactor how we replace the GITWEB_PROJECT_MAXDEPTH. Even
though this variable is supposed to be an integer, the source file has
the value quoted. The quotes are eventually stripped via sed(1), which
replaces `"@GITWEB_PROJECT_MAXDEPTH@"` with the actual value, which is
rather nonsensical. This is made clearer by just dropping the quotes in
the source file.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a bunch of placeholders in our scripts that we replace at build
time, for example by using sed(1). These placeholders come in three
different formats: @PLACEHOLDER@, @@PLACEHOLDER@@ and ++PLACEHOLDER++.
Next to being inconsistent it also creates a bit of a problem with
CMake, which only supports the first syntax in its `configure_file()`
function. To work around that we instead manually replace placeholders
via string operations, which is a hassle and removes safeguards that
CMake has to verify that we didn't forget to replace any placeholders.
Besides that, other build systems like Meson also support the CMake
syntax.
Unify our codebase to consistently use the syntax supported by such
build systems.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The helper functions (strbuf_branchname, strbuf_check_branch_ref,
and strbuf_check_tag_ref) are about handling branch and tag names,
and it is a non-essential fact that these functions use strbuf to
hold these names. Rename them to make it clarify that these are
more about "ref".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In Perl 5.14, released in May 2011, the r modifier was added to the s///
operator to allow it to return the modified string instead of modifying
the string in place. This allows to write nicer, more succinct code in
several cases, so let's do that here.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Our platform support policy states that we require "versions of
dependencies which are generally accepted as stable and supportable,
e.g., in line with the version used by other long-term-support
distributions". Of Debian, Ubuntu, RHEL, and SLES, the four most common
distributions that provide LTS versions, the version with mainstream
long-term security support with the oldest Perl is 5.26.0 in SLES 15.6.
This is a major upgrade, since Perl 5.8.1, according to the Perl
documentation, was released in September of 2003. It brings a lot of
new features that we can choose to use, such as s///r to return the
modified string, the postderef functionality, and subroutine signatures,
although the latter was still considered experimental until 5.36.
This change was made with the following one-liner, which intentionally
excludes modifying the vendored modules we include to avoid conflicts:
git grep -l 'use 5.008001' | grep -v 'LoadCPAN/' | xargs perl -pi -e 's/use 5.008001/require v5.26/'
Use require instead of use to avoid changing the behavior as the latter
enables features and the former does not.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
GitWeb update to use committer date consistently in rss/atom feeds.
* am/gitweb-feed-use-committer-date:
gitweb: rss/atom change published/updated date to committer date
The author date is used for published/updated date in the rss/atom
feed stream. Change it to the committer date that reflects the
"published/updated" definition better and makes rss/atom feeds more
linear. Gitlab/Github rss/atom feeds use the committer date.
Additionally, to be consistent, also use the committer date to
determine the date of the last commit to send in the feed
instead of the author date.
Signed-off-by: Jesús Ariel Cabello Mateos <080ariel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When given an existing but unreadable file as a configuration file,
gitweb behaved as if the file did not exist at all, but now it
errors out. This is a change that may break backward compatibility.
* mj/gitweb-unreadable-config-error:
gitweb: die when a configuration file cannot be read
Fix a possibility of a permission to access error go unnoticed.
Perl uses two different variables to manage errors from a "do
$filename" construct. One is $@, which is set in this case when do
is unable to compile the file. The other is $!, which is set in case
do cannot read the file. The current code only checks "$@", which
means a configuration file passed to GitWeb that is not readable by
the server process does not cause it to "die".
Make sure we also check and act on "$!" to fix this.
Signed-off-by: Marcelo Roberto Jimenez <marcelo.jimenez@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stale URLs have been updated to their current counterparts (or
archive.org) and HTTP links are replaced with working HTTPS links.
* js/update-urls-in-doc-and-comment:
doc: refer to internet archive
doc: update links for andre-simon.de
doc: switch links to https
doc: update links to current pages
These pages are no longer reachable from their original locations,
which makes things difficult for readers. Instead, switch to linking to
the Internet Archive for the content.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Beyond the fact that it's somewhat traditional to respect sites'
self-identification, it's helpful for links to point to the things
that people expect them to reference. Here that means linking to
specific pages instead of a domain.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These sites offer https versions of their content.
Using the https versions provides some protection for users.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The following commit will make use of a Getopt::Long feature which is
only present in Perl >= 5.8.1. Document that as the minimum version we
support.
Many of our Perl scripts will continue to run with 5.8.0 but this change
allows us to adjust them as needed without breaking any promises to our
users.
The Perl requirement was last changed in d48b284183 (perl: bump the
required Perl version to 5.8 from 5.6.[21], 2010-09-24). At that time,
5.8.0 was 8 years old. It is now over 21 years old.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Those heuristics are way outdated and too specific to the kernel project
to be useful outside of kernel.org. Since kernel.org doesn't use gitweb
anymore and at least one project complained about incorrect behavior,
entirely remove them.
Signed-off-by: Julien Rouhaud <julien.rouhaud@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
According to the HTML Standard FAQ:
“What is the DOCTYPE for modern HTML documents?
In text/html documents:
<!DOCTYPE html>
In documents delivered with an XML media type: no DOCTYPE is required
and its use is generally unnecessary. However, you may use one if you
want (see the following question). Note that the above is well-formed
XML.”
Source: [1]
Gitweb uses an XHTML 1.0 DOCTYPE:
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
While that DOCTYPE is still valid [2], it has several disadvantages:
1. It’s misleading. If an XML parser uses the DTD at the given link,
then the entities and ⋅ won’t get declared. Instead, the
parser has to use a DTD from the HTML Standard that has nothing to do
with XHTML 1.0 [2].
2. It’s obsolete. XHTML 1.0 was last revised in 2002 and was superseded in
2018 [3].
3. It’s unreliable. Gitweb uses and ⋅ but lets an external file
define them. “[…U]using entity references for characters in XML documents
is unsafe if they are defined in an external file (except for <, >,
&, ", and ').” [4]
[1]: <https://github.com/whatwg/html/blob/main/FAQ.md#what-is-the-doctype-for-modern-html-documents>
[2]: <https://html.spec.whatwg.org/multipage/xhtml.html#parsing-xhtml-documents>
[3]: <https://www.w3.org/TR/xhtml1/#xhtml>
[4]: <https://html.spec.whatwg.org/multipage/xhtml.html#writing-xhtml-documents>
Signed-off-by: Jason Yundt <jason@jasonyundt.email>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before this change, gitweb would generate pages which included:
<meta http-equiv="content-type" content="application/xhtml+xml; charset=utf-8"/>
When a meta's http-equiv equals "content-type", the http-equiv is said
to be in the "Encoding declaration state". According to the HTML
Standard,
The Encoding declaration state may be used in HTML documents,
but elements with an http-equiv attribute in that state must not
be used in XML documents.
Source: <https://html.spec.whatwg.org/multipage/semantics.html#attr-meta-http-equiv-content-type>
This change removes that meta element since gitweb always generates XML
documents.
Signed-off-by: Jason Yundt <jason@jasonyundt.email>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tie-break branches that point at the same object in the list of
branches on GitWeb to show the one pointed at by HEAD early.
* gh/gitweb-branch-sort:
gitweb: use HEAD as secondary sort key in git_get_heads_list()
The "heads" section on the gitweb summary page shows heads in
`-committerdate` order (ie. the most recently-modified ones at the
top), tie-breaking equal-dated refs using the implicit `refname` sort
fallback. This recency-based ordering appears in multiple places in the
UI, such as the project listing, the tags list, and even the
shortlog and log views.
Given two equal-dated refs, however, sorting the `HEAD` ref before
the non-`HEAD` ref provides more useful signal than merely sorting by
refname. For example, say we had "master" and "trunk" both pointing at
the same commit but "trunk" was `HEAD`, sorting "trunk" first helps
communicate its special status as the default branch that you'll check
out if you clone the repo.
Add `-HEAD` as a secondary sort key to the `git for-each-ref` call
in `git_get_heads_list()` to provide the desired behavior. The most
recently committed refs will appear first, but `HEAD`-ness will be used
as a tie-breaker. Note that `refname` is the implicit fallback sort key,
which means that two same-dated non-`HEAD` refs will continue to be
sorted in lexicographical order, as they are today.
Signed-off-by: Greg Hurrell <greg@hurrell.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Gitweb extracts content from the Git log and makes it accessible
over HTTP. As a result, e-mail addresses found in commits are
exposed to web crawlers and they may not respect robots.txt.
This can result in unsolicited messages.
Introduce an 'email-privacy' feature which redacts e-mail addresses
from the generated HTML content. Specifically, obscure addresses
retrieved from the the author/committer and comment sections of the
Git log. The feature is off by default.
This feature does not prevent someone from downloading the
unredacted commit log, e.g., by cloning the repository, and
extracting information from it. It aims to hinder the low-
effort, bulk collection of e-mail addresses by web crawlers.
Signed-off-by: Georgios Kontaxis <geko1702+commits@99rst.org>
Acked-by: Eric Wong <e@80x24.org>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit trailers like "Thanks-to:", "Fixes:", and "Closes:" are fairly
common, but gitweb didn't highlight them like other trailers.
Signed-off-by: Emma Brooks <me@pluvano.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
FCGI streams are implemented using the older stream API: TIEHANDLE,
therefore applying PerlIO layers using binmode() has no effect to them.
The solution in this patch is to redefine the FCGI::Stream::PRINT function
to use UTF-8 as output encoding, except within git_blob_plain() and git_snapshot()
which must still output in raw binary mode.
This problem and solution were previously reported back in 2012:
- http://git.661346.n2.nabble.com/Gitweb-running-as-FCGI-does-not-print-its-output-in-UTF-8-td7573415.html
- http://stackoverflow.com/questions/5005104
Signed-off-by: Julien Moutinho <julm+git@sourcephile.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some codepaths in "gitweb" that forgot to escape URLs generated
based on end-user input have been corrected.
* jk/gitweb-anti-xss:
gitweb: escape URLs generated by href()
t/gitweb-lib.sh: set $REQUEST_URI
t/gitweb-lib.sh: drop confusing quotes
t9502: pass along all arguments in xss helper
There's a cross-site scripting problem in gitweb, where it will print
URLs generated by its href() helper without further quoting. This allows
an attacker to point a victim to a specially crafted gitweb URL and
inject arbitrary HTML into the resulting page (which the victim sees as
coming from gitweb).
The base of the URL comes from evaluate_uri(), which pulls the value of
$REQUEST_URI via the CGI module. It tries to strip off $PATH_INFO, but
fails to do so in some cases (including ones that contain special
characters, like "+"). Most of the uses of the URL end up being passed
to "$cgi->a(-href = href())", which will get quoted properly by the CGI
module. But in a few places, we output them ourselves as part of
manually-generated HTML, and whatever was in the original URL will
appear unquoted in the output.
Given that all of the nearby variables placed into this manual HTML
_are_ quoted, it seems like the authors assumed that these URLs would
not need quoting. So it's possible that the bug is actually in
evaluate_uri(), which should be doing a more careful job of stripping
$PATH_INFO. There's some discussion in a comment in that function, as
well as the commit message in 81d3fe9f48 (gitweb: fix wrong base URL
when non-root DirectoryIndex, 2009-02-15). But I'm not sure I understand
it.
Regardless, it's a good idea to quote these values at the point of
insertion into the HTML output:
1. Even if there is a bug in evaluate_uri(), this would give us
belt-and-suspenders protection.
2. evaluate_uri() is only handling the base. Some generated URLs will
also mention arbitrary refs or filenames in the repositories, and
these should be quoted anyway.
3. It should never _hurt_ to quote (and that's what all of the
$cgi->a() calls are doing already).
So there may be further work here, but this patch at least prevents the
XSS vulnerability, and shouldn't make anything worse.
The test here covers the calls in print_feed_meta(), but I manually
audited every call to href() to see how its output was used, and quoted
appropriately. Most of them are esc_attr(), as they're used in tag
attributes, but I used esc_html() when the URLs were printed bare. The
distinction is largely academic, as one is implemented as a wrapper for
the other.
Reported-by: NAKAYAMA DAISUKE <nakyamad@icloud.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Gitweb has several hard-coded 40 values throughout it to check for
values that are passed in or acquired from Git. To simplify the code,
introduce a regex variable that matches either exactly 40 or exactly 64
hex characters, and use this variable anywhere we would have previously
hard-coded a 40 in a regex.
Add some helper functions which allow us to write tighter regexes that
match exactly the number of hex characters we're expecting.
Similarly, switch the code that looks for deleted diffinfo information
to look for either 40 or 64 zeros, and update one piece of code to use
this function. Finally, when formatting a log line, allow an
abbreviated describe output to contain up to 64 characters.
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since my d48b284183 ("perl: bump the required Perl version to 5.8 from
5.6.[21]", 2010-09-24), we've depended on 5.8, so there's no reason to
conditionally require Digest::MD5 anymore. It was released with perl
v5.7.3[1]
The initial introduction of the dependency in
e9fdd74e53 ("gitweb: (gr)avatar support", 2009-06-30) says as much,
this also undoes part of the later 2e9c8789b7 ("gitweb: Mention
optional Perl modules in INSTALL", 2011-02-04) since gitweb will
always be run on at least 5.8, so there's no need to mention
Digest::MD5 as a required module in the documentation, let's instead
say that we require perl 5.8.
1. $ corelist Digest::MD5
Data for 2015-02-14
Digest::MD5 was first released with perl v5.7.3
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"gitweb" checks if a directory is searchable with Perl's "-x"
operator, which can be enhanced by using "filetest 'access'"
pragma, which now we do.
* gc/gitweb-filetest-acl:
gitweb: use filetest to allow ACLs
In commit 46a1385 (gitweb: skip unreadable subdirectories, 2017-07-18)
we forgot to handle non-unix ACLs as well. Fix this.
Signed-off-by: Guillaume Castagnino <casta@xwing.info>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"gitweb" shows a link to visit the 'raw' contents of blbos in the
history overview page.
* js/gitweb-raw-blob-link-in-history:
gitweb: add 'raw' blob_plain link in history overview
When a directory is not readable, "gitweb" fails to build the
project list. Work this around by skipping such a directory.
It might end up hiding a problem under the rug and a better
solution might be to loudly complain to the administrator pointing
out the problematic directory, but this will at least make it
"work".
* hb/gitweb-project-list:
gitweb: skip unreadable subdirectories
For people that work with very large plain text files it may be easier
if one can bypass viewing the htmlized blob and instead click directly
to the raw file (rather then click through 'blob' and then to 'raw').
Signed-off-by: Job Snijders <job@instituut.net>
Reviewed-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a directory is not readable, "gitweb" fails to build the
project list. Work this around by skipping such a directory.
* hb/gitweb-project-list:
gitweb: skip unreadable subdirectories
gitweb terminates and shows no project list, if it can not access a
sub-directory in the project root directory while looking for projects
to show.
Work it around by skipping unreadable directories.
Signed-off-by: Hielke Christian Braun <hcb@unco.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In addition to purely abbreviated commit object names, "gitweb"
learned to turn "git describe" output (e.g. v2.9.3-599-g2376d31787)
into clickable links in its output.
* ab/gitweb-abbrev-links:
gitweb: link to "git describe"'d commits in log messages
gitweb: link to 7-char+ SHA-1s, not only 8-char+
gitweb: fix a typo in a comment
Change the log formatting function to know about "git describe" output
such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".
There are still many valid refnames that we don't link to
e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
similarly it's trivially possible to create some refnames like
"æ/var-gf6727b0" or which won't be picked up by this regex.
There's surely room for improvement here, but I just wanted to address
the very common case of sticking "git describe" output into commit
messages without trying to link to all possible refnames, that's going
to be a rather futile exercise given that this is free text, and it
would be prohibitively expensive to look up whether the references in
question exist in our repository.
There was on-list discussion about how we could do better than this
patch. Junio suggested to update parse_commits() to call a new
"gitweb--helper" command which would pass each of the revision
candidates through "rev-parse --verify --quiet". That would cut down
on our false positives (e.g. we'll link to "deadbeef"), and also allow
us to be more aggressive in selecting candidate revisions.
That may be too expensive to work in practice, or it may
not. Investigating that would be a good follow-up to this patch.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the minimum length of an abbreviated object identifier in the
commit message gitweb tries to turn into link from 8 hexchars to 7.
This arbitrary minimum length of 8 was introduced in bfe2191 ("gitweb:
SHA-1 in commit log message links to "object" view", 2006-12-10), but
the default abbreviation length is 7, and has been for a long time.
It's still possible to reference SHA-1s down to 4 characters in length,
see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make
git actually produce that, so I doubt anyone is putting that into log
messages in practice, but people definitely do put 7 character SHA-1s
into log messages.
I think it's fairly dubious to link to things matching [0-9a-fA-F]
here as opposed to just [0-9a-f], that dates back to the initial
version of gitweb from 161332a ("first working version",
2005-08-07). Git will accept all-caps SHA-1s, but didn't ever produce
them as far as I can tell.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change a typo'd MIME type in a comment. The Content-Type is
application/xhtml+xml, not application/xhtm+xml.
Fixes up code originally added in 53c4031 ("gitweb: Strip
non-printable characters from syntax highlighter output", 2011-09-16).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "highlight" binary can, in some cases, determine the language type
by the means of file contents, for example the shebang in the first line
for some scripting languages. Make use of this autodetection for files
which syntax is not known by gitweb. In that case, pass the blob
contents to "highlight --force"; the parameter is needed to make it
always generate HTML output (which includes HTML-escaping).
Although we now run highlight on files which do not end up highlighted,
performance is virtually unaffected because when we call highlight, it
is used for escaping HTML. In the case that highlight is used, gitweb
calls sanitize() instead of esc_html(), and the latter is significantly
slower (it does more, being roughly a superset of sanitize()). Simple
benchmark comparing performance of 'blob' view of files without syntax
highlighting in gitweb before and after this change indicates ±1%
difference in request time for all file types. Benchmark was performed
on local instance on Debian, using Apache/2.4.23 web server and CGI.
Document the feature and improve syntax highlight documentation, add
test to ensure gitweb doesn't crash when language detection is used.
Signed-off-by: Ian Kelling <ian@iankelling.org>
Acked-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a case where an html link can be generated from unescaped input
resulting in invalid strict xhtml or potentially injected code.
An overview of a repo with a tag "1.0.0&0.0.1" would previously result
in an unescaped ampersand in the link body.
Signed-off-by: Andreas Brauchli <a.brauchli@elementarea.net>
Acked-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some multi-byte encoding can have a backslash byte as a later part
of one letter, which would confuse "highlight" filter used in
gitweb.
* sk/gitweb-highlight-encoding:
gitweb: apply fallback encoding before highlight
Some multi-byte character encodings (such as Shift_JIS and GBK) have
characters whose final bytes is an ASCII '\' (0x5c), and they
will be displayed as funny-characters even if $fallback_encoding is
correct. This is because `highlight` command always expects UTF-8
encoded strings from STDIN.
$ echo 'my $v = "申";' | highlight --syntax perl | w3m -T text/html -dump
my $v = "申";
$ echo 'my $v = "申";' | iconv -f UTF-8 -t Shift_JIS | highlight \
--syntax perl | iconv -f Shift_JIS -t UTF-8 | w3m -T text/html -dump
iconv: (stdin):9:135: cannot convert
my $v = "
This patch prepare git blob objects to be encoded into UTF-8 before
highlighting in the manner of `to_utf8` subroutine.
Signed-off-by: Shin Kojima <shin@kojima.org>
Reviewed-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>