Browse Source
CI has been taught to catch some Unicode directional formatting sequence that can be used in certain mischief. * js/ci-no-directional-formatting: ci: disallow directional formattingmaint
![gitster@pobox.com](/assets/img/avatar_default.png)
2 changed files with 28 additions and 0 deletions
@ -0,0 +1,27 @@
@@ -0,0 +1,27 @@
|
||||
#!/bin/bash |
||||
|
||||
# This script verifies that the non-binary files tracked in the Git index do |
||||
# not contain any Unicode directional formatting: such formatting could be used |
||||
# to deceive reviewers into interpreting code differently from the compiler. |
||||
# This is intended to run on an Ubuntu agent in a GitHub workflow. |
||||
# |
||||
# To allow translated messages to introduce such directional formatting in the |
||||
# future, we exclude the `.po` files from this validation. |
||||
# |
||||
# Neither GNU grep nor `git grep` (not even with `-P`) handle `\u` as a way to |
||||
# specify UTF-8. |
||||
# |
||||
# To work around that, we use `printf` to produce the pattern as a byte |
||||
# sequence, and then feed that to `git grep` as a byte sequence (setting |
||||
# `LC_CTYPE` to make sure that the arguments are interpreted as intended). |
||||
# |
||||
# Note: we need to use Bash here because its `printf` interprets `\uNNNN` as |
||||
# UTF-8 code points, as desired. Running this script through Ubuntu's `dash`, |
||||
# for example, would use a `printf` that does not understand that syntax. |
||||
|
||||
# U+202a..U+2a2e: LRE, RLE, PDF, LRO and RLO |
||||
# U+2066..U+2069: LRI, RLI, FSI and PDI |
||||
regex='(\u202a|\u202b|\u202c|\u202d|\u202e|\u2066|\u2067|\u2068|\u2069)' |
||||
|
||||
! LC_CTYPE=C git grep -El "$(LC_CTYPE=C.UTF-8 printf "$regex")" \ |
||||
-- ':(exclude,attr:binary)' ':(exclude)*.po' |
Loading…
Reference in new issue