You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

298 lines
6.1 KiB

/* SPDX-License-Identifier: GPL-2.0-or-later */
20 years ago
/*
* (C) Copyright David Gibson <dwg@au1.ibm.com>, IBM Corporation. 2005.
*/
%option noyywrap nounput noinput never-interactive
20 years ago
%x BYTESTRING
%x PROPNODENAME
%s V1
20 years ago
dtc: Fix some lexical problems with references The recent change to the lexer to only recognize property and node names in the appropriate context removed a number of lexical warts in our language that would have gotten ugly as we add expression support and so forth. But there's one nasty one remaining: references can contain a full path, including the various problematic node name characters (',', '+' and '-', for example). This would cause trouble with expressions, and it also causes trouble with the patch I'm working on to allow expanding references to paths rather than phandles. This patch therefore reworks the lexer to mitigate these problems. - References to labels cause no problems. These are now recognized separately from references to full paths. No syntax change here. - References to full paths, including problematic characters are allowed by "quoting" the path with braces e.g. &{/pci@10000/somedevice@3,8000}. The braces protect any internal problematic characters from being confused with operators or whatever. - For compatibility with existing dts files, in v0 dts files we allow bare references to paths as before &/foo/bar/whatever - but *only* if the path contains no troublesome characters. Specifically only [a-zA-Z0-9_@/] are allowed. This is an incompatible change to the dts-v1 format, but since AFAIK no-one has yet switched to dts-v1 files, I think we can get away with it. Better to make the transition when people to convert to v1, and get rid of the problematic old syntax. Strictly speaking, it's also an incompatible change to the v0 format, since some path references that were allowed before are no longer allowed. I suspect no-one has been using the no-longer-supported forms (certainly none of the kernel dts files will cause trouble). We might need to think about this harder, though. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
17 years ago
PROPNODECHAR [a-zA-Z0-9,._+*#?@-]
PATHCHAR ({PROPNODECHAR}|[/])
LABEL [a-zA-Z_][a-zA-Z0-9_]*
dtc: Clean up lexing of include files Currently we scan the /include/ directive as two tokens, the "/include/" keyword itself, then the string giving the file name to include. We use a special scanner state to keep the two linked together, and use the scanner state stack to keep track of the original state while we're parsing the two /include/ tokens. This does mean that we need to enable the 'stack' option in flex, which results in a not-easily-suppressed warning from the flex boilerplate code. This is mildly irritating. However, this two-token scanning of the /include/ directive also has some extremely strange edge cases, because there are a variety of tokens recognized in all scanner states, including INCLUDE. For example the following strange dts file: /include/ /dts-v1/; / { /* ... */ }; Will be processed successfully with the /include/ being effectively ignored: the '/dts-v1/' and ';' are recognized even in INCLUDE state, then the ';' transitions us to PROPNODENAME state, throwing away INCLUDE, and the previous state is never popped off the stack. Or for another example this construct: foo /include/ = "somefile.dts" will be parsed as though it were: foo = /include/ "somefile.dts" Again, the '=' is scanned without leaving INCLUDE state, then the next string triggers the include logic. And finally, we use a different regexp for the string with the included filename than the normal string regexpt, which is also potentially weird. This patch, therefore, cleans up the lexical handling of the /include/ directive. Instead of the INCLUDE state, we instead scan the whole include directive, both keyword and filename as a single token. This does mean a bit more complexity in extracting the filename out of yytext, but I think it's worth it to avoid the strageness described above. It also means it's no longer possible to put a comment between the /include/ and the filename, but I'm really not very worried about breaking files using such a strange construct.
17 years ago
STRING \"([^\\"]|\\.)*\"
CHAR_LITERAL '([^']|\\')*'
dtc: Clean up lexing of include files Currently we scan the /include/ directive as two tokens, the "/include/" keyword itself, then the string giving the file name to include. We use a special scanner state to keep the two linked together, and use the scanner state stack to keep track of the original state while we're parsing the two /include/ tokens. This does mean that we need to enable the 'stack' option in flex, which results in a not-easily-suppressed warning from the flex boilerplate code. This is mildly irritating. However, this two-token scanning of the /include/ directive also has some extremely strange edge cases, because there are a variety of tokens recognized in all scanner states, including INCLUDE. For example the following strange dts file: /include/ /dts-v1/; / { /* ... */ }; Will be processed successfully with the /include/ being effectively ignored: the '/dts-v1/' and ';' are recognized even in INCLUDE state, then the ';' transitions us to PROPNODENAME state, throwing away INCLUDE, and the previous state is never popped off the stack. Or for another example this construct: foo /include/ = "somefile.dts" will be parsed as though it were: foo = /include/ "somefile.dts" Again, the '=' is scanned without leaving INCLUDE state, then the next string triggers the include logic. And finally, we use a different regexp for the string with the included filename than the normal string regexpt, which is also potentially weird. This patch, therefore, cleans up the lexical handling of the /include/ directive. Instead of the INCLUDE state, we instead scan the whole include directive, both keyword and filename as a single token. This does mean a bit more complexity in extracting the filename out of yytext, but I think it's worth it to avoid the strageness described above. It also means it's no longer possible to put a comment between the /include/ and the filename, but I'm really not very worried about breaking files using such a strange construct.
17 years ago
WS [[:space:]]
COMMENT "/*"([^*]|\*+[^*/])*\*+"/"
LINECOMMENT "//".*\n
20 years ago
%{
#include "dtc.h"
#include "srcpos.h"
#include "dtc-parser.tab.h"
extern bool treesource_error;
/* CAUTION: this will stop working if we ever use yyless() or yyunput() */
#define YY_USER_ACTION \
{ \
srcpos_update(&yylloc, yytext, yyleng); \
}
20 years ago
/*#define LEXDEBUG 1*/
#ifdef LEXDEBUG
#define DPRINT(fmt, ...) fprintf(stderr, fmt, ##__VA_ARGS__)
#else
#define DPRINT(fmt, ...) do { } while (0)
#endif
20 years ago
static int dts_version = 1;
#define BEGIN_DEFAULT() DPRINT("<V1>\n"); \
BEGIN(V1); \
static void push_input_file(const char *filename);
static bool pop_input_file(void);
static void PRINTF(1, 2) lexical_error(const char *fmt, ...);
20 years ago
%}
%%
dtc: Clean up lexing of include files Currently we scan the /include/ directive as two tokens, the "/include/" keyword itself, then the string giving the file name to include. We use a special scanner state to keep the two linked together, and use the scanner state stack to keep track of the original state while we're parsing the two /include/ tokens. This does mean that we need to enable the 'stack' option in flex, which results in a not-easily-suppressed warning from the flex boilerplate code. This is mildly irritating. However, this two-token scanning of the /include/ directive also has some extremely strange edge cases, because there are a variety of tokens recognized in all scanner states, including INCLUDE. For example the following strange dts file: /include/ /dts-v1/; / { /* ... */ }; Will be processed successfully with the /include/ being effectively ignored: the '/dts-v1/' and ';' are recognized even in INCLUDE state, then the ';' transitions us to PROPNODENAME state, throwing away INCLUDE, and the previous state is never popped off the stack. Or for another example this construct: foo /include/ = "somefile.dts" will be parsed as though it were: foo = /include/ "somefile.dts" Again, the '=' is scanned without leaving INCLUDE state, then the next string triggers the include logic. And finally, we use a different regexp for the string with the included filename than the normal string regexpt, which is also potentially weird. This patch, therefore, cleans up the lexical handling of the /include/ directive. Instead of the INCLUDE state, we instead scan the whole include directive, both keyword and filename as a single token. This does mean a bit more complexity in extracting the filename out of yytext, but I think it's worth it to avoid the strageness described above. It also means it's no longer possible to put a comment between the /include/ and the filename, but I'm really not very worried about breaking files using such a strange construct.
17 years ago
<*>"/include/"{WS}*{STRING} {
char *name = strchr(yytext, '\"') + 1;
yytext[yyleng-1] = '\0';
push_input_file(name);
}
<*>^"#"(line)?[ \t]+[0-9]+[ \t]+{STRING}([ \t]+[0-9]+)? {
char *line, *fnstart, *fnend;
struct data fn;
/* skip text before line # */
line = yytext;
while (!isdigit((unsigned char)*line))
line++;
/* regexp ensures that first and list "
* in the whole yytext are those at
* beginning and end of the filename string */
fnstart = memchr(yytext, '"', yyleng);
for (fnend = yytext + yyleng - 1;
*fnend != '"'; fnend--)
;
assert(fnstart && fnend && (fnend > fnstart));
fn = data_copy_escape_string(fnstart + 1,
fnend - fnstart - 1);
/* Don't allow nuls in filenames */
if (memchr(fn.val, '\0', fn.len - 1))
lexical_error("nul in line number directive");
/* -1 since #line is the number of the next line */
srcpos_set_line(xstrdup(fn.val), atoi(line) - 1);
data_free(fn);
}
<*><<EOF>> {
if (!pop_input_file()) {
yyterminate();
}
}
dtc: Clean up lexing of include files Currently we scan the /include/ directive as two tokens, the "/include/" keyword itself, then the string giving the file name to include. We use a special scanner state to keep the two linked together, and use the scanner state stack to keep track of the original state while we're parsing the two /include/ tokens. This does mean that we need to enable the 'stack' option in flex, which results in a not-easily-suppressed warning from the flex boilerplate code. This is mildly irritating. However, this two-token scanning of the /include/ directive also has some extremely strange edge cases, because there are a variety of tokens recognized in all scanner states, including INCLUDE. For example the following strange dts file: /include/ /dts-v1/; / { /* ... */ }; Will be processed successfully with the /include/ being effectively ignored: the '/dts-v1/' and ';' are recognized even in INCLUDE state, then the ';' transitions us to PROPNODENAME state, throwing away INCLUDE, and the previous state is never popped off the stack. Or for another example this construct: foo /include/ = "somefile.dts" will be parsed as though it were: foo = /include/ "somefile.dts" Again, the '=' is scanned without leaving INCLUDE state, then the next string triggers the include logic. And finally, we use a different regexp for the string with the included filename than the normal string regexpt, which is also potentially weird. This patch, therefore, cleans up the lexical handling of the /include/ directive. Instead of the INCLUDE state, we instead scan the whole include directive, both keyword and filename as a single token. This does mean a bit more complexity in extracting the filename out of yytext, but I think it's worth it to avoid the strageness described above. It also means it's no longer possible to put a comment between the /include/ and the filename, but I'm really not very worried about breaking files using such a strange construct.
17 years ago
<*>{STRING} {
DPRINT("String: %s\n", yytext);
20 years ago
yylval.data = data_copy_escape_string(yytext+1,
yyleng-2);
return DT_STRING;
}
<*>"/dts-v1/" {
DPRINT("Keyword: /dts-v1/\n");
dts_version = 1;
BEGIN_DEFAULT();
return DT_V1;
}
<*>"/plugin/" {
DPRINT("Keyword: /plugin/\n");
return DT_PLUGIN;
}
<*>"/memreserve/" {
DPRINT("Keyword: /memreserve/\n");
BEGIN_DEFAULT();
return DT_MEMRESERVE;
}
<*>"/bits/" {
DPRINT("Keyword: /bits/\n");
BEGIN_DEFAULT();
return DT_BITS;
}
<*>"/delete-property/" {
DPRINT("Keyword: /delete-property/\n");
DPRINT("<PROPNODENAME>\n");
BEGIN(PROPNODENAME);
return DT_DEL_PROP;
}
<*>"/delete-node/" {
DPRINT("Keyword: /delete-node/\n");
DPRINT("<PROPNODENAME>\n");
BEGIN(PROPNODENAME);
return DT_DEL_NODE;
}
<*>"/omit-if-no-ref/" {
DPRINT("Keyword: /omit-if-no-ref/\n");
DPRINT("<PROPNODENAME>\n");
BEGIN(PROPNODENAME);
return DT_OMIT_NO_REF;
}
dtc: Fix some lexical problems with references The recent change to the lexer to only recognize property and node names in the appropriate context removed a number of lexical warts in our language that would have gotten ugly as we add expression support and so forth. But there's one nasty one remaining: references can contain a full path, including the various problematic node name characters (',', '+' and '-', for example). This would cause trouble with expressions, and it also causes trouble with the patch I'm working on to allow expanding references to paths rather than phandles. This patch therefore reworks the lexer to mitigate these problems. - References to labels cause no problems. These are now recognized separately from references to full paths. No syntax change here. - References to full paths, including problematic characters are allowed by "quoting" the path with braces e.g. &{/pci@10000/somedevice@3,8000}. The braces protect any internal problematic characters from being confused with operators or whatever. - For compatibility with existing dts files, in v0 dts files we allow bare references to paths as before &/foo/bar/whatever - but *only* if the path contains no troublesome characters. Specifically only [a-zA-Z0-9_@/] are allowed. This is an incompatible change to the dts-v1 format, but since AFAIK no-one has yet switched to dts-v1 files, I think we can get away with it. Better to make the transition when people to convert to v1, and get rid of the problematic old syntax. Strictly speaking, it's also an incompatible change to the v0 format, since some path references that were allowed before are no longer allowed. I suspect no-one has been using the no-longer-supported forms (certainly none of the kernel dts files will cause trouble). We might need to think about this harder, though. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
17 years ago
<*>{LABEL}: {
DPRINT("Label: %s\n", yytext);
yylval.labelref = xstrdup(yytext);
yylval.labelref[yyleng-1] = '\0';
return DT_LABEL;
}
<V1>([0-9]+|0[xX][0-9a-fA-F]+)(U|L|UL|LL|ULL)? {
char *e;
DPRINT("Integer Literal: '%s'\n", yytext);
errno = 0;
yylval.integer = strtoull(yytext, &e, 0);
if (*e && e[strspn(e, "UL")]) {
lexical_error("Bad integer literal '%s'",
yytext);
}
if (errno == ERANGE)
lexical_error("Integer literal '%s' out of range",
yytext);
else
/* ERANGE is the only strtoull error triggerable
* by strings matching the pattern */
assert(errno == 0);
return DT_LITERAL;
20 years ago
}
<*>{CHAR_LITERAL} {
struct data d;
DPRINT("Character literal: %s\n", yytext);
d = data_copy_escape_string(yytext+1, yyleng-2);
if (d.len == 1) {
lexical_error("Empty character literal");
yylval.integer = 0;
} else {
yylval.integer = (unsigned char)d.val[0];
if (d.len > 2)
lexical_error("Character literal has %d"
" characters instead of 1",
d.len - 1);
}
data_free(d);
return DT_CHAR_LITERAL;
}
<*>\&{LABEL} { /* label reference */
dtc: Fix some lexical problems with references The recent change to the lexer to only recognize property and node names in the appropriate context removed a number of lexical warts in our language that would have gotten ugly as we add expression support and so forth. But there's one nasty one remaining: references can contain a full path, including the various problematic node name characters (',', '+' and '-', for example). This would cause trouble with expressions, and it also causes trouble with the patch I'm working on to allow expanding references to paths rather than phandles. This patch therefore reworks the lexer to mitigate these problems. - References to labels cause no problems. These are now recognized separately from references to full paths. No syntax change here. - References to full paths, including problematic characters are allowed by "quoting" the path with braces e.g. &{/pci@10000/somedevice@3,8000}. The braces protect any internal problematic characters from being confused with operators or whatever. - For compatibility with existing dts files, in v0 dts files we allow bare references to paths as before &/foo/bar/whatever - but *only* if the path contains no troublesome characters. Specifically only [a-zA-Z0-9_@/] are allowed. This is an incompatible change to the dts-v1 format, but since AFAIK no-one has yet switched to dts-v1 files, I think we can get away with it. Better to make the transition when people to convert to v1, and get rid of the problematic old syntax. Strictly speaking, it's also an incompatible change to the v0 format, since some path references that were allowed before are no longer allowed. I suspect no-one has been using the no-longer-supported forms (certainly none of the kernel dts files will cause trouble). We might need to think about this harder, though. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
17 years ago
DPRINT("Ref: %s\n", yytext+1);
yylval.labelref = xstrdup(yytext+1);
return DT_LABEL_REF;
dtc: Fix some lexical problems with references The recent change to the lexer to only recognize property and node names in the appropriate context removed a number of lexical warts in our language that would have gotten ugly as we add expression support and so forth. But there's one nasty one remaining: references can contain a full path, including the various problematic node name characters (',', '+' and '-', for example). This would cause trouble with expressions, and it also causes trouble with the patch I'm working on to allow expanding references to paths rather than phandles. This patch therefore reworks the lexer to mitigate these problems. - References to labels cause no problems. These are now recognized separately from references to full paths. No syntax change here. - References to full paths, including problematic characters are allowed by "quoting" the path with braces e.g. &{/pci@10000/somedevice@3,8000}. The braces protect any internal problematic characters from being confused with operators or whatever. - For compatibility with existing dts files, in v0 dts files we allow bare references to paths as before &/foo/bar/whatever - but *only* if the path contains no troublesome characters. Specifically only [a-zA-Z0-9_@/] are allowed. This is an incompatible change to the dts-v1 format, but since AFAIK no-one has yet switched to dts-v1 files, I think we can get away with it. Better to make the transition when people to convert to v1, and get rid of the problematic old syntax. Strictly speaking, it's also an incompatible change to the v0 format, since some path references that were allowed before are no longer allowed. I suspect no-one has been using the no-longer-supported forms (certainly none of the kernel dts files will cause trouble). We might need to think about this harder, though. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
17 years ago
}
<*>"&{/"{PATHCHAR}*\} { /* new-style path reference */
dtc: Fix some lexical problems with references The recent change to the lexer to only recognize property and node names in the appropriate context removed a number of lexical warts in our language that would have gotten ugly as we add expression support and so forth. But there's one nasty one remaining: references can contain a full path, including the various problematic node name characters (',', '+' and '-', for example). This would cause trouble with expressions, and it also causes trouble with the patch I'm working on to allow expanding references to paths rather than phandles. This patch therefore reworks the lexer to mitigate these problems. - References to labels cause no problems. These are now recognized separately from references to full paths. No syntax change here. - References to full paths, including problematic characters are allowed by "quoting" the path with braces e.g. &{/pci@10000/somedevice@3,8000}. The braces protect any internal problematic characters from being confused with operators or whatever. - For compatibility with existing dts files, in v0 dts files we allow bare references to paths as before &/foo/bar/whatever - but *only* if the path contains no troublesome characters. Specifically only [a-zA-Z0-9_@/] are allowed. This is an incompatible change to the dts-v1 format, but since AFAIK no-one has yet switched to dts-v1 files, I think we can get away with it. Better to make the transition when people to convert to v1, and get rid of the problematic old syntax. Strictly speaking, it's also an incompatible change to the v0 format, since some path references that were allowed before are no longer allowed. I suspect no-one has been using the no-longer-supported forms (certainly none of the kernel dts files will cause trouble). We might need to think about this harder, though. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
17 years ago
yytext[yyleng-1] = '\0';
DPRINT("Ref: %s\n", yytext+2);
yylval.labelref = xstrdup(yytext+2);
return DT_PATH_REF;
dtc: Fix some lexical problems with references The recent change to the lexer to only recognize property and node names in the appropriate context removed a number of lexical warts in our language that would have gotten ugly as we add expression support and so forth. But there's one nasty one remaining: references can contain a full path, including the various problematic node name characters (',', '+' and '-', for example). This would cause trouble with expressions, and it also causes trouble with the patch I'm working on to allow expanding references to paths rather than phandles. This patch therefore reworks the lexer to mitigate these problems. - References to labels cause no problems. These are now recognized separately from references to full paths. No syntax change here. - References to full paths, including problematic characters are allowed by "quoting" the path with braces e.g. &{/pci@10000/somedevice@3,8000}. The braces protect any internal problematic characters from being confused with operators or whatever. - For compatibility with existing dts files, in v0 dts files we allow bare references to paths as before &/foo/bar/whatever - but *only* if the path contains no troublesome characters. Specifically only [a-zA-Z0-9_@/] are allowed. This is an incompatible change to the dts-v1 format, but since AFAIK no-one has yet switched to dts-v1 files, I think we can get away with it. Better to make the transition when people to convert to v1, and get rid of the problematic old syntax. Strictly speaking, it's also an incompatible change to the v0 format, since some path references that were allowed before are no longer allowed. I suspect no-one has been using the no-longer-supported forms (certainly none of the kernel dts files will cause trouble). We might need to think about this harder, though. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
17 years ago
}
20 years ago
<BYTESTRING>[0-9a-fA-F]{2} {
yylval.byte = strtol(yytext, NULL, 16);
DPRINT("Byte: %02x\n", (int)yylval.byte);
20 years ago
return DT_BYTE;
}
<BYTESTRING>"]" {
DPRINT("/BYTESTRING\n");
BEGIN_DEFAULT();
20 years ago
return ']';
}
<PROPNODENAME>\\?{PROPNODECHAR}+ {
DPRINT("PropNodeName: %s\n", yytext);
yylval.propnodename = xstrdup((yytext[0] == '\\') ?
yytext + 1 : yytext);
BEGIN_DEFAULT();
return DT_PROPNODENAME;
20 years ago
}
"/incbin/" {
DPRINT("Binary Include\n");
return DT_INCBIN;
}
dtc: Clean up lexing of include files Currently we scan the /include/ directive as two tokens, the "/include/" keyword itself, then the string giving the file name to include. We use a special scanner state to keep the two linked together, and use the scanner state stack to keep track of the original state while we're parsing the two /include/ tokens. This does mean that we need to enable the 'stack' option in flex, which results in a not-easily-suppressed warning from the flex boilerplate code. This is mildly irritating. However, this two-token scanning of the /include/ directive also has some extremely strange edge cases, because there are a variety of tokens recognized in all scanner states, including INCLUDE. For example the following strange dts file: /include/ /dts-v1/; / { /* ... */ }; Will be processed successfully with the /include/ being effectively ignored: the '/dts-v1/' and ';' are recognized even in INCLUDE state, then the ';' transitions us to PROPNODENAME state, throwing away INCLUDE, and the previous state is never popped off the stack. Or for another example this construct: foo /include/ = "somefile.dts" will be parsed as though it were: foo = /include/ "somefile.dts" Again, the '=' is scanned without leaving INCLUDE state, then the next string triggers the include logic. And finally, we use a different regexp for the string with the included filename than the normal string regexpt, which is also potentially weird. This patch, therefore, cleans up the lexical handling of the /include/ directive. Instead of the INCLUDE state, we instead scan the whole include directive, both keyword and filename as a single token. This does mean a bit more complexity in extracting the filename out of yytext, but I think it's worth it to avoid the strageness described above. It also means it's no longer possible to put a comment between the /include/ and the filename, but I'm really not very worried about breaking files using such a strange construct.
17 years ago
<*>{WS}+ /* eat whitespace */
<*>{COMMENT}+ /* eat C-style comments */
<*>{LINECOMMENT}+ /* eat C++-style comments */
20 years ago
<*>"<<" { return DT_LSHIFT; };
<*>">>" { return DT_RSHIFT; };
<*>"<=" { return DT_LE; };
<*>">=" { return DT_GE; };
<*>"==" { return DT_EQ; };
<*>"!=" { return DT_NE; };
<*>"&&" { return DT_AND; };
<*>"||" { return DT_OR; };
<*>. {
DPRINT("Char: %c (\\x%02x)\n", yytext[0],
(unsigned)yytext[0]);
if (yytext[0] == '[') {
DPRINT("<BYTESTRING>\n");
BEGIN(BYTESTRING);
}
if ((yytext[0] == '{')
|| (yytext[0] == ';')) {
DPRINT("<PROPNODENAME>\n");
BEGIN(PROPNODENAME);
}
20 years ago
return yytext[0];
}
%%
static void push_input_file(const char *filename)
{
assert(filename);
srcfile_push(filename);
yyin = current_srcfile->f;
yypush_buffer_state(yy_create_buffer(yyin, YY_BUF_SIZE));
}
static bool pop_input_file(void)
{
if (srcfile_pop() == 0)
return false;
yypop_buffer_state();
yyin = current_srcfile->f;
return true;
}
static void lexical_error(const char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
srcpos_verror(&yylloc, "Lexical error", fmt, ap);
va_end(ap);
treesource_error = true;
}