The set of tokens of the language is
{
+,
IDENT,
NUMBER,
[,
],
(,
),
.,
^,
,}
(mind the last token, which is a comma). The token
NUMBER represents
unsigned integers, i.e., non-empty sequences of digits. The token
IDENT
represents function identifiers, variables, and field identifiers. Such tokens
are non-empty sequences of alphanumeric characters and underscore, not starting
by a digit. A function receives 0 or more arguments, enclosed in parentheses
and separated by commas. A variable can be a numeric value, an array indexable
with
[], a struct with fields accessible with
., or a pointer
dereferenceable with
^. An expression (such as a number, the value
returned by a function call or a struct field) can be anything a variable can.
Thus, apparently semantically nonsensical expressions such as
0^ or
(1+2)[3] must be accepted. Function arguments and array indices can be
arbitrary expressions, whereas a struct field is always an
IDENT.
Examples of correct expressions are
- “1+2+x”,
- “f1,xy) + g(2+z)”,
- “a[1+f()]”,
- “a1[1][2][3]”,
- “st._m[1]._x”,
- “p^ + a[00]^ + st.me^ + ppp^^^ + k()^.k^[8]^”
- “f().m + f2()[1] + fun(4)^”
- “(f()) + g((2)) + arr[(5)] + (6 + 7)”
- “(p)^ + ((h)[3])^ + (str).field”
and examples of incorrect expressions are
- “1()”,
- “[2]”,
- “x[]”,
- “^x”,
- “y[3]()”,
- “z()()”,
- “f(())”,
The generated AST must correspond to an interpretation of the operator
+
as left-associative, and all of the following:
- Function calls must be represented by a subtree with the symbol (
as root, the function identifier as first child, and another special node named
params as second child. The params node has one child for each
parameter.
- Array access must be represented by a subtree with the symbol [ as
root, the array as first child and the index as second child. Note that the
implicit parenthesization of “a[1][2]” is “(a[1])[2]”.
- Deferences must be represented by a subtree with the symbol “^”
as root and the identifier being dereferenced as only child. Note that the
implicit parenthesization of “p^^” is “(p^)^”.
- Field access must be represented by a subtree with the symbol “.” as
root, the struct being accessed as first child and the field as second child.
Note that the implicit parenthesization of “st.fi.z” is
“(st.fi).z”.
For example, for input
f().m^[2]
the resulting AST must be
[(^(.(((f,params())
,m)),2)