The set of tokens of the language is
{
+,
-,
*,
/,
<,
>,
=,
not,
and,
or,
IDENT,
NUMBER,
[,
],
(,
),
.,
^,
,}
(mind the last token, which is a comma). The token
NUMBER represents
unsigned integers, i.e., non-empty sequences of digits. The token
IDENT
represents function identifiers, variables, and field identifiers. Such tokens
are non-empty sequences of alphanumeric characters and underscore, not starting
by a digit. A function can receive an arbitary number of arguments, possibly 0,
enclosed in parentheses and separated by commas. A variable can be an array
indexable with
[], a struct with fields accessible with
. or a
numeric value. An arbitrary expression, including the value returned by a
function call, an array elemement or a a struct field, is considered a variable
itself, and therefore can be any of the aformentioned things (but not function
names). Thus, semantically nonsensical expressions such as
1[2] or
(not 3).field must be accepted. Function parameters and array indices
can be arbitrary expressions, whereas a struct field is always an
IDENT.
Examples of correct expressions are
- “not not (1>2*pi or not e^3<4/x)”
- “-1=+x and +2=-y[3] and 4=f(x*5,6/z) or s.f=7”
- “f1(1,x) - f2(2*x)”
- “a[1+x()]”
- “a[1][2][3]”
- “s.a[1].b”
- “x^y”
- “f1().a / f2()[0] / g(1)”
- “(func()) + t((5)) * array[(5)] - (6 + 7)”
- “((name)[6]).field + ((namebis)).fieldbis”
and examples of incorrect expressions are
- “1<2=2”,
- “[1]”,
- “a[]”,
- “b^”,
- “a[7]()”,
- “f()()”,
- “f(())”,
The generated AST must correspond to an interpretation of all of the following:
- Unary operators +,-, and not have the highest
precedence.
- All arithmetic operators are left-associative except ^, which is
right-associative, and follow the usual precedence rules.
- Comparison operators are not associative (e.g., “1=2=3” is not
valid) and have less precedence than arithmetic operators.
- Logical operators have the lowest precedence, with and being more
sticky than or. The resulting AST must have at most one token or
(resp. and) for each disjunction (resp. conjunction) of boolean
expressions.
- Function calls must be represented by a subtree with the symbol (
as root, the function identifier as first child, and another special node named
params as second child. The params node has one child for each
parameter.
- Array access must be represented by a subtree with the symbol [ as
root, the array as first child and the index as second child. Note that the
implicit parenthesization of “a[1][2]” is “(a[1])[2]”.
- Field access must be represented by a subtree with the symbol “.” as
root, the struct being accessed as first child and the field as second child.
Note that the implicit parenthesization of “st.fi.z” is
“(st.fi).z”.
Note that expressions such as “
not 1”, “
1 < (not 2=2)”,
“
1 + (2<3)” and “
1.m + (1+2)[_variable1] + 3()” may be
semantically nonsensical, but are syntactically correct and should be
recognized. For example, for input
myfunc(not 1-2 or 3<k).mem
the resulting AST must be
.(((myfunc,params(or(-(not(1),2),<(3,k)))),mem)