The set of tokens is
{IDENT,
expr,
decl,
if,
else,
switch,
while,
do,
for,
goto,
continue,
break,
return,
case,
default,
:,
;,
{,
},
(,
) }.
The token
IDENT refers to an identifier (such as a label name), that is,
a non-empty sequence of alphanumeric characters and underscore, not starting by a
digit. We also dispose of tokens
expr and
decl, which denote
expressions and declarations respectively. Thus, instead of parsing those, we
use these ‘tokens’. Note that the expressions and declarations represented by
them are not empty.
The most elemental statements are: nothing, an expression, or a list of
declarations and statements enclosed in curly brackets (
{ and
}).
All statements must end in a semicolon (
;), except for the
aforementioned list in curly brackets, which does not need a semicolon after
the closing bracket. A statement does not need to have its own ending semicolon
if it ends in another statement, because that statement will have its own
semicolon or be a block
{...}, which does not require one. For instance,
the statement “
expr;” has an ending semicolon, so in the if statement
“
if(expr) expr;”, it is not required to add another semicolon.
A statement can be labeled, in which case it is preceded by a label identifier
(
IDENT) and the token “
:”. Alternatively, instead of a label
identifier we can have the keyword
case followed by an expression, or
the keyword
default (these options are only semantically valid in the
context of a switch statement, but this doesn’t have to be considered by the
parser).
There are also the following types of statements:
- An if statement starts with the keyword if, followed by
the condition (i.e., an expression) in parentheses, and a statement.
Optionally, this statement can be followed by the keyword else and
another statement.
- A switch statement starts with the keyword switch,
followed by an expression in paretheses and then a statement.
- A while statement starts with the keyword while, followed
by the condition (an expression) in paretheses and then a statement.
- A do-while statement starts with the keyword do, followed
by a statement, the keyword while and a condition (an expression) in
parentheses.
- A for statement starts with the keyword for, followed by
a list of three expressions enclosed in parentheses and separated by
semicolons, and then a statement. Any of the expressions can be missing, but
the separating semicolons must remain.
- A goto statement starts with the keyword goto, followed
by a label identifier.
- A return statement starts with the keyword return
followed by an optional expression.
Moreover, the statements
continue and
break consist of just
the keywords
continue and
break, respectively.
Remarks about AST construction:
- An empty statement must be represented by a special node named nop
(no operation).
- A statement consisting of an expression must be represented simply by the
expr literal token.
- A list of statements and declarations enclosed in curly brackets must be
represented by a tree with a special node named block as root, and the
element of the list as children.
- A labeled statement must be represented by a tree with a special node
named label as root, the label identifier as first child and the
statement itself as second child.
- A case statement (a statement starting with case) must be
represented by a tree with the token case as root and the expression and
the statement as children.
- A default statement (a statement starting with default)
must be represented by a tree with the token default as root and the
statement as only child.
- An if statement must be represented by a tree with the token
if as root and the condition and statement as children. In case an else
clause is present, the corresponding statement must appear as third child.
- A switch statement must be represented by a tree with the token
switch as root and the expression and statement as children.
- A while statement must be represented by a tree with the token
while as root and the condition and statement as children.
- A do-while statement must be represented by a tree with the
token do as root and the statement and condition as children.
- A for statement must be represented by a tree with the token
for as root and four children, one for each expression of the list and
one for the statement. In case any of the expressions are missing, the
corresponding son must consist of a special node named nop.
- A goto statement must be represented by a tree with the token
goto as root and the label identifier as only child.
- A return statement must be represented by a tree with the token
return as root. If an expression is given, then it must appear as child.
Otherwise, a special node named nop must appear as child.
- A continue or break statement must be represented by a
node with just the corresponding keyword.