The set of tokens of the language is {
IDENT,
expr,
:,
:=,
,,
;,
(,
),
goto,
begin,
end,
if,
then,
else,
case,
of,
repeat,
until,
while,
do,
for,
to,
downto,
with}. The token
IDENT recognizes
valid identifiers, i.e., a non-empty sequence of lowercase alphanumeric
characters and underscore, not starting by a digit. We do not consider
uppercase letters, since Pascal is case-insensitive. Token
expr
represents an arbitrary expression, and thus, we de not need to parse
expressions, as we already dispose of this ‘token’. This token includes
variable access, which are expressions such as “
index”,
“
v[6]”, and “
node.child”.
There are two types of statements: simple statements, and structured
statements. Any statement may be prefixed by a label (an identifier), as in
“
L: ...”. In that case, the token
: must appear as root of the
AST.
Simple statements are statements that do not contain other statements. There are 4 kinds:
- Empty statements. In the AST, there should be a special node named
nop (no operation).
- Assignment statements, such as “i := j + k”. In the AST, the
token := shoud appear as root, with the variable access (which is
recognized by the token expr) and the assigned value as children.
- Procedure statements, such as “paint(color)”,
“showInfo”, or “max(x, y, z)”. The list of parameters is
optional. When it is present, in the AST the token ( should appear as
root, with the procedure identifier and the list of parameters (which are
expressions), grouped under a special node named param_list, as
children.
- Go-to statements, such as “goto L1”. In the AST, the token
goto should appear as root.
There are 4 kinds of structured statements:
- Compound statements, such as “begin x := y; ; x := z end”, where
the second statement is an empty statement. They consist of a list of
statements separated by semicolons between the begin and end
tags. In the AST, the tokens begin should appear as root, with the
statements as children.
- Conditional statements: there are 2 types:
- Repetitive statements. There are tree types:
- While statements, such as
while b <> 0 do
begin t := b;
b := a mod b;
a := t
end
In the AST, the token while should appear as root, with the condition
and the statement to be executed while it is true as children.
- Repeat statements, such as
repeat
t := b;
b := a mod b;
a := t
until b = 0
Note that between the repeat and until keywords there can be a
more than one statement separated by semicolons. In other words, the
begin and end tags are not necessary. In the AST, the token
repeat should appear as root, with the list of statements, grouped under
a special node named statement_sequence, and the condition as children.
- For statements, such as
begin
for i := 10 to 30 do
cont := cont + i;
for j := 12*8 downto i/2 do
cont := cont - j
end
which is a compound statement with two for statements. In the AST, the token
for should appear as root, with the control variable (an
identifier, such as i in the example), the initial value (10),
the loop direction keyword (to or downto), the final value
(30) and statement to be executed at each iteration as children.
- With statements, such as
with date do
if month = 12 then
begin month := 1; year := year + 1
end
else month := month + 1
It serves to access the elements of a record without having to specify the
record variable’s name each time. Thus, the previous code snippet is
equivalent to:
if date.month = 12 then
begin date.month := 1; date.year := date.year + 1
end
else date.month := date.month + 1
The record variable (date in the example) is an expression. There can be
more than one record variable, as in with x, y, z do .... In the AST,
the with token should appear as root, with the list of record variables,
grouped under a special node named record_variable_list, and the
statement to be executed with these variables as children.