Note: this problem is taken from the
Compilers subject at Barcelona School of Informatics.
We want to design the parser of a language to describe trade routes.
Routes can be defined literally. For instance:
Route1 := "Tarragona" "Barcelona" 100
Route2 := "Girona" "Barcelona" 105
Route3 := "Barcelona" "Igualada" 70
Route4 := "Igualada" "Lleida" 95
Route5 := "Lleida" "LaSeuDUrgell" 130
Route6 := "LaSeuDUrgell" "Andorra" 20
Route7 := "Barcelona" "Berga" 110
Route8 := "Berga" "Puigcerda" 51
Route9 := "Puigcerda" "LaSeuDUrgell" 48
In this example, there is a road that connects Tarragona and Barcelona with a
length of 100 kilometers, a road that connects Girona and Barcelona with a
length of 105km, and so on. New routes can be defined by composing other routes
by means of two operators: ‘
+’ represents the concatenation of routes,
whereas ‘
|’ denotes that two routes are alternative. For instance,
possible routes defined from the previous routes could be:
Route10 := Route3 + Route4
Route11 := Route1 + (Route10 + Route5 | Route7 + Route8 + Route9)
In other words, Route10 defines a path from Barcelona to Lleida through
Igualada, whereas Route11 defines two possible paths from Tarragona to La Seu
d’Urgell (LaSeuDUrgell). The operator
+ has more precedence than the
operator
|, but, as we can see in the example, priority can be changed
using parentheses.
The set of tokens of the language is
{IDENT,
:=,
NUMBER,
STRING,
+,
|,
(,
)}. The token
IDENT represents a route name, which can be any non-empty sequence of
alphanumeric characters and underscore, not starting by a digit. The token
NUMBER is any unsigned integer number, i.e., a non-empty sequence of
digits. The token
STRING refers to the string literals. Such literals
are delimited by double quotes (
") and contain any amount of characters
different from double quotes.
The constructed AST must have as root a special node named
routes, with
one child per route. In turn, each route must be represented by a subtree with
the route name as root and a child with the definition of the route. If it is a
direct route, the definition must have a special node named
direct as
root, with the origin, destination, and distance as children. A concatenation
of two or more routes must be represented by a subtree with the token
+
as root, and one child per route. Similarly, an alternative of two or more
routes must be represented by a subtree with the token
| as root, and
one child per alternative.