A declaration specifies the identifier, type, and other aspects of some
element, such as a variables or function. Declarations in
C have a very
rich syntax, but we only consider a subset.
The set of tokens is
{IDENT,
expr,
;,
,,
=,
(,
),
[,
],
{,
},
*,
enum,
struct,
union,
type,
TYPE_QUALIFIER,
STORAGE_CLASS}. The token
IDENT represents any identifier
(such as a variable or function name), that is, a non-empty sequence of
alphanumeric characters and underscore, not starting by a digit. The token
expr represents an arbitrary expression, such as
3+4 or
i++. Thus, we do not need to parse expressions, as we already dispose of
this ‘token’.
A declaration has three parts: the
declaration specifiers, the list of
declarators, and a terminating semicolon “
;”.
Declaration specifiers: they are keywords or fragments of code that
specify some property about the declared variables. The declaration specifiers
part of a declaration consists of one or more declaration specifiers.
Semantically speaking, not all of them can appear together or in any order, but
we defer this inspection for the semantical analysis. They can be classified as
follows:
- Type: defines the type of the variables. It includes keywords
such as int, float, void, unsigned, and more. Most
of these keywords are grouped into the type token (that is a literal word
“type”, in order to simplify the exercise), with the exception of two
type specifiers that require more complex parsing:
- The enumerate type starts with the keyword enum, followed by an
identifier and the list of enumerate elements. Either the identifier or the
list can be missing, but not both. The list is delimited by curly brackets
({, }) and must contain one or more elements, separated by
commas. Each enumerate element is an identifier that can be initialized. In
that case, it is followed by the token = and an expression expr.
For example:
- enum IDENT
- enum {IDENT, IDENT=expr, IDENT}
- enum IDENT {IDENT=expr, IDENT}
- The struct/union type starts with either the keyword struct or
union, followed by an identifier and the list of struct elements.
Either the identifier or the list can be missing, but not both. The list is
delimited by curly brackets and must contain one or more elements. Each element
is an arbitrary declaration. For example:
- struct IDENT
- union {type IDENT;}
- struct IDENT {type IDENT; type IDENT; type IDENT;}
and even expressions such as “struct {type IDENT = IDENT;}”, even though it
is not semantically valid to initialize struct fields.
- Storage class: defines the scope and life time of the variables.
It can be any of the keywords typedef, extern, static,
auto, or register. They are grouped into the STORAGE_CLASS
token.
- Type qualifier: it can be any of the keywords const,
restrict, or volatile. They are grouped into the
TYPE_QUALIFIER token.
List of declarators: it consists of one or more declarators separated
by commas. A declarator is the part of a declaration that specifies the name
that is to be introduced into the program. Declarators can be initialized,
which means that they can be followed by the symbol
= and an
initializer. An initializer is either an expression or an initializer
list, which is a list of one or more initializers delimited by curly brackets
and separated by commas. A declarator can be any of the following:
- An identifier.
- A pointer. It consists of a declarator preceeded by the token *.
Pointers can be qualified, case in which the token * is followed
by a type qualifier. For instance, the expressions “TYPE_QUALIFIER *IDENT;”
and “* TYPE_QUALIFIER IDENT;” are both valid. In the first case, the type
qualifier is a declaration specifier, whereas in the second it qualifies the
pointer.
- An array. It consists of a declarator followed by the token [, an
optional expression, and the token ].
- A function. It consists of a declarator followed by a list of zero or
more parameter declarations, delimited by parentheses and separated by commas.
A parameter declaration, in turn, is a regular declaration, except that it can
have only one declarator, it cannot be initialized and it doesn’t have the
final semicolon.
- A declarator inside parentheses.
Examples of correct declarators are:
- IDENT
- *IDENT
- * TYPE_QUALIFIER * * TYPE_QUALIFIER IDENT
- IDENT[expr]
- IDENT()
- IDENT(type IDENT, type IDENT, type IDENT)
- IDENT()[expr](type IDENT)[expr]
- *(*IDENT()[expr])[expr]
- IDENT = expr
- IDENT = {expr, expr, {expr, expr}}
Remarks about the AST construction:
- A list of declaration specifiers must have a special node called
declarationSpecifiers as root, with all the specifiers as children.
- A list of declarators must have a special node named declarators
as root, with all the declarators as children.
- If a declarator is initialized, the token = must appear as root,
with the declarator and the initializer as children.
- A declaration must be represented by a special node named
declaration, with the list of declaration specifiers as first child and
the list of declarators as second child.
- The enumerate type must have the token enum as root and the
identifier and list of enumerate elements as children, when available. In turn,
the list of enumerate elements must have a special node named
enumerators as root, with all the elements as children. In case an
enumerate element is initialized, the token = must appear as root, with
the identifier and the expression as children.
- The struct/union type must have either the token struct or
union as root, and the identifier and list of declarations as children,
when available. In turn, the list of declarations must have a special node
named structDeclarations as root, with the declarations as children.
- A pointer declarator must be represented by a subtree with the token
“*” as root, and the type qualifier (if given) and the declarator as
children. Note that the implicit parenthesisation of “*TYPE_QUALIFIER *IDENT”
is “*(TYPE_QUALIFIER, *(IDENT))”.
- An array declarator must be represented by a subtree with the token
“[” as root, the declarator as first child and the size expression,
if given, as second child. Note that the implicit parenthesisation of
IDENT[][] is (IDENT[])[].
- A fuction declarator must be represented by a subtree with the token
“(” as root, the declarator as first child and the list of parameters
as second child. The list of parameters must have a special node called
parameters as root, even if there are no parameters, with one child per
parameter declaration. A parameter declaration must have a special node called
parameterDeclaration as root, with the list of declaration specifiers as
first child and the declarator (since there can be only one) as second child.
- Array and function declarators have priority over pointers. Thus, the
implicit parenthesisation of “*IDENT[]” is “*(IDENT[])”.
- An initializer consisting of an expression, must be represented by the
expression itself. In turn, an initializer consisting of a list of initializers
must be represented by a special node called initializers with one child
per initializer.