A Simple Compiler - Part 5: Variables

Introduction

In Part 4 we introduced the Symbol Table and Constants. This article will deal with variables.

Variables

Variables are used to store program state information. Unlike Constants, Variables can be modified during runtime. We are going to add simple integer variables to Teeny. Later, we will add booleans, characters, enumerations, and arrays of each of the simple types.

Variables also have to be stored somewhere. This implies that the size of the variable is important. In a compiler that generates code for a real machine, alignment issues must be dealt with.

Variables can also have a lifetime. There are global variables, which maintain their value throughout the programs run. There are local variables, which lose their value when when the routine or class they were declared in goes out of scope. A stack is a convenient structure to use for storing local variables.

Variables can also have visibility. Typically, visibility is limited to a routine, a class, or a module. In some instances, there can be variables that are always visible.

Types

Variables are often associated with a type. Typical types are character, integer, floating point, boolean, string and user defined. In Teeny .05, the only variable type we will consider is integer.

As an aside, there is a long ongoing debate on static typing vs. dynamic typing, and strong typing vs. weak typing. The argument goes that dynamic and weak typing make programming easier, which the other side says that static and strong typing makes programs more reliable.

For simplicity, we will only deal with static and strong typing.

Declarations:

How do you declare variables? We will follow a Pascal-like approach, in order to make parsing simple.

 'var' ident {',' ident} ';'

This says that a variable is declared by the keyword 'var', followed by one or more identifiers, separated by commas. The list is terminated with a semicolon. Since we only have a single type in Teeny v0.05 (integer), there is no need to also declare the variables type. When we add multiple types to Teeny, we will amend this production, adding the type.

Operations on Variables

What types of operations can you do with variables:

For integers, you can perform all the standard arithmetic operations upon them.

It turns out that a stack (as used in our Virtual Machine) is ideal for working with integers, and makes code generation for arithmetic operations easy.

Changes to the Teeny v0.04 Grammar

Lines marked with '!' are new or changed:

Teeny v0.05 grammar:

   Teeny     = 'program' ident ';' {Decl} CmpndStmt '.' .
   Decl      = 'const' Const {',' Const } ';'
 !           | 'var' ident {',' ident} ';'
             .
   Const     = ident '=' number .
   CmpndStmt = 'begin' [Stmt {';' Stmt }] 'end' .
   Stmt      = Writeln
 !           | ident ':=' Expr .
   Writeln   = 'writeln' '('
               [(string|Expr) {',' (string|Expr) }] ')' .
   Expr      = Term { ('+' | '-') Term } .
   Term      = Factor { ('*' | '/') Factor } .
   Factor    = number
             | ident
             | ('+' | '-') Factor
             | '(' Expr ')'
             .

As you can see, we have added the variable declaration production, and the assignment production.

Here is a sample Teeny v0.05 program:

 program five;
 var x, y, z;
 begin
     x:=3;
     y:=2;
     z:=x+y*4;
     writeln('z = ', z);
 end.

Changes to the Scanner

The identifier "var" is now recognized.

Changes to the Symbol Table

We already have a place to store the identifiers name and value in the symbol table, from our addition of constants. For variables, we also need a place to store the address of the variable.

A new routine, enter_var() was added to add variables (as opposed to constants) to the symbol table. enter_var() sets the variables address, and then increments the the total space used by global variables.

Changes to the Parser

Enumerated type var_t, and varsym added.

factor() updated to handle variables in expressions. This mainly consists of discovering that the identifier is a variable, by consulting the symbol table. Then, that symbol table reference is passed along to the code generator to generate code to load the variables value onto the stack.

We also have to account for new statements. The EBNF for declaring variables is:

 'var' ident {',' ident} ';'

This is handled in function decl(). In pseudo code:

 if accept(varsym) then
   repeat
     expect(identsym)
   until not accept(comma)
 end if

For semantic checking, we make sure that the variable name is not already defined. Finally, we need to enter this variable identifier into the symbol table, by calling enter_var().

The EBNF for our assignment statement is:

 ident ':=' Expr .

This is handled in function stmt(). In pseudo code:

 if accept(identsym) then
   expect(assign)
   expression()
 end if

We also need to verify that the variable on the left hand side of the assignment statement exists.

Expression() places the result on the stack, and we call the code generator to generate code to store the result in the variable, using the symbol table reference.

Changes to the Code Generator

Added load_value() to load the value of a variable on the stack.

Added store() to store the top value in the stack into a variable.

Changes to the Virtual Machine

The load (lod) and store (sto) instructions were added.

Load loads a value from a variable (in the stack) to the top of the stack.

Store stores the top value on the stack into a variable (also in the stack).

The interpreter now sets the initial stack pointer just above the allocated global data. This implies that global variables are stored at the bottom of the stack.

Summary

We have added variables to our simple compiler. And, it turned out to be amazingly easy. So, what do we have? We have a simple compiler, that can create and assign variables, and perform the standard arithmetic operations upon them. Not very exciting, except that by taking small steps, it has been very easy to get to where we are. Major reasons our journey has been so simple is that we have based our compiler on a concrete grammar, and have been careful to segregate the scanning from the parsing from the code generation. It also helps that we are generating code for an idealized virtual machine :-)

Next time we will add a few control statements, which will allow us to write some simple programs that do something a little more exciting.

Summary of Changes

References