COP 4020 Project 2: Recursive Descent Compilation

Educational Objectives Summary: After completing this assignment, the student should be able to do the following:

Draw parse trees for legal expressions

Given an LL(1) grammar, implement a recursive descent parser as a Java class:

Define functions (methods) for each non-terminal in the grammar

Use sequencing and recursion as defined in the productions of the grammar

Explain how legal expressions are parsed by the code

Explain why non-legal expressions are not fully parsed and where the parser detects the non-legality

Illustrate decorated parse trees for legal expressions in an augmented LL(1) grammar

Read and understand implementations of augmented LL(1) grammars

Expand augmented LL(1) grammars to include new language featrures

Modify and expand existing implementations of augmented LL(1) grammars to include additions to the grammar

Implement a compiler front end for a small LL(1) grammar with output an abstract syntax tree (AST).

Implement a compiler for a calculator language producing an AST with known arithmetic semantics accomplished

Traverse an AST to produce a Lisp-like representation of a legal input expression

Deliverables summary: Parser2.java, Calc2.java, AST2.java, Calc2Lisp.java, log.txt

Part 1: Recursive Descent Parser

Educational Objectives: After completing this assignment, the student should be able to do the following:

Draw parse trees for legal expressions
Given an LL(1) grammar, implement a recursive descent parser as a Java class:

Define functions (methods) for each non-terminal in the grammar
Use sequencing and recursion as defined in the productions of the grammar

Explain how legal expressions are parsed by the code
Explain why non-legal expressions are not fully parsed and where the parser detects the non-legality

Operational Objectives: Implement a recursive descent parser in Java for a calculator language based on a BNF grammar.

Deliverables: One file Parser2.java

Arithmetic operators in a programming language are typically left associative with the notable exception of exponentiation (^) which is right associative. (However, this rule of thumb is not universal.)

Associativity can be captured in a grammar. For a left associative binary operator lop we can have a production of the form:
```
<expr> -> <term>
        | <expr> lop <term>
```
For example, a+b+c is evaluated from the left to the right by summing a and b first. Assuming that <term> represents identifiers, the parse tree of a+b+c with the grammar above is:
```
            <expr>
          /   |   \
      <expr>  +  <term>
    /   |   \      |
<expr>  +  <term>  c
  |          |
<term>       b
  |
  a
```
As you can see, the left subtree represents a+b which is a subsexpression of a+b+c, because a+b+c is parsed as (a+b)+c.
Note that the production for a left associative operator is left recursive. To eliminate left recursion, we can rewrite the grammar into:
```
<expr>      -> <term> <term_tail>
<term_tail> -> lop <term> <term_tail>
             | empty
```
This (part of the) grammar is LL(1) and therefore suitable for recursive descent parsing. However, the parse tree structure does not capture the left-associativity of the lop operator.

Draw the parse tree of a+b+c using the LL(1) grammar shown above. You may assume that <term> represents identifiers. Hint: draw the tree from the top down by simulating a top-down predictive parser.
For a right associative operator rop we can create a grammar production of the form:
```
<expr> -> <term>
        | <term> rop <expr>
```
An example right associative operator is exponentiation ^, so a^b^c is evaluated from the right to the left such that b^c is evaluated first.
Draw the parse tree of a^b^c. You may assume that <term> represents identifiers.

The precedence of an operator indicates the priority of applying the operator relative to other operators. For example, multiplication has a higher precedence than addition, so a+b*c is evaluated by multiplying b and c first. In other words, multiplication groups more tightly compared to addition. The rules of operator precedence vary from one programming language to another.

The relative precedences between operators can be captured in a grammar as follows. A nonterminal is introduced for every group of operators with identical precedence. The nonterminal of the group of operators with lowest precedence is the nonterminal for the expression as a whole. Productions for (left associative) binary operators with lowest to highest precedences are written of the form suitable for recursive descent parsing. Here is an outline:

<expr>    -> <e1> <e1_tail>
<e1>      -> <e2> <e2_tail>
<e1_tail> -> <lowest_op> <e1> <e1_tail>
           | empty
<e2>      -> <e3> <e3_tail>
<e2_tail> -> <second_lowest_op> <e2> <e2_tail>
           | empty
...
<eN>      -> '(' <expr> ')'
           | '-' <eN> 
           | identifier
           | number
<eN_tail> -> <highest_op> <eN> <eN_tail>
           | empty

where <lowest_op> is a nonterminal denoting all operators with the same lowest precedence, etc.

The following Java program uses these concepts to implement a recursive descent parser for a calculator language:

/* Parser.java
   Implementes a parser for a calculator language
   Uses java.io.StreamTokenizer and recursive descent parsing

   Compile:
   javac Parser.java
*/
import java.io.*;
/* Calculator language grammar:

   <expr>        -> <term> <term_tail>
   <term>        -> <factor> <factor_tail>
   <term_tail>   -> <add_op> <term> <term_tail>
                  | empty
   <factor>      -> '(' <expr> ')'
                  | '-' <factor>
                  | identifier
                  | number
   <factor_tail> -> <mult_op> <factor> <factor_tail>
                  | empty
   <add_op>      -> '+' | '-'
   <mult_op>     -> '*' | '/'
*/
public class Parser
{
  private static StreamTokenizer tokens;
  private static int token;
  public static void main(String argv[]) throws IOException
  {
    InputStreamReader reader;
    if (argv.length > 0)
      reader = new InputStreamReader(new FileInputStream(argv[0]));
    else
      reader = new InputStreamReader(System.in);
    // create the tokenizer:
    tokens = new StreamTokenizer(reader);
    tokens.ordinaryChar('.');
    tokens.ordinaryChar('-');
    tokens.ordinaryChar('/');
    // advance to the first token on the input:
    getToken();
    // check if expression:
    expr();
    // check if expression ends with ';'
    if (token == (int)';')
      System.out.println("Syntax ok");
    else
      System.out.println("Syntax error");
  }
  // getToken - advance to the next token on the input
  private static void getToken() throws IOException
  {
    token = tokens.nextToken();
  }
  // expr - parse <expr> -> <term> <term_tail>
  private static void expr() throws IOException
  {
    term();
    term_tail();
  }
  // term - parse <term> -> <factor> <factor_tail>
  private static void term() throws IOException
  {
    factor();
    factor_tail();
  }
  // term_tail - parse <term_tail> -> <add_op> <term> <term_tail> | empty
  private static void term_tail() throws IOException
  {
    if (token == (int)'+' || token == (int)'-')
    {
      add_op();
      term();
      term_tail();
    }
  }
  // factor - parse <factor> -> '(' <expr> ')' | '-' <expr> | identifier | number
  private static void factor() throws IOException
  {
    if (token == (int)'(')
    {
      getToken();
      expr();
      if (token == (int)')')
        getToken();
      else System.out.println("closing ')' expected");
    }
    else if (token == (int)'-')
    {
      getToken();
      factor();
    }
    else if (token == tokens.TT_WORD)
      getToken();
    else if (token == tokens.TT_NUMBER)
      getToken();
    else System.out.println("factor expected");
  }
  // factor_tail - parse <factor_tail> -> <mult_op> <factor> <factor_tail> | empty
  private static void factor_tail() throws IOException
  {
    if (token == (int)'*' || token == (int)'/')
    {
      mult_op();
      factor();
      factor_tail();
    }
  }
  // add_op - parse <add_op> -> '+' | '-'
  private static void add_op() throws IOException
  {
    if (token == (int)'+' || token == (int)'-')
      getToken();
  }
  // mult_op - parse <mult_op> -> '*' | '/'
  private static void mult_op() throws IOException
  {
    if (token == (int)'*' || token == (int)'/')
      getToken();
  }
}

Copy (and download if needed) this example parser program from:

~cop4020p/[semester]/proj2/

Compile and execute:

javac Parser.java
java Parser

Give the output of the program when you type 2*(1+3)/x; and explain why this expression is accepted by the parser by drawing the parse tree. Give the output of the program when you type 2x+1; and explain why it is not accepted. At what point in the program does the parser fail?

Extend the parser program to include syntax checking of function calls with one argument, given by the new production for <factor>:
```
<factor> -> '(' <expr> ')'
          | '-' <factor>
          | identifier '(' <expr> ')'
          | identifier
          | number
```
Test your implementation with 2*f(1+a);. Also draw the parse tree of 2*f(1+a);.
Extend the parser to include syntax checking of the exponentiation operator ^, so that expressions like -a^2 and -(a^b)^(c*d)^(e+f) can be parsed. Note that exponentation is right associative and has the highest precedence, even higher than unary minus, so -a^2 is evaluated by evaluating a^2 first. To implemented this, you must add a <power> nonterminal and also change the production of <factor> so that the parse tree of -a^2 is:
```
    <factor>
     /    \
    -   <power>
        /  |  \
       a   ^   <power>
                 |
                 2
```

Keep this step of the project as a working Java program named "Parser2.java". This will be collected using the standard submit script configured with the project 2 deliverables.sh. Your answers for the non-programming questions should be inserted as "Appendix 1" at the end of your log.txt. Use plain text to draw the trees as required.

Hint: The compiled "Parser2v.class" is distributed in area51. You can use this to see how an expression is processed through a correctly functioning parser. (This is a "verbose" version. Your Parser2.java should not be verbose.)

Part 2: Calculator with Assignment

Educational Objectives: After completing this assignment, the student should be able to do the following:

Illustrate decorated parse trees for legal expressions in an augmented LL(1) grammar
Read and understand implementations of augmented LL(1) grammars
Expand augmented LL(1) grammars to include new language featrures
Modify and expand existing implementations of augmented LL(1) grammars to include additions to the grammar

Operational Objectives: Implement a calculator with assignment in Java using an L-attributed grammar

Deliverables: One file Calc2.java

Consider the following augmented LL(1) grammar for an expression language:

<expr>         -> <term> <term_tail>           term_tail.subtotal := term.value;
                                               expr.value := term_tail.value
<term>         -> <factor> <factor_tail>       factor_tail.subtotal := factor.value;
                                               term.value := factor_tail.value
<term_tail1>   -> '+' <term> <term_tail2>      term_tail2.subtotal :=
                                                         term_tail1.subtotal+term.value;
                                               term_tail1.value := term_tail2.value
                | '-' <term> <term_tail2>      term_tail2.subtotal :=
                                                         term_tail1.subtotal-term.value;
                                               term_tail1.value := term_tail2.value
                | empty                        term_tail1.value := term_tail1.subtotal
<factor1>      -> '(' <expr> ')'               factor1.value := expr.value
                | '-' <factor2>                factor1.value := -factor2.value
                | number                       factor1.value := number
<factor_tail1> -> '*' <factor> <factor_tail2>  factor_tail2.subtotal :=
                                                         factor_tail1.subtotal*factor.value;
                                               factor_tail1.value := factor_tail2.value
                | '/' <factor> <factor_tail2>  factor_tail2.subtotal :=
                                                         factor_tail1.subtotal/factor.value;
                                               factor_tail1.value := factor_tail2.value
                | empty                        factor_tail1.value := factor_tail1.subtotal

Note: the indexing (1 and 2) used with nonterminals, such as <factor1> and <factor2>, is only relevant to the semantic rules to identify the specific occurrences of the nonterminals in a production. (See text.)

Draw the decorated parse tree for -2*3+1 that shows the attributes and their values.

The following calculator Java program implements the attribute grammar shown above to calculate the value of an expression. To this end, the synthesized value attributes are returned as integer values from the methods that correspond to nonterminals. Inherited subtotal attributes are passed to the methods as arguments:

/* Calc.java
   Implementes a parser and calculator for simple expressions
   Uses java.io.StreamTokenizer and recursive descent parsing
	
   Compile:
   javac Calc.java

   Execute:
   java Calc
   or:
   java Calc <filename>
*/
import java.io.*;
public class Calc
{
  private static StreamTokenizer tokens;
  private static int token;
  public static void main(String argv[]) throws IOException
  {
    InputStreamReader reader;
    if (argv.length > 0)
      reader = new InputStreamReader(new FileInputStream(argv[0]));
    else
      reader = new InputStreamReader(System.in);
    // create the tokenizer:
    tokens = new StreamTokenizer(reader);
    tokens.ordinaryChar('.');
    tokens.ordinaryChar('-');
    tokens.ordinaryChar('/');
    // advance to the first token on the input:
    getToken();
    // parse expression and get calculated value:
    int value = expr();
    // check if expression ends with ';' and print value
    if (token == (int)';')
      System.out.println("Value = " + value);
    else
      System.out.println("Syntax error");
  }
  // getToken - advance to the next token on the input
  private static void getToken() throws IOException
  {
    token = tokens.nextToken();
  }
  // expr - parse <expr> -> <term> <term_tail>
  private static int expr() throws IOException
  {
    int subtotal = term();
    return term_tail(subtotal);
  }
  // term - parse <term> -> <factor> <factor_tail>
  private static int term() throws IOException
  {
    int subtotal = factor();
    return factor_tail(subtotal);
  }
  // term_tail - parse <term_tail> -> <add_op> <term> <term_tail> | empty
  private static int term_tail(int subtotal) throws IOException
  {
    if (token == (int)'+')
    {
      getToken();
      int termvalue = term();
      return term_tail(subtotal + termvalue);
    }
    else if (token == (int)'-')
    {
      getToken();
      int termvalue = term();
      return term_tail(subtotal - termvalue);
    }
    else
      return subtotal;
  }
  // factor - parse <factor> -> '(' <expr> ')' | '-' <expr> | identifier | number
  private static int factor() throws IOException
  {
    if (token == (int)'(')
    {
      getToken();
      int value = expr();
      if (token == (int)')')
        getToken();
      else
        System.out.println("closing ')' expected");
      return value;
    }
    else if (token == (int)'-')
    {
      getToken();
      return -factor();
    }
    else if (token == tokens.TT_WORD)
    {
      getToken();
      // ignore variable names
      return 0;
    }
    else if (token == tokens.TT_NUMBER)
    {
      getToken();
      return (int)tokens.nval;
    }
    else
    {
      System.out.println("factor expected");
      return 0;
    }
  }
  // factor_tail - parse <factor_tail> -> <mult_op> <factor> <factor_tail> | empty
  private static int factor_tail(int subtotal) throws IOException
  {
    if (token == (int)'*')
    {
      getToken();
      int factorvalue = factor();
      return factor_tail(subtotal * factorvalue);
    }
    else if (token == (int)'/')
    {
      getToken();
      int factorvalue = factor();
      return factor_tail(subtotal / factorvalue);
    }
    else
      return subtotal;
  }
}

Copy this example Calc.java program from [LIB]/proj2/, and compile and run it:

javac Calc.java
java Calc

Explain why the input 1/2; to this program produces the value 0. What are the relevant parts of the program involved in computing this result?

Extend the attribute grammar with two new productions and two new attributes for all nonterminals:

The in inherited attribute is a symbol table with identifier-value bindings that defines the bindings of identifiers in the scope (context) in which (part of) the expression is evaluated,
The out synthesized attribute is a symbol table with identifier-value bindings that holds the in bindings plus the new bindings introduced by (part of) the expression as explained below.

The two new productions with corresponding semantic rules are as follows:

<expr1>   -> 'let' identifier '=' <expr2> expr2.in           := expr1.in;
                                          expr1.value        := expr2.value
                                          expr1.out          := expr2.out.put(identifier=expr2.value)
           | <term> <term_tail>           term.in            := expr1.in;
                                          term_tail.in       := term.out;
                                          term_tail.subtotal := term.value;
                                          expr1.value        := term_tail.value;
                                          expr1.out          := term_tail.out
<factor1> -> '(' <expr> ')'               expr.in            := factor1.in;
                                          factor1.value      := expr.value
                                          factor1.out        := expr.out
           | '-' <factor2>                factor2.in         := factor1.in;
                                          factor1.value      := -factor2.value;
                                          factor1.out        := factor2.out
           | identifier                   factor1.value      := factor1.in.get(identifier)
                                          factor1.out        := factor1.in  
           | number                       factor1.value      := number;
                                          factor1.out        := factor1.in

The first production introduces an assignment construct as an expression, similar to the C/C++ assignment which can also be used within an expression, as in this example:

(let x = 3) + x;
Value = 6

The semantic rule expr2.in := expr1.in copies the symbol table of the context in which expr1 is evaluated to the context of expr2. The evaluation of expr2 may change the symbol table and the table is copied to expr1 with the semantic rule expr1.out := expr2.out. For this part of the assignment, you have to change the semantic rules of all other productions in the grammar to include assignments for the in and out attributes to pass the symbol table. Write down the grammar with these new semantic rules.

Implement the two new productions and semantic rules in an updated Calc2.java program.

To implement a symbol table with identifier-value bindings, you can use the Java java.util.Hashtable class as follows:

import java.util.*;
...
public class Calc
{
  ...
  public static void main(String argv[]) throws IOException
  {
    ...
    Hashtable<String,Integer> exprin = new Hashtable<String,Integer>();
    Hashtable<String,Integer> exprout;
    ...
    int value = expr(exprin, exprout);
    ...
    private static int expr
      (Hashtable<String,Integer> exprin, Hashtable<String,Integer> exprout) throws IOException
    {
      if (token == tokens.TT_WORD && tokens.sval.equals("let"))
      {
        getToken(); // advance to identifier
        String id = tokens.sval;
        getToken(); // advance to '='
        getToken(); // advance to <expr>
        int value = expr(exprin, exprout);
        exprout.put(id, new Integer(value));
        ... // return statement here
      }
      else
      {
        Table x = exprin; // Java likes references to be initialized
        int subtotal = term(exprin, x);
        return term_tail(subtotal, x, exprout);
      }
    }
    private static int factor
      (Hashtable<String,Integer> factorin, Hashtable<String,Integer> factorout) throws IOException
    {
      ...
      else if (token == tokens.TT_WORD)
      {
        String id = tokens.sval;
        getToken();
        factorout = factorin;
        return ((Integer)factorin.get(id)).intValue();
      }
      ...

The put method puts a key and value in the hashtable, where the value must be a class instance so an Integer instance is created. The get method returns the value of a key. The intValue method of Integer class returns an int. Test your new Calc2.java application. For example:

let x = 1;
Value = 1

(let x = 1) + x;
Value = 2

(let a = 2) + 3 * a;
Value = 8

1 + (let a = (let b = 1) + b) + a;
Value = 5

Save this assignment as a working Java program named "Calc2.java". It will be collected using the standard submit script configured with the project 2 deliverables.sh. Your answers for the non-programming questions should be in "Appendix 2" at the end of your log.txt. Use plain text to draw trees and write grammars as required.

Suggestions on drawing trees. There are (at least) two basic ways to illustrate trees using ascii text. The first is "pyramidal":


                                  <expr>(-3)
                        --------------------------------
                       /                                \
                  <term>(-5)                             <term_tail1>[-5](-3)
                 --------                               -------------
                /        \                             /     |       \

           .................................................................

      Note: [] represents inherited attribute values
            () represents synthesized attribute values

A second way to represent the same tree is a squared off version:

<expr>(-3)
----------------------------------------------------
    |                                          |
<term>(-5)                             <term_tail1>[-5](-3)
---------------                        ----------------------------
    |        |                                 |        |        |

 .....................................................................

      Note: [] represents inherited attribute values
            () represents synthesized attribute values

The latter may be easier to use, especially for decorated parse trees where a lot of information is displayed for each node. Note that in either case we are using [] to enclose inherited attribute values and () to enclose synthesized attribute values.

Hint: The compiled "Calc2v.class" is distributed in area51. You can use this to see how an expression is processed through a correctly functioning calculator. (This is a "verbose" version. Your Calc2.java should not be verbose.)

Part 3: AST and Calc2Lisp

Educational Objectives: After completing this assignment, the student should be able to do the following:

Implement a compiler front end for a small LL(1) grammar with output an abstract syntax tree (AST).
Implement a compiler for a calculator language producing an AST with known arithmetic semantics accomplished
Traverse an AST to produce a Lisp-like representation of a legal input expression

Operational Objectives: Implement a calculator-to-lisp application in Java using the abstract syntax tree class defined in AST.java

Deliverables: Two files AST2.java and Calc2Lisp.java

Copy (and download if needed) the CalcAST.java and AST.java source files from

[LIB]/proj2/

The CalcAST program constructs an abstract syntax tree (AST) representation of arithmetic expressions. For example, when the expression that you input is 1+2; the program constructs the following AST:

  +
 / \
1   2

This tree structure is constructed with the AST class, which has a tree node structure that contains an optional operator (e.g. +), an optional value (e.g. 1), and optional left and right subnodes for the operands to unary and binary operators. The AST class has a toLisp method. When invoked it will output the expression in Lisp form, such as (+ 1 2) for example.

Compile the sources on linprog with:

javac CalcAST.java

And run the resulting program:

java CalcAST

The program will wait for input from the command line, so type 1+2;<enter> for example. The program output will be the Lisp equivalent of this expression (+ 1 2). (Note that the toLisp method does a preorder traversal of the AST, implemented recursively. See COP 4530 Lecture Notes.)

Modify the CalcAST.java program to pre-evaluate parts of expressions when possible. That is, all arithmetic operations are performed when the operands are numeric. When one of the operands is non-numeric (symbolic), an AST node is created. The output of the program will be partially evaluated expressions translated into Lisp.

In addition, add productions and code to implement the power operator ^ (see Part 1 above). For the implementation, you need to use the static Math.pow method of class Math to compute powers. This operator must be evaluated when possible, along with the other arithmetic operators.

You may find it convenient to strengthen the AST class. Whether you do or not, copy the file AST.java to AST2.java and rename the class to AST2. Your Calc2Lisp should be a client of AST2. Both files should be turned in (using the submit script as usual).

Examples:

java Calc2Lisp
2*(1+3)-2^3+xyz;
 xyz
java Calc2Lisp
2*(1+3)-2^3+x*y*z;
 (* (* x y) z)

The outputs are simplified Lisp expressions - xyz is an identifier while (* (* x y) z) is the product of x, y, z.

Note that the AST node structure includes a val member that can be used to store a node's value and to pass values as part of the AST instances that are returned from methods (as synthesized attribute values) and passed to methods (as inherited attribute values). The type of val is Object, so to create an AST node with an integer value, say 7, you need: new AST(new Integer(7)).

Here are some sample calculations you can use to test your caclulator:

1+2+3;
 6

1*2*3;
 6

1*-2*(3-6);
 6

1+2+x+3;
 (+ (+ 3 x) 3)

x+1+2;
 (+ (+ x 1) 2)

x+0;
 x

1*x;
 x

x^1;
 x

--2;
 2

--x;
(-(- x))

2+3+x+4+5; 
 (+ (+ (+ 5 x) 4) 5)

2*3*x*4*5;
 (* (* (* 6 x) 4) 5)

2^3^x^4^5;
 (^ 2 (^ 3 (^ x 1024)))

Note that the semantic rules of the grammar enforce associativity, so 1+2+x+3 is evaluated from the left. The evaluation process does not consider commutativity, so the expression does not simplify to x+6.

The files Parser2.java, Calc2.java, AST2.java, Calc2Lisp.java, and log.txt will be collected by the submit script configured with deliverables.sh.