Copyright (C) R.A. van Engelen, FSU Department of Computer Science, 2000-2003

 
3. Compilers and Interpreters

Overview

  • Common compiler and interpreter configurations
  • Virtual machines
  • Integrated programming environments
  • Compiler phases
    • Lexical analysis
    • Syntax analysis
    • Semantic analysis
    • Code generation
Note: Study Chapter 1 Sections 1.4 to 1.6 of the textbook.

 

Compiling and Interpreting Programming Languages

  • The compilerDefine this term versus interpreterDefine this term implementation is often fuzzy
    • One can view an interpreter as a virtual machineDefine this term
    • A processor (CPU) can be viewed as an implementation in hardware of a virtual machine
  • Some languages cannot be purely compiled into machine code
    • Some languages allow programs to rewrite/add code
  • In general, compilers try to be as smart as possible to fix decisions that can be taken at compile time to avoid to generate code that makes a decision at run time
  • Compilation leads to better performance in general
    • Allocation of variables without variable lookup at run time
    • Aggressive code optimization to exploit hardware features
  • Interpretation leads to better diagnostics of a programming problem
    • Procedures can be invoked from command line by a user
    • Variable values can be inspected and modified by a user

 

Compilation and Interpretation

  • Compilation (conceptual):
Source Program ®
CompilerDefine this term
® Target Program
Input ®
Target Program
® Output
  • Interpretation (conceptual):
Source Program
Input
®
®
InterpreterDefine this term
® Output

 

Pure Compilation and Linking

  • Adopted by the typical Fortran implementation
  • Library routines are separately linkedDefine this term (merged) with the object code of the program
Source Program ®
CompilerDefine this term
® Incomplete Object Code
Incomplete Object Code
Library Routines
®
®
LinkerDefine this term
® Object Code

 

Compilation, Assembly, and Linking

  • Adopted by most compilers
  • Facilitates debugging of the compiler
Source Program ®
CompilerDefine this term
® AssemblyDefine this term
Assembly ®
AssemblerDefine this term
® Incomplete Object Code
Incomplete Object Code
Library Routines
®
®
LinkerDefine this term
® Object Code

 

Mixed Compilation and Interpretation

  • Adopted by Pascal, Java, functional and logic languages, and most scripting languages
  • Pascal compilers generate P-code that can be interpreted or compiled into object code
  • Java compilers generate byte code that is interpreted by the Java virtual machine (or translated into machine code by a just-in-time (JIT)Define this term compiler)
  • Functional and logic languages are compiled, but also allow dynamically created code to be compiled at run time for which the virtual machine invokes the compiler
Source Program ®
Translator
® Intermediate Program
Intermediate Program
Input
®
®
Virtual Machine
® Output

 

Preprocessing

  • Compilers for C and C++ adopt a preprocessorDefine this term
Source Program ®
PreprocessorDefine this term
® Modified Source Program
Modified Source Program ®
CompilerDefine this term
® AssemblyDefine this term
  • Early C++ compilers generated intermediate C code
Source Program ®
PreprocessorDefine this term
® Modified Source Program
Modified Source Program ®
C++ Compiler
® C Code
C Code ®
C Compiler
® AssemblyDefine this term

 

Integrated Programming Environments (IDEs)

  • Programming tools (editors, compilers/interpreters, debuggers, preprocessors, assemblers, linkers) function together in concert
  • Editors can help formatting and cross referencing
  • Trace facilities to monitor execution of the program
  • Upon run time error in compiled code the editor is invoked with cursor at source line 
  • Fundamental to Smalltalk-80
  • Java Studio, VisualStudio, Borland

 

Overview of Compilation

  • Compilation of a program proceeds through a series of phases, where subsequent phases use information found in an earlier phase or uses a form of the program produced by an earlier phase
  • Each phase may consist of a number of passes over the program representation
Character Stream
¯
ScannerDefine this term
¯
Token Stream
¯
ParserDefine this term
¯
Parse TreeDefine this term
¯
Semantic Analysis and
Intermediate Code Generation
¯
Abstract Syntax TreeDefine this term or
Other Intermediate Form
Front end
(analysis)
¯
Machine-Independent
Code Improvement
¯
Modified Intermediate Form
¯
Target Code Generation
¯
AssemblyDefine this term or Object Code
¯
Machine-Specific
Code Improvement
¯
Modified AssemblyDefine this term or Object Code
Back end
(synthesis)

 

Lexical Analysis

  • Lexical analysis breaks up a program (e.g. in Pascal)
  • program gcd (input, output);
    var i, j : integer;
    begin
      read (i, j);
      while i <> j do
        if i > j then i := i - j else j := j - i;
      writeln (i)
    end.
    into a stream of tokensDefine this term
    program  gcd  (   input  ,   output   )     ;
    var      i    ,   j      :   integer  ;     begin
    read     (    i   ,      j   )        ;     while
    i        <>   j   do     if  i        >     j
    then     i    :=  i      -   j        else  j
    :=       j    -   i      ;   writeln  (     i
    )        end  .
  • This is also known as scanning performed by a scannerDefine this term
  • A lexical error is produced when an unrecognized character is encountered
Note: Download a scanner application in Java

 

Context-Free Grammars

  • A context-free grammar defines the syntax of a programming language
  • The grammar defines syntactic categories
    • Statements
    • Expressions
    • Declarations
  • Categories are subdivided into more detailed categories
    • Loop-statement
    • If-statement
    • Logical-expression
    • ...
  • Some programming language manuals include language grammars
<statement> -> <loop-statement>
<statement> -> <if-statement>
<loop-statement> -> for (<expression>; <expression>; <expression>)
                       <statement>
<expression> -> <logical-expression>
...
     

 

Syntax Analysis

  • Parsing organizes tokens into a hierarchy called a parse treeDefine this term
  • A grammar of a language with the token stream defines the structure of the parse tree
  • A syntax error is produced by a compiler when the parse tree cannot be constructed for a program (fragment)
  • Example (incomplete) Pascal grammar:
  • <Program> -> program <id> ( <id> <More_ids> ) ; <Block> .
    
    <Block> -> <Variables> begin <Stmt> <More_Stmts> end
    
    <More_ids> -> , <id> <More_ids>
                | e
    
    <Variables> -> var <id> <More_ids> : <Type> ; <More_Variables>
                 | e
    
    <More_Variables> -> <id> <More_ids> : <Type> ; <More_Variables>
                      | e
    
    <Stmt> -> <id> := <Exp>
            | read ( <id> <More_ids> )
            | writeln ( <Exp> <More_Exps> )
            | if <Exp> then <Stmt> else <Stmt>
            | while <Exp> do <Stmt>
            | begin <Stmt> <More_Stmts> end
Note: An interactive parser demo demonstrates the parsing of the gcd Pascal example program into a parse tree (see also textbook pp. 20-21).

 

Semantic Analysis

  • Semantic analysis is applied by a compilerDefine this term to discover the meaning of a program by analyzing its parse treeDefine this term or abstract syntax treeDefine this term (see later)
  • Static semantic checksDefine this term are performed at compile time
    • Type checking
    • Every variable is declared before used
    • Identifiers are used in appropriate contexts
    • Check subroutine call arguments
    • Check labels
  • Dynamic semantic checksDefine this term are performed at run time, and the compiler produces code that performs these checks
    • Array subscript values are within bounds
    • Arithmetic errors, e.g. division by zero
    • Pointers are not dereferenced unless pointing to valid object
    • A variable is used but hasn't been initialized
    • When a check fails at run time, an exceptionDefine this term is raised

 

Strong Typing

  • A language is strongly typed "if (type) errors are always detected"
  • Such errors are listed on previous slide
  • Errors are either detected at compile time or at run time
  • Strong typing makes language safe and easier to use, but slower because of dynamic semantic checks
  • Languages that are strongly typed are
    • AdaDefine this term
    • JavaDefine this term
    • MLDefine this term, HaskellDefine this term
  • Languages that are not strongly typed are
    • FortranDefine this term, PascalDefine this term, CDefine this term
    • LispDefine this term, C++Define this term
  • In some languages, most (type) errors are detected late at run time which is detrimental to reliability (e.g. early BasicDefine this term, LispDefine this term, PrologDefine this term, some script languages)

 

Intermediate Code Generation

  • A typical intermediate form of code produced by the semantic analyzer is an abstract syntax tree (AST)Define this term
  • The AST is annotated with useful information such as pointers to the symbol table entry of identifiers
  • Example AST for the gcd Pascal program:
AST of gcd Pascal program

 

Target Code Generation and Optimization

  • The AST with the annotated information is traversed by the compiler to generate a low-level intermediate form of code, close to assemblyDefine this term
  • This machine-independent intermediate form is optimized
  • From the machine-independent form assembly or object code is generated by the compiler
  • This machine-specific code is optimized to exploit specific hardware features
Exercise 1: Name two languages in which a program can rewrite new pieces of itself. Hint: which languages are said to be suitable for symbolic and logic processing?
Exercise 2: Which IDEs do you regularly use? If not, explain the tools you use for programming projects.
Exercise 3: Describe the six tools that are commonly used with a compiler within an IDE.
Exercise 4: Using your favorite compiled programming language, give an example of
  • a lexical error detected by the scanner
  • a syntax error detected by the parser
  • a static semantic error detected by semantic analysis
  • a dynamic semantic error detected at runtime by the code generated by the compiler