Term not found? Please inform the instructor. Thanks!
Copyright R.A. van Engelen,
FSU Department of Computer Science, 2000-2003
Background Information and Glossary
Instructions: click on the
-buttons
provided in the notes to obtain a detailed description of a term.
Programming languages:
Programming languages are central to Compute Science. They reflect
many aspects of Computer Science in a nutshell, such as language syntax,
programming lanugage semantics (meaning), applicatin of theorem proving
(for type checking and inference), abstract and virtual machines
,
data structures, software engineering, computer architecture and hardware
issues, etc.
Programming languages follow simple syntactic conventions (as opposed
to natural languages), see also BNF
syntax. Programming languages are compiled into machine code
(also known as object code) by a compiler
and linker
,
or interpreted by an interpreter
,
or executed by a hybrid compiler/interpreter.
Programming languages can be classified as imperative
or declarative
.
This classification is further subdivided as follows:
declarative :
implicit solution
"what the computer should do" |
functional
(e.g. Lisp ,
Scheme ,
ML ,
Haskell )
logic
(e.g. Prolog )
dataflow |
imperative :
explicit solution
"how the computer should do it" |
procedural
("von Neumann", e.g. Fortran ,
Pascal ,
Basic ,
C )
object-oriented
(Smalltalk ,
Eiffel ,
C++ ,
C# ,
Java ) |
Imperative programming languages:
Programs written in imperative programming languages describe exactly
the computational steps necessary for the computer to obtain a result.
In contrast, declarative languages
allow a programming problem to be stated without certain explicit details
by which the calculation should proceed. Imperative languages are procedural
languages (e.g. Fortran
,
Pascal
,
Basic
,
C
)
and (most) object-oriented
languages (Smalltalk
,
Eiffel
,
C++
,
C#
,
Java
).
Declarative programming languages:
Programs written in a declarative programming lack explicit details
by which the calculation should proceed. Rather, a program is written in
a style that assumes a more implicit execution ordering. Typically, recursion
is used in declarative programming, possibly in combination with higher-order
functions
.
Declarative languages are the functional
languages (e.g. Lisp
,
Scheme
,
ML
,
Haskell
),
logic
languages (e.g. Prolog
),
and dataflow
languages.
Functional programming languages:
The underlying machinery of functional programming languages is based
on Church's lambda calculus
.
The computational model is based on recursive functions and a program is
considered a function thats maps inputs to outputs. Through the process
of top-down refinement, a program is defined in terms of simpler functions.
Example languages in this category are Lisp
,
Scheme
,
ML
,
and Haskell
.
Functional program example (Haskell):
gcd a b
| a == b = a
| a > b = gcd (a-b) b
| a < b = gcd a (b-a)
Dataflow programming languages:
Dataflow programming languages model computation as the flow of information
among primitive functional nodes. An example of this model of computation
is a spread-sheet program. The cells can be viewed as primitive computational
units that communicate with other cells to obtain values used to calculate
the value displayed in the cell from a formula.
Logic programming languages:
Logic programming languages are declarative languages that derive results
by logical inference. Prolog
for
example, is based on propositional logic. The computational model consists
of an inference process on a database to find values that satisfy certain
constraints and relationships.
Logic program example (Prolog):
gcd(A, A, A). % note: if the first two arguments are the same, the third argument (GCD) is A
gcd(A, B, G) :- A > B, N is A-B, gcd(N, B, G).
gcd(A, B, G) :- A < B, N is B-A, gcd(A, N, G).
Procedural ("von Neumann")
programming languages:
Although object-oriented programming
is gaining more popularity, the procedural languages are still the most
familiar and successful languages. The basic mode of operation is the modification
of variables which is sometimes referred to as computing via side effects
:
procedural languages are based on statements that influence subsequent
computation by chaning the value of memory. The success of these languages
can be mainly contributed to the efficiency of the language implementation
in our current computer architectures, called the von Neumann computer
architectures. This common architecture exhibits a central processing unit
(CPU) and memory which are connected by a bus.
Example procedural languages are Fortran 77
,
Basic
,
Pascal
,
Ada
,
and C
.
Procedural program example (C):
int gcd(int a, int b)
{ while (a != b)
if (a > b) a = a-b; else b = b-a;
return a;
}
Object-oriented programming
languages:
Most object-oriented programming languages are closely related to the
procedural languages. The fundamental difference in programming style,
which is known as object-oriented programming OOP
),
is that object-oriented languages put objects and their interactions on
the forefront rather than computation as the operation of a processor on
a monolithic memory. Each object has an internal state and executable functions
to manage that state.
Example object-oriented languages are Smalltalk
,
Eiffel
,
C++
,
C#
,
Java
.
Safe programming languages:
Strong typing
is considered the most important safety issue of a programming language
as typing errors are always detected. Among other things are the safety
issues dealing with the use of a progamming language on the Internet,
such as in Java and C# which have elaborate authentication schemes. Languages
such as C and C++ are not safe, because the compiler and runtime environment
cannot guarantee type safety. For example, pointer casts can be used to
change the type of the data pointed to without actually converting the
data.
Here is an example of a safe cast to convert an integer to a float:
int n = 5;
float f = (float)n;
This is an example of an unsafe cast, which is prohibited in type-safe
languages:
int n = 5;
float *fp = (float*)&n;
Such casts, whether explicit as above or implicit, can lead to disaster.
Type-safe languages usually do not support pointer arithmetic to prevent
accessing data of the ''wrong'' type.
Strong typing:
In a strongly typed programming language typing errors are always detected.
The detection can be at compile time or at run time. A strongly typed language
is considered more safe, because it prevents operations from being applied
to the wrong type of object which can cause unintended modifications to
the state of the program. For example, Ada
,
Java
,
and Haskell
are strongly typed languages. C
and C++
are not, e.g. because void pointers can point to any type of object that
can be manipulated, see the example in
.
Pascal
is "almost" strongly typed. The exception is the use of a variant record
(union) without discriminator. The variant record can hold alternative
types of objects during the execution of a program.
Relocatable:
When machine code is relocatable in memory it means that the code can
be moved from one location to another to make room for new or modified
routines in memory. Relative addressing is used in the code and/or the
absolute addresses in the code are converted before the code is executed.
This conversion can take place during the loading of an executable program
in memory by a loader
.
Machine code or object code:
Machine code (also known as object code) consists of machine-specific
operations expressed in binary code. A central processing unit (CPU) of
a computer executes the binary machine code which is typically fetched
from the main memory of a machine. An executable program consists of object
code which contains a sequence of machine instructions. An executable program
is loaded (sometimes with a loader
and/or by an operating system (OS)) into main memory for execution. See
also assembly language
Assembly is translated into machine code (object code) by an assembler
.
Assembler:
Translator of assembly
programs (mnemonic instructions) to machine code (or object code).
Assembly language and machine/object code:
An assembly language is a processor-specific language that uses mnemonic
abbreviations to define low-level machine instructions. The mnemonic abbreviations
are translated into machine code (also called object code) by an assembler
.
The abbreviations usually consist of the name for the instruction followed
by operands which are register names such as 'sp' (stack pointer) and 'a0'
(address register 0), memory references such as 'A' (local label) and 'putint'
(function label), and memory offsets such as '20(sp)' (20 bytes/words from
the location pointed to be the stack pointer ).
Example MIPS assembly program to compute GCD (from textbook page 1):
addiu sp,sp,-32
sw ra,20(sp)
jal getint
nop
jal getint
sw v0,28(sp)
lw a0,28(sp)
move v1,v0
beq a0,v0,D
slt at,v1,a0
A: beq at,zero,B
nop
b C
subu a0,a0,v1
B: subu v1,v1,a0
C: bne a0,v1,A
slt at,v1,a0
D: jal putint
nop
lw ra,20(sp)
addiu sp,sp,32
jr ra
move v0,zero
Example MIPS R4000 machine code of the above assembly program (from textbook
page 1):
27bdffd0 afbf0014 0c1002a8 00000000 0c1002a8 afa2001c 8fa4001c
00401825 10820008 0064082a 10200003 00000000 10000002 00832023
00641823 1483fffa 0064082a 0c1002b2 00000000 8fbf0014 27bd0020
03e00008 00001025
Structured programming:
Considered a revolution in programming in the 70s (much like object-oriented
programming in the late 80s and early 90s). A programming technique that
emphasizes top-down design, modularization of code (large routines are
broken down into smaller, modular, routines), structured types (eg. records,
sets, pointers, and multi-dimensional arrays), descriptive variable and
constant names, and extensive commenting conventions. The use of the GOTO
statement is discouraged to avoid spaghetti code (code that exhibits a
criss-cross control flow behavior at run-time). Certain programming statements
are indented in order to make loops and other program logic easier to follow.
Structured languages, such as Pascal
and Ada
,
force the programmer to write a structured program. However, unstructured
languages such as Fortran 77
,
Cobol
,
and Basic
require discipline on the part of the programmer to follow.
Here is an example of a non-structured program in C
that counts the number of goto's in a file whose filename is given
as an argument on the command line. This program can be used to measure
the "spaghettiness" of a C program:
#include <stdio.h>
#include <malloc.h>
main(togo,toog)
int togo;
char *toog[];
{char *ogto, tgoo[80];FILE *ogot; int oogt=0, ootg, otog=79,
ottg=1;if ( togo== ottg) goto gogo; goto goog; ggot:
if ( fgets( tgoo, otog, ogot)) goto gtgo; goto gott;
gtot: exit(); ogtg: ++oogt; goto ogoo; togg: if ( ootg > 0)
goto oggt; goto ggot; ogog: if ( !ogot) goto gogo;
goto ggto; gtto: printf( "%d goto \'s\n", oogt); goto
gtot; oggt: if ( !memcmp( ogto, "goto", 4)) goto otgg;
goto gooo; gogo: exit( ottg); tggo: ootg= strlen(tgoo);
goto tgog; oogo: --ootg; goto togg; gooo: ++ogto; goto
oogo; gott: fclose( ogot); goto gtto; otgg: ogto= ogto +3;
goto ogtg; tgog: ootg-=4;goto togg; gtgo: ogto= tgoo;
goto tggo; ogoo: ootg-=3;goto gooo; goog: ogot= fopen(
toog[ ottg], "r"); goto ogog; ggto: ogto= tgoo; goto
ggot;}
Fortran and Basic programs developed in the early days of computing were
difficult to read and understand, somewhat similar to this example in terms
of the choice in variable names (limit 6 characters in Fortran and 2 in
Basic) and the frequent use of goto.
Block structured language:
A language that supports the local declaration of variables with a
limited scope
in a block or compound statement. For example, the following C fragment
declares a temporary integer variable n
to be used in the loop to copy a file from standard input to standard output:
{ int n;
while ((n = getchar()) != EOF)
putchar(n);
}
Variable n has a local scope
limited to the block and it's value is only accessible within the block.
In C, C++, Java, and C#, a block is opened with {
and closed with }. Pascal and Ada
use begin and end keywords to delimit a block.
The use of blocks is so common today that we don't tend to think of
it as something special.
Object-oriented programming (OOP):
Object-oriented programming OOP is a programming style that puts objects
and their interactions on the forefront rather than computation as the
operation of a processor on a monolithic memory. Each object has an internal
state and executable functions to manage that state. This programming style
is naturally adopted in object-oriented programming
languages but can also be adopted in procedural
or functional
languages. In fact, most object oriented languages were designed as an
extension of a procedural language (e.g. C++ and object-oriented Pascal
dialects) and the concept of a class is largely based on the concept of
an abstract data type
.
Abstract data type (ADT):
The concept of an ADT is based on encapsulating data and a set of operations
on the data. An ADT declaration is in some respect similar to a class declaration
in an object-oriented programming language, except that the abstract data
type is typically declared in a module. Like a class, an abstract data
type has an internal state and a set of operations on its state. However,
the state is global, i.e. only one "instance" exists at any one time. Inheritance
is not supported.
The following Modula-2 example stack abstraction is from the textbook
page 124:
CONST stack_size = ...
TYPE element = ...
...
MODULE stack;
IMPORT element, stack_size;
EXPORT push, pop;
TYPE
stack_index = [1..stack_size];
VAR
s : ARRAY stack_index OF element;
top : stack_index;
PROCEDURE error; ...
PROCEDURE push (elem : element);
BEGIN
IF top = stack_size THEN
error;
ELSE
s[top] := elem;
top := top + 1;
END;
END push;
PROCEDURE pop () : element
BEGIN
IF top = 1 THEN
error;
ELSE
top := top - 1;
RETURN s[top];
END;
END pop;
BEGIN
top := 1;
END stack;
Class:
The concept of a class extends the notion of abstract data types
(ADTs) with inheritance
.
ADTs are limited to packages that encapsulate a data type declaration with
a set of operations.
The following is an example stack template class in C++:
template <class element> class stack
{ private:
int top;
element[] s;
public:
stack(int size)
{ s = new element[size]; top = 0; };
~stack()
{ delete[] s; };
void push(element elem)
{ s[top++] = elem; };
void pop(void)
{ top--; };
element top(void)
{ return s[top-1]; };
};
Inheritance:
A class
inherits structure and properties from a base class. Some object oriented
languages support multiple inheritance. Single inheritance is simpler to
implement and avoids possible ambiguity problems caused by multiple inheritance.
Therefore, newer languages such as Java and C# support single inheritance.
Ada (Ada 83):
History's largest design effort is the development of the Ada language,
primarely based on the design of Pascal
.
Over 40 organizations outside DoD with over 200 participants collaborated
(and competed with different designs) in the final design of Ada. Originally
intended to be the standard language for all software commissioned by the
US Department of Defense. Prototypes designed by teams at several sites;
final '83 language developed by a team at Honeywell's Systems and Research
Center in Minneapolis and Alsys Corp. in France, led by Jean Ichbiah.
Example program in Ada:
with TEXT_IO;
use TEXT_IO;
procedure AVEX is
package INT_IO is new INTEGER_IO (INTEGER);
use INT_IO;
type INT_LIST_TYPE is array (1..99) of INTEGER;
INT_LIST : INT_LIST_TYPE;
LIST_LEN, SUM, AVERAGE : INTEGER;
begin
SUM := 0;
-- read the length of the input list
GET (LIST_LEN);
if (LIST_LEN > 0) and (LIST_LEN < 100) then
-- read the input into an array
for COUNTER := 1 .. LIST_LEN loop
GET (INT_LIST(COUNTER));
SUM := SUM + INT_LIST(COUNTER);
end loop;
-- compute the average
AVERAGE := SUM / LIST_LEN;
-- write the input values > average
for counter := 1 .. LIST_LEN loop
if (INT_LIST(COUNTER) > AVERAGE) then
PUT (INT_LIST(COUNTER));
NEW_LINE;
end if
end loop;
else
PUT_LINE ("Error in input list length");
end if;
end AVEX;
Ada 95:
Ada 95 is a revision developed under government contract by a team
at Intermetrics, Inc. It fixes several subtle problems in the earlier language,
and adds objects, shared-memory synchronization, and several other features.
Algol 60:
The original block-structured
language. The design of Algol 60 is a landmark of clarity and conciseness
and made a first use of Backus-Naur form (BNF)
for formally defining the grammar. All subsequent imperative programming
languages are based on Algol 60, and these languages are sometimes referred
to as "Algol-like" languages. Strangely, it lacks input/output statements
and has no character set. Algol 60 never gained wide acceptance in the
US, partly because of the intrenchment of Fortran and lack of support by
IBM. Besides block-structures, Algol 60 has recursion
and stack-dynamic arrays
.
Example Algol 60 program:
comment avex program
begin
integer array intlist [1:99];
integer listlen, counter, sum, average;
sum := 0;
comment read the length of the input list
readint (listlen);
if (listlen > 0) L (listlen < 100) then
begin
comment read the input into an array
for counter := 1 step 1 until listlen do
begin
readint (intlist[counter]);
sum := sum + intlist[counter]
end;
comment compute the average
average := sum / listlen;
comment write the input values > average
for counter := 1 step 1 until listlen do
if intlist[counter] > average then
printint (intlist[counter])
end
else
printstring ("Error in input list length")
end
Algol 68:
Introduced user defined types with an attempt to design a language
that is orthogonal: a few primitive types and structures can be
combined to form new types and structures. The languages also added new
programming constructs to Algol 60 which were already available in other
languages. Unfortunately, the Algol 68 documentation was unreadable and
Algol 68 never gained widespread acceptance. Includes (among other things)
structures and unions, expression-based syntax, reference parameters, a
reference model of variables, and concurrency.
Algol W:
A smaller, simpler alternative to Algol 68
,
proposed by Niklaus Wirth and C. A. R. Hoare. The precursor to Pascal
.
Introduced the case statement.
APL:
A functional language designed by Kenneth Iverson in the late 1950's
and early 1960's, primarily for the manipulation of numeric arrays. Extremely
concise language with a powerful set of operators. It employs an extended
character set to express operators with special symbols. Intended for interactive
use ``throw away programming'' (quick programming of a solution that is
not intended to be kept: hard to understand the programming solution later!).
Example APL program:
(2=(+/[2]0=(iN°.|(iN)))/iN
This program computes prime numbers in the range 1 to N.
BASIC:
BASIC (Beginner's All-purpose Symbolic Instruction Set) is a simple
imperative language that gained popularity because of its ease of use and
its interpreted execution, despite the fact that the early versions lacked
many language features found in modern languages (e.g. procedures). Many
dialects exists. The most widely used version of BASIC today is Microsoft's
Visual Basic. The structure of programs written in early Basic dialects
resemble the structure of Fortran programs with similar limitations.
Example QuickBasic program:
REM avex program
DIM intlist(99)
sum = 0
REM read the length of the input list
INPUT listlen
IF listlen > 0 AND listlen < 100 THEN
REM read the input into an array
FOR counter = 1 TO listlen
INPUT intlist(counter)
sum = sum + intlist(counter)
NEXT counter
REM compute the average
average = sum / listlen
REM write the input values > average
FOR counter = 1 TO listlen
IF intlist(counter) > average THEN
PRINT intlist(counter);
NEXT counter
ELSE
PRINT "Error in input list length"
END IF
END
C:
C is one of the most successful imperative languages that was originally
defined as part of the development of the UNIX operating system. It is
still considered a system's programming language for which certain features
such as pointers and the absence of dynamic semantic checks (e.g. array
bound checking) are very useful to manipulate memory. Two notably different
version of C exist: the original K&R (Kernighan and Ritchie)
version and ANSI C.
Example program in C:
main()
{ int intlist[99], listlen, counter, sum, average;
sum = 0;
/* read the length of the list */
scanf("%d", &listlen);
if (listlen > 0 && listlen < 100)
{ /* read the input into an array */
for (counter = 0; counter < listlen; counter++)
{ scanf("%d", &intlist[counter]);
sum += intlist[counter];
}
/* compute the average */
average = sum / listlen;
/* write the input values > average */
for (counter = 0; counter < listlen; counter++)
if (intlist[counter] > average)
printf("%d\n", intlist[counter]);
}
else
printf("Error in input list length\n");
}
C++:
The most successful of several object-oriented successors of C. It
is a large an fairly complex language, in part because it supports both
procedural and object-oriented programming. The Standard Template Library
(STL) is an important library with common compound data types and operations.
main()
{ std::vector<int> intlist;
int listlen;
/* read the length of the list */
std::cin >> listlen;
if (listlen > 0 && listlen < 100)
{ int sum = 0;
/* read the input into an STL vector */
for (int counter = 0; counter < listlen; counter++)
{ int value;
std::cin >> value;
intlist.push_back(value);
sum += value;
}
/* compute the average */
int average = sum / listlen;
/* write the input values > average */
for (std::vector<int>::const_iterator it = intlist.begin(); it != intlist.end(); ++it)
if ((*it) > average)
std::cout << (*it) << std::endl;
}
else
std::cerr << "Error in input list length" << std::endl;
}
C#:
Pronounced ``C sharp''. A language developed by Microsoft that is very
similar to Java
.
C# is part of Microsoft's VisualStudio.NET, a development environment
for Internet-based computing. C# uses the Common Lanuage Runtime (CLR)
to manage objects that can be shared among different languages (C#,
VisualBasic, C++, Haskell). Objects can be exchanged over the Web and remote
methods can be invoked using SOAP (Simple Object Access Protocol).
COBOL:
COBOL (COmmon Business Oriented Language) was for long the most widely
used programming language in the world. COBOL is intended primarely for
business data processing with elaborate input/output facilities. It supports
extensive numerical formatting features and decimal number storage format.
COBOL introduced the concept of records and nested selection statements.
Still the most widely used programming language for business applications
on mainframes and minis. Originally developed by the Department of Defense.
The language is very wordy and adopts English names for arithmetic operators.
A COBOL program is structured into the following divisions:
Division name Contains
IDENTIFICATION Program identification.
ENVIRONMENT Types of computers used.
DATA Buffers, constants, work areas.
PROCEDURE The processing parts (program logic).
Example COBOL program to convert Fahrenheit to Celcius:
IDENTIFICATION DIVISION.
PROGRAM-ID. EXAMPLE.
ENVIRONMENT DIVISION.
CONFIGURATION SECTION.
SOURCE-COMPUTER. IBM-370.
OBJECT-COMPUTER. IBM-370.
DATA DIVISION.
WORKING-STORAGE SECTION.
77 FAHR PICTURE 999.
77 CENT PICTURE 999.
PROCEDURE DIVISION.
DISPLAY 'Enter Fahrenheit ' UPON CONSOLE.
ACCEPT FAHR FROM CONSOLE.
COMPUTE CENT = (FAHR- 32) * 5 / 9.
DISPLAY 'Celsius is ' CENT UPON CONSOLE.
GOBACK.
CLOS:
The Common Lisp Object System is a set of object-oriented extensions
to Common Lisp, now incorporated into the ANSI standard language (see Common
Lisp
).
The leading notation for object-oriented functional programming.
Eiffel:
An object-oriented language developed by Bertrand Meyer and associates
at the Societe des Outils du Logiciela Paris. Includes (among other things)
multiple inheritance
,
automatic garbage collection
,
and powerful mechanisms for re-naming of data members and methods in derived
classes.
Euclid:
Imperative language developed by Butler Lampson and associates at the
Xerox Palo Alto Research Center in the mid 1970's. Designed to eliminate
many of the sources of common programming errors in Pascal
,
and to facilitate formal verification of programs. Has closed scopes and
module types.
Fortran (I, II, IV, 77):
The first high-level programming language was Fortran (I) (FORmula
TRANslator),
developed in the mid-50s. It had a dramatic impact on computing in early
days when most of the programming took place in machine code or assembly
code for an assembler
.
It was originally designed to express mathematical formulas. Fortran 77
is still widely used for scientific, engineering, and numerical problems,
mainly because very good compilers exist. These compilers are very effective
in optimizing code, because of the maturity of the compilers and due to
the lack of pointers and recursion
in Fortran 77. Fortran 77 has limited type checking and lacks records,
unions, dynamic allocation, case-statements, and while-loops.
Variable names are upper case and the name length is limited to 6 characters.
Fortran 77 is not structured
and not object-oriented
.
More recent Fortran dialects such as Fortran 90
are better structured and support modern programming constructs.
Example Fortran 77 program:
PROGRAM AVEX
INTEGER INTLST(99)
C variable names that start with I,J,K,L,N,M are integers
ISUM = 0
C read the length of the list
READ (*, *) LSTLEN
IF ((LSTLEN .GT. 0) .AND. (LSTLEN .LT. 100)) THEN
C read the input in an array
DO 100 ICTR = 1, LSTLEN
READ (*, *) INTLST(ICTR)
ISUM = ISUM + INTLST(ICTR)
100 CONTINUE
C compute the average
IAVE = ISUM / LSTLEN
C write the input values > average
DO 110 ICTR = 1, LSTLEN
IF (INTLST(ICTR) .GT. IAVE) THEN
WRITE (*, *) INTLST(ICTR)
END IF
110 CONTINUE
ELSE
WRITE (*, *) 'ERROR IN LIST LENGTH'
END IF
END
Fortran (90, 95, HPF):
Fortran 90 is a major revision of the language. Recursion
,
pointers, records, dynamic allocation, a module facility, and new control
flow constructs are added. Also array operations are added that operate
on arrays and array slices. Array operations on distributed arrays in HPF
can be parallelized.
Example Fortran 90 program:
PROGRAM AVEX
INTEGER INT_LIST(1:99)
INTEGER LIST_LEN, COUNTER, AVERAGE
C read the length of the list
READ (*, *) LISTLEN
IF ((LIST_LEN > 0) .AND. (LIST_LEN < 100)) THEN
C read the input in an array
DO COUNTER = 1, LIST_LEN
READ (*, *) INT_LIST(COUNTER)
END DO
C compute the average
AVERAGE = SUM(INT_LIST(1:LIST_LEN)) / LIST_LEN
C write the input values > average
DO COUNTER = 1, LIST_LEN
IF (INT_LIST(COUNTER) > AVERAGE) THEN
WRITE (*, *) INT_LIST(1:LIST_LEN)
END IF
END DO
ELSE
WRITE (*, *) 'ERROR IN LIST LENGTH'
END IF
END
Haskell:
Haskell is the currently leading functional programming language. Descended
from Miranda
.
Designed by a committee of researchers beginning in 1987. Includes curried
functions, higher-order functions
,
non-strict semantics, static polymorphic typing, pattern matching, list
comprehensions, modules, monadic I/O, and layout (indentation)-based syntactic
grouping.
Example Haskell program:
sum [] = 0
sum (a:x) = a + sum x
avex [] = []
avex (a:x) = [n | n <- a:x, n > sum (a:x) / length (a:x)]
Java:
Java is an object-oriented language based largely on C++, developed
at SUN Microsystems. The language is intended for the construction of highly
portable, machine-independent programs. Includes (among other things) a
reference model of (class-typed) variables, mix-in inheritance
,
threads, and extensive pre-defined libraries for graphics, communication,
etc. Heavily used for transmission of program fragments, called applets,
over the Internet. The language is designed to be translated into intermediate
Java byte code that can be transmitted over the Internet. Java byte code
is executed by the Java virtual machine (JVM)
or compiled into native machine code by a just-in-time (JIT) compiler
.
Java is a safe language
.
Example Java program:
import java.io;
class Avex
{ public static void main(String args[]) throws IOException
{ DataInputStream in = new DataInputStream(System.in);
int listlen, counter, sum = 0, average;
int [] intlist = int[100];
// read the length of the list
listlen = Integer.parseInt(in.readLine());
if (listlen > 0 && listlen < 100)
{ // read the input into an array
for (counter = 0; counter < listlen; counter++)
{ intlist[counter] = Integer.valueOf(in.readline()).intValue();
sum += intlist[counter];
}
// compute the average
average = sum / listlen;
// write the input values > average
for (counter = 0; counter < listlen; counter++)
{ if (intlist[counter] > average)
System.out.println(intlist[counter] + "\n");
}
}
else
System.out.println("Error in input length\n");
}
}
Lisp:
Lisp (LISt Processing language) was developed by McCarthy as a realization
of Church's lambda calculus
.
Many dialects exists, among which Common Lisp and Scheme
are the most popular. Lisp is the dominant language used in Artificial
Intelligence. The emphasis is on symbolic computation rather than numeric.
Lisp is very powerful for symbolic computation with lists and Lisp often
used in artificial intelligence. As a functional language
,
all control is performed by recursion
and conditional expressions. Lisp was the first language with implicit
memory management (automatic allocate and deallocate) by "garbage collection"
.
Lisp heavily influenced functional programming languages (e.g. ML
,
Miranda
,
Haskell
)
Miranda:
A purely functional language designed by David Turner in the mid 1980's.
Resembles ML
in several respects; has type inference and automatic currying. Unlike
ML, provides list comprehensions, and uses lazy evaluation for all arguments.
Like Haskell
,
it uses indentation and line breaks for syntactic grouping.
ML:
A functional language with "Pascal-like" syntax. Originally designed
in the mid to late 1970's by Robin Milner and associates at the University
of Edinburgh as the meta-language for a program verification system. Pioneered
aggressive compile-time type inference and polymorphism. ML has a few imperative
features.
Modula-2:
The immediate successors to Pascal, developed by Niklaus Wirth. The
original Modula was an explicitly concurrent monitor-based language. Modula-2
was originally designed with coroutines
,
but no real concurrency. Both languages provide mechanisms for module-as-manager
style data abstractions.
Modula-3:
A major extension to Modula-2
developed by Luca Cardelli, Jim Donahue, Mick Jordan, Bill Kalsow, and
Greg Nelson at the Digital Systems Research Center and the Olivetti Research
Center in the late 1980's. Intended to provide a level of support for large,
reliable, and maintainable systems comparable to that of Ada
,
but in a simpler and more elegant form.
Oberon:
A deliberately minimal language designed by Niklaus Wirth. Essentially
a subset of Modula-2
,
augmented with a mechanism for type extension.
Pascal:
A high-level programming language designed by Swiss professor Niklaus
Wirth (Wirth is pronounced "Virt") in the late 60s and named after the
French mathematician, Blaise Pascal. It was designed largely in reaction
to Algol 68
,
which was widely perceived as bloated. It is noted for its structured programming
and was heavily used in the 70s and 80s, particularly for teaching. Pascal
has had strong influence on subsequent high-level languages, such as Ada
,
ML
,
Modula-2
and Modula-3
.
Example Pascal program:
program avex(input, output);
type
intlisttype = array [1..99] of integer;
var
intlist : intlisttype;
listlen, counter, sum, average : integer;
begin
sum := 0;
(* read the length of the input list *)
readln(listlen);
if ((listlen > 0) and (listlen < 100)) then
begin
(* read the input into an array *)
for counter := 1 to listlen do
begin
readln(intlist[counter]);
sum := sum + intlist[counter]
end;
(* compute the average *)
average := sum / listlen;
(* write the input values > average *)
for counter := 1 to listlen do
if (intlist[counter] > average) then
writeln(intlist[counter])
end
else
writeln('Error in input list length')
end.
PL/I:
Developed by IBM and intended to displace Fortran
,
COBOL
,
and Algol
.
Very complicated and poorly designed language that is kept alive by IBM.
The first language that adopted exception handling
and pointer types.
Example PL/I program:
AVEX: PROCEDURE OPTIONS (MAIN);
DECLARE INTLIST (1:99) FIXED;
DECLARE (LISTLEN, COUNTER, SUM, AVERAGE) FIXED;
SUM = 0;
/* read the input list length */
GET LIST (LISTLEN);
IF (LISTLEN > 0) & (LISTLEN < 100) THEN
DO;
/* read the input into an array */
DO COUNTER = 1 TO LISTLEN;
GET LIST (INTLIST(COUNTER));
SUM = SUM + INTLIST(COUNTER);
END;
/* compute the average */
AVERAGE = SUM / LISTLEN;
/* write the input values > average */
DO COUNTER = 1 TO LISTLEN;
IF INTLIST(COUNTER) > AVERAGE THEN
PUT LIST (INTLIST(COUNTER));
END;
ELSE
PUT SKIP LIST ('ERROR IN INPUT LIST LENGTH');
END AVEX;
Prolog:
Prolog is the most polular logic programming language. Most Prolog
systems are conforming to the ISO Prolog standard, but deviations make
it hard to write Prolog programs that are portable between different Prolog
systems. The language is based on formal logic and it can be summarized
as an intelligent database system that uses an inferencing process to infer
the truth of given queries.
Example Prolog program:
avex(IntList, GreaterThanAveList) :-
sum(IntList, Sum),
length(IntList, ListLen),
Average is Sum / ListLen,
filtergreater(IntList, Average, GreaterThanAveList).
% sum(+IntList, -Sum)
% recursively sums integers of IntList
sum([Int | IntList], Sum) :-
sum(IntList, ListSum),
Sum is Int + ListSum.
sum([], 0).
% filtergreater(+IntList, +Int, -GreaterThanIntList)
% recursively remove integers smaller or equal to Int from IntList
filtergreater([AnInt | IntList], Int, [AnInt | GreaterThanIntList]) :-
AnInt > Int, !,
filtergreater(IntList, Int, GreaterThanIntList).
filtergreater([AnInt | IntList], Int, GreaterThanIntList) :-
filtergreater(IntList, Int, GreaterThanIntList).
filtergreater([], Int, []).
The following example illustrates a more "traditional" use of Prolog to
infer information from a database of facts:
rainy(rochester). % fact: rochester is rainy
rainy(seattle). % fact: seattle is rainy
cold(rochester). % fact: rochester is cold
snowy(X) :- rainy(X), cold(X). % rule: X is snowy if X is rainy and cold
With this program loaded, we can query the system interactively:
?- rainy(X). % user question
X = rochester % system answer(s)
X = seattle
?- snowy(X).
X = rochester
Scheme:
Scheme is one of the most popular dialects of Lisp
.
Developed in the mid 1970's by Guy Steele and Gerald Sussman. Standardized
by the IEEE and ANSI. Has static scoping and true first-class functions.
Scheme is widely used for teaching.
Example Scheme program:
(DEFINE (avex lis)
(filtergreater lis (/ (sum lis) (length lis)))
)
(DEFINE (sum lis)
(COND
((NULL? lis) 0)
(ELSE (+ (CAR lis) (sum (CDR lis))))
)
)
(DEFINE (filtergreater lis num)
(COND
((NULL? lis) '())
((> (CAR lis) num) (CONS (CAR lis) (filtergreater (CDR lis) num)))
(ELSE (filtergreater (CDR lis) num)
)
)
Simula 67:
Designed at the Norwegian Computing Centre, Oslo, in the mid 1960's
by Ole-Johan Dahl, Bjorn Myhrhaug, and Kristen Nygaard. Extends Algol 60
with classes and coroutines
.
The name of the language reflects its suitability for discrete-event simulation
Smalltalk-80:
The first full implementation of an object-oriented language is still
considered the quintessential object-oriented language. Developed at Xerox
PARC pioneered the use of graphical user interfaces.
Example Smalltalk-80 program:
class name Avex
superclass Object
instance variable names intlist
"Class methods"
"Create an instance"
new
^ super new
"Instance methods"
"Initialize"
initialize
intlist <- Array new: 0
"Add int to list"
add: n | oldintlist |
oldintlist <- intlist.
intlist <- Array new: intlist size + 1.
intlist <- replaceFrom: 1 to: intlist size with: oldintlist.
^ intlist at: intlist size put: n
"Calculate average"
average | sum |
sum <- 0.
1 to: intlist size do:
[:index | sum <- sum + intlist at: index].
^ sum // intlist size
"Filter greater than average"
filtergreater: n | oldintlist i |
oldintlist <- intlist.
i <- 1.
1 to: oldintlist size do:
[:index | (oldintlist at: index) > n
ifTrue: [oldintlist at: i put: (oldintlist at: index)]]
intlist <- Array new: oldintlist size.
intlist replaceFrom: 1 to: oldintlist size with: oldintlist
Example Smalltalk-80 session:
av <- Avex new
av initialize
av add: 1
1
av add: 2
2
av add: 3
3
av filtergreater: av average
av at: 1
3
Lambda calculus:
A very simple algebraic model of computation designed byAlonzo Church
in the 60s. In its pure form, everything is a function (even primitive
types such as numbers and compound data structures such as lists). While
the syntax and the rewrite rules of lambda calculus are very primitive,
it has been shown that lambda calculus provides a theoretical model of
computation. This model is actually easier to program than a Turing machine,
the other well-known model of computation. Lisp
is a direct realization of lambda calculus as a programming language.
A lambda expression is recursively defined as
-
a name which is treated as a symbolic constant or function name
-
a variable
-
a lambda abstraction which is essentially a nameless function written
as l var . lambda-expression, where var
is a variable denoting the formal argument (input value) and lambda-expression
is the body of the function (the return value)
-
a function application written as adjacent lambda expressions
-
a parenthesized lambda expression
Only two rewrite rules on lambda expressions are necessary to model universal
computation: alpha reduction and beta reduction. The difference
between a name and a variable becomes clear in the context in which they
are used. A variable is used in a lambda abstraction to represent the input
parameter, while a name is ``inert'', i.e. it has no value other than the
name itself.
Some examples:
f a denotes the application of a function symbol f to
an argument a.
l v . v is a lambda abstraction that
denotes the identity function: argument v takes a value and the
function returns the (unchanged) value. The application of this function
to an expression a, for example, is written (l
v . v) a and this evaluates to a (argument v takes
a
and the function returns the value of v).
l v. f v is a lambda abstraction that,
when applied to a value, applies f to it. For example, (lv.
f v) a results in f a.
l f. f a is a lambda abstraction that,
when applied to a function name, applies this function to a. For
example, (l f. f a) g results
in g a.
Coroutines:
A routine that runs concurrently with other coroutines. Coroutines
do not necessarily run in parallel. A coroutine can temporarily relinguish
control to another coroutine without involving the subroutine calling mechanism.
That is, control in a coroutine can jump to another coroutine and back,
possibly multiple times during the lifetime of a coroutine. Therefore,
it appears as if the coroutines are operating concurrently.
Garbage collection:
A routine that searches memory for program segments or data that are
no longer active or used in order to reclaim that space. It tries to make
as much memory available on the heap
as possible. Implicit garbage collection operates on the background of
a running program to clean up unused heap space.
Heap:
An area in memory for the dynamic creation of data during the lifetime
of a program. The heap contains application data that is not static or
stack-allocated
.
Stack:
A lifo (last in first out) structure to hold temporary data. The implementation
of a programming language requires at least one stack data structure for
subroutine calling (and object oriented method invocation) in languages
that support recursion. The stack holds the return address of the caller
of a subroutine and the parameters passed to the subroutine. Local stack-allocated
data for the subroutine is also pushed on the stack.
Regular expression:
Regular expressions describe the tokens
of a programming language. A regular expression is one of
-
a character
-
empty (denoted e)
-
concatenation: sequence of regular expressions denoting a concatenation
-
alternation: regular expressions separated by a bar | are alternative forms
-
repetition: a regular expression followed by a star * means that the regular
expression is repeated zero, one, or more times
For example, the regular expression describing an identifier in C/C++ is:
identifier -> letter (letter | digit)*
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
letter -> a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
| _
For compiler design, tools exist that generate efficient scanners
automatically from regular expressions (e.g. flex).
BNF:
Backus-Naur Form (BNF) is a form of context-free grammar frequently
used to describe a programming language syntax.
-
BNF grammar productions are of the form
<nonterminal> -> sequence of (non)terminals
and give a description of the syntax for the nonterminal
-
A terminal
of
a grammar is a token
e.g. specific programming language keyword or identifier
-
A <nonterminal>
denotes
a syntactic category, e.g. a collection of program statements.
For example, an assignment statement
<stmt> -> <id> := <expr>
-
The symbol | (bar) denotes alternative forms in a production, e.g. different
program statements are catagorized.
For example:
<stmt> -> return | break | <id>
:= <expression>
-
The special symbol e denotes empty, and is often
used in optional constructs.
For example:
<optional_static> -> static | e
-
Extended BNF includes an explicit form for optional constructs with [ and
].
For example:
<stmt> -> for <id> := <expr>
to
<expr> [ step <expr> ] do <stmt>
-
Extended BNF includes a repetition construct * (star).
For example:
<decl> -> int <id> (, <id>)*
LL grammar:
An LL grammar is a grammar suitable for top-down parsing. If it is
not possible to write a recursive descent parser for a grammar, it is not
LL(1). An LL(n) grammar is a grammar suitable for top-down parsing
using n lookahead tokens.
An LL grammar cannot have left-recursive productions
,
because a recursive descent parser would recursively call itself forever
without consuming any input characters.
The following grammar is not LL(1)
<A> -> <B> <C>
<A> -> a
<B> -> a b
<B> -> b
<C> -> c
It is not LL(1) because the subroutine for nonterminal A cannot decide
which production to use when it sees an a on the input:
proc A
if next_token="a"
?? cannot decide whether the first or second production for <A> applies here ??
The grammar is LL(2), because the token after next token can be used to
determine which production should be applied:
proc A
if next_token="a" and token_after_next_token="b"
B()
C()
else if next_token="b"
B()
C()
else
match("a");
LR grammar:
A LR grammar is a grammar suitable for bottom-up parsing. A LR(n)
grammar is a grammar suitable for bottom-up parsing using n lookeahead
tokens. The class of LR grammars includes the class of LL grammars.
Ambiguous grammar:
A grammar is ambiguous if a string exists that has more than one distinct
derivation resulting in distinct parse trees. See also ambiguous if-then-else
.
The grammar for simple expressions below is ambiguous:
<expression> -> identifier
| unsigned_integer
| - <expression>
| ( <expression> )
| <expression> <operator> <expression>
<operator> -> + | - | * | /
because we find two distinct (left-most) derivations for the string a-b+1:
<expression>
=> <expression> <operator> <expression>
=> <expression> <operator> <expression> <operator> <expression>
=> identifier <operator> <expression> <operator> <expression>
=> identifier - <expression> <operator> <expression>
=> identifier - identifier <operator> <expression>
=> identifier - identifier + <expression>
=> identifier - identifier + unsigned_integer
(a) - (b) + (1)
and
<expression>
=> <expression> <operator> <expression>
=> identifier <operator> <expression>
=> identifier - <expression>
=> identifier - <expression> <operator> <expression>
=> identifier - identifier <operator> <expression>
=> identifier - identifier + <expression>
=> identifier - identifier + unsigned_integer
(a) - (b) + (1)
The simple expression grammar below is unambiguous:
<expression> -> <term>
| <expression> <add_op> <term>
<term> -> <factor>
| <term> <mult_op> <factor>
<factor> -> identifier | unsigned_integer
| - <factor> | ( <expression> )
<add_op> -> + | -
<mult_op> -> * | /
We find only one derivation for all strings in the language defined by
the grammar. For example, the left-most derivation of a-b+1 is:
<expression>
=> <expression> <add_op> <term>
=> <expression> <add_op> <term> <add_op> <term>
=> <term> <add_op> <term> <add_op> <term>
=> <factor> <add_op> <term> <add_op> <term>
=> identifier <add_op> <term> <add_op> <term>
=> identifier - <term> <add_op> <term>
=> identifier - <factor> <add_op> <term>
=> identifier - identifier <add_op> <term>
=> identifier - identifier + <term>
=> identifier - identifier + <factor>
=> identifier - identifier + unsigned_integer
(a) - (b) + (1)
Ambiguous if-then-else:
A problem with the if-then-else grammar for Pascal and C is the formulation
of unambiguous grammar productions for if-then-else. The grammar below
is ambiguous
<stmt> -> if <expr> then <stmt>
| if <expr> then <stmt> else <stmt>
because we find two distinct derivations of the string
if C1 then if C2 then S1 else S2
(where C1 and C2 are some expressions, S1 and S2 are some statements):
<stmt>
=> if <expr> then <stmt>
=> if <expr> then if <expr> then <stmt> else <stmt>
and another derivation
<stmt>
=> if <expr> then <stmt> else <stmt>
=> if <expr> then if <expr> then <stmt> else <stmt>
An anuambiguous grammar for if-then-else is (you don't need to memorize
this):
<stmt> -> <balanced_stmt>
| <unbalanced_stmt>
<balanced_stmt> -> if <expr> then <balanced_stmt> else <balanced_stmt>
| <other_stmt>
<unbalanced_stmt> -> if <expr> then
| if <expr> then <balanced_stmt> else <unbalanced_stmt>
which is an LR grammar
,
but not an LL grammar
and no pure top-down parser
can be used to parse program fragments with the unambiguous if-then-else
grammar.
Attribute grammar:
A grammar augmented with attributes for terminals and nonterminals
and semantic rules
that operate on the attribute values.
Semantic rule:
A rule with a grammar production that is used to operate on the values
of the attributes of terminals and nonterminals in the grammar.
Example
grammar production semantic rule
<number1> -> <number2> <digit> number1.value := 10*number2.value + digit.value
<number> -> <digit> number.value := digit.value
<digit> -> 0 digit.value := 0
| 1 digit.value := 1
| 2 digit.value := 2
| 3 digit.value := 3
| 4 digit.value := 4
| 5 digit.value := 5
| 6 digit.value := 6
| 7 digit.value := 7
| 8 digit.value := 8
| 9 digit.value := 9
In this example, the nonterminals <number> and <digit> have an attribute
'value' that holds the value of the numeric representation defined by the
grammar. When the semantic rules are applied on the syntactic representation
of the input, the rules compute the value of the input. When the values
computed are used to check the validity of the input, then the semantic
rules are used to enforce semantic checks
.
Note: the nonterminal <number> has subscripts in the first production
to distinguish the nonterminal on the left hand side and right hand side
of the production.
Semantic checks:
Static semantic checks
are performed by the compiler at compile time. Dynamic semantic checks
are
performed at run time. A compiler cannot always ensure that certain constraints
on programming constructs are met at compile time, for example, whether
the index value of an array is out of bounds. A compiler may generate run
time checks in the target code to enforce these constraints at run time.
Static semantic
checks:
Static semantic checks performed by a compiler at compile time is applied
to ensure that variables are declared before used, variables are typed
correctly in expressions, labels have targets, etc.
Dynamic semantic
checks:
A compiler may generate run time checks in the target code to enforce
programming language specific onstraints on programming constructs at run
time. An interpreter or virtual machine may enforce constraints immediately
while executing an intruction. Exceptions are raised when an error is detected.
Tokens:
Tokens are the indivisible units a scanner
of a compiler produces for further analysis by the parser. Example tokens
are programming language keywords, operators, identifiers, numbers, and
punctuation. Tokens are also called terminals
in the context of grammars
.
Terminals:
A terminal of a grammar of a programming language is a token
,
e.g. a keyword or operator.
Nonterminals:
A nonterminal of a grammar denotes a syntactic category of a language.
For example, a programming language statement as a syntactic category can
be one of many alternative statements.
Production:
BNF
grammar productions are of the form
<nonterminal> -> sequence of (non)terminals
Productions provide descriptions of the syntax for a syntactic category
denoted by a nonterminal.
A production is immediately left recursive if it is of the form
<A> -> <A> ...
and a production is immediately right recursive if it is of the
form
<A> -> ... <A>
where <A> is some nonterminal.
Productions can be left or right recursive through other productions.
For example
<A> -> <B> ...
<B> -> <A> ...
Derivation:
Parse tree:
A parse tree depicts a derivation
as a tree: the nodes are the nonterminals
,
the children of a node are the symbols (terminals and nonterminals) of
a right-hand side of a production
for the nonterminal at the node, and the leaves are the terminals
.
Given the grammar
<id_list> -> identifier <id_list_tail>
<id_list_tail> -> , identifier <id_list_tail>
| ;
The parse tree of "A,B,C;" is
Abstract syntax tree (AST):
Associative:
An operator is left associative if the operations are performed from
the left to the right in an expression. Similarly, an operator is right
associative if the operations are performed from the right to the left
in an expression. For example, addition is left associative and in the
expression 1 + 2 + 3 the numbers 1 and 2 are added first, after which 3
is added. Note that for the addition of numbers the associativity of +
does not matter as the terms can be reordered in a formal system. However,
limited numeric precision in a computer restricts this reordering and an
overflow may occur when the terms are reordered. Also, if the terms are
functions with side effects
the result would be different after reordering. Arithmetic operators in
a programming language are typically left associative with the notable
exception of exponentiation (^) which is right associative. However, this
rule of thumb is not universal.
Associativity can be captured in a grammar. For a left associative binary
operator
op we have a production of the form
<expr> -> <term> | <expr> op <term>
and for a right associative operator <op> we have a production
of the form
<expr> -> <term> | <term> op <expr>
Note that the production for a left associative operator is left recursive
and therefore has to be rewritten for a recursive descent parser:
<expr> -> <term> <more_terms>
<more_terms> -> op <term> <more_terms> | e
Precedence:
The precedence of an operator indicates the priority of applying the
operator relative to other operators. For example, multiplication has a
higher precedence than addition, so a+b*c is evaluated by multiplying b
and c first, after which a is added. That is, multiplication groups more
tightly compared to addition. The rules of operator precedence vary from
one programming language to another.
The relative precedences between operators can be captured in a grammar.
A nonterminal is introduced for every group of operators with identical
precedence. The nonterminal of the group of operators with lowest precedence
is the nonterminal for the expression as a whole. Productions for (left
associative) binary operators with lowest to highest precedences are written
of the form
<expr> -> <expr1> | <expr> <lowest_op> <expr1>
<expr1> -> <expr2> | <expr1> <one_but_lowest_op> <expr2>
...
<expr9> -> <term> | <expr9> <highest_op> <term>
<term> -> identifier | number | - <term> | ( <expr> )
where <lowest_op> is a nonterminal denoting all operators
with the same lowest precedence, etc.
Scanner:
A scanner of a compiler
breaks up the character stream of a source program into tokens
.
The process of scanning comprises the lexical analysis phase of a compiler.
The purpose of scanning is to simplify the task of the parser of the compiler.
Comments and white space are removed, keywords are recognized and represented
as tokens
,
identifers for names of variables and functions are stored in a symbol
table and tagged with source file and line numbers.
Example scanner written in Java:
import java.io.*;
public class Scanner
{ public static void main(String argv[]) throws IOException
{ FileInputStream stream = new FileInputStream(argv[0]);
InputStreamReader reader = new InputStreamReader(stream);
StreamTokenizer tokens = new StreamTokenizer(reader);
int next = 0;
while ((next = tokens.nextToken()) != tokens.TT_EOF)
{ switch (next)
{ case tokens.TT_WORD:
System.out.println("WORD: " + tokens.sval);
break;
case tokens.TT_NUMBER:
System.out.println("NUMBER: " + tokens.nval);
break;
default:
switch ((char)next)
{ case '"':
System.out.println("STRING: " + tokens.sval);
break;
case '\'':
System.out.println("CHAR: " + tokens.sval);
break;
default:
System.out.println("PUNCT: " + (char)next);
}
}
}
stream.close();
}
}
Get Java source. Save it with file name "Scanner.java",
compile it with "javac Scanner.java", and run it with "Scanner Scanner.java",
where the scanner is applied to itself.
Parser:
A parser of a compiler builds a parse tree
representation of a stream of tokens
.
The grammar of a programming language defines the parse tree structure
produced by a parser given a syntactically valid program fragment.
Top-down parser:
Also called a predictive parser. This type of parser proceeds building
a parse tree from the root down. An example top-down parser is a recursive
descent parser
.
Bottom-up parser:
This type of parser proceeds building a parse tree from the bottom
up.
Recursive descent
parser:
A top-down parser
based on recursive functions.
Consider for example, the following LL(1) grammar
<expr> -> <term> <term_tail>
<term_tail> -> <add_op> <term> <term_tail> | e
<term> -> <factor> <factor_tail>
<factor_tail> -> <mult_op> <factor> <factor_tail> | e
<factor> -> ( <expr> ) | - <factor> | identifier | unsigned_integer
<add_op> -> + | -
<mult_op> -> * | /
For this LL(1) grammar a recursive descent parser in Java is:
import java.io.*;
public class CalcParser
{ private static StreamTokenizer tokens;
private static int ahead;
public static void main(String argv[]) throws IOException
{ InputStreamReader reader = new InputStreamReader(System.in);
tokens = new StreamTokenizer(reader);
tokens.ordinaryChar('.');
tokens.ordinaryChar('-');
tokens.ordinaryChar('/');
get();
expr();
if (ahead == (int)'$')
System.out.println("Syntax ok");
else
System.out.println("Syntax error");
}
private static void get() throws IOException
{ ahead = tokens.nextToken();
}
private static void expr() throws IOException
{ term();
term_tail();
}
private static void term_tail() throws IOException
{ if (ahead == (int)'+' || ahead == (int)'-')
{ add_op();
term();
term_tail();
}
}
private static void term() throws IOException
{ factor();
factor_tail();
}
private static void factor_tail() throws IOException
{ if (ahead == (int)'*' || ahead == (int)'/')
{ mult_op();
factor();
factor_tail();
}
}
private static void factor() throws IOException
{ if (ahead == (int)'(')
{ get();
expr();
if (ahead == (int)')')
get();
else System.out.println("closing ) expected");
}
else if (ahead == (int)'-')
{ get();
factor();
}
else if (ahead == tokens.TT_WORD)
get();
else if (ahead == tokens.TT_NUMBER)
get();
else System.out.println("factor expected");
}
private static void add_op() throws IOException
{ if (ahead == (int)'+' || ahead == (int)'-')
get();
}
private static void mult_op() throws IOException
{ if (ahead == (int)'*' || ahead == (int)'/')
get();
}
}
Get Java source.This parser does not construct
a parse tree but verifies if a string terminated with a $ is an expression.
A recursive descent
parser to evaluate simple expressions:
Get Java source
To run the example, save the source with file name "Calc.java", compile
it with "javac Calc.java" and run it with "java Calc".
A recursive descent
parser to translate simple expressions into Lisp expressions:
Get Java source of the CalcAST class
Get Java source of the AST class
To run the example, save the CalcAST class source with file name "Calc.java",
save the AST class source with file name "AST.java", compile it with "javac
AST.java CalcAST.java" and run it with "java CalcAST".
Compiler:
A compiler translates source programs into assembly code
,
machine code, or code for a virtual machine
.
Just-in-time compiler:
A translator of intermediate code (e.g. for a virtual machine
)
into machine code for a particular platform. The translation is done just
before the program is executed. Just-in-time compilers are available for
many types of machines to translate Java byte code into native machine
code.
Virtual machine:
A virtual machine (VM) executes machine instructions in software.
The Java Virtual Machine is an abstract computing machine. Like a real
computing machine, it has an instruction set and uses various memory areas. It
is reasonably common to implement a programming language using a virtual
machine; the best-known virtual machine may be the P-Code machine of UCSD
Pascal.
For more information about the Java Virtual Machine, see the Java
Virtual Machine Specification.
Interpreter:
An interpreter is a virtual machine for a high-level language.
Loader:
Because the memory addressing in older systems was typically flat,
a loader was required to place a binary executable program in memory. The
loader modified the absolute addresses used for jumps and static data within
a program to reflect the change in addressing by the placement of the code
at a particular address in memory.
Linker:
A linker merges object codes
and static library routines together to produce a binary executable program.
Preprocessor:
A preprocessor applies macro expansion
to a source program. In C and C++ for example, #define macros
are expanded and header files are incuded in the source for the first phase
of compiler analysis (lexical analysis by the scanner).
Macro:
A definition of a name for a text fragment. Macro expansion is a repeated
provess that replaces the occurrences of macro names in a text with the
textual content of the macro.
Exception:
Function:
Procedure:
Formal parameter:
A formal parameter is an parameter declared with a subroutine definition.
It is an identifier refering to the value of an actual parameter
when the subroutine is called. For example, in the C program fragment
main(int argc, char argv[])
{ ... }
both argc and argv are formal parameters in main's
definition.
Also known as dummy arguments in Fortran.
Actual parameter:
A value or reference to an object which is passed to a function or
procedure.
Scope:
Static scope:
Dynamic Scope:
Exceptions and exception handling:
Inheritance:
Software development
environment:
An integrated software development environment (IDE) offers a source
code editor, compiler, linker, and debugger.
Side effect:
A side effect is an intentional modification of the value of a location
in memory to affect the global state of the machine. Side effects can change
the behavior of a function across multiple function calls. Functions that
return a value that solely depends on the values of the parameters passed
to it is called side-effect free.
Example of a function with a side effect:
sum = 0;
int accumulate(int value)
{ sum += value;
return sum;
}
Referentially transparent:
A referentially transparent expression is composed of side-effect
free functions and operators. No side-effects may occur in the evaluation
of the expression. As a result, the expression evaluates to a value that
is solely depending on the values of the variables used in the expression.
Example of a non-referentially transparent expression (where 'accumulate'
and 'sum' are defined in
)
is:
accumulate(2) + sum
This expression uses a function with a side effect. The value of this expression
is undetermined in C and C++, because these languages allow different operator
evaluation orders, which means that the value of 'sum' may or may not have
been updated through the 'accumulate' call.
Recursion:
Higher-order function:
Functions that take other functions as input parameters or return newly
constructed functions.