Robert van Engelen and Steven Bronson
Last update: March 29, 2012 9:46 AM
http://www.cs.fsu.edu/~engelen/courses/COT5315
Choice of additional topics
[SSPL] Syntax and Semantics of Programming Languages by Ken Slonneger and Barry Kurtz.
[TPL] Types and Programming Languages by Benjamin Pierce
[FP] Functional Programming by Anthony Field and Peter Harrison
All of the example programs used in this talk are available for download. The examples run with SWIProlog 5. Newer versons of SWIProlog broke the syntax that is used in these examples.
A metalanguage is a metalevel language and notation to define a language. The commonlyused BNF grammar notation is a metalevel notation to define the syntax of a (programming) language. For example, consider the expression language NB (expressions over Booleans and Natural numbers):
E ::= true  false  if E then E else E  0  succ E  pred E  iszero E 
where E
is a metavariable, namely the nonterminal that defines the syntactic category of NB expressions and ::=
and 
are metaoperators in the metalanguage.
There are many kinds of metalanguages that can be used to define a (programming) language and we will describe some of them. First, let's distinguish concrete syntax from abstract syntax.
A concrete syntax defines the set of words of a language concretely, where words are strings (sequences) of tokens (or terminals) from a given alphabet of symbols (or signs). A grammar defines the concrete syntax of a language. Positional information may be relevant in the syntax, as well as punctuation symbols such as parenthesis, commas, semicolons, and so on. A parser produces a concrete parse tree given a word that is syntactically correct.
For example, given the concrete syntax of NB expressions E
defined by the BNF grammar above and the word "if iszero pred succ 0 then if true then 0 else succ 0 else 0
", the resulting parse tree is:
___________________________E___________________________ /     \  __E__  _________E________  E  / \  /     \     E_            / \             E_   E  E  E_       \        \       E        E                  if iszero pred succ 0 then if true then 0 else succ 0 else 0 
An abstract syntax inductively defines the set of terms (or expressions) of a language by a finite set of abstract constructs over terms. A term is wellformed if it is derivable from the (abstract) syntax. A term is either an atom or a kary functor with k arguments that are terms. In the example abstract syntax for NB expressions E
defined below, true
, false
, and 0
are atoms, if
is a 3ary functor, succ
, pred
, and iszero
are unary functors:
E ::= true  false  if(E, E, E)  0  succ(E)  pred(E)  iszero(E) 
Because terms are composed of atoms and functors over terms, terms can be viewed as data structures, commonly referred to as abstract syntax trees (ASTs). An abstract syntax tree compactly represents a term without the unnecessary syntactic details found in concrete syntax trees, such as nonterminals and parenthesis for grouping expressions.
For example, the abstract syntax tree of the term if(iszero(pred(succ(0))), if(true, 0, succ(0)), 0)
is:
_______if__________ 
Instead of the abstract syntax definition for E
given above as a grammar above, we can also define terms inductively using set theory as a metalanguage. The set of terms is the smallest set T such that:
true
, false
, 0
} ⊆ Tsucc(
t_{1})
, pred(
t_{1})
, iszero(
t_{1})
} ⊆ Tif(
t_{1},
t_{2},
t_{3})
} ⊆ TA concrete definition of the set T is
S_{0} = ∅
S_{i+1} = {true
, false
, 0
} ∪ {succ(
t_{1})
, pred(
t_{1})
, iszero(
t_{1})
 t_{1} ∈ S_{i}} ∪ {if(
t_{1},
t_{2},
t_{3})
 t_{1}, t_{2}, t_{3} ∈ S_{i}}
T = ∪ S_{i}
Again, we are defining T as a set of terms that are trees, not strings.
We can also define the set of terms T by inference rules in the "natural deduction style" presentation of logical Post systems, where each inference rule is of the form
premises 
conclusion 
where the conclusion is a term t and the premises are n terms t_{i}:
t_{1} t_{2} ⋅⋅⋅ t_{n} 
t 
When n=0 we will simply write
t 
which is an axiom (or a fact).
In the sequel, we will consider terms constructed over atoms, functors, and (meta)variables. A term is closed (or ground) when it contains no free variables. A term containing free variables is a theorem. A Post system metavariable in a theorem can be instantiated to any term. An instance of an inference rule is obtained by replacing each metavariable by the same term in the rule's conclusion and premises (if any).
A proof is inductively defined as a finite set of inference rule instances such that
t_{1} t_{2} ⋅⋅⋅ t_{n} 
t 
Pr_{1} Pr_{2} ⋅⋅⋅ Pr_{n} 
t 
A term is provable if a proof can be constructed.
Consider the axiom
0 
and inference rule
X 
succ( X) 
where 0
is an atom, succ
is a unary functor, and X is a variable. The axiom "proves" the existance of the atom 0
as a fact. The inference rule derives new terms of the form succ(
X)
given term X. Thus, given the instance
0 
of the axiom, the term succ(0)
is provable by instantiating X=0
in the inference rule to obtain
0 
succ(0) 
We generally write the complete proof as derivation tree, oriented as an inverted tree, where the concluding provable term is at the bottom and the proofs of the premises branch out to the top as follows
0 
succ(0) 
succ(succ(0)) 
In this case the derivation tree is "skinny", since we have only one premise in the rules. This is not always the case as we will see later.
To "connect" inference rule instances in a proof, we apply unification. In matching the concluding term of a rule to the term of a premise the terms should be "structurally compatible". Unification means that the term "are made equal" by instantiating their variables accordingly.
More precisely, unification is the process of finding the minimum number of substitutions for the variables in the two terms such that the two terms become equal.
When two terms are trees, all we need to do to unify these terms is traverse both trees in parallel and check if the nodes and leaves are identical. When a variable is encountered the variable is bound to the corresponding term in the other tree, including to other variables (which effectively become aliases).
Consider for example the two terms if(iszero(X), if(true, Y, succ(0)), 0)
and if(iszero(pred(succ(U))), if(V, W, succ(W)), 0)
depicted as trees:
______if_______ 
___if 
Unification yields X = pred(succ(U))
, V = true
, and Y = W = 0
. Note that Y and W are aliases and that variable U remains uninstantiated (remember that we should keep substitutions to a minimum, which means that we should not instantiate more variables than necessary to unify both terms). Unification is an equivalence relation, and is therefore symmetric (commutative), reflexive, and transitive. That is, t = t and if t_{1} = t_{2}, t_{2} = t_{3}, then t_{1} = t_{3} (though different variable instantiations may result as a sideeffect from unifications of terms t_{1} = t_{2}, t_{2} = t_{3}, and t_{1} = t_{3}).
Unification may create terms that are cyclic. For example, unifying succ(
X)
with succ(succ(
X))
binds X = succ(
X)
thereby creating a cycle that represents the infinite term succ(succ(succ(succ(
...))))
. To avoid cycles, unification is applied with an occurs check. Normally, we do not assume that terms with cycles are produced in derivation trees for proofs. However, when cycles are allowed this will be explicitly stated. As we will see later, cycles can be useful in type checking.
Exercise: show that plus(succ(succ(0)), succ(0)) = succ(succ(succ(0)))
is a provable term using the four rules:
0 
X 
succ( X) 
Y 
plus(0, Y) = Y 
plus( X, Y) = Z 
plus(succ( X), Y) = succ( Z) 
Our previous example of NB expressions can be defined by inference rules to define the set T of terms as follows:
true ∈ T 
false ∈ T 
0 ∈ T 
t_{1} ∈ T 
succ( t_{1}) ∈ T 
t_{1} ∈ T 
pred( t_{1}) ∈ T 
t_{1} ∈ T 
iszero( t_{1}) ∈ T 
t_{1} ∈ T t_{2} ∈ T t_{3} ∈ T 
if( t_{1}, t_{2}, t_{3}) ∈ T 
This defines the abstract syntax of expressions as provable terms. Provable terms are well formed with respect to the (abstract) syntax.
Exercise: show that if(iszero(pred(succ(0))), if(true, 0, succ(0)), 0)
is a provable term.
We can directly implement the inference rules on NB terms in Prolog by defining a predicate is_term
with seven clauses, consisting of three facts and four rules:
% PROLOG FILE: nbterms.pl is_term(true). is_term(false). is_term(0). is_term(succ(E1)) : is_term(E1). is_term(pred(E1)) : is_term(E1). is_term(iszero(E1)) : is_term(E1). is_term(if(E1, E2, E3)) : is_term(E1), is_term(E2), is_term(E3). 
Note that the predicate we defined is_term(
...)
takes the place of the conclusion in the rule and that E1
, E2
, and E3
are variables. The premises, if any, appear at the righthand side of the :
.
And indeed, we can query the Prolog system to prove that if(iszero(pred(succ(0))), if(true, 0, succ(0)), 0)
is a term, whereas if(a, 0, 0)
is not:
? [nbterms]. % nbterms compiled 0.00 sec, 1,960 bytes 
When a Prolog goal succeeds, Prolog reports true
and the bindings of variables of the solution are shown (if any). When a Prolog goal fails, Prolog reports false
(or fail
), obviously without any variable bindings. Thus, fail is not an error or an exception but rather a state. In a Prolog program, failure typically occurs as an internal state in the search for solutions.
To trace the rules and show the derivation tree of the proof:
? [nbterms_rules]. % rule compiled 0.00 sec, 7,440 bytes 
Prolog is a logic programming language based on logic deduction by rule inference using backward chaining. That is, in backward chaining we start with the final goal to prove (the term in the conclusion of a rule) and try to prove it by finding a matching rule. When a rule matches by unification, we then try to prove the subgoals (the premises), and so on.
Prolog uses term unification for rule matching and backtracking over rules to prove a goal. When a rule leads to a dead end and fails, backtracking finds another rule to try, which means that variable bindings established in the deadend must be undone.
Terms in Prolog form trees over atoms, functors and Prolog variables. Terms are inductively defined as follows:
fooBar3
), operators and concatenations of operators (such as +
, =
, ++
, :=
, =/=
), numeric constants (such as 12
, 3.14
), or any string of characters that is quoted (such as 'this is an atom'
).foo(12, 3.14, bar(+), =(12, X))
).X
, Foo
, BAR
).In addition, the following conventions are used.
op(
precedence, fy,
f)
or op(
precedence, yf,
f)
, respectively. For example, +a
represents the term +(a)
. Predefined prefix operators include +
, 
, \+
, :
.op(
precedence, yfx,
f)
, op(
precedence, xfy,
f)
, or op(
precedence, xfx,
f)
, respectively. For example, a+b+c
represents the term +(+(a, b), c)
which in tree form is depicted by:
+__ 
*
, /
, //
, mod
, +
, 
, <
, =
, =..
, ==
, >
, >=
, \=
, \==
, is
, comma (,
), >
, ;
, 
, and :
.(a, b, c)
represents the term ,(a, ,(b, c))
which in tree form is
depicted by:
__, 
[
t_{1},
t_{2},
...,
t_{n}]
and are composed of 2ary "dot" functors and the special atom []
to denote the empty list. That is, [a,b,c]
represents .(a, .(b, .(c, [])))
which in tree form is depicted by:
____. 
[
... 
t]
. This allows for constructing new lists and for decomposing lists by unification. For example, we can unify [a,b,c,d] = [a,b  L]
to get L = [c,d]
. This is equivalent to unifying .(a, .(b, .(c, .(d, [])))) = .(a, .(b, L))
:
______. 
__. 
L = .(c, .(d, [])) = [c,d]
. _
is a wildcard, meaning that each occurrence of _
is a new anonymous variable that can be instantiated to any term in unification._
. A wildcard can be used as an alternative.Useful builtin Prolog predicates are:
N
is
expression: this infix predicate evaluates arithmetic expression and assigns the value to variable N
.var(X)
: succeeds if X
is an unbound variable.atom(A)
: succeeds if A
is an atom (a name).atomic(A)
: succeeds if A
is an atom, a string, or a number.\+
goal: the logical "not", i.e. succeeds when goal fails, and vice versa.T = S
: unifies terms T
and S
or fails.T == S
: compares terms T
and S
and succeeds if they are equal (no unification).T =@= S
: compares terms T
and S
and succeeds if they are structurally equal modulo variable names (no unification).T =.. Ts
: the "univ" operator for term construction/splitting, constructs term T
from a list Ts
of terms starting with the functor name followed by its argument terms, for example foo(1,bar(+)) =.. [foo,1,bar(+)]
.N < M
: number comparisonT @< S
: term comparisontrue
: always succeeds.fail
: always fails.!
: the "cut" prevents (further) backtracking of the rules of the current predicate.Common list predicates in Prolog are:
append(Xs, Ys, Zs)
appends list Xs and Ys to give Zs.union(Xs, Ys, Zs)
set union of Xs and Ys to give Zs (terms may occur only once in each list)member(X, Xs)
finds term X in Xs.reverse(Xs, Ys)
Ys is reversed list of Xs.length(Xs, N)
finds the length N of the list Xs.setof(X,
goal, Xs)
finds all solutions for variable X
for which goal succeeds.Most Prolog predicates are relational. That is, input and output are often (but not always) reversible.
Prolog predicates cannot be nested as terms as if they were functions. That is, the programming style is not a functional style, but somewhat comparable to imperative sequencing of statements, where typically the next predicate takes the result of the previous. When a predicate fails, Prolog backtracks to retry previous predicates, and so on. This makes it easy to implement generateandtest solutions to problems.
For example:
? member(b, [a,b,c]). true. ? member(X, [a,b,c]), b = X. X = b . ? member(X, [1,2,3,4,5]), X > 3. X = 4 . ? member(b, [aXs]). Xs = [b_G310] . ? append([a,b], [c,d], Zs). Zs = [a, b, c, d]. ? append(Xs, [c,d], [a,b,c,d]). Xs = [a, b]. ? append(Xs, Ys, [a,b,c,d]), member(c, Xs). Xs = [a, b, c], Ys = [d]. ? setof((Xs,Ys), append(Xs, Ys, [a,b,c,d]), Pairs). Pairs = [ ([], [a, b, c, d]), ([a], [b, c, d]), ([a, b], [c, d]), ([a, b, c], [d]), ([a, b, c...], [])]. 
Writing a program in Prolog amounts to defining a set of rules (Prolog clauses) for predicates, which are entered in a Prolog file with extension .pl
. Multiple files are loaded from the Prolog command line with (filename does not require the .pl
extension):
? [
filename, filename, ...].
Definitions of inference rules (to be defined in files) are of the form:
head :
body.
where the head is a predicate (an atom or functor) and body is a conjunction of n subgoals:
head :
goal_{1}, goal_{2}, ..., goal_{n}.
If the body is simply true
, we can omit the :
and state this as a fact:
head.
Predicates can be atoms, but that is not so useful so we usually use functors for predicates. Predicates define properties of terms and relations between terms:
% PROLOG FILE: mary.pl valuable(gold). valuable(painting). interesting(book). interesting(painting). father(john, mary). mother(beth, mary). gives(Parent, Object, Child) : father(Parent, Child), valuable(Object), interesting(Object). gives(Parent, Object, Child) : mother(Parent, Child), valuable(Object), interesting(Object). 
Suppose we need to determine what present mary
receives from one of her parents:
? [mary]. % mary compiled 0.00 sec, 2,888 bytes true. ? gives(P, X, mary). P = john, X = painting . 
when we trace the inference steps of the goal gives(P, X, mary)
with trace
/0, we see that backtracking over subgoals occurs (the _G###
denote internal variables or new variables created by rule instantiations):
? trace, gives(P, X, mary). 
Tracers and debuggers are implemented in Prolog, as metalevel programs, for controlling and reasoning about logic programs.
To illustrate the use of the ! "cut", consider changing the second to last rule of the example:
gives(Parent, Object, Child) : father(Parent, Child), !, valuable(Object), interesting(Object). 
This cuts backtracking after father
/2, which prevents the search for alternative matching rules for subgoals to the left of ! (backtracking over father
/2 is cut in this case) and also cuts the backtracking over the current predicate (gives
/3 in this case). Any deeper backtrack points, if any, are not affected!
The "cut" is a extra (or meta)logical predicate, because it controls the logical inference process (beyond control predicates such as cut and negation \+
, other extralogical predicates are term inspection predicates such as var
/1 that divert from the pure firstorder Hornclause logic programming paradigm). The reason to use "cut" is either for performance optimization or to limit solutions. A white cut is placed to prevent further matching of a predicate's clauses that will lead to nonmatches or to failure anyway. A green cut is placed to limit backtracking to discard solutions that are not needed. A red cut is a cut that is incorrectly placed and causes the program to fail to produce solutions.
Exercise: is the cut in the example above a white or a green cut?
Prolog is selfdefining and allows terms to be executed as goals using call(T)
for any term T
that is not a variable.
Call
combined with cut and fail
can be useful to implement metalogical predicates:
if(G1, G2, G3) : call(G1), !, call(G2). if(G1, G2, G3) : call(G3). not(G) : call(G), !, fail. not(G). and(G1, G2) : call(G1), call(G2). or(G1, G2) : call(G1). or(G1, G2) : call(G2). 
Note: the if
has a builtin Prolog equivalent written as (G1 > G2; G3)
, not
has a builtin prefix operator "\+
", and
has a builtin ",
" (comma), and or
has a builtin ";
" (semicolon).
Exercise: what variables are instantiated when we query if(1=X, and(Z=Y, Z=2), Y=3)
? What about not(and(X=1, X>2))
?
Induction on terms provides a mechanism to determine various properties of terms. For our NB expression language, we can inductively define the set of constants appearing in a term:
% PROLOG FILE: nbterms_induc.pl consts(true, Cs) : Cs = [true]. consts(false, Cs) : Cs = [false]. consts(0, Cs) : Cs = [0]. consts(succ(E1), Cs) : consts(E1, Cs). consts(pred(E1), Cs) : consts(E1, Cs). consts(iszero(E1), Cs) : consts(E1, Cs). consts(if(E1, E2, E3), Cs) : consts(E1, Cs1), consts(E2, Cs2), consts(E3, Cs3), union(Cs1, Cs2, Cs12), union(Cs12, Cs3, Cs). 
to inductively define the size of a term:
% PROLOG FILE: nbterms_induc.pl size(true, N) : N = 1. size(false, N) : N = 1. size(0, N) : N = 1. size(succ(E1), N) : size(E1, K), N is K+1. size(pred(E1), N) : size(E1, K), N is K+1. size(iszero(E1), N) : size(E1, K), N is K+1. size(if(E1, E2, E3), N) : size(E1, K), size(E2, L), size(E3, M), N is K+L+M+1. 
and to inductively define the depth of a term:
% PROLOG FILE: nbterms_induc.pl depth(true, N) : N = 1. depth(false, N) : N = 1. depth(0, N) : N = 1. depth(succ(E1), N) : depth(E1, K), N is K+1. depth(pred(E1), N) : depth(E1, K), N is K+1. depth(iszero(E1), N) : depth(E1, K), N is K+1. depth(if(E1, E2, E3), N) : depth(E1, K), depth(E2, L), depth(E3, M), N is max(max(K, L), M)+1. 
For example:
? [draw,nbterms_induc]. ... ? draw(if(iszero(pred(succ(0))),if(true,0,succ(0)),0)). ______if_______ /  \ isz _if___ 0  /  \ pre true 0 suc   suc 0  0 true. ? consts(if(iszero(pred(succ(0))),if(true,0,succ(0)),0), Cs). Cs = [true, 0]. ? size(if(iszero(pred(succ(0))),if(true,0,succ(0)),0), N). N = 11. ? depth(if(iszero(pred(succ(0))),if(true,0,succ(0)),0), N). N = 5. 
There are three principles of induction on terms.
Induction on depth:
If, for each term s,
given P(r) for all r such that depth(r) < depth(s) we can show P(s),
then P(s) holds for all s.
Induction on size:
If, for each term s,
given P(r) for all r such that size(r) < size(s) we can show P(s),
then P(s) holds for all s.
Structural induction:
If, for each term s,
given P(r) for all immediate subterms r of s we can show P(s),
then P(s) holds for all s.
Exercise: prove that consts(t) ≤ size(t) for any NB expression t by induction on the depth of t. Answer: see TPL p.30.
Denotational semantics takes an abstract view of the meaning of a program by formalizing the semantics of a programming construct as a mathematical object. Semantic functions map a program's syntactic programming constructs to denotations, where the mappings are defined by a set of semantic equations. Denotations are mathematical objects from a semantic (value) domain. The mathematical object produced for a program is a function object. This function object maps the program's inputs to its outputs. The semantic domains of the inputs and outputs of the function object may be the same as that of the program. However, we can map the program's values to new value domains. The function object itself is expressed in a welldefined language of a calculus or logic.
There are five components in a denotational semantics definition of a given language L:
Consider the abstract syntax of NB with one syntactic domain E
of expressions with seven abstract production rules:
E ::= true  false  if(E, E, E)  0  succ(E)  pred(E)  iszero(E) 
We assume that the values computed by NB expressions are Booleans and the natural numbers. This is formalized by defining the semantic value domain NB:
NB = {t, f} ∪ ℕ = {t, f, 0, 1, ...}
The signature of the semantic function D that maps constructs from the syntactic domain E
to denotations is:
D : E
→ NB
The semantic equations are:
D ⟦true ⟧ = 
t 
D ⟦false ⟧ = 
f 
D ⟦if(E1, E2, E3) ⟧ = 
⎧D ⟦E2 ⟧ if D ⟦E1 ⟧ = t 
⎨  
⎩D ⟦E3 ⟧ if D ⟦E1 ⟧ = f 

D ⟦0 ⟧ = 
0 
D ⟦succ(E) ⟧ = 
D ⟦E ⟧ + 1 
D ⟦pred(E) ⟧ = 
max(0, D ⟦E ⟧ − 1) 
⎧ t if D ⟦E ⟧ = 0 

D ⟦iszero(E) ⟧ = 
⎨ 
⎩ f if D ⟦E ⟧ > 0 
Emphatic brackets ⟦ ⟧ are used to separate the syntactic world (terms in the syntactic domain) from the semantic world (denotations in the semantic domain).
The powerful principle of compositionality can be exploited with denotational semantics because of the inductive structure of the semantic equations. As a consequence, semantic functions are homomorphisms, which means they respect operations. The function H is a homomorphism if H(f(x)) = g(H(x)).
x  H → 
H(x) 
f↓  ↓g 

f(x)  → H 
H(f(x)) = g(H(x)) 
Clearly, function D is a homomorphism. An operation f in the syntactic domain has an equivalent operation g in the semantic domain.
Instead of mathematical objects, we can also use a higherlevel programming language to define semantic functions and select certain types of values in the programming language for the semantic value domains.
For example, we can construct a Prolog program for the NB expression language and choose for the semantic value domains the set of Prolog atoms t
, f
, 0, 1, 2, ...:
% PROLOG FILE: nbterms_denot.pl d(true, t). 
Prolog predicates are relational (to a limited extent), so the input and output roles of a predicate's argument can be reversed, viz. succ(VT, V)
and succ(V, VT)
used in the clauses of predicate d
/2 (predicates are often referred to by name/arity). Hence, we can apply the semantic function to a NB expression to compute its value, and vice versa:
? [nbterms_denot]. ... ? d(succ(pred(succ(succ(0)))), V). V = 2. ? d(d(if(iszero(0),if(false,0,succ(0)),0), V). V = 1. ? d(T, t). T = true . ? d(T, 3). T = succ(succ(succ(0))) . 
However, Prolog is not purely relational and the use of control and metalogical predicates (such as "cut", \+, var
/1) and other nonrelational predicates in clauses often prevents predicates from being "reversible". Another problem is nontermination of inference by backward chaining. The termination property is very sensitive to clause orderings. For example, reversing the two clause definitions:
d(succ(T), V) : d(T, VT), integer(VT), succ(VT, V). 
leads to nontermination of the goal d(T, 3)
, whereas d(T, V)
for any term T
still terminates. This makes it generally difficult to design and implement a reversible predicate.
More on denotational semantics later. See also Syntax and Semantics of Programming Languages Chapter 9.
Axiomatic semantics derives laws from the definitions of imperative programming language constructs. These laws define the meaning of the program by means of describing the initial and final state of a computation, and can be used to verify the correctness of a program or algorithm.
Axiomatic semantics was covered in COP4020.
Operational semantics defines the meaning of a language by the operations of an abstract machine. The machine operates on the abstract syntax tree of terms of the language by applying transition functions on terms. Hence, the state of the machine is just the term it is operating on. The operational semantic meaning of a term t of the language we define is the final state (term) that is reached when the machine halted after starting with the inital state (the term t).
The onestep evaluation relation on terms, written t → s and meaning "t evaluates to s", is a onestep transformation to modify term t into s. The onestep evaluation relation represents a transformation function from a term (state) t to another term (next state) s.
Consider the abstract syntax of the language NB of expressions:
E ::= true  false  if(E, E, E)  0  succ(E)  pred(E)  iszero(E) 
where we want NB expressions to compute values v
over Booleans true
and false
and natural numbers nv
, expressed by the following abstract syntax of value terms of NB:
v ::= true  false  nv nv ::= 0  succ(nv) 
The operational semantics onestep evaluation rules for NB are:
(EIfTrue)  
if(true , T _{2}, T _{3}) → T _{2} 
(EIfFalse)  
if(false , T _{2}, T _{3}) → T _{3} 
T _{1} → T '_{1} 
(EIf) 
if(T _{1}, T _{2}, T _{3}) → if(T '_{1}, T _{2}, T _{3}) 
T _{1} → T '_{1} 
(ESucc) 
succ(T _{1}) → succ(T '_{1}) 
(EPredZero)  
pred(0 ) → 0 
(EPredSucc)  
pred(succ( nv _{1})) → nv _{1} 
T _{1} → T '_{1} 
(EPred) 
pred(T _{1}) → pred(T '_{1}) 
(EIsZeroZero)  
iszero(0 ) → true 
(EIsZeroSucc)  
iszero(succ(nv _{1})) → false 
T _{1} → T '_{1} 
(EIsZero) 
iszero(T _{1}) → iszero(T '_{1}) 
Axioms (EIfTrue, EIfFalse, EPredZero, EIsZeroZero, EIsZeroSucc) are computation rules. Rules with premises (EIf, ESucc, EPred, EIsZero) are congruence rules and define an evaluation strategy expressing which parts of the term to evaluate. For example, the conditional in the if
must be evaluated first.
A rule is satisfied by a relation if for each instance of the rule, either the conclusion is in the relation or one of the premises is not. Basically, the evaluation relation that we want t → s should be the smallest binary relation on terms satisfying the rules shown above. That is, the relation should only include pairs (t, s) of terms t and s that are derivable. More formally, when the pair (t, s) is in the evaluation relation (t, s) ∈ →, we say that "the evaluation statement (or judgement) t → s is derivable."
Basically, we are stating that all pairs (t, s) from the provable terms t → s are in the evaluation relation (t, s) ∈ →, no more, no less. Hence, we can consider the inference rules to define the smallest evaluation relation satisfying the rules.
A term t is in normal form (t is a normal form, or t is a canonical form) if no evaluation rule applies to it. That is, there is no term s such that t → s.
We will state some useful properties of NB.
THEOREM: Every NB value is in normal form.
Proof: The values true
, false
, and 0
are normal forms, because they do not appear on the left of the evaluation relation in any rule. Values succ(
t)
with t in normal form are normal forms, because t is in normal form and the premise in (ESucc) is not provable. ∎
Note that we only consider wellformed terms. That is, terms defined by the (abstract) syntax NB for expressions and values. With this assumption we can state the following.
THEOREM [Completeness of NB]: If t is a wellformed NB term in normal form, then t is an NB value.
Proof: By structural induction on t. ∎
We enforce wellformedness to prevent admitting terms such as succ(true)
that are normal forms but meaningless values.
In general, we may encounter terms that are stuck in normal form but are not a value. That is, the operational semantics has reached a "meaningless state" comparable to the notion of a runtime error. In a concrete implementation of the language these states might correspond to failures of various kinds: segmentation faults, exceptions, etc. For example, evaluating succ(n)
might fail when n
is the maximum machine representation of a number.
A common approach to formalize the notion of meaningless states in an abstract machine is to introduce a special term ⊥ called bottom. We can augment NB with the value ⊥ and additional evaluation rules:
(ESuccTrue)  
succ(true) → ⊥ 
(ESuccFalse)  
succ(false) → ⊥ 
(EPredTrue)  
pred(true) → ⊥ 
(EPredFalse)  
succ(false) → ⊥ 
and the following bottompreserving evaluation rules
(EIfBottom)  
if( ⊥, T _{2}, T _{3}) → ⊥ 
(ESuccBottom)  
succ( ⊥) → ⊥ 
(EPredBottom)  
pred( ⊥) → ⊥ 
(EIsZeroBottom)  
iszero( ⊥) → ⊥ 
A bottompreserving function (or operation) is one that produces ⊥ when one of its operands evaluates to ⊥. This "propagates" the error as a result.
Another importent property of NB is that we have no more than one choice of an evaluation rule for a given term.
THEOREM [Determinacy of the onestep evaluation of NB]: If t → s and t → r, then s = r in NB.
Proof: By structural induction. Base case: the property holds for all computation rules (axioms) of NB. For the congruence rules starting with (EIf), we note that the conclusions of (EIf) and (ETrue, EFalse) both match if(T
_{1}, T
_{2},
T
_{3})
. However, when T
_{1}=true
or T
_{1}=false
then the (EIf) premise is not derivable (since T
_{1} is a value) and only (ETrue) or (EFalse) are applicable. By the induction hypothesis, if T
_{1} → T'
_{1} is deterministic so is if(T
_{1}, T
_{2},
T
_{3})
→ if(T'
_{1}, T
_{2}, T
_{3})
. The same conclusions can be made for the other congruence rules by noting that values are in normal form. ∎
In general, we are interested in evaluating terms t to a value u (a normal form) through multiple steps t → s → ... → u using the multistep evaluation relation t →^{*} s defined as the reflexive, transitive closure of the onestep evaluation, where →^{*} satisfies
t →^{*} s if t → s
t →^{*} r if t →^{*} s and s →^{*} r
t →^{*} t
THEOREM [Consistency of NB]: Normal forms are unique, that is, if t →^{*} u and t →^{*} u' for normal forms u and u', then u = u'.
Proof: Immediately follows from the determinacy of the onestep evaluation. ∎
THEOREM [Termination of evaluation in NB]: For every term t there is some normal form u such that t →^{*} u.
A Prolog definition of the onestep and multistep evaluation relations is straight forward:
% PROLOG FILE: nbterms_eval.pl % Meta ops : op(950, xfx, :>). % Onestep evaluation relation : op(950, xfx, *>). % Multistep reflexive transitive closure of :> T *> R : T :> S, !, S *> R. C *> C. %_____________CONCLUSION_____________ : ________RULE________, _PREMISES. 
The rule
/1 predicates are used for metalevel tracing, which is demonstrated as follows:
? [nbterms_eval]. ... ? show, if(iszero(0),if(false,0,succ(0)),0) *> C. _______if________ /  \ isz _if____ 0  /  \ 0 false 0 suc  0 [1,1]:EIsZeroZero iszero(0):>true iszero(0):>true [1]:EIf if(iszero(0),if(false,0,succ(0)),0):>if(true,if(false,0,succ(0)),0) _______if_________ /  \ true _if____ 0 /  \ false 0 suc  0 [2]:EIfTrue if(true,if(false,0,succ(0)),0):>if(false,0,succ(0)) _if____ /  \ false 0 suc  0 [3]:EIfFalse if(false,0,succ(0)):>succ(0) suc  0 C = succ(0). 
We can also define the natural semantics (bigstep semantics as opposed to the smallstep style) for NB:
T _{1} ⇒ true T _{2} ⇒ V _{2} 
(BIfTrue) 
if(T _{1}, T _{2}, T _{3}) ⇒ V _{2} 
T _{1} ⇒ false T _{3} ⇒ V _{3} 
(BIfFalse) 
if(T _{1}, T _{2}, T _{3}) ⇒ V _{3} 
T _{1} ⇒ NV _{1} 
(BSucc) 
succ(T _{1}) ⇒ succ(NV _{1}) 
T _{1} ⇒ 0 
(BPredZero) 
pred(T _{1}) ⇒ 0 
T _{1} ⇒ succ(NV _{1}) 
(BPredSucc) 
pred(T _{1}) ⇒ NV _{1} 
T _{1} ⇒ 0 
(BIsZeroZero) 
iszero(T _{1}) ⇒ true 
T _{1} ⇒ succ(NV _{1}) 
(BIsZeroSucc) 
iszero(T _{1}) ⇒ false 
(BValue)  
V ⇒ V 
The bigstep evaluation relation t ⇒ u as defined by the rules above fully evaluates a term t to a value u.
The bigstep rule (BValue) implicitly assumes that V
is in normal form to "evaluate" it to V
. That is, the choice of metavariable V
that ranges only over values helps control the order of evaluation. The rules are unordered so this ensures that (BValue) is only applied when V
is a value.
In an implementation of the rules we can select any rule ordering, but we want the (BValue) rule to be matched last so V
is in normal form because the evaluation rules prior to (BValue) apply to terms that represent expressions that are evaluable.
The Prolog definition of the bigstep evaluation rules is straight forward:
% PROLOG FILE: nbterms_eval.pl % Meta ops : op(950, xfx, =>). % Big step evaluation relation ... %________CONCLUSION_________ : ________RULE________, ______PREMISES______. 
It is clear that the (BValue) rule is to be defined last, unless we add a premise to verify that V
is a value. Note that the ordering of premises is relevant for the efficiency of the implementation. Here, the condition of the if
is evaluated first. Though the order of the premises in the rules does not matter for correctness verification, because of the absence of a state that produces errors and all evaluations will terminate (there are no loops and no recursion). However, a weak aspect to bigstep operational semantics is that the order of the premises is relevant when defining evaluation rules for a language with potentially nonterminating term evaluations, e.g. if(false, nonterminating_calculation, 0)
.
Given these definitions, the proof that if(iszero(0),if(false,0,succ(0)),0)
⇒ succ(0)
is automatically derived as follows:
? [nbterms_eval]. ... ? show, if(iszero(0),if(false,0,succ(0)),0) => C. [1,1,1]:BValue 0=>0 0=>0 [1,1]:BIsZeroZero iszero(0)=>true [1,2,1]:BValue false=>false [1,2,2,1]:BValue 0=>0 0=>0 [1,2,2]:BSucc succ(0)=>succ(0) false=>false,succ(0)=>succ(0) [1,2]:BIfFalse if(false,0,succ(0))=>succ(0) iszero(0)=>true,if(false,0,succ(0))=>succ(0) [1]:BIfTrue if(iszero(0),if(false,0,succ(0)),0)=>succ(0) C = succ(0) . 
Natural semantics (bigstep operational semantics) closely resembles denotational semantics and can be viewed as a notational variant of it. By contrast, in smallstep semantics a term is transformed stepbystep where the terms represent the intermediate (machine) states of the evaluation. Natural semantics and denotational semantics do not transform the term being evaluated. But rather the (machine) state is explicitly described by a denotation or by a state value.
See Figure 8.4 p.241 from [SSPL Ch.8.4] for abstract syntax of Wren.
See Figure 8.6 p.247 from [SSPL Ch.8.5] for inference rules for the onestep evaluation relation → on Wren expressions.
THEOREM [Completeness of Wren expressions]: SSPL p.250 states that normal forms are values (numerals and Booleans) in Wren if the store contains bindings for all variables used in an expression and if the expression does not contain the division operation.
THEOREM [Consistency of Wren expressions]: SSPL p.251 states that normal forms are unique.
To implement the onestep evaluation relation for Wren, let's first simplify the abstract syntax a little by combining integer/Boolean expressions in one syntactic category Exp
and use a more familiar nary functor representations for Wren commands as terms:
Exp ::= true  false  NUM  Exp iop Exp  Exp rop Exp  Exp bop Exp  not(Exp) 
The onestep evaluation relation for Wren expressions can be implemented as follows (simular rules are elided [...] for clarity):
% PROLOG FILE: wren.pl ... %______________CONCLUSION______________ : ___RULE___, ________PREMISES________. (IE1 + IE2,STO) :> (IE1p + IE2,STO) : rule('E1'), (IE1,STO) :> (IE1p,STO). ... (IE1 < IE2,STO) :> (IE1p < IE2,STO) : rule('E2'), (IE1,STO) :> (IE1p,STO). ... (BE1 and BE2,STO) :> (BE1p and BE2,STO) : rule('E3'), (BE1,STO) :> (BE1p,STO). ... (IE1 + IE2,STO) :> (IE1 + IE2p,STO) : rule('E4'), (IE2,STO) :> (IE2p,STO). ... (IE1 < IE2,STO) :> (IE1 < IE2p,STO) : rule('E5'), (IE2,STO) :> (IE2p,STO). ... (BE1 and BE2,STO) :> (BE1 and BE2p,STO) : rule('E6'), (BE2,STO) :> (BE2p,STO). ... (N1 + N2,STO) :> (N,STO) : rule('E7'), N is N1+N2. ... (N1 < N2,STO) :> (B,STO) : rule('E8'), (N1 < N2 > B = true; B = false). ... (B1 and B2,STO) :> (B,STO) : rule('E9'), (B1 = true, B2 = true > B = true; B = false). ... (not(BE),STO) :> (not(BEp),STO) : rule('E10'), (BE,STO) :> (BEp,STO). ... (not(true),STO) :> (false,STO) : rule('E11'). ... (ID,STO) :> (V,STO) : rule('E12'), atom(ID), member(ID=V, STO). 
Note that the store STO
is a list of name=value bindings that is propagated along and needed when an identifier has to be looked up with member
/2.
Example evaluation:
? Exp = x+y+6, STO = [x=17,y=25], show, (Exp,STO) *> (Val,STO). ________,____ / \ +__ ____.__ / \ / \ + 6 = .__ / \ / \ / \ x y x 17 = [] / \ y 25 atom(x),member(x=17,[x=17,y=25]) [1,1,1]:E12 (x,[x=17,y=25]):> (17,[x=17,y=25]) (x,[x=17,y=25]):> (17,[x=17,y=25]) [1,1]:E1 (x+y,[x=17,y=25]):> (17+y,[x=17,y=25]) (x+y,[x=17,y=25]):> (17+y,[x=17,y=25]) [1]:E1 (x+y+6,[x=17,y=25]):> (17+y+6,[x=17,y=25]) ________,____ / \ +__ ____.__ / \ / \ + 6 = .__ / \ / \ / \ 17 y x 17 = [] / \ y 25 atom(y),member(y=25,[x=17,y=25]) [2,1,1]:E12 (y,[x=17,y=25]):> (25,[x=17,y=25]) (y,[x=17,y=25]):> (25,[x=17,y=25]) [2,1]:E4 (17+y,[x=17,y=25]):> (17+25,[x=17,y=25]) (17+y,[x=17,y=25]):> (17+25,[x=17,y=25]) [2]:E1 (17+y+6,[x=17,y=25]):> (17+25+6,[x=17,y=25]) ________,____ / \ +__ ____.__ / \ / \ + 6 = .__ / \ / \ / \ 17 25 x 17 = [] / \ y 25 42 is 17+25 [3,1]:E7 (17+25,[x=17,y=25]):> (42,[x=17,y=25]) (17+25,[x=17,y=25]):> (42,[x=17,y=25]) [3]:E1 (17+25+6,[x=17,y=25]):> (42+6,[x=17,y=25]) ________,__ / \ + ____.__ / \ / \ 42 6 = .__ / \ / \ x 17 = [] / \ y 25 48 is 42+6 [4]:E7 (42+6,[x=17,y=25]):> (48,[x=17,y=25]) ________, / \ 48 ____.__ / \ = .__ / \ / \ x 17 = [] / \ y 25 Exp = x+y+6, STO = [x=17, y=25], Val = 48. 
See Figure 8.8 p.254 from [SSPL Ch.8.6] for inference rules for the onestep evaluation relation → on Wren commands.
Commands are state transformers. That is, the execution of the commands of a program proceeds by a sequence <c_{0},st(in_{0},out_{0},sto_{0})> → <c_{1},st(in_{1},out_{1},sto_{1})> → <c_{2},st(in_{2},out_{2},sto_{2})> → ... where in is a queue of input values the program reads as input, out is a queue of output values the program writes, and sto is a list of name=value bindings.
Given this machine state model, we can say that two programs c_{1} and c_{2} are semantically equivalent if they produce the same final state s_{f} for any input state s or both do not terminate on s. That is, <c_{1},s> → <skip,s_{f}> iff <c_{2},s> → <skip,s_{f}> and <c_{1},s> → ∞ iff <c_{1},s> → ∞.
The onestep evaluation relation for Wren commands can be implemented as follows:
% PROLOG FILE: wren.pl ... %_________________________CONCLUSION__________________________ : ___RULE___, ______PREMISES_____. (ID := E,st(IN,OUT,STO)) :> (ID := Ep,st(IN,OUT,STO)) : rule('C1'), (E,STO) :> (Ep,STO). (ID := V,st(IN,OUT,STO)) :> (skip,st(IN,OUT,STOp)) : rule('C2'), STOp = [ID=VSTO]. (if(E,C1,C2),st(IN,OUT,STO)) :> (if(Ep,C1,C2),st(IN,OUT,STO)) : rule('C3'), (E,STO) :> (Ep,STO). (if(true,C1,_C2),STATE) :> (C1,STATE) : rule('C4'). (if(false,_C1,C2),STATE) :> (C2,STATE) : rule('C5'). (if(E,C),STATE) :> (if(E,C,skip),STATE) : rule('C6'). (while(E,C),STATE) :> (if(E,(C;while(E,C))),STATE) : rule('C7'). ((C1;C2),STATE) :> ((C1p;C2),STATEp) : rule('C8'), (C1,STATE) :> (C1p,STATEp). ((skip;C),STATE) :> (C,STATE) : rule('C9'). (read(ID),st([VIN],OUT,STO)) :> (skip,st(IN,OUT,STOp)) : rule('C10'), STOp = [ID=VSTO]. (write(E),st(IN,OUT,STO)) :> (write(Ep),st(IN,OUT,STO)) : rule('C11'), (E,STO) :> (Ep,STO). (write(V),st(IN,OUT,STO)) :> (skip,st(IN,OUTp,STO)) : rule('C12'), append(OUT, [V], OUTp). 
We use functor st
/3 to hold the state consisting of a list of input values, output values, and a store. The store is a list that is populated by the assignment and read commands. These commands add a new binding name=value to the front of a new store STOp = [ID=V  STO]
. Output for the write command is appended to the output list.
Example execution:
? Cmd = (read(x); x := x+1; write(x)), STATE = st([5],[],[]), show, (Cmd,STATE) *> FINALSTATE. ______,________ / \ ______; _st____ / \ /  \ rea ;____ . [] []  / \ / \ x _:= wri 5 [] / \  x + x / \ x 1 [x=5]=[x=5] [1,1]:C10 (read(x),st([5],[],[])):> (skip,st([],[],[x=5])) (read(x),st([5],[],[])):> (skip,st([],[],[x=5])) [1]:C8 ((read(x);x:=x+1;write(x)),st([5],[],[])):> ((skip;x:=x+1;write(x)),st([],[],[x=5])) ________,_________ / \ ______;_ _____st__ / \ /  \ skip ;____ [] [] .__ / \ / \ _:= wri = [] / \  / \ x + x x 5 / \ x 1 [2]:C9 ((skip;x:=x+1;write(x)),st([],[],[x=5])):> ((x:=x+1;write(x)),st([],[],[x=5])) ________,______ / \ ;____ _____st__ / \ /  \ _:= wri [] [] .__ / \  / \ x + x = [] / \ / \ x 1 x 5 atom(x),member(x=5,[x=5]) [3,1,1,1]:E12 (x,[x=5]):> (5,[x=5]) (x,[x=5]):> (5,[x=5]) [3,1,1]:E1 (x+1,[x=5]):> (5+1,[x=5]) (x+1,[x=5]):> (5+1,[x=5]) [3,1]:C1 (x:=x+1,st([],[],[x=5])):> (x:=5+1,st([],[],[x=5])) (x:=x+1,st([],[],[x=5])):> (x:=5+1,st([],[],[x=5])) [3]:C8 ((x:=x+1;write(x)),st([],[],[x=5])):> ((x:=5+1;write(x)),st([],[],[x=5])) ________,______ / \ ;____ _____st__ / \ /  \ _:= wri [] [] .__ / \  / \ x + x = [] / \ / \ 5 1 x 5 6 is 5+1 [4,1,1]:E7 (5+1,[x=5]):> (6,[x=5]) (5+1,[x=5]):> (6,[x=5]) [4,1]:C1 (x:=5+1,st([],[],[x=5])):> (x:=6,st([],[],[x=5])) (x:=5+1,st([],[],[x=5])):> (x:=6,st([],[],[x=5])) [4]:C8 ((x:=5+1;write(x)),st([],[],[x=5])):> ((x:=6;write(x)),st([],[],[x=5])) ________,____ / \ ;__ _____st__ / \ /  \ := wri [] [] .__ / \  / \ x 6 x = [] / \ x 5 [x=6,x=5]=[x=6,x=5] [5,1]:C2 (x:=6,st([],[],[x=5])):> (skip,st([],[],[x=6,x=5])) (x:=6,st([],[],[x=5])):> (skip,st([],[],[x=6,x=5])) [5]:C8 ((x:=6;write(x)),st([],[],[x=5])):> ((skip;write(x)),st([],[],[x=6,x=5])) ____________,___ / \ ;_ _________st__ / \ /  \ skip wri [] [] ____.__  / \ x = .__ / \ / \ x 6 = [] / \ x 5 [6]:C9 ((skip;write(x)),st([],[],[x=6,x=5])):> (write(x),st([],[],[x=6,x=5])) ____________, / \ wri _________st__  /  \ x [] [] ____.__ / \ = .__ / \ / \ x 6 = [] / \ x 5 atom(x),member(x=6,[x=6,x=5]) [7,1]:E12 (x,[x=6,x=5]):> (6,[x=6,x=5]) (x,[x=6,x=5]):> (6,[x=6,x=5]) [7]:C11 (write(x),st([],[],[x=6,x=5])):> (write(6),st([],[],[x=6,x=5])) ____________, / \ wri _________st__  /  \ 6 [] [] ____.__ / \ = .__ / \ / \ x 6 = [] / \ x 5 append([], [6], [6]) [8]:C12 (write(6),st([],[],[x=6,x=5])):> (skip,st([],[6],[x=6,x=5])) ______________,_ / \ skip ___________st____ /  \ [] . ____.__ / \ / \ 6 [] = .__ / \ / \ x 6 = [] / \ x 5 Cmd = (read(x);x:=x+1;write(x)), STATE = st([5], [], []), FINALSTATE = (skip, st([], [6], [x=6, x=5])). 
See Figure 8.9 p.262 from [SSPL Ch.8.6] for inference rules for the natural semantics of Wren.
% PROLOG FILE: wren.pl ... %_____________________CONCLUSION______________________ : ___RULE___, _______________________PREMISES______________________________. (IE1 + IE2,STO) => N : rule('B1'), (IE1,STO) => N1, (IE2,STO) => N2, N is N1+N2. ... (IE1 < IE2,STO) => B : rule('B2'), (IE1,STO) => N1, (IE2,STO) => N2, (N1 < N2 > B = true; B = false). ... (BE1 and BE2,STO) => B : rule('B3'), (BE1,STO) => B1, (BE2,STO) => B2, (B1 = true, B2 = true > B = true; B = false). ... (not(BE),STO) => B : rule('B4'), (BE,STO) => B1, (B1 = true > B = false; B = true). (ID,STO) => V : rule('B5'), atom(ID), member(ID=V, STO), !. (V,_STO) => V : rule('B6'), atomic(V). (ID := E,st(IN,OUT,STO)) => st(IN,OUT,[ID=VSTO]) : rule('B7'), (E,STO) => V. (if(E,C1,C2),st(IN,OUT,STO)) => STATEp : rule('B8'), (E,STO) => B, ( B = true > (C1,st(IN,OUT,STO)) => STATEp ; B = false > (C2,st(IN,OUT,STO)) => STATEp ). (if(E,C),st(IN,OUT,STO)) => STATEp : rule('B9'), (E,STO) => B, ( B = true > (C,st(IN,OUT,STO)) => STATEp ; STATEp = st(IN,OUT,STO) ). (while(E,C),st(IN,OUT,STO)) => STATEpp : rule('B10'), (E,STO) => B, ( B = true > (C,st(IN,OUT,STO)) => STATEp, (while(E,C),STATEp) => STATEpp ; STATEpp = st(IN,OUT,STO) ). ((C1;C2),STATE) => STATEpp : rule('B11'), (C1,STATE) => STATEp, (C2, STATEp) => STATEpp. (skip,STATE) => STATE : rule('B12'). (read(ID),st([VIN],OUT,STO)) => st(IN,OUT,[ID=VSTO]) : rule('B13'). (write(E),st(IN,OUT,STO)) => st(IN,OUTp,STO) : rule('B14'), (E,STO) => V, append(OUT, [V], OUTp). 
Again, note that expression evaluation needs a store with name=value bindings. Commands are state transformers, i.e. we map a command and state to a new state. The natural semantics do not modify the expression/program term in the evaluation process, but rather produces a value (for expressions) and an updated state (for commands). Note that the whileloop semantics is defined by recursion, where the while
/2 term is evaluated again in the recursive step when the condition is true.
Exercise: modify the natural semantics of Wren expressions to include a where
construct that locallly binds a name to a value. For example:
(where(x+y, y, 2), [x=1]) => 3
This corresponds to the use of "where" in functional languages. For example, the expression in Haskell is:
x+y where y=2
That is, the name y
is bound to 2
in the expression x+y
.
Exercise: modify the natural semantics of the and
and or
operations in Wren to shortcircuit logical operators. That is, the second operand is only evaluated when necessary. For example, x<>0 and 1/x==y
does not evaluate expression 1/x==y
when x==0
.
Exercise: modify the natural semantics of Wren expressions and commands to have a state with memory consisting of a list of location=value bindings (memory cells). The store is changed to name=location bindings. Thus, values of variables are now physically stored in memory. Variables can be aliases, since two variables can have the same location binding. For example,
? (x := x+y+z, st([],[],[x=0,y=0,z=1],[0=3,1=4]) => V.
V = st([],[],[x=0,y=0,z=1],[0=10,1=4])
.
where x
and y
are aliases. To define the evaluation of the assignment command, use Prolog select
/3 to remove a cell at a location LOC
from memory MEM
and construct a new memory cell MEMpp
with a value V
at location LOC
:
select(LOC=_, MEM, MEMp), MEMpp = [LOC=V  MEMp]
and/or use delete(MEM, LOC=_, MEMp)
, which does not fail when the term LOC=_
to be removed is not in the list MEM
.
Summary: "Neal Ford emphasizes the fact that functional programming uses a different way of solving a problem, thinking about the results rather than the steps to make."
Function pointers in C are rather primitive.
int f(int,int); // a function int (*pf)(int,int); // a pointer to a function ... pf = f; int n = pf(1,2); // call f(1,2) 
Function pointers in C have no state. They are sometimes necessary to pass code along to other functions as callbacks, for example Clib qsort:
int icomp(int *a,int *b) { return *a < *b ? 1 : *a > *b ? 1 : 0; } int a[100]; ... qsort(a, 100, sizeof(int), icomp); 
While passing function pointers as callbacks to other functions does not appear problematic, it is more interesting when we want to pass a function that has an internal state, for example a counter that is updated when the function is called:
static int count = 0; int icomp(int *a,int *b) { count++; return *a < *b ? 1 : *a > *b ? 1 : 0; } int a[100]; ... qsort(a, 100, sizeof(int), icomp); printf("Comparisons made = %d\n", count); 
Ideally, we would like to make the function local, to avoid the static counter, as shown in the pseudoC code (not valid in ANSI C):
int foo() { int count = 0; int icomp(int *a,int *b) { count++; return *a < *b ? 1 : *a > *b ? 1 : 0; } int a[100]; ... qsort(a, 100, sizeof(int), icomp); printf("Comparisons made = %d\n", count); } 
In this case the scope of the variable count
extends or "bleeds" into function icomp
, which is what we wanted. Programming languages that support functions as firstclass objects offer this advantage. Firstclass means that functions can be declared anywhere a valuebased variable can declared and assigned, i.e. functions can be nested, can be passed to other functions, and returned from functions. Programming languages such as ADA support (almost) firstclass functions. Functions in functional languages are always first class.
Returning a function from another function is a bit more interesting. The following pseudoC code with a local function f
declared locally in function fk
(not valid in ANSI C):
typedef int (*F)(int,int); F fk(int k) { int f(int a, int b) { return a + b + k; }; // a local function f return f; // that is returned as a pointer } ... F pf = fk(7); // pf = f, with f declared in fk() and k=7 int n = pf(1,2); // call f(1,2) 
Here we attempt to return a function f
as a closure with an internal state parameterized with k=7
being part of the outer scope of f
. So it is assumed that when pf(1,2)
is called the value k=7
is used. But note that k
is no longer in scope (and is deallocated from the stack if this were valid C!). For functions to be closures, the state of the variables referenced in the outer scope must be preserved in the closure.
"Functors" or "function objects" in C++ are in essence just objects with nice functionlike syntax. An explicit state is kept in the class instance. Function objects without state are callbacks. The term "functor" usually refers to a function object that is not a function pointer (a callback).
For example, the state of k
is maintained by the functor F:
class F { int k; public: int operator()(int a, int b) { return a + b + k; } F(int k) { F::k = k; } }; ... F f(7); int n = f(1,2); 
Note that the state F::k=7
is explicitly set through the constructor, whereas with nested functions the state defined in the outer scope simply "bleeds" into locallydefined function as was shown in the pseudoC example above. That is, variables in the nonlocal scope can be referenced while with true closures we can.
Exercise: since we stated that function objects are not closures, can the following code work?
int foo() { int k=7; class F { public: int operator()(int a,int b) { return a + b + k; } } f; int n = f(1,2); } 
STL defines unary and binary function objects for arithmetic, comparison, logical opertions, and selections. These operators are used with STL algorithms to iterate over objects for traversing, transforming, searching, and sorting container objects using their iterators.
For example, the transform "algorithm" iterates over objects to produce a result by applying a unary or binary operation. The operations are function objects. Here, we square the elements of a vector to produce a second vector, sort it, and then print it out:
#include <iostream> #include <vector> #include <algorithm> #include <functional> ... struct isqr : public unary_function<int, int> { int operator()(int x) { return x*x; } }; template<typename T> struct print : public unary_function<T, void> { print(ostream& out) : os(out) {} void operator()(T x) { os << x << ' '; } ostream& os; }; vector<int> V1(N), V2(N); ... transform(V1.begin(), V1.end(), V2.begin(), isqr()); sort(V2.begin(), V2.end(), less<int>()); for_each(V2.begin(), V2.end(), print<int>(cout)); 
Note that isqr
and print
function objects are derived from unary_function
.
Exercise: replace isqr
with a template sqr
function object to compute the square of an int
and double
.
Answer 1:
template<typename T> struct sqr : public unary_function<T, T> { T operator()(T x); }; template<> struct sqr 
template<typename T> class SquareTraits; 
We generally refer to an operation applied over a container to produce another container as a map. In the example above we used transform
as a map.
The Boost Lambda Library (BLL) for C++ simplifies the definition of function objects "on the fly" (or "inline"):
transform(V1.begin(), V1.end(), V2.begin(), _1 * _1); sort(V2.begin(), V2.end(), _1 < _2); for_each(V2.begin(), V2.end(), cout << _1 << ' '); 
This works by constructing function objects from the arguments _1
, _2
, _3
etc. which are bound to an operation such as *
and <
. The bind
opertion is used to bind arguments to a function. For example, bind(sin, _1)
binds the sine function to one argument to create a function object.
The Lambda Library approximates the lambda abstraction mechanism of lambda calculus. BLL lambdas are function objects and not real closures, because variables in an outer scope cannot be referenced in a BLL lambda. Also, a minor inconvenience is that no C++ statements are allowed with BLL, only expressions.
The ability to introduce functions and code blocks in expressions is an essential part of lambda calculus and languages that implement closures such as Haskell, Scheme, Python, Java closures, and Ruby. We will see more about lambda calculus later.
The new C++0x standard introduces "lambda function" closures, in which one ore more variables, say x
, in the outer scope can be accessed by value [=x]
or by reference [&x]
:
int foo() { int k = 7, n = 0; function<int (int,int)> f = [=k,&n](int a, int b) > int { n++; return a + b + k; }; bar(f); } int bar(function<int (int,int)> f) { return f(1,2); } 
However, we have to be careful. The environment of variable bindings in which the closure was constructed is not saved, which means we get into trouble when we pass variables in the outer scope by reference and the outer scope is no longer valid when the function is executed, for example when we pass k
by reference [&k]
into the closure:
function<int (int,int)> fk(int k) { function<int (int,int)> f = [&k](int a, int b) > int { return a + b + k; }; return f; } ... function<int (int,int)> f = fk(7); int n = f(1,2); // OOPS 
Passing k
by value [=k]
into the closure is fine:
function<int (int,int)> fk(int k) { function<int (int,int)> f = [=k](int a, int b) > int { return a + b + k; }; return f; } ... function<int (int,int)> f = fk(7); int n = f(1,2); 
A lambda function that does not reference any variables in the outer scope is essentially a function pointer to an anonymous function defined "inline" on the fly, for example:
sort(X.begin(), X.end(), [](double a, double b) > bool { return a < b; });
The other extreme, when a lambda function references variables in the outer scope but has no arguments is called a thunk. Thunks can be used to program Jensen's device (based on Algol 60 parameter passing by name):
double integrate(function<double (double)> f, double &x, double a, double b, double h) { double sum = 0; for (x = a; x =< b; x += h) sum += f(); return sum; } ... double y = 1.0; double z = integrate([&]() > double { return 2*x + x*x + y; }, x, 0.0, 10.0, 0.5); 
Recommended reading: "C++ Templates" by D. Vandevoorde and Nicolai Josuttis.
More on function pointers, delegates, and member function pointers for C++ experts: Member Function Pointers and the Fastest Possible C++ Delegates
For information on Haskell, see: a Gentle Introduction to Haskell. Other places to look: Haskell Tutorials.
In general, we refer to functions that take other functions as arguments as higherorder functions.
In the functional programming paradigm, higherorder functions typically operate over lists. There are many higherorder functions that can be used as building blocks to construct complex algorithms over lists. Here we will discuss the most common Haskell functions over lists.
First, we introduce the Haskell expression syntax which has a legacy in prior functional programming languages, such as ML and Miranda. The syntax is "clean" in the sense that we do not use parenthesis and commas for arguments in function calls. Parenthesis are solely used to group expressions.
We simply write
sqr x
instead of sqr(x)
by dropping the parenthesis. For functions with multiple arguments, we can drop the comma as well and write
power x 2
instead of power(x,2)
. When an argument is a complex expression we need parenthesis, as in
power x (sqr 2)
whereas writing "power x sqr 2
" applies power
to three arguments, which is not intended.
The syntax may be baffling at first when you're used to C/C++ and Java, but it is simply limiting the parenthesis to only those cases when you really need them to group expressions. This avoid the "syntax overloading" of parenthesis and commas in C/C++/Java for other constructs such as to delimit arguments in function calls and keywords.
We now introduce many common (higherorder) functions over lists. A list as constructed with the cons (:) operator, meaning x:xs
is the list with x
as head and xs
as tail.
Mapping a function f
over a list:
map f [x_{1}, x_{2}, ..., x_{n}]
= [f x_{1}, f x_{2}, ..., f x_{n}]
The map function satisfies
map f []
= []
map f (x:xs) = f x : map f xs
Haskell supports pattern matching in function arguments with list patterns []
and (x:xs)
, representing the empty list and nonempty list with head x
and tail xs
, respectively. Using the pattern matching capability, the equations for map actually define the map function recursively. This leads to the following evaluation steps when map
is applied to map f [1,2]
:
__map / \ f __: / \ 1 : / \ 2 [] 
____: / \ f map  / \ 1 f : / \ 2 [] 
____: / \ f __:  / \ 1 f map  / \ 2 f [] 
__: / \ f :  / \ 1 f []  2 
Note: pattern matching in Haskell is more restrictive than Prolog unification. There are only two patterns for lists, []
and (x:xs)
. Other Haskell patterns are 0
and n
for integers, and patterns for variant records (constants and contructors in "tagged unions").
An alternative definition without pattern matching can be given using the ifthenelse ternary function if'
:
map f xs = if' (null xs) [] (f (head x) : map f (tail xs))
where the following builtin functions are used:
head (x:xs) = x
tail (x:xs) = xs
null [] = true
null (x:xs) = false
if' x y z = case x of
true > y
false > z
Needless to say that pattern matching helps to reveal the true meaning of a function without the need to use any of the obfuscating primitive functions on lists.
Another example with a list pattern is:
length [] = 0
length (x:xs) = 1 + length xs
Evaluation of length [a,b]
proceeds recursively as follows:
length  __: / \ a : / \ b [] 
__+ / \ 1 length  : / \ b [] 
__+ / \ 1 + / \ 1 length  [] 
__+ / \ 1 + / \ 1 0 
To append lists we define the ++
infix operator as follows:
[] ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)
Note that the resulting list is a copy of the list xs
followed by the shared list ys
:
___++____ / \ __: __: / \ / \ 1 : 3 : / \ / \ 2 [] 4 [] 
________: / \ 1 ___++__ / \ : __: / \ / \ 2 [] 3 : / \ 4 [] 
________: / \ 1 ______: / \ 2 ___++ / \ [] __: / \ 3 : / \ 4 [] 
______: / \ 1 ____: / \ 2 __: / \ 3 : / \ 4 [] 
Filtering elements of a list results in a list of elements x
such that p x
holds. We can define filter
with a case construct:
filter p [] = []
filter p (x:xs) = case (p x) of
true >
x : filter p xs
false > filter p xs
Alternatively, we can define filter
using a guard to test if p x
is true to put the element x
in the new list or to skip over it:
filter p [] = []
filter p (x:xs)  p x = x : filter p xs
 = filter p xs
filter / \ odd __: / \ 1 : / \ 2 [] 
_____: / \ 1 filter / \ odd : / \ 2 [] 
___: / \ 1 filter / \ odd [] 
: / \ 1 [] 
We make a small change to the definition of filter
's third case by returning []
when p x
is false to get
takeWhile p [] = []
takeWhile p (x:xs)  p x = x : takeWhile p xs
 = []
Likewise we make a small change to the second and third case to get
dropWhile p [] = []
dropWhile p (x:xs)  p x = dropWhile p xs
 = x:xs
Note that we have a bit of inefficiency in the above when we match x:xs
and then create x:xs
again. To by using an "as pattern" s@(x:xs)
where argument s
contains the matched pattern (x:xs)
:
dropWhile p [] = []
dropWhile p s@(x:xs)  p x = dropWhile p xs
 = s
So, takeWhile p xs
returns the initial list of elements x
in xs
for which p x
is true, and dropWhile p xs
returns the list of remaining elements. Hence, we have that
xs = (takeWhile p xs) ++ (dropWhile p xs)
for any p
and xs
.
To reduce a list from the right to a single value is called a fold right:
foldr f a [x_{1}, x_{2}, ..., x_{n}] = f x_{1} (f x_{2} (f ... (f x_{n} a)))
where f
is a binary operator ⊗
, so we can write this more clearly as:
foldr (
⊗) a [x_{1}, x_{2}, ..., x_{n}] = x_{1}
⊗ (x_{2}
⊗ (... (x_{n}
⊗ a)))
which is represented by the expression tree:
______⊗ 
The fold right is defined as
foldr f a [] = a
foldr f a (x:xs) = f x (foldr f a xs)
Evaluation of foldr (*) 3 [1,2]
proceeds as follows:
____foldr /  \ * 3 __: / \ 1 : / \ 2 [] 
______* / \ 1 __foldr /  \ * 3 : / \ 2 [] 
______* / \ 1 ____* / \ 2 foldr /  \ * 3 [] 
__* / \ 1 * / \ 2 3 
To reduce a list from the left to a single value is called a fold left:
foldl f a [x_{1}, x_{2}, ..., x_{n}] = f (f (f (f a x_{1}) x_{2}) ...) x_{n}
where f
is a binary operator ⊗
, so we can write this more clearly as:
foldr (
⊗) a [x_{1}, x_{2}, ..., x_{n}] = (((a
⊗ x_{1})
⊗ x_{2})
⊗ ...)
⊗ x_{n}
which is represented by the expression tree:
⊗ 
The fold left is defined as
foldl f a [] = a
foldl f a (x:xs) = foldl f (f a x) xs
Evaluation of foldl (*) 1 [2,3]
proceeds as follows:
____foldl /  \ * 1 __: / \ 2 : / \ 3 [] 
____foldl__ /  \ * * : / \ / \ 1 2 3 [] 
____foldl____ /  \ * *__ [] / \ * 3 / \ 1 2 
*__ / \ * 3 / \ 1 2 
The fold operations are particularly interesting to serve as building blocks for other functions:
xs ++ ys = foldr (:) ys xs
length xs = foldr oneplus 0 xs
where oneplus x n = 1 + n
takeWhile p xs = foldr consifp [] xs
where consifp x xs  p x = x:xs
 = []
concat xss = foldr (++) [] xss
reverse xs = foldl snoc []
where snoc xs x = x:xs
sum xs = foldr (+) 0 xs
prod xs = foldr (*) 0 xs
and xs = foldr (&&) true xs
or xs = foldr () false xs
Note that reverse
defined above takes O(n) with n = length xs
to reverese the list, because foldl
applies snoc
n times. The naive implementation:
reverse [] = []
reverse (x:xs) = reverse xs ++ [x]
takes O(n^{2}) time.
We state some useful observations (laws).
THEOREM [First Duality Theorem]: Let (S,⊕) be a monoid with identity element a
. Then foldr (
⊕) a xs = foldl (
⊕) a xs
.
For example:
sum xs = foldr (+) 0 xs
.
= foldl (+) 0 xs
concat xss = foldr (++) [] xss
foldl (++) [] xss.
Exercise: which of the last two choices of fold in concat
is the most efficient, given that xs ++ ys
takes O(k) with k = length xs
to compute? Assume we take concat
over a list of n lists each of length m.
Answer: foldr
takes O(mn) time, whereas foldl
takes O(m^{2}n) time. The fold right is more efficient because each application of xs ++ ys
takes O(m) time since each xs
is of length m and ys
is the concatenated lists starting from the right in xss
.
THEOREM [Second Duality Theorem]: Let ⊕ and ⊗ be operators such that x ⊕ (y ⊗ z) = (x ⊕ y) ⊗ z and x ⊕ a = a ⊗ x. Then foldr (
⊕) a xs = foldl (
⊗) a xs
.
For example:
length xs = foldr oneplus 0 xs
where oneplus x n = 1 + n
= foldl plusone 0 xs
where plusone n x = n + 1
since oneplus x1 (plusone n x2) = 1+(n+1) = oneplus (plusone x1 n) x2
.
THEOREM [Third Duality Theorem]: Let ⊕ and ⊗ be operators such that x ⊕ y = y ⊗ x. Then foldr (
⊕) a xs = foldl (
⊗) a (reverse xs)
.
For example:
xs = foldr (:) [] xs
= foldl (snoc) [] (reverse xs)
where snoc xs x = x:xs
Hence, by the fact that reverse (reverse xs) = xs
we have that
reverse xs = foldl snoc [] xs
where snoc xs x = x:xs
Folds over monoids can be segmented by partitioning the list into parts xs = ys ++ zs
:
foldl (
⊕) a xs = foldl (
⊕) a (ys ++ zs) = (foldl (
⊕) a ys)
⊕ (foldl (
⊕) a zs)
foldr (
⊕) a xs = foldr (
⊕) a (ys ++ zs) = (foldr (
⊕) a ys)
⊕ (foldr (
⊕) a zs)
This observation leads to balanced reduction trees, for example in parallel sums.
Consider the evaluation relation ⇒ we used in defining the natural semantics of Wren. Suppose we define the evaluation function
eval state cmd = state'
where (cmd,state) ⇒ state'
Then a fold left of eval
over a list of commands starting with an initial state state_{0}
executes the commands in order to produce the final state state_{f}
:
state_{f} = foldl eval state_{0} cmds
For example
foldl eval st([5],[],[]) ["read(x)","x:=x+1","write(x)"]
= foldl eval (eval st([5],[],[]) "read(x)") ["x:=x+1","write(x)"]
= foldl eval st([],[],[x=5])
["x:=x+1","write(x)"]
= foldl eval (eval st([],[],[x=5]) "x:=x+1") ["write(x)"]
= foldl eval st([],[],[x=6]) ["write(x)"]
= foldl eval (eval st([],[],[x=6]) "write(x)") []
= foldl eval st([],[6],[x=6]) []
= st([],[6],[x=6])
To insert an element in an ordered list:
insert x xs = takeWhile lesseq xs ++ [x] ++ dropWhile lesseq xs
where lesseq y = (y <= x)
We can use insert
for simple insertion sort using a fold right:
isort xs = foldr insert [] xs
We end by defining two zip functions, zip
and zipWith
:
zip [x_{1}, x_{2}, ..., x_{n}] [y_{1}, y_{2}, ..., y_{n}] = [(x_{1},y_{1}), (x_{2},y_{1}), ..., (x_{n},y_{1})]
zipWith (
⊕) [x_{1}, x_{2}, ..., x_{n}] [y_{1}, y_{2}, ..., y_{n}] = [(x_{1}
⊕y_{1}), (x_{2}
⊕y_{1}), ..., (x_{n}
⊕y_{1})]
defined by
zip [] ys = []
zip (x:xs) [] = []
zip (x:xs) (y:ys) = (x,y) : (zip xs ys)
zipWith f [] ys = []
zipWith f (x:xs) [] = []
zipWith f (x:xs) (y:ys) = (f x y) : (zipWith f xs ys)
For example:
dotprod xs ys = foldr (+) 0 (zipWith (*) xs ys)
Say we take the dot product:
dotprod [3,1,4] [1,5,9] ⇒ foldr (+) 0 (zipWith (*) [3,1,4] [1,5,9]) ⇒ foldr (+) 0 [3,5,36] ⇒ 3+(5+(36+0)) ⇒ 44
Let's take a closer look at the expression syntax. Application of a function to multiple arguments associates to the left:
f a b = (f a) b
When we make the application operation explicit using an operator, say @
, this becomes more clear:
f @ a @ b = (f @ a) @ b
@__ / \ @ b / \ f a 
Note that argument expressions must be parenthesized:
f (g a) (h b)
__@____ / \ __@ @ / \ / \ f @ h b / \ g a 
Let's consider the following definition again:
sum xs = foldr (+) 0 xs
Because application associates to the left, we can rewrite this as:
sum xs = ((foldr (+)) 0) xs
which is depicted as:
@_ / \ sum xs 
@______ / \ @____ xs / \ @__ 0 / \ foldr + 
The simplified equation is:
sum = foldr (+) 0
Viewed as a definition, it eliminates the need to include the argument xs
on the lefthand side, because the function in the body of the definition on the righthand side is applied to xs
. This principle is referred to as "Currying" in honor of Haskell Curry.
Consider for example:
add x y = x + y
inc = add 1
We can write (+)
to refer to the addition operation as a function, so:
inc = (+) 1
Haskell uses the following Currying rules for infix operators, called "sections" for partial application:
(
⊗ a) x = x
⊗ a
(a
⊗) x = a
⊗ x
Currying is particularly useful for higherorder functions:
map ( 1) [1,2,3] = [0,1,2]
map (2 *) [1,2,3] = [1,4,9]
filter (> 1) [1,2,3] = [2,3]
List comprehensions form a convenient syntax to process lists. Several programming languages support list comprehensions, for example:
[ 2*x  x < [1..100], x^2 > 3 ] 
Haskell 
[2 * x for x in range(101) if x ** 2 > 3] 
Python 
[? List: 2 * x  x < 1  100 ; x * x > 3 ?] 
OCaml 
[2*n  n=1..100; n*n > 3] 
Pure 
[2*i for (n in (function(start,end){for (var i=start; i<=end; i++) yield i})(1,100)) if (n*n>3)] 
Javascript 1.8 
Let's take a closer look at the Haskell list comprehension syntax, which is based on prior functional languages such as Miranda and ML.
The syntax of list comprehensions in Haskell is
[ expr  qualifier, qualifier, ... ]
where a qualifier is a generator expression of the form pattern < list
, a predicate for filtering, or a local variable binding x = expr
.
For example:
divisors n = [ d  d < [1..n], n mod d = 0 ]
prime n = (divisors n = [1,n])
where the notation [a..b]
denotes a list of integers from a
to b
.
A faster primality check is:
prime n = ([ d  d < [2..isqrt n], n mod d = 0] = [])
Cartesian product:
cartesian xs ys = [ (x,y)  x < xs, y < ys ]
Sorting:
qsort [] = []
qsort (p:xs) = qsort [ x  x < xs, x < p] ++ [p] ++ qsort [ x  x < xs, x >= p ]
In fact, any list comprehension can be translated to an expression with map
, filter
, and concat
using the following rules (roughly):
[ x  x < xs] 
= xs 
[ f x  x < xs ] 
= map f xs 
[ e  x < xs, p x, ... ] 
= [ e  x < filter p xs, ... ] 
[ e  x < xs, y < ys, ... ] 
= concat [ [ e  y < ys, ... ]  x < xs ] 
For example,
qsort [] = []
qsort (p:xs) = qsort (filter (< p) xs) ++ [p] ++ qsort (filter (>= p) xs)
by noting that (< p)
and (>= p)
are Curried functions.
Lazy evaluation in Haskell allows for defining "infinite" data structures. More about lazy evaluation later, when we review lambda calculus evaluation modes.
Basically, lazy evaluation allows us to safely define an infinite list, such as
from n = n : from (n+1)
and use from
to produce a list of integers in a range:
range a b = take (ba+1) (from a)
where the Haskell take
function is defined as:
take 0 xs = []
take n [] = []
take n (x:xs) = x : take (n1) xs
This works by evaluating from
only to the point necessary (being lazy). This means that actual nonstrict arguments are never evaluated before passing them to the function. But rather the expression is passed as an argument into the function body and then evaluated when needed (viz. passbyname). For example:
range 1 2 
⇒ take 2 (from 1) 
⇒ take 2 (1 : from 2) 

⇒ 1 : take 1 (from 2) 

⇒ 1 : take 1 (2 : from 3) 

⇒ 1 : 2 : take 0 (from 3) 

⇒ 1 : 2 : [] 

⇒ [1,2] 
Another advantage of lazy evaluation is that potentially expensive highorder functions can be used to accomplish a simpler task efficiently that could otherwise only be done efficiently by recoding it in a new function.
For example, to search a list for a value we can define:
isin x [] = false
isin x (y:xs)  x == y = true
 = isin x xs
Or we can use a fold right:
isin x xs = foldr match false xs
where match y b  x == y = true
 = b
Due to the benefits of lazy evaluation, this latter definition will not traverse the entire list to find a match, but rather the search stops as soon as the element is found.
Why is this so? Suppose we evaluate isin 2 [1,2,3] = foldr match false [1,2,3]
, where match y b = true
when y == 2
else match y b = b
:
________foldr____ /  \ match false ____: / \ 1 __: / \ 2 : / \ 3 [] 
________match / \ 1 ______foldr____ /  \ match false __: / \ 2 : / \ 3 [] 
______foldr____ /  \ match false __: / \ 2 : / \ 3 [] 
______match / \ 2 ____foldr____ /  \ match false : / \ 3 [] 
true 
The same efficiency of lazy evaluation applies to takeWhile
, defined by:
takeWhile p xs = foldr consp [] xs
where consp x xs  p x = x:xs
 = []
With lazy evaluation it is safe to evaluate takeWhile
on an infinite list: takeWhile (< 10) (from 1) ⇒ [1,2,3,4,5,6,7,8,9]
.
Finally, we leave it as an exercise to define an nth
function using a fold right to get the n^{th} element of a list.
Exercise: use a fold right to define a function nth
such that nth n a xs
returns the n
^{th} element of xs
, or if the list is too short returns a
.
Answer: we can use zip to generate [(1,x_{1}), (2,x_{2}), ..., (m,x_{m})]
from a list [x_{1}, x_{2}, ..., x_{m}]
and search for the n^{th} tuple (n,x_{n})
to find x_{n}
:
nth n a xs = foldr ith a (zip (from 1) xs)
where ith (i x) y  i == n = x
 = y
MapReduce, functional style with immutability vs. imperative/OOP to avoid shared state, concurrency.
[... Will move this up ...]
A function is a mapping from the elements of a domain set to the elements of a range set (codomain set) given by a rule, for example:
cube: ℤ → ℤ
cube(n) = n^{3}
Another commonlyused notation writes out the function as a mapping relation ↦:
cube: n ↦ n^{3}
Lambda calculus emphasizes the mapping relation by dropping the name of the function altogether, i.e. making the functions anonymous. A lambda abstraction is an anonymous function of the form
λx.E
where x
is a variable and E
an expression. For example:
λn.n^{3}
which represents the cube function. To accept multiple arguments we nest abstractions as follows:
λn.λm.nm
Compare this to the notation used by the following selection of programming languages that support lambda abstractions ("lambda functions" and closures):
One argument  Multiple arguments  Language 
(lambda (n) (* n n n)) 
(lambda (n m) ( n m)) 
Scheme 
fn n => n*n*n 
fn n m => nm 
ML 
fun n > n*n*n 
fun n m > nm 
OCaml & F# 
n => n*n*n 
(n, m) => nm 
C# 
\n > n*n*n 
\n m > nm 
Haskell 
[](int n) > int { return n*n*n; } 
[](int n, int m) > int { return nm; } 
C++0x 
lambda n: n*n*n 
lambda n, m: nm 
Python 
function(n) { return n*n*n; } 
function(n, m) { return nm; } 
Javascript 
{ n > return n*n*n } 
{ n, m > return nm } 
Groovy closures 
(n: Int) => n*n*n 
(n: Int, m: Int) => nm 
Scala 
To apply an abstraction to one ore more actual arguments, we write each of the arguments to the right of the abstraction. For example:
(λn.n^{3}) 2
and
(λn.λm.nm) 3 1
The abstract syntax of a lambda term (or lambda expression) in the pure lambda calculus consists only of variables, abstrations, and applications:
E ::= x  λx.E  E E 
where x
is a name of a variable. The variable x
in the abstraction λx.E
is called a bound variable and λx
is a binder for x
whose scope is E
. Applications are denoted E_{1} E_{2}
, where E_{1}
is the operator (an expression that evaluates to a function) called the rator and E_{2}
is the operand, or simply the rand.
In the nonpure lambda calculus, values v
(such as constants, numbers, strings, data structures, and objects) are included in E
:
E ::= v  x  λx.E  E E 
We can also combine the language NB with lambda terms and obtain the language λ
NB:
E ::= true  false  if(E, E, E)  0  succ(E)  pred(E)  iszero(E)  x  λx.E  E E 
Parenthesis are used for grouping expressions. Abstractions are right associative:
λx.λy.z = λx.(λy.z)
Depicted as an abstract syntax tree:
__. 
Lambda application is left associative and syntactically binds more tightly than an abstraction, requiring parenthesis for the abstraction when applied to argument expressions:
(λx.x x) y z = ((λx.(x x)) y) z
For clarity, in this part of the talk we will use ':
' to explicitly denote the application operation in lambda terms and in abstract syntax trees (we will avoid confusing the apply ':
' with cons (:) that we used for lists by using 'cons' for lists) . Therefore, the lambda term can also be written
(λx.x:x):y:z
Depicted as an abstract syntax tree :
:______ 
Lambda abstractions are closures, since innermost lambda abstractions may use variables bound by outer abstractions:
(λk.λa.λb.a+b+k) 7
where k=7
in the innermost abstraction. This is written in C++0x lambda function form as
[](int k) > function<function<int (int)> (int)> { return [=k](int a) > function<int (int)> { return [=a,=k](int b) > int { return a + b + k; }; }; }(7)
At first inspection it seems that lambda terms are of little use beyond just defining anonymous functions in compact form. Furthermore, there are no constants, no data structures, no arithmetic, no Booleans, and no control flow.
However, lambda calculus has been shown to be Turing complete with the following semantic interpretation of lambda application by a onestep evaluation relation →_{β} which is referred to as beta reduction:
(λx.E
_{1})
E
_{2} →_{β} [x
↦ E
_{2}]E
_{1}
The substitution [x
↦ E
_{2}]E
_{1} replaces all free variables x
in term E
_{1} (the body of the abstraction) by term E
_{2} (the actual argument expression).
Nested abstractions naturally lead to Currying and partial evaluation, since we can always supply a single argument expression to an abstraction and either get a value (a normal form) or another abstraction to be applied to the next argument expression:
(λn.n^{3}) 2
→ 8
(λn.λm.nm) 3
→ λm.3m
The abstraction λm.3m
results from the partial application and can be applied to the next argument expression, when provided.
For example, let's define the two abstractions by name:
cube = λn.n^{3}
diff = λn.λm.nm
And evaluate:
cube (diff 3 1) 
→_{def}  (λn.n^{3})((λn.λm.nm) 3 1) 
→_{β}  (λn.n^{3})([n ↦ 3](λm.nm) 1) 

=  (λn.n^{3})((λm.3m) 1) 

→_{β}  (λn.n^{3}) ([m ↦ 1](3m) ) 

=  (λn.n^{3})(31) 

→_{δ}  (λn.n^{3}) 2 

→_{β}  [n ↦ 2]n^{3} 

=  2^{3} 

→_{δ}  8 
When a globallydefined name f occurs in an expression, it is simply replaced with its value f →_{def} v. In case of functions f, the value v is an abstraction. Arithmetic operation and other builtin operations are denoted by the evaluation relation →_{δ} denoting the application of delta rules.
In the lambda abstraction
λx.x+y
variable x
is bound and y
is free. This concept of bound versus free variables in lambda terms is similar to the familiar scoping of function arguments in programming languages such as C:
f(int x){ return x+y; }
where x
is bound (the argument of f
) and y
is free (it is bound to something that is outside of the definition of f
).
Substituion respects variable bindings to ensure that the proper variable occurrences x
are replaced by term s
while avoiding changing the meaning of other parts of E
. This is the same problem when replacing a name in a C program for another, where we only want to replace the name when it is not locally defined (bound).
For example, suppose we want to replace name x
with y
in the C code
{ int x; ... x ... z ... } x ... { int z; ... x ... z ... }
then we want to avoid replacing the local x
declared in the first block. So we get:
{ int x; ... x ... z ... } y ... { int z; ... y ... z ... }
Likewise,
[x
↦ y]((λx.x z) x (λz.x z)) =
((λx.x z) y (λz.y z))
But what happends when we naively substitute x
by a different variable, say z
, in our C fragment? Then we get:
{ int x; ... x ... z ... } z ... { int z; ... z ... z ... }
Note that the second replacement of x
by z
is captured and z
becomes locally bound! This action changed the meaning of the program. To force substitution in these cases, we must rename the local z
to a new unused variable, say t
, and get:
{ int x; ... x ... z ... } z ... { int t; ... z ... t ... }
and then substitute as usual:
{ int x; ... x ... z ... } z ... { int t; ... z ... t ... }
Likewise, full substitution in lambda terms must avoid variable capture by renaming (λz.x z)
to (λt.x t)
first and then replace the x
:
[x
↦ z]((λx.x z) x (λz.x z)) =
((λx.x z) z (λt.z t))
Renaming (λz.x z)
to (λt.z t)
is referred to as alpha conversion (or alpha renaming) in lambda calculus. More formally, alpha conversion is the relation →_{α} defined by
(λx.E)
→_{α} (λy.
[x
↦ y]E)
if y
∉ FV[E]
That is, we replace a name x
with another, say y
, but we should not rename by y
if y
occurs as a free variable y
∈ FV[E]
in E
, since this would lead to capturing all free y
in E
.
The set of free variables FV[E]
of a lambda term E
is formally defined as:
FV[v] 
= 
∅ if v is a value 
FV[x] 
= 
{x} 
FV[λx.E] 
= 
FV[E]\{x} 
FV[E _{1} E _{2}] 
= 
FV[E _{1}] ∪ FV[E _{2}] 
Substitution of x
by a term s
in expression E
without alpha conversion, denoted [x
↦ s]E
, is defined as:
[x ↦ s]v 
= 
v if v is a value 
[x ↦ s]x 
= 
s 
[x ↦ s]y 
= 
y if y ≠ x 
[x ↦ s](λy.E ) 
= 
λy. E if y = x 
[x ↦ s](λy.E ) 
= 
λy.[x ↦ s]E if y ≠ x and y ∉ FV[s] 
[x ↦ s](E _{1} E _{2}
) 
= 
[x ↦ s]E _{1} [x ↦ s]E _{2} 
Captureavoiding substitution by alpha conversion includes the following clause:
[x
↦ s](λy.E
) = λz.[x
↦ s][y
↦ z
]E
if y
≠ x
and y
∈ FV[s]
where z
is a new variable such that z
∉ FV[E]
and z
∉ FV[s]
.
For example, captureavoiding substitution is needed when operands in applications are not closed (recall that a term t is closed when it contains no free variables, i.e. FV[
t]
= ∅):
(λx.λy.x+y) y
→_{β} [x
↦ y](λy.x+y)
= λz.y+z
We saw that an important part of the operational semantics of lambda calculus is defined by beta reduction (see TPL Ch. 5):
(EAppAbs)  
(λx.E _{1}) E _{2} →_{β} [x ↦ E _{2}]E _{1} 
where (λx.E
_{1})
E
_{2} is called a beta redex. A redex is a reducible expression.
However, we have not yet defined an ordering to evaluate lambda applications. First, we should evaluate the rator of an application by reducing all redexes in the rator E
_{1} by the rule:
E _{1} →_{β} E '_{1} 
(EApp1) 
E _{1} E _{2} →_{β} E '_{1} E _{2} 
Then, we can choose to evaluate the rand of an application by reducing all redexes in the rand E
_{2} by the rule:
E _{2} →_{β} E '_{2} 
(EApp2) 
E _{1} E _{2} →_{β} E _{1} E '_{2} 
Furthermore, although as we see later this is rarely done or necessary, we could also choose to reduce redexes inside an abstraction by the rule:
E →_{β} E ' 
(EAbs) 
λx.E →_{β} λx.E ' 
The question is, do we really need E_{2}
in (EAppAbs) to be a value (normal form) produced by rule (EApp2)? How about requiring E_{1}
in (EAppAbs) to be reduced to normal form by (EAbs). Is this really necessary? Because we have multiple choices of beta reductions in a lambda expression, does it matter which one we pick and should we eventually reduce all beta redexes?
Consider
(λx.x^{2}) ((λy.y+1) 3)
We can either apply the leftmost abstraction, giving
((λy.y+1) 3)^{2}
and obtain
(3+1)^{2} = 16
Or we can apply the rightmost abstraction, giving
(λx.x^{2}) (3+1)
and then
(3+1)^{2} = 16
The results are identical. As it turns out the order is immaterial when the evaluations terminate into a beta normal form. Termination of reduction is not guaranteed, and may depend on the redex we pick. A beta normal form is a lambda expression that has no beta redex, i.e. no subexpression of the form (λx.E
_{1})
E
_{2}. Values are in normal form, and so are abstractions that are not applied such as λx.λy.x (x y)
.
First, we need to define an equivalence relation on lambda expressions to compare (partially) evaluated results and beta normal forms.
Observe that the reflexive, symmetric, and transitive closure of alpha conversion →_{α} satisfies
t ≡_{α} s if t →_{α} s
t ≡_{α} t
t ≡_{α} s if s ≡_{α} t
t ≡_{α} s if t ≡_{α} r and r ≡_{α} s
THEOREM [Alpha Equivalence]: The relations →_{α}^{*} and ≡_{α} are identical.
Proof: The alpha reduction relation →_{α} is symmetric: λx.E
→_{α} λy.[x
↦ y]
E
→_{α} λx.[y
↦ x][x
↦ y]
E
= λx.E
. Therefore, →_{α}^{*} is symmetric. ∎
The equivalence relation ≡_{α} over lambda expressions forms an equivalence class. That is, if t ≡_{α} s then t and s are identical up to the choice of names for the bound variables.
The rules (EApp1), (EApp2), and (EAbs) select a part of a lambda expression to evaluate that contains a beta redex. When there is more than one redex any one of the rules can be applied to pick a redex. This suggests the full betareduction scheme:
while there are beta redexes in t do
reduce one of the redexes in t
When the above loop terminates, t is in beta normal form.
Full evaluation does no specificy which redex to reduce. Several different evaluation strategies for lambda calculus exist. The most important strategies are:
E_{1}
is not an abstraction:
(λx.g x x) ((λy.f y) a) 
→_{β} g ((λy.f y) a) ((λy.f y) a) 
by (EAppAbs) 
→_{β} g (f a) ((λy.f y) a) 
by (EApp1), (EApp2), (EAppAbs)  
→_{β} g (f a) (f a) 
by (EApp2), (EAppAbs) 
(λx.λy.x y) (λz.z) 
→_{β} λy.(λz.z) y 
(λx.g x x) ((λy.f y) a) 
→_{β} g x x where x = (λy.f y) a 
→_{β} g x x where x = f a 
(λx.g x x) ((λy.f y) a) 
→_{β} (λx.g x x) (f a) 
by (EApp2), (EAppAbs) 
→_{β} g (f a) (f a) 
by (EAppAbs) 
The callbyvalue strategy is strict, in the sense that arguments to functions are always evaluated. By contrast, the nonstrict (or lazy) strategies such as callbyname and callbyneed evaluate only the arguments that are actually used.
Let's try this out and experiment with lambda calculus by defining the following abstract syntax of lambda expressions in Prolog:
E ::= v  x  \x.E  E:E 
where values v
are Prolog atoms (numbers and names), x
is a Prolog atom name, \x.E
is an abstraction, and E:E
is an application.
% PROLOG FILE: beta.pl : op(900, xfy, .). % lambda abstraction 
For example, we write (λx.g x x) ((λy.f y) a)
as (\x.g:x:x):((\y.f:y):a)
and it abstract syntax tree is drawn as follows:
? [beta]. ? draw((\x.g:x:x):((\y.f:y):a)). ______:______ / \ ____. :____ / \ / \ \ :__ __. a  / \ / \ x : x \ : / \  / \ g x y f y 
We can now implement the operational semantics of lambda calculus as follows:
% PROLOG FILE: beta.pl %_CONCLUS_ : _________RULE_________, __PREMISES__. 
There are two rules for (EAppAbs), only one of which is enabled at any given time depending on the evaluation strategy NOR or AOR. With NOR the rule (EAbsApp) should be tried first before (EApp2), hence the strategic rule ordering. Callbyname (NOR to WHNF) and callbyvalue strategies require disabling (EAbs).
We can experiment with these strategies by enabling and disabling rules with by_name
, by_value
, nor
, and aor
:
% PROLOG FILE: beta.pl % Lazy call by name strategy = NOR to weak head normal form (WHNF): by_name : nor, disable(['EAbs', 'EApp2']). % Lazy call by need strategy = NOR + WHNF + sharing: by_need : by_name, disable('EAppAbsNOR'), enable('EAppAbsshare'). % Strict call by value strategy = AOR w/o lambda abstraction body reduction: by_value : aor, disable('EAbs'). % Normal order reduction (NOR) strategy: nor : disable(['EAppAbsAOR', 'EAppAbsshare']), enable(['EAbs', 'EApp1', 'EApp2', 'EAppAbsNOR']). % Applicative order reducton (AOR) strategy: aor : disable(['EAppAbsNOR', 'EAppAbsshare']), enable(['EAbs', 'EApp1', 'EApp2', 'EAppAbsAOR']). 
For example, callbyname reduces the lambda expression (λx.x) ((λy.y) (λz.(λu.u) z))
to WHNF by →_{β}^{*} using the reflexive, transitive relation *>
of :>
? [beta]. ? show, by_name, (\x.x):((\y.y):(\z.(\u.u):z)) *> C. __________:__ / \ . ______:__ / \ / \ \ x . ____.  / \ / \ x \ y \ :__   / \ y z . z / \ \ u  u beta((\x.x): ((\y.y):(\z.(\u.u):z)),(\y.y):(\z.(\u.u):z)) [1]:EAppAbsNOR (\x.x): ((\y.y):(\z.(\u.u):z)):>(\y.y):(\z.(\u.u):z) ______:__ / \ . ____. / \ / \ \ y \ :__   / \ y z . z / \ \ u  u beta((\y.y):(\z.(\u.u):z),(\z.(\u.u):z)) [2]:EAppAbsNOR (\y.y):(\z.(\u.u):z):>(\z.(\u.u):z) ____. / \ \ :__  / \ z . z / \ \ u  u C = (\z.(\u.u):z). 
whereas AOR fully evaluates the lambda expression to a normal form:
? show, aor, (\x.x):((\y.y):(\z.(\u.u):z)) *> C. __________:__ / \ . ______:__ / \ / \ \ x . ____.  / \ / \ x \ y \ :__   / \ y z . z / \ \ u  u beta((\u.u):z,z) [1,1,1,1]:EAppAbsAOR (\u.u):z:>z (\u.u):z:>z [1,1,1]:EAbs (\z.(\u.u):z):>(\z.z) (\z.(\u.u):z):>(\z.z) [1,1]:EApp2 (\y.y):(\z.(\u.u):z):>(\y.y):(\z.z) (\y.y):(\z.(\u.u):z):>(\y.y):(\z.z) [1]:EApp2 (\x.x): ((\y.y):(\z.(\u.u):z)):>(\x.x): ((\y.y):(\z.z)) ______:__ / \ . __:__ / \ / \ \ x . .  / \ / \ x \ y \ z   y z beta((\y.y):(\z.z),(\z.z)) [2,1]:EAppAbsAOR (\y.y):(\z.z):>(\z.z) (\y.y):(\z.z):>(\z.z) [2]:EApp2 (\x.x): ((\y.y):(\z.z)):>(\x.x):(\z.z) __:__ / \ . . / \ / \ \ x \ z   x z beta((\x.x):(\z.z),(\z.z)) [3]:EAppAbsAOR (\x.x):(\z.z):>(\z.z) . / \ \ z  z C = (\z.z). 
To simulate callbyneed with NOR to WHNF with sharing, we use a pair of term~value in beta reduction of an argument by substituting E
_{2}~v
into the abstraction instead of just the term E
_{2}:
(EAppAbsshare)  
(λx.E _{1}) E _{2} →_{β} [x ↦ E _{2}~v]E _{1} 
where v
is a new variable that is a placeholder for a shared value, so that for example
(λx.x x) ((λy.y) (λz.z))
→_{β} ((λy.y) (λz.z))~v ((λy.y) (λz.z))~v
→_{β} (λz.z) ((λy.y) (λz.z))~(λz.z)
→_{β} (λz.z) (λz.z)
→_{β} (λz.z)
where ((λy.y) (λz.z))~v
→_{β} v
with v
= (λz.z)
is evaluated with the following new rules:
E →_{β}^{*} v

if v is an uninstantiated variable in E~v 
E~v →_{β} v 
if v is a value 

E~v →_{β} v 
To implement the callbyneed strategy in Prolog we just need to add:
% PROLOG FILE: beta.pl %_CONCLUS_ : _________RULE_________, __PREMISES__. 
The first beta reduction shows the sharing via a Prolog variable, in this case highlighted as V
:
? show, by_need, (\x.x:x):((\y.y):(\z.z)) *> C. ______:____ / \ __. __:__ / \ / \ \ : . .  / \ / \ / \ x x x \ y \ z   y z beta((\x.x:x): ((\y.y):(\z.z))~V, ((\y.y):(\z.z))~V: ((\y.y):(\z.z))~V) [1]:EAppAbsshare (\x.x:x): ((\y.y):(\z.z)):> ((\y.y):(\z.z))~V: ((\y.y):(\z.z))~V ________:________ / \ ~______ ~______ / \ / \ __:__ V __:__ V / \ / \ . . . . / \ / \ / \ / \ \ y \ z \ y \ z     y z y z 
where eventually (\y.y):(\z.z) :> \z.z
, so V = \z.z.
The big question is whether the NOR and AOR fullevaluation strategies terminate in a normal form and if these normal forms are the same? If so, the normal form of a lambda expression is unique and can be considered the value computed by reducing the lambda expression. Note that callbyname and callbyvalue strategies may not always produce normal forms.
A relation → satisfies the diamond property if for all terms t, s, r such that t → s and t → r there exists a term u such that s → u and r → u.
THEOREM: If a relation → satisfies the diamond property, so does its transitive closure →^{*}.
Unfortunately, the onestep →_{β} evaluation relation does not satisfy the diamond property.
Take for example (λx.x x) (λy.y a)
, then:
(λx.x x) (λy.y a)
→_{β} λy.y a (λy.y a)
→_{β} a (λy.y a)
→_{β}
a a
and
(λx.x x) (λy.y a)
→_{β} (λx.x x) a
→_{β}
a a
We can easily construct a new evaluation relation →_{◇} for which the transitive closure is the same as →_{β}^{*} as follows:
t →_{◇} t 
t →_{◇} s  
λx. t →_{◇} λx. s 
t →_{◇} r s →_{◇} u  
t s →_{◇} r u 
t →_{◇} r s →_{◇} u  
(λx. t) s →_{◇} [x ↦ u] r 
We can easily verify that →_{◇}^{*} is identical to →_{β}^{*}. We say that the →_{◇} reduction relation and its transitive closure →_{β}^{*} are confluent relations by the diamond property. The →_{β} reduction relation is weakly confluent.
THEOREM [ChurchRosser]: If t ≡_{β} s then there is a u such that t →_{β}^{*} u and s →_{β}^{*} u.
Proof: By induction on the ≡_{β} relation:
Suppose t ≡_{β} s because t →_{β} s, then take u = s such that t →_{β}^{*} u and s →_{β}^{*} u.
Suppose t ≡_{β} s because t = s, then take u = t = s such that t →_{β}^{*} u and s →_{β}^{*} u.
Suppose t ≡_{β} s because s ≡_{β} t, then by the induction hypothesis there is a u such that t →_{β}^{*} u and s →_{β}^{*} u.
Suppose t ≡_{β} s because t ≡_{β} r and r ≡_{β} s, then by the induction hypothesis there exist u' and u'' such that t →_{β}^{*} u' and r →_{β}^{*} u', and also r →_{β}^{*} u'' and s →_{β}^{*} u''. By the diamond property there exists a u such that u' →_{β}^{*} u and u'' →_{β}^{*} u. ∎
It follows as a corollary from ChurchRosser that:
COROLLARY: Let t ≡_{β} u and u is in normal form, then t →_{β}^{*} u.
Proof: By ChurchRosser we have that t →_{β}^{*} u' and u →_{β}^{*} u'. Since u is in normal form u →_{β}^{*} u' = u and t →_{β}^{*} u.
COROLLARY: A lambda expression can have at most one normal form
Proof: Suppose u' and u'' are both beta normal forms of t. We have that t ≡_{β} u and t ≡_{β} u''. By Church Rosser there is a u such that u' →_{β}^{*} u and u'' →_{β}^{*} u . Because u, u', and u'' are normal forms u = u' = u''.
THEOREM [Standardization Theorem]: If a lambda expression has a normal form, then the NOR strategy guarantees reaching that normal form.
Note that NOR may terminate when AOR does not, since NOR does not evaluate arguments that are not needed.
Assuming we use callbyname (NOR to WHNF) reduction strategy, there is a very useful observation we can make: in the steps to reduce a closed lambda expression to WHNF we never encounter the variable capture problem because the operand argument that is substituted into a lambda abstraction is closed. To see why this is the case, consider:
(λx.λy. ...x...) (...y...)
However, because we started with a closed expression, there must be a binding for y
in an outer abstraction:
(λy. ... (λx.λy. ...x...) (...y...) ...)
But since we never reduce inside abstractions as per WHNF, the application (λx. ...) (...y...)
is never reduced before y
is bound to a value, say a
:
(λy. ... (λx.λy. ...x...) (...y...) ...) a
→_{β} ... (λx.λy. ...x...) (...a...) ...
Therefore, the variable capture problem never occurs when substituting in beta reduction under callbyname (and consequently, under callbyneed).
The pure lambda calculus has no values other than lambda abstractions. This seems very limited. However, we can encode Booleans, natural numbers, and lists in the pure lambda calculus by Church encoding.
Church Booleans 'true' and 'false' are selector functions:
tru = λx.λy.x
fls = λx.λy.y
The idea here is that tru
and fls
select the first or second argument, respectively, which are used to select the then and elseexpressions in a conditional form:
tru
thenexpr elseexpr →_{β} thenexpr
fls
thenexpr elseexpr →_{β} elseexpr
test
[condexpr, thenexpr, elseexpr] = condexpr thenexpr elseexpr
Logical operations are defined by
and = λx.λy.test[x,y,fls] = λx.λy.x y fls
or = λx.λy.test[x,tru,y] = λx.λy.x tru y
Let's verify these:
and tru tru = (λx.λy.x y fls) tru tru
→_{β} (λy.tru y fls) tru
→_{β} tru tru fls
→_{β} tru
and tru fls = (λx.λy.x y fls) tru fls
→_{β} (λy.tru y fls) fls
→_{β} tru fls fls
→_{β} fls
and fls any = (λx.λy.x y fls) fls any
→_{β} (λy.fls y fls) any
→_{β} fls any fls
→_{β} fls
Likewise, we verify that:
or tru any = (λx.λy.x tru y) tru any
→_{β} (λy.tru tru y) any
→_{β} tru tru any
→_{β} tru
or fls tru = (λx.λy.x tru y) fls tru
→_{β} (λy.fls tru y) tru
→_{β} fls tru tru
→_{β} tru
or fls fls = (λx.λy.x tru y) fls fls
→_{β} (λy.fls tru y) fls
→_{β} fls tru fls
→_{β} fls
not = λx.λy.λz.x z y
Exercise: verify that not tru = fls
and not fls = tru
.
Church numerals are formed by the abstractions
0 = λs.λz.z
1 = λs.λz.s z
2 =
λs.λz.s (s z)
3 =
λs.λz.s (s (s z))
4 = λs.λz.s (s (s (s z)))
...
n = λs.λz.s^{[n]} z
where
s^{[0]} z = z
s^{[n+1]} z = s^{[n]} (s z)
Thus, for any numeral n
we have the identity n
s z = s^{[n]} z
, which leads to the following definitions for zero, the successor, and iszero:
zero = λs.λz.z
succ = λn.λs.λz.s (n s z)
iszo = λn.n (λs.fls) tru
We verify that
succ n = λs.λz.s^{[n+1]} z =
λs.λz.s (s^{[n]} z)
iszo zero = zero (λs.fls) tru = (λs.λz.z) (λs.fls) tru = tru
iszo (n+1) = (n+1) (λs.fls) tru = (λs.λz.s^{[n+1]} z) (λs.fls) tru = (λs.fls)^{[n+1]} tru = (λs.fls) ((λs.fls)
^{[n]} tru) = fls
The addition function uses the identity λs.λz.s^{[n+m]} z = s^{[n]} (s^{[m]} z)
:
plus =
λn.λm.λs.λz.n x (m s z)
and multiplication uses the identity s^{[n*m]} z = (s^{[n]})^{[m]} z
:
mult =
λn.λm.λs.λz.n (m s) z
Church pairs form the basic structure to create aggregate data structures such as lists. The idea is to use Booleans tru
and fls
as selectors to get the head or tail value of a pair. That is, a pair
is a function that takes a Boolean b
:
pair[h,t] = λb.b h t
such that
pair[h,t] tru = h
pair[h,t] fls = t
This leads to the following definitions to construct a pair and decompose a pair in a head and tail:
pair = λh.λt.λb.b h t
head = λp.p tru
tail = λp.p fls
The empty list null and isempty test are functions:
null = λx.tru
empt = λx.x (λy.λz.fls)
such that
empt null = (λx.x (λy.λz.fls)) (λx.tru)
→_{β} (λx.tru) (λy.λz.fls)
→_{β} tru
empt (pair h t) = (λx.x (λy.λz.fls)) (
λb.b h t)
→_{β} (
λb.b h t)
(λy.λz.fls)
→_{β} (λy.λz.fls) h t
→_{β} (λz.fls) t
→_{β} fls
An implementation of Church encodings:
? [church]. ? noshow, aor, plus:zero:(succ:zero) *> C. C = (\s.(\z.s:z)). ? noshow, aor, mult:(succ:zero):(succ:(succ:zero)) *> C. C = (\s.(\z.s: (s:z))). 
Recursion in the pure lambda calculus is achieved with an operation that is divergent, meaning its evaluation never terminates. Consider the Ωcombinator (a combinator is a closed lambda expression)
Ω = (λx.x x) (λx.x x)
that never terminates
(λx.x x) (λx.x x)
→_{β} (λx.x x) (λx.x x)
→_{β} ...
We would like to harnass the ability of infinite repetition to implement recursion. Let's consider a recursive function
fac n = if n=0 then 1 else n*fac (n1)
which can be written in pure form
fac = λn.(iszo n) 1 (mult n (fac (pred n)))
By abstracting the function name away, we obtain
λf.λn.(iszo n) 1 (mult n (f (pred n)))
The only thing we have to do now is to bind variable f
to the lambda abstraction's body itself, so that f
represents the recusrive function:
(λn.(iszo n) 1 (mult n (f (pred n))))
↖______________________↙
The fixpoint Ycombinator can be used to achieve this. The Ycombinator (often written 'fix
') is a replicator that satisfies:
Y F = F (Y F)
Take for example
F = λf.λn.(iszo n) 1 (mult n (f (pred n)))
then we can define
fac = Y F
since
fac = Y F = F (Y F) = (λf.λn.(iszo n) 1 (mult n (f (pred n)))) (Y F)
→_{β} λn.(iszo n) 1 (mult n ((Y F) (pred n)))) = λn.(iszo n) 1 (mult n (fac (pred n))))
which is what we wanted.
Recall that the Ωcombinator replicates itself. A generalization of this is the fixpoint Ycombinator in lambda form is:
Y = λf.(λx.f (λy.x x y)) (λx.f (λy.x x y))
By eta reduction, defined by the rule
λx.E x
→_{η} E
if x
∉ FV[E]
we find the alternative equivalent form
Y = λf.(λx.f (x x)) (λx.f (x x))
With this form we immediately see that
Ω = Y I
with identity combinator
I = λx.x
However, we do not use the alternative form of Y
with AOR, because Y F
diverges for any F
due to strict evaluation under AOR.
We verify that Y
is indeed a Ycombinator:
Y F = λf.(λx.f (x x)) (λx.f (x x)) F
→_{β} (λx.F (x x)) (λx.F (x x))
→_{β} F (λx.F (x x)) (λx.F (x x)) = F (Y F)
Assuming lazy evaluation by callbyname or callbyneed, we can define
fix = λF.F (fix F)
which satisfies the fixpoint Ycombinator proprty fix F = F (fix F)
.
With lazy evaluation this defnition works, because the recursive invocation will only be evaluated when F
uses its operand (fix F)
.
? [delta]. ? noshow, by_name, fix:(\f.(\n.if:(eq:n:0):1:(mul:n:(f:(sub:n:1))))):3 *> C. C = 6. ? [church]. ? noshow, by_name, fix:(\f.(\n.(iszo:n):(succ:zero):(mult:n:(f:(pred:n))))):(succ:(succ:(succ:zero))) *> C. C = (\s.(\z.s: (s: (s: (s: (s: (s:z))))))). 
Special forms are needed for AOR strategies such as callbyvalue. Special forms are nonstrict in certain operands.
When implementing the fixpoint Ycombinator for evaluation with callbyvalue (AOR), we cannot use the definition of fix above, because fix F
diverges for any F
. We can use
fix = λf.λx.f (fix f) x
which is equivalent to the previous definition by eta reduction:
fix = λf.λx.f (fix f) x
→_{η} λf.f (fix f)
.
Eta reduction generalizes the principle of Currying, meaning that any partial application of a kary function f applied to k1 arguments satisfies
f E_{1} E_{2} ... E_{k1} = (λx.f E_{1} E_{2} ... E_{k1} x)
For example:
? [eta]. ? eta((\x.add:1:x):2, E). E = add:1:2. ? eta((\x.(\y.add:y):x):1:2, E). E = add:1:2. 
Exercise: Show that the correctness of eta reduction follows from beta reduction.
Proof: Let E
be an expression such that x
∉ FV[E]
. Then, (λx.E x) a
→_{β} E a
. So (λx.E x)
→_{η} E
is valid under beta reduction. ∎
Another special form is needed for conditionals for evaluation with strict evaluation strategies such as callbyvalue and AOR. That is, the Church Boolean test
[condexpr, thenexpr, elseexpr] evaluates all three operands under callbyvalue. We extend the lambda expression syntax with the if
/3 construct and use the operational semantics of NB expressions:
(EIfTrue)  
if(true , T _{2}, T _{3}) → T _{2} 
(EIfFalse)  
if(false , T _{2}, T _{3}) → T _{3} 
T _{1} → T '_{1} 
(EIf) 
if(T _{1}, T _{2}, T _{3}) → if(T '_{1}, T _{2}, T _{3}) 
We add the if
/3 construct to the lambda calculus operational semantics in Prolog and tuples:
E ::= v  x  \x.E  E:E  if(E,E,E)  (E,E) 
Values v
include the atoms true
and false
.
We add the following operational semantics (delta rules) to define several "builtin" functions:
% PROLOG FILE: delta.pl %________CONCLUSION_________ : ________RULE_______, __PREMISES__. add:N:M :> K : rule('EAdd'), number(N), number(M), K is N+M. add:T:S :> add:R:S : rule('EAdd1'), T :> R. add:T:S :> add:T:R : rule('EAdd2'), S :> R. sub:N:M :> K : rule('ESub'), number(N), number(M), K is NM. sub:T:S :> sub:R:S : rule('ESub1'), T :> R. sub:T:S :> sub:T:R : rule('ESub2'), S :> R. mul:N:M :> K : rule('EMul'), number(N), number(M), K is N*M. mul:T:S :> mul:R:S : rule('EMul1'), T :> R. mul:T:S :> mul:T:R : rule('EMul2'), S :> R. div:N:M :> K : rule('EDiv'), number(N), number(M), K is N/M. div:T:S :> div:R:S : rule('EDiv1'), T :> R. div:T:S :> div:T:R : rule('EDiv2'), S :> R. eq:T:T :> true : rule('EEqTrue'). eq:T:S :> eq:R:S : rule('EEq1'), T :> R. eq:T:S :> eq:T:R : rule('EEq2'), S :> R. eq:_:_ :> false : rule('EEqFalse'). and:true:true :> true : rule('EAndTrue'). and:false:_ :> false : rule('EAndFalse'). and:T:S :> add:R:S : rule('EAnd1'), T :> R. and:T:S :> add:T:R : rule('EAnd2'), S :> R. if:true:T:_ :> T : rule('ECondTrue'). if:false:_:T :> T : rule('ECondFalse'). if:T:P:Q :> if:R:P:Q : rule('ECond'), T :> R. fix:F:T :> F:(fix:F):T : rule('EFix'). if(true,T,_) :> T : rule('EIfTrue'). if(false,_,T) :> T : rule('EIfFalse'). if(T,P,Q) :> if(R,P,Q) : rule('EIf'), T :> R. pair:T:S :> (T,S) : rule('EPair'). 
The addition of the if
/3 special form allows the evaluation rules for this form to deviate from the evaluation rules for lambda calculus. This ensures we can safely use conditionals with AOR strategies (callbyvalue). Without the special form, all operands to a conditional will be evaluated leading to potential errors. For example:
? noshow, by_value, if:true:1:(div:1:0) *> C. 
which gives a zerodivisor error because (div:1:0)
is always evaluated with callbyvalue as an operand to the if
function, whereas the special form if
/3 for example:
? noshow, by_value, if(true,1,(div:1:0)) *> C. 
which produces the result '1
' without error since we defined specific evaluation rules (EIfTrue), (EIfFalse), and (EIf) exactly as in our NB expression language operational semantics.
With nonstrict evaluation strategies such as callbyname the if
function can be used without the need for a if
/3 special form:
? noshow, by_name, if:true:1:(div:1:0) *> C. C = 1. 
Because pairs (E,E)
have no evaluation rules, pairs form lazy data structures. This may appear rather strange at first. It means that pairs can be placeholders for unevaluated expressions. Once the first or second expression in the pair is obtained with fst
or snd
, respectively, the expression is evaluated by the current evaluation strategy.
In NOR strategies (callbyname and callbyneed) operands are only evaluated when needed, which demonstrates the lazyness of the tuple constructor pair
:
? noshow, by_name, fst:(pair:1:(div:1:0)) *> C. C = 1 . 
which produces the result without error, due to the fact the the tuple lazely stores the operands 1
and (div:1:0)
as shown here in more detail:
? show, by_name, fst:(pair:1:(div:1:0)) *> C. __________:_ / \ fst _____:___ / \ :_ :___ / \ / \ pair 1 :_ 0 / \ div 1 [1,1]:EPair pair:1: ((div):1:0):> (1, (div):1:0) pair:1: ((div):1:0):> (1, (div):1:0) [1]:EFirst fst: (pair:1: ((div):1:0)):>fst: (1, (div):1:0) _______:_ / \ fst _____, / \ 1 :___ / \ :_ 0 / \ div 1 [2]:EFirstPair fst: (1, (div):1:0):>1 1 C = 1. 
Note that the operands T
and S
to pair:T:S
are not evaluated with NOR strategies and simply copied into the (T,S)
tuple result.
Callbyneed with lazy data structure constructors allows the construction of "infinite" data structures. For example, we can define from
and take
functions:
from :> \n.pair:n:(from:(add:n:1)). take :> \n.(\xs.if(eq:n:0,[],if(eq:xs:[],[],pair:(fst:xs):(take:(sub:n:1):(snd:xs))))). 
and observe that the infinite list (by tupling) generated by from
is never created unless needed:
? noshow, by_name, fst:(from:1) *> C. C = 1. 
Because tuples are lazy data structures, their content is never evaluated unless we explicitly fetch the first or second element by fst
or snd
:
? noshow, by_name, take:2:(from:1) *> C1, fst:C1 *> C2, snd:C1 *> C3, fst:C3 *> C4. C1 = (fst: (from:1), take: (sub:2:1): (snd: (from:1))), 
Note the duplication of the snd:(from:1)
subexpressions in C3
, which is avoided in the callbyneed strategy by sharing.
The callbyvalue strategy does not apply (EAbs) to reduce lambda expressions, meaning that we never evaluate inside abstractions. This allows us to define "thunks", "suspensions", or "delayed forms" to simulate callbyname and lazy evaluation in callbyvalue strategies. That is, we can "stuff" an expression E in an abstraction λx.E
using a dummy variable x
. We force evaluation of the delayed E
later by applying the abstraction to a dummy value. We define the delay
and force
special forms as macros:
macro delay(E) = (λx.E)
macro force(D) = (D nil)
Recall that a similar approach was used in the C++0x example to pass an expression by name to an integration function by stuffing it into a lambda function:
double integrate(function<double (double)> f, double &x, double a, double b, double h) { double sum = 0; for (x = a; x =< b; x += h) sum += f(); return sum; } ... double y = 1.0; double z = integrate([&]() > double { return 2*x + x*x + y; }, x, 0.0, 10.0, 0.5); 
Suppose we construct lambda expressions such that each bound variable has a unique name and all free variables are unique. This property is refered to as Barendregt's variable convention (BVC).
When we use BVC for lambda abstractions with Prolog variables we ensure that a Prolog variable is bound to exactly one abstraction. With BVC we can implement beta reduction efficiently by instantiating Prolog variables in unit time rather than relying on substitution that takes O(n) time for terms of size n.
The following illustrates this idea by reducing (λx.mul x x) 2
→_{β} mul 2 2
by matching (X.mul:X:X):2 = F:T
to application of a function F
to an operand expression T
, where F = (X.S)
matches an abstraction X.S
with body S = mul:X:X
, and then we set variable X
to the operand T
to obtain S = mul:2:2
:
? (X.mul:X:X):2 = F:T, F = (X.S), X = T. 
which suggestes a simple operational semantics rule for beta reduction in Prolog:
(X.S):T :> S : rule('EAbsApp'), X = T. 
or we can write this as:
(T.S):T :> S : rule('EAbsApp'). 
To see why BVC is essential, consider:
? (X.X:2):(X.add:1:X) :> C. 
which fails, because we attempt the unification X = (X.add:1:X)
.
Unfortunately, we cannot automatically maintain BVC:
? (X.X:(X:2)):(Y.add:1:Y) :> C. C = (Y.add:1:Y):((Y.add:1:Y):2). 
To maintain the BVC property we need alpha renaming.
The copy_term
/2 predicate copies a term to a new term with the same structure but with fresh new Prolog variables. That is, variables are renamed. The copy_term
/2 predicate performs the same copying operation when a Prolog rule is copied from the program database of rules and instantiated. When we add the new rule:
% PROlOG FILE: beta.pl ... %_CONCLUS_ : _________RULE_________, __PREMISES__. 
the rule above is a simplified version of the slightly more elaborate but more transparant implementation using explicit unification for the substitution:
(X.S):T :> R : rule('EAbsApp'), copy_term((X.S), (Y.R)), Y = T. 
and we get step by step:
? [delta]. ? noshow, by_name, (X.X:(X:2)):(Y.add:1:Y) :> C1, C1 :> C2, C2 :> C3, C3 :> C4, C4 :> C5. C1 = (Y.add:1:Y): ((Y.add:1:Y):2), 
as desired. This approach works with callbyname, callbyneed, and callbyvalue strategies, which do not apply (EAbs).
Exercise: With (EAbs) enabled, what can go wrong when evaluating lambda expressions in Prolog with Prolog variables in BVC using copy_term
/2?
To implement callbyneed, we only have to change the new rule (EAbsApp) to include a shared form as follows:
F:T :> S : rule('EAbsApp'), copy_term(F, ((T~_).S)). 
Lambda calculus can be trivially embedded in pure Prolog restricted to Horn clauses.
We can avoid the nonpure copy_term
/2 and implement lambda calculus with callbyname, callbyneed, and callbyvalue strategies in pure Prolog by converting each abstraction (X.S)
in a lambda expression to a new predicate lam(i, X, S)
and adopt a new lambda application rule:
lam(I):T :> S : rule('EAbsApp'), lam(I, T, S). 
For example, (X.X:(X:2)):(Y.add:1:Y)
is converted to the term lam(1):lam(2)
with indexed lam
/1 terms that refer to the lam
/3 Prolog facts:
lam(1, X, X:(X:2)). lam(2, X, add:1:X). 
Execution proceeds by instantiation of the indexed lam
/3 rules:
? [delta]. ? noshow, by_value, lam(1):lam(2) *> C. : / \ lam lam   1 2 __: / \ lam :  / \ 2 lam 2  2 __:___ / \ :_ : / \ / \ add 1 lam 2  2 _____:___ / \ :_ :___ / \ / \ add 1 :_ 2 / \ add 1 :___ / \ :_ 3 / \ add 1 4 C = 4. 
This idea works in principle, but in general we need to make a small change to allow variables of outer scope to occur inside abstractions to create closures. Thus, we need an environment to carry along variable bindings into the closure. We can use a list of Prolog variables. For example, (X.Y.X)
is translated to lam(1, [])
where
lam(1, [], X, lam(2,[X])). lam(2, [X], Y, X). 
Note that the second lambda abstraction has a free variable X
that forms the body of the inner abstraction Y.X
of the nested abstractions X.Y.X
.
We modify rule (EAbsApp) to include the environment as a list of free variables Xs
:
lam(I,Xs):T :> S : rule('EAbsApp'), lam(I, Xs, T, S). 
The natural semantics formulation of the callbyname evaluation strategy consists of just three rules:
% PROLOG FILE: pure.pl ... : op(950, xfx, =>). % bigstep evaluation relation ... val(A) => val(A). lam(I,Xs) => lam(I,Xs). 
where terms are precompiled to pure Prolog terms and lam
/4 facts with the ~>
relation defined by:
% PROLOG FILE: pure.pl ... : op(950, xfx, ~>). % compilation relation ... V ~> (val(V),[]) : atomic(V). 
We use the compiler relation E ~> (C,B)
to compile E
to C
and list of free variables B
, then evaluate C
to a value V
:
? [pure]. ? (X.Y.X):2:3 ~> (C,B), C => V. C = app(app(lam(1, []), val(2)), val(3)), 
An additional advantage is that the compilation automatically implements local variable bindings by the fact that variable bindings are part of the lam
/4 facts. Thus, variables are no longer universally quantified after compilation and we do not have to start in BVC form. For example, (X.X:(X:2)):(X.add:1:X)
.
The pure lambda calculus is sufficiently powerful to express any calculation. But it would be syntactically awkward to use. Let's use some syntactic sugar to the lambda calculus notation and define convenient syntactic constructs for our "MiniMu" programming language, consisting of expressions E
and function definitions D
:
E ::= v  x  \x.E  E:E  if(E,E,E)  (E,E)  f(E,...,E)  E⊕E  ⊕E  E where x = E  E where f(x,...,x) = E D ::= def f := E.  def f(x,...,x) := E.  def x⊕x := E.  def ⊕x := E. 
where values v
are atoms and constants, x
denotes a variable name, f
is a function name, and ⊕ is an infix operator.
The syntax of expressions E
for the last five grammar productions is converted to lambda expressions by the following rules:
f(E_{1},E_{2}...,E_{k})
⇒ f:E_{1}:E_{2}:...:E_{k}
E_{1}
⊕E_{2}
⇒ (
⊕):E_{1}:E_{2}
⊕E
⇒ (
⊕):E
E_{1} where x = E_{2}
⇒ (\x.E_{1}):E_{2}
E_{1} where f(x_{1},x_{2}...,x_{k}) = E_{2}
⇒ (\f.E_{1}):(\x
_{1}.(\x
_{2}.(...(\x
_{k}.
E_{2}
))))
D
are normalized by the rules:
def f(x_{1},x_{2}...,x_{k}) := E
⇒ f := (\x
_{1}.(\x
_{2}.(...(\x
_{k}.E)))).
def x_{1}
⊕x_{2} := E
⇒ (_{}
⊕) := (\x
_{1}.(\x
_{2}.
E)).
def
⊕x := E
⇒ (_{}
⊕) := (\x
.
E).
These rules suffice to reduce the syntax down to a normalized form consisting only of lambda expressions and definitions of constants (where functions are constants).
To implement the common list functions we saw earlier, we define:
% PROlOG FILE: list.pl ... % List operators : op(500, yfx, ++). % Append : op(700, xfx, ..). % Range def length(xs) := if(nil(xs), 0, length(tl(xs))+1 ). def xs++ys := if(nil(xs), ys, cons(hd(xs), tl(xs)++ys) ). def map(f, xs) := if(nil(xs), [], cons(f(hd(xs)), map(f, tl(xs))) ). def foldl(f, x, xs) := if(nil(xs), x, foldl(f, f(x, hd(xs)), tl(xs)) ). def foldr(f, x, xs) := if(nil(xs), x, f(hd(xs), foldr(f, x, tl(xs))) ). def filter(p, xs) := if(nil(xs), [], if((p(hd(xs)), cons(hd(xs), filter(p, tl(xs))), filter(p, tl(xs))) ) ). def takeWhile(p, xs) := if(nil(xs), [], if(p(x), cons(x, takeWhile(p, tl(xs))), [] ) where x = hd(xs) ). def dropWhile(p, xs) := if(nil(xs), [], if(p(hd(xs)), dropWhile(p, tl(xs)), xs ) ). def take(n, xs) := if(n=0, [], if(nil(xs), [], cons(hd(xs), take(n1, tl(xs))) ) ). def drop(n, xs) := if(n=0, xs, if(nil(xs), [], drop(n1, tl(xs)) ) ). def zipWith(f, xs, ys) := if(nil(xs), [], if(nil(ys), [], cons(f(hd(xs),hd(ys)), zipWith(f, tl(xs), tl(ys))) ) ). def zip(xs, ys) := zipWith(pair, xs, ys). def nth(n, a, xs) := if(nil(xs), a, if(n=<1, hd(xs), nth(n1, a, tl(xs)) ) ). def concat := foldr(++, []). def a..b := if(b<a, [], cons(a, a+1..b)). def from(n) := cons(n, from(n+1)). 
A readevalprint loop that performs the syntax translation and evaluation of expressions using the callbyneed strategy:
? [minimu,list]. ? loop. > 1+2*3. 7 > def sum := foldr(+, 0). ok > sum(1..10). 55 > foldr(*, 1, 1..10). 3628800 > def fac(n) := if(n=0, 1, n*fac(n1)). ok > fac(10). 3628800 > map(sqr, 1..10) where sqr(n) = n*n. _______________________.____________________ / \ ____________:______ ____________:_________ / \ / \ ____. __________: ______:_ __________: / \ / \ / \ / \ \ :__ hd ________. map ____. tl ________.  / \ / \ / \ / \ x : x 1 :______ \ :__ 1 :______ / \ / \  / \ / \ * x ____: 10 y : y ____: 10 / \ / \ / \ .. :__ * y .. :__ / \ / \ : 1 : 1 / \ / \ + 1 + 1 
Note that lists are lazy, so the last evaluation produces a partially evaluated list of the form head.tail.
[...]
 End.