COT5315 Foundations of Programming Languages and Software Systems

Robert van Engelen and Steven Bronson

Last update: March 29, 2012 9:46 AM

Course Web Site

Course Outline (Tentative)

Choice of additional topics

Course Materials

[SSPL] Syntax and Semantics of Programming Languages by Ken Slonneger and Barry Kurtz.
[TPL] Types and Programming Languages by Benjamin Pierce
[FP] Functional Programming by Anthony Field and Peter Harrison

Program code to download.

Metatheory Basics

Metalanguages, terms, and metavariables

A metalanguage is a meta-level language and notation to define a language. The commonly-used BNF grammar notation is a meta-level notation to define the syntax of a (programming) language. For example, consider the expression language NB (expressions over Booleans and Natural numbers):

E ::= true
    | false
    | if E then E else E
    | 0
    | succ E
    | pred E
    | iszero E

where E is a metavariable, namely the nonterminal that defines the syntactic category of NB expressions and ::= and | are meta-operators in the metalanguage.

There are many kinds of metalanguages that can be used to define a (programming) language and we will describe some of them. First, let's distinguish concrete syntax from abstract syntax.

A concrete syntax defines the set of words of a language concretely, where words are strings (sequences) of tokens (or terminals) from a given alphabet of symbols (or signs). A grammar defines the concrete syntax of a language. Positional information may be relevant in the syntax, as well as punctuation symbols such as parenthesis, commas, semicolons, and so on. A parser produces a concrete parse tree given a word that is syntactically correct.

For example, given the concrete syntax of NB expressions E defined by the BNF grammar above and the word "if iszero pred succ 0 then if true then 0 else succ 0 else 0", the resulting parse tree is:

  /       |             |              |                | \
 |      __E__           |     _________E________        |  E
 |     /     \          |    /  |    |  |   |   \       |  |
 |    |       E_        |   |   |    |  |   |    |      |  |
 |    |      /  \       |   |   |    |  |   |    |      |  |
 |    |     |    E_     |   |   E    |  E   |    E_     |  |
 |    |     |    | \    |   |   |    |  |   |    | \    |  |
 |    |     |    |  E   |   |   |    |  |   |    |  E   |  |
 |    |     |    |  |   |   |   |    |  |   |    |  |   |  |
if iszero pred succ 0 then if true then 0 else succ 0 else 0

An abstract syntax inductively defines the set of terms (or expressions) of a language by a finite set of abstract constructs over terms. A term is well-formed if it is derivable from the (abstract) syntax. A term is either an atom or a k-ary functor with k arguments that are terms. In the example abstract syntax for NB expressions E defined below, true, false, and 0 are atoms, if is a 3-ary functor, succ, pred, and iszero are unary functors:

E ::= true
    | false
    | if(E, E, E)
    | 0
    | succ(E)
    | pred(E)
    | iszero(E)

Because terms are composed of atoms and functors over terms, terms can be viewed as data structures, commonly referred to as abstract syntax trees (ASTs). An abstract syntax tree compactly represents a term without the unnecessary syntactic details found in concrete syntax trees, such as nonterminals and parenthesis for grouping expressions.

For example, the abstract syntax tree of the term if(iszero(pred(succ(0))), if(true, 0, succ(0)), 0) is:

/ | \
iszero __if___ 0
| / | \
pred true 0 succ
| |
succ 0

Instead of the abstract syntax definition for E given above as a grammar above, we can also define terms inductively using set theory as a metalanguage. The set of terms is the smallest set T such that:

  1. {true, false, 0} ⊆ T
  2. if t1 T then {succ(t1), pred(t1), iszero(t1)} ⊆ T
  3. if t1 T, t2 T, and t3 T then {if(t1, t2, t3)} ⊆ T

A concrete definition of the set T is

S0 = ∅

Si+1 = {true, false, 0} ∪ {succ(t1), pred(t1), iszero(t1) | t1Si} ∪ {if(t1, t2, t3) | t1, t2, t3Si}

T = ∪ Si

Again, we are defining T as a set of terms that are trees, not strings.

Post systems and inference rules

We can also define the set of terms T by inference rules in the "natural deduction style" presentation of logical Post systems, where each inference rule is of the form


where the conclusion is a term t and the premises are n terms ti:

t1    t2    ⋅⋅⋅    tn

When n=0 we will simply write


which is an axiom (or a fact).

In the sequel, we will consider terms constructed over atoms, functors, and (meta)variables. A term is closed (or ground) when it contains no free variables. A term containing free variables is a theorem. A Post system metavariable in a theorem can be instantiated to any term. An instance of an inference rule is obtained by replacing each metavariable by the same term in the rule's conclusion and premises (if any).

A proof is inductively defined as a finite set of inference rule instances such that

  1. An instance of an axiom is a proof of its conclusion.
  2. If Pr1, Pr2, ..., Prn are proofs of terms t1, t2, ..., tn respectively, and
    t1    t2    ⋅⋅⋅    tn
    is an inference rule instance, then
    Pr1    Pr2    ⋅⋅⋅    Prn
    is a proof of t.

A term is provable if a proof can be constructed.

Consider the axiom


and inference rule


where 0 is an atom, succ is a unary functor, and X is a variable. The axiom "proves" the existance of the atom 0 as a fact. The inference rule derives new terms of the form succ(X) given term X. Thus, given the instance


of the axiom, the term succ(0) is provable by instantiating X=0 in the inference rule to obtain


We generally write the complete proof as derivation tree, oriented as an inverted tree, where the concluding provable term is at the bottom and the proofs of the premises branch out to the top as follows


In this case the derivation tree is "skinny", since we have only one premise in the rules. This is not always the case as we will see later.


To "connect" inference rule instances in a proof, we apply unification. In matching the concluding term of a rule to the term of a premise the terms should be "structurally compatible". Unification means that the term "are made equal" by instantiating their variables accordingly.

More precisely, unification is the process of finding the minimum number of substitutions for the variables in the two terms such that the two terms become equal.

When two terms are trees, all we need to do to unify these terms is traverse both trees in parallel and check if the nodes and leaves are identical. When a variable is encountered the variable is bound to the corresponding term in the other tree, including to other variables (which effectively become aliases).

Consider for example the two terms if(iszero(X), if(true, Y, succ(0)), 0) and if(iszero(pred(succ(U))), if(V, W, succ(W)), 0) depicted as trees:

/ | \
iszero _if___ 0
| / | \
X true Y succ
/ \
iszero _if__
| / | \
pred V W succ
| |
succ W

Unification yields X = pred(succ(U)), V = true, and Y = W = 0. Note that Y and W are aliases and that variable U remains uninstantiated (remember that we should keep substitutions to a minimum, which means that we should not instantiate more variables than necessary to unify both terms). Unification is an equivalence relation, and is therefore symmetric (commutative), reflexive, and transitive. That is, t = t and if t1 = t2, t2 = t3, then t1 = t3 (though different variable instantiations may result as a side-effect from unifications of terms t1 = t2, t2 = t3, and t1 = t3).

Unification may create terms that are cyclic. For example, unifying succ(X) with succ(succ(X)) binds X = succ(X) thereby creating a cycle that represents the infinite term succ(succ(succ(succ(...)))). To avoid cycles, unification is applied with an occurs check. Normally, we do not assume that terms with cycles are produced in derivation trees for proofs. However, when cycles are allowed this will be explicitly stated. As we will see later, cycles can be useful in type checking.

Exercise: show that plus(succ(succ(0)), succ(0)) = succ(succ(succ(0))) is a provable term using the four rules:

plus(0, Y) = Y
plus(X, Y) = Z
plus(succ(X), Y) = succ(Z)

Our previous example of NB expressions can be defined by inference rules to define the set T of terms as follows:

true T
false T
0 T
t1 T
succ(t1) T
t1 T
pred(t1) T
t1 T
iszero(t1) T
t1 T    t2 T    t3 T
if(t1, t2, t3) T

This defines the abstract syntax of expressions as provable terms. Provable terms are well formed with respect to the (abstract) syntax.

Exercise: show that if(iszero(pred(succ(0))), if(true, 0, succ(0)), 0) is a provable term.

(Meta)programming with Prolog

We can directly implement the inference rules on NB terms in Prolog by defining a predicate is_term with seven clauses, consisting of three facts and four rules:

is_term(succ(E1)) :- is_term(E1).
is_term(pred(E1)) :- is_term(E1).
is_term(iszero(E1)) :- is_term(E1).
is_term(if(E1, E2, E3)) :- is_term(E1), is_term(E2), is_term(E3).

Note that the predicate we defined is_term(...) takes the place of the conclusion in the rule and that E1, E2, and E3 are variables. The premises, if any, appear at the right-hand side of the :-.

And indeed, we can query the Prolog system to prove that if(iszero(pred(succ(0))), if(true, 0, succ(0)), 0) is a term, whereas if(a, 0, 0) is not:

?- [nbterms].
% nbterms compiled 0.00 sec, 1,960 bytes
true. ?- is_term(if(iszero(pred(succ(0))), if(true, 0, succ(0)), 0)). true. ?- is_term(if(a, 0, 0)). false.

When a Prolog goal succeeds, Prolog reports true and the bindings of variables of the solution are shown (if any). When a Prolog goal fails, Prolog reports false (or fail), obviously without any variable bindings. Thus, fail is not an error or an exception but rather a state. In a Prolog program, failure typically occurs as an internal state in the search for solutions.

To trace the rules and show the derivation tree of the proof:

?- [nbterms_rules].
%  rule compiled 0.00 sec, 7,440 bytes
% nbterms_rules compiled 0.00 sec, 10,656 bytes
true. ?- show, is_term(if(iszero(pred(succ(0))), if(true, 0, succ(0)), 0)). [1,1,1,1,1]:--------------------------------------------------------------------I-Zero
is_term(succ(0)) is_term(succ(0))
is_term(pred(succ(0))) is_term(pred(succ(0)))
is_term(0) is_term(0)
is_term(succ(0)) is_term(true),is_term(0),is_term(succ(0))
is_term(if(iszero(pred(succ(0))),if(true,0,succ(0)),0)) true .

Prolog is a logic programming language based on logic deduction by rule inference using backward chaining. That is, in backward chaining we start with the final goal to prove (the term in the conclusion of a rule) and try to prove it by finding a matching rule. When a rule matches by unification, we then try to prove the subgoals (the premises), and so on.

Prolog uses term unification for rule matching and backtracking over rules to prove a goal. When a rule leads to a dead end and fails, backtracking finds another rule to try, which means that variable bindings established in the dead-end must be undone.

Terms in Prolog form trees over atoms, functors and Prolog variables. Terms are inductively defined as follows:

In addition, the following conventions are used.

Useful built-in Prolog predicates are:

Common list predicates in Prolog are:

Most Prolog predicates are relational. That is, input and output are often (but not always) reversible.

Prolog predicates cannot be nested as terms as if they were functions. That is, the programming style is not a functional style, but somewhat comparable to imperative sequencing of statements, where typically the next predicate takes the result of the previous. When a predicate fails, Prolog backtracks to retry previous predicates, and so on. This makes it easy to implement generate-and-test solutions to problems.

For example:

?- member(b, [a,b,c]).

?- member(X, [a,b,c]), b = X.
X = b .

?- member(X, [1,2,3,4,5]), X > 3.
X = 4 .

?- member(b, [a|Xs]).
Xs = [b|_G310] .

?- append([a,b], [c,d], Zs).
Zs = [a, b, c, d].

?- append(Xs, [c,d], [a,b,c,d]).
Xs = [a, b].

?- append(Xs, Ys, [a,b,c,d]), member(c, Xs).
Xs = [a, b, c],
Ys = [d].

?- setof((Xs,Ys), append(Xs, Ys, [a,b,c,d]), Pairs).
Pairs = [ ([], [a, b, c, d]), ([a], [b, c, d]), ([a, b], [c, d]), ([a, b, c], [d]), ([a, b, c|...], [])].

Writing a program in Prolog amounts to defining a set of rules (Prolog clauses) for predicates, which are entered in a Prolog file with extension .pl. Multiple files are loaded from the Prolog command line with (filename does not require the .pl extension):

?- [filename, filename, ...].

Definitions of inference rules (to be defined in files) are of the form:

head :- body.

where the head is a predicate (an atom or functor) and body is a conjunction of n subgoals:

head :- goal1, goal2, ..., goaln.

If the body is simply true, we can omit the :- and state this as a fact:


Predicates can be atoms, but that is not so useful so we usually use functors for predicates. Predicates define properties of terms and relations between terms:

father(john, mary).
mother(beth, mary).
gives(Parent, Object, Child) :- father(Parent, Child), valuable(Object), interesting(Object).
gives(Parent, Object, Child) :- mother(Parent, Child), valuable(Object), interesting(Object).

Suppose we need to determine what present mary receives from one of her parents:

?- [mary].
% mary compiled 0.00 sec, 2,888 bytes

?- gives(P, X, mary).
P = john,
X = painting .

when we trace the inference steps of the goal gives(P, X, mary) with trace/0, we see that backtracking over subgoals occurs (the _G### denote internal variables or new variables created by rule instantiations):

?- trace, gives(P, X, mary).
Call: (7) gives(_G231, _G232, mary) ? creep
Call: (8) father(_G231, mary) ? creep
Exit: (8) father(john, mary) ? creep
Call: (8) valuable(_G232) ? creep
Exit: (8) valuable(gold) ? creep
Call: (8) interesting(gold) ? creep
Fail: (8) interesting(gold) ? creep
Redo: (8) valuable(_G232) ? creep
Exit: (8) valuable(painting) ? creep
Call: (8) interesting(painting) ? creep
Exit: (8) interesting(painting) ? creep
Exit: (7) gives(john, painting, mary) ? creep
P = john,
X = painting .

Tracers and debuggers are implemented in Prolog, as meta-level programs, for controlling and reasoning about logic programs.

To illustrate the use of the ! "cut", consider changing the second to last rule of the example:

gives(Parent, Object, Child) :- father(Parent, Child), !, valuable(Object), interesting(Object).    

This cuts backtracking after father/2, which prevents the search for alternative matching rules for subgoals to the left of ! (backtracking over father/2 is cut in this case) and also cuts the backtracking over the current predicate (gives/3 in this case). Any deeper backtrack points, if any, are not affected!

The "cut" is a extra- (or meta-)logical predicate, because it controls the logical inference process (beyond control predicates such as cut and negation \+, other extra-logical predicates are term inspection predicates such as var/1 that divert from the pure first-order Horn-clause logic programming paradigm). The reason to use "cut" is either for performance optimization or to limit solutions. A white cut is placed to prevent further matching of a predicate's clauses that will lead to nonmatches or to failure anyway. A green cut is placed to limit backtracking to discard solutions that are not needed. A red cut is a cut that is incorrectly placed and causes the program to fail to produce solutions.

Exercise: is the cut in the example above a white or a green cut?

Prolog is self-defining and allows terms to be executed as goals using call(T) for any term T that is not a variable.

Call combined with cut and fail can be useful to implement meta-logical predicates:

if(G1, G2, G3) :- call(G1), !, call(G2).
if(G1, G2, G3) :- call(G3).
not(G) :- call(G), !, fail.
and(G1, G2) :- call(G1), call(G2).
or(G1, G2) :- call(G1).
or(G1, G2) :- call(G2).

Note: the if has a built-in Prolog equivalent written as (G1 -> G2; G3), not has a built-in prefix operator "\+", and has a built-in "," (comma), and or has a built-in ";" (semicolon).

Exercise: what variables are instantiated when we query if(1=X, and(Z=Y, Z=2), Y=3)? What about not(and(X=1, X>2))?

Induction on terms provides a mechanism to determine various properties of terms. For our NB expression language, we can inductively define the set of constants appearing in a term:

consts(true, Cs) :- Cs = [true].
consts(false, Cs) :- Cs = [false].
consts(0, Cs) :- Cs = [0].
consts(succ(E1), Cs) :- consts(E1, Cs).
consts(pred(E1), Cs) :- consts(E1, Cs).
consts(iszero(E1), Cs) :- consts(E1, Cs).
consts(if(E1, E2, E3), Cs) :- consts(E1, Cs1), consts(E2, Cs2), consts(E3, Cs3), union(Cs1, Cs2, Cs12), union(Cs12, Cs3, Cs).

to inductively define the size of a term:

size(true, N) :- N = 1.
size(false, N) :- N = 1.
size(0, N) :- N = 1.
size(succ(E1), N) :- size(E1, K), N is K+1.
size(pred(E1), N) :- size(E1, K), N is K+1.
size(iszero(E1), N) :- size(E1, K), N is K+1.
size(if(E1, E2, E3), N) :- size(E1, K), size(E2, L), size(E3, M), N is K+L+M+1.

and to inductively define the depth of a term:

depth(true, N) :- N = 1.
depth(false, N) :- N = 1.
depth(0, N) :- N = 1.
depth(succ(E1), N) :- depth(E1, K), N is K+1.
depth(pred(E1), N) :- depth(E1, K), N is K+1.
depth(iszero(E1), N) :- depth(E1, K), N is K+1.
depth(if(E1, E2, E3), N) :- depth(E1, K), depth(E2, L), depth(E3, M), N is max(max(K, L), M)+1.

For example:

?- [draw,nbterms_induc].
?- draw(if(iszero(pred(succ(0))),if(true,0,succ(0)),0)).
 /       |       \ 
isz     _if___    0 
 |     /   |  \ 
pre true   0  suc 
 |             | 
suc            0 
?- consts(if(iszero(pred(succ(0))),if(true,0,succ(0)),0), Cs).
Cs = [true, 0].
?- size(if(iszero(pred(succ(0))),if(true,0,succ(0)),0), N).
N = 11.
?- depth(if(iszero(pred(succ(0))),if(true,0,succ(0)),0), N).
N = 5.  

There are three principles of induction on terms.

Induction on depth:

If, for each term s,

given P(r) for all r such that depth(r) < depth(s) we can show P(s),

then P(s) holds for all s.

Induction on size:

If, for each term s,

given P(r) for all r such that size(r) < size(s) we can show P(s),

then P(s) holds for all s.

Structural induction:

If, for each term s,

given P(r) for all immediate subterms r of s we can show P(s),

then P(s) holds for all s.

Exercise: prove that |consts(t)| ≤ size(t) for any NB expression t by induction on the depth of t. Answer: see TPL p.30.

Denotational semantics (introduction)

Denotational semantics takes an abstract view of the meaning of a program by formalizing the semantics of a programming construct as a mathematical object. Semantic functions map a program's syntactic programming constructs to denotations, where the mappings are defined by a set of semantic equations. Denotations are mathematical objects from a semantic (value) domain. The mathematical object produced for a program is a function object. This function object maps the program's inputs to its outputs. The semantic domains of the inputs and outputs of the function object may be the same as that of the program. However, we can map the program's values to new value domains. The function object itself is expressed in a well-defined language of a calculus or logic.

There are five components in a denotational semantics definition of a given language L:

  1. The abstract syntactic categories (or syntactic domains) of L, which are the nonterminals of L if the syntax is concrete.
  2. The (abstract) syntax of L in (abstract) production rules.
  3. The semantic value domains for the semantic functions.
  4. The signature of the semantic function for each syntactic category of L.
  5. The semantic equations, such that each semantic function maps the syntactic constructs in a category to a denotation (a mathematical object) in the semantic value domain.

Consider the abstract syntax of NB with one syntactic domain E of expressions with seven abstract production rules:

E ::= true
    | false
    | if(E, E, E)
    | 0
    | succ(E)
    | pred(E)
    | iszero(E)

We assume that the values computed by NB expressions are Booleans and the natural numbers. This is formalized by defining the semantic value domain NB:

NB = {t, f} ∪ ℕ = {t, f, 0, 1, ...}

The signature of the semantic function D that maps constructs from the syntactic domain E to denotations is:


The semantic equations are:

Dtrue⟧ = t
D false⟧ = f
D if(E1, E2, E3)⟧ =  D E2⟧ if D E1⟧ = t
D E3⟧ if DE1⟧ = f
D 0⟧ = 0
D succ(E)⟧ = D E⟧ + 1
D pred(E)⟧ = max(0, D E⟧ − 1)
  t if DE⟧ = 0
D iszero(E)⟧ =
  f if DE⟧ > 0

Emphatic brackets ⟦ ⟧ are used to separate the syntactic world (terms in the syntactic domain) from the semantic world (denotations in the semantic domain).

The powerful principle of compositionality can be exploited with denotational semantics because of the inductive structure of the semantic equations. As a consequence, semantic functions are homomorphisms, which means they respect operations. The function H is a homomorphism if H(f(x)) = g(H(x)).

x H
f   g
f(x)  → 
H(f(x)) = g(H(x))

Clearly, function D is a homomorphism. An operation f in the syntactic domain has an equivalent operation g in the semantic domain.

Instead of mathematical objects, we can also use a higher-level programming language to define semantic functions and select certain types of values in the programming language for the semantic value domains.

For example, we can construct a Prolog program for the NB expression language and choose for the semantic value domains the set of Prolog atoms t, f, 0, 1, 2, ...:

d(true,           t).
d(false, f).
d(0, 0).
d(succ(T), V) :- d(T, VT), integer(VT), succ(VT, V).
d(pred(T), V) :- d(T, VT), (VT = 0 -> V = 0; integer(VT), succ(V, VT)).
d(iszero(T), V) :- d(T, VT), (VT = 0 -> V = t; V = f).
d(if(T1, T2, T3), V) :- d(T1, VT1), (VT1 = t -> d(T2, V); VT1 = f -> d(T3, V)).

Prolog predicates are relational (to a limited extent), so the input and output roles of a predicate's argument can be reversed, viz. succ(VT, V) and succ(V, VT) used in the clauses of predicate d/2 (predicates are often referred to by name/arity). Hence, we can apply the semantic function to a NB expression to compute its value, and vice versa:

?- [nbterms_denot].
?- d(succ(pred(succ(succ(0)))), V).
V = 2.

?- d(d(if(iszero(0),if(false,0,succ(0)),0), V).
V = 1.

?- d(T, t).
T = true .

?- d(T, 3).
T = succ(succ(succ(0))) .

However, Prolog is not purely relational and the use of control- and meta-logical predicates (such as "cut", \+, var/1) and other non-relational predicates in clauses often prevents predicates from being "reversible". Another problem is non-termination of inference by backward chaining. The termination property is very sensitive to clause orderings. For example, reversing the two clause definitions:

d(succ(T),        V) :- d(T, VT), integer(VT), succ(VT, V).
d(0, 0).

leads to non-termination of the goal d(T, 3), whereas d(T, V) for any term T still terminates. This makes it generally difficult to design and implement a reversible predicate.

More on denotational semantics later. See also Syntax and Semantics of Programming Languages Chapter 9.

Axiomatic semantics (definition)

Axiomatic semantics derives laws from the definitions of imperative programming language constructs. These laws define the meaning of the program by means of describing the initial and final state of a computation, and can be used to verify the correctness of a program or algorithm.

Axiomatic semantics was covered in COP4020.

Operational semantics

Operational semantics defines the meaning of a language by the operations of an abstract machine. The machine operates on the abstract syntax tree of terms of the language by applying transition functions on terms. Hence, the state of the machine is just the term it is operating on. The operational semantic meaning of a term t of the language we define is the final state (term) that is reached when the machine halted after starting with the inital state (the term t).

The one-step evaluation relation on terms, written ts and meaning "t evaluates to s", is a one-step transformation to modify term t into s. The one-step evaluation relation represents a transformation function from a term (state) t to another term (next state) s.

Operational semantics of untyped arithmetic expressions

Consider the abstract syntax of the language NB of expressions:

E ::= true
    | false
    | if(E, E, E)
    | 0
    | succ(E)
    | pred(E)
    | iszero(E)

where we want NB expressions to compute values v over Booleans true and false and natural numbers nv, expressed by the following abstract syntax of value terms of NB:

v  ::= true | false | nv
nv ::= 0 | succ(nv)

The operational semantics one-step evaluation rules for NB are:

if(true, T2, T3)T2
if(false, T2, T3)T3
T1T'1 (E-If)
if(T1, T2, T3)if(T'1, T2, T3)
T1T'1 (E-Succ)
T1T'1 (E-Pred)
T1T'1 (E-IsZero)

Axioms (E-IfTrue, E-IfFalse, E-PredZero, E-IsZeroZero, E-IsZeroSucc) are computation rules. Rules with premises (E-If, E-Succ, E-Pred, E-IsZero) are congruence rules and define an evaluation strategy expressing which parts of the term to evaluate. For example, the conditional in the if must be evaluated first.

A rule is satisfied by a relation if for each instance of the rule, either the conclusion is in the relation or one of the premises is not. Basically, the evaluation relation that we want ts should be the smallest binary relation on terms satisfying the rules shown above. That is, the relation should only include pairs (t, s) of terms t and s that are derivable. More formally, when the pair (t, s) is in the evaluation relation (t, s) ∈ →, we say that "the evaluation statement (or judgement) ts is derivable."

Basically, we are stating that all pairs (t, s) from the provable terms ts are in the evaluation relation (t, s) ∈ →, no more, no less. Hence, we can consider the inference rules to define the smallest evaluation relation satisfying the rules.

A term t is in normal form (t is a normal form, or t is a canonical form) if no evaluation rule applies to it. That is, there is no term s such that ts.

We will state some useful properties of NB.

THEOREM: Every NB value is in normal form.

Proof: The values true, false, and 0 are normal forms, because they do not appear on the left of the evaluation relation in any rule. Values succ(t) with t in normal form are normal forms, because t is in normal form and the premise in (E-Succ) is not provable. ∎

Note that we only consider well-formed terms. That is, terms defined by the (abstract) syntax NB for expressions and values. With this assumption we can state the following.

THEOREM [Completeness of NB]: If t is a well-formed NB term in normal form, then t is an NB value.

Proof: By structural induction on t. ∎

We enforce well-formedness to prevent admitting terms such as succ(true) that are normal forms but meaningless values.

In general, we may encounter terms that are stuck in normal form but are not a value. That is, the operational semantics has reached a "meaningless state" comparable to the notion of a run-time error. In a concrete implementation of the language these states might correspond to failures of various kinds: segmentation faults, exceptions, etc. For example, evaluating succ(n) might fail when n is the maximum machine representation of a number.

A common approach to formalize the notion of meaningless states in an abstract machine is to introduce a special term ⊥ called bottom. We can augment NB with the value ⊥ and additional evaluation rules:

succ(true) → ⊥
succ(false) → ⊥
pred(true) → ⊥
succ(false) → ⊥

and the following bottom-preserving evaluation rules

if(, T2, T3) → ⊥
succ() → ⊥
pred() → ⊥
iszero() → ⊥

A bottom-preserving function (or operation) is one that produces ⊥ when one of its operands evaluates to ⊥. This "propagates" the error as a result.

Another importent property of NB is that we have no more than one choice of an evaluation rule for a given term.

THEOREM [Determinacy of the one-step evaluation of NB]: If ts and tr, then s = r in NB.

Proof: By structural induction. Base case: the property holds for all computation rules (axioms) of NB. For the congruence rules starting with (E-If), we note that the conclusions of (E-If) and (E-True, E-False) both match if(T1, T2, T3). However, when T1=true or T1=false then the (E-If) premise is not derivable (since T1 is a value) and only (E-True) or (E-False) are applicable. By the induction hypothesis, if T1T'1 is deterministic so is if(T1, T2, T3)if(T'1, T2, T3). The same conclusions can be made for the other congruence rules by noting that values are in normal form. ∎

In general, we are interested in evaluating terms t to a value u (a normal form) through multiple steps t s ... → u using the multi-step evaluation relation t * s defined as the reflexive, transitive closure of the one-step evaluation, where * satisfies

t →* s if t → s
t →
* r if t * s and s →* r
t →* t

THEOREM [Consistency of NB]: Normal forms are unique, that is, if t →* u and t * u' for normal forms u and u', then u = u'.

Proof: Immediately follows from the determinacy of the one-step evaluation. ∎

THEOREM [Termination of evaluation in NB]: For every term t there is some normal form u such that t * u.

A Prolog definition of the one-step and multi-step evaluation relations is straight forward:

% Meta ops
:- op(950, xfx, :>).	% One-step evaluation relation
:- op(950, xfx, *>).	% Multi-step reflexive transitive closure of :>

T *> R :- T :> S, !, S *> R.
C *> C.
%_____________CONCLUSION_____________ :- ________RULE________, _PREMISES.
if(true, T2, _T3) :> T2 :- rule('E-IfTrue').
if(false, _T2, T3) :> T3 :- rule('E-IfFalse').
if(T1, T2, T3) :> if(T1p, T2, T3) :- rule('E-If'), T1 :> T1p.
succ(T1) :> succ(T1p) :- rule('E-Succ'), T1 :> T1p.
pred(0) :> 0 :- rule('E-PredZero').
pred(succ(NV1)) :> NV1 :- rule('E-PredSucc').
pred(T1) :> pred(T1p) :- rule('E-Pred'), T1 :> T1p.
iszero(0) :> true :- rule('E-IsZeroZero').
iszero(succ(_NV)) :> false :- rule('E-IsZeroSucc').
iszero(T1) :> iszero(T1p) :- rule('E-IsZero'), T1 :> T1p.

The rule/1 predicates are used for meta-level tracing, which is demonstrated as follows:

?- [nbterms_eval].
?- show, if(iszero(0),if(false,0,succ(0)),0) *> C.
  /        |        \  
isz      _if____     0 
 |      /    |  \      
 0   false   0  suc    



   /         |        \  
true       _if____     0 
          /    |  \      
       false   0  suc    


    /    |  \  
 false   0  suc


C = succ(0).

We can also define the natural semantics (big-step semantics as opposed to the small-step style) for NB:

T1true    T2V2 (B-IfTrue)
if(T1, T2, T3)V2
T1false    T3V3 (B-IfFalse)
if(T1, T2, T3)V3
T1NV1 (B-Succ)
T10 (B-PredZero)
T1succ(NV1) (B-PredSucc)
T10 (B-IsZeroZero)
T1succ(NV1) (B-IsZeroSucc)

The big-step evaluation relation tu as defined by the rules above fully evaluates a term t to a value u.

The big-step rule (B-Value) implicitly assumes that V is in normal form to "evaluate" it to V. That is, the choice of meta-variable V that ranges only over values helps control the order of evaluation. The rules are unordered so this ensures that (B-Value) is only applied when V is a value.

In an implementation of the rules we can select any rule ordering, but we want the (B-Value) rule to be matched last so V is in normal form because the evaluation rules prior to (B-Value) apply to terms that represent expressions that are evaluable.

The Prolog definition of the big-step evaluation rules is straight forward:

% Meta ops
:- op(950, xfx, =>).	% Big step evaluation relation
%________CONCLUSION_________ :- ________RULE________, ______PREMISES______.
if(T1, T2, _T3) => V :- rule('B-IfTrue'), T1 => true, T2 => V.
if(T1, _T2, T3) => V :- rule('B-IfFalse'), T1 => false, T3 => V.
succ(T1) => succ(NV) :- rule('B-Succ'), T1 => NV.
pred(T1) => 0 :- rule('B-PredZero'), T1 => 0.
pred(T1) => NV :- rule('B-PredSucc'), T1 => succ(NV).
iszero(T1) => true :- rule('B-IsZeroZero'), T1 => 0.
iszero(T1) => false :- rule('B-IsZeroSucc'), T1 => succ(_NV).
V => V :- rule('B-Value').

It is clear that the (B-Value) rule is to be defined last, unless we add a premise to verify that V is a value. Note that the ordering of premises is relevant for the efficiency of the implementation. Here, the condition of the if is evaluated first. Though the order of the premises in the rules does not matter for correctness verification, because of the absence of a state that produces errors and all evaluations will terminate (there are no loops and no recursion). However, a weak aspect to big-step operational semantics is that the order of the premises is relevant when defining evaluation rules for a language with potentially non-terminating term evaluations, e.g. if(false, nonterminating_calculation, 0).

Given these definitions, the proof that if(iszero(0),if(false,0,succ(0)),0) succ(0) is automatically derived as follows:

?- [nbterms_eval].
?- show, if(iszero(0),if(false,0,succ(0)),0) => C.








C = succ(0) .

Natural semantics (big-step operational semantics) closely resembles denotational semantics and can be viewed as a notational variant of it. By contrast, in small-step semantics a term is transformed step-by-step where the terms represent the intermediate (machine) states of the evaluation. Natural semantics and denotational semantics do not transform the term being evaluated. But rather the (machine) state is explicitly described by a denotation or by a state value.

Operational semantics of Wren

See Figure 8.4 p.241 from [SSPL Ch.8.4] for abstract syntax of Wren.

See Figure 8.6 p.247 from [SSPL Ch.8.5] for inference rules for the one-step evaluation relation → on Wren expressions.

THEOREM [Completeness of Wren expressions]: SSPL p.250 states that normal forms are values (numerals and Booleans) in Wren if the store contains bindings for all variables used in an expression and if the expression does not contain the division operation.

THEOREM [Consistency of Wren expressions]: SSPL p.251 states that normal forms are unique.

To implement the one-step evaluation relation for Wren, let's first simplify the abstract syntax a little by combining integer/Boolean expressions in one syntactic category Exp and use a more familiar n-ary functor representations for Wren commands as terms:

Exp ::= true | false | NUM | Exp iop Exp | Exp rop Exp | Exp bop Exp | not(Exp)
iop ::= + | - | * | /
rop ::= < | =< | = | >= | > | <>
bop ::= and | or
Cmd ::= skip | ID := Exp | if(Exp,Cmd) | if(Exp,Cmd,Cmd) | Cmd;Cmd | while(Exp,Cmd) | read(ID) | write(Exp)

The one-step evaluation relation for Wren expressions can be implemented as follows (simular rules are elided [...] for clarity):

%______________CONCLUSION______________ :- ___RULE___, ________PREMISES________.
(IE1 +   IE2,STO) :> (IE1p +   IE2,STO) :- rule('E1'),  (IE1,STO) :> (IE1p,STO).
(IE1 <   IE2,STO) :> (IE1p <   IE2,STO) :- rule('E2'),  (IE1,STO) :> (IE1p,STO).
(BE1 and BE2,STO) :> (BE1p and BE2,STO) :- rule('E3'),  (BE1,STO) :> (BE1p,STO).
(IE1 +   IE2,STO) :> (IE1 +   IE2p,STO) :- rule('E4'),  (IE2,STO) :> (IE2p,STO).
(IE1 <   IE2,STO) :> (IE1 <   IE2p,STO) :- rule('E5'),  (IE2,STO) :> (IE2p,STO).
(BE1 and BE2,STO) :> (BE1 and BE2p,STO) :- rule('E6'),  (BE2,STO) :> (BE2p,STO).
(N1 +   N2,STO)   :> (N,STO)            :- rule('E7'),  N is N1+N2.
(N1 <   N2,STO)   :> (B,STO)            :- rule('E8'),  (N1 <  N2 -> B = true; B = false).
(B1 and B2,STO)   :> (B,STO)            :- rule('E9'),  (B1 = true,  B2 = true  -> B = true;  B = false).
(not(BE),STO)     :> (not(BEp),STO)     :- rule('E10'), (BE,STO) :> (BEp,STO).
(not(true),STO)   :> (false,STO)        :- rule('E11').
(ID,STO)          :> (V,STO)            :- rule('E12'), atom(ID), member(ID=V, STO).

Note that the store STO is a list of name=value bindings that is propagated along and needed when an identifier has to be looked up with member/2.

Example evaluation:

?- Exp = x+y+6, STO = [x=17,y=25], show, (Exp,STO) *> (Val,STO).
      /             \          
     +__         ____.__       
    /   \       /       \      
   +     6     =         .__   
  / \         / \       /   \  
 x   y       x  17     =    [] 
                      / \      
                     y  25     
        (x,[x=17,y=25]):> (17,[x=17,y=25])

      (x,[x=17,y=25]):> (17,[x=17,y=25])
      (x+y,[x=17,y=25]):> (17+y,[x=17,y=25])

    (x+y,[x=17,y=25]):> (17+y,[x=17,y=25])
    (x+y+6,[x=17,y=25]):> (17+y+6,[x=17,y=25])

      /             \          
     +__         ____.__       
    /   \       /       \      
   +     6     =         .__   
  / \         / \       /   \  
17   y       x  17     =    [] 
                      / \      
                     y  25     
        (y,[x=17,y=25]):> (25,[x=17,y=25])

      (y,[x=17,y=25]):> (25,[x=17,y=25])
      (17+y,[x=17,y=25]):> (17+25,[x=17,y=25])

    (17+y,[x=17,y=25]):> (17+25,[x=17,y=25])
    (17+y+6,[x=17,y=25]):> (17+25+6,[x=17,y=25])

      /             \          
     +__         ____.__       
    /   \       /       \      
   +     6     =         .__   
  / \         / \       /   \  
17  25       x  17     =    [] 
                      / \      
                     y  25     
      42 is 17+25
      (17+25,[x=17,y=25]):> (42,[x=17,y=25])

    (17+25,[x=17,y=25]):> (42,[x=17,y=25])
    (17+25+6,[x=17,y=25]):> (42+6,[x=17,y=25])

    /           \          
   +         ____.__       
  / \       /       \      
42   6     =         .__   
          / \       /   \  
         x  17     =    [] 
                  / \      
                 y  25     
    48 is 42+6
    (42+6,[x=17,y=25]):> (48,[x=17,y=25])

  /         \          
48       ____.__       
        /       \      
       =         .__   
      / \       /   \  
     x  17     =    [] 
              / \      
             y  25     
Exp = x+y+6,
STO = [x=17, y=25],
Val = 48.

See Figure 8.8 p.254 from [SSPL Ch.8.6] for inference rules for the one-step evaluation relation → on Wren commands.

Commands are state transformers. That is, the execution of the commands of a program proceeds by a sequence <c0,st(in0,out0,sto0)> → <c1,st(in1,out1,sto1)> → <c2,st(in2,out2,sto2)> → ... where in is a queue of input values the program reads as input, out is a queue of output values the program writes, and sto is a list of name=value bindings.

Given this machine state model, we can say that two programs c1 and c2 are semantically equivalent if they produce the same final state sf for any input state s or both do not terminate on s. That is, <c1,s> → <skip,sf> iff <c2,s> → <skip,sf> and <c1,s> → ∞ iff <c1,s> → ∞.

The one-step evaluation relation for Wren commands can be implemented as follows:

%_________________________CONCLUSION__________________________ :- ___RULE___, ______PREMISES_____.
(ID := E,st(IN,OUT,STO))      :> (ID := Ep,st(IN,OUT,STO))     :- rule('C1'), (E,STO) :> (Ep,STO).

(ID := V,st(IN,OUT,STO))      :> (skip,st(IN,OUT,STOp))        :- rule('C2'), STOp = [ID=V|STO].

(if(E,C1,C2),st(IN,OUT,STO))  :> (if(Ep,C1,C2),st(IN,OUT,STO)) :- rule('C3'), (E,STO) :> (Ep,STO).

(if(true,C1,_C2),STATE)       :> (C1,STATE)                    :- rule('C4').

(if(false,_C1,C2),STATE)      :> (C2,STATE)                    :- rule('C5').

(if(E,C),STATE)               :> (if(E,C,skip),STATE)          :- rule('C6').

(while(E,C),STATE)            :> (if(E,(C;while(E,C))),STATE)  :- rule('C7').

((C1;C2),STATE)               :> ((C1p;C2),STATEp)             :- rule('C8'), (C1,STATE) :> (C1p,STATEp).

((skip;C),STATE)              :> (C,STATE)                     :- rule('C9').

(read(ID),st([V|IN],OUT,STO)) :> (skip,st(IN,OUT,STOp))        :- rule('C10'), STOp = [ID=V|STO].

(write(E),st(IN,OUT,STO))     :> (write(Ep),st(IN,OUT,STO))    :- rule('C11'), (E,STO) :> (Ep,STO).

(write(V),st(IN,OUT,STO))     :> (skip,st(IN,OUTp,STO))        :- rule('C12'), append(OUT, [V], OUTp).

We use functor st/3 to hold the state consisting of a list of input values, output values, and a store. The store is a list that is populated by the assignment and read commands. These commands add a new binding name=value to the front of a new store STOp = [ID=V | STO]. Output for the write command is appended to the output list.

Example execution:

?- Cmd = (read(x); x := x+1; write(x)), STATE = st([5],[],[]), show, (Cmd,STATE) *> FINALSTATE.
          /               \        
   ______;               _st____   
  /       \             /    |  \  
rea        ;____       .    []  [] 
 |        /     \     / \          
 x     _:=      wri  5  []         
      /   \      |                 
     x     +     x                 
          / \                      
         x   1                     
      (read(x),st([5],[],[])):> (skip,st([],[],[x=5]))

    (read(x),st([5],[],[])):> (skip,st([],[],[x=5]))
    ((read(x);x:=x+1;write(x)),st([5],[],[])):> ((skip;x:=x+1;write(x)),st([],[],[x=5]))

           /                  \          
    ______;_             _____st__       
   /        \           /  |      \      
skip         ;____    []  []       .__   
            /     \               /   \  
         _:=      wri            =    [] 
        /   \      |            / \      
       x     +     x           x   5     
            / \                          
           x   1                         

    ((skip;x:=x+1;write(x)),st([],[],[x=5])):> ((x:=x+1;write(x)),st([],[],[x=5]))

        /               \          
       ;____       _____st__       
      /     \     /  |      \      
   _:=      wri []  []       .__   
  /   \      |              /   \  
 x     +     x             =    [] 
      / \                 / \      
     x   1               x   5     
          (x,[x=5]):> (5,[x=5])

        (x,[x=5]):> (5,[x=5])
        (x+1,[x=5]):> (5+1,[x=5])

      (x+1,[x=5]):> (5+1,[x=5])
      (x:=x+1,st([],[],[x=5])):> (x:=5+1,st([],[],[x=5]))

    (x:=x+1,st([],[],[x=5])):> (x:=5+1,st([],[],[x=5]))
    ((x:=x+1;write(x)),st([],[],[x=5])):> ((x:=5+1;write(x)),st([],[],[x=5]))

        /               \          
       ;____       _____st__       
      /     \     /  |      \      
   _:=      wri []  []       .__   
  /   \      |              /   \  
 x     +     x             =    [] 
      / \                 / \      
     5   1               x   5     
        6 is 5+1
        (5+1,[x=5]):> (6,[x=5])

      (5+1,[x=5]):> (6,[x=5])
      (x:=5+1,st([],[],[x=5])):> (x:=6,st([],[],[x=5]))

    (x:=5+1,st([],[],[x=5])):> (x:=6,st([],[],[x=5]))
    ((x:=5+1;write(x)),st([],[],[x=5])):> ((x:=6;write(x)),st([],[],[x=5]))

      /             \          
     ;__       _____st__       
    /   \     /  |      \      
  :=    wri []  []       .__   
  / \    |              /   \  
 x   6   x             =    [] 
                      / \      
                     x   5     
      (x:=6,st([],[],[x=5])):> (skip,st([],[],[x=6,x=5]))

    (x:=6,st([],[],[x=5])):> (skip,st([],[],[x=6,x=5]))
    ((x:=6;write(x)),st([],[],[x=5])):> ((skip;write(x)),st([],[],[x=6,x=5]))

     /                \              
    ;_       _________st__           
   /  \     /  |          \          
skip  wri []  []       ____.__       
       |              /       \      
       x             =         .__   
                    / \       /   \  
                   x   6     =    [] 
                            / \      
                           x   5     

    ((skip;write(x)),st([],[],[x=6,x=5])):> (write(x),st([],[],[x=6,x=5]))

  /             \              
wri    _________st__           
 |    /  |          \          
 x  []  []       ____.__       
                /       \      
               =         .__   
              / \       /   \  
             x   6     =    [] 
                      / \      
                     x   5     
      (x,[x=6,x=5]):> (6,[x=6,x=5])

    (x,[x=6,x=5]):> (6,[x=6,x=5])
    (write(x),st([],[],[x=6,x=5])):> (write(6),st([],[],[x=6,x=5]))

  /             \              
wri    _________st__           
 |    /  |          \          
 6  []  []       ____.__       
                /       \      
               =         .__   
              / \       /   \  
             x   6     =    [] 
                      / \      
                     x   5     
    append([], [6], [6])
    (write(6),st([],[],[x=6,x=5])):> (skip,st([],[6],[x=6,x=5]))

   /                \                
skip     ___________st____           
        /    |            \          
      []     .         ____.__       
            / \       /       \      
           6  []     =         .__   
                    / \       /   \  
                   x   6     =    [] 
                            / \      
                           x   5     
Cmd = (read(x);x:=x+1;write(x)),
STATE = st([5], [], []),
FINALSTATE = (skip, st([], [6], [x=6, x=5])).

Natural semantics of Wren

See Figure 8.9 p.262 from [SSPL Ch.8.6] for inference rules for the natural semantics of Wren.

%_____________________CONCLUSION______________________ :- ___RULE___, _______________________PREMISES______________________________.
(IE1 +   IE2,STO)             => N                     :- rule('B1'), (IE1,STO) => N1, (IE2,STO) => N2, N is N1+N2.
(IE1 <   IE2,STO)             => B                     :- rule('B2'), (IE1,STO) => N1, (IE2,STO) => N2, (N1 <  N2 -> B = true; B = false).
(BE1 and BE2,STO)             => B                     :- rule('B3'), (BE1,STO) => B1, (BE2,STO) => B2, (B1 = true,  B2 = true  -> B = true;  B = false).
(not(BE),STO)                 => B                     :- rule('B4'), (BE,STO) => B1, (B1 = true -> B = false; B = true).

(ID,STO)                      => V                     :- rule('B5'), atom(ID), member(ID=V, STO), !.

(V,_STO)                      => V                     :- rule('B6'), atomic(V).

(ID := E,st(IN,OUT,STO))      => st(IN,OUT,[ID=V|STO]) :- rule('B7'), (E,STO) => V.

(if(E,C1,C2),st(IN,OUT,STO))  => STATEp                :- rule('B8'), (E,STO) => B,
                                                                      ( B = true  -> (C1,st(IN,OUT,STO)) => STATEp
                                                                      ; B = false -> (C2,st(IN,OUT,STO)) => STATEp
(if(E,C),st(IN,OUT,STO))      => STATEp                :- rule('B9'), (E,STO) => B,
                                                                      ( B = true -> (C,st(IN,OUT,STO)) => STATEp
                                                                      ; STATEp = st(IN,OUT,STO)
(while(E,C),st(IN,OUT,STO))   => STATEpp               :- rule('B10'), (E,STO) => B,
                                                                       ( B = true -> (C,st(IN,OUT,STO)) => STATEp,
                                                                                     (while(E,C),STATEp) => STATEpp
                                                                       ; STATEpp = st(IN,OUT,STO)
((C1;C2),STATE)               => STATEpp               :- rule('B11'), (C1,STATE) => STATEp, (C2, STATEp) => STATEpp.

(skip,STATE)                  => STATE                 :- rule('B12').

(read(ID),st([V|IN],OUT,STO)) => st(IN,OUT,[ID=V|STO]) :- rule('B13').

(write(E),st(IN,OUT,STO))     => st(IN,OUTp,STO)       :- rule('B14'), (E,STO) => V, append(OUT, [V], OUTp).

Again, note that expression evaluation needs a store with name=value bindings. Commands are state transformers, i.e. we map a command and state to a new state. The natural semantics do not modify the expression/program term in the evaluation process, but rather produces a value (for expressions) and an updated state (for commands). Note that the while-loop semantics is defined by recursion, where the while/2 term is evaluated again in the recursive step when the condition is true.

Exercise: modify the natural semantics of Wren expressions to include a where construct that locallly binds a name to a value. For example:

(where(x+y, y, 2), [x=1]) => 3

This corresponds to the use of "where" in functional languages. For example, the expression in Haskell is:

x+y where y=2

That is, the name y is bound to 2 in the expression x+y.

Exercise: modify the natural semantics of the and and or operations in Wren to short-circuit logical operators. That is, the second operand is only evaluated when necessary. For example, x<>0 and 1/x==y does not evaluate expression 1/x==y when x==0.

Exercise: modify the natural semantics of Wren expressions and commands to have a state with memory consisting of a list of location=value bindings (memory cells). The store is changed to name=location bindings. Thus, values of variables are now physically stored in memory. Variables can be aliases, since two variables can have the same location binding. For example,

?- (x := x+y+z, st([],[],[x=0,y=0,z=1],[0=3,1=4]) => V.
V = st([],[],[x=0,y=0,z=1],[0=10,1=4]) .

where x and y are aliases. To define the evaluation of the assignment command, use Prolog select/3 to remove a cell at a location LOC from memory MEM and construct a new memory cell MEMpp with a value V at location LOC:

select(LOC=_, MEM, MEMp), MEMpp = [LOC=V | MEMp]

and/or use delete(MEM, LOC=_, MEMp), which does not fail when the term LOC=_ to be removed is not in the list MEM.

Programming With Functions

Functional thinking by Neal Ford

Summary: "Neal Ford emphasizes the fact that functional programming uses a different way of solving a problem, thinking about the results rather than the steps to make."

Functional concepts in modern programming languages

Functions in C

Function pointers in C are rather primitive.

int f(int,int);      // a function
int (*pf)(int,int);  // a pointer to a function
pf = f;
int n = pf(1,2);     // call f(1,2)

Function pointers in C have no state. They are sometimes necessary to pass code along to other functions as callbacks, for example Clib qsort:

int icomp(int *a,int *b) { return *a < *b ? -1 : *a > *b ? 1 : 0; }
int a[100];
qsort(a, 100, sizeof(int), icomp);

While passing function pointers as callbacks to other functions does not appear problematic, it is more interesting when we want to pass a function that has an internal state, for example a counter that is updated when the function is called:

static int count = 0;
int icomp(int *a,int *b)
{ count++;
  return *a < *b ? -1 : *a > *b ? 1 : 0;
int a[100];
qsort(a, 100, sizeof(int), icomp);
printf("Comparisons made = %d\n", count);

Ideally, we would like to make the function local, to avoid the static counter, as shown in the pseudo-C code (not valid in ANSI C):

int foo()
{ int count = 0;
  int icomp(int *a,int *b)
  { count++;
    return *a < *b ? -1 : *a > *b ? 1 : 0;
  int a[100];
  qsort(a, 100, sizeof(int), icomp);
  printf("Comparisons made = %d\n", count);

In this case the scope of the variable count extends or "bleeds" into function icomp, which is what we wanted. Programming languages that support functions as first-class objects offer this advantage. First-class means that functions can be declared anywhere a value-based variable can declared and assigned, i.e. functions can be nested, can be passed to other functions, and returned from functions. Programming languages such as ADA support (almost) first-class functions. Functions in functional languages are always first class.

Returning a function from another function is a bit more interesting. The following pseudo-C code with a local function f declared locally in function fk (not valid in ANSI C):

typedef int (*F)(int,int);
F fk(int k)
{ int f(int a, int b) { return a + b + k; };  // a local function f
  return f;                                   // that is returned as a pointer
F pf = fk(7);     // pf = f, with f declared in fk() and k=7
int n = pf(1,2);  // call f(1,2)

Here we attempt to return a function f as a closure with an internal state parameterized with k=7 being part of the outer scope of f. So it is assumed that when pf(1,2) is called the value k=7 is used. But note that k is no longer in scope (and is deallocated from the stack if this were valid C!). For functions to be closures, the state of the variables referenced in the outer scope must be preserved in the closure.

Function Objects in C++

"Functors" or "function objects" in C++ are in essence just objects with nice function-like syntax. An explicit state is kept in the class instance. Function objects without state are callbacks. The term "functor" usually refers to a function object that is not a function pointer (a callback).

For example, the state of k is maintained by the functor F:

class F
{ int k;
  int operator()(int a, int b) { return a + b + k; }
  F(int k) { F::k = k; }
F f(7);
int n = f(1,2);

Note that the state F::k=7 is explicitly set through the constructor, whereas with nested functions the state defined in the outer scope simply "bleeds" into locally-defined function as was shown in the pseudo-C example above. That is, variables in the nonlocal scope can be referenced while with true closures we can.

Exercise: since we stated that function objects are not closures, can the following code work?

int foo()
{ int k=7;
  class F
  { public:
      int operator()(int a,int b) { return a + b + k; }
  } f;
  int n = f(1,2);

STL defines unary and binary function objects for arithmetic, comparison, logical opertions, and selections. These operators are used with STL algorithms to iterate over objects for traversing, transforming, searching, and sorting container objects using their iterators.

For example, the transform "algorithm" iterates over objects to produce a result by applying a unary or binary operation. The operations are function objects. Here, we square the elements of a vector to produce a second vector, sort it, and then print it out:

#include <iostream>
#include <vector>
#include <algorithm>
#include <functional>
struct isqr : public unary_function<int, int>
  int operator()(int x) { return x*x; }
template<typename T> struct print : public unary_function<T, void>
  print(ostream& out) : os(out) {}
  void operator()(T x) { os << x << ' '; }
  ostream& os;
vector<int> V1(N), V2(N);
transform(V1.begin(), V1.end(), V2.begin(), isqr());
sort(V2.begin(), V2.end(), less<int>());
for_each(V2.begin(), V2.end(), print<int>(cout));

Note that isqr and print function objects are derived from unary_function.

Exercise: replace isqr with a template sqr function object to compute the square of an int and double.
Answer 1:

template<typename T> struct sqr : public unary_function<T, T>
  T operator()(T x);
template<> struct sqr : public unary_function<int, int>
  int operator()(int x) { return x*x; }
template<> struct sqr : public unary_function<double, double>
  double operator()(double x) { return x*x; }
Answer 2: with traits templates
template<typename T> class SquareTraits;
template<> class SquareTraits<int> { public: typedef int type; };
template<> class SquareTraits<double> { public: typedef double type; };
template<typename T> struct sqr : public unary_function<T, T>
typename SquareTraits<T>::type operator()(typename SquareTraits<T>::type x) { return x*x; }

We generally refer to an operation applied over a container to produce another container as a map. In the example above we used transform as a map.

The Boost Lambda Library (BLL) for C++ simplifies the definition of function objects "on the fly" (or "inline"):

transform(V1.begin(), V1.end(), V2.begin(), _1 * _1);
sort(V2.begin(), V2.end(), _1 < _2);
for_each(V2.begin(), V2.end(), cout << _1 << ' ');

This works by constructing function objects from the arguments _1, _2, _3 etc. which are bound to an operation such as * and <. The bind opertion is used to bind arguments to a function. For example, bind(sin, _1) binds the sine function to one argument to create a function object.

The Lambda Library approximates the lambda abstraction mechanism of lambda calculus. BLL lambdas are function objects and not real closures, because variables in an outer scope cannot be referenced in a BLL lambda. Also, a minor inconvenience is that no C++ statements are allowed with BLL, only expressions.

The ability to introduce functions and code blocks in expressions is an essential part of lambda calculus and languages that implement closures such as Haskell, Scheme, Python, Java closures, and Ruby. We will see more about lambda calculus later.

The new C++0x standard introduces "lambda function" closures, in which one ore more variables, say x, in the outer scope can be accessed by value [=x] or by reference [&x]:

int foo()
{ int k = 7, n = 0;
  function<int (int,int)> f = [=k,&n](int a, int b) -> int { n++; return a + b + k; };

int bar(function<int (int,int)> f)
{ return f(1,2);

However, we have to be careful. The environment of variable bindings in which the closure was constructed is not saved, which means we get into trouble when we pass variables in the outer scope by reference and the outer scope is no longer valid when the function is executed, for example when we pass k by reference [&k] into the closure:

function<int (int,int)> fk(int k)
{ function<int (int,int)> f = [&k](int a, int b) -> int { return a + b + k; };
  return f;
function<int (int,int)> f = fk(7);
int n = f(1,2); // OOPS

Passing k by value [=k] into the closure is fine:

function<int (int,int)> fk(int k)
{ function<int (int,int)> f = [=k](int a, int b) -> int { return a + b + k; };
  return f;
function<int (int,int)> f = fk(7);
int n = f(1,2);

A lambda function that does not reference any variables in the outer scope is essentially a function pointer to an anonymous function defined "inline" on the fly, for example:

sort(X.begin(), X.end(), [](double a, double b) -> bool { return a < b; });

The other extreme, when a lambda function references variables in the outer scope but has no arguments is called a thunk. Thunks can be used to program Jensen's device (based on Algol 60 parameter passing by name):

double integrate(function<double (double)> f, double &x, double a, double b, double h)
{ double sum = 0;
  for (x = a; x =< b; x += h)
    sum += f();
  return sum;
double y = 1.0;
double z = integrate([&]() -> double { return 2*x + x*x + y; }, x, 0.0, 10.0, 0.5);

Recommended reading: "C++ Templates" by D. Vandevoorde and Nicolai Josuttis.

More on function pointers, delegates, and member function pointers for C++ experts: Member Function Pointers and the Fastest Possible C++ Delegates

Higher-Order Functions in Haskell

For information on Haskell, see: a Gentle Introduction to Haskell. Other places to look: Haskell Tutorials.

In general, we refer to functions that take other functions as arguments as higher-order functions.

In the functional programming paradigm, higher-order functions typically operate over lists. There are many higher-order functions that can be used as building blocks to construct complex algorithms over lists. Here we will discuss the most common Haskell functions over lists.

First, we introduce the Haskell expression syntax which has a legacy in prior functional programming languages, such as ML and Miranda. The syntax is "clean" in the sense that we do not use parenthesis and commas for arguments in function calls. Parenthesis are solely used to group expressions.

We simply write

sqr x

instead of sqr(x) by dropping the parenthesis. For functions with multiple arguments, we can drop the comma as well and write

power x 2

instead of power(x,2). When an argument is a complex expression we need parenthesis, as in

power x (sqr 2)

whereas writing "power x sqr 2" applies power to three arguments, which is not intended.

The syntax may be baffling at first when you're used to C/C++ and Java, but it is simply limiting the parenthesis to only those cases when you really need them to group expressions. This avoid the "syntax overloading" of parenthesis and commas in C/C++/Java for other constructs such as to delimit arguments in function calls and keywords.

We now introduce many common (higher-order) functions over lists. A list as constructed with the cons (:) operator, meaning x:xs is the list with x as head and xs as tail.

Mapping a function f over a list:

map f [x1, x2, ..., xn] = [f x1, f x2, ..., f xn]

The map function satisfies

map f []     = []
map f (x:xs) = f x : map f xs

Haskell supports pattern matching in function arguments with list patterns [] and (x:xs), representing the empty list and non-empty list with head x and tail xs, respectively. Using the pattern matching capability, the equations for map actually define the map function recursively. This leads to the following evaluation steps when map is applied to map f [1,2]:

  /     \      
 f     __:     
      /   \    
     1     :   
          / \  
         2  []
  /     \      
 f     map     
 |    /   \    
 1   f     :   
          / \  
         2  []
  /     \      
 f     __:     
 |    /   \    
 1   f    map  
     |    / \  
     2   f  []
  /   \    
 f     :   
 |    / \  
 1   f  [] 

Note: pattern matching in Haskell is more restrictive than Prolog unification. There are only two patterns for lists, [] and (x:xs). Other Haskell patterns are 0 and n for integers, and patterns for variant records (constants and contructors in "tagged unions").

An alternative definition without pattern matching can be given using the if-then-else ternary function if':

map f xs = if' (null xs) [] (f (head x) : map f (tail xs))

where the following built-in functions are used:

head (x:xs) = x

tail (x:xs) = xs

null []     = true
null (x:xs) = false

if' x y z = case x of
              true  -> y
              false -> z

Needless to say that pattern matching helps to reveal the true meaning of a function without the need to use any of the obfuscating primitive functions on lists.

Another example with a list pattern is:

length []     = 0
length (x:xs) = 1 + length xs

Evaluation of length [a,b] proceeds recursively as follows:

  /   \    
 a     :   
      / \  
     b  []
  /   \    
 1  length 
      / \  
     b  []
  /   \    
 1     +   
      / \  
     1  length
  /   \    
 1     +   
      / \  
     1   0

To append lists we define the ++ infix operator as follows:

[]     ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)

Note that the resulting list is a copy of the list xs followed by the shared list ys:

      /         \      
   __:         __:     
  /   \       /   \    
 1     :     3     :   
      / \         / \  
     2  []       4  []
  /         \          
 1       ___++__       
        /       \      
       :       __:     
      / \     /   \    
     2  []   3     :   
                  / \  
                 4  []
  /         \          
 1     ______:         
      /       \        
     2     ___++       
          /     \      
        []     __:     
              /   \    
             3     :   
                  / \  
                 4  []
  /       \        
 1     ____:       
      /     \      
     2     __:     
          /   \    
         3     :   
              / \  
             4  [] 

Filtering elements of a list results in a list of elements x such that p x holds. We can define filter with a case construct:

filter p []     = []
filter p (x:xs) = case (p x) of
                    true  -> x : filter p xs
                    false -> filter p xs

Alternatively, we can define filter using a guard to test if p x is true to put the element x in the new list or to skip over it:

filter p []           = []
filter p (x:xs) | p x = x : filter p xs
                |     = filter p xs

   /      \      
 odd     __:     
        /   \    
       1     :   
            / \  
           2  []
  /      \       
 1     filter    
       /    \    
     odd     :   
            / \  
           2  []
  /    \     
 1   filter  
       /  \  
     odd  []
  / \  
 1  []

We make a small change to the definition of filter's third case by returning [] when p x is false to get

takeWhile p []           = []
takeWhile p (x:xs) | p x = x : takeWhile p xs
                   |     = []

Likewise we make a small change to the second and third case to get

dropWhile p []           = []
dropWhile p (x:xs) | p x = dropWhile p xs
                   |     = x:xs

Note that we have a bit of inefficiency in the above when we match x:xs and then create x:xs again. To by using an "as pattern" s@(x:xs) where argument s contains the matched pattern (x:xs):

dropWhile p []             = []
dropWhile p s@(x:xs) | p x = dropWhile p xs
                     |     = s

So, takeWhile p xs returns the initial list of elements x in xs for which p x is true, and dropWhile p xs returns the list of remaining elements. Hence, we have that

xs = (takeWhile p xs) ++ (dropWhile p xs)

for any p and xs.

To reduce a list from the right to a single value is called a fold right:

foldr f a [x1, x2, ..., xn] = f x1 (f x2 (f ... (f xn a)))

where f is a binary operator , so we can write this more clearly as:

foldr () a [x1, x2, ..., xn] = x1 (x2 (... (xn a)))

which is represented by the expression tree:

  /       \        
x1     ____⊗       
      /     \      
    x2       ⋱     
        ⋱     ⊗   
              / \  
            xn   a

The fold right is defined as

foldr f a []     = a
foldr f a (x:xs) = f x (foldr f a xs)

Evaluation of foldr (*) 3 [1,2] proceeds as follows:

  /  |      \      
 *   3     __:     
          /   \    
         1     :   
              / \  
             2  []
  /       \        
 1     __foldr     
      /  |    \    
     *   3     :   
              / \  
             2  []
  /       \        
 1     ____*       
      /     \      
     2     foldr   
          /  |  \  
         *   3  []
  /   \    
 1     *   
      / \  
     2   3

To reduce a list from the left to a single value is called a fold left:

foldl f a [x1, x2, ..., xn] = f (f (f (f a x1) x2) ...) xn

where f is a binary operator , so we can write this more clearly as:

foldr () a [x1, x2, ..., xn] = (((a x1) x2) ...) xn

which is represented by the expression tree:

        /       \  
       ⊗____    xn 
      /     \      
     ⋰      x3     
   ⊗    ⋰         
  / \              
 a  x1

The fold left is defined as

foldl f a []     = a
foldl f a (x:xs) = foldl f (f a x) xs

Evaluation of foldl (*) 1 [2,3] proceeds as follows:

  /  |      \      
 *   1     __:     
          /   \    
         2     :   
              / \  
             3  []
  /    |      \    
 *     *       :   
      / \     / \  
     1   2   3  []
  /      |      \  
 *       *__    [] 
        /   \      
       *     3     
      / \          
     1   2
    /   \  
   *     3 
  / \      
 1   2

The fold operations are particularly interesting to serve as building blocks for other functions:

xs ++ ys = foldr (:) ys xs

length xs = foldr oneplus 0 xs
              where oneplus x n = 1 + n

takeWhile p xs = foldr consifp [] xs
                   where consifp x xs | p x = x:xs
                                      |     = []

concat xss = foldr (++) [] xss

reverse xs = foldl snoc []
               where snoc xs x = x:xs

sum xs = foldr (+) 0 xs

prod xs = foldr (*) 0 xs

and xs = foldr (&&) true xs

or xs = foldr (||) false xs

Note that reverse defined above takes O(n) with n = length xs to reverese the list, because foldl applies snoc n times. The naive implementation:

reverse []     = []
reverse (x:xs) = reverse xs ++ [x]

takes O(n2) time.

We state some useful observations (laws).

THEOREM [First Duality Theorem]: Let (S,⊕) be a monoid with identity element a. Then foldr () a xs = foldl () a xs.

For example:

sum xs = foldr (+) 0 xs
       = foldl (+) 0 xs

concat xss = foldr (++) [] xss
             foldl (++) [] xss.

Exercise: which of the last two choices of fold in concat is the most efficient, given that xs ++ ys takes O(k) with k = length xs to compute? Assume we take concat over a list of n lists each of length m.
Answer: foldr takes O(mn) time, whereas foldl takes O(m2n) time. The fold right is more efficient because each application of xs ++ ys takes O(m) time since each xs is of length m and ys is the concatenated lists starting from the right in xss.

THEOREM [Second Duality Theorem]: Let ⊕ and ⊗ be operators such that x ⊕ (yz) = (xy) ⊗ z and xa = ax. Then foldr () a xs = foldl () a xs.

For example:

length xs = foldr oneplus 0 xs
              where oneplus x n = 1 + n

          = foldl plusone 0 xs
              where plusone n x = n + 1

since oneplus x1 (plusone n x2) = 1+(n+1) = oneplus (plusone x1 n) x2.

THEOREM [Third Duality Theorem]: Let ⊕ and ⊗ be operators such that xy = y x. Then foldr () a xs = foldl () a (reverse xs).

For example:

xs = foldr (:) [] xs
   = foldl (snoc) [] (reverse xs)
       where snoc xs x = x:xs

Hence, by the fact that reverse (reverse xs) = xs we have that

reverse xs = foldl snoc [] xs
               where snoc xs x = x:xs

Folds over monoids can be segmented by partitioning the list into parts xs = ys ++ zs:

foldl () a xs = foldl () a (ys ++ zs) = (foldl () a ys) (foldl () a zs)

foldr () a xs = foldr () a (ys ++ zs) = (foldr () a ys) (foldr () a zs)

This observation leads to balanced reduction trees, for example in parallel sums.

Consider the evaluation relation ⇒ we used in defining the natural semantics of Wren. Suppose we define the evaluation function

eval state cmd = state'
                 where (cmd,state) ⇒ state'

Then a fold left of eval over a list of commands starting with an initial state state0 executes the commands in order to produce the final state statef:

statef = foldl eval state0 cmds

For example

foldl eval st([5],[],[]) ["read(x)","x:=x+1","write(x)"]
= foldl eval (eval st([5],[],[]) "read(x)") ["x:=x+1","write(x)"]
= foldl eval st([],[],[x=5])
= foldl eval (eval st([],[],[x=5]) "x:=x+1") ["write(x)"]
foldl eval st([],[],[x=6]) ["write(x)"]
= foldl eval (eval st([],[],[x=6]) "write(x)") []
= foldl eval st([],[6],[x=6]) []
= st([],[6],[x=6])

To insert an element in an ordered list:

insert x xs = takeWhile lesseq xs ++ [x] ++ dropWhile lesseq xs
                where lesseq y = (y <= x)

We can use insert for simple insertion sort using a fold right:

isort xs = foldr insert [] xs

We end by defining two zip functions, zip and zipWith:

zip [x1, x2, ..., xn] [y1, y2, ..., yn] = [(x1,y1), (x2,y1), ..., (xn,y1)]

zipWith () [x1, x2, ..., xn] [y1, y2, ..., yn] = [(x1y1), (x2y1), ..., (xny1)]

defined by

zip []     ys     = []
zip (x:xs) []     = []
zip (x:xs) (y:ys) = (x,y) : (zip xs ys)

zipWith f []     ys     = []
zipWith f (x:xs) []     = []
zipWith f (x:xs) (y:ys) = (f x y) : (zipWith f xs ys)

For example:

dotprod xs ys = foldr (+) 0 (zipWith (*) xs ys)

Say we take the dot product:

dotprod [3,1,4] [1,5,9] ⇒ foldr (+) 0 (zipWith (*) [3,1,4] [1,5,9]) ⇒ foldr (+) 0 [3,5,36] ⇒ 3+(5+(36+0)) ⇒ 44

Let's take a closer look at the expression syntax. Application of a function to multiple arguments associates to the left:

f a b = (f a) b

When we make the application operation explicit using an operator, say @, this becomes more clear:

f @ a @ b = (f @ a) @ b

    /   \  
   @     b 
  / \      
 f   a

Note that argument expressions must be parenthesized:

f (g a) (h b)

      /       \    
   __@         @   
  /   \       / \  
 f     @     h   b 
      / \          
     g   a

Let's consider the following definition again:

sum xs = foldr (+) 0 xs

Because application associates to the left, we can rewrite this as:

sum xs = ((foldr (+)) 0) xs

which is depicted as:

   /  \  
 sum  xs
        /       \  
       @____    xs 
      /     \      
     @__     0     
    /   \          
 foldr   +

The simplified equation is:

sum = foldr (+) 0

Viewed as a definition, it eliminates the need to include the argument xs on the left-hand side, because the function in the body of the definition on the right-hand side is applied to xs. This principle is referred to as "Currying" in honor of Haskell Curry.

Consider for example:

add x y = x + y

inc = add 1

We can write (+) to refer to the addition operation as a function, so:

inc = (+) 1

Haskell uses the following Currying rules for infix operators, called "sections" for partial application:

( a) x = x a

(a ) x = a x

Currying is particularly useful for higher-order functions:

map (- 1) [1,2,3] = [0,1,2]

map (2 *) [1,2,3] = [1,4,9]

filter (> 1) [1,2,3] = [2,3]

List comprehensions form a convenient syntax to process lists. Several programming languages support list comprehensions, for example:

[ 2*x | x <- [1..100], x^2 > 3 ] Haskell
[2 * x for x in range(101) if x ** 2 > 3] Python
[? List: 2 * x | x <- 1 -- 100 ; x * x > 3 ?] OCaml
[2*n | n=1..100; n*n > 3] Pure
[2*i for (n in (function(start,end){for (var i=start; i<=end; i++) yield i})(1,100)) if (n*n>3)] Javascript 1.8

Let's take a closer look at the Haskell list comprehension syntax, which is based on prior functional languages such as Miranda and ML.

The syntax of list comprehensions in Haskell is

[ expr | qualifier, qualifier, ... ]

where a qualifier is a generator expression of the form pattern <- list, a predicate for filtering, or a local variable binding x = expr.

For example:

divisors n = [ d | d <- [1..n], n mod d = 0 ]

prime n = (divisors n = [1,n])

where the notation [a..b] denotes a list of integers from a to b.

A faster primality check is:

prime n = ([ d | d <- [2..isqrt n], n mod d = 0] = [])

Cartesian product:

cartesian xs ys = [ (x,y) | x <- xs, y <- ys ]


qsort []     = []
qsort (p:xs) = qsort [ x | x <- xs, x < p] ++ [p] ++ qsort [ x | x <- xs, x >= p ]

In fact, any list comprehension can be translated to an expression with map, filter, and concat using the following rules (roughly):

[ x | x <- xs] = xs
[ f x | x <- xs ] = map f xs
[ e | x <- xs, p x, ... ] = [ e | x <- filter p xs, ... ]
[ e | x <- xs, y <- ys, ... ] = concat [ [ e | y <- ys, ... ] | x <- xs ]

For example,

qsort []     = []
qsort (p:xs) = qsort (filter (< p) xs) ++ [p] ++ qsort (filter (>= p) xs)

by noting that (< p) and (>= p) are Curried functions.

Lazy evaluation in Haskell allows for defining "infinite" data structures. More about lazy evaluation later, when we review lambda calculus evaluation modes.

Basically, lazy evaluation allows us to safely define an infinite list, such as

from n = n : from (n+1)

and use from to produce a list of integers in a range:

range a b = take (b-a+1) (from a)

where the Haskell take function is defined as:

take 0 xs     = []
take n []     = []
take n (x:xs) = x : take (n-1) xs

This works by evaluating from only to the point necessary (being lazy). This means that actual non-strict arguments are never evaluated before passing them to the function. But rather the expression is passed as an argument into the function body and then evaluated when needed (viz. pass-by-name). For example:

range 1 2 ⇒ take 2 (from 1)
  ⇒ take 2 (1 : from 2)
  ⇒ 1 : take 1 (from 2)
  ⇒ 1 : take 1 (2 : from 3)
  ⇒ 1 : 2 : take 0 (from 3)
  ⇒ 1 : 2 : []
  ⇒ [1,2]

Another advantage of lazy evaluation is that potentially expensive high-order functions can be used to accomplish a simpler task efficiently that could otherwise only be done efficiently by recoding it in a new function.

For example, to search a list for a value we can define:

isin x []              = false
isin x (y:xs) | x == y = true
              |        = isin x xs

Or we can use a fold right:

isin x xs = foldr match false xs
              where match y b | x == y = true
                              |        = b

Due to the benefits of lazy evaluation, this latter definition will not traverse the entire list to find a match, but rather the search stops as soon as the element is found.

Why is this so? Suppose we evaluate isin 2 [1,2,3] = foldr match false [1,2,3], where match y b = true when y == 2 else match y b = b:

    /      |          \        
 match   false     ____:       
                  /     \      
                 1     __:     
                      /   \    
                     2     :   
                          / \  
                         3  []
  /             \              
 1       ______foldr____       
        /      |        \      
     match   false     __:     
                      /   \    
                     2     :   
                          / \  
                         3  []
    /      |        \      
 match   false     __:     
                  /   \    
                 2     :   
                      / \  
                     3  []
  /           \            
 2       ____foldr____     
        /      |      \    
     match   false     :   
                      / \  
                     3  []

The same efficiency of lazy evaluation applies to takeWhile, defined by:

takeWhile p xs = foldr consp [] xs
                   where consp x xs | p x = x:xs
                                    |     = []

With lazy evaluation it is safe to evaluate takeWhile on an infinite list: takeWhile (< 10) (from 1) ⇒ [1,2,3,4,5,6,7,8,9].

Finally, we leave it as an exercise to define an nth function using a fold right to get the nth element of a list.

Exercise: use a fold right to define a function nth such that nth n a xs returns the nth element of xs, or if the list is too short returns a.

Answer: we can use zip to generate [(1,x1), (2,x2), ..., (m,xm)] from a list [x1, x2, ..., xm] and search for the nth tuple (n,xn) to find xn:

nth n a xs = foldr ith a (zip (from 1) xs)
               where ith (i x) y | i == n = x
                                 |        = y

Advanced programming techniques

MapReduce, functional style with immutability vs. imperative/OOP to avoid shared state, concurrency.

[... Will move this up ...]

The Untyped Lambda Calculus


A function is a mapping from the elements of a domain set to the elements of a range set (codomain set) given by a rule, for example:

cube: ℤ → ℤ
cube(n) = n3

Another commonly-used notation writes out the function as a mapping relation ↦:

cube: nn3

Lambda calculus emphasizes the mapping relation by dropping the name of the function altogether, i.e. making the functions anonymous. A lambda abstraction is an anonymous function of the form


where x is a variable and E an expression. For example:


which represents the cube function. To accept multiple arguments we nest abstractions as follows:


Compare this to the notation used by the following selection of programming languages that support lambda abstractions ("lambda functions" and closures):

One argument Multiple arguments Language
(lambda (n) (* n n n)) (lambda (n m) (- n m)) Scheme
fn n => n*n*n fn n m => n-m ML
fun n -> n*n*n fun n m -> n-m OCaml & F#
n => n*n*n (n, m) => n-m C#
\n -> n*n*n \n m -> n-m Haskell
[](int n) -> int { return n*n*n; } [](int n, int m) -> int { return n-m; } C++0x
lambda n: n*n*n lambda n, m: n-m Python
function(n) { return n*n*n; } function(n, m) { return n-m; } Javascript
{ n -> return n*n*n } { n, m -> return n-m } Groovy closures
(n: Int) => n*n*n (n: Int, m: Int) => n-m Scala

To apply an abstraction to one ore more actual arguments, we write each of the arguments to the right of the abstraction. For example:

(λn.n3) 2


(λn.λm.n-m) 3 1

The abstract syntax of a lambda term (or lambda expression) in the pure lambda calculus consists only of variables, abstrations, and applications:

E ::= x
    | λx.E
    | E E

where x is a name of a variable. The variable x in the abstraction λx.E is called a bound variable and λx is a binder for x whose scope is E. Applications are denoted E1 E2, where E1 is the operator (an expression that evaluates to a function) called the rator and E2 is the operand, or simply the rand.

In the non-pure lambda calculus, values v (such as constants, numbers, strings, data structures, and objects) are included in E:

E ::= v
    | x
    | λx.E
    | E E

We can also combine the language NB with lambda terms and obtain the language λNB:

E ::= true
    | false
    | if(E, E, E)
    | 0
    | succ(E)
    | pred(E)
    | iszero(E)
    | x
    | λx.E
    | E E

Parenthesis are used for grouping expressions. Abstractions are right associative:

λx.λy.z = λx.(λy.z)

Depicted as an abstract syntax tree:

/ \
λx .
/ \
λy z

Lambda application is left associative and syntactically binds more tightly than an abstraction, requiring parenthesis for the abstraction when applied to argument expressions:

(λx.x x) y z = ((λx.(x x)) y) z

For clarity, in this part of the talk we will use ':' to explicitly denote the application operation in lambda terms and in abstract syntax trees (we will avoid confusing the apply ':' with cons (:) that we used for lists by using 'cons' for lists) . Therefore, the lambda term can also be written


Depicted as an abstract syntax tree :

/ \
:____ z
/ \
__. y
/ \
λx :
/ \
x x

Lambda abstractions are closures, since innermost lambda abstractions may use variables bound by outer abstractions:

(λk.λa.λb.a+b+k) 7

where k=7 in the innermost abstraction. This is written in C++0x lambda function form as

[](int k) -> function<function<int (int)> (int)> { return [=k](int a) -> function<int (int)> { return [=a,=k](int b) -> int { return a + b + k; }; }; }(7)

At first inspection it seems that lambda terms are of little use beyond just defining anonymous functions in compact form. Furthermore, there are no constants, no data structures, no arithmetic, no Booleans, and no control flow.

However, lambda calculus has been shown to be Turing complete with the following semantic interpretation of lambda application by a one-step evaluation relation →β which is referred to as beta reduction:

(λx.E1) E2β [xE2]E1

The substitution [xE2]E1 replaces all free variables x in term E1 (the body of the abstraction) by term E2 (the actual argument expression).

Nested abstractions naturally lead to Currying and partial evaluation, since we can always supply a single argument expression to an abstraction and either get a value (a normal form) or another abstraction to be applied to the next argument expression:

(λn.n3) 28

(λn.λm.n-m) 3λm.3-m

The abstraction λm.3-m results from the partial application and can be applied to the next argument expression, when provided.

For example, let's define the two abstractions by name:

cube = λn.n3

diff = λn.λm.n-m

And evaluate:

cube (diff 3 1) def (λn.n3)((λn.λm.n-m) 3 1)
  β (λn.n3)([n3](λm.n-m) 1)
  = (λn.n3)((λm.3-m) 1)
  β (λn.n3)([m1](3-m))
  = (λn.n3)(3-1)
  δ (λn.n3) 2
  β [n2]n3
  = 23
  δ 8

When a globally-defined name f occurs in an expression, it is simply replaced with its value fdef v. In case of functions f, the value v is an abstraction. Arithmetic operation and other built-in operations are denoted by the evaluation relation →δ denoting the application of delta rules.

Free versus Bound Variables and Substitution

In the lambda abstraction


variable x is bound and y is free. This concept of bound versus free variables in lambda terms is similar to the familiar scoping of function arguments in programming languages such as C:

f(int x){ return x+y; }

where x is bound (the argument of f) and y is free (it is bound to something that is outside of the definition of f).

Substituion respects variable bindings to ensure that the proper variable occurrences x are replaced by term s while avoiding changing the meaning of other parts of E. This is the same problem when replacing a name in a C program for another, where we only want to replace the name when it is not locally defined (bound).

For example, suppose we want to replace name x with y in the C code

{ int x; ... x ... z ... } x ... { int z; ... x ... z ... }

then we want to avoid replacing the local x declared in the first block. So we get:

{ int x; ... x ... z ... } y ... { int z; ... y ... z ... }


[x y]((λx.x z) x (λz.x z)) = ((λx.x z) y (λz.y z))

But what happends when we naively substitute x by a different variable, say z, in our C fragment? Then we get:

{ int x; ... x ... z ... } z ... { int z; ... z ... z ... }

Note that the second replacement of x by z is captured and z becomes locally bound! This action changed the meaning of the program. To force substitution in these cases, we must rename the local z to a new unused variable, say t, and get:

{ int x; ... x ... z ... } z ... { int t; ... z ... t ... }

and then substitute as usual:

{ int x; ... x ... z ... } z ... { int t; ... z ... t ... }

Likewise, full substitution in lambda terms must avoid variable capture by renaming (λz.x z) to (λt.x t) first and then replace the x:

[x z]((λx.x z) x (λz.x z)) = ((λx.x z) z (λt.z t))

Renaming (λz.x z) to (λt.z t) is referred to as alpha conversion (or alpha renaming) in lambda calculus. More formally, alpha conversion is the relation →α defined by

(λx.E)α (λy.[x y]E) if yFV[E]

That is, we replace a name x with another, say y, but we should not rename by y if y occurs as a free variable yFV[E] in E, since this would lead to capturing all free y in E.

The set of free variables FV[E] of a lambda term E is formally defined as:

FV[v] = ∅ if v is a value
FV[x] = {x}
FV[λx.E] = FV[E]\{x}
FV[E1 E2] = FV[E1]FV[E2]

Substitution of x by a term s in expression E without alpha conversion, denoted [x s]E, is defined as:

[x s]v = v if v is a value
[x s]x = s
[x s]y = y if yx
[x s](λy.E) = λy.E if y = x
[x s](λy.E) = λy.[x s]E if yx and yFV[s]
[x s](E1 E2) = [x s]E1 [x s]E2

Capture-avoiding substitution by alpha conversion includes the following clause:

[x s](λy.E) = λz.[x s][yz]E if yx and yFV[s]

where z is a new variable such that zFV[E] and zFV[s].

For example, capture-avoiding substitution is needed when operands in applications are not closed (recall that a term t is closed when it contains no free variables, i.e. FV[t] = ∅):

(λx.λy.x+y) yβ [xy](λy.x+y) = λz.y+z

We saw that an important part of the operational semantics of lambda calculus is defined by beta reduction (see TPL Ch. 5):

(λx.E1) E2β [xE2]E1

where (λx.E1) E2 is called a beta redex. A redex is a reducible expression.

However, we have not yet defined an ordering to evaluate lambda applications. First, we should evaluate the rator of an application by reducing all redexes in the rator E1 by the rule:

E1β E'1 (E-App1)
E1 E2β E'1 E2

Then, we can choose to evaluate the rand of an application by reducing all redexes in the rand E2 by the rule:

E2β E'2 (E-App2)
E1 E2β E1 E'2

Furthermore, although as we see later this is rarely done or necessary, we could also choose to reduce redexes inside an abstraction by the rule:

Eβ E' (E-Abs)
λx.Eβ λx.E'

Evaluation Strategies

The question is, do we really need E2 in (E-AppAbs) to be a value (normal form) produced by rule (E-App2)? How about requiring E1 in (E-AppAbs) to be reduced to normal form by (E-Abs). Is this really necessary? Because we have multiple choices of beta reductions in a lambda expression, does it matter which one we pick and should we eventually reduce all beta redexes?


(λx.x2) ((λy.y+1) 3)

We can either apply the leftmost abstraction, giving

((λy.y+1) 3)2

and obtain

(3+1)2 = 16

Or we can apply the rightmost abstraction, giving

(λx.x2) (3+1)

and then

(3+1)2 = 16

The results are identical. As it turns out the order is immaterial when the evaluations terminate into a beta normal form. Termination of reduction is not guaranteed, and may depend on the redex we pick. A beta normal form is a lambda expression that has no beta redex, i.e. no subexpression of the form (λx.E1) E2. Values are in normal form, and so are abstractions that are not applied such as λx.λy.x (x y).

First, we need to define an equivalence relation on lambda expressions to compare (partially) evaluated results and beta normal forms.

Observe that the reflexive, symmetric, and transitive closure of alpha conversion →α satisfies

tα s if tα s
α t
tα s if sα t
tα s if tα r and rα s

THEOREM [Alpha Equivalence]: The relations →α* and α are identical.

Proof: The alpha reduction relation →α is symmetric: λx.Eα λy.[x y]Eα λx.[y x][x y]E = λx.E. Therefore, →α* is symmetric. ∎

The equivalence relation α over lambda expressions forms an equivalence class. That is, if t ≡α s then t and s are identical up to the choice of names for the bound variables.

The rules (E-App1), (E-App2), and (E-Abs) select a part of a lambda expression to evaluate that contains a beta redex. When there is more than one redex any one of the rules can be applied to pick a redex. This suggests the full beta-reduction scheme:

while there are beta redexes in t do
   reduce one of the redexes in t

When the above loop terminates, t is in beta normal form.

Full evaluation does no specificy which redex to reduce. Several different evaluation strategies for lambda calculus exist. The most important strategies are:

The call-by-value strategy is strict, in the sense that arguments to functions are always evaluated. By contrast, the non-strict (or lazy) strategies such as call-by-name and call-by-need evaluate only the arguments that are actually used.

Let's try this out and experiment with lambda calculus by defining the following abstract syntax of lambda expressions in Prolog:

E ::= v | x | \x.E | E:E

where values v are Prolog atoms (numbers and names), x is a Prolog atom name, \x.E is an abstraction, and E:E is an application.

:- op(900, xfy, .).	% lambda abstraction
:- op(600, yfx, :). % apply operator
:- op(200, fy, \). % lambda abstraction

For example, we write (λx.g x x) ((λy.f y) a) as (\x.g:x:x):((\y.f:y):a) and it abstract syntax tree is drawn as follows:

?- [beta].
?- draw((\x.g:x:x):((\y.f:y):a)).
        /             \        
   ____.               :____   
  /     \             /     \  
 \       :__       __.       a 
 |      /   \     /   \        
 x     :     x   \     :       
      / \        |    / \      
     g   x       y   f   y

We can now implement the operational semantics of lambda calculus as follows:

%_CONCLUS_ :- _________RULE_________, __PREMISES__.
X.T :> X.S :- rule('E-Abs'), T :> S.
F:T :> S :- rule('E-AppAbs-NOR'), beta(F:T, S).
F:T :> S :- rule('E-AppAbs-share'), beta(F:(T~_), S). F:T :> G:T :- rule('E-App1'), F :> G.
F:T :> F:S :- rule('E-App2'), T :> S.
F:T :> S :- rule('E-AppAbs-AOR'), beta(F:T, S). beta((X.T):S, R) :- subst(T, X=S, R).

There are two rules for (E-AppAbs), only one of which is enabled at any given time depending on the evaluation strategy NOR or AOR. With NOR the rule (E-AbsApp) should be tried first before (E-App2), hence the strategic rule ordering. Call-by-name (NOR to WHNF) and call-by-value strategies require disabling (E-Abs).

We can experiment with these strategies by enabling and disabling rules with by_name, by_value, nor, and aor:


% Lazy call by name strategy = NOR to weak head normal form (WHNF):
by_name :- nor, disable(['E-Abs', 'E-App2']).

% Lazy call by need strategy = NOR + WHNF + sharing:
by_need :- by_name, disable('E-AppAbs-NOR'), enable('E-AppAbs-share').

% Strict call by value strategy = AOR w/o lambda abstraction body reduction:
by_value :- aor, disable('E-Abs').

% Normal order reduction (NOR) strategy:
nor :- disable(['E-AppAbs-AOR', 'E-AppAbs-share']), enable(['E-Abs', 'E-App1', 'E-App2', 'E-AppAbs-NOR']).

% Applicative order reducton (AOR) strategy:
aor :- disable(['E-AppAbs-NOR', 'E-AppAbs-share']), enable(['E-Abs', 'E-App1', 'E-App2', 'E-AppAbs-AOR']).

For example, call-by-name reduces the lambda expression (λx.x) ((λy.y) (λz.(λu.u) z)) to WHNF by →β* using the reflexive, transitive relation *> of :>

?- [beta].
?- show, by_name, (\x.x):((\y.y):(\z.(\u.u):z)) *> C.
    /             \            
   .         ______:__         
  / \       /         \        
 \   x     .       ____.       
 |        / \     /     \      
 x       \   y   \       :__   
         |       |      /   \  
         y       z     .     z 
                      / \      
                     \   u     
    beta((\x.x): ((\y.y):(\z.(\u.u):z)),(\y.y):(\z.(\u.u):z))
    (\x.x): ((\y.y):(\z.(\u.u):z)):>(\y.y):(\z.(\u.u):z)

    /         \        
   .       ____.       
  / \     /     \      
 \   y   \       :__   
 |       |      /   \  
 y       z     .     z 
              / \      
             \   u     

  /     \      
 \       :__   
 |      /   \  
 z     .     z 
      / \      
     \   u     
C = (\z.(\u.u):z).

whereas AOR fully evaluates the lambda expression to a normal form:

?- show, aor, (\x.x):((\y.y):(\z.(\u.u):z)) *> C.
    /             \            
   .         ______:__         
  / \       /         \        
 \   x     .       ____.       
 |        / \     /     \      
 x       \   y   \       :__   
         |       |      /   \  
         y       z     .     z 
                      / \      
                     \   u     



    (\x.x): ((\y.y):(\z.(\u.u):z)):>(\x.x): ((\y.y):(\z.z))

    /         \        
   .         __:__     
  / \       /     \    
 \   x     .       .   
 |        / \     / \  
 x       \   y   \   z 
         |       |     
         y       z     

    (\x.x): ((\y.y):(\z.z)):>(\x.x):(\z.z)

    /     \    
   .       .   
  / \     / \  
 \   x   \   z 
 |       |     
 x       z     

  / \  
 \   z 
C = (\z.z).

To simulate call-by-need with NOR to WHNF with sharing, we use a pair of term~value in beta reduction of an argument by substituting E2~v into the abstraction instead of just the term E2:

(λx.E1) E2β [xE2~v]E1

where v is a new variable that is a placeholder for a shared value, so that for example

(λx.x x) ((λy.y) (λz.z)) β ((λy.y) (λz.z))~v ((λy.y) (λz.z))~v β (λz.z) ((λy.y) (λz.z))~(λz.z)β (λz.z) (λz.z) β (λz.z)

where ((λy.y) (λz.z))~vβ v with v = (λz.z) is evaluated with the following new rules:

Eβ* v if v is an uninstantiated variable in E~v
E~vβ v
  if v is a value
E~vβ v

To implement the call-by-need strategy in Prolog we just need to add:

%_CONCLUS_ :- _________RULE_________, __PREMISES__.
... F:T :> S :- rule('E-AppAbs-share'), beta(F:(T~_), S). ... T~V :> V :- var(V), T *> V. % Produce shared value V _~V :> V. % Use shared value V

The first beta reduction shows the sharing via a Prolog variable, in this case highlighted as V:

?- show, by_need, (\x.x:x):((\y.y):(\z.z)) *> C.
      /           \        
   __.           __:__     
  /   \         /     \    
 \     :       .       .   
 |    / \     / \     / \  
 x   x   x   \   y   \   z 
             |       |     
             y       z     
    beta((\x.x:x): ((\y.y):(\z.z))~V, ((\y.y):(\z.z))~V: ((\y.y):(\z.z))~V)
    (\x.x:x): ((\y.y):(\z.z)):> ((\y.y):(\z.z))~V: ((\y.y):(\z.z))~V

          /                 \          
         ~______             ~______   
        /       \           /       \  
     __:__       V       __:__       V 
    /     \             /     \        
   .       .           .       .       
  / \     / \         / \     / \      
 \   y   \   z       \   y   \   z     
 |       |           |       |         
 y       z           y       z

where eventually (\y.y):(\z.z) :> \z.z, so V = \z.z.

Church Rosser

The big question is whether the NOR and AOR full-evaluation strategies terminate in a normal form and if these normal forms are the same? If so, the normal form of a lambda expression is unique and can be considered the value computed by reducing the lambda expression. Note that call-by-name and call-by-value strategies may not always produce normal forms.

A relation → satisfies the diamond property if for all terms t, s, r such that ts and tr there exists a term u such that su and r u.


THEOREM: If a relation → satisfies the diamond property, so does its transitive closure →*.

Unfortunately, the one-step →β evaluation relation does not satisfy the diamond property.

Take for example (λx.x x) (λy.y a), then:

(λx.x x) (λy.y a) β λy.y a (λy.y a) β a (λy.y a) β a a


(λx.x x) (λy.y a) β (λx.x x) a β a a

We can easily construct a new evaluation relation → for which the transitive closure is the same as →β* as follows:

t t
t s  
λx.t λx.s
t r    s u  
t s r u
t r    s u  
(λx.t) s [xu]r

We can easily verify that →* is identical to →β*. We say that the → reduction relation and its transitive closure →β* are confluent relations by the diamond property. The →β reduction relation is weakly confluent.

THEOREM [Church-Rosser]: If t β s then there is a u such that tβ* u and sβ* u.

Proof: By induction on the β relation:
Suppose tβ s because tβ s, then take u = s such that tβ* u and sβ* u.
Suppose tβ s because t = s, then take u = t = s such that tβ* u and sβ* u.
Suppose tβ s because sβ t, then by the induction hypothesis there is a u such that tβ* u and sβ* u.
Suppose tβ s because tβ r and rβ s, then by the induction hypothesis there exist u' and u'' such that tβ* u' and rβ* u', and also rβ* u'' and sβ* u''. By the diamond property there exists a u such that u'β* u and u''β* u. ∎

It follows as a corollary from Church-Rosser that:

COROLLARY: Let t β u and u is in normal form, then tβ* u.

Proof: By Church-Rosser we have that tβ* u' and u β* u'. Since u is in normal form u β* u' = u and tβ* u.

COROLLARY: A lambda expression can have at most one normal form

Proof: Suppose u' and u'' are both beta normal forms of t. We have that t β u and t β u''. By Church Rosser there is a u such that u' β* u and u''β* u . Because u, u', and u'' are normal forms u = u' = u''.

THEOREM [Standardization Theorem]: If a lambda expression has a normal form, then the NOR strategy guarantees reaching that normal form.

Note that NOR may terminate when AOR does not, since NOR does not evaluate arguments that are not needed.

Assuming we use call-by-name (NOR to WHNF) reduction strategy, there is a very useful observation we can make: in the steps to reduce a closed lambda expression to WHNF we never encounter the variable capture problem because the operand argument that is substituted into a lambda abstraction is closed. To see why this is the case, consider:

(λx.λy. ...x...) (...y...)

However, because we started with a closed expression, there must be a binding for y in an outer abstraction:

(λy. ... (λx.λy. ...x...) (...y...) ...)

But since we never reduce inside abstractions as per WHNF, the application (λx. ...) (...y...) is never reduced before y is bound to a value, say a:

(λy. ... (λx.λy. ...x...) (...y...) ...) a β ... (λx.λy. ...x...) (...a...) ...

Therefore, the variable capture problem never occurs when substituting in beta reduction under call-by-name (and consequently, under call-by-need).

Church Encoding

The pure lambda calculus has no values other than lambda abstractions. This seems very limited. However, we can encode Booleans, natural numbers, and lists in the pure lambda calculus by Church encoding.

Church Booleans 'true' and 'false' are selector functions:

tru = λx.λy.x

fls = λx.λy.y

The idea here is that tru and fls select the first or second argument, respectively, which are used to select the then- and else-expressions in a conditional form:

tru thenexpr elseexprβ thenexpr

fls thenexpr elseexprβ elseexpr

so we can write an if-then-else conditional as:

test[condexpr, thenexpr, elseexpr] = condexpr thenexpr elseexpr

Logical operations are defined by

and = λx.λy.test[x,y,fls] = λx.λy.x y fls

or  = λx.λy.test[x,tru,y] = λx.λy.x tru y

Let's verify these:

and tru tru = (λx.λy.x y fls) tru tru β (λy.tru y fls) tru β tru tru flsβ tru

and tru fls = (λx.λy.x y fls) tru fls β (λy.tru y fls) fls β tru fls flsβ fls

and fls any = (λx.λy.x y fls) fls any β (λy.fls y fls) any β fls any flsβ fls

Likewise, we verify that:

or tru any = (λx.λy.x tru y) tru any β (λy.tru tru y) any β tru tru anyβ tru

or fls tru = (λx.λy.x tru y) fls tru β (λy.fls tru y) tru β fls tru truβ tru

or fls fls = (λx.λy.x tru y) fls fls β (λy.fls tru y) fls β fls tru flsβ fls

We implement logical negation by flipping arguments:

not = λx.λy.λz.x z y

Exercise: verify that not tru = fls and not fls = tru.

Church numerals are formed by the abstractions

0 = λs.λz.z
1 =
λs.λz.s z
2 = λs.λz.s (s z)
3 = λs.λz.s (s (s z))
4 =
λs.λz.s (s (s (s z)))
n =
λs.λz.s[n] z


s[0] z = z

s[n+1] z = s[n] (s z)

Thus, for any numeral n we have the identity n s z = s[n] z, which leads to the following definitions for zero, the successor, and is-zero:

zero = λs.λz.z

succ = λn.λs.λz.s (n s z)

iszo = λn.n (λs.fls) tru

We verify that

succ n = λs.λz.s[n+1] z = λs.λz.s (s[n] z)

iszo zero = zero (λs.fls) tru = (λs.λz.z) (λs.fls) tru = tru

iszo (n+1) = (n+1) (λs.fls) tru = (λs.λz.s[n+1] z) (λs.fls) tru = (λs.fls)[n+1] tru = (λs.fls) ((λs.fls)[n] tru) = fls

The addition function uses the identity λs.λz.s[n+m] z = s[n] (s[m] z):

plus = λn.λm.λs.λz.n x (m s z)

and multiplication uses the identity s[n*m] z = (s[n])[m] z:

mult = λn.λm.λs.λz.n (m s) z

Church pairs form the basic structure to create aggregate data structures such as lists. The idea is to use Booleans tru and fls as selectors to get the head or tail value of a pair. That is, a pair is a function that takes a Boolean b:

pair[h,t] = λb.b h t

such that

pair[h,t] tru = h

pair[h,t] fls = t

This leads to the following definitions to construct a pair and decompose a pair in a head and tail:

pair = λh.λt.λb.b h t

head = λp.p tru

tail = λp.p fls

The empty list null and is-empty test are functions:

null = λx.tru

empt = λx.x (λy.λz.fls)

such that

empt null = (λx.x (λy.λz.fls)) (λx.tru) β (λx.tru) (λy.λz.fls) β tru

empt (pair h t) = (λx.x (λy.λz.fls)) (λb.b h t) β (λb.b h t) (λy.λz.fls) β (λy.λz.fls) h t β (λz.fls) t β fls

An implementation of Church encodings:

?- [church].
?- noshow, aor, plus:zero:(succ:zero) *> C.
C = (\s.(\z.s:z)).
?- noshow, aor, mult:(succ:zero):(succ:(succ:zero)) *> C.
C = (\s.(\z.s: (s:z))).

The Fixpoint Y-Combinator

Recursion in the pure lambda calculus is achieved with an operation that is divergent, meaning its evaluation never terminates. Consider the Ω-combinator (a combinator is a closed lambda expression)

Ω = (λx.x x) (λx.x x)

that never terminates

(λx.x x) (λx.x x)β (λx.x x) (λx.x x)β ...

We would like to harnass the ability of infinite repetition to implement recursion. Let's consider a recursive function

fac n = if n=0 then 1 else n*fac (n-1)

which can be written in pure form

fac = λn.(iszo n) 1 (mult n (fac (pred n)))

By abstracting the function name away, we obtain

λf.λn.(iszo n) 1 (mult n (f (pred n)))

The only thing we have to do now is to bind variable f to the lambda abstraction's body itself, so that f represents the recusrive function:

(λn.(iszo n) 1 (mult n (f (pred n))))

The fixpoint Y-combinator can be used to achieve this. The Y-combinator (often written 'fix') is a replicator that satisfies:

Y F = F (Y F)

Take for example

F = λf.λn.(iszo n) 1 (mult n (f (pred n)))

then we can define

fac = Y F


fac = Y F = F (Y F) = (λf.λn.(iszo n) 1 (mult n (f (pred n)))) (Y F) β λn.(iszo n) 1 (mult n ((Y F) (pred n)))) = λn.(iszo n) 1 (mult n (fac (pred n))))

which is what we wanted.

Recall that the Ω-combinator replicates itself. A generalization of this is the fixpoint Y-combinator in lambda form is:

Y = λf.(λx.f (λy.x x y)) (λx.f (λy.x x y))

By eta reduction, defined by the rule

λx.E x η E if xFV[E]

we find the alternative equivalent form

Y = λf.(λx.f (x x)) (λx.f (x x))

With this form we immediately see that

Ω = Y I

with identity combinator

I = λx.x

However, we do not use the alternative form of Y with AOR, because Y F diverges for any F due to strict evaluation under AOR.

We verify that Y is indeed a Y-combinator:

Y F = λf.(λx.f (x x)) (λx.f (x x)) F β (λx.F (x x)) (λx.F (x x)) β F (λx.F (x x)) (λx.F (x x)) = F (Y F)

Assuming lazy evaluation by call-by-name or call-by-need, we can define

fix = λF.F (fix F)

which satisfies the fixpoint Y-combinator proprty fix F = F (fix F).

With lazy evaluation this defnition works, because the recursive invocation will only be evaluated when F uses its operand (fix F).

?- [delta].
?- noshow, by_name, fix:(\f.(\n.if:(eq:n:0):1:(mul:n:(f:(sub:n:1))))):3 *> C.
C = 6.
?- [church].
?- noshow, by_name, fix:(\f.(\n.(iszo:n):(succ:zero):(mult:n:(f:(pred:n))))):(succ:(succ:(succ:zero))) *> C.
C = (\s.(\z.s: (s: (s: (s: (s: (s:z))))))).

Special forms are needed for AOR strategies such as call-by-value. Special forms are non-strict in certain operands.

When implementing the fixpoint Y-combinator for evaluation with call-by-value (AOR), we cannot use the definition of fix above, because fix F diverges for any F. We can use

fix = λf.λx.f (fix f) x

which is equivalent to the previous definition by eta reduction:

fix = λf.λx.f (fix f) x η λf.f (fix f).

Eta reduction generalizes the principle of Currying, meaning that any partial application of a k-ary function f applied to k-1 arguments satisfies

f E1 E2 ... Ek-1 = (λx.f E1 E2 ... Ek-1 x)

For example:

?- [eta].
?- eta((\x.add:1:x):2, E).
E = add:1:2.
?- eta((\x.(\y.add:y):x):1:2, E).
E = add:1:2.

Exercise: Show that the correctness of eta reduction follows from beta reduction.

Proof: Let E be an expression such that xFV[E]. Then, (λx.E x) a β E a. So (λx.E x) η E is valid under beta reduction. ∎


Another special form is needed for conditionals for evaluation with strict evaluation strategies such as call-by-value and AOR. That is, the Church Boolean test[condexpr, thenexpr, elseexpr] evaluates all three operands under call-by-value. We extend the lambda expression syntax with the if/3 construct and use the operational semantics of NB expressions:

if(true, T2, T3)T2
if(false, T2, T3)T3
T1T'1 (E-If)
if(T1, T2, T3)if(T'1, T2, T3)

We add the if/3 construct to the lambda calculus operational semantics in Prolog and tuples:

E ::= v | x | \x.E | E:E | if(E,E,E) | (E,E)

Values v include the atoms true and false.

We add the following operational semantics (delta rules) to define several "built-in" functions:

%________CONCLUSION_________ :- ________RULE_______, __PREMISES__.
add:N:M       :> K           :- rule('E-Add'),       number(N), number(M), K is N+M.
add:T:S       :> add:R:S     :- rule('E-Add1'),      T :> R.
add:T:S       :> add:T:R     :- rule('E-Add2'),      S :> R.
sub:N:M       :> K           :- rule('E-Sub'),       number(N), number(M), K is N-M.
sub:T:S       :> sub:R:S     :- rule('E-Sub1'),      T :> R.
sub:T:S       :> sub:T:R     :- rule('E-Sub2'),      S :> R.
mul:N:M       :> K           :- rule('E-Mul'),       number(N), number(M), K is N*M.
mul:T:S       :> mul:R:S     :- rule('E-Mul1'),      T :> R.
mul:T:S       :> mul:T:R     :- rule('E-Mul2'),      S :> R.
div:N:M       :> K           :- rule('E-Div'),       number(N), number(M), K is N/M.
div:T:S       :> div:R:S     :- rule('E-Div1'),      T :> R.
div:T:S       :> div:T:R     :- rule('E-Div2'),      S :> R.
eq:T:T        :> true        :- rule('E-EqTrue').
eq:T:S        :> eq:R:S      :- rule('E-Eq1'),       T :> R.
eq:T:S        :> eq:T:R      :- rule('E-Eq2'),       S :> R.
eq:_:_        :> false       :- rule('E-EqFalse').
and:true:true :> true        :- rule('E-AndTrue').
and:false:_   :> false       :- rule('E-AndFalse').
and:T:S       :> add:R:S     :- rule('E-And1'),      T :> R.
and:T:S       :> add:T:R     :- rule('E-And2'),      S :> R.
if:true:T:_   :> T           :- rule('E-CondTrue').
if:false:_:T  :> T           :- rule('E-CondFalse').
if:T:P:Q      :> if:R:P:Q    :- rule('E-Cond'),      T :> R.
fix:F:T       :> F:(fix:F):T :- rule('E-Fix').
if(true,T,_)  :> T           :- rule('E-IfTrue').
if(false,_,T) :> T           :- rule('E-IfFalse').
if(T,P,Q)     :> if(R,P,Q)   :- rule('E-If'),        T :> R.
pair:T:S      :> (T,S)       :- rule('E-Pair').
fst:(T,_) :> T :- rule('E-FirstPair').
fst:T :> fst:R :- rule('E-First'), T :> R.
snd:(_,T) :> T :- rule('E-SecondPair').
snd:T :> snd:R :- rule('E-Second'), T :> R.

The addition of the if/3 special form allows the evaluation rules for this form to deviate from the evaluation rules for lambda calculus. This ensures we can safely use conditionals with AOR strategies (call-by-value). Without the special form, all operands to a conditional will be evaluated leading to potential errors. For example:

?- noshow, by_value, if:true:1:(div:1:0) *> C.

which gives a zero-divisor error because (div:1:0) is always evaluated with call-by-value as an operand to the if function, whereas the special form if/3 for example:

?- noshow, by_value, if(true,1,(div:1:0)) *> C.

which produces the result '1' without error since we defined specific evaluation rules (E-IfTrue), (E-IfFalse), and (E-If) exactly as in our NB expression language operational semantics.

With non-strict evaluation strategies such as call-by-name the if-function can be used without the need for a if/3 special form:

?- noshow, by_name, if:true:1:(div:1:0) *> C.
C = 1.

Lazy Datastructures

Because pairs (E,E) have no evaluation rules, pairs form lazy data structures. This may appear rather strange at first. It means that pairs can be placeholders for unevaluated expressions. Once the first or second expression in the pair is obtained with fst or snd, respectively, the expression is evaluated by the current evaluation strategy.

In NOR strategies (call-by-name and call-by-need) operands are only evaluated when needed, which demonstrates the lazyness of the tuple constructor pair:

?- noshow, by_name, fst:(pair:1:(div:1:0)) *> C.
C = 1 .

which produces the result without error, due to the fact the the tuple lazely stores the operands 1 and (div:1:0) as shown here in more detail:

?- show, by_name, fst:(pair:1:(div:1:0)) *> C.
   /            \            
 fst        _____:___        
           /         \       
          :_          :___   
         /  \        /    \  
      pair   1      :_     0 
                   /  \      
                 div   1     

      pair:1: ((div):1:0):> (1, (div):1:0)

    pair:1: ((div):1:0):> (1, (div):1:0)
    fst: (pair:1: ((div):1:0)):>fst: (1, (div):1:0)

   /         \         
 fst     _____,        
        /      \       
       1        :___   
               /    \  
              :_     0 
             /  \      
           div   1     

    fst: (1, (div):1:0):>1

C = 1.

Note that the operands T and S to pair:T:S are not evaluated with NOR strategies and simply copied into the (T,S) tuple result.

Call-by-need with lazy data structure constructors allows the construction of "infinite" data structures. For example, we can define from and take functions:

from :> \n.pair:n:(from:(add:n:1)).
take :> \n.(\xs.if(eq:n:0,[],if(eq:xs:[],[],pair:(fst:xs):(take:(sub:n:1):(snd:xs))))).

and observe that the infinite list (by tupling) generated by from is never created unless needed:

?- noshow, by_name, fst:(from:1) *> C.
C = 1.

Because tuples are lazy data structures, their content is never evaluated unless we explicitly fetch the first or second element by fst or snd:

?- noshow, by_name, take:2:(from:1) *> C1, fst:C1 *> C2, snd:C1 *> C3, fst:C3 *> C4.
C1 = (fst: (from:1), take: (sub:2:1): (snd: (from:1))),
C2 = 1,
C3 = (fst: (snd: (from:1)), take: (sub: (sub:2:1):1): (snd: (snd: (from:1)))), C4 = 2.

Note the duplication of the snd:(from:1) subexpressions in C3, which is avoided in the call-by-need strategy by sharing.

Delayed Execution in Call-by-Value

The call-by-value strategy does not apply (E-Abs) to reduce lambda expressions, meaning that we never evaluate inside abstractions. This allows us to define "thunks", "suspensions", or "delayed forms" to simulate call-by-name and lazy evaluation in call-by-value strategies. That is, we can "stuff" an expression E in an abstraction λx.E using a dummy variable x. We force evaluation of the delayed E later by applying the abstraction to a dummy value. We define the delay and force special forms as macros:

macro delay(E) = (λx.E)
macro force(D) = (D nil)

Recall that a similar approach was used in the C++0x example to pass an expression by name to an integration function by stuffing it into a lambda function:

double integrate(function<double (double)> f, double &x, double a, double b, double h)
{ double sum = 0;
  for (x = a; x =< b; x += h)
    sum += f();
  return sum;
double y = 1.0;
double z = integrate([&]() -> double { return 2*x + x*x + y; }, x, 0.0, 10.0, 0.5);

Barendregt's Variable Convention

Suppose we construct lambda expressions such that each bound variable has a unique name and all free variables are unique. This property is refered to as Barendregt's variable convention (BVC).

When we use BVC for lambda abstractions with Prolog variables we ensure that a Prolog variable is bound to exactly one abstraction. With BVC we can implement beta reduction efficiently by instantiating Prolog variables in unit time rather than relying on substitution that takes O(n) time for terms of size n.

The following illustrates this idea by reducing (λx.mul x x) 2β mul 2 2 by matching (X.mul:X:X):2 = F:T to application of a function F to an operand expression T, where F = (X.S) matches an abstraction X.S with body S = mul:X:X, and then we set variable X to the operand T to obtain S = mul:2:2:

?- (X.mul:X:X):2 = F:T, F = (X.S), X = T.
F = (2.mul:2:2),
T = X, X = 2,
S = mul:2:2.

which suggestes a simple operational semantics rule for beta reduction in Prolog:

(X.S):T :> S :- rule('E-AbsApp'), X = T.

or we can write this as:

(T.S):T :> S :- rule('E-AbsApp').

To see why BVC is essential, consider:

?- (X.X:2):(X.add:1:X) :> C.

which fails, because we attempt the unification X = (X.add:1:X).

Unfortunately, we cannot automatically maintain BVC:

?- (X.X:(X:2)):(Y.add:1:Y) :> C.
C = (Y.add:1:Y):((Y.add:1:Y):2).

To maintain the BVC property we need alpha renaming.

The copy_term/2 predicate copies a term to a new term with the same structure but with fresh new Prolog variables. That is, variables are renamed. The copy_term/2 predicate performs the same copying operation when a Prolog rule is copied from the program database of rules and instantiated. When we add the new rule:

%_CONCLUS_ :- _________RULE_________, __PREMISES__.                                 
F:T :> S :- rule('E-AbsApp'), copy_term(F, (T.S)). ...

the rule above is a simplified version of the slightly more elaborate but more transparant implementation using explicit unification for the substitution:

(X.S):T :> R :- rule('E-AbsApp'), copy_term((X.S), (Y.R)), Y = T.

and we get step by step:

?- [delta].
?- noshow, by_name, (X.X:(X:2)):(Y.add:1:Y) :> C1, C1 :> C2, C2 :> C3, C3 :> C4, C4 :> C5.
C1 = (Y.add:1:Y): ((Y.add:1:Y):2),
C2 = add:1: ((Y.add:1:Y):2),
C3 = add:1: (add:1:2),
C4 = add:1:3,
C5 = 4 .

as desired. This approach works with call-by-name, call-by-need, and call-by-value strategies, which do not apply (E-Abs).

Exercise: With (E-Abs) enabled, what can go wrong when evaluating lambda expressions in Prolog with Prolog variables in BVC using copy_term/2?

To implement call-by-need, we only have to change the new rule (E-AbsApp) to include a shared form as follows:

F:T :> S   :- rule('E-AbsApp'),       copy_term(F, ((T~_).S)).

Pure Lambda Calculus Embedded in Pure Prolog

Lambda calculus can be trivially embedded in pure Prolog restricted to Horn clauses.

We can avoid the non-pure copy_term/2 and implement lambda calculus with call-by-name, call-by-need, and call-by-value strategies in pure Prolog by converting each abstraction (X.S) in a lambda expression to a new predicate lam(i, X, S) and adopt a new lambda application rule:

lam(I):T :> S :- rule('E-AbsApp'), lam(I, T, S).    

For example, (X.X:(X:2)):(Y.add:1:Y) is converted to the term lam(1):lam(2) with indexed lam/1 terms that refer to the lam/3 Prolog facts:

lam(1, X, X:(X:2)).
lam(2, X, add:1:X).

Execution proceeds by instantiation of the indexed lam/3 rules:

?- [delta].
?- noshow, by_value, lam(1):lam(2) *> C.
  / \  
lam lam
 |   | 
 1   2 
  /   \    
lam    :   
 |    / \  
 2  lam  2 
     /      \    
    :_       :   
   /  \     / \  
 add   1  lam  2 
     /         \       
    :_          :___   
   /  \        /    \  
 add   1      :_     2 
             /  \      
           add   1     
     /    \  
    :_     3 
   /  \      
 add   1     

C = 4.

This idea works in principle, but in general we need to make a small change to allow variables of outer scope to occur inside abstractions to create closures. Thus, we need an environment to carry along variable bindings into the closure. We can use a list of Prolog variables. For example, (X.Y.X) is translated to lam(1, []) where

lam(1, [], X, lam(2,[X])).
lam(2, [X], Y, X).

Note that the second lambda abstraction has a free variable X that forms the body of the inner abstraction Y.X of the nested abstractions X.Y.X.

We modify rule (E-AbsApp) to include the environment as a list of free variables Xs:

lam(I,Xs):T :> S :- rule('E-AbsApp'), lam(I, Xs, T, S).    

The natural semantics formulation of the call-by-name evaluation strategy consists of just three rules:

:- op(950, xfx, =>).	% big-step evaluation relation
val(A)    => val(A).
lam(I,Xs) => lam(I,Xs).
app(F,T) => C :- F => lam(I,Xs), lam(I, Xs, T, S), S => C.

where terms are pre-compiled to pure Prolog terms and lam/4 facts with the ~> relation defined by:

:- op(950, xfx, ~>).	% compilation relation
V   ~> (val(V),[])    :- atomic(V).
X ~> (X,[X]) :- var(X), !.
X.T ~> (lam(I,Xs),Xs) :- T ~> (S,Ys), ord_del_element(Ys, X, B), flag(lam, I, I+1), assertz(lam(I, Xs, X, S)).
F:T ~> (app(G,S),Zs) :- F ~> (G,Xs), T ~> (S,Ys), ord_union(Xs, Ys, Zs).

We use the compiler relation E ~> (C,B) to compile E to C and list of free variables B, then evaluate C to a value V:

?- [pure].
?- (X.Y.X):2:3 ~> (C,B), C => V.
C = app(app(lam(1, []), val(2)), val(3)),
B = [],
V = val(2) .

An additional advantage is that the compilation automatically implements local variable bindings by the fact that variable bindings are part of the lam/4 facts. Thus, variables are no longer universally quantified after compilation and we do not have to start in BVC form. For example, (X.X:(X:2)):(X.add:1:X).

MiniMu: a Minimalistic Functional Programming Language

The pure lambda calculus is sufficiently powerful to express any calculation. But it would be syntactically awkward to use. Let's use some syntactic sugar to the lambda calculus notation and define convenient syntactic constructs for our "MiniMu" programming language, consisting of expressions E and function definitions D:

E ::= v
    | x
    | \x.E
    | E:E
    | if(E,E,E)
    | (E,E)
    | f(E,...,E)
    | E⊕E
    | ⊕E
    | E where x = E
    | E where f(x,...,x) = E
D ::= def f := E.
    | def f(x,...,x) := E.
    | def x⊕x := E.
    | def ⊕x := E.

where values v are atoms and constants, x denotes a variable name, f is a function name, and ⊕ is an infix operator.

The syntax of expressions E for the last five grammar productions is converted to lambda expressions by the following rules:




E1 where x = E2(\x.E1):E2

E1 where f(x1,x2...,xk) = E2(\f.E1):(\x1.(\x2.(...(\xk.E2))))

and definitions D are normalized by the rules:

def f(x1,x2...,xk) := E f := (\x1.(\x2.(...(\xk.E)))).

def x1x2 := E () := (\x1.(\x2.E)).

def x := E () := (\x.E).

These rules suffice to reduce the syntax down to a normalized form consisting only of lambda expressions and definitions of constants (where functions are constants).

To implement the common list functions we saw earlier, we define:

% List operators
:- op(500, yfx, ++).	% Append
:- op(700, xfx, ..).	% Range

def length(xs) :=

def xs++ys :=
		cons(hd(xs), tl(xs)++ys)

def map(f, xs) :=
		cons(f(hd(xs)), map(f, tl(xs)))

def foldl(f, x, xs) :=
		foldl(f, f(x, hd(xs)), tl(xs))

def foldr(f, x, xs) :=
		f(hd(xs), foldr(f, x, tl(xs)))

def filter(p, xs) :=
			cons(hd(xs), filter(p, tl(xs))),
			filter(p, tl(xs)))

def takeWhile(p, xs) :=
			cons(x, takeWhile(p, tl(xs))),
		) where x = hd(xs)

def dropWhile(p, xs) :=
			dropWhile(p, tl(xs)),

def take(n, xs) :=
			cons(hd(xs), take(n-1, tl(xs)))

def drop(n, xs) :=
			drop(n-1, tl(xs))

def zipWith(f, xs, ys) :=
			cons(f(hd(xs),hd(ys)), zipWith(f, tl(xs), tl(ys)))

def zip(xs, ys) := zipWith(pair, xs, ys).

def nth(n, a, xs) :=
          nth(n-1, a, tl(xs))

def concat := foldr(++, []).

def a..b := if(b<a, [], cons(a, a+1..b)).

def from(n) := cons(n, from(n+1)).

A read-eval-print loop that performs the syntax translation and evaluation of expressions using the call-by-need strategy:

?- [minimu,list].
?- loop.
> 1+2*3.
> def sum := foldr(+, 0).
> sum(1..10).
> foldr(*, 1, 1..10).
> def fac(n) := if(n=0, 1, n*fac(n-1)).
> fac(10).
> map(sqr, 1..10) where sqr(n) = n*n.
                      /                                            \                         
         ____________:______                            ____________:_________               
        /                   \                          /                      \              
   ____.           __________:                  ______:_             __________:             
  /     \         /           \                /        \           /           \            
 \       :__    hd     ________.             map     ____.        tl     ________.           
 |      /   \         /         \                   /     \             /         \          
 x     :     x       1           :______           \       :__         1           :______   
      / \                       /       \          |      /   \                   /       \  
     *   x                 ____:        10         y     :     y             ____:        10 
                          /     \                       / \                 /     \          
                        ..       :__                   *   y              ..       :__       
                                /   \                                             /   \      
                               :     1                                           :     1     
                              / \                                               / \          
                             +   1                                             +   1

Note that lists are lazy, so the last evaluation produces a partially evaluated list of the form head.tail.


- End.