Version 12/3/18

Dynamic Programming

We can introduce dynamic programming where we left off in Divide & Conquer, illustrating with the Fibonacci recursion. A Fibonacci calculator can be implemeted just by translating the recurrence into a recursive function:

unsigned long fibrec(unsigned n)
{
  if (n == 0) return 0;
  if (n == 1) return 1;
  return (fibrec(n - 1) + fibrec(n - 2));
}

It is a simple matter to code up a driver calling this function, so we can submit and go home, right? If we do that, when our developer colleagues try to use it they find it extremely slow and return the code to us asking for some optimization.

Exercise 1. Code a driver for fib_rec and experiment.

The problem, it turns out, is that fibrec(n) requires fibrec(n-1) and fibrec(n-2), while fibrec(n-1) requires fibrec(n-2) and fibrec(n-3) - at the first layer of recursive calls fibrec(n-1) is calculated (recursively) twice. Continuing, we see the following tree of calls to fibrec (showing only the argument):


                                     (n)
                      (n-1)                         (n-2)
              (n-2)           (n-3)           (n-3)          (n-4)
          (n-3)   (n-4)   (n-4)   (n-5)   (n-4)   (n-5)   (n-5)   (n-6)
          /   \   /   \   /   \   /   \   /   \   /   \   /   \   /   \
         .     . .     . .     . .     . .     . .     . .     . .     .

The recursive call order is a pre-order traversal of the call tree, so, for example, when fibrec(n-3) requires fibrec(n-4) that value is not on the call stack and hence there is no alternative but to calculate the value. This complete binary tree of calls must each be evaluated to get fibrec(n) until a base case is reached at the bottom of the tree. The call tree has 2ⁿ nodes, and therefore:

Proposition. The runtime of fibrec(n) is Θ(2ⁿ).

Exponential runtime is not acceptable - the runtime doubles when the input increases by 1.

1 Memoization

The culprit in the horrible runtime calculation is the repeated re-calculation of values. A possible workaround is the memo pad: when a value is calculated, write it on a memo pad and when that value is needed again, look it up instead of re-calculating it. Here is sample code for such memoization of fibrec:


// 1. Create a non-recursive executor function that creates a memo pad
typedef fsu::HashTable<unsigned, unsigned long> AA;    // memo type = associative array
unsigned long fib_memo_AA(unsigned n);                 // executor
unsigned long fib_memo_AA(unsigned n, AA& memo);       // recursive call

unsigned long fib_memo_AA(unsigned n)
{
  unsigned b = 2;
  if (n > 2) b = n;
  AA memo(b);    // <-- create memo pad
  memo[0] = 0;   // <-- supply base case values (optional)
  memo[1] = 1;   // <-- supply base case values (optional)
  return fib_memo_AA(n,memo);  // <-- call recursive version, passing memo by reference
}

// 2. Implement the old "recursive" function with lookup added, passing memo by reference
unsigned long fib_memo_AA(unsigned n, AA& memo)
{
  unsigned long f;
  if (memo.Retrieve(n,f)) // memo lookup, includes base cases for recursion
  {
    return f;             // if successful lookup we're done, otherwise ...
  }
  f = fib_memo_AA(n-1,memo) + fib_memo_AA(n-2,memo); // ... calculate ...
  memo[n] = f;            // ... and store for future lookup ...
  return f;               // ... before returning value
}

Lemma. If n > 1, the call fib_memo_AA(n) triggers 2n-1 calls to fib_memo_AA(n,memo): n-1 calls result in calculation and store and n calls result in memo lookup.

Proof. The memo eventually stores all (n+1) values f₀, f₁, f₂, ... , f_n. Because the first two are stored by the executor function, the remaining n - 1 are stored by calls to the recursive function. All other calls must result in memo lookup. We count the recursive calls using mathematical induction.

For the base case n = 2: Let r=0 count the recursive calls.

fib_memo_AA(2,memo)[++r] doesn't find 2 in the memo so it calls fib_memo_AA(1,memo)[++r] and fib_memo_AA(0,memo)[++r]
fib_memo_AA(1,memo) and fib_memo_AA(0,memo) are lookups of the initial values in the table.

This gives r = 3 as required

For the inductive step, suppose the result is true for all numbers less than n. Then

fib_memo_AA(n,memo) doesn't find n in the memo so it calls fib_memo_AA(n-1,memo) and fib_memo_AA(n-2,memo)
By induction the call at fib_memo_AA(n-1,memo) uses 2(n-1)-1 = 2n-3 recursive calls. And the call fib_memo_AA(n-2,memo) triggers one lookup (because the call to fib_memo_AA (n-1,memo) already ensured f_n-2 is in the memo table) and hence zero recursive calls.

This yields a total of 2 additional recursive calls, or 2n-3+2 = 2n-1. ∎

Proposition. The runtime of fib_memo_AA(n) is Θ(n).

Proof. By the Lemma, there are 2×n recursive calls, n-1 calculate/store operations and n table retrieve operations. Thus the cost is

C = (cost of recursive calls) + (cost of table inserts) + (cost of table lookups)
C = 2×(n-1) + (n-1)×(cost of table insert) + n×(cost of table lookup)

The cost of the table operations is O(1) due to the setting up of the table parameters. Therefore

C = 2×(n-1) + n-1 + n =4n - 3 = Θ(n)

as required. ∎

Thus for example the runtime to calculate Fib(30) drops from more than 1,000,000,000 to less than 100, a 1,000,000 percent decrease. The team will be pleased!

1.1 Time/Space tradeoff

The exponential runtime of Fib(n) has been reduced from Θ(2ⁿ) to Θ(n) at the cost of an additional +Θ(n) space usage. (This is on top of the space required for the runtime stack to push the recursive calls.) Clearly any use of a memo technique on another algorithm will have a similar effect. If the runtime is reduced in degree this is almost always a reasonable tradeoff. (In some special circumstances this extra space requirement can also be reduced, as we shall see for Fib.)

1.2 Bottom-Up Re-Write

It is often the case that after memoization of an algorithm an iterative bottom-up version can be constructed with three nice properties:

The memo structure can be predicted and simplified
The runtime stack usage is reduced to Θ(1)
The runtime analysis is more straightforward

Continuing with the Fibonacci example, we can replace the somewhat elaborate hash table implementation of memo with a simple vector, even in the top-down version:

// 1. Create a non-recursive executor function that creates a memo pad
unsigned long fib_memo_Vector(unsigned n)
{
  fsu::Vector<unsigned long> memo(1+n,0); // all elements initilazed to 0
  if (n > 0) memo[1] = 1;
  return fib_memo_Vector(n,memo);
}

// 2. Implement the old "recursive" function with lookup added, passing memo by reference
unsigned long fib_memo_Vector(unsigned n, fsu::Vector<unsigned long>& memo)
{
  // base cases already in memo
  if (n < 2) return memo[n]; // handle separately because memo[0] == 0
  unsigned long f;
  if (memo[n] != 0) // any updated element is > 0, except memo[0]
  {
    f = memo[n];
  }
  else
  {
    f = fib_memo_Vector(n-1,memo) + fib_memo_Vector(n-2,memo);
    memo[n] = f;
  }
  return f;
}

The bottom-up version just builds the array in a loop:

unsigned long fib_bu_Vector(unsigned n)
{
  if (n == 0) return 0;
  if (n == 1) return 1;

  // can use ordinary array for ladder
  fsu::Vector memo(n + 1);
  // build ladder:
  memo[0] = 0;
  memo[1] = 1;
  for (size_t i = 2; i <= n; ++i)
  {
    memo[i] = memo[i-1] + memo[i-2];
  }

  // use memo:
  return memo[n];
}

We could export the vector and have a very fast calculator of f₀, f₁, f₂, ... , f_n. Note also that the runtime stack space is reduced to +Θ(1) and the runtime analysis is completely straightforward:

Cost = (n-1)×(2 array accesses + 1 assignment) = Θ(n)

We will leave Fibo alone for now and return at the end of this chapter.

2 Knapsack Problem

The Knapsack Problem, technically the 0-1 Knapsack Problem, is a classic demonstration of the use of dynamic programming. It also is an important kind of problem to solve in many domains, as it optimizes value among available resources with a constrained consumption ability.

Unfortunately, most explanations of this problem offer up a cheesy analogy of a burglar in an art gallery with a constrained carry capacity. I found one video that omits this dubious tradition and another that, once you get past a bit of profanity and the cheese, is also a good down-to-earth explanation. These are:

The "0-1" in the Knapsack Problem simply refers to the assumption that resources are discrete and indivisible - you have to take a resource or leave it, you can't take less than the whole. (There is a "Fractional Knapsack Problem" where the resources are divisible - for example, money, energy, fuel, and yes for the burglar analogy, gold dust. That version is easier to solve, and the greedy approach works in that case. See Greedy notes.)

Knapsack variables and notation:

n = number of resources
W = maximum weight
{1,2, ... n} = R = set of resources
v₁, v₂, ..., v_n: resource values
w₁, w₂, ..., w_n: resource weights or costs
Thus item k has value v_k and weight w_k.

We assume that weights are non-negative integers and values are non-negative real numbers. The weight (and value) for a subset S⊆R are defined as the sum of all values (weights) of items in the subset:

v(S) = ∑_s∊Sv(s)
w(S) = ∑_s∊Sw(s)

Knapsack Goal: Find a subset S of R such that v(S) is maximized subject to the constraint w(S) ≤ W.

In other words we want to find:

The maximum value V obtainable with total weight ≤ W
A selection of items from R that realizes V

For the present we will work on finding the optimal value V. Once we have algorithms for that, we will return to examine what is needed to produce the solution for goal 2. We can devise a recursive solution to Knapsack (which we call KS for the rest of this section) as follows: Let KS(n,W) denote a solution, and assume we can solve the smaller problems KS(k,W) for k < n. Then we should be able to construct a solution to KS(n,W) in two cases:

If the weight of the nth item is larger than W, take KS(n,W) = KS(n-1,W) because the nth item is too large to be part of a solution
Otherwise, there are two possible configurations of a solution - either the solution uses the nth resource or it does not. Put another way, either KS(n-1,W) is still optimal without using the nth item or we can do better using the nth item together with a solution to the reduced-size problem using the first n-1 items: KS(n,W) is the larger of KS(n-1,W) and v_n + KS(n-1,W - w_n)

The base case is KS(0,W) = 0 (and considering W also a variable) KS(n,0) = 0). In other words, if there are no resources there's nothing to put in the knapsack, and if the max weight is zero the knapsack will not hold anything. This recursive solution can be coded as follows (using vectors to input the data):

double KS_rec(size_t n, size_t W, const fsu::Vector<double>& v, const fsu::Vector<size_t>& w)
{
  double result;
  if (n == 0 || W == 0)
    result = 0;
  else if (w[n] > W)
    result = KS_rec(n-1,W,v,w);
  else
    result = fsu::Max(KS_rec(n-1,W,v,w), v[n] + KS_rec(n-1,W-w[n],v,w));
  return result;
}

This recursive function can be used with a simple driver and it does solve the problem. However it suffers from the same defect as the recursive Fibonacci calculator - many sub-problems get solved over and over, making the worst case runtime exponential in n.

Exercise 2. Code up a driver and test KS_rec. As we introduce improved versions, add them to the driver, ultimately testing them all.

2.1 Knapsack Memoized - HashTable

Following the process illustrated for Fibo in Section 1, we can memoize the Knapsack algorithm. Because it is not immediately clear how much memo will be needed, the most straightforward path to memoization is to use an associative array with an initial guess for memo size and an upgrade subroutine when expansion is needed:

// memoized recursive solution (AA version)
typedef fsu::HashTable<fsu::String,double> AA;
double KS_memo_AA(size_t n, size_t W, const fsu::Vector<double>& v, const fsu::Vector<size_t>& w); // executor
double KS_memo_AA(size_t n, size_t W, const fsu::Vector<double>& v, const fsu::Vector<size_t>& w, AA & memo); // recursive call

double KS_memo_AA(size_t n, size_t W, const fsu::Vector<double>& v, const fsu::Vector<size_t>& w)
{
  size_t b = fsu::Max(n,W);          // <-- WAG
  AA memo(b);
  double rval = KS_memo_AA(n, W, v, w, memo);
  // memo.Analysis(std::cout);       // <-- illustrating possible check on table efficiency
  return rval;
}
double KS_memo_AA(size_t n, size_t W, const fsu::Vector<double>& v, const fsu::Vector<size_t>& w, AA& memo)
{
  static float expansion_factor = 1.5;
  fsu::String key = fsu::ToDec(n) + "." +  fsu::ToDec(W);
  double result;
  if (memo.Retrieve(key,result)) return result;
  if (n == 0 || W == 0)
    result = 0;
  else if (w[n] > W)
    result = KS_memo_AA(n-1,W,v,w,memo);
  else
    result = fsu::Max(KS_memo_AA(n-1,W,v,w,memo), v[n] + KS_memo_AA(n-1,W-w[n],v,w,memo));
  memo[key] = result;
  if (memo.Size() > expansion_factor * memo.NumBuckets())
    memo.Rehash(expansion_factor * memo.Size());
  return result;
} // KS_memo_AA */

To use a pair of integers as a hashtable key we convert them each to a digit string and concatenate with a dot between. This simple expedient can likely be improved.

2.2 Knapsack Memoize - Matrix

Associative arrays work as memo pads and (correctly parametrized) do not add to the asymptotic runtime. However if one can predict the memory needs exactly in advance, then a vector or matrix structure will be much simpler and have considerably smaller overhead associated with storage and retrieval. In the case of Knapsack, we predictably might need memory for all sub-problems KS(i,j) for 0 ≤ i ≤ n and 0 ≤ j ≤ W. This storage requirement is met by an (n+1)×(W+1) matrix:

// memoized recursive solution (Matrix version)
typedef fsu::Matrix<double> Matrix;
double KS_memo_Matrix(size_t n, size_t W, const fsu::Vector<double>& v, const fsu::Vector<size_t>& w); // executor
double KS_memo_Matrix(size_t n, size_t W, const fsu::Vector<double>& v, const fsu::Vector<size_t>& w, Matrix& memo); // recursive call

double KS_memo_Matrix(size_t n, size_t W, const fsu::Vector<double>& v, const fsu::Vector<size_t>& w)
{
  fsu::Matrix<double> memo(n+1,W+1,0.0);
  double rval = KS_memo_Matrix(n, W, v, w, memo);
  return rval;
}
double KS_memo_Matrix(size_t n, size_t W, const fsu::Vector<double>& v, const fsu::Vector<size_t>& w, Matrix& memo)
{
  double result = memo[n][W];
  if ( result != 0 ) return result;
  if (n == 0 || W == 0)
    result = 0;
  else if (w[n] > W)
    result = KS_memo_Matrix(n-1,W,v,w,memo);
  else
    result = fsu::Max(KS_memo_Matrix(n-1,W,v,w,memo), v[n] + KS_memo_Matrix(n-1,W-w[n],v,w,memo));
  memo[n][W] = result;
  return result;
} // KS_memo_Matrix */

Note that the memory requirement for KS_memo_Matrix is explicitly +Θ(nW) - a hefty amount when both n and W are large.

2.3 Knapsack Bottom-Up

The bottom-up ("brute force") algorithm can be easily derived from KS_memo_Matrix - just calculate the matrix left-to-right, top-to-bottom, with a pair of nested loops. This has all the advantages of simplicity and transparent analysis we obtained earlier for Fibo. However, note that it does commit us to calculating solutions to all sub-problems, which may be wasteful.

// bottom-up
double KS_bu(size_t n, size_t W, const fsu::Vector& v, const fsu::Vector& w)
{
  fsu::Matrix M(n+1, W+1,0);
  for (size_t i = 1; i <= n; ++i)
  {
    for (size_t j = 0; j <= W; ++j)
    {
      if (w[i] > j)
        M[i][j] = M[i-1][j];
      else
        M[i][j] = fsu::Max(M[i-1][j], v[i] + M[i-1][j - w[i]]);
    }
  }
  return M[n][W];
}

Proposition. KS_bu has runtime Θ(n×W) and runspace +Θ(n×W).

Proof. The memory used is an (n+1)×(W+1) matrix and the execution is essentially a traversal of that matrix. ∎

2.4 Storage requirement

The amount of storage a given Knapsack problem actually requires is highly data dependent. In the matrix versions, we have allocated a substantial amount of storage based on the worst case of requiring all possible sub-problem solutions. However, that seems to be rarely the case in practice. In many situations the amount of storage requried is "sparse" compared to n×W.

Note that the associative array version of memoization illustrated above begins by assuming only +O(b) space, where b is the larger of n and W, and checks for the need for expansion during the computation. That version may be the most efficient. Faster hashing of keys would speed it up. The hashing example used in the code above is likely not optimal.

Having excessive storage allocated also affects runtime. For example, the loop dimensions in KS_bu are determined by the matrix dimensions used for storage. While definitive results are not available, some randomly generated sample Knapsack problems yield:

  W = 500 
  n  size of memo    (n+1)×(W+1) = worst-case storage
 --  ------------    -----------
 10    702            5,511
 20  1,973           10,521
 30  3,990           15,531
 40  6,118           20,541
 50  5,831           25,551

Another short series of experiments with more items of smaller average value yields:

  W = 500 
  n  size of memo    (n+1)×(W+1) = worst-case storage
 --  ------------    -----------
100   43,255           50,601
200   90,242          100,701
300  136,118          150,801
400  178,983          200,901
500  226,371          251,001

These memory use measurements are much closer to the worst-case. Having many items with smaller average weight makes the search more difficult. In fact, it may be that the ratio of actual memory used to the worst-case max is a useful measure of the complexity of the particular knapsack problem.

More investigation is needed.

2.5 Recovering the resource solution

Recovery of the actual solution to a Knapsack problem is necessary to make practical use of the optimization - it's all well and good to know what the maximum value is, but that is of little practical value without a recipe for actually using the resources optimally.

We have a set of items, or resources, which we have treated as an ordered set

R = {r₁, r₂, ..., r_n}

and we have now several algorithms for calculating

V = maximum value that can be obtained by selecting a subset of resources from R, subject to the constraint that their combined weight is bounded by W.

What is missing is a specification of which resources from R to be selected to achieve a maximum value V. We will use a "bit set" (actually an fsu::BitVector object) to code this information.

Discovering the "optimal packing" S that realizes the maximum value is done after the maximum is computed by examining the history of selection during the run of the algorithm. One way to understand the process is with a recursive calculation on the matrix M that is computed by KS_bu:

void Reconstruct_rec (size_t i, size_t j, fsu::BitVector& bv)
{
  //   assumes all bits in bv are unset
  if (i == 0)
  {
    return;
  }
  if (the solution to the (i,j) subproblem used resource i)
  {
    bv.Set(i);
    Reconstruct(i-1, j-w[i], bv);
  }
  else
  {
    bv.Unset(i);
    Reconstruct(i-1,j,bv);
  }
}

used as a subroutine for KS_bu with modified prototype

double KS_bu(size_t n, size_t W, const fsu::Vector<double>& v, const fsu::Vector<size_t>& w, fsu::BitVector& bv)

The result is a bit encoding for the selection of resources used to create the maximum value.

Note however that the recursive subroutine suffers from the usual malady, hence:

Exercise 3. Devise iterative (non-recursive) subroutines for KS_bu, KS_memo_Matrix, and KS_memo_AA that return the optimal packing solution found by the original algorithm encoded in the BitVector bv. Be sure the subroutines have linear runtime O(n + W) and +O(1) space usage. Use these upgraded prototypes:

double KS_memo_AA  (size_t n, size_t W, const fsu::Vector<double>& v, const fsu::Vector<size_t>& w, fsu::BitVector& bv)
double KS_memo_Matrix (size_t n, size_t W, const fsu::Vector<double>& v, const fsu::Vector<size_t>& w, fsu::BitVector& bv)
double KS_bu       (size_t n, size_t W, const fsu::Vector<double>& v, const fsu::Vector<size_t>& w, fsu::BitVector& bv)

Note that only the executor requires the extra BitVector argument and that the subroutines run in the executor. See the course LIB/tests for a driver mainKS.cpp for these three implementations and a random knapsack data generator ranKS.cpp that is handy for testing. LIB/notes_support contains sample executable knapsack_i.x.

Exercise 4. Locate the directory LIB/notes_support in your course library and copy/execute these demonstration programs:


fib_demo_i.x   # demonstrates various versions of Fibonacci calculator
knapsack_i.x   # demonstrates all 3 dynamic programming solutions of Knapsack
ranKS_i.x      # random knapsack problem generator

As usual, the "_i" indicates Linux/Intel executable. Entering the command without arguments prints some documentation to screen.

3 When does Dynamic Programming work? (Optimal Substructure + Non-Proliferation)

Given a problem with a simple recursive solution, when is that likely not acceptable? In other words, when is a divide and conquer approach not likely to work out in practice? One way is for the divide step to produce sub-problems that have deeper sub-problems in common. When the same subproblem crops up in many of the recursive calls, time as wasted re-computing a solution to that subproblem repeatedly. We saw this occur with Fibo and Knapsack. We found ways to use memoized recursion and ultimately bottom-up computation to solve these problems relatively efficiently.

A complementary question is, what properties of a problem make it suitable for a dynamic programming solution? Again looking at our examples, it is clear that we rely on two properties. The first is named for the developer of Dynamic Programming:

Bellman Optimal Substructure Property. An optimization problem has optimal substructure iff an optimal solution can be constructed from optimal solutions to (some of) its subproblems.

In the case of Fibo, we have the fact that the solution F(n) is obtained from solutions to two smaller problems: F(n-1) and F(n-2). In the case of Knapsack (KS), we have the formula

Result = Max(KS(n-1,W), v_n + KS(n-1,W-w_n))

and we rely on the optimal substructure property to prove that "Result" is indeed an optimal solution KS(n,W).

The second important feature of problems that are susceptable to dynamic programming is reasonable growth of the number of subproblems:

Subproblem Non-Proliferation Property. The number S(n) of distinct sub-problems is O(n^d) for some degree d.

In the case of Fibo we had subproblems F(0), F(1), ..., F(n-1) required to solve (calculate) F(n): The subproblem growth is O(n^d) with d = 1. For Knapsack the subproblems are held in an n×W matrix so the subproblem growth is O(nW) which is again linear if W is held constant and quadratic (d = 2) if W is not allowed to grow fast than n.

Given a problem of size n, the number S(n) of distinct sub-problems will ultimately determine both the time and space complexity of a dynamic programming solution.

4 Other Applicatons for DP

In this section we set up a few important problems whose solution by DP is fairly straightforward. The first two are explored in detail in [Cormen]. The second two are explored in the notes on Strings.

4.1 Parenthesization

Suppose we have four matrices to multiply:

A×B×C×D

and suppose that A has dimension 1×n, B has dimension n×n, C has dimension n×n, and D has dimension n×1. (Thus A is a row vector, B and C are square n×n matrices, and D is a column vector.) Because matrix multiplication is associative, we have several choices how to carry out the product:

A×B×C×D = A×(B×(C×D)) = ((A×B)×C)×D = (A×(B×C))×D = ...

The cost of these various recipes is not the same! To see why, note that:

The product [n×n] × [n×n] requires n³ operations
The product [n×n] × [n×1] requires n² operations
The product [n×1] × [n×n] requires n² operations
The product [n×1] × [1×n] requires n operations
The product [1×n] × [n×1] requires n operations
The product [1×1] × [1×1] requires 1 operation

Calculating the number of operations for various parenthesizations yields these examples:

(A×(B×C))×D requires n³ + n² + n operations

A×(B×(C×D)) requires 2n² + n operations

clearly much less costly for large n. An optimal parenthesization for a matrix product

A₁ × A₂ × ... × A_n

is a choice of parentheses that minimizes the cost of the computation, as measured by the number of individual arithmetic operations required. This problem has a DP solution with O(n) runtime. (See [Cormen] for details.)

4.2 Optimal Binary Search Trees

Suppose you want an index, based on Binary Search Tree technology, for a fixed set of keys and that you know from historical data the frequency of lookup for each key. Goal: construct the index to that the average search time is mimimized.

Suppose the keys are

k₀, k₁, k₂, ... k_n-1

and that the set as been sorted, so that

k₀ < k₁< k₂ < ... < k_n-1

Let p_i be the probability that key k_i appears in a search. (Then ∑_i<n p_i = 1.) Finally we assume that no other searches are made - user errors being trapped by a RegEx before the search.

The cost of a search can be measured by the number of tree operations required to find the key. To keep the problem simple (once a solution is found the problem and solution are easily upgraded) we assume:

The cost initiating a search is 1
The cost of a compare at a node is 2
The cost of a descent to a child node is 1

Based on the above, the cost of a (successful) search for a key k_i is 1 + 3d(i) where d(i) is the depth of k_i in the search tree. Therefore the expected cost of a search is given by

E = ∑_i<n p_i (1 + 3d(i)) = 1 + 3∑_i<n p_id(i)

The problem is to find structure the BST so that E is minimized. Clearly we can concentrate on minumizing the sum itself:

S = ∑_i<n p_id(i)

A critical ingredient of the DP solution is showing that if the subtree rooted at k_i is part of an optimal solution for the original problem then the subtree is an optimal solution to the subproblem for its subrange of keys.

A recursive algorithm can then be devised that solves the optimization problem and then taken straight to a bottom-up solution. In pursuit of the latter, use the following notation:

S(i,j) = subproblem lookup S for keys k_i, ... , k_j-1

If the subproblem has root k_r then we can write:

S(i,j) = p_r + S(i,r-1) + S(r+1,j) + ∑_[i,r) p_i + ∑_[r+1,j) p_i

(The root k_r has cost 1 and depth 0, so it contributes p_r to the total S. The sums S(i,r-1) and S(r+1,j) are from the search trees for their respective subproblems, assuming we count the search from their roots. Making these two subtrees the children of k_r adds the two remaining sums to the S.) The formula simplifies to

S(i,j) = S(i,r-1) + S(r+1,j) + ∑_[i,j) p_i

(This verifies the optimal subproblem property.) The optimal S is then the minimum over all possible root choices:

S(i,j) = Min { S(i,r-1) + S(r+1,j) + ∑_[i,j) p_i | i ≤ r < j }

The last recurrence translates to a bottom-up computation of S in time Θ(n³). A Θ(n) backtracking recurrence in the 3D solution space produces the insertion sequence of keys to create the optimal search tree found by the algorithm.

4.3 Longest Common Subsequence

This problem is taken up in the Strings notes.

4.4 Edit Distance

This problem is taken up in the Strings notes.

4.5 Bellman-Ford SSSP

This algorithm is taken up in the Weighted Graphs notes.

5^* More on Fibonacci

We left Fibo after giving the original naive recursive calculator a full recursion→memoization→bottom-up makeover that, if phrased as a calculator of all Fibonacci numbers ≤ n, has run time Θ(n), has constant extra space requirements (space usage = +Θ(1)), and is simple to analyze. There are two more nuances we might wish to address for Fibo:

Is there more efficiency to be gained if we only want one Fibo number as a return value?
How do we get really big Fibo numbers, such as Fib(1000) ?

For the remainder of this section, the Fibonacci sequence is denoted by f₀, f₁, f₂, ... , f_n, ... with f₀ = 0, f₁ = 1, and f_n = f_n-1 + f_n-2 for n ≥ 2.

5.1 RunTime = Θ(n) , SpaceUse = +Θ(1)

Looking at the fib_bu_Vector algorithm we see that the final return value uses only the last two calculated numbers. The simple expedient of remembering only the previous two calculated values reduces the space requirements to Θ(1):

unsigned long fib_opt(unsigned n)
{
  if (n == 0) return 0;
  if (n == 1) return 1;
  if (n == 2) return 1;

  // reduce size of vector from n to 3:
  fsu::Vector memo(3);

  // don't need the ladder, just the top three rungs:
  memo[0] = 0;
  memo[1] = 1;
  memo[2] = 1;
  for (size_t i = 3; i <= n; ++i)
  {
    memo[0] = memo[1];
    memo[1] = memo[2];
    memo[2] = memo[0] + memo[1];
  }
  return memo[2];
}

5.2 Fibonacci Identities

Proposition. The Fibonacci recursion F(n) satisfies the following identities for k ≥ 1.

F(2k-1) = F(k)² + F(k-1)²
F(2k) = F(k)² + 2×F(k)×F(k-1)

We will use matrix algebra (confined to 2×2 matrices). Define A to the the matrix

| 1 1 |

| 1 0 |

Claim 1. Aⁿ is equal to

| f_n+1 f_n |

| f_n f_n-1 |

Proof of Claim 1. We use mathematical induction. The base case n = 1 is apparent since A¹ = A and f₀ = 0, f₁ = 1, f₂ = 1.

For the inductive step, assume the result is true for all positive integers less than n and consider the matrix Aⁿ. Because matrix powers commute, Aⁿ = A^n-1×A. Using the inductive hypothesis for A^n-1, the product is

| f_n f_n-1 | × | 1 1 | = | f_n+f_n-1 f_n | = | f_n+1 f_n |
| f_n-1 f_n-2 | | 1 0 | | f_n-1+f_n-2 f_n-1 | | f_n f_n-1 |

(applying the definition of matrix multiplication and the Fibonacci recursion). ∎

Proof of Proposition. Even though matrix multiplication is not in general commutative, it is for powers of a matrix: A^m+n = A^m×Aⁿ. Looking at the (2,2) entry and applying Claim 1, we have:

f_m+n-1 = f_m×f_n + f_m-1×f_n-1

Substituting m = n = k we get

f_2k-1 = f_k×f_k + f_k-1×f_k-1 = f_k² + f_k-1² { the first identity }

Substituting m = k + 1 and n = k we get

f_2k = f_k+1×f_k + f_k×f_k-1 = (f_k+f_k-1)×f_k + f_k×f_k-1 = f_k² + 2×f_k×f_k-1 { the second identity }

thus verifying both identities. ∎

5.3 Memoized: RunTime = Θ(log n) , SpaceUse = +Θ(log n)

The identities of 5.2 provide a completely straightforward path to memoization of Fib. It seems intuitively clear that number of intermediate values to be remembered in a memo is limited to O(log n) (we will prove that later) so it makes sense to declare the memo in the executor to have about log₂ n buckets. Here is a working implementation:

// top-down using table f[]
typedef fsu::HashTable<unsigned, unsigned long> AA;
unsigned long fiblog_memo_AA(unsigned n);        // executor
unsigned long fiblog_memo_AA(unsigned n, AA& f); // memoized recursive function

unsigned long fiblog_memo_AA(unsigned n)
{
  unsigned b = 2;
  if (n > 2) b = 1 + floor(log2(n));
  AA f(b);
  f[0] = 0;
  f[1] = 1;
  f[2] = 1;
  return fiblog_memo_AA(n,f);
} // fiblog_memo_AA executor

unsigned long fiblog_memo_AA(unsigned n, AA& f)
{
  unsigned long value;

  // cases already in memo - includes recursion base cases
  if (f.Retrieve(n,value)) return value;

  // 2 memoized recursive calls
  bool n_is_odd = ((n&1) != 0);
  unsigned k = n_is_odd ? (n+1)/2 : n/2;
  fiblog_memo_AA(k,f);   // <- defines f[k];
  fiblog_memo_AA(k-1,f); // <- defines f[k-1];
  f[n] = n_is_odd ? f[k]*f[k] + f[k-1]*f[k-1] : (2*f[k-1] + f[k])*f[k];
  return f[n];
} // fiblog_memo_AA memoized recursive

The table lookups in the calculaton can be reduced by adding two variables to remember the recursive call return values:

  unsigned long fk = fiblog_memo_AA(k,f);   // <- defines f[k];
  unsigned long fk1 = fiblog_memo_AA(k-1,f); // <- defines f[k-1];
  f[n] = n_is_odd ? fk*fk + fk1*fk1 : (2*fk1 + fk)*fk;

One can show that the runtime and runspace requirements of fiblog_memo_AA are logarithmic. We will instead go directly to the conversion to bottom-up version, where both facts are transparent.

5.4 Bottom-Up: RunTime = Θ(log n) , SpaceUse = +Θ(log n)

The last step in optimizing Fibo will be to convert to bottom-up. The main difficulty now is finding an appropriate scaffolding to support the computation that is size O(log n). The purpose of the scaffolding is to predict the parity bottom-up. These can be computed in reverse order by successively dividing the input n by 2 until reaching 1:

  ...
  // n >= 2
  // 1: Calculate ceiling(log_2(n))
  unsigned long log_2n  = 0;
  {
    size_t l, p;
    for (l = 0, p = 1; p < n; ++l, p*= 2){}
    log_2n = l;
  }

  // 2: Calculate ladder
  fsu::Vector ladder (1+log_2n); // declare ladder; needs 1 + log_2(n) rungs in worst case
  unsigned i = 0;
  ladder[0] = n;
  while(ladder[i] > 1) // it is convenient to calculate and use the ladder "upside down"
  {
    ++i;
    ladder[i] = ladder[i - 1] / 2;
  } // i is now the bottom step of the ladder
  ...

The ladder is used to make the even/odd decision required for Fib(n) in a bottom-up process. We will maintain 3 variables p0,p1,p2 that represent f[k-1], f[k], f[k+1] as we climb the ladder. Recall that the ladder is "upside down" with the bottom step ladder[i] = 1 and the top step ladder[0] = n:

  // 3: calculate recurrence
  unsigned long p2, p1, p0, sq;
  p2 = 1; // f(k+1)
  p1 = 1; // f(k)
  p0 = 0; // f(k-1)
  for(unsigned j = i; j > 0; )
  {
    --j;
    if (1 == ladder[j] % 2)
    {
      sq = p2 * p2;    // essential: p2 gets redefined
      p2 = sq + 2 * p2 * p1;
      p1 = sq + p1 * p1;
    }
    else
    {
      sq = p1 * p1;    // efficient, but not essential
      p2 = p2 * p2 + sq;
      p1 = sq + 2 * p1 * p0;
    }
    p0 = p2 - p1;      // f(k-1)
  } // for

After step 3, p1 is the value of Fib(n). Putting all three components together we have:

unsigned long fiblog_bu(unsigned n)
{
  if (n == 0) return 0;
  if (n == 1) return 1;

  // 1: Calculate ceiling(log_2(n))
  // 2: Calculate ladder
  // 3. Calculate recurrence

  return p1;
}

Proposition. Fiblog_bu(n) has runtime Θ(log n) and runspace +Θ(log n).

Proof. There are 3 blocks of code run sequentially. Each block consists of a loop of length no greater than 1 + ⌈log₂n⌉, and at least one of them has length exactly ⌊log₂n⌋. Therefore the runtime is Θ(log₂n).

The non-constant space requirement is embodied in the ladder vector. This vector is declared to size 1 + ⌈log₂n⌉. ∎

5.4 Really BIG Integers

Fibo values get large rapidly. The largest Fibonacci number that can be stored in a 64-bit integer is f₉₀ = 2,880,067,194,370,816,120. To calculate f_n for n > 90 we need a larger integer type in which to calculate and store results. There is such a type in LIB/cpp:

/*
    integer.h

    Definition of the class Integer

    Integer is a class of arbitrary precision base 10 integers,
    the number of digits limited only by the maximum size of an array of char.
    The class uses a private fsu::Vector object to store digits.
*/

The API for these "infinite precision" integers is essentially the same as that of native C++ integer types:

namespace fsu
{
  class Integer
  {
    friend std::istream& operator >> (std::istream& s, Integer& n);
    friend std::ostream& operator << (std::ostream& s, const Integer& n);

    friend Integer operator  +  (const Integer& n, const Integer& m);
    friend Integer operator  -  (const Integer& n, const Integer& m);
    friend Integer operator  -  (const Integer& n);
    friend Integer operator  *  (const Integer& n, const Integer& m);
    friend Integer operator  /  (const Integer& n, const Integer& m);
    friend Integer operator  %  (const Integer& n, const Integer& m);

    friend bool    operator  == (const Integer& n, const Integer& m);
    friend bool    operator  != (const Integer& n, const Integer& m);
    friend bool    operator  <= (const Integer& n, const Integer& m);
    friend bool    operator  <  (const Integer& n, const Integer& m);
    friend bool    operator  >= (const Integer& n, const Integer& m);
    friend bool    operator  >  (const Integer& n, const Integer& m);

  public:
    Integer   ();                  // Initialize an Integer to zero
    Integer   (long n);            // Initialize: typecast long to Integer
    Integer   (const Integer& y);  // copy constructor
    ~Integer  ();                  // destructor
    Integer& operator  =  (const Integer& y);
    Integer& operator +=  (const Integer& y);
    Integer& operator -=  (const Integer& y);
    Integer& operator *=  (const Integer& y);
    Integer& operator /=  (const Integer& y);
    Integer& operator %=  (const Integer& y);
    Integer& operator ++  ();
    Integer  operator ++  (int);
    Integer& operator --  ();
    Integer  operator --  (int);

    // added to test for palindroms
    Integer Reverse   () const;    // return Integer with digits reversed
    bool    Palindrom () const;    // return 1 iff Integer is a palindrom
  ...
  }; // Integer
} namespace fsu

Integer algorithms can often be translated to use type fsu::Integer with only a few editorial changes. In particular, the fiblog_memo_AA and fiblog_bu algorithms translate simply. The "input" types can remain the C++ natives (unsigned and size_t), but the "output" types need to be upgraded to fsu::Integer. The first two parts of the fiblog_bu algorithm need no change as they involve numbers related to the input n. The ladder is also still a vector of unsigned ints. Only the third step, calculating the recurrence, requires the large integer type:

fsu::Integer Fiblog_bu(unsigned n) // return type is changed
{
  if (n == 0) return 0;
  if (n == 1) return 1;

  // 1: Calculate ceiling(log_2(n)) - no changes
  // 2: Calculate ladder            - no changes
  // 3. Calculate recurrence        - change types of p0,p1,p2 to fsu::Integer

  return p1;
}

The memoized version also translates simply:


typedef fsu::HashTable<unsigned,fsu::Integer> AAI;
fsu::Integer Fiblog_memo_AA(unsigned n);         // executor
fsu::Integer Fiblog_memo_AA(unsigned n, AAI& f); // recursive call

fsu::Integer Fiblog_memo_AA(unsigned n)
{
  unsigned b = 2;
  if (n > 2) b = 1 + floor(log2(n));
  AAI f(b);
  f[0] = 0;
  f[1] = 1;
  f[2] = 1;
  return Fiblog_memo_AA(n,f);
} 

fsu::Integer Fiblog_memo_AA(unsigned n, AAI& f, size_t& callcount)
{
  fsu::Integer value;
  if (f.Retrieve(n,value))
    return value;
  unsigned k = (0 != (n&1))? (n+1)/2 : n/2;
  fsu::Integer fk   = Fiblog_memo_AA(k,f,callcount);
  fsu::Integer fk1  = Fiblog_memo_AA(k-1,f,callcount);
  return f[n] = (n & 1) ? fk*fk + fk1*fk1 : (2*fk1 + fk)*fk;
}

Here are some sample values and runtimes:

Fiblog_memo_AA(100) = [21 digits] = 354224848179261915075
elapsed time = 0.0002 sec

Fiblog_memo_AA(1000) = [209 digits] =
4346655768693745643568852767504062580256466051737178040248172908953655541794905189040387984007925516\
9295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166\
849228875
elapsed time = 0.0037 sec

Fiblog_memo_AA(10000) = [2090 digits] =
3364476487643178326662161200510754331030214846068006390656476997468008144216666236815559551363373402\
5582065332680836159373734790483865268263040892463056431887354544369559827491606602099884183933864652\
7313000888302692356736131351175792974378544137521305205043477016022647583189065278908551543661595829\
8727968298751063120057542878345321551510387081829896979161312785626503319548714021428753269818796204\
6936097879900350962302291026368131493195275630227837628441540360584402572114334961180023091208287046\
0889239623288354615057765832712525460935911282039252853934346209042452489294039017062338889910858410\
6518317336043747073790855263176432573399371287193758774689747992630583706574283016163740896917842637\
8624212835258112820516370298089332099905707920064367426202389783111470054074998459250360633560933883\
8319233867830561364353518921332797329081337326426526339897639227234078829281779535805709936910491754\
7080893184105614632233821746563732124822638309210329770164805472624384237486241145309381220656491403\
2751086643394517512161526545361333111314042436854805106765843493523836959653428071768775328348234345\
5573667197313927462736291082106792807847180353291311767789246590899386354593278945237776744061922403\
3763867400402133034329749690202832814593341882681768389307200363479562311710310129195316979460763273\
7589253530772552375943788434504067715555779056450443016640119462580972216729758615026968443146952034\
6149322911059706762432685159928347098912847067408620085871350162603120719031720860940812983215810772\
8207635318662461127824553720853236530577595643007251774431505153960090516860322034916322264088524885\
2433158051534849622434848299380905070483482449327453732624567755879089187190803662058009594743150052\
4025327097469953187707243768259074199396322659841474981936092852239450397071654431564213281576889080\
5878318340491743455627052022356484649519611246026831397097506938264870661326450766507461151267752274\
8621598642530711298441182622661057163515069260029861704945425047491378115154139941550671256271197133\
252763631939606902895650288268608362241082050562430701794976171121233066073310059947366875
elapsed time = 0.1678 sec

Fiblog_memo_AA(100,000) = [more than 20,000 digits] =
elapsed time = 16.065 sec

A curve-fit estimate of the time to calculate Fib_rec(100,000) [i.e., the original recursive algorithm] is :
estimated elapsed time = 10¹⁹⁴⁸¹ years.