Homework 2: Recursion

Exploration of recursive implementations - The Good, the Bad, and the Ugly

Note: This assignment is used to assess some of the required ABET outcomes for the degree program. The outcomes assessed here are:

(a) an ability to apply knowledge of computing and mathematics appropriate to the discipline (divide-and-conquer recurrences)

(c) an ability to design, implement, and evaluate a computer-based system, process, component, or program to meet desired needs

(i) an ability to use current techniques, skills, and tools necessary for computing practice

These will be assessed using the following specific outcomes and scoring rubric

Rubric for Specific Outcomes i.-iv. I E H
Key:
I = ineffective
E = effective
H = highly effective

i. Runtime/Runspace Analysis - Result - - -

ii. Runtime/Runspace Analysis - Process - - -

iii. Program Implementation - Base Case - - -

iv. Program Implementation - Recursive Call - - -

In order to earn a course grade of C- or better, the assessment must result in Effective or Highly Effective for each specific outcome in the rubric.

Educational Objectives: After completing this assignment, the student should be able to accomplish the following:

Define explicit and implicit recursion in C++
Explain how recursion is implemented in modern programming languages
Define recursive solutions to certain programming problems
Define binary search on an array or vector as a recursive solution
Define a recursive and a dynamic programming calculation of Fibonacci numbers
Convert a simple loop to an equivalent recursive call
Define the concepts of recursive implementation and iterative implementation of algorithms.
Define the concept of divide and conquer algorithm.
Provide a recursive implementation of a generic divide and conquer algorithm.
State the aymptotic runtime for specific divide and conquer algorithm
Informally argue the correctness of your statement of the aymptotic runtime for specific divide and conquer algorithm

Operational Objectives: Modify four distributed programs by supplying recursive and dynamic functions as specified. Write a report answering questions and reporting findings.

Deliverables: Five files fibo.cpp, loop.cpp, find.cpp, sort.cpp, and report.txt.

Recursion

Recursion is both a mathematical and a computational concept. I picked up the nearest "Discrete Mathematics for Computer Scientists" text and found the word "recursion" and its derivates indexed on pages 1, 2, 27, 33, 39, 40, 43-46, 50, 92, 112, 204, 263, 264, 281, 282, 313, 316, 402, 404, 405, and 435, all in a text of only 515 pages. (This is a "lighter-weight" text than the one FSU uses for its Discrete Mathematics series.) Suffice to say: You study recursion in discrete math.

Recursion is an important concept and tool in computer science as well, and you will encounter it in many significant ways and places, including:

In data structures, where some operations are conceptually simple when expressed recursively but extraordinarily complex when expressed iteratively.
In algorithms, where some are most naturally expressed and analyzed in recursive form.
In specific programming languages, notably Lisp and Scheme, that are "pure functional" languages and therefore do not have a loop structure and must rely on recursion for "repititive" tasks.
In compilers, where the concept of a recursive descent parser is fundamental.

Note that at this point we have not said exactly what recursion is, but rather have pointed out that it is many things and that you will likely not have a complete understanding of recursion until later in your career.

For now, we will define recursion in C++: a function is said to be recursive if it calls itself in its implementation. Here is an example:

float Mystery (float * array, size_t arraySize)
{
  if (arraySize == 0)                                          // base case
    return 0;
  return array[arraySize - 1] + Mystery(array, arraySize - 1); // recursive call
}

Some important things to observe about this function are:

There is no loop structure in the body
There is a call to the function in its own body (the "recursive call")
There is a conditional branch in the body one leg of which does not have a recursive call (the "base case")
The code is layed out almost like a proof by "mathematical induction" - the question is, proof of what?
What does the function Mystery calculate?

We can begin to answer the question by tracing the call Mystery (a,3) for the array a = [4,5,6]:

Mystery(a,3)
  return a[2] + Mystery(a,2) // apply recursive call
         = 6 + Mystery(a,2)  // substitute 6 = a[2]
         = 6 + (a[1] + Mystery(a,1)) // apply recursive call
         = 6 + (5 + Mystery(a,1))    // substitute 5 = a[1]
         = 6 + (5 + (a[0] + Mystery(a,0)))  // apply resursive call
         = 6 + (5 + (4 + Mystery(a,0)))     // substiture 4 = a[0]
         = 6 + (5 + (4 + 0)) // apply base case
         = 6 + (5 + 4)  // return
         = 6 + 9   // return
         = 15   // value to return

If you follow this trace, you can begin to see that the function Mystery returns the sum of all the elements of the array. In fact, the function body can serve as a model of a proof (using the Principle of Mathematical Induction) that Mystery(a,n) returns the sum of the first n elements of a.

Recursion is not always good. There are recursive solutions to some problems that are rather ugly - perhaps the example above is one of these. Why would we want a recursive calculation of a sum when a simple loop is more transparent and easier to implement? And recursive solutions can be genuinely bad, in the sense that they are extremely inefficient, requiring far more computational time than other equivalent methods.

Procedural Requirements

Copy all of the files in [LIB]/hw2/ into your hw2 directory. You should now have these files:
```
fibo.distribute
loop.distribute
find.distribute
sort.distribute
makefile
```
The four files suffixed ".distribute" are code files. The makefile will build the four executables fibo.x, loop.x, find.x, and sort.x.
Copy the four distribute files onto files suffixed ".cpp":
```
cp fibo.distribute fibo.cpp
cp loop.distribute loop.cpp
cp find.distribute find.cpp
cp sort.distribute sort.cpp
```
These are four code files that form the point of beginning for your assignment. These are correct code and should compile to executables using the supplied makefile.
Enter the command "make" and make sure you get four executables fibo.x, loop.x, find.x, and sort.x. Run each of these programs. Look at the source code for each of these programs. Spend some time understanding what each of these programs does (and what it does not do...).
Modify fibo.cpp, loop.cpp, find.cpp, and sort.cpp according to the code requirements and specifications below. Make sure that your four programs are well written and perform as required, including "boundary" cases.
Write a brief report answering questions (given below) about these programs and your experience creating them.
Turn in four files fibo.cpp, loop.cpp, find.cpp, sort.cpp, and report.txt using the hw2submit.sh submit script.

Warning: Submit scripts do not work on the program and linprog servers. Use shell.cs.fsu.edu to submit projects. If you do not receive two confirmations, the second with the contents of your project, there has been a malfunction.

Code Requirements and Specifications: `fibo.cpp`

Redefine the body of the function RFib [for "Recursive Fibonacci"] so that it implements the classic recursive definition of Fibonacci numbers, to wit:
The Fibonacci numbers are the elements of the sequence f₀, f₁, f₂, f₃ ... of non-negative integers such that f₀ = 0, f₁ = 1, and for each n > 1, f_n = f_n-1 + f_n-2.
Be sure that you have a base case, a recursive call, and no loop structure. (Note: You may have more than one base case and/or more than one recursive call. However, at least one of each is required for all recursive functions.) Be sure that you have covered the trivial boundary cases. You can compile with make and test your code. Be sure it is getting correct values for these cases:
```
RFib(0)  = 0
RFib(1)  = 1
RFib(2)  = 1
RFib(3)  = 2
RFib(4)  = 3
RFib(5)  = 5
RFib(6)  = 8
RFib(7)  = 13
RFib(8)  = 21
RFib(9)  = 34
RFib(10) = 55 
RFib(20) = 6765
RFib(30) = 832040
RFib(40) = 102334155
```
Of course you can test other input as well. It would be most unlikely for an incorrect program to generate the results above, however. NOTE: that the results returned by DFib will not be correct at this point.
Now supply a new body for the function DFib [for "Dynamic Fibonacci"] that is a so-called dynamic programming calculation. Dynamic programming in this case takes advantage of the recursive equation defining the Fibonacci numbers but also uses a loop and assignment statements to have the last three numbers in memory during each iteration of the loop. For example, we could have three variables, one for the current number, a second for the previous number, and a third for the number two places back:
```
size_t f,   // current fib number
       pf,  // previous fib number (1 back)
       ppf; // previous previous fib number (2 back)
```
Then after appropriate initialization the loop body would redefine each of these by updating one at a time:
```
ppf = pf;        // new previous previous becomes old previous
pf  = f;         // new previous becomes old current
f   = pf + ppf;  // new current becomes new previous plus new previous previous
```
Be sure that DFib uses a for loop and also that it takes care of the base cases outside the loop. Also make sure that DFib is not recursive. Test DFib on all the input you used to test RFib. Now these should be producing identical results.
The key observation that makes the dynamic programming approach work is that you don't need to calculate or remember the entire sequence just to get the next one, you only need the last two in the sequence to get the next. The three variables inside the dynamic programming loop are often called a "ladder" that supports the calculation.

Code Requirements and Specifications: `loop.cpp`

Supply a new body for the function RLoop in this program. RLoop(n) should duplicate the results already correctly produced by ILoop(n), namely output n dots. RLoop should be a recursive refactorization of ILoop. (To refactor code is to rewrite the code in a manner that does not change what the code accomplishes - do the same thing but in a different way. When you wrote DFib, you refactored RFib.)
RLoop should have a base case, a recursive call, and no loop structure.
RLoop should have identical behavior as ILoop. Note that you should be able to compile with make and test to see identical output for ILoop and RLoop.

Code Requirements and Specifications: `find.cpp`

Supply a new body for the template functions LowerBound and UpperBound that recursively calculate the lower bound and upper bound of a search value in an array. The input is a range determined by two pointers and a search value. The output is a pointer. Here is the header for LowerBound:
```
template < typename T >
T* LowerBound (T* low, T* hih, T val)
// pre:    low and hih point into an array of T
//         low + n = hih for some n >= 0
//         low[0] ... low[n-1] are in non-decreasing order
// post:   no state is changed
// return: ptr = lower bound location of val in range; that is:
//         ptr = low + i, where low[i-1] < val <= low[i]; or
//         ptr = hih if no such i exists
{
  // recursive body goes here
}
```
Note that this is a template function parametrized by the type of the elements in the search space. The return value should be a pointer satisfying the conditions in the header documentation. A typical call would be
```
loc = LowerBound (a, a+size, val);
if (loc == a + size) // not found
else ...             // loc points to first occurence of val in a
```
Note that (loc - a) (pointer arithmetic) is the array index of the location, if that is needed.
The recursive LowerBound body should have at least one base case, at least one recursive call, and NO LOOP. You can get ideas for this from the lecture notes on iterative algorithm lower_bound, which should produce identical results.
Similar requirements apply to UpperBound.
Test your implementations by invoking make to build executables. The type used in the test is char which makes it easy to visually check the correctness of the algorithm implementations. Be sure to test boundary cases ... the program should handle such things as empty search ranges with aplomb.

Code Requirements and Specifications: `sort.cpp`

Supply a new body for the function MergeSort(v, p, q) that implements recursive merge sort on an object of type std::vector. Note that the function header and three parameters are as follows:
```
MergeSort
  (
    std::vector<int> & v, // a std::vector object whose elements have type int
    size_t beg,           // the beginning index of the range to be sorted
    size_t end            // the end index of the range to be sorted
  )
```
which is designed to facilitate a recursive implementation - the recursive calls using different specifications of range. If a client wants to sort an entire vector, the call would be
```
MergeSort (v, 0, v.size()); // sorts entire vector v
```
Note also that the vector is passed by reference, so that anything the call to MergeSort does to the vector makes changes in the actual vector that is owned and passed by the calling process.
Your new body for MergeSort should follow and implement this algorithm:
1. Let mid be the middle index in the range [beg,end)
2. Sort each of the two ranges [beg,mid) and [mid,end)
3. Merge the two ranges back to [beg,end)
The recursive MergeSort body should have at least one base case (perhaps 2) and two recursive calls. In addition there should be a call to the function fsu::Merge that is supplied in the file sort.cpp. Do not change any code anywhere except inside the body of MergeSort.
The function fsu::Merge is set up to work with MergeSort - merge two sorted ranges into one (larger) sorted range. This function is correct and debugged - don't change it.
Test your new sort code by invoking make to build executables. There are two data files that you can use to test with, and you should make up some of your own. Also be sure to test boundary cases ... the program should handle such things as empty files and bad file reads with aplomb.

Report

The report is a plain text file. Do not submit any file with special formatting in it, such as Word or rtf or pdf or html. The assessment process will be able to read text files only.

The report file must be named "report.txt" for the submit script.

Begin your report before you even start coding, because some of the questions pertain to the files as distributed and before modification. Keep your log entries in an appendix to the report named "work log".

Start your report with file header info as follows:

COP 4530 Homework 2: Recursion - The Good, the Bad, and the Ugly
<your name>
<your CS username>
<your FSU username>

Answer each of the following questions and/or supply evidence that you have performed the required tasks. Be sure to number the questions and repeat the question in the report prior to answering.

After copying the four .distribute files onto their respective .cpp files, build by issuing the make command. What executables are created?
Run each executable and describe its behavior.
Open the source code files and try to understand the code. Take special note of the use of command line arguments in function main.

What is "argc"?
What is "argv[]"?
What does "atoi" accomplish?
What function calls are made by main in fibo?
What function calls are made by main in loop?
What function calls are made by main in find?
What function calls are made by main in sort?

Answers provided by copy/paste screen shots are fine.

The remaining questions should be answered after completing the coding part of the assignment.

You have created four different recursive functions.

Which of these do you consider to be most elegant?
Which of these do you consider to be most efficient?
Are any of these a total waste of (human) time without any redeeming features?

(Give brief justifications for each answer.)
Compare RFib and DFib.

Which of these is more efficient?
Which of these is more elegant?
Rate these as "good", "bad", and "ugly":

RFib
RLoop
LowerBound
UpperBound
MergeSort

Use each rating at least once.

Hints

Don't get confused about ranges: in C++, a range always includes the beginning index and excludes the end index, expressed as the "half open" interval [beg,end). This matches the standard pattern for a for loop:
```
for (size_t i = beg; i < end; ++i)    // loops over the range [beg,end)
{ Whatever(); }
```
and for array/vector indexing:
```
for (size_t i = 0; i < v.Size(); ++i) // loops over the entire vector
{ Whatever(); }
```
Note that an array or vector has valid indices in the range [0,size).
There are correct executables fibo_i.x, loop_i.x, find_i.x, sort_i.x, fibo_s.x, loop_s.x, find_s.x, sort_i.x distributed in LIB/area51. Here _i indicates executable for Intel/Linux (e.g. linprog) and _s executable for Sun/Unix (e.g., program).