Project 6: Stats Templates

Making the stats functions generic

Revision dated 03/31/17

Educational Objectives: After completing this assignment the student should have the following knowledge, ability, and skills:

Distinguish between a function template and a concrete function: definition, implementation, and file structure conventions
Demonstrate correct notation and syntax for defining and implementing a function template
State the rules compilers use to instantiate a function template
Test a function template for correct syntax
Test a function template for correct behavior
Test a function template for genericity
Understand the various native data types and their overloads for the input and output operatots.

Operational Objectives: Implement and test the function templates Mean, Median, and InsertionSort.

Deliverables: Three files: stats.t, makefile, log.txt

Assessment Rubric

build 10 test executables             [1 pt each]:  xx
40 tests [1 point each]:
 stattestA.x data1, data2, data3, data4   x
 stattestB.x data1, data2, data3, data4   x
 stattestC.x data1, data2, data3, data4   x
 stattestD.x data1, data2, data3, data4   x
 stattestE.x data1, data2, data3, data4   x
 stattestF.x data1, data2, data3, data4   x
 stattestG.x data1, data2, data3, data4   x
 stattestH.x data1, data2, data3, data4   x
 stattestI.x data1, data2, data3, data4   x
 stattestJ.x data1, data2, data3, data4   x
total for tests                           [0..40]:  xx
log.txt                                 [-20..0]]: ( x)
project specs                           [-20..0]]: ( x)
code quality                            [-20..0]]: ( x)
dated submission deduction            [2 pts per]: ( x)
                                                   ---
total                                     [0..50]:  xx

Notes: 1. input files may vary over time and from those distributed.
       2. selection of 10 tests may vary among the 21 choices

Code quality includes: 
  - conformance to assignment requirements and specifications
  - conformance to coding standards [see course organizer]
  - engineering and design, including appropriateness of name choices
  - readability

Background: See lecture notes Chapter 12. Templates.

Procedural Requirements

Copy all files from LIB/proj6/. You should see at least these:

stattest.cpp     # client program testing Stats Templates
makefile.start   # makefile stub 
runtests.sh      # script to build and execute all 21 tests on 4 different data files
deliverables.sh  # submission configuration file

Begin a log file named log.txt. This should be an ascii text file in cop3330/proj6 with the following header:
```
log.txt # log file for stats project
<date file created>
<your name>
<your CS username>
```
This file should document all work done by date and time, including all testing and test results.
Create the file stats.t defining and implementing function templates for Mean, Median, and InsertionSort. Be sure to make log entries for all work.
READ the test program LIB/stattest.cpp. Note that stattest.cpp is set up so that you can choose any one of 21 different numerical types by uncommenting one of 21 possible typedef statements defining NumberType. Start your testing by creating 21 different tests stattest1.cpp, ... , stattest21.cpp, where stattestX.cpp uses the number type X.

This is a good exercise in using basic unix commands and Emacs. Do this work yourself, on linprog. Be sure to keep your log up to date.
READ the script runtests.sh. Be sure you understand what it does and how to use it.
CREATE a makefile that builds all 21 versions of stattest.x. Begin by copying makefile.start to makefile. Be sure the makefile is set up so that you can
1. Build any specific stattestX.x target [X = 1,2,...,21]
2. Build all stattest1.x ... stattest21.x targets
3. Erase output data files created by runtests.sh
Test your implementation using the supplied runtests.sh script. There should be 21 versions of stattest.cpp, one for each numerical type listed in the program header documentation, named as in the distributed makefile. You also need four data files:
1. Integer data, odd number of entries
2. Integer data, even number of entries
3. Real number data, odd number of entries
4. Real number data, odd number of entries
Again be sure to make log entries appropriately.
Turn in the files stats.t, makefile, and log.txt. using the submit script system.

Warning: Submit scripts do not work on the program and linprog servers. Use shell.cs.fsu.edu to submit projects. If you do not receive two confirmations, the second with the contents of your project, there has been a malfunction.

Code Requirements and Specifications

In file stats.t define and implement function templates with these prototypes:

template < typename T >
long double Mean (const std::vector<T>& v);  // returns mean of elements of v

template < typename T >
long double Median (std::vector<T>& v);  // returns median of elements of v

template < typename T >
void InsertionSort (std::vector<T>& v);  // implements insertion_sort algorithm

The behavior and semantics are similar to the non-template functions from Project 2, using vector instead of array to contain data.

Be sure your code conforms to the C++ Code Standards (available also through the Course Organizer).
Be sure that you have tested your code for syntax errors with the supplied test harness as well as your own test program, using the supplied makefile with warning flags set. All warnings should be eliminated.
Be sure that you have tested your code for both logic errors with the supplied test harness as well as your own test program.
Be sure that you have tested your code for genericity using all 21 numerical types available in stattest.cpp.

The Strange Case of 1-byte Data Types

You are developing function templates for Mean and Median. The peculiar relationship with C, C++, and "char" data makes the results difficult to interpret.

The types char, unsigned char, and signed char (as well as uint8_t and int8_t) can all be treated as numerical types, and hence we can add and multiple in those types. But the I/O system tends to interpret 1-byte types as characters. So we have two ways to think of the mean and median.

Example: data1 = { APPLE } data2 = { BANANA }

The "numerical mean" and "numerical median" are calculated directly by your templates. These calculate using the numerical values of the characters, as bytes. An alternate is a kind of "in type" calculation. These are also presented by stattest. The results on these two data sets are:
Data type:        char
Data as entered:  A P P L E
Numerical Mean:   74
Numerical Median: 76
Mean in type:     J
Median in type:   L
Data after sort:  A E L P P
and
Data type:        char
Data as entered:  B A N A N A
Numerical Mean:   69.5
Numerical Median: 65.5
Mean in type:     E + 0.5
Median in type:   [A,B]
Data after sort:  A A A B N N

The presentation for 1-byte types in Stattest.cpp is designed to be intuitive but also recognize that there are no characters between, say, A and B, so that in the character set (A+B)/2 doesn't really make sense. The way these are stated is to indicate how "off point" the mean is and, in the case of even data count, where the median lies. For the "in type" mean with type = char, think of the characters lined up left to right with some of them underlined as being in the data set. The mean represents the center of mass of the underlined characters, and can be near a character that is not in the data set. The median is either the middle element or "half way between" the two middle elements, which is signalled by a bracketed pair.

If the type is anything other than a char type, the input operator is expecting to see digits. If it is a signed type it can handle a leading +/- sign. If it is a float type, it can handle a period (.) someplace in the string.

Note that these are all controlled by the overloads of the input and output operators whose prototypes are:

template < typename T >
std::istream& operator<< (std::istream& is, T& tref);

template < typename T >
std::ostream& operator<< (std::ostream& os, const T& tval);

for a given type T. When T = char the behavior of the input operator is to grab a single char (other than white space). When T = unsigned int, the behavior is to grab as long a string of digits as it can find, stopping only when whitespace is reached or a non-digit is encountered. Note that these overloads of operators<< and >> are in istream and ostream, respectively, for all of the native types.

Example: Suppose we have a file with this content:
22 3 5 17 105 -56 13 16 A 65 P 22 21
and consider what happens for 3 different types: int, unsigned int, and char.

For type T = int these numbers will be read, stopping when the 'A' is encountered:
22 3 5 17 105 -56 13 16
For type T = unsigned int these numbers will be read, stopping when the '-' is encountered:
22 3 5 17 105
For type T = char these characters will be read, stopping when the end of file is encountered:
2 2 3 5 1 7 1 0 5 - 5 6 1 3 1 6 A 6 5 P 2 2 2 1

Hints

The behavior required is essentially the same as that from project 1 "Stats". Sample executables may be found in LIB/area51.
Copy LIB/proj6/runtests.sh. Read and understand it. How many tests does it run? (Hint: a famous futuristic novel is titled 19xx).
Change permissions on runtests.sh 700. Create the four data files using the names that runtests.sh is expecting. Be sure your data files are varied, so that int and float types get exercised with both even and odd sizes. Execute runtests.sh. If you get no errors and no differences to screen, you are probably OK. Best to redo the data sets a few times to be more certain.