Project 3: BitVector & Applications

Implementing BitVector and Discovering Prime Numbers

Revision dated 01/05/19

Educational Objectives. After successfully completing this assignment, the student should be able to accomplish the following:

Implement a class from a class definition
Implement class BitVector
Implement global operators for a class
Implement global operators class BitVector
Correctly separate class definition and implementation using files
Create executables of class client programs using makefiles and the Make utility
Test a class using specs and an existing test platform
Create client applications of BitVector

Operational Objectives: Implement the class BitVector and clients Sieve and PrimeUnder along with a makefile for the supplied test clients.

Deliverables: bitvect.cpp, primes.h, primes.cpp, makefile, log.txt

Assessment Rubric


=====================================================
student build:                            [0..5]:   x
    fbitvect.x                                  
    prime_below.x
    all_primes_below.x
assess build:                             [0..5]:   x
    fbitvect.x
    prime_below.x
    all_primes_below.x
test:
    fbitvect.x         bv.com1            [0..5]:   x
    fbitvect.x         bv.com2            [0..5]:   x
    fbitvect.x         bv.com3            [0..5]:   x
    fbitvect.x         bv.com4            [0..5]:   x
    prime_below.x      (prime)            [0..5]:   x
    prime_below.x      (non-prime)        [0..5]:   x
    all_primes_below.x (prime)            [0..5]:   x
    all_primes_below.x (non-prime)        [0..5]:   x
code:
    constructor                          [-2..0]: ( x)
    copy constructor                     [-2..0]: ( x)
    destructor                           [-2..0]: ( x)
    assignment operator                  [-2..0]: ( x)
engineering etc:
    readability                         [-20..0]: (xx)
    requirements                        [-20..0]: (xx)
    coding standard                     [-20..0]: (xx)
dated submission deduction           [2 pts per]: (xx)
                                                   --
total                                    [0..50]:  xx
=====================================================

Note that points are added for test results and subtracted during code review.

Background

See lecture notes Chapter 4. Classes Part 1, Chapter 5. Pointers, Chapter 6. Classes Part 2, and Chapter 8. BitVectors.

The Sieve of Eratosthenes

Assume that b is a vector of bits indexed in the range [0 ... n). Denote the "value" of bit k by b[k]. The Sieve of Eratosthenes is a process that operates on a bit vector b, as follows:

Begin with a bitvector b indexed in the range 0 ≤ k < n. Our goal is to unset bits for all composit numbers up to n, so that b[k] = 1 if and only if k is prime.
Initialize b by setting all bits.
Unset b[0] and b[1] (because 0 and 1 are not prime).
For k between 2 and the square root of n, stepsize 1:
  if b[k] is set
    for j between k + k and n, stepsize k:
      unset b[j]
Stop.

In short, unset the bits of all multiples of primes less than the square root of n.

Assertion 1. After invoking the sieve algorithm, an integer k in the range [0 ... n) is prime iff b[k] = 1.

The assertion is proved by mathematical induction. The base cases k = 0,1,2 are each easily checked by following the first few lines of the process. For the inductive step, assume the assertion is true for all index values less than k. If b[k] = 0 then there was an instance of k = a×b which resulted in unsetting b[k], so clearly k is composit. If b[k] = 1 then there was never an instance of k = a×b with a prime and a² ≤ k. But that is enough to prove that k is prime, because a composit number always has a factorization of the form a×b with a ≤ b (by just writing the smaller factor first) and then we would have k = p×q where p is a prime factor of a and q = b×a/p.

Remark. What was Eratosthenes thinking? Clearly, the big E did not use bitvectors. His approach went something like this: Imagine the numbers 1..n all written down in a list. We will cross all the composit numbers off of the list, so that those that are left must be all of the non-composit, that is, prime, numbers. The E-man went on to describe how to cross numbers off: first cross off 1, keep 2, and then cross off all multiples of 2. Go to the next number not crossed off (which must be prime) and cross of all of its multiples. Keep going until the list is exhausted.

Here is the sieve algorithm implemented (with some minor optimizations) in C++:

  void Sieve(BitVector& b, bool ticker)
  // pre:  b is a BitVector
  // post: for all odd k < b.Size(), 
  //       k is prime iff b.Test(k) == true
  {
    // set up timer
    fsu::Timer timer;
    fsu::Instant time;
    if (ticker)
    {
      timer.EventReset();
      std::cout << '.' << std::flush; // for p = 2
    }

    // calculate max and square root of max
    const size_t max = b.Size();
    size_t sqrt = 2;
    while (sqrt*sqrt < max)
    {
      ++sqrt;
    }

    // process b                                 <-- this is the actual sieve [for odd bits only]
    b.Set();
    b.Unset(0);  // 0 is not prime
    b.Unset(1);  // 1 is not prime

    // clear bits at odd multiples of all odd primes < sqrt(max) - ignore all even bits
    size_t jump;
    for (size_t i = 3; i < sqrt; i += 2)         // see note 1 below
    {
      if (b[i])  // i is prime
      {
        if (ticker) std::cout << '.' << std::flush;
        jump = 2*i;                              // see note 2 below
        for (size_t j = i*i; j < max; j+= jump)  // clear all odd multiples of i above i*i
        {                                        // see note 3 below
          b.Unset(j);
        }
      }
    }

    // output elapsed time
    if (ticker)
    {
      time = timer.EventTime();
      std::cout << '\n';
      std::cout << " Sieve time: ";
      time.Write_seconds(std::cout,2);
      std::cout << " sec\n";
    }
  }  // end Sieve()

This code contains a few "optimizations" that save work but produce an effective outcome. Notes:

We start i at 3 and always jump by 2, ensuring we look only at odd numbers.
The original algorithm jumps by i each iteration, but half of these end up with j an even number (odd + odd = even). By jumping twice as far, we skip over the even case to the next odd case (odd + even = odd).
The original algorithm starts at j = i+i. We start at j = i*i because all smaller values will have already been considered (if i*i has a prime factor p then i and i+i are also divisible by p).

By ignoring the even-index bits, the process is sped up considerably. The client programs that use this must take into account that even bits have been ignored. This is simple to deal with in practice because 2 is the only even prime number and the clients can deal with that as a special case. Here is one of the clients:

  size_t PrimeBelow (size_t n, bool ticker)
  // returns largest prime number <= n
  {
    if (n <= 1) return 0;
    if (n == 2) return 2;
    if (n >= n+1)          // see note 4
    {
      std::cerr << " ** PrimeBelow: argument too large for implementation. Execution terminating.\n";
      exit (EXIT_FAILURE);
    }
    fsu::BitVector b(1+n); // see note 5
    Sieve(b, ticker);
    if (n%2 == 0) --n;     // make n odd
    while (n > 2)          // see note 6
    {
      if (b[n])
        return n;
      n -= 2;
    }
    return 2;
  } // PrimeBelow()

Notes:

This is a way of checking whether n is the largest number in the type ... in that case, n + 1 would be smaller (typically the smallest value in the type).
We need bit n.
We are handling the even numbers greater than 2 by ignoring them, knowing they are not prime. We accoumplish this by making sure n is odd and jumping down by 2 until we encounter a prime (which we know must be odd) or 2.

Procedural Requirements:

Copy all of the files in LIB/proj3 into your cop3330/proj3 directory. Then copy the file LIB/cpp/bitvect.h ONTO THE FILE bitvect.api in your cop3330/proj3 directory. You should now see these files (and perhaps others) in your project directory:
```
all_primes_below.cpp
bitvect.api
bitvect.start
deliverables.sh
fbitvect.cpp
prime_below.cpp
```
Begin a log file named log.txt. This should be an ascii text file in cop3330/proj3 with the following header:
```
log.txt # log file for Prime project
<date file created>
<your name>
<your CS username>
```
This file should log all work done by date and time. It should also include a discussion of testing - how was it done, what were the results, and how any modifications were made.
Familiarize yourself with the BitVector header code in your library: LIB/cpp/bitvect.h. Both the API and implementation are discussed in the class notes.
Review the make tutorial (U Hawaii College of Engineering) linked from the organizer.
Copy the file bitvect.start onto bitvect.cpp. This is a good starter file that illustrates good formatting, namespace placement, and inclusion of header file. It also has implementations of the global operators and the Dump method. Complete this file to implement fsu::BitVector (as defined in bitvect.h).
As always, when you use code from assignment docs, lecture slides, or other sanctioned sources, type the code using the eyes->brain->fingers loop (or equivalent). Do not copy/paste.
Begin a project makefile that builds fbitvect.x. You will need intermediate targets bitvect.o and fbitvect.o. Make bitvect.cpp a dependancy for bitvect.o.
Test your BitVector implementation thoroughly, keeping records on procedures and results.
Create the two files primes.h and primes.cpp containing the prototypes and implementations of the three prime functions, respectively. These are:
```
  size_t PrimeBelow (size_t n, bool ticker = 0);
  void AllPrimesBelow (size_t n, std::ostream& os = std::cout, bool ticker = 0);
  void Sieve (fsu::BitVector& b, bool ticker = 0);
```
Be sure that these files are correctly structured with file level doc, good formatting, and, in the header file, multiple read protection. If you copy the code from this assignment doc, do so by typing, NOT by copy/paste. You want to understand the code as you type it.

Only one of these functions will require an implementation from whole cloth ... once you have read and typed the code for PrimeBelow and Sieve you should have a good understanding of what you need to do to implement AllPrimesBelow.
Add to your makefile targets prime_below.x and all_primes_below.x (along with the intermediate targets prime_below.o and all_primes_below.o). Explicitly list dependencies only for files that you have responsibility for.
Thoroughly test your prime calculators prime_below.x and all_primes_below.x.
Turn in bitvect.cpp, primes.h, primes.cpp, makefile, and log.txt using LIB/scripts/submit.sh and LIB/proj3/deliverables.sh, following the usual procedure.

Warning: Submit scripts do not work on the program and linprog servers. Use shell.cs.fsu.edu to submit projects. If you do not receive the second confirmation with the contents of your project, there has been a malfunction.

Technical Requirements and Specifications - BitVector

BitVector should comply with the behavior outlined in the lecture notes.
BitVector should agree exactly with the behavior elicited by fbitvect.cpp as demonstrated by LIB/area51/fbitvect_i.x.
Code implementing BitVector should comply with the code standard found in the Code Stds document linked from the course organizer.
When in doubt about required behavior, consult the executable LIB/area51/fbitvect_i.x.

Technical Requirements and Specifications - prime calculators

PrimeBelow (n, ticker) should return the largest prime number that is less than or equal to n. The ticker variable is passed to Sieve.
AllPrimesBelow (n, os, ticker) should output through os all prime numbers less than or equal to n. The ticker variable is passed to Sieve.
Sieve (b, ticker) runs the Sieve of Eratosthenes algorithm on b (regular or optimized version, student choice). If ticker != 0 the timer (and screen ticker) are activated.
When in doubt about required behavior, consult the executable LIB/area51/prime_below_i.x and LIB/area51/all_primes_below_i.x.

Hints

Note that fbitvect accepts a command file, so you can devise tests, record them in a command file, and then repeat the test with a few keystrokes. Enter the executable name with no arguments to see what is expected.
Note that we have a timing device inserted in the two prime calculators. You activate the timer with a second command line argument (after the required number). Any visible character will do - the timer is activated when the second argument exists.

Sieve is often used as a benchmark program. Here is some timing data we obtained for the new linprog machines, using an optimized version of Sieve:

   n     PrimeBelow(n)    time on old linprog2      time on new linprog7
----     -------------    --------------------      --------------------
10^2                97          0.00 sec              0.00 sec
10^3               997          0.00                  0.00
10^4              9973          0.00                  0.00
10^5             99991          0.00                  0.00
10^6            999983          0.02                  0.02
10^7           9999991          0.21                  0.13
10^8          99999989          2.49                  0.98
10^9         999999937         33.66                 13.31
10^10       9999999967        394.17 (6.57 min)     169.80 (2.83 min)
10^11      99999999977       4398.74 (73.31 min)   1924.85 (32.08 min)

This data appears to show that the runtime of Sieve() is slightly slower than Θ(n) but considerably faster than Θ(n²). See the Wikipedia entry for much more on oprimization, runtime, and other Sieve topics.