Homework 6: Heaps and Heap Sort

Improving Heap Construction and Heap_Sort Performance

Educational Objectives: After completing this assignment, the student should be able to accomplish the following:

Define and give examples of generic algorithms
Define the ADT Priority Queue
Define generic sort algorithms operating on a range of values determined by iterators
Implement heap algorithms, including build_heap with linear runtime
Implement and test alternate versions of heap sort as generic algorithms
Explain why the build_heap algorithm has runtime Θ(n) but heap_sort has runtime Θ(n log n).
Use namespaces to develop and test multiple implementations of an ADT or algorithm

Operational Objectives: Implement two new generic heap algorithms fsu::g_heap_repair and fsu::g_build_heap. Move the current implementation of g_heap_sort into the alt namespace and re-implement fsu::g_heap_sort using the new algorithms. Test both versions fsu::g_heap_sort and alt::g_heap_sort using sort_spy.cpp and tease out the differences in runtimes, providing details in log.txt.

Deliverables: Two files:

gheap.h      # fsu::g_heap_sort and alt::g_heap_sort, plus other algorithms
log.txt      # your project work log

Procedural Requirements

The official development | testing | assessment environment is gnu g++47 -std=c++11 -Wall -Wextra on the linprog machines.
Create and work within a separate subdirectory cop4530/hw6.
Do your own work. Variations of this project have been used in previous courses. You are not permitted to seek help from former students or their work products. For this and all other projects, it is a violation of course ethics and the student honor code to use, or attempt to use, code from any source other than that explicitly distributed in the course code library, or to give or receive help on this project from anyone other than the course instruction staff. See Introduction/Work Rules.
Begin by copying the entire contents of the directory LIB/hw6 into your cop4530/hw6/ directory. Then copy the file LIB/tcpp/gheap.h into cop4530/hw6/. At this point you should see these files in your directory:
```
gheap.h
hsort.cpp
hw6submit.sh
ranuint.cpp
sort_spy.cpp
```
Edit gheap.h to satisfy the requirements for that deliverable.
Test your algorithms thoroughly and put your notes and conclusions into your log.txt.
Submit the assignment using the script hw6submit.sh.

Warning: Submit scripts do not work on the program and linprog servers. Use shell.cs.fsu.edu to submit assignments. If you do not receive the second confirmation with the contents of your assignment, there has been a malfunction.

Heap Algorithms

This section discusses the advanced heap algorithms you are implementing for the assignment. The basic heap algorithms push_heap, pop_heap, and the vanilla version of heap_sort are discussed in the lecture notes and are implemented in the file gheap.h that you have copied. This discussion assumes that the reader is familier with those materals.

Begin with a review of the pop_heap algorithm and its implementing code (default order version):

  template <class I>
  void g_pop_heap (I beg, I end)
  {
    if (end - beg < 2)
      return;
    size_t n = end - beg - 1;
    size_t i = 0, left, right;
    bool finished = 0;
    g_XC(beg[0],beg[n]);
    do
    {
      left = 2*i + 1; right = left + 1;  // left and right child nodes
      if (right < n)                     // both child nodes exist
      {
        if (beg[left] < beg[right])      // ==> follow right subtree
        {
          if (beg[i] < beg[right])
          {
            g_XC(beg[right], beg[i]);
            i = right;
          }
          else
          {
            finished = 1;
          }
        }
        else // !(beg[left] < beg[right]) ==> follow left subtree
        {
          if (beg[i] < beg[left])
          {
            g_XC(beg[left], beg[i]);
            i = left;
          }
          else
          {
            finished = 1;
          }
        }
      }
      else if (left < n)       // only the left child node exists
      {
        if (beg[i] < beg[left])
        {
          g_XC(beg[left], beg[i]);
        }
        finished = 1;          // no grandchild nodes exist
      } 
      else                     // no child nodes exist
      {
        finished = 1;
      }
    }
    while (!finished);
  }

This algorithm consists of a swap of the first and last elements of a presumed heap in the range [0..n] followed by a repair of the smaller heap in the range [0..n-1]. The repair works because the two children of the root are heaps, so the only place where the heap conditions might be violated is at the root. The repair portion of this code is in blue.

We can create another function called "repair" using that blue code:

  void repair (I beg, I end)
  {
    if (end - beg < 2)
      return;
    size_t n = end - beg - 1;
    size_t i = 0, left, right;
    bool finished = 0;
    do
    {
      left = 2*i + 1; right = left + 1;  // left and right child nodes
      if (right < n)                     // both child nodes exist
      {
        if (beg[left] < beg[right])      // ==> follow right subtree
        {
          if (beg[i] < beg[right])
          {
            g_XC(beg[right], beg[i]);
            i = right;
          }
          else
          {
            finished = 1;
          }
        }
        else // !(beg[left] < beg[right]) ==> follow left subtree
        {
          if (beg[i] < beg[left])
          {
            g_XC(beg[left], beg[i]);
            i = left;
          }
          else
          {
            finished = 1;
          }
        }
      }
      else if (left < n)       // only the left child node exists
      {
        if (beg[i] < beg[left])
        {
          g_XC(beg[left], beg[i]);
        }
        finished = 1;          // no grandchild nodes exist
      } 
      else                     // no child nodes exist
      {
        finished = 1;
      }
    }
    while (!finished);
  }

and we can refactor this code into a more compact form as:

  void repair (I beg, I end)
  {
    if (end - beg < 2)
      return;
    size_t n,i,left,right,largest;
    n = end - beg;
    i = 0;
    bool finished = 0;
    do
    {
      left = 2*i + 1; right = left + 1;
      largest = ((left < n && beg[i] < beg[left]) ? left : i);
      if (right < n && beg[largest] < beg[right])
        largest = right;
      // test order property at i; if bad, swap and repeat
      if (largest != i)
      {
        fsu::g_XC(beg[i],beg[largest]);
        i = largest;
      }
      else finished = 1;
    }
    while (!finished);
  }

Be sure to convince yourself that these two blocks of code implement "repair" in exactly the same way. We can take "repair" one step further, to repair any node in the tree under the assumption that its child nodes are heaps. The place where repair is needed is passed in as a third iterator:

  void repair (I beg, I loc, I end)
  {
    if (end - beg < 2)
      return;
    size_t n,i,left,right,largest;
    n = end - beg;
    i = loc - beg; // only changes in red!
    bool finished = 0;
    do
    {
      left = 2*i + 1; right = left + 1;
      largest = ((left < n && p(beg[i], beg[left])) ? left : i);
      if (right < n && p(beg[largest], beg[right]))
        largest = right;

      // test order property at i; if bad, swap and repeat
      if (largest != i)
      {
        fsu::g_XC(beg[i],beg[largest]);
        i = largest;
      }
      else finished = 1;
    }
    while (!finished);
  }

This code defines our new generic algorithm g_heap_repair(I beg, I loc, I end). We can immediately refactor g_pop_heap as follows:

  void g_pop_heap (I beg, I end)
  {
   if (end - beg < 2)
      return;
    g_XC(*beg,*(end - 1));
    g_heap_repair(beg,beg,end - 1,pred);
  }

Moreover, we can use heap_repair as another way to create a heap from an arbitrary array (or other range):

  void fsu::g_build_heap (I beg, I end)
  {
    size_t size = end - beg;
    if  (size < 2) return;
    for (size_t i = size/2; i > 0; --i)
    {
      g_heap_repair(beg, beg + (i - 1), end);
    }
  }

The other way to build a heap from scratch is embodied in the first loop in the vanilla version of heap sort:

  void alt::g_build_heap (I beg, I end)
  {
    size_t size = end - beg;
    if  (size < 2) return;
    for (size_t i = 1; i < size; ++i)
    {
      g_push_heap(beg, beg + (i + 1));
    }
  }

The remarkable facts are:

alt::build_heap (using a loop of calls to push_heap) has runtime Θ(n log n)
fsu::build_heap (using a loop of calls to heap_repair) has runtime Θ(n)

These facts are all the more remarkable when you consider that the worst-case runtime for both push_heap and repair_heap are Θ(log n) and the loop of calls is Θ(n) in both cases. We will come back to explaining why these are true later. For now, just contemplate the subtlety. And realize that this provides an opportunity to improve the performance of heap_sort, by substituting fsu::build_heap for the first loop of calls to push_heap. This change does not affect the asymptotic runtime of heap_sort, because the second loop still runs in Θ(n log n). But it certainly improves the algorithm.

Code Requirements and Specifications

Begin by moving the current two versions of heap_sort into the namespace alt, keeping only the prototypes in namespace fsu. You will also need to add the namespace fsu:: resolution to the calls to push_heap and pop_heap. The effect is that the old "fsu::g_heap_sort" is now "alt::g_heap_sort".

Add these algorithm prototypes to namespace fsu (including the requisit template statements):

g_build_heap(I beg, I end, P& p);
g_build_heap(I beg, I end);
g_heap_repair(I beg, I loc, I end, P& p);
g_heap_repair(I beg, I loc, I end);

so that the totality of prototypes in the file is:

namespace fsu
{
  g_push_heap   (I beg, I end, P& p);
  g_pop_heap    (I beg, I end, P& p);
  g_heap_sort   (I beg, I end, P& p);
  g_build_heap  (I beg, I end, P& p);
  g_heap_repair (I beg, I loc, I end, P& p);

  g_push_heap   (I beg, I end);
  g_pop_heap    (I beg, I end);
  g_heap_sort   (I beg, I end);
  g_build_heap  (I beg, I end);
  g_heap_repair (I beg, I loc, I end);
} 

namespace alt
{
  g_heap_sort   (I beg, I end, P& p);
  g_heap_sort   (I beg, I end);
}

The implementations for the alt versions should also be already there in the file, where you changed the namespace to include them under alt.

Implement all of the namespace fsu algorithms, obviously in the namespace fsu. The implementation for fsu::g_push_heap can be the same as it was in the old version. The new algorithms g_heap_repair and g_build_heap obviously require new implementations, using the ideas outlined above in this document. Finally, g_pop_heap and g_heap_sort require the improved implementations, again as outlined above.
Test your various implementations using the supplied sort_spy.cpp. Note that this version of sort_spy has the additional feature of checking the results of each sort and reporting the number of order errors in the result. Zero is good. Anything else tells you the "sort" algorithm is misnamed.
Once you are sure you have the implementations correct, begin to pay attention to the comp_count data for the two versions of heap_sort. (Keep observations in your log.) Try to tease out a distinction between the two.

Runtime of fsu :: build_heap

In the discussion of heap algorithms we asserted that the build_heap algorithm has runtime O(n), which is a surprising results given the organization of the algorithm as a loop of n/2 of calls to a function whose worst-case runtime is clearly Θ (log n).

To gain some intuition on this fact, notice that the algorithm can be described as follows: For each subtree, starting with the smallest and progressing to the largest, repair the structure to be a heap. This process starts out at the bottom of the tree - i.e., the leaves of the tree, which are by default already heaps. So until we reach a node with a child, there is nothing to repair (which is why we can start the loop at n/2 - the leaves are the nodes with height 0). We first go through all of the nodes with height 1, repairing as we go; then all the nodes with height 2, and so on, until we hit the node with largest height, the root, which has height log₂ n = lg n. Notice that 1/2 of the nodes have height 0 with no repair needed. Also 1/2 of the remaining nodes have height 1, so the repair process requires at most one swap. As we get nearer the top of the tree, where the "tall" nodes are, there are very few of them to repair.

Let's say there are N(k) nodes with height k in the tree. Since heap_repair at one of these nodes requires at worst k comparisons, the total number of comparisons is no greater than sum of k * N(k), the sum taken over all possible heights k:

comp_count <= Σ_kk*N(k)

The number of nodes of height k can be calculated as no greater than the ceiling of n/2^k+1. Substituting into the summation yields

comp_count <= Σ_kk*[ceil(n/2^k+1)] <= Σ_kk*[n/2^k] = nΣ_k[k/2^k]

A fact from Discrete Math is:

Σ_kka^k <= 1/(1 - |a|), provided |a| < 1.

(The sum extends to infinity. See Rosen, Discrete Math, xxx.) Taking a = 1/2, we then have

Σ_k[k/2^k] <= 2

Extending our sum from 0 to infinity and applying this fact, we have

comp_count <= n Σ_k[k/2^k] <= 2n

which verifies that

comp_count <= O(n) and therefore
fsu::build_heap <= O(n).

One final interesting note: Worst-case comp_count for the algorithm has been established exactly!
In [Suchenek, Marek A. (2012), "Elementary Yet Precise Worst-Case Analysis of Floyd's Heap-Construction Program", Fundamenta Informaticae (IOS Press) 120 (1): 75] Suchenek shows that

comp_count = 2n - 2s₂(n) - e₂(n)

in the worst case, where s₂(n) is the number of 1's in the binary representation of n and e₂(n) is the exponent of 2 in the prime decomposition of n.

Runtime of alt :: build_heap

The opposite conclusion holds for the basic or "alt" version, which builds a heap with a loop of calls to push_heap. In that algorithm, the calls to push_heap on the sub-range [0,k+1] may require lg(k) comparisons, so the entire algorithm may require

comp_count >= Σ_k lg k

comp_count >= Ω(n log n) and therefore
alt::build_heap >= Ω(n log n).

The naming of this basic build_heap algorithm as "alt::build_heap" is not standard and should not be used outside this course.

Hints

By compiling hsort.cpp, you have a test of the default order version of your new heap_sort.
There is an executable you can use to check your comp_count data against what we think is correct: [LIB]/area51/sort_spy_all.x.
One optimization you want to use, after all code is debugged and tested, is lowering the cost of the control arithmetic used in the various algorithms. Here multiplying and dividing by 2 are used to calculate the child and parent indices. This integer arithmetic can be made much faster by using the observations:
1. Multiplying by 2 is the same as left shift by 1
2. Dividing by 2 is the same as right shift by 1
3. Adding 1 to a known multiple of 2 is the same as bitwise or with the mask 0x01
For examples:
```
left = 2*i + 1;                 // uses integer arithmetic
left = (i << 1) | (size_t)0x01; // uses bitwise operations to get same result

parent = i/2;      // integer arithmetic
parent = (i >> 1); // same result using bitwise operations
```
Because integer arithmetic follows an algorithm quite a few clock cycles may be needed to perform one division or multiplication. The bitwise operations on the other hand have hardware support and may run in as little as one clock cycle.