Advanced Heaps and Heap Sort

Improving Heap Construction and Heap_Sort Performance

Version 09/15/16

Heap Algorithms

This section discusses the advanced heap algorithms implewmented in gheap_advanced.h. The basic heap algorithms push_heap, pop_heap, and the vanilla version of heap_sort are discussed in the COP4530 lecture notes and are implemented in the file cop4530/LIB/tcpp/gheap.h. This discussion assumes that the reader is familier with those materals.

Begin with a review of the pop_heap algorithm and its implementing code (default order version):

  template <class I>
  void g_pop_heap (I beg, I end)
  {
    if (end - beg < 2)
      return;
    size_t n = end - beg - 1;
    size_t i = 0, left, right;
    bool finished = 0;
    g_XC(beg[0],beg[n]);
    do
    {
      left = 2*i + 1; right = left + 1;  // left and right child nodes
      if (right < n)                     // both child nodes exist
      {
        if (beg[left] < beg[right])      // ==> follow right subtree
        {
          if (beg[i] < beg[right])
          {
            g_XC(beg[right], beg[i]);
            i = right;
          }
          else
          {
            finished = 1;
          }
        }
        else // !(beg[left] < beg[right]) ==> follow left subtree
        {
          if (beg[i] < beg[left])
          {
            g_XC(beg[left], beg[i]);
            i = left;
          }
          else
          {
            finished = 1;
          }
        }
      }
      else if (left < n)       // only the left child node exists
      {
        if (beg[i] < beg[left])
        {
          g_XC(beg[left], beg[i]);
        }
        finished = 1;          // no grandchild nodes exist
      } 
      else                     // no child nodes exist
      {
        finished = 1;
      }
    }
    while (!finished);
  }

This algorithm consists of a swap of the first and last elements of a presumed heap in the range [0..n] followed by a repair of the smaller heap in the range [0..n-1]. The repair works because the two children of the root are heaps, so the only place where the heap conditions might be violated is at the root. The repair portion of this code is in blue.

We can create another function called "repair" using that blue code:

  void repair (I beg, I end)
  {
    if (end - beg < 2)
      return;
    size_t n = end - beg - 1;
    size_t i = 0, left, right;
    bool finished = 0;
    do
    {
      left = 2*i + 1; right = left + 1;  // left and right child nodes
      if (right < n)                     // both child nodes exist
      {
        if (beg[left] < beg[right])      // ==> follow right subtree
        {
          if (beg[i] < beg[right])
          {
            g_XC(beg[right], beg[i]);
            i = right;
          }
          else
          {
            finished = 1;
          }
        }
        else // !(beg[left] < beg[right]) ==> follow left subtree
        {
          if (beg[i] < beg[left])
          {
            g_XC(beg[left], beg[i]);
            i = left;
          }
          else
          {
            finished = 1;
          }
        }
      }
      else if (left < n)       // only the left child node exists
      {
        if (beg[i] < beg[left])
        {
          g_XC(beg[left], beg[i]);
        }
        finished = 1;          // no grandchild nodes exist
      } 
      else                     // no child nodes exist
      {
        finished = 1;
      }
    }
    while (!finished);
  }

and we can refactor this code into a more compact form as:

  void repair (I beg, I end)
  {
    if (end - beg < 2)
      return;
    size_t n,i,left,right,largest;
    n = end - beg;
    i = 0;
    bool finished = 0;
    do
    {
      left = 2*i + 1; right = left + 1;
      largest = ((left < n && beg[i] < beg[left]) ? left : i);
      if (right < n && beg[largest] < beg[right])
        largest = right;
      // test order property at i; if bad, swap and repeat
      if (largest != i)
      {
        fsu::g_XC(beg[i],beg[largest]);
        i = largest;
      }
      else finished = 1;
    }
    while (!finished);
  }

Be sure to convince yourself that these two blocks of code implement "repair" in exactly the same way. We can take "repair" one step further, to repair any node in the tree under the assumption that its child nodes are heaps. The place where repair is needed is passed in as a third iterator:

  void repair (I beg, I loc, I end)
  {
    if (end - beg < 2)
      return;
    size_t n,i,left,right,largest;
    n = end - beg;
    i = loc - beg; // only changes in red!
    bool finished = 0;
    do
    {
      left = 2*i + 1; right = left + 1;
      largest = ((left < n && beg[i] < beg[left]) ? left : i);
      if (right < n && beg[largest] < beg[right])
        largest = right;

      // test order property at i; if bad, swap and repeat
      if (largest != i)
      {
        fsu::g_XC(beg[i],beg[largest]);
        i = largest;
      }
      else finished = 1;
    }
    while (!finished);
  }

This code defines our new generic algorithm g_heap_repair(I beg, I loc, I end). We can immediately refactor g_pop_heap as follows:

  void g_pop_heap (I beg, I end)
  {
   if (end - beg < 2)
      return;
    g_XC(*beg,*(end - 1));
    g_heap_repair(beg,beg,end - 1);
  }

Moreover, we can use heap_repair as another way to create a heap from an arbitrary array (or other range):

  void fsu::g_build_heap (I beg, I end)
  {
    size_t size = end - beg;
    if  (size < 2) return;
    for (size_t i = size/2; i > 0; --i)
    {
      g_heap_repair(beg, beg + (i - 1), end);
    }
  }

The other way to build a heap from scratch is embodied in the first loop in the vanilla version of heap sort:

  void alt::g_build_heap (I beg, I end)
  {
    size_t size = end - beg;
    if  (size < 2) return;
    for (size_t i = 1; i < size; ++i)
    {
      g_push_heap(beg, beg + (i + 1));
    }
  }

The remarkable facts are:

alt::build_heap (using a loop of calls to push_heap) has runtime Θ(n log n)
fsu::build_heap (using a loop of calls to heap_repair) has runtime Θ(n)

These facts are all the more remarkable when you consider that the worst-case runtime for both push_heap and repair_heap are Θ(log n) and the loop of calls is Θ(n) in both cases. We will come back to explaining why these are true later. For now, just contemplate the subtlety. And realize that this provides an opportunity to improve the performance of heap_sort, by substituting fsu::build_heap for the first loop of calls to push_heap. This change does not affect the asymptotic runtime of heap_sort, because the second loop still runs in Θ(n log n). But it certainly improves the algorithm.

Code Organization

The vanilla versions of heap_sort are moved into the namespace alt. Although we no longer us a loop of calls g_push_heap to build a heap, that algorithm is still useful, notably in implementing O(log n) priority queues, so both g_push_heap and g_pop_heap are retained in the fsu namespace. The algorithms are thus organized as follows in gheap_advanced.h:

namespace fsu
{
  // using a supplied order predicate:
  g_push_heap   (I beg, I end, P& p);
  g_pop_heap    (I beg, I end, P& p);
  g_build_heap  (I beg, I end, P& p);
  g_heap_repair (I beg, I loc, I end, P& p);
  g_heap_sort   (I beg, I end, P& p);

  // using default order:
  g_push_heap   (I beg, I end);
  g_pop_heap    (I beg, I end);
  g_build_heap  (I beg, I end);
  g_heap_repair (I beg, I loc, I end);
  g_heap_sort   (I beg, I end);
} 

namespace alt
{
  // vanilla versions
  g_heap_sort   (I beg, I end, P& p);
  g_heap_sort   (I beg, I end);
}

The implementations for heap_sort (default order versions) in gheap_advanced are:

namespace alt
{
  void g_heap_sort (I beg, I end)
   {
    if (end - beg <= 1) return;
    size_t size = end - beg;
    size_t i; 
    for (i = 1; i < size; ++i)
    {
      fsu::g_push_heap(beg, beg + (i + 1));
    }
    for (i = size; i > 1; --i)
    {
      fsu::g_pop_heap(beg, beg + i);
    }
  }
}

namespace fsu
{
  void g_heap_sort (I beg, I end)
  {
    if (end - beg <= 1) return;
    size_t size = end - beg;
    size_t i; 
    fsu::g_build_heap(beg, end);
    for (i = size; i > 1; --i)
    {
      g_pop_heap(beg, beg + i);
    }
  }
}

The Θ(n log n) loop of calls to g_push_heap in the alt version is replaced with the Θ(n) call to fsu::g_build_heap(beg, end) in the fsu version.

Runtime of fsu :: build_heap

In the discussion of heap algorithms we asserted that the build_heap algorithm has runtime O(n), which is a surprising result given the organization of the algorithm as a loop of n/2 of calls to a function whose worst-case runtime is clearly Θ (log n).

To gain some intuition on this fact, notice that the algorithm can be described as follows: For each subtree, starting with the smallest and progressing to the largest, repair the structure to be a heap. This process starts out at the bottom of the tree - i.e., the leaves of the tree, which are by default already heaps. So until we reach a node with a child, there is nothing to repair (which is why we can start the loop at n/2 - the leaves are the nodes with height 0). We first go through all of the nodes with height 1, repairing as we go; then all the nodes with height 2, and so on, until we hit the node with largest height, the root, which has height log₂ n = lg n. Notice that 1/2 of the nodes have height 0 with no repair needed. Also 1/2 of the remaining nodes have height 1, so the repair process requires at most one swap. As we get nearer the top of the tree, where the "tall" nodes are, there are very few of them to repair.

Let's say there are N(k) nodes with height k in the tree. Since heap_repair at one of these nodes requires at worst 2k comparisons, the total number of comparisons is no greater than 2 times the sum of k * N(k), the sum taken over all possible heights k:

comp_count <= 2 Σ_kk*N(k)

The number of nodes of height k can be calculated as no greater than the ceiling of n/2^k+1. Substituting into the summation yields

comp_count <= 2 Σ_kk*[ceil(n/2^k+1)] <= Σ_kk*[ceil(n/2^k)] <= nΣ_k[k/2^k]

A fact from Discrete Math is:

Σ_kka^k <= 1/(1 - |a|), provided |a| < 1.

(The sum extends to infinity. See Rosen, Discrete Math, xxx.) Taking a = 1/2, we then have

Σ_k[k/2^k] <= 2

Extending our sum from 0 to infinity and applying this fact, we have

comp_count <= n Σ_k[k/2^k] <= 2n

which verifies that

comp_count <= O(n) and therefore
fsu::build_heap <= O(n).

One final interesting note: Worst-case comp_count for the algorithm has been established exactly!
In [Suchenek, Marek A. (2012), "Elementary Yet Precise Worst-Case Analysis of Floyd's Heap-Construction Program", Fundamenta Informaticae (IOS Press) 120 (1): 75] Suchenek shows that

comp_count = 2n - 2s₂(n) - e₂(n)

in the worst case, where s₂(n) is the number of 1's in the binary representation of n and e₂(n) is the exponent of 2 in the prime decomposition of n.

Runtime of alt :: build_heap

The opposite conclusion holds for the basic or "alt" version, which builds a heap with a loop of calls to push_heap. In that algorithm, the calls to push_heap on the sub-range [0,k+1] may require lg(k) comparisons, so the entire algorithm may require

comp_count >= Σ_k lg k

comp_count >= Ω(n log n) and therefore
alt::build_heap >= Ω(n log n).

The naming of this basic build_heap algorithm as "alt::build_heap" is not standard and will require some background exposition when using it with people not associated with this class.

Navigation optimizations

One optimization you implemented in the code is lowering the cost of the control arithmetic used in the various algorithms, where multiplying and dividing by 2 are used to calculate the child and parent indices. This integer arithmetic can be made faster by using the observations:

Multiplying by 2 is the same as left shift by 1
Dividing by 2 is the same as right shift by 1
Adding 1 to a known multiple of 2 is the same as bitwise or with the mask 0x01

For examples:

left = 2*i + 1;                 // uses integer arithmetic
left = (i << 1) | (size_t)0x01; // uses bitwise operations to get same result

parent = i/2;      // integer arithmetic
parent = (i >> 1); // same result using bitwise operations

Because integer arithmetic follows an algorithm quite a few clock cycles may be needed to perform one division or multiplication. The bitwise operations on the other hand have hardware support and may run in as little as one clock cycle.

Cormen Algorithms

The Cormen text discusses most of the algorithms in the fsu namespace above. The two main differences are:

Different names for some of the algorithms
Recursive implementations

These are implemented in the library file gheap_cormen.h and are in the namespace cormen. While the recursive implementations are quite different at the source code level, the effects of the algorithms are identical to their counterparts in namespace fsu.