Version 06/04/19

Advanced Heaps, Heap Sort, and Priority Queues

Improving Heap Construction, Heap_Sort Performance, and Priority Queue algorithms

1 Heap Algorithms

This section discusses the advanced heap algorithms implemented in gheap_advanced.h. The basic heap algorithms push_heap, pop_heap, and the vanilla version of heap_sort are discussed in the COP4530 lecture notes and are implemented in the COP4530 LIB. This discussion assumes that the reader is familier with those materals.

1.1 Heap Algorithms

Begin with a review of the pop_heap algorithm and its implementing code (default order version):


  template <class I>
  void g_pop_heap (I beg, I end)
  {
    if (end - beg < 2)
      return;
    size_t n = end - beg - 1;
    size_t i = 0, left, right;
    bool finished = 0;
    g_XC(beg[0],beg[n]);
    do
    {
      left = 2*i + 1; right = left + 1;  // left and right child nodes
      if (right < n)                     // both child nodes exist
      {
        if (beg[left] < beg[right])      // ==> follow right subtree
        {
          if (beg[i] < beg[right])
          {
            g_XC(beg[right], beg[i]);
            i = right;
          }
          else
          {
            finished = 1;
          }
        }
        else // !(beg[left] < beg[right]) ==> follow left subtree
        {
          if (beg[i] < beg[left])
          {
            g_XC(beg[left], beg[i]);
            i = left;
          }
          else
          {
            finished = 1;
          }
        }
      }
      else if (left < n)       // only the left child node exists
      {
        if (beg[i] < beg[left])
        {
          g_XC(beg[left], beg[i]);
        }
        finished = 1;          // no grandchild nodes exist
      } 
      else                     // no child nodes exist
      {
        finished = 1;
      }
    }
    while (!finished);
  }

This algorithm consists of a swap of the first and last elements of a presumed heap in the range [0..n] followed by a repair of the smaller heap in the range [0..n-1]. The repair works because the two children of the root are heaps, so the only place where the heap conditions might be violated is at the root. The repair portion of this code is in blue.

We can create another function called "repair" using that blue code:


  void repair (I beg, I end)
  {
    if (end - beg < 2)
      return;
    size_t n = end - beg - 1;
    size_t i = 0, left, right;
    bool finished = 0;
    do
    {
      left = 2*i + 1; right = left + 1;  // left and right child nodes
      if (right < n)                     // both child nodes exist
      {
        if (beg[left] < beg[right])      // ==> follow right subtree
        {
          if (beg[i] < beg[right])
          {
            g_XC(beg[right], beg[i]);
            i = right;
          }
          else
          {
            finished = 1;
          }
        }
        else // !(beg[left] < beg[right]) ==> follow left subtree
        {
          if (beg[i] < beg[left])
          {
            g_XC(beg[left], beg[i]);
            i = left;
          }
          else
          {
            finished = 1;
          }
        }
      }
      else if (left < n)       // only the left child node exists
      {
        if (beg[i] < beg[left])
        {
          g_XC(beg[left], beg[i]);
        }
        finished = 1;          // no grandchild nodes exist
      } 
      else                     // no child nodes exist
      {
        finished = 1;
      }
    }
    while (!finished);
  }

and we can refactor this code into a more compact form as:


  void repair (I beg, I end)
  {
    if (end - beg < 2)
      return;
    size_t n,i,left,right,largest;
    n = end - beg;
    i = 0;
    bool finished = 0;
    do
    {
      left = 2*i + 1; right = left + 1;
      largest = ((left < n && beg[i] < beg[left]) ? left : i);
      if (right < n && beg[largest] < beg[right])
        largest = right;
      // test order property at i; if bad, swap and repeat
      if (largest != i)
      {
        fsu::g_XC(beg[i],beg[largest]);
        i = largest;
      }
      else finished = 1;
    }
    while (!finished);
  }

Be sure to convince yourself that these two blocks of code implement "repair" in exactly the same way. We can take "repair" one step further, to repair any node in the tree under the assumption that its child nodes are heaps. The place where repair is needed is passed in as a third iterator:


  void repair (I beg, I loc, I end)
  {
    if (end - beg < 2)
      return;
    size_t n,i,left,right,largest;
    n = end - beg;
    i = loc - beg; // only changes in red!
    bool finished = 0;
    do
    {
      left = 2*i + 1; right = left + 1;
      largest = ((left < n && beg[i] < beg[left]) ? left : i);
      if (right < n && beg[largest] < beg[right])
        largest = right;

      // test order property at i; if bad, swap and repeat
      if (largest != i)
      {
        fsu::g_XC(beg[i],beg[largest]);
        i = largest;
      }
      else finished = 1;
    }
    while (!finished);
  }

This code defines our new generic algorithm g_heap_repair(I beg, I loc, I end). We can immediately refactor g_pop_heap as follows:


  void g_pop_heap (I beg, I end)
  {
    if (end - beg < 2)
      return;
    g_XC(*beg,*(end - 1));
    g_heap_repair(beg,beg,end - 1);
  }

Moreover, we can use heap_repair as another way to create a heap from an arbitrary array (or other range):


  void fsu::g_build_heap (I beg, I end)
  {
    size_t size = end - beg;
    if  (size < 2) return;
    for (size_t i = size/2; i > 0; --i)
    {
      g_heap_repair(beg, beg + (i - 1), end);
    }
  }

The other way to build a heap from scratch is embodied in the first loop in the vanilla version of heap sort:


  void alt::g_build_heap (I beg, I end)
  {
    size_t size = end - beg;
    if  (size < 2) return;
    for (size_t i = 1; i < size; ++i)
    {
      g_push_heap(beg, beg + (i + 1));
    }
  }

The remarkable facts are:

alt::build_heap (using a loop of calls to push_heap) has runtime Θ(n log n)
fsu::build_heap (using a loop of calls to heap_repair) has runtime Θ(n)

These facts are all the more remarkable when you consider that the worst-case runtime for both push_heap and repair_heap are Θ(log n) and the loop of calls is Θ(n) in both cases. We will come back to explaining why these are true later. For now, just contemplate the subtlety. And realize that this provides an opportunity to improve the performance of heap_sort, by substituting fsu::build_heap for the first loop of calls to push_heap. This change does not affect the asymptotic runtime of heap_sort, because the second loop still runs in Θ(n log n). But it certainly improves the algorithm.

1.2 Code Organization

The vanilla versions of heap_sort are moved into the namespace alt. Although we no longer us a loop of calls g_push_heap to build a heap, that algorithm is still useful, notably in implementing O(log n) priority queues, so both g_push_heap and g_pop_heap are retained in the fsu namespace. The algorithms are thus organized as follows in gheap_advanced.h:


namespace fsu
{
  // using a supplied order predicate:
  g_push_heap   (I beg, I end, P& p);
  g_pop_heap    (I beg, I end, P& p);
  g_build_heap  (I beg, I end, P& p);
  g_heap_repair (I beg, I loc, I end, P& p);
  g_heap_sort   (I beg, I end, P& p);

  // using default order:
  g_push_heap   (I beg, I end);
  g_pop_heap    (I beg, I end);
  g_build_heap  (I beg, I end);
  g_heap_repair (I beg, I loc, I end);
  g_heap_sort   (I beg, I end);
} 

namespace alt
{
  // vanilla versions
  g_heap_sort   (I beg, I end, P& p);
  g_heap_sort   (I beg, I end);
}

The implementations for heap_sort (default order versions) in gheap_advanced are:


namespace alt
{
  void g_heap_sort (I beg, I end)
   {
    if (end - beg <= 1) return;
    size_t size = end - beg;
    size_t i; 
    for (i = 1; i < size; ++i)
    {
      fsu::g_push_heap(beg, beg + (i + 1));
    }
    for (i = size; i > 1; --i)
    {
      fsu::g_pop_heap(beg, beg + i);
    }
  }
}

namespace fsu
{
  void g_heap_sort (I beg, I end)
  {
    if (end - beg <= 1) return;
    size_t size = end - beg;
    size_t i; 
    fsu::g_build_heap(beg, end);
    for (i = size; i > 1; --i)
    {
      g_pop_heap(beg, beg + i);
    }
  }
}

The Θ(n log n) loop of calls to g_push_heap in the alt version is replaced with the Θ(n) call to fsu::g_build_heap(beg, end) in the fsu version.

2 Runtime of the build_heap algorithms

Comparing the fsu:: and alt:: versions of build_heap, it is clear they have structural similarity:


  void fsu::g_build_heap (I beg, I end)
  {
    size_t size = end - beg;
    if  (size < 2) return;
    for (size_t i = size/2; i > 0; --i)       // <- loop length Θ(n);
    {
      g_heap_repair(beg, beg + (i - 1), end); // <- runtime O(log n);
    }
  }

  void alt::g_build_heap (I beg, I end)
  {
    size_t size = end - beg;
    if  (size < 2) return;
    for (size_t i = 1; i < size; ++i)  // <- loop length Θ(n);
    {
      g_push_heap(beg, beg + (i + 1)); // <- runtime O(log n)
    }
  }

At the end of Section 1.1 above we asserted that the fsu::build_heap algorithm has runtime O(n), and we implied also that the runtime of alt::build_heap is Ω(n log n). Taken together these results are surprising, especially after seeing the structural similarity of the code implementing the two, where each is a loop of Θ(n) of calls to a function whose worst-case runtime is Θ(log n).

2.1 Runtime of fsu::build_heap

To gain some intuition on why fsu::build_heap is asympototically faster that alt::build_heap, notice that the fsu version can be described as follows: For each subtree, starting with the smallest and progressing to the largest, repair the structure to be a heap. This process starts out at the bottom of the tree - i.e., the leaves of the tree, which are by default already heaps. So until we reach a node with a child, there is nothing to repair (which is why we can start the loop at n/2 - the leaves are the nodes with height 0). We first go through all of the nodes with height 1, repairing as we go; then all the nodes with height 2, and so on, until we hit the node with largest height, the root, which has height log₂ n = lg n. Notice that 1/2 of the nodes have height 0 with no repair needed. Also 1/2 of the remaining nodes have height 1, so the repair process requires at most one swap. As we get nearer the top of the tree, where the "tall" nodes are, there are far fewer of them to repair.

Proposition 1. The asymptotic runtime of fsu::build_heap is ≤ O(n).

Proof. Let's say there are N(k) nodes with height k in the tree. Since heap_repair at one of these nodes requires at worst 2k comparisons, the total number of comparisons is no greater than 2 times the sum of k × N(k), the sum taken over all possible heights k:

comp_count ≤ 2 Σ_kk×N(k)

The number of nodes of height k can be calculated as no greater than the ceiling of n/2^k+1. Substituting into the summation yields

comp_count ≤ 2 Σ_kk×[ceil(n/2^k+1)] ≤ Σ_kk×[ceil(n/2^k)] ≤ nΣ_k[k/2^k]

A fact from Discrete Math is:

Σ_kka^k ≤ 1/(1 - |a|), provided |a| < 1.

(The sum extends to infinity. See Rosen, Discrete Math, xxx.) Taking a = 1/2, we then have

Σ_k[k/2^k] ≤ 2

Extending our sum from 0 to infinity and applying this fact, we have

comp_count ≤ n Σ_k[k/2^k] ≤ 2n

which verifies that

comp_count ≤ O(n) and therefore
fsu::build_heap ≤ O(n).

and the proof is complete. ∎

One final interesting note: Worst-case comp_count for the algorithm has been established exactly!
In [Suchenek, Marek A. (2012), "Elementary Yet Precise Worst-Case Analysis of Floyd's Heap-Construction Program", Fundamenta Informaticae (IOS Press) 120 (1): 75] Suchenek shows that

comp_count = 2n - 2s₂(n) - e₂(n)

in the worst case, where s₂(n) is the number of 1's in the binary representation of n and e₂(n) is the exponent of 2 in the prime decomposition of n.

2.2 Runtime of alt::build_heap

The opposite conclusion holds for the basic or "alt" version, which builds a heap with a loop of calls to push_heap.

Proposition 2. The asymptotic runtime of alt::build_heap is ≥ Ω(n log n).

Proof. In the alt::build_heap algorithm, the calls to push_heap on the sub-range [0,k+1] may require lg(k) comparisons, so the entire algorithm may require

comp_count ≥ Σ_k lg k

The sum on the right-hand side is an approximating sum for the integral ∫log x = x log x and hence comp_count ≥ Ω(n log n) and therefore
alt::build_heap ≥ Ω(n log n). ∎

The naming of this basic build_heap algorithm as "alt::build_heap" is not standard and will require some background exposition when using it with people not associated with this class.

2.3 Navigation optimizations

Another optimization you may implement in the code is lowering the cost of the control arithmetic used in the various algorithms, where multiplying and dividing by 2 are used to calculate the child and parent indices. This integer arithmetic can be made faster by using the observations:

Multiplying by 2 is the same as left shift by 1
Dividing by 2 is the same as right shift by 1
Adding 1 to a known multiple of 2 is the same as bitwise or with the mask 0x01

For examples:


left = 2*i + 1;                 // uses integer arithmetic
left = (i << 1) | (size_t)0x01; // uses bitwise operations to get same result

parent = i/2;      // integer arithmetic
parent = (i >> 1); // same result using bitwise operations

Because integer arithmetic follows an algorithm quite a few clock cycles may be needed to perform one division or multiplication. The bitwise operations on the other hand have hardware support and may run in as little as one clock cycle.

Most likely, the compiler you use will in-line these control optimizations without your help. It's still interesting to do.

2.4 Cormen Algorithms

The Cormen text discusses most of the algorithms in the fsu namespace above. The two main differences are:

Different names for some of the algorithms
Recursive implementations

These are implemented in the library file gheap_cormen.h and are in the namespace cormen. While the recursive implementations are quite different at the source code level, the effects of the algorithms are identical to their counterparts in namespace fsu.

3 Priority Queues

We have seen above that the heap algorithms facilitate the HeapSort algorithm - an important computing technology. Another well know application, perhaps even more important (since there are alternatives to HeapSort), is facilitating priority queues.

A priority queue stores elements of typename T with a priority determined by an object of a predicate class P. The operations are syntactically queue-like but have associative semantics (rather than positional semantics, as in an ordinary queue). The operations for a priority queue PQ<T,P> and informal descriptions of them are as follows:

void   Push    (const T& t) // places t in the priority queue, position unspecified
void   Pop     ()           // removes the highest priority item from the priority queue
T      Front   () const     // returns (usually by const reference) the highest priority item in the priority queue 
void   Clear   ()           // makes the priority queue empty
bool   Empty   () const     // returns true iff the priority queue is empty
size_t Size    () const     // returns the number of elements in the priority queue

Priority queues are used in several important applications, including:

Operating system job schedulers
Communications Satelite schedulers
Discrete Event simulations
Embedded/RealTime Systems
Ordering structures used in certain greedy algorithms, such as the Prim and Kruskal minimum spanning tree algorithms and the Dijkstra shortest path algorithm.

3.1 Priority Queue Implementation Plans

Priority queues are traditionally built as adaptations of other data structures. For example, if s is a container and we prioritize with less-than (so that the larger element has higher priority):

Priority Queue Implementation Strategies

Container s Operation      Implementation Runtime (Θ class)

1. ordered list Init() s.Sort() n log n

Push(t) s.Insert(t) n

Front() return s.Back() constant

Pop() s.PopBack() constant

2. ordered vector/deque Init() g_heap_sort(s.Begin(),s.End()) n log n

Push(t) s.Insert(t) n

Front() return s.Back() constant

Pop() s.PopBack() constant

3. unordered list Init() -- constant

Push(t) s.PushBack(t) constant

Front() return g_max_element(s.Begin(),s.End())      n

Pop() s.Remove(Front()) n

4. unordered vector/deque      Init() -- constant

Push(t) s.PushBack(t) constant

Front() return g_max_element(s.Begin(),s.End()) n

Pop() s.Remove(Front()) n

5. BST (balanced) Init() s.Insert(x) for all x n log n

Push(t) s.Insert(t) log n

Front() return s.Max() log n

Pop() s.Remove(s.Max()) log n

6. unordered vector/deque      Init() g_build_heap(s.Begin(),s.end()) n

Push(t) s.PushBack(t); g_push_heap(s.Begin(),s.End()) log n

Front() return s.Back(); constant

Pop() g_pop_heap(s.Begin(),s.End(); s.PopBack(); log n

Note that at least one Push/Pop/Front operation in each of implementations 1,2,3,4 has runtime Ω(n). The amortized runtime of an operation over a typical use will also be at least Ω(n).

Implementation 5, using BST does meet the goal of logarithmic runtime for Priority Queue operations Push/Pop/Front. However it has superlinear runtime for Init. Morever it also has a heavy footprint and requires copying data into a BST to initialize.

Implementation 6 is very light, has linear Init time, logarithmic runtime for Push and Pop, and constant runtime for Front. And because it operates on a simple vector or deque, it can easily operate on a client-owned container with no space overhead.

3.2 Priority Queue Class Designs

Two possible designs for PriorityQueue class can be useful: a PriorityQueue object pq may own the data itself or it may operate directly on client data: pq is either an entrepreneur or a consultant. The "entrepreneur" version will own a container in which the objects being prioritized are stored. The "consultant" version will have permission to operate on a container already owned by the client. Either of these models can be useful. We will concentrate first on the "consultant" version and then discuss modifications needed for the other.

Assume that a client program needs a priority queue operating on data that is stored in a vector v. Consider the adaptor class:

template < typename T , class C , class P > class PriorityQueue { typedef T ValueType; typedef C ContainerType; typedef P PredicateType; private: ContainerType & c_; PredicateType & p_; public: PriorityQueue(C& c, P& p) : c_(c), p_(p) {} void Init (); void Push (const ValueType& t); void Pop (); const ValueType& Front () const; void Clear () { c_.Clear(); } bool Empty () const { return c_.Empty(); } size_t Size () const { return c_.Size(); } };

The constructor receives references to the container c and predicate p from the client and stores (non-const) references to each. The operations Clear, Empty, and Size are also implemented, independent of any detailed knowledge of the nature of the container c - only requiring that the same names are implemented in c. Thus only the four operations highlighted in garnet require implementations that are dependent on the nature of c.

Exercise 1. For each implementation strategy 1-5 complete a class from the outline above by supplying appropriate code for the four operations Init, Push, Pop, and Front.

3.3 Priority Queue Implementation

We concentrate now on "plan 6" - the heap-based implementation, as described in the table:

template < typename T , class C , class P > void PriorityQueue<T,C,P>::Init() { fsu::g_build_heap(c_.Begin(), c_.End(), p_); // <- Θ(n) } template < typename T , class C , class P > void PriorityQueue<T,C,P>::Push (const ValueType& t) { c_.PushBack(t); // <- amortized Θ(1) fsu::g_push_heap(c_.Begin(), c_.End(), p_); // <- Θ(log n) } template < typename T , class C , class P > void PriorityQueue<T,C,P>::Pop () { fsu::g_pop_heap(c_.Begin(), c_.End(), p_); // <- Θ(log n) c_.PopBack(); // <- Θ(1) } template < typename T , class C , class P > const PriorityQueue<T,C,P>::ValueType& Front () const { return c_.Front(); // <- Θ(1) }

These implementations allow a client to supply data in any random order in a vector (or deque), along with a predicate object that captures the desired notion of priority, and use the PriorityQueue interface to manipulate data directly in the client-owned vector. Typical uses of PriorityQueue are "behind the scenes" in another algorithm, in similar fashion to the way Partition is used. For example, in the Kruskal Minimum Spanning Tree algorithm operating on an undirected weighted graph, both a Partition object and a PriorityQueue object are used in essential ways to guide the flow of the process.

3.4 Priority Queue Client Code

The following is sample client code designed to demonstrate how PriorityQueue is set up and also to provide transparancy to the inner workings of the operations.

/* fpq.cpp */ typedef uint32_t ElementType; ElementType sentinal = 0; typedef fsu::LessThan < ElementType > PredicateType; // Max PQ // typedef fsu::GreaterThan < ElementType > PredicateType; // Min PQ typedef fsu::Vector < ElementType > ContainerType; int main() { ElementType s; ContainerType v; PredicateType p; fsu::PriorityQueue pq(v,p); ... do { std::cout << " Enter command ('Q' to quite): "; std::cin >> command; switch(command) { case '>': // load data into v std::cin >> command; switch(command) { case 'f': case 'F': // load from file std::cout << " Enter file name: "; std::cin >> filename; ifs.open(filename.Cstr()); ... while (ifs >> s) { v.PushBack(s); } ifs.clear(); ifs.close(); break; case 'k': case 'K': // load from keyboard std::cout << " Enter elements (\"" << sentinal << "\" to stop):\n"; std::cin >> s; while (s != sentinal) { v.PushBack(s); std::cin >> s; } break; default: std::cout << " ** unknown load code\n"; } break; case 'i': case 'I': pq.Init(); break; case '+': std::cin >> s; pq.Push(s); std::cout << " " << s << " pq.Push(" << s << ")\n"; break; case '-': pq.Pop(); std::cout << " pq.Pop();\n"; break; case 'f': case 'F': s = pq.Front(); std::cout << " " << s << " = pq.Front()\n"; break; case 's': case 'S': size = pq.Size(); std::cout << " " << size << " = pq.Size()\n"; break; case 'd': case 'D': std::cout << " pq.Dump(): // if Init has been called, this should be a max heap using predicate order\n"; pq.Dump(std::cout, 1 + maxwidth); std::cout << '\n'; break; case 'o': case 'O': std::cout << " pq.PopOut(): // if Init has been called, this should be sorted in reverse predicate order\n"; pq.PopOut(std::cout, ' '); std::cout << '\n'; break; case 'm': case 'M': DisplayMenu(std::cout); break; case 'q': case 'Q': command = 'q'; break; default: std::cout << " ** command not found\n"; break; } } while (command != 'q'); }

The declaration of the PriorityQueue object is highlighted and access to the PriorityQueue API is typical of a test/demo harness. There are two cases highlighted that require further explanation.

The operations Dump and PopOut are not official components of the PriorotyQueue API. These are added for the dual purposes of development and to provide transparancy into the heap structure. Assuming Init has been called:

Dump displays the contents of the container arranged as an index-based binary tree. This should be a heap.
PopOut empties the PQ in order of decreasing priority. This should be sorted in reverse order from the predicate.

Here are implementations of these developer helpers:

template < typename T , class C , class P > void PriorityQueue<T,C,P>::Dump (std::ostream& os, int cw) const { size_t n = Size(); size_t i = 0, k = 2; while (i < n) { // one level of tree os << ' '; // indent extra space while (i < n && i < k - 1) { os << std::setw(cw) << c_[i]; ++i; } os << '\n'; k *= 2; } } template < typename T , class C , class P > void PriorityQueue<T,C,P>::PopOut (std::ostream& os, char ofc = '\0') { if (ofc == '\0') { while (!Empty()) { os << Front(); Pop(); } } else { while (!Empty()) { os << ofc << Front(); Pop(); } } }

PriorityQueue as above is implemented in LIB/tcpp/pq.h. The test harness (somewhat elaborated) is implemented in LIB/tests/fpq.cpp.

Exercise 2. Locate the executable LIB/notes_support/fpq_i.x and copy into a directory where you have execute permissions. Execute fpq_i.x on various data input until you can comfortably predict the outcome of each selection choice for that data.