Project 4: Ordered Sets

Educational Objectives: On successful completion of this assignment, the student should be able to

Background Knowledge Required: Be sure that you have mastered the material in these chapters before beginning the project:
Trees 1: Definitions and Theory , Trees 2: Binary Trees, Navigators, and Iterators , Trees 3: Dynamic Binary Tree Construction , Trees 4: Associative Binary Trees , Generic Associative Containers , and Set and Map Abstractions .

Operational Objectives: Implement a generic container class THBT<T,P> based on the AVL tree algorithms for insertion and removal of elements and satisfying the conditions detailed in this document.

Deliverables: One file:

thbt.h   # contains the THBT definitions and implementations

AVL Trees

An AVL tree is a binary tree in which the heights of the left and right subtrees of each node differ by at most 1. Note this definition is equivalent to the recursive version: an AVL tree is a binary tree in which the heights of the left and right subtrees of the root differ by at most 1 and in which the left and right subtrees are again AVL trees. The name "AVL" derives from the names of the two inventors of the technology, G.M. Adelson-Velskii and E.M. Landis [An algorithm for the organization of information, 1962.] We will adopt the more neutral terminology height-balanced tree (abbreviated HB tree or HBT) in discussing the abstract data type and its implementations, but retain the attributive names when discussing the underlying algorithms.

Because an HB tree (HBT) is a binary search tree (BST), there is a well defined binary search algorithm in an HBT that follows descending paths. An important feature of HBTs is that they have height bounded above by 1.5 log2 (n + 1), where n is the number of nodes in the HBT, so an HBT must be fairly "bushy". (In stark contrast, a BST can be severely "leggy", with height n - 1. We can define bushy and leggy asymptotically as having height O(log n) and Ω(n), respectively. Note that "sparse" is a synonym for "leggy" and "dense" is a synonym for "busy". "Bushy and "leggy" are terms from gardening. "Dense" and "sparse" are from graph theory.)

Theorem 1. Suppose an HB tree has n vertices and height H. Then:

Proof. The first inequality is true for any binary tree: the maximum number of vertices a binary tree can have is Σk2k, the sum ranging over all layers k = 0...H. This sum evaluates to 2H +1 - 1. Therefore n <= 2H +1 - 1 < 2H +1. Taking log2 of boths sides yields the first result.

We concentrate now on the second claim. Let n(H) be the minimum number of vertices an HB tree of height H can have. Clearly n(0) = 1, since a tree of height 0 consists exactly of the root vertex. Also n(1) = 2, by looking at all cases. As an inductive step, note that an HB tree of height H with minimal vertex count must have one subtree of height H - 1 and another subtree of height H - 2. Thus n(H) = n(H-1) + n(H-2) + 1. In summary, we have the following recurrance relation:

n(0) = 1
n(1) = 2
n(H) = n(H-1) + n(H-2) + 1

Consider the Fibannaci recursion given by:

f(0) = 0
f(1) = 1
f(H) = f(H-1) + f(H-2)

Assertion: n(H) > f(H+2) - 1

Proof:

Base cases:
n(0) = 1, f(2) - 1 = 1 - 1 = 0, so n(0) > f(2) - 1
n(1) = 2, f(3) - 1 = 2 - 1 = 1, so n(1) > f(2) - 1

Inductive case:
n(H + 1)
       = n(H) + n(H-1) + 1 # definition of n
       > (f(H+2) - 1) + (f(H+1) - 1) + 1 # inductive hypothesis
       = f(H+2) + f(H+1) - 1
       = f(H+3) - 1

Because both sides of the inequality are integers, we can rephrase the previous assertion as:

Assertion: n(H) >= f(H+2)

Exercise 4-5 in your textbook concludes that f(H + 2) >= φH / SQRT(5), where φ = (1 + SQRT(5))/2, the golden ratio. Whence we obtain:

Assertion: n(H) >= φH / SQRT(5)

It follows that

n(H) >= φH / SQRT(5)
SQRT(5) n(H) >= φH
logφ(SQRT(5) n(H)) >= H
logφSQRT(5) + logφn(H) >= H

Thus we have proved the theorem with A = log2φ and B = (log2φ) (log2SQRT(5))

Theorem 2. BST search in an HBT has worst case run time <= O(log n), where n is the number of nodes.

The challenge is to make sure that the HBT properties are maintained as we insert and remove elements. It turns out that the HBT properties do not necessarily hold after an ordinary BST insert or remove operation, but that there are "repair" algorithms that bring the resulting BST back into compliance with the HBT definition. These algorithms restructure the BST by pruning and re-hanging subtrees and are called rotations.

Rotations are constant time algorithms, and they are combined into repair algorithms that iterate along a descending path in the HBT. It follows that BST insert or remove, followed by HBT repair, has run time O(log n). Consequently

Theorem 3. HBT insert and remove have worst case run time <= O(log n).

Procedural Requirements

  1. Begin by copying all of the files from the project distribution directory:

    proj4/thbt.partial    # contains a good start, including all "boiler plate"
    proj4/fcset.cpp       # test the ADT set implemented as HBT
    proj4/ftbst.cpp       # test the HBT as a tree
    proj4/sthbt.cpp       # sledge hammer test of insert and remove
    proj4/proj4submit.sh  # submit script
    

  2. Start your code file by copying thbt.partial to thbt.h.

  3. Define and implement the classes THBT<T,P> and THBT<T,P>::Iterator in the file thbt.h. Also place all supporting definitions and implementations, such as operator overloads, in this file.

  4. Be sure to fully cite all references used for code and ideas, including URLs for web-based resources. These citations should be in the file documentation and if appropriate detailed in relevant code locations.

  5. Test your classes using the distributed test harnesses fcset.cpp and ftbst.cpp.

  6. Write a brief description of your development and test methods and results and place this in the file header documentation of thbt.h.

  7. Submit the project using the script LIB/submitscripts/proj4submit.sh.

    Warning: Submit scripts do not work on the program and linprog servers. Use shell.cs.fsu.edu to submit projects. If you do not receive the second confirmation with the contents of your project, there has been a malfunction.

Code Requirements and Specifications

  1. T is the element type stored in the container.

  2. P is the predicate type used to determine order in the container.

  3. THBT<T,P> should implement HB/AVL binary search tree with unimodal semantics. In particular, Insert(t), Remove(t), and Includes(t) should have worst case runtime O(log n) where n is the number of elements in the tree.

  4. THBT<T,P>::Iterator should be a bidirectional iterator for THBT<T,P>.

  5. THBT<T,P>::Iterator should be a "const" iterator, meaning that it does not facilitate writing to a stored value.

  6. The increment and decrement operators of THBT<T,P>::Iterator should be have amortized constant runtime.

  7. THBT<T,P> and THBT<T,P>::Iterator should support the standard traversal loops

    for (typename THBT<T,P>::Iterator i = t.Begin(); i != t.End(); ++i) // forward traversal
    {
      // loop body
    }
    

    and

    for (typename THBT<T,P>::Iterator i = t.rBegin(); i != t.rEnd(); --i) // reverse traversal
    {
      // loop body
    }
    

    (where t is an object of type THBT<T,P>) with the usual expectations of reciprocity.

  8. The standard forward traversal should encounter the elements of the tree in inorder order.

  9. THBT<T,P> and THBT<T,P>::Iterator should be in the fsu namespace.

  10. THBT<T,P> should function correctly with the following definition of SetType:

    typedef fsu::CSet < T , THBT < T , fsu::TLessThan < T > > > SetType;
    

    That is, objects of type SetType should exhibit correct behavior for a unimodal set of elements of type T, as defined in the file cset.h and tested with fcset.cpp.

  11. Re-use code and reduce the risk of memory leaks and other pointer-related problems by inheriting from TBST<T,P> as follows:

    template < typename T , class P = TLessThan < T > >
    class THBT : public TBST<T,P>
    {
    public:
      typedef T                              ValueType;
      typedef P                              PredicateType;
      typedef TBinaryTreeInorderIterator<T>  Iterator;
      typedef TBinaryTreeNavigator<T>        Navigator;
    
      // constructors, assignment, etc for proper type
    
      // inherit all methods except Insert and Remove
    
      // override the associative Insert and Remove 
      // insert and remove proceed with the classic BST algorithm, but follow with 
      // the AVL rebalance algorithms, which are isolated as private methods
    
      // privatize the locational insert and remove to prevent use OR:
    
      // You may find it convenient to implement the locational remove method
      // and call it for the associative remove method. This is a n acceptable
      // alternative. Of course if you want to also implement the locational insert,
      // that's OK too.
    
      // use the "color" field of TNode to keep track of balance factors at each
      // node.
    
    private:
      // optional helper methods
      Navigator BalanceLeft (Navigator);
      Navigator BalanceRight (Navigator);
      // These call BinaryTree::RotateLeft and BinaryTree::RotateRight, which is
      // defined and should be used (not re-invented).
      // They are called in the implementations of Insert and Remove.
    };
    

    Refer to the notes on Sorted List and also the code file LIB/tcpp/tolist.h for some ideas and discussion.

Hints