Project 4: Implementing Ordered Associative Array

Using binary search tree, iterative implementation

Revision dated 01/05/18

Educational Objectives: On successful completion of this assignment, the student should be able to

===================================================
rubric to be used in assessment
---------------------------------------------------
builds                              [0..10]:   xx
foaa+.x foaa.com1 (operator[])       [0..5]:    x
foaa+.x foaa.com2 (Get, Put, Dump)   [0..5]:    x
foaa+.x foaa.com3 (Erase, Retrieve)  [0..5]:    x
foaa+.x foaa.com4 (Erase, Rehash)    [0..5]:    x 
foaa+.x foaa.com5 (Erase, Put)       [0..5]:    x 
ws2.x   on english.txt               [0..5]:    x
moaa.x 5                            [0..10]:   xx

log                                [-20..0]:  ( x)
requirements and specs             [-20..0]:  ( x)
software engineering               [-20..0]:  ( x)
dated submissions deduction    [2 pts each]:  ( x)
                                               --
total:                              [0..50]:   xx
===================================================

Background Knowledge Required: Be sure that you have mastered the material in these chapters before beginning the assignment:
Introduction to Sets, Introduction to Maps, Binary Search Trees, and Balanced BSTs.

Operational Objectives: Create an implementation of the Ordered Associative Array API using iteratively implemented binary search trees. Illustrate the use of the API by refactoring your WordSmith as a client of Ordered Associative Array API.

Deliverables:

oaa_bst.h       # the ordered associative array class template 
wordsmith2.h    # defines wordsmith refactored to use the OAA API
wordsmith2.cpp  # implements wordsmith2.h
cleanup.cpp     # from WordSmith project
makefile.ws2    # makefile for project - builds ws2.x, foaa.x, foaa+.x, and moaa.x
log.txt         # your standard work log

Procedural Requirements

  1. The official development/testing/assessment environment is specified in the Course Organizer.

  2. Keep a text file log of your development and testing activities in log.txt.

  3. Begin by copying all of the files from the assignment distribution directory LIB/proj4. Your directory should now include:

    main_ws2.cpp      # driver program for wordsmith2
    foaa.cpp          # functionality test for OAA
    moaa.cpp          # random hammer test for OAA
    oaa_bst.start     # starter kit for OAA<K,D,P>
    rantable.cpp      # random table file generator
    deliverables.ws2  # submit script configuration file
    

    By now you should have set up "submit.sh" as a command stored in your .bin directory.

  4. Define and implement the class template OAA<K,D,P>, placing the code in the file oaa_bst.h.

  5. Use the default value P = fsu::LessThan<K> for the third template parameter, so that OAA<K,D> is also automatically defined.

  6. Thoroughly test your OAA<> with the distributed test client programs foaa.cpp and moaa.cpp. Be sure to log all test activity.

  7. Define the application WordSmith, refactored as a client of OAA<fsu::String, size_t>, in the header file wordsmith2.h, and implement the refactored WordSmith in the file wordsmith2.cpp

  8. Test your refactored WordSmith thoroughly to be certain that it is a true refactoring of the original. (Refactoring is defined to be re-coding without changing the program behavior.) Again, log all test activity.

  9. Be sure to fully cite all references used for code and ideas, including URLs for web-based resources. These citations should be in two places: (1) the code file documentation and if appropriate detailed in relevant code locations; and (2) in your log.

  10. Submit the assignment by entering the command submit.sh deliverables.ws2.

    Warning: Submit scripts do not work on the program and linprog servers. Use shell.cs.fsu.edu to submit assignments. If you do not receive the second confirmation with the contents of your assignment, there has been a malfunction.

Requirements - Ordered Associative Array

  1. Implement the Ordered Associative Array API as a binary search tree using the following class template:

      template < typename K , typename D , class P = LessThan<K> >
      class OAA
      {
      public:
    
        typedef K    KeyType;
        typedef D    DataType;
        typedef P    PredicateType;
    
                 OAA  ();
        explicit OAA  (P p);
                 OAA  (const OAA& a);
                 ~OAA ();
        OAA& operator=(const OAA& a);
    
        DataType& operator [] (const KeyType& k)        { return Get(k); }
    
        void   Put       (const KeyType& k , const DataType& d) { Get(k) = d; }
        D&     Get       (const KeyType& k);
        bool   Retrieve  (const KeyType& k, DataType& d) const;
    
        void   Erase     (const KeyType& k);
        void   Clear     ();
        void   Rehash    ();
    
        bool   Empty     () const { return root_ == nullptr; }
        size_t Size      () const { return RSize(root_); }     // counts alive nodes
        size_t NumNodes  () const { return RNumNodes(root_); } // counts nodes
        int    Height    () const { return RHeight(root_); }
    
        template <class F>
        void   Inorder(F& f) const   { RInorder(root_,f); }
    
        template <class F>
        void   Preorder(F& f) const  { RPreorder(root_,f); }
    
        template <class F>
        void   Postorder(F& f) const { RPostorder(root_,f); }
    
        void   Display (std::ostream& os, int kw, int dw,     // key, data widths
                        std::ios_base::fmtflags kf = std::ios_base::right, // key flag
                        std::ios_base::fmtflags df = std::ios_base::right // data flag
                       ) const;
    
        void   DumpBW (std::ostream& os) const;
        void   Dump (std::ostream& os) const;
        void   Dump (std::ostream& os, int kw) const;
        void   Dump (std::ostream& os, int kw, char fill) const;
    
      private: // definitions and relationships
    
        enum Flags { ZERO = 0x00 , DEAD = 0x01, RED = 0x02 , LEFT_THREAD = 0x04 , RIGHT_THREAD = 0x08 ,
                     THREADS = LEFT_THREAD | RIGHT_THREAD };
    
        struct Node
        {
          const KeyType   key_;
                DataType  data_;
          Node * lchild_, * rchild_;
          uint8_t flags_; //  bit1 = red/black, bit2 = alive/dead, bit3 = left thread/child, bit4 = right thread/child
    
          Node (const KeyType& k, const DataType& d, uint8_t flags)
            : key_(k), data_(d), lchild_(nullptr), rchild_(nullptr), flags_(flags)
          {}
    
          static const char* ColorMap (uint8_t flags)
          {
            flags &= 0x03; // last 2 bits only
            switch(flags)
            {
              case 0x00: return ANSI_BOLD_BLUE;        // bits 00 = !RED |  ALIVE 
              case 0x01: return ANSI_BOLD_BLUE_SHADED; // bits 01 = !RED | !ALIVE
              case 0x02: return ANSI_BOLD_RED;         // bits 10 =  RED |  ALIVE
              case 0x03: return ANSI_BOLD_RED_SHADED;  // bits 11 =  RED | !ALIVE
              default: return "unknown color";   // unknown flags
            }
          }
    
          static char BWMap (uint8_t flags)
          {
            flags &= 0x03; // last 2 bits only
            switch(flags)
            {
              case 0x00: return 'B'; // bits 00 = !RED |  ALIVE 
              case 0x01: return 'b'; // bits 01 = !RED | !ALIVE
              case 0x02: return 'R'; // bits 10 =  RED |  ALIVE
              case 0x03: return 'r'; // bits 11 =  RED | !ALIVE
              default: return 'X';   // unknown flags
            }
          }
        
          // support for color management
          bool IsRed    () const { return 0 != (RED & flags_); }
          bool IsBlack  () const { return !IsRed(); }
          bool IsDead   () const { return 0 != (DEAD & flags_); }
          bool IsAlive  () const { return !IsDead(); }
          void SetRed   ()       { flags_ |= RED; }
          void SetBlack ()       { flags_ &= ~RED; }
          void SetDead  ()       { flags_ |= DEAD; }
          void SetAlive ()       { flags_ &= ~DEAD; }
    
          // support for search
          bool HasLeftChild       () const { return (lchild_ != nullptr) && !(IsLeftThreaded()); }
          bool HasRightChild      () const { return (rchild_ != nullptr) && !(IsRightThreaded()); }
    
          // support for threaded iterators    
          bool IsLeftThreaded     () const { return 0 != (LEFT_THREAD & flags_); }
          bool IsRightThreaded    () const { return 0 != (RIGHT_THREAD & flags_); }
          void SetLeftThread      (Node* n) { lchild_ = n; flags_ |= LEFT_THREAD; }
          void SetRightThread     (Node* n) { rchild_ = n; flags_ |= RIGHT_THREAD; }
          void SetLeftChild       (Node* n) { lchild_ = n; flags_ &= ~LEFT_THREAD; }
          void SetRightChild      (Node* n) { rchild_ = n; flags_ &= ~RIGHT_THREAD; }
    
          // node dump - may be useful during development
          void Dump (std::ostream& os = std::cout) const
          { /* see .start file for implementation */ }
        }; // struct Node
    
        class PrintNode
        {
        public:
          PrintNode (std::ostream& os, int kw, int dw,
                     std::ios_base::fmtflags kf, std::ios_base::fmtflags df )
            : os_(os), kw_(kw), dw_(dw), kf_(kf), df_(df) {}
          void operator() (const Node * n) const
          {
            if (n->IsAlive())
            {
              os_.setf(kf_,std::ios_base::adjustfield);
              os_ << std::setw(kw_) << n->key_;
              os_.setf(df_,std::ios_base::adjustfield);
              os_ << std::setw(dw_) << n->data_;
              os_ << '\n';
            }
          }
        private:
          std::ostream& os_;
          int kw_, dw_;      // key and data column widths
          std::ios_base::fmtflags kf_, df_; // column adjustment flags for output stream
        };
    
      private: // data
        Node *         root_;
        PredicateType  pred_;
    
      private: // methods
        static Node * NewNode     (const K& k, const D& d, uint8_t flags = ZERO);
        static void   RRelease    (Node* n); // deletes all descendants of n
        static Node * RClone      (const Node* n); // returns deep copy of n
        static size_t RSize       (Node * n);
        static size_t RNumNodes   (Node * n);
        static int    RHeight     (Node * n);
    
        template < class F >
        static void   RInorder (Node * n, F& f);
    
        template < class F >
        static void   RPreorder (Node * n, F& f);
    
        template < class F >
        static void   RPostorder (Node * n, F& f);
    
      }; // class OAA<>
    

    Note that the implementations of all OAA methods are discussed in the lecture notes in one form or another.

  2. Some of the required implementations are already available in the file LIB/proj4/oaa_bst.start, including the "big 4" required for proper type, all of the Dump methods, Display, and Rehash.

  3. It is worth pointing out what is NOT in these requirements that would be in a "full" Map API:

    1. Object comparison operators == and !=
    2. Iterators and iterator support
    3. The Table API: Insert, Includes
    4. Remove

    The remaining "mutator" portion of the OAA API consists of Get, Put, Clear, Erase and Rehash -- arguably the minimal necessary for a useful general purpose container.

  4. Note that the AA bracket operator is in the interface and is implemented in-line above with a single call to Get. Also note that Put is implemented with a single call to Get, which leaves Get as the principal functionality requiring implementation in order to have the AA bracket operator. The AA bracket operator, in turn, is required for the refactoring of WordSmith.

  5. The various const methods measure useful characteristics of the underlying BST and provide output useful in the development process as well as offering client programs insight into the AA structure.

  6. The color system is outlined here just as in the lecture notes. The ColorMap is used by the Dump methods to color nodes at output. Color is manipulated by the four Node methods for detecting and changing node color. (This particular map colors red nodes red, black nodes blue, and shades the background of tombstones.) Note that the color system is used only in certain future developments such as Red-Black trees and Left-Leaning Red-Black trees.

  7. There are three privately declared in-class types:

    1. The enumerated type Flags - names the various flags and assigns them to distinct bits
    2. The struct Node - the primary structural component of the tree
    3. The class PrintNode - function class facilitating Display

  8. The various "private" statements are redundant, but they emphasize the various reasons for using that designation: (1) to have private in-class definitions, such as Node or typedef statements, and to record any friend relationships that might be needed; (2) private data in the form of variables; (3) private methods; and (4) things that are privatized to prevent their use.

  9. The three traversals Inorder, Preorder, and Postorder are supported by recursive implementations (as described in the lecture notes) and are special to the iterator-free "lite" implementation. These can be omitted in a full Map implementation and replaced with iterator-based traversals.

  10. Identical Output. Output from fbst and fbst+ (screen and files) should be identical to that of the area51 examples.

Requirements - WordSmith

  1. Here is a working header file for the refactored WordSmith:

    /*  
        wordsmith2.h
    */
    
    #include <xstring.h>
    #include <list.h>
    #include <oaa_bst.h>
    
    class WordSmith
    {
    public:
      WordSmith           ();
      virtual ~WordSmith  ();
      bool   ReadText     (const fsu::String& infile, bool showProgress = 0);
      bool   WriteReport  (const fsu::String& outfile,
                           unsigned short kw = 15,   // key col width 
                           unsigned short dw = 15,   // data col width
                           std::ios_base::fmtflags kf = std::ios_base::left, // key justify 
                           std::ios_base::fmtflags df = std::ios_base::right // data justify
                          )  const;
      void   ShowSummary  () const;
      void   ClearData    ();
    
    private:
      typedef fsu::String              KeyType;
      typedef size_t                   DataType;
    
      size_t                           globalCount_;
      fsu::OAA  < KeyType , DataType > frequency_;
      fsu::List < fsu::String >        infiles_;
      static void Cleanup  (fsu::String& s);
    } ;
    

    The set "wordset_" from the original design is replaced with the ordered associative array "frequency_".

  2. Note the private terminology is changed slightly. (Of course, the API is not changed.) The main storage OAA is called frequency_ which makes very readable code of this form:

        ...
        Cleanup(str);
        if (str.Length() != 0)
        {
          ++frequency_[str];
          ++numwords;
        } // end if
        ...
    

    This snippet is the inner core of the processing loop implementing ReadText. The main loop implementing ReadText is now only 5 lines of code. (You should be certain you understand how this works, for both a new word and another encounter of an existing word. Note that the default constructor for size_t sets the initial value to 0).

  3. Another small change is that it is no longer possible to loop through the data to count the words, because we are not defining an Iterator class. We could work out a way to make this count using a traversal with a special function object that retrieves the specific frequencies, but it is simpler just to have a class variable globalCount_ that maintains the total number of words read (and is reset to 0 by ClearData()).

  4. Identical Output. Output from wordsmith2 (screen and files) should be identical to that of the area51 example.

Requirements - Makefile

A makefile named makefile.ws2 is required to build the following components: ws2.x, foaa.x, foaa+.x, moaa.x, main2_ws.o, wordsmith2.o, xstring.o. Each target should have an appropriate dependency list and a build statement. The top target "all" should depend on all of the executables (.x files).

Hints