Project 3: WordSmith

Educational Objectives: On successful completion of this assignment, the student should be able to

Background Knowledge Required: Be sure that you have mastered the material in these chapters before beginning the project:
Binary Trees and Iterators, Binary Tree Construction, Associative Containers, Sets and Maps 1, Associative Binary Trees,

Operational Objectives: Create (1) a client WordSmith of CSet<> that serves as a text analysis application; and (2) an implementation of Binary Search Tree TBST<> that serves as an implementation platform for CSet<>.

Deliverables: wordsmith.h, wordsmith.cpp, tbst.h, makefile, log.txt.

Procedural Requirements

  1. Begin by copying all of the files from the project distribution directory:

    proj3/main.cpp        # driver program for wordsmith
    proj3/makefile        # makefile for project
    proj3/data?           # sample word files
    proj3/ftmolist.cpp    # ftoac.cpp with Container = TMOList
    proj3/ftuolist.cpp    # ftoac.cpp with Container = TUOList
    proj3/ftbst.cpp       # ftoac.cpp with Container = TBST
    proj3/proj3submit.sh  # submit script
    tests/ftoac.cpp       # functionality test for ordered associative containers
    tests/fcset.cpp       # functionality test for CSet 
    

  2. Define and implement the classes TBST<T,P> and TBST<T,P>::Iterator in the file tbst.h. Also place all supporting definitions and implementations, such as operator overloads, in this file.

  3. Be sure to fully cite all references used for code and ideas, including URLs for web-based resources. These citations should be in the file documentation and if appropriate detailed in relevant code locations.

  4. Test your classes using the distributed test harnesses fcset.cpp and ftbst.cpp.

  5. Write a brief description of your development and test methods and results and place this in the file header documentation of tbst.h.

  6. Submit the project using the script proj3submit.sh.

    Warning: Submit scripts do not work on the program and linprog servers. Use shell.cs.fsu.edu to submit projects. If you do not receive the second confirmation with the contents of your project, there has been a malfunction.

Project Overview: The project consists of two orthogonal tasks: (1) creation of the WordSmith application, which is a client of fsu::CSet, and (2) creation of the BinarySearchTree associative container used to efficiently implement fsu::CSet. These tasks are discussed seperately below.

  1. The WordSmith Client
    1. Functionality Requirements.
      1. WordSmith can read an arbitrary text file on command and extract all of the words in the file, maintaining the unique words, along with the frequency of occurrence of each word, in a set. Letters are converted to lower case before comparison and storage. A word is understood to be a string of letters and/or digits, with certain other symbols allowed. Most non-alpha-numeric characters are ignored. Exceptions are hyphens and apostrophes, which are considered part of the word, so that contractions and hyphenated constructs are counted as individual words. (Note: two adjacent apostrophes are not considered part of a word, since they represent closing of a quotation.)
      2. WordSmith can write an analysis of its current stored words. This analysis consists of a lexicographical listing of the unique words together with their frequencies, followed by a count of the total number of words and the vocabulary size (number of unique words). Note that this is a cumulative analysis over all of the input files read since starting up TA (or since the last clearing operation).
      3. WordSmith must operate with the supplied driver program LIB/proj3/main.cpp which has a user interface with the following options:
        1. Read a file. Read the words of the file into the structure (and report summary to screen).
        2. Write an analysis of the current data (including input file names) to a file (and report summary to screen).
        3. Erase current data and clear all data from the structure.
        4. Show current size and send a data summary to the screen.
        5. display Menu.
        6. eXit program.
        Use the source code in the driver program main.cpp to determine the syntax requirements for the WordSmith public interface. Use the executables in area51 to model expected behavior.
      4. From any directory having access to the course library and containing your submission files, entering "make" should result in an executable called "wordsmith.x".
    2. Implementation Requirements.
      1. You should define a class WordSmith, declared in the file wordsmith.h and implemented in the file wordsmith.cpp. An object of type WordSmith is used by the driver program to create the executable wordsmith.x.
      2. The primary data structure used for storing words and wordcounts should be an object of type
        fsu::CSet < EntryType , ContainerType >, where ContainerType is an associative container and EntryType is typedef'd as fsu::TPair<fsu::String, unsigned long>. Note that an EntryType object holds a word and a wordcount.
      3. Note that the fsu::Pair template class has comparison operators defined that emphasize the first coordinate of the pair (called the "key"), so that two pairs are considered equal, for example, if they have equal keys.
      4. The structure used for storing file names should be an object of type fsu::TList<fsu::String>.
      5. The application should function correctly in every respect using fsu::TUOList < EntryType > for ContainerType.
      6. Changing the structure used for ContainerType should be as simple as changing one typedef statement in the WordSmith class declaration.
      7. As usual, you should employ good software design practice. Your application should be completely robust and all classes you define should be thoroughly tested for correct function, robust behavior, and against memory leaks. Your wordsmith.x should mimic, or improve upon, the behavior illustrated in area51/wordsmith_?.x.
  2. The Binary Search Tree Container
    1. The container class TBST<T,P> should be declared and implemented in the file tbst.h
    2. TBST<T,P> may either be derived from TBinaryTree<T> [inheritance], use a private TBinaryTree<T> object [adaptation], or developed stand-alone. The template parameter P should have a default value P = TLessThan<T>.
    3. TBST<T,P> should be a proper type and implement the interface illustrated in Chapter 16, Slide 5.
    4. The following methods should have runtime O(d) where d is the depth of the tree:
      Iterator Insert     (const T& t);              // insert in order
      Iterator LowerBound (const T& t) const;        //
      Iterator UpperBound (const T& t) const; 
      Iterator Includes   (const T& t) const;        // returns LowerBound() or End()
      size_t   Remove     (const T& t);              // remove (all copies of) t 
      
    5. The following methods should have constant runtime O(1):
      bool     Insert     (Iterator& i, const T& t); // insert only if location is correct
      bool     Remove     (Iterator& i);             // remove item at i
      
    6. All insertion methods should use type "U" semantics.
    7. TBST<T,P> should compile and function correctly with the client proj3/ftbst.cpp [tests/ftoac.cpp set to case Cd: uni-bst].
  3. Scoring
    1. Level 1 (80 points): WordSmith based on CSet < EntryType , TUOList < EntryType > >
    2. Level 2 (100 points): level 1 plus WordSmith based on CSet < EntryType , TBST < EntryType > >
    3. Level 3 (120 points): level 2 plus TBST < T , P > passes all tests using ftbst.cpp

Hints