This document is officially released and open for comment.

Project 2: Hash Analysis

Analysis Methodology for Hash Tables

Educational Objectives: After completing this assignment, the student should be able to accomplish the following:

Operational Objectives: Add two methods to the class template THashTable<K,T,H>:

size_t  THashTable<K,T,H>::MaxBucketSize () const;
void    THashTable<K,T,H>::Analysis      (std::ostream& os) const;

conforming to the requirements and specifications given below.

Deliverables: One file:

thashtbl.cpp  # contains implementations MaxBucketSize and Analysis

Note this is a slave file for thashtbl.h.

Procedural Requirements

  1. The official development | testing | assessment environment is gnu g++ on the linprog machines.

  2. Begin by copying the following files from the course directory into your proj2 directory:

    LIB/proj2/thashtbl-stub.cpp     # slave file - stubbed versions of Analysis and MaxBucketSize
    LIB/proj2/makefile              # builds most executables
    LIB/proj2/proj2submit.sh        # submit script
    LIB/tcpp/thashtbl.h             # THashTable<> and THashTableIterator<>, except Analysis and MaxBucketSize
    LIB/tcpp/thashtbl-nonprime.h    # number of buckets is not forced to be prime number
    LIB/tests/fthtbl.cpp            # test harness for hash tables
    LIB/tests/hashcalc.cpp          # calculates has values interactively
    LIB/tests/hasheval.cpp          # test focusing specifically on Analysis
    LIB/tests/rantable.cpp          # creates random  table data 
    LIB/area51/fthtbl_i.x
    LIB/area51/fthtblSimple_i.x
    LIB/area51/rantable_i.x
    LIB/area51/hashcalc_i.x
    LIB/area51/hashevalKISS_i.x
    LIB/area51/hashevalMM_i.x
    LIB/area51/hashevalSimple_i.x
    

    The executables in area51 are distributed only for your information and experimentation. You have the source code for these (except for thashtbl.cpp and can build these to test your own code.

    The file thashtbl.h is copied ONLY FOR YOUR CONVENIENCE. Note this file is NOT submitted to your portfolio, so any code you write must deal with this file as it currently exists in the course library at the time.

  3. Your file thashtbl.cpp should contain implementations of Analysis and MaxBucketSize.

  4. Copy the file proj2submit.sh into your project directory, change its permissions to executable, and submit the project by executing the script.

    Warning: Submit scripts do not work on the program and linprog servers. Use shell.cs.fsu.edu to submit projects. If you do not receive the second confirmation with the contents of your project, there has been a malfunction.

Code Requirements and Specifications - MaxBucketSize and Analysis

  1. MaxBucketSize should return the size of the largest bucket in the hash table instance.

  2. Analysis should result in a display (to the std::ostream passed in) as follows:

    table size: 9997 number of buckets: 9973 nonempty buckets: 6326 max bucket size: 7 expected search time: 2.00 actual search time: 2.58 bucket size distributions ------------------------- size actual theory (uniform random distribution) ---- ------ ------ 0 3647 3659.9 1 3685 3669.0 2 1846 1838.9 3 608 614.4 4 145 153.9 5 37 30.9 6 4 5.2 7 1 0.7 8 0.1

    This display shows the size of the table, number of buckets, number of non-empty buckets, max bucket size, expected search time [1 + (table size)/(number of buckets)], actual average search time [1 + (table size)/(number of non-empty buckets)]. Then a tabular printout of the bucket size distribution follows, showing the bucket size, actual number of buckets of that size, and the expected number for simple uniform hashing. The table print terminates for bucket size n when there are no buckets of size > n and the theoretical size is < 0.1.

  3. Use the algorithms you developed in Assignment 4.

  4. Thoroughly test your implementation for correct functionality using the provided test clients fthtbl.cpp and hasheval.cpp using a variety of tables you create with rantable.cpp. Be sure to test using variations:

    1. Tables of various sizes, small to very large (at least 1,000,000)
    2. Varieties of hash functions (three are provided: KISS, MM, and Simple)
    3. Ratio of (approximate) number of buckets and table size (0.01, 0.1, 1.0, 10.0 are suggested)
    4. Prime / nonprime number of buckets

    The test harnesses are easily changed via comment/uncomment of typedefs to accomodate these variations.

  5. Write a short summary paper giving your experience and lessons learned during the testing of variations as above. Turn this in as a pdf document under Assignment 5.

Hints