Project 6: RabinKarp

Monte Carlo & Las Vegas Substring Search

Educational Objectives: After completing this assignment, the student should be able to accomplish the following:

Operational Objectives: Design and implement the class template RabinKarp< Radix, Prime > that acts as a function object on input strings, returning the location of a first match or the length of the input string when no match is found.

Deliverables: Files:

rk.h      # definition & implementation of template< size_t Radix, size_t Prime > class RabinKarp
log.txt   # project log and commentary

Procedural Requirements

  1. The official development | testing | assessment environment is clang++ -std=c++11 on the linprog machines. Code should compile without error or warning.

  2. File log.txt should explain how the software was developed, how it was tested, and how it is expected to be operated.

  3. Copy the files LIB/scripts/submit.sh and LIB/proj6/deliverables.RK into your project directory. (submit.sh may be omitted if the submit system is executable in your /.bin.) Submit the project by executing the script: submit.sh deliverables.RK

    Warning: Submit scripts do not work on the program and linprog servers. Use shell.cs.fsu.edu to submit projects. If you do not receive the second confirmation with the contents of your project, there has been a malfunction.

Code Requirements and Specifications

  1. Class RabinKarp < size_t R , size_t P > should conform to the outline given in Strings 2.

  2. Test Harness. The class should function correctly using the supplied test harness frk.cpp.

  3. Monte Carlo Rule. The pattern is matched with high probability. The runtime is guaranteed O(n + k), where n is incoming string length and k is the pattern length.

  4. Las Vegas Rule. The pattern match is guaranteed. The runtime is O(n + k) with high probability, where n is incoming string length and k is the pattern length. Worst case runtime is Ω(n×k).

  5. Probability. There should be a method long double Probability() const that returns the probability estimate of success under either rule.

  6. Generalizations. Three directions for generalization should be discussed in log.txt.

    1. Discuss how the design might accomodate alphabets other than ASCII.
    2. Discuss design modifications that would accommodate very large search spaces, such as the text of a large book.
    3. Discuss how the design could be generalized (or perhaps re-vamped entirely) to accommodate multi-dimensional patterns of characters. The 2-dimensional case can be used for this discussion.

Hints