Project 1: Word/letter usage statistics

Due: January 23, 2004

Educational Objectives: Experience text processing techniques; experience using makefiles to organize and compile applications; experience using namespaces.

Statement of Work: Implement a program that collects the statistics of word/letter usage in a file (redirected as the standard input). A word is defined as a sequence of letters or numbers. Words are separated by non-letter and non-digit characters. For example 'aaa111:111bbb' contains two words 'aaa111' and '111bbb'. Your program should record the number of times each word/letter happens in the file.

Deliverables: Turn in files program1.cpp and makefile using the project1submit.sh script. You can copy project1submit.sh script from scripts/project1submit.sh.

Requirements:

  1. Create a subdirectory called proj1. At this point you should have six subdirectories of cop4530: cpp, tcpp, tests, examples, partials, and proj1. Make sure your code distribution directories are up to date by invoking your "update'' command.
  2. For this project you need to create two files: program1.cpp, and makefile. Both files should be placed in the proj1 directory.
  3. The file program1.cpp should contain the main function, int main(). In the main() function, the program should read the input until it reaches the end, counting the number of times each word/letter is used. The program should then output the ten most used letters and the ten most used words as well as the number of times these letters/words are used. The letters and words should be outputed in the descending order based on the number of times they are used in the file. When two letters happen the same number of times in the file, the letter with a smaller ASCII value should be considered as being used more frequently. When two words happen the same number of times, the word that occurs earlier in the file should be considered as being used more frequently. An example executable (for the program machines) 'proj1.x' is given in the area51 directory. You should make the outputs of your program the same as those of 'proj1.x'. When printing letters, use '\t' for tab and '\n' for newline. All other letters should be outputted normally.
  4. Write a makefile for your project that compiles an executable called program1.x
  5. You are not allowed to use any C++ container class: You must maintain a simple container (to record words and their corresponding counters) from scratch.
Hints: Extra credits: