Design hints ____________ Your program should be well designed. I give an outline of how I might design this program. Of course, you do not have to follow this design, and, in fact, you should not copy this directly. But it might give you an indication of how you might go about it. There might be mistakes in here; so use it just as a guide. I will use one .c file for the implementation of each module, and one .h for each interface. The header file will only include the interface. Functions that others do not need to know about will be declared static in the .c file, and will not be prototyped in the header file. I use a few conventions in my design: 1. If a module allocates some memory that is returned to the caller, then that module will also provide a function to free that memory. 2. Each module may contain a function that will check if the module works correctly. That way, if you port the code, or make changes, it will be relatively easy to check for correctness of the program. If the name of the module is 'xyz', then I might call this function 'Checkxyz', which will be prototyped as: int Checkxyz(), which returns 0 if the module is incorrect, and non-zero otherwise. The prototype will normally be declared in the header file. (But I do not show it in the examples here.) You can do something similar in your program if you have the time, though it is not strictly required. 3. Use typedefs to make the data type name reflect the type of entity. For example, I may use: 'typedef float *Vector', and declare objects of type vector, instead of directly declaring them as 'float *'. 4. You will normally declare a structure in the header file, if it is needed by the interface functions. It is then difficult to hide fields that you want to be 'private'. It is possible to handle this quite easily, but I will not do that here. You can meet me if you want to see some examples of that. 5. Use guards and extern C (for C linkage in C++ programs) in each header file. I have not shown these here, since I am just giving an outline. Modules for this program: vocabulary.h ____________ #define NOT_IN_VOCABULARY -1 or static const int NOT_IN_VOCABULARY = -1; typedef some structure Vocabulary; Vocabulary *CreateVocabulary(char *filename); /* Reads words from filename, allocates an efficient data structure to store the words, and returns a pointer to it. void FreeVocabulary(Vocabulary *); /* Frees the structure allocated by the above function */ int NumberOfWords(Vocabulary *); int GiveIndex(Vocabulary *, char *word); /* Returns index of word, if it is present in the vocabulary. Otherwise it returns NOT_IN_VOCABULARY */ matrices.h __________ typedef float **Matrix; /* Alternatively, create a structure that stores the number of rows and number of columns too. That would have provided better interfaces to the remaining functions. */ typedef float *Vector; Matrix CreateMatrix(int NumberOfRows, int NumberOfColumns); /* Allocate matrix and initialize to 0 */ void FreeMatrix(Matrix, int NumberOfRows); Vector CreateVector(int NumberOfElements); void FreeVector(Vector); Vector MatVecMultiply(Matrix, Vector, int NumberOfRows, int NumberOfColumns); /* The result vector is allocated here, and must be freed by a call to FreeVector */ int MaxIndex(Vector, int NumberOfElements); /* Returns the index of the largest component */ document.h __________ void GiveDocumentRepresentation(char *filename, Vector DocumentRepresentation); /* The user of this function allocates the vector and passes it as an argument. Alternatively, we could have typdefed Vector to a type called Document representation, and returned this type. However, representation of a set of documents as a matrix is an essential feature of this computation, so I have just used the simpler approach. It is also slightly more efficient that allocating and deallocating an array each time. */ documentset.h _____________ typdef struct { Matrix DocumentSetRepresentation; char **DocumentNames; int NumberOfDocuments; int VocabularySize; } DocumentSet; DocumentSet *CreateDocumentSetRepresentation(int NumberOfDocuments /* argc */, char **DocumentNames /* argv+1 */); void FreeDocumentSet(DocumentSet *); char *GiveDocumentName(DocumentSet *, int index); query.h _______ typdef Vector Query; Vector ReadQuery(Vocabulary *); /* Reads a query and returns its representation as a vector. It sacrifices some efficiency by allocating for each query. It returns NULL on reading EOF. */ FreeQuery(Query); int NotDone(Query query); /* Returns 0 if query is NULL (indicating EOF), or non-zero otherwise. For example, this could just return query. However, this function makes the code easier to read, and does not assume anything about the representation of the query. */ char *MatchDocument(DocumentSet *, Query); /* Returns the name of the document that best matches the query from the document set */ util.h ______ Any utility routines. For example, I sometimes use my version of malloc which does the error checking and handling. Each of the above files also has an associated .c file implementing the functions. I may then write a main.c that does just the following ______________________________________________________ Find vocabulary file name using getenv(); voc = CreateVocabulary(filename); docset = CreateDocumentSetRepresentation(argc,argv+1); while (NotDone(query=ReadQuery(voc))) { printf("Best match = %s\n", MatchDocument(docset, query); FreeQuery(query); } FreeQuery(query); /* Make sure you handle NULL argument to this function */ FreeDocumentSet(docset); FreeVocabulary(voc);