Educational Objectives: After completing this assignment, the student should be able to accomplish the following:
Operational Objectives: Design and implement the class MovieMatch
Deliverables: Files:
moviematch.h report.txt
The Kevin Bacon game is this: given an actor by name, what is his/her Kevin Bacon number?
To solve this we first need a clear definition of the Kevin Bacon number for an actor, or more generally, the movie distance between two actors. The definition is much like the path distance between two vertices in a graph, except using movie chains instead of edges.
A movie chain from actor x to actor y is a sequence m1 m2 ... mk such that
- mj and mj+1 have an actor in common for 0 < j < k
- x is in movie m1
- y is in movie mk
The movie distance md(x,y) is defined to be the number of movies in a shortest movie chain from x to y. If there is no movie chain from x to y, we define md(x,y) = infinity.
The Kevin Bacon number of an actor x is the movie distance from x to Kevin Bacon.
Some consequences are:
To solve the Kevin Bacon game (or any other similar game based on another actor) we use graphs. Specifically, create a graph in which both actors and movies are vertices, and insert an edge whenever an actor is in a movie. Thus each edge has an actor for one vertex and a movie for the other.
A graph is said to be bipartite if the vertices can be colored with two colors, say red and blue, such that each edge has different colored vertices, that is, each edge goes between a blue vertex and a red vertex. Clearly the movie-actor graph is bipartite, with actors colored blue and movies colored red.
The following result is proved in discrete math courses and most books on graph theory:
Theorem. In a bipartite graph, a path whose ends have the same color has an even number of edges.
As a consequence, any path from one actor to another in the movie-actor graph has an even number of edges. If P is such a path, with length n, then n is even and n/2 is the number of movies passed through by P. If P is a shortest path from actor x to actor y, then n/2 is the movie distance from x to y. (Note, in passing, that the path P has an odd number of vertices.)
Thus to solve the Kevin Bacon game, we perform a Breadth-First survey from Kevin Bacon. The Breadth First Search Tree rooted at Kevin Bacon consists of shortest paths from Kevin Bacon to all other actors who have a finite Kevin Bacon number. Dividing the length of such a path by 2 yields the Kevin Bacon number for the actor at the other end of the path.
In practical terms, we start at an actor x and follow the parent vertices of the BFS tree back to Kevin Bacon, counting the steps. Then divide this count by 2 to get the number.
Note that the path itself provides documentation in the form of a list starting with x and then listing movie | actor in pairs until we are back to Kevin Bacon.
The official development | testing | assessment environment is g++47 -std=c++11 -Wall -Wextra on the linprog machines. Code should compile without error or warning.
Copy all files from LIB/proj5, including:
kb.cpp # client program plays Kevin Bacon game line.cpp # contains implementation of Line() movies.txt # movie DB movies_abbreviated.txt deliverables.sh # submission specs
Copy the submit scripts LIB/scripts/submit.sh into your project directory, change its permissions to executable, and submit the project by executing the script.
Warning: Submit scripts do not work on the program and linprog servers. Use shell.cs.fsu.edu to submit projects. If you do not receive the second confirmation with the contents of your project, there has been a malfunction.
MovieMatch should, at a minimum, provide services required by kb.cpp. This will require the following (partial) class definition:
// types used typedef uint32_t Vertex; typedef fsu::String Name; typedef fsu::ALUGraph <Vertex> Graph; typedef fsu::BFSurvey <Graph> BFS; typedef hashclass::KISS<Name> Hash; typedef fsu::HashTable<Name,Vertex,Hash> AA; // associative array typedef fsu::Vector<Name> Vector; class MovieMatch { public: MovieMatch (); bool Load (const char* filename); bool Init (const char* actor); long MovieDistance (const char* actor); void ShowPath (std::ostream& os) const; void ShowStar (Name name, std::ostream& os) const; void Hint (Name name, std::ostream& os) const; void Dump (std::ostream& os) const; ... };
The underlying graph should be built from the "database" provided in the text file movies.txt. Each line of this file represents a movie and the actors in the movie. Forward slash '/' is used to delimit the strings representing movie titles and actor names in each line.
The following helper function makes reading a movie DB file somewhat straightforward:
private: static fsu::Vector<Name> Line(std::ifstream& is) { fsu::String line; char delim = '/'; line.GetLine(is); char* name_buffer = new char [1+line.Size()]; size_t pos = 0, next = 0; fsu::Vector<Name> movie; while (pos < line.Size()) { next = line.Position(delim,pos); for (size_t i = pos; i < next; ++i) { name_buffer[i - pos]= line[i]; } name_buffer[next-pos] = '\0'; movie.PushBack(name_buffer); pos = ++next; // skip delimiter } delete [] name_buffer; return movie; } ...
This function consumes a line of text from the stream and returns a vector whose elements are the names delimited by '/'. The first element is then a movie title and all other elements are actors in that movie. Note that you are not required to use this - it can be optimized away - but it can be helpful in a draft to postpone read issues until the main functionality is built.
bool Load (const char* filename)
This method uses the data in the file to build the underlying symbol graph for
the game. The symbol graph consists of these private members:
private: ... Graph g_; Vector name_; AA vrtx_; 2...
name_ is a mapping: {vertices} -> {names}, and vrtx_ is a mapping: {names} -> {vertices}. Even though Vector and AA are very different structurally, they perform as mappings in the abstract, each using its bracket operator as function evaluation. These mappings are required to be mutually inverses of the other: For any vertex v, vrtx_[name_[v]] == v and for any name n name_[vrtx_[n]] == n.
Load must look at each name encountered (movie or actor) and, if and only if that name is not already encountered, record it as a new vertex. Then Load must add an edge [a,m] to g_ whenever a is an actor in movie m.
It is advisable to allow Load to read the file twice: First to establish the vertices and the two mappings vrtx_ and name_; and second to insert all of the edges. Function Line() will be handy for these steps.
bool Init(const char* actor)
This method establishes actor as the base actor in the game (i.e., the "Kevin
Bacon") and performs a BFS from the base actor in the graph. This BFS
searches only from the base_actor vertex (not a full survey) and records
all the parent info for use later during game play. The BFS survey data is
thus required to be persistent, so it is maintained as a BFSurvey object:
private: ... BFS bfs_; ...
long MovieDistance (const char* actor)
This method uses the pre-computed BFS tree to (1) determine whether actor is in
the DB and retrieve its vertex if so (if
(vrtx_.Retrieve(actor,v))), (2) determine whether actor is reachable
from the base actor (if (bfs_.Color()[v] == 'b')). In that case, it (3) computes
the path from actor to base actor (storing the path as it goes) and returns
the move distance. The path is stored in the class member
private: ... fsu::List<Vertex> path_; ...
MovieDistance returns -3 when the entered name is not in the DB, -2 when the name is not reachable from the base actor, -1 when the name entered is a movie (not an actor), and otherwise the movie distance between actor and base_actor.
void ShowPath (std::ostream& os) const
This method outputs the entire path as an actor-movie chain connecting actor_to
baseActor_ This is used to document the movie distance number. See area51/kb.x
for suggested behavior.
void ShowStar (Name name, std::ostream& os) const
This method outputs the name stored in actor_ (which might be a movie...)
followed by the names of all vertices that are adjacent to actor_ in the
graph. This is implemented using an AdjIterator:
typename fsu::ALUGraph::AdjIterator i;
See area51/kb.x for suggested behavior.
void Hint (Name name, std::ostream& os) const
This method provides hints intended to be helpful when a name is not found in
the DB. See area51/kb.x for one idea on behavior.
void Dump (std::ostream& os) const
This method, as expected, depicts the internal structure of the MovieMatch
objects. The demonstration program area51/kb.x uses this implementation:
void Dump(std::ostream& os) { ShowAL(g_,os); WriteData(bfs_,os); vrtx_.Dump(os); for (size_t i = 0; i < name_.Size(); ++i) { os << "name_[" << i << "] = " << name_[i] << '\t'; os << "vrtx_[" << name_[i] << "] = " << vrtx_[name_[i]] << '\n'; } vrtx_.Analysis(std::cout); }
ShowAL and WriteData are in graph_util.h and survey_util.h, respectively. vrtx_.Dump() and vrtx_.Analysis() are calls to the HashTable API. The for loop shows the two mappings. Every one of these has proved helpful tracking down a bug!
kb.cpp
This client program is supplied. Note that it utilizes the entire API discussed
above. The program #includes source code for all helpers in the library, so
it can be compiled with one call to g++.
It is highly recommended to construct some tiny fake movies files. Spend a few minutes creating these to model specifc cases of graph structure, accessibility, and redundancy. Keep a hand drawing of the symbol graphs for these examples so that the Dump output can be hand-traced. Note that Dump is called by kb.cpp when there is a third command line argument:
kb.x m_test.1 name # runs kb.x with DB = m_test.1 and base actor = name kb.x m_test.1 name y # same as above, with a call to Dump after Load and Init
It is also advisable to read the source code kb.cpp to understand what it is asking your MovieMatch object to do.
When you need a string with blanks in it to be read as a single command line argument, enclose it in single quotes:
kb.x movies.txt 'Bacon, Kevin' # runs kb.x with base actor = 'Bacon, Kevin'
Here is a partial list of technologies used in this project:
graphs graph search & survey path computation in graphs associative arrays [hash tables] generic optimized quicksort generic binary search
(The last two are optionally used by Hint().)
Be careful to keep in mind the dual personality of the AA bracket operator: aa[key] behaves as "insert key" when key is not in the table. In a const environment you are protected. The const bracket operator will be called and fail if you accidentally use it in insert mode. You can use the const method HashTable::Retrieve to probe whether a name is already a key in the AA. Otherwise, it is advised to use the AA bracket operator for readability.
Load time can be an issue. kb.cpp has a built-in timer for the load operation, and we'll run a contest of sorts on this measure. In designing your Load function, be aware of runtime in every step of the plan. There are a lot of places where choosing one direction over another can have a dramatic affect on Load time.
The supplied executable area51/kb.x requires about 0.41 seconds to load movies_abbreviated.txt (190 movies and 10,190 actors) and 4.25 seconds to load movies.txt (4,188 movies and 115,241 actors). Note that the two ratios 4.25/0.41 and (4188 + 115241)/(190+10190) are approximately equal, informally indicating linear runtime growth.
Nevertheless there are aspects to the Load process (as implemented for kb.x) that can be optimized to reduce the load time.
The implementation of Hint that is illustrated in area51/kb.x uses yet another item we have worked on: g_quick_sort_3w_opt. This generic algorithm is used to sort a vector hint_ which is built during the first read loop and consists of all names (actor and movie names). This sort is done after the graph has been established. (We do the sort as part of Init so it doesn't add to Load runtime.)
Once hint_ is sorted, the generic binary search algorithms can be used to locate small ranges in the vector surrounding an input name.
Hint() is needed because it is difficult to recall the exact name of an actor. For example, "Wayne, John" is not found in the DB ... Huh? ... ok, the hint shows us he is officially "Wayne, John (I)".
Note BTW that you can mouse-select an entire line of Hint output on screen and "paste selection" will pipe the selection directly into input for a running kb.x.
Aside from bragging rights for "best load time" (self-reported on Blackboard), style points can be awarded for "intuitive hints" (AI anyone?) and for finding an actor with maximal Kevin Bacon number (again, self-report to Blackboard).
The Kevin Bacon number of an actor using movies_abbreviated.txt is not necessarily the actual Kevin Bacon number. (Explain this.)