Project 4: Degrees of Separation

Implementing the Kevin Bacon game

Version 10/20/16

Educational Objectives: After completing this assignment, the student should be able to accomplish the following:

Operational Objectives: Design and implement the class MovieMatch

Deliverables: Files:

moviematch.h
log.txt

Movie Distance and Kevin Bacon

The Kevin Bacon game is this: given an actor by name, what is his/her Kevin Bacon number?

To solve this we first need a clear definition of the Kevin Bacon number for an actor, or more generally, the movie distance between two actors. The definition is much like the path distance between two vertices in a graph, except using movie chains instead of edges.

A movie chain from actor x to actor y is a sequence of movies m1 m2 ... mk such that

  1. mj and mj+1 have an actor in common for 0 < j < k
  2. x is in movie m1
  3. y is in movie mk

The movie distance md(x,y) is defined to be the number of movies in a shortest movie chain from x to y. If there is no movie chain from x to y, we define md(x,y) = infinity.

The Kevin Bacon number of an actor x is the movie distance from x to Kevin Bacon.

Some consequences are:

  1. Kevin Bacon has Kevin Bacon number 0.
  2. In general, md(x,x) = 0 for any actor x.
  3. All other actors have Kevin Bacon number at least 1.
  4. In general, if x != y and x and y are actors in the same movie, then md(x,y) = 1
  5. Movie distance satisfies the triangle inequality: md(x,z) <= md(x,y) + md(y,z)

The actor-movie graph

To solve the Kevin Bacon game (or any other similar game based on another actor) we use graphs. Specifically, create a graph in which both actors and movies are vertices, and insert an edge whenever an actor is in a movie. Thus each edge has an actor for one vertex and a movie for the other.

A graph is said to be bipartite if the vertices can be colored with two colors, say red and blue, such that each edge has different colored vertices, that is, each edge goes between a blue vertex and a red vertex. Clearly the movie-actor graph is bipartite, with actors colored blue and movies colored red.

The following result is proved in discrete math courses and most books on graph theory:

Theorem. In a bipartite graph, a path whose ends have the same color has an even number of edges.

As a consequence, any path from one actor to another in the movie-actor graph has an even number of edges. If P is such a path, with length n, then n is even and n/2 is the number of movies passed through by P. If P is a shortest path from actor x to actor y, then n/2 is the movie distance from x to y. (Note, in passing, that the path P has an odd number of vertices.)

Thus to solve the Kevin Bacon game, we perform a Breadth-First survey from Kevin Bacon. The Breadth First Search Tree rooted at Kevin Bacon consists of shortest paths from Kevin Bacon to all other actors who have a finite Kevin Bacon number. Dividing the length of such a path by 2 yields the Kevin Bacon number for the actor at the other end of the path.

In practical terms, we start at an actor x and follow the parent vertices of the BFS tree back to Kevin Bacon, counting the steps. Then divide this count by 2 to get the number.

Note that the path itself provides documentation in the form of a list starting with x and then listing movie | actor in pairs until we are back to Kevin Bacon.

Procedural Requirements

  1. The official development | testing | assessment environment is g++ -std=c++11 -Wall -Wextra on the linprog machines. Code should compile without error or warning.

  2. Maintain your work log in the text file log.txt as documentation of effort, testing results, and development history. This file may also be used to report on any relevant issues encountered during project development.

  3. Copy all files from LIB/proj4, including:

    kb.cpp          # client program plays Kevin Bacon game
    line.cpp        # contains implementation of Line()
    movies.txt      # movie DB
    movies_abbreviated.txt # smaller version for debugging and optimizing
    deliverables.sh # submit configuration file
    
  4. When logged in to shell or quake, submit the project by executing "submit.sh deliverables.sh". Read the screen and watch for processing errors.

    Warning: The submit process does not work on the program and linprog servers. Use shell or quake to submit projects. If you do not receive the second confirmation with the contents of your project, there has been a malfunction.

Code Requirements and Specifications - MovieMatch

  1. MovieMatch should, at a minimum, provide services required by kb.cpp. This will require the following (partial) class definition:

    // types used
    typedef uint32_t                           Vertex;
    typedef fsu::String                        Name;
    typedef fsu::ALUGraph <Vertex>             Graph;
    typedef fsu::BFSurvey <Graph>              BFS;
    typedef hashclass::KISS<Name>              Hash;
    typedef fsu::HashTable<Name,Vertex,Hash>   AA; // associative array
    typedef fsu::Vector<Name>                  Vector;
    
    class MovieMatch
    {
    public:
           MovieMatch    ();
      bool Load          (const char* filename);
      bool Init          (const char* actor);
      long MovieDistance (const char* actor);
      void ShowPath      (std::ostream& os) const;
      void ShowStar      (Name name, std::ostream& os) const;
      void Hint          (Name name, std::ostream& os) const;
      void Dump          (std::ostream& os) const;
      ...
    };
    

  2. The underlying graph should be built from the "database" provided in the text file movies.txt. Each line of this file represents a movie and the actors in the movie. Forward slash '/' is used to delimit the strings representing movie titles and actor names in each line.

  3. The following helper function makes reading a movie DB file somewhat straightforward:

    private:
      static fsu::Vector<Name> Line(std::ifstream& is)
      {
        fsu::String line;
        char delim = '/';
        line.GetLine(is);
        char* name_buffer = new char [1+line.Size()];
        size_t pos = 0, next = 0;
        fsu::Vector<Name> movie;
        while (pos < line.Size())
        {
          next = line.Position(delim,pos);
          for (size_t i = pos; i < next; ++i)
          {
            name_buffer[i - pos]= line[i];
          }
          name_buffer[next-pos] = '\0';
          movie.PushBack(name_buffer);
          pos = ++next; // skip delimiter
        }
        delete [] name_buffer;
        return movie;
      }
    ...
    
    

    This function consumes a line of text from the stream and returns a vector whose elements are the names delimited by '/' in the file. The first element is then a movie title and all other elements are actors in that movie. Note that you are not required to use this - it can be optimized away - but it can be helpful in a draft to postpone read issues until the main functionality is built.

  4. bool Load (const char* filename)
    This method uses the data in the file to build the underlying symbol graph for the game. The symbol graph consists of these private members:

    private:
      ...
      Graph  g_;
      Vector name_;
      AA     vrtx_;
      ...
    

    name_ is a mapping: {vertices} -> {names}, and vrtx_ is a mapping: {names} -> {vertices}. Even though Vector and AA are very different structurally, they perform as mappings in the abstract, each using its bracket operator as function evaluation. These mappings are required to be mutually inverses of the other: For any vertex v, vrtx_[name_[v]] == v and for any name n, name_[vrtx_[n]] == n.

    Load must look at each name encountered (movie or actor) and, if and only if that name is not already encountered, record it as a new vertex. Then Load must add an edge [a,m] to g_ whenever a is an actor in movie m.

    It is advisable to allow Load to read the file twice: First to establish the vertices and the two mappings vrtx_ and name_; and second to insert all of the edges. Function Line() will be handy for these steps.

  5. bool Init (const char* actor)
    This method establishes actor as the base actor in the game (i.e., the "Kevin Bacon") and performs a BFS from the base actor in the graph. This BFS searches only from the base_actor vertex (not a full survey) and records all the parent info for use later during game play. The BFS survey data is thus required to be persistent, so it is maintained as a BFSurvey object:

    private:
      ...
      BFS bfs_;
      ...
    

  6. long MovieDistance (const char* actor)
    This method uses the pre-computed BFS tree to (1) determine whether actor is in the DB and retrieve its vertex if so (if (vrtx_.Retrieve(actor,v))), (2) determine whether actor is reachable from the base actor (if (bfs_.Color()[v] == 'b')). In that case, it (3) computes the path from actor to base actor (storing the path as it goes) and returns the move distance. The path is stored in the class member

    private:
      ...
      fsu::List<Vertex>  path_;
      ...
    

    MovieDistance returns -3 when the entered name is not in the DB, -2 when the name is not reachable from the base actor, -1 when the name entered is a movie (not an actor), and otherwise the movie distance between actor and base_actor.

  7. void ShowPath (std::ostream& os) const
    This method outputs the entire path as an actor-movie chain connecting actor_ to baseActor_. This is used to document the movie distance number. See area51/kb.x for suggested behavior.

  8. void ShowStar (Name name, std::ostream& os) const
    This method outputs name (which might be a movie...) followed by the names of all vertices that are adjacent to name in the graph. This is implemented using an AdjIterator:

        typename fsu::ALUGraph::AdjIterator i;
    

    Note that if name is an actor the star is a list of all movies in which the actor appears. If name is a movie, the star is a list of all actors in that movie. See area51/kb.x for suggested behavior.

  9. void Hint (Name name, std::ostream& os) const
    This method provides hints intended to be helpful when a name is not found in the DB. See area51/kb.x for one idea on behavior.

  10. void Dump (std::ostream& os) const
    This method, as expected, depicts the internal structure of the MovieMatch objects. The demonstration program area51/kb.x uses this implementation:

      void Dump(std::ostream& os)
      {
        ShowAL(g_,os);
        WriteData(bfs_,os);
        vrtx_.Dump(os);
        for (size_t i = 0; i < name_.Size(); ++i)
        {
          os << "name_[" << i << "] = " << name_[i] << '\t';
          os << "vrtx_[" << name_[i] << "] = " << vrtx_[name_[i]] << '\n';
        }
        vrtx_.Analysis(std::cout);
    }
    

    ShowAL and WriteData are in graph_util.h and survey_util.h, respectively. vrtx_.Dump() and vrtx_.Analysis() are calls to the HashTable API. The for loop shows the two mappings. Every one of these has proved helpful tracking down a bug!

  11. kb.cpp
    This client program is supplied. Note that it utilizes the entire API discussed above. The program #includes source code for all helpers in the library, so it can be compiled with one call to g++.

Hints