Project 5: Kevin Bacon

Final Algorithms Project

Educational Objectives: After completing this assignment, the student should be able to accomplish the following:

Operational Objectives: Design and implement the following classes:

  1. Symbol Graph
  2. Movie Match Game

You may have teams of 2 or 3 people. The team should compose a brief summary of work that explains the responsibilities and work products each member of the team accomplished. Also each team member should submit the project individually. Please make certain that the submissions for each member of a team are identical.

Deliverables: Files:

readme.txt     
symgraph.h     
moviematch.h   
kevinbacon.cpp # your client, may be same as kb.cpp or an elaboration
makefile       # builds all executables in project, including tests

Procedural Requirements

  1. The official development | testing | assessment environment is g++47 -std=c++11 -Wall -Wextra on the linprog machines. Code should compile without error or warning.

  2. Each member of a team submits all team deliverables

  3. Deliverables submitted should be identical across all team members.

  4. The team makup is listed in the file header documentation of each submitted file (see C++ Style link for standards)

  5. File readme.txt explains how the software was developed, what responsibilities each team member had, how it was tested, and how it is expected to be operated.

  6. Copy all files from LIB/proj5, including:

    kb.cpp # sample client program for MovieMatch
    movies.txt
    movies_abbreviated.txt
    
  7. Copy the file LIB/proj5/proj5submit.sh into your project directory, change its permissions to executable, and submit the project by executing the script.

    Warning: Submit scripts do not work on the program and linprog servers. Use shell.cs.fsu.edu to submit projects. If you do not receive the second confirmation with the contents of your project, there has been a malfunction.

Code Requirements and Specifications - SymbolGraph

  1. Class SymbolGraph implements a graph whose vertices are symbols (typically strings). The API is largely the same as that of the abstract graph classes discussed Project 4, with the additional ability to adjust the vertex size "on the fly" using the Push() operation.

    namespace fsu
    {
      template < typename S , typename N >
      class SymbolGraph
      {
      public:
        typedef S      Vertex;
        typedef xxxxx  AdjIterator;
    
        void   SetVrtxSize  (N n);
        void   AddEdge      (Vertex from, Vertex to);
        size_t VrtxSize     () const;
        size_t EdgeSize     () const;
        size_t OutDegree    (Vertex x) const;
        size_t InDegree     (Vertex x) const;
        AdjIterator Begin   (Vertex x) const;
        AdjIterator End     (Vertex x) const;
    
        void   Push         (const S& s); // add s to the vertex set
    
        // access to underlying data
        const ALUGraph<N>&      GetAbstractGraph() const; // reference to g_
        const HashTable<S,N,H>& GetSymbolMap() const; // reference to s2n_
        const Vector<S>&        GetVertexMap() const; // reference to n2s_
    
        SymbolGraph ( );
        SymbolGraph ( N n );
        ...
      private:
        ALUGraph<N>      g_;
        HashTable<S,N,H> s2n_;
        Vector<S>        n2s_;
        ...
      };
    } // namespace fsu
    

    where xxxxx is the adjacency iterator type. There is a directed version SymbolDirectedGraph<S,N> whose implementation is almost identical to the undirected case, except using ALDGraph<N> as the underpinning abstract graph type.

  2. The template arguments are S = SymbolType and N = IntegerType. S is the type for the names of vertices, and is typically some form of string. N is the parameter to instantiate the underpinning abstract graph.

  3. s2n_ is an associative array, or mapping, from symbols to vertices in the abstract graph g_. n2s_ is the inverse mapping from vertices in g_ to symbols. The symbol graph uses the two mappings to translate symbols to abstract vertices and calls operations in the abstract graph.

Code Requirements and Specifications - MovieMatch

  1. MovieMatch should, at a minimum, provide services required by kb.cpp. This will require the following (partial) class definition:

    class MovieMatch
    {
    public:
    
      MovieMatch (const char* baseActor) : baseActor_(0)
      {
        size_t length = strlen(baseActor);
        baseActor_ = new char [length + 1];
        baseActor_[length] = '\0';
        strcpy (baseActor_,baseActor);
      }
    
      void Load (const char* filename);
      // loads a movie/actor file
    
      unsigned long MovieDistance(const char* actor);
      // returns the number of movies required to get from actor to baseActor_
    
      ...
    
    private:
      char* baseActor_;
      SymbolGraph < fsu::String , size_t > sg_;
      ...
    };
    

    (The names can be your choice, except for those required by the distributed client program.)

    If you prefer you may build the symbol graph directly in MovieMatch.

  2. The underlying graph should be built from the "database" provided in the text file movies.txt. Each line of this file represents a movie and the actors in the movie. Forward slash '/' is used to delimit the strings representing movie titles and actor names in each line.

  3. It will be helpful to use either the cstring library or std::string to read entire lines and break them up into strings using the '/' delimiter, so that spaces are captured. We will distribute a client program for MovieMatch that illustrates this approach by allowing actor names (with blanks) to be entered through the keyboard.

Movie Distance and Kevin Bacon

The Kevin Bacon game is this: given an actor by name, what is his/her Kevin Bacon number?

To solve this we first need a clear definition of the Kevin Bacon number for an actor, or more generally, the movie distance between two actors. The definition is much like the path distance between two vertices in a graph, except using movie chains instead of edges.

A movie chain from actor x to actor y is a sequence m1 m2 ... mk such that

  1. mj and mj+1 have an actor in common for 0 < j < k
  2. x is in movie m1
  3. y is in movie mk

The movie distance md(x,y) is defined to be the number of movies in a shortest movie chain from x to y. If there is no movie chain from x to y, we define md(x,y) = infinity.

The Kevin Bacon number of an actor x is the movie distance from x to Kevin Bacon.

Some consequences are:

  1. Kevin Bacon has Kevin Bacon number 0.
  2. All other actors have Kevin Bacon number at least 1.
  3. if x != y and x and y are in the same movie, then md(x,y) = 1
  4. md(x,z) <= md(x,y) + md(y,z)

The actor-movie graph

To solve the Kevin Bacon game (or any other similar game based on another actor) we use graphs. Specifically, create a graph in which both actors and movies are vertices, and insert an edge whenever an actor is in a movie. Thus each edge has an actor for one vertex and a movie for the other.

A graph is said to be bipartite if the vertices can be colored with two colors, say Red and Blue, such that each edge has different colored vertices, that is, each edge goes between a blue vertex and a red vertex. Clearly the movie-actor graph is bipartite, with actors colored blue and movies colored red.

The following result is proved in discrete math courses and most books on graph theory:

Theorem. In a bipartite graph, a path whose ends have the same color has an even number of edges.

As a consequence, any path from one actor to another in the movie-actor graph has an even number of edges. If P is such a path, with length n, then n is even and n/2 is the number of movies passed through by P. If P is a shortest path from actor x to actor y, then n/2 is the movie distance from x to y.

Thus to solve the Kevin Bacon game, we perform a Breadth-First survey from Kevin Bacon. The Breadth First Tree rooted at Kevin Bacon consists of shortest paths from Kevin Bacon to all other actors who have a finite Kevin Bacon number. Dividing the length of such a path by 2 yields the Kevin Bacon number for the actor at the other end of the path.

In practical terms, we start at an actor x and follow the parent pointers of the BF tree back to Kevin Bacon, counting the steps. Then divide this count by 2 to get the number.

Hints