Project 4: Movie Match

Final Algorithms Project

Educational Objectives: After completing this assignment, the student should be able to accomplish the following:

Operational Objectives: Design and implement the following classes:

  1. Breadth First Survey
  2. Depth First Survey (3-teams only)
  3. Symbol Graph
  4. Movie Match Game

You may have teams of 2 or 3 people. The team should compose a brief summary of work that explains the responsibilities and work products each member of the team accomplished. Also each team member should submit the project individually. Please make certain that the submissions for each member of a team are identical.

Deliverables: Files:

readme.txt     # required for all
bfsurvey.h     # required for all
dfsurvey.h     # required for 3-teams only
symgraph.h     # required for all
moviematch.h   # required for all
makefile       # builds all executables in project, including tests

Procedural Requirements

  1. The official development | testing | assessment environment is gnu g++ on the linprog machines.

  2. Each member of a team submits all team deliverables

  3. Deliverables submitted should be identical across all team members.

  4. The team makup is listed in the file header documentation of each submitted file (see C++ Style link for standards)

  5. File readme.txt explains how the software was developed, what responsibilities each team member had, how it was tested, and how it is expected to be operated.

  6. Copy all of the test harnesses and graph files from LIB/proj4:

    fgraph.cpp     # general test for graph classes
    ftopsort.cpp   # another test for directed graphs
    fbfsurvey.cpp  # used for fbfsurvey_ug.cpp and fbfsurvey_dg.cpp
    fdfsurvey.cpp  # used for fdfsurvey_ug.cpp and fdfsurvey_dg.cpp
    KevinBacon.cpp # client program for MovieMatch
    

  7. Copy the file LIB/proj4/proj4submit.sh into your project directory, change its permissions to executable, and submit the project by executing the script.

    Warning: Submit scripts do not work on the program and linprog servers. Use shell.cs.fsu.edu to submit projects. If you do not receive the second confirmation with the contents of your project, there has been a malfunction.

Code Requirements and Specifications - ALGraph

  1. Class ALUGraph implements the adjacency list representation of a graph whose vertices are assumed to be unsigned integers 0,1,...,n-1. The interface should conform to:

    namespace fsu
    {
      template < typename N >
      class ALUGraph
      {
      public:
        typedef N      Vertex;
        typedef xxxxx  AdjIterator;
    
        void   SetVrtxSize  (N n);
        void   AddEdge      (Vertex from, Vertex to);
        size_t VrtxSize     () const;
        size_t EdgeSize     () const;
        size_t OutDegree    (Vertex x) const;
        size_t InDegree     (Vertex x) const;
        AdjIterator Begin   (Vertex x) const;
        AdjIterator End     (Vertex x) const;
    
        ALUGraph ( );
        ALUGraph ( N n );
      ...
      };
    } // namespace fsu
    

    where xxxxx is a type that you define. This is an iterator for the adjacency list, which could be fsu::List<Vertex>::ConstIterator, std::list<Vertex>::const_iterator, or some other type. The directed graph API is exactly the same (but for the name of the class):

    namespace fsu
    {
      template < typename N >
      class ALDGraph
      {
      public:
        typedef N      Vertex;
        typedef xxxxx  AdjIterator;
    
        void   SetVrtxSize  (N n);
        void   AddEdge      (Vertex from, Vertex to);
        size_t VrtxSize     () const;
        size_t EdgeSize     () const;
        size_t OutDegree    (Vertex x) const;
        size_t InDegree     (Vertex x) const;
        AdjIterator Begin   (Vertex x) const;
        AdjIterator End     (Vertex x) const;
    
        ALUGraph ( );
        ALUGraph ( N n );
      ...
      };
    } // namespace fsu
    

    Much of the implementation code for the undirected and directed cases is identical, so it can be profitable to derive one of these from the other. In the derived class, only AddEdge, EdgeSize, and InDegree require re-definition.

  2. Begin(x) returns an AdjIterator which is a forward ConstIterator that iterates through the adjacency list of the vertex v. End(x) returns the end iterator of the adjacency list. So, the loop

    for (typename GraphType::AdjIterator i = g.Begin(x); i != g.End(x); ++i)
    {/*   do something at the vertex *i   */}
    

    encounters all of the vertices adjacent from v in the (directed or undirected) graph g.

  3. The template argument is some unsigned integer type. We are using templates mainly as a convenience so that member functions will not be compiled (or even require implementation) if they are not called by client code.

  4. Test graph classes thoroughly using fgraph.cpp.

Code Requirements and Specifications - Algorithms

  1. Algorithms should operate on ALGraph objects via the interface defined above, so that another team's version of ALGraph can be substituted without modification.

  2. Algorithms should be class templates (in line with the graph class template). See discussion of algorithm classes in the Graphs 1 Lecture Notes.

  3. Test algorithms (surveys) thoroughly using the supplied survey tests.

Code Requirements and Specifications - SymbolGraph

  1. Class SymbolGraph implements a graph whose vertices are symbols (typically strings). The API is largely the same as that of the abstract graph classes discussed above, with the additional ability to adjust the vertex size "on the fly" using the Push() operation.

    namespace fsu
    {
      template < typename S , typename N >
      class SymbolGraph
      {
      public:
        typedef S      Vertex;
        typedef xxxxx  AdjIterator;
    
        void   SetVrtxSize  (N n);
        void   AddEdge      (Vertex from, Vertex to);
        size_t VrtxSize     () const;
        size_t EdgeSize     () const;
        size_t OutDegree    (Vertex x) const;
        size_t InDegree     (Vertex x) const;
        AdjIterator Begin   (Vertex x) const;
        AdjIterator End     (Vertex x) const;
    
        void   Push         (const S& s); // add s to the vertex set
    
        // access to underlying data
        const ALUGraph<N>&      GetAbstractGraph() const; // reference to g_
        const HashTable<S,N,H>& GetSymbolMap() const; // reference to s2n_
        const Vector<S>&        GetVertexMap() const; // reference to n2s_
    
        SymbolGraph ( );
        SymbolGraph ( N n );
        ...
      private:
        ALUGraph<N>      g_;
        HashTable<S,N,H> s2n_;
        Vector<S>        n2s_;
        ...
      };
    } // namespace fsu
    

    where xxxxx is the adjacency iterator type. There is a directed version SymbolDirectedGraph<S,N> whose implementation is almost identical to the undirected case, except using ALDGraph<N> as the abstract graph underpinning.

  2. The template arguments are S = SymbolType and N = IntegerType. S is the type for the names of vertices, and is typically some form of string. N is the parameter to instantiate the underpinning abstract graph.

  3. s2n_ is an associative array, or mapping, from symbols to vertices in the abstract graph g_. n2s_ is the inverse mapping from vertices in g_ to symbols. The symbol graph uses the two mappings to translate symbols to abstract vertices and calls operations in the abstract graph.

Code Requirements and Specifications - MovieMatch

  1. MovieMatch should provide services required by KevinBacon.cpp. This will require the following (partial) class definition:

    class MovieMatch
    {
    public:
    
      MovieMatch (const char* baseActor) : baseActor_(0)
      {
        size_t length = strlen(baseActor);
        baseActor_ = new char [length + 1];
        baseActor_[length] = '\0';
        strcpy (baseActor_,baseActor);
      }
    
      void Load (const char* filename);
      // loads a moview/actor table
    
      unsigned long MovieDistance(const char* actor);
      // returns the number of movies required to get from actor to baseActor_
    
      ...
    
    private:
      char* baseActor_;
      SymbolGraph < fsu::String , size_t > sg_;
      ...
    };
    

    (The names can be your choice, except for those required by the distributed client program.)

    If you prefer you may build the symbol graph directly in MovieMatch.

  2. The underlying graph should be built from the "database" provided in the text file movies.txt. Each line of this file represents a movie and the actors in the movie. Forward slash '/' is used to delimit the strings representing movie titles and actor names in each line.

  3. It will be helpful to use either the cstring library or std::string to read entire lines and break them up into strings using the '/' delimiter, so that spaces are captured. We will distribute a client program for MovieMatch that illustrates this approach by allowing actor names (with blanks) to be entered through the keyboard.

Movie Distance and Kevin Bacon

The Kevin Bacon game is this: given an actor by name, what is his/her Kevin Bacon number?

To solve this we first need a clear definition of the Kevin Bacon number for an actor, or more generally, the movie distance between two actors. The definition is much like the path distance between two vertices in a graph, except using movie chains instead of edges.

A movie chain from actor x to actor y is a sequence m1 m2 ... mk such that

  1. mj and mj+1 have an actor in common for 0 < j < k
  2. x is in movie m1
  3. y is in movie mk

The movie distance md(x,y) is defined to be the number of movies in a shortest movie chain from x to y. If there is no movie chain from x to y, we define md(x,y) = infinity.

The Kevin Bacon number of an actor x is the movie distance from x to Kevin Bacon.

Some consequences are:

  1. Kevin Bacon has Kevin Bacon number 0.
  2. All other actors have Kevin Bacon number at least 1.
  3. if x != y and x and y are in the same movie, then md(x,y) = 1
  4. md(x,z) <= md(x,y) + md(y,z)

The actor-movie graph

To solve the Kevin Bacon game (or any other similar game based on another actor) we use graphs. Specifically, create a graph in which both actors and movies are vertices, and insert an edge whenever an actor is in a movie. Thus each edge has an actor for one vertex and a movie for the other.

A graph is said to be bipartite if the vertices can be colored with two colors, say Red and Blue, such that each edge has different colored vertices, that is, each edge goes between a blue vertex and a red vertex. Clearly the movie-actor graph is bipartite, with actors colored blue and movies colored red.

The following result is proved in discrete math courses and most books on graph theory:

Theorem. In a bipartite graph, a path whose ends have the same color has an even number of edges.

As a consequence, any path from one actor to another in the movie-actor graph has an even number of edges. If P is such a path, with length n, then n is even and n/2 is the number of movies passed through by P. If P is a shortest path from actor x to actor y, then n/2 is the movie distance from x to y.

Thus to solve the Kevin Bacon game, we perform a Breadth-First survey from Kevin Bacon. The Breadth First Tree rooted at Kevin Bacon consists of shortest paths from Kevin Bacon to all other actors who have a finite Kevin Bacon number. Dividing the length of such a path by 2 yields the Kevin Bacon number for the actor at the other end of the path.

In practical terms, we start at an actor x and follow the parent pointers of the BF tree back to Kevin Bacon, counting the steps. Then divide this count by 2 to get the number.

Hints