Graphs and Digraphs

Reference: Chapter B.4 of [Cormen].

Undirected Graphs - Theory and Terminology

Reference: Chapter B.4 of [Cormen].

Directed Graphs - Theory and Terminology

Reference: Chapter B.4 of [Cormen].

Graph Data Structures

There are three basic approaches to keeping track of graph data in programs. In each case, we keep track of the edges from one vertex to another. The first two assume that vertices are integers and keep up with edges by remembering which pair of vertices each edge connects. The third method actually stores vertices and edges as objects, which facilitates templatizing the graph classes.

We will discuss adjacency matrix and adjacency list representations in the context of the abstract base class (ADT) for graphs with integer vertices, shown in the side.

The underlying premise for IGraph is that the vertices of the graph are (indexed by) the integers 0 .. n -1. The class provides the methods SetVrtxSize() and AddEdge() which facilitate the construction of any integer-vertexed graph. Access to graph elements is intended to be through an IGraph::Iterator object (discussed below).

Adjacency Matrix Representation

The adjacency matrix representation assumes the graph G has vertices v₀ ... v_{n -1} indexed on the integers 0 ... n - 1. An adjacency matrix M representing G sets M(i,j) = 1 to signify an edge from vertex v_i to vertex v_j in G and M(i,j) = 0 when G has no edge from v_i to v_j.

The adjacency matrix representation is attractive for its simplicity and for the fast random access to edge information in the graph it provides. It is the most widely used representation in mathematical settings. The adjacency matrix representation is straightforward to implement as follows:

class Ungraph
{
protected:
  TMatrix < int > adjMatrix;

public:
  void AddEdge(x, y)
  {
    adjMatrix[x][y] = 1;
    adjMatrix[y][x] = 1;
  }
} ;

class Digraph
{
protected:
  TMatrix < int > adjMatrix;

public:
  void AddEdge(x, y)
  {
    adjMatrix[x][y] = 1;
  }
} ;

Of course, other methods would be added to these classes. The above is just a bare-bones starting point.

The adjacency matrix representation has very fast access to edge information using the matrix bracket double operator and fixed storage size:

edge access time = Θ(1)
storage size = Θ(n²)

The fixed storage requirement of adjacency matrices can be a disadvantage for graphs with few edges, because the matrix stores "no information" in lots of places, thus using memory unnecesarily.

For example, suppose a graph has about as many edges as it does vertices, as is the case in many practical applications of graphs. (Such graphs are called sparse or sparsely connected.) Then most entries in an adjacency matrix are used to store 0 to indicate an edge does not exist. Note that a matrix with n rows and columns has n² entries. The adjacency matrix for a sparse graph would have Θ(n² - n) = Θ(n²) entries that are zero, corresponding to "no edge here".

Adjacency List Representation

For sparse graphs, it is much more space conservative to store data only when an edge does exist. This is the idea behind the adjacency list representation: each vertex v keeps a list of all of the other vertices to which an edge from v goes. The adjacency list representation is implemented as follows:

class Ungraph
{
protected:
  TVector < TList < unsigned int > > adjList;

public:
  void AddEdge(x, y)
  {
    adjList[x].PushBack(y);
    adjList[y].PushBack(x);
  }
} ;

class Digraph
{
protected:
  TVector < TList < unsigned int > > adjList;

public:
  void AddEdge(x, y)
  {
    adjList[x].PushBack(y);
  }
} ;

Again, these classes can be elaborated considerably.

The storage requirement for the adjacency list representation is dependent on the number of elements in the graph and thus for sparse graphs is much more conservative. The price paid for edge access time is small: the adjacency list of a vertex is accessed randomly, but to find a specific edge the list must be searched (sequentially):

storage size = Θ(eSize + vSize) = Θ(eSize + n)
edge access time = Θ(outDegree(vertex))

Theorem. For sparse graphs, the edge access time in the adjacency list representation is amortized constant time.

Proof. A sparse graph (or digraph) has the property that the number of edges is (roughly) equal to the number of vertices, that is, eSize = vSize. Using the "degree" theorem (slide 2 or 3), the average out degree of a vertex is

(Σ_v outDegree(v)) / vSize = eSize / vSize = vSize / vSize = 1.

Therefore amortized edge access time = Θ(average outDegree) = Θ(1).

Advanced Representations

The ultimate in generality for graphical representation is to allow for any (proper) types for vertices and edges. We refer to these here as vertex_type and edge_type, respectively. The problem of representing a graph (or digraph) that has general vertex_type and edge_type is quite interesting and has many practical applications. One should not conclude however that there is a one-design-fits-all-needs solution to the graph/digraph representation problem. Rather, it is useful to explore options, any one of which may be appropriate for a particular client. Fortunately, the tools we have developed (and that have analogs in the STL) make the implementation of these options straightforward.

The edge list representation of a graph uses list of <vertex_type, list < edge_type > > pairs. However, a list of lists sacrifices the ability to access an edge list randomly with respect to vertices, as is done in both the adjacency matrix and adjacency list representations. This change would slow down many graph algorithms by a factor of Θ(vSize) (or worse, when edge accesses are buried inside loops).

To maintain the random access ability while using a general type for vertices and edges, we turn to the associative array or map (which, recall, can be implemented as a version of hash table CHashMap or binary search tree CSortMap). An associative array indexed on key_type = vertex_type and accessing data_type == TList<edge_type> works well:

class Ungraph
{
protected:
  TAArray < vertex_type, TList < edge_type > > edgeList;

public:
  void AddEdge(x, y)
  {
    edge_type e = new edge_type(x, y);
    edgeList[x].PushBack(e);
    edgeList[y].PushBack(e);
  }
} ;

class Digraph
{
protected:
  TAArray < vertex_type, TList < edge_type > > edgeList;

public:
  void AddEdge(x, y)
  {
    edge_type e = new edge_type(x, y);
    edgeList[x].PushBack(e);
  }
} ;

The edge access time for this representation is the search time of the edge list, that is O(v.OutDegree()). This is an excellent choice for sparse graphs where the edge list can be expected to be small in size. For dense graphs, the edge list can be replaced by a faster access time structure such as a set, making the edge access time O(log(v.OutDegree())).

Note that many graph algorithms use edge access as an atomic operation buried inside one or more loops, making edge access time a critical point in choosing a representation.

Graph Traversals

There are two basic strategies for systematically going to each vertex in an ungraph or digraph, depth-first and breadth-first. We have encountered both of these before in trees and backtracking problems. The graph cases are the most general form of these algorithms. Both depth-first and breadth-first strategies are most useful when captured in the form of iterators.

Depth-First Search Iterator

This version of DFS emphasizes vertices and is guaranteed to iterate through all vertices of a graph (or digraph) in depth-first order. A similar version could be defined that emphasizes edges. The implementation of DFS uses a stack of vertices. (The stack could be eliminated from DFS.)

DFS uses a marking device for vertices that is left unspecified. The simplest implementation for a marker is a vector of bool values indexed on vertices. The order in which edges from a vertex are considered by DFS is also unspecified and is in practice dependent on the particular representation of the graph.

Note the similarity between DFSIterator and PreorderIterator for binary trees. If a binary tree is represented as a graph with child edges always listed in left to right order, then these two iterator types are equivalent on binary trees.

The following is C++ pseudocode defining a DFSiterator class for graphs or digraphs.

class DFSIterator
{
protected:
  Stack < vertex_type > S;  // top of S is current vertex

public:
  void Initialize(vstart)
  {
    S.Clear();
    Unmark all vertices;
    Mark(vstart);
    S.Push(vstart);
  }

  int Valid()
  {
    return !S.Empty();
  }

  vertex_type* Retrieve()
  {
    if (S.Empty())
      return 0;
    return &S.Top();
  }

  DFSIterator<T>& operator ++ ()
  {
    while (!S.Empty() and no edge from S.Top() goes to an unmarked vertex)
    {
      S.Pop();
    }
    if  (there is an edge from S.Top() to an unmarked vertex v)
    {
      Mark(v);
      S.Push(v);
    }
    return *this;
  }
} ;

Breadth-First Search

This version of BFS emphasizes vertices and is guaranteed to iterate through all vertices of a graph (or digraph) in breadth-first order. A similar version could be defined that emphasizes edges. The implementation of BFS uses a queue of vertices. (It is impractical to eliminate the queue from BFS.)

BFS uses a marking device for vertices that is left unspecified. The simplest implementation for a marker is a vector of bool values indexed on vertices. The order in which edges from a vertex are considered by BFS is also unspecified and is in practice dependent on the particular representation of the graph.

Note the similarity between BFSIterator and LevelorderIterator for binary trees. If a binary tree is represented as a graph with child edges always listed in left to right order, then these two iterator types are equivalent on binary trees.

The following is C++ pseudocode defining a BFSiterator class for graphs or digraphs.

class BFSIterator
{
protected:
  Queue < vertex_type > Q; // front of Q  is current vertex

public:

  void Initialize(vstart)
  {
    Q.Clear();
    Unmark all vertices;
    Mark(vstart);
    Q.Push(vstart);
  }

  int Valid()
  {
    return !Q.Empty();
  }

  vertex_type* Retrieve()
  {
    if (Q.Empty())
      return 0;
    return &Q.Front();
  }

  BFSIterator<T>& operator ++ ()
  {
    if (!Q.Empty())
    {
      for (all edges from Q.Front() to an unmarked vertex v)
      {
        Mark(v);
        Q.Push(v)
      }
      Q.Pop();
    }
    return *this;
  }
} ;

Topological Sort

Algorithms that operate on graphs form one of the richest and most useful classes of non-numerical computer algorithms, with applications in virtually all aspects of computing.

One problem that arises in many applications areas (from spreadsheet macro evaluation to learning in artificial neural networks) can be described as a "prerequisite" problem: given a set of items to be processed (such as courses in computer science that must be taken) and a set of pre/post relationships among the items (such as course prerequisite requirements), order the items in such a way that all of the prerequisite relations are respected by the ordering. For the course analogy, order the courses so that whenever a course is taken, all of its prerequisites have already been taken.

Another similar problem is that faced by the make utility: given a set of targets and their dependencies, typically encoded as

target: dependency1 dependency2 ...

find an ordering of the targets such that each target comes later in the ordering than its dependencies. (For a project build makefile, this is a compilation ordering for the targets.)

This problem is modelled as a digraph, in which vertices represent the set of items and directed edges represent the pre/post relations among items. A solution is an ordering of the vertices of the digraph so that all directed edges go "forward" in the ordering. More formally: A topological ordering of the vertices of a digraph D is a traversal of D such that if [x,y] is an edge of D then x is encountered before y in the traversal.

Note that if the digraph contains a directed cycle then we cannot find such an ordering: inevitably, as we follow the cycle, we must step from a newly encountered vertex to a previously encountered vertex, violating the topological sort condition. A consequence of the Topological Sort Algorithm is that the opposite is also true: A digraph has a topological ordering of its vertices iff it has no directed cycles.

Algorithm uses three data structures

Queue Q of vertices: stores topological ordering of vertices (initialized empty)
Vector V of integers indexed on vertices: stores "unused" inDegree of each vertex (initialized to inDegree)
Stack S of vertices: stores vertices with no "unused" inDegree for processing (initialized with all "source" vertices)

Topological Sort Algorithm

Assume that D is a digraph defined within the IGraph framework.

Data structures: 

queue<int>  Q  // stores the topological order as it is discovered
               // initialized empty

vector<int> V  // maintains the number of vertices that are not yet in Q and
               // have and edge to vertex i
               // initialized V[i] = inDegree(i)

stack<int>  S  // temporary store of vertices i with 0 = V[i]
               // initialized with all "source" vertcies (those with 0 = V[i])

Process:

While (!S.Empty())
{
   t = S.Top();
   S.Pop();
   for every neighbor n of t
   { 
      --V[n];
      if (V[n] == 0)
        S.Push(n);
   }
   Q.Push (t);
}

Conclusion:

if (Q.Size() == D.VrtxSize())
   Q contains a topological ordering of D;
else
   D has a cycle;

Other Graph Algorithms

Graphs and digraphs model many kinds of problems that involve locations and connections between locations, and solving particular kinds of problems in these graphical models is often of both practical importance and theoretical interest.

Recall that a weighted graph or digraph is one in which a numerical weight is associated with each edge. A graph otherwise without weights can have weight 1 assigned to each edge, making the unweighted cases included in the weighted cases. A network is a digraph in which each directed edge has an associated numerical label, usually called a weight. A network models such as diverse as gas pipeline networks (where weight of a pipeline is its capacity), the flow of money in an economy, and computer networks (where weight of a connection is its bandwidth). We conclude by mentioning a few of the best known and the most useful graph (and digraph) algorithms.

Kruskal's algorithm and Prim's algorithm are two methods for finding a minimal spanning tree of a connected weighted graph. A minimal spanning tree of a graph G is a subgraph T of G such that

T is a tree
T contains all the vertices of G
T has minimal total weight among all subgraphs satisfying the first two conditions.

Dijkstra's algorithm solves the single-source shortest path problem in a weighted digraph. That is, it finds the shortest directed path from a vertex to all other vertices reachable from that vertex.

Several algorithms solve the linear programming to solve such graph problems as the maximum network flow problem.