Version 09/03/2018

Random Graphs

The study of random graphs began in 1960 with the publication of a remarkable paper by Paul Erdös and Alfréd Rényi that illucidated their discovery of a phase transition in the number of components of a random graph as the expected vertex degree passes through the value 1.0. This result was astonishing and unexpected. [See Footnote 1.]

It is not possible to convey the breadth, depth, and importance of the study of large-scale graphs in a few paragraphs. An entire book would be needed just for a complete bibliography on the subject. Nevertheless some intuition can be obtained by thinking about the following:

Many important systems and phenomena are representable as graphs, including:
1. Road networks [vertices represent intersections, edges represent roads between intersections]
2. Airline schedules [vertices represent cities, edges direct flights between cities]
3. Social networks [vertices represent facebook users, edges represent friend relationships]
4. Contagion networks [vertices represent individuals, edges represent contacts]
5. Customer/product networks [vertices = Amazon book titles and book customers, edges = book is purchased by customer]
6. Movie/actor graph [vertices = actors and movies, edges = actor is in that movie]
7. Human Sexual Contacts [vertices = humans, edges = ... (well, you know)]. This example was the subject of a 2001 paper in the journal Nature.
8. WWW
Properties of graph representations yield information about the context. For example:
1. An A-B path is a route. Routes can be optimized on any quality, many route planners provide at least 3: shortest by distance, shortest by driving time, most scenic.
2. Travel itineraries can be minimized on travel time, number of stopovers, or cost.
3. Large degree vertices may be good viral marketers.
4. A contagion may be an infectious disease among birds, knowledge of a chimpanzee predator, or a juicy piece of gossip. Graph properties will determine how fast it spreads.
5. If I bought the same book as you, and you later buy another book, Amazon can recommend your new book to me. More generally, clusters, components, and node degrees are important features for marketing.
6. Well ... we'll explore this one in detail in another project.
7. Who are the vertices with high degree? Who is an isolated vertex? How close is the graph to being bipartite? What is the significance of long paths? Connected components? What is the average path distance between vertices?
8. Around 2000, grad students Sergey Brin and Larry Page came up with a way to exploit properties of the web graph to optimize searches. Google is now one of the richest and most influential companies ever created.
Some of most interesting graphs are too large and/or too dynamic to study. Features of these graphs can be built into randomly generated abstract graph models whose properties reflect their real counterparts.

We are going to dip our pinky toes into this research area by writing some random graph generators with certain properties, and then we will analyze some of their features. One of the things we will want to do is keep up with the component structure of the graphs as we generate them, because that is so much more tractable than doing component analysis after the graph is created. For this we need the union-find algorithm and a special analytic tool that gives the component size breakdown in descending order by size. We also want to analyze the degree sequence structure of these graphs.

Our starting point consists of:

The Partition class that implements the union-find algorithms
The Graph class and associated search/survey algorithms already introduced and in LIB/graph/graph.h
Two example random graph generators that implement the classic cases studied by Erdös and Rényi

Let's abbreviate Erdös-Rényi to ER. ER studied two families of random graphs called G[n,e] and G(n,p) in the first reference below. A member of the G[n,e] family is obtained by starting with n vertices and repetitively adding edges between randomly drawn vertex pairs until e edges have been added, while ensuring that the graph remains simple (i.e., there are no self-loops and at most one edge between any two vertices). Members of G(n,p) are obtained in a slightly different manner: for each vertex, add an edge to every other vertex with probability p, again ensuring the graph remains simple. The two ER families are very similar when we take

p = 2e/(n(n - 1)).

The formula above is obtained from the observation that the expected degree of a vertex v in G(n,p) is [d] = p(n - 1) and the "degree theorem" which states

Σ_vd(v) = 2e.

Substituting expected values yields

2[e] = Σ_v[d(v)] = Σ_vp×(n - 1) = n×p×(n - 1) = pn(n - 1)

(taking [x] to mean the expected value of x). The subtle difference in the way the families are generated is that in G[n,e] there is a single random Bernoulli generator associated with the graph, used to pick the vertex pair at random. Whereas in G(n,p) each vertex has its own independent Bernoulli generator. While the two families have very similar properties, the second one is more cumbersome to implement but also is amenable to generalizations in which the individual generators associated with vertices may vary in their properties.

References

Joel Spencer, The Giant Component - Golden Anniversary, Notices of the AMS 57:6, 720-724 (2010).

Tom Britton, Maria Deijfen, and Anders Martin-Loef, Generating simple random graphs with prescribed degree distribution, Journal of Statistical Physics 124:6, 1377-1397 (2006) [arXiv.org > math > arXiv:1509.06985 23 Sep 2015]

Jure Leskovec, SNAP: The Stanford Network Analysis Project, Stanford University, 2009 - present.

Experiments

Find the following executables in LIB/notes_support:

rangraph_i.x            # G[n,p] family
rangraph_ER_i.x         # G(n,p) family
rangraph_bipartite_i.x  # G[n,p] 
rangraph_geo_i.x        # G(n,p) with geometric distribution of vertex probabilities

Once the code is copied and made executable, each student needs to conduct experiments as follows:

Use rangraph and rangraph_ER to tease out the phase change behavior as the expected degree passes the value 1.0. This serves to confirm the classical result of Erdös and Rényi.

Please also observe the degree distributions and conclude ??
Is there an ER-like phase transition in bipartite graphs? If so, what is the critical value of expected degree?

Please also observe the degree distributions and conclude ??
Is there an ER-like phase transition in graphs with geometrically distributed degree sequences? If so, what is the critical value of expected degree?

Please also observe the degree distributions and conclude ??

Footnote 1

Paul Erdös is the mathematician of the so-called "Erdös Number", which is the smallest number of co-authorships connecting a published mathematician to Erdös. (This is the math-nerd analog of the Kevin Bacon number which is famous in movie-nerd circles, defined as the smallest number of co-actors connecting an actor back to Kevin Bacon.) Lacher has Erdös number 3:

Kuratowski, K.; Lacher, R.C. (1969), "A theorem on the space of monotone mappings", Bull. Pol. Acad. Sci. 12 (1969) 797--800.
Kuratowski, K.; Ulam, St. (1932), "Quelques proproperties topologiques du produit combinatoire", Fundamenta Mathematicae, Institute of Mathematics Polish Academy of Sciences, 19 (1): 247--251.
Erdös, P; Ulam, St. (1968), "On equations with sets as unknowns", Proceedings of the National Academy of Sciences of the United States of America 60: 1189-95.

Erdös, Kuratowski, and Ulam are each incredibly famous. Kuratowski is one of the founders of both set-theoretic topology and graph theory - the notations K(n) for the complete graph on n vertices and K(p,q) for the complete bipartite graph use "K" in his honor. He was also Ulam's major professor. Among many other things, Ulam discovered the way to calculate how to start a mass particle chain reaction. This work as part of the Manhatten project led to hydrogen fusion, hydrogen bombs, the threat of global destruction, and ultimately the end of the cold war. Without Ulam's discovery, we could well be living in a Stalinist dictatorship.

Lacher feels thrilled and humbled to appear in the list above.