Project: Random Graphs

Generating and analyzing random graphs using Partition

Version 11/15/17

Educational Objectives: After completing this assignment, the student should be able to accomplish the following:

Operational Objectives: Implement the ComponentSizeDistribution function defined on Partition objects. Write programs that generate random graphs and analyze their component and degree distributions, using various structural constraints on the graphs. Use the software to detect giant component phase transitions.

Deliverables:

partition_util.h       # contains void ComponentSizeDistribution ( ... )
rangraph_bipartite.cpp # generates and analyzes random bipartite graphs
rangraph_geo.cpp       # generates and analyzes random graphs with geometrically distributed vertex degrees

makefile               # builds all project object code and executables
manual.txt             # operating instructions for software [team document]
report.txt             # overview of team and project [team document]
log.txt                # personal log for team member [individual document]

Background

The study of random graphs began in 1960 with the publication of a remarkable paper by Paul Erdös and Alfréd Rényi that illucidated their discover of a phase transtion in the number of components of a random graph as the expected vertex degree passes through the value 1.0. This result was astonishing and unexpected. [See Footnote 1.]

It is not possible to convey the breadth, depth, and importance of the study of large-scale graphs in a few paragraphs. An entire book would be needed just for a complete bibliography on the subject. Nevertheless some intuition can be obtained by thinking about the following:

We are going to dip our pinky toes into this research area by writing some random graph generators with certain properties, and then we will analyze some of their features. One of the things we will want to do is keep up with the component structure of the graphs as we generate them, because that is so much more tractable than doing component analysis after the graph is created. For this we need the union-find algorithm and a special analytic tool that gives the component size breakdown in descending order by size. We also want to analyze the degree sequence structure of these graphs.

Our starting point consists of:

  1. The Partition class that implements the union-find algorithms
  2. The Graph class and associated search/survey algorithms already studied in the previous project
  3. Two example random graph generators that implement the classic cases studied by Erdös and Rényi.

Let's abbreviate Erdös-Rényi to ER. ER studied two families of random graphs called G[n,e] and G(n,p) in the first reference below. A member of the G[n,e] family is obtained by starting with n vertices and repetitively adding edges between randomly drawn vertex pairs until e edges have been added, while ensuring that the graph remains simple (i.e., there are no self-loops and at most one edge between any two vertices). Members of G(n,p) are obtained in a slightly different manner: for each vertex, add an edge to every other vertex with probability p, again ensuring the graph remains simple. The two ER families are very similar when we take

p = 2e/(n(n - 1)).

The formula above is obtained from the observation that the expected degree of a vertex v in G(n,p) is [d] = p(n - 1) and the "degree theorem" which states

Σvd(v) = 2e.

Substituting expected values yields

2[e] = Σv[d(v)] = Σvp(n - 1) = n×p(n - 1) = pn(n - 1)

(taking [x] to mean the expected value of x). The subtle difference in the way the families are generated is that in G[n,e] there is a single random Bernoulli generator associated with the graph, used to pick the vertex pair at random. Whereas in G(n,p) each vertex has its own independent Bernoulli generator. While the two families have very similar properties, the second one is more cumbersome to implement but also is amenable to generalizations in which the individual generators associated with vertices may vary in their properties.

References

Joel Spencer, The Giant Component - Golden Anniversary, Notices of the AMS 57:6, 720-724 (2010).

Tom Britton, Maria Deijfen, and Anders Martin-Loef, Generating simple random graphs with prescribed degree distribution, Journal of Statistical Physics 124:6, 1377-1397 (2006) [arXiv.org > math > arXiv:1509.06985 23 Sep 2015]

Jure Leskovec, SNAP: The Stanford Network Analysis Project, Stanford University, 2009 - present.

Procedural Requirements

  1. The official development, testing, and assessment environment is g++ -std=c++11 -Wall -Wextra on the linprog machines. Code should compile without error or warning.

  2. Maintain your work log in the text file log.txt as documentation of effort, testing results, and development history. This file may also be used to report on any relevant issues encountered during project development.

  3. Begin by copying all files from LIB/proj8RG into your proj8RG directory. All of these files require your familiarization with code, in conjunction with reading from the lecture notes.

    In addition you will want to copy the following executables:

    LIB/area51/rangraph.x
    LIB/area51/rangraph_ER.x
    LIB/area51/rangraph_BP.x
    LIB/area51/rangraph_geo.x
    LIB/area51/fpartition1.x
    LIB/area51/fpartition2.x
    

    After completing the project, you should be able to create these using the distributed makefile. All of the executables are important to use to assist in understanding:

  4. Create the file partition_util.h by copying the "stub" version and completing the implementation of ComponentSizeDistribution(). Test your function using the supplied fpartition1.cpp and fpartition2.cpp.

  5. Create the files rangraph_bipartite.cpp and rangraph_geo.cpp. Test thoroughly and complete the experimental investigation discussed. Be sure to put your results in log.txt.

  6. When logged in to shell or quake, submit the project by executing "submit.sh deliverables.sh". Read the screen and watch for processing errors.

    Warning: The submit process does not work on the program and linprog servers. Use shell or quake to submit projects. If you do not receive the second confirmation with the contents of your project, there has been a malfunction.

Code Requirements and Specifications

  1. The stand-alone function template

    template < class P >
    void ComponentSizeDistribution ( const P& p , size_t maxToDisplay , std::ostream& os = std::cout )
    

    takes three arguments:

    1. const P& p is the Partition object under analysis, passed by const reference.
    2. size_t maxToDisplay is the number of component sizes to display.
    3. std::ostream& os is the output stream through which to display.

    The items to display are the sizes of the components of the partition object, ranked in descending order by size. For example, suppose there are 25 components, component 3 has size 4, component 5 has size 6, component 22 has size 3, and all other components have size 1. Then the display would be:

    rank      size
    ----      ----
    1         6
    2         4
    3         3
    4         1
    *         1 (the remaining 21 components have size 1)
    

    The display may be cut short by the "maxToDisplay" argument. All of the display boiler plate is supplied in the stub file. You only have to come up with the algorithm to calculate the distribution.

  2. The random graph generator rangraph_bipartite.cpp should generate random bipartite graphs with the inputs (1) name of file to store graph, (2) number of red vertices, (3) number of blue vertices, and (4) number of edges. An optional fifth argument determines the length of the tail of the component distribution to display. A good starting point for this code is rangraph.cpp which generates graphs in the family G[n,e]. These generators are simple enough that they can get by with the random number generator in fsu::xran.

  3. The random graph generator rangraph_geo.cpp should generate random graphs with expected vertex degrees geometricaly distributed. The inputs are (1) name of file to store graph, (2) the number of vertices, and (3) the expected vertex degree. An optional fourth argument determines the length of the tail of the component distribution to display. A good starting point for this code is rangraph_ER.cpp which generates graphs in the family G(n,p). The generators rangraph_ER and rangraph_geo require the use of the C++ <random> library.

    The generator rangraph_ER can be thought of as having a Bernoulli generator with probability p = d/(n - 1) at each vertex. What must happen in rangraph_geo is the probability associated with the Bernoulli generators at the vertices must be distributed geometricaly over the vertices with mean d/(n - 1).

Experiment

Once the code is written and working correctly, the team needs to conduct experiments as follows:

  1. Use rangraph and rangraph_ER to tease out the phase change behavior as the expected degree passes the value 1.0. This serves to confirm the classical result of Erdös and Rényi.

    Please also observe the degree distributions and conclude ??

  2. Is there an ER-like phase transition in bipartite graphs? If so, what is the critical value of expected degree?

    Please also observe the degree distributions and conclude ??

  3. Is there an ER-like phase transition in graphs with geometricaly distributed degree sequences? If so, what is the critical value of expected degree?

    Please also observe the degree distributions and conclude ??

Discuss all experimental results in report.txt. Succinctly please! But backed up with computational results.

Hints

Footnote 1

Paul Erdös is the mathematician of the so-called "Erdös Number", which is the smallest number of co-authorships connecting a published mathematician to Erdös. (This is the math-nerd analog of the Kevin Bacon number which is famous in movie-nerd circles, defined as the smallest number of co-actors connecting an actor back to Kevin Bacon.) Lacher has Erdös number 3:

  1. Kuratowski, K.; Lacher, R.C. (1969), "A theorem on the space of monotone mappings", Bull. Pol. Acad. Sci. 12 (1969) 797--800.
  2. Kuratowski, K.; Ulam, St. (1932), "Quelques proproperties topologiques du produit combinatoire", Fundamenta Mathematicae, Institute of Mathematics Polish Academy of Sciences, 19 (1): 247--251.
  3. Erdös, P; Ulam, St. (1968), "On equations with sets as unknowns", Proceedings of the National Academy of Sciences of the United States of America 60: 1189-95.

Erdös, Kuratowski, and Ulam are each incredibly famous. Kuratowski is one of the founders of both set-theoretic topology and graph theory - the notations K(n) for the complete graph on n vertices and K(p,q) for the complete bipartite graph use "K" in his honor. He was also Ulam's major professor. Among many other things, Ulam discovered the way to calculate how to start a mass particle chain reaction. This work as part of the Manhatten project led to hydrogen fusion, hydrogen bombs, the threat of global destruction, and ultimately the end of the cold war. Without Ulam's discovery, we could well be living in a Stalinist dictatorship.