Project 1:  Analyzing Internet Network Connectivity

Due: 9/18/2025 (unofficial deadline for part I: 9/9/2025)


Educational Objectives:  Refreshing C/C++ programming skills, experiencing text processing techniques, experience using makefile to organize and compile applications.

Statement of Work: Implement two programs to extract required information and analyze the properties of data in a data file. You may use any C++ features including STL containers and algorithms.

Project Description:

In this project you need to write two programs to collect and report some properties of the data contained in a text file. More specifically, this data file is a BGP routing table trace file. Note that it is not necessary for you to understand how BGP or the Internet works (of course, you are more than welcome to spend time and effort to learn them). Indeed, the only thing you really need to understand is the format of the text file in order for you to extract the required information and analyze some properties for this project. Some background on BGP and its routing table is provided below.

Background on BGP and Its Routing Table

The Internet is a collection of large number of networks or autonomous systems (ASes). Each AS owns a number of network prefixes (range of network addresses). Border Gateway Protocol (BGP) is the inter-domain routing protocol used on the Internet for ASes to exchange the reachability information among ASes (or rather, their network prefixes). Each AS has an unique AS number. For example the AS number of the FSU campus network is 2553 and the network prefix owned by the FSU campus network is 128.186.0.0/16. By considering each AS as a node, and the connection between neighboring ASes as a link/edge, we can consider the Internet as an AS-level graph. This AS-level graph can be obtained from the BGP routing table, the format of which is explained below.

The BGP routing table data trace is provided in the following file (compressed using bzip2).

Each line in the file contains a record of the BGP routing information. It follows certain format. The fields in a line are separated by a vertical bar |. Two fields in a line, the 6th and 7th fields, are of particular interests to the Internet inter-domain routing. The 6th field is the network prefix that this record is about. The 7th field is the so called ASPATH, which indicates the sequence of ASes that packets need to traverse to reach the corresponding destination network prefix. In this project, you only need to work on the 7th field (the ASPATH information). For example, the following line is taken from the data trace file. 

TABLE_DUMP|1130191746|B|144.228.241.81|1239|128.186.0.0/16|1239 2914 174 11096 2553|IGP|144.228.241.81|0|-2|1239:321 1239:1000 1239:1011|NAG||

The 6th field is 128.186.0.0/16 (FSU campus network prefix), and the 7th field is 1239 2914 174 11096 2553 (ASPATH). The ASPATH field states that, the rightmost AS (2553, FSU campus network) originates (i.e., owns) the corresponding network prefixes (128.186.0.0/16), and on the way from the leftmost AS (1239) to the rightmost AS (2553), a packet needs to traverse the immediate ASes (2914 174 11096). The ASPATH information expose the neighboring ASes, i.e., the adjacent ASes in the ASPATH are neighbors on the Internet. For example, 1239 and 2914 are neighbors of each other, 2914 and 174 are neighbors of each other, and so are 174 and 11096, 11096 and 2553.  In summary, based on the 7th field of this line, AS 1239 has one neighbor (2914); AS 2914 has two neighbors (1239 and 174); AS 174 has two neighbors (2914 and 11096); AS 11096 has two neighbors (174 and 2553), and AS 2553 has one neighbor (11096).

Unfortunately, an ASPATH may have two special cases that you need to pay attention to. First, some of ASPATH contain so called ASSET that we do not know their exact relationship. ASSET is indicated by a pair of square bracket ([]). For example, the following is an ASPATH that contains an ASSET:

1239 1668 10796 [11060 12262]

which was taken from the following line in the data trace:

 TABLE_DUMP|1130191716|B|144.228.241.81|1239|24.223.128.0/17|1239 1668 10796 [11060 12262]|IGP|144.228.241.81|0|-2|1239:321 1239:1000 1239:1006|AG|24.95.80.203|

In this example, 11060 and 12262 are in an ASSET. Fortunately, all ASSET occurs at the end (right side) of the ASPATH. In addition, in this project, when you analyze the neighboring ASes for each AS, you ignore the ASSET part of the ASPATH, but you still need to use the rest of the ASPATH. For example, for the above example, you still need to use the portion of the 7th field after removing the ASSET, that is, 1239 1668 10796 should be included in your analysis of AS-level graph.

Second, some AS numbers may appear multiple times in an ASPATH, for example, in the following ASPATH, AS number 7911 appears three times, and 30033 appears twice.

1239 7911 7911 7911 30033 30033

which is taken from the following line in the data trace:

TABLE_DUMP|1130191714|B|144.228.241.81|1239|8.3.43.0/24|1239 7911 7911 7911 30033 30033|IGP|144.228.241.81|0|-2|1239:123 1239:1000 1239:1011|NAG||

Fortunately, if an AS number appears multiple times, they appear consecutively (next to each other). When you analyze the data file, you should note that an AS is not the neighbor of its own. That is, you should ignore duplicate AS numbers in an ASPATH in a line.

You need to take special care of two special cases in the data trace. 

Project Requirement:

This project has two parts, that is, you need to develop two programs to analyze BGP routing tables to obtain the Internet connectivity information. More specifically, you need to determine the number of neighboring ASes for each AS.  

Part I: First program (named as proj1_p1.cpp)

In the Part I of the project, you will write a program to parse a given BGP routing table and extract the 7th field of each line. Note that, you should handle the two special cases we discussed above when you extract the 7th field: you should remove ASSET, and you should remove duplicate AS numbers, if they appear on the 7th field. This program reads from standard input (BGP data file will be redirected as the standard input to the program) and writes to the standard output. Each output line contains the extracted 7th field of a corresponding line from the data file (which may have be processed to remove ASSET and duplicate AS numbers, if they appear). An example run is given below:

$ proj1_p1.x < rib.20051024.2208_144.228.241.81 > aspath.rib.20051024.2208_144.228.241.81

which redirects BGP data file rib.20051024.2208_144.228.241.81 as the standard input to program proj1_p1.x, and redirects the standard output to aspath.rib.20051024.2208_144.228.241.81. Again, you should note that proj1_p1.x only handles standard input and standard output.

Part II: Second program (named as proj1_p2.cpp)

In the Part II of the project, you will write a program to analyze the extracted (and cleaned) ASPATH information to determine the number of neighbors each AS has. Similarly, this program will read from standard input (the extracted ASPATH file from the Part I of the project will be redirected as the standard input to the program), and write to the standard output. This program will read line by line the content of the extracted ASPATH file (redirected as the standard input of the program). As it reads in one line, it will update the neighbors information of each AS appearing on the line. Note that a neighbor of an AS may appear multiple times on different lines. However, it should be counted as one neighbor. For example, 7911 may appear next to 1239 (that is , it is a neighbor of 1239) on multiple lines; however, 7911 is just one neighbor of 1239 no matter how many times it appears next to 1239. By the same token, 1239 is just one neighbor of 7911. After processing all the lines in the ASPATH file, the program obtains the neighbors information of all the ASes in the ASPATH file. At the end of the program, it will display on the standard output the top 10 ASes in terms of the number of neighbors they have. The top 10 ASes should be printed in the descending (non-increasing) order based on the number of neighbors they have. If two ASes have the same number of neighbors, the AS with a smaller AS number (numerically) is considered to have more neighbors (outputted first). Let k denote  the total number of ASes in the data file, if k < 10, the program should just display the number of neighbors for these k ASes (instead of top 10 ASes) in the specified order.

Each output line contains the AS number of a top 10 AS, followed by a semicolon, a white space, and then the number of neighbors of this AS. The following is an example output line.

701: 1780

which shows that ASN 701 has 1780 neighbors.

Example run:

$ proj1_p2.x < aspath.rib.20051024.2208_144.228.241.81 > aspath.top10_as.txt

which redirects the ASPATH file aspath.rib.20051024.2208_144.228.241.81 (obtained from running proj1_p1.x) as the standard input to proj1_p2.x, and redirect the standard output of proj1_p2.x to file aspath.top10_as.txt.

Provided files

Example runs to create the above output files:

$proj1_p1.x < rib.20051024.2208_144.228.241.81 > aspath.rib.20051024.2208_144.228.241.81

$proj1_p1.x < rib.20051024.2208_144.228.241.81_10000 > aspath.rib.20051024.2208_144.228.241.81_10000

$proj1_p2.x < aspath.rib.20051024.2208_144.228.241.81 > aspath.top10_as.txt

$proj1_p2.x < aspath.rib.20051024.2208_144.228.241.81_10000 > aspath.top10_as_10000.txt

Deliverables: For both parts of the project, please turn in all the source code files and the makefile in a single tar file online via the Canvas system. Note: You should not include any test data files, including the two BGP data traces provided by us.

In order to ensure that you have plenty time to complete both parts of the project, we have a suggested unofficial deadline on 9/9/2025, you do not need to submit part I by this date, but you should try to finish part I by this deadline.

Some helpful information