Project: Implementation of TCP/IP Protocol Stack in an Emulated Network Environment


 

In this project, you are asked to implement the TCP/IP protocol stack in an emulated network environment. In the following, we will first highlight the computer network model used in this project, and then we describe in detail how the components of the computer network should be implemented in the emulation environment. 
  1. Network model overview
  2. Network emulation
  3. Netork components
  4. Phase 1 and 2 requirements
  5. A simplified version of TCP
  6. A simplified version of OSPF
  7. Provided code of the project
  8. Some guidelines on the project report
  9. Grading policy

Section I: Network model overview 

A LAN is (typically) a broadcast network, where stations (or routers) are attached to a common shared physical medium and a transmission from any one station is broadcasted to and received by all other stations.

We consider a star-wired LAN topology where each station is directly connected to a common central node, called a transparent bridge or bridge in short (or switch). Each station is connected to the bridge through one of its ports, thus a bridge with n ports can connect n stations in total. When a bridge is first plugged, it repeats a packet (ethernet frame) it receives from one station (via one of its ports) to all the other stations attached to it, i.e., by retransmitting the packet through all the ports except for the one from which it receives the packet. This enables every station to hear the transmission from any other station connected to the bridge. Through self learning, the bridge will gradually learn the location of the stations on the LAN and only direct the packet to the necessary segment of the LAN.

IP routers interconnect LANs to form a virtual IP network. Any pair of stations in this virtual network can communicate with each other using the Internet Protocols. Stations and routers (or more precisely, the interfaces through which they are attached to the internetwork) are assigned globally unique IP addresses. IP uses a table-driven, hop-by-hop datagram delivery model. Each datagram is attached an IP header with appropriate source and destination IP addresses. Routers (and in the very first hop, stations) are responsible for delivering datagrams to their next-hops towards the destination. This hop-by-hop packet forwarding is based on some forwarding (routing) table. All the routers in the internetwork will run some routing protocol to create a local forwarding table for its own.

A station can transmit a packet to another station. Before it transmits the packet, it needs to add a packet header. The two most important fields in the header are source address and destination addresses. A station should also accept a packet sent by another station which is addressed to it while discarding any other packet which it receives but is not addressed to it.

Figure 1 presents a simple IP network, where we have three LANs, realized through three bridges, Bridge 1, 2, 3, respectively. Router 1 interconnects LAN 1 and LAN 2, while Router 2 interconnects LAN 2 and LAN 3. On LAN 1, we have two stations, Station A and Station B. LAN 2 and LAN 3 each have a single station, Station C and Station D, respectively. In this configuration, Station A and Station B can communicate with each other directly through Bridge 1. When Station A sends a message to Station C, Router 1 will forward this message from LAN1 (Bridge 1) to LAN2 (Bridge 2) by consulting its routing table. Similarly, when Station A tries to send a message to Station D, Router 1 will forward it from LAN1 to LAN2, and Router 2 will forward it from LAN2 to LAN3 based on the packet destination IP address, i.e., the IP address of Station D.



Figure 1. A simple IP network.

Section II: Computer network emulation

In order to emulate the physical links of a LAN, we will make use of Unix BSD. Moreover, all the stations, bridges, and routers are implemented in software. That is, our computer network is completely software based.

As you know, the paradigm used in socket programming is a client-server model. In our network emulation model, we will have a bridge act as a server and stations (routers) attached to it as its clients, and the physical links between the bridge and a station will be emulated by a connection-oriented TCP socket connection. Note that you always have to run a bridge first before running a station (why?). A client has to be aware of the whereabouts of the server in any client-server model (a common approach is to have the server use a well-known port). Here we make lan-name one of the command line arguments for the bridges, so that stations and routers on the LAN know where to look for the bridge information. There can be only one bridge for a given lan-name. Once a bridge is ready to listen, it stores its IP address and the port by creating two files (or symbolic links) in its directory (assumed that both bridge and station are invoked from the same directory which shared across all the machines in the emulation) namely .lan-name.addr and .lan-name.port (this is just one way of sharing information between a server and a client). Given a lan-name, a station, by reading these two files, knows the complete address of the bridge.

Section III: Network components

As discussed above, our network model consists of three components namely bridge, station and router. The program interface and the basic functionalities for each of these components are described below. 
Bridge
Station
Router

NOTE:    In the first phase, a router gets the routing table from the file, i.e., the routing table is populated manually. In the second phase, a router needs to run a routing protocol to learn the network topology and compute the short paths to each destination network and fills the routing table accordingly.

 

Section IV : Phase 1 and 2 Requirements

Phase 1

            In the first phase of the project, you are asked to implement transparent bridges, ARP, IP, and UDP.

Phase 2

            In this phase, you need to implement TCP and OSPF.

 

Section V: A simplified version of TCP

Section V: A simplified version of OSPF

After a user types the command "ip ospfd," on a router, the router starts to run the simple link-state routing protocol to learn the network topology information and to compute the routing table for forwarding packets. When a router detects or learns a network topology change, it will re-compute the routing table. In some sense, it is a simplified OSPF (Open Shortest Path First) protocol. For ease of explanation, in the following we will use the term OSPF to refer to our simple link-state routing protocol. To ease the explanation of the OSPF protocol, let us consider it in two phases. The first one is a LEARNING phase. When a router first starts, it is in this LEARNING phase to collect the network topology information. In this phase, it has only limited routing table entry (local LAN information). At the end of this phase, the router runs the Dijkstra algorithm to compute the routing table based on the learned network topology information. The second phase is a MAINTAINANCE phase. During this phase, it only periodically communicates with its neighbors with a HELLO/HELLOACK message (see below) to detect if there are network topology changes (an old neighbor departs or a new neighbor joins the LAN). If so, it will re-compute the routing table, and propagate this network topology change information to others by flooding. First, let us see the LEARNing phase. There are five steps in this phase.
  1. Learning the neighbors. Neighbors are the other routers that are on the same subnet (in this project, connected to the same bridge).
  2. Exchanging the detailed connectiviy information with the neighbors. The connectivity information includes: the distance to the neighbors and the directly connected sub-networks (subnet IP addresses and their corresponding network masks).
  3. flooding the learned connectivity information to other routers.
  4. After a router gets all the network topology information, it applies the Dijkstra algorithm to compute the shortest-paths from this router to all the other routers.
  5. Each router connects to several subnets. From the results in the last step, we compute the shortest-paths to all the subnets. This shortest-path to all the subnet information is saved in the routing table for forwarding packets.

In the following, we will describe them in detail one by one.

Step 1 &2: Learning neighbors and exchanging connectivity information
For each router, a unique router ID is assigned to it. Normally this is chosen to be the smallest or largest IP address of the router (recall that a router has multiple NIC cards and therefore multiple IP addresses). Three kinds of messages are used in OSPF, in the following we discuss them one by one. For clarity, we point out here that, each router should at least maitain two separate data structures: the neighbor list, which is the list of neighbors of this router (see below for details); the link-state database, which maintains neighbor information for ALL the routers in the network (see below for details). The neighbor list is used for the router to detect if there is a topoloy change. The link-state database is used for contructing the graph for the Dijkstra algorithm.
  1. THE HELLO MESSAGE

    After a router starts, it will send a Hello message to the all-routers IP address. For example, consider a network IP address 128.252.1.0/24, the all-routers IP address is 128.252.1.255. The hello message includes (at least) the following information:

     ospf_msg_type
     router ID
     sequence number
     number of neighbors
     list of neighbors
    
    where ospf_msg_type is OSPF_TYPE_HELLO; router ID is 4 bytes (IP address); sequence number is used to indicate if there is something new at the sending router (see below); number of neighbors indicates how many neighbors the router knows now; list of neighbors is a list of the neighbors' router IDs.

    When a router receives an OSPF message, it checks the ospf message type. If it is a hello message, it reads out the router ID information, and checks to see if this is a new neighbor. For this purpose, each router maintains a neighbor list.

  2. THE HELLOACK MESSAGE

    When a router receives a helloAck message (see above for the format), it will respond back an OSPF_TYPE_LSA_UPDATE message, this is discussed below.

  3. THE LSA UPDATE MESSAGE

    A router will send out an OSPF_TYPE_LSA_UPDATE message to another router after it hear the helloAck message from that router. The format of the OSPF update messsage is

        ospf_msg_type
        router ID
        sequence number
        number of neighbors
        a neighbor router ID
        distance to the neighbor
        ...
        number of directly connected subnet (LAN)
        a subnet (network) IP address
        network mask
        disttance to the subnet
        ... 
        optional field
    
    where ospf_msg_type is OSPF_TYPE_LSA_UPDATE; router ID and sequence number have the previous meaning. Note that this router ID is the router identifier of its own; the number of neighbors indicates how many neighors follow in the messsage. Following this information is a list of neighbors and their distance to this router. We assume the distances to be 1. Also included in the update message is list of LANs that the router connects to, and the distances (assumed to be 1). Optional field has the following format:
        router ID
        sequence number
        number of neighbors
        a neighbor router ID
        distance to the neighbor
        ...
        number of directly connected subnet (LAN)
        a subnet (network) IP address
        network mask
        disttance to the subnet
        ... 
        optional field
    
    Its format is as same as above, however, this set of information is not the one of the sending router. Instead, it is the information of other routers in the network that are currently in the link-state database of the sending router. This optional field is attached only when a router detects a NEW router. The purpose of this optional field is to speed up the materialization of the link-state database of the new router.

    After a router receives an OSPF_TYPE_LSA_UPDATE, it records down the information contained in this packet into a link-state database including the router ID, its neighbors and the distances, its directly conntected subnet and the distances. It will also floods this update message out to other routers.

At the LEARNING phase, the HELLO message is sent out periodically with relatively short time interval, say every 1 second. After certain time, say 10 seconds, the router assumes that it has all the network topology information, and it then goes to step 4 for computing the shortest-path to all the routers.

Note also that, when a router detects a new router (heard a HELLO message from a new router), after it has sent back HELLOACK and received update message, it will immediately send HELLO message to the new router instead of waiting the timer indicating when to send next HELLO message to expire. In this way, the new router will get the network topology faster. (Note also, as we will see later, the time interval to send out HELLO message in the MAINTAINANCE phase is relatively long.)

Step 3: Flooding the connectivity information
After a router receives an OSPF_TYPE_LSA_UPDATE message, it will forward it out from all the interfaces except the one where it hears the update message. 

Note that, a router needs to make sure that there is no loop. That is, if this update is about a new router, or the sequence number is "newer" than the one in its local link-state database, the router will forward it; otherwise, it should just ignore this update message. Note that if optional field is present in the update message, the receiving router may change the content of the update message before forwarding the update the packet (namely removing some optional field which is related to the routers that are already in the local link-state database).

Step 4: Computing shortest-path to all the routers
After a router gets all the network topology information (as we said before, after certain amount of time, say 10 seconds, the router will assume it has all the network topology information), it builds a directed network graph starting from its own. In this graph, each node is a router with its router ID as the node name. The Dijkstra algorithm is applied to get the shortest-path from this router to all the other routers. You can find information about Dijkstra algorithm (and its implmemention) in some data structure and algorithm textbooks or by searching on google. (Indeed, you can find the implementation of the complete OSPF.)
Step 5: Computing shortest-path to all the subnets
After a router gets the shortest-path to all other routers, it computes the distances to all the subnets and choose the shortest-path to them. This shortest path information is written into the routing table, which has the following formate:

destinationIPAddress nextHopIPAddress networkMask interface

where destinationIPAddress is the IP address of the destination (network), hextHopIPAddress is the IP address of the next hop (where the router should forward a packet to, remember that this information is contained in the neighbor list), networkMask is the network mask; and interface is the name of the interface from which the packet should be sent out.

Now let us move on to the MAINTAINANCE phase. After the LEARNING phase, a router has computed its routing table based on the link-state database. From now on, it moves on to the MAINTAINANCE phase. In the MAINTAINANCE phase, it will again periodically send out HELLO message to the ALL-ROUTER IP address, with a relatively long time interval, say 10 seconds. If a router does not hear HELLO or HELLOACK message from a neighbor for certain time interval, say 30 seconds, it considers this neighbor is dead. It will re-compute the routing-table and floods this information out to other neighbors by the update message. Note that, whenever a router detects some changes of its neighbor information, the sequence number will be increased by 1 (mod (the maximum sequence number +1)) so that other routers know that this update information is newer.

 

Section VI: Provided code of the project

Section VII: Some guidelines on the project report

  1. Up to two pages
  2. What components/functionalities you have implemented
  3. Briefly describing the major functions
  4. Journey of the implementation
  5. Difficulties encountered and how you solve them

Section VIII: Grading policy

We only grade the part of the code that works

Phase 1: 30% of final grade

  1. broadcasting of bridge: 10 points
  2. self-learning of bridge (including entry expiration): 10 points
  3. sending/receiving Ethernet frames (encapsulating/decapsulating Ethernet frames): 10 points
  4. ARP (including sending/receiving ARP packets and ARP caching): 10 points
  5. sending/receiving IP packet: 10 points
  6. IP packet forwarding (at a router): 10 points
  7. UDP: 10 points
  8. connecting/disconnecting stations/routers at any time: 10 points
  9. code readability: 10 points
  10. report: 10 points

Phase 2: 30% of final grade:

  1. Emulation of lossy link (packet loss at routers/bridges): 5 points
    1. ip loss
  2. TCP connection establishment/close management (negotiating window size): 10 points
  3. TCP slow start and congestion avoidance algorithm, TCP Tahoe or Reno (choose one of them): 10 points
  4. TCP packet retransmission (go back N or selective repeat): 10 points
  5. FTP application: 10 points
  6. OSPF step 1: 5 points
  7. OSPF step 2: 10 points
  8. OSPF step 3: 10 points
  9. OSPF step 4: 10 points
  10. OSPF step 5: 10 points
    1. ip sh route
    2. ip rm route
  11. Code readability: 5 points
  12. report: 5 points