My Research at Florida State University

 

My research is in the networking area, including wireless networks, optical networks, and incentive mechanisms in peer-to-peer networks. My research usually involves theoretical analysis as well as system implementations in various types of physical networks of practical interests. My research sponsors include NSF and Google. Since I joined FSU in August 2007, I have published in top venues such as ACM Mobisys, IEEE Infocom, and IEEE Transactions on Information Forensics and Security. I received NSF CAREER Award in 2012, Google Research Award in 2011, and a Best Paper Award in IEEE ICC 2009. I have graduated three Master students. I have been working with three PhD students since Summer 2008; one PhD student has passed the dissertation defense. I have supported all my PhD students to attend conferences where they have papers accepted.

 

This page is devoted to my research at FSU; the highlights of my PhD research can be found here.

 

 

Overview

 

Wi-Fi networks are becoming ubiquitous as they provide the convenience and flexibility that people need. Much of my recent work is on improving the performance of Wi-Fi networks. Our results have been published in ACM Mobisys 2011, IEEE Infocom 2010, as well as several other papers in IEEE Globecom 2009 and 2011. We improve the efficiency of the Wi-Fi link with partial packet recovery (ACM Mobisys 2011, IEEE Globecom 2012), utilize advanced physical layer techniques to achieve simultaneous transmissions (IEEE Infocom 2010, IEEE Globecom 2009), design more efficient Medium Access Control layer protocols (IEEE Globecom 2011), as well as exploring the wireless mesh technology (IEEE Globecom 2011, IEEE Globecom 2012).

 

Optical technology will reshape the interconnection network design for high performance computing systems because of its huge capacity and relatively low power consumption. A major part of my PhD thesis is about packet scheduling in optical networks and I have been continuing this line of research since joining FSU. I obtained funding from NSF on the design of an ultra-low latency optical packet switching network. The results have been published in IEEE IPDPS 2009 and IEEE Infocom 2011; some journal papers are under preparation and one journal paper will appear in IEEE Journal of Lightwave Technology.

 

Out of curiosity and also driven by the practical demand, I also worked on the incentive mechanisms in peer-to-peer (p2p) networks. We designed the bandwidth puzzle which ensures that the peers cannot collude to get free credits from the system. The results first appeared in ICISS 2009 and a theoretical proof appeared in IEEE Transactions on Information Forensics and Security.

 

Note:

1.      Unless specified as “collaborative project,” a project involves only my students and me.

2.      Student names are shown in italic font; directly supervised students are further marked with superscript *.

Multi-User Multiple-Input-Multiple-Output (MU-MIMO) for Wi-Fi

 

                   

Wi-Fi is a widely deployed technology that has changed the way people work and spend their leisure times. The Wi-Fi interface in smartphones, laptops, and tablets allows people to work anywhere and even work while mobile. The Wi-Fi technology therefore has a significant impact on our society.

 

In our paper published in IEEE Infocom 2010, we propose to use the Multi-User Multiple-Input- Multiple-Output (MU-MIMO) to improve the Wi-Fi downlink throughput. We refer MU-MIMO on the downlink as One-Sender-Multiple-Receiver (OSMR) for clarity because MU-MIMO can also be used to refer to the uplink transmissions. With OSMR, the network downlink throughput can potentially be increased by several times, because the Access Point (AP) can send distinct data to multiple clients simultaneously.

 

We started to work on OSMR in 2008; prior to that, OSMR has been mainly studied with theoretical analysis and simulations. Our focus was to apply OSMR in Wi-Fi networks which are typically deployed indoors with highly dynamic traffic. In 2008, it was unclear whether or not OSMR was applicable to Wi-Fi networks because the indoor wireless channel characteristics for OSMR were not well understood; in addition, the problems of scheduling OSMR transmissions under the dynamic Wi-Fi data traffic had not been studied yet.

 

We were among the first to implement prototype OSMR transceivers and conduct indoor OSMR experiments. Our implementation was in C++ with GNU Software Defined Radio.  Our experiments confirm that the indoor wireless channel can be stable and usually contains sufficient degrees of diversity to support OSMR.

 

We also study the problem of packet scheduling with OSMR. Basically, OSMR opportunities arise when the wireless channel permits; however, as an OSMR transmission involves multiple nodes, the AP may or may not have data to send to all nodes. The key challenge of the problem is to cope with both the randomness in the wireless channel and the randomness in the traffic. To this end, we first prove that the optimal scheduling packet problem is NP-hard, and then propose a practical algorithm.

 

We conduct simulations with the wireless channel traces collected with our GNU Software Defined Radio implementation and the real traffic traces in Wi-Fi networks. Our simulation result confirms the gain of OSMR; more importantly, it proves that the gain is determined to a large degree by the nature of the traffic, which is a key distinction between our finding and the findings in other papers which focus only on the wireless channel and assume constant or saturated traffic, such as an ACM Mobicom 2010 paper (Design and Experimental Evaluation of Multi-User Beamforming in Wireless LANs) published shortly after our paper. We used the simulation for evaluation because the GNU Software Defined Radio is a physical layer tool and does not efficiently support Medium Access Control layer functions.  Today, the collective efforts from multiple research groups have convinced the protocol standardization organizations to consider adopting MU-MIMO in the 802.11 protocol; the future 802.11ac will likely support MU-MIMO.

 

Publication (journal version is under review after major revision):

1.     Z. Zhang, S. Bronson*, J. Xie* and W. Hu*, “Employing the one-sender-multiple-receiver technique in wireless LANs,” in Proc. of IEEE Infocom, San Diego, USA, March 2010. 9 pages. Acceptance rate: 18 % (276 / 1575).   [paper]

 

Software Partial Packet Recovery in Wi-Fi Networks

Our ACM Mobisys 2011 paper describes UNITE, a software partial packet recovery scheme for Wi-Fi networks. In a Wi-Fi link, a packet may be received correctly in most parts but contains just a few errors. Such packets are usually referred to as the partial packets; they will not pass the CRC check and will be discarded by the Wi-Fi protocol and retransmissions will be scheduled. We note that it makes much more sense to repair such packets instead of retransmitting them because the number of errors in such packets is usually very small. Partial packet recovery has been studied in recent years and significant gains over the original Wi-Fi protocol have been demonstrated. The existing repair methods include retransmitting the corrupted parts of the packet or correcting the errors with an error correction code.

 

We, in a sense, revisit this problem due to the following motivations. First, the repair methods are not mutually exclusive; a joint repair scheme may achieve better performance. Second, we propose an error estimator referred to as AMPS, which can estimate the number of errors in a packet and provide important information for partial packet recovery. Third, the existing methods assume the CPU usage is not a constraint, while our experiments show that the CPU usage can be very high in many cases such that a recovery scheme that works under a CPU usage constraint is needed.

 

UNITE employs multiple repair methods to provide best flexibility under various scenarios. Different repair methods have different pros and cons: retransmission does not consume CPU time but is more vulnerable to random errors; error correction consumes CPU time but is more resilient to random errors. A repair method may be ideal for one case but not for others. By supporting multiple repair methods, UNITE is more flexible than existing schemes which support only one repair method. In addition, UNITE also supports a third repair method, referred to as the Target Error Correction (TEC), which first locates the corrupted blocks and uses an error correction code to correct the errors in the corrupted blocks. Compared to the block-based retransmission, TEC requires less repair data because only the parity bytes are transmitted which can be much smaller than the size of the original data. Compared to the packet level error correction, TEC zooms in the corrupted parts of the packet and requires less parity bytes and less CPU time for decoding. In practice, TEC can recover a large percentage of packets because many packets contain just a few errors.

 

UNITE uses AMPS as the error estimator to determine the number of errors in a packet such that it is able to make intelligent decisions on packet recovery. An error estimator can roughly tell the number of errors in a packet which can be used to determine the most efficient repair method. Error estimator was studied only very recently and the other estimator is EEC (Efficient Error Estimating Coding: Feasibility and Applications) which received the Best Paper Award in ACM Sigcomm 2010. We use AMPS because it is comparable with or better than EEC in every aspect. On performance, we did a simulation comparison and found that AMPS has smaller estimation errors than EEC. On overhead, AMPS requires an additional 8 bytes per packet while EEC requires 36 bytes per packet when the packet size is 1500, the most common packet size in Wi-Fi networks. On complexity, both AMPS and EEC can be implemented according to a simple table lookup. AMPS achieves better performance than EEC because it was developed according to the Maximum A Posteriori criteria while EEC is a simple algorithm.

 

UNITE is a practical packet recovery scheme because it checks its CPU usage and will not over-consume the CPU time. The decoding of the error correction code can take non-trivial time because such algorithms are highly complicated. Earlier papers, such as ZipTx (ZipTx: Harnessing Partial Packets in 802.11 Networks) which was published in ACM Mobicom 2008, argue that the CPU usage caused by the decoding algorithm is low. We note that the experiments in ZipTx were carried out on high end desktop machines; on devices with weaker CPU such as smartphones and tablets, the CPU usage will become a key constraint. We confirm with our experiments on a Toshiba laptop computer that the CPU usage can be very high under certain channel conditions. Therefore, UNITE takes a parameter from the system as the decoding time limit and uses an algorithm to select the best repair method to maximize the link throughput under the CPU time constraint.

 

UNITE is a software-only solution that can be installed at the Access Point (AP) and the user devices running on commodity wireless cards and bring immediate benefits. We implement UNITE and AMPS on the MadWifi driver and our experiments confirm that UNITE achieves higher link throughput than all existing methods while consuming CPU time under the specified constraint. To date, UNITE is the state-of-the-art software partial packet recover scheme.

 

Publication (journal version is under preparation):

1.      J. Xie*, W. Hu* and Z. Zhang, “Revisiting partial packet recovery for 802.11 wireless LANs,” in Proc. of ACM Mobisys, Washington DC, USA, June 2011. 12 pages. Acceptance rate:  17.7% (25 / 141). [accepted] [ACM final]

 

OpCut – High Performance Optical Packet Switch (Collaborative Project)

       

Optical technology will reshape the interconnection network design for high performance computing systems because of its huge capacity and relatively low power consumption. I have been working on optical networks since my PhD and a major part of my PhD thesis is about packet scheduling in optical networks.  I have been continuing this line of research since joining FSU.

 

We have been working on a project on practical optical packet switching, which has been supported by NSF since 2009. We propose a new switching architecture, called OpCut, which combines optical switching with electronic buffering. OpCut is an attractive solution for optical packet switching because practical optical memory still does not exist and electronic memory is the only viable solution for providing the buffering capacity needed by a switch. Prior to OpCut, IBM, Corning, and the US Department of Energy started a project called OSMOSIS with the goal of building an optical packet switch, which also uses electronic buffers. The core difference between OpCut and OSMOSIS is that OpCut achieves lower packet latency, which is critical for high performance computing applications where the latency should ideally be in the order of microseconds.

 

The key feature of OpCut to achieve low latency is that it allows an optical packet to cut-through the switch whenever possible. That is, optical packets will be directly transferred from the input side to the output side such that packets experience only the propagation delay in most cases if the traffic load is not excessively high. On the other hand, the OSMOSIS switch converts every packet from the optical form to the electronic form, which introduces extra latency as well as increasing the power consumption.

 

After I conceived the idea of OpCut, I did an initial performance analysis with a Markov chain model, which was presented in IEEE IPDPS 2009. We wrote the proposal based on our preliminary results and submitted to NSF, which was funded in 2009. We have been working on the OpCut project and our results have been presented in two papers in IEEE Infocom 2011. An extended version of one of the IEEE Infocom 2011 papers has been accepted for publication in IEEE Journal of Lightwave Technology. More journal papers are currently under preparation.

 

One of the key challenges in the OpCut switch is to maintain packet order, which we solve with a novel solution based on a timestamp-based buffer. In the OpCut switch, if a packet cannot cut through, it is converted to the electronic form and buffered in the middle nodes. When the output ports are free, the middle nodes can send the buffered packets to the output ports. Because the packets arrived at the same input port may be sent to different middle nodes, the packet order may be disrupted. On the other hand, upper layer protocols such as TCP depend on the assumption that the network switches do not send packets out of order in most cases. The packet ordering problem in the OpCut switch is similar to the packet ordering problem in the two-stage switch; however, the existing solutions for the two-stage switches are either too complicated or cannot be implemented in an optical switch. Our solution is to use a buffer indexed by the packet timestamp and is very simple in nature. Basically, packets are written to the buffer according to their timestamps. The output ports keep track of packet information and only request for packets at the head of the flows. If an output port can receive a packet, it can send the lower bits of the timestamp to the middle node which can locate the packet in constant time.

 

We study the packet scheduling problem with a pipelined scheduler to reduce the length of a time slot. The performance of the OpCut switch, similar to the input-buffered switch, is to a large extent constrained by the efficiency of the scheduling algorithm which determines which packet can be sent from the input port to the output port and which packet has to be buffered. Advanced algorithms will lead to better performance; however, the complexity of the algorithm is under the constraint that the algorithm must converge to a solution in a time slot, i.e., the time to send a data block or a cell, because a decision is needed in every time slot. We propose to adopt a divide-and-conquer approach by breaking down the scheduling problem into smaller problems where each small problem is finished in a time slot. Basically, we introduce multiple sub-schedulers, each taking the partial schedule calculated by the sub-scheduler of lower order and augmenting the schedule with more packet transmissions. The advantage of such pipelined scheduler is that advanced scheduling algorithm can be used without increasing the length of the time slot. We rigorously prove that duplicate scheduling, i.e., multiple sub-schedulers scheduling the same packet, does not occur with our algorithm when two sub-schedulers are used. We also propose a technique called “adaptive pipelining” to reduce the packet latency when the traffic load is low. Basically, when the traffic is low, the scheduling problem is simple and fewer sub-schedulers may suffice such that the packets may experience less delay. This work was first presented in IEEE Infocom 2011 and an extended version has been accepted for publication in IEEE Journal of Lightwave Technology.

 

We study the packet scheduling problems when there are multiple wavelengths per optical fiber. The capacity of an optical packet switching network is improved when the bandwidth of an optical fiber is better utilized with multiple wavelengths carrying packets in parallel. The packet scheduling problem with multiple wavelengths is even more challenging than with a single wavelength, because multiple packets may arrive an input fiber at one time slot and the orders of such packets must be maintained. We prove that the optimal scheduling problem is NP-hard by reducing the Set Packing problem to the scheduling problem. We further prove that the scheduling problem cannot be approximated within a constant factor. Therefore, we propose a set of practical scheduling algorithms and our simulations show that they achieve good performance in typical conditions. This work was first presented in IEEE Infocom 2011 and an extended version is currently under preparation for journal submission.

 

Publications:

1.      L. Liu, Z. Zhang and Y. Yang, “Pipelining packet scheduling in a low latency optical packet switch,” IEEE Journal of Lightwave Technology, to appear. [accepted] [IEEE final]

2.      Z. Zhang and Y. Yang, “Performance analysis of optical packet switches enhanced with electronic buffering,” in Proc. of IEEE IPDPS, Rome, Italy, May 2009. 9 pages. Acceptance rate: 22.7 % (100 / 440). [paper]

3.      L. Liu, Z. Zhang and Y. Yang, “Packet scheduling in a low-latency optical switch with wavelength division multiplexing and electronic buffer,” in Proc. of IEEE Infocom, Shanghai, China, April 2011. 9 pages. Acceptance rate:  15.96% (291 / 1823). [accepted] [IEEE final]

4.      L. Liu, Z. Zhang and Y. Yang, “Pipelining packet scheduling in a low latency optical packet switch,” in Proc. of IEEE Infocom, Shanghai, China, April 2011. 9 pages. Acceptance rate:  15.96% (291 / 1823). [accepted] [IEEE final]

 

Bandwidth Puzzle – Securing the Incentives in P2P Networks (Collaborative Project)

               

Peer-to-Peer (P2P) technology has been widely deployed; it has been reported that over 65% of the Internet traffic is P2P traffic. Our papers on the incentive schemes in p2p networks solve a critical problem: why is a peer willing to contribute its resources and serve other peers, as serving other peers means the consumptions of network bandwidth, CPU cycles, and battery life? A selfish peer may choose not to serve any peer; the system will fail if it allows such peers to get free rides.

 

The key challenge in designing a good incentive scheme is that the peers may collude. A naïve approach, for example, is to collect data transfer logs from the nodes and give higher priorities to peers who have uploaded more data to other peers. This naïve approach will fail because such logs can be forged; Alice may send a report to the system claiming that she has uploaded a large file to Bob while actually she has not. It is also useless to confirm this claim with Bob because Bob may be a friend of Alice and may lie to the system such that Alice may get free credits from the system.

 

The key breakthrough to this problem is the puzzle challenge scheme first described in our ICISS 2009 paper. The idea is to challenge all peers simultaneously with puzzles and require everyone to send back the answers of the puzzles before a timeout. The puzzle has the key properties that 1) it takes time to solve the puzzle and 2) the puzzles can be solved only if the peer has the content that it claims to possess. Therefore, a peer cannot solve the puzzle if it does not have the content; neither can its friends help in solving the puzzle because the friends have to solve their own puzzles before the timeout. As a result, the fake transactions can be detected.

 

The puzzle solution is major contribution because it is the first real solution to the secure incentive problem. Prior to that, no solutions existed. The free rider can potentially get service without doing any work, which has been demonstrated by several research papers on popular p2p networks such as BitTorrent. The puzzle challenge scheme can be adopted in such p2p networks and improve the security of the incentive mechanisms.

 

Another major contribution is the performance bound I proved for the puzzle. The theoretical proof of the bound appears in the April 2012 issue of IEEE Transactions on Information Forensics & Security. The bound gives the expected number of bits the peers must upload to be able to solve the puzzles with a certain probability. Such a bound is needed because it gives an assurance to the system on the expected outcomes after applying the puzzle. The first ICISS 2009 paper provided a weaker bound; the new bound is much tighter because it can be approached asymptotically by a practical strategy. The major challenge in proving the bound is that the adversaries may collude and adopt any algorithm. The bound must be established against the optimal algorithm; however, the optimal algorithm is unknown due to the complexity of the puzzle challenging scheme. The breakthrough is to adopt an indirect proof technique, i.e., introducing an intermediate system which is easier to reason about and is close to the real system in performance. Basically, I first prove the difference between the intermediate system and the real system, and then prove the bound for the intermediate system which is combined with the difference to establish the bound for the real system.

 

Publications:

1.      Z. Zhang, “A New Bound on the Performance of the Bandwidth Puzzle,” IEEE Transactions on Information Forensics & Security,  vol. 7, no. 2, pp. 731-742, April 2012.[accepted] [IEEE final]

2.      M.K. Reiter, V. Sekar, C. Spensky and Z. Zhang, “Making contribution-aware P2P systems robust to collusion attacks using bandwidth puzzles,” in Proc. of ICISS, Kolkata, India, December 2009. 16 pages. Acceptance rate: 19.8 % (18 / 91). [paper]

 

Coded Relay (Crelay) in Multi-hop Wireless Networks

 

               

Multi-hop wireless networks such as wireless mesh networks may extend the wireless coverage at low cost. In this project, we design and implement a novel packet forwarding protocol called Coded Relay (Crelay) by exploiting both overhearing and partial packets. The basic idea is to send only parity bytes to the downstream node if the downstream node has already overheard a corrupted version of the original packet because it can correct the errors with the parity bytes.

 

We design a novel protocol which exploits packet queuing delays for extra coordination time to allow the upstream nodes to learn the states of the downstream nodes in a timely manner. The downstream node announces a feedback about a packet that has been overheard and the upstream nodes can wait for the feedback before processing the packet because it has to send other packets first. The main feature of the protocol is that it will work seamlessly with the upper layers such as TCP because it processes individual packet while other opportunistic routing protocols may require packet batching and cannot work with TCP.

 

Crelay is a departure from both traditional routing and existing opportunistic routing. Exploiting the overhearing opportunities, Crelay achieves better performance than traditional routing. Exploiting the queuing delay, Crelay is a simpler and more upper-layer-friendly protocol compared to existing opportunistic routing protocols such as MORE. In addition, Crelay uses advanced error correction codes to recover partial packets in a multi-hop setting; in contrast, existing protocols may ignore the partial packets, or only use simple error correction codes, or have to rely on special hardware.

 

We implement the protocol in around 5,000 lines of C++ code in the Click modular router, and our experiments on an 11-node testbed show significant gains over existing protocols. This work will be presented in IEEE Globecom 2012.

 

arXiv technical report:

1.      Z. Zhang, W. Hu* and J. Xie*, Employing Coded Relay in Multi-hop Wireless Networks,” http://arxiv.org/abs/1012.4136

 

Others

Network fault diagnosis. We estimate the most likely faulty link with only the end-to-end path loss ratio observations. The novelty of this work is to exploit the prior knowledge of the link loss ratio which has been previously overlooked. This work won a Best Paper Award in IEEE ICC 2009.

·         B. Sun* and Z. Zhang, “Probabilistic diagnosis of link loss using end-to-end path measurements and maximum likelihood estimation,” in Proc. of IEEE ICC, Dresden, Germany, June 2009. 5 pages. Acceptance rate: 35 % (1050 / 3000). Winner of a Best Paper Award. [paper]

 

A Medium Access Control (MAC) protocol combining polling and contention. We propose to combine polling with contention to improve the network efficiency. We use the maximum likelihood estimation of the number of active nodes to determine the dynamic switching between the polling mode and contention mode. This work has been presented at IEEE Globecom 2011.

·        S. Zhou* and Z. Zhang, “cMAC: A centralized MAC protocol for high speed wireless LANs,” in Proc. of IEEE Globecom, Houston, USA, December 2011. 6 pages. Acceptance rate:  36.6% (1070 / 2923). [accepted] [IEEE final]

 

Enhancing random network coding with partial packets support. We enhance MORE, a packet forwarding protocol based on wireless random network coding, with partial packets support. This work has been presented at IEEE Globecom 2011.

·         W. Hu*, J. Xie* and Z. Zhang, “pMORE: Exploiting partial packets in opportunistic routing,” in Proc. of IEEE Globecom, Houston, USA, December 2011. 6 pages. Acceptance rate:  36.6% (1070 / 2923). [accepted] [IEEE final]

 

A hybrid packet switch achieving 100% throughput. We propose the Distribute and Match switch which employs a randomization stage and a matching stage, and achieves 100% throughput. This work has been presented at IEEE Globecom 2011.

·         Z. Zhang, “Distribute and match -- The DM switch for high speed packet switching networks,” in Proc. of IEEE Globecom, Houston, USA, December 2011. 6 pages. Acceptance rate:  36.6% (1070 / 2923). [accepted] [IEEE final]

 

Investigation of indoor localization with carrier phase difference (collaborative project). We measure the carrier phase difference with GNU Software Defined Radio and explore the possibilities of accurate indoor localization. This work has been presented at IEEE RFID 2010.

·         C. Hekimian, B. Grant, X. Liu, Z. Zhang and P. Kumar, “Accurate localization of RFID tags using phase difference,” in Proc. of IEEE RFID, Orlando, USA, April 2010. 8 pages. Acceptance rate: 30 % (39 / 130).  [paper]