|
My Research at Florida State University |
My research is in the networking area, including
wireless networks, optical networks, and incentive mechanisms in peer-to-peer networks.
My research usually involves theoretical analysis as well as system
implementations in various types of physical networks of practical interests.
My research sponsors include NSF and Google. Since I joined FSU in August 2007,
I have published in top venues such as ACM
Mobisys, IEEE Infocom, and IEEE
Transactions on Information Forensics and Security. I received NSF CAREER Award in 2012, Google Research Award in 2011, and a Best Paper Award in IEEE ICC 2009. I have graduated three Master students. I have been
working with three PhD students since Summer 2008; one PhD student has passed
the dissertation defense. I have supported all my PhD students to attend
conferences where they have papers accepted.
This page is devoted to my research at FSU; the highlights of my
PhD research can be found here.
|
Overview |
Wi-Fi networks are becoming ubiquitous as they provide
the convenience and flexibility that people need. Much of my recent work is on
improving the performance of Wi-Fi networks. Our results have been published in
ACM Mobisys 2011, IEEE Infocom 2010, as well as several
other papers in IEEE Globecom 2009
and 2011. We improve the efficiency
of the Wi-Fi link with partial packet recovery (ACM Mobisys 2011, IEEE Globecom 2012), utilize advanced physical
layer techniques to achieve simultaneous transmissions (IEEE Infocom 2010, IEEE Globecom 2009), design more efficient
Medium Access Control layer protocols (IEEE
Globecom 2011), as well as exploring the wireless mesh technology (IEEE Globecom 2011, IEEE Globecom 2012).
Optical technology will reshape the interconnection
network design for high performance computing systems because of its huge
capacity and relatively low power consumption. A major part of my PhD thesis is
about packet scheduling in optical networks and I have been continuing this
line of research since joining FSU. I obtained funding from NSF on the design
of an ultra-low latency optical packet switching network. The results have been
published in IEEE IPDPS 2009 and IEEE Infocom 2011; some journal papers
are under preparation and one journal paper will appear in IEEE Journal of Lightwave Technology.
Out of curiosity and also driven by the practical
demand, I also worked on the incentive mechanisms in peer-to-peer (p2p) networks.
We designed the bandwidth puzzle which ensures that the peers cannot collude to
get free credits from the system. The results first appeared in ICISS 2009 and a theoretical proof
appeared in IEEE Transactions on
Information Forensics and Security.
Note:
1.
Unless specified as “collaborative project,” a project
involves only my students and me.
2.
Student names are shown in italic font; directly
supervised students are further marked with superscript *.
|
Multi-User Multiple-Input-Multiple-Output (MU-MIMO)
for Wi-Fi |
Wi-Fi is a widely deployed technology that has changed
the way people work and spend their leisure times. The Wi-Fi interface in
smartphones, laptops, and tablets allows people to work anywhere and even work
while mobile. The Wi-Fi technology therefore has a significant impact on our
society.
In our paper published in IEEE Infocom 2010, we propose to use the Multi-User Multiple-Input-
Multiple-Output (MU-MIMO) to improve the Wi-Fi downlink throughput. We refer
MU-MIMO on the downlink as One-Sender-Multiple-Receiver (OSMR) for clarity
because MU-MIMO can also be used to refer to the uplink transmissions. With
OSMR, the network downlink throughput can potentially be increased by several
times, because the Access Point (AP) can send distinct data to multiple clients
simultaneously.
We started to work on OSMR in 2008; prior to that,
OSMR has been mainly studied with theoretical analysis and simulations. Our
focus was to apply OSMR in Wi-Fi networks which are typically deployed indoors
with highly dynamic traffic. In 2008, it was unclear whether or not OSMR was
applicable to Wi-Fi networks because the indoor wireless channel
characteristics for OSMR were not well understood; in addition, the problems of
scheduling OSMR transmissions under the dynamic Wi-Fi data traffic had not been
studied yet.
We were
among the first to implement prototype OSMR transceivers and conduct indoor
OSMR experiments. Our implementation
was in C++ with GNU Software Defined Radio.
Our experiments confirm that the indoor wireless channel can be stable
and usually contains sufficient degrees of diversity to support OSMR.
We also
study the problem of packet scheduling with OSMR. Basically, OSMR opportunities arise when the
wireless channel permits; however, as an OSMR transmission involves multiple
nodes, the AP may or may not have data to send to all nodes. The key challenge
of the problem is to cope with both the randomness in the wireless channel and
the randomness in the traffic. To this end, we first prove that the optimal
scheduling packet problem is NP-hard, and then propose a practical algorithm.
We conduct simulations with the wireless channel
traces collected with our GNU Software Defined Radio implementation and the
real traffic traces in Wi-Fi networks. Our simulation result confirms the gain
of OSMR; more importantly, it proves
that the gain is determined to a large degree by the nature of the traffic,
which is a key distinction between our finding and the findings in other papers
which focus only on the wireless channel and assume constant or saturated
traffic, such as an ACM Mobicom 2010
paper (Design and Experimental Evaluation
of Multi-User Beamforming in Wireless LANs) published shortly after our paper. We used the
simulation for evaluation because the GNU Software Defined Radio is a physical
layer tool and does not efficiently support Medium Access Control layer functions. Today, the collective efforts from multiple
research groups have convinced the protocol standardization organizations to
consider adopting MU-MIMO in the 802.11 protocol; the future 802.11ac will
likely support MU-MIMO.
Publication (journal version is
under review after major revision):
1. Z.
Zhang, S. Bronson*, J. Xie* and W. Hu*,
“Employing the one-sender-multiple-receiver technique in wireless LANs,” in Proc. of IEEE Infocom, San Diego, USA,
March 2010. 9 pages. Acceptance rate: 18 % (276 / 1575). [paper]
|
Software Partial Packet Recovery in Wi-Fi Networks |

Our ACM Mobisys 2011 paper describes UNITE, a software partial packet recovery scheme for Wi-Fi networks. In a Wi-Fi link, a packet may be received correctly in most parts but contains just a few errors. Such packets are usually referred to as the partial packets; they will not pass the CRC check and will be discarded by the Wi-Fi protocol and retransmissions will be scheduled. We note that it makes much more sense to repair such packets instead of retransmitting them because the number of errors in such packets is usually very small. Partial packet recovery has been studied in recent years and significant gains over the original Wi-Fi protocol have been demonstrated. The existing repair methods include retransmitting the corrupted parts of the packet or correcting the errors with an error correction code.
We, in a sense, revisit this problem due to the
following motivations. First, the repair methods are not mutually exclusive; a
joint repair scheme may achieve better performance. Second, we propose an error
estimator referred to as AMPS, which can estimate the number of errors in a
packet and provide important information for partial packet recovery. Third,
the existing methods assume the CPU usage is not a constraint, while our
experiments show that the CPU usage can be very high in many cases such that a
recovery scheme that works under a CPU usage constraint is needed.
UNITE
employs multiple repair methods to provide best flexibility under various
scenarios. Different repair methods
have different pros and cons: retransmission does not consume CPU time but is
more vulnerable to random errors; error correction consumes CPU time but is
more resilient to random errors. A repair method may be ideal for one case but
not for others. By supporting multiple repair methods, UNITE is more flexible
than existing schemes which support only one repair method. In addition, UNITE
also supports a third repair method, referred to as the Target Error Correction
(TEC), which first locates the corrupted blocks and uses an error correction
code to correct the errors in the corrupted blocks. Compared to the block-based
retransmission, TEC requires less repair data because only the parity bytes are
transmitted which can be much smaller than the size of the original data.
Compared to the packet level error correction, TEC zooms in the corrupted parts
of the packet and requires less parity bytes and less CPU time for decoding. In
practice, TEC can recover a large percentage of packets because many packets
contain just a few errors.
UNITE uses
AMPS as the error estimator to determine the number of errors in a packet such
that it is able to make intelligent decisions on packet recovery. An error estimator can roughly tell the number of
errors in a packet which can be used to determine the most efficient repair
method. Error estimator was studied only very recently and the other estimator
is EEC (Efficient Error Estimating
Coding: Feasibility and Applications)
which received the Best Paper Award
in ACM Sigcomm 2010. We use AMPS because it is comparable with
or better than EEC in every aspect. On performance, we did a simulation
comparison and found that AMPS has smaller estimation errors than EEC. On
overhead, AMPS requires an additional 8 bytes per packet while EEC requires 36
bytes per packet when the packet size is 1500, the most common packet size in
Wi-Fi networks. On complexity, both AMPS and EEC can be implemented according
to a simple table lookup. AMPS achieves better performance than EEC because it
was developed according to the Maximum A Posteriori criteria while EEC is a
simple algorithm.
UNITE is a practical packet recovery scheme because it checks its CPU usage and will not over-consume the CPU time. The decoding of the error correction code can take non-trivial time because such algorithms are highly complicated. Earlier papers, such as ZipTx (ZipTx: Harnessing Partial Packets in 802.11 Networks) which was published in ACM Mobicom 2008, argue that the CPU usage caused by the decoding algorithm is low. We note that the experiments in ZipTx were carried out on high end desktop machines; on devices with weaker CPU such as smartphones and tablets, the CPU usage will become a key constraint. We confirm with our experiments on a Toshiba laptop computer that the CPU usage can be very high under certain channel conditions. Therefore, UNITE takes a parameter from the system as the decoding time limit and uses an algorithm to select the best repair method to maximize the link throughput under the CPU time constraint.
UNITE
is a software-only solution that can be installed at the Access Point (AP) and
the user devices running on commodity wireless cards and bring immediate
benefits. We implement UNITE and AMPS
on the MadWifi driver and our experiments confirm that UNITE achieves higher
link throughput than all existing methods while consuming CPU time under the
specified constraint. To date, UNITE is the state-of-the-art software partial
packet recover scheme.
Publication (journal version is
under preparation):
1.
J. Xie*, W. Hu* and Z. Zhang, “Revisiting partial
packet recovery for 802.11 wireless LANs,” in Proc. of ACM Mobisys, Washington DC, USA, June 2011. 12 pages.
Acceptance rate: 17.7% (25 / 141). [accepted] [ACM final]
|
OpCut – High Performance Optical Packet Switch
(Collaborative Project) |

Optical technology will reshape the interconnection
network design for high performance computing systems because of its huge
capacity and relatively low power consumption. I have been working on optical
networks since my PhD and a major part of my PhD thesis is about packet
scheduling in optical networks. I have
been continuing this line of research since joining FSU.
We have been working on a project on practical optical
packet switching, which has been supported by NSF since 2009. We propose a new
switching architecture, called OpCut,
which combines optical switching with electronic buffering. OpCut is an
attractive solution for optical packet switching because practical optical
memory still does not exist and electronic memory is the only viable solution
for providing the buffering capacity needed by a switch. Prior to OpCut, IBM,
Corning, and the US Department of Energy started a project called OSMOSIS with the goal of
building an optical packet switch, which also uses electronic buffers. The core
difference between OpCut and OSMOSIS is that OpCut achieves lower packet
latency, which is critical for high performance computing applications where
the latency should ideally be in the order of microseconds.
The key
feature of OpCut to achieve low latency is that it allows an optical packet to cut-through
the switch whenever possible. That
is, optical packets will be directly transferred from the input side to the
output side such that packets experience only the propagation delay in most
cases if the traffic load is not excessively high. On the other hand, the
OSMOSIS switch converts every packet from the optical form to the electronic
form, which introduces extra latency as well as increasing the power
consumption.
After I conceived the idea of OpCut, I did an initial
performance analysis with a Markov chain model, which was presented in IEEE IPDPS 2009. We wrote the proposal
based on our preliminary results and submitted to NSF, which was funded in
2009. We have been working on the OpCut project and our results have been
presented in two papers in IEEE Infocom
2011. An extended version of one of the IEEE
Infocom 2011 papers has been accepted for publication in IEEE Journal of Lightwave Technology.
More journal papers are currently under preparation.
One of the
key challenges in the OpCut switch is to maintain packet order, which we solve
with a novel solution based on a timestamp-based buffer. In the OpCut switch, if a packet cannot cut through,
it is converted to the electronic form and buffered in the middle nodes. When
the output ports are free, the middle nodes can send the buffered packets to
the output ports. Because the packets arrived at the same input port may be
sent to different middle nodes, the packet order may be disrupted. On the other
hand, upper layer protocols such as TCP depend on the assumption that the
network switches do not send packets out of order in most cases. The packet
ordering problem in the OpCut switch is similar to the packet ordering problem
in the two-stage switch; however, the existing solutions for the two-stage
switches are either too complicated or cannot be implemented in an optical
switch. Our solution is to use a buffer indexed by the packet timestamp and is
very simple in nature. Basically, packets are written to the buffer according
to their timestamps. The output ports keep track of packet information and only
request for packets at the head of the flows. If an output port can receive a
packet, it can send the lower bits of the timestamp to the middle node which
can locate the packet in constant time.
We study the
packet scheduling problem with a pipelined scheduler to reduce the length of a
time slot. The performance of the
OpCut switch, similar to the input-buffered switch, is to a large extent
constrained by the efficiency of the scheduling algorithm which determines
which packet can be sent from the input port to the output port and which
packet has to be buffered. Advanced algorithms will lead to better performance;
however, the complexity of the algorithm is under the constraint that the
algorithm must converge to a solution in a time slot, i.e., the time to send a
data block or a cell, because a decision is needed in every time slot. We
propose to adopt a divide-and-conquer approach by breaking down the scheduling
problem into smaller problems where each small problem is finished in a time
slot. Basically, we introduce multiple sub-schedulers, each taking the partial
schedule calculated by the sub-scheduler of lower order and augmenting the
schedule with more packet transmissions. The advantage of such pipelined
scheduler is that advanced scheduling algorithm can be used without increasing
the length of the time slot. We rigorously prove that duplicate scheduling,
i.e., multiple sub-schedulers scheduling the same packet, does not occur with
our algorithm when two sub-schedulers are used. We also propose a technique
called “adaptive pipelining” to reduce the packet latency when the traffic load
is low. Basically, when the traffic is low, the scheduling problem is simple
and fewer sub-schedulers may suffice such that the packets may experience less
delay. This work was first presented in IEEE
Infocom 2011 and an extended version has been accepted for publication in IEEE Journal of Lightwave Technology.
We study the
packet scheduling problems when there are multiple wavelengths per optical
fiber. The capacity of an optical
packet switching network is improved when the bandwidth of an optical fiber is
better utilized with multiple wavelengths carrying packets in parallel. The
packet scheduling problem with multiple wavelengths is even more challenging
than with a single wavelength, because multiple packets may arrive an input
fiber at one time slot and the orders of such packets must be maintained. We
prove that the optimal scheduling problem is NP-hard by reducing the Set
Packing problem to the scheduling problem. We further prove that the scheduling
problem cannot be approximated within a constant factor. Therefore, we propose
a set of practical scheduling algorithms and our simulations show that they achieve
good performance in typical conditions. This work was first presented in IEEE Infocom 2011 and an extended
version is currently under preparation for journal submission.
Publications:
|
Bandwidth Puzzle – Securing the Incentives in P2P
Networks (Collaborative Project) |

Peer-to-Peer (P2P) technology has been widely deployed; it has been reported that over 65% of the Internet traffic is P2P traffic. Our papers on the incentive schemes in p2p networks solve a critical problem: why is a peer willing to contribute its resources and serve other peers, as serving other peers means the consumptions of network bandwidth, CPU cycles, and battery life? A selfish peer may choose not to serve any peer; the system will fail if it allows such peers to get free rides.
The key challenge in designing a good incentive scheme
is that the peers may collude. A naïve approach, for example, is to collect
data transfer logs from the nodes and give higher priorities to peers who have
uploaded more data to other peers. This naïve approach will fail because such
logs can be forged; Alice may send a report to the system claiming that she has
uploaded a large file to Bob while actually she has not. It is also useless to
confirm this claim with Bob because Bob may be a friend of Alice and may lie to
the system such that Alice may get free credits from the system.
The key
breakthrough to this problem is the puzzle challenge scheme first described in
our ICISS 2009 paper. The idea is to
challenge all peers simultaneously with puzzles and require everyone to send
back the answers of the puzzles before a timeout. The puzzle has the key properties that 1) it takes
time to solve the puzzle and 2) the puzzles can be solved only if the peer has
the content that it claims to possess. Therefore, a peer cannot solve the
puzzle if it does not have the content; neither can its friends help in solving
the puzzle because the friends have to solve their own puzzles before the
timeout. As a result, the fake transactions can be detected.
The puzzle
solution is major contribution because it is the first real solution to the secure
incentive problem. Prior to that, no
solutions existed. The free rider can potentially get service without doing any
work, which has been demonstrated by several research papers on popular p2p
networks such as BitTorrent. The puzzle challenge scheme can be adopted in such
p2p networks and improve the security of the incentive mechanisms.
Another
major contribution is the performance bound I proved for the puzzle. The theoretical proof of the bound appears in the
April 2012 issue of IEEE Transactions on Information Forensics & Security. The bound gives the expected number of
bits the peers must upload to be able to solve the puzzles with a certain
probability. Such a bound is needed because it gives an assurance to the system
on the expected outcomes after applying
the puzzle. The first ICISS 2009 paper provided a weaker bound; the new
bound is much tighter because it can be approached asymptotically by a
practical strategy. The major challenge in proving
the bound is that the adversaries may collude and adopt any algorithm. The
bound must be established against the optimal algorithm; however, the optimal
algorithm is unknown due to the complexity of the puzzle challenging scheme.
The breakthrough is to adopt an indirect proof technique, i.e., introducing an
intermediate system which is easier to reason about and is close to the real
system in performance. Basically, I first prove the difference between the
intermediate system and the real system, and then prove the bound for the
intermediate system which is combined with the difference to establish the
bound for the real system.
Publications:
|
Coded Relay (Crelay) in Multi-hop Wireless Networks |
Multi-hop wireless networks such as wireless mesh
networks may extend the wireless coverage at low cost. In this project, we
design and implement a novel packet forwarding protocol called Coded Relay
(Crelay) by exploiting both overhearing and partial packets. The basic idea is to send only parity bytes
to the downstream node if the downstream node has already overheard a corrupted
version of the original packet because it can correct the errors with the
parity bytes.
We design a
novel protocol which exploits packet queuing delays for extra coordination time
to allow the upstream nodes to learn the states of the downstream nodes in a
timely manner. The downstream node announces
a feedback about a packet that has been overheard and the upstream nodes can
wait for the feedback before processing the packet because it has to send other
packets first. The main feature of the
protocol is that it will work seamlessly with the upper layers such as TCP
because it processes individual packet while other opportunistic routing
protocols may require packet batching and cannot work with TCP.
Crelay is a departure from both traditional routing
and existing opportunistic routing. Exploiting the overhearing opportunities,
Crelay achieves better performance than traditional routing. Exploiting the
queuing delay, Crelay is a simpler and more upper-layer-friendly protocol
compared to existing opportunistic routing protocols such as MORE.
In addition, Crelay uses advanced error correction codes to recover partial
packets in a multi-hop setting; in contrast, existing protocols may ignore the
partial packets, or only use simple error correction codes, or have to rely on
special hardware.
We implement the protocol in around 5,000 lines of C++
code in the Click modular router, and our experiments on an 11-node testbed
show significant gains over existing protocols. This work will be presented in IEEE Globecom 2012.
arXiv technical report:
1.
Z. Zhang, W. Hu* and J. Xie*, “Employing Coded Relay in Multi-hop Wireless Networks,”
http://arxiv.org/abs/1012.4136
|
Others |
Network fault diagnosis. We estimate the most likely faulty link with only the end-to-end path loss ratio observations. The novelty of this work is to exploit the prior knowledge of the link loss ratio which has been previously overlooked. This work won a Best Paper Award in IEEE ICC 2009.
· B. Sun* and Z. Zhang, “Probabilistic diagnosis of link loss using end-to-end path measurements and maximum likelihood estimation,” in Proc. of IEEE ICC, Dresden, Germany, June 2009. 5 pages. Acceptance rate: 35 % (1050 / 3000). Winner of a Best Paper Award. [paper]
A Medium Access Control (MAC) protocol
combining polling and contention. We
propose to combine polling with contention to improve the network efficiency.
We use the maximum likelihood estimation of the number of active nodes to
determine the dynamic switching between the polling mode and contention mode.
This work has been presented at IEEE
Globecom 2011.
·
S. Zhou* and Z. Zhang, “cMAC: A centralized
MAC protocol for high speed wireless LANs,” in Proc. of IEEE Globecom, Houston, USA, December 2011. 6 pages.
Acceptance rate: 36.6% (1070 / 2923). [accepted] [IEEE
final]
Enhancing random network coding with
partial packets support. We enhance
MORE, a packet forwarding protocol based on wireless random network coding,
with partial packets support. This work has been presented at IEEE Globecom 2011.
·
W. Hu*, J. Xie* and Z. Zhang, “pMORE: Exploiting partial
packets in opportunistic routing,” in Proc.
of IEEE Globecom, Houston, USA, December 2011. 6 pages. Acceptance
rate: 36.6% (1070 / 2923). [accepted] [IEEE
final]
A hybrid packet switch achieving 100%
throughput. We propose the Distribute
and Match switch which employs a randomization stage and a matching stage, and
achieves 100% throughput. This work has been presented at IEEE Globecom 2011.
Investigation of indoor localization
with carrier phase difference (collaborative project). We measure the carrier phase difference with GNU
Software Defined Radio and explore the possibilities of accurate indoor
localization. This work has been presented at IEEE RFID 2010.
·
C.
Hekimian, B. Grant, X. Liu, Z. Zhang
and P. Kumar, “Accurate localization of RFID tags using phase difference,” in Proc. of IEEE RFID, Orlando, USA, April 2010.
8 pages. Acceptance rate: 30 % (39 / 130).
[paper]