Publications by Year (* denotes my advisees)


  1. [JCC'17] Zhuo Liu*, Bin Wang*, and W. Yu. HALO: a fast and durable disk write cache using phase change memory. Journal of Cluster Computing. 2017. In Press. Paper
  2. [OpenSHMEM'17] H. Fu*, M. Gorentla Venkata, N. Imam, and W. Yu. Portable SHMEMCache: A High-Performance Key-Value store on OpenSHMEM and MPI.. Fourth workshop on OpenSHMEM and Related Technologies(OpenSHMEM'17). Annapolis, Maryland. August 2017.
  3. [CCGrid'17] H. Fu*, M. Gorentla Venkata, A. Roy Choudhury*, N. Imam, and W. Yu. High-Performance Key-Value Store On OpenSHMEM. 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). Madrid, Spain. (Acceptance rate: 23%). May 2017.
  4. [IPDPS'17] T. Wang*, A. Moody, Y. Zhu*, K Mohror, K. Sato, T. Islam, and W. Yu. MetaKV: A Key-Value Store for Metadata Management of Distributed Burst Buffers. 31st IEEE International Parallel and Distributed Processing Symposium. Orlando, FL. (Acceptance rate: 22%). May 2017.


  1. [ParCo'16]: L. Shi*, Z. Wang, W. Yu, X. Meng. A Case Study of Tuning MapReduce for Efficient Bioinformatics in the Cloud. Journal of Parallel Computing. Accepted.
  2. [ParCo'16]: H. Fu*, H. Chen, Y. Zhu*, W. Yu. FARMS: Efficient MapReduce Speculation for Failure Recovery in Short Jobs. Journal of Parallel Computing. Accepted.
  3. [SC'16]: T. Wang*, K. Mohror, A. Moody, K. Sato, W. Yu. An Ephemeral Burst-Buffer File System for Scientific Applications. International Conference for High performance Computing Networking, Storage and Analysis. Salt Lake City, Utah. November 2016. (Acceptance rate: 18%).
  4. [PACT'16]: B. Wang*, Y. Zhu*, W. Yu. OAWS: Memory Occlusion Aware Warp Scheduling. International Conference on Parallel Architecture and Compilation Techniques (PACT 2016). September 2016. (Acceptance rate: 26%). Haifa, Israel. Paper.
  5. [TPDS'16]: C. Xu*, R. Goldstone, Z. Liu*, H. Chen*, B. Neitzel, W. Yu. Exploiting Analytics Shipping with Virtualized MapReduce on HPC Backend Storage Servers. IEEE Transactions on Parallel and Distributed Systems. Paper.
  6. [IJHPCA]: Teng Wang*, Kevin Vasko*, Zhuo Liu*, Hui Chen*, Weikuan Yu. Enhance Scientific Application I/O with Cross-Bundle Aggregation. International Journal of High Performance Computing. Paper.


  1. [NAS'15]: Fang Zhou*, Hai Pham*, Jianhui Yue*, Hao Zou* and Weikuan Yu. SFMapReduce: An Optimized MapReduce Framework for Small Files. IEEE International Conference on Network, Architecture and Storage (NAS). August 2015, Boston, MA. Paper.
  2. [MASCOTS'15]: X. Wang*, B. Wang*, Z. Liu*, W. Yu. Preserving Row Buffer Locality for PCM Wear-Leveling Under Massive Parallelism. 23rd International Conference on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. October 2015, Atlanta, GA. Paper.
  3. [Cluster'15]: T. Wang*, H.S. Oral, H. Pritchard*, B. Wang* and W. Yu. TRIO: Burst Buffer Based I/O Orchestration. IEEE International Conference on Cluster Computing. September 2015, Chicago, IL. Paper.
  4. [ICS'15]: B. Wang*, W. Yu, X.H. Sun, X. Wang. DaCache: Memory Divergence-Aware GPU Cache Management. 29th International Conference on Supercomputing, June 2015, Newport Beach, CA. Paper.
  5. [IPDPS'15]: Y. Wang*, H. Fu*, and W. Yu. Cracking Down MapReduce Failure Amplification through Analytics Logging and Migration. 29th IEEE International Parallel and Distributed Processing Symposium (Acceptance rate: 22%). Hyderabad, India. May 2015. Paper.
  6. [DATE'15]: B. Wang*, Z. Liu*, X. Wang*, and W. Yu. Eliminating Intra-Warp Conflict Misses in GPU. The 18th Conference on Design Automation and Test in Europe. (Long paper. Acceptance rate: 22.4%). Grenoble, Fr. March 2015. Paper.
  7. [TC'15]: W. Yu, Y. Wang, X. Que, C. Xu. Virtual Shuffling for Efficient Data Movement in MapReduce. IEEE Transactions on Computers. Paper.
  8. [DISCS'15]: H. Fu, Y. Zhu, W. Yu. A Case Study of MapReduce Speculation for Failure Recovery The 2015 International Workshop on Data-Intensive Scalable Computing Systems (DISCS'15). Paper.
  9. [DISCS'15]: L. Shi, Z. Wang, W. Yu, X. Meng. Performance Evaluation and Tuning of BioPig for Genomic Analysis. The 2015 International Workshop on Data-Intensive Scalable Computing Systems (DISCS'15). Paper.


  1. [BigData'14]: T. Wang*, S. Oral, Y. Wang*, B. Settlemyer, S. Atchley, W. Yu. BurstMem: A High-Performance Burst Buffer System for Scientific applications. 2014 IEEE Conference on Big Data (Acceptance rate: 18.5%). Washington, DC. October 2014. Paper.
  2. [DISCS'14]: T. Wang*, K. Vasko*, Z. Liu*, H. Chen*, W. Yu. BPAR: A Bundle-Based Parallel Aggregation Framework for Decoupled I/O Execution. The 2014 International Workshop on Data-Intensive Scalable Computing Systems. New Orleans, LA. November 2014. Paper.
  3. [Sigmetrics'14]: J. Tan, Y. Wang, W. Yu, L. Zhang. Non-work-conserving effects in MapReduce: Diffusion Limit and Criticality. ACM SigMetrics 2014 (Acceptance rate: 17%). Austin, TX. June 2014. Paper.
  4. [IPDPS'14]: Y. Wang, R. Goldstone, W. Yu, T. Wang. Characterization and Optimization of Memory-Resident MapReduce on HPC Systems. 28th IEEE International Parallel and Distributed Processing Symposium (Acceptance rate: 21%). Tucson, AZ. May 2014. Paper.
  5. [TPDS'14]: W. Yu, Y. Wang and X. Que. Design and Evaluation of Network-Levitated Merge for Hadoop Acceleration. IEEE Transactions on Parallel and Distributed Computing. Paper.
  6. [CPE'14]: G. F. Lofstead, Q. Liu, J. Logan, Y. Tian, H. Abbasi, N. Podhorszki, J. Y. Choi, S. Klasky, R. Tchoua, R. A. Oldfield, M. Parashar, N. Samatova, K. Schwan, A. Shoshani, M. Wolf, K. Wu, W. Yu. Hello ADIOS: The Challenges and Lessons of Developing Leadership Class I/O Frameworks. Concurrency and Computation: Practice and Experience, John Wiley and Sons. Paper.


  1. [SC'13]: X. Li, Y. Wang, Y. Jiao, C. Xu, W. Yu. CooMR: Cross-Task Coordination for Efficient Data Management in MapReduce Programs. Li and Wang contributed equally to the paper. Denver, CO. (Acceptance Rate: 20%). November 2013. Paper.
  2. [PACT'13]: B. Wang, B. Wu, D. Li, X. Shen, W. Yu, Y. Jiao, J. Vetter. Exploring Hybrid Memory for GPU Energy Efficiency through Software-Hardware Co-Design. The 22nd International Conference on Parallel Architecture and Compilation Techniques (PACT'13). (Acceptance Rate: 17%). September 2013. Edinburgh, Scotland. Paper.
  3. [Cluster'13]: Z. Liu, J. Lofstead, T. Wang, W. Yu. A Case of System-Wide Power Management for Scientific Applications. IEEE International Conference on Cluster Computing. (Acceptance rate: 31%). Indiannapolis, IN. September 2013. Paper.
  4. [MASCOTS'13]: B. Wang, Y. Jiao, W. Yu, X. Shen, D. Li, J. Vetter. A Versatile Performance and Energy Simulation Tool for Composite GPU Global Memory. Short paper. August 2013. San Francisco, CA. Paper.
  5. [WBDB'13]: Y. Wang, Y. Jiao, C. Xu, X. Li, T. Wang, X. Que, C. Cira, B. Wang, Z. Liu, B. Bailey, W. Yu. Assessing the Performance Impact of High-Speed Interconnects on MapReduce. Invited paper to the Proceedings of Workshop on Big Data Benchmarks. 2013. Paper.
  6. [ICAC'13]: Y. Wang, J. Tan, W. Yu, L. Zhang, X. Meng. Preemptive ReduceTask Scheduling for Fair and Fast Job Completion. 10th International Conference on Autonomic Computing (ICAC'13). (Acceptance Rate: 22%). June 2013. Paper.
  7. [MSST'13]: Y. Tian, Z. Liu, S. Klasky, B. Wang, H. Abbasi, S. Zhou, N. Podhorszki, T. Clune, J. Logan and W. Yu. A Lightweight I/O Scheme to Facilitate Spatial and Temporal Queries of Scientific Data Analytics. IEEE Symposium on Massive Storage Systems and Technologies (MSST'13). (Acceptance Rate: 13%). May 2013. Paper.
  8. [NAS'13]: Yuan Tian, Scott Klasky, Weikuan Yu, Bin Wang, Hasan Abbasi, Norbert Podhorszki, Ray Grout. DynaM: Dynamic Multiresolution Data Representation for Large-Scale Scientific Analysis. NAS 2013. Xi'an, China. July 2013. Paper.
  9. [ICCCN'13]: Zhuo Liu, Bin Wang, Teng Wang, Yuan Tian, Cong Xu, Yandong Wang, Weikuan Yu, Carlos A. Cruz, Shujia Zhou, Tom Clune, Scott Klasky. Profiling and Improving I/O Performance of a Large-Scale Climate Scientific Application. ICCCN 2013. Paper.
  10. [JCC'13]: Yuan Tian, Cong Xu, Weikuan Yu, Jeffrey S. Vetter, Scott Klasky, Honggao Liu, Saad Biaz. neCODEC: nearline data compression for scientific applications. Journal of Cluster Computing. April 2013. Paper.
  11. [CCGrid'13]: C. Xu, R. Graham, M. Venkata, Y. Wang, Z. Liu, and W. Yu. SLOAVx: Scalable LOgarithmic AlltoallV Algorithm for Hierarchical Multicore Systems. International Conference on Cluster Cloud and Grid Computing. (CCGrid'13). (Acceptance Rate: 22%). May 2013. Delft, Netherland. Paper.
  12. [IPDPS'13]: Y. Wang, C. Xu, X. Li and W. Yu. JVM-Bypass for Efficient Hadoop Shuffling. International Parallel and Distributed Processing Symposium (IPDPS'13). (Acceptance Rate: 21%). May 2013. Boston, MA. Paper.
  13. [IPDPS'13 PhD Forum]: Bin Wang, Weikuan Yu. Performance and Power Simulation for Versatile GPGPU GlobalMemory. 2013 IPDPS PhD Forum. Boston, MA.


  1. [First Place, 2012 ACM SRC Grand Finals]: Y. Tian and W. Yu. Smart-IO: System-Aware Two-Level Data Organization for Efficient Scientific Analytics. 2012 ACM Grand Finals Student Research Competition. ACM Awards Ceremony, San Francisco, CA. June 2012. [ACM SRC Website] .
  2. [HPDC'12]: Yuan Tian, Scott Klasky, Weikuan Yu, Hasan Abbasi, Bin Wang, Norbert Podhorszki, Ray W. Grout, Matthew Wolf. A system-aware optimized data organization for efficient scientific analytics. Poster. HPDC 2012. (Acceptance Rate: 23% combined with full papers). 125-126. Paper.
  3. [SC'12]: D. Li, J.S. Vetter, and W. Yu. Classifying Soft Error Vulnerabilities in Extreme-Scale Scientific Applications Using a Binary Instrumentation Tool. SC'12. (Acceptance Rate: 21%). Salt Lake City, 2012. Paper.
  4. [CEE'12]: Y. Tian*, W. Yu, J.S. Vetter. RXIO: Design and Implementation of High Performance RDMA-capable GridFTP. Journal of Computers and Electrical Engineering. Vol 38 (2012) 772-784.
  5. [IJPP'12]: V. Tipparaju, E. Apra, W. Yu, Xinyu Que, J.S. Vetter. Runtime Techniques to Enable a Highly-Scalable Global Address Space Model for Petascale Computing. International Journal of Parallel Programming. 2012.
  6. [JPDC'12]: W. Yu., X. Que, V. Tipparaju, and J.S. Vetter. HiCOO: Hierarchical Cooperation for Scalable Communication in Global Address Space Programming Models on Cray XT Systems Journal of Parallel and Distributed Computing (JPDC). 2012. Paper.
  7. [IPDPS'12]: D. Li, J.S. Vetter, G. Marin, C. McCurdy, C. Cira, Z. Liu, and W. Yu. Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications. International Parallel and Distributed Processing Symposium (IPDPS'12). (Acceptance Rate: 21%). May 2012. Shanghai, China. Paper.
  8. [MASCOTS'12]: Y. Tian, S. Klasky, W. Yu, H. Abbasi, B. Wang, N. Podhorszki, R. Grout, M. Wolf. SMART-IO: SysteM-AwaRe Two-Level Data Organization for Efficient Scientific Analytics. MASCOTS 2012. 181-188. Paper.
  9. [MASCOTS'12]: Z. Liu, B. Wang, P. Carpenter, D. Li, J.S. Vetter, W. Yu. PCM-Based Durable Write Cache for Fast Disk I/O. MASCOTS 2012. 451-458. Paper.


  1. [SC'11]: Y. Wang, X. Que, W. Yu, D. Goldenberg, D. Sehgal. Hadoop Acceleration through Network Levitated Merging. SC11. (Acceptance Rate: 21%). Seattle, WA. Paper, Project Website, Code Download.
  2. [ICPP'11:] Weikuan Yu, Vinod Tipparaju, Xinyu Que, Jeffrey S. Vetter. Virtual Topologies for Scalable Resource Management and Contention Attenuation in a Global Address Space Model on the Cray XT5. ICPP 2011: 235-244. Paper.
  3. [Cluster'11:] Yuan Tian, Scott Klasky, Hasan Abbasi, Jay F. Lofstead, Ray W. Grout, Norbert Podhorszki, Qing Liu, Yandong Wang, Weikuan Yu. EDO: Improving Read Performance for Scientific Applications through Elastic Data Organization. CLUSTER 2011: 93-102. Paper.
  4. [Cluster'11:] Weikuan Yu, K. John Wu, Wei-Shinn Ku, Cong Xu, Juan Gao. BMF: Bitmapped Mass Fingerprinting for Fast Protein Identification. CLUSTER 2011: 17-25 Paper.
  5. [CCGrid'11:] Xinyu Que*, Weikuan Yu, Vinod Tipparaju, Jeffrey S. Vetter, Bin Wang. Network-Friendly One-Sided Communication through Multinode Cooperation on Petascale Cray XT5 Systems. CCGRID 2011: 352-361. Paper.
  6. [ERSS'11:] Z. Liu*, J. Zhou, W. Yu, F. Wu, X. Qin and C. Xie. MIND: A Black-Box Energy Consumption Model for Disk Arrays. In 1st International Workshop on Energy Consumption and Reliability of Storage Systems (ERSS'11). Orlando, Florida, July 2011. Paper.


  1. [CF'10]: V. Tipparaju, E. Apra, W. Yu, J.S. Vetter. Enabling a highly-scalable global address space model for petascale computing. International Conference on Computing Frontiers. (Acceptance Rate: 25%). Bertinoro, Italy. May 2010. Paper.
  2. [ISC'10]. Weikuan Yu, Xinyu Que, Vinod Tipparaju, Richard L. Graham, Jeffrey S. Vetter. Cooperative server clustering for a scalable GAS model on petascale cray XT5 systems. Computer Science - R&D 25(1-2): 57-64 (2010). Paper.


  1. [IPDPS'09]: W. Yu, O. Drokin, J.S. Vetter. Design, Implementation, and Evaluation of Transparent pNFS on Lustre. 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS'09). (Acceptance Rate: 23%). Rome, Italy.


  1. [SC'08]: N.S.V. Rao, W. Yu, S.W. Poole, W.R. Wing, J.S. Vetter. Wide-Area Performance Profiling of 10GigE and InfiniBand Technologies. SC08. (Acceptance Rate: 21%). Nov 2008. Austin, TX.
  2. [SC'08]: S. Alam, R. Barrett, M. Bast, M. R. Fahey, J. Kuehn, C. McCurdy, J. Rogers, P. Roth, R. Sankaran, J. S. Vetter, P. Worley, W. Yu. Early Evaluation of IBM BlueGene/P, SC08. (Acceptance Rate: 21%). Nov 2008. Austin, TX. Paper.
  3. [IPDPS'08]: W. Yu, J.S. Vetter, Sarp Oral. Performance Characterization and Optimization of Parallel I/O on the Cray XT. IPDPS 2008. (Acceptance Rate: 26%). April 2008. Miami, FL. Paper.
  4. [ICPP'08]: W. Yu, J.S. Vetter: ParColl: Partitioned Collective I/O on the Cray XT. International Conference on Parallel Processing (ICPP'08). (Acceptance Rate: 31%). Portland, OR. Paper.
  5. [CCGrid'08]: Weikuan Yu, Jeffrey S. Vetter. Xen-Based HPC: A Parallel I/O Perspective. CCGRID 2008. Paper.
  6. [NAS'08]: Weikuan Yu, Nageswara S. V. Rao, Jeffrey S. Vetter. Experimental Analysis of InfiniBand Transport Services on WAN. NAS 2008. Chongqing, China. Paper.
  7. [EuroPar'08]: Weikuan Yu, Sarp Oral, Shane Canon, Jeffrey S. Vetter, Ramanan Sankaran. Empirical Analysis of a Large-Scale Hierarchical Storage System. Euro-Par 2008: 130-140. Canary Islands, Spain. Paper.


  1. [CCGrid'07:] Weikuan Yu, Jeffrey S. Vetter, Shane Canon, Song Jiang: Exploiting Lustre File Joining for Effective Collective IO. CCGRID 2007. Paper.
  2. [ICPP'07:] Feng Chen, Song Jiang, Weisong Shi, Weikuan Yu: FlexFetch: A History-Aware Scheme for I/O Energy Saving in Mobile Computing. ICPP 2007. Paper.

2006 and Before

  1. [ICPP'06:] Shuang Liang, Weikuan Yu, Dhabaleswar K. Panda: High Performance Block I/O for Global File System (GFS) with InfiniBand RDMA. ICPP 2006: 391-398. Paper.
  2. [ICPP'06:] Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda: Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand. ICPP 2006. Paper.
  3. [IPDPS'06:] Weikuan Yu, Qi Gao, Dhabaleswar K. Panda: Adaptive connection management for scalable MPI over InfiniBand. IPDPS 2006. Paper.
  4. [IPDPS'05:] Weikuan Yu, Timothy S. Woodall, Richard L. Graham, Dhabaleswar K. Panda: Design and Implementation of Open MPI over Quadrics/Elan4. IPDPS 2005. Paper.
  5. [ICS'05:] Weikuan Yu, Shuang Liang, Dhabaleswar K. Panda: High performance support of parallel virtual file system (PVFS2) over Quadrics. ICS 2005. Paper.
  6. [Cluster'05:] Pavan Balaji, Wu-chun Feng, Qi Gao, Ranjit Noronha, Weikuan Yu, Dhabaleswar K. Panda. Head-to-TOE Evaluation of High-Performance Sockets over Protocol Offload Engines. CLUSTER 2005. Paper.
  7. [IJHPCA'05:] Weikuan Yu, Sayantan Sur, Dhabaleswar K. Panda, Rob T. Aulwes, Richard L. Graham. High Performance Broadcast Support in La-Mpi Over Quadrics. IJHPCA 19(4): 453-463 (2005). Paper.
  8. [Cluster'04:] Weikuan Yu, Dhabaleswar K. Panda, Darius Buntinas. Scalable, high-performance NIC-based all-to-all broadcast over Myrinet/GM. CLUSTER 2004. Paper.
  9. [IEEE Micro'04:] Jiuxing Liu, B. Chandrasekaran, Weikuan Yu, Jiesheng Wu, Darius Buntinas, Sushmitha P. Kini, Dhabaleswar K. Panda, Pete Wyckoff: Microbenchmark Performance Comparison of High-Speed Cluster Interconnects. IEEE Micro 24(1): 42-51 (2004). Paper.
  10. [HiPC'04:] Weikuan Yu, Jiesheng Wu, Dhabaleswar K. Panda. Fast and Scalable Startup of MPI Programs in InfiniBand Clusters. HiPC 2004. Paper.
  11. [ICPP'03:] Weikuan Yu, Darius Buntinas, Dhabaleswar K. Panda. High Performance and Reliable NIC-Based Multicast over Myrinet/GM-2. ICPP 2003. Paper.
  12. [HotI'03:] Jiuxing Liu , Balasubramanian Chandrasekaran , Weikuan Yu , Jiesheng Wu , Darius Buntinas , Sushmitha Kini, Peter Wyckoff, Dhabaleswar K. Panda. Micro-Benchmark Level Performance Comparison of High-Speed Cluster Interconnects. Hot Interconnect 2003. Stanford, CA. Paper.
  13. [SC'03:] Jiuxing Liu, B. Chandrasekaran, Jiesheng Wu, Weihang Jiang, Sushmitha P. Kini, Weikuan Yu, Darius Buntinas, Pete Wyckoff, Dhabaleswar K. Panda: Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics. SC 2003. Paper.

[Complete List] from the PASL Publication Database.