Home
People
Publications
Software
Sponsors
|
The FSU MPI project
Overview
We investigate various techniques for optimizing the performance of MPI
programs in the clusters of workstations environment. We consider Ethernet
and Infiniband clusters with single CPU, SMP, and multi-core nodes. Our current
efforts include the following:
- Understanding important performance issues in MPI collective
operations
- Developing topology and architecture specific communication
algorithms
- Developing practical and accurate performance models for MPI
collective communication algorithms
- Investigating adaptive MPI library implementation techniques
- Investigating the integrated compiler and library approach (compiled
communication) for MPI optimizations.
People
Faculty:
Current Students:
- Joshua Lawrence
- Wickus Nienaber
- Pitch Patarasuk
- Matthew Small
- Chi Zhang
Past students:
- Ahmad Faraj (PHD 2006, currently at IBM)
Equipment
- draco.cs.fsu.edu (since Summer 2007)
- Number of nodes: 18
- Node: Dell Poweredge 1950 (two 2.33 GHz Quad-core Intel Xeon E5345,
8 cores per node).
- Memory: 8GB per node
- Connectivity: (1) Infiniband DDR (20Gbps), (2) 1Gbps managed Ethernet,
(3) 100Mbps Ethernet.
- cetus.cs.fsu.edu (since Fall 2004)
- Number of nodes: 36
- Node: Dell dimension 2400 (2.8GHz P4)
- Memory: 640MB per node
- connectivity: 1 Gbps and 100 Mbps reconfigurable Ethernet (8 Ethernet
switches in disposal)
Publications
Journal publications
- A. Faraj, P. Patarasuk, and X. Yuan,
"A Study of Process
Arrival Patterns for MPI Collective Operations,"
International Journal of Parallel Programming, accepted.
- A. Faraj, P. Patarasuk and X. Yuan. "Bandwidth Efficient All-to-all Broadcast on Switched Clusters," International Journal of Parallel
Programming, Accepted.
- P. Patarasuk, A Faraj, and X. Yuan. "
Techniques for Pipelined
Broadcast on Ethernet Switched Clusters,"
Journal of Parallel and Distributed Computing, 68(6):809-824,
June 2008.
- R. G. Lane, S. Daniels and X. Yuan, "
An Empirical Study of Reliable
Multicast Protocols over Ethernet-Connected Networks,"
Performance Evaluation Journal, Vol. 64, No. 3, pages 210-228, March 2007
- A. Faraj, X. Yuan, and P. Patarasuk,
"A Message Scheduling Scheme for All-to-All Personalized Communication
on Ethernet Switched Clusters," IEEE Transactions on Parallel and
Distributed Systems, Vol. 18, No. 2, pages 264-276, Feb. 2007
- A. Karwande, X. Yuan, and D. K. Lowenthal, "An MPI Prototype for Compiled Communication on Ethernet Switched Clusters."
Journal of Parallel and Distributed Computing (JPDC), special issue on Design and Performance of Networks for Super-, Cluster-, and Grid-Computing, Vol. 65, No. 10, pages 1123-1133,
October 2005.
- X. Yuan, R. Melhem and
R. Gupta, "Algorithms for Supporting Compiled
Communication,"
IEEE Transactions on Parallel and Distributed Systems,
Volume 14, No. 2, pages 107-118, February 2003.
Conference Publications
-
P. Patarasuk and X. Yuan,
"Efficient MPI_Bcast across Different Process
Arrival Patterns,"
the 2008 International Parallel and Distributed Processing
Symposium (IPDPS), Miami, FL, April 14-18, 2008.
-
J. Lawrence and X. Yuan, "An MPI Tool for Automatically Discovering the
Switch Level Topologies of Ethernet Clusters," the IPDPS Workshop on
System Management Techniques, Processes, and Services,
Miami, FL, April 14, 2008.
- W. Nienaber, X. Yuan, Z.
Duan, "On LID Assignment in Infiniband
Networks,"
ACM/IEEE Symposium on Architectures for Networking and
Communications Systems, pages 97-106,
Orlando, FL, December 3-4, 2007.
- A. Faraj, P. Patarasu, and X. Yuan, "A Study of Process Arrival
Patterns for MPI Collective Operations," the 21th International
Conference on Supercomputing (ICS), Seattle, WA, June 16-20, 2007.
- X. Yuan, W. Nienaber, Z. Duan, and R. Melhem, "Oblivious Routing for
Fat-Tree Based System Area Networks with Uncertain Traffic Demands,"
(full paper), ACM Sigmetrics, San Diego, CA, June 12-16, 2007.
- P. Patarasuk and X. Yuan, "Bandwidth Efficient All-reduce Operation
on Tree Topologies," IEEE IPDPS Workshop on High-Level Parallel
Programming Models and Supportive Environments (HIPS 2007),
Long Beach, CA, March 2007.
- A. Faraj, X. Yuan, and D.K. Lowenthal, "STAR-MPI: Self Tuned Adaptive
Routines for MPI Collective Operations." The 20th ACM International
Conference on Supercomputing (ICS), pages 199-208, June 2006.
- P. Patarasu, A. Faraj, and X. Yuan,
"Pipelined Broadcast on Ethernet
Switched Clusters." The
20th IEEE International Parallel & Distributed
Processing Symposium (IPDPS), Rhodes Island, Greece, April 25-29, 2006.
- A. Faraj, P. Patarasuk, and X. Yuan, "
Bandwidth Efficient
All-to-all Broadcast on Switched Clusters." The 2005 IEEE
International Conference on Cluster Computing (Cluster 2005),
Boston, MA, September 27-30, 2005.
- A. Faraj and X. Yuan, "Automatic Generation and Tuning of MPI
Collective Communication Routines." The 19th ACM International
Conference on Supercomputing (ICS), June 2005.
- A. Faraj and X. Yuan, "An Empirial Approach for Efficient All-to-All
Personalized Communication on Ethernet Switched Clusters." The 2005
International Conference on Parallel Processing (ICPP),
Oslo, Norway, June 14-17, 2005.
- A. Faraj and X. Yuan,
"Message Scheduling for All-to-all Personalized
Communication on Ethernet Switched Clusters." The 19th IEEE International
Parallel & Distributed Processing Symposium (IPDPS), Denver, Colorado,
April 4-8, 2005.
- A. Karwande, X. Yuan,
and D. Lowenthal, "CCMPI: A Compiled
Communication Capable MPI Prototype for Ethernet Switched Clusters."
ACM SIGPLAN Symposium on Principles and Practice of Parallel
Programming (PPoPP), San Diego, California, June 11-13, 2003.
- A. Faraj and X. Yuan
"Communication Characteristics in the NAS
Parallel Benchmarks,"
Fourteenth IASTED International Conference on Parallel and
Distributed Computing and Systems
(PDCS 2002), Cambridge, MA, November 4-6,
2002.
- X. Yuan, S. Daniels, A. Faraj
and A. Karwande, "Group Management Schemes for
Implementing MPI Collective Communication over IP-Multicast.",
The 6th International Conference on Computer Science and Informatics,
Durham, NC, March 8-14, 2002.
- R. G. Lane, D. Scott and
X. Yuan,
"An Empirical Study of Reliable Multicast
Protocols over Ethernet-Connected
Networks," International Conference on Parallel Processing
(ICPP'01), pages 553-560,
Valencia, Spain, September 3-7, 2001.
Thesis
- Ahmad Faraj,
"
Automatic Empirical Techniques for Developing Efficient MPI Collective
Communication Routines," PHD Dissertation, Department of Computer Science,
Florida State University, Fall 2006.
- Ahmad Faraj,
"Communication Characteristics in the NAS Parallel Benchmarks," MS. Thesis,
Department of Computer Science, Florida State University, Fall 2002.
- Amit Karwande,
"CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet
Switched Clusters," MS. Thesis,
Department of Computer Science, Florida State University, Spring 2003.
Software
- STAR-MPI package
- STAGE_MPI package
- An MPI tool for discovering the switch
level topology in a homogeneous Ethernet cluster.
- CCMPI version 1.0. This is an
undocumented prototype MPI implementation that optimizes a number
of MPI collective communication routines for Ethernet
switched LINUX clusters through compiled communication.
The README file describes how to use CCMPI-1.0.
Sponsors
This research is supported by NSF CCR-0073482, CCF-0342540, and CCF-0541096
Any opinions, findings, and conclusions or
recommendations expressed in
this material are
those of the author(s) and do not necessarily reflect the views
of the National Science
Foundation.
|