STAR-ALLTOALL

[Email Ahmad Faraj at faraja@us.ibm.com or Xin Yuan at xyuan@cs.fsu.edu in case of any questions regarding STAR-ALLTOALL]

The adaptive MPI_Alltoall routine is part of the STAR-MPI library developed at Florida State University by Ahmad Faraj and Xin Yuan in a research project "Delayed Finalization of MPI Collective Routines", funded by National Science Foundation (CCF-0541096). Although STAR-MPI includes routines for many MPI collective operations, this particular page has the source file(s) for the STAR_Alltoall routine and some testing benchmarks. Here are the instructions once the package STAR-MPI.tar.gz is downloaded. NOTE: the code works with any MPI library (MPICH, LAM, MVAPICH, OPENMPI, ..etc).

gunzip STAR-MPI.tar.gz
tar xvf STAR-MPI.tar

This will create a directory named STAR-MPI with these contents: algorithms (directory), benchmarks (directory), star-mpi.c, and star-mpi.h. The directory benchmarks contains the benchmarks used in the paper. We will discuss the benchmarks at the end. Next we see how to setup and use STAR_Alltoall routine.

set the STAR_HOME environment variable to point to the STAR-MPI directory. For example, if STAR-MPI is under /home/faraj then issue:

setenv STAR_HOME /home/faraj/STAR-MPI

To use the STAR-Alltoall routine, you need to put the #include <star-mpi.c> in your file.
To access the STAR_Alltoall routine, you need to replace all MPI_Alltoall with STAR_Alltoall and add an extra parameter at the end of the STAR_Alltoall routine, which is the call_site_number. Because STAR_Alltoall routine differentiates between different call sites, we must tell the routine about the caller. Consider the following code segment:

int main(int argc, char ** argv){
.....
STAR_Alltoall(sbuff, .., comm, 0)
if (x == 10)
......
STAR_Alltoall(temp, ....., comm, 1);
.....
}

the parameter 0 and 1 at the end of the two STAR_Alltoall calls tell the routine that the first call to STAR_Alltoall has a call_site_number 0 while the other is with a call_site_number 1. Note that each call site must have a unique call_site_number. Note for FORTRAN users: the call_site_number parameter comes before the last argument "err".

Assume your file is named file.c. To compile, issue the command

mpicc -o exec file.c -DSTAR -I$(STAR_HOME) -I$(STAR_HOME)/algorithms

for FORTRAN,

mpicc -c star-mpi.c -DSTAR -I$(STAR_HOME) -I$(STAR_HOME)/algorithms

then, you can link the star-mpi.o with your file.

mpicc -o executable file.c star-mpi.o

*** Some FORTRAN compilers will complain about the number of underscores after a routine being called in C. If that is the case, then compile star-mpi.c with an extra flag -DSUNSCORE

There are 8 algorithms in STAR_Alltoall repository. The default number of invocations to examine each algorithm is 10. Thus, a minumum of 80 invocations are needed for STAR_Alltoall routine to examine all algorithms. The assumption here is that your code will invoke STAR_Alltoall routine sufficiently many times (more than 80). Note that the number of invocations to test each algorithm can be changed by passing a flag while compiling as follows: -DITER=5 this will examine each algorithm in 5 consequetive invocations.
There is a -DDEBUG flag that can be used while compiling. This will result in printing performance measurement results to a log file named "log-all.txt." Users can set the flag if they desire to see the intermediate performance results.

Description of Benchmarks:

Under the benchmark directory, there is a directory for micro-bench and applications. Let us start with the micro-bench.

micro-bench:

Under this directory, there are Makefile, test-native.c, and test-star.c. Issueing Make will result in two executables micro-star and micro-native. Please refer to the paper to read more about these micro-benchmarks. Both executables take four parameters: msize (message size), comp_time (computation time in milliseconds), imf (imbalance factor), and ppn (number of processor per node). Assume msize = 8000, comp_time = 50, imf = 10, and ppn = 2. To run the executables on 16 processors, type:

mpirun -np 16 micro-star 8000 50 10 2

mpirun -np 16 micro-native 8000 50 10 2

applications:

There are three directories: ft, matrix, and vh. The total number of applications in all directories is four. To create the FT executables, cd to ft directory. To make both native and star versions of ft benchmark, type:

make ft_star CLASS=C NPROCS=16, which will create a class "C" ft benchmark for 16 processors.

make ft_native CLASS=C NPROCS=16

Both executables will be under the bin directory.

Under the matrix directory, there are two benchmarks: mt and fft2d. Just type make to create the different executables. For the fft2d benchmark, the problem size is set at compile time as a flag while the mt benchmark takes the problem size as a command line argument.

Finally, the vh directory contains the virginia hydrodynamics code. To create the star executable version, you need to issue:

cp Makefile.star Makefile then Make

similarly, to get the native version, issue: cp Makefile.native Makefile then Make

the file zone.h is what defines the problem size.