CIS5930-07 Parallel Computing: Homeworks

Homework 1

Due date: Tuesday, 18 Sep 2001

How to turn in: Turn in a hardcopy to me in class, at 2 pm.

Note: You must justify all your answers.

Design an efficient algorithm for the parallel prefix problem and give its time complexity, assuming that any processor can send or receive a message from any other processor in constant time.
Do Chapter 2, exercise # 1. Describe the algorithm precisely, giving each step. Also give the number of computational steps required in the worst case.
Do Chapter 3, exercise # 10. You need not perform empirical studies.

Homework 2

(Was analysis of matrix-vector multiplication with row-wise striped distribution.)

Homework 3

Due date: Thursday, 25 Oct 2001

How to turn in: Email me a tar file containing your code and makefile, at ashoks@modi4.ncsa.uiuc.edu, by 3 pm on the due date.

Note: (i) Your code should run on the Origin 2000 at NCSA, and you must have a makefile. (ii) The tar file should not contain any executable or object files. (iii) The tar file should be called hw3.tar. On untarring (tar xvf hw3.tar) a directory called hw3 should be created, with the required files in it. (ii) Typing make in the hw3 directory should create the executable hw3.

The program: You will write a program in C to multiply two matrices (C = A*B) in parallel using MPI, with a block-checkerboard distribution on each process. You may use as many MPI features as you wish to. You should write a file called f.c which defines a function called double f(char matrix, int i, int j). This function give the value of the (i,j) th entry of the A matrix when its first argument is 'a', and the corresponding element of the B matrix when the first argument is 'b'. In your code, let A be defined by A(i,j) = 0.5*i+j and B by B(i,j) = i+0.5j. (The first element is indexed (1,1), rather than (0,0).) The file f.c should not contain anything else. I will replace f.c with a few other function definitions in my tests.

Your code should take a command line argument N. A, B, and C will then be N x N matrices. You program should output C in row major order to stdout, with each row separated by a '\n'. You may assume that the number of processors is a perfect square, and that the square root of the number of processors divides N. Your code should perform the multiplication using the systolic algorithm we discussed in class, and not by directly applying a formula to the example application. Your code should work even when I change f. Furthermore, you should actually implement the matrix multiplication algorithm yourself, rather a using a library created by someone else!

Homework 4

Due date: Tuesday, 6 Nov 2001

How to turn in: Email me a C file containing your code (called hw4.c) at ashoks@modi4.ncsa.uiuc.edu, by 3 pm on the due date. You should also plot two speed-up curves, (i) with automatic parallelization and (ii) with OpenMP parallelization. Please turn in hardcopies of these figures at the beginning of class 13 Nov 2001.

Note: (i) Your code should run on the Origin 2000 at NCSA.

The program: You will write a program in C to multiply two matrices (C = A*B) and parallelize it (i) using automatic parallelization, and (ii) using OpenMP directives. The code you turn in should be the one with OpenMP directives. You should hardcode the definition of the A and B matrices. Let A be defined by A(i,j) = 0.5*i+j and B by B(i,j) = i+0.5j. (The first element is indexed (1,1), rather than (0,0).) Unlike with the previous homework, you will not have a separate file called f.c. Just write the definition of A and B inside your initialization loop.)

Your code should take a command line argument N. A, B, and C will then be N x N matrices. You program should output C in row major order to stdout, with each row separated by a '\n'. (Unlike with the previous homework, will not assume that the number of threads is a perfect square, or that the square root of the number of processors divides N.) You will be graded on the performance of your algorithm too. Note that you may need to use a one-dimensional implementation of the matrices to get good performance, especially with automatic parallelization.