- Assignment 1
- Due: 17 Feb 2003.
- Assignment 2
- Due: 31 Mar 2003.

**Reading assignment:**- Read Chapter 1: sections 1.5 - 1.5, 1.8.
- Read Chapter 2: sections 2.2.1 - 2.2.4, 2.2.7, 2.4 - 2.7, table 2.1.

**Review questions:**- How are computers classified, based on Flynn's taxonomy?
- Do you know what the following mean: SISD, SIMD, and MIMD?
- In which of the following classes does a traditional sequential computer fall: SISD, SIMD, or MIMD?
- In which of the following classes do most current commercial parallel computers fall: SISD, SIMD, or MIMD?
- In which of the following classes does a super-scalar, pipelined sequential computer fall: SISD, SIMD, or MIMD?
- How are MIMD computers classified, based on their address space organization?
- Do you know the following terminology: distributed memory, shared memory, distributed shared memory, SMP, UMA, NUMA, multicomputer, multiprocessor, centralized multiprocessor, distributed multiprocessor?

**Reading assignment:**- Read Chapter 4: section 4.2.
- Read Chapter 7: sections 7.1-7.4.
- Read Chapter 17: sections 17.1-17.4.

**Review questions:**- What is "false-sharing"?
- Determine the diameter, bisection width, and cost for each of the following network topologies, as a function of the number of node,
`n`

: 2-D torus, 3-D torus, and hypercube? - What are some parallel programming paradigms that we discussed in class?
- What is SPMD?
- What are the definitions of efficiency and speedup? What do they attempt to measure?
- What is a thread?

**Reading assignment:**- Read Chapter 17: sections 17.1-17.7.
- Read the OpenMP standard: www.openmp.org.

**Review questions:**- What are Amdahl's law and Gustafson's law?
- Given the sequential fraction, what are limits on speed-up obtained using Amdahl's law and Gustafson's law? Why do the two laws yield different results?
- Do you know the following? Concept of threads, OpenMP execution model, compiling and OpenMP program on the SGI origin 2000, compiler directives for creating a parallel region and work-sharing a for loop, reduction, library calls to set the number of threads, and get the thread number.
- Can you give examples to demonstrate errors that can occur in a program when multiple threads execute a piece of code?
- What do you think is the most likely reason for the restrictions OpenMP places on the type of
`for`

loops?

**Reading assignment:**- Read Chapter 17: sections 17.1-17.7.
- Read the OpenMP standard: www.openmp.org.

**Review questions:**- Do you know the following? OpenMP constructs for reduction, avoiding barriers at the end of for loops, setting the scheduling policy, defining critical sections, declaring variables private; and library calls for setting the number of threads and obtaining the number of processors.
- Do you know the semantics of reduction, private, firstprivate, and lastprivate?
- Can you give an example to demonstrate how changing the order of loops may enable more effective parallelization with OpenMP (see page 510 of the text book).

**Examples:**www.cs.fsu.edu/~asriniva/courses/hpcsa/examples/lec9.tar

**Reading assignment:**- Read Chapter 3: section 3.5.
- Read the class notes.

**Review questions:**- Do you know the following? The communication model that we are using, parallel algorithms for reduction, prefix, and solving linear recurrences?
- Our communication model does not take into account the distance between processors. Is this justified? If we need to take it into account, how might you model the communication cost? Would the algorithms we discussed need to be modified to make them more effective? Would you choose certain architectures over others, in order to permit efficient implementation?
- What would you need to do to handle
`n`

that is not a power-of-two, in the reduction and prefix algorithms? How would you change them if the number of processors is much greater than`n`

? What are the speed up and efficiency of these algorithms?

**Reading assignment:****Review questions:**- Do you know the following? the six "basic" MPI calls, Isends and receives, collective communications calls, the semantics of the communication calls, potential for deadlock, and how it can be avoided.

**Examples:**www.cs.fsu.edu/~asriniva/courses/hpcsa/examples/lec13.tar

**Reading assignment:****Review questions:**- Do you know the following (in MPI)? duplicating communicators, splitting communicators, defining topologies, defining derived data types -- vectors and structures.

**Reading assignment:**- Sections: 8.2, 8.3, 8.4.1, and 8.6.1 from the text.

**Review questions:**- Do you know the following? sequential matrix-vector multiplication, different ways of distributing matrices on to processors (terminology: striped, checkerboard, block, cyclic), parallel matrix-vector multiplication with striped and checkerboard distributions.

**Miscellaneous:**- Please select project and paper presentation topics, and get my approval.

**Reading assignment:**- Sections: 8.4.1, 8.6.1, 11.1, and 11.2 from the text.

**Review questions:**- Do you know the following? how to multiply two matrices, what the time complexity of sequential matrix multiplication is, the cache model that we used, how the loop order can affect the number of cache misses in matrix multiplication.

**Reading assignment:**- Sections: 11.3 and 11.4 from the text.

**Review questions:**- Do you know the following? Cache aware and cache oblivious algorithms for sequential matrix multiplication, how to multiply two matrices using one-dimensional and 2-dimensional decompositions, why we change the initial distribution of matrix blocks in Canon's algorithm.
- If the matrices are initially distributed in a 1-D manner, then is it worthwhile changing them to 2-D in order to perform the multiplication more efficiently? Determine the time taken for the redistribution and determine the conditions under which the savings from the matrix multiplication make the re-distribution worthwhile.

**Reading assignment:**- Read the following paper: www.cs.fsu.edu/~asriniva/courses/hpcsa/karypischapter.ps.

**Review questions:**- Do you know the following? The basic aims of domain decomposition, how the domain decomposition issue is modeled as a graph partitioning problem, the justification for the edge-cut metric as a measure of the communication cost, and its shortfalls, if the graph partitioning problem is NP complete, the three geometric partitioning techniques that we studied, and their advantages and disadvantages.
- Can you define a measure to define communication cost that is better than the edge-cut metric?
- Can you define a metric to measure the communication cost so that the domain decomposition problem has a polynomial time algorithm?

**Reading assignment:**- Read the following paper: www.cs.fsu.edu/~asriniva/courses/hpcsa/karypischapter.ps (same as in Lec 19).

**Review questions:**- Do you know the following? the two combinatorial partitioning techniques that we studied, and their advantages and disadvantages, the fundamental difference between geometric and combinatorial techniques for graph partitioning, of the two combinatorial techniques that we studied, which one is used to refining an existing partition, and which one can be used to create a partition from scratch, the spectral method.

**Reading assignment:**- Read the following paper: www.cs.fsu.edu/~asriniva/courses/hpcsa/karypischapter.ps (same as in Lec 20).

**Review questions:**- Do you know the following? The three basic steps in multilevel methods, algorithms that you can use in each of those steps, the effectiveness of multilevel, relative to the other methods that we have studied, in terms of computational effort and quality of the partitioning.

**Reading assignment:**- Read the following paper: www.cs.fsu.edu/~asriniva/courses/hpcsa/karypischapter.ps (same as in Lec 20).

- We completed the discussion of multilevel methods.

**Reading assignment:**- Read sections 10.1, 10.2, and 10.3 from the text, and class notes.

**Review questions:**- Do you know the following? What is the fundamental idea behind Monte Carlo methods? What is the basic idea behind Monte Carlo integration? What advantages does Monte Carlo integration have over traditional numerical quadrature? How does the error of Monte Carlo integration decrease with the number of samples? How is Monte Carlo traditionally parallelized, and how does it scale up with the number of processes?

**Reading assignment:**- Read sections 10.1, 10.2, and 10.3 from the text, class notes, and the following paper: www.cs.fsu.edu/~asriniva/courses/hpcsa/pprng.ps.

**Review questions:**- Do you know the following? Do you understand the terminology used in random number generation, such as 'cycle', 'seed', 'iteration function', and 'period'? What are low discrepancy sequences (quasi-random numbers), how do they differ from pseudo-random numbers, and how does the error of integration decrease with number of samples when they are used? What are the two broad classes of random number parallelization techniques? Can you mention two methods under each of the two classes, and their relative advantages and disadvantages?

- Paper presentation.

- Paper presentation.

- Paper presentation.

- Paper presentation.

- Project presentations.

Last modified: 28 April 2003