Lecture 1: introduction - What is a parallel computer? - Why parallel computing? Lecture 2: architecture classification - Flynn's taxonomy: SISD, SIMD, MIMD, MISD - Modern classification: data parallelism and function parallelism - performance metrics: MIPS, FLOPS - peak performance/sustain performance - measuring performance, benchmarks Lecture 3: optimizing single thread performance - dependence, true/output/anti dependence - loop carried dependence - loop optimizations * unimodelar transformations * loop interchange * loop permutation * loop reversal * loop tiling * ... Lecture 4: SSE - What is it? - SSE data types - SSE operations - intrinsic functions - write simple SSE programs - memory alignment issues with SSE program Lecture 5: Shared memory architectures - UMA, NUMA - Cache coherence problem - Snooping bus cache coherence protocol, the MSI protocol - directory based cache coherence protocol - dealing with multiple levels of cache Lecture 6: Shared memory consistence models - shared memroy consistence issue - sequential consistency - sequential consistency requirement - weak ordering - relaxed memory models - how memory models can affect the behavior or a program Lectures 7 and 8: pthread - write simple pthread programs - pthread synchronization mechanisms Lecture 9: OpenMP - write OpenMP programs - OpenMP execution and memory models - Open directives - compile and run OpenMP programs Lecture 10: Scalable multiprocessors - the scalable computer concept - low level and high level communication abstractions - understand each term in the latency metric Lecture 11: interaction networks - components - crossbar, bus/memory, multi-stage networks - network topology metrics: diameter, nodal degree, bisection bandwidth - the construction of regular topologies: linear array, ring, 2D array (mesh), 2D-torus, d-dimensional array/torus, hypercubes, k-ary n-cube, trees, irregular topology, - direct and indirect networks - multistage networks, Clos Network, fat-tree