Final covers all lectures during the semester See midterm review for materials in Lecture 1 to Lecture 11 Lecture 12: Interconnection network topology - network topology metrics: diameter, nodal degree, bisection bandwidth - the construction of regular topologies: linear array, ring, 2D array (mesh), 2D-torus, d-dimensional array/torus, hypercubes, k-ary n-cube, trees, irregular topology, - direct and indirect networks - multistage networks, Clos Network, fat-tree Lecture 14: Switching, Flow control and Routing - packet switching, circuit switching, store-and-forward, cut-through switching - packet delay analysis - Wormhole routing - tree saturation problem in lossless network - deadlock - dimension order routing for k-ary n-cube - deadlock free routing Lecture 15: Network interface design - What is the problem? - messaging layer Lecture 16: Case study IBM Bluegene and InfiniBand - BlueGene nodes - BlueGene networks - InfiniBand architecture components - Infiniband link layer, transport layer, and Verbs Lecture 17/18: MPI - Write basic MPI programs - MPI p2p and collective routines Lecture 19: GPU overview - GPGPU concept - Difference between CPU and GPU and why GPU has large computing power - Fermi architecture - Warp and SIMT - CPU-GPU system bandwidth limits Lecture 20/21: CUDA programming - CUDA model - CUDA program structure - CUDA C extensions: specifying kernel, kernel launch, - special variables: threadIdx, blockIdx, blockDim, gridDim - Cuda thread organization: * grid/block structure * computing global ID - CUDA memory hierarchy: where are the registers, shared memory, local memory, constant memory, global memory - CUDA synchronization: __syncthreads() Lecture 22: CUDA programming best practices - technique to deal with long host/device memory transfer - Memory coallescing - shared memory bank conflicts - control flow divergence - occupancy Lecture 23: PRAM - Describe the model - EREW, CREW, CRCW (common), CRCW (random), CRCW (priority) - developing PRAM algorithms Lecture 24: LogP and BSP - Describe the models - Analyze communication algorithm with LogP - Relation between PRAM, LogP, BSP