Final covers all lectures during the semester See midterm review for materials in Lecture 1 to Lecture 11 Lecture 13: Case study IBM Bluegene and InfiniBand - BlueGene nodes - BlueGene networks - InfiniBand architecture components - Infiniband link layer, transport layer, and Verbs Lecture 17,18,19,20: MPI - Write basic MPI programs - MPI p2p and collective routines Lecture 21: GPU overview - GPGPU concept - Difference between CPU and GPU and why GPU has large computing power - Fermi architecture - Warp and SIMT - CPU-GPU system bandwidth limits Lecture 22/23: CUDA programming - CUDA model - CUDA program structure - CUDA C extensions: specifying kernel, kernel launch, - special variables: threadIdx, blockIdx, blockDim, gridDim - Cuda thread organization: * grid/block structure * computing global ID - CUDA memory hierarchy: where are the registers, shared memory, local memory, constant memory, global memory - CUDA synchronization: __syncthreads() Lecture 24: CUDA programming best practices - technique to deal with long host/device memory transfer - Memory coallescing - shared memory bank conflicts - control flow divergence - occupancy Lecture 25: PRAM - Describe the model - EREW, CREW, CRCW (common), CRCW (random), CRCW (priority) - developing PRAM algorithms Lecture 26: LogP and BSP - Describe the models - Analyze communication algorithm with LogP - Relation between PRAM, LogP, BSP