CDA5125 Introduction to Parallel and Distributed Systems, Spring 2022


Syllabus, Code Examples


Lecture 1 (01/05): Syllabus, Introduction

Lecture 2 (01/07): PDS basics

Lecture 3 (01/10): PDS basics (II)

Lecture 4 (01/12): PDS basics (III), Zoom recording

Homework 1, naive_mm.c. Due January 17, 11:59pm.

Lecture 5 (01/14): CPU core architecture and single thread performance, Zoom recording

Lecture 6 (01/19): Affine loops and dependence analysis, Zoom recording

Lecture 7 (01/21, 01/24): Loop optimizations , Zoom recording

Lecture 8 (01/26, 01/28), : Deep neural networks from scratch

Programming assignment 1: Deep Neural Network for Hand-Written Digit Recognition, a sample training output, Due: 02/07 - Part 1, and 02/14 - Part 2.

Lecture 9 (01/31, 02/02): x86 SIMD extensions, Zoom Recording (01/31) , Zoom Recording (02/02)

Lecture 10 (02/04): Shared Memory Architectures, Zoom recording

Programming assignment 2: Improving Deep Neural Network Code with x86 Vector Extensions, Due: 02/21.

Lecture 11 (02/07, 02/09, 02/11): Introduction to OpenMP, Zoom Recording (02/11)

Lecture 12 (02/11, 02/14): OpenMP for NUMA Architectures, Zoom Recording(02/14)

Lecture 13 (02/16): Scalable Computers

Lecture 14 (02/18): Interconnection Networks

Programming assignment 3: Parallelizing Deep Neural Network Code with OpenMP, Due: 03/07.

Homework 2: Read the following paper and write a critique of the paper (you can use the template), Due March 2.

J. Kim, W. J. Dally, S. Scott and D. Abts, "Technology-Driven, Highly-Scalable Dragonfly Topology," 2008 International Symposium on Computer Architecture, 2008, pp. 77-88, doi: 10.1109/ISCA.2008.19.

Lecture 15 (02/21, 02/23, 02/25, 02/28): Interconnect Topology, Zoom recording (02/21), Zoom Recording (02/28)

Course project information

Lecture 16 (02/28, 03/2, 03/04): Routing, Switching, and Flow Control, Zoom Recording (03/02), Zoom Recording (03/04)

Lecture 17 (03/07): State of the art Interconnect Design: Slingshot and its analysis, Slides (Saptarshi Bhowmik and Rubayet Rahman Rongon)

Daniele De Sensi, Salvatore Di Girolamo, Kim H. McMahon, Duncan Roweth, and Torsten Hoefler. 2020. An in-depth analysis of the slingshot interconnect. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '20). IEEE Press, Article 35.

Lecture 18 (03/09): State of the art Interconnect Design: 3D-Hyper-Flex-LION, Slides (Ram Chaulagain and Tusher Chandra Mondol), Zoom recording.

Gengchen Liu, Roberto Proietti, Marjan Fariborz, Pouya Fotouhi, Xian Xiao, and S. J. Ben Yoo. 2020. Architecture and performance studies of 3D-Hyper-FleX-LION for reconfigurable all-to-all HPC networks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '20). IEEE Press, Article 26, 1-16.

Lecture 19 (03/11): Programming Distributed Memory Systems: Message Passing Interface 1, Zoom Recording

Homework 3, Due: March 25.

Lecture 20 (03/21): Programming Distributed Memory Systems: Message Passing Interface 2, Zoom Recording

Lecture 21 (03/23): Programming Distributed Memory Systems: Message Passing Interface 3 - Domain decomposition

Lecture 22 (03/25, 03/28): MPI implementation, Zoom Recording (03/25), Zoom Recording (03/28)

Programming assignment 4: Parallelizing Deep Neural Network Code with MPI, Due: 04/08.

Lecture 23 (03/30): State of the art - Security issues in Parallel and Distributed Computing - Side channel attacks and defenses (Kazi), Slides, Zoom Recording

Lecture 24 (04/04): State of the art - topology aware job scheduling (Patrick and Zach), Slides, Zoom Recording

Staci A. Smith and David K. Lowenthal, "Jigsaw: A High-Utilization, Interference-Free Job Scheduler for Fat-Tree Clusters" ACM HPDC 2021.

Lecture 25 (04/01): GPU overview, Zoom Recording

Lecture 26 (04/06): CUDA programming I, Zoom Recording

Lecture 27 (04/08): CUDA programming II, Zoom Recording

Lecture 28 (04/11): CUDA programming III, Zoom Recording

Lecture 29 (04/13): State of the art - CUDA Unified Memory (Jack and Luiz), Slides, vecadd_um.cu, overload_um.cu, Zoom Recording

Cuda Unified Memory Tutorial

Programming assignment 5: GPU Deep Neural Network Code (optional), Due: 04/22.

Some information about project presentation and report

Lecture 30 (04/15): State of the art - MPI-3 Neighborhood Collective (Mohsen), Zoom Recording

S. Mahdieh Ghazimirsaeed, Qinghua Zhou, Amit Ruhela, and Mohammadreza Bayatpour. 2020. A hierarchical and load-aware design for large message neighborhood collectives. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '20). IEEE Press, Article 34, 1-13.

Term project presentation (Monday, April 18): Zoom Recording

Term project presentation (Wednesday, April 20):

Term project presentation (Friday, April 22):

Final exam will be a take-home, open everything exam (no discussion), 7:30am-11:59am, Monday April 25.