Lecture 5

Learning objectives

After this class, you should be able to:

  1. Explain how use of shared memory can ameliorate the memory bandwidth bottleneck.
  2. Explain the purpose of the following CUDA keywords and API: __shared__, __constant__, __syncthreads().
  3. Give the approximate latencies for accessing data in (i) register, (ii) shared memory, and (iii) global memory. Also give their lifetime and scope.
  4. Give the typical sizes of shared memory and L1 cache.
  5. Use shared memory to reduce the data transfer overhead in GPU code.
  6. Given a problem, calculate limits on performance based on the memory bandwidth bottleneck.

Reading assignment

  1. Read the UIUC Lec-5 slides.
  2. Chapter 5 of text.

Exercises and review questions


Last modified: 24 Jan 2013