Learning objectives and review

Lecture 2

Learning objectives

After this class, you should be able to:

Explain how a CUDA program is compiled.
Explain the purpose of the following variables, APIs, and keywords: blockIdx, blockDim, threadIdx, cudaMalloc, cudaFree, cudaMemcpy, __global__, __device__, __host__, cudaThreadSynchronize.
Explain how multiple threads on the GPU are partitioned into blocks and used to perform data parallel computation.
Use the above features to write simple CUDA programs.

Reading assignment

Read the UIUC Lec-2 slides.
Search online for information on cudaMallocHost and cudaFreeHost.

Exercises and review questions

Exercises and review questions on current lecture's material

Write CUDA code to compute the squares of the first N integers.
Write CUDA code to determine the following: (i) data transfer bandwidth from host to device, (ii) data transfer bandwidth from device to host, (iii) data transfer bandwidth from host to device using pinned memory, (iv) data transfer bandwidth from device to host using pinned memory, and (v) kernel creation overhead. (Post your answer on the discussion board).

Preparation for the next lecture

What is pinned memory?
Give an example of two 3 x 3 matrices and show their product.

Last modified: 12 Jan 2014