Lecture 1
Lecture 2 Lecture 3Your code should meet the following requirement: (1) it should produce correct results for all matrix sizes (0 point otherwise; using -DCHECK to produce output for checking); (2) it should achieve a speedup of 4 for most reasonably large matrix sizes; and (3) for 1024x1024 matrix, the multiplication time should be less than 2.1 second. The program should be compiled with -O3 flag when testing performance.
Lecture 10
Lecture 11 Lecture 12 Lecture 13 Lecture 14 Lecture 15