This course will detail how to use the Compute API in particular the CUDA C API. This course will introduce the students to the massively parallel world of compute programming. Coming from a sequential logic background, programmers will be given concrete examples to consolidate their understanding of the new parallel programming paradigm. Students will be introduced to the NVIDIA CUDA computing toolset in a practical setting. Topics covered will range from introduction to the parallel programming model, CUDA kernels and their specific compilation/launch parameters, the various memory models of CUDA, asynchronous kernel launches as well as inter-thread communication through shared memory optimizations.

Each section will be concluded with a practical use case. Examples will range from basic 2D image processing (in both spatial and frequency domain), to advanced smoke/fluid solvers. The interoperability between CUDA C and OpenGL graphics API will be covered as well to ensure that the students are able to use the massively parallel processing capabilities in a practical real-time game or simulation scenario. Students will be expected to create one project based on the knowledge gathered during the course duration from the list of projects detailed later.

- Introduction to GPU Compute Programming
- Introduction to the CUDA Programming Model and the CUDA C API
- CUDA Memory Models
- Parallel Thread Execution
- Control Flow/Precision
- Introduction to the CUDA Computing Tools
- Use case: Matrix Vector Multiplication
- Use case: 2D image processing
- Shared Memory Optimization
- Use case: 2D/3D Cloth Simulation
- Use case: Volume rendering

- Introduction to CUDA/OpenGL Interoperability
- Section Demo: Simple Ray tracing/Path tracing
- CUDA Threading and Memory Models In-depth
- Section Demo: 3D soft body simulation using TLED algorithm
- Memory Bank and Shared Memory Conflicts
- Use case: Optimizing the 2D/3D Cloth Simulation
- Parallel Patterns
- High level CUDA wrappers (Introduction to Thrust)
- Sparse Matrix Vector Multiplication
- Use case: Implicit Cloth Simulation
- Use case: 2D smoke and fluid simulation in CUDA
- Optimization Tips
- Conclusion

Total Assignments: 5-7

Total Quizzes: 5-7

Reference Books:

- Programming Massively Parallel Processors: A Hands-on Approach by David B. Kirk, Wen-mei W. Hwu
- CUDA by Example by Jason Sanders and Edward Kandrot
- The CUDA Handbook: A Comprehensive Guide to GPU Programming by Nicolas Wilt.
- CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs (Applications of GPU Computing Series) by Shane Cook.
- GPU Computing Gems Emerald Edition by Wen-mei W. Hwu.
- GPU Computing Gems Jade Edition by Wen-mei W. Hwu.
- Heterogeneous Computing with OpenCL 2nd Edition by Benedict Gaster, Lee Howes, David R. Kaeli and Perhaad Mistry
- OpenCL in Action: How to Accelerate Graphics and Computations by Mathew Scarpino
- OpenCL Programming Guide by Aaftab Munshi, Benedict Gaster, Timothy G. Mattson

- Image processing in spatial domain
- Image processing in frequency domain
- Volume rendering in CUDA
- Physically based volume rendering in CUDA
- Smoke and fluid simulation in CUDA
- Physically based animation in CUDA
- Soft body simulation in CUDA
- Implicit cloth solver in CUDA
- Continuous collision detection in CUDA
- Efficient self-collision and tearing for cloth simulation
- Fast and efficient convex decomposition on CUDA

- Portable 3D scanner (using Microsoft Kinect v2 and VisionWorks/OpenCV)
- Real-time path/ray tracing renderer for previewing architectural walkthroughs
- Particle systems test bed
- GPU based real-time physics test bed
- Fluid/Smoke solver
- Any suggested projects from NVIDIA RnD Teams