High Performance Computing using CUDA (CG-464)

This course will detail how to use the Compute API in particular the CUDA C API. This course will introduce the students to the massively parallel world of compute programming. Coming from a sequential logic background, programmers will be given concrete examples to consolidate their understanding of the new parallel programming paradigm. Students will be introduced to the NVIDIA CUDA computing toolset in a practical setting. Topics covered will range from introduction to the parallel programming model, CUDA kernels and their specific compilation/launch parameters, the various memory models of CUDA, asynchronous kernel launches as well as inter-thread communication through shared memory optimizations.

Each section will be concluded with a practical use case. Examples will range from basic 2D image processing (in both spatial and frequency domain), to advanced smoke/fluid solvers. The interoperability between CUDA C and OpenGL graphics API will be covered as well to ensure that the students are able to use the massively parallel processing capabilities in a practical real-time game or simulation scenario. Students will be expected to create one project based on the knowledge gathered during the course duration from the list of projects detailed later.

Tentative Course Outline:

  • Introduction to GPU Compute Programming
  • Introduction to the CUDA Programming Model and the CUDA C API
  • CUDA Memory Models
  • Parallel Thread Execution
  • Control Flow/Precision
  • Introduction to the CUDA Computing Tools
    • Use case: Matrix Vector Multiplication
    • Use case: 2D image processing
  • Shared Memory Optimization
    • Use case: 2D/3D Cloth Simulation
    • Use case: Volume rendering
  • Introduction to CUDA/OpenGL Interoperability
    • Section Demo: Simple Ray tracing/Path tracing
  • CUDA Threading and Memory Models In-depth
    • Section Demo: 3D soft body simulation using TLED algorithm
  • Memory Bank and Shared Memory Conflicts
    • Use case: Optimizing the 2D/3D Cloth Simulation
  • Parallel Patterns
  • High level CUDA wrappers (Introduction to Thrust)
  • Sparse Matrix Vector Multiplication
    • Use case: Implicit Cloth Simulation
    • Use case: 2D smoke and fluid simulation in CUDA
  • Optimization Tips
  • Conclusion
Total Lectures: 40
Total Assignments: 5-7
Total Quizzes: 5-7
Reference Books:
Projects List (including but not limited to):
  1. Image processing in spatial domain
  2. Image processing in frequency domain
  3. Volume rendering in CUDA
  4. Physically based volume rendering in CUDA
  5. Smoke and fluid simulation in CUDA
  6. Physically based animation in CUDA
  7. Soft body simulation in CUDA
  8. Implicit cloth solver in CUDA
  9. Continuous collision detection in CUDA
  10. Efficient self-collision and tearing for cloth simulation
  11. Fast and efficient convex decomposition on CUDA
NVIDIA Tegra K1 Specific Projects List:
  1. Portable 3D scanner (using Microsoft Kinect v2 and VisionWorks/OpenCV)
  2. Real-time path/ray tracing renderer for previewing architectural walkthroughs
  3. Particle systems test bed
  4. GPU based real-time physics test bed
  5. Fluid/Smoke solver
  6. Any suggested projects from NVIDIA RnD Teams