🤖 AI Summary
NVIDIA and Oak Ridge National Laboratory are running a 13-part CUDA C++ training series aimed at new and experienced GPU programmers. Each session pairs a one-hour lecture with a one-hour hands-on exercise (exercises hosted on GitHub) to reinforce core CUDA concepts. Topics span introductory material through advanced techniques—shared memory, atomics/reductions/warp shuffle, managed memory, CUDA concurrency, performance analysis, cooperative groups, streams and multithreading, Multi-Process Service (MPS), debugging, and CUDA Graphs—offering a structured pathway to writing and optimizing GPU code for HPC and ML workloads.
The series is significant because it combines practical labs with focused lectures to accelerate developer proficiency in GPU programming and performance tuning—skills crucial for scalable ML training and HPC simulations. Remote participants can view broadcasts and access exercises, but temporary compute access is limited (Cori-GPU access will be added for current NERSC users; temporary Summit and Theta GPU access are not available remotely). Attendees must register for each session individually. For practitioners seeking to improve throughput, reduce kernel bottlenecks, and adopt modern CUDA features like cooperative groups and graphs, this curriculum provides a compact, practice-oriented roadmap.
Loading comments...
login to comment
loading comments...
no comments yet