Learn the basics of OpenACC, a high-level programming language for programming on GPUs. This course is for anyone with some C/C++ experience who is interested in accelerating the performance of their applications beyond the limits of CPU-only programming. In this course, you’ll learn: • Four simple steps to accelerating your already existing application with OpenACC • How to profile and optimize your OpenACC codebase • How to program on multi-GPU systems by combining OpenACC with MPI Upon completion, you’ll be able to build and optimize accelerated heterogeneous applications on multiple GPU clusters using a combination of OpenACC, CUDA-aware MPI, and NVIDIA profiling tools.
In this lab, you'll learn about a number of memory optimization techniques when programming with CUDA Fortran for an NVIDIA GPU. You'll be working with a basic matrix transpose example. The prerequisites for this lab are as follows: Basic knowledge of programming with CUDA Fortran Please read the instructions at the bottom of this page before clicking the Start Lab button!
This lab teaches you how to use the Computational Network Toolkit (CNTK) from Microsoft for training and testing neural networks to recognize handwritten digits. You will work through a series of examples that will allow you to design, create, train and test a neural network to classify the MNIST handwritten digit dataset, illustrating the use of convolutional, pooling and fully connected layers as well as different types of activation functions. By the end of the lab you will have basic knowledge of convolutional neural networks, which will prepare you to move to more advanced usage of CNTK.
In this lab, you'll learn about a number of memory optimization techniques when programming with CUDA C/C++ for an NVIDIA GPU. You'll be working with a basic matrix transpose example. Users should have basic knowledge of programming with CUDA C/C++.
Thrust is a parallel algorithms library loosely based on the C++ Standard Template Library. Thrust provides a number of building blocks, such as sort, scans, transforms, and reductions, to enable developers to quickly embrace the power of parallel computing. In addition to targeting the massive parallelism of NVIDIA GPUs, Thrust supports multiple system back-ends such as OpenMP and Intel’s Threading Building Blocks. This means that it’s possible to compile your code for different parallel processors with a simple flick of a compiler switch. In 90-minutes, you will work through a number of exercises including: Basic Iterators, Containers, and Functions Built-in and Custom Functors Fancy Iterators Portability to CPU processing Exception and Error handling A case study implementing all of the above
Learn about shared memory, generalized ufuncs, and GPU dataframes, intermediate topics for CUDA Python programming with Numba.
Learn how to accelerate your C/C++ or Fortran application using OpenACC to harness the massively parallel power of NVIDIA GPUs. OpenACC is a directive based approach to computing where you provide compiler hints to accelerate your code, instead of writing the accelerator code yourself. In 90 minutes, you will experience a four-step process for accelerating applications using OpenACC: Characterize and profile your application Add compute directives Add directives to optimize data movement Optimize your application using kernel scheduling
Leverage the NVIDIA Command-Line Profiler and an understanding of Unified Memory to iteratively optimize CUDA C/++ accelerated applications.
Learn how to accelerate your C/C++ application using drop-in libraries to harness the massively parallel power of NVIDIA GPUs. In about two hours, you will work through three exercises, including: Use cuBLAS to accelerate a basic matrix multiply Combine libraries by adding some cuRAND API calls to the previous cuBLAS calls Use nvprof to profile code and optimize with some CUDA Runtime API calls