menu

38 results
Sort by: Relevance
Quest

Fundamentals of Accelerated Computing with OpenACC

Learn the basics of OpenACC, a high-level programming language for programming on GPUs. This course is for anyone with some C/C++ experience who is interested in accelerating the performance of their applications beyond the limits of CPU-only programming. In this course, you’ll learn: • Four simple steps to accelerating your already existing application with OpenACC • How to profile and optimize your OpenACC codebase • How to program on multi-GPU systems by combining OpenACC with MPI Upon completion, you’ll be able to build and optimize accelerated heterogeneous applications on multiple GPU clusters using a combination of OpenACC, CUDA-aware MPI, and NVIDIA profiling tools.

Hands-On Lab

GPU Memory Optimizations with Fortran

In this lab, you'll learn about a number of memory optimization techniques when programming with CUDA Fortran for an NVIDIA GPU. You'll be working with a basic matrix transpose example. The prerequisites for this lab are as follows: Basic knowledge of programming with CUDA Fortran Please read the instructions at the bottom of this page before clicking the Start Lab button!

Hands-On Lab

Image Classification with Microsoft Cognitive Toolkit

This lab teaches you how to use the Computational Network Toolkit (CNTK) from Microsoft for training and testing neural networks to recognize handwritten digits. You will work through a series of examples that will allow you to design, create, train and test a neural network to classify the MNIST handwritten digit dataset, illustrating the use of convolutional, pooling and fully connected layers as well as different types of activation functions. By the end of the lab you will have basic knowledge of convolutional neural networks, which will prepare you to move to more advanced usage of CNTK.

Hands-On Lab

GPU Memory Optimizations with C/C++

In this lab, you'll learn about a number of memory optimization techniques when programming with CUDA C/C++ for an NVIDIA GPU. You'll be working with a basic matrix transpose example. Users should have basic knowledge of programming with CUDA C/C++.

Hands-On Lab

Using Thrust to Accelerate C++

Thrust is a parallel algorithms library loosely based on the C++ Standard Template Library. Thrust provides a number of building blocks, such as sort, scans, transforms, and reductions, to enable developers to quickly embrace the power of parallel computing. In addition to targeting the massive parallelism of NVIDIA GPUs, Thrust supports multiple system back-ends such as OpenMP and Intel’s Threading Building Blocks. This means that it’s possible to compile your code for different parallel processors with a simple flick of a compiler switch. In 90-minutes, you will work through a number of exercises including: Basic Iterators, Containers, and Functions Built-in and Custom Functors Fancy Iterators Portability to CPU processing Exception and Error handling A case study implementing all of the above

Hands-On Lab

Intermediate Techniques for CUDA Python Programming with Numba

Learn about shared memory, generalized ufuncs, and GPU dataframes, intermediate topics for CUDA Python programming with Numba.

English 日本語 简体中文
Hands-On Lab

OpenACC - 2X in 4 Steps

Learn how to accelerate your C/C++ or Fortran application using OpenACC to harness the massively parallel power of NVIDIA GPUs. OpenACC is a directive based approach to computing where you provide compiler hints to accelerate your code, instead of writing the accelerator code yourself. In 90 minutes, you will experience a four-step process for accelerating applications using OpenACC: Characterize and profile your application Add compute directives Add directives to optimize data movement Optimize your application using kernel scheduling

Hands-On Lab

Managing Accelerated Application Memory with CUDA C/C++ Unified Memory and nvprof

Leverage the NVIDIA Command-Line Profiler and an understanding of Unified Memory to iteratively optimize CUDA C/++ accelerated applications.

Hands-On Lab

Accelerating Applications with GPU-Accelerated Libraries in C/C++

Learn how to accelerate your C/C++ application using drop-in libraries to harness the massively parallel power of NVIDIA GPUs. In about two hours, you will work through three exercises, including: Use cuBLAS to accelerate a basic matrix multiply Combine libraries by adding some cuRAND API calls to the previous cuBLAS calls Use nvprof to profile code and optimize with some CUDA Runtime API calls

Hands-On Lab

Deployment for Intelligent Video Analytics using TensorRT

First version

Filter

Duration
expand_more
Modality
expand_more
Language
expand_more
home
Home
school
Catalog
menu
More
More