NVIDIA DLI: Fundamentals of Accelerated Computing with CUDA Python - UALR

By Arkansas High Performance Computing Center

Overview

Learn how to use Numba—the just-in-time Python function compiler—to accelerate Python programs to run on massively parallel NVIDIA GPUs.

This NVIDIA Deep Learning Institute (DLI) course explores how to use Numba—the just-in-time, type-specializing Python function compiler—to accelerate Python programs to run on massively parallel NVIDIA GPUs. You’ll learn how to: · Use Numba to compile CUDA kernels from NumPy universal functions (ufuncs) · Use Numba to create and launch custom CUDA kernels · Apply key GPU memory management techniques Upon completion, you’ll be able to use Numba to compile and launch CUDA kernels to accelerate your Python applications on NVIDIA GPUs.

Dr. Recep Erol, one of two NVIDIA Deep Learning Institute ambassadors for the state of Arkansas, will lead this workshop. NVIDIA DLI certificates are awarded to all participants who pass the assessment test at the end of the workshop. There are 80 seats available for this workshop.

Learning Objectives

At the conclusion of the workshop, you’ll have an understanding of the fundamental tools and techniques for GPU-accelerated Python applications with CUDA and Numba:

GPU-accelerate NumPy ufuncs with a few lines of code.
Configure code parallelization using the CUDA thread hierarchy.
Write custom CUDA device kernels for maximum performance and flexibility.
Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth.

Topics Covered

Numba

Location

Donaghey College of Engineering & Information Technology (EIT) building Auditorium, University of Arkansas in Little Rock, 2801 S University Ave, Little Rock, AR 72204

Parking

Please park in Lot 8 adjacent to the Donaghey College of Engineering & Information Technology (EIT) building.

Course Outline

Introduction to CUDA Python with Numba

Begin working with the Numba compiler and CUDA programming in Python.
Use Numba decorators to GPU-accelerate numerical Python functions.
Optimize host-to-device and device-to-host memory transfers.

Custom CUDA Kernels in Python with Numba

Learn CUDA’s parallel thread hierarchy and how to extend parallel program possibilities.
Launch massively parallel custom CUDA kernels on the GPU.
Utilize CUDA atomic operations to avoid race conditions during parallel execution.

Multidimensional Grids, and Shared Memory for CUDA Python with Numba

Learn multidimensional grid creation and how to work in parallel on 2D matrices.
Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices.

Final Review

Review key learnings and wrap up questions.
Complete the assessment to earn a certificate.
Take the workshop survey.

Category: Science & Tech, Science

Good to know

Highlights

7 hours
In person

Location

University of Arkansas at Little Rock

2801 South University Avenue

Little Rock, AR 72204

How do you want to get there?

Driving Public transport Biking Walking

Organized by

Arkansas High Performance Computing Center

Followers

Events

Hosting

Related to this event

Free

Jan 9 · 10:00 AM CST