In-network Computing with SmartNICs for Parallel Applications

uofsc_logo
BES Engineering logo
Renci logo with a blue, white and blue arrow
Green and black square logo with the word "NVIDIA" in black font.
Georgia Tech logo with the letters "GT" in yellow and "Georgia Tech" in blue
nsf_onr_logo

 

 

 

 

 

 

 

 

In-network Computing with SmartNICs for Parallel Applications

 

40th IEEE International Parallel & Distributed Processing Symposium

May 25-29, 2026
Marriott on Canal Street
New Orleans, USA

  •  
  •  
  •  

Overview

Data Processing Units (DPUs) are programmable processors designed to offload and accelerate infrastructure workloads and data processing. This tutorial introduces the NVIDIA BlueField-3 DPU and examines its programming models including the DOCA SDK, P4, and DPDK. It also demonstrates High-performance Computing (HPC) workloads that can be offloaded to the DPU.

Audience

This tutorial is intended for HPC users, application developers, researchers and developers of programming models and communication libraries, as well as tool developers who are interested in leveraging next-generation SmartNICs for HPC.

Tutorial Goals

By participating in this comprehensive tutorial, attendees will gain:

  • An understanding of asynchronous programmable engines, such as SmartNICs, and their evolution in HPC architectures, including an overview of current efforts by major vendors such as NVIDIA, Intel, and AMD.
  • Familiarity with programming models for SmartNICs, such as vendor-supported frameworks like P4 and DOCA, OpenMP offloading, and communication offloading with MPI.
  • Practical knowledge of leveraging SmartNICs for in-line packet processing, communication offload optimizations, storage optimizations, and algorithmic changes in applications.
  • Real-world application experiences and mini-apps case studies that leverage SmartNICs and DPUs
  • Hands-on experience with exercises covering a variety of application examples, including tutorials on P4 and DOCA features, blocking and nonblocking MPI collective offload operations, OpenMP offload for DPUs, and using accelerators like Data Path Accelerators (DPAs).

Pre-requisites

Connectivity to the Internet and a browser to access the online virtual platform. Attendees will be provided with an account to access USC’s NETLAB system: https://netlab.cec.sc.edu/

Agenda

Monday, May 25 
Time (CDT)TopicDescription
8:30 - 8:40 IntroductionAttendee Survey
8:40-9:20Communication OffloadingData offloading (CPU→GPU), SmartNIC overview, examples (DPUs, IPUs), concepts: packet processing, computation offloading
9:20-10:00SmartNIC Use CasesPacket processing, HPC offload, AI/gRPC, cyber-security
10:00-10:30BREAK 
10:30-11:15Infrastructure SWDOCA and P4 frameworks
11:15-12:00Hands-onDOCA and P4 demo
12:00-1:00LUNCH 
1:00-1:45HPC ProgrammingMPI collective offload, OpenMP offload
1:45-2:30Hands-on HPCMPI and OpenMP demo
2:30-3:00Storage AccelerationVendor use cases, NVMe offload, Virtio-FS, microservice offload (checksums, erasure coding)
3:00-3:30BREAK 
3:30-4:00Future UsesAI acceleration, quantum networking, TBD
4:00-4:05Tutorial Survey 
4:05-5:00Hands-onAdded demos
Slides