In-network Computing with SmartNICs for Parallel Applications

In-network Computing with SmartNICs for Parallel Applications

40th IEEE International Parallel & Distributed Processing Symposium

May 25-29, 2026
Marriott on Canal Street
New Orleans, USA

Overview

Data Processing Units (DPUs) are programmable processors designed to offload and accelerate infrastructure workloads and data processing. This tutorial introduces the NVIDIA BlueField-3 DPU and examines its programming models including the DOCA SDK, P4, and DPDK. It also demonstrates High-performance Computing (HPC) workloads that can be offloaded to the DPU.

Audience

This tutorial is intended for HPC users, application developers, researchers and developers of programming models and communication libraries, as well as tool developers who are interested in leveraging next-generation SmartNICs for HPC.

Tutorial Goals

By participating in this comprehensive tutorial, attendees will gain:

An understanding of asynchronous programmable engines, such as SmartNICs, and their evolution in HPC architectures, including an overview of current efforts by major vendors such as NVIDIA, Intel, and AMD.
Familiarity with programming models for SmartNICs, such as vendor-supported frameworks like P4 and DOCA, OpenMP offloading, and communication offloading with MPI.
Practical knowledge of leveraging SmartNICs for in-line packet processing, communication offload optimizations, storage optimizations, and algorithmic changes in applications.
Real-world application experiences and mini-apps case studies that leverage SmartNICs and DPUs
Hands-on experience with exercises covering a variety of application examples, including tutorials on P4 and DOCA features, blocking and nonblocking MPI collective offload operations, OpenMP offload for DPUs, and using accelerators like Data Path Accelerators (DPAs).

Pre-requisites

Connectivity to the Internet and a browser to access the online virtual platform. Attendees will be provided with an account to access USC’s NETLAB system: https://netlab.cec.sc.edu/

Agenda

Monday, May 25

Time (CDT)	Topic	Description
8:30 - 8:40	Introduction	Attendee Survey
8:40-9:20	Communication Offloading	Data offloading (CPU→GPU), SmartNIC overview, examples (DPUs, IPUs), concepts: packet processing, computation offloading
9:20-10:00	SmartNIC Use Cases	Packet processing, HPC offload, AI/gRPC, cyber-security
10:00-10:30	BREAK
10:30-11:15	Infrastructure SW	DOCA and P4 frameworks
11:15-12:00	Hands-on	DOCA and P4 demo
12:00-1:00	LUNCH
1:00-1:45	HPC Programming	MPI collective offload, OpenMP offload
1:45-2:30	Hands-on HPC	MPI and OpenMP demo
2:30-3:00	Storage Acceleration	Vendor use cases, NVMe offload, Virtio-FS, microservice offload (checksums, erasure coding)
3:00-3:30	BREAK
3:30-4:00	Future Uses	AI acceleration, quantum networking, TBD
4:00-4:05	Tutorial Survey
4:05-5:00	Hands-on	Added demos
Slides