In-network Computing with SmartNICs for Parallel Applications
40th IEEE International Parallel & Distributed Processing Symposium
May 25-29, 2026
Marriott on Canal Street
New Orleans, USA
Overview
Data Processing Units (DPUs) are programmable processors designed to offload and accelerate infrastructure workloads and data processing. This tutorial introduces the NVIDIA BlueField-3 DPU and examines its programming models including the DOCA SDK, P4, and DPDK. It also demonstrates High-performance Computing (HPC) workloads that can be offloaded to the DPU.
Audience
This tutorial is intended for HPC users, application developers, researchers and developers of programming models and communication libraries, as well as tool developers who are interested in leveraging next-generation SmartNICs for HPC.
Tutorial Goals
By participating in this comprehensive tutorial, attendees will gain:
- An understanding of asynchronous programmable engines, such as SmartNICs, and their evolution in HPC architectures, including an overview of current efforts by major vendors such as NVIDIA, Intel, and AMD.
- Familiarity with programming models for SmartNICs, such as vendor-supported frameworks like P4 and DOCA, OpenMP offloading, and communication offloading with MPI.
- Practical knowledge of leveraging SmartNICs for in-line packet processing, communication offload optimizations, storage optimizations, and algorithmic changes in applications.
- Real-world application experiences and mini-apps case studies that leverage SmartNICs and DPUs
- Hands-on experience with exercises covering a variety of application examples, including tutorials on P4 and DOCA features, blocking and nonblocking MPI collective offload operations, OpenMP offload for DPUs, and using accelerators like Data Path Accelerators (DPAs).
Pre-requisites
Connectivity to the Internet and a browser to access the online virtual platform. Attendees will be provided with an account to access USC’s NETLAB system: https://netlab.cec.sc.edu/
Agenda
Monday, May 25
| Time (CDT) | Topic | Description |
|---|---|---|
| 8:30 - 8:40 | Introduction | Attendee Survey |
| 8:40-9:20 | Communication Offloading | Data offloading (CPU→GPU), SmartNIC overview, examples (DPUs, IPUs), concepts: packet processing, computation offloading |
| 9:20-10:00 | SmartNIC Use Cases | Packet processing, HPC offload, AI/gRPC, cyber-security |
| 10:00-10:30 | BREAK | |
| 10:30-11:15 | Infrastructure SW | DOCA and P4 frameworks |
| 11:15-12:00 | Hands-on | DOCA and P4 demo |
| 12:00-1:00 | LUNCH | |
| 1:00-1:45 | HPC Programming | MPI collective offload, OpenMP offload |
| 1:45-2:30 | Hands-on HPC | MPI and OpenMP demo |
| 2:30-3:00 | Storage Acceleration | Vendor use cases, NVMe offload, Virtio-FS, microservice offload (checksums, erasure coding) |
| 3:00-3:30 | BREAK | |
| 3:30-4:00 | Future Uses | AI acceleration, quantum networking, TBD |
| 4:00-4:05 | Tutorial Survey | |
| 4:05-5:00 | Hands-on | Added demos |
| Slides | ||