Accelerating Container Networking: Fun with Fastpathing

Abstract

InfiniBand, GPUDirect, RoCE; modern AI workloads demand intense networking solutions. At Runpod, we orchestrate millions of GPU containers whose interconnectivity must prove dependable, scalable, and adaptable to today’s AI demands.

The Linux kernel offers a remarkable variety of drivers for building bespoke virtual networking stacks. We’ll go over bridges, bonds, VLAN/VXLANs, MACVLAN/IPVLANs and more, and have fun stacking interfaces & drivers on top of one another while maintaining end-to-end hardware offloads & RDMA fastpathing. 


Speaker

Murat Magomedov

Software Engineer @Runpod

Murat is a software engineer on Runpod's cloud team, where he designs and builds systems spanning fleets of thousands of machines across globally distributed datacenters. 
 

Read more

Session Sponsored By

AI infrastructure developers trust. Everything you need to train, deploy and scale AI all in one place.

Date

Tuesday Mar 17 / 02:45PM GMT ( 50 minutes )

Location

Westminster (4th Fl.)

Video

Video is not available

Share

From the same track

Session

Engineering for the Long Haul: Operating Open Source at Scale

Tuesday Mar 17 / 10:35AM GMT

Open-source software powers the world’s most innovative enterprises, but keeping it secure, compliant, and reliable over the long haul is where complexity (and unexpected risk) quietly creeps in.

Speaker image - Ari       Karkkainen

Ari Karkkainen

Sales Consultant @TuxCare

Session

Avoid AI Concept Creep with a Knowledge Graph Created in Hours

Tuesday Mar 17 / 03:55PM GMT

Modern data lakes and streaming architectures are optimized for data Volume, Variety and Velocity, not for preserving the Meaning of the data.

Speaker image - Atanas  Kiryakov

Atanas Kiryakov

President @Graphwise

Speaker image - Peio  Popov

Peio Popov

FSI Vertical Lead @Graphwise

Session

Sponsored Session powered by ING

Tuesday Mar 17 / 10:35AM GMT

Details coming soon!

Session

Multi-Cloud Distributed databases for Agentic AI

Tuesday Mar 17 / 11:45AM GMT

Distributed databases offer the strong consistency, performance and familiarity of a relational database allied with the scale and resilience of noSQL databases. Multi-cloud is an increasingly popular option for platforms supporting mission-critical services.

Speaker image - David Roberts

David Roberts

Senior Solutions Architect @Yugabyte

Session

Fix Nothing, Build Boldly: The Rise of Agentic Mobile Workflows

Tuesday Mar 17 / 01:35PM GMT

Mobile apps are no longer just a channel, they are the business. With 88% of mobile time spent in apps and 71% of uninstalls caused by crashes, the stakes have never been higher.

Speaker image - Daniel Day

Daniel Day

CMO @Luciq

Session

Learning to Trust your AI Agents

Tuesday Mar 17 / 05:05PM GMT

AI agents are flooding codebases with far more code than humans can possibly review – so how do you verify the integrity and reliability of your software? 

Speaker image - Lawrie  Green

Lawrie Green

Software Developer @Antithesis