Abstract
InfiniBand, GPUDirect, RoCE; modern AI workloads demand intense networking solutions. At Runpod, we orchestrate millions of GPU containers whose interconnectivity must prove dependable, scalable, and adaptable to today’s AI demands.
The Linux kernel offers a remarkable variety of drivers for building bespoke virtual networking stacks. We’ll go over bridges, bonds, VLAN/VXLANs, MACVLAN/IPVLANs and more, and have fun stacking interfaces & drivers on top of one another while maintaining end-to-end hardware offloads & RDMA fastpathing.
Speaker
Murat Magomedov
Software Engineer @Runpod
Murat is a software engineer on Runpod's cloud team, where he designs and builds systems spanning fleets of thousands of machines across globally distributed datacenters.
Session Sponsored By
AI infrastructure developers trust. Everything you need to train, deploy and scale AI all in one place.