Performance Engineering

Session Machine Learning Infrastructure

From S3 to GPU in One Copy: Rethinking Data Loading for ML Training

Tuesday Mar 17 / 11:45AM GMT

ML training pipelines treat data as static. Teams spend weeks preprocessing datasets into WebDataset or TFRecords, and when they want to experiment with curriculum learning or data mixing, they reprocess everything from scratch.

Speaker image - Onur Satici

Onur Satici

Staff Engineer @SpiralDB & a Core Maintainer of Vortex (LF AI & Data), Previously Building Distributed Systems @Palantir