How to Unlock Insights and Enable Discovery Within Petabytes of Autonomous Driving Data

For autonomous vehicle companies, finding valuable insights within millions of hours of video data is essential yet challenging. This talk explores how we at Wayve are leveraging foundation models and embeddings to build scalable search tools that make data discovery faster and labeling more efficient.

Attendees will learn how we leverage vision-language models (VLMs) to retrieve relevant scenarios at scale, which is invaluable for pinpointing scenes needed to meet safety standards or evaluate specific driving behaviors. By using embeddings, we can train classifiers to detect specific driving competencies. Through an active learning loop, we refine these classifiers, enabling them to label similar scenarios across the entire dataset with high efficiency. This embedding-based approach is both fast and scalable, and it also helps us spot “bad data” clusters, like images with droplets on the lens or scenes from test tracks.

The presentation will delve into the technical infrastructure behind these tools, from vector databases that enable rapid similarity search to Flyte workflows that orchestrate scalable processing across distributed systems. We’ll also explore how query generation helps bridge the gap in positional awareness within text embeddings, allowing for more precise search across video datasets. Finally, the talk will close with a look toward future possibilities, such as on-device edge filtering, which would use embeddings to reduce storage costs by capturing only the most interesting scenarios in real time.

Designed for engineers and data scientists, this session provides a deep dive into the power of embeddings and VLMs for labeling and retrieving data at scale, making it possible to unlock insights and drive advancements in autonomous vehicle technology.


Speaker

Kyra Mozley

Machine Learning Engineer @Wayve

Machine Learning Engineer @ Wayve. With a background in computer vision and deep learning, Kyra leads the development of tools that leverage foundation models and embeddings to efficiently process and understand vast amounts of autonomous vehicle driving data.

Read more

From the same track

Session

Deploy MultiModal RAG Systems with vLLM

While text-based RAG systems have been everywhere in the last year and a half, there is so much more than text data. Images, audio, and documents often need to be processed together to provide meaningful insights, yet most RAG implementations focus solely on text.

Speaker image - Stephen Batifol

Stephen Batifol

Developer Advocate @Zilliz, Founding Member of the MLOps Community Berlin, Previously Machine Learning Engineer @Wolt, and Data Scientist @Brevo

Session

AI for Food Image Generation in Production: How & Why

In this talk, we will conduct a technical overview of a client-facing Food Image Generation solution developed at Delivery Hero.

Speaker image - Iaroslav  Amerkhanov

Iaroslav Amerkhanov

Senior Data Scientist @Delivery Hero