Summary

Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qconlondon.com with any comments or concerns.

The presentation titled "How to Unlock Insights and Enable Discovery Within Petabytes of Autonomous Driving Data" by Kyra Mozley from Wayve discusses advanced methods for processing large-scale autonomous driving data.

The key points of the presentation are:

Foundation Models and Embeddings: The use of foundation models and embeddings is emphasized for their ability to improve data discovery through scalable search tools, which enhance data labeling and retrieval efficiency.
Vision-Language Models (VLMs): VLMs are used to locate relevant driving scenarios, crucial for meeting safety standards and evaluating driving behaviors.
Edge Cases: The identification and understanding of rare scenarios are highlighted as crucial due to their safety implications.
Technical Infrastructure: The presentation covers the architecture behind the tools, including vector databases and Flyte workflows for scalable data processing.
Future Applications: Future advancements such as on-device edge filtering are explored, aiming to reduce data storage by capturing only significant scenarios in real time.
Auto-labeling and Clustering: Discussions on auto-labeling via foundation models and clustering for unsupervised labeling, enabling the discovery of patterns within data without manual intervention.
Embedding-based Search and RAG Techniques: Introduces embedding-based search for scene retrieval and the refinement of this process using query generation and multi-query fusion techniques to enhance search accuracy.
Training Lightweight Classifiers: Proposes the training of lightweight classifiers on embeddings for specific tasks without the need for extensive data sets.

This session provided a comprehensive overview of how embedding and foundation models revolutionize data handling in autonomous vehicle technology, offering a scalable approach to processing and evaluating large data sets with minimal manual intervention.

This is the end of the AI-generated content.

Abstract

For autonomous vehicle companies, finding valuable insights within millions of hours of video data is essential yet challenging. This talk explores how we at Wayve are leveraging foundation models and embeddings to build scalable search tools that make data discovery faster and labeling more efficient.

Attendees will learn how we leverage vision-language models (VLMs) to retrieve relevant scenarios at scale, which is invaluable for pinpointing scenes needed to meet safety standards or evaluate specific driving behaviors. By using embeddings, we can train classifiers to detect specific driving competencies. Through an active learning loop, we refine these classifiers, enabling them to label similar scenarios across the entire dataset with high efficiency. This embedding-based approach is both fast and scalable, and it also helps us spot “bad data” clusters, like images with droplets on the lens or scenes from test tracks.

The presentation will delve into the technical infrastructure behind these tools, from vector databases that enable rapid similarity search to Flyte workflows that orchestrate scalable processing across distributed systems. We’ll also explore how query generation helps bridge the gap in positional awareness within text embeddings, allowing for more precise search across video datasets. Finally, the talk will close with a look toward future possibilities, such as on-device edge filtering, which would use embeddings to reduce storage costs by capturing only the most interesting scenarios in real time.

Designed for engineers and data scientists, this session provides a deep dive into the power of embeddings and VLMs for labeling and retrieving data at scale, making it possible to unlock insights and drive advancements in autonomous vehicle technology.

Interview:

What is the focus of your work?

My work focuses on building scalable pipelines for running perception models at scale (e.g., segmentation, cuboids, CLIP) and enhancing semantic search capabilities to enable users to search and retrieve relevant video data using natural language. I’m also experimenting with the latest vision-language models (VLMs) for video understanding to address these challenges and perform data mining at scale.

What’s the motivation for your talk?

The motivation stems from the growing need for efficient data mining and discovery in the autonomous vehicle industry. With petabytes of data collected from our fleet, we need to surface valuable insights across diverse teams—from safety validation to offline evaluation—while addressing specific challenges like dataset coverage, behavioural evaluation, and bad data removal. This talk aims to highlight how we leverage foundation models and embeddings to solve these challenges, showcasing how scalable search and retrieval tools can transform data understanding and accelerate innovation.

Who is your talk for?

This talk is designed for data scientists, machine learning engineers, and anyone working on video understanding, large-scale ML pipelines, search and retrieval, or autonomous driving technology. It’s particularly relevant for teams dealing with vast datasets and looking to leverage open foundation models for smarter data processing and retrieval.

What do you want someone to walk away with from your presentation?

I want attendees to leave with a clear understanding of how foundation models and embeddings can revolutionise data retrieval and labelling at scale. They’ll learn how to design robust pipelines for large-scale data processing, use VLMs for scene retrieval, and apply learning loops to scale classifier training. Additionally, I hope to inspire ideas for exploring future possibilities, like on-device edge filtering, to optimise storage and processing costs.

What do you think is the next big disruption in software?

The next big disruption will likely come from the intersection of edge computing and foundation models. Real-time, on-device filtering and decision-making will enable systems to process and prioritise data at the source, dramatically reducing storage and compute costs while enhancing the speed and accuracy of downstream tasks. This will be particularly transformative in fields like autonomous vehicles, IoT, and video understanding, where real-time insights are critical.

Speaker

Kyra Mozley

ML Engineer @Wayve, Previously Security & AI PhD Candidate @Royal Holloway University

Machine Learning Engineer @ Wayve. With a background in computer vision and deep learning, Kyra leads the development of tools that leverage foundation models and embeddings to efficiently process and understand vast amounts of autonomous vehicle driving data.

Speaker

Kyra Mozley

ML Engineer @Wayve, Previously Security & AI PhD Candidate @Royal Holloway University

How to Unlock Insights and Enable Discovery Within Petabytes of Autonomous Driving Data

Summary

Abstract

Interview:

What is the focus of your work?

What’s the motivation for your talk?

Who is your talk for?

What do you want someone to walk away with from your presentation?

What do you think is the next big disruption in software?

Speaker

Kyra Mozley

Find Kyra Mozley at:

Speaker

Kyra Mozley

Date

Location

Track

Topics

Slides

Share

From the same track

Deploy MultiModal RAG Systems with vLLM

AI for Food Image Generation in Production: How & Why

Foundation Models for Ranking: Challenges, Successes, and Lessons Learned

Building Embedding Models for Large-Scale Real-World Applications

Unconference: AI and ML for Software Engineers

Follow QCon

Contact

Menu

Conferences around the World