For autonomous vehicle companies, finding valuable insights within millions of hours of video data is essential yet challenging. This talk explores how we at Wayve are leveraging foundation models and embeddings to build scalable search tools that make data discovery faster and labeling more efficient.
Attendees will learn how we leverage vision-language models (VLMs) to retrieve relevant scenarios at scale, which is invaluable for pinpointing scenes needed to meet safety standards or evaluate specific driving behaviors. By using embeddings, we can train classifiers to detect specific driving competencies. Through an active learning loop, we refine these classifiers, enabling them to label similar scenarios across the entire dataset with high efficiency. This embedding-based approach is both fast and scalable, and it also helps us spot “bad data” clusters, like images with droplets on the lens or scenes from test tracks.
The presentation will delve into the technical infrastructure behind these tools, from vector databases that enable rapid similarity search to Flyte workflows that orchestrate scalable processing across distributed systems. We’ll also explore how query generation helps bridge the gap in positional awareness within text embeddings, allowing for more precise search across video datasets. Finally, the talk will close with a look toward future possibilities, such as on-device edge filtering, which would use embeddings to reduce storage costs by capturing only the most interesting scenarios in real time.
Designed for engineers and data scientists, this session provides a deep dive into the power of embeddings and VLMs for labeling and retrieving data at scale, making it possible to unlock insights and drive advancements in autonomous vehicle technology.
Speaker
Kyra Mozley
Machine Learning Engineer @Wayve
Machine Learning Engineer @ Wayve. With a background in computer vision and deep learning, Kyra leads the development of tools that leverage foundation models and embeddings to efficiently process and understand vast amounts of autonomous vehicle driving data.