Building Embedding Models for Large-Scale Real-World Applications

Embedding models are at the core of search, recommendation, and retrieval-augmented generation (RAG) systems, transforming data into meaningful representations. We can adapt state-of-the-art large language models (LLMs) into embedding models that generate high-quality embeddings, but deploying these models in large-scale applications presents significant challenges.

This talk explores the end-to-end lifecycle of embedding systems, including:

  • Leveraging LLMs for high-quality embeddings and adapting them for domain-specific use cases using contrastive learning.
  • Designing custom architectures optimized for use-case specific serving requirements.
  • Distilling large embedding models into smaller, production-friendly sizes.
  • Serving embeddings efficiently with optimization strategies like variable batch sizes and post-training quantization.

Attendees will leave with practical strategies for scaling embedding models from research to production, ensuring high performance and efficiency in real-world applications like retrieving best matching documents, passages or images, data de-duplication, generating personalized recommendations, content clustering, and grounding GenAI responses using RAG approach.


Speaker

Sahil Dua

Senior Software Engineer, Machine Learning @Google, Stanford AI, Co-Author of “The Kubernetes Workshop”, Open-Source Enthusiast

Sahil Dua is a Tech Lead focused on developing and adapting large language models (LLMs) with an expertise in Representation Learning. He oversees the full LLM lifecycle, from designing data pipelines and model architectures to optimizing models for highly efficient serving. Before Google, Sahil worked on the ML platform at Booking.com to scale machine learning model development and deployment.

A co-author of “The Kubernetes Workshop” book and an open-source enthusiast, Sahil has contributed to projects like Git, Pandas, and Linguist. As a frequent speaker at global conferences, he shares insights on AI, machine learning, and tech innovation, inspiring professionals across the industry.

Read more

From the same track

Session

Deploy MultiModal RAG Systems with vLLM

While text-based RAG systems have been everywhere in the last year and a half, there is so much more than text data. Images, audio, and documents often need to be processed together to provide meaningful insights, yet most RAG implementations focus solely on text.

Speaker image - Stephen Batifol

Stephen Batifol

Developer Advocate @Zilliz, Founding Member of the MLOps Community Berlin, Previously Machine Learning Engineer @Wolt, and Data Scientist @Brevo

Session

How to Unlock Insights and Enable Discovery Within Petabytes of Autonomous Driving Data

For autonomous vehicle companies, finding valuable insights within millions of hours of video data is essential yet challenging.

Speaker image - Kyra Mozley

Kyra Mozley

Machine Learning Engineer @Wayve

Session

AI for Food Image Generation in Production: How & Why

In this talk, we will conduct a technical overview of a client-facing Food Image Generation solution developed at Delivery Hero.

Speaker image - Iaroslav  Amerkhanov

Iaroslav Amerkhanov

Senior Data Scientist @Delivery Hero

Session

Foundation Models for Recommenders: Challenges, Successes, and Lessons Learned

Recommender systems are an integral part of most products nowadays and are often a key driver of discovery for users of the product.

Speaker image - Moumita Bhattacharya

Moumita Bhattacharya

Senior Research Scientist @Netflix, Previously @Etsy

Session

Lessons Learned From Building LinkedIn’s First Agent: Hiring Assistant

In October 2024, we announced LinkedIn’s first agent, Hiring Assistant to a select group of LinkedIn customers.

Speaker image - Karthik Ramgopal

Karthik Ramgopal

Distinguished Engineer & Tech Lead of the Product Engineering Team @LinkedIn, 15+ Years of Experience in Full-Stack Software Development

Speaker image - Daniel Hewlett

Daniel Hewlett

Principal AI Engineer & Technical Lead for AI @LinkedIn, 12+ Years of Expierence in ML and AI Engineering, Previously @Google