Reliable Retrieval for Production AI Systems

Abstract

Search is central to many AI systems. Everyone is building RAG and agents right now, but few are building reliable retrieval systems.

Drawing from our real world RAG system built on 10K+ documents, used by 300+ users, we found that most RAG failures can be traced back to two things: indexing and retrieval. This talk shares what matters when building production retrieval systems, starting from effective document parsing, chunking, indexing to reliable search and retrieval pipeline. I will present specific implementation details, toolings, and how we solve the challenges encountered.

Here is one truth: you know who the real boss of NLP is? A PDF! Retrieval is only as good as the documents you index. We will cover how to handle document layout nightmares and the decisions around parsing and chunking strategies. Your system might not even need chunking, adding it could hurt performance. I will show when to chunk and how to find the "chunking sweet spot.”

Once your documents are indexed, the next challenge is search itself. Many retrieval systems see the entire world as just strings. But real user queries carry non-textual signals such as time or numbers which also needs to be encoded. We demonstrate how search can be combined with temporal scoring to capture user's time intent and when agentic search could help in retrieval.

Finally, let's not forget evals. If you can't trust your evals, how can you trust your AI system? I will share our approach to building good evaluation sets from working with stakeholders to capture real failure modes, to using bootstrapping to determine how many samples you actually need.

Speaker

Lan Chu

AI Tech Lead and Senior Data Scientist

Lan Chu is an AI Tech Lead and Senior Data Scientist with 7+ years of experience building production data and machine learning pipelines and 3+ years in building GenAI products. She specializes in designing, implementing data and AI pipelines and responsible AI practices. Lan has a background in Data Science, deep expertise in Natural Language Processing. She works with AI production systems powered by 10000+ documents.

Lan Chu

AI Tech Lead and Senior Data Scientist

From the same track

Session AI

Beyond Context Windows: Building Cognitive Memory for AI Agents

Tuesday Mar 17 / 02:45PM GMT

AI agents are rapidly changing how users interact with software, yet most agentic systems today operate with little to no intelligent memory, relying instead on brittle context-window heuristics or short-term state.

Karthik Ramgopal

Distinguished Engineer & Tech Lead of the Product Engineering Team @LinkedIn, 15+ Years of Experience in Full-Stack Software Development

Session AI/ML

Refreshing Stale Code Intelligence

Tuesday Mar 17 / 01:35PM GMT

Coding models are helping software developers move faster than ever, but weirdly, the models themselves are not keeping up. They are trained on months-old snapshots of open source code. They have never seen your internal codebase, let alone the code you wrote yesterday.

Jeff Smith

CEO & Co-Founder @Neoteny AI, AI Engineer, Researcher, Author, Ex-Meta/FAIR

Session AI

Rewriting All of Spotify's Code Base, All the Time

Tuesday Mar 17 / 11:45AM GMT

We don't need LLMs to write new code. We need them to clean up the mess we already made.In mature organizations, we have to maintain and migrate the existing codebase. Engineers are constantly balancing new feature development with endless software upkeep.

Jo Kelly-Fenton

Engineer @Spotify

Aleksandar Mitic

Software Engineer @Spotify

Session applied ai

Building an AI Gateway Without Frameworks: One Platform, Many Agents

Tuesday Mar 17 / 03:55PM GMT

Early AI integrations often start small: wrap an inference API, add a prompt, ship a feature. At Zoox, that approach grew into Cortex, a production AI gateway supporting multiple model providers, multiple modalities, and agentic workflows with dozens of tools, serving over 100 internal clients.

Amit Navindgi

Staff Software Engineer @Zoox

Jatin Aneja

Leading Developer Experience @Zoox, Previously Director of Site Reliability Engineering @AppLovin

Session

Async Agents in Production: Failure Modes and Fixes

Tuesday Mar 17 / 05:05PM GMT

As models improve, we are starting to build long-running, asynchronous agents such as deep research agents and browser agents that can execute multi-step workflows autonomously. These systems unlock new use cases, but they fail in ways that short-lived agents do not.

Meryem Arik

Co-Founder and CEO @Doubleword (Previously TitanML), Recognized as a Technology Leader in Forbes 30 Under 30, Recovering Physicist

Reliable Retrieval for Production AI Systems

Abstract

Speaker

Lan Chu

Speaker

Lan Chu

Date

Location

Track

Topics

Share

From the same track

Beyond Context Windows: Building Cognitive Memory for AI Agents

Refreshing Stale Code Intelligence

Rewriting All of Spotify's Code Base, All the Time

Building an AI Gateway Without Frameworks: One Platform, Many Agents

Async Agents in Production: Failure Modes and Fixes

Follow QCon

Contact

Menu

Conferences around the World