Abstract
Search is central to many AI systems. Everyone is building RAG and agents right now, but few are building reliable retrieval systems.
Drawing from our real world RAG system built on 10K+ documents, used by 300+ users, we found that most RAG failures can be traced back to two things: indexing and retrieval. This talk shares what matters when building production retrieval systems, starting from effective document parsing, chunking, indexing to reliable search and retrieval pipeline. I will present specific implementation details, toolings, and how we solve the challenges encountered.
Here is one truth: you know who the real boss of NLP is? A PDF! Retrieval is only as good as the documents you index. We will cover how to handle document layout nightmares and the decisions around parsing and chunking strategies. Your system might not even need chunking, adding it could hurt performance. I will show when to chunk and how to find the "chunking sweet spot.”
Once your documents are indexed, the next challenge is search itself. Many retrieval systems see the entire world as just strings. But real user queries carry non-textual signals such as time or numbers which also needs to be encoded. We demonstrate how search can be combined with temporal scoring to capture user's time intent and when agentic search could help in retrieval.
Finally, let's not forget evals. If you can't trust your evals, how can you trust your AI system? I will share our approach to building good evaluation sets from working with stakeholders to capture real failure modes, to using bootstrapping to determine how many samples you actually need.
Speaker
Lan Chu
AI Tech Lead and Senior Data Scientist
Lan Chu is an AI Tech Lead and Senior Data Scientist with 7+ years of experience building production data and machine learning pipelines and 3+ years in building GenAI products. She specializes in designing, implementing data and AI pipelines and responsible AI practices. Lan has a background in Data Science, deep expertise in Natural Language Processing. She works with AI production systems powered by 10000+ documents.