Summary

Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qconlondon.com with any comments or concerns.

In the presentation "Deploy MultiModal RAG Systems with vLLM," the speaker Stephen Batifol discusses the implementation and deployment of multimodal Retrieval-Augmented Generation (RAG) systems leveraging vLLM. This presentation aims to explore and address challenges in building RAG systems that process diverse data types such as images, audio, and documents, moving beyond text-only implementations.

Key Points:

Introduction to Vector Databases: Vector databases are essential for managing and retrieving embeddings from multimodal inputs. These databases facilitate the transformation of diverse data into embeddings stored for fast access and retrieval.
Importance of Multimodal Systems: Typical RAG systems focus on text; however, real-world applications often involve multiple data formats. Examples include integrating sensor data with visual feedback in robotics or correlating manufacturing logs with images in automated inspection systems.
Architecture Overview: The architecture involves using tools like Milvus for vector databases, vLLM for model inference, and Pixel for processing multimodal inputs. These components help process and retrieve relevant information from embedded data.
Performance Optimization: Discussion on optimization techniques such as parallelism and dynamic batching. These methods improve the latency and throughput of deployed systems, enabling efficient real-time processing.
Evaluation and Retrieval: Evaluation of RAG systems is crucial, emphasizing the importance of proper context retrieval to ensure the effectiveness of the model. The speaker highlighted the necessity of accurately measuring system performance and refining retrieval mechanisms to prevent irrelevant or misleading results.

The session additionally includes a live demonstration showcasing a practical application where images and text queries are processed together. This provides insights into deploying RAG systems on self-hosted infrastructure to enhance control, privacy, and reduce API costs.

This is the end of the AI-generated content.

Abstract

While text-based RAG systems have been everywhere in the last year and a half, there is so much more than text data. Images, audio, and documents often need to be processed together to provide meaningful insights, yet most RAG implementations focus solely on text. Think automated visual inspection systems understanding both manufacturing logs and production line images, or robotics systems correlating sensor data with visual feedback. These multimodal scenarios demand RAG systems that go beyond text-only processing.

In this talk, we'll talk about how one can build a MultiModal RAG system that helps solve this problem. We'll explore the architecture that makes it possible to run such a system and demonstrate how to build one using Milvus, LlamaIndex, and vLLM for deploying open-source LLMs on your own infrastructure.

Through a live demo, we'll showcase a real-world application processing both images and text queries. Whether you're looking to reduce API costs, maintain data privacy, or simply gain more control over your AI infrastructure, this session will provide you with actionable insights to implement MultiModal RAG in your organization.

Interview:

What is the focus of your work?

I focus on GenAI usage, going from simple RAG systems to full Agentic ones. I also highlight how search works at Scale. I am a big open source fan so most of my work is focused around that.

What’s the motivation for your talk?

To showcase that you can deploy open source apps that can be very good. The idea is to showcase to people that they can be in control and not dependent on closed source systems.

Who is your talk for?

People interested in moving from OpenAI and they want to control their GenAI stack. Also people interested in Multimodality.

What do you want someone to walk away with from your presentation?

Learn how specific open source tools like vLLMs can match or exceed proprietary solutions while giving you full control over your AI stack and SLAs.

What do you think is the next big disruption in software?

I believe that the combination of open source AI models and rapid development tools will enable more customized, sovereign AI solutions.

People will have their own unique version of their software and likely not rely as much on the typical apps we used to have.

Speaker

Stephen Batifol

Developer Advocate @Zilliz, Founding Member of the MLOps Community Berlin, Previously Machine Learning Engineer @Wolt, and Data Scientist @Brevo

Stephen Batifol is a Developer Advocate at Zilliz. He previously worked as a Machine Learning Engineer at Wolt, where he created and worked on the ML Platform, and previously as a Data Scientist at Brevo. Stephen studied Computer Science and Artificial Intelligence.

He is a founding member of the MLOps.community Berlin group, where he organizes Meetups and hackathons. He enjoys boxing and surfing.

Deploy MultiModal RAG Systems with vLLM

Summary

Abstract

Interview:

What is the focus of your work?

What’s the motivation for your talk?

Who is your talk for?

What do you want someone to walk away with from your presentation?

What do you think is the next big disruption in software?

Speaker

Stephen Batifol

Find Stephen Batifol at:

Speaker

Stephen Batifol

Date

Location

Track

Topics

Share

From the same track

How to Unlock Insights and Enable Discovery Within Petabytes of Autonomous Driving Data

AI for Food Image Generation in Production: How & Why

Foundation Models for Ranking: Challenges, Successes, and Lessons Learned

Building Embedding Models for Large-Scale Real-World Applications

Unconference: AI and ML for Software Engineers

Follow QCon

Contact

Menu

Conferences around the World