Navigating LLM Deployment: Tips, Tricks, and Techniques

Self-hosted Language Models are going to power the next generation of applications in critical industries like financial services, healthcare, and defence. Self-hosting LLMs, as opposed to using API-based models, comes with its own host of challenges - as well as needing to solve business problems, engineers need to wrestle with the intricacies of model inference, deployment and infrastructure. In this talk we are going to discuss the best practices in model optimisation, serving and monitoring - with practical tips and real case-studies.

What's the focus of your work these days?

At TitanML our focus is on making Generative AI applications easier to develop, deploy and serve. A large focus of our work recently is making it easier to build applications that involve both RAG and JSON constrained outputs. 

What's the motivation for your talk at QCon London 2024?

Almost every business is trying to build and deploy LLM applications at the moment, however very few of them have successfully got these applications into production. Our teams are experts in deploying and serving LLM apps so we have a lot of tips and tricks to help other developers avoid common pitfalls. 

How would you describe your main persona and target audience for this session?

This session is interesting for those working with or thinking of building with Generative AI, especially self-hosted open source AI. It is not a 'code-along' session, however there may be some technical concepts. 

Is there anything specific that you'd like people to walk away with after watching your session?

I want this persona to realize that deploying LLMs within your own environment is a viable option and is not as scary as it might appear!


Meryem Arik

Co-Founder @TitanML

Meryem co-founded TitanML with the vision of creating a seamless and secure infrastructure for enterprise LLM deployments. Meryem's training was in Theoretical Physics and Philosophy at the University of Oxford. Beyond her contributions to TitanML, Meryem is dedicated to sharing her insights on the practical and ethical adoption of AI in enterprise.

Read more


Monday Apr 8 / 11:45AM BST ( 50 minutes )


Mountbatten (6th Fl.)


AI/ML LLM Deployment Inference Infrastructure


From the same track

Session AI/ML

Retrieval-Augmented Generation (RAG) Patterns and Best Practices

Monday Apr 8 / 10:35AM BST

The rise of LLMs that coherently use language has led to an appetite to ground the generation of these models in facts and private collections of data.

Speaker image - Jay Alammar

Jay Alammar

Director & Engineering Fellow @Cohere & Co-Author of "Hands-On Large Language Models"

Session AI/ML

Reach Next-Level Autonomy with LLM-Based AI Agents

Monday Apr 8 / 01:35PM BST

Generative AI has emerged rapidly since the release of ChatGPT, yet the industry is still at its very early stage with unclear prospects and potential.

Speaker image - Tingyi Li

Tingyi Li

Enterprise Solutions Architect @AWS

Session AI/ML

LLM and Generative AI for Sensitive Data - Navigating Security, Responsibility, and Pitfalls in Highly Regulated Industries

Monday Apr 8 / 02:45PM BST

As large language models (LLM) become more prevalent in highly regulated industries, dealing with sensitive data and ensuring the security and ethical design of machine learning (ML) models is paramount.

Speaker image - Stefania Chaplin

Stefania Chaplin

Solutions Architect @GitLab

Speaker image - Azhir Mahmood

Azhir Mahmood

Research Scientist @PhysicsX

Session AI/ML

How Green is Green: LLMs to Understand Climate Disclosure at Scale

Monday Apr 8 / 05:05PM BST

Assessment of the validity of climate finance claims requires a system that can handle significant variation in language, format, and structure present in climate and financial reporting documentation, and knowledge of the domain-specific language of climate science and finance.

Speaker image - Leo Browning

Leo Browning

First ML Engineer @ClimateAligned

Session AI/ML

The AI Revolution Will Not Be Monopolized: How Open-Source Beats Economies of Scale, Even for LLMs

Monday Apr 8 / 03:55PM BST

With the latest advancements in Natural Language Processing and Large Language Models (LLMs), and big companies like OpenAI dominating the space, many people wonder: Are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big t

Speaker image - Ines Montani

Ines Montani

Co-Founder & CEO @Explosion, Core Developer of spaCy