You are viewing content from a past/completed QCon -

Track: Operationalizing Microservices: Design, Deliver, Operate

Location: Fleming, 3rd flr.

Day of week:

Building and operating distributed systems is hard, and microservices are no different. The source of the challenge is not necessarily the services themselves but rather managing the space in between the services. Here, you enter a world of non-determinism, where all bets are off—half the time you can’t tell if your system is up, down, or partially working, and now every outage is a murder mystery.   

Sounds scary? Join us on this track where we explore what it takes to create Operational Microservices, through a continuous iterative process: from how to design a system for observability, resilience, predictability, continuous delivery; to exploring what tools, practices, and processes help you tackle the complexities of operating microservices at scale.

Track Host: Jonas Bonér

Founder & CTO @Lightbend / Creator of Akka

Jonas Bonér is Founder and CTO of Lightbend, inventor of the Akka project, initiator and co-author of the Reactive Manifesto, and a Java Champion. Learn more at

10:35am - 11:25am

Complex Event Flows in Distributed Systems

Event-driven architectures enable nicely decoupled microservices and are fundamental for decentralized data management. However, using peer-to-peer event chains to implement complex end-to-end logic crossing service boundaries can accidentally increase coupling. Extracting such business logic into dedicated services reduces coupling and allows to keep sight of larger-scale flows - without violating bounded contexts, harming service autonomy or introducing god services. Service boundaries get clearer and service APIs get smarter by focusing on their potentially long running nature. I will demonstrate how the new generation of lightweight and highly-scalable state machines ease the implementation of long running services. Based on my real-life experiences, I will share how to handle complex logic and flows which require proper reactions on failures, timeouts and compensating actions and I provide guidance backed by code examples to illustrate alternative approaches.

Bernd Ruecker, Co-founder and chief technologist @Camunda

11:50am - 12:40pm

Reactive Systems Architecture

Reactive systems architecture promises resilience and scalability, but building and maintaining a globally distributed system introduces considerable challenges. Jan and Matt will share the most important building aspects of systems that spread over multiple data centres as well as multiple AWS regions. You will learn about the evolution of the system's architecture, including some of the more interesting mistakes made, the protocols and APIs that its microservices use to communicate with each other, the challenges of eventual consistency in a system that spans continents, and the hard-learned lessons in keeping the system's components running in production. Moreover, Matt and Jan will present an overview of analysis process to discover just what makes the biggest impact on distributed system’s resilience; together with the results of applying this process to several production projects.


In short, Matt and Jan will give you the answer to the click-baity headline “4 things that make the biggest impact in distributed systems”, together with architectural and code examples to help you to avoid repeating the speakers’ mistakes.

Jan Machacek, Senior Principal Engineer @waltdisneyco & Founder @muvrhq
Matthew Squire, Technical Team Leader @BamtechMedia

1:40pm - 2:30pm

What Lies Between: The Challenge of Operationalising Microservices

The biggest challenge in operationalising microservices is managing the space between them. This is the land of distributed systems: uncertainty and non-determinism. I will present practical approaches that you can use to take microservices into production or increase the value provided by existing systems. I will explore how to integrate microservices at scale, including asset management, security considerations, and representing uncertainty in data. I will examine approaches that can be used to debug, monitor, adapt, and control microservices, and I will expand on why I do not like calling this observability. Finally, I will detail tools and models for handling failure, including how to avoid making reliability worse, rather than better. I will conclude with some thoughts on how the space between microservices, and the challenges it introduces, depends on scale and perspective.

Colin Breck, Sr. Staff Software Engineer @Tesla

2:55pm - 3:45pm

Lessons From 300k+ Lines of Infrastructure Code

This talk is a concise masterclass on how to write infrastructure code. I’ll share key lessons from the “Infrastructure Cookbook” we developed at Gruntwork while creating and maintaining a library of over 300,000 lines of infrastructure code that’s used in production by hundreds of companies. Come and hear our war stories, laugh about all the mistakes we’ve made along the way, and learn what Terraform, Packer, Docker, and Go look like in the wild. Topics include how to design infrastructure APIs, automated tests for infrastructure code, patterns for reuse and composition, refactoring, namespacing, versioning, CI / CD for infrastructure code, and more.

Yevgeniy Brikman, Co-founder @gruntwork_io

5:25pm - 6:15pm

Cultivating Production Excellence - Taming Complex Distributed Systems

Taming the complex distributed systems we're responsible for requires changing not just the tools and technical approaches we use; it also requires changing who is involved in production, how they collaborate, and how we measure success. 


In this talk, you'll learn about several practices core to production excellence: giving everyone a stake in production, collaborating to ensure observability, measuring with Service Level Objectives, and prioritizing improvements using risk analysis.

Liz Fong-Jones, Site Reliability Engineer


This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.