SESSION + Live Q&A

Pitfalls in Measuring SLOs

We built support for SLOs (Service Level Objectives) against our event store so we could monitor operations for our own complex distributed system. In the process of doing so, we learned that there were a number of important aspects that we didn’t expect from carefully reading the SRE workbook.

This talk is the story of the missing pieces, unexpected pitfalls, and how we solved those problems. We’d like to share what we learned and how we iterated on our SLO adventure.

As part of the design process, we collected user feedback through iterative deployments to learn what challenges users were running into. This conversation will discuss how we iterated our design, based on user feedback; how we deployed, what we learned, and re-deployed; and how we collected information from our users and from the alerts our system fired.

In this talk, we will discuss how we brought the theory of SLOs to practice, and what we learned that we hadn’t expected in the process. We’ll discuss implementing the SLO feature and burn alerts; and our experiences from working with the SRE team who started using the alerts. Our hope is that when you buy or build your SLO tools, you’ll know what to look for, and how to get started. implementors will be able to start with a more solid ground, and that we will be able to advance the state of SLO support for all teams that wish to implement them.

The major design points will be broken into a discussion of what we actually built; a number of unexpected technical features; and ways that we had to educate users beyond the standard SLO guidelines. The talk is largely conceptual: no live code will be shown, although some innocent servers may well die in the process of being visualized.

Speaker

Danyel Fisher

Principal Design Researcher @honeycombio

Danyel Fisher is a Principal Design Researcher for Honeycomb.io. He focuses his passion for data visualization on helping SREs understand their complex systems quickly and clearly. Before he started at Honeycomb, he spent thirteen years at Microsoft Research, studying ways to help people gain...

Speaker

Danyel Fisher

Principal Design Researcher @honeycombio

From the same track

SESSION + Live Q&A

Lessons Learned Implementing ChatOps

Email overload, distributed teams and excessive meetings have caused many organizations to move their DevOps teams to messaging platforms and thus adopt ChatOps workflows. With thousands of open source installs and hundreds of customer implementations, we have a few lessons to share on...

Corey Hulen

CTO and Co-founder @Mattermost

SESSION + Live Q&A

Kubernetes for Developers, Architects, & Other People

You've got kubernetes up and running, and you're ready for it to change your life! But, all you see now is a blinking cursor. What do you do now?!This talk explains what cloud native development is, how kubernetes supports it, and give you a toolkit to start planning for how you'll...

Michael Coté

Staff Technologist @VMware

SESSION + Live Q&A

Using Reinforcement Learning AI to Accelerate DevOps

You probably learned that hacking on code until it works is the wrong way to write a program. But today, that’s essentially what we do in AI (albeit trillions of times faster)—and it achieves remarkable results!Reinforcement learning was notably used by Google’s AlphaGo...

Mathew Lodge

Chief Executive Officer @diffbluehq

SESSION + Live Q&A

Open Source Developers Are Security’s New Front Line

Bad actors have recognised the power of open source and are now beginning to create their own attack opportunities. This new form of assault, where OSS project credentials are compromised and malicious code is intentionally injected into open source libraries, allows hackers to poison the well....

Ilkka Turunen

Global Director, Pre-Sales Engineering @Sonatype

SESSION + Live Q&A

Design, Catalogue, Discover and Use Your Event Assets

How do you communicate about the Event assets in your organisation? Where do you keep what Events there are? Authors and users of RESTful APIs are used to a rich suite of tools that gather, document, manage, monitor and govern their APIs so that teams can collaborate on their API...

Tom Fairbairn

Solace Distinguished Engineer

View full Schedule

SESSION + Live Q&A

Pitfalls in Measuring SLOs

Speaker

Danyel Fisher

Find Danyel Fisher at:

Speaker

Danyel Fisher

Location

Track

Video

Share

From the same track

Lessons Learned Implementing ChatOps

Corey Hulen

Kubernetes for Developers, Architects, & Other People

Michael Coté

Using Reinforcement Learning AI to Accelerate DevOps

Mathew Lodge

Open Source Developers Are Security’s New Front Line

Ilkka Turunen

Design, Catalogue, Discover and Use Your Event Assets

Tom Fairbairn

Follow QCon

Contact

Menu

QCons around the World