Presentation: Do You Really Know Your Response Times?

Location:

Duration

Duration: 
11:50am - 12:40pm

Day of week:

Level:

Persona:

Key Takeaways

  • Learn how to make your response time metrics simpler and easier to understand
  • Acquire ideas for getting insight into the performance of microservices
  • Analyze examples of how you can aggregate and visualize performance date

Abstract

With the recent surge in highly available microsevervices with high incoming traffic, it is becoming more and more important to know how your service is performing right now and to be able to diagnose issues in production quickly. It took a while for us to understand how to produce meaningful graphs and alerts that help us truly understand our application performance.

We initially found that most developers did not understand what they were measuring and that many of the graphs caused confusion. In this talk I show how we collect application performance metrics at Sky.

I focus on the use of histogram metrics to monitor response times, explain how reservoir sampling can help and show the trade-offs among reservoir types. Finally I illustrate, with real-world examples, some good and bad practices when monitoring response times.

Interview

Question: 
What’s the motivation for your talk?
Answer: 

When I came into the team at Sky I saw that there were monitors, graphs, and lot’s of information. Looking further it turned out that there were mistakes in the graphs, some weren’t accurate and people started to distrust them.

In my talk I will show an example of providing data about the different kinds of platforms that our client are using. It turned out that by breaking the data down too much we didn’t have enough data, and aggregating them incorrectly produced overall metrics that were nonsense.

My talk focuses on response time as an example of how you can make metrics simpler and easier to understand.

Question: 
What will people walk away from your talk with?
Answer: 

It’s worth knowing when to aggregate on the host and when not to, and why you want to do that, so that’s something I will talk about. Also I want to get architects thinking about the data that their systems provide, and consider what developers and managers will do with that data.

Question: 
What do you feel is the most disruptive tech in IT right now?
Answer: 

I think functional programming which is now going mainstream with Scala is disrupting how developers think and develop software.

Speaker: Daniel Rolls

Collecting and Interpreting Large-Scale Data Collected @SkyUK

Daniel Rolls is a senior developer at Sky where he is responsible for building web services for over the top delivery of video streams. Prior to joining Sky Daniel did a PhD in Computer Science and worked for various organisations including Xerox and The University of Hertfordshire. Daniel questioned the accuracy and interpretation of various metric dashboards and started investigating.

Find Daniel Rolls at

Similar Talks

Tracks

Conference for Professional Software Developers