Track:

Observability Done Right: Automating Insight & Software Telemetry

Location:

Whittle, 3rd flr.

Duration

Duration:

5:25pm - 6:15pm

Day of week:

Wednesday

Level:

Intermediate

Persona:

Architect

Key Takeaways

Learn how to handle terabytes of data with gigabytes of heap in a JVM.
Discuss architect applications to use event sourcing for reliability and persistence.
Compare the differences between log monitoring and event sourcing.

Abstract

What is a way to have complete transparency of the state of a service? Ideally we would record everything - the inputs, outputs and timings - in order to capture highly reproducible and transparent state changes. However, is it possible to record every event or message in and out of a service without hurting performance? Join Peter for an exploration of the use cases and practicalities for having downstream services consuming all of the state changes of an upstream service in order to provide automated, real-time insight of services.

Interview

Question:

What’s the focus of the talk?

Answer:

It’s a call to arms for event sourcing and a record everything model - you can do it if you have aggressive low latency requirements, so there isn’t a performance reason not to do it. If you have performance problems you’re doing it wrong.

The talk will be presented in general terms; and is applicable to a number of different problems and products, so it will be applicable to Kafka, JMS, Aeron, but when referring to implementation details the product that will be used as an example will be our company’s open-source product, Chronicle Queue.

Question:

What are you going to talk about?

Answer:

Two clients have over 100TB of buffers, going back three years - and they can replay anything from that time.

It’s like data driven tests on steroids. It’s possible to replay a day’s worth of events in a specific order that came in. This allows performance tests to be investigated and debugged in case a specific ordering of events happens.

Question:

What are some of the challenges of logging in production?

Answer:

Logging presents a challenge in your application: you don’t want to slow down the critical path but you want to record what is happening. However, if you log everything you can slow down your application. The key is to be able to log the right amount, so that the application doesn’t get slowed down, but enough to be useful.

So it is necessary to decide up front that you can record enough information to reproduce any problem, and that it’s being recorded by default.

Question:

Why use event sourcing as a means of driving data through the application?

Answer:

Event sourcing allows the application to be viewed as a series of state machines.

With event sourcing, you can have confidence that you can reproduce any problem as it won’t run unless there is enough information and the confidence to know you can fix the bug. If you can reproduce the problem you can demonstrate whether the fix works as expected with that data, rather than making a change and hoping that it works.

The queue is one or more files. Effectively the application is in replay mode the whole time; when the application starts, it follows the file and is playing the log.

Being able to record a stream of events is separate from being able to replay those events afterwards. Applications that are put into production are often not tested for replay. In addition the application may need supporting infrastructure that isn’t necessarily easy to set up on a development machine.

By writing events to a log and having the application tail that log for listening to new events, it’s easy to set up both in production and on a development machine.

Question:

How does event sourcing relate to backpressure?

Answer:

Back-pressure is worth talking about: one of the differentiators of the messaging systems is how they implement back-pressure and this can determine if it is appropriate for your problem or not. In particular as a contrast a very good solution is Reactive Streams, which is what Akka supports. It’s been added to Java 9 as the Flow API, and one of the key elements about this is that the consumer pushes back on the producer in a fairly simply yet elegant way where it says give me another 5, another 2 etc. While it’s receiving data it’s giving the producer permission to give more data while applying back pressure, instead of polling for more information but without the latency downsides of blocking. It works particularly well with GUI clients, because they are running on different machines and with different networks - you don’t have a level of control that allows you to send (or consume) messages at a fixed rate.

With reactive streams you continually say whether how much you can handle, which you can then adjust over time.

In Chronicle Queue we have a different approach, by focussing on upstream, like market data or compliance. Instead of utilising back-pressure, we have an enormous buffer. The assumption is that you can run for a week without it filling, and as a part of your weekly maintenance cycle you can rotate it and delete, compress or archive it..

Question:

What performance can you drive with this architectural pattern?

Answer:

So for example on a machine with 128 GB of memory we test what happens if you give it a terabyte of data in 3 hours. The difference between the first 64 GB and last 64 GB is only about 20%, so while it slows down, it doesn’t have a dramatic impact. It takes 0.9s to write the first GB and 1.1s to write the last GB.

Some clients are taking peaks of 30 million messages per second without a push back on the market data provider. Compliance is also another compelling case; regulatory requirements are placed on existing systems that don’t have development teams any more. They may not be in a situation to add back-pressure support.

This allows the pattern to be used in a number of back-end systems, although GUI clients will still need something reactive to be able to display updates.

Speaker: Peter Lawrey

Gold Badges Java, JVM, Memory, & Performance @StackOverflow / Lead developer of the OpenHFT project

Peter Lawrey likes to inspire developers to improve the craftsmanship of their solutions, engineer their systems for simplicity and performance, and enjoy their work more by being creative and innovative. He has a popular blog “Vanilla Java” which gets 120K page views per months, is 3rd on StackOverflow.com for [Java] and 2nd for [concurrency], and is lead developer of the OpenHFT project which includes support for off heap memory, thread pinning and low latency persistence and IPC (as low as 100 nano-seconds).

Find Peter Lawrey at

Speaker page

https://vanilla-java.github.io

https://twitter.com/peterlawrey

https://uk.linkedin.com/in/peterlawrey

Core Kafka team @Confluent

Ben Stopford

Mastering the app Store landscape with telemetry

Kaushik Patel

Mastering the app Store landscape with telemetry

Data Scientist

Usman Khan

Causal Consistency For Large Neo4j Clusters

Chief Scientist @Neo4j

Jim Webber

Effective Data Pipelines: Data Mngmt from Chaos

Python engineer, Founder @kjamistan

Katharine Jarmul

Deliver Docker Containers Continuously on AWS

Lead Software Developer @AutoScout24

Philipp Garbe

Creating Space To Be Awesome

CTO who understands the science around helping people do their best

Meri Williams

Thinking Strategically About IoT

Senior Software Engineer @IBM, Committer on Apache Aries

Holly Cummins

In-Memory Caching: Curb Tail Latency with Pelikan

Distributed Systems Engineer Working on Cache @Twitter

Yao Yue

Tracks

Architecting for Failure

Building fault tolerate systems that are truly resilient
Architectures You've Always Wondered about

QCon classic track. You know the names. Hear their lessons and challenges.
Modern Distributed Architectures

Migrating, deploying, and realizing modern cloud architecture.
Fast & Furious: Ad Serving, Finance, & Performance

Learn some of the tips and technicals of high speed, low latency systems in Ad Serving and Finance
Java - Performance, Patterns and Predictions

Skills embracing the evolution of Java (multi-core, cloud, modularity) and reenforcing core platform fundamentals (performance, concurrency, ubiquity).
Performance Mythbusting

Performance myths that need busting and the tools & techniques to get there

Dark Code: The Legacy/Tech Debt Dilemma

How do you evolve your code and modernize your architecture when you're stuck with part legacy code and technical debt? Lessons from the trenches.
Modern Learning Systems

Real world use of the latest machine learning technologies in production environments
Practical Cryptography & Blockchains: Beyond the Hype

Looking past the hype of blockchain technologies, alternate title: Weaselfree Cryptography & Blockchain
Applied JavaScript - Atomic Applications and APIs

Angular, React, Electron, Node: The hottest trends and techniques in the JavaScript space
Containers - State Of The Art

What is the state of the art, what's next, & other interesting questions on containers.
Observability Done Right: Automating Insight & Software Telemetry

Tools, practices, and methods to know what your system is doing

Data Engineering : Where the Rubber meets the Road in Data Science

Science does not imply engineering. Engineering tools and techniques for Data Scientists
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS ideas
Workhorse Languages, Not Called Java

Workhorse languages not called Java.
Security: Lessons Learned From Being Pwned

How Attackers Think. Penetration testing techniques, exploits, toolsets, and skills of software hackers
Engineering Culture @{{cool_company}}

Culture, Organization Structure, Modern Agile War Stories
Softskills: Essential Skills for Developers

Skills for the developer in the workplace

LAST YEAR'S SCHEDULE

Location:

Duration

Day of week:

Level:

Persona:

Key Takeaways

Abstract

Interview

Find Peter Lawrey at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Observability, Event Sourcing and State Machines

Location:

Duration

Day of week:

Level:

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Peter Lawrey at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World