Conference:March 6-8, 2017
Workshops:March 9-10, 2017
Presentation: Observability, Event Sourcing and State Machines
Location:
- Whittle, 3rd flr.
Duration
Day of week:
- Wednesday
Level:
- Intermediate
Persona:
- Architect
Key Takeaways
- Learn how to handle terabytes of data with gigabytes of heap in a JVM.
- Discuss architect applications to use event sourcing for reliability and persistence.
- Compare the differences between log monitoring and event sourcing.
Abstract
What is a way to have complete transparency of the state of a service? Ideally we would record everything - the inputs, outputs and timings - in order to capture highly reproducible and transparent state changes. However, is it possible to record every event or message in and out of a service without hurting performance? Join Peter for an exploration of the use cases and practicalities for having downstream services consuming all of the state changes of an upstream service in order to provide automated, real-time insight of services.
Interview
It’s a call to arms for event sourcing and a record everything model - you can do it if you have aggressive low latency requirements, so there isn’t a performance reason not to do it. If you have performance problems you’re doing it wrong.
The talk will be presented in general terms; and is applicable to a number of different problems and products, so it will be applicable to Kafka, JMS, Aeron, but when referring to implementation details the product that will be used as an example will be our company’s open-source product, Chronicle Queue.
Two clients have over 100TB of buffers, going back three years - and they can replay anything from that time.
It’s like data driven tests on steroids. It’s possible to replay a day’s worth of events in a specific order that came in. This allows performance tests to be investigated and debugged in case a specific ordering of events happens.
Logging presents a challenge in your application: you don’t want to slow down the critical path but you want to record what is happening. However, if you log everything you can slow down your application. The key is to be able to log the right amount, so that the application doesn’t get slowed down, but enough to be useful.
So it is necessary to decide up front that you can record enough information to reproduce any problem, and that it’s being recorded by default.
Event sourcing allows the application to be viewed as a series of state machines.
With event sourcing, you can have confidence that you can reproduce any problem as it won’t run unless there is enough information and the confidence to know you can fix the bug. If you can reproduce the problem you can demonstrate whether the fix works as expected with that data, rather than making a change and hoping that it works.
The queue is one or more files. Effectively the application is in replay mode the whole time; when the application starts, it follows the file and is playing the log.
Being able to record a stream of events is separate from being able to replay those events afterwards. Applications that are put into production are often not tested for replay. In addition the application may need supporting infrastructure that isn’t necessarily easy to set up on a development machine.
By writing events to a log and having the application tail that log for listening to new events, it’s easy to set up both in production and on a development machine.
Back-pressure is worth talking about: one of the differentiators of the messaging systems is how they implement back-pressure and this can determine if it is appropriate for your problem or not. In particular as a contrast a very good solution is Reactive Streams, which is what Akka supports. It’s been added to Java 9 as the Flow API, and one of the key elements about this is that the consumer pushes back on the producer in a fairly simply yet elegant way where it says give me another 5, another 2 etc. While it’s receiving data it’s giving the producer permission to give more data while applying back pressure, instead of polling for more information but without the latency downsides of blocking. It works particularly well with GUI clients, because they are running on different machines and with different networks - you don’t have a level of control that allows you to send (or consume) messages at a fixed rate.
With reactive streams you continually say whether how much you can handle, which you can then adjust over time.
In Chronicle Queue we have a different approach, by focussing on upstream, like market data or compliance. Instead of utilising back-pressure, we have an enormous buffer. The assumption is that you can run for a week without it filling, and as a part of your weekly maintenance cycle you can rotate it and delete, compress or archive it..
So for example on a machine with 128 GB of memory we test what happens if you give it a terabyte of data in 3 hours. The difference between the first 64 GB and last 64 GB is only about 20%, so while it slows down, it doesn’t have a dramatic impact. It takes 0.9s to write the first GB and 1.1s to write the last GB.
Some clients are taking peaks of 30 million messages per second without a push back on the market data provider. Compliance is also another compelling case; regulatory requirements are placed on existing systems that don’t have development teams any more. They may not be in a situation to add back-pressure support.
This allows the pattern to be used in a number of back-end systems, although GUI clients will still need something reactive to be able to display updates.
Similar Talks




Tracks
-
Architecting for Failure
Building fault tolerate systems that are truly resilient
-
Architectures You've Always Wondered about
QCon classic track. You know the names. Hear their lessons and challenges.
-
Modern Distributed Architectures
Migrating, deploying, and realizing modern cloud architecture.
-
Fast & Furious: Ad Serving, Finance, & Performance
Learn some of the tips and technicals of high speed, low latency systems in Ad Serving and Finance
-
Java - Performance, Patterns and Predictions
Skills embracing the evolution of Java (multi-core, cloud, modularity) and reenforcing core platform fundamentals (performance, concurrency, ubiquity).
-
Performance Mythbusting
Performance myths that need busting and the tools & techniques to get there
-
Dark Code: The Legacy/Tech Debt Dilemma
How do you evolve your code and modernize your architecture when you're stuck with part legacy code and technical debt? Lessons from the trenches.
-
Modern Learning Systems
Real world use of the latest machine learning technologies in production environments
-
Practical Cryptography & Blockchains: Beyond the Hype
Looking past the hype of blockchain technologies, alternate title: Weaselfree Cryptography & Blockchain
-
Applied JavaScript - Atomic Applications and APIs
Angular, React, Electron, Node: The hottest trends and techniques in the JavaScript space
-
Containers - State Of The Art
What is the state of the art, what's next, & other interesting questions on containers.
-
Observability Done Right: Automating Insight & Software Telemetry
Tools, practices, and methods to know what your system is doing
-
Data Engineering : Where the Rubber meets the Road in Data Science
Science does not imply engineering. Engineering tools and techniques for Data Scientists
-
Modern CS in the Real World
Applied, practical, & real-world dive into industry adoption of modern CS ideas
-
Workhorse Languages, Not Called Java
Workhorse languages not called Java.
-
Security: Lessons Learned From Being Pwned
How Attackers Think. Penetration testing techniques, exploits, toolsets, and skills of software hackers
-
Engineering Culture @{{cool_company}}
Culture, Organization Structure, Modern Agile War Stories
-
Softskills: Essential Skills for Developers
Skills for the developer in the workplace