Presentation: Staying in Sync: From Transactions to Streams

Key Takeaways

  • Learn approaches to keeping datastores in sync in the face of failure and latency.
  • Explore event streams and Kafka use cases.
  • Understand consistency guarantees and trade offs with keeping multiple data stores sync’d.

Abstract

For the very simplest applications, a single database is sufficient, and then life is pretty good. But as your application needs to do more, you often find that no single technology can do everything you need to do with your data. And so you end up having to combine several databases, caches, search indexes, message queues, analytics tools, machine learning systems, and so on, into a heterogeneous infrastructure...

Now you have a new problem: your data is stored in several different places, and if it changes in one place, you have to keep it in sync in the other places, too. It's not too bad if all your systems are up and running smoothly, but what if some parts of your systems have failed, some are running slow, and some are running buggy code that was deployed by accident?

It's not easy to keep data in sync across different systems in the face of failure. Distributed transactions and 2-phase commit have long been seen as the "correct" solution, but they are slow and have operational problems, and so many systems can't afford to use them.

In this talk we'll explore using event streams and Kafka for keeping data in sync across heterogeneous systems, and compare this approach to distributed transactions: what consistency guarantees can it offer, and how does it fare in the face of failure?

Interview

Question: 
What is your main role today?
Answer: 

At the moment, in half of my time I am writing "Designing Data-Intensive Applications" — which has been quite a long project, but I’m gradually getting towards the end of it. In the other half of my time, I am at the University of Cambridge, working on a research project that is figuring out how to join up databases with security research.

What we are trying to do is the following: imagine you wanted to build Google Docs, but in a way that you don’t have to trust Google’s servers. We can do this using end-to-end encryption between the devices that are collaborating on a document, but still allowing the same kind of real-time collaboration that you get with Google Docs. We are trying to figure out how to build the fundamental infrastructure that would make it easy for people to write this kind of real-time collaborative apps with end-to-end security.

Question: 
What is the main motivation for your talk?
Answer: 

The problem I want to address is the issue of keeping several datastores in sync with each other. The traditional way of handling that, if you want any sort of consistency guarantees, has been to use two-phase commit: distributed transactions across different stores. However, such transactions have all sorts of operational problems. The alternative has been to use eventual consistency everywhere, which has better performance, but failure modes that are very hard to reason about. It’s easy to end up in a situation where you data is wrong and you don’t even realise it.

What I see in event-driven systems is a way to get fairly strong guarantees by making sure that the writes to different data stores always happen in the same order. It’s fundamentally a very simple idea, although actually putting it into practice is still a fair bit of work.

Question: 
How deep are you diving into it, and how are you shaping that discussion?
Answer: 

I will introduce it using Kafka as an implementation method. This is not intended as a pitch for Kafka specifically, but it happens to be one of the best-suited tools for the purpose. I will summarize Kafka briefly, the architecture, how it works internally for those who have not seen it before, because it is very different from traditional message queues like RabbitMQ or ActiveMQ.

In Kafka, once you’ve published your messages to a topic and you have got several people subscribing or consuming from it, all subscribers are going to see the same messages for a particular partition in the same order. That is a guarantee that Kafka provides, and this ordering guarantee has a whole range of consequences. Failure handling becomes a lot simpler, because the consumers can just keep a checkpoint of their offset, indicating which messages they have seen and which they haven’t seen. Since the ordering stays the same, you can replay the history when recovering from a failure.

And there are performance benefits, because now there is much less work for the message brokers to do. They don’t need to keep track of the state of every single consumer and which messages they have acknowledged and which they haven’t, so you actually get much better throughput on the brokers as well. But most importantly, if you have this ordering guarantee, then you can write the messages to different datastores and make sure that once all the messages have been processed, the end result will be consistent with each other across all of these different stores.

You could see this as a scalable implementation of event sourcing. That is one way of phrasing it for people who are already familiar with event sourcing.

Speaker: Martin Kleppmann

Software Engineer, Author, & Commiter to Samza and Avro

Martin Kleppmann is a researcher at the University of Cambridge Computer Laboratory, where he works at the intersection of databases, distributed systems and information security. He previously founded and sold two startups, and worked on data infrastructure at LinkedIn. Martin's work-in-progress book "Designing Data-Intensive Applications" (O'Reilly) has received reviews such as "This is by far the best book on distributed systems and data engineering I'm aware of", even though it's not even finished yet.

Find Martin Kleppmann at

Tracks

Conference for Professional Software Developers