You are viewing content from a past/completed QCon

Presentation: Databases and Stream Processing: A Future of Consolidation

Track: Streaming Data Architectures

Location: Churchill, G flr.

Duration: 1:40pm - 2:30pm

Day of week: Monday

Slides: Download Slides

Share this on:

This presentation is now available to view on InfoQ.com

Watch video with transcript

What You’ll Learn

  1. Hear about the similarities and differences between databases and stream processors.
  2. Find out about the new type of databases for data that moves.

Abstract

Are databases and stream processors wholly different things, or are they really two sides of the same coin? Certainly, stream processors feel very different from traditional databases when you use them. In this talk, we’ll explore why this is true, but maybe more importantly why it's likely to be less true in the future: a future where consolidation seems inevitable.  

So what advantage is there to be found in merging these two fields? To understand this we will dig into why both stream processors and databases are necessary, from a technical standpoint, but also by exploring industry trends that make consolidation in the future far more likely. Finally, we'll examine how these trends map onto common approaches from active databases like MongoDB to streaming solutions like Flink, Kafka Streams or ksqlDB.  

By the end of this talk, you should have a clear idea of how stream processors and databases relate and why there is an emerging new category of databases that focus on data that moves.

Question: 

Tell us a little bit about yourself and what you are doing today.

Answer: 

I work at Confluent, which is one of the companies that sits behind Apache Kafka. Originally I worked on Kafka Core where I worked on a number of features, including the latest version of the replication protocol. I did some work on throttling and a few other things too. These days I run what we call the Office of the CTO, which is a strategic function: we look at different parts of the industry and then also internally across the company, we try and work out what we should be doing next. So this involves a number of different initiatives across the company, including the subject we're going to talk about in this session, which originally came from a thought experiment we conducted where we created a fictitious stream processor, unusually, without the use of streams.

Question: 

What are the goals for your talk, Databases and Stream Processing?

Answer: 

Most of this talk is about how these two things relate, and at the same time how they're different. Databases have been around forever, and they all have pretty much the same shape. You make a request of a database that holds your data. The database calculates your answer and gives it back to you. Now, it's been that way for a long time, and then stream processors came along, maybe over five or so years and they take a very different approach: data isn’t locked up like it is in a database, it is actually in motion. But there are lots of similarities between databases and stream processors. There are tables in both, they both talk SQL, but the interaction model is very different. When you start to look at what the stream processors has become, you can make the argument that it’s a special type of database for data that is in motion. Data in event streams. This is no more different than other database variants we see around these days. Maybe something like Cassandra being a specialist in large datasets held on disk or Neo4J being a specialist in asking questions about relationships. Then we will talk a bit about why that's the case at a technical level.

Question: 

Can you also give us a little preview on how these stream processors and databases related to each other?

Answer: 

The fact that both of them have tables is very similar. But the main thing is this interaction model is very different. If you use something like ksqlDB, just to take an example, it still feels quite different to a database. You don't ask questions and get answers. Instead, the database is reacting to events that are happening in real-time. They are very different from an interaction model perspective. But despite this, the underlying technologies are quite similar, they both support predicates and joins and aggregations, and the like, but in a database, you can optimize queries in a very different way because you don't know everything about all of the data the query might return. In a stream processor, you don't know what's going to turn up next. When you put these things together, you have this venn diagram, with a section of overlap between the two. We'll be looking closely at this overlap and how you can think of a stream processor as an extension of the database rather than something that's completely different.

Question: 

What do you want the people to leave the talk with?

Answer: 

I’d like to think they'll leave with a pretty good understanding of what stream processor is, and not just in terms of how you use it, but why is it technically different to a database. Everyone can probably understand the database. I’ll cover the differences in a technical sense. Finally, I'd like to think that folks will leave with provoked thought around whether or not they should fundamentally rethink what a database is? We are all a little indoctrinated into this notion of what a database is, we are all very familiar with it. They are the basis of every pretty much every application we've built for the last 60 years so we understand them really well. I think hopefully people will leave thinking, well, I never thought of databases in this way. I need to think about this some more.

Speaker: Benjamin Stopford

Author of “Designing Event Driven Systems” & Senior Director @confluentinc

Ben is a Senior Director at Confluent (a company that backs Apache Kafka) where he runs the Office of the CTO. He's worked on a wide range of projects from implementing the latest version of Kafka’s replication protocol through to assessing and shaping Confluent's strategy. His earlier career spanned projects at Thoughtworks and UK-based enterprise companies. He is also the author of the book “Designing Event Driven Systems”, O’Reilly, 2018. Find out more at http://benstopford.com.

Find Benjamin Stopford at

Last Year's Tracks