Track:

Location:

Windsor, 5th flr.

Duration

Duration:

5:25pm - 6:15pm

Day of week:

Wednesday

Level:

Intermediate

Persona:

Developer

Key Takeaways

Understand how the Log Structured Merge Tree algorithm works (hence how HBase, Cassandra, RocksDb etc work under the covers)
Hear why LSM works, despite being somewhat counter-intuitive.
Rethink how such log-centric approaches apply to a variety of use cases right up to patterns for building distributed systems.

Abstract

This talk is about the beauty of sequential access and append only data structures. We'll do this in the context of a little known paper entitled “Log Structured Merge Trees”. LSM describes a surprisingly counterintuitive approach to storing and accessing data in a sequential fashion. It came to prominence in Google's Big Table paper and today, the use of Logs, LSM and append only data structures drive many of the world's most influential storage systems: Cassandra, HBase, RocksDB, Kafka and more. Finally we'll look at how the beauty of sequential access goes beyond database internals, right through to how applications communicate, share data and scale.

Interview

Question:

You say sequential access goes beyond database internals, can you elaborate?

Answer:

Sure - sequential patterns are very much the grain of a software application. Whether you're pulling data from disk, SSDs, RAM or over a network, if you can arrange for sequential access performance will always improve. Kafka is a great example of this because it forces you to program against a stream of sequential events.

Question:

What do you want someone focused on developing distributed systems to leave your talk with?

Answer:

The core takeaway is this interesting LSM algorithm. They don't teach LSM in most CS courses, despite it's recent popularity. Whilst we won't be getting into the maths, hopefully everyone will understand how this algorithm works by the end of the talk, and hence how a large number of contemporary databases work under the covers. This has a few uses. It helps explain the different performance profiles we see in different storage engines. It also makes it easier to understand how to tune such devices.

Beyond LSM we'll touch on how being "log-centric" helps a number of distributed systems problems as we move to an increasingly decentralised, coordination free world.

Question:

What do you want someone focused on developing distributed systems to leave your talk with?

Answer:

Beyond LSM we'll touch on how being "log-centric" helps a number of distributed systems problems as we move to an increasingly decentralised, coordination free world.

Question:

What do you want someone focused on developing distributed systems to leave your talk with?

Answer:

Beyond LSM we'll touch on how being "log-centric" helps a number of distributed systems problems as we move to an increasingly decentralised, coordination free world.

Speaker: Ben Stopford

Core Kafka team @Confluent

Ben is an engineer working on the Core Apache Kafka at Confluent Inc (the company behind Apache Kafka). He's worked with distributed data infrastructure for over a decade, switching between engineering products and helping companies use them. Before Confluent he designed the central data infrastructure for a large investment bank. His earlier career spanned a variety of projects at Thoughtworks and UK-based enterprise companies.

Find Ben Stopford at

Speaker page

http://benstopford.com

@benstopford

Engineer at Confluent

Chief Scientist @Neo4j

Jim Webber

Effective Data Pipelines: Data Mngmt from Chaos

Python engineer, Founder @kjamistan

Katharine Jarmul

Deliver Docker Containers Continuously on AWS

Lead Software Developer @AutoScout24

Philipp Garbe

Creating Space To Be Awesome

CTO who understands the science around helping people do their best

Meri Williams

Thinking Strategically About IoT

Senior Software Engineer @IBM, Committer on Apache Aries

Holly Cummins

In-Memory Caching: Curb Tail Latency with Pelikan

Distributed Systems Engineer Working on Cache @Twitter

Yao Yue

Observability, Event Sourcing and State Machines

Gold Badges Java, JVM, Memory, & Performance @StackOverflow / Lead developer of the OpenHFT project

Peter Lawrey

Microservices At The Heart of BBC iPlayer

Software Engineer @BBC iPlayer

Cem Staveley

The Hitchhiker's Guide to Serverless Javascript

Director of Engineering @Bustle

Steve Faulkner

Tracks

Architecting for Failure

Building fault tolerate systems that are truly resilient
Architectures You've Always Wondered about

QCon classic track. You know the names. Hear their lessons and challenges.
Modern Distributed Architectures

Migrating, deploying, and realizing modern cloud architecture.
Fast & Furious: Ad Serving, Finance, & Performance

Learn some of the tips and technicals of high speed, low latency systems in Ad Serving and Finance
Java - Performance, Patterns and Predictions

Skills embracing the evolution of Java (multi-core, cloud, modularity) and reenforcing core platform fundamentals (performance, concurrency, ubiquity).
Performance Mythbusting

Performance myths that need busting and the tools & techniques to get there

Dark Code: The Legacy/Tech Debt Dilemma

How do you evolve your code and modernize your architecture when you're stuck with part legacy code and technical debt? Lessons from the trenches.
Modern Learning Systems

Real world use of the latest machine learning technologies in production environments
Practical Cryptography & Blockchains: Beyond the Hype

Looking past the hype of blockchain technologies, alternate title: Weaselfree Cryptography & Blockchain
Applied JavaScript - Atomic Applications and APIs

Angular, React, Electron, Node: The hottest trends and techniques in the JavaScript space
Containers - State Of The Art

What is the state of the art, what's next, & other interesting questions on containers.
Observability Done Right: Automating Insight & Software Telemetry

Tools, practices, and methods to know what your system is doing

Data Engineering : Where the Rubber meets the Road in Data Science

Science does not imply engineering. Engineering tools and techniques for Data Scientists
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS ideas
Workhorse Languages, Not Called Java

Workhorse languages not called Java.
Security: Lessons Learned From Being Pwned

How Attackers Think. Penetration testing techniques, exploits, toolsets, and skills of software hackers
Engineering Culture @{{cool_company}}

Culture, Organization Structure, Modern Agile War Stories
Softskills: Essential Skills for Developers

Skills for the developer in the workplace

LAST YEAR'S SCHEDULE

Location:

Duration

Day of week:

Level:

Persona:

Key Takeaways

Abstract

Interview

Find Ben Stopford at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Power of the Log:LSM & Append Only Data Structures

Location:

Duration

Day of week:

Level:

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Ben Stopford at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World