Conference:March 6-8, 2017
Workshops:March 9-10, 2017
Presentation: Power of the Log:LSM & Append Only Data Structures
Location:
- Windsor, 5th flr.
Duration
Day of week:
- Wednesday
Level:
- Intermediate
Persona:
- Developer
Key Takeaways
- Understand how the Log Structured Merge Tree algorithm works (hence how HBase, Cassandra, RocksDb etc work under the covers)
- Hear why LSM works, despite being somewhat counter-intuitive.
- Rethink how such log-centric approaches apply to a variety of use cases right up to patterns for building distributed systems.
Abstract
This talk is about the beauty of sequential access and append only data structures. We'll do this in the context of a little known paper entitled “Log Structured Merge Trees”. LSM describes a surprisingly counterintuitive approach to storing and accessing data in a sequential fashion. It came to prominence in Google's Big Table paper and today, the use of Logs, LSM and append only data structures drive many of the world's most influential storage systems: Cassandra, HBase, RocksDB, Kafka and more. Finally we'll look at how the beauty of sequential access goes beyond database internals, right through to how applications communicate, share data and scale.
Interview
Sure - sequential patterns are very much the grain of a software application. Whether you're pulling data from disk, SSDs, RAM or over a network, if you can arrange for sequential access performance will always improve. Kafka is a great example of this because it forces you to program against a stream of sequential events.
The core takeaway is this interesting LSM algorithm. They don't teach LSM in most CS courses, despite it's recent popularity. Whilst we won't be getting into the maths, hopefully everyone will understand how this algorithm works by the end of the talk, and hence how a large number of contemporary databases work under the covers. This has a few uses. It helps explain the different performance profiles we see in different storage engines. It also makes it easier to understand how to tune such devices.
Beyond LSM we'll touch on how being "log-centric" helps a number of distributed systems problems as we move to an increasingly decentralised, coordination free world.
The core takeaway is this interesting LSM algorithm. They don't teach LSM in most CS courses, despite it's recent popularity. Whilst we won't be getting into the maths, hopefully everyone will understand how this algorithm works by the end of the talk, and hence how a large number of contemporary databases work under the covers. This has a few uses. It helps explain the different performance profiles we see in different storage engines. It also makes it easier to understand how to tune such devices.
Beyond LSM we'll touch on how being "log-centric" helps a number of distributed systems problems as we move to an increasingly decentralised, coordination free world.
The core takeaway is this interesting LSM algorithm. They don't teach LSM in most CS courses, despite it's recent popularity. Whilst we won't be getting into the maths, hopefully everyone will understand how this algorithm works by the end of the talk, and hence how a large number of contemporary databases work under the covers. This has a few uses. It helps explain the different performance profiles we see in different storage engines. It also makes it easier to understand how to tune such devices.
Beyond LSM we'll touch on how being "log-centric" helps a number of distributed systems problems as we move to an increasingly decentralised, coordination free world.
Similar Talks
Tracks
-
Architecting for Failure
Building fault tolerate systems that are truly resilient
-
Architectures You've Always Wondered about
QCon classic track. You know the names. Hear their lessons and challenges.
-
Modern Distributed Architectures
Migrating, deploying, and realizing modern cloud architecture.
-
Fast & Furious: Ad Serving, Finance, & Performance
Learn some of the tips and technicals of high speed, low latency systems in Ad Serving and Finance
-
Java - Performance, Patterns and Predictions
Skills embracing the evolution of Java (multi-core, cloud, modularity) and reenforcing core platform fundamentals (performance, concurrency, ubiquity).
-
Performance Mythbusting
Performance myths that need busting and the tools & techniques to get there
-
Dark Code: The Legacy/Tech Debt Dilemma
How do you evolve your code and modernize your architecture when you're stuck with part legacy code and technical debt? Lessons from the trenches.
-
Modern Learning Systems
Real world use of the latest machine learning technologies in production environments
-
Practical Cryptography & Blockchains: Beyond the Hype
Looking past the hype of blockchain technologies, alternate title: Weaselfree Cryptography & Blockchain
-
Applied JavaScript - Atomic Applications and APIs
Angular, React, Electron, Node: The hottest trends and techniques in the JavaScript space
-
Containers - State Of The Art
What is the state of the art, what's next, & other interesting questions on containers.
-
Observability Done Right: Automating Insight & Software Telemetry
Tools, practices, and methods to know what your system is doing
-
Data Engineering : Where the Rubber meets the Road in Data Science
Science does not imply engineering. Engineering tools and techniques for Data Scientists
-
Modern CS in the Real World
Applied, practical, & real-world dive into industry adoption of modern CS ideas
-
Workhorse Languages, Not Called Java
Workhorse languages not called Java.
-
Security: Lessons Learned From Being Pwned
How Attackers Think. Penetration testing techniques, exploits, toolsets, and skills of software hackers
-
Engineering Culture @{{cool_company}}
Culture, Organization Structure, Modern Agile War Stories
-
Softskills: Essential Skills for Developers
Skills for the developer in the workplace