Track:

Location:

Churchill, G flr.

Duration

Duration:

5:25pm - 6:15pm

Day of week:

Monday

Level:

Advanced

Persona:

Developer

Key Takeaways

Realize the importance of tail latency, particularly at scale.
Understand design decisions and architectural patterns used in the implementation of Twitter’s replacement for Redis and Memecachd
Identify operations that "may" introduce indeterministic delays, and understand how they present at tail.

Abstract

In-memory caching in a distributed setting is widely used these days, and users usually expect it to be "fast and cheap". One important but often overlooked aspect of cache is tail latency– for many real-time applications, tail latency plays an outsized role in determining the overall performance and reliability.

Controlling tail latency, however, is tricky business, and much harder to achieve than good average latency. It requires understanding the details and subtleties in low-level system behavior, including memory management, threading model, locking, network I/O, etc.

Pelikan is a framework to implement distributed caches such as Memcached and Redis. This talk discusses the system aspects that are important to the performance, especially tail latency, of such services. It also covers the decisions we made in Pelikan, and how they help us reason about performance even before we run benchmarks.

Interview

Question:

What is Pelikan?

Answer:

Pelikan is the in memory caching framework that I have been working on to replace the caching use cases at Twitter.

People know that Twitter is, historically, a very large Memcached shop. It is also a very large Redis shop, and we have done our own forks based on those both Memcached and Redis. As we accumulate more experience with these existing caching services, we have a good idea of how to do it well. We realized that instead of having two solutions working toward the same purpose, we can unify them in a single architecture that is optimized for the use cases Twitter needs.

In short, Pelikan is the replacement for Memcached and Redis and everything they are responsible for at Twitter.

Question:

So is it kind of a ground up replacement for cache tools like Memcached and Redis?

Answer:

I would describe it as two fold. From an architectural (or design) point of view, it is a clean slate design. We come at it from first principles asking questions like ‘What is the problem we are trying to solve using caching in memory storage?’ and ‘What are the requirements?’ If we pose this problem as a new question today, how are we going to design an architecture that works for the scale and works for the datacenter that we have or that is available on the market? So from a design point of view, it is a clean slate.

However, new code is usually bad code (or if not bad, it’s ugly code). There is significant risk in using new coding in a critical piece of the architecture. In terms of implementation, we are not so strict. We actually prefer using existing code if it fits into the design well. So let’s say you have an existing networking library that works really well, there is no need to write your own library to handle connections. You can just copy the code that handles connections and drop that into your design. You can do that, because it serves exactly the same purpose. With implementation, we are much more practical and utilitarian in examining solutions that we have, including Memcached and Twemcached (which is our fork Redis). We will take a look at whoever has a good library out there that does something similar. gRPC is another one. There are actually a lot of code that does more or less the same thing, so we would just look at all the codebases we can find. If there is something that we can use, we will use it, and we will give credit to open source project.

In reality, what happened is we started by copying code and writing new code about one to one. We were re-using about 50% of our code from existing open source projects and I lost track of the ratio over time because then we started re-factoring and polishing the code. But we started with 50% re-use. The new code ratio has gone up slightly since then, because, as we add more features and as we re-write, we tend to bring more code of our own into the project.

Question:

How would you describe target audience core persona?

Answer:

The talk is best suited for people who are designing and operating large-scale distributed systems, or have some experience developing and maintaining high-performance services.

Question:

What’s the goal of your talk?

Answer:

My goal is to make a concrete case for solid performance engineering. I want to use my extensive experience with cache to bring attention to performance "corner cases" that are less visible in smaller operations, or problems that take a long time to surface. Our solution to these issues is summarized in the OSS project, Pelikan cache. I believe the best practices that we came up to solve these challenges have broader appeal outside of the immediate scope of cache, and hopefully can help other builders avoid performance headache down the line.

Speaker: Yao Yue

Distributed Systems Engineer Working on Cache @Twitter

Distributed Systems Engineer Working on Cache at Twitter

Find Yao Yue at

Speaker page

@thinkingfish

Core Kafka team @Confluent

Ben Stopford

Coding for High Frequency Trading

VP of High Frequency Engineering @Barclays

Richard Croucher

Causal Consistency For Large Neo4j Clusters

Chief Scientist @Neo4j

Jim Webber

Reliable & Scalable Data Infra Eco-System At Uber

Staff Engineer @Uber

Sudhir Mallem

Effective Data Pipelines: Data Mngmt from Chaos

Python engineer, Founder @kjamistan

Katharine Jarmul

Deliver Docker Containers Continuously on AWS

Lead Software Developer @AutoScout24

Philipp Garbe

Creating Space To Be Awesome

CTO who understands the science around helping people do their best

Meri Williams

Thinking Strategically About IoT

Senior Software Engineer @IBM, Committer on Apache Aries

Holly Cummins

Observability, Event Sourcing and State Machines

Gold Badges Java, JVM, Memory, & Performance @StackOverflow / Lead developer of the OpenHFT project

Peter Lawrey

Tracks

Architecting for Failure

Building fault tolerate systems that are truly resilient
Architectures You've Always Wondered about

QCon classic track. You know the names. Hear their lessons and challenges.
Modern Distributed Architectures

Migrating, deploying, and realizing modern cloud architecture.
Fast & Furious: Ad Serving, Finance, & Performance

Learn some of the tips and technicals of high speed, low latency systems in Ad Serving and Finance
Java - Performance, Patterns and Predictions

Skills embracing the evolution of Java (multi-core, cloud, modularity) and reenforcing core platform fundamentals (performance, concurrency, ubiquity).
Performance Mythbusting

Performance myths that need busting and the tools & techniques to get there

Dark Code: The Legacy/Tech Debt Dilemma

How do you evolve your code and modernize your architecture when you're stuck with part legacy code and technical debt? Lessons from the trenches.
Modern Learning Systems

Real world use of the latest machine learning technologies in production environments
Practical Cryptography & Blockchains: Beyond the Hype

Looking past the hype of blockchain technologies, alternate title: Weaselfree Cryptography & Blockchain
Applied JavaScript - Atomic Applications and APIs

Angular, React, Electron, Node: The hottest trends and techniques in the JavaScript space
Containers - State Of The Art

What is the state of the art, what's next, & other interesting questions on containers.
Observability Done Right: Automating Insight & Software Telemetry

Tools, practices, and methods to know what your system is doing

Data Engineering : Where the Rubber meets the Road in Data Science

Science does not imply engineering. Engineering tools and techniques for Data Scientists
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS ideas
Workhorse Languages, Not Called Java

Workhorse languages not called Java.
Security: Lessons Learned From Being Pwned

How Attackers Think. Penetration testing techniques, exploits, toolsets, and skills of software hackers
Engineering Culture @{{cool_company}}

Culture, Organization Structure, Modern Agile War Stories
Softskills: Essential Skills for Developers

Skills for the developer in the workplace

LAST YEAR'S SCHEDULE

Location:

Duration

Day of week:

Level:

Persona:

Key Takeaways

Abstract

Interview

Find Yao Yue at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: In-Memory Caching: Curb Tail Latency with Pelikan

Location:

Duration

Day of week:

Level:

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Yao Yue at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World