Track:

Location:

Fleming, 3rd flr.

Duration

Duration:

4:10pm - 5:00pm

Day of week:

Wednesday

Level:

Intermediate

Persona:

Developer

Key Takeaways

Learn that it's important to optimize for acceptable and predictable worst-case behavior in complex systems
Understand that being able to rely on a system automatically "just doing the right thing" in all cases is a qualitative change that often enables entirely new technology
Explore how applying techniques such as auto-scaling and dynamic work balancing limits the negative impact of information

Abstract

One of the main causes of performance problems in distributed data processing systems (from the original MapReduce to modern Spark and Flink) is "stragglers." Stragglers are parts of the input that take an unexpectedly long time to process, delaying the completion of the whole job, and wasting resources that stay idle. Stragglers can happen due to imbalance of data distribution or processing complexity, hardware/networking anomalies, and a variety of other factors.

Google Cloud Dataflow is the first system to address the problem of stragglers in a fully general way. By dynamically redistributing parts of already launched work from straggler workers onto idle workers to maximize utilization, Google Cloud Dataflow is able to preserve data consistency and minimizing re-execution.

This talk describes the theory and practice behind Cloud Dataflow's approach to straggler elimination, as well as the associated non-obvious challenges, benefits, and implications of the technique.

Interview

Question:

What is the focus of your work today?

Answer:

Right now I'm working on introducing a new primitive into the Apache Beam programming model, https://s.apache.org/splittable-do-fn. It is only marginally related to the topic of the current talk, but it enables a large number of new use cases and greatly simplifies some existing ones, in particular, making it much easier to integrate Apache Beam with new data sources.

I am also staying involved in work related to auto-scaling and dynamic work rebalancing, in particular, we are planning on exploring more into applying these techniques in the face of noisy information about the progress of shards, limiting the negative impact of missing or even completely inaccurate information.

Question:

What’s the motivation for your talk?

Answer:

It is twofold.

First, I want to advocate for the general practical software design principles on which our solution rests - for example, the importance of looking for a perfect, "holy grail" type solution, rather than settling for solutions addressing individual common cases; the importance of optimizing the worst-case behavior; the challenges of optimizing dynamic system behavior in an unpredictable environment.

Second, I would like to emphasize how important is the problem of stragglers and how enormous is the impact of having a good solution for it. I really would like to see authors of other data processing systems realize this and address the problem in their systems as well.

Question:

How you you describe the persona of the target audience of this talk?

Answer:

It describes a high-level architecture and its underlying principles, so it's targeted at tech leads who may be able to apply similar principles to their architectures. The talk is not specific to any particular language.

Question:

Can you give me a glimpse of what you’ll dive into with your talk?

Answer:

To get an idea of what I plan to discuss, a good place to start is a blog post we wrote. It’s called No Shard Left Behind. The basic idea sounds really simple in principle. We repartition data as it’s processed (Dynamic Work Rebalancing). It’s really that simple. This talk dives into why it’s also challenging and worth the effort. It’s also important to note that the talk also explores other approaches that just don’t work in all use cases.

Question:

How would you rate the level of this talk?

Answer:

Intermediate - anyone marginally familiar with parallel or distributed processing should be able to enjoy the talk.

Question:

QCon targets advanced architects and sr development leads, what do you feel will be the actionable that type of persona will walk away from your talk with?

Answer:

I hope they will walk away with an inspiration to keep looking for "holy grail"-type solutions to architectural problems they face, and an appreciation for systems that implement such solutions, such as Cloud Dataflow.

Question:

What do you feel is the most disruptive tech in IT right now?

Answer:

I have to admit I'm in a bit of a bubble, having been so excited about my project for the past few years that I haven't looked around very much. But from what I see, two directions look particularly promising: - The "cloudification" of everything - the move from libraries and on-premises hardware to high-level, zero-configuration cloud services (not just allocation of resources), serverless apps etc., hiding all the operational and scaling complexity from the developer. I'm hopeful that the entire concepts of "deployment" and "configuration" will become a thing of the past. - Tying into the previous point, the recent strides made in deep learning. I'm especially excited about how cloud services offering very high-quality ML will greatly lower the bar to entry into the market for smart apps integrated with the physical world: using sensor data, images, video, sound, etc.

Speaker: Eugene Kirpichov

Cloud Dataflow Sr SE @Google

Eugene is a Senior Software Engineer on the Cloud Dataflow team at Google, working primarily on the autoscaling and work rebalancing secret sauce as well as the Apache Beam programming model. He is also very interested in functional programming languages, data visualization (especially visualization of behavior of parallel/distributed systems), and machine learning.

Find Eugene Kirpichov at

Speaker page

@jkff

Senior Software Engineer, distributed data processing at Google

Core Kafka team @Confluent

Ben Stopford

Reliable & Scalable Data Infra Eco-System At Uber

Staff Engineer @Uber

Sudhir Mallem

Effective Data Pipelines: Data Mngmt from Chaos

Python engineer, Founder @kjamistan

Katharine Jarmul

Deliver Docker Containers Continuously on AWS

Lead Software Developer @AutoScout24

Philipp Garbe

Creating Space To Be Awesome

CTO who understands the science around helping people do their best

Meri Williams

Thinking Strategically About IoT

Senior Software Engineer @IBM, Committer on Apache Aries

Holly Cummins

In-Memory Caching: Curb Tail Latency with Pelikan

Distributed Systems Engineer Working on Cache @Twitter

Yao Yue

Observability, Event Sourcing and State Machines

Gold Badges Java, JVM, Memory, & Performance @StackOverflow / Lead developer of the OpenHFT project

Peter Lawrey

Assuring Crypto Code with Automated Reasoning

Research Lead, Software Correctness @Galois

Aaron Tomb

Tracks

Architecting for Failure

Building fault tolerate systems that are truly resilient
Architectures You've Always Wondered about

QCon classic track. You know the names. Hear their lessons and challenges.
Modern Distributed Architectures

Migrating, deploying, and realizing modern cloud architecture.
Fast & Furious: Ad Serving, Finance, & Performance

Learn some of the tips and technicals of high speed, low latency systems in Ad Serving and Finance
Java - Performance, Patterns and Predictions

Skills embracing the evolution of Java (multi-core, cloud, modularity) and reenforcing core platform fundamentals (performance, concurrency, ubiquity).
Performance Mythbusting

Performance myths that need busting and the tools & techniques to get there

Dark Code: The Legacy/Tech Debt Dilemma

How do you evolve your code and modernize your architecture when you're stuck with part legacy code and technical debt? Lessons from the trenches.
Modern Learning Systems

Real world use of the latest machine learning technologies in production environments
Practical Cryptography & Blockchains: Beyond the Hype

Looking past the hype of blockchain technologies, alternate title: Weaselfree Cryptography & Blockchain
Applied JavaScript - Atomic Applications and APIs

Angular, React, Electron, Node: The hottest trends and techniques in the JavaScript space
Containers - State Of The Art

What is the state of the art, what's next, & other interesting questions on containers.
Observability Done Right: Automating Insight & Software Telemetry

Tools, practices, and methods to know what your system is doing

Data Engineering : Where the Rubber meets the Road in Data Science

Science does not imply engineering. Engineering tools and techniques for Data Scientists
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS ideas
Workhorse Languages, Not Called Java

Workhorse languages not called Java.
Security: Lessons Learned From Being Pwned

How Attackers Think. Penetration testing techniques, exploits, toolsets, and skills of software hackers
Engineering Culture @{{cool_company}}

Culture, Organization Structure, Modern Agile War Stories
Softskills: Essential Skills for Developers

Skills for the developer in the workplace

LAST YEAR'S SCHEDULE

Location:

Duration

Day of week:

Level:

Persona:

Key Takeaways

Abstract

Interview

Find Eugene Kirpichov at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Straggler Free Data Processing in Cloud Dataflow

Location:

Duration

Day of week:

Level:

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Eugene Kirpichov at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World