Lessons From a ~Yearly Re-Write of a Data Pipeline | Software Development Conference QCon London 2018

Abstract

Every year, we’ve set ourselves a goal of dramatically improving the performance and efficiency of our core data pipelines. We’ve done this by re-writing, effectively from scratch, the streaming pipelines that are responsible for processing over 120,000 events per second to deliver realtime personalisation to millions of web and mobile clients.
From our initial custom ETL system to the latest generation powered by Apache Beam, we’ve learnt to both respect and ignore the common wisdom of not re-writing software that works.
In this talk, we walk through multiple generations of our multi-tenant and high performance streaming data pipelines. We’ll compare the different approaches and frameworks, and highlight the lessons we’ve learnt from building perform data pipelines dealing with messy real-world data collection and aggregation.

Speaker: Jibran Saithi

Lead Architect @Qubit

Jibran is Lead Architect at Qubit. He has an unhealthy interest in plumbing data pipelines.

Find Jibran Saithi at

Speaker page

@jibz

Similar Talks

High Performance Java AMA w/ Gil Tene

CTO and co-founder @AzulSystems

Gil Tene

Performance Management in the Wild

Infrastructure Engineer @CapitalOne

Ivan Merrill

Tasty Topics

Mic Hussey

Serverless Spring

Senior Consulting Engineer @Pivotal

Dave Syer

JDK 9: Mission Accomplished. What Next for Java?

Deputy CTO @Azul

Simon Ritter

Understanding Geospatial Processing

Principal Solutions Architect @SAP

Vitaliy Rudnytskiy

Responsibly Smashing Pandora’s Box

Software Development Manager @metaswitch

Yanqing Cheng

XDP in Practice: DDoS Mitigation @Cloudflare

System Engineer @Cloudflare London

Gilberto Bertin

Serverless and Java in the Real World

Cloud Technology Consultant with an expertise in Serverless Computing

John Chapin

Tracks

Microservices/ Serverless: Patterns and Practices

Stories of success and failure building modern service and function-based applications, including event sourcing, reactive, decomposition, & more.
Distributed Stateful Systems

Architecting and leveraging NoSQL revisitied
Evolving Java and the JVM: Mobile, Micro and Modular

Although the Java language is holding strong as a developer favourite, new languages and paradigms are being embraced on JVM.
The Practice & Frontiers of AI

Learn about machine learning in practice and on the horizon
Operating Systems: LinuxKit, Unikernels, & Beyond

Applied, practical, & real-world deep-dive into industry adoption of OS, containers and virtualisation, including Linux on Windows, LinuxKit, and Unikernels
Stream Processing in the Modern Age

Compelling applications of stream processing & recent advances in the field

Leading Edge Backend Languages

Code the future! How cutting-edge programming languages and their more-established forerunners can help solve today and tomorrow’s server-side technical problems.
Modern CS in the Real World

Applied trends in Computer Science that are likely to affect Software Engineers today.
DevEx: The Next Evolution of DevOps

Removing friction from the developer experience.
Bare Knuckle Performance

Killing latency and getting the most out of your hardware
Tech Ethics in Action

Learning from the experiences of real-world companies driving technology decisions from ethics as much as technology.
Security: Red XOR Blue Team

Security from the defender's AND the attacker's point of view

Architecting for Failure

If you're not architecting for failure you're heading for failure
Architectures You've Always Wondered About

Topics like next-gen architecture mixed with applied use cases found in today's large-scale systems, self-driving cars, network routing, scale, robotics, cloud deployments, and more.
Observability: Logging, Alerting and Tracing

Observability in modern large distributed computer systems
Speaker AMAs (Ask Me Anything)
Building Great Engineering Cultures & Organizations

Stories of cultural change in organizations
Speaker AMAs (Ask Me Anything)

This Year's Schedule

Track: Stream Processing in the Modern Age

Location: Whittle, 3rd flr.

Duration: 11:50am - 12:40pm

Day of week: Monday

Level: Intermediate

Abstract

Find Jibran Saithi at

Similar Talks

Tracks

Learn trends from innovator and early adopter companies that you can bring home to your team

Presentation: Lessons From a ~Yearly Re-Write of a Data Pipeline

Track: Stream Processing in the Modern Age

Location: Whittle, 3rd flr.

Duration: 11:50am - 12:40pm

Day of week: Monday

Level: Intermediate

More talks on:

Share this on:

Abstract

Find Jibran Saithi at

Similar Talks

Tracks

Learn trends from innovator and early adopter companies that you can bring home to your team