Presentation: Lessons From a ~Yearly Re-Write of a Data Pipeline
Share this on:
Abstract
Every year, we’ve set ourselves a goal of dramatically improving the performance and efficiency of our core data pipelines. We’ve done this by re-writing, effectively from scratch, the streaming pipelines that are responsible for processing over 120,000 events per second to deliver realtime personalisation to millions of web and mobile clients.
From our initial custom ETL system to the latest generation powered by Apache Beam, we’ve learnt to both respect and ignore the common wisdom of not re-writing software that works.
In this talk, we walk through multiple generations of our multi-tenant and high performance streaming data pipelines. We’ll compare the different approaches and frameworks, and highlight the lessons we’ve learnt from building perform data pipelines dealing with messy real-world data collection and aggregation.
Similar Talks
Tracks
-
Microservices/ Serverless: Patterns and Practices
Stories of success and failure building modern service and function-based applications, including event sourcing, reactive, decomposition, & more.
-
Distributed Stateful Systems
Architecting and leveraging NoSQL revisitied
-
Evolving Java and the JVM: Mobile, Micro and Modular
Although the Java language is holding strong as a developer favourite, new languages and paradigms are being embraced on JVM.
-
The Practice & Frontiers of AI
Learn about machine learning in practice and on the horizon
-
Operating Systems: LinuxKit, Unikernels, & Beyond
Applied, practical, & real-world deep-dive into industry adoption of OS, containers and virtualisation, including Linux on Windows, LinuxKit, and Unikernels
-
Stream Processing in the Modern Age
Compelling applications of stream processing & recent advances in the field
-
Leading Edge Backend Languages
Code the future! How cutting-edge programming languages and their more-established forerunners can help solve today and tomorrow’s server-side technical problems.
-
Modern CS in the Real World
Applied trends in Computer Science that are likely to affect Software Engineers today.
-
DevEx: The Next Evolution of DevOps
Removing friction from the developer experience.
-
Bare Knuckle Performance
Killing latency and getting the most out of your hardware
-
Tech Ethics in Action
Learning from the experiences of real-world companies driving technology decisions from ethics as much as technology.
-
Security: Red XOR Blue Team
Security from the defender's AND the attacker's point of view
-
Architecting for Failure
If you're not architecting for failure you're heading for failure
-
Architectures You've Always Wondered About
Topics like next-gen architecture mixed with applied use cases found in today's large-scale systems, self-driving cars, network routing, scale, robotics, cloud deployments, and more.
-
Observability: Logging, Alerting and Tracing
Observability in modern large distributed computer systems
-
Speaker AMAs (Ask Me Anything)
-
Building Great Engineering Cultures & Organizations
Stories of cultural change in organizations
-
Speaker AMAs (Ask Me Anything)