Presentation: Lessons From a ~Yearly Re-Write of a Data Pipeline
Share this on:
Abstract
Every year, we’ve set ourselves a goal of dramatically improving the performance and efficiency of our core data pipelines. We’ve done this by re-writing, effectively from scratch, the streaming pipelines that are responsible for processing over 120,000 events per second to deliver realtime personalisation to millions of web and mobile clients.
From our initial custom ETL system to the latest generation powered by Apache Beam, we’ve learnt to both respect and ignore the common wisdom of not re-writing software that works.
In this talk, we walk through multiple generations of our multi-tenant and high performance streaming data pipelines. We’ll compare the different approaches and frameworks, and highlight the lessons we’ve learnt from building perform data pipelines dealing with messy real-world data collection and aggregation.
Similar Talks
Tracks
Monday, 5 March
-
Leading Edge Backend Languages
Code the future! How cutting-edge programming languages and their more-established forerunners can help solve today and tomorrow’s server-side technical problems.
-
Security: Red XOR Blue Team
Security from the defender's AND the attacker's point of view
-
Microservices/ Serverless: Patterns and Practices
Stories of success and failure building modern service and function-based applications, including event sourcing, reactive, decomposition, & more.
-
Stream Processing in the Modern Age
Compelling applications of stream processing & recent advances in the field
-
DevEx: The Next Evolution of DevOps
Removing friction from the developer experience.
-
Modern CS in the Real World
Applied trends in Computer Science that are likely to affect Software Engineers today.
-
Speaker AMAs (Ask Me Anything)
Tuesday, 6 March
-
Next Gen Banking: It’s not all Blockchains and ICOs
Great technologies like Blockchain, smartphones and biometrics must not be limited to just faster banking, but better banking.
-
Observability: Logging, Alerting and Tracing
Observability in modern large distributed computer systems
-
Building Great Engineering Cultures & Organizations
Stories of cultural change in organizations
-
Architectures You've Always Wondered About
Topics like next-gen architecture mixed with applied use cases found in today's large-scale systems, self-driving cars, network routing, scale, robotics, cloud deployments, and more.
-
The Practice & Frontiers of AI
Learn about machine learning in practice and on the horizon
-
JavaScript and Beyond: The Future of the Frontend
Exploring the great frontend frameworks that make JavaScript so popular and theg JavaScript-based languages revolutionising frontend development.
-
Speaker AMAs (Ask Me Anything)
Wednesday, 7 March
-
Distributed Stateful Systems
Architecting and leveraging NoSQL revisitied
-
Operating Systems: LinuxKit, Unikernels, & Beyond
Applied, practical, & real-world deep-dive into industry adoption of OS, containers and virtualisation, including Linux on Windows, LinuxKit, and Unikernels
-
Architecting for Failure
If you're not architecting for failure you're heading for failure
-
Evolving Java and the JVM: Mobile, Micro and Modular
Although the Java language is holding strong as a developer favourite, new languages and paradigms are being embraced on JVM.
-
Tech Ethics in Action
Learning from the experiences of real-world companies driving technology decisions from ethics as much as technology.
-
Bare Knuckle Performance
Killing latency and getting the most out of your hardware
-
Speaker AMAs (Ask Me Anything)