Presentation: Lessons From a ~Yearly Re-Write of a Data Pipeline

Track: Stream Processing in the Modern Age

Location: Whittle, 3rd flr.

Duration: 11:50am - 12:40pm

Day of week: Monday

Level: Intermediate

Share this on:

Abstract

Every year, we’ve set ourselves a goal of dramatically improving the performance and efficiency of our core data pipelines. We’ve done this by re-writing, effectively from scratch, the streaming pipelines that are responsible for processing over 120,000 events per second to deliver realtime personalisation to millions of web and mobile clients.
From our initial custom ETL system to the latest generation powered by Apache Beam, we’ve learnt to both respect and ignore the common wisdom of not re-writing software that works.
In this talk, we walk through multiple generations of our multi-tenant and high performance streaming data pipelines. We’ll compare the different approaches and frameworks, and highlight the lessons we’ve learnt from building perform data pipelines dealing with messy real-world data collection and aggregation.

Speaker: Jibran Saithi

Lead Architect @Qubit

Jibran is Lead Architect at Qubit. He has an unhealthy interest in plumbing data pipelines.

Find Jibran Saithi at

Similar Talks

CTO and co-founder @AzulSystems
Infrastructure Engineer @CapitalOne
Senior Consulting Engineer @Pivotal
Principal Solutions Architect @SAP
Software Development Manager @metaswitch
Cloud Technology Consultant with an expertise in Serverless Computing

Tracks