Building High-Fidelity Data Streams

Low latency data streaming technology and practices remain a hot and trending topic among data engineers today. At its core, it promises to deliver data in near real time in order to provide snappy data-driven user experiences. This experience comes in many forms including low latency updates to social news feeds, near-real time payment fraud prevention, time-relevant recommender systems used in flash sales, self-driving car route planning, and more. Our need to stay engaged has made low latency data streams a critical part of modern data architectures. 

While it may seem trivial to get a data streaming POC up and running, productionalizing such a system under strict SLAs with the aid of a lean engineering team requires making the right choices but also learning from mistakes along the way. At Datazoom, we built a lossless streaming data system that guarantees sub-second (p95) event delivery at scale with better than three nines availability – we measure availability in terms of the on-time delivery of events. Come to this talk to learn how you can build such a system soup-to-nuts.

What is the focus of your work these days?

I currently serve as the Chief Architect and Head of Engineering at Datazoom, a company that offers a video data platform that captures video playback telemetry data. This data can be used to understand how customers experience and interact with video. At Datazoom, we build both client SDKs and a cloud-based analytics platform.

What’s the motivation for your talk?

In this talk, I explain how engineers can build a low-latency, high-fidelity data streaming system using open source software and public cloud technologies combined with recommended best practices. My talk focuses on the non-functional requirements (e.g. the -ilities) of such a system including but not limited to scalability, performance, reliability, observability, availability, etc…

How would you describe the persona and level of the target audience?

This talk will take a ground-up approach to building such a system. My talk requires little background knowledge beyond basic familiarity with various AWS technologies & Apache Kafka. The ideal target audience would be composed of engineers, ranging from beginner to intermediate, interested in building a high-fidelity streaming system.

What do you want this persona to walk away with from your presentation?

This talk will serve as an architect’s guide to building a high-fidelity streaming system. While it may leave out specific details for lack of time, it will provide enough information to get an architect 80% of the way to building a similar system.

What do you think is the next big disruption in software?

AI-managed data infra – it is sorely needed in order to reduce the onerous task of operating data infrastructure at scale.


Speaker

Sid Anand

Chief Architect and Head of Engineering @Datazoom

Sid Anand currently serves as the Chief Architect and Head of Engineering for Datazoom, where he and his team build autonomous streaming data systems for Datazoom's high-fidelity, low latency streaming analytics needs. Prior to joining Datazoom, Sid served as PayPal's Chief Data Engineer, focusing on ways to realize the value of PayPal's hundreds of petabytes of data. Prior to joining PayPal, he held several positions including Agari's Data Architect, a Technical Lead in Search @ LinkedIn, Netflix’s Cloud Data Architect, Etsy’s VP of Engineering, and several technical roles at eBay. Sid earned his BS and MS degrees in CS from Cornell University, where he focused on Distributed Systems. Outside of work, Sid is a maintainer/committer on Apache Airflow and advises early-stage companies and several conferences (QCon, Data Council, and conferences under Skills Matter).

Read more
Find Sid Anand at:

Date

Monday Mar 27 / 01:40PM BST ( 50 minutes )

Location

Whittle (3rd Fl.)

Share

From the same track

Session Microservices

Banking on Thousands of Microservices

Monday Mar 27 / 05:25PM BST

Monzo has built an entire banking platform from scratch composed of many microservices; it serves over 7 million customers daily with an organisationally lean engineering team. All aspects of the bank are deployed hundreds of times a day (even on Fridays!).

Suhail Patel

Staff Engineer @Monzo

Session scalability

Scaling Google's Global Cloud L7 Load Balancer

Monday Mar 27 / 10:35AM BST

We'll take a look at Google's Global Cloud L7 Balancer, how it's put together and how we've scaled it to meet the reliability and performance demands of our Cloud customers.

James Spooner

Principal Engineer, Load Balancing @Google

Session scalability

Zoom: Why Does It Work?

Monday Mar 27 / 04:10PM BST

During the pandemic Zoom had to scale massively to support the big move from working in the office every day to meeting online for both business and private use. How did Zoom manage this scaling dilemma? And when you join a Zoom call how does that actually work?

Ian Sleebe

Senior Solutions Architect @Zoom

Session Microservices

Tales of Kafka @Cloudflare: Lessons Learnt on the Way to 1 Trillion Messages

Monday Mar 27 / 02:55PM BST

Cloudflare uses Kafka to decouple microservices and communicate the creation, change or deletion of various resources via a common data format in a fault-tolerant manner.

Andrea Medda

Senior Systems Engineer @Cloudflare

Matt Boyle

Engineering Manager @Cloudflare

Session

Unconference: Architectures You've Always Wondered About

Monday Mar 27 / 11:50AM BST

What is an unconference? An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.

Shane Hastie

Global Delivery Lead @SoftEd, Lead Editor for Culture & Methods @InfoQ