Distributed Stateful Systems

Location: Churchill, G flr.

Day of week: Wednesday

Tackling the difficult problem of distributed state management

Track Host:
Sid Anand
Chief Data Engineer @PayPal

Sid Anand currently serves as PayPal's Chief Data Engineer, focusing on ways to realize the value of data. Prior to joining PayPal, he held several positions including Agari's Data Architect, a Technical Lead in Search @ LinkedIn, Netflix’s Cloud Data Architect, Etsy’s VP of Engineering, and several technical roles at eBay. Sid earned his BS and MS degrees in CS from Cornell University, where he focused on Distributed Systems. In his spare time, he is a maintainer/committer on Apache Airflow, a co-chair for QCon, and a frequent speaker at conferences. When not working, Sid spends time with his wife, Shalini, and their 2 kids.

10:35am - 11:25am

by Matthew Bates
Co-founder at UK Kubernetes Company Jetstack

by James Munnelly
Solutions Engineer @Jetstack

So you've mastered Kubernetes for scheduling and scaling your stateless applications. Your pager has been quieter, life is good. But what about the carefully configured database clusters running on expensive dedicated infrastructure? (And the expensive sysadmin you're paying to maintain it!).
In Kubernetes, there are now many of the building blocks needed to help herd database ‘Pets’, alongside the stateless applications in your cluster. In this talk, we'll explain how we use these...

11:50am - 12:40pm

by Danny Yuan
Real-time Streaming Lead @Uber

Uber's Marketplace is the algorithmic brain behind Uber's ride-sharing services, and the brain needs immense amount of real-time data to make timely and sound decisions. Uber's Marketplace Intelligence team has been using Elasticsearch as a real-time OLAP database to serve thousands of internal users and dozens of services for a wide range of workload. The system is currently storing more than 800 billion documents, scanning billions of documents for thousands of queries every second, while...

1:40pm - 2:30pm

by Allen Wang
Senior Software Engineer - Cloud Platform @Netflix

Kafka as a distributed stateful service faces serious stability and scalability challenges in cloud environment which favors stateless services. As cluster size grows with traffic, it faces issues of data balancing, high consumer data fan out and time consuming process to scale up or update. Failover is necessary to deal with cluster disasters but is hard to do right.

At Netflix, we address these issues by having many smaller...

2:55pm - 3:45pm

by Carlos Garcia
Ocado Smart Platform Fraud Team Lead

by Przemyslaw Pastuszka
ML Engineer @Ocado

Ocado Technology is providing a full solution to put the world’s retailers online using the cloud, robotics, AI and IoT. Processing tens of thousands of orders every day, we generate millions of events every minute, leading to huge amount of data to be managed. We will present how this Big Data is handled in Google Cloud Platform to build a end-to-end machine learning pipeline: how data is stored and processed in BigQuery, post-processed and copied with Dataflow...

4:10pm - 5:00pm

by Sumedh Pathak
VP Engineering & Co-Founder @CitusData

Years ago when working at Amazon on shopping cart infrastructure and the precursor to DynamoDB, my co-founder and I realized that while distributed key value stores were useful for a few use-cases, we missed many of the benefits of relational databases: transactions, joins, and the power of the lingua franca of RDBMS’s: SQL. So we challenged ourselves to modernize the traditional relational database, to take a robust open...