Presentation: Scaling Uber's Elasticsearch Clusters

Track: Distributed Stateful Systems

Location: Churchill, G flr.

Duration: 11:50am - 12:40pm

Day of week: Wednesday

Level: Intermediate - Advanced

Share this on:


Uber's Marketplace is the algorithmic brain behind Uber's ride-sharing services, and the brain needs immense amount of real-time data to make timely and sound decisions. Uber's Marketplace Intelligence team has been using Elasticsearch as a real-time OLAP database to serve thousands of internal users and dozens of services for a wide range of workload. The system is currently storing more than 800 billion documents, scanning billions of documents for thousands of queries every second, while sustaining more than 1.5 million document writes per second in the same time.
This talk will discuss in depth how Uber scaled its Elasticsearch clusters as well as its ingestion pipelines for ingestions, queries, data storage, and operations by mere three-person team, who also manage over 100 ingestion jobs. The talk will cover topics like federation, query optimization, caching, failure recovery, data fidelity, transition from Lambda architecture to Kappa architecture, and improvements on Elasticsearch internals.


How you you describe the persona and level of the target audience?


The target audience are software engineers or SREs who are interested in scaling out Elasticsearch for OLAP workload. The audience should have basic understanding of Elasticsearch and OLAP.


What do you want “that” persona to walk away from your talk knowing that they might not have known 50 minutes before?


The audience will know how to scale out Elsaticsearch as an efficient real-time OLAP system in three dimensions: data ingestion, query, and operations.


What trend in the next 12 months would you recommend an early adopter/early majority SWE to pay particular attention to?


It is quite hard to optimize a stateful distributed system. There are just too many configurations to tweak and too many knobs to turn. A promising solution is to automatically tune system configurations through machine learning, such as the OtterTune system by Aken et al. Expect that similar ideas are applied to other stateful distributed systems.

Speaker: Danny Yuan

Real-time Streaming Lead @Uber

Danny Yuan is a software engineer in Uber. He’s currently working on streaming systems for Uber’s marketplace platform. Prior to joining Uber, he worked on building Netflix’s cloud platform. His work includes predictive autoscaling, distributed tracing service, real-time data pipeline that scaled to process hundreds of billions of events every day, and Netflix’s low-latency crypto services.

Find Danny Yuan at

Similar Talks

Executive vice president @ScaleFlux
Head of Technology - Products @ThoughtWorks
Developer Advocate @JFrog
Principal Solutions Architect @SAP
Cloud Technology Consultant with an expertise in Serverless Computing