Presentation: Scaling Uber's Elasticsearch Clusters
Share this on:
Abstract
Uber's Marketplace is the algorithmic brain behind Uber's ride-sharing services, and the brain needs immense amount of real-time data to make timely and sound decisions. Uber's Marketplace Intelligence team has been using Elasticsearch as a real-time OLAP database to serve thousands of internal users and dozens of services for a wide range of workload. The system is currently storing more than 800 billion documents, scanning billions of documents for thousands of queries every second, while sustaining more than 1.5 million document writes per second in the same time.
This talk will discuss in depth how Uber scaled its Elasticsearch clusters as well as its ingestion pipelines for ingestions, queries, data storage, and operations by mere three-person team, who also manage over 100 ingestion jobs. The talk will cover topics like federation, query optimization, caching, failure recovery, data fidelity, transition from Lambda architecture to Kappa architecture, and improvements on Elasticsearch internals.
How you you describe the persona and level of the target audience?
The target audience are software engineers or SREs who are interested in scaling out Elasticsearch for OLAP workload. The audience should have basic understanding of Elasticsearch and OLAP.
What do you want “that” persona to walk away from your talk knowing that they might not have known 50 minutes before?
The audience will know how to scale out Elsaticsearch as an efficient real-time OLAP system in three dimensions: data ingestion, query, and operations.
What trend in the next 12 months would you recommend an early adopter/early majority SWE to pay particular attention to?
It is quite hard to optimize a stateful distributed system. There are just too many configurations to tweak and too many knobs to turn. A promising solution is to automatically tune system configurations through machine learning, such as the OtterTune system by Aken et al. Expect that similar ideas are applied to other stateful distributed systems.
Similar Talks
Tracks
Monday, 5 March
-
Leading Edge Backend Languages
Code the future! How cutting-edge programming languages and their more-established forerunners can help solve today and tomorrow’s server-side technical problems.
-
Security: Red XOR Blue Team
Security from the defender's AND the attacker's point of view
-
Microservices/ Serverless: Patterns and Practices
Stories of success and failure building modern service and function-based applications, including event sourcing, reactive, decomposition, & more.
-
Stream Processing in the Modern Age
Compelling applications of stream processing & recent advances in the field
-
DevEx: The Next Evolution of DevOps
Removing friction from the developer experience.
-
Modern CS in the Real World
Applied trends in Computer Science that are likely to affect Software Engineers today.
-
Speaker AMAs (Ask Me Anything)
Tuesday, 6 March
-
Next Gen Banking: It’s not all Blockchains and ICOs
Great technologies like Blockchain, smartphones and biometrics must not be limited to just faster banking, but better banking.
-
Observability: Logging, Alerting and Tracing
Observability in modern large distributed computer systems
-
Building Great Engineering Cultures & Organizations
Stories of cultural change in organizations
-
Architectures You've Always Wondered About
Topics like next-gen architecture mixed with applied use cases found in today's large-scale systems, self-driving cars, network routing, scale, robotics, cloud deployments, and more.
-
The Practice & Frontiers of AI
Learn about machine learning in practice and on the horizon
-
JavaScript and Beyond: The Future of the Frontend
Exploring the great frontend frameworks that make JavaScript so popular and theg JavaScript-based languages revolutionising frontend development.
-
Speaker AMAs (Ask Me Anything)
Wednesday, 7 March
-
Distributed Stateful Systems
Architecting and leveraging NoSQL revisitied
-
Operating Systems: LinuxKit, Unikernels, & Beyond
Applied, practical, & real-world deep-dive into industry adoption of OS, containers and virtualisation, including Linux on Windows, LinuxKit, and Unikernels
-
Architecting for Failure
If you're not architecting for failure you're heading for failure
-
Evolving Java and the JVM: Mobile, Micro and Modular
Although the Java language is holding strong as a developer favourite, new languages and paradigms are being embraced on JVM.
-
Tech Ethics in Action
Learning from the experiences of real-world companies driving technology decisions from ethics as much as technology.
-
Bare Knuckle Performance
Killing latency and getting the most out of your hardware
-
Speaker AMAs (Ask Me Anything)