Conference:March 6-8, 2017
Workshops:March 9-10, 2017
Presentation: Reliable & Scalable Data Infra Eco-System At Uber
Location:
- Windsor, 5th flr.
Duration
Day of week:
- Monday
Level:
- Intermediate
Persona:
- Data Scientist
Abstract
Uber's vision is to make transportation as reliable as running water everywhere, for everyone. Data is key for Uber's 24x7 global business operations and making data available for different use cases across the company in a reliable, scalable and performant way is often challenging.
In this talk, we will discuss the overall data analytics eco-system at Uber and learn on how Uber shapes its data from a raw form to a modeled form by leveraging various in-house and open source technologies such as Hadoop, Hive on Tez/MR, Spark, Presto, Airflow and Enterprise technology such as HPE Vertica. Consumers of this data include Machine Learning & data science, city operations, Experimentation, Fraud, Marketplace and Growth Analytics.
We will also discuss on a whole different aspect of going back to basics on traditional data modeling and how it has helped us scale analytical and adhoc interactive queries while retaining the same standard SQL interface offered by SQL-on-Hadoop technologies like Hive, Presto and Spark. We will also discuss how we built and orchestrate ETL and Data processing pipelines leveraging Piper (forked from Airflow).
Finally, we will discuss couple of real time use cases of leveraging this framework and how this helped us power key business operations.
Similar Talks


Tracks
-
Architecting for Failure
Building fault tolerate systems that are truly resilient
-
Architectures You've Always Wondered about
QCon classic track. You know the names. Hear their lessons and challenges.
-
Modern Distributed Architectures
Migrating, deploying, and realizing modern cloud architecture.
-
Fast & Furious: Ad Serving, Finance, & Performance
Learn some of the tips and technicals of high speed, low latency systems in Ad Serving and Finance
-
Java - Performance, Patterns and Predictions
Skills embracing the evolution of Java (multi-core, cloud, modularity) and reenforcing core platform fundamentals (performance, concurrency, ubiquity).
-
Performance Mythbusting
Performance myths that need busting and the tools & techniques to get there
-
Dark Code: The Legacy/Tech Debt Dilemma
How do you evolve your code and modernize your architecture when you're stuck with part legacy code and technical debt? Lessons from the trenches.
-
Modern Learning Systems
Real world use of the latest machine learning technologies in production environments
-
Practical Cryptography & Blockchains: Beyond the Hype
Looking past the hype of blockchain technologies, alternate title: Weaselfree Cryptography & Blockchain
-
Applied JavaScript - Atomic Applications and APIs
Angular, React, Electron, Node: The hottest trends and techniques in the JavaScript space
-
Containers - State Of The Art
What is the state of the art, what's next, & other interesting questions on containers.
-
Observability Done Right: Automating Insight & Software Telemetry
Tools, practices, and methods to know what your system is doing
-
Data Engineering : Where the Rubber meets the Road in Data Science
Science does not imply engineering. Engineering tools and techniques for Data Scientists
-
Modern CS in the Real World
Applied, practical, & real-world dive into industry adoption of modern CS ideas
-
Workhorse Languages, Not Called Java
Workhorse languages not called Java.
-
Security: Lessons Learned From Being Pwned
How Attackers Think. Penetration testing techniques, exploits, toolsets, and skills of software hackers
-
Engineering Culture @{{cool_company}}
Culture, Organization Structure, Modern Agile War Stories
-
Softskills: Essential Skills for Developers
Skills for the developer in the workplace