Presentation: Reliable & Scalable Data Infra Eco-System At Uber

Location:

Duration

Duration: 
2:55pm - 3:45pm

Day of week:

Level:

Persona:

Abstract

Uber's vision is to make transportation as reliable as running water everywhere, for everyone. Data is key for Uber's 24x7 global business operations and making data available for different use cases across the company in a reliable, scalable and performant way is often challenging.

In this talk, we will discuss the overall data analytics eco-system at Uber and learn on how Uber shapes its data from a raw form to a modeled form by leveraging various in-house and open source technologies such as Hadoop, Hive on Tez/MR, Spark, Presto, Airflow and Enterprise technology such as HPE Vertica. Consumers of this data include Machine Learning & data science, city operations, Experimentation, Fraud, Marketplace and Growth Analytics.

We will also discuss on a whole different aspect of going back to basics on traditional data modeling and how it has helped us scale analytical and adhoc interactive queries while retaining the same standard SQL interface offered by SQL-on-Hadoop technologies like Hive, Presto and Spark. We will also discuss how we built and orchestrate ETL and Data processing pipelines leveraging Piper (forked from Airflow).

Finally, we will discuss couple of real time use cases of leveraging this framework and how this helped us power key business operations.

Speaker: Sudhir Mallem

Staff Engineer @Uber

Sudhir Mallem is a Staff Engineer at Uber working in the data infrastructure team. He was previously a Staff engineer and an early team member of the data infra team at LinkedIn where he built and maintained massively scalable enterprise and analytical warehouse that powered business operations, data science and decision making at LinkedIn, leveraging both open source and enterprise software.

Find Sudhir Mallem at

Similar Talks

Distributed Systems Engineer Working on Cache @Twitter
Principal Engineer @ Sky Betting & Gaming
Software Engineer @Instagram
Elm Pioneer & Software Engineer @noredink
Responsible for Data Quality @Uber Communications
Engineering Manager @Uber - Marketplace Data & Forecasting

Tracks

Conference for Professional Software Developers