Track: Data Engineering : Where the Rubber meets the Road in Data Science


Day of week:

Data Science is a discipline with brilliant minds and employing cutting edge research. However, science does not imply engineering. The Data Engineering: Where the Rubber Meets the Road in Data Science is all about advancing the engineering side of the profession. The track discusses the patterns and practices with core tooling like Jupiter Notebooks, big data cloud migrations, and lessons from Data Scientists who have been there.

Track Host:
Andreas Gertsch Grover
Head of Data Science @JinnApp
Andreas is the Head of Data Science at Jim (an on-demand delivery service based in the London area). Prior to joining Jinn in July of 2016, he was a senior Data Scientist at Skipjaq and a consultant at the Advisory House.
10:35am - 11:25am

by Katharine Jarmul
Python engineer, Founder @kjamistan

Creating automated, efficient and accurate data pipelines out of the (often) noisy, disparate and busy data flows used by today's enterprises is a difficult task. Data science teams and engineering teams may be asked to work together to create a management platform (or install one) that helps funnel these streams into the company's so-called data lake. But how are these pipelines managed? Who is in charge of maintaining services and reducing costs? How do we...

11:50am - 12:40pm

by Casey Stella
Committer and PMC member on the Apache Metron project

Any data scientist who works with real data will tell you that the hardest part of any data science task is the data preparation. Everything from cleaning dirty data to understanding where your data is missing and how your data is shaped, the care and feeding of your data is a prime task for the working data scientist.

I will describe my experiences in the field and present some useful open source software to automate some of...

1:40pm - 2:30pm

Open Space
2:55pm - 3:45pm

by Sudhir Mallem
Staff Engineer @Uber

Uber's vision is to make transportation as reliable as running water everywhere, for everyone. Data is key for Uber's 24x7 global business operations and making data available for different use cases across the company in a reliable, scalable and performant way is often challenging.

In this talk, we will discuss the overall data analytics eco-system at Uber and learn on how Uber shapes its data from a raw form to a modeled form...

4:10pm - 5:00pm

by Victor Hu
Head of Data Science @QBE

This talk will cover the challenges, both technical and cultural, of building a data science team and capability in a large, global company. It will discuss best practices, lessons learned, and rewards of leveraging data effectively in the next frontier of data science: commercial insurance.

5:25pm - 6:15pm

by Marco Bonzanini
Data Scientist & Co-Organiser of PyData London Meetup

This talk discusses the process of building data pipelines, e.g. extraction, cleaning, integration, pre-processing of data, in general all the steps that are necessary to prepare your data for your data-driven product. In particular, the focus is on data plumbing and on the practice of going from prototype to production.

Starting from some common anti-patterns, we'll highlight the need for a workflow manager for any non-trivial...