Presentation: Building Data Pipelines in Python

Location:

Duration

Duration: 
5:25pm - 6:15pm

Day of week:

Persona:

Abstract

This talk discusses the process of building data pipelines, e.g. extraction, cleaning, integration, pre-processing of data, in general all the steps that are necessary to prepare your data for your data-driven product. In particular, the focus is on data plumbing and on the practice of going from prototype to production.

Starting from some common anti-patterns, we'll highlight the need for a workflow manager for any non-trivial project.

We'll discuss the case for Luigi as an interesting option to consider, and we'll consider where it fits in the bigger picture of deploying a data product.

Speaker: Marco Bonzanini

Data Scientist & Co-Organiser of PyData London Meetup

I'm a Data Science consultant based in London, UK. Author of "Mastering Social Media Mining with Python", published by Packt Publishing. Co-organiser of the PyData London meetup. Backed by a PhD in Information Retrieval, I specialise in search applications and text analytics applications, and I've enjoyed working on a broad range of information management and data science projects.

Find Marco Bonzanini at

Similar Talks

Tracks

Conference for Professional Software Developers