You are viewing content from a past/completed QCon

Presentation: Data Inferno: 9 Circles of Data Tests With Apache Airflow

Track: Solutions Track IV

Location: Westminster, 4th flr.

Duration: 10:35am - 11:25am

Day of week: Wednesday

Share this on:

Abstract

Continuous delivery is a given nowadays. This goes hand in hand with a lot of automated testing. For 'normal' applications, such testing is well known and documented in the form of unit tests, integration tests, regression tests etc. For big data applications, however, another dimension of complexity is added: that of the data itself. The truth is: real data sucks, it always surprises you by how it differs from what you expect. Unreliable data, in turn, can result in unreliable applications, which makes for unhappy users. In this talk, we'll take you on a journey through our Nine Circles of Data Tests which ensure the data is correct and makes sense. We use Airflow to do this, testing our data and logic at several steps, in order to avoid having to debug such issues over the weekend.

Topics include:

  • CI tests for your data deployments
  • Integrating data tests into your DAG
  • DTAP-ing your data deployments
  • Integrating data science models into this engineering world
  • How we went nuclear with GIT
  • How Chuck Norris keeps us honest
  • Local Airflow in Docker

Speaker: John Müller

Data Engineer WB Advanced Analytics @ING_news (ING Bank)

John works as a Data Engineer at WB Advanced Analytics of ING Bank. Working with loads of data from all kinds of different source systems gets you intimitaly familiar with some good practices in Data Engineering, as you're going to need them all when working with all of it.

Find John Müller at

Last Year's Tracks

Monday, 4 March

Tuesday, 5 March

Wednesday, 6 March