You are viewing content from a past/completed QCon

Presentation: Data Inferno: 9 Circles of Data Tests With Apache Airflow

Track: Solutions Track IV

Location: Westminster, 4th flr.

Duration: 10:35am - 11:25am

Day of week: Wednesday

Share this on:


Continuous delivery is a given nowadays. This goes hand in hand with a lot of automated testing. For 'normal' applications, such testing is well known and documented in the form of unit tests, integration tests, regression tests etc. For big data applications, however, another dimension of complexity is added: that of the data itself. The truth is: real data sucks, it always surprises you by how it differs from what you expect. Unreliable data, in turn, can result in unreliable applications, which makes for unhappy users. In this talk, we'll take you on a journey through our Nine Circles of Data Tests which ensure the data is correct and makes sense. We use Airflow to do this, testing our data and logic at several steps, in order to avoid having to debug such issues over the weekend.

Topics include:

  • CI tests for your data deployments
  • Integrating data tests into your DAG
  • DTAP-ing your data deployments
  • Integrating data science models into this engineering world
  • How we went nuclear with GIT
  • How Chuck Norris keeps us honest
  • Local Airflow in Docker

Speaker: John Müller

Data Engineer WB Advanced Analytics @ING_news (ING Bank)

John works as a Data Engineer at WB Advanced Analytics of ING Bank. Working with loads of data from all kinds of different source systems gets you intimitaly familiar with some good practices in Data Engineering, as you're going to need them all when working with all of it.

Find John Müller at


  • Architectures You've Always Wondered About

    Hard-earned lessons from the names you know on scalability, reliability, security, and performance.

  • Machine Learning: The Latest Innovations

    AI and machine learning is more approachable than ever. Discover how ML, deep learning, and other modern approaches are being used in practice.

  • Kubernetes and Cloud Architectures

    Practical approaches and lessons learned for deploying systems into Kubernetes, cloud, and FaaS platforms.

  • Evolving Java

    JVM futures, JIT directions and improvements to the runtimes stack is the theme of this year’s JVM track.

  • Next Generation Microservices: Building Distributed Systems the Right Way

    Microservice-based applications are everywhere, but well-built distributed systems are not so common. Early adopters of microservices share their insights on how to design systems the right way.

  • Chaos and Resilience: Architecting for Success

    Making systems resilient involves people and tech. Learn about strategies being used, from cognitive systems engineering to chaos engineering.

  • The Future of the API: REST, gRPC, GraphQL and More

    The humble web-based API is evolving. This track provides the what, how, and why of future APIs.

  • Streaming Data Architectures

    Today's systems move huge volumes of data. Hear how the innovators in this space are designing systems and leveraging modern data stream processing platforms.

  • Modern Compilation Targets

    Learn about the innovation happening in the compilation target space. WebAssembly is only the tip of the iceberg.

  • Leaving the Ivory Tower: Modern CS Research in the Real World

    Thoughts pushing software forward, including consensus, CRDT's, formal methods & probabilistic programming.

  • Bare Knuckle Performance

    Crushing latency and getting the most out of your hardware.

  • Leading Distributed Teams

    Remote and distributed working are increasing in popularity, but many organisations underestimate the leadership challenges. Learn from those who are doing this effectively.

  • Full Cycle Developers: Lead the People, Manage the Process & Systems

    "Full cycle developers" is not just another catch phrase; it's about engineers taking ownership and delivering value, and doing so with the support of their entire organisation. Learn more from the pioneers.

  • JavaScript: Pushing the Client Beyond the Browser

    JavaScript is not just the language of the web. Join this track to learn how the innovators are pushing the boundaries of this classic language and ecosystem.

  • When Things Go Wrong: GDPR, Ethics, & Politics

    Privacy, confidentiality, safety and security: learning from the frontlines, from both good and bad experiences

  • Growing Unicorns in the EU: Building, Leading and Scaling Financial Tech Start Ups

    Learn how EU FinTech innovators have designed, built, and led both their technologies and organisations.

  • Building High Performing Teams

    To have a high-performing team, everybody on it has to feel and act like an owner. Learn about cultivating culture, creating psychological safety, sharing the vision effectively, and more

  • Scaling Security, from Device to Cloud

    Implementing effective security is vitally important, regardless of where you are deploying software applications.