Cloud Dataflow

Google Cloud Dataflow is a cloud-based data processing service for both batch and real-time data streaming applications. It enables developers to set up processing pipelines for integrating, preparing, and analyzing large data sets.

The programming model/SDK portion of Google Cloud Dataflow has been moved into an Apache Software Foundation incubator project called Apache Beam.

According to Tyler Akidau and Frances Perry (both software engineers working on Cloud Dataflow/Beam), dataflow is unique amongst data parallel systems in that it is built upon a comprehensive model for out-of-order processing: one designed to meet the challenges of real-time data processing without compromising correctness, motivated by our years of experience with production batch and streaming systems at Google.

Position on the Adoption Curve

Presentations about Cloud Dataflow