Modern data workloads are pushing the limits of our streaming systems. Row-oriented streaming pipelines often send every event (and every field) over the network, even if consumers only need a fraction of that data. With data volumes skyrocketing in the AI era – where machine learning and real-time analytics consume more data than ever – these inefficiencies become a serious bottleneck. In this talk, we explore a new approach that treats all data as tables, seamlessly integrating streams as continuously updating tables to make streaming more efficient and scalable.
What you'll learn:
Filtering Data at the Source: See how combining Apache Paimon with Apache Arrow (a columnar format) enables predicate pushdown and columnar storage. This allows data to be filtered and pruned at the source, significantly cutting down unnecessary network traffic (often by up to 50%). By avoiding the row-by-row transfer of irrelevant data, we reduce network overhead and boost throughput.
Unified Batch and Stream Queries: Learn how treating streams as tables lets you run the same queries across static data lakes and live streaming data without special-case code. We’ll discuss how Apache Arrow’s in-memory columnar capabilities, integrated with Apache Paimon’s table format, make it possible to query historical data and real-time events in a unified way. This streaming-lakehouse approach simplifies architecture by using one table-oriented model for both batch and streaming workloads.
Scalable Pipeline Architecture: Discover strategies for chaining multiple processing jobs (with engines like Apache Flink and even Apache Spark) while maintaining data integrity and low latency. We’ll cover how the open-source FLUSS project serves as a real-time streaming storage layer that works hand-in-hand with Flink. Drawing on real-world use cases – including how Alibaba’s platforms handle massive, continuous data streams – we’ll illustrate how this architecture supports billions of events without compromising performance.
This session is designed for data engineers and architects looking to build scalable, cost-effective data pipelines that blend streaming and batch processing. You’ll come away with practical insights into why our data architectures must evolve to support AI-driven demand, and how reimagining streams as tables can simplify your stack while delivering substantial performance gains and cost savings.
Speaker

Ben Gamble
Field CTO @Ververica
A long builder of AI powered games, simulations, and collaborative user experiences. Ben has previously built a global logistics company. Large scale online games and Augmented reality apps. Ben currently works to make fast data and AI a reality for everyone. He is the Field CTO of Ververica
Session Sponsored By

The Unified Streaming Data Platform powered by VERA, from the original creators of Apache Flink®
Speaker

Ben Gamble
Field CTO @Ververica