Streaming Really Large Data With Flink and Fluss

Modern data workloads are pushing the limits of our streaming systems. Row-oriented streaming pipelines often send every event (and every field) over the network, even if consumers only need a fraction of that data. With data volumes skyrocketing in the AI era – where machine learning and real-time analytics consume more data than ever – these inefficiencies become a serious bottleneck. In this talk, we explore a new approach that treats all data as tables, seamlessly integrating streams as continuously updating tables to make streaming more efficient and scalable.

What you'll learn:
Filtering Data at the Source: See how combining Apache Paimon with Apache Arrow (a columnar format) enables predicate pushdown and columnar storage. This allows data to be filtered and pruned at the source, significantly cutting down unnecessary network traffic (often by up to 50%). By avoiding the row-by-row transfer of irrelevant data, we reduce network overhead and boost throughput.

Unified Batch and Stream Queries: Learn how treating streams as tables lets you run the same queries across static data lakes and live streaming data without special-case code. We’ll discuss how Apache Arrow’s in-memory columnar capabilities, integrated with Apache Paimon’s table format, make it possible to query historical data and real-time events in a unified way. This streaming-lakehouse approach simplifies architecture by using one table-oriented model for both batch and streaming workloads.

Scalable Pipeline Architecture: Discover strategies for chaining multiple processing jobs (with engines like Apache Flink and even Apache Spark) while maintaining data integrity and low latency. We’ll cover how the open-source FLUSS project serves as a real-time streaming storage layer that works hand-in-hand with Flink. Drawing on real-world use cases – including how Alibaba’s platforms handle massive, continuous data streams – we’ll illustrate how this architecture supports billions of events without compromising performance.
This session is designed for data engineers and architects looking to build scalable, cost-effective data pipelines that blend streaming and batch processing. You’ll come away with practical insights into why our data architectures must evolve to support AI-driven demand, and how reimagining streams as tables can simplify your stack while delivering substantial performance gains and cost savings.


Speaker

Ben Gamble

Field CTO @Ververica

A long builder of AI powered games, simulations, and collaborative user experiences. Ben has previously built a global logistics company. Large scale online games and Augmented reality apps. Ben currently works to make fast data and AI a reality for everyone. He is the Field CTO of Ververica

Read more

Session Sponsored By

The Unified Streaming Data Platform powered by VERA, from the original creators of Apache Flink®

Date

Tuesday Apr 8 / 02:45PM BST ( 50 minutes )

Location

Westminster (4th Fl.)

Video

Video is not available

Share

From the same track

Session

Building a Streaming Agentic AI Pipeline with Redpanda and Snowflake

Tuesday Apr 8 / 01:35PM BST

In this technical talk for developers, architects and the technically curious, Paul will cover recent developments within Redpanda Connect.

Speaker image - Paul Wilkinson

Paul Wilkinson

Principal Solutions Architect @Redpanda

Session

From Concept to Code: Navigating Agentic AI Services

Tuesday Apr 8 / 11:45AM BST

Those who embrace agentic AI will reap the rewards. Building on the strategic insights from thefirst session (“A Blueprint for Agentic AI Services”), this presentation delves into the technicalintricacies of harnessing agentic AI. Attendees will explore practical code examples that

Speaker image - Alan Klikic

Alan Klikic

Senior Solutions Architect @Akka

Session

Engineering Excellence at ING: Balance Autonomy with Standardization

Tuesday Apr 8 / 10:35AM BST

ING is committed to empowering its engineers to maximize their impact and create more value for customers. To achieve this, ING continuously seeks innovative ways to accelerate development and enhance productivity.

Speaker image - Daniele Tonella

Daniele Tonella

CTO @ING

Session

ICSAET Cohort Special Session

Tuesday Apr 8 / 03:55PM BST

Only available to attendees with a “Conference (3 days) + Certification (half day)” ticket.

Speaker image - Wes Reisz

Wes Reisz

Technical Principal @EqualExperts, ex-Thoughtworker & ex-VMWare, 16-Time QCon Chair, Creator/Co-host of The InfoQ Podcast