Modern data processing systems—databases, analytics engines, vector stores, and stream processors—hide an extraordinary amount of performance engineering beneath their abstractions. In this talk, we “open the lid” on data systems and reveal the low‑level performance tricks they rely on to achieve staggering throughput: vectorized execution, branch‑aware operators, cache‑optimized data layouts, late materialization, compressed‑buffer processing, adaptive execution paths, and microarchitectural tuning. We’ll break down what really happens between a high‑level query and the CPU pipeline—and why systems spend so much energy manipulating data before doing any useful computation.

Drawing on data processing systems research, we explore the surprising bottlenecks hidden inside operators, memory hierarchies, and parallelization strategies. We abstract these insights into concrete techniques you can apply directly in your own codebases: when to favor columnar layouts, how to reduce branch mispredictions, when compression accelerates computation, how to leverage vectorization safely, and how modern profiling tools can expose CPU‑level inefficiencies that traditional metrics miss.

Whether you build databases, ML pipelines, backend services, or high‑performance libraries, this talk will give you a new mental model of how data really flows through modern hardware—and how to “liberate" the best ideas from the fastest systems in the world to make your code faster.

Holger Pirk is an Associate Professor in the Large‑Scale Data and Systems group at Imperial College London and an avid runner. His research spans all things data: analytics, transactions, systems, algorithms, data structures, processing models, and everything in between. While some of his work targets “traditional” relational databases, his broader aim is to expand the applicability of data management techniques. To this end, Holger studies Composable Database Systems—systems that are extensible to support heterogeneous workloads, data models, and hardware. This naturally leads to research at the intersection of data management, compilers, and computer architecture, with applications in areas ranging from generative modeling and graph processing to classic analytical workloads. Before joining Imperial, Holger was a Postdoctoral Associate in the Database Group at MIT CSAIL, a PhD student in the Database Architectures Group at CWI in Amsterdam, and an undergraduate in Computer Science at Humboldt‑Universität zu Berlin. Holger knows how to speak and write, as evidenced, respectively, by a CIDR Gong Show Award and a VLDB Best Paper Award.

From the same track

Session AI/ML

Machine Learning at the Edge of Scale and Speed: Nanosecond Inference at the CERN Large Hadron Collider

Wednesday Mar 18 / 10:35AM GMT

The CERN Large Hadron Collider (LHC) produces O(10,000) exabytes of raw data annually from high-energy proton collisions. Handling this volume under strict compute and storage limits requires real-time event filtering capable of processing millions of collisions per second.

Thea Klaeboe Aarrestad

Particle Physics and Real-Time ML @CERN @ETH Zürich

Session compilers

Automatically Retrofitting JIT Compilers

Wednesday Mar 18 / 03:55PM GMT

We as a community have attempted, multiple times, to speed up languages such as Lua, Python, and Ruby by hand-writing JIT compilers. Sometimes we've had short-term success, but the size, and pace of change, of their standard implementations has proven difficult to keep up with over time.

Laurence Tratt

Shopify / Royal Academy of Engineering Research Chair in Language Engineering @King's College London

Session architecture

Not Just I/O: Using Async/Await for Computational Scheduling

Wednesday Mar 18 / 01:35PM GMT

In the past two years I have developed a new query execution engine for Polars, which not only tries to execute as much of your query in parallel as possible, but in a streaming fashion as well, such that you can process data sets which do not fit in memory.

Orson Peters

Senior Engineer of Query Execution @Polars, (Co-)Author of Stdlib Sort in Rust & Go

Session Data Systems

Vector Search on Columnar Storage

Wednesday Mar 18 / 11:45AM GMT

Managing vector data entails storing, updating, and searching collections of large and multi-dimensional pieces of data. Some believe that this justifies the creation of a new class of data systems specialized for this.

Peter Boncz

Professor @CWI, Co-Creator of MonetDB, VectorWise and MotherDuck, Database Systems Researcher, and Entrepreneur

Looking Under the Hood: Data Processing Systems Performance Tricks (and How to Apply Them to Your Code)

Abstract

Speaker

Holger Pirk

Find Holger Pirk at:

Speaker

Holger Pirk

Date

Location

Track

Topics

Share

From the same track

Machine Learning at the Edge of Scale and Speed: Nanosecond Inference at the CERN Large Hadron Collider

Automatically Retrofitting JIT Compilers

Not Just I/O: Using Async/Await for Computational Scheduling

Vector Search on Columnar Storage

Follow QCon

Contact

Menu

Conferences around the World