Looking Under the Hood: Data Processing Systems Performance Tricks (and How to Apply Them to Your Code)

Abstract

Modern data processing systems—databases, analytics engines, vector stores, and stream processors—hide an extraordinary amount of performance engineering beneath their abstractions. In this talk, we “open the lid” on data systems and reveal the low‑level performance tricks they rely on to achieve staggering throughput: vectorized execution, branch‑aware operators, cache‑optimized data layouts, late materialization, compressed‑buffer processing, adaptive execution paths, and microarchitectural tuning. We’ll break down what really happens between a high‑level query and the CPU pipeline—and why systems spend so much energy manipulating data before doing any useful computation.

Drawing on data processing systems research, we explore the surprising bottlenecks hidden inside operators, memory hierarchies, and parallelization strategies. We abstract these insights into concrete techniques you can apply directly in your own codebases: when to favor columnar layouts, how to reduce branch mispredictions, when compression accelerates computation, how to leverage vectorization safely, and how modern profiling tools can expose CPU‑level inefficiencies that traditional metrics miss.

Whether you build databases, ML pipelines, backend services, or high‑performance libraries, this talk will give you a new mental model of how data really flows through modern hardware—and how to “liberate" the best ideas from the fastest systems in the world to make your code faster.


Speaker

Holger Pirk

Associate Professor for Data Management Systems at Imperial College London and Avid Runner — Minimizing Cache Misses, Thread Divergence and Aerobic Decoupling

Holger Pirk is an Associate Professor in the Large‑Scale Data and Systems group at Imperial College London and an avid runner. His research spans all things data: analytics, transactions, systems, algorithms, data structures, processing models, and everything in between. While some of his work targets “traditional” relational databases, his broader aim is to expand the applicability of data management techniques. To this end, Holger studies Composable Database Systems—systems that are extensible to support heterogeneous workloads, data models, and hardware. This naturally leads to research at the intersection of data management, compilers, and computer architecture, with applications in areas ranging from generative modeling and graph processing to classic analytical workloads. Before joining Imperial, Holger was a Postdoctoral Associate in the Database Group at MIT CSAIL, a PhD student in the Database Architectures Group at CWI in Amsterdam, and an undergraduate in Computer Science at Humboldt‑Universität zu Berlin. Holger knows how to speak and write, as evidenced, respectively, by a CIDR Gong Show Award and a VLDB Best Paper Award.

Read more
Find Holger Pirk at:

From the same track

Session AI/ML

Navigating the Edge of Scale and Speed for Physics Discovery

Wednesday Mar 18 / 10:35AM GMT

Details coming soon.

Speaker image - Thea  Klaeboe Aarrestad

Thea Klaeboe Aarrestad

Particle Physics and Real-Time ML @CERN @ETH Zürich

Session compilers

Automatically Retrofitting JIT Compilers

Wednesday Mar 18 / 03:55PM GMT

We as a community have attempted, multiple times, to speed up languages such as Lua, Python, and Ruby by hand-writing JIT compilers. Sometimes we've had short-term success, but the size, and pace of change, of their standard implementations has proven difficult to keep up with over time.

Speaker image - Laurence Tratt

Laurence Tratt

Shopify / Royal Academy of Engineering Research Chair in Language Engineering @King's College London

Session architecture

Not Just I/O: Using Async/Await for Computational Scheduling

Wednesday Mar 18 / 01:35PM GMT

In the past two years I have developed a new query execution engine for Polars, which not only tries to execute as much of your query in parallel as possible, but in a streaming fashion as well, such that you can process data sets which do not fit in memory.

Speaker image - Orson Peters

Orson Peters

Senior Engineer of Query Execution @Polars, (Co-)Author of Stdlib Sort in Rust & Go

Session

Vector Search on Columnar Storage

Wednesday Mar 18 / 11:45AM GMT

Managing vector data entails storing, updating, and searching collections of large and multi-dimensional pieces of data. Some believe that this justifies the creation of a new class of data systems specialized for this.

Speaker image - Peter Boncz

Peter Boncz

Professor @CWI, Co-Creator of MonetDB, VectorWise and MotherDuck, Database Systems Researcher, and Entrepreneur