Abstract
Modern data processing systems—databases, analytics engines, vector stores, and stream processors—hide an extraordinary amount of performance engineering beneath their abstractions. In this talk, we “open the lid” on data systems and reveal the low‑level performance tricks they rely on to achieve staggering throughput: vectorized execution, branch‑aware operators, cache‑optimized data layouts, late materialization, compressed‑buffer processing, adaptive execution paths, and microarchitectural tuning. We’ll break down what really happens between a high‑level query and the CPU pipeline—and why systems spend so much energy manipulating data before doing any useful computation.
Drawing on data processing systems research, we explore the surprising bottlenecks hidden inside operators, memory hierarchies, and parallelization strategies. We abstract these insights into concrete techniques you can apply directly in your own codebases: when to favor columnar layouts, how to reduce branch mispredictions, when compression accelerates computation, how to leverage vectorization safely, and how modern profiling tools can expose CPU‑level inefficiencies that traditional metrics miss.
Whether you build databases, ML pipelines, backend services, or high‑performance libraries, this talk will give you a new mental model of how data really flows through modern hardware—and how to “liberate" the best ideas from the fastest systems in the world to make your code faster.
Speaker
Holger Pirk
Associate Professor for Data Management Systems at Imperial College London and Avid Runner — Minimizing Cache Misses, Thread Divergence and Aerobic Decoupling
Holger Pirk is an Associate Professor in the Large‑Scale Data and Systems group at Imperial College London and an avid runner. His research spans all things data: analytics, transactions, systems, algorithms, data structures, processing models, and everything in between. While some of his work targets “traditional” relational databases, his broader aim is to expand the applicability of data management techniques. To this end, Holger studies Composable Database Systems—systems that are extensible to support heterogeneous workloads, data models, and hardware. This naturally leads to research at the intersection of data management, compilers, and computer architecture, with applications in areas ranging from generative modeling and graph processing to classic analytical workloads. Before joining Imperial, Holger was a Postdoctoral Associate in the Database Group at MIT CSAIL, a PhD student in the Database Architectures Group at CWI in Amsterdam, and an undergraduate in Computer Science at Humboldt‑Universität zu Berlin. Holger knows how to speak and write, as evidenced, respectively, by a CIDR Gong Show Award and a VLDB Best Paper Award.