Abstract

Analytical data management systems have long been monolithic monsters far removed from the action by ancient protocols. Redesigning them to move into the application process greatly streamlines data transfer, deployment, and management. This new class of systems a whole new class of use cases, for example in-browser or edge OLAP, running SQL queries in lambdas, and Big Data on laptops.

DuckDB is a new analytical data management system that is built for an in-process use case. DuckDB speaks SQL, is trivially integrated as a library, and uses state-of-the art query processing techniques with vectorized execution and lightweight compression. DuckDB is Free and Open Source software that is distributed under the permissive MIT license. In my talk, I will explain the rationale and design decisions behind DuckDB and give a tour of the internals.

Interview:

What's the focus of your work these days?

I spend most of my time working on DuckDB. It's the open source project that I co-founded, and we have spun out from the research institute that I worked on into a separate company, which is called DuckDB Labs, which I'm also leading.

What's the motivation for your talk at QCon London 2023?

The motivation for the talk came from interacting with data practitioners some years ago; we found out that they really hated using data systems. As somebody who builds data systems, I was a bit concerned that the world hated us, so we were starting to rethink how data systems could work. We came up with this idea that they should be running in process, and that's what I want to talk about. I want to show people how powerful this new way of thinking about data systems is.

How would you describe your main persona and target audience for this session?

I think there are two groups. The first group consists of data analysts and data scientists that are interested in processing large data sets with SQL. The other group is data engineers that are trying to build data pipelines, as DuckDB does it. I'm going to talk about this and it's going to be very useful for those with a more embedded role. These are the two groups that I think would most interested.

Is there anything specific that you'd like people to walk away with after watching your session?

My first motivation is for them to have heard of us. We are a fast-growing project, but I'm told there are still some people out there that haven't heard of us. I think the way DuckDB works can really open up new possibilities and dimensions for people to think about how to build data pipelines and how to analyze data. So I think for them to walk away with that insight would be great.

Speaker

Hannes Mühleisen

Co-founder and CEO @duckdblabs

Prof. Dr. Hannes Mühleisen is a creator of the DuckDB database management system and Co-founder and CEO of DuckDB Labs, a consulting company providing services around DuckDB. He is also a senior researcher of the Database Architectures group at the Centrum Wiskunde & Informatica (CWI), the Dutch national research lab for Mathematics and Computer Science in Amsterdam. Hannes is also Professor of Data Engineering at Radboud Universiteit Nijmegen. His' main interest is analytical data management systems.

In-Process Analytical Data Management with DuckDB

Abstract

Interview:

What's the focus of your work these days?

What's the motivation for your talk at QCon London 2023?

How would you describe your main persona and target audience for this session?

Is there anything specific that you'd like people to walk away with after watching your session?

Speaker

Hannes Mühleisen

Speaker

Hannes Mühleisen

Date

Location

Track

Topics

Share

From the same track

Change Data Capture for Microservices

Amazon DynamoDB Distributed Transactions at Scale

Speed of Apache Pinot at the Cost of Cloud Object Storage with Tiered Storage

Multi-Region Data Streaming with Redpanda

A New Era for Database Design with TigerBeetle

Follow QCon

Contact

Menu

Conferences around the World