Abstract
Managing vector data entails storing, updating, and searching collections of large and multi-dimensional pieces of data. Some believe that this justifies the creation of a new class of data systems specialized for this. Others would contend that such systems would eventually need to provide services provided by database system, including e.g., transaction management, role-based access control, and integration of vector search predicates in complex queries.
Recent research (PDX - Partition Dimension Across) has shown that already highly optimized vector search kernels can profit from columnar storage. This talk gives a sneak preview of our ongoing work in this area, including optimized vector ingest, tailored vector indexing, and integrated evaluation of queries and vector predicates in the DuckDB system.
Speaker
Peter Boncz
Professor @CWI, Co-Creator of MonetDB, VectorWise and MotherDuck, Database Systems Researcher, and Entrepreneur
Peter Boncz holds appointments as tenured researcher at CWI and professor at VU University Amsterdam. His academic background is in database systems, with the open-source column-store MonetDB the outcome of his PhD. He has a track record in bridging the gap between academia and commercial application, founding multiple startups. In 2008 he co-founded Vectorwise around the analytical database system by the same name, which pioneered vectorized query execution, and lightweight data compression; which have been adopted broadly in analytical database systems.
Recent work to make data (de)compression data-parallel and AI/GPU-friendly led to the FastLanes data format. In recent years he has collaborated closely with both Databricks and with MotherDuck — a startup that is connecting DuckDB to the cloud. DuckDB originates from the Database Architectures research group, which he leads at CWI (the Amsterdam research institute where also python was created).