Rockset - Building a Modern Analytics Database on Top of RocksDB

RocksDB, a key-value store built on the foundation of Log-Structured Merge-Tree data structures and originally open-sourced by Facebook, has played a significant role in shaping data systems over the past decades. However, it hasn’t seen widespread adoption in analytics databases, mainly due to the absence of native support for tight columnar encoding formats.

This talk will explore a journey of building a modern analytical database, Rockset, on top of RocksDB. We’ll discuss a key insight that enabled us to bring columnar encoding into RocksDB, achieving not only performance parity with column-oriented databases, but also allowing real-time updates. Additionally, we will highlight the architectural advantages of deploying RocksDB in the cloud, showing how we achieved compute-storage and compute-compute separation by utilizing cloud object storage for durability and multi-tenant hot storage layer for performance. Finally, we will share learnings from operating Rockset and RocksDB in production.

What's the focus of your work these days?

I am an engineer at Rockset, a search and analytics database, and my primary responsibility these days is query performance. My work ranges from low-level optimizations of hot inner loops to thinking about the performance of the system on a higher level and, finally, building tooling that helps us debug performance issues more quickly.

What's the motivation for your talk at QCon London 2024?

There are a number of challenges that we solved while building Rockset that I am happy to share with the audience. We picked RocksDB as our underlying key-value store, which brings important architectural advantages when deployed in the cloud and makes it easy to build compute-storage and compute-compute separation. However, off-the-shelf RocksDB is not performant for analytical queries due to lack of tight columnar encodings. This talk will explore both aspects - why RocksDB is great for the cloud, and how we made it perform well for analytical workloads.

How would you describe your main persona and target audience for this session?

This talk will be technical and will assume attendees have good knowledge of the architecture and design of data-intensive systems. The target audience will be builders and people who spend a lot of their time thinking about data, systems, and performance.

Is there anything specific that you'd like people to walk away with after watching your session?

I hope they'll gain new insights into how to build data-intensive systems in the cloud, a deeper understanding of RocksDB and some tricks on how to use it as a building block of a system where performance matters.


Speaker

Igor Canadi

Founding Engineer and Architect @Rockset, Previously at RocksDB and Facebook

Igor Canadi is a Founding Engineer at Rockset, a modern cloud-native search and analytics database, where he is responsible for the data indexing and distributed SQL engine. Previously, Igor was a Software Engineer at Facebook, where he developed RocksDB, an open-source key-value store widely deployed in the data industry, and contributed to Facebook's core GraphQL infrastructure. In his free time, he enjoys sailing and snowboarding.

Read more
Find Igor Canadi at:

Date

Tuesday Apr 9 / 03:55PM BST ( 50 minutes )

Location

Whittle (3rd Fl.)

Topics

Building Databases Data Architecture performance database

Share

From the same track

Session ML Feature Store

The Harsh Reality of Building a Realtime ML Feature Platform

Tuesday Apr 9 / 11:45AM BST

In a world where AI and ML are rapidly evolving, the need for efficient Realtime Feature Platforms has never been greater. But the journey to create one is far from straightforward.

Speaker image - Ivan Burmistrov
Ivan Burmistrov

Principal Software Engineer @ShareChat

Session Apache Iceberg

Open Formats: The Happy Accident Disrupting the Data Industry

Tuesday Apr 9 / 01:35PM BST

Analytic databases are quietly going through an unprecedented transformation. Open table formats, like Apache Iceberg, enable multiple query engines to share one central copy of a table.

Speaker image - Ryan Blue
Ryan Blue

Co-Founder and CEO @Tabular, Co-creator of Apache Iceberg

Session database

Powering User Experiences with Streaming Dataflow

Tuesday Apr 9 / 10:35AM BST

Streaming dataflow provides a unique solution to scaling OLTP applications by allowing for an efficient cache implementation that does not diverge from the relational model of the underlying data store.

Speaker image - Alana Marzoev
Alana Marzoev

Founder & CEO @ReadySet

Session architecture

High Performance Time-Series Database Design With QuestDB

Tuesday Apr 9 / 05:05PM BST

In this talk we will explore the world of time series and unique set of problems time series present to the developers. We will discuss the engineering principles behind QuestDB's design, focusing on high performance.

Speaker image - Vlad Ilyushchenko
Vlad Ilyushchenko

Co-Founder & CTO @QuestDB, OG Author of PSY-Probe, Geek

Session architecture

How Xata Improved the Way Developers Work With Data and Solved Some Tough Problems Along the Way

Tuesday Apr 9 / 02:45PM BST

Validating your code against actual production data can be challenging. We have all been at least once on the receiving end of a "test1" email subject because somebody somewhere did a test with the production database.

Speaker image - Noémi Ványi
Noémi Ványi

Senior Software Engineer @Xata

Speaker image - Simona Pencea
Simona Pencea

Staff Software Engineer @Xata