Rockset - Building a Modern Analytics Database on Top of RocksDB

RocksDB, a key-value store built on the foundation of Log-Structured Merge-Tree data structures and originally open-sourced by Facebook, has played a significant role in shaping data systems over the past decades. However, it hasn’t seen widespread adoption in analytics databases, mainly due to the absence of native support for tight columnar encoding formats.

This talk will explore a journey of building a modern analytical database, Rockset, on top of RocksDB. We’ll discuss a key insight that enabled us to bring columnar encoding into RocksDB, achieving not only performance parity with column-oriented databases, but also allowing real-time updates. Additionally, we will highlight the architectural advantages of deploying RocksDB in the cloud, showing how we achieved compute-storage and compute-compute separation by utilizing cloud object storage for durability and multi-tenant hot storage layer for performance. Finally, we will share learnings from operating Rockset and RocksDB in production.

Interview:

What's the focus of your work these days?

I am an engineer at Rockset, a search and analytics database, and my primary responsibility these days is query performance. My work ranges from low-level optimizations of hot inner loops to thinking about the performance of the system on a higher level and, finally, building tooling that helps us debug performance issues more quickly.

What's the motivation for your talk at QCon London 2024?

There are a number of challenges that we solved while building Rockset that I am happy to share with the audience. We picked RocksDB as our underlying key-value store, which brings important architectural advantages when deployed in the cloud and makes it easy to build compute-storage and compute-compute separation. However, off-the-shelf RocksDB is not performant for analytical queries due to lack of tight columnar encodings. This talk will explore both aspects - why RocksDB is great for the cloud, and how we made it perform well for analytical workloads.

How would you describe your main persona and target audience for this session?

This talk will be technical and will assume attendees have good knowledge of the architecture and design of data-intensive systems. The target audience will be builders and people who spend a lot of their time thinking about data, systems, and performance.

Is there anything specific that you'd like people to walk away with after watching your session?

I hope they'll gain new insights into how to build data-intensive systems in the cloud, a deeper understanding of RocksDB and some tricks on how to use it as a building block of a system where performance matters.


Speaker

Igor Canadi

Founding Engineer and Architect @Rockset, Previously at RocksDB and Facebook

Igor Canadi is a Founding Engineer at Rockset, a modern cloud-native search and analytics database, where he is responsible for the data indexing and distributed SQL engine. Previously, Igor was a Software Engineer at Facebook, where he developed RocksDB, an open-source key-value store widely deployed in the data industry, and contributed to Facebook's core GraphQL infrastructure. In his free time, he enjoys sailing and snowboarding.

Read more
Find Igor Canadi at:

Date

Tuesday Apr 9 / 03:55PM BST ( 50 minutes )

Location

Windsor (5th Fl.)

Topics

Building Databases Data Architecture performance database

Share

From the same track

Session ML Feature Store

The Harsh Reality of Building a Realtime ML Feature Platform

Tuesday Apr 9 / 11:45AM BST

In a world where AI and ML are rapidly evolving, the need for efficient Realtime Feature Platforms has never been greater. But the journey to create one is far from straightforward.

Speaker image - Ivan Burmistrov

Ivan Burmistrov

Principal Software Engineer @ShareChat

Session database

Powering User Experiences with Streaming Dataflow

Tuesday Apr 9 / 10:35AM BST

Streaming dataflow provides a unique solution to scaling OLTP applications by allowing for an efficient cache implementation that does not diverge from the relational model of the underlying data store.

Speaker image - Alana Marzoev

Alana Marzoev

Founder & CEO @ReadySet

Session architecture

High Performance Time-Series Database Design With QuestDB

Tuesday Apr 9 / 01:35PM BST

In this talk we will explore the world of time series and unique set of problems time series present to the developers. We will discuss the engineering principles behind QuestDB's design, focusing on high performance.

Speaker image - Vlad Ilyushchenko

Vlad Ilyushchenko

Co-Founder & CTO @QuestDB, OG Author of PSY-Probe, Geek

Session architecture

Improving Developer Experience Using Automated Data CI/CD Pipelines

Tuesday Apr 9 / 02:45PM BST

Validating your code against actual production data can be challenging. We have all been at least once on the receiving end of a "test1" email subject because somebody somewhere did a test with the production database.

Speaker image - Noémi Ványi

Noémi Ványi

Senior Software Engineer @Xata

Speaker image - Simona Pencea

Simona Pencea

Staff Software Engineer @Xata

Session

Unconference: Innovations in Data Engineering

Tuesday Apr 9 / 05:05PM BST

An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.