The Harsh Reality of Building a Realtime ML Feature Platform

In a world where AI and ML are rapidly evolving, the need for efficient Realtime Feature Platforms has never been greater. But the journey to create one is far from straightforward.

In the talk, Ivan Burmistrov will share how ShareChat - the largest social network in India - built their own Realtime Feature Platform serving more than 1 billion features per second, and how they managed to make it cost-efficient.

Ivan will cover the challenges the team faced along the way, how they managed to overcome them and which ones are still not fully resolved. The talk will also cover the experience in using relatively new technologies such as ScyllaDB and RedPanda and why such technologies are crucial for building a cost efficient system. Additionally, Ivan will share how the system leverages Apache Flink in the very core of the data pipeline.

This talk will provide insights for anyone interested in real-time data pipelines and Realtime Feature Platforms, in particular. 

Interview:

What's the focus of your work these days?

I'm a Principal Software Engineer at ShareChat, working on infrastructure for a recommendation system, with a particular focus on a realtime feature store and other data-related systems.

What's the motivation for your talk at QCon London 2024?

Building a realtime feature store has been a lot of fun, and there are a lot of real-life lessons I've learned -  which are hard to find online. So, the motivation is to share something worthwhile.

How would you describe your main persona and target audience for this session?

Anyone who is curious about realtime data processing and low-latency systems.

Is there anything specific that you'd like people to walk away with after watching your session?

I'd like attendees to walk away with an understanding of what it takes to build a realtime feature store, and generic tips and tricks on realtime data processing.
 


Speaker

Ivan Burmistrov

Principal Software Engineer @ShareChat

Ivan is an experienced software engineer with a passion in building large-scale distributed systems, realtime data processing and low-latency.

Currently Ivan is working at Indian's largest social network ShareChat, where he is leading the work on Realtime ML Feature Store, powering ShareChat's recommendation system.

Prior to ShareChat, Ivan has been working on Ads Experimentation system at Meta - one of the largest and most sophisticated experimentation systems in the world.

Outside of work, Ivan dedicates his free time to his 3-year-old daughter, whom he playfully describes as the most challenging 'system' he's ever encountered.

Read more
Find Ivan Burmistrov at:

Date

Tuesday Apr 9 / 11:45AM BST ( 50 minutes )

Location

Windsor (5th Fl.)

Topics

ML Feature Store Apache Flink ScyllaDB Realtime Infrastructure Low-latency systems

Share

From the same track

Session Building Databases

Rockset - Building a Modern Analytics Database on Top of RocksDB

Tuesday Apr 9 / 03:55PM BST

RocksDB, a key-value store built on the foundation of Log-Structured Merge-Tree data structures and originally open-sourced by Facebook, has played a significant role in shaping data systems over the past decades.

Speaker image - Igor Canadi

Igor Canadi

Founding Engineer and Architect @Rockset, Previously at RocksDB and Facebook

Session database

Powering User Experiences with Streaming Dataflow

Tuesday Apr 9 / 10:35AM BST

Streaming dataflow provides a unique solution to scaling OLTP applications by allowing for an efficient cache implementation that does not diverge from the relational model of the underlying data store.

Speaker image - Alana Marzoev

Alana Marzoev

Founder & CEO @ReadySet

Session architecture

High Performance Time-Series Database Design With QuestDB

Tuesday Apr 9 / 01:35PM BST

In this talk we will explore the world of time series and unique set of problems time series present to the developers. We will discuss the engineering principles behind QuestDB's design, focusing on high performance.

Speaker image - Vlad Ilyushchenko

Vlad Ilyushchenko

Co-Founder & CTO @QuestDB, OG Author of PSY-Probe, Geek

Session architecture

Improving Developer Experience Using Automated Data CI/CD Pipelines

Tuesday Apr 9 / 02:45PM BST

Validating your code against actual production data can be challenging. We have all been at least once on the receiving end of a "test1" email subject because somebody somewhere did a test with the production database.

Speaker image - Noémi Ványi

Noémi Ványi

Senior Software Engineer @Xata

Speaker image - Simona Pencea

Simona Pencea

Staff Software Engineer @Xata

Session

Unconference: Innovations in Data Engineering

Tuesday Apr 9 / 05:05PM BST

An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.