Innovations in Data Engineering

Data engineering has become an indispensable function in most software engineering organizations today. Data engineering as a discipline has broadened to encompass all practices, systems, and architectures involved in storing and serving data for a myriad of needs. From OLTP systems that power user experiences to the analytics systems that power business & user insights to all of the connective tissue that keeps data consistent between these systems, data engineers have their hands full managing complex systems and architectures. The promise of the modern data stack was to simplify these architectures to reduce the operational burden many of us still wrestle with today. But, what really works? Which technologies and practices live up to their promises? What patterns and technologies have stood the test of time? What are some pitfalls that you need to be aware of? Come to this track to learn from data engineers facing & solving these problems today.


From this track

Session database

Powering User Experiences with Streaming Dataflow

Tuesday Apr 9 / 10:35AM BST

Streaming dataflow provides a unique solution to scaling OLTP applications by allowing for an efficient cache implementation that does not diverge from the relational model of the underlying data store.

Speaker image - Alana Marzoev
Alana Marzoev

Founder & CEO @ReadySet

Session ML Feature Store

The Harsh Reality of Building a Realtime ML Feature Platform

Tuesday Apr 9 / 11:45AM BST

In a world where AI and ML are rapidly evolving, the need for efficient Realtime Feature Platforms has never been greater. But the journey to create one is far from straightforward.

Speaker image - Ivan Burmistrov
Ivan Burmistrov

Principal Software Engineer @ShareChat

Session architecture

High Performance Time-Series Database Design With QuestDB

Tuesday Apr 9 / 01:35PM BST

In this talk we will explore the world of time series and unique set of problems time series present to the developers. We will discuss the engineering principles behind QuestDB's design, focusing on high performance.

Speaker image - Vlad Ilyushchenko
Vlad Ilyushchenko

Co-Founder & CTO @QuestDB, OG Author of PSY-Probe, Geek

Session architecture

Improving Developer Experience Using Automated Data CI/CD Pipelines

Tuesday Apr 9 / 02:45PM BST

Validating your code against actual production data can be challenging. We have all been at least once on the receiving end of a "test1" email subject because somebody somewhere did a test with the production database.

Speaker image - Noémi Ványi
Noémi Ványi

Senior Software Engineer @Xata

Speaker image - Simona Pencea
Simona Pencea

Staff Software Engineer @Xata

Session Building Databases

Rockset - Building a Modern Analytics Database on Top of RocksDB

Tuesday Apr 9 / 03:55PM BST

RocksDB, a key-value store built on the foundation of Log-Structured Merge-Tree data structures and originally open-sourced by Facebook, has played a significant role in shaping data systems over the past decades.

Speaker image - Igor Canadi
Igor Canadi

Founding Engineer and Architect @Rockset, Previously at RocksDB and Facebook

Session

Unconference: Innovations in Data Engineering

Tuesday Apr 9 / 05:05PM BST

An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.

Track Host

Sid Anand

Fellow, Cloud & Data Platform @Walmart, Apache Airflow Committer/PMC, Ex-Netflix, LinkedIn, eBay, Etsy, & PayPal

Sid recently joined Walmart (i.e. Walmart Global Tech) as a fellow to work on all things data. Prior to joining Walmart Global Tech, Sid served as the Chief Architect and Head of Engineering for Datazoom, where he and his team built high-fidelity, low-latency data streaming systems. Prior to joining Datazoom, Sid served as PayPal's Chief Data Engineer, where he helped build systems, platforms, teams, and processes, all with the aim of building access to the hundreds of petabytes of data under PayPal's management. Prior to joining PayPal, Sid held senior technical positions at Netflix, LinkedIn, eBay, & Etsy to name a few. He earned my BS and MS degrees in CS from Cornell University, focusing on Distributed Systems.

Outside of work, Sid advises early-stage companies and several conferences. Once an active committer on Apache Airflow, he is now mostly a fan.

Sid's body of work includes but is not limited to :

  • The world's first cloud-based streaming video service -- I was the first engineer to work on the cloud at Netflix
  • LinkedIn's Federated Search Typeahead (a.k.a. auto-complete)
  • LinkedIn's (Big Data) Self-service Marketing Analytics tool
  • PayPal's DBaaS - an internal self-service system to provision & manage heterogenous databases
  • PayPal's CDC - an internal self-service CDC system to stream DB updates to nearline applications
  • eBay-over-Skype : Following the Skype-acquisition, I built a P2P version of eBay offers
  • eBay's Best Match Search Ranking Engine powered by an In-Memory Database
  • eBay's Fuzzy-match name/email Search
  • Agari's Data Platform : Batch & Streaming Predictive Data Platform as a Service
  • Datazoom's Platform : High-fidelity, Low-latency Streaming Data Platform as a Service
Read more
Find Sid Anand at: