Presentation: Automatic Clustering At Snowflake

Track: Modern CS in the Real World

Location: Windsor, 5th flr.

Duration: 1:40pm - 2:30pm

Day of week: Wednesday

Share this on:

Abstract

For partitioned tables, maintaining good clustering properties for frequently filtered dimensions is critical for partition pruning and query performance. Naive methods of maintaining good clustering is usually expensive, especially when the clustering dimensions are different from the natural dimension with which the data is loaded. Usually the tradeoff between cost of reorganizing the data and benefit on the query  time taper off after a certain point. Approximate clustering is cheaper to maintain while still resulting in good pruning performance. In this talk, I will present Snowflake’s clustering capabilities, including our algorithm for incremental maintenance of approximate clustering of partitioned tables, as well as our infrastructure to perform such maintenance automatically. I will also cover some real-world problems we run into and our solutions.

Speaker: Prasanna Rajaperumal

Developer @SnowflakeDB

Prasanna Rajaperumal is a senior engineer at Snowflake, working on Snowflake Databases' Query Engine. Before Snowflake, he worked on building the next generation Data infrastructure at Uber. Over the last decade, He has been building data systems that scale in Cloudera, Cisco and few other companies before that. Prasanna graduated with a B.E. in Computer Science from BITS Pilani, India.

Find Prasanna Rajaperumal at

Tracks

The all-new QCon app!

Available on iOS and Android

The new QCon app helps you make the most of your conference experience. Easily browse and follow the conference schedule, star the talks you want to attend, and keep tabs on your personal itinerary. Download the app now for free on iOS and Android.