You are viewing content from a past/completed QCon

Presentation: Automatic Clustering At Snowflake

Track: Modern CS in the Real World

Location: Windsor, 5th flr.

Duration: 1:40pm - 2:30pm

Day of week: Wednesday

Slides: Download Slides

Share this on:

This presentation is now available to view on

Watch video with transcript


For partitioned tables, maintaining good clustering properties for frequently filtered dimensions is critical for partition pruning and query performance. Naive methods of maintaining good clustering is usually expensive, especially when the clustering dimensions are different from the natural dimension with which the data is loaded. Usually the tradeoff between cost of reorganizing the data and benefit on the query  time taper off after a certain point. Approximate clustering is cheaper to maintain while still resulting in good pruning performance. In this talk, I will present Snowflake’s clustering capabilities, including our algorithm for incremental maintenance of approximate clustering of partitioned tables, as well as our infrastructure to perform such maintenance automatically. I will also cover some real-world problems we run into and our solutions.

Speaker: Prasanna Rajaperumal

Developer @SnowflakeDB

Prasanna Rajaperumal is a senior engineer at Snowflake, working on Snowflake Databases' Query Engine. Before Snowflake, he worked on building the next generation Data infrastructure at Uber. Over the last decade, He has been building data systems that scale in Cloudera, Cisco and few other companies before that. Prasanna graduated with a B.E. in Computer Science from BITS Pilani, India.

Find Prasanna Rajaperumal at

Last Year's Tracks