Liquid Clustering

90 views
Skip to first unread message

Tahseen Firoz

unread,
Oct 12, 2023, 12:41:31 PM10/12/23
to Delta Lake Users and Developers
Anyone did a demo in Liquid Clustering? Need some stats if its really helping.

Regards,
Tahseen

Mich Talebzadeh

unread,
Oct 13, 2023, 12:40:28 PM10/13/23
to Delta Lake Users and Developers
There was a reference to some initial findings in linkedlin on Liquid Clustering.

See below link


I put a comment on it that you can see here

Activity | Mich Talebzadeh (Ph.D.) | LinkedIn

These were my views on this feature borrowed heavily from Hive external tables

Clustering is an established concept in data management that has been in use for a considerable period. In essence, clustering enables a DW to organize data by similarity,
optimizing the storage and query performance. This is achieved by arranging the data based on the values within a chosen column. In the case of Delta Lake, I suspect the same pattern applies, it typically automates the sorting and storage decisions, often utilizing storage solutions such as gs, s3, HDFS, or other Hadoop-compatible file systems (HCFS), what else?.

Clustering works most effectively when applied to columns with high cardinality, meaning columns that have a large number of distinct values. It is important to note that the performance benefits of clustering may not be significant for tables smaller than 1 GB in size. Therefore, your mileage varies, depending on the specific use case. Combining clustering with partitioning can lead to even better performance optimization which I am not sure it is the case in this set-up.

One advantage from this clustering, is the potential to reduce data skewness which can occur when certain values are overrepresented in a column. Even distribution of data into clusters may reduce the skew problem say in in Spark.

HTH

Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom


   view my Linkedin profile


 https://en.everybodywiki.com/Mich_Talebzadeh


Reply all
Reply to author
Forward
0 new messages