Liquid Clustering

123 views

Skip to first unread message

Tahseen Firoz

unread,

Oct 12, 2023, 12:41:31 PM10/12/23

to Delta Lake Users and Developers

Anyone did a demo in Liquid Clustering? Need some stats if its really helping.

Regards,

Tahseen

Mich Talebzadeh

unread,

Oct 13, 2023, 12:40:28 PM10/13/23

to Delta Lake Users and Developers

There was a reference to some initial findings in linkedlin on Liquid Clustering.

See below link

Delta Lake Liquid Clustering: First Impressions | Closer Consulting (medium.com)

I put a comment on it that you can see here

Activity | Mich Talebzadeh (Ph.D.) | LinkedIn

These were my views on this feature borrowed heavily from Hive external tables

Clustering is an established concept in data management that has been in use for a considerable period. In essence, clustering enables a DW to organize data by similarity,
optimizing the storage and query performance. This is achieved by arranging the data based on the values within a chosen column. In the case of Delta Lake, I suspect the same pattern applies, it typically automates the sorting and storage decisions, often utilizing storage solutions such as gs, s3, HDFS, or other Hadoop-compatible file systems (HCFS), what else?.

Clustering works most effectively when applied to columns with high cardinality, meaning columns that have a large number of distinct values. It is important to note that the performance benefits of clustering may not be significant for tables smaller than 1 GB in size. Therefore, your mileage varies, depending on the specific use case. Combining clustering with partitioning can lead to even better performance optimization which I am not sure it is the case in this set-up.

One advantage from this clustering, is the potential to reduce data skewness which can occur when certain values are overrepresented in a column. Even distribution of data into clusters may reduce the skew problem say in in Spark.

HTH

Mich Talebzadeh,

Distinguished Technologist, Solutions Architect & Engineer

London

United Kingdom

view my Linkedin profile

https://en.everybodywiki.com/Mich_Talebzadeh

Reply all

Reply to author

Forward

0 new messages