Sparse columns

358 views
Skip to first unread message

Maxim Fridental

unread,
Feb 14, 2022, 9:22:51 AM2/14/22
to ClickHouse
Hello there.

I have a table with around 1000 columns of type Nullable(Float64). Most of them are NULL for most of the rows. Whether a column contains only NULLs or is filled with values depends on the column sensor_type in the same table. I'm partitioning on YYYYMM and on this column, so that most of the columns within the same part contain only NULL values. Still, these files are (alltogether) very large. I've experimented with different codecs a little, unfortunately they don't bring much of saving. If I would remove the "all-NULL" files from the parts, I could reduce the size of this table on disk from 500 Gb to 6 Gb. 

Is there a way to prevent Clickhouse from writing "all-NULL" or "all-Zero" columns into the part, as a way to reduce storage costs and replication latency? If not, would you consider it as a future improvement?

Best, 
Maxim

Denis Zhuravlev

unread,
Feb 14, 2022, 9:50:01 AM2/14/22
to ClickHouse

name:        ratio_of_defaults_for_sparse_serialization
value:       1
changed:     0
description: Minimal ratio of number of default values to number of all values in column to store it in sparse serializations. If >= 1, columns will be always written in full serialization.
type:        Float

Maxim Fridental

unread,
Feb 14, 2022, 12:03:41 PM2/14/22
to ClickHouse
Amazing! Thank you.
Reply all
Reply to author
Forward
0 new messages