zstd compression

381 views
Skip to first unread message

Hiroaki Nakamura

unread,
Aug 24, 2017, 2:33:11 AM8/24/17
to ClickHouse
First of all, thanks for creating and sharing a great software like ClickHouse!

I have questions about zstd compression.

Q1)
the comment says "Don't do that if you just started using ClickHouse."
I'd like to know the reason.
Is it because "zstd compression library is highly experimental"?

Q2) Does something wrong happens if you change the config for the compression for a new database?

Q3) Can you change the compression config for an existing database?
If not, can you stop ClickHouse, compress the existing data, and restart ClickHouse to use that database?

man...@gmail.com

unread,
Sep 4, 2017, 12:10:36 AM9/4/17
to ClickHouse
Hi. Sorry for the delay.

A1:

It because zstd is somewhat slower than lz4.
It matters when you query hot data (that reside in page cache) or when you are using high-speed disk subsystem (NVMe as an example),
and when your queries are rather cheap computationally.
Here are more details: https://groups.google.com/d/msg/clickhouse/QXUXHCtRN90/f6T9lhGXCQAJ

A2:

Nothing wrong will happen. ClickHouse will apply new compression scheme for new data parts and while merging existing data parts..
Don't forget to use identical compression configuration on all replicas.

(If you have different compression configurations on replicas, nothing wrong will happen, except that replica will detect inconsistency after doing merge, and will download merged part from other replica instead of using locally merged part.)

A3:

Yes, you can. (You have to restart ClickHouse for new configuration to take effect.)
New compression scheme will take effect lazily: ClickHouse will apply new compression scheme for old data only while merging.
Data will not be forcefully re-merged and it is possible that some old data will not be re-merged at all.

To forcefully merge some data for new compression to take effect, you could run:
OPTIMIZE TABLE table PARTITION yyyymm FINAL

This will re-merge specified partition (you need to have enough free space on disk for merge to proceed).

Hiroaki Nakamura

unread,
Sep 4, 2017, 5:03:34 AM9/4/17
to ClickHouse
Hi, Thanks for your detailed reply!
I appreciate your answers.

Actually I tried a configuration which always uses zstd since then.
I used the custom built ClickHouse version with my pull resquest merged.
"Update zstd to 1.3.1 by hnakamur · Pull Request #1144 · yandex/ClickHouse"

It turned out the storage needed with zstd is about half of that with lz4
for my load test.

zstd: 0.75GB/h
lz4: 1.4GB/h

Very impressive!
I will go for always using zstd for machines with enough CPU power.

Thanks!

2017年9月4日月曜日 13時10分36秒 UTC+9 man...@gmail.com:
Reply all
Reply to author
Forward
0 new messages