Status of LowCardinality feature

495 views
Skip to first unread message

Etienne Champetier

unread,
Aug 20, 2018, 3:56:27 PM8/20/18
to ClickHouse

Hi ClickHouse team,

I tried to read the design document of the LowCardinality feature (https://gist.github.com/alexey-milovidov/ebdc6a0b8731fdba9438bda3ec6e8ca4) using google traduction,

I tested and saw the nice size improvements, and I'm just wondering why it's behind "experimental" flag:

- Do you have some missing parts to implement ? (performance improvements ?)
- Are you waiting to iron out the remaining bugs ?
- Do you have doubts about the design ?
- doubts about the on disk format ?
- ...?

I'm just trying to have an idea of when I can try it in production ;)

Thanks a lot!

man...@gmail.com

unread,
Aug 20, 2018, 3:59:10 PM8/20/18
to ClickHouse
> Do you have some missing parts to implement ? (performance improvements ?)

The implementation is complete. There are no big missing parts.
We are going to do some renames. Probably `StringWithDictionary` alias will be removed in favor of `LowCardinality(String)`.


> Are you waiting to iron out the remaining bugs ?

Yes, this is one of the main reasons. We should test in near production workload before we can remove "experimental" flag.


> Do you have doubts about the design ?
> doubts about the on disk format ?

No doubts, the design looks solid.
We still have an option to change or tune disk format in incompatible way before removing "experimental flag".


понедельник, 20 августа 2018 г., 22:56:27 UTC+3 пользователь Etienne Champetier написал:

man...@gmail.com

unread,
Aug 20, 2018, 4:21:23 PM8/20/18
to ClickHouse
I recommend to start experimenting on your testing environment and report as much (performance comparisons, bugs) as possible.

понедельник, 20 августа 2018 г., 22:59:10 UTC+3 пользователь man...@gmail.com написал:

Vadim Tkachenko

unread,
Aug 21, 2018, 3:35:52 PM8/21/18
to ClickHouse
Hello,

Will it work with Nested and Array types? We want to store Enum like data in array, but to be able to sort it by its String values, for example.

ste...@enrich-data.io

unread,
Oct 2, 2018, 10:26:10 AM10/2/18
to ClickHouse
Hi,

I thought that Clickhouse automatically converted low cardinal columns into dictionaries, am I wrong?
Both Parquet and ORC work that way so perhaps I'm just assuming.

If that is not the case can you then please tell me when StringWithDictionary will be released?

Regards,
 -Stefán

Артем Зуйков

unread,
Oct 3, 2018, 12:52:29 PM10/3/18
to ClickHouse
Hi,

It's early alpha in current releases. You can enable it by
set allow_experimental_low_cardinality_type = 1;

If you find any troubles you're welcome to report bugs to github issues:

вторник, 2 октября 2018 г., 17:26:10 UTC+3 пользователь ste...@enrich-data.io написал:

Etienne Champetier

unread,
Oct 3, 2018, 1:00:05 PM10/3/18
to ClickHouse


Le mardi 2 octobre 2018 10:26:10 UTC-4, ste...@enrich-data.io a écrit :
Hi,

I thought that Clickhouse automatically converted low cardinal columns into dictionaries, am I wrong?
Both Parquet and ORC work that way so perhaps I'm just assuming.

Right now there is no automatic conversion
 

If that is not the case can you then please tell me when StringWithDictionary will be released?

man...@gmail.com

unread,
Oct 4, 2018, 5:35:12 PM10/4/18
to ClickHouse
ClickHouse does not automatically apply dictionary encoding for low cardinality values.
(Someone may also consider compression as dictionary encoding but it is quite different.)

We want to make this option available in explicit form first, and only then consider it as transparent feature.

вторник, 2 октября 2018 г., 17:26:10 UTC+3 пользователь ste...@enrich-data.io написал:
Hi,
Reply all
Reply to author
Forward
0 new messages