What is performance impact of the Nullable columns?

1,777 views
Skip to first unread message

kriticar

unread,
Sep 3, 2019, 10:34:09 AM9/3/19
to ClickHouse
Hi,

from the experience, what is the performance impact of using Nullable columns?
In documentation (https://clickhouse.yandex/docs/en/data_types/nullable/) it is stated: "Using Nullable almost always negatively affects performance, keep this in mind when designing your databases."

What is your experience?

Regards.

Denis Zhuravlev

unread,
Sep 3, 2019, 11:07:09 PM9/3/19
to ClickHouse
Up to 2 times.

create table XxX(A Int64, B Int64, C Nullable(Int64)) Engine=MergeTree order by A;
insert into XxX select number id, id % 9973 x, x from  numbers (10000000);

select count(), B from XxX group by B format Null;
9973 rows in set. Elapsed: 0.043 sec. Processed 10.00 million rows, 80.00 MB (229.89 million rows/s., 1.84 GB/s.)

select count(), C from XxX group by C format Null;
9973 rows in set. Elapsed: 0.101 sec. Processed 10.00 million rows, 90.00 MB (98.87 million rows/s., 889.79 MB/s.)

kriticar

unread,
Sep 4, 2019, 3:17:37 AM9/4/19
to ClickHouse
Thanks Denis,

your selects prove everything.

Because of the performance impact, i am considering few options for fields that will not be part of sorting key:

1. for fields of UInt type, I will change them to Int type, and replace all Null values with value -1. This will be OK for fields whose original values are always positive. It is kind of impractical, but it could work.

2. for fields of Int type, maybe I can find a value (positive or negative) that is not likely to show up among values, or use Nullable and pay the consequences.

3. for fields of String type I can provide default value for an example 'n/a'.

If fields has to be part of the sorting key, the problem is even bigger.

Are better handling of Nullable columns on Clickhouse roadmap in the near future?

Regards.

Denis Zhuravlev

unread,
Sep 4, 2019, 9:30:25 AM9/4/19
to ClickHouse
>Are better handling of Nullable columns on Clickhouse roadmap in the near future?
I am not a developer and I can't speak for developer's team (because I am just DBA and I don't work at Yandex).
But AFAIK it's impossible to speed up even in theory.
Currently CH does double work because nulls stored in different file.
If CH team will decide to use bit from type itself (MAXINT for UInt64 for example) as you (1,2,3) it will slow down math for UInt64 because there is no CPU instructions which know such math.

kriticar

unread,
Sep 5, 2019, 4:24:59 AM9/5/19
to ClickHouse
What do you mean with "there is no CPU instructions which know such math"?

Regards.

kutrapali

unread,
May 31, 2021, 5:55:08 AM5/31/21
to ClickHouse
Hello! Then you will have to replace the "coalesce" function with "if" in the calculations. Will there be the same request processing time if you use the default value, and in the "if" calculations, as when using "Nullable"
среда, 4 сентября 2019 г. в 06:07:09 UTC+3, denis.z...@gmail.com:

Denis Zhuravlev

unread,
May 31, 2021, 11:53:16 AM5/31/21
to ClickHouse
> Then you will have to replace the "coalesce" function with "if" in the calculations.
In 99% cases you don't need, for example for empty String values. And in the most cases you can do this replacing in your application not in CH.
But it's up to you. There are pros and cons for both solutions.
I started to use Clickhouse when Nullable types did not exist I don't see a problem here.

Sébastien Estienney

unread,
May 31, 2021, 2:16:20 PM5/31/21
to ClickHouse
Using placeholder like -1 could be a good idea but keep in mind that null value are viral and provide some handy feature. for exemple "A+B*C" return NULL if A or B or C is null, so you can just coalesce le result ; doing the same logic with placeholder is a little more tricky.

Denis Zhuravlev

unread,
May 31, 2021, 2:29:42 PM5/31/21
to ClickHouse
 it's up to you. There are pros and cons for both solutions.
I started to use Clickhouse when Nullable types did not exist I don't see a problem here.

Right now I have ~100 tables, some tables with 500 columns. 10 PT of data. ZERO nullable columns.
Yes, sometimes it's cumbersome, and not obvious,  for example I store -700 Fahrenheit degrees instead of Null.
Reply all
Reply to author
Forward
0 new messages