1000 of countIf statements very slow - should I generally avoid so many columns?

759 views
Skip to first unread message

Maxim Fridental

unread,
Feb 15, 2017, 8:48:50 AM2/15/17
to ClickHouse
Hi there,

given a table with float values, I'm calculating a histogram.

This query runs 800ms:

select round(MyValue / 100, 0), count()
from MyTable
group by round(MyValue / 100, 0)

And this query runs 3400ms:
select
  countIf(MyValue < 100) bar1,
  countIf(MyValue >= 100 and MyValue < 200) bar2,
  countIf(MyValue >= 200 and MyValue < 300) bar3,
   ...
  countIf(MyValue > 1000) bar10
from MyTable

I think both queries should process the same amounts of data, so why is the time difference so substantial?

Thanks!
Maxim

Vitaliy Lyudvichenko

unread,
Feb 15, 2017, 2:45:46 PM2/15/17
to ClickHouse
GROUP BY clause has heavily optimized implementation inside ClickHouse.
It is should be a priori faster than manual SQL implementation.

The second query contains 10 columns, the first only single one.
Each column is computed on the same amount of data.
So the second query should call countIf function 10 * rowsOf(MyTable) times, the second one - rowsOf(MyTable) times.

But you can speedup the second query.
Just replace countIf(boolean_variable) to sum(boolean_variable), it will be equal transformation.
Result of all logical functions is Int8 and can be only either 0 either 1.

You could also speedup the first query if turn on dynamic code generation. See docs here (rus version).
To do so just add <compile>1</compile> string in default user profile in users.xml.
You need to run your query several (three) times, it runs dynamic code generation in background.
When code generation is finished (it takes up to 30 sec. and requires installed clang libs), ClickHouse will use compiled version for such queries.

BTW insted of round(MyValue / 100, 0) you could use round(MyValue, -2):

clickhouse-local -q "SELECT round(152, -2)"
Read 1 rows, 1.00 B in 0.000 sec., 2306 rows/sec., 2.25 KiB/sec.
200


среда, 15 февраля 2017 г., 16:48:50 UTC+3 пользователь Maxim Fridental написал:
Reply all
Reply to author
Forward
0 new messages