Clickhouse and Vectorized Execution

582 views
Skip to first unread message

Alvin Hom

unread,
Mar 16, 2017, 7:15:37 PM3/16/17
to ClickHouse
I was wondering how vectorized execution works in Clickhouse.  For a normal query, I would imagine that vectorization can be applied to the following:

-  Filtering operations
-  General functions (math, ..)
-  Aggregation functions

I scan through the code and search for where SSE is defined.  It seems to be define on the filtering and the general math functions.  I did not see them in the aggregation functions.  Maybe I misread the code.

Also, have you guys done any performance benchmark on how much the vectorization improve the query performace?  Like with vectorization and without vectorization.  Have you guys look at using AVX instruction sets on newer processors?

- Alvin
Message has been deleted

Vitaliy Lyudvichenko

unread,
Mar 17, 2017, 11:14:58 AM3/17/17
to ClickHouse

Let me say some words about vectorization.


Vectorization could be manual (developer writes SSE instructions) or automatic (compiler rollups loops and inserts SSE instructions).

Modern compilers can generate quite optimal vectorized code.

Manual vectorization requires a lot of work and don't always guarantee significant speedup.


But both kind of vectorizations are possible only if data processing algorithms have local memory access patterns.

So, good data structure design automatically gives you automatic vectorization.

Column-oriented processing a-priori assumes vectorized code execution, but there are many details in its implementation...


ClickHouse indeed don't have explicit SSE instructions in aggregate functions.

However their memory access patterns remains local.

But there is an overhead for procedure add() call (it adds new element to a group).

To avoid the overhead, ClickHouse has runtime compilation of aggregate function with clang (docs in Russian, install clang, and set <compile>1</compile> in profile settings).

ClickHouse generates a code for all aggregate functions of a query, Clang comiples it, rollups loops, inlines all add() calls and tries to insert SSE code.

Such efforts allows to increase speed of GROUP BY up to 2x times.


We don't have benchmark for "CH with vectorization on" vs "CH with vectorization off" case due to reasons described above.

But we compared performance of ClickHouse with several row-oriented databases (that has lack of vectorization).

You can see the result on benchmark page.


We don't compile code with AVX by default, since some amount of productions servers dont's support it still.

You can add a compiler's flag and recompile ClickHouse to turn on AVX, but manual vectorizatoins will use SSE only.


пятница, 17 марта 2017 г., 2:15:37 UTC+3 пользователь Alvin Hom написал:

Alvin Hom

unread,
Mar 17, 2017, 1:16:50 PM3/17/17
to ClickHouse
Vitaliy,

Thanks for the detail answer.  I will take a look at the compilation of aggregation functions and do some benchmarking.  Cheers.

- Alvin
Reply all
Reply to author
Forward
0 new messages