Let me say some words about vectorization.
Vectorization could be manual (developer writes SSE instructions) or automatic (compiler rollups loops and inserts SSE instructions).
Modern compilers can generate quite optimal vectorized code.
Manual vectorization requires a lot of work and don't always guarantee significant speedup.
But both kind of vectorizations are possible only if data processing algorithms have local memory access patterns.
So, good data structure design automatically gives you automatic vectorization.
Column-oriented processing a-priori assumes vectorized code execution, but there are many details in its implementation...
ClickHouse indeed don't have explicit SSE instructions in aggregate functions.
However their memory access patterns remains local.
But there is an overhead for procedure add() call (it adds new element to a group).
To avoid the overhead, ClickHouse has runtime compilation of aggregate function with clang (docs in Russian, install clang, and set <compile>1</compile> in profile settings).
ClickHouse generates a code for all aggregate functions of a query, Clang comiples it, rollups loops, inlines all add() calls and tries to insert SSE code.
Such efforts allows to increase speed of GROUP BY up to 2x times.
We don't have benchmark for "CH with vectorization on" vs "CH with vectorization off" case due to reasons described above.
But we compared performance of ClickHouse with several row-oriented databases (that has lack of vectorization).
You can see the result on benchmark page.
We don't compile code with AVX by default, since some amount of productions servers dont's support it still.
You can add a compiler's flag and recompile ClickHouse to turn on AVX, but manual vectorizatoins will use SSE only.