Hello. Sorry for the delay.
Scalability of query execution time depends on how many work is done on each shard in parallel, and how many work is not parallelized across shards.
Usually, what is not paralellized well is sending intermediate data over the network.
So, query scalability is depend on how many intermediate data must be sent over the network and on network speed.
And for simple GROUP BY queries, it depend of number of result rows and total size of states of aggregate functions
(states of aggregate functions need to be merged to get final result).
For example, if you use as simple aggregate function as 'sum' and your query returns just few rows, you will, sure enough, get linear scalability.
For count unique it is more difficult. uniqExact function just have all distinct values in its state, so it could be quite large.
uniq function has state up to 64K values, it does not grow further. But 64K values for each state could also be large, especially if you have large number of such states.
You could try 'uniqCombined' function. It has same or better quality that 'uniq' function, but also has more compact state.
Also you could try to use better network: 10 GBit instead of 1 GBit if it is possible; locate shards near to each other (ideally - in same switch, or at least in same region).
Sometimes you could limit amount of intermediate data for query. Consider for example, you have the following query:
SELECT key, uniq(x), count() FROM table GROUP BY key ORDER BY count() DESC LIMIT 10
It will calculate uniq for all keys, send it over the network, merge intermediate values, then select top 10.
You could rewrite it with the following:
SELECT key, uniq(x), count() FROM table
WHERE key GLOBAL IN (SELECT key, count() FROM table GROUP BY key ORDER BY count() DESC LIMIT 10)
GROUP BY key ORDER BY count() DESC LIMIT 10
This query will calculate top 10 keys inside subquery and then calculate uniq values just for top 10 keys.
Also there are some tricky ways:
SELECT key, uniq(x), count() FROM table
WHERE key IN (SELECT key FROM table WHERE rand() % 100 = 0)
GROUP BY key ORDER BY count() DESC LIMIT 10
This query calculates uniq only for keys that are likely to be frequent enough.
You could also use max_rows_to_group_by option with group_by_overflow_mode = 'any'.
Also, if you shard your data by user_id, you could avoid merging intermediate states of aggregation with 'distributed_group_by_no_merge = 1' option. This is also very tricky, see here for details:
https://github.com/yandex/ClickHouse/issues/332To check, that your query is really bounded by network speed, run
dstat linux tool. If you will see high network usage and low CPU usage during query execution, then you know, that performance is network-bound.