Hello.
Depending on your sharding configuration, you should use one of following:
1.
select count(*) from
default.graphite_d where Path in (select Path FROM default.graphite
WHERE Path like 'network.%.sFlow.all_pktsIn' GROUP BY Path order by
max(Value) desc limit 5) AND Date >= toDate(toDateTime(1470346140))
AND Date <= toDate(toDateTime(1470432540)) limit 10;
Or SET distributed_product_mode = 'local' and leave query as is, it will be rewritten in same way automatically.
2.
select count(*) from
default.graphite_d where Path global in (select Path FROM default.graphite_d
WHERE Path like 'network.%.sFlow.all_pktsIn' GROUP BY Path order by
max(Value) desc limit 5) AND Date >= toDate(toDateTime(1470346140))
AND Date <= toDate(toDateTime(1470432540)) limit 10;
Or SET distributed_product_mode = 'global' and leave query as is, it will be rewritten in same way automatically.
First variant is suitable, if corresponding data is co-located on same nodes. In your case - if all data for same Path are in one node: when data is sharded by Path. It works as follows: on each server, data will be filtered by result of subquery, that run over local table.
Second variant is suitable in all cases, but more expensive. It works as follows: first, execute distributed subquery, get result on initiating server. Then, broadcast this result to every server, into temporary table in memory. Then execute outer query distributed, using temporary table on each server. So, every server will see full result of subquery.
Also third variant is possible: on each server, process subquery distributed way. For example, if you have 10 shards, then query is send to 10 servers, and on each of them, to execute subquery, this subquery is sent to each of 10 servers again, resulting in 100 elementary queries. It is too bad, and we denied it by default, and introduced '
distributed_product_mode' to control this behaviour and to provide possibility to select one of two better options ('local' and 'global') without need of query rewrite.
All methods of processing distributed subqueries described in documentation:
https://clickhouse.yandex/reference_en.html#Distributed%20subqueries(But it does not cover '
distributed_product_mode' setting.)