Double-distributed IN/JOIN subqueries is denied (distributed_product_mode = 'deny')

1,638 views
Skip to first unread message

kriticar

unread,
Nov 8, 2018, 4:18:06 AM11/8/18
to ClickHouse
Hi,

why this select is not working?

select         *
from         query.cm_downstream_quality
where       crm_FiberNodeName in (
                select        distinct crm_FiberNodeName
                from         query.cm_downstream_quality
                limit         10
            )


query.cm_downstream_quality is a distributed table.

Regards.
Message has been deleted

Denis Zhuravlev

unread,
Nov 8, 2018, 1:59:19 PM11/8/18
to ClickHouse
Because you need distributed_product_mode = 'allow' or 'local'
(all shards need to execute 2 distributed queries (select from B), get result into R1, (select * from A where in R1)
CH does not know that  query.cm_downstream_quality is one table in both. In your case 'local' is enough

Imagine you have two sharded (and two distributed) tables A & B (one column one row at each shard)

sharded independently
A_shard0 = 1   B_shard0 = 2
A_shard0 = 2   B_shard0 = 1


distributed_product_mode = 'allow'
select * from A where in (select from B)
--
1
2


distributed_product_mode = 'local'
select * from A where in (select from B)
--
zero rows

distributed_product_mode = 'deny'

select * from A where in (select from B)
--
error

-----------------------------------------------------------------
sharded dependently
A_shard0 = 1   B_shard0 = 1
A_shard0 = 2  B_shard0 = 2

distributed_product_mode = 'allow'

select * from A where in (select from B)
--
1
2

distributed_product_mode = 'local'
select * from A where in (select from B)
--
1
2

select * from A where in (select from B_shard)
--
1
2

distributed_product_mode = 'deny'

select * from A where in (select from B)
--
error

select * from A where in (select from B_shard)
--
1
2
Reply all
Reply to author
Forward
0 new messages