Distributed table over system tables

355 views
Skip to first unread message

Amit Sharma

unread,
Jun 4, 2020, 12:34:21 AM6/4/20
to ClickHouse
Hi, Is it possible to create distributed table over a system table such as query_log so that I can query aggregated query_log from all the nodes in my cluster? 

Thanks, 
Amit

Denis Zhuravlev

unread,
Jun 4, 2020, 12:42:57 AM6/4/20
to ClickHouse
yes. You can create additional cluster description (remote_serves) and place all your nodes as single shards, then create a Distributed table using with new cluster.

Or you can use clusterAllReplicas table function and existing cluster description select FQDN(), * from clusterAllReplicas({your_cluster{, system, query_log);

Amit Sharma

unread,
Jun 4, 2020, 9:43:38 AM6/4/20
to ClickHouse
Denis, Which version has this function clusterAllReplicas? I couldn't find any documentation around it. 

Thanks, 
Amit

Denis Zhuravlev

unread,
Jun 4, 2020, 12:36:02 PM6/4/20
to ClickHouse
Not sure. Check changelog.

20.4.4.18 works
select FQDN(), * from clusterAllReplicas(xxx, system, metrics);

Divyanshu Jimmy

unread,
Jun 9, 2020, 12:35:35 PM6/9/20
to ClickHouse
@Denis

I tried to run the same query using cluster() method and it gave me the same result on my localhost.
Here I have two instances of click house on the same server and on different ports with each instance has different databases. 
I only have 1 replica in each of the two shards so I am guessing maybe this is the reason I am getting same results.


So, can you please tell me the difference between cluster() and clusterAllReplicas() and what is the use case? feel free to share document link as I was not able to find in the changelog for the cluster(); 

I come to know about this request https://github.com/ClickHouse/ClickHouse/issues/7262 and this solution

and line number 214 in TableFunctionRemote.cpp has an OR condition 
is_cluster_function = (name == "cluster" || name == "clusterAllReplicas"); which means both invokes same function.

But,  line number 144 in the same file, we have an if block which is invoking the different function for getting clusters.
So I am a bit confused.


With cluster() : 

SELECT name

FROM cluster('test_cluster_two_shards_localhost', 'system', 'databases')


┌─name───────────────────────────┐

│ _temporary_and_external_tables │

│ default                       

│ dist                          

│ system                        

└────────────────────────────────┘

┌─name───────────────────────────┐

│ _temporary_and_external_tables │

│ default                       

│ demo                          

│ system                        

│ tutorial                      

└────────────────────────────────┘


9 rows in set. Elapsed: 0.019 sec. 



with clusterAllReplicas()


:) select name from clusterAllReplicas('test_cluster_two_shards_localhost','system','databases') ;


SELECT name

FROM clusterAllReplicas('test_cluster_two_shards_localhost', 'system', 'databases')


┌─name───────────────────────────┐

│ _temporary_and_external_tables │

│ default                       

│ dist                          

│ system                        

└────────────────────────────────┘

┌─name───────────────────────────┐

│ _temporary_and_external_tables │

│ default                       

│ demo                          

│ system                        

│ tutorial                      

└────────────────────────────────┘


9 rows in set. Elapsed: 0.007 sec. 

Denis Zhuravlev

unread,
Jun 9, 2020, 9:03:07 PM6/9/20
to ClickHouse
both cluster() and clusterAllReplicas() return the same result if a cluster with has only shards and each shard only one replica

is_cluster_function just a flag

Divyanshu Jimmy

unread,
Jun 9, 2020, 10:06:52 PM6/9/20
to Denis Zhuravlev, ClickHouse
Thanks, Denis,

I was able to confirm and understand it.
I also observed that irrespective of the order of replicas in the shard in  <remote_server> configuration, while invoking cluster() the first available replica is chosen as expected.

Best regards


--
You received this message because you are subscribed to the Google Groups "ClickHouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clickhouse+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clickhouse/b93b830e-5e6f-40ed-a8bc-52287c3a2f8fo%40googlegroups.com.

Divyanshu Jimmy

unread,
Jun 10, 2020, 10:24:13 AM6/10/20
to Denis Zhuravlev, ClickHouse
Just a follow-up question:

Is it possible to get the same information for all the replicas present across all the clusters?

Something like this (pseudocode)

SELECT name

FROM clusterAllReplicas('[SELECT name from system.cluster], 'system', 'databases');


Denis Zhuravlev

unread,
Jun 10, 2020, 11:16:33 AM6/10/20
to ClickHouse

SELECT FQDN(), hostName(), *

FROM clusterAllReplicas('[SELECT name from system.cluster], 'system', 'databases');



To unsubscribe from this group and stop receiving emails from it, send an email to click...@googlegroups.com.

Denis Zhuravlev

unread,
Jun 10, 2020, 11:18:22 AM6/10/20
to ClickHouse
>same information for all the replicas present across all the clusters?

aaa, you have many clusters. No it's not possible.

On Wednesday, 10 June 2020 11:24:13 UTC-3, Divyanshu Jimmy wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to click...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages