Watching task /clickhouse/task_queue/ddl/query-0000000005 is executing longer than distributed_ddl_task_timeout

979 views
Skip to first unread message

ervin...@gmail.com

unread,
Dec 11, 2020, 9:47:03 AM12/11/20
to ClickHouse
Hey guys,
hope you're doing great.

 I've been using clickhouse with docker for about a year now without any problems. A simple cluster with 3 replicas and 1 shard. No problem at all with this cluster, can execute ddl queries and do everything.

Now a few days needed to create a new cluster. The cluster creation went well, clickhouse and zookeeper running as expected, but I'm stuck with an issue that is not letting me run "on cluster" queries, like
"create database mydb on cluster my cluster"

The query is timing out and the job is failing on all hosts.


Clickhouse error:

2020.12.11 14:25:00.727536 [ 90 ] {00707b04-f3ef-4eee-bb03-cda2ab014017} <Error> DynamicQueryHandler: Code: 159, e.displayText() = DB::Exception: Watching task /clickhouse/task_queue/ddl/query-0000000006 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 2 unfinished hosts (0 of them are currently active), they are going to execute the query in background, Stack trace (when copying this message, always include the lines below):

0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x110c8ef0 in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x998893d in /usr/bin/clickhouse
2. DB::DDLQueryStatusInputStream::readImpl() @ 0xdcad7d7 in /usr/bin/clickhouse
3. DB::IBlockInputStream::read() @ 0xdae814d in /usr/bin/clickhouse
4. void DB::copyDataImpl<std::__1::function<bool ()> const&, std::__1::function<void (DB::Block const&)> const&>(DB::IBlockInputStream&, DB::IBlockOutputStream&, std::__1::function<bool ()> const&, std::__1::function<void (DB::Block const&)> const&) @ 0xdb1bfcb in /usr/bin/clickhouse
5. DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) @ 0xe1fdda6 in /usr/bin/clickhouse
6. DB::HTTPHandler::processQuery(DB::Context&, Poco::Net::HTTPServerRequest&, HTMLForm&, Poco::Net::HTTPServerResponse&, DB::HTTPHandler::Output&) @ 0x9a6ce9c in /usr/bin/clickhouse
7. DB::HTTPHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&) @ 0x9a7095e in /usr/bin/clickhouse
8. Poco::Net::HTTPServerConnection::run() @ 0x10f798a3 in /usr/bin/clickhouse
9. Poco::Net::TCPServerConnection::start() @ 0x10fb4c6b in /usr/bin/clickhouse
10. Poco::Net::TCPServerDispatcher::run() @ 0x10fb50fb in /usr/bin/clickhouse
11. Poco::PooledThread::run() @ 0x11162cc6 in /usr/bin/clickhouse
12. Poco::ThreadImpl::runnableEntry(void*) @ 0x1115df80 in /usr/bin/clickhouse
13. start_thread @ 0x76db in /lib/x86_64-linux-gnu/libpthread-2.27.so
14. clone @ 0x121a3f in /lib/x86_64-linux-gnu/libc-2.27.so
 (version 20.4.8.99 (official build))
2020.12.11 14:25:00.728261 [ 90 ] {00707b04-f3ef-4eee-bb03-cda2ab014017} <Error> DynamicQueryHandler: Cannot send exception to client: Code: 24, e.displayText() = DB::Exception: Cannot write to ostream at offset 664, Stack trace (when copying this message, always include the lines below):


Zookeeper error:

2020-12-11 14:17:33,081 [myid:5] - INFO [CommitProcessor:5:ZooKeeperServer@694] - Established session 0x500717ef9a00000 with negotiated timeout 30000 for client /10.244.4.76:47616
2020-12-11 14:17:33,086 [myid:5] - INFO [ProcessThread(sid:5 cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x500717ef9a00000 type:create cxid:0x1 zxid:0x200000004 txntype:-1 reqpath:n/a Error Path:/clickhouse Error:KeeperErrorCode = NodeExists for /clickhouse
2020-12-11 14:17:33,088 [myid:5] - INFO [ProcessThread(sid:5 cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x500717ef9a00000 type:create cxid:0x2 zxid:0x200000005 txntype:-1 reqpath:n/a Error Path:/clickhouse/task_queue Error:KeeperErrorCode = NodeExists for /clickhouse/task_queue
2020-12-11 14:17:33,089 [myid:5] - INFO [ProcessThread(sid:5 cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x500717ef9a00000 type:create cxid:0x3 zxid:0x200000006 txntype:-1 reqpath:n/a Error Path:/clickhouse/task_queue/ddl Error:KeeperErrorCode = NodeExists for /clickhouse/task_queue/ddl
2020-12-11 14:17:38,199 [myid:5] - INFO [ProcessThread(sid:5 cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x4002893c4ff0000 type:create cxid:0x1 zxid:0x200000008 txntype:-1 reqpath:n/a Error Path:/clickhouse Error:KeeperErrorCode = NodeExists for /clickhouse
2020-12-11 14:17:38,203 [myid:5] - INFO [ProcessThread(sid:5 cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x4002893c4ff0000 type:create cxid:0x2 zxid:0x200000009 txntype:-1 reqpath:n/a Error Path:/clickhouse/task_queue Error:KeeperErrorCode = NodeExists for /clickhouse/task_queue
2020-12-11 14:17:38,207 [myid:5] - INFO [ProcessThread(sid:5 cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x4002893c4ff0000 type:create cxid:0x3 zxid:0x20000000a txntype:-1 reqpath:n/a Error Path:/clickhouse/task_queue/ddl Error:KeeperErrorCode = NodeExists for /clickhouse/task_queue/ddl
2020-12-11 14:19:34,348 [myid:5] - INFO [ProcessThread(sid:5 cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x4002893c4ff0000 type:create cxid:0x10 zxid:0x20000000b txntype:-1 reqpath:n/a Error Path:/clickhouse Error:KeeperErrorCode = NodeExists for /clickhouse
2020-12-11 14:19:34,350 [myid:5] - INFO [ProcessThread(sid:5 cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x4002893c4ff0000 type:create cxid:0x11 zxid:0x20000000c txntype:-1 reqpath:n/a Error Path:/clickhouse/task_queue Error:KeeperErrorCode = NodeExists for /clickhouse/task_queue
2020-12-11 14:19:34,351 [myid:5] - INFO [ProcessThread(sid:5 cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x4002893c4ff0000 type:create cxid:0x12 zxid:0x20000000d txntype:-1 reqpath:n/a Error Path:/clickhouse/task_queue/ddl Error:KeeperErrorCode = NodeExists for /clickhouse/task_queue/ddl
2020-12-11 14:22:00,094 [myid:5] - INFO [ProcessThread(sid:5 cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x4002893c4ff0000 type:create cxid:0x154 zxid:0x200000010 txntype:-1 reqpath:n/a Error Path:/clickhouse Error:KeeperErrorCode = NodeExists for /clickhouse
2020-12-11 14:22:00,096 [myid:5] - INFO [ProcessThread(sid:5 cport:-1)::PrepRequestProcessor@653] - Got user-level KeeperException when processing sessionid:0x4002893c4ff0000 type:create cxid:0x155 zxid:0x200000011 txntype:-1 reqpath:n/a Error Path:/clickhouse/task_queue Error:KeeperErrorCode = NodeExists for /clickhouse/task_queue


Cluster topology: 2 clickhouse nodes, 2 zookeeper nodes

metrika.xml:

<yandex>
<clickhouse_remote_servers> <!-- name of substitution-->
<single_shard_3_replicas>
<shard>
<replica>
<host>clickhouse4</host>
<port>9000</port>
<!--
<user>user</user>
<password>pass</password>
-->
</replica>
<replica>
<host>clickhouse5</host>
<port>9000</port>
<!--
<user>user</user>
<password>pass</password>
-->
</replica>

</shard>

</single_shard_3_replicas>
</clickhouse_remote_servers>


<zookeeper-servers>
<node>
<host>zoo4</host>
<port>2181</port>
</node>
<node>
<host>zoo5</host>
<port>2181</port>
</node>

</zookeeper-servers>

<macros>
<shard>01</shard>
<replica>serv4</replica>
</macros>

</yandex>


I'm obv able to create databases and tables without using the "on cluster" but all queries with "on cluster" are failing.

The only differece between this setup and the working setup that I have is that this one has 2 clickhouse/zookeeper nodes while the other one has 3.
I have also sql user access enabled on this cluster, and have tried to include user info in metrika.xml but did not make any difference.

Any tip/help would be highly appreciated.

Denis Zhuravlev

unread,
Dec 11, 2020, 5:15:28 PM12/11/20
to ClickHouse

The most probable issue is the nodes docker internal IPs/hostnames.

A node initiator (where the 'on cluster' is executed) puts into ZK a task for clickhouse4 and clickhouse5

All nodes constantly check the task queue and pull their task. If their IPs/hostnames are 127.0.0.1 / localhost they never find their tasks. Because localhost != clickhouse4


https://stackoverflow.com/a/64947962/11644308

Message has been deleted

ervin...@gmail.com

unread,
Dec 12, 2020, 4:05:28 PM12/12/20
to ClickHouse

Hi Denis, yes you're right. The task in zookeeper is being created using clickhouse4 and clickhouse5 as host names.

[zk: localhost:2181(CONNECTED) 13] get /clickhouse/task_queue/ddl/query-0000000000 
version: 1
query: CREATE DATABASE optimizer_mobile ON CLUSTER single_shard_3_replicas
hosts: ['clickhouse4:9000','clickhouse5:9000']
initiator: clickhouse4%2D0%2Eclickhouse4%2Eclickhouse%2Dcluster%2Dprod%2Esvc%2Ecluster%2Elocal:9000

cZxid = 0x100000006
ctime = Fri Dec 11 16:29:21 UTC 2020
mZxid = 0x100000006
mtime = Fri Dec 11 16:29:21 UTC 2020
pZxid = 0x100000007
cversion = 2
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 233
numChildren = 2
[zk: localhost:2181(CONNECTED) 14]

However  not sure how to check which host name is clickhouse using to check the task. Will that be the hostname of the container?
When I do "cat /etc/hostname" in the container I see clickhouse4-0 and clickhouse5-0 respectively.

ervin...@gmail.com

unread,
Dec 12, 2020, 5:24:46 PM12/12/20
to ClickHouse
Found the solution, sharing it here with others that might need.

Im my case the cluster was deployed in kubernetes, and it was necessary to add an host alias to the yaml:

spec:
hostAliases:
- ip: 127.0.0.1
hostnames:
- clickhouse4

This basically will create a new entry in the /etc/hosts file of the container by adding "127.0.0.1 clickhouse4".
This way the clickhouse node will be able to identify it self as "clickhouse4" and not only "clickhouse4-0" which was added automatically by kubernetes.


Thanks again to Denis for pointing this out really clear.

Denis Zhuravlev

unread,
Dec 12, 2020, 5:27:31 PM12/12/20
to ClickHouse
>The task in zookeeper is being created using clickhouse4 and clickhouse5 as host names.

But I wrote about another problem.

A node initiator (where the 'on cluster' is executed) puts into ZK a task for clickhouse4 and clickhouse5

All nodes constantly check the task queue and pull their task. If their OWN IPs/hostnames are 127.0.0.1 / localhost they never find their tasks. Because localhost != clickhouse4


It's important that shard nodes understanding about their own names/ips (hostnames -f / ip a) matched with names / ips used in the cluster definition.



Reply all
Reply to author
Forward
0 new messages