shunned node not recovering even when the node/db does come up

17 views
Skip to first unread message

arun chakrapani rao

unread,
Jun 11, 2025, 12:34:16 PMJun 11
to codership
Hi,
I am testing a write failover on a 5 node cluster on rocky linux.
node 1 node 2 was chosen as write and node 3/4 was chose for select.
i brought the first node down and the write move to the 2nd node.
i brough the first node backup up, after few mins of shunned message online turned up and i brough the 2nd node down, the write move to the first node.
Now i brought the 2nd node up and the shunned message is not going away at all even though the mysql_servers show online of the 2nd node.
Anybody pleaes do let me know how to bring back this node out of shunned status.


Thanks in advance
Arun

arun chakrapani rao

unread,
Jun 11, 2025, 12:41:29 PMJun 11
to codership
i see the follwing mesage in t he logs

2025-06-11 19:01:51 MySQL_Monitor.cpp:7328:monitor_ping_process_ready_tasks(): [ERROR] Error after 0ms on server sangviclstr1:3306 : Lost connection to server during query
2025-06-11 19:02:07 MySQL_Session.cpp:4564:handler_minus1_ClientLibraryError(): [ERROR] Detected a broken connection while running query on (10,sangviclstr1,3306,79) , FD (Conn:78 , MyDS:78) , user tungsten , last_used 47813ms ago : 2013, Lost connection to server during query
2025-06-11 19:02:07 MySQL_Session.cpp:4578:handler_minus1_ClientLibraryError(): [WARNING] Retrying query.
2025-06-11 19:02:07 mysql_connection.cpp:1238:handler(): [ERROR] Failed to mysql_real_connect() on 10:sangviclstr1:3306 , FD (Conn:74 , MyDS:74) , 2013: Lost connection to server at 'handshake: reading initial communication packet', system error: 111.
2025-06-11 19:02:07 mysql_connection.cpp:1238:handler(): [ERROR] Failed to mysql_real_connect() on 10:sangviclstr1:3306 , FD (Conn:74 , MyDS:74) , 2013: Lost connection to server at 'handshake: reading initial communication packet', system error: 111.
2025-06-11 19:02:07 mysql_connection.cpp:1238:handler(): [ERROR] Failed to mysql_real_connect() on 10:sangviclstr1:3306 , FD (Conn:74 , MyDS:74) , 2013: Lost connection to server at 'handshake: reading initial communication packet', system error: 111.
2025-06-11 19:02:07 MySrvC.cpp:127:connect_error(): [ERROR] Shunning server sangviclstr1:3306 with 5 errors/sec. Shunning for 10 seconds
2025-06-11 20:48:22 MySQL_Monitor.cpp:7328:monitor_ping_process_ready_tasks(): [ERROR] Error after 0ms on server sangviclstr2:3306 : Lost connection to server during query
2025-06-11 20:48:42 MySQL_Monitor.cpp:3307:monitor_ping(): [ERROR] Server sangviclstr2:3306 missed 3 heartbeats, shunning it and killing all the connections. Disabling other checks until the node comes back online.




but the 2nd node is up and running and accept writess and reads from other clients.

0: 123b5eb6-46d8-11f0-9626-bfb975c0ca5c, sangviclstrdb2
1: 4072924b-46c9-11f0-97c9-576d6dac95fd, sangviclstrdb1
2: d3393643-4679-11f0-8c44-3f5149967a54, sangviclstrdb3
3: eabf828c-4679-11f0-ae9f-2bfd3395fdd2, sangviclstrdb4
4: fa75505f-4679-11f0-a9ea-c74e7ee65a4c, sangviclstrdb5
=================================================
2025-06-11 20:53:49 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2025-06-11 20:53:49 1 [Note] WSREP: Lowest cert index boundary for CC from group: 3239
2025-06-11 20:53:49 1 [Note] WSREP: Min available from gcache for CC from group: 1
2025-06-11 20:53:51 0 [Note] WSREP: (4072924b-97c9, 'tcp://0.0.0.0:4567') turning message relay requesting off
2025-06-11 20:53:54 0 [Note] WSREP: Member 0.0 (sangviclstrdb2) requested state transfer from '*any*'. Selected 4.0 (sangviclstrdb5)(SYNCED) as donor.
2025-06-11 20:53:55 0 [Note] WSREP: 4.0 (sangviclstrdb5): State transfer to 0.0 (sangviclstrdb2) complete.
2025-06-11 20:53:55 0 [Note] WSREP: Member 4.0 (sangviclstrdb5) synced with group.
2025-06-11 20:53:57 0 [Note] WSREP: 0.0 (sangviclstrdb2): State transfer from 4.0 (sangviclstrdb5) complete.
2025-06-11 20:53:57 0 [Note] WSREP: Member 0.0 (sangviclstrdb2) synced with group.



Is this a bug or is there anything which i am missing for this to roll back.

Please do let me know

thanks in advance.

Arun
Reply all
Reply to author
Forward
0 new messages