WSREP: exception from gcomm, backend must be restarted: bytes_transferred == 0:

548 views
Skip to first unread message

TAO ZHOU

unread,
Aug 16, 2017, 2:11:16 AM8/16/17
to codership
I upgraded my existing mariadb with 40G data to galera multi-master cluster.
Here's how I did it.

1. add the wsrep parameters in my.cnf
2. restart the existing db with --wsrep-new-cluster 
3. start a second node 
4. start a garbd daemon.


It had been running well for 2 days.
The first node suddenly crashed this morning, with the following error logs:

2017-08-15 14:35:16 34650013696 [Note] WSREP: (5813fff3, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-08-15 17:22:22 44802375680 [Warning] Aborted connection 6055867 to db: 'zabbix' user: 'zabbix' host: '203.29.62.201' (Got timeout reading communication packets)
2017-08-15 23:31:53 44450496000 [Warning] Aborted connection 5601458 to db: 'unconnected' user: 'mysqldump' host: 'localhost' (Got timeout reading communication packets)
2017-08-16  1:34:42 34650013696 [ERROR] WSREP: exception from gcomm, backend must be restarted: bytes_transferred == 0:  (FATAL)
         at gcomm
/src/asio_tcp.cpp:write_handler():265
2017-08-16  1:34:42 34650013696 [Note] WSREP: gcomm: terminating thread
2017-08-16  1:34:42 34650013696 [Note] WSREP: gcomm: joining thread
2017-08-16  1:34:42 34650013696 [Note] WSREP: gcomm: closing backend
2017-08-16  1:34:42 34650013696 [Note] WSREP: Forced PC close
2017-08-16  1:34:43 34650013696 [Warning] WSREP: discarding 4 messages from message index
2017-08-16  1:34:43 34650013696 [Note] WSREP: gcomm: closed
2017-08-16  1:34:43 34650014976 [Note] WSREP: Received self-leave message.
2017-08-16  1:34:43 34650014976 [Note] WSREP: comp msg error in core 53
2017-08-16  1:34:43 34650014976 [Note] WSREP: Closing send monitor...
2017-08-16  1:34:43 34650014976 [Note] WSREP: Closed send monitor.
2017-08-16  1:34:43 34650017536 [Note] WSREP: New cluster view: global state: 00000000-0000-0000-0000-000000000000:0, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version -1
2017-08-16  1:34:43 34650017536 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-08-16  1:34:43 34650014976 [Note] WSREP: Closing replication queue.
2017-08-16  1:34:43 34650014976 [Note] WSREP: Closing slave action queue.
2017-08-16  1:34:43 34650014976 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 1939074)
2017-08-16  1:34:43 34650014976 [Note] WSREP: RECV thread exiting -53: Software caused connection abort
2017-08-16  1:34:43 34650017536 [Note] WSREP: applier thread exiting (code:6)
2017-08-16  2:34:06 44777171968 [Warning] Aborted connection 4876891 to db: 'zabbix' user: 'zabbix' host: '

Any insights into what's going on here?

Thanks

Tao

TAO ZHOU

unread,
Aug 16, 2017, 9:03:19 PM8/16/17
to codership
Galera crashed again today. I just found that the node that crashed has galera 25.3.20_2 installed while the other node has galera 25.3.21 installed. Not sure if that's the cause.
I have just upgraded it. Hope it will solve the problem.

alexey.y...@galeracluster.com

unread,
Aug 17, 2017, 6:44:11 AM8/17/17
to TAO ZHOU, codership
Most likely it is. A similar bug was fixed in 3.21.
Reply all
Reply to author
Forward
0 new messages