WSREP: exception from gcomm, backend must be restarted: bytes

TAO ZHOU

unread,

Aug 16, 2017, 2:11:16 AM8/16/17

to codership

I upgraded my existing mariadb with 40G data to galera multi-master cluster.

Here's how I did it.

1. add the wsrep parameters in my.cnf

2. restart the existing db with --wsrep-new-cluster

3. start a second node

4. start a garbd daemon.

It had been running well for 2 days.

The first node suddenly crashed this morning, with the following error logs:

2017-08-15 14:35:16 34650013696 [Note] WSREP: (5813fff3, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-08-15 17:22:22 44802375680 [Warning] Aborted connection 6055867 to db: 'zabbix' user: 'zabbix' host: '203.29.62.201' (Got timeout reading communication packets)
2017-08-15 23:31:53 44450496000 [Warning] Aborted connection 5601458 to db: 'unconnected' user: 'mysqldump' host: 'localhost' (Got timeout reading communication packets)
2017-08-16  1:34:42 34650013696 [ERROR] WSREP: exception from gcomm, backend must be restarted: bytes_transferred == 0:  (FATAL)
         at gcomm/src/asio_tcp.cpp:write_handler():265
2017-08-16  1:34:42 34650013696 [Note] WSREP: gcomm: terminating thread
2017-08-16  1:34:42 34650013696 [Note] WSREP: gcomm: joining thread
2017-08-16  1:34:42 34650013696 [Note] WSREP: gcomm: closing backend
2017-08-16  1:34:42 34650013696 [Note] WSREP: Forced PC close
2017-08-16  1:34:43 34650013696 [Warning] WSREP: discarding 4 messages from message index
2017-08-16  1:34:43 34650013696 [Note] WSREP: gcomm: closed
2017-08-16  1:34:43 34650014976 [Note] WSREP: Received self-leave message.
2017-08-16  1:34:43 34650014976 [Note] WSREP: comp msg error in core 53
2017-08-16  1:34:43 34650014976 [Note] WSREP: Closing send monitor...
2017-08-16  1:34:43 34650014976 [Note] WSREP: Closed send monitor.
2017-08-16  1:34:43 34650017536 [Note] WSREP: New cluster view: global state: 00000000-0000-0000-0000-000000000000:0, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version -1
2017-08-16  1:34:43 34650017536 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-08-16  1:34:43 34650014976 [Note] WSREP: Closing replication queue.
2017-08-16  1:34:43 34650014976 [Note] WSREP: Closing slave action queue.
2017-08-16  1:34:43 34650014976 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 1939074)
2017-08-16  1:34:43 34650014976 [Note] WSREP: RECV thread exiting -53: Software caused connection abort
2017-08-16  1:34:43 34650017536 [Note] WSREP: applier thread exiting (code:6)
2017-08-16  2:34:06 44777171968 [Warning] Aborted connection 4876891 to db: 'zabbix' user: 'zabbix' host: '

Any insights into what's going on here?

Thanks

Tao

TAO ZHOU

unread,

Aug 16, 2017, 9:03:19 PM8/16/17

to codership

Galera crashed again today. I just found that the node that crashed has galera 25.3.20_2 installed while the other node has galera 25.3.21 installed. Not sure if that's the cause.
I have just upgraded it. Hope it will solve the problem.

alexey.y...@galeracluster.com

unread,

Aug 17, 2017, 6:44:11 AM8/17/17

to TAO ZHOU, codership

Most likely it is. A similar bug was fixed in 3.21.

Reply all

Reply to author

Forward