I upgraded my existing mariadb with 40G data to galera multi-master cluster.
Here's how I did it.
1. add the wsrep parameters in my.cnf
2. restart the existing db with --wsrep-new-cluster
3. start a second node
4. start a garbd daemon.
It had been running well for 2 days.
The first node suddenly crashed this morning, with the following error logs:
2017-08-15 14:35:16 34650013696 [Note] WSREP: (5813fff3, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-08-15 17:22:22 44802375680 [Warning] Aborted connection 6055867 to db: 'zabbix' user: 'zabbix' host: '203.29.62.201' (Got timeout reading communication packets)
2017-08-15 23:31:53 44450496000 [Warning] Aborted connection 5601458 to db: 'unconnected' user: 'mysqldump' host: 'localhost' (Got timeout reading communication packets)
2017-08-16 1:34:42 34650013696 [ERROR] WSREP: exception from gcomm, backend must be restarted: bytes_transferred == 0: (FATAL)
at gcomm/src/asio_tcp.cpp:write_handler():265
2017-08-16 1:34:42 34650013696 [Note] WSREP: gcomm: terminating thread
2017-08-16 1:34:42 34650013696 [Note] WSREP: gcomm: joining thread
2017-08-16 1:34:42 34650013696 [Note] WSREP: gcomm: closing backend
2017-08-16 1:34:42 34650013696 [Note] WSREP: Forced PC close
2017-08-16 1:34:43 34650013696 [Warning] WSREP: discarding 4 messages from message index
2017-08-16 1:34:43 34650013696 [Note] WSREP: gcomm: closed
2017-08-16 1:34:43 34650014976 [Note] WSREP: Received self-leave message.
2017-08-16 1:34:43 34650014976 [Note] WSREP: comp msg error in core 53
2017-08-16 1:34:43 34650014976 [Note] WSREP: Closing send monitor...
2017-08-16 1:34:43 34650014976 [Note] WSREP: Closed send monitor.
2017-08-16 1:34:43 34650017536 [Note] WSREP: New cluster view: global state: 00000000-0000-0000-0000-000000000000:0, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version -1
2017-08-16 1:34:43 34650017536 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-08-16 1:34:43 34650014976 [Note] WSREP: Closing replication queue.
2017-08-16 1:34:43 34650014976 [Note] WSREP: Closing slave action queue.
2017-08-16 1:34:43 34650014976 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 1939074)
2017-08-16 1:34:43 34650014976 [Note] WSREP: RECV thread exiting -53: Software caused connection abort
2017-08-16 1:34:43 34650017536 [Note] WSREP: applier thread exiting (code:6)
2017-08-16 2:34:06 44777171968 [Warning] Aborted connection 4876891 to db: 'zabbix' user: 'zabbix' host: '
Any insights into what's going on here?