mysql failed to start after reboot

3797 views
Skip to first unread message

Paras pradhan

unread,
Sep 27, 2013, 1:51:46 PM9/27/13
to codersh...@googlegroups.com
Hi,

Very new to Galera. I setup a three nodes galera cluster, started mysql and everything looks fine. Replication works. I had to reboot the nodes and after that mysql failed to start,

This is what i see in the logs:

Sep 27 12:45:50 control01 mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: Read nil XID from storage engines, skipping position init
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: wsrep_load(): Galera 23.2.1(r129) by Codership Oy <in...@codership.com> loaded succesfully.
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: Passing config to GCS: base_host = 192.168.0.10; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: wsrep_sst_grab()
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: Start replication
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: protonet asio version 0
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: backend: asio
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: GMCast version 0
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: (a5e05564-279c-11e3-0800-517430f98b6c, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: (a5e05564-279c-11e3-0800-517430f98b6c, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: EVS version 0
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: PC version 0
Sep 27 12:45:50 control01 mysqld: 130927 12:45:50 [Note] WSREP: gcomm: connecting to group 'controller_cluster', peer '192.168.0.12:4567'
Sep 27 12:45:53 control01 mysqld: 130927 12:45:53 [Warning] WSREP: no nodes coming from prim view, prim not possible
Sep 27 12:45:53 control01 mysqld: 130927 12:45:53 [Note] WSREP: view(view_id(NON_PRIM,a5e05564-279c-11e3-0800-517430f98b6c,1) memb {
Sep 27 12:45:53 control01 mysqld: #011a5e05564-279c-11e3-0800-517430f98b6c,
Sep 27 12:45:53 control01 mysqld: } joined {
Sep 27 12:45:53 control01 mysqld: } left {
Sep 27 12:45:53 control01 mysqld: } partitioned {
Sep 27 12:45:53 control01 mysqld: })
Sep 27 12:45:53 control01 mysqld: 130927 12:45:53 [Warning] WSREP: last inactive check more than PT1.5S ago, skipping check
Sep 27 12:45:54 control01 mysqld: 130927 12:45:54 [Note] WSREP: declaring a84e93c1-279c-11e3-0800-704b6289c47e stable
Sep 27 12:45:54 control01 mysqld: 130927 12:45:54 [Warning] WSREP: no nodes coming from prim view, prim not possible
Sep 27 12:45:54 control01 mysqld: 130927 12:45:54 [Note] WSREP: view(view_id(NON_PRIM,a5e05564-279c-11e3-0800-517430f98b6c,2) memb {
Sep 27 12:45:54 control01 mysqld: #011a5e05564-279c-11e3-0800-517430f98b6c,
Sep 27 12:45:54 control01 mysqld: #011a84e93c1-279c-11e3-0800-704b6289c47e,
Sep 27 12:45:54 control01 mysqld: } joined {
Sep 27 12:45:54 control01 mysqld: } left {
Sep 27 12:45:54 control01 mysqld: } partitioned {
Sep 27 12:45:54 control01 mysqld: })
Sep 27 12:46:01 control01 mysqld: 130927 12:46:01 [Note] WSREP: declaring a84e93c1-279c-11e3-0800-704b6289c47e stable
Sep 27 12:46:01 control01 mysqld: 130927 12:46:01 [Note] WSREP: declaring ac8fb116-279c-11e3-0800-eb86ba48cea2 stable
Sep 27 12:46:01 control01 mysqld: 130927 12:46:01 [Warning] WSREP: no nodes coming from prim view, prim not possible
Sep 27 12:46:01 control01 mysqld: 130927 12:46:01 [Note] WSREP: view(view_id(NON_PRIM,a5e05564-279c-11e3-0800-517430f98b6c,3) memb {
Sep 27 12:46:01 control01 mysqld: #011a5e05564-279c-11e3-0800-517430f98b6c,
Sep 27 12:46:01 control01 mysqld: #011a84e93c1-279c-11e3-0800-704b6289c47e,
Sep 27 12:46:01 control01 mysqld: #011ac8fb116-279c-11e3-0800-eb86ba48cea2,
Sep 27 12:46:01 control01 mysqld: } joined {
Sep 27 12:46:01 control01 mysqld: } left {
Sep 27 12:46:01 control01 mysqld: } partitioned {
Sep 27 12:46:01 control01 mysqld: })
Sep 27 12:46:03 control01 /etc/init.d/mysql[10291]: 0 processes alive
Sep 27 12:46:03 control01 /etc/init.d/mysql[10291]: 
Sep 27 12:46:23 control01 mysqld: 130927 12:46:23 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
Sep 27 12:46:23 control01 mysqld: #011 at gcomm/src/pc.cpp:connect():148
Sep 27 12:46:23 control01 mysqld: 130927 12:46:23 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)
Sep 27 12:46:23 control01 mysqld: 130927 12:46:23 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel 'controller_cluster' at 'gcomm://192.168.0.12:4567': -110 (Connection timed out)
Sep 27 12:46:23 control01 mysqld: 130927 12:46:23 [ERROR] WSREP: gcs connect failed: Connection timed out
Sep 27 12:46:23 control01 mysqld: 130927 12:46:23 [ERROR] WSREP: wsrep::connect() failed: 6
Sep 27 12:46:23 control01 mysqld: 130927 12:46:23 [ERROR] Aborting
Sep 27 12:46:23 control01 mysqld: 
Sep 27 12:46:23 control01 mysqld: 130927 12:46:23 [Note] WSREP: Service disconnected.
Sep 27 12:46:24 control01 mysqld: 130927 12:46:24 [Note] WSREP: Some threads may fail to exit.
Sep 27 12:46:24 control01 mysqld: 130927 12:46:24 [Note] /usr/sbin/mysqld: Shutdown complete
Sep 27 12:46:24 control01 mysqld: 
Sep 27 12:46:24 control01 mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended

Any advice?

Thanks !

Paras.

Alex Yurchenko

unread,
Sep 29, 2013, 11:37:04 AM9/29/13
to codersh...@googlegroups.com
> Sep 27 12:45:53 control01 mysqld: 130927 12:45:53 [Warning] WSREP: no
> nodes
> coming from prim view, prim not possible

Oh, you have to learn a very important concept of "Primary Component"
(PC for short).

In short, once the cluster ceases to exist (all nodes down) you need to
bootstrap it again, e.g. by starting one of the nodes with
--wsrep-new-cluster parameter. To avoid this, don't shut down the whole
cluster, restart the nodes one by one.

And, you're using a very old (and pretty buggy) release. The latest GA
releases are:
https://launchpad.net/galera/2.x/24.2.7
https://launchpad.net/codership-mysql/5.5/5.5.33-24.8

Regards,
Alex
--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011
Reply all
Reply to author
Forward
0 new messages