Crash in 3-node cluster

Luis Fernando Planella Gonzalez

unread,

Aug 11, 2011, 9:53:05 AM8/11/11

to codership

Hi all (again!) :)
I've successfully deployed galera in 3 nodes.
However, after ~1h, with very light db usage (no more than 10 requests
in app) all 3 mysql nodes have crashed.
I've added to /etc/mysql/conf.d/wsrep.cnf the initial cluster url in
all nodes to contain the ip of some other node, so if it restarts, it
will join automatically.
This approach requires a manual intervention in one of them to
temporarily set it to gcomm:// to bootstrap the cluster (BTW, any
better suggestions for the cluster to automatically restart after
rebooting all machines?).
I'm attaching the log without doing that: all nodes were offline after
the crash.
Any help?
Thanks.

Here's the log:
...
110811 10:37:40 [Note] WSREP: GMCast::handle_stable_view:
view(view_id(PRIM,78372883-c418-11e0-0800-04f541689ba3,13) memb {
78372883-c418-11e0-0800-04f541689ba3,
7abaf0ea-c418-11e0-0800-ae1fc9c3d7c6,
b5b26d3c-c41d-11e0-0800-4615aedbfb68,
} joined {
} left {
} partitioned {
})
110811 10:37:40 [Note] WSREP: New COMPONENT: primary = yes, my_idx =
2, memb_num = 3
110811 10:37:40 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
110811 10:37:40 [Note] WSREP: declaring 78372883-
c418-11e0-0800-04f541689ba3 stable
110811 10:37:40 [Note] WSREP: declaring 7abaf0ea-c418-11e0-0800-
ae1fc9c3d7c6 stable
110811 10:37:40 [Note] WSREP: STATE EXCHANGE: sent state msg: 158a2ddb-
c41f-11e0-0800-50f42779c2ce
110811 10:37:40 [Note] WSREP: STATE EXCHANGE: got state msg: 158a2ddb-
c41f-11e0-0800-50f42779c2ce from 0 (server2)
110811 10:37:40 [Note] WSREP: STATE EXCHANGE: got state msg: 158a2ddb-
c41f-11e0-0800-50f42779c2ce from 1 (server3)
110811 10:37:40 [Note] WSREP: STATE EXCHANGE: got state msg: 158a2ddb-
c41f-11e0-0800-50f42779c2ce from 2 (server1)
110811 10:37:40 [Warning] WSREP: Quorum: No node with complete state:

Version : 0
Flags : 1
Protocols : 0 / 1 / 0
State : NON-PRIMARY
Prim state : SYNCED
Prim UUID : c7709851-c41d-11e0-0800-b25958df6cd9
Prim JOINED : 1
Prim seqno : 9
Global seqno : 17
State UUID : 158a2ddb-c41f-11e0-0800-50f42779c2ce
Group UUID : d214b6c5-c41a-11e0-0800-9ec4a040a0e1
Name : 'server2'
Incoming addr: '192.168.12.2'

Version : 0
Flags : 0
Protocols : 0 / 1 / 0
State : NON-PRIMARY
Prim state : SYNCED
Prim UUID : b5f96b3e-c41d-11e0-0800-1d6122f3fd5e
Prim JOINED : 2
Prim seqno : 8
Global seqno : 17
State UUID : 158a2ddb-c41f-11e0-0800-50f42779c2ce
Group UUID : d214b6c5-c41a-11e0-0800-9ec4a040a0e1
Name : 'server3'
Incoming addr: '192.168.12.3'

Version : 0
Flags : 0
Protocols : 0 / 1 / 0
State : NON-PRIMARY
Prim state : SYNCED
Prim UUID : c7709851-c41d-11e0-0800-b25958df6cd9
Prim JOINED : 1
Prim seqno : 9
Globa
110811 10:37:40 [Note] WSREP: Complete merge of primary c7709851-
c41d-11e0-0800-b25958df6cd9 found: 1 of 1.
110811 10:37:40 [ERROR] WSREP: gcs/src/
gcs_state_msg.c:state_quorum_remerge():511: Found more than one re-
merged primary component candidate.
110811 10:37:40 [ERROR] WSREP: gcs/src/
gcs_state_msg.c:gcs_state_msg_get_quorum():537: Failed to establish
quorum.
110811 10:37:40 [Note] WSREP: Quorum results:
version = 4294967295,
component = NON-PRIMARY,
conf_id = -1,
members = 1/3 (joined/total),
act_id = -1,
last_appl. = 0,
protocols = -1/-1/-1 (gcs/repl/appl),
group UUID = 00000000-0000-0000-0000-000000000000
110811 10:37:40 [Note] WSREP: Flow-control interval: [14, 28]
110811 10:37:40 [Note] WSREP: Received NON-PRIMARY.
110811 10:37:40 [ERROR] WSREP: Unsupported application protocol
version: -1
110811 10:37:40 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this
binary
or one of the libraries it was linked against is corrupt, improperly
built,
or misconfigured. This error can also be caused by malfunctioning
hardware.
We will try our best to scrape up some info that will hopefully help
diagnose
the problem, but since we have already crashed, something is
definitely wrong
and this may fail.

key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=10
max_threads=151
threads_connected=9
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads =
346566 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x147cf70
Attempting backtrace. You can use the following information to find
out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fb4c28d9ea8 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x88e6a5]
/usr/sbin/mysqld(handle_segfault+0x344)[0x5cbad4]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfc60)[0x7fb4c6bf0c60]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7fb4c5fa4d05]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x186)[0x7fb4c5fa8ab6]
/usr/sbin/mysqld[0x746c7b]
/usr/lib/galera/
libgalera_smm.so(_ZN6galera13ReplicatorSMM17process_view_infoEPvRK15wsrep_view_infoNS_10Replicator5StateEl
+0x202)[0x7fb4c4cdd752]
/usr/lib/galera/
libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvPKvm12gcs_act_typell
+0x37e)[0x7fb4c4cd53be]
/usr/lib/galera/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPv
+0x81)[0x7fb4c4cd5551]
/usr/lib/galera/
libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x118)
[0x7fb4c4cd8558]
/usr/lib/galera/libgalera_smm.so(galera_recv+0x23)[0x7fb4c4ceb953]
/usr/sbin/mysqld(_Z25wsrep_replication_processP3THD+0x10e)[0x5dcd8e]
/usr/sbin/mysqld(start_wsrep_THD+0x28a)[0x5d1e2a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6d8c)[0x7fb4c6be7d8c]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb4c605704d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query ((nil)): is an invalid pointer
Connection ID (thread ID): 1
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html
contains
information that should help you find out what is causing the crash.
110811 10:37:40 [Note] WSREP: wsrep_load(): loading provider library '/
usr/lib/galera/libgalera_smm.so'
110811 10:37:40 [Note] WSREP: wsrep_load(): Galera 0.8.1(r57) by
Codership Oy <in...@codership.com> loaded succesfully.
110811 10:37:40 [Note] WSREP: Passing config to GCS: gcs.fc_debug = 0;
gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO;
gcs.max_packet_size = 64500; gcs.max_throttle = 0.25;
gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit =
0.25; replicator.commit_order = 3
110811 10:37:40 [Note] WSREP: wsrep_sst_grab()
110811 10:37:40 [Note] WSREP: Start replication
110811 10:37:40 [Note] WSREP: Found saved state: d214b6c5-
c41a-11e0-0800-9ec4a040a0e1:17
110811 10:37:40 [Note] WSREP: Assign initial position for
certification: 17, protocol version: 1
110811 10:37:40 [Note] WSREP: Setting initial position to d214b6c5-
c41a-11e0-0800-9ec4a040a0e1:17
110811 10:37:40 [Note] WSREP: protonet asio version 0
110811 10:37:40 [Note] WSREP: backend: asio
110811 10:37:40 [Note] WSREP: GMCast version 0
110811 10:37:40 [Note] WSREP: (1598b0e5-c41f-11e0-0800-171bd22bc8c5,
'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
110811 10:37:40 [Note] WSREP: (1598b0e5-c41f-11e0-0800-171bd22bc8c5,
'tcp://0.0.0.0:4567') multicast: , ttl: 1
110811 10:37:40 [Note] WSREP: EVS version 0
110811 10:37:40 [Note] WSREP: PC version 0
110811 10:37:40 [Note] WSREP: gcomm: connecting to group 'cyclos',
peer '192.168.12.2:'
110811 10:37:41 [Warning] WSREP: no nodes coming from prim view, prim
not possible

Alex Yurchenko

unread,

Aug 11, 2011, 12:18:34 PM8/11/11

to codersh...@googlegroups.com

Whoa!

There we have whole 2 bugs! That happened due to a very particular
partitioning scenario which I guess can be reproduced only by pulling
cables and doing it with a right timing. Have you been pulling cables
there?

Anyway, the probability for such condition should be rather low (unless
you've taken habit to pull cables every second :) ) so you must be able
to continue safely (the nodes still have consistent states). If this
happens again, you must have something odd with your network.

Thanks for reporting,
Alex

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Alex Yurchenko

unread,

Aug 11, 2011, 2:21:20 PM8/11/11

to codersh...@googlegroups.com

On Thu, 11 Aug 2011 20:02:16 +0200, admin extremeshok.com wrote:
> What about a faulty switch, that can cause connections to disconnect
> randomly.

Ok, I reread the subject and put it in a proper context ;)

Well, it could be a faulty switch, but in that case little can be done
besides fixing the switch or waiting for 0.8.2.

A faulty switch could also cause other issues and reveal more bugs and
it should do that rather consistently. Unfortunately we don't have any
and getting one is rather hard (or expensive), so if you want to make
sure that Galera workes reliably with your faulty switch, send to us!

Regards,
Alex

Tuure Laurinolli

unread,

Aug 12, 2011, 4:11:37 AM8/12/11

to codersh...@googlegroups.com

You could probably make a "faulty switch" by running Galera on VMs under a Linux host so that the VM NICs look like TAP interfaces to the host and connecting the TAPs with a bridge on the host. You could then simulate cable disconnections by adding and removing TAP interfaces to the bridge, and also packet loss, delay etc. with netem. I haven't actually tried this yet, but I was planning on doing something along those lines soon(tm).

Alexey Yurchenko

unread,

Aug 15, 2011, 4:48:03 AM8/15/11

to codership

This time a proper unit test did the job.

Thanks for suggestions, everybody.

On Aug 12, 11:11 am, Tuure Laurinolli <tuure.laurino...@portalify.com>
wrote:

Luis Fernando Planella Gonzalez

unread,

Aug 15, 2011, 9:02:09 AM8/15/11

to codership

Sorry for asking again, but now I had another issue with the setup...
In the same 3-node cluster, I've forced the shutdown of one of those 3
(not by pulling cables, but using sysreq+R, sysreq+E, sysreq+S, sysreq
+U, sysreq+O in linux).
The problem is that the other 2 nodes both disconnected, and no longer
accepted connections.
They seem to be in a loop.
Here's the last 485KB :-) from error.log: http://pastebin.com/SGzX2YYv
Is it the same bug?
Best regards.

On 15 ago, 05:48, Alexey Yurchenko <alexey.yurche...@codership.com>
wrote:

Alex Yurchenko

unread,

Aug 16, 2011, 5:06:06 AM8/16/11

to codersh...@googlegroups.com

On Mon, 15 Aug 2011 06:02:09 -0700 (PDT), Luis Fernando Planella
Gonzalez wrote:
> Sorry for asking again, but now I had another issue with the setup...
> In the same 3-node cluster, I've forced the shutdown of one of those
> 3
> (not by pulling cables, but using sysreq+R, sysreq+E, sysreq+S,
> sysreq
> +U, sysreq+O in linux).
> The problem is that the other 2 nodes both disconnected, and no
> longer
> accepted connections.
> They seem to be in a loop.
> Here's the last 485KB :-) from error.log:
> http://pastebin.com/SGzX2YYv
> Is it the same bug?
> Best regards.
>

Hi Luis.

No, it is a totally different issue. It happens at a lower layer and
maybe not bug at all (although we'll be looking into it), but a settings
limitation.

It seems like you're changing cluster configuration faster than it can
track it:

> 110815 9:42:13 [Note] WSREP: remote endpoint tcp://192.168.12.3:4567
> changed identity 9df39267-c735-11e0-0800-0033cc1fbf1c ->
> 070daefb-c73c-11e0-0800-963a64d2fc2a

> 110815 9:42:28 [Note] WSREP: remote endpoint tcp://192.168.12.2:4567
> changed identity 6950244e-c737-11e0-0800-efff3989f381 ->
> 0bc30e8a-c73c-11e0-0800-cdab91fdf848

Which means that two nodes were restarted faster than failure detection
could detect it and, moreover, it happened within 15 seconds whereas
default suspect timeout is 5 seconds, inactive timeout is 15 seconds,
consensus timeout is 30 seconds and keepalive period is 1 second. So
you're changing cluster dangerously fast, when timeouts may not fire on
some nodes and they won't be able to come to consensus later.

(This is not exactly what you reported you did, but this is what
happened nevertheless, and this is where it all started)

If you want faster cluster reactions, you need to adjust keepalive
period and timeouts accordingly.

Regards,
Alex

Luis Fernando Planella Gonzalez

unread,

Aug 16, 2011, 8:40:48 AM8/16/11

to codership

Hi Alex.

> It seems like you're changing cluster configuration faster than it can
> track it:

Well, actually I had just halted one of the cluster nodes.
Anyway, I'll keep evaluating / testing the setup.
Thanks.

On 16 ago, 06:06, Alex Yurchenko <alexey.yurche...@codership.com>
wrote:

Reply all

Reply to author

Forward