Failed update compromised 3 out of 4 nodes and made them abort!

863 views
Skip to first unread message

Ilias Bertsimas

unread,
Oct 22, 2012, 6:35:10 AM10/22/12
to codersh...@googlegroups.com
Hello,

I've got that error on 3 out of the 4 nodes and those 3 nodes aborted the cluster.


121021 19:53:58 [Note] WSREP: Member 2 (node1) synced with group.
121022 11:24:11 [ERROR] Slave SQL: Could not execute Update_rows event on table data.users; Can't find record in 'users', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 237, Error
_code: 1032
121022 11:24:11 [Warning] WSREP: RBR event 2 Update_rows apply warning: 120, 311133546
121022 11:24:11 [Note] WSREP: (6abe38ee-0e32-11e2-0800-8058fe2de605, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.0.66:4567
121022 11:24:11 [ERROR] WSREP: Failed to apply trx: source: 282642e0-1ba7-11e2-0800-f2df0496c154 version: 2 local: 0 state: CERTIFYING flags: 1 conn_id: 436280 trx_id: 37413409828 seqnos (l: 302479118, g: 311133546, s: 311133544, d: 3111
33285, ts: 1350897851536060528)
121022 11:24:11 [ERROR] WSREP: Failed to apply app buffer: »
PÉ, seqno: 311133546, status: WSREP_FATAL
         at galera/src/replicator_smm.cpp:apply_wscoll():50
         at galera/src/replicator_smm.cpp:apply_trx_ws():121
121022 11:24:11 [ERROR] WSREP: Node consistency compromized, aborting...
121022 11:24:11 [Note] WSREP: Closing send monitor...
121022 11:24:11 [Note] WSREP: Closed send monitor.
121022 11:24:11 [Note] WSREP: gcomm: terminating thread
121022 11:24:11 [Note] WSREP: gcomm: joining thread
121022 11:24:11 [Note] WSREP: gcomm: closing backend

Any idea what was that compromised the whole cluster ?

Kind Regards,
Ilias.

Henrik Ingo

unread,
Oct 22, 2012, 1:46:18 PM10/22/12
to Ilias Bertsimas, codersh...@googlegroups.com
Hi Ilias

For some reason your table data.users is different on different nodes.
This is not good.

Now an update that has modified 1 row on the master fails, because the
same row is not found on the slave. In this case Galera shuts down the
slave node. This is by design. Unlike MySQL replication Galera will
enforce your nodes to not diverge. In your case it seems this was the
case for all 3 slave nodes.

So the mystery is, how can the tables be different? Some guesses to look at:

What is your SST method? Are you sure it succeeded without errors?
(Check error logs on all servers.)
Are you or have you been using binlog-do-db, et al? (Check my.cnf /
SHOW VARIABLES on all servers.)
Have you at some point executed transactions with wsrep_on=off?
Have you recently added auto_increment columns with ALTER TABLE?


henrik
> --
>
>



--
henri...@avoinelama.fi
+358-40-8211286 skype: henrik.ingo irc: hingo
www.openlife.cc

My LinkedIn profile: http://www.linkedin.com/profile/view?id=9522559

Ilias Bertsimas

unread,
Oct 22, 2012, 2:03:58 PM10/22/12
to codersh...@googlegroups.com, Ilias Bertsimas, henri...@avoinelama.fi
Hi Henrik,

I use xtrabackup sst and yes every sst ever run completed without errors. 
None of what you mention applies in my case that is why I am trying to figure out how could the 3 slaves diverge from the master in terms of data.

I had some network issues last week and lots of cluster negotiations and configuration changes, maybe something happened there that made the cluster inconsistent.

Another question is why they did not try to perform an SST and just aborted all together, does it have to do with that only 1 node was left and lost PRIMARY mode due to that ?


Kind Regards,
Ilias. 

Henrik Ingo

unread,
Oct 22, 2012, 4:33:35 PM10/22/12
to Ilias Bertsimas, codersh...@googlegroups.com
On Mon, Oct 22, 2012 at 9:03 PM, Ilias Bertsimas <awar...@gmail.com> wrote:
> I had some network issues last week and lots of cluster negotiations and
> configuration changes, maybe something happened there that made the cluster
> inconsistent.

Possibly. The one question I forgot is that you could startup the
failed nodes in isolated mode (ie not connected to cluster and also
don't let application write to them) to investigate how they really
differ. Is one row missing? Is the whole table missing? Are the rows
there but primary keys out of sync? Are there also other tables that
have diverged or just this one?


> Another question is why they did not try to perform an SST and just aborted
> all together, does it have to do with that only 1 node was left and lost
> PRIMARY mode due to that ?

No, this is the intended behavior whether it happens to one node or
all slaves. The point is just to shut down the node and leave it to
you to decide what to do. For instance, you may want to look at the
nodes to investigate what has happened, and you may want to manually
salvage some data. Note that Galera only knows that the databases have
diverged, it cannot know which database is "better", since this is a
subjective judgement by you. So the action is just to shut down the
failed node and leave it to you to investigate and take action.

If you then decide that the single node that is still alive has the
"good" database, you could just restart the failed nodes, they will
wipe out their current database, do full SST after which we hope they
really are identical with the donor node and everything will be fine
again.

Hint: You may want to take a copy of the database of failed nodes and
keep them for some months into the future. If a user complains that
his data has gone missing, you could go into your saved copies to see
if it is there. This could provide clues to what has happened and
when, or at least just allow you to manually "copy-paste" the lost
data into the running cluster.

henrik

Ilias Bertsimas

unread,
Oct 22, 2012, 6:25:46 PM10/22/12
to codersh...@googlegroups.com, Ilias Bertsimas, henri...@avoinelama.fi
Unfortunately I had to bring everything back up as it was a production cluster.
I had the node that did not abort as the master so all writes happened there so I assumed it to be the most consistent one.
I have backups in case I need to restore some data that was lost at some point.
I tried starting the nodes but they were failing to connect to the remaining node and they were timing out making them abort and shutdown.
The only thing that worked was starting thr cluster anew in the remaining node.

Ilias.

Henrik Ingo

unread,
Oct 23, 2012, 3:38:47 AM10/23/12
to Ilias Bertsimas, codersh...@googlegroups.com
Well, if you were only writing to that one master, then this is of
course the correct fix.

henrik

seppo....@codership.com

unread,
Oct 23, 2012, 3:45:41 AM10/23/12
to codersh...@googlegroups.com
The reason for the inconsistency in data.users table should be figured out.

Could you send me (seppo....@codership.com)for troubleshooting:

1. full mysql error log from any of the crashed nodes
2. SHOW CREATE TABLE data.users;
3. From mysql data directory, from any of the crashed nodes, all files
with name
like: GRA_#_#.log

-seppo

Quoting Ilias Bertsimas <awar...@gmail.com>:

> Unfortunately I had to bring everything back up as it was a production
> cluster.
> I had the node that did not abort as the master so all writes happened
> there so I assumed it to be the most consistent one.
> I have backups in case I need to restore some data that was lost at some
> point.
> I tried starting the nodes but they were failing to connect to the
> remaining node and they were timing out making them abort and shutdown.
> The only thing that worked was starting thr cluster anew in the remaining
> node.
>
> Ilias.
>
> On Monday, October 22, 2012 9:33:38 PM UTC+1, Henrik Ingo wrote:
>>
>> On Mon, Oct 22, 2012 at 9:03 PM, Ilias Bertsimas
>> <awar...@gmail.com<javascript:>>

Alex Yurchenko

unread,
Oct 24, 2012, 1:29:38 PM10/24/12
to codersh...@googlegroups.com
On 2012-10-23 01:25, Ilias Bertsimas wrote:
> Unfortunately I had to bring everything back up as it was a
> production
> cluster.
> I had the node that did not abort as the master so all writes
> happened
> there so I assumed it to be the most consistent one.
> I have backups in case I need to restore some data that was lost at
> some
> point.
> I tried starting the nodes but they were failing to connect to the
> remaining node and they were timing out making them abort and
> shutdown.

Yes, because the node they were connecting to was not a part of primary
component (and they learned it by actually connecting to it: they
connected to the node but timed out connecting to primary component).
This is done to indicate to you that the node you're trying to connect
to might not be the right one.

> The only thing that worked was starting thr cluster anew in the
> remaining
> node.

If a node, or a group of nodes loose primary component status do to
catastrophic failure of majority, it can be restored by

mysql> SET GLOBAL wsrep_provider_options='pc.bootstrap=1';

on one of the nodes in question.

Regards,
Alex

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Ilias Bertsimas

unread,
Oct 24, 2012, 1:49:55 PM10/24/12
to codersh...@googlegroups.com
Thanks Alexey,
Good to know about the bootstrap option!

Amol Kedar

unread,
Apr 23, 2013, 11:53:44 PM4/23/13
to codersh...@googlegroups.com, Ilias Bertsimas, henri...@avoinelama.fi
Hi Henrik,
you have mentioned in you reply about

Have you recently added auto_increment columns with ALTER TABLE? 

yes i did that this morning on my cluster db in the development env like this....

ALTER table Token add column ID int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;

and my cluster db crashed, with node 1 loosing primary status and other 2 nodes crashing with this error

30423 23:03:37 [ERROR] Slave SQL: Could not execute Delete_rows event on table mobiledb.Token; Can't find record in 'Token', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 171, Error_code: 1032
130423 23:03:37 [Warning] WSREP: RBR event 2 Delete_rows apply warning: 120, 405361
130423 23:03:37 [ERROR] WSREP: Failed to apply trx: source: 449ce03a-a790-11e2-0800-f4ff527607ed version: 2 local: 0 state: APPLYING flags: 1 conn_id: 564139 trx_id: 4253473 seqnos (l: 1566, g: 405361, s: 405360, d: 405268, ts: 1366772620124592598)
130423 23:03:37 [ERROR] WSREP: Failed to apply app buffer: seqno: 405361, status: WSREP_FATAL
     at galera/src/replicator_smm.cpp:apply_wscoll():52
     at galera/src/replicator_smm.cpp:apply_trx_ws():118
130423 23:03:37 [ERROR] WSREP: Node consistency compromized, aborting...
130423 23:03:37 [Note] WSREP: Closing send monitor...
130423 23:03:37 [Note] WSREP: Closed send monitor.
130423 23:03:37 [Note] WSREP: gcomm: terminating thread

so i started node 1 with the boothstrap option and other nodes recovered via SST

but my concern is what did i do wrong with that command/statement?

Please let me know

Alex Yurchenko

unread,
Apr 24, 2013, 1:01:37 AM4/24/13
to codersh...@googlegroups.com
https://bugs.launchpad.net/percona-xtradb-cluster/+bug/587170

> On Monday, October 22, 2012 1:46:20 PM UTC-4, Henrik Ingo wrote:
>>
>> Hi Ilias
>>
>> For some reason your table data.users is different on different
>> nodes.
>> This is not good.
>>
>> Now an update that has modified 1 row on the master fails, because
>> the
>> same row is not found on the slave. In this case Galera shuts down
>> the
>> slave node. This is by design. Unlike MySQL replication Galera will
>> enforce your nodes to not diverge. In your case it seems this was the
>> case for all 3 slave nodes.
>>
>> So the mystery is, how can the tables be different? Some guesses to
>> look
>> at:
>>
>> What is your SST method? Are you sure it succeeded without errors?
>> (Check error logs on all servers.)
>> Are you or have you been using binlog-do-db, et al? (Check my.cnf /
>> SHOW VARIABLES on all servers.)
>> Have you at some point executed transactions with wsrep_on=off?
>> Have you recently added auto_increment columns with ALTER TABLE?
>>
>>
>> henrik
>>
>>
>> On Mon, Oct 22, 2012 at 1:35 PM, Ilias Bertsimas
>> <awar...@gmail.com<javascript:>>
>> henri...@avoinelama.fi <javascript:>
>> +358-40-8211286 skype: henrik.ingo irc: hingo
>> www.openlife.cc
>>
>> My LinkedIn profile: http://www.linkedin.com/profile/view?id=9522559
>>

Amol Kedar

unread,
Apr 24, 2013, 12:53:11 PM4/24/13
to codersh...@googlegroups.com
Hi Alexey , 
thanks for the link 

how would you suggest we perform such operation, is the only way out to create a new table, migrate data, drop old table, rename the new table?
or is there a less arduous process?

Alex Yurchenko

unread,
Apr 25, 2013, 2:05:53 AM4/25/13
to codersh...@googlegroups.com
On 2013-04-24 19:53, Amol Kedar wrote:
> Hi Alexey ,
> thanks for the link
>
> how would you suggest we perform such operation, is the only way out
> to
> create a new table, migrate data, drop old table, rename the new
> table?
> or is there a less arduous process?
>

Disable wsrep_auto_increment_control (on ALL nodes) and ALTER the table
(on one node). Reenable wsrep_auto_increment_control afterwards.

Laurent MINOST

unread,
Apr 25, 2013, 4:32:47 AM4/25/13
to codersh...@googlegroups.com
Hi all,

Maybe I'm not understanding properly everything about this problem and if this is the case, sory for this, but this is now many times I see this bug occurs and this one is pretty old (Reported by Alex Yurchenko on 2010-05-29), don't you think this bug should now be fix with a higher priority as the impact can be critical on a running cluster (losing 2 nodes on a 3 nodes cluster can have some consequences on traffic and response time if it is on a production cluster due to the fact that the cluster will run on one node only for a moment, from the time to restart others nodes then resyncing with SST ...)

IMO, this should normally not arrived in a cluster environment as it is against the logic of having a cluster if finally all nodes but one go down !?

Regards,

Laurent

Alex Yurchenko

unread,
Apr 25, 2013, 11:06:39 AM4/25/13
to codersh...@googlegroups.com
I agree that this is totally awkward and embarrassing, but not everyone
here does - and they must have good reasons for it. So go and vote for
it in LP, we got a democracy here.

https://bugs.launchpad.net/codership-mysql/+bug/587170

On 2013-04-25 11:32, Laurent MINOST wrote:
> Hi all,
>
> Maybe I'm not understanding properly everything about this problem and
> if
> this is the case, sory for this, but this is now many times I see this
> bug
> occurs and this one is pretty old (Reported by Alex
> Yurchenko<https://launchpad.net/~ayurchen>

Jay Janssen

unread,
Apr 25, 2013, 11:10:11 AM4/25/13
to Alex Yurchenko, codersh...@googlegroups.com

On Apr 25, 2013, at 8:06 AM, Alex Yurchenko <alexey.y...@codership.com> wrote:

I agree that this is totally awkward and embarrassing, but not everyone here does - and they must have good reasons for it. So go and vote for it in LP, we got a democracy here.

I'm a bit scared about what crazy hack it would take to fix it (or not).  

How does this work on a dual-master setup when auto_increment_* settings are used?  Is it some hint in the replication stream?

Jay Janssen, MySQL Consulting Lead, Percona
Percona Live in Santa Clara, CA  April 22nd-25th 2013

Alex Yurchenko

unread,
Apr 26, 2013, 12:41:09 AM4/26/13
to Jay Janssen, codersh...@googlegroups.com
On 2013-04-25 18:10, Jay Janssen wrote:
> On Apr 25, 2013, at 8:06 AM, Alex Yurchenko
> <alexey.y...@codership.com> wrote:
>
>> I agree that this is totally awkward and embarrassing, but not
>> everyone here does - and they must have good reasons for it. So go and
>> vote for it in LP, we got a democracy here.
>
>
> I'm a bit scared about what crazy hack it would take to fix it (or
> not).
>
> How does this work on a dual-master setup when auto_increment_*
> settings are used? Is it some hint in the replication stream?

Most likely yes. Most likely wsrep support for DDL replication is
simply incomplete: regular STATEMENT replication gets events from IO
cache. Wsrep replicates before execution, so nothing in the cache yet,
so needs to forge events itself and probably does not do it that well.

> Jay Janssen, MySQL Consulting Lead, Percona
> http://about.me/jay.janssen
> Percona Live in Santa Clara, CA April 22nd-25th 2013
> http://www.percona.com/live/mysql-conference-2013/

Reply all
Reply to author
Forward
0 new messages