Entire cluster breaks down when 'Error_code: 1062' occurs.

118 views
Skip to first unread message

Galleria

unread,
Jun 26, 2014, 1:56:02 PM6/26/14
to codersh...@googlegroups.com
Hello all.  We are running into a scenario where our entire 3 node cluster becomes unusable.  We don't use multi-master and have host 01 used for writing but all three are intended for reading.  The problem starts when host 02 and 03 log the following:

[ERROR] Slave SQL: Could not execute Write_rows event on table mydb.mytbl; Duplicate entry '0' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 35362, Error_code: 1062

These two instances then proceed to terminate while the 'master' keeps running but with 'Received NON-PRIMARY' which results in any query against it returning 'unknown command', possibly due to loosing quorum.

We're then forced to restart-bootstrap 01 prior to starting the fallen comrades.  We're running the following on CentOS:

Percona-XtraDB-Cluster-server-56-5.6.15-25.5.759.rhel6.x86_64
Percona-XtraDB-Cluster-galera-2-2.8-1.157.rhel6.x86_64


The pertinent config file and detailed logs are attached.  Any suggestions to deal with this scenario will be greatly appreciated.  Thanks!


my.cnf
01-error.log
02-error.log
03-error.log

Daniel Black

unread,
Jun 26, 2014, 5:51:19 PM6/26/14
to Galleria, codersh...@googlegroups.com

> [ ERROR ] Slave SQL : Could not execute Write_rows event on table mydb
> . mytbl ; Duplicate entry '0' for key 'PRIMARY' , Error_code : 1062 ;
> handler error HA_ERR_FOUND_DUPP_KEY ; the event 's master log FIRST,
> end_log_pos 35362, Error_code: 1062


It looks like 01 is inserting a row into mydb.mytbl and the 02/03 already has a 0 for the primary key (hence unique).

There wasn't a 0 primary key entry in the table on 01.

So you have two different database contents, or perhaps table definitions, between the 01 and 02/03 machines.

--
Daniel Black, Engineer @ Open Query (http://openquery.com.au)
Remote expertise & maintenance for MySQL/MariaDB server environments.

Galleria

unread,
Jun 26, 2014, 6:25:23 PM6/26/14
to codersh...@googlegroups.com, ba...@axelabs.com, daniel...@openquery.com
From what our development team says, it seems there is a transaction on the master which gets rolled back when that duplicate is encountered but somehow it is still being replicated to the other cluster members.  We are looking at this bug possibly being related:

The shutdown of the write master was due to https://bugs.launchpad.net/galera/+bug/1217225 where we got around it by updating to 

Percona-XtraDB-Cluster-galera-2-2.10-1.188.rhel6.x86_64.

Reply all
Reply to author
Forward
0 new messages