The galera node is shutdown with update query error

Oopos

unread,

Jan 19, 2012, 4:12:44 AM1/19/12

to codership

Does it mean the writeset not work normal?

version:
I compiled from source galera/2.x revision 107, and mysql is 5.5.17.
120115 11:36:47 [Note] WSREP: wsrep_load(): Galera 2.0(r107) by
Codership Oy <in...@codership.com> loaded succesfully.

error info1:
120117 10:40:58 [ERROR] Slave SQL: Could not execute Update_rows event
on table box1.files79; Can't find record in 'files79', Error_code:
1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log
FIRST, end_log_pos 332, Error_code: 1032
120117 10:40:58 [Warning] WSREP: RBR event 2 Update_rows apply
warning: 120, 56966694
120117 10:40:58 [ERROR] WSREP: Failed to apply trx: source:
4e7c13d0-3e46-11e1-0800-536548e34260 version: 1 local: 0 state:
CERTIFYING flags: 1 conn_id: 66102253 trx_id: 463250376 seqnos (l:
36123969, g: 56966694, s: 56966693, d: 56950271, ts:
1326768044390968968)
120117 10:40:58 [ERROR] WSREP: Failed to apply app buffer: �� O ,
seqno: 56966694, status: WSREP_FATAL
at galera/src/replicator_smm.cpp:apply_wscoll():51
at galera/src/replicator_smm.cpp:apply_trx_ws():122
120117 10:40:58 [ERROR] WSREP: Node consistency compromized,
aborting...
120117 10:40:58 [Note] WSREP: Closing send monitor...
120117 10:40:58 [Note] WSREP: Closed send monitor.
120117 10:40:58 [Note] WSREP: gcomm: terminating thread
120117 10:40:58 [Note] WSREP: gcomm: joining thread
120117 10:40:58 [Note] WSREP: gcomm: closing backend

Seppo Jaakola

unread,

Jan 24, 2012, 4:51:31 AM1/24/12

to codership

This case is still open. I have been investigating this issue with the
help from Oopos, and so far now obvious explanation for the node crash
has been found. There is a suspect that updates on unique key column
could cause the problem. I'm running a few long term tests which
simulate this use case.

-seppo

On 19 tammi, 11:12, Oopos <myoo...@gmail.com> wrote:
> Does it mean the writeset not work normal?
>
> version:
> I compiled from source galera/2.x revision 107, and mysql is 5.5.17.
> 120115 11:36:47 [Note] WSREP: wsrep_load(): Galera 2.0(r107) by

> Codership Oy <i...@codership.com> loaded succesfully.

Oopos

unread,

Jan 30, 2012, 9:12:14 PM1/30/12

to codership

It seems all galera version (1.x,2.x,etc) has this bug, cannot work
with a table which has an unique column key like "UNIQUE KEY `name`
(`name`)".

Alex Yurchenko

unread,

Nov 28, 2012, 9:03:26 AM11/28/12

to codersh...@googlegroups.com

On 2012-11-28 14:59, Ruud G. wrote:
> Hello,
>
> I've just caught the same.

Probably not. Does the table in question have no primary key but still
has a unique key defined?

> Is this probem not solved yet?

It is solved. You're most likely seeing a different issue. Please post
exact error message, involved table definition and other relevant
information (like Galera and MySQL versions, how many nodes in cluster,
how many had this error, etc.)

> вторник, 31 января 2012 г., 6:12:14 UTC+4 пользователь Oopos написал:

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Ruud G.

unread,

Nov 28, 2012, 10:31:52 AM11/28/12

to codersh...@googlegroups.com

It seems actually not my case because I have primary key and my unique key is not single (composite).

I had this error only one time and can't reproduce it anymore after many attempts.

But I'd be grateful if you would have a look on my case:

121128 15:58:36 [ERROR] Slave SQL: Could not execute Update_rows event on table db.g_character;

Can't find record in 'g_character', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND;

the event's master log FIRST, end_log_pos 138, Error_code: 1032

121128 15:58:36 [Warning] WSREP: RBR event 2 Update_rows apply warning: 120, 2846616

121128 15:58:36 [ERROR] WSREP: Failed to apply trx:

source: 7750bfd2-3951-11e2-0800-6ac93456152e version: 2 local: 0

state: APPLYING flags: 1 conn_id: 219 trx_id: 49650137

seqnos (l: 2871111, g: 2846616, s: 2846615, d: 2846564, ts: 1354103916789135518)

121128 15:58:36 [ERROR] WSREP: Failed to apply app buffer: l..P^S, seqno: 2846616, status: WSREP_FATAL

at galera/src/replicator_smm.cpp:apply_wscoll():49

at galera/src/replicator_smm.cpp:apply_trx_ws():120

121128 15:58:36 [ERROR] WSREP: Node consistency compromized, aborting...

Create Table: CREATE TABLE `g_character` (

`id` int(10) unsigned NOT NULL AUTO_INCREMENT,

`guid` int(10) unsigned NOT NULL,

`gid` smallint(5) unsigned NOT NULL,

`wid` int(10) unsigned NOT NULL,

`fid` smallint(5) unsigned NOT NULL,

`val` int(10) unsigned NOT NULL,

`updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

PRIMARY KEY (`id`),

UNIQUE KEY `atom` (`guid`,`fid`,`gid`,`wid`),

KEY `updated` (`updated`),

KEY `fid` (`fid`),

KEY `gid` (`gid`),

KEY `wid` (`wid`),

) ENGINE=InnoDB

Software:

percona-xtradb-cluster-5.5-28

3node cluster, 1 master (for writes)

среда, 28 ноября 2012 г., 18:03:26 UTC+4 пользователь Alexey Yurchenko написал:

Alex Yurchenko

unread,

Nov 28, 2012, 1:36:51 PM11/28/12

to codersh...@googlegroups.com

On 2012-11-28 17:31, Ruud G. wrote:
> It seems actually not my case because I have primary key and my
> unique key
> is not single (composite).
> I had this error only one time and can't reproduce it anymore after
> many
> attempts.

After you restarted the node, did it have IST or SST?

Ruud G.

unread,

Nov 28, 2012, 2:00:06 PM11/28/12

to codersh...@googlegroups.com

It has SST:

121128 17:02:07 [Note] WSREP: State transfer required:

Ruud G.

unread,

Nov 28, 2012, 2:02:48 PM11/28/12

to codersh...@googlegroups.com

Sorry, I think right log entry that confirm SST is

21128 17:02:09 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) do

at galera/src/replicator_str.cpp:prepare_for_IST():440. IST will be unavailable.

среда, 28 ноября 2012 г., 23:00:06 UTC+4 пользователь Ruud G. написал:

Alex Yurchenko

unread,

Nov 28, 2012, 2:27:28 PM11/28/12

to codersh...@googlegroups.com

On 2012-11-28 21:02, Ruud G. wrote:
> Sorry, I think right log entry that confirm SST is
>
> 21128 17:02:09 [Warning] WSREP: Failed to prepare for incremental
> state
> transfer: Local state UUID (00000000-0000-0000-0000-000000000000) do
> at galera/src/replicator_str.cpp:prepare_for_IST():440. IST will be
> unavailable.

Hm, yes, I guess Galera disables IST in case of detected data
inconsistency.

Unfortunately this makes it harder to discriminate between the bug in
applier code and general data inconsistency which there may be many
causes for, some of them being user fault. Given that it does not
reproduce for you any more I would not rule out the latter.

In any case, to be able to diagnose it if that happens again following
things can be done:

1) set core-file option to mysqld.
2) enable binlog and log_slave_udates on slaves.
3) backup crashed sever data directory before restart.
4) start crashed server in a standalone mode and compare the database
between the node and the remaining cluster. Exact comparison is hard to
suggest without knowing the application logic, but the point should be
to determine whether all rows that were supposed to be in the database
before crash are the same. If they are (or very small fraction is
different) - this is a bug in applier code. If there are massive
differences, even in the slow changing tables, most likely the states
diverged long before the crash, and most likely that was a user fault .

Reply all

Reply to author

Forward