Drupal and Galera cluster: node always fails with " [ERROR] Slave SQL: Could not execute Delete_rows event..."

1,030 views
Skip to first unread message

Erik

unread,
Mar 26, 2014, 11:14:44 AM3/26/14
to codersh...@googlegroups.com
Hi all,

I installed a 2 node + garbd cluster based on Severalnines configurator script.
I use this cluster database for a Drupal 7 website. Initially, only 1 Galera node is used by website (other doesn't receive any read/write requests).

After a while the second node always stop with errors:
...
 [ERROR] Slave SQL: Could not execute Delete_rows event on table NIEUWSBERICHTEN_P.cache; Can't find record in 'cache', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 191, Error_code: 1032
2014-03-26 14:41:04 35362 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 120, 5497
2014-03-26 14:41:04 35362 [Warning] WSREP: Failed to apply app buffer: seqno: 5497, status: 1
         at galera/src/trx_handle.cpp:apply():340
Retrying 2th time
...

Several retries give the same error on different tables (not only the cache table as above).

I tried to tweak the galera config so that it resembles the same config of another cluster which succesfully handles another Drupal 7 usecase (this cluster is MySQL v5.5.29, wsrep_23.7.3.r3853, 2 nodes+1garbd), but no luck.

This failing cluster is:
MySQL v5.6.14
MySQL Community Server (GPL), wsrep_25.1.r4019

Could someone please help me on this ? thx !
(Of course I need some help urgent because going live date is very short, and I thaught it wouldn't give much trouble based on the experience of the other cluster... )

Kind regards,
Erik

Here is the mysql config:
# All files in this package is subject to the GPL v2 license
# More information is in the COPYING file in the top directory of this package.
# Copyright (C) 2011 severalnines.com

[MYSQLD]
user=mysql
basedir=/usr/
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid-file=mysqld.pid
port=3306
log-error=/var/log/mysqld.log
log-warnings=2
#log-output=FILE
### INNODB OPTIONS 
innodb-buffer-pool-size=5591M
innodb-flush-log-at-trx-commit=2
innodb-file-per-table=1
innodb-data-file-path = ibdata1:100M:autoextend
## You may want to tune the below depending on number of cores and disk sub
#innodb-read-io-threads=4 #commented out against my first version
#innodb-write-io-threads=4 #commented out against my first version
#innodb-doublewrite=1 #commented out against my first version
innodb-doublewrite=0
innodb-log-file-size=512M
innodb-log-buffer-size=64M
#innodb-buffer-pool-instances=4 #commented out against my first version
innodb-buffer-pool-instances=1
innodb-log-files-in-group=2
innodb-thread-concurrency=0
#innodb-file-format=barracuda
innodb-flush-method = O_DIRECT
#innodb-locks-unsafe-for-binlog=1
innodb-autoinc-lock-mode=2
## avoid statistics update when doing e.g show tables
innodb-stats-on-metadata=0
default-storage-engine=innodb

# CHARACTER SET
#collation-server = utf8_unicode_ci
#init-connect='SET NAMES utf8'
#character-set-server = utf8

# REPLICATION SPECIFIC - GENERAL
#server-id must be unique across all mysql servers participating in replication.
#server-id=SERVERID
# REPLICATION SPECIFIC
binlog_format=ROW
#log-bin=binlog
#relay-log=relay-bin
#expire_logs_days=7
#log-slave-updates=1
#gtid_mode=ON
#log-bin=binlog
#enforce-gtid-consistency=1
# OTHER THINGS, BUFFERS ETC
key_buffer_size = 24M
tmp_table_size = 64M
max_heap_table_size = 64M
max-allowed-packet = 512M
#sort-buffer-size = 256K
#read-buffer-size = 256K
#read-rnd-buffer-size = 512K
#myisam-sort-buffer_size = 8M
skip-name-resolve
memlock=0
sysdate-is-now=1
max-connections=200
thread-cache-size=512
query-cache-type = 0
query-cache-size = 0
table-open_cache=1024
lower-case-table-names=0
# 5.6 backwards compatibility
explicit_defaults_for_timestamp=1
##
## WSREP options
##

# Full path to wsrep provider library or 'none'
wsrep_provider=/usr/lib64/galera/libgalera_smm.so

wsrep_node_address=10.22.75.186
# Provider specific configuration options
wsrep_provider_options="gcache.size=128M; gmcast.segment=0; pc.ignore_sb = yes"

# Logical cluster name. Should be the same for all nodes.
wsrep_cluster_name="my_wsrep_cluster"

# Group communication system handle
wsrep_cluster_address=gcomm://10.22.75.***:4567,10.22.75.***:4567

# Human-readable node name (non-unique). Hostname by default.
#wsrep_node_name=

# Address for incoming client connections. Autodetect by default.
#wsrep_node_incoming_address=

# How many threads will process writesets from other nodes
wsrep_slave_threads=4

# DBUG options for wsrep provider
#wsrep_dbug_option

# Generate fake primary keys for non-PK tables (required for multi-master
# and parallel applying operation)
wsrep_certify_nonPK=1

# Location of the directory with data files. Needed for non-mysqldump
# state snapshot transfers. Defaults to mysql_real_data_home.
#wsrep_data_home_dir=

# Maximum number of rows in write set
wsrep_max_ws_rows=131072

# Maximum size of write set
wsrep_max_ws_size=1073741824

# to enable debug level logging, set this to 1
wsrep_debug=0

# convert locking sessions into transactions
wsrep_convert_LOCK_to_trx=0

# how many times to retry deadlocked autocommits
wsrep_retry_autocommit=1

# change auto_increment_increment and auto_increment_offset automatically
wsrep_auto_increment_control=1

# replicate myisam
wsrep_replicate_myisam=1
# retry autoinc insert, which failed for duplicate key error
wsrep_drupal_282555_workaround=1

# enable "strictly synchronous" semantics for read operations
wsrep_causal_reads=0

# Command to call when node status or cluster membership changes.
# Will be passed all or some of the following options:
# --status  - new status of this node
# --uuid    - UUID of the cluster
# --primary - whether the component is primary or not ("yes"/"no")
# --members - comma-separated list of members
# --index   - index of this node in the list
#wsrep_notify_cmd=

##
## WSREP State Transfer options
##

# State Snapshot Transfer method
# ClusterControl currently DOES NOT support wsrep_sst_method=mysqldump
wsrep_sst_method=rsync

# Address on THIS node to receive SST at. DON'T SET IT TO DONOR ADDRESS!!!
# (SST method dependent. Defaults to the first IP of the first interface)
#wsrep_sst_receive_address=

# SST authentication string. This will be used to send SST to joining nodes.
# Depends on SST method. For mysqldump method it is root:<root password>
wsrep_sst_auth=root:***********

# Desired SST donor name.
#wsrep_sst_donor=

# Protocol version to use
# wsrep_protocol_version=
[MYSQL]
socket=/var/lib/mysql/mysql.sock
#default-character-set=utf8
[client]
socket=/var/lib/mysql/mysql.sock
#default-character-set=utf8
[mysqldump]
max-allowed-packet = 512M
#default-character-set=utf8
[MYSQLD_SAFE]
pid-file=mysqld.pid
log-error=/var/log/mysqld.log
basedir=/usr/
datadir=/var/lib/mysql

Ilias Bertsimas

unread,
Mar 26, 2014, 12:48:10 PM3/26/14
to codersh...@googlegroups.com
Hello,


On Wednesday, March 26, 2014 3:14:44 PM UTC, Erik wrote:
Hi all,

I installed a 2 node + garbd cluster based on Severalnines configurator script.
I use this cluster database for a Drupal 7 website. Initially, only 1 Galera node is used by website (other doesn't receive any read/write requests).

After a while the second node always stop with errors:
...
 [ERROR] Slave SQL: Could not execute Delete_rows event on table NIEUWSBERICHTEN_P.cache; Can't find record in 'cache', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 191, Error_code: 1032
2014-03-26 14:41:04 35362 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 120, 5497
2014-03-26 14:41:04 35362 [Warning] WSREP: Failed to apply app buffer: seqno: 5497, status: 1
         at galera/src/trx_handle.cpp:apply():340
Retrying 2th time
...


This means the nodes are inconsistent.
I see you have enabled the myisam replication which is experimental. 

Do you have schemas that are MyISAM ? If yes it might explain the issues you are facing.

After the issue the node that aborted is taking a full snapshot of the dataset via SST and fails again afterwards in a similar way ?


Kind Regards,

Ilias.

seppo....@codership.com

unread,
Mar 26, 2014, 1:00:59 PM3/26/14
to codersh...@googlegroups.com
Drupal caching does not work well in multi-master topology. You better
switch to use memcached module for caching.

-seppo
> --
> You received this message because you are subscribed to the Google
> Groups "codership" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to codership-tea...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


--
http://www.codership.com seppo....@codership.com
tel: +358 40 510 5938 skype: seppo_jaakola


Erik

unread,
Mar 26, 2014, 1:20:21 PM3/26/14
to codersh...@googlegroups.com
Hi Seppo,

Currently our 2 other galera cluster setups are running succesfully Drupal for 2 years (mysql 5.1 and wsrep 2.1) and another one with MySql 5.5 and wrep 23.2.
also the errors not only occur on the cache table, also others. So it seems that the new combination with MySQL 5.6 seems to "break" our Drupal usecase.

eg.
 10920 [ERROR] Slave SQL: Could not execute Delete_rows event on table NIEUWSBERICHTEN_P.field_data_body; Can't find record in 'field_data_body', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, en
d_log_pos 657, Error_code: 1032
2014-03-26 15:10:41 10920 [Warning] WSREP: RBR event 7 Delete_rows apply warning: 120, 6495
2014-03-26 15:10:41 10920 [Warning] WSREP: Failed to apply app buffer: seqno: 6495, status: 1
         at galera/src/trx_handle.cpp:apply():340

Should I downgrade to a MySQL 5.5 installation then ?


Op woensdag 26 maart 2014 18:00:59 UTC+1 schreef Seppo Jaakola:

Erik

unread,
Mar 26, 2014, 1:24:08 PM3/26/14
to codersh...@googlegroups.com
Hi Ilias,


All tables in the relevant database are innodb. So the error has nothing to do with isam.

and indeed after abort, the node fails again afterwards in a similar way on another table.

thx for your suggestions, may you have others ? ;-)

regards,
Erik

Ilias Bertsimas

unread,
Mar 26, 2014, 2:00:18 PM3/26/14
to codersh...@googlegroups.com
Hi,

About the wsrep_replicate_myisam I would personally disable it as it is *experimental* and it can possibly cause problems in the future.

Just to be safe, I would ask you to disable the drupal caching as Seppo suggested.

Can you confirm that you use both nodes to write ? 

Also can you start over with a fresh dataset on both nodes if possible and provide us with the logs when the issue happens again ?

Can you look for GRA_xx_xxxxxxxx.log files in the mysql data dir of the node that will fail and attach them as well ?

Kind Regards,

Ilias.

Erik

unread,
Mar 28, 2014, 3:52:16 AM3/28/14
to codersh...@googlegroups.com
Hi,

Only one node receives write action (node1). Node2 fails. 

Like you suggested, I started from scratch, imported a dump file and started up Drupal. After logging in into Drupal, the second node already failed. I attached the log files of both nodes and a GRA*.log file of node 2 (failing node).


Hopefully you can spot the source of the error, thx !

regards,
Erik

Op woensdag 26 maart 2014 19:00:18 UTC+1 schreef Ilias Bertsimas:
node1-mysqld.log
node2-mysqld.log
GRA_5_3476.log

Joe

unread,
Mar 28, 2014, 4:03:13 AM3/28/14
to Erik, codersh...@googlegroups.com
Why not use sst to create a copy of the db on node2? 

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "codership" group.
To unsubscribe from this group and stop receiving emails from it, send an email to codership-tea...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<node1-mysqld.log>
<node2-mysqld.log>
<GRA_5_3476.log>

Ilias Bertsimas

unread,
Mar 28, 2014, 5:27:52 AM3/28/14
to codersh...@googlegroups.com
Hi,

It seems this schema is the one where it fails: "NIEUWSBERICHTEN_P.cache"
Can you provide the schema for that table ?

Also since it seems to be failing again at that table that is related to the drupal cache I would suggest to disable it and  retry from scratch.

Kind Regards,

Ilias.

Erik

unread,
Mar 28, 2014, 8:45:51 AM3/28/14
to codersh...@googlegroups.com
Hi,

here are the table schema's where the node has failed upon during the lasdst few tests:

CREATE TABLE `cache` (
  `cid` varchar(255) NOT NULL DEFAULT '' COMMENT 'Primary Key: Unique cache ID.',
  `data` longblob COMMENT 'A collection of data to cache.',
  `expire` int(11) NOT NULL DEFAULT '0' COMMENT 'A Unix timestamp indicating when the cache entry should expire, or 0 for never.',
  `created` int(11) NOT NULL DEFAULT '0' COMMENT 'A Unix timestamp indicating when the cache entry was created.',
  `serialized` smallint(6) NOT NULL DEFAULT '0' COMMENT 'A flag to indicate whether content is serialized (1) or not (0).',
  PRIMARY KEY (`cid`),
  KEY `expire` (`expire`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Generic cache table for caching things not separated out...';


CREATE TABLE `sessions` (
  `uid` int(10) unsigned NOT NULL COMMENT 'The users.uid corresponding to a session, or 0 for anonym
ous user.',
  `sid` varchar(128) NOT NULL COMMENT 'A session ID. The value is generated by Drupal’s session hand
lers.',
  `ssid` varchar(128) NOT NULL DEFAULT '' COMMENT 'Secure session ID. The value is generated by Drup
al’s session handlers.',
  `hostname` varchar(128) NOT NULL DEFAULT '' COMMENT 'The IP address that last used this session ID
 (sid).',
  `timestamp` int(11) NOT NULL DEFAULT '0' COMMENT 'The Unix timestamp when this session last reques
ted a page. Old records are purged by PHP automatically.',
  `cache` int(11) NOT NULL DEFAULT '0' COMMENT 'The time of this user’s last post. This is used when
 the site has specified a minimum_cache_lifetime. See cache_get().',
  `session` longblob COMMENT 'The serialized contents of $_SESSION, an array of name/value pairs tha
t persists across page requests by this session ID. Drupal loads $_SESSION from here at the start of
 each request and saves it at the end.',
  PRIMARY KEY (`sid`,`ssid`),
  KEY `timestamp` (`timestamp`),
  KEY `uid` (`uid`),
  KEY `ssid` (`ssid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Drupal’s session handlers read and write into the...';
/*!40101 SET character_set_client = @saved_cs_client */;

CREATE TABLE `field_data_body` (
  `entity_type` varchar(128) NOT NULL DEFAULT '' COMMENT 'The entity type this data is attached to',
  `bundle` varchar(128) NOT NULL DEFAULT '' COMMENT 'The field instance bundle to which this row belongs, used when deleting a field instance',
  `deleted` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'A boolean indicating whether this data item has been deleted',
  `entity_id` int(10) unsigned NOT NULL COMMENT 'The entity id this data is attached to',
  `revision_id` int(10) unsigned DEFAULT NULL COMMENT 'The entity revision id this data is attached to, or NULL if the entity type is not versioned',
  `language` varchar(32) NOT NULL DEFAULT '' COMMENT 'The language for this data item.',
  `delta` int(10) unsigned NOT NULL COMMENT 'The sequence number for this data item, used for multi-value fields',
  `body_value` longtext,
  `body_summary` longtext,
  `body_format` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`entity_type`,`entity_id`,`deleted`,`delta`,`language`),
  KEY `entity_type` (`entity_type`),
  KEY `bundle` (`bundle`),
  KEY `deleted` (`deleted`),
  KEY `entity_id` (`entity_id`),
  KEY `revision_id` (`revision_id`),
  KEY `language` (`language`),
  KEY `body_format` (`body_format`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Data storage for field 2 (body)';
/*!40101 SET character_set_client = @saved_cs_client */;


regards, Erik

Erik

unread,
Mar 28, 2014, 1:36:05 PM3/28/14
to codersh...@googlegroups.com
Hi all,

I downgraded MySQL and galera on the 2 nodes to:
MySQL-server-5.5.29_wsrep_23.7.3-1.rhel6.x86_64.rpm
galera-23.2.4-1.rhel6.x86_64.rpm.

And now I get no errors at all - this after a day testing...

So it seems that above errors are related to MySQL version 5.6 or a newer galera version...

regards,
Erik
Reply all
Reply to author
Forward
0 new messages