Hello,
I'm facing to a strange behavior on a MariaDB 5.5 replication with PRM. I'm wondering if my corosync configuration is fine and if someone could give me inputs to resolve that issue.
Here is the scenario, I got this configuration:
node master1 \
attributes p_mysql_mysql_master_IP="192.168.33.31"
node master2 \
attributes p_mysql_mysql_master_IP="192.168.33.32"
primitive p_mysql ocf:percona:mysql \
params config="/etc/mysql/my.cnf" pid="/var/run/mysqld/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" replication_user="replication" replication_passwd="password" max_slave_lag="60" evict_outdated_slaves="false" binary="/usr/sbin/mysqld" test_user="test_user" test_passwd="password" \
op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1" \
op monitor interval="2s" role="Slave" OCF_CHECK_LEVEL="1" \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="60s"
primitive reader_vip_1 ocf:heartbeat:IPaddr2 \
params ip="192.168.33.101" nic="eth2" \
op monitor interval="10s" \
meta target-role="Started"
primitive reader_vip_2 ocf:heartbeat:IPaddr2 \
params ip="192.168.33.102" nic="eth2" \
op monitor interval="10s"
primitive writer_vip ocf:heartbeat:IPaddr2 \
params ip="192.168.33.100" nic="eth2" \
op monitor interval="10s"
ms ms_MySQL p_mysql \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" globally-unique="false" target-role="Started" is-managed="true"
location loc-No-reader-vip-2 reader_vip_2 \
rule $id="rule-no-reader-vip-2" -inf: readable gt 0
location loc-no-reader-vip-1 reader_vip_1 \
rule $id="rule-no-reader-vip-1" -inf: readable gt 0
colocation writer_vip_on_master inf: writer_vip ms_MySQL:Master
order ms_MySQL_promote_before_vip inf: ms_MySQL:promote writer_vip:start
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false" \
last-lrm-refresh="1395333299"
property $id="mysql_replication" \
p_mysql_REPL_INFO="192.168.33.31|mariadb-bin.000190|245" \
p_mysql_REPL_STATUS="mariadb-bin.000190|245|104857600"
When I start corosync, during a few seconds, I can see readers are ready, but not writer:
reader_vip_1 (ocf::heartbeat:IPaddr2): master1
reader_vip_2 (ocf::heartbeat:IPaddr2): master2
writer_vip (ocf::heartbeat:IPaddr2): Stopped
Then they are totally unusable and the writer vip is only working:
reader_vip_1 (ocf::heartbeat:IPaddr2): Stopped
reader_vip_2 (ocf::heartbeat:IPaddr2): Stopped
writer_vip (ocf::heartbeat:IPaddr2): Started master1
If I take a look at the master2, it is correcting acting as a slave server. So don't see why it doesn't want to add a read IP on it.
In the logs, it tells me that there was a problem on check_slave. Howerver I confirm the replication is correct! Here is a sample of what I've got:
Both mysql are started and the replication seams working. However, I do not get any VIP for read purpose. Here is what I got in the logs:
Mar 20 21:40:06 master1 mysql[10951]: ERROR: check_slave invoked on an instance that is not a replication slave.
Mar 20 21:40:06 master1 mysql[10951]: WARNING: Attempted to unset the replication master on an instance that is not configured as a replication slave
...
Mar 20 21:40:06 master1 crm_resource: [10916]: info: Invoked: /usr/sbin/crm_resource --list
Mar 20 21:40:06 master1 mysql[10895]: ERROR: check_slave invoked on an instance that is not a replication slave.
Mar 20 21:40:06 master1 lrmd: [10450]: info: operation notify[10] on p_mysql:0 for client 10453: pid 10895 exited with return code 0
Mar 20 21:40:06 master1 crmd: [10453]: info: process_lrm_event: LRM operation p_mysql:0_notify_0 (call=10, rc=0, cib-update=0, confirmed=true) ok
Mar 20 21:40:06 master1 lrmd: [10450]: info: rsc:p_mysql:0 promote[11] (pid 10951)
Mar 20 21:40:06 master1 crm_resource: [10972]: info: Invoked: /usr/sbin/crm_resource --list
Mar 20 21:40:06 master1 mysql[10951]: ERROR: check_slave invoked on an instance that is not a replication slave.
Mar 20 21:40:06 master1 mysql[10951]: WARNING: Attempted to unset the replication master on an instance that is not configured as a replication slave
Mar 20 21:40:06 master1 attrd: [10451]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-p_mysql:0
Does somebody get an idea?
Thanks in advance
Pierre