PRM does not prevent failover to a broken replica

aurelien lemaire

unread,

Jul 25, 2014, 6:09:20 AM7/25/14

to prm-d...@googlegroups.com, hos...@smile.fr

Hi folks,

hope you 're going all as great as the work you produce.

Short Story
I notice while testing your percona resource agent an odd decision when a replica get broken (for instance : duplicate key ) : this slave can get promoted Master if current master crash (power outage/ kernel panic, whatever the raeson is). It ends up with "a-lot-of-missing-data Databases" becoming the Master --> OUCHHH !

IMHO : a broken slave should never be promotable until it get the replication fixed.

Long story/HOW TO reproduce

Knowing the relative stiffness of mysql replication : we often got integrity diff (thanks pt-table-checksum by the way) that result sometime with a broken replica due to 1062 duplicated key error.

1- setup a 2 node Master<-> slave with percona 5.5 , SBR replication, 1 vip, corosync 1.4.2, pacemaker 1.1.7 .and percona RA from today github branch.
Here the conf:

pacemaker-prm2:~# crm configure show
node pacemaker-prm1 \
    attributes IP="x.x.x.1" 
node pacemaker-prm2 \
    attributes IP="x.x.x.2" 
primitive p_mysql ocf:percona:mysql3 \
    params config="/etc/mysql/my.cnf" pid="/var/run/mysqld/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" replication_user="MYREPUSER" replication_passwd="MYREPPWD" max_slave_lag="60" evict_outdated_slaves="true" binary="/usr/bin/mysqld_safe" test_user="MYUSER" test_passwd="MYPASSWORD" test_table="test.example" \
    op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1" \
    op monitor interval="3s" role="Slave" OCF_CHECK_LEVEL="1" \
    op start interval="0" timeout="300s" \
    op stop interval="0" timeout="300s"
primitive p_vip ocf:heartbeat:IPaddr2 \
    params ip="x.x.x.3" cidr_netmask="25" nic="eth0" iflabel="vip-mysql" \
    op monitor interval="10s"
ms ms_mysql p_mysql \
    meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" globally-unique="false" target-role="Master"
colocation c_vip_on_master inf: p_vip ms_mysql:Master
order o_ms_mysql_promote_before_p_vip inf: ms_mysql:promote p_vip:start
property $id="cib-bootstrap-options" \
    dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
    cluster-infrastructure="openais" \
    stonith-enabled="false" \
    expected-quorum-votes="2" \
    no-quorum-policy="ignore" \
    last-lrm-refresh="1406209717"
property $id="mysql_replication" \
    p_mysql_REPL_INFO="pacemaker-prm1|mysql-bin.000298|107" \
    p_mysql_REPL_STATUS="mysql-bin.000298|4334|104857600"
rsc_defaults $id="rsc-options" \
    resource-stickiness="100"

From here the prm1 is Master and prm2 is slave

2- Broke your repli
note : I use duplicated key method to break my repli

pacemaker-prm2 > mysql test -e "insert into example_autoincrement(id,data) values('5','test5')"
pacemaker-prm1 > mysql test -e "insert into example_autoincrement(id,data) values('5','test5')"

3- notice the broken repli :
mysql level :

pacemaker-prm2:~# mysql -e "show slave status\G" | egrep "Last_Errno|Last_Error|Seconds_Behind_Master|Running"
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
                   Last_Errno: 1062
                   Last_Error: Error 'Duplicate entry '5' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into example_autoincrement(id,data) values('5','test5')'
        Seconds_Behind_Master: NULL

Pacemaker level: crm_mon -A1

Online: [ pacemaker-prm1 pacemaker-prm2 ]

p_vip   (ocf::heartbeat:IPaddr2):       Started pacemaker-prm1
 Master/Slave Set: ms_mysql [p_mysql]
     Masters: [ pacemaker-prm1 ]
     Slaves: [ pacemaker-prm2 ]

Node Attributes:
* Node pacemaker-prm1:
    + IP                                : x.x.x.1
    + master-p_mysql:0                  : 2147483647
    + readable                          : 1
* Node pacemaker-prm2:
    + IP                                : x.x.x.2
    + master-p_mysql:1                  : 0
    + readable                          : 0

Notice the readable of prm2 is now "0". --> good to prevent reader to access the broekn replica but not enough to prevent failover to it

crm_simulate -sL

Allocation scores:
clone_color: ms_mysql allocation score on pacemaker-prm1: 100
clone_color: ms_mysql allocation score on pacemaker-prm2: 0
clone_color: p_mysql:0 allocation score on pacemaker-prm1: INFINITY
clone_color: p_mysql:0 allocation score on pacemaker-prm2: 0
clone_color: p_mysql:1 allocation score on pacemaker-prm1: 0
clone_color: p_mysql:1 allocation score on pacemaker-prm2: 100
native_color: p_mysql:0 allocation score on pacemaker-prm1: INFINITY
native_color: p_mysql:0 allocation score on pacemaker-prm2: 0
native_color: p_mysql:1 allocation score on pacemaker-prm1: -INFINITY
native_color: p_mysql:1 allocation score on pacemaker-prm2: 100
p_mysql:0 promotion score on pacemaker-prm1: INFINITY
p_mysql:1 promotion score on pacemaker-prm2: 0
native_color: p_vip allocation score on pacemaker-prm1: INFINITY
native_color: p_vip allocation score on pacemaker-prm2: -INFINITY

Nothing here prevent in case of Master crash that the broken slave ; which possibly lost 4 hours of not-replicated-data ; take the Master role. Disaster pending here !

Fix suggestion
Could you please add an automatic colocation or -INFINITY score on the broken replica ?

Overall :
Due to this issue : i cannot used your percona RA in prod env.
Keen to provide you any more infos/debug or tests.

Regards, Aurélien

Yves Trudeau

unread,

Jul 29, 2014, 4:05:55 PM7/29/14

to prm-d...@googlegroups.com

Hi Aurélien,

good catch, I'll modify the code to set the master-score for a node that has replication broken to -INF. That will prevent its promotion.

Regards,

Yves

--
You received this message because you are subscribed to the Google Groups "PRM-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prm-discuss...@googlegroups.com.
To post to this group, send email to prm-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/prm-discuss.
For more options, visit https://groups.google.com/d/optout.

aurelien lemaire

unread,

Aug 13, 2014, 9:55:22 AM8/13/14

to prm-d...@googlegroups.com

Hi Yves,

Thanks a bunch for the reply.

I saw you git commit related to my bug report : https://github.com/percona/percona-pacemaker-agents/commit/f9236d59b8d0d748ee646b27e4eb5cae9e3d96d7#diff-76ee4ad3b12ba3471b5273d26f48b51b

I found some other buggy stuff (but less disruptive) in your RA... would you prefer i post them here or directly to you ?

Regards, Aurélien

Yves Trudeau

unread,

Aug 13, 2014, 10:39:40 AM8/13/14

to prm-d...@googlegroups.com

Hi Aurélien,
Here is fine.

Regards,

Yves

Reply all

Reply to author

Forward