Hi Group.
I have a cluster with 5 nodes that has been working perfectly for the last few weeks, today i have a failed node(network issues) the resulted in a fail over, the problem is that the wip did not moved, i got an error and the application was not able to write to the mysql. take a look :
This is during the incident:
[root@server01 ~]# crm status
============
Last updated: Wed Jun 11 07:23:23 2014
Last change: Wed Jun 11 07:11:42 2014 via crm_attribute on server02.domain
Stack: openais
Current DC: server05.domain - partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
5 Nodes configured, 5 expected votes
9 Resources configured.
============
Online: [ server01.domain server02.domain server03.domain server04.domain server05.domain ]
Master/Slave Set: ms_MySQL [p_mysql]
Masters: [ server02.domain ]
Slaves: [ server01.domain server05.domain server03.domain server04.domain ]
reader_vip_1 (ocf::heartbeat:IPaddr2): Started server02.domain
reader_vip_2 (ocf::heartbeat:IPaddr2): Started server04.domain
reader_vip_3 (ocf::heartbeat:IPaddr2): Started server03.domain
writer_vip (ocf::heartbeat:IPaddr2): Started server01.domain (unmanaged) FAILED
Failed actions:
writer_vip_stop_0 (node=server01.domain, call=1002, rc=-2, status=Timed Out): unknown exec error
p_mysql:0_monitor_2000 (node=server01.domain, call=1009, rc=-2, status=Timed Out): unknown exec error
any idea why this happened and how can i avoid it next time?
10x.
Alon