Correct defaults for start, stop timeouts and monitor timeout and intervals

Kumar Pandey

unread,

Aug 17, 2017, 3:25:28 AM8/17/17

to PRM-discuss

Hello,

I am new to PRM and was trying to create a setup that I can validate on production - a master and 2 slave setup on EC2. I have been following the PRM setup guide and it has been very useful. However I have a doubt about the start and stop action timeouts and the monitor action's intervals and timeouts. The default values for these as mentioned in the RA code are:

<action name="start" timeout="120" />
<action name="stop" timeout="120" />
<action name="monitor" role="Master" depth="0" timeout="30" interval="10" />
<action name="monitor" role="Slave" depth="0" timeout="30" interval="30" />

However in the PRM setup guide, a sample declaration (https://github.com/Percona-Lab/pacemaker-replication-agents/blob/master/doc/PRM-setup-guide.rst#the-mysql-resource-primitive) is as follows:

primitive p_mysql ocf:percona:mysql \
      ...
      op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1" \
      op monitor interval="2s" role="Slave" OCF_CHECK_LEVEL="1" \
      op start interval="0" timeout="60s" \
      op stop interval="0" timeout="60s"

The monitor interval in this example seem very aggressive compared to the defaults. Same can be said about the timeouts. So my question was, if there are any good recommendations for these values. What values do people typically use in production (especially on EC2). Are there any tests that I should run to determine the right values of the intervals?

P.S. I set the value of token timeout on corosync to 10 secs instead of 1 sec cause it used to failover on momentary network latency surges as well.

Thanks in advance!

Yves Trudeau

unread,

Aug 18, 2017, 4:18:15 PM8/18/17

to prm-d...@googlegroups.com

Hi Kumar,

the start and stop timeout are related to the following issues:

Start: upon start-up, MySQL may have to perform InnoDB recovery. If you have very large Innodb log files, that may take a long time and you may need a bigger start timeout. Another reason why it could take a long time to start is when a database has many 100k tables.

Stop: Stop is also related to the InnoDB log file size. If you have a server with a lot of memory, slow storage and large innodb log files, when MySQL stops it may have to flush a large number of dirty InnoDB pages. Again, the default stop values (either 60s or 120s) may not be sufficient.

I most cases, MySQL starts and stops in a few seconds so 60s is way sufficient.

--
You received this message because you are subscribed to the Google Groups "PRM-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prm-discuss+unsubscribe@googlegroups.com.
To post to this group, send email to prm-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/prm-discuss.
For more options, visit https://groups.google.com/d/optout.

Vaibhaw Pandey

unread,

Aug 18, 2017, 11:20:43 PM8/18/17

to prm-d...@googlegroups.com

Thanks for replying Yves! :)

It sounds like we look keep looking at the start stop timeout and change them as our database grows.

Could you also shed some light on the monitoring intervals please? What values are close to optimal on cross AZ EC2 cluster? Are the default values better or would the aggressive value of 2 and 5 secs be better? I realize that it depends on one's fault tolerance level but we certainly want to avoid being too aggressive. What values do people generally use in production environments?

Thanks

--
You received this message because you are subscribed to a topic in the Google Groups "PRM-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prm-discuss/m-KNi1dUaD8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prm-discuss+unsubscribe@googlegroups.com.

Reply all

Reply to author

Forward