I've written a custom condition that tries to make a request via
ActiveRecord to a MySQL server, and restarts the server if it throws
an exception, in order to detect overloading problems which were
causing our DB to grind to a halt without actually stopping.
I also have a flapping condition in a lifecycle block to stop
monitoring the server if it keeps failing, as per the usual examples.
(See attached file mysql.god)
As I understand it, a condition like the one I used:
w.lifecycle do |on|
on.condition(:flapping) do |c|
c.to_state = [:start, :restart]
c.times = 5
c.within = 10.minutes
c.transition = :unmonitored
c.retry_in = 10.minutes
c.retry_times = 5
c.retry_within = 2.hours
end
end
should cause the server to go unmonitored if it is started or
restarted 5 times in 10 minutes.
However, if I simulate a problem with the server by removing the table
it's checking, which isn't fixed by restarting the server, it
continues to monitor (and restart) indefinitely. Looking at the log
(see attached god.log) I can see the following line at the appropriate
time:
I [2010-03-21 11:44:20] INFO: mysqld auto-reenable monitoring in 600 seconds
but as you can see from the log, the monitoring and restarting
continues at the same rate.
(By the way, ignore the mailer warnings, no MTA on the box I used.)
Anybody know what I'm doing wrong? Has anyone got a working example of
flapping detection via a custom condiiton?
Many thanks!
Andrew.
--
:: http://biotext.org.uk/ ::