Can't get flapping detection to work properly with custom condition

24 views

Skip to first unread message

Andrew Clegg

unread,

Mar 21, 2010, 8:43:42 AM3/21/10

to god...@googlegroups.com

Hi,

I've written a custom condition that tries to make a request via
ActiveRecord to a MySQL server, and restarts the server if it throws
an exception, in order to detect overloading problems which were
causing our DB to grind to a halt without actually stopping.

I also have a flapping condition in a lifecycle block to stop
monitoring the server if it keeps failing, as per the usual examples.

(See attached file mysql.god)

As I understand it, a condition like the one I used:

w.lifecycle do |on|
on.condition(:flapping) do |c|
c.to_state = [:start, :restart]
c.times = 5
c.within = 10.minutes
c.transition = :unmonitored
c.retry_in = 10.minutes
c.retry_times = 5
c.retry_within = 2.hours
end
end

should cause the server to go unmonitored if it is started or
restarted 5 times in 10 minutes.

However, if I simulate a problem with the server by removing the table
it's checking, which isn't fixed by restarting the server, it
continues to monitor (and restart) indefinitely. Looking at the log
(see attached god.log) I can see the following line at the appropriate
time:

I [2010-03-21 11:44:20] INFO: mysqld auto-reenable monitoring in 600 seconds

but as you can see from the log, the monitoring and restarting
continues at the same rate.

(By the way, ignore the mailer warnings, no MTA on the box I used.)

Anybody know what I'm doing wrong? Has anyone got a working example of
flapping detection via a custom condiiton?

Many thanks!

Andrew.

--
:: http://biotext.org.uk/ ::

god.log

mysql.god

Reply all

Reply to author

Forward

0 new messages