I am using the following Monit script. Besides the normal checks for
cpu load and memory usage I have created a web page called alive and
all it does it do a very simpel select onto the database. Every 30
seconds I check if that link is still OK. If for some reason that link
is not working for two times in a row I restart Apache and I receive
an alert message. It is a very basic check, but so far I has served
its purpose. Our hardware loadbalancer is checking the same page.
On my database sever I use Monit as well to check the stats of MySQL.
Hope this helps,
Marco
PS One thing that I still need to add to my Monit script is to take
automatic action if one of my Ruby process go to 100%, that happens
maybe once every two weeks. At this moment I still kill that process
manually.
##############################################################################
## Monit control file
###############################################################################
set daemon 30
set mailserver localhost
set logfile /var/log/monit.log
set alert <your e-mail address>
set httpd port 2812
allow localhost
allow <your ip address>
allow <your user-name>:<your password>
check system localhost
if loadavg (1min) > 8 then alert
if loadavg (5min) > 4 then alert
if memory usage > 75% then alert
if cpu usage (user) > 70% for 8 cycles then alert
if cpu usage (system) > 30% for 8 cycles then alert
if cpu usage (wait) > 20% for 8 cycles then alert
check process apache with pidfile /var/run/apache2.pid
start program = "/etc/init.d/apache2 start"
stop program = "/etc/init.d/apache2 stop"
if cpu > 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if totalmem > 200.0 MB for 5 cycles then restart
if children > 250 then restart
if loadavg(5min) greater than 20 for 8 cycles then alert
if failed url
http://localhost/alive
and content == 'Site is alive!'
for 2 cycles
then restart
group server