God Stops Monitoring Processes but Doesn't Die.

7 views
Skip to first unread message

Linda

unread,
Nov 10, 2009, 5:10:55 PM11/10/09
to god.rb
I have a weird issue that I'm trying to track down the solution for.
We have god monitoring memory and cpu usage on mongrel_rails
processes. When god starts up it does it's job, but at some point it
stops monitoring the processes and the god process doesn't die.

God processes that have been running since Oct 31st are already dead
and no events are from god are in the retained syslog files. I'm just
now tracking this down and we only keep our syslog files for a week.
I've got one god process that has been running since the evening of
the 3rd so hopefully the monitoring will kick the bucket soon. It
would be helpful if there is a triggering event logged.

We are running a old version of god (0.7.6), and I've upgraded to a
newer version (0.7.18) on one of testing servers to see if it will
exhibit the same behavior.

Has anyone run into this before?


Here's the config file we're running:

mongrel_list = Array.new
Dir["/etc/sv/mongrel-*"].each do |file|
file =~ /mongrel-(.+)-(\d+)/
group = $1
port = $2
mongrel = Hash.new
mongrel[:name] = "#{group}-#{port}"
mongrel[:start] = "/usr/bin/sv start #{file}"
mongrel[:stop] = "/usr/bin/sv stop #{file}"
mongrel[:restart] = "/usr/bin/sv restart #{file}"
mongrel[:pidfile] = File.join(file, "supervise", "pid")
mongrel[:port] = port
mongrel_list << mongrel
end

mongrel_list.each do |mongrel|
God.watch do |w|
w.name = mongrel[:name]
w.interval = 30.seconds # default
w.start = mongrel[:start]
w.stop = mongrel[:stop]
w.restart = mongrel[:restart]
w.start_grace = 10.seconds
w.restart_grace = 10.seconds
w.pid_file = mongrel[:pidfile]

w.restart_if do |restart|
restart.condition(:memory_usage) do |c|
c.above = 150.megabytes
end

restart.condition(:cpu_usage) do |c|
c.above = 50.percent
c.times = 5
end
end
end
end
Reply all
Reply to author
Forward
0 new messages