"live" CPU monitoring with god

56 views
Skip to first unread message

pogodan

unread,
Dec 6, 2011, 3:46:50 PM12/6/11
to god.rb
We're trying to use God to monitor some processes that occasionally
crash and peg the CPU. In top we see CPU% go to 100 or 200%, but God
still reports use as only maybe 15-20%

Poking through the code and some linux forums, it seems the CPU stats
returned by `ps` are historical over the life of the process, while
calculating realtime stats requires polling, waiting an interval and
re-polling to compare "jiffies" used over the interval. It looks like
both God's SlashProcPoller and PortablePoller are basically
calculating the historical CPU use which is not very helpful for
occasionally crashing processes

Has anyone made a stab at writing a "realtime" CPU checker?

-
Paul | http://pogodan.com

captainf

unread,
Dec 13, 2011, 6:29:23 AM12/13/11
to god.rb
A small patch to SlashProcPoller
implementing the strategy you mention.

There is still some averaging on the cpu usage, depending on the
interval on the cpu_usage condition.

Goodluck and thanks for pointing out the issue.
I realy takes a long time for a long running process to get noticed if
it misbehaves on cpu usage with the current cpu_usage implementation
Write the code in a file in your god config's include path and require
the file in your config to test

module God
module System
class SlashProcPoller < PortablePoller
# the poller object is recreated for each pol
# book keeping must be done on class level
@@last_cpu_stats = []# // [pid] => {:run_time, :total_time}

def percent_cpu
stats = stat
total_time = stats[:utime].to_i + stats[:stime].to_i # in
jiffies
run_time = uptime - stats[:starttime].to_i / @@hertz
if run_time == 0
0
else
delta_run_time = run_time
delta_total_time = total_time
@@last_cpu_stats[@pid] ||= {:run_time => 0, :total_time
=>0}

delta_run_time = delta_run_time - @@last_cpu_stats[@pid]
[:run_time]
delta_total_time = delta_total_time - @@last_cpu_stats[@pid]
[:total_time]


@@last_cpu_stats[@pid][:run_time] = run_time
@@last_cpu_stats[@pid][:total_time] = total_time

((delta_total_time * 1000 / @@hertz) / delta_run_time) / 10
end
rescue # This shouldn't fail is there's an error (or proc
doesn't exist)
0
end

end
end
end

Reply all
Reply to author
Forward
0 new messages