God event based monitoring does not restart process when it dies

577 views
Skip to first unread message

Mike Fellows

unread,
May 3, 2013, 8:38:07 PM5/3/13
to god...@googlegroups.com
Apologies if the answer to this is obvious but I am having trouble getting event based monitoring working on either OSX 10.7.5 or a RHEL Linux variant (the latest AWS Linux AMI).

In both cases god check is reporting success for event based monitoring, and I have confirmed for the Linux OS that the CONFIG_PROC_EVENTS option was set to 'y' when it was compiled.  Here is the output from god check for OSX below, the Linux god check returns the same output although it is using the 'netlink' event system.

$ rvmsudo god check
using event system: kqueue
starting event handler
forking off new process
forked process with pid = 1504
killing process
[ok] process exit event received


I've reduced a test case down to something very simple test case.  It is a simplification of the example from http://godrb.com.  I also tried the full sample file from the site with no luck either.  Here is the simplified god file.

root = File.dirname(File.expand_path(__FILE__))

God.watch do |w|
  w.name     = "simple_test"
  w.interval = 30.seconds
  w.start    = "#{root}/simple_test.rb"
  w.log = "#{root}/simple_test.log"
  w.uid = 'mike'
  w.dir = root

  # determine the state on startup
  w.transition(:init, { true => :up, false => :start }) do |on|
    on.condition(:process_running) do |c|
      c.running = true
    end
  end

  # determine when process has finished starting
  w.transition([:start, :restart], :up) do |on|
    on.condition(:process_running) do |c|
      c.running = true
    end

    # failsafe
    on.condition(:tries) do |c|
      c.times = 5
      c.transition = :start
    end
  end

  # start if process is not running
  w.transition(:up, :start) do |on|
    on.condition(:process_exits)
  end
end


And here is the output from god when it is started in foreground mode and the god file is loaded.

$ rvmsudo god -D --no-syslog --log-level debug
I [2013-05-03 17:28:13]  INFO: Syslog disabled.
I [2013-05-03 17:28:13]  INFO: Using pid file directory: /var/run/god
I [2013-05-03 17:28:13]  INFO: Started on drbunix:///tmp/god.17165.sock
I [2013-05-03 17:28:40]  INFO: simple_test Loaded config
I [2013-05-03 17:28:40]  INFO: simple_test move 'unmonitored' to 'init'
D [2013-05-03 17:28:40] DEBUG: driver schedule #<God::Conditions::ProcessRunning:0x007f806081d570> in 0 seconds
I [2013-05-03 17:28:40]  INFO: simple_test moved 'unmonitored' to 'init'
I [2013-05-03 17:28:40]  INFO: simple_test [trigger] process is not running (ProcessRunning)
D [2013-05-03 17:28:40] DEBUG: simple_test ProcessRunning [false] {true=>:up, false=>:start}
I [2013-05-03 17:28:40]  INFO: simple_test move 'init' to 'start'
I [2013-05-03 17:28:40]  INFO: simple_test start: /Users/mike/code/STAT-HQ/test/god/simple_test/simple_test.rb
D [2013-05-03 17:28:40] DEBUG: driver schedule #<God::Conditions::ProcessRunning:0x007f806081fc58> in 0 seconds
D [2013-05-03 17:28:40] DEBUG: driver schedule #<God::Conditions::Tries:0x007f806081f8e8> in 0 seconds
I [2013-05-03 17:28:40]  INFO: simple_test moved 'init' to 'start'
D [2013-05-03 17:28:40] DEBUG: simple_test ProcessRunning [true] {true=>:up}
I [2013-05-03 17:28:40]  INFO: simple_test move 'start' to 'up'
I [2013-05-03 17:28:40]  INFO: simple_test registered 'proc_exit' event for pid 1659
I [2013-05-03 17:28:40]  INFO: simple_test moved 'start' to 'up'


If I then kill the process that god is supposed to be monitoring there is no output from god and the simple_test.rb program is not restarted.  Here is the output from killing the process.

sparrow:simple_test mike$ rvmsudo god load simple_test.god 
Sending 'load' command with action 'leave'

The following tasks were affected:
  simple_test
$ ps -ef | grep simple_test.rb
  501  1659     1   0  5:28pm ??         0:00.03 /usr/bin/ruby /Users/mike/code/STAT-HQ/test/god/simple_test/simple_test.rb
  501  1663   875   0  5:28pm ttys004    0:00.00 grep simple_test.rb
$ kill 1659
$ ps -ef | grep simple_test.rb
  501  1665   875   0  5:28pm ttys004    0:00.00 grep simple_test.rb
$ ps -ef | grep simple_test.rb
  501  1667   875   0  5:29pm ttys004    0:00.00 grep simple_test.rb
$ ps -ef | grep simple_test.rb
  501  1669   875   0  5:30pm ttys004    0:00.00 grep simple_test.rb


I've read all the documentation and been through some of the source code for god but I am not finding any obvious clues.  Does my god file look reasonable?  Would anyone have any suggestions for other steps to take to troubleshoot?

As a postscript - I am able to get god to monitor and restart processes using polling but I would prefer to use the event based technique.

Thanks for any help.

Regards,
Mike

Prathan Thananart

unread,
Jan 6, 2014, 6:51:09 AM1/6/14
to god...@googlegroups.com, mike.f...@shaw.ca
Hi Mike,

I wasted a whole day tracking down what I believe to be the same bug. This is the only discussion thread about the issue that I can find, so I want to put it out here.

If you start god without specifying a config file, the events module doesn't initialize properly, and god isn't notified when the processes quit.

Obviously the fix is as easy as adding `-c <config file>` to the command line you used to start god. I'll leave it to people who understand god's internals to explain why this would be the case.

Prathan

P.S.

$ sudo god -V

Version: 0.13.3

Polls: enabled

Events: netlink

Reply all
Reply to author
Forward
0 new messages