god should clean PID file when you stop a watch

172 views
Skip to first unread message

Stephen George

unread,
Feb 21, 2014, 1:03:48 PM2/21/14
to god...@googlegroups.com
Let's start with an example that has nothing to do with god...  When you stop a service, such as crond, its pid file is also removed:

$ cat /var/run/crond.pid
32484

$ service crond status
crond (pid  32484) is running...

$ service crond stop
Stopping crond:  [  OK  ]

$ cat /var/run/crond.pid
cat: /var/run/crond.pid: No such file or directory

Now in contrast, god does not do this for your watches.  Furthermore, it doesn't cleanup /var/run/god.pid when you cleanly end it, either.

There are a few important benefits to cleaning up a PID file when a planned stop/restart occurs:
  1. process IDs are in a finite space and are recycled after ending.  Just because process ID "123" exists both in the process table and in a stale PID file doesn't mean that our original service is still running.  In other words, checking for your process via `ps -p $(cat /var/run/my-service.pid)` is _weak_ guarantee that the process you expected is still running, unless it remains constantly in a monitored state by god.
  2. The current behavior relies heavily on the idea that you won't restart god or unmonitor watches.  If you do, god's re-init is forced to rely on the _weak_ guarantee above to assure itself that watches were not killed while it was away.
  3. If we can correlate the absence of a PID file with a planned down/stop, then it makes requests like #156 easy.
w.behavior(:clean_pid_file) is similar to this, but it cleans up during start-up.  We need a behavior for stop.  Unfortunately, :clean_pid_file is such a generic name, that adding another one will certainly lead to confusion.

Personally, I can't think of a case where you _wouldn't_ want to clean up the PID file upon a planned "stop" action.  That's why I advocate that PID file cleanup on stop should be an inherent behavior, and not an option.

But I open this up to the group... do you agree that this stop-pid-cleanup should be 1.) available and 2.) an inherent behavior for all watches that have a w.pid_file property?  Thanks for your thoughts!
Reply all
Reply to author
Forward
0 new messages