god should clean PID file when you stop a watch

172 views

Skip to first unread message

Stephen George

unread,

Feb 21, 2014, 1:03:48 PM2/21/14

to god...@googlegroups.com

Let's start with an example that has nothing to do with god... When you stop a service, such as crond, its pid file is also removed:

$ cat /var/run/crond.pid

32484

$ service crond status

crond (pid 32484) is running...

$ service crond stop

Stopping crond: [ OK ]

$ cat /var/run/crond.pid

cat: /var/run/crond.pid: No such file or directory

Now in contrast, god does not do this for your watches. Furthermore, it doesn't cleanup /var/run/god.pid when you cleanly end it, either.

There are a few important benefits to cleaning up a PID file when a planned stop/restart occurs:

process IDs are in a finite space and are recycled after ending. Just because process ID "123" exists both in the process table and in a stale PID file doesn't mean that our original service is still running. In other words, checking for your process via `ps -p $(cat /var/run/my-service.pid)` is _weak_ guarantee that the process you expected is still running, unless it remains constantly in a monitored state by god.
The current behavior relies heavily on the idea that you won't restart god or unmonitor watches. If you do, god's re-init is forced to rely on the _weak_ guarantee above to assure itself that watches were not killed while it was away.
If we can correlate the absence of a PID file with a planned down/stop, then it makes requests like #156 easy.

w.behavior(:clean_pid_file) is similar to this, but it cleans up during start-up. We need a behavior for stop. Unfortunately, :clean_pid_file is such a generic name, that adding another one will certainly lead to confusion.

Personally, I can't think of a case where you _wouldn't_ want to clean up the PID file upon a planned "stop" action. That's why I advocate that PID file cleanup on stop should be an inherent behavior, and not an option.

But I open this up to the group... do you agree that this stop-pid-cleanup should be 1.) available and 2.) an inherent behavior for all watches that have a w.pid_file property? Thanks for your thoughts!

Reply all

Reply to author

Forward

0 new messages