successful restart followed by failed processrunning triggers

15 views
Skip to first unread message

thoraxe

unread,
Jan 8, 2009, 11:22:53 AM1/8/09
to god.rb
Hi all, welcome to my first post!

A quick scan revealed some perturbations of what I'm experiencing, but
no messages that I found that were exactly the same.

Running god 0.7.12 on Centos 4.7 to monitor an app we just deployed
that is having some memory issues (will fix later, but need app up
now). My god configuration can be found here:
http://pastie.org/355756

I'm forcing my app to memory hog so that god will restart it, and I'm
seeing the following:
I [2009-01-08 11:11:58] INFO: ridingresource-mongrel-12002 [trigger]
memory out of bounds [39728kb, 85004kb, *127152kb, *127152kb,
*127152kb] (MemoryUsage)
I [2009-01-08 11:11:58] INFO: ridingresource-mongrel-12002 move 'up'
to 'restart'
I [2009-01-08 11:11:58] INFO: ridingresource-mongrel-12002 restart:
mongrel_rails restart -P /home/riding/railsapps/equine/log/mongrel.pid
I [2009-01-08 11:12:10] INFO: ridingresource-mongrel-12002 moved 'up'
to 'up'
I [2009-01-08 11:12:11] INFO: ridingresource-mongrel-12002 [trigger]
process is not running (ProcessRunning)
I [2009-01-08 11:12:11] INFO: ridingresource-mongrel-12002 move 'up'
to 'start'
I [2009-01-08 11:12:11] INFO: ridingresource-mongrel-12002
before_start: no pid file to delete (CleanPidFile)
I [2009-01-08 11:12:11] INFO: ridingresource-mongrel-12002 start:
mongrel_rails start -c /home/riding/railsapps/equine -p 12002 -
P /home/riding/railsapps/equine/log/mongrel.pid -e production -d
I [2009-01-08 11:12:23] INFO: ridingresource-mongrel-12002 moved 'up'
to 'up'
I [2009-01-08 11:12:24] INFO: ridingresource-mongrel-12002 [trigger]
process is not running (ProcessRunning)
I [2009-01-08 11:12:24] INFO: ridingresource-mongrel-12002 move 'up'
to 'start'
I [2009-01-08 11:12:24] INFO: ridingresource-mongrel-12002
before_start: no pid file to delete (CleanPidFile)
I [2009-01-08 11:12:24] INFO: ridingresource-mongrel-12002 start:
mongrel_rails start -c /home/riding/railsapps/equine -p 12002 -
P /home/riding/railsapps/equine/log/mongrel.pid -e production -d

You can see that once god sees the memory usage is out of bounds, it
does a restart. This restart works fine -- watching the mongrel
process in htop shows that it has been restarted and memory usage
falls.

For some reason, on the next test for processrunning, god thinks that
the process isn't running, even though it is, and even though the pid
is there. It then continually starts the mongrel until it decides
that it is flapping.

I don't see anything egregiously wrong with my configuration file, so
I'm not sure what's going on.

Any suggestions?

woahdae

unread,
Jan 8, 2009, 10:37:57 PM1/8/09
to god.rb
I didn't look at your post super-close, but I had the same issue in a
different setting, and it boiled down to the w.behavior
(:clean_pid_file) declaration. If I remember right, if it doesn't find
a pidfile to clean, something bad happens... ok, I don't remember, but
here's what I did to solve it. Instead of having god clean pids, use
mongrel_cluster instead. from my mongrel watch:

w.start = "mongrel_rails cluster::start -C #{RAILS_ROOT + "/config/
mongrel_cluster.yml"} --only #{port} --clean"
w.stop = "mongrel_rails cluster::stop -C #{RAILS_ROOT + "/config/
mongrel_cluster.yml"} --only #{port}"
w.restart = "mongrel_rails cluster::restart -C #{RAILS_ROOT + "/
config/mongrel_cluster.yml"} --only #{port} --clean"

hope this helps, or at least rules another thing out in the game of
"where is the problem not at"

Matt Davies

unread,
Jan 9, 2009, 3:37:31 AM1/9/09
to god...@googlegroups.com
Morning all

I had exactly the same problem and by a process of elimination came to the same solution as woody, use the mongrel_rails cluster::start command instead and the problem went away.

Does anyone know exactly what the problem is with the w.behavior
(:clean_pid_file) declaration?

It might save other developers a whole heap of time if that could be fixed, even just letting people know about the problem in the docs would help.



2009/1/9 woahdae <woody.p...@gmail.com>

thoraxe

unread,
Jan 9, 2009, 9:53:34 AM1/9/09
to god.rb
Hi! Thanks for the quick responses.

mongrel_rails does not accept a --clean option in mongrel 1.1.5

I am not using cluster.

Any suggestions?

On Jan 9, 3:37 am, "Matt Davies" <tonm...@gmail.com> wrote:
> Morning all
>
> I had exactly the same problem and by a process of elimination came to the
> same solution as woody, use the mongrel_rails cluster::start command instead
> and the problem went away.
>
> Does anyone know exactly what the problem is with the w.behavior
> (:clean_pid_file) declaration?
>
> It might save other developers a whole heap of time if that could be fixed,
> even just letting people know about the problem in the docs would help.
>
> 2009/1/9 woahdae <woody.peter...@gmail.com>

woahdae

unread,
Jan 9, 2009, 11:39:14 AM1/9/09
to god.rb
Well, you'll have to either figure out what's wrong with god's method
of cleaning pids or start using mongrel_cluster. Mongrel_cluster is a
simple and useful gem, so I'd go with installing mongrel_cluster,
although God would be improved if someone figured out the pid cleaning
issue.

woahdae

unread,
Jan 9, 2009, 11:56:04 AM1/9/09
to god.rb

thoraxe

unread,
Jan 9, 2009, 2:26:36 PM1/9/09
to god.rb
Well, I would love to troubleshoot what god's problem is but I'm not
savvy enough to know what to look for. I'm open to suggestions.

mongrel_cluster may surely be useful, but I think in this instance
fixing the actual problem will probably be more beneficial to the
community :) So I'm down to try!

On Jan 9, 11:56 am, woahdae <woody.peter...@gmail.com> wrote:
> This topic's been mentioned elsewhere:
>
> http://groups.google.com/group/god-rb/browse_thread/thread/3239f2d018...

thoraxe

unread,
Jan 13, 2009, 11:24:10 PM1/13/09
to god.rb
I actually don't think that god has a problem with cleaning the pids,
because the restart works successfully.

I think the problem lies in the actual checking of the pid. My log
doesn't show it, but from start, checking if the process is running
(via the pid) works just fine.

It isn't until after the memory trigger happens, which causes a
restart. Restart takes the existing pid so that mongrel knows what to
restart. This restart itself works fine.

The problem happens when the next processrunning check occurs -- it
thinks that the process is NOT running. It then tries to start
again. Start obviously fails because the pid does really still exist
(it never got deleted). It sounds like somehow when the restart
happens that god's memory of where the pid is is getting erased or
changed, and therefore it can't find the pid anymore.

Unfortunately, I am not smart enough to understand how to troubleshoot
this behavior.

thoraxe

unread,
Jan 14, 2009, 9:29:47 AM1/14/09
to god.rb
Now I'm not so sure there is a problem. I've gotten it to run
successfully for a while now. I think there may have been an issue
with the scan interval and the checks I was running. It's restarted
for over-memory successfully without flapping on processrunning, so it
looks like it might have been a configuration problem.

However, this does present a question -- does god understand that it
takes a finite amount of time to actually perform its action (start/
restart/whatever)? Perhaps there should be a delay interval between
the time it perfoms an action (start/restart) and the next check of
something.
Reply all
Reply to author
Forward
0 new messages