bluepill not detecting started processes on a node

110 views
Skip to first unread message

Denis Haskin

unread,
Feb 20, 2012, 10:08:03 PM2/20/12
to Bluepill
I have one (EC2) Ubuntu server where bluepill is working just fine to
start and monitoring resque processes (and it has done so on other
nodes in the past).

I'm setting up a new node, and for some reason on this node bluepill
does not recognize that the processes have started and are running,
and so keeps creating new ones. I'm a little baffled by what's
causing this. The 2 nodes are almost identical; they're both EC2
servers provisioned by the same chef scripts. It is true that the one
not working is 'production' and the other 'staging', but there's
almost no difference due to that.

Any thoughts or suggestions before I fork the github project and start
inserting more monitoring, to try and figure out what's going on?
There's been discussion on this list in the past about troubles w/
bluepill and resque, but as I said this is working fine on my staging
server, and has worked fine on earlier production servers (although I
will note that this new production server is ruby 1.9.3 (vs 1.9.2) and
rails 3.2 (vs. 3.1)).

Thanks!

dwh

Akzhan Abdulin

unread,
Feb 21, 2012, 6:31:44 AM2/21/12
to bluep...@googlegroups.com
Now I have no time to reserarch it by myself.

Try to update bluepill and use :foreground option to describe your troubles.

Pull requests always apprepriated.

2012/2/21 Denis Haskin <de...@ceromancy.net>

Denis Haskin

unread,
Feb 21, 2012, 7:52:23 AM2/21/12
to bluep...@googlegroups.com
No problem.  I'll try to see if I can figure out what's going on.  I did already try with :foreground and VVERBOSE, without any change in behavior.  Here's my .pill file (note I had start grace time up to 240 secs, and it was still doing it); below is the log from running in foreground.

ENV["RAILS_ENV"] = "production"
ENV["QUEUE"] = "*"
ENV["VVERBOSE"] = "1"

Bluepill.application("zmx_app", :foreground => true) do |app|
  app.working_dir = "/srv/zmx/current"
  app.uid = "root"
  app.gid = "root"
  2.times do |i|
    app.process("resque-#{i}") do |process|
      process.group = "resque"
      process.start_command = "rake resque:work"
      process.pid_file = "/srv/zmx/current/tmp/pids/resque_workers-#{i}.pid"
      process.stop_command = "kill -QUIT {{PID}}"
      process.daemonize = true
      process.stdout = process.stderr = "/tmp/bluepill-resque-#{i}.log"
      process.start_grace_time = process.stop_grace_time = process.restart_grace_time = 240.seconds
    end
  end
end

Output from running in foreground:

$ sudo bluepill load /etc/bluepill/zmx_app.pill 
[warning]: [zmx_app:resque:resque-0] Executing start command: rake resque:work
[info]: [zmx_app:resque:resque-0] Going from down => starting
[warning]: [zmx_app:resque:resque-1] Executing start command: rake resque:work
[info]: [zmx_app:resque:resque-1] Going from down => starting
[info]: [zmx_app:resque:resque-0] Going from starting => down
[info]: [zmx_app:resque:resque-1] Going from starting => down
[warning]: [zmx_app:resque:resque-0] Executing start command: rake resque:work
[info]: [zmx_app:resque:resque-0] Going from down => starting
[warning]: [zmx_app:resque:resque-1] Executing start command: rake resque:work
[info]: [zmx_app:resque:resque-1] Going from down => starting
^CTerminating...

Thanks,

--
Denis Haskin

Ryan Mohr

unread,
Feb 21, 2012, 2:12:05 PM2/21/12
to bluep...@googlegroups.com
Have you tried an approach that doesn't use rake?  It's probably forking the worker processes in a way that's not supported by bluepill.  

Give resque-pool a try.  I've successfully been using it in production for a few months now and don't have any complaints.

Denis Haskin

unread,
Feb 21, 2012, 2:33:02 PM2/21/12
to bluep...@googlegroups.com
Not yet re: not using rake.  And I will take a look at resque-pool, which is frequently suggested.

What I find baffling is why it's behaving differently on this server than on others...


--
Denis Haskin




Ryan Mohr

unread,
Feb 21, 2012, 2:42:19 PM2/21/12
to bluep...@googlegroups.com
I've found that any time I encounter a "baffling" situation with my code it's always due to something simple I've overlooked.  Run a diff between the working and non-working pills and make sure both servers are using the same version of resque.
Reply all
Reply to author
Forward
0 new messages