bluepill not detecting started processes on a node

Denis Haskin

unread,

Feb 20, 2012, 10:08:03 PM2/20/12

to Bluepill

I have one (EC2) Ubuntu server where bluepill is working just fine to
start and monitoring resque processes (and it has done so on other
nodes in the past).

I'm setting up a new node, and for some reason on this node bluepill
does not recognize that the processes have started and are running,
and so keeps creating new ones. I'm a little baffled by what's
causing this. The 2 nodes are almost identical; they're both EC2
servers provisioned by the same chef scripts. It is true that the one
not working is 'production' and the other 'staging', but there's
almost no difference due to that.

Any thoughts or suggestions before I fork the github project and start
inserting more monitoring, to try and figure out what's going on?
There's been discussion on this list in the past about troubles w/
bluepill and resque, but as I said this is working fine on my staging
server, and has worked fine on earlier production servers (although I
will note that this new production server is ruby 1.9.3 (vs 1.9.2) and
rails 3.2 (vs. 3.1)).

Thanks!

dwh

Akzhan Abdulin

unread,

Feb 21, 2012, 6:31:44 AM2/21/12

to bluep...@googlegroups.com

Now I have no time to reserarch it by myself.

Try to update bluepill and use :foreground option to describe your troubles.

Pull requests always apprepriated.

2012/2/21 Denis Haskin <de...@ceromancy.net>

Denis Haskin

unread,

Feb 21, 2012, 7:52:23 AM2/21/12

to bluep...@googlegroups.com

No problem. I'll try to see if I can figure out what's going on. I did already try with :foreground and VVERBOSE, without any change in behavior. Here's my .pill file (note I had start grace time up to 240 secs, and it was still doing it); below is the log from running in foreground.

ENV["RAILS_ENV"] = "production"

ENV["QUEUE"] = "*"

ENV["VVERBOSE"] = "1"

Bluepill.application("zmx_app", :foreground => true) do |app|

app.working_dir = "/srv/zmx/current"

app.uid = "root"

app.gid = "root"

2.times do |i|

app.process("resque-#{i}") do |process|

process.group = "resque"

process.start_command = "rake resque:work"

process.pid_file = "/srv/zmx/current/tmp/pids/resque_workers-#{i}.pid"

process.stop_command = "kill -QUIT {{PID}}"

process.daemonize = true

process.stdout = process.stderr = "/tmp/bluepill-resque-#{i}.log"

process.start_grace_time = process.stop_grace_time = process.restart_grace_time = 240.seconds

end

Output from running in foreground:

$ sudo bluepill load /etc/bluepill/zmx_app.pill

[warning]: [zmx_app:resque:resque-0] Executing start command: rake resque:work

[info]: [zmx_app:resque:resque-0] Going from down => starting

[warning]: [zmx_app:resque:resque-1] Executing start command: rake resque:work

[info]: [zmx_app:resque:resque-1] Going from down => starting

[info]: [zmx_app:resque:resque-0] Going from starting => down

[info]: [zmx_app:resque:resque-1] Going from starting => down

[warning]: [zmx_app:resque:resque-0] Executing start command: rake resque:work

[info]: [zmx_app:resque:resque-0] Going from down => starting

[warning]: [zmx_app:resque:resque-1] Executing start command: rake resque:work

[info]: [zmx_app:resque:resque-1] Going from down => starting

^CTerminating...

Thanks,

--

Denis Haskin

cell: 781-258-7414

Ryan Mohr

unread,

Feb 21, 2012, 2:12:05 PM2/21/12

to bluep...@googlegroups.com

Have you tried an approach that doesn't use rake? It's probably forking the worker processes in a way that's not supported by bluepill.

Give resque-pool a try. I've successfully been using it in production for a few months now and don't have any complaints.

Denis Haskin

unread,

Feb 21, 2012, 2:33:02 PM2/21/12

to bluep...@googlegroups.com

Not yet re: not using rake. And I will take a look at resque-pool, which is frequently suggested.

What I find baffling is why it's behaving differently on this server than on others...

--

Denis Haskin

cell: 781-258-7414

Ryan Mohr

unread,

Feb 21, 2012, 2:42:19 PM2/21/12

to bluep...@googlegroups.com

I've found that any time I encounter a "baffling" situation with my code it's always due to something simple I've overlooked. Run a diff between the working and non-working pills and make sure both servers are using the same version of resque.

Reply all

Reply to author

Forward