shutdown worker.

14 views
Skip to first unread message

Eric

unread,
Feb 10, 2011, 11:31:20 AM2/10/11
to ruote
I am sure that this is covered inside the documentation some place but
I cant find it. How do you properly shut down a worker. We are running
with ruotekit and a fsstorage. Our worker is running inside rake job.

When we deploy a new build we want to stop and start the worker along
with the web server but short of killing the pid that the worker is
running on, I don't see how we would tell the worker to "give up". I
thought Routekit.engine.shutdown would do it but it does not seem to
have any effect on the worker.

Thanks
Eric Smith

John Mettraux

unread,
Feb 10, 2011, 9:31:46 PM2/10/11
to openwfe...@googlegroups.com

Hello Eric,

sorry it's not covered by the documentation.

To ensure a proper shutdown of a worker you could add to its script something like

---8<---
Signal.trap('TERM') do
puts "worker shutdown"
$worker.context.shutdown
exit 0
end
--->8---

Now if you need a way to shutdown all the workers at once, please tell me.

With a quick glance, I see two variants.

A : a message (much like a launch message) is passed to the storage that tells the workers to shutdown. Works with workers scattered across multiple hosts.

The issue with A is that the message should remain until all workers are shut down, it then has to be removed.

B : at startup, workers registers their pid in the storage, you can then fetch the pid list and kill them. Only works with works in the same system.

Maybe a C is worth exploring : workers register with an id and at shutdown time, 1 shutdown message per worker is emitted so that only the target workers gets shut down.

A could be leveraged to pause the engine, "shutdown flag" is on, "pause flag" is on, ... The flag has to be lowered at some point...


Am I going too far ? How can we refine that.


Best regards,

--
John Mettraux - http://jmettraux.wordpress.com

Torsten Schönebaum

unread,
Feb 11, 2011, 3:23:34 AM2/11/11
to openwfe...@googlegroups.com

I think you're going too far: Shutting down workers is up to the ruote
integrator. There are far too much different possible setups for running
the worker, you can't find a shutdown solution which suits them all.
Apart from that, it should be possible to just stop single and not all
workers.

@Eric: I'd suggest a look at the daemons gem. Here is an example
launcher/launch controller script: https://gist.github.com/822065
Start the worker with
start_ruote_worker_ctl start
or, if it should not be daemonized,
start_ruote_worker_ctl run

Cheers,
Torsten

Eric

unread,
Feb 11, 2011, 9:24:46 AM2/11/11
to ruote
So to be clear the objective is to safely shutdown a worker while it
is not in the middle of a consume. and prevent it from picking up any
new workitems. So of the solutions You outlined A and C look like they
solve the issue. Pausing the engine would be a good solution but the
workers would need to confirm that they are really paused.

I do take Torstens point about the shutdown process being
implementation dependent, however I don't think it alleviates the need
for a best practice. Ruote is an exceptional tool and so far it has
been able to take every task that we can dream up for it( no pun
intended ). As is often the case with configurable software the answer
to every question is "it depends".

I think this type of problem will continue to cause issues around
fault tolerance and instrumentation. You should be able to ask the
engine how many workers are running, how many are consuming. You
should be able to pause or stop the workers. I have been trying to
figure out how to hook ruote up to newrelic so that we can monitor the
performance of individual participants. We have something working but
it still does not let us know if the working is happy and health.

I have looked at the daemons gem before but was not going to tackle
that until we implemented a queue. I will take another look.

Thanks
Eric Smith


On Feb 11, 2:23 am, Torsten Schönebaum <torsten.schoeneb...@web.de>
wrote:

John Mettraux

unread,
Feb 11, 2011, 9:52:02 AM2/11/11
to openwfe...@googlegroups.com

On Fri, Feb 11, 2011 at 06:24:46AM -0800, Eric wrote:
>
> So to be clear the objective is to safely shutdown a worker while it
> is not in the middle of a consume. and prevent it from picking up any
> new workitems. So of the solutions You outlined A and C look like they
> solve the issue. Pausing the engine would be a good solution but the
> workers would need to confirm that they are really paused.
>
> I do take Torstens point about the shutdown process being
> implementation dependent, however I don't think it alleviates the need
> for a best practice. Ruote is an exceptional tool and so far it has
> been able to take every task that we can dream up for it( no pun
> intended ).

Hello Eric,

saturday night's beer will taste extra good with such praise !

Thanks for your trust and all your help so far !

> As is often the case with configurable software the answer
> to every question is "it depends".

> I think this type of problem will continue to cause issues around
> fault tolerance and instrumentation.

> You should be able to ask the
> engine how many workers are running, how many are consuming. You
> should be able to pause or stop the workers.

+1

> I have been trying to
> figure out how to hook ruote up to newrelic so that we can monitor the
> performance of individual participants. We have something working but
> it still does not let us know if the working is happy and health.

I will implement your suggestion (not the newrelic hooking ;-))

Give me a bit of time, I'm working on tidying up ruote-dm (and ruote-sequel) as well as ruote-couch, plus bugs like

http://groups.google.com/group/openwferu-users/browse_thread/thread/ee493bdf8d8cdb37

I also have to terminate the work with Raphael on the filters. Then it will be 2.2.0.

Looking forward to your feedback.


Many thanks for the suggestion, have a nice a week-end,

Reply all
Reply to author
Forward
0 new messages