best effort to gracefully shut down?

452 views
Skip to first unread message

Jan Brdo

unread,
May 13, 2013, 12:53:19 PM5/13/13
to cellulo...@googlegroups.com
So I would like to know if there is any reasonable (can be hack/monkeypatch) way of shutting down gracefully. I know that the big refactor will address this but I am looking for workarounds for the current stable.

What I want is:
1. Process receives SIGTERM
2. Because in my case, my tasks are short (only a couple of seconds), I want to wait for all currently running tasks to finish.
3. Terminate after all tasks completed (without being forcefully terminated)

What happens now is:
1. Process receives SIGTERM
2. I try to use Supervisor.root to prevent forcefully killing my actors:

class WorkSupervisor
  include Celluloid
  def self.run
    loop do
      supervisor = new
      Supervisor.root = supervisor.wrapped_object
      # Take five, toplevel supervisor
      sleep 5 while supervisor.alive?
      log.error "!!! WorkSupervisor crashed."
    end
  end
  def terminate
    log.info "Waiting for #{worker_pool.busy_workers.count} to finish..."
    wait_on_all_workers
  end
  def wait_on_all_workers
    sleep 0.05 while worker_pool.busy_workers.count > 0
  end
end

However, my actors just get killed before I can check anything, even though looking at Celluloid's source real quick it would seem that they get terminated after the Supervisor.root.terminate, I was not able to yet figure out exactly the order of how this happens or why (perhaps the threads act themselves based on the received SIGTERM?). How could I solve this? I want to wait for the actors to finish instead of killing them.

Subproblem 1
I get a "Celluloid::Task::TerminatedError: task was terminated" error when I try to check if my actors are still alive and busy (busy? method I defined by myself), or sometimes I get a "Celluloid::DeadActorError: attempted to call a dead actor" error also. Just for the fun of it I check if the actor is still busy like this, although the error itself illustrates that the actors are being forcefully killed before I try to "reach" them:

  worker.alive? && worker.busy?
I also remember something about alive? being deprecated or something changed, I forgot, but what is the correct way to do this anyway?

Subproblem 2
I am also using a timer, I know it is a separate gem now (timers), but if anybody knows, again the issue is when shutting down. This timer basically schedules tasks to run on the actors, so of course I want to first make sure that the timer has finished and prevent scheduling it, and then wait for any busy actors to finish processing. I am using the after method like this (because the body can take quite some time to run and that's why I don't want a normal interval/every):
  def start_timer
    periodic_task_proc = Proc.new do
      periodic_task

      @periodic_timer = after(5, &periodic_task_proc)
    end

    @periodic_timer = after(0.1, &periodic_task_proc)
  end
So ideally I would like to wait for the periodic_task method to end, but not schedule any new execution after that, and then wait on the busy actors to complete. Should I just use instance variables as a mutex (if I understand correctly access is synced to those), or is there a more elegant way? But that would probably not be atomic so not sure if that would work 100%.

Thanks! 

Tim Carey-Smith

unread,
May 13, 2013, 5:33:26 PM5/13/13
to cellulo...@googlegroups.com
The Supervisor.root.terminate call is basically never actually used.
It was added and never really embraced.

We are improving the supervision hierarchy.
We hope to have a tree-based supervision structure.
This will allow for the termination to happen in a sequence.

I also believe we will add hooks to allow you to safely handle the termination of your actor.
As to whether we would defer the termination until any tasks have completed, this is something which we might keep in userspace for now.

>
> *Subproblem 1*
> I get a "Celluloid::Task::TerminatedError: task was terminated" error when
> I try to check if my actors are still alive and busy (busy? method I
> defined by myself), or sometimes I get a "Celluloid::DeadActorError:
> attempted to call a dead actor" error also. Just for the fun of it I check
> if the actor is still busy like this, although the error itself illustrates
> that the actors are being forcefully killed before I try to "reach" them:
>
> worker.alive? && worker.busy?
>
> I also remember something about alive? being deprecated or something
> changed, I forgot, but what is the correct way to do this anyway?

Asking the "actor" (really this is the object proxy) if it is "alive?" is valid.
This will return false as soon as the mailbox is shutdown.

>
> *Subproblem 2*
> I am also using a timer, I know it is a separate gem now (timers), but if
> anybody knows, again the issue is when shutting down. This timer basically
> schedules tasks to run on the actors, so of course I want to first make
> sure that the timer has finished and prevent scheduling it, and then wait
> for any busy actors to finish processing. I am using the after method like
> this (because the body can take quite some time to run and that's why I
> don't want a normal interval/every):
>
>> def start_timer
>>
>> periodic_task_proc = Proc.new do
>>
>> periodic_task
>>
>>
>>> @periodic_timer = after(5, &periodic_task_proc)
>>
>> end
>>
>>
>>> @periodic_timer = after(0.1, &periodic_task_proc)
>>
>> end
>>
>> So ideally I would like to wait for the periodic_task method to end, but
> not schedule any new execution after that, and then wait on the busy actors
> to complete. Should I just use instance variables as a mutex (if I
> understand correctly access is synced to those), or is there a more elegant
> way? But that would probably not be atomic so not sure if that would work
> 100%.

A timer is run inside a Task, so this is in the same category to waiting for all tasks to complete.

I'd like to see some gists of complete code surrounding this behavior.
This would give us some examples of how we could make this work better in future.

Ciao,
Tim

MrBrdo

unread,
May 13, 2013, 5:42:03 PM5/13/13
to cellulo...@googlegroups.com
Hey Tim,

I know the Supervisor.root is not really used, but it was the only thing that caught my eye, that I thought could help me with my graceful shutdown problem (- it didn't). Also I know there are plans to provide options for this in the future but was also wondering if there are any 'hacks' possible right now?

Do you know where Celluloid::Task::TerminatedError is being raised from and how to prevent it if possible?

Is there any way of me atomically asking my actor if it's alive? and busy? (my method)? Or should I just do busy? and rescue the Celluloid::DeadActorError exception?

Not sure what you meant regarding the timer.

I can provide gists, since this is part of a bigger project I can't really share all the code but I can provide some examples, is that what you meant - you want something that you can execute to demonstrate my issues? If you describe what you would like I will provide.

Regards,
Jan




2013/5/13 Tim Carey-Smith <g...@spork.in>

--
You received this message because you are subscribed to a topic in the Google Groups "Celluloid" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/celluloid-ruby/U98_fUzMO9E/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to celluloid-rub...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Tim Carey-Smith

unread,
May 13, 2013, 5:49:15 PM5/13/13
to cellulo...@googlegroups.com
Hi,

I think it best to jump on IRC or have a video chat.
I can give you a run down of the current workings and perhaps show the planned future.

More below: vvvv

On May 14, 2013, at 9:42 AM, MrBrdo <mrb...@gmail.com> wrote:

> Hey Tim,
>
> I know the Supervisor.root is not really used, but it was the only thing
> that caught my eye, that I thought could help me with my graceful shutdown
> problem (- it didn't). Also I know there are plans to provide options for
> this in the future but was also wondering if there are any 'hacks' possible
> right now?
>
> Do you know where Celluloid::Task::TerminatedError is being raised from and
> how to prevent it if possible?

This is inside the Task implementation.
And is called from the termination code. Check Actor#cleanup.

> Is there any way of me atomically asking my actor if it's alive? and busy?
> (my method)? Or should I just do busy? and rescue the
> Celluloid::DeadActorError exception?

You should listen to the termination events from your subordinate actors.
This gives you the best way to determine their liveliness.

Our overhaul will improve this situation greatly.

> Not sure what you meant regarding the timer.

A timer runs in a Task, so if you were to implement a "wait for all tasks" hack, you would catch this timer.

>
> I can provide gists, since this is part of a bigger project I can't really
> share all the code but I can provide some examples, is that what you meant
> - you want something that you can execute to demonstrate my issues? If you
> describe what you would like I will provide.

Yes, having a few good demonstrations, rather than incomplete snippets would be amazing!
I do understand that it can take time to extract pieces and that exposing internal code is sometimes not possible.

Cheers,
Tim
> You received this message because you are subscribed to the Google Groups "Celluloid" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to celluloid-rub...@googlegroups.com.

MrBrdo

unread,
May 13, 2013, 6:02:02 PM5/13/13
to cellulo...@googlegroups.com
Ok thanks, at the moment it is kind of late here so I have to run, but I will come by on IRC tomorrow, and will start preparing the examples.

Thank you


2013/5/13 Tim Carey-Smith <g...@spork.in>

Tony Arcieri

unread,
May 13, 2013, 7:50:14 PM5/13/13
to cellulo...@googlegroups.com
On Mon, May 13, 2013 at 2:33 PM, Tim Carey-Smith <g...@spork.in> wrote:
The Supervisor.root.terminate call is basically never actually used.
It was added and never really embraced.

It's used if you call MySupervisionGroup.run

--
Tony Arcieri

Tim Carey-Smith

unread,
May 13, 2013, 11:33:44 PM5/13/13
to cellulo...@googlegroups.com
This uses the root *name* registry. Not the root supervisor.
Reply all
Reply to author
Forward
0 new messages