Supervisor and simple_one_on_one termination

519 views
Skip to first unread message

Bastien Barcaioni

unread,
May 1, 2014, 11:03:20 AM5/1/14
to elixir-l...@googlegroups.com
Hi,
I'm having trouble with stoping supervised childrens. Here you'll see my Supervisor module.


defmodule Supervisor.Process.Supervisor do
  use Supervisor.Behaviour

  def start_link() do
    :supervisor.start_link({:local,  :process_supervisor}, __MODULE__, [])
  end

  def init(_options) do
 
    children = [
      supervisor(Supervisor.Process.Single.Supervisor, [] )
    ]

    supervise(children, strategy: :simple_one_for_one)
  end

  def start_child(id, data) do
    :supervisor.start_child(:process_supervisor,  [id,data])
  end

  def stop_child(pid) do
    :supervisor.terminate_child(:process_supervisor, pid)
  end
 
end

And here is the supervised supervisor :

defmodule Supervisor.Process.Single.Supervisor do
  use Supervisor.Behaviour

  def gen_name(id) do
      binary_to_atom("single.process.supervisor.#{id}")
  end

  def start_link(id,data) do
    :supervisor.start_link({:local,  gen_name(id)}, __MODULE__, [id,data])
  end

  def init([id,data]) do 
    children = [
      worker(Supervisor.Process.Single.Responsible, [id, data] , [restart: :transient] ),
      worker(Supervisor.Process.Single.Rabbit, [id], [restart: :transient] )
    ]
    supervise(children, strategy: :one_for_one)
  end
 
end


 Starting a supervisor works, which will starts a few other Processes. Job is nicely done. But, terminate_child doesn't seem to work as nicely. My started processes seem to be running for ever since I get no call to terminate on them.
Is there some thing I can do to assert the supervised supervisor get appropriately terminated ? I guess i can have supervised processes stoped but then the worker will stay up forever which seems a bit sad.

Thanks for your advices !

Bastien

Alex Shneyderman

unread,
May 1, 2014, 11:22:31 AM5/1/14
to elixir-l...@googlegroups.com
it looks like the supervision tree you are creating does not 
make sense. You probably meant something else.

Basically, you have your top supervisor creating supervisor 
Supervisor.Process.Single.Supervisor as its child. This top 
supervisor has simple_one_for_one strategy to restart its children.

I am not sure what the use case for simple_one_for_one can be 
made for a child that is a supervisor but my initial reaction is that's
proabably not what you want here. 

Then your Supervisor.Process.Single.Supervisor has its restart 
strategy that is one_for_one. Meaning if any of its children die 
they will be restarted one for one (having no affect on the rest 
of the tree). So, when you terminate your child (worker child) it 
gets restarted. You can start up :appmon.start() and see your 
tree in graphical form. You can also see easily that PIDs are 
changing as you terminate the kids.

Cheers,
Alex.

Bastien Barcaioni

unread,
May 1, 2014, 1:05:58 PM5/1/14
to elixir-l...@googlegroups.com
Hi, 

Thanks Alex for your information ! I had no knowledge of this :appmon app, I'll give it a try tomorrow !

As for what this tree means, I should have given a better explanation earlier, i'm sorry about that, I'll try to explain it now :

Reacting to some event, I need to start a "Process", that is composed of several, but distinct erlang-process. One will be dedicated to handling MQ events, another that will handle TCP / Port interactions. The point was, if the Process handling Rabbit went down for some reason, it could get restarted independantly without any problem for the rest of the tree. Same goes for the other part. On the other hand, if TCP or Port Processes were to crash / get terminated for some reason, It would need some other handling, which is why they don't show up in the second supervisor and get monitored by Responsible.

The whole (Rabbit + Responsible) represent one "unit" so to say, that made sense to me, at least. Thus I decided to have them supervised. This supervisor was to be watched, just in case the whole tree went down and automatically get killed. 

Probably I was mislead in my usage of supervisor, I'll have to reread some doc chapters ^^

Still, second supervisor apart, is it the right way to handle termination of simple_one_for_one strategy ? (guess I mispelled title as well -_- )

Thanks again Alex !

Bastien
--
You received this message because you are subscribed to a topic in the Google Groups "elixir-lang-talk" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elixir-lang-talk/dUWlAErbeLw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elixir-lang-ta...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alex Shneyderman

unread,
May 1, 2014, 3:14:59 PM5/1/14
to elixir-l...@googlegroups.com


On Thursday, May 1, 2014 1:05:58 PM UTC-4, Bastien Barcaioni wrote:
Reacting to some event, I need to start a "Process", that is composed of several, but distinct erlang-process. One will be dedicated to handling MQ events, another that will handle TCP / Port interactions. The point was, if the Process handling Rabbit went down for some reason, it could get restarted independantly without any problem for the rest of the tree. Same goes for the other part. On the other hand, if TCP or Port Processes were to crash / get terminated for some reason, It would need some other handling, which is why they don't show up in the second supervisor and get monitored by Responsible.

The whole (Rabbit + Responsible) represent one "unit" so to say, that made sense to me, at least. Thus I decided to have them supervised. This supervisor was to be watched, just in case the whole tree went down and automatically get killed. 

ok, now this makes sense.

You are missing a setting in your top supervisor declaration

this line 
supervisor(Supervisor.Process.Single.Supervisor, [])

should be 
supervisor(Supervisor.Process.Single.Supervisor, [],  [restart: :temporary] )

Here is a fully functional example of what I think you want to achieve:
defmodule App do
    use Application.Behaviour
    def start do
        :application.start(:elixir)
    end
    def start(_, _) do
        Top.Super.start_link()
    end
end

defmodule Top.Super do
    use Supervisor.Behaviour

    def start_link() do
        :supervisor.start_link({:local,  Top.Super}, __MODULE__, [])
    end

    def init(_options) do
        children = [
            supervisor(Top.Coord, [], [restart: :temporary] )
        ]
        supervise(children, strategy: :simple_one_for_one)
    end

    def start_child() do
        :supervisor.start_child(Top.Super, [])
    end

    def stop_child(pid) do 
        :supervisor.terminate_child(Top.Super, pid)
    end
end

defmodule Top.Coord do
    use Supervisor.Behaviour

    def start_link() do
        :supervisor.start_link(__MODULE__, [])
    end

    def init(_options) do
        children = [
            worker(Worker.Mq, []),
            worker(Worker.Tcp, [])
        ]
        supervise(children, strategy: :one_for_one)
    end
end

defmodule Worker.Mq do
    use GenServer.Behaviour
    
    defrecord Config, []
              
    def start_link() do
        :gen_server.start_link(__MODULE__, [], [])
    end

    def init(_args) do
        IO.puts "Wroker.MQ initializing ..."
        IO.inspect Process.self()
        {:ok, Config.new()}
    end

    def terminate(State) do
        IO.puts "Terminating Worker.Mq"
        IO.inspect Process.self()
        :ok
    end
end

defmodule Worker.Tcp do
    use GenServer.Behaviour
    
    defrecord Config, []
              
    def start_link() do
        :gen_server.start_link(__MODULE__, [], [])
    end
    def init(_args) do
        IO.puts "Worker.Tcp initializing ..."
        IO.inspect Process.self()
        {:ok, Config.new()}
    end
    def terminate(State) do
        IO.puts "Terminating Worker.Tcp"
        IO.inspect Process.self()
        :ok
    end
end

iex -S mix

iex(11)> {:ok, p} = Top.Super.start_child()
Wroker.MQ initializing ...
#PID<0.149.0>
Worker.Tcp initializing ...
#PID<0.150.0>
{:ok, #PID<0.148.0>}
iex(12)> Top.Super.stop_child(p)           
:ok
iex(13)> Process.alive?(p)
false

if you use :appmon you will be able to use it super feature "kill by clicking". Click the button kill and then click a process you want to kill :-)

Cheers,
Alex.

Bastien Barcaioni

unread,
May 6, 2014, 4:59:20 AM5/6/14
to elixir-l...@googlegroups.com
Appmon, or Observer as it stand for R17, is just THE TOOL I've been dreaming of ^^ Thanks you a million alex to have share that app :)

Sadly, your solution does only half the work, 'though i'm investigating right now, you gave me some clue that will help :)

iex(1)> {:ok, p} = Top.Super.start_child()
Wroker.MQ initializing ...
#PID<0.108.0>
Worker.Tcp initializing ...
#PID<0.109.0>
{:ok, #PID<0.107.0>}
iex(2)> Top.Super.stop_child(p)
:ok
iex(3)>

First point is great, the spawned supervisor get killed ^^ (same happend with my own code now, thanks to you :p )

But, if I read clearly both TCP and MQ Worker should say something like Terminating. But they dont. YET they get killed (observer says so.)
If I read doc correctly it says that I should use a :erlang.process_flag(:trap_exit, true)  stuff. But that doesn't work too. I'll dig that a bit ^^

Well, that problem aside, your solution is Great !

Bastien Barcaioni

unread,
May 6, 2014, 5:19:33 AM5/6/14
to elixir-l...@googlegroups.com
According to http://www.erlang.org/doc/design_principles/gen_server_concepts.html

If it is necessary to clean up before termination, the shutdown strategy must be a timeout value and the gen_server must be set to trap exit signals in the init function. When ordered to shutdown, the gen_server will then call the callback function terminate(shutdown, State):

And to http://elixir-lang.org/docs/stable/Supervisor.Behaviour.html
:shutdown - defines how a child process should be terminated. Defaults to 5000 for a worker and :infinity for a supervisor;
 

Shutdown values

The following shutdown values are supported:

  • :brutal_kill - the child process is unconditionally terminated using exit(child, :kill);

  • :infinity - if the child process is a supervisor, it is a mechanism to give the subtree enough time to shutdown. It can also be used with workers with care;

  • Finally, it can also be any integer meaning that the supervisor tells the child process to terminate by calling exit(child, :shutdown) and then waits for an exit signal back. If no exit signal is received within the specified time (in miliseconds), the child process is unconditionally terminated using exit(child, :kill);


First point:
In your example, terminate function didn't have the good signature. right signature is :
def terminate( reason, state) do
...
end

Second point:
Worker do indeed need to have :trap_exit flag to be set.
:erlang.process_flag(:trap_exit, true)

I dont know if using this flag has other side effects ... I'll check that ^^

Thanks for your support Alex !
Reply all
Reply to author
Forward
0 new messages