_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
I am by no means an expert on the topic but I would like to point out
that the only reason you get {already_started, ...} error is because
you attempt to register the helper process with {local, ...}. If it is
a helper, there should be no reason for it to be globally accessible.
And if it wasn't registered, the gen_server would be restarted without
issues creating new helper process. The old helper would die eventually
just as you expect it to.
Ladislav Lenart
On 2.6.2011 09:15, Steve Strong wrote:
> Yeah, that makes perfect sense and would obviously solve the problem.
>
> The reason we'd gone down this path was that we had a number of "sub" processes (the gen_event just being one example) that we felt would be "polluting" the supervisor; these sub-processes were just
> helpers of the primary gen_servers that the supervisor was controlling - using a start_link in the primary gen_servers felt like a very clean and easy way of spinning up these other processes in a way
> that (we thought) would still be resilient to failures.
>
> The thing that bit us was that we naively thought that, due to the sub-process being linked, it would die when the parent died. Of course, it does, but its death is asynchronous to the notification
> that the supervisor receives and hence it may well still be alive (doomed, but alive) when the supervisor begins the restart cycle. Our servers don't crash that often, and when they do this race
> condition is was rarely seen, which was reinforced our misconceptions. The only thing that does surprise me is how many times the supervisor can go round the restart loop before the doomed process
> finally exits - we have seen it thrash round this loop about 1000 times before the supervisor itself finally fails; I guess it's just down to how things are being scheduled by the VM, and in those
> cases we were just getting unlucky.
>
> Sounds like best-practice within the OTP world is to have everything started via a supervisor - is that a fair comment?
>
> Cheers,
>
> Steve
>
> --
> Steve Strong, Director, id3as
> twitter.com/srstrong
>
> On Wednesday, 1 June 2011 at 23:57, Ahmed Omar wrote:
>
>> Agree with Roberto, you should put under supervisor. Regarding your case, i would guess you are trapping exit in your init in my_gen_event?
>>
>> On Wed, Jun 1, 2011 at 11:15 PM, Roberto Ostinelli <rob...@widetag.com <mailto:rob...@widetag.com>> wrote:
>>> hi steve,
>>>
>>> your gen_event should be started by your supervisor too. in this case, since you specified a one_for_all behaviour, when gen_server crashes, gen_event will be restarted too.
>>>
>>> r.
>>>
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-q...@erlang.org <mailto:erlang-q...@erlang.org>
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>
>>
>>
>> --
>> Best Regards,
>> - Ahmed Omar
>> http://nl.linkedin.com/in/adiaa
>> Follow me on twitter
>> @spawn_think <http://twitter.com/#!/spawn_think>
My bad. By "globally accessible" I meant that the locally
registered process will be available to all processes on
the local node.
Ladislav Lenart
> Steve, let's put it this way it's better to start processes under supervisor specially if you want to benefit from standard restarting strategies, it keeps your application cleaner.
> (as a hack, your case can also be solved by a monitor in the gen_server init before starting the gen_server
> erlang:monitor(process, my_gen_event),
> receive
> {'DOWN', Ref, process, Pid, Reason}->
> ok
> end,
> )
> But using supervisor is much cleaner and safer, and easier to design with in my opinion
> twitter.com/srstrong <http://twitter.com/srstrong>
>
> On Wednesday, 1 June 2011 at 23:57, Ahmed Omar wrote:
>
> Agree with Roberto, you should put under supervisor. Regarding your case, i would guess you are trapping exit in your init in my_gen_event?
>
> On Wed, Jun 1, 2011 at 11:15 PM, Roberto Ostinelli <rob...@widetag.com <mailto:rob...@widetag.com> <mailto:rob...@widetag.com <mailto:rob...@widetag.com>>> wrote:
>
> hi steve,
>
> your gen_event should be started by your supervisor too. in this case, since you specified a one_for_all behaviour, when gen_server crashes, gen_event will be restarted too.
>
> r.
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org <mailto:erlang-q...@erlang.org> <mailto:erlang-q...@erlang.org <mailto:erlang-q...@erlang.org>>
Yeah, that makes perfect sense and would obviously solve the problem.
The reason we'd gone down this path was that we had a number of "sub" processes (the gen_event just being one example) that we felt would be "polluting" the supervisor; these sub-processes were just helpers of the primary gen_servers that the supervisor was controlling - using a start_link in the primary gen_servers felt like a very clean and easy way of spinning up these other processes in a way that (we thought) would still be resilient to failures.
The thing that bit us was that we naively thought that, due to the sub-process being linked, it would die when the parent died. Of course, it does, but its death is asynchronous to the notification that the supervisor receives and hence it may well still be alive (doomed, but alive) when the supervisor begins the restart cycle. Our servers don't crash that often, and when they do this race condition is was rarely seen, which was reinforced our misconceptions. The only thing that does surprise me is how many times the supervisor can go round the restart loop before the doomed process finally exits - we have seen it thrash round this loop about 1000 times before the supervisor itself finally fails; I guess it's just down to how things are being scheduled by the VM, and in those cases we were just getting unlucky.Sounds like best-practice within the OTP world is to have everything started via a supervisor - is that a fair comment?
I wouldn't say that you are wrong. I think that you are reasoning good
about not putting the gen_event module under a supervisor because
*that is what links are for*. Just because you have a supervisor
doesn't mean the you shove everything underneath there! If the
gen_server and the gen_event are truly linked (meaning: gen_server
doesn't act as a "supervisor" keeping track of its gen_event process
and restarts it all the time but rather that they really are linked
and they crash together) then your approach, in my opinion, is good.
There are great benefits in doing it in that way. Many will claim that
it is best practice to put *everything* under a supervisor but this is
simply not true. 90% of cases it *is* the best thing to do and many
times it is more about how you designed your application rather than
where to put the supervisors and their children but doing it the way
you did is not necessarily wrong.
The only problem I see with your approach is that you have registered
the gen_event process which clearly isn't useful (since only the
gen_server should know about it, after all, it started it). Other than
that, this approach is extremely helpful and a nice way to clean up
things after they die/shutdown (Again: assuming truly linked).
There is a big misconception in the community that everything
should/must look like the supervisor-tree model which shows how
gen_servers are put under supervisors and more supervisors under the
"top" supervisor but that is not enforced and the design principles
doesn't take many cases into account where this setup actually brings
more headache to the table than to just exit and clean up using linked
processes (because they do exist).
/M
You might consider gproc for these kinds of use cases. It provides a
great deal of simplification around synchronising startups and
registering names etc.
This is a serious point to consider if you ever plan on going the way of releases/appups if the workers you use are to be long-lived (you don't want them to be killed during a purge). I'm not saying you didn't know this, but I felt I should point it out for the sake of having the arguments clear on the mailing list.
--
Fred Hébert
http://www.erlang-solutions.com
FWIW couldn't agree more with this. For completeness (it's obvious and you're
no doubt aware of it): 'normal' exits don't kill linked peers, which takes a
little getting used to, but is trivial to manage.
As a more general point, designing sensible supervision trees was probably
the most difficult engineering aspect of OTP for me to learn, so I guess
people shouldn't feel too bad if it feels intimidating initially. :-)
BR,
-- Jachym
Personally I have very rarely used the live upgrade tools of a node
(relup/appup/release_handler etc) so I don't really know the bad side
of not putting everything under a supervision tree. But then again I
simply don't think the fuzz of specifying every single thing to
reload/change is worth the "uptime" mark.
The strategy I prefer is to have an architecture which enables me to;
take down a node gracefully (detaching itself from the cluster),
manually install a release (I.e. untar the release and changing
start_erl.data to point to it), and start up the node again. This
should not affect the system which should still be operational (say
you have 10 nodes and you do this upgrade one by one). Should the new
release not work or something unexpected turns up then just change the
start_erl.data file to point to the old release and bounce the node
(your version handling on your applications should support this
meaning v1.32.424 in this release has *exactly* the same code as
v1.32.424 in the previous release).
This way of working has been proven very successful to me (and the
systems I took part in building). Specifying relups and appups for
this kind of work is, in my opinion, tedious but some seem to think it
is worth the effort. However you do have a very important point to
consider when not hanging everything under a supervisor tree. If I had
only 2 nodes to consider maybe I'd want them up at all time but then
again they would be built in a way to handle if one goes down (E.g.
when I upgrade them).
2011/6/2 Frédéric Trottier-Hébert <fred....@erlang-solutions.com>: