Hi Bernard,
2010/5/12 Bernard Duggan <
ber...@m5net.com>:
> Hi Mazen,
> Our setup kind of has the opposite requirement - if either the worker or
> the gen_server that spawned it crash, both should crash out (sorry, that
> probably wasn't very clear in my initial post). That's why initially a
> straight spawn_link to create the worker looked so attractive - errors
> propagate between the gen_server and worker, both are cleaned up, and the
> gen_server's supervisor just restarts the gen_server (which is what we
> want). That, though, would require us to handle the EXIT message from the
> worker in the gen_server if the latter turned on trap_exit (which it seems
> like it needs to). It's not a huge problem, just seems like something we
> should be able to avoid.
I'm currently facing a similar situation (not exactly alike, but alike
enough) and got tripped up by the need to trap_exit if you want to run
terminate/2 on shutdown.
What I'm starting to realize is that linking should as much as
possible, ONLY be used withing the OTP supervision tree context... and
monitors should be used for everything else. I've not seen this
written anywhere, but I've come to discover that there are OTP ways to
work around the scenario you're describing. Managing it any other way
will break in some way within OTP.
Now i'm experimenting with depending on the various supervisor restart
strategies, children restart types and restart frequencies, monitors,
etc.
- You want a gen_server to spawn a worker, and both should be
restarted if any of the other goes down... What to do is this (forgive
the ascii art!):
[root_sup] % root_sup should use a one_for_all restart strategy
|
[gen_server, worker_sup] % worker_sup should be simple_one_for_one,
MaxR = 0, MaxT=10
|
[worker(s)] % this should be a permanent child.
To complete the above, establish a monitor (erlang:monitor/2) b/w
gen_server and worker pid after the supervisor:start_child(worker_sup,
[]) call.
The above tree will work as you want, though it introduces root_sup
and worker_sup.
- If gen_server crashes, worker_sup will also crash and restart, thus
taking down and restart worker(s)?
- Since the gen_server is monitoring the worker, it will recieve the
'DOWN' message when the worker goes down and you can then {stop,
Reason} from the gen_server. Also, the worker restart frequency is
zero times in 10 seconds, so it will not be restarted by its
supervisor, untill the gen_server has died also, taken them all down
and they're all restarted by the root_sup.
I have tried it out, and it works nicely too.
The question i have for you though, is:
Does every gen_server just spawn a single worker process or can it
spawn multiple workers? If multiple, what happens to the rest if one
dies? (I'm assuming the answer to this is one gen_server, one worker)
Hope that helps.
cheers,
Essien