[erlang-questions] gen_server cleanup handling

Bernard Duggan

unread,

May 10, 2010, 9:58:19 PM5/10/10

to Erlang

Hi list,
This is something of a followup to my previous question about
supervision trees. Basically we have a gen_server that, in the case of
a standard app shutdown, needs to do some cleanup. A standard app
shutdown seems to take the form of supervisors sending 'shutdown' exit
messages to their children. Fine so far. The catch is that in order
for the terminate() handler to be invoked on our server, we have to turn
on trap_exit in every gen_server that needs to do cleanup. This seems
non-ideal for a few reasons:

* Elsewhere in the docs, we're cautioned against using trap_exit except
when unavoidable (though I'm happy to accept it if this is one such case).

* It means that we don't get automatic propagation of crashes from any
worker processes we might spawn. Instead we have to write a
handle_info({'EXIT'...) in every gen_server that might ever link to
another process to ensure those crashes are propagated properly. This
seems like it could be solved by instead spawning any worker processes
as a child of our supervisor (which is what Garrett suggested we should
do anyway) - if that's a good reason to set up things that way, I'm
likewise happy to accept it and rearrange our code accordingly.

* Most concerning, though, is the possibility of some library call
creating a link to a temporary worker process (or any process for that
matter) whose crash should propagate through us - in this case we'd
still have to have the handle_info({'EXIT'...) setup as a catchall which
seems like a fiddly, repetitive bit of code we'd rather avoid if possible.

So what's the thinking about this? Am I missing something obvious?
Should I just turn on trap_exit willy-nilly wherever I need shutdown
cleanup? Should I just suck it up and write the 'EXIT' message handlers
in all such gen_servers?

Cheers,

Bernard

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questio...@erlang.org

--
You received this message because you are subscribed to the Google Groups "Erlang Programming" group.
To post to this group, send email to erlang-pr...@googlegroups.com.
To unsubscribe from this group, send email to erlang-programm...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/erlang-programming?hl=en.

Mazen Harake

unread,

May 11, 2010, 2:58:03 AM5/11/10

to Bernard Duggan, Erlang

Spawn your worker processes in a simple_one_for_one supervisor which is
under your supervisor. This will propagate the exit to the worker
processes. Use the gen_server under the first Sup to start a worker
child in the simple_one_for_one Sup.

In other words; Garrett seems to have gotten it right.

This also eliminates point 3 because the crash will not propagate
through the gen_server.

I'm assuming from your setup that the gen_server doesn't care if the
worker crashed or not before it finished, otherwise you might as well
just trap exits and handle all the exit messages anyway without a
supervisor.

Hope this makes sense; it is morning here and my coffee cup is still
full (didn't have a chance to drink it yet :))

Good luck

/Mazen

---------------------------------------------------

---------------------------------------------------

WE'VE CHANGED NAMES!

Since January 1st 2010 Erlang Training and Consulting Ltd. has become ERLANG SOLUTIONS LTD.

www.erlang-solutions.com

Bernard Duggan

unread,

May 11, 2010, 8:47:15 PM5/11/10

to Mazen Harake, Erlang

Hi Mazen,
Our setup kind of has the opposite requirement - if either the
worker or the gen_server that spawned it crash, both should crash out
(sorry, that probably wasn't very clear in my initial post). That's why
initially a straight spawn_link to create the worker looked so
attractive - errors propagate between the gen_server and worker, both
are cleaned up, and the gen_server's supervisor just restarts the
gen_server (which is what we want). That, though, would require us to
handle the EXIT message from the worker in the gen_server if the latter
turned on trap_exit (which it seems like it needs to). It's not a huge
problem, just seems like something we should be able to avoid.

The alternative we thought of (that I forgot to mention) is that
this whole mess seems like it could be neatly avoided if supervisors
also knew how to shutdown their child - I'm thinking an optional extra
function in the child_spec that is called before the exit(Child,
shutdown) call is made. That would allow the gen_server to do its
cleanup properly and exit with 'stop'. There may be a very good (or
historical or both) reason that that doesn't exist, but it seems to me
to be the 'right' way to solve the problem...maybe.

Cheers,

Bernard

Essien Essien

unread,

May 12, 2010, 7:29:50 PM5/12/10

to Bernard Duggan, Mazen Harake, Erlang

Hi Bernard,

2010/5/12 Bernard Duggan <ber...@m5net.com>:

> Hi Mazen,
> Our setup kind of has the opposite requirement - if either the worker or
> the gen_server that spawned it crash, both should crash out (sorry, that
> probably wasn't very clear in my initial post). That's why initially a
> straight spawn_link to create the worker looked so attractive - errors
> propagate between the gen_server and worker, both are cleaned up, and the
> gen_server's supervisor just restarts the gen_server (which is what we
> want). That, though, would require us to handle the EXIT message from the
> worker in the gen_server if the latter turned on trap_exit (which it seems
> like it needs to). It's not a huge problem, just seems like something we
> should be able to avoid.

I'm currently facing a similar situation (not exactly alike, but alike
enough) and got tripped up by the need to trap_exit if you want to run
terminate/2 on shutdown.

What I'm starting to realize is that linking should as much as
possible, ONLY be used withing the OTP supervision tree context... and
monitors should be used for everything else. I've not seen this
written anywhere, but I've come to discover that there are OTP ways to
work around the scenario you're describing. Managing it any other way
will break in some way within OTP.

Now i'm experimenting with depending on the various supervisor restart
strategies, children restart types and restart frequencies, monitors,
etc.

- You want a gen_server to spawn a worker, and both should be
restarted if any of the other goes down... What to do is this (forgive
the ascii art!):

[root_sup] % root_sup should use a one_for_all restart strategy
|
[gen_server, worker_sup] % worker_sup should be simple_one_for_one,
MaxR = 0, MaxT=10
|
[worker(s)] % this should be a permanent child.

To complete the above, establish a monitor (erlang:monitor/2) b/w
gen_server and worker pid after the supervisor:start_child(worker_sup,
[]) call.

The above tree will work as you want, though it introduces root_sup
and worker_sup.

- If gen_server crashes, worker_sup will also crash and restart, thus
taking down and restart worker(s)?

- Since the gen_server is monitoring the worker, it will recieve the
'DOWN' message when the worker goes down and you can then {stop,
Reason} from the gen_server. Also, the worker restart frequency is
zero times in 10 seconds, so it will not be restarted by its
supervisor, untill the gen_server has died also, taken them all down
and they're all restarted by the root_sup.

I have tried it out, and it works nicely too.

The question i have for you though, is:

Does every gen_server just spawn a single worker process or can it
spawn multiple workers? If multiple, what happens to the rest if one
dies? (I'm assuming the answer to this is one gen_server, one worker)

Hope that helps.

cheers,
Essien

Bernard Duggan

unread,

May 12, 2010, 8:29:01 PM5/12/10

to Essien Essien, Mazen Harake, Erlang

Hi Essien,

On 13/05/10 09:29, Essien Essien wrote:
> - You want a gen_server to spawn a worker, and both should be
> restarted if any of the other goes down... What to do is this (forgive
> the ascii art!):
>
> [root_sup] % root_sup should use a one_for_all restart strategy
> |
> [gen_server, worker_sup] % worker_sup should be simple_one_for_one,
> MaxR = 0, MaxT=10
> |
> [worker(s)] % this should be a permanent child.
>
> To complete the above, establish a monitor (erlang:monitor/2) b/w
> gen_server and worker pid after the supervisor:start_child(worker_sup,
> []) call.
>
> The above tree will work as you want, though it introduces root_sup
> and worker_sup.
>
> - If gen_server crashes, worker_sup will also crash and restart, thus
> taking down and restart worker(s)?
>
> - Since the gen_server is monitoring the worker, it will recieve the
> 'DOWN' message when the worker goes down and you can then {stop,
> Reason} from the gen_server. Also, the worker restart frequency is
> zero times in 10 seconds, so it will not be restarted by its
> supervisor, untill the gen_server has died also, taken them all down
> and they're all restarted by the root_sup.
>

Yeah, that arrangement was one I considered, I'm just not sure that it's
better than just doing a straight spawn_link, and catching and
propagating the non-normal 'EXIT' message from the worker. If you have
to catch and propagate the 'DOWN', the only difference in terms of code
size is the extra supervisor and the explicit monitoring call which both
count against that plan. (Not, of course, that code size should be the
sole determining factor, but trying to avoid unnecessary code was what
got me onto this train of thought in the first place). I do make use of
monitoring elsewhere, though, and I think you're quite right that it's
underused at the expense of linking.

> I have tried it out, and it works nicely too.
>
> The question i have for you though, is:
>
> Does every gen_server just spawn a single worker process or can it
> spawn multiple workers? If multiple, what happens to the rest if one
> dies? (I'm assuming the answer to this is one gen_server, one worker)
>

You assume correctly. I have a gen_server that needs to be quite
responsive. Occasionally it also kicks off a single longer-running
operation which can be done in parallel, but is only transient.

You know, it's just occurred to me that the "easy" and (probably)
"right" solution for this particular case is to have the worker as "just
another gen_server" under a one_for_all supervisor with the original
gen_server. It can then sit idle quite happily until required - I don't
know why I had it fixed in my head that it had to be spawned only when
it was needed and terminated when it was done. Duh.

Thanks for your thoughtful response :)

Cheers,

Bernard

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questio...@erlang.org

Reply all

Reply to author

Forward