[erlang-questions] gen_server message queue length increasing

145 views
Skip to first unread message

Zvi

unread,
Nov 3, 2011, 5:14:15 PM11/3/11
to erlang-q...@erlang.org
Hi,

I have a locally registered gen_server [1], which traping exits and
monitoring list of processes (which saved in ETS table in it's state).
The priority of this gen_server set to high.
We also use fullsweep_after = 0.
The only job of this gen_server is to spawn gen_servers of other type
and update ETS table with their pids.
The priority of a spawned gen_server process is normal.

For some reason the message queue length of this gen_server start
increasing, whith messages which supposedly should processed by
gen_server:handle_info/2 [2].
Any ideas?

[2]

(mynode@myhost)9> whereis(myserver).
<0.194.0>
(mynode@myhost)10> i(0,194,0).
[{registered_name,myserver},
{current_function,{proc_lib,sync_wait,2}},
{initial_call,{proc_lib,init_p,5}},
{status,waiting},
{message_queue_len,43388},
{messages,[{'EXIT',<0.17263.1>,normal},
{'DOWN',#Ref<0.0.220.23782>,process,<0.17263.1>,normal},
{'EXIT',<0.19870.0>,normal},
{'DOWN',#Ref<0.0.7.128134>,process,<0.19870.0>,normal},
{'EXIT',<0.19945.0>,normal},
{'DOWN',#Ref<0.0.7.183474>,process,<0.19945.0>,normal},
{'EXIT',<0.19927.0>,normal},
{'DOWN',#Ref<0.0.7.166242>,process,<0.19927.0>,normal},
{'EXIT',<0.19847.0>,normal},
{'DOWN',#Ref<0.0.7.119123>,process,<0.19847.0>,normal},
{'EXIT',<0.19935.0>,normal},
{'DOWN',#Ref<0.0.7.174779>,process,<0.19935.0>,normal},
{'EXIT',<0.17267.1>,normal},
{'DOWN',#Ref<0.0.220.24915>,process,<0.17267.1>,normal},
{'EXIT',<0.19833.0>,normal},
{'DOWN',#Ref<0.0.7.109092>,process,<0.19833.0>,normal},
{'EXIT',<0.19895.0>,normal},
{'DOWN',#Ref<0.0.7.135024>,process,...},
{'EXIT',<0.19906.0>,...},
{'DOWN',...},
{...}|...]},
{links,[<0.14463.0>,<0.1452.1>,<0.28537.1>,<0.6041.2>,
<0.11523.2>,<0.13320.2>,<0.13691.2>,<0.14031.2>,<0.14312.2>,
<0.14363.2>,<0.14502.2>,<0.14514.2>,<0.14518.2>,<0.14523.2>,
<0.14516.2>,<0.14509.2>,<0.14512.2>,<0.14506.2>,<0.14471.2>,
<0.14490.2>|...]},
{dictionary,[{'$ancestors',[router_core_sup,router_sup,
<0.189.0>]},
{'$initial_call',{myserver,init,1}}]},
{trap_exit,true},
{error_handler,error_handler},
{priority,high},
{group_leader,<0.188.0>},
{total_heap_size,1346269},
{heap_size,1346269},
{stack_size,26},
{reductions,3818768},
{garbage_collection,[{min_bin_vheap_size,46368},
{min_heap_size,233},
{fullsweep_after,0},
{minor_gcs,0}]},
{suspending,[]}]

--------------------------------------

[1]

-module(myserver).
-behaviour(gen_server).

...

handle_info({'EXIT', Pid, _}, State) ->
delete_by_pid(Pid, State),
{noreply, State};

handle_info({'DOWN', _, process, Pid, _}, State) ->
delete_by_pid(Pid, State),
{noreply, State};

handle_info(Info, State) ->
{stop, {unknown_info, Info}, State}.
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Zvi

unread,
Nov 3, 2011, 5:15:52 PM11/3/11
to erlang-q...@erlang.org
I forgot to mention, that we use R14B04 with HiPE disabled and Kernel
Poll enabled.
OS is Ubuntu 11.10.

> erlang-questi...@erlang.orghttp://erlang.org/mailman/listinfo/erlang-questions

Allen Kim

unread,
Nov 3, 2011, 5:21:31 PM11/3/11
to Zvi, erlang-q...@erlang.org
from my limited knowledge, it seems your delete_by_pid/1 is waiting for
something.

Allen

Zvi

unread,
Nov 3, 2011, 5:27:53 PM11/3/11
to erlang-q...@erlang.org
The only thing its waiting for is ETS lookup:

delete_by_pid(Pid, State) ->    case ets:lookup(State#state.pid2id,
Pid) of        [{_, ID, Ref}] ->             erlang:demonitor(Ref),   
        ets:delete(State#state.pid2id, Pid),           
ets:delete(State#state.id2pid, ID);         _ ->            ignore   
end.

On Nov 3, 11:21 pm, Allen Kim <allen....@epicadvertising.com> wrote:
> from my limited knowledge, it seems your delete_by_pid/1 is waiting for
> something.
>
> Allen
>

> erlang-questi...@erlang.orghttp://erlang.org/mailman/listinfo/erlang-questions

Zvi

unread,
Nov 3, 2011, 5:31:10 PM11/3/11
to erlang-q...@erlang.org
sorry, hopefully better formatting now:


delete_by_pid(Pid, State) ->
case ets:lookup(State#state.pid2id, Pid) of
[{_, ID, Ref}] ->
erlang:demonitor(Ref),
ets:delete(State#state.pid2id, Pid),
ets:delete(State#state.id2pid, ID);
_ ->
ignore
end.

Allen Kim

unread,
Nov 3, 2011, 5:43:35 PM11/3/11
to Zvi, erlang-q...@erlang.org
You can put some timer:sleep/1 between spawning.
I think your spawning speed is much faster than gen_server queue
processing speed.

Allen

>erlang-q...@erlang.org
>http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Wang Wei

unread,
Nov 3, 2011, 10:36:52 PM11/3/11
to erlang-q...@erlang.org
Hi, Is it ok to let spawned gen_servers delete themselves in ETS table
when they terminated? If you spawn and destroy too fast that will a lock
condition in the main gen_server.

Magnus Klaar

unread,
Nov 4, 2011, 10:22:47 AM11/4/11
to Wang Wei, erlang-q...@erlang.org
Hi!

@Wang

It is not ok to let a gen_server delete itself because it is not guaranteed to delete the reference associated with it from the shared table. This approach will leak memory unless the child processes don't exit under ideal conditions.

@Zvi

when you both link and monitor the child process you are doubling the amount of work your server needs to keep up with, since you are already linked you can skip monitoring.

There is also one possible issue with your server which is that you are starting the child processes directly from the same server receiving the EXIT and DOWN messages. In the process info you included you can see that the current function is proc_lib:sync_wait, this call happens to be inefficient on processes with large inboxes, it'll likely turn out to be much more of a bottleneck than the ETS calls in delete_by_pid.

MVH Magnus

Jon Watte

unread,
Nov 4, 2011, 7:44:47 PM11/4/11
to Magnus Klaar, erlang-q...@erlang.org
It is not ok to let a gen_server delete itself because it is not guaranteed to delete the reference associated with it from the shared table. This approach will leak memory unless the child processes don't exit under ideal conditions.


Is that *actually* true? Isn't the whole point of gen_server, and the Erlang VM in general, that you always have control, and thus can always run code, no matter what the fault?

Specifically, except for the brutal_kill termination kind, is there any case where a gen_server:terminate() callback that does an ets delete on a public table would ever fail?

Sincerely,

jw

Magnus Klaar

unread,
Nov 4, 2011, 10:54:37 PM11/4/11
to Jon Watte, erlang-q...@erlang.org
Hi!

On the verge of very off topic.

First, even if you are not using the brutal_kill shutdown strategy, your server may be sent a kill signal by the supervisor if it does not terminate in a timely fashion when you are not using an infinite shutdown time.

So... now we have created ourselves a design that relies on workers trapping exits and a shutdown strategy that may hang the supervisor that the workers are running under indefinitely if we would ever need to stop a worker or restart the supervisor. Is the neatness of the terminate/2 callback worth this?

Second, even if one decides to go forward with this there is a second requirement for the terminate/2 callback being called, all of your other callbacks must be _guaranteed_ to return control back to the internal gen_server loop at some point, only if that loop receives an exit message from the parent process will your terminate/2 callback be called. Enter a deadlock somewhere or wait for a message that never arrives and this will break your assumption.

This counter example is a worst case scenario. Even if it should be rare, i see little sense in assigning a patient as the one responsible for performing its own post-mortem activities.

MVH Magnus

Ulf Wiger

unread,
Nov 5, 2011, 8:52:55 AM11/5/11
to Magnus Klaar, erlang-q...@erlang.org
On 4 Nov 2011, at 15:22, Magnus Klaar wrote:

There is also one possible issue with your server which is that you are starting the child processes directly from the same server receiving the EXIT and DOWN messages. In the process info you included you can see that the current function is proc_lib:sync_wait, this call happens to be inefficient on processes with large inboxes, it'll likely turn out to be much more of a bottleneck than the ETS calls in delete_by_pid.

That's true. Actually, it ought to be possible to make proc_lib:start_link() insensitive to mailbox length in the same way as gen_server:call(), since the spawn_link() function produces a pid() that cannot possibly be in the message queue already.

Even more interesting, perhaps, is to find out whether the gen_server is blocking, waiting for a child to respond with proc_lib:init_ack/2. This is the very purpose of being in proc_lib:sync_wait(), and would also explain why it isn't picking up other messages from the queue.

BR,
Ulf W

Ulf Wiger, CTO, Erlang Solutions, Ltd.



Jon Watte

unread,
Nov 5, 2011, 9:33:31 PM11/5/11
to Magnus Klaar, erlang-q...@erlang.org
That's what I had understood, too, but some people who I would trust were suggesting that I did too much work myself by keeping track of children in a separate process using links, and managing an ets table in that process -- the advice was that processes removing themselves in terminate() would be cleaner.
Thanks for your description.

Sincerely,

jw


--
Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable.
Reply all
Reply to author
Forward
0 new messages