[erlang-questions] Timeout Erlang GenServer Crash Loop

32 views
Skip to first unread message

Code Box

unread,
Oct 12, 2012, 12:03:13 AM10/12/12
to erlang-q...@erlang.org
** Reason for termination ==
** {timeout,{gen_server,call,[thetime,gettime]}}

=CRASH REPORT==== 2012-10-09 05:37:04 UTC ===
  crasher:
    initial call: process_listener:-init/1-fun-2-/0
    pid: <0.12376.513>
    registered_name: []
    exception exit: {timeout,{gen_server,call,[thetime,gettime]}}
      in function  gen_server:terminate/6
    ancestors: [incoming_req_processor,incoming_sup,top_process_sup,
                  <0.52.0>]
    messages: []
    links: []
    dictionary: [{random_seed,{23375,22820,17046}}]
    trap_exit: true
    status: running
    heap_size: 6765
    stack_size: 24
    reductions: 1646842
  neighbours:

I am seeing a lot of these messages in my Crash Reports. Once this reaches this it goes into this crash loop for quite a while. I am not sure how to debug this error. These timeouts are really annoying. Can some one help me understand the root cause of this?

Why does my genserver calls are facing timeouts ? Is it that my erlang VM is slow if so why ? How can i debug this issue to get to the root cause of it ? 

Michael Truog

unread,
Oct 12, 2012, 12:14:05 AM10/12/12
to Code Box, erlang-q...@erlang.org
If you look at gen_server:call/2 at http://www.erlang.org/doc/man/gen_server.html
it shows the default Timeout is 5000 milliseconds (5 seconds).  Your gen_server process must have been processing for longer than 5 seconds while a gen_server:call/2 message was waiting in the process message queue, to cause the timeout exception.  So, it isn't the Erlang VM being slow, it is just an Erlang process that is overloaded (i.e., the "thetime" registered process).



Code Box

unread,
Oct 12, 2012, 1:05:41 AM10/12/12
to Michael Truog, erlang-q...@erlang.org
Will it not relate to any CPU Stats of my host and also any memory stats of my host that the process is overloaded ? I see CPU % usage as just 50% ?

Michael Truog

unread,
Oct 12, 2012, 1:18:27 AM10/12/12
to Code Box, erlang-q...@erlang.org
Well a common problem is to have the process also blocked on its own synchronous call, so that can keep the CPU usage low, since it is spending time mostly idle waiting for 1 or more responses from some other processes.  The best way I have seen to deal with this type of timeout problem is to always pass the timeouts in the message like this:
gen_server:call(<process>, {<message>, Timeout - DELTA}, Timeout)
Where DELTA can be 100 milliseconds.  Then the (Timeout-DELTA) value the handle_call sees can be used for any internally synchronous calls.  However, then the problem becomes understanding what the cumulative delay might be, if there are multiple synchronous calls used within the process.  Ideally, the process is kept simpler, so it doesn't need to try and track many synchronous calls.

I am not entirely sure if this is your problem, since it could be latency due to function calls too, if function calls are blocking schedulers or something strange, code loading locking schedulers.  Usually those issues aren't as common a concern though.

Code Box

unread,
Oct 12, 2012, 3:15:12 AM10/12/12
to Michael Truog, erlang-q...@erlang.org
Thanks for your reply. I really appreciate it. I am sure i do have a lot of load on my server like few thousands requests per second. But the process getting time out is not waiting on any other process that call just does a . So I definitely think it is due to the reason that the process is overloaded and all the other requests to that process are in the process queue are getting time outs. I am trying to prove this looking at the Server metrics around CPU, Memory, IO Stats. Talking about IO Stats i do see a big spike in IO Stats. That could be the reason for other processes blocked till the IO is happening which can cause CPU contention.

Michael Truog

unread,
Oct 12, 2012, 3:31:13 AM10/12/12
to Code Box, erlang-q...@erlang.org
Ok, if you are experiencing latency with file io, make sure you have async thread pool threads set on the Erlang VM with something like:
erl +A 5

If it is related to the async thread pool, the job queue is not shared between the threads... it is a queue per thread, so the size of the async thread pool can impact the wait time... meaning that file io can take longer if the async thread pool is smaller, but you normally don't need a large number of async threads started.

If it is socket stuff, it might be related to the encoding, but there are many possibilities down that road.

Matthew Evans

unread,
Oct 12, 2012, 12:40:22 PM10/12/12
to mjt...@gmail.com, codei...@gmail.com, erlang-q...@erlang.org
It's hard to answer without knowing what your code is doing (i.e. maybe there is an inefficiency somewhere). However, a common design pattern if your gen_server is doing complex work is to spawn another process to do this task. If you are running multicore the work will be distributed over the different cores:

e.g.

handle_call({some_operation,Data}, From, State) ->
    spawn(fun() ->
         Rsp = do_lots_of_work(),
         gen_server:reply(From,Rsp)
    end),
    {noreply,State};


Date: Fri, 12 Oct 2012 00:31:13 -0700
From: mjt...@gmail.com
To: codei...@gmail.com
CC: erlang-q...@erlang.org
Subject: Re: [erlang-questions] Timeout Erlang GenServer Crash Loop
_______________________________________________ erlang-questions mailing list erlang-q...@erlang.org http://erlang.org/mailman/listinfo/erlang-questions
Reply all
Reply to author
Forward
0 new messages