[erlang-questions] Frequent crashes in inets http client (R12B-5)

9 views
Skip to first unread message

Chris Newcombe

unread,
Jun 5, 2009, 10:43:54 AM6/5/09
to erlang-questions Questions, erlan...@erlang.org
Is there a patch for the following issue?

It was reported a while ago:
http://groups.google.com/group/erlang-programming/browse_thread/thread/4c497978c75ed6a9/18d9a242df81ba3a?lnk=gst&q=badrecord#18d9a242df81ba3a(but
I didn't see any replies)

Here's a bit more detail:

httpc_handler is crashing with

{badrecord,request}

(BTW it would be great if badrecord errors also contained the incorrect
term, not just the name of the expected record type)

It’s crashing in

httpc_handler,handle_info,2

The last message received by the gen_server is

{timeout,#Ref<0.0.0.9038>}

The gen_server #state is


{state,undefined,{tcp_session,{{"my-test-url",8080},<0.709.0>},false,http,#Port<0.1351>,1},undefined,undefined,undefined,undefined,{[],[]},pipeline,[#Ref<0.0.0.5834>],nolimit,nolimit,{options,{undefined,[]},20000,1,100,disabled,enabled,false},{timers,[],#Ref<0.0.0.19293>}

I think the relevant element is the first one (request).

i.e. request == undefined

Given the message, it seems almost certain that the crash is in the second
timeout clause of handle_info,

(marked below with ***).

This clause will fire even if request == undefined, but will try to use
Request#request.from, which crashes with {badrecord,request}

%%% Timeouts

%% Internaly, to a request handling process, a request time out is

%% seen as a canceld request.

handle_info({timeout, RequestId}, State =

#state{request = Request = #request{id = RequestId}}) ->

httpc_response:send(Request#request.from,

httpc_response:error(Request,timeout)),

{stop, normal,

State#state{canceled = [RequestId | State#state.canceled],

request = Request#request{from = answer_sent}}};

*** handle_info({timeout, RequestId}, State = #state{request = Request})
->

httpc_response:send(Request#request.from,

httpc_response:error(Request,timeout)),

{noreply, State#state{canceled = [RequestId |
State#state.canceled]}};

handle_info(timeout_pipeline, State = #state{request = undefined}) ->

{stop, normal, State};

It looks like State#state.request is being set to undefined without
cancelling an in-progress request timer.


I've only glanced at the code, but both of the following clauses appear to
do that.

(But it could easily be something else.)

%% On a redirect or retry the current request becomes

%% obsolete and the manager will create a new request

%% with the same id as the current.

{redirect, NewRequest, Data}->

ok = httpc_manager:redirect_request(NewRequest, ProfileName),

handle_pipeline(State#state{request = undefined}, Data);

{retry, TimeNewRequest, Data}->

ok = httpc_manager:retry_request(TimeNewRequest, ProfileName),

handle_pipeline(State#state{request = undefined}, Data);

thanks,

Chris

Oscar Hellström

unread,
Jun 5, 2009, 10:42:34 AM6/5/09
to Chris Newcombe, erlang-questions Questions, erlan...@erlang.org
Not to start a flame war, but I would stay away from the inets http
client if I were trying to build something serious. You can find my
reasons here:
http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:43806:200905:gocblgddeplfolmoleep

Best regards


--
Oscar Hellström, os...@erlang-consulting.com
Office: +44 20 7655 0337
Mobile: +44 798 45 44 773
Erlang Training and Consulting Ltd
http://www.erlang-consulting.com/


________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Steve Davis

unread,
Jun 5, 2009, 1:13:03 PM6/5/09
to erlang-q...@erlang.org
Hi Oscar,

Do you happen to know whether ibrowse suffers the same limitations?

regs,
/s

On Jun 5, 9:42 am, Oscar Hellström <os...@erlang-consulting.com>
wrote:


> Not to start a flame war, but I would stay away from the inets http
> client if I were trying to build something serious. You can find my

> reasons here:http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:43806:200905:gocblgddep...


>
> Best regards
>
>
>
>
>
> Chris Newcombe wrote:
> > Is there a patch for the following issue?
>
> > It was reported a while ago:

> >http://groups.google.com/group/erlang-programming/browse_thread/threa...


> > I didn't see any replies)
>
> > Here's a bit more detail:
>
> > httpc_handler is crashing with
>
> >        {badrecord,request}
>
> > (BTW it would be great if badrecord errors also contained the incorrect
> > term, not just the name of the expected record type)
>
> > It’s crashing in
>
> >        httpc_handler,handle_info,2
>
> > The last message received by the gen_server is
>
> >       {timeout,#Ref<0.0.0.9038>}
>
> > The gen_server #state is
>

> > {state,undefined,{tcp_session,{{"my-test-url",8080},<0.709.0>},false,http,# Port<0.1351>,1},undefined,undefined,undefined,undefined,{[],[]},pipeline,[# Ref<0.0.0.5834>],nolimit,nolimit,{options,{undefined,[]},20000,1,100,disabl ed,enabled,false},{timers,[],#Ref<0.0.0.19293>}

> Erlang Training and Consulting Ltdhttp://www.erlang-consulting.com/

Oscar Hellström

unread,
Jun 5, 2009, 1:27:26 PM6/5/09
to Steve Davis, erlang-q...@erlang.org
Hi Steve,

I have a bone to pick with ibrowse as well. I did discuss some
shortcoming with Chandru, but I never posted anything to the list. I
guess it's time for that now. I also thing Chandru is working on
correcting some of these.

1. Reading of requests
ibrowse reads request using active sockets. This means that any packet
that comes in will be sent to the process managing the connection as a
message. This message is then added to the front of a list, which is
reversed when the complete response is received, and then flattened. Our
response bodies were quite large, some times up to 2MB (or ever larger,
but the largest that I have actually measured was ~2MB), and at peak
time we did have quite high latency over the Internet. I think latency
makes this worse, since the data can be split in to more "active
packets", but I'm not sure. I've done some tests with ibrowse, receiving
a 2MB, 4MB and 8MB file over a slow network. The erlang OS process would
use over 800MB of resident ram when fetching the 8M file. These tests
were on a 64bit machine, which makes matters worse though.

2. Copying of data between processes
The request is first copied from the requesting process to the manager,
which will then copy it again to the process handling the connection.
When the process handling the connection is done, it will copy the
response to the manager, which will in turn copy it again to the process
doing the request. This copying is creating a lot of garbage and is
probably part of the reason why we see so much memory being used in the
previous test. It also makes the manager process' heap grow quite large
if there is a lot of traffic. Another quite important issue here is the
response being send from the process handling the connection to the
manager. This is done with a gen_server:call, which will time out after
some time. During very high CPU load (most likely from flattening many
very large lists) this call would time out and we would see lots of very
big crash reports, which would also make the error_handler use *a lot*
of memory.

3. Timeouts
The timeout handling is done in the process handling the connection, and
is using some weird calculations to come up with reasonable timeouts for
connect. Also note that (gen_tcp/ssl):send/2 can block for some time on
a congested network. Anyway, from out point of view, ibrowse doesn't
respect the actual timeout handed to the send_req call, and we had lots
of internal timeouts instead of external ones.

4. Memory usage / garbage collection
I don't really think this is an ibrowse issue, but it's interesting
anyhow. Since data was copied between lots of processes, and processes
are recycled between requests (to enable pipelining I guess) to monitor
sockets they would keep a lot of the memory they had allocated. We tried
to get around this by adding calls to garbage_collect after each request
in the client process, but I don't think this is a good way to do it.
Another approach is to use one process / request, and let it die when
it's done, which would free the memory, but this doesn't work very well
if you want to support pipelining, but why would you want to do that btw?

Hope this helps

Oscar Hellström

unread,
Jun 5, 2009, 1:30:27 PM6/5/09
to Steve Davis, erlang-q...@erlang.org
Hi All,

As usual, I type too much or too little. The short answer would be:
ibrowse also have issues (see my previous email), but not the blocking
behaviour of inets http client. It can however be a real CPU and memory
drag, and in some cases the timeout handling is questionable.

Best regards

Chris Newcombe

unread,
Jun 5, 2009, 2:34:50 PM6/5/09
to erlang-questions Questions, erlan...@erlang.org
My first (weak) hypothesis was wrong -- cancelling the request timer in the
redirect and retry clauses did not fix the issue.

I tried the obvious mindless suppression of the symptom, and that appears to
work.
See hack below. (This is not a production-worthy patch.)

Of course this does nothing to address the root cause, or other symptoms of
the bug. In a short test with 100 sessions (pipeline depth of 1), I didn't
see any obvious resource leaks, but absence of evidence is not evidence of
absence and all that.

It would be great to get an official fix for this bug, and the other issues
identified by Oscar.

Finally, this hack also applies Mats Cronqvist's suggested patch to avoid
SASL error reports every time that an HTTP server closes a connection. (As
I mention in a comment below, I don't think a SASL report should be
generated if a TCP connection happens to be broken and cause a tcp_error,
but I didn't make that change as I have no easy way to test it.)

Chris

This is against R12B-5

==== erlang/lib/inets/src/http_client/httpc_handler.erl ====
@@ -361,9 +361,15 @@

%%% Error cases
handle_info({tcp_closed, _}, State) ->
- {stop, session_remotly_closed, State};
+ %% cnewcom: was: {stop, session_remotly_closed, State};
+ {stop, shutdown, State}; % cnewcom: per
http://groups.google.com/group/erlang-programming/browse_thread/thread/4c497978c75ed6a9/18d9a242df81ba3a?lnk=gst&q=badrecord#18d9a242df81ba3a
+
handle_info({ssl_closed, _}, State) ->
- {stop, session_remotly_closed, State};
+ %% cnewcom: was: {stop, session_remotly_closed, State};
+ {stop, shutdown, State};
+
+%% cnewcom: TODO: do we really want a SASL error report if a connection is
broken?
+%%
handle_info({tcp_error, _, _} = Reason, State) ->
{stop, Reason, State};
handle_info({ssl_error, _, _} = Reason, State) ->
@@ -379,6 +385,14 @@


{stop, normal,
State#state{canceled = [RequestId | State#state.canceled],
request = Request#request{from = answer_sent}}};

+
+%% cnewcom : try to work around request timeout arriving when
State#state.request == undefined
+%% cnewcom : this appears to successfully suppress the symptom (and a
stress test doesn't show any obvious resource leak)
+%% cnewcom : but there may be other symptoms
+%%
+handle_info({timeout, RequestId}, State = #state{request = undefined}) ->
+ {noreply, State#state{canceled = [RequestId | State#state.canceled]}};
+


handle_info({timeout, RequestId}, State = #state{request = Request}) ->
httpc_response:send(Request#request.from,

Chris Newcombe

unread,
Jun 5, 2009, 2:49:57 PM6/5/09
to Oscar Hellström, erlang-questions Questions
Hi Oscar,

I did see your earlier post -- very useful, thanks very much. (It would be
great to hear from the OTP team on this.)

I normally use ibrowse, but as ibrowse does not yet use binaries internally
it's not working well for my current project. I have exactly the issue you
mentioned in your post (on ibrowse) just now -- I need to concurrently
receive many multi-MB response bodies, and due to ibrowse's use of lists the
memory overhead is crippling -- I'm too am running on 64-bit. My current
project has to co-exist with other important applications, so memory usage
(and much higher cpu from all of the
reversal/flattening/copying/garbage-collection) is a real practical issue.

A couple of days ago I asked Chandru how much work it would be to change
ibrowse to use binaries, and he very kindly said he would look at it.
(Thankyou again Chandru!)

If other users of ibrowse would like this to happen, then it might be useful
to reply publicly to this post and say so. (e.g. I heard that the CouchDb
team are also very interested.) I'm guessing that Chandru would appreciate
assistance with testing, and perhaps even with code (although I have not
asked him).

regards,

Chris


2009/6/5 Oscar Hellström <os...@erlang-consulting.com>

> Not to start a flame war, but I would stay away from the inets http
> client if I were trying to build something serious. You can find my
> reasons here:
>
> http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:43806:200905:gocblgddeplfolmoleep
>
> Best regards
>
> Chris Newcombe wrote:
> > Is there a patch for the following issue?
> >
> >
> >
> > It was reported a while ago:
> >

> http://groups.google.com/group/erlang-programming/browse_thread/thread/4c497978c75ed6a9/18d9a242df81ba3a?lnk=gst&q=badrecord#18d9a242df81ba3a(but<http://groups.google.com/group/erlang-programming/browse_thread/thread/4c497978c75ed6a9/18d9a242df81ba3a?lnk=gst&q=badrecord#18d9a242df81ba3a%28but>

Oscar Hellström

unread,
Jun 5, 2009, 2:55:28 PM6/5/09
to Chris Newcombe, erlang-questions Questions
Hi Chris,

Actually, the OTP team has responded to this, but privately. I did the
mistake of sending it to erlang-bugs and CC:d erlang-questions. This
somehow seem to make the mail end up in the erlang-question's archive,
but was never sent out. An email was however sent out to erlang-bugs'
members. The response from the OTP team is quoted below:

> Hi,
>
> Thanks for the input (good analyzis). I will add it
> to my inets todo-list (which is getting quite long :)
> There is no time to deal with these issues in the
> R13B01 release, but hopefully R13B02.
>
> Regards,
> /BMK

Chris Newcombe wrote:
> Hi Oscar,
>
> I did see your earlier post -- very useful, thanks very much. (It
> would be great to hear from the OTP team on this.)
>
> I normally use ibrowse, but as ibrowse does not yet use binaries
> internally it's not working well for my current project. I have
> exactly the issue you mentioned in your post (on ibrowse) just now --
> I need to concurrently receive many multi-MB response bodies, and due
> to ibrowse's use of lists the memory overhead is crippling -- I'm too
> am running on 64-bit. My current project has to co-exist with other
> important applications, so memory usage (and much higher cpu from all
> of the reversal/flattening/copying/garbage-collection) is a real
> practical issue.
>
> A couple of days ago I asked Chandru how much work it would be to
> change ibrowse to use binaries, and he very kindly said he would look
> at it. (Thankyou again Chandru!)
>
> If other users of ibrowse would like this to happen, then it might be
> useful to reply publicly to this post and say so. (e.g. I heard that
> the CouchDb team are also very interested.) I'm guessing that
> Chandru would appreciate assistance with testing, and perhaps even
> with code (although I have not asked him).
>
> regards,
>
> Chris
>
>
> 2009/6/5 Oscar Hellström <os...@erlang-consulting.com

> <mailto:os...@erlang-consulting.com>>

> <mailto:os...@erlang-consulting.com>


> Office: +44 20 7655 0337
> Mobile: +44 798 45 44 773
> Erlang Training and Consulting Ltd
> http://www.erlang-consulting.com/
>
>


--
Oscar Hellström, os...@erlang-consulting.com
Office: +44 20 7655 0337
Mobile: +44 798 45 44 773
Erlang Training and Consulting Ltd
http://www.erlang-consulting.com/

Chandru

unread,
Jun 14, 2009, 2:49:50 AM6/14/09
to Oscar Hellström, Steve Davis, erlang-q...@erlang.org
Hi Oscar,

Thanks for the feedback. I've hacked ibrowse to use binaries internally
instead of lists. I have an experimental version in case anyone is
interested. I'll announce an update as soon as I'm confident nothing is
broken.

cheers
Chandru

2009/6/5 Oscar Hellström <os...@erlang-consulting.com>

Reply all
Reply to author
Forward
0 new messages