Running out of sockets (filehandles) with 1.4.1 -- a known issue?

102 views
Skip to first unread message

Tatu Saloranta

unread,
Jan 28, 2011, 11:40:31 PM1/28/11
to asyncht...@googlegroups.com
I noticed that I can fairly easily reproduce an issue in our
production host, where after fetching about 4k of data objects, system
chokes with errors like:

---
97069 [pool-4-thread-16] ERROR
com.ning.http.client.providers.netty.NettyAsyncHttpProvider -
bootstrap.connect
org.jboss.netty.channel.ChannelException: Failed to open a socket.
at org.jboss.netty.channel.socket.nio.NioClientSocketChannel.newSocket(NioClientSocketChannel.java:49)
at org.jboss.netty.channel.socket.nio.NioClientSocketChannel.<init>(NioClientSocketChannel.java:83)
at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.newChannel(NioClientSocketChannelFactory.java:139)
at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.newChannel(NioClientSocketChannelFactory.java:86)
at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:218)
at org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:188)
at com.ning.http.client.providers.netty.NettyAsyncHttpProvider.doConnect(NettyAsyncHttpProvider.java:735)
at com.ning.http.client.providers.netty.NettyAsyncHttpProvider.execute(NettyAsyncHttpProvider.java:647)
at com.ning.http.client.AsyncHttpClient.executeRequest(AsyncHttpClient.java:476)
at com.ning.http.client.AsyncHttpClient$BoundRequestBuilder.execute(AsyncHttpClient.java:231)
---

For some reason this does not happen on my desktop; possibly this is
related to lower request rate (from desktop I can only get ~10
requests per second, in production ~50, as latency is much lower).
Code I use is pretty simple, so I don't see how I would be leaking
resources (and there aren't close() methods to call anyway): I just
get a Future<Response>, wait for it, process. All request are for
different (virtual) hosts, however, if that matters.
I guess this could be related to underlying limit of open files, but
it would seem odd that so many filehandles were kept open.

Has anyone seen something like this?

-+ Tatu +-

ps. Jean-Francois, feel free to ping me for details if necessary :)

Tatu Saloranta

unread,
Jan 28, 2011, 11:45:03 PM1/28/11
to asyncht...@googlegroups.com
Oh, one more thing: I wish code did not do log.error() on this -- it's
best to get an exception that I can handle; errors to log are just
noise. In this case the problem is that console is flooded with big
stack traces, which makes troubleshooting bit difficult. So cases
where exception is thrown, logging is redundant (I assume this does
throw an exception too, which I handle as a potentially transient
error).

-+ Tatu +-

Hubert Iwaniuk

unread,
Jan 30, 2011, 4:36:41 AM1/30/11
to asyncht...@googlegroups.com
Hi Tatu,

You can check number of file descriptors been used with 'lsof'.
This will tell you what FDs are used, and how many of them.

Hope this helps in trouble shooting,
Hubert.

> --
> You received this message because you are subscribed to the Google Groups "asynchttpclient" group.
> To post to this group, send email to asyncht...@googlegroups.com.
> To unsubscribe from this group, send email to asynchttpclie...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/asynchttpclient?hl=en.
>

jfarcand

unread,
Jan 30, 2011, 7:28:29 PM1/30/11
to asyncht...@googlegroups.com

Salut,

first, have you limited to the number of connections using the
AsyncHttpClientConfig? If not, it means all connections are getting
cached which explain the current failure IMO. Just curious, if you
switch the provider to the JDKAsyncHttpProvider (or use Netty's blocking
I/O), does it makes a difference?

On 11-01-28 11:45 PM, Tatu Saloranta wrote:
> Oh, one more thing: I wish code did not do log.error() on this --
it's
> best to get an exception that I can handle; errors to log are just
> noise. In this case the problem is that console is flooded with big
> stack traces, which makes troubleshooting bit difficult. So cases
> where exception is thrown, logging is redundant (I assume this does
> throw an exception too, which I handle as a potentially transient
> error).

Let me first try to reproduce and I will fix that.

Thanks!

-- jeanfrancois

Tatu Saloranta

unread,
Jan 30, 2011, 7:35:53 PM1/30/11
to asyncht...@googlegroups.com
On Sun, Jan 30, 2011 at 4:28 PM, jfarcand <jfarca...@gmail.com> wrote:
>
> Salut,
>

Hi there! I can check out lsof later on; for what it's worth, fails
are on linux (not sure which distro, version); but work on my macbook.

> first, have you limited to the number of connections using the
> AsyncHttpClientConfig? If not, it means all connections are getting cached
> which explain the current failure IMO. Just curious, if you switch the

Ok. No, I have not changed configuration; I vaguely remember there
being some limits (I guess on per-host basis, and since these are
different hosts, probably essentially unlimited).

> provider to the JDKAsyncHttpProvider (or use Netty's blocking I/O), does it
> makes a difference?

I can try that -- I did actually did the code with basic stock JDK URL
connection, which actually works without issues. So I suspect this
would also work. I can test other settings tomorrow and hopefully have
more information to share.

Thanks!

-+ Tatu +-

Hubert Iwaniuk

unread,
Jan 31, 2011, 3:33:11 AM1/31/11
to asyncht...@googlegroups.com
Hi Tatu,

On Jan 31, 2011, at 1:35 AM, Tatu Saloranta wrote:

> On Sun, Jan 30, 2011 at 4:28 PM, jfarcand <jfarca...@gmail.com> wrote:
>>
>> Salut,
>>
>
> Hi there! I can check out lsof later on; for what it's worth, fails
> are on linux (not sure which distro, version); but work on my macbook.
>
>> first, have you limited to the number of connections using the
>> AsyncHttpClientConfig? If not, it means all connections are getting cached
>> which explain the current failure IMO. Just curious, if you switch the
>
> Ok. No, I have not changed configuration; I vaguely remember there
> being some limits (I guess on per-host basis, and since these are
> different hosts, probably essentially unlimited).

Please use following: http://asynchttpclient.github.com/async-http-client/apidocs/com/ning/http/client/AsyncHttpClientConfig.Builder.html#setMaximumConnectionsTotal(int)


>
>> provider to the JDKAsyncHttpProvider (or use Netty's blocking I/O), does it
>> makes a difference?
>
> I can try that -- I did actually did the code with basic stock JDK URL
> connection, which actually works without issues. So I suspect this
> would also work. I can test other settings tomorrow and hopefully have
> more information to share.

If you are running linux 32 bit than system can run out of LOWMEM, but not having limit of connections, you sure are going to run out of them.
Also if you don't query same hosts more than once than also set http://asynchttpclient.github.com/async-http-client/apidocs/com/ning/http/client/AsyncHttpClientConfig.Builder.html#setAllowPoolingConnection(boolean) to false.

HTH,
Hubert.

>
> Thanks!
>
> -+ Tatu +-

jfarcand

unread,
Jan 31, 2011, 8:30:09 AM1/31/11
to asyncht...@googlegroups.com
Salut,

On 11-01-28 11:45 PM, Tatu Saloranta wrote:

> Oh, one more thing: I wish code did not do log.error() on this -- it's
> best to get an exception that I can handle; errors to log are just
> noise.

OK I've removed the log...are you using AsyncHandler? You should get the
exception there.

A+

-- Jeanfrancois

Tatu Saloranta

unread,
Jan 31, 2011, 2:15:30 PM1/31/11
to asyncht...@googlegroups.com
On Mon, Jan 31, 2011 at 12:33 AM, Hubert Iwaniuk <neo...@kungfoo.pl> wrote:
> Hi Tatu,
>
> On Jan 31, 2011, at 1:35 AM, Tatu Saloranta wrote:
>
>> On Sun, Jan 30, 2011 at 4:28 PM, jfarcand <jfarca...@gmail.com> wrote:
>>>
>>> Salut,
>>>
>>
>> Hi there! I can check out lsof later on; for what it's worth, fails
>> are on linux (not sure which distro, version); but work on my macbook.
>>
>>> first, have you limited to the number of connections using the
>>> AsyncHttpClientConfig? If not, it means all connections are getting cached
>>> which explain the current failure IMO. Just curious, if you switch the
>>
>> Ok. No, I have not changed configuration; I vaguely remember there
>> being some limits (I guess on per-host basis, and since these are
>> different hosts, probably essentially unlimited).
>
> Please use following: http://asynchttpclient.github.com/async-http-client/apidocs/com/ning/http/client/AsyncHttpClientConfig.Builder.html#setMaximumConnectionsTotal(int)

Ah. I actually missed piece of existing code that was setting this
value to 10000, which would explain the problem (ulimit for file
handles was 1k).

However: if reducing this value to, say, 100, I get 91 succesful calls
and then 99k failures (haven't yet checked which ones; will enable
failure output for now).
Also: what is the default setting? Code seems to assume -1, which
might be "unlimited"? That would seem like a risky default so I must
be missing a setting.

>If you are running linux 32 bit than system can run out of LOWMEM, but not having limit of connections, you sure are going to run out of them.
Also if you don't query same hosts more than once than also set
http://asynchttpclient.github.com/async-http-client/apidocs/com/ning/http/client/AsyncHttpClientConfig.Builder.html#setAllowPoolingConnection(boolean)
to false.

Thanks, that does make sense in this case.

I am still bit puzzled about logics: only getting 91 success cases
probably is ok (I assume 9 fail for some other reason; I get about 12%
failure rate due to service itself failing). And it sounds like there
was some additional leakage; but I don't see what additional calls I
could/should make to close resources... API seems to assume it can
handle closures transparently. But in error cases (http response not
200 etc) I am not reading response message, should I? I remember that
Jakarta client assume this needed to be done (otherwise connection can
not be reused).

-+ Tatu +-

Tatu Saloranta

unread,
Jan 31, 2011, 2:18:09 PM1/31/11
to asyncht...@googlegroups.com
On Mon, Jan 31, 2011 at 5:30 AM, jfarcand <jfarca...@gmail.com> wrote:
> Salut,
>
> On 11-01-28 11:45 PM, Tatu Saloranta wrote:
>>
>> Oh, one more thing: I wish code did not do log.error() on this -- it's
>> best to get an exception that I can handle; errors to log are just
>> noise.
>
> OK I've removed the log...are you using AsyncHandler? You should get the
> exception there.

Thanks!

I am using simple execute + get to do basically synchronous call (not
enough concurrency for it to really matter, slightly simpler code than
using handlers), and I think I am getting an exception anyway. I am
working on initial pass, and not recording failures yet, to reduce
noise on client side (this is a batch process).
So failures get reported only second of third time they occur per destination.

-+ Tatu +-

jfarcand

unread,
Jan 31, 2011, 2:27:04 PM1/31/11
to asyncht...@googlegroups.com

On 11-01-31 2:15 PM, Tatu Saloranta wrote:
> On Mon, Jan 31, 2011 at 12:33 AM, Hubert Iwaniuk<neo...@kungfoo.pl> wrote:
>> Hi Tatu,
>>
>> On Jan 31, 2011, at 1:35 AM, Tatu Saloranta wrote:
>>
>>> On Sun, Jan 30, 2011 at 4:28 PM, jfarcand<jfarca...@gmail.com> wrote:
>>>>
>>>> Salut,
>>>>
>>>
>>> Hi there! I can check out lsof later on; for what it's worth, fails
>>> are on linux (not sure which distro, version); but work on my macbook.
>>>
>>>> first, have you limited to the number of connections using the
>>>> AsyncHttpClientConfig? If not, it means all connections are getting cached
>>>> which explain the current failure IMO. Just curious, if you switch the
>>>
>>> Ok. No, I have not changed configuration; I vaguely remember there
>>> being some limits (I guess on per-host basis, and since these are
>>> different hosts, probably essentially unlimited).
>>
>> Please use following: http://asynchttpclient.github.com/async-http-client/apidocs/com/ning/http/client/AsyncHttpClientConfig.Builder.html#setMaximumConnectionsTotal(int)
>
> Ah. I actually missed piece of existing code that was setting this
> value to 10000, which would explain the problem (ulimit for file
> handles was 1k).
>
> However: if reducing this value to, say, 100, I get 91 succesful calls
> and then 99k failures (haven't yet checked which ones; will enable
> failure output for now).
> Also: what is the default setting? Code seems to assume -1, which
> might be "unlimited"? That would seem like a risky default so I must
> be missing a setting.

This is what we decided between 1.0 -> 1.1 because many user reported
"issues" with the limit. Hence the reason to make it unlimited.


>
>> If you are running linux 32 bit than system can run out of LOWMEM, but not having limit of connections, you sure are going to run out of them.
> Also if you don't query same hosts more than once than also set
> http://asynchttpclient.github.com/async-http-client/apidocs/com/ning/http/client/AsyncHttpClientConfig.Builder.html#setAllowPoolingConnection(boolean)
> to false.
>
> Thanks, that does make sense in this case.
>
> I am still bit puzzled about logics: only getting 91 success cases
> probably is ok (I assume 9 fail for some other reason; I get about 12%
> failure rate due to service itself failing). And it sounds like there
> was some additional leakage; but I don't see what additional calls I
> could/should make to close resources... API seems to assume it can
> handle closures transparently. But in error cases (http response not
> 200 etc) I am not reading response message, should I?

No you shoud not. The AsyncHandler (the default one in your case) just
accumulate the data. If an exception occurs this handler is never
populated (or it gets interrupted). You may want to install an
AsyncHandler#onThrowable to see if you get something interesting.
Technically using the Future.get() (blocking) should do the same but I
we may have a possible bug. One thing you can try is to invoke the
Future.cancel() when an exception occurs...if that help that means we
have a bug.

The way I test the library is by using the following project, which
blindly load a remote server

https://github.com/jfarcand/java-http-client-benchmark

I would be interested to see how you do it on your side.

A+

-- Jeanfrancois

Tatu Saloranta

unread,
Jan 31, 2011, 4:26:48 PM1/31/11
to asyncht...@googlegroups.com
On Mon, Jan 31, 2011 at 11:27 AM, jfarcand <jfarca...@gmail.com> wrote:
...

>> Also: what is the default setting? Code seems to assume -1, which
>> might be "unlimited"? That would seem like a risky default so I must
>> be missing a setting.
>
> This is what we decided between 1.0 -> 1.1 because many user reported
> "issues" with the limit. Hence the reason to make it unlimited.

Ah.

Yeah, users sometimes ask for things that are not necessarily good for them. :-)
(but seem like they want it at given point in time).

This is not a problem now that I know of setting, and my use case is
probably not very common one.

But at the same time, after testing out lower values I did realize
that the original author of code (that I am trying to upgrade) most
likely had hit the same issue, and thought that by raising value to
10k made sense.
I think it only worked around the problem, such that connection pool
timeout made sure that actual number of pooled connections was nowhere
near 10k (due to many requests being rather slow).

I was able to resolve the immediate issues by just disabling
connection reuse, as it is not needed in this case. Lowering
connection pool timeout setting would probably also have prevented
buildup.

...


>> I am still bit puzzled about logics: only getting 91 success cases
>> probably is ok (I assume 9 fail for some other reason; I get about 12%
>> failure rate due to service itself failing). And it sounds like there
>> was some additional leakage; but I don't see what additional calls I
>> could/should make to close resources... API seems to assume it can
>> handle closures transparently. But in error cases (http response not
>> 200 etc) I am not reading response message, should I?
>
> No you shoud not. The AsyncHandler (the default one in your case) just
> accumulate the data. If an exception occurs this handler is never populated

Ok. That makes sense.

> (or it gets interrupted). You may want to install an
> AsyncHandler#onThrowable to see if you get something interesting.

Yeah I think errors are of small number of types; 404, 503, regular timeout.

> Technically using the Future.get() (blocking) should do the same but I we
> may have a possible bug. One thing you can try is to invoke the
> Future.cancel() when an exception occurs...if that help that means we have a

I can try that out, will let you know if it helps.

> The way I test the library is by using the following project, which blindly
> load a remote server
>
>  https://github.com/jfarcand/java-http-client-benchmark
>
> I would be interested to see how you do it on your side.

Thanks. I'll see if I can get that to work at some point.

-+ Tatu +-

Tatu Saloranta

unread,
Jan 31, 2011, 8:12:19 PM1/31/11
to asyncht...@googlegroups.com
On Mon, Jan 31, 2011 at 1:26 PM, Tatu Saloranta <tsalo...@gmail.com> wrote:
> On Mon, Jan 31, 2011 at 11:27 AM, jfarcand <jfarca...@gmail.com> wrote:
> ...
>> Technically using the Future.get() (blocking) should do the same but I we
>> may have a possible bug. One thing you can try is to invoke the
>> Future.cancel() when an exception occurs...if that help that means we have a

This did not make any difference. It really looks like connections
were left dangling somehow.
Fortunately I can work around the issue we disabling connection pooling.

-+ Tatu +-

jfarcand

unread,
Feb 1, 2011, 8:52:35 AM2/1/11
to asyncht...@googlegroups.com
Salut,

OK I'm still not sure if it is a bug or a configuration issue. Have you
set AsyncHttpClientConfig.setIdleConnectionInPoolTimeoutInMs to some
value? This is how connection get cleared from the connection pool.

A+

-- Jeanfrancois

> -+ Tatu +-
>

Tatu Saloranta

unread,
Feb 1, 2011, 1:55:55 PM2/1/11
to asyncht...@googlegroups.com

The way I was assuming things work is that eviction works as with
caches: individual connections can go stale (timeout), or maximum
capacity is limited, in which case one or more connections should be
forcibly closed. Is this correct understanding?

The reason I haven't changed the timeout is because that could mask
other underlying problems; although by preventing reuse I effectively
set timeout to zero.

Does this make sense? Thanks for all the help,

-+ Tatu +-

jfarcand

unread,
Feb 1, 2011, 2:04:19 PM2/1/11
to asyncht...@googlegroups.com

On 11-02-01 1:55 PM, Tatu Saloranta wrote:
> On Tue, Feb 1, 2011 at 5:52 AM, jfarcand<jfarca...@gmail.com> wrote:
>> Salut,
>>
>> On 11-01-31 8:12 PM, Tatu Saloranta wrote:
>>>
>>> On Mon, Jan 31, 2011 at 1:26 PM, Tatu Saloranta<tsalo...@gmail.com>
>>> wrote:
>>>>
>>>> On Mon, Jan 31, 2011 at 11:27 AM, jfarcand<jfarca...@gmail.com>
>>>> wrote:
>>>> ...
>>>>>
>>>>> Technically using the Future.get() (blocking) should do the same but I
>>>>> we
>>>>> may have a possible bug. One thing you can try is to invoke the
>>>>> Future.cancel() when an exception occurs...if that help that means we
>>>>> have a
>>>
>>> This did not make any difference. It really looks like connections
>>> were left dangling somehow.
>>> Fortunately I can work around the issue we disabling connection pooling.
>>>
>>
>> OK I'm still not sure if it is a bug or a configuration issue. Have you set
>> AsyncHttpClientConfig.setIdleConnectionInPoolTimeoutInMs to some value? This
>> is how connection get cleared from the connection pool.
>

> The way I was assuming things work is that eviction works as with
> caches: individual connections can go stale (timeout),

Yes, that's the method above.

or maximum
> capacity is limited,

Yup, AsyncHttpClientConfigBuilder#setMaximumConnectionsTotal

in which case one or more connections should be
> forcibly closed. Is this correct understanding?

This is not how it is implemented. Right now AHC will throw an
IOException if all the connections are in use and you ask for a new one.
That means the connections pool is empty, but all its connection in use.

>
> The reason I haven't changed the timeout is because that could mask
> other underlying problems; although by preventing reuse I effectively
> set timeout to zero.

Zero and -1 disable the mechanism (yes yes I need to improve
documentation on that :-):-)). This is handled by Netty under the hood.

>
> Does this make sense? Thanks for all the help,

Hey, we are collaborating again...and not on a PHP puzzle :-) :-)

Tatu Saloranta

unread,
Feb 1, 2011, 2:15:57 PM2/1/11
to asyncht...@googlegroups.com
On Tue, Feb 1, 2011 at 11:04 AM, jfarcand <jfarca...@gmail.com> wrote:
>
>
> On 11-02-01 1:55 PM, Tatu Saloranta wrote:
...

>> The way I was assuming things work is that eviction works as with
>> caches: individual connections can go stale (timeout),
>
> Yes, that's the method above.
>
> or maximum
>>
>> capacity is limited,
>
> Yup, AsyncHttpClientConfigBuilder#setMaximumConnectionsTotal
>
> in which case one or more connections should be
>>
>> forcibly closed. Is this correct understanding?
>
> This is not how it is implemented. Right now AHC will throw an IOException
> if all the connections are in use and you ask for a new one. That means the
> connections pool is empty, but all its connection in use.

This is what I was suspecting. But I did not think I was actually
using this many connections, as number of concurrent threads to use is
much lower (35) than number of connections. Which would suggest that
either are not logically closed (i.e. put back in pool for reuse), or
they were somehow leaked.

But what seems odd is that slowing down system seemed to solve the
issue, assumption being that this allows pooled connections to time
out. Also: preventing connection reuse fully solves the issue, which
would not make sense if problems were due to request sending did not
(logically) close connection.

I hope it's still something with my code, as it would be easier to fix!

...


>> Does this make sense? Thanks for all the help,
>
> Hey, we are collaborating again...and not on a PHP puzzle :-) :-)

Yeah, I much better this puzzle! :-)

-+ Tatu +-

jfarcand

unread,
Feb 1, 2011, 2:20:06 PM2/1/11
to asyncht...@googlegroups.com

On 11-02-01 2:15 PM, Tatu Saloranta wrote:
> On Tue, Feb 1, 2011 at 11:04 AM, jfarcand<jfarca...@gmail.com> wrote:
>>
>>
>> On 11-02-01 1:55 PM, Tatu Saloranta wrote:
> ...
>>> The way I was assuming things work is that eviction works as with
>>> caches: individual connections can go stale (timeout),
>>
>> Yes, that's the method above.
>>
>> or maximum
>>>
>>> capacity is limited,
>>
>> Yup, AsyncHttpClientConfigBuilder#setMaximumConnectionsTotal
>>
>> in which case one or more connections should be
>>>
>>> forcibly closed. Is this correct understanding?
>>
>> This is not how it is implemented. Right now AHC will throw an IOException
>> if all the connections are in use and you ask for a new one. That means the
>> connections pool is empty, but all its connection in use.
>
> This is what I was suspecting. But I did not think I was actually
> using this many connections, as number of concurrent threads to use is
> much lower (35) than number of connections. Which would suggest that
> either are not logically closed (i.e. put back in pool for reuse), or
> they were somehow leaked.

OK I will double check again. Sound like a bug in AHC. I know 1.4.1
leaked connection when the pool was full, but that has been fixed in 1.5.0.


>
> But what seems odd is that slowing down system seemed to solve the
> issue, assumption being that this allows pooled connections to time
> out. Also: preventing connection reuse fully solves the issue, which
> would not make sense if problems were due to request sending did not
> (logically) close connection.
>
> I hope it's still something with my code, as it would be easier to fix!

I'm starting to think we have an issue. If you can privately share the
code snipped you use, I can try to reproduce the issue locally (jfarcand
[at] apache [dot] org).

A+

-- Jeanfrancois

Reply all
Reply to author
Forward
0 new messages