RestS3Service connection pool exhaustion under high load

593 views
Skip to first unread message

edevi...@hotmail.com

unread,
Jun 26, 2008, 4:23:38 PM6/26/08
to JetS3t Users
Hi,

I'm having trouble with Jets3t RestS3service. When I make more
requests in parallel than s3service.max-thread-count, threads hang and
I have to restart my application.
I've tried to setup httpclient.connection-timeout-ms but it does not
seem to work as timeout shows up as 0 in the logs when requesting a
connection.

Thu Jun 26 20:02:03 2008 GMT [DEBUG]
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager:
HttpConnectionManager.getConnection: config =
HostConfiguration[host=http://s3.amazonaws.com], timeout = 0
Thu Jun 26 20:02:03 2008 GMT [TRACE]
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager:
enter
HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
Thu Jun 26 20:02:03 2008 GMT [DEBUG]
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager:
Unable to get a connection, waiting...,
hostConfig=HostConfiguration[host=http://s3.amazonaws.com]

Any idea what could be wrong?

Thanks

Manu

James Murty

unread,
Jun 26, 2008, 5:08:34 PM6/26/08
to jets3t...@googlegroups.com
Hi Manu,

Check that the jets3t.properties setting "httpclient.max-connections" is equal to the largest of the settings "s3service.max-thread-count" and "s3service.admin-max-thread-count". The "httpclient.max-connections" property tells the HttpClient library how many HTTP/S connections to make available, while the second two properties control the number of threads JetS3t will run. If there are fewer connections than threads, the library can become deadlocked.

Hope this helps,
James

---
http://www.jamesmurty.com

edevi...@hotmail.com

unread,
Jun 28, 2008, 4:18:22 PM6/28/08
to JetS3t Users
Hi James,

Thanks for your suggestion but I had no luck setting properties as
follows:
s3service.max-thread-count=4
httpclient.max-connections=4
s3service.admin-max-thread-count=4

I threw 10 requests at my server and only got one response back. The
other 9 requests timed out.
I looked at the Jets3t code and even though it seems to initialize the
httpclient with jets3t timeout and retry property values, these don't
seem to be used when the httpclient requests a connection:
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager:
HttpConnectionManager.getConnection: config =
HostConfiguration[host=http://s3.amazonaws.com], timeout = 0

If I hack the RestS3Service code to set the timeout another way so I
can see a non zero value used in the logs, then the connection times
out as expected but the retry does not seem to kick in and the request
fails anyway.

Any other suggestion?

Thanks,

Manu

On Jun 26, 10:08 pm, "James Murty" <ja...@misterm.org> wrote:
> Hi Manu,
>
> Check that the jets3t.properties setting "httpclient.max-connections" is
> equal to the largest of the settings "s3service.max-thread-count" and
> "s3service.admin-max-thread-count". The "httpclient.max-connections"
> property tells the HttpClient library how many HTTP/S connections to make
> available, while the second two properties control the number of threads
> JetS3t will run. If there are fewer connections than threads, the library
> can become deadlocked.
>
> Hope this helps,
> James
>
> ---http://www.jamesmurty.com

James Murty

unread,
Jun 28, 2008, 8:04:38 PM6/28/08
to jets3t...@googlegroups.com
Sorry Manu, I misunderstood your original question. If you are using only RestS3Service for parallel connections and not one of the multi-threaded classes in the JetS3t library, the "s3service.max-thread-count" and "s3service.admin-max-thread-count" properties will have no effect.

I think the problem is due to the way the HttpClient connection manager MultiThreadedHttpConnectionManager, as used in the RestS3Service, handles multiple simultaneous connections -- if there are more connections in parrallel than there are available HTTP connections (as set by "httpclient.max-connections") this class seems to deadlock.

There is another discussion about this issue here: http://www.nabble.com/RestS3Service-locking-up-td17644333.html#a17644333

In the linked thread, I mention a couple of options for dealing with this issue. Aside from increasing the number of available HTTP connections to cope with any potential demands placed on your server, your best option may be to try a different connection manager as mentioned in my last comment in that thread.

I'm keen to track this problem down and reach a proper resolution, so please let me know whether the suggestions from that other thread are helpful.

Cheers,
James

---
http://www.jamesmurty.com

James Murty

unread,
Jun 28, 2008, 8:34:59 PM6/28/08
to jets3t...@googlegroups.com
By the way, what S3 operation are you performing when the deadlock occurs? And which version of JetS3t are you using?

I have been doing some testing by performing more parallel connections than there are HTTP connections and I have not been able to replicate the deadlock with the #putObject, #getObjectDetails, or #deleteObject operations. I wonder if you are calling getObject without consuming and closing the resultant input stream?

James

---
http://www.jamesmurty.com

edevi...@hotmail.com

unread,
Jun 30, 2008, 7:36:42 AM6/30/08
to JetS3t Users
Hi James,

Thanks a lot for closely following up with this.
My case matches well with the discussion you referred me to:
I'm using Jets3t 0.6 and my application runs under Tomcat as well.
I'm doing gets when deadlock occurs.
I run my test with the following setup
s3service.max-thread-count=4
httpclient.max-connections=4
s3service.admin-max-thread-count=4

I restart my application to reset everything.

Then I simultaneously start 10 clients that will make one get(Object)
each on an S3 object (always the same but this should not matter). The
object returned to the client is written to a file, making sure that I
get it all when the server replies.
When I run this, I get different results. Sometimes all requests get
through (rarely though) sometimes I get 1, 3, ... But whenever I get
less than 10 results, I can be sure the deadlock is there and if I try
to make another request, it fails.
Here is the stack trace for a locked thread:

0xb75ce27d __pthread_cond_wait + 0x7d
0x0630b169 _ZN2os13PlatformEvent4parkEv + 0xb9
0x06362bf5 _ZN13ObjectMonitor4waitExbP6Thread + 0x615
0x0636045f _ZN18ObjectSynchronizer4waitE6HandlexP6Thread + 0x3f
0x0625de4a JVM_MonitorWait + 0x1da
0xb539c66e * java.lang.Object.wait(long) bci:0 (Interpreted
frame)
0xb5394edd *
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(org.apache.commons.h
ttpclient.HostConfiguration, long) bci:325 line:509 (Interpreted
frame)
0xb5394d77 *
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(org.apache.
commons.httpclient.HostConfiguration, long) bci:74 line:394
(Interpreted frame)
0xb5395253 *
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(org.apache.commons.httpclient.HttpMeth
od) bci:180 line:152 (Interpreted frame)
0xb5394edd *
org.apache.commons.httpclient.HttpClient.executeMethod(org.apache.commons.httpclient.HostConfiguratio
n, org.apache.commons.httpclient.HttpMethod,
org.apache.commons.httpclient.HttpState) bci:115 line:396 (Interpreted
fra
me)
0xb5394db9 *
org.apache.commons.httpclient.HttpClient.executeMethod(org.apache.commons.httpclient.HttpMethod)
bci:
14 line:324 (Interpreted frame)
0xb5394db9 *
org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(org.apache.commons.httpclient.Ht
tpMethodBase, int) bci:93 line:308 (Interpreted frame)
0xb5394edd *
org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestGet(java.lang.String,
java.lang.Stri
ng, java.util.Map, java.util.Map) bci:51 line:605 (Interpreted frame)
0xb5394d77 *
org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(boolean,
java.lang.String, java.l
ang.String, java.util.Calendar, java.util.Calendar,
java.lang.String[], java.lang.String[], java.lang.Long, java.lang.L
ong) bci:517 line:1214 (Interpreted frame)
0xb5394d77 *
org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(java.lang.String,
java.lang.Strin
g, java.util.Calendar, java.util.Calendar, java.lang.String[],
java.lang.String[], java.lang.Long, java.lang.Long) bci:
15 line:1157 (Interpreted frame)
0xb5394d77 *
org.jets3t.service.S3Service.getObject(org.jets3t.service.model.S3Bucket,
java.lang.String, java.util
.Calendar, java.util.Calendar, java.lang.String[], java.lang.String[],
java.lang.Long, java.lang.Long) bci:24 line:1496
(Interpreted frame)
0xb5394d77 *
org.jets3t.service.S3Service.getObject(org.jets3t.service.model.S3Bucket,
java.lang.String) bci:16 li
ne:1133 (Interpreted frame)

I tried to do the same with a putObject and I could not reproduce the
issue. So there might be something wrong specific to getObject.
I'll have another look into my code but I doubt it's the root cause of
this issue.
Looking at Jets3t code, I suspect there might be connection leak when
S3ServiceException is thrown from performRequest method but I don't
see such exception being thrown in my case.
I tried the following config:
httpclient.connection-timeout-ms=10000
httpclient.socket-timeout-ms=11000
httpclient.retry-max=3

But these don't seem to be used at all unless I hack the existing
RestS3Service.java (see my previous post in this thread).

Any ideas on this point too?

Thanks,

Manu


On Jun 29, 1:34 am, "James Murty" <ja...@misterm.org> wrote:
> By the way, what S3 operation are you performing when the deadlock occurs?
> And which version of JetS3t are you using?
>
> I have been doing some testing by performing more parallel connections than
> there are HTTP connections and I have not been able to replicate the
> deadlock with the #putObject, #getObjectDetails, or #deleteObject
> operations. I wonder if you are calling getObject without consuming and
> closing the resultant input stream?
>
> James
>
> ---http://www.jamesmurty.com
>
> On Sun, Jun 29, 2008 at 10:04 AM, James Murty <ja...@misterm.org> wrote:
> > Sorry Manu, I misunderstood your original question. If you are using only
> > RestS3Service for parallel connections and not one of the multi-threaded
> > classes in the JetS3t library, the "s3service.max-thread-count" and
> > "s3service.admin-max-thread-count" properties will have no effect.
>
> > I think the problem is due to the way the HttpClient connection manager
> > MultiThreadedHttpConnectionManager, as used in the RestS3Service, handles
> > multiple simultaneous connections -- if there are more connections in
> > parrallel than there are available HTTP connections (as set by
> > "httpclient.max-connections") this class seems to deadlock.
>
> > There is another discussion about this issue here:
> >http://www.nabble.com/RestS3Service-locking-up-td17644333.html#a17644333
>
> > In the linked thread, I mention a couple of options for dealing with this
> > issue. Aside from increasing the number of available HTTP connections to
> > cope with any potential demands placed on your server, your best option may
> > be to try a different connection manager as mentioned in my last comment in
> > that thread.
>
> > I'm keen to track this problem down and reach a proper resolution, so
> > please let me know whether the suggestions from that other thread are
> > helpful.
>
> > Cheers,
> > James
>
> > ---
> >http://www.jamesmurty.com
>

James Murty

unread,
Jun 30, 2008, 8:26:20 AM6/30/08
to jets3t...@googlegroups.com
Hi,
 
Then I simultaneously start 10 clients that will make one get(Object)
each on an S3 object (always the same but this should not matter). The
object returned to the client is written to a file, making sure that I
get it all when the server replies.

Can you check that you are closing the input streams from the S3Objects when you read them to a file? If you don't explicitly close the input stream, the underlying HTTP connection will not be released. This could explain the deadlocks and the unpredictable behaviour, because JetS3t will try to release connections if you forget to close the input stream, but it cannot do this in all cases.
 
Looking at Jets3t code, I suspect there might be connection leak when
S3ServiceException is thrown from performRequest method but I don't
see such exception being thrown in my case.

Ah, I see what you mean. I'll fix that potential connection leak, but as you say it doesn't explain your current issue.
 
I tried the following config:
httpclient.connection-timeout-ms=10000
httpclient.socket-timeout-ms=11000
httpclient.retry-max=3

But these don't seem to be used at all unless I hack the existing
RestS3Service.java (see my previous post in this thread).

Any ideas on this point too?

By default JetS3t only passed on a few HttpClient-specific settings from the jets3t.properties file. If you need more control over the HttpClient settings you currently have to modify the code. Any ideas about how this could be better handled?

Cheers,
James

edevi...@hotmail.com

unread,
Jun 30, 2008, 2:25:11 PM6/30/08
to JetS3t Users
Hi James,


On Jun 30, 1:26 pm, "James Murty" <ja...@misterm.org> wrote:
> Hi,
>
> > Then I simultaneously start 10 clients that will make one get(Object)
> > each on an S3 object (always the same but this should not matter). The
> > object returned to the client is written to a file, making sure that I
> > get it all when the server replies.
>
> Can you check that you are closing the input streams from the S3Objects when
> you read them to a file? If you don't explicitly close the input stream, the
> underlying HTTP connection will not be released. This could explain the
> deadlocks and the unpredictable behaviour, because JetS3t will try to
> release connections if you forget to close the input stream, but it cannot
> do this in all cases.

I checked the code and the stream is closed once the stream is fully
read. Double checked: Instead of running my 10 requests in parallel, I
ran them sequentially. It works fine. No connection leak.

> > Looking at Jets3t code, I suspect there might be connection leak when
> > S3ServiceException is thrown from performRequest method but I don't
> > see such exception being thrown in my case.
>
> Ah, I see what you mean. I'll fix that potential connection leak, but as you
> say it doesn't explain your current issue.

Thanks.

> > I tried the following config:
> > httpclient.connection-timeout-ms=10000
> > httpclient.socket-timeout-ms=11000
> > httpclient.retry-max=3
>
> > But these don't seem to be used at all unless I hack the existing
> > RestS3Service.java (see my previous post in this thread).
>
> > Any ideas on this point too?
>
> By default JetS3t only passed on a few HttpClient-specific settings from the
> jets3t.properties file. If you need more control over the HttpClient
> settings you currently have to modify the code. Any ideas about how this
> could be better handled?

To me, it looks like some of the Jets3t httpclient params may not be
passed correctly down to the httpclient. That's why they don't have
the expected effect when requesting a connection from the pool.

HttpClient creates a HttpMethodDirector and passes its
HttpClientParams "params". The httpClient instanciated in the
RestS3Service class passes the timeout settings as part of the
connectionManager object. If you set the timeout in the clientParams,
it gets applied when a connection is requested from the pool.

Does this make sense?

Manu

> Cheers,
> James

James Murty

unread,
Jul 1, 2008, 6:59:34 AM7/1/08
to jets3t...@googlegroups.com
Hi Manu,
 
I checked the code and the stream is closed once the stream is fully
read. Double checked: Instead of running my 10 requests in parallel, I
ran them sequentially. It works fine. No connection leak.

Can you turn on some extra debugging by adding the following line to configs/log4j.properties:

log4j.logger.org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream=DEBUG
log4j.logger.org.apache.commons.httpclient=DEBUG

If the HTTP connections are being released properly you should see something like the following message after each file has been downloaded:

2008-07-01 20:41:50,474 [org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream] Released HttpMethod as its response data stream is fully consumed

If you don't see one of these messages for each S3 operation that is performed, something is preventing the HttpClient objects from being released. If you do see these messages but you still end up with a deadlock, this would indicate that the problem is located deeper in the HttpClient library.
 
> By default JetS3t only passed on a few HttpClient-specific settings from the
> jets3t.properties file. If you need more control over the HttpClient
> settings you currently have to modify the code. Any ideas about how this
> could be better handled?

To me, it looks like some of the Jets3t httpclient params may not be
passed correctly down to the httpclient. That's why they don't have
the expected effect when requesting a connection from the pool.

I believe the HttpClient properties supported by JetS3t are being initialised correctly. There are two different timeout settings that may be confusing issues.

JetS3t set's two timeout values for HTTP socket connections but these both apply to the underlying communication connections, in other words how long a connection is held open when there is no data being sent along the wire. On the HttpClient project's web site (http://hc.apache.org/httpclient-3.x/preference-api.html) you can see the documentation for these two settings: "http.connection.timeout" and "http.socket.timeout"

I think you are referring to a different timeout value, one that controls how long JetS3t will wait for HttpClient to provide it with a connection object (see the documentation for "http.connection-manager.timeout"). JetS3t deliberately avoids setting this second type of timeout value because if HttpClient fails to provide a connection, the S3 operation will fail anyway so it is better to wait until a connection becomes available. Well, unless it's deadlocked, that is...

If you can run your tests with the extra logging described above, we should at least gain a better idea of where the root problem is. If you are able to attach the log messages to a follow-up post that would be helpful.

Cheers,
James

edevi...@hotmail.com

unread,
Jul 3, 2008, 10:36:46 AM7/3/08
to JetS3t Users
Hi James,

You're right I misunderstood that part of the code.
In my quest to find an answer to this problem, I found the following:
http://markmail.org/message/nkqzobomibp5hbkr#query:HTTPCLIENT-675+page:1+mid:eb7jkdngtaogcbqg+state:results

And tried to use HttpClient 3.1 instead of 3.0.1 but without luck. I'm
now running with 2 connections and 2 threads max.
I've already turned on the log level to TRACE on both Jets3t and
HttpClient to figure out what could be wrong.
I've identified a few suspicious things like
HttpMethodReleaseInputStream.finalize() seems to kick off a bit
unexpectedly. Based on the code, I suspect that if that kicks off and
the stream is not yet consumed, it will not release the connection.
Also, in case of IO exception for any of the
HttpMethodReleaseInputStream methods related to the stream, the
connection should be released before the IO exception is forwarded.
This way, any network glitch will not result in a deadlock on the long
run.
But it does not seem to be enough to fix the problem.
I've been sucked into other issues for now, but I'll get back to it as
soon as I can.
In the mean time, I let you meditate on my above proposal to try being
more aggressive on the connection release.

Manu

James Murty

unread,
Jul 5, 2008, 12:52:09 AM7/5/08
to jets3t...@googlegroups.com
Hi Manu,

I've already turned on the log level to TRACE on both Jets3t and
HttpClient to figure out what could be wrong.
I've identified a few suspicious things like
HttpMethodReleaseInputStream.finalize() seems to kick off a bit
unexpectedly.

This is a very bad sign. If you are seeing the warning message about HttpMethodReleaseInputStream.finalize() being invoked before the S3 data stream has been consumed and closed, it shows that your server is not always properly cleaning up when it downloads data from S3. There are no guarantees in Java that the finalize() method will complete successfully, so JetS3t prints the warning message to show it is performing a last-ditch effort to clean up but it may not be able to do so. If the finalize() method doesn't complete, an HTTP connection will be left un-released.

As your code is written to consume and close the S3 download data streams, these warning messages may indicate a deeper problem in the way your code interacts with the Tomcat engine. I wonder if Tomcat automatically kills off threads in some circumstances without letting them finish their work? This might happen if the client's connection is dropped. If Tomcat is killing the thread that runs your code before it can close the S3 download streams, this would explain why the underlying HTTP connections to S3 are not being released.

Perhaps you could try modifying your code to ensure that S3Object input streams are always closed, even if an InterruptedException occurs in the middle of a download? Even better, you could put your code in a try/finally block that closes the input stream in the finally section.
 
Based on the [JetS3t] code, I suspect that if [HttpMethodReleaseInputStream#releaseConnection] kicks off and

the stream is not yet consumed, it will not release the connection.

No, if the releaseConnection() method is invoked it will always release the underlying HTTP connection. If the connection's data stream has not already been consumed, it will perform the extra step of aborting the connection, but the actual release should always occur.
 
Also, in case of IO exception for any of the
HttpMethodReleaseInputStream methods related to the stream, the
connection should be released before the IO exception is forwarded.
This way, any network glitch will not result in a deadlock on the long
run.

This is a fair point. Ideally, users of JetS3t should ensure that their response data input streams are always closed by placing their code in a try/catch/finally block. However, by catching IOExceptions in the HttpMethodReleaseInputStream class itself, JetS3t can clean up after exceptions, even if the API user forgets to do so. I have added you recommended changes to the latest version.

Cheers,
James

edevi...@hotmail.com

unread,
Jul 15, 2008, 7:08:32 PM7/15/08
to JetS3t Users
Hi James,

I found a couple potential vulnerabilities in my own code which I
believe can lead to streams not being fully read nor closed. One last
question:
When do you think you will release a new version of Jets3t including
the change on HttpMethodReleaseInputStream discussed above?

Thanks,

Manu

James Murty

unread,
Jul 16, 2008, 7:08:43 AM7/16/08
to jets3t...@googlegroups.com
Hi Manu,
 
When do you think you will release a new version of Jets3t including
the change on HttpMethodReleaseInputStream discussed above?

Realistically I won't be releasing a new public version of JetS3t any time soon. I'm reluctant to make a release until I have updated the documentation, and unfortunately that process is tedious and time-consuming so I keep putting it off.

The codebase in CVS is fairly stable and there are at least a few people using it, so a CVS checkout is a reasonable option if you need the latest code. In reality, the only difference between the codebase as it currently stands and a full public release would be the extra documentation and some (minimal) extra testing.
 
Cheers,
James

P.S. Can you please follow up on this thread if/when you reach a resolution with your thread starvation issues? Both myself and future forum readers will be interested to know what you discover.

edevi...@hotmail.com

unread,
Jul 22, 2008, 6:09:14 PM7/22/08
to JetS3t Users
Hi James,


Here are the methods I changed in HttpMethodReleaseInputStream.java:

public int read() throws IOException {
try {
int read = inputStream.read();
if (read == -1) {
underlyingStreamConsumed = true;
if (!alreadyReleased) {
releaseConnection();
log.debug("Released HttpMethod as its response
data stream is fully consumed");
}
}
return read;
} catch (IOException io) {
releaseConnection();
log.warn("Released connection due to io exception: " +
io);
throw io;
}
}


/**
* Standard input stream read method, except it calls {@link
#releaseConnection} when the underlying
* input stream is consumed.
*/
public int read(byte[] b, int off, int len) throws IOException {
try {
int read = inputStream.read(b, off, len);
if (read == -1) {
underlyingStreamConsumed = true;
if (!alreadyReleased) {
releaseConnection();
log.debug("Released HttpMethod as its response
data stream is fully consumed");
}
}
return read;
} catch (IOException io) {
releaseConnection();
log.warn("Released connection due to io exception: " +
io);
throw io;
}
}

public int available() throws IOException {
try {
return inputStream.available();
} catch (IOException io) {
releaseConnection();
log.warn("Released connection due to io exception: " +
io);
throw io;
}
}

No need to change the close() method since the connection is released
before touching the InputStream.

Also I changed the following in RestS3Service around line 444 like
this:

} catch (S3ServiceException e) {
httpMethod.releaseConnection();
log.warn("Releasing connection due to exception: " + e);
throw e;
}
to release the connection in case of S3 error.

Do you agree with these changes?

I've fixed a couple of vulnerabilities in my code as well to ensure
the stream is released in case anything wrong happens before it is
returned to the client. At last, but not least, I've upgraded from
Jakarta HttpClient 3.0.1 to 3.1 in which they fixed a few concurrency
issues and resource leaks in MultiThreadedHttpConnectionManager. I've
been running a test for over 36h now and the Heap memory usage is
stable, no connection leak. All looks good so far.

Thanks again for your help with this.

Manu

James Murty

unread,
Jul 23, 2008, 8:15:50 AM7/23/08
to jets3t...@googlegroups.com
Hi Manu,

I'm glad to hear your application is now running smoothly.

I have compared your code changes with the changes I applied to the codebase recently and they match very closely, so the next release of JetS3t will perform almost exactly as your code does. The only differences are that my code logs IOExceptions as debug messages rather than warnings to keep the logging volume down, and in the REST service it now cleans up after any Throwable event rather than just S3ServiceExceptions.

The latest version of JetS3t in CVS also uses HttpClient version 3.1, so it's good to know that this version's resource leak fixes seem to work well.

For the benefit of others who may experience similar issues, could you summarize the steps you took in your application to ensure it cleans up after any errors?


Cheers,
James

---
http://www.jamesmurty.com


Reply all
Reply to author
Forward
0 new messages