Too many ConnectionPoolTimeout exceptions after some time.

514 views
Skip to first unread message

Deepak Batra

unread,
Sep 6, 2017, 5:03:03 AM9/6/17
to Alluxio Users
2017-09-05 07:14:48,993 INFO  http.AmazonHttpClient (AmazonHttpClient.java:executeHelper) - Unable to execute HTTP request: Timeout waiting for connection from pool
alluxio.underfs.s3a.org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
        at alluxio.underfs.s3a.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:292)
        at alluxio.underfs.s3a.org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:269)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at alluxio.underfs.s3a.com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70)
        at alluxio.underfs.s3a.com.amazonaws.http.conn.$Proxy39.get(Unknown Source)
        at alluxio.underfs.s3a.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191)
        at alluxio.underfs.s3a.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        at alluxio.underfs.s3a.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at alluxio.underfs.s3a.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at alluxio.underfs.s3a.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
        at alluxio.underfs.s3a.com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
        at alluxio.underfs.s3a.com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:787)
        at alluxio.underfs.s3a.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:630)
        at alluxio.underfs.s3a.com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:405)
        at alluxio.underfs.s3a.com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:367)
        at alluxio.underfs.s3a.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:318)
        at alluxio.underfs.s3a.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3787)
        at alluxio.underfs.s3a.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1007)
        at alluxio.underfs.s3a.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:982)
        at alluxio.underfs.s3a.S3AUnderFileSystem.getObjectStatus(S3AUnderFileSystem.java:482)
        at alluxio.underfs.ObjectUnderFileSystem.isFile(ObjectUnderFileSystem.java:428)
        at alluxio.underfs.UnderFileSystemWithLogging$17.call(UnderFileSystemWithLogging.java:307)
        at alluxio.underfs.UnderFileSystemWithLogging$17.call(UnderFileSystemWithLogging.java:304)
        at alluxio.underfs.UnderFileSystemWithLogging.call(UnderFileSystemWithLogging.java:515)
        at alluxio.underfs.UnderFileSystemWithLogging.isFile(UnderFileSystemWithLogging.java:304)
        at alluxio.worker.block.UnderFileSystemBlockReader.init(UnderFileSystemBlockReader.java:127)
        at alluxio.worker.block.UnderFileSystemBlockReader.create(UnderFileSystemBlockReader.java:97)
        at alluxio.worker.block.UnderFileSystemBlockStore.getBlockReader(UnderFileSystemBlockStore.java:227)
        at alluxio.worker.block.DefaultBlockWorker.readUfsBlock(DefaultBlockWorker.java:417)
        at alluxio.worker.netty.DataServerBlockReadHandler.openBlock(DataServerBlockReadHandler.java:209)
        at alluxio.worker.netty.DataServerBlockReadHandler.getDataBuffer(DataServerBlockReadHandler.java:141)
        at alluxio.worker.netty.DataServerReadHandler$PacketReader.runInternal(DataServerReadHandler.java:440)
        at alluxio.worker.netty.DataServerReadHandler$PacketReader.run(DataServerReadHandler.java:407)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2017-09-05 07:14:48,929 ERROR netty.DataServerReadHandler (DataServerReadHandler.java:runInternal) - Failed to read data.
alluxio.exception.BlockDoesNotExistException: Ufs path s3a://path/y=2017/m=08/d=25/h=19/min=30/part-r-00068-da987792-0d58-4a3e-bdc2-b16be246263f.zlib.orc does not exist
        at alluxio.worker.block.UnderFileSystemBlockReader.init(UnderFileSystemBlockReader.java:128)
        at alluxio.worker.block.UnderFileSystemBlockReader.create(UnderFileSystemBlockReader.java:97)
        at alluxio.worker.block.UnderFileSystemBlockStore.getBlockReader(UnderFileSystemBlockStore.java:227)
        at alluxio.worker.block.DefaultBlockWorker.readUfsBlock(DefaultBlockWorker.java:417)
        at alluxio.worker.netty.DataServerBlockReadHandler.openBlock(DataServerBlockReadHandler.java:209)
        at alluxio.worker.netty.DataServerBlockReadHandler.getDataBuffer(DataServerBlockReadHandler.java:141)
        at alluxio.worker.netty.DataServerReadHandler$PacketReader.runInternal(DataServerReadHandler.java:440)
        at alluxio.worker.netty.DataServerReadHandler$PacketReader.run(DataServerReadHandler.java:407)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


We are running Presto over Alluxio on S3. The setup works fine for around 2 days but suddenly after two days we start seeing exceptions like these. Any query which has to get data from S3 fails because of this exception and every-time there's a different file which alluxio says it is unable to find. The max threads properties to read/write from/to S3 are set to default. I checked the tcp connections in CLOSED_WAIT state (to S3) in all the machines and these hover around 20-30 everytime. The problem disappears when either Presto/Alluxio is restarted. I'm thinking that the problem might be the CLOSED_WAIT states only. Is there any way to increase max number of concurrent connections in S3 similar to Impala/Spark (fs.s3a.connection.maximum)?

CLOSED_WAIT state is generally an application specific issue, that the application is not ACKing the FIN state from the opposite end. Has anybody faced such an issue? Can anything be done about it ?

Calvin Jia

unread,
Sep 6, 2017, 1:52:44 PM9/6/17
to Alluxio Users
Hi,

Do you start seeing these issues after a period of time (2 days) or after running a specific job? I'm wondering if it is due to a resource leak or just a bad job not closing connections. After restart, do you still see 20-30 connections in the CLOSED_WAIT state? Another thing you can do is to jstack the Alluxio worker process when you run into this problem.

You can increase the max number of concurrent connections to S3 through the following properties

alluxio.underfs.s3.upload.threads.max
alluxio.underfs.s3.threads.max

For details, you can take a look at the configuration documentation.

Hope this helps,
Calvin

Calvin Jia

unread,
Oct 13, 2017, 2:01:22 PM10/13/17
to Alluxio Users
Hi,

Were you able to resolve the issue?

Thanks,
Calvin
Reply all
Reply to author
Forward
0 new messages