2017-09-05 07:14:48,993 INFO http.AmazonHttpClient (AmazonHttpClient.java:executeHelper) - Unable to execute HTTP request: Timeout waiting for connection from pool
alluxio.underfs.s3a.org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
at alluxio.underfs.s3a.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:292)
at alluxio.underfs.s3a.org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:269)
at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at alluxio.underfs.s3a.com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70)
at alluxio.underfs.s3a.com.amazonaws.http.conn.$Proxy39.get(Unknown Source)
at alluxio.underfs.s3a.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191)
at alluxio.underfs.s3a.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at alluxio.underfs.s3a.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at alluxio.underfs.s3a.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at alluxio.underfs.s3a.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at alluxio.underfs.s3a.com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
at alluxio.underfs.s3a.com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:787)
at alluxio.underfs.s3a.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:630)
at alluxio.underfs.s3a.com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:405)
at alluxio.underfs.s3a.com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:367)
at alluxio.underfs.s3a.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:318)
at alluxio.underfs.s3a.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3787)
at alluxio.underfs.s3a.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1007)
at alluxio.underfs.s3a.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:982)
at alluxio.underfs.s3a.S3AUnderFileSystem.getObjectStatus(S3AUnderFileSystem.java:482)
at alluxio.underfs.ObjectUnderFileSystem.isFile(ObjectUnderFileSystem.java:428)
at alluxio.underfs.UnderFileSystemWithLogging$17.call(UnderFileSystemWithLogging.java:307)
at alluxio.underfs.UnderFileSystemWithLogging$17.call(UnderFileSystemWithLogging.java:304)
at alluxio.underfs.UnderFileSystemWithLogging.call(UnderFileSystemWithLogging.java:515)
at alluxio.underfs.UnderFileSystemWithLogging.isFile(UnderFileSystemWithLogging.java:304)
at alluxio.worker.block.UnderFileSystemBlockReader.init(UnderFileSystemBlockReader.java:127)
at alluxio.worker.block.UnderFileSystemBlockReader.create(UnderFileSystemBlockReader.java:97)
at alluxio.worker.block.UnderFileSystemBlockStore.getBlockReader(UnderFileSystemBlockStore.java:227)
at alluxio.worker.block.DefaultBlockWorker.readUfsBlock(DefaultBlockWorker.java:417)
at alluxio.worker.netty.DataServerBlockReadHandler.openBlock(DataServerBlockReadHandler.java:209)
at alluxio.worker.netty.DataServerBlockReadHandler.getDataBuffer(DataServerBlockReadHandler.java:141)
at alluxio.worker.netty.DataServerReadHandler$PacketReader.runInternal(DataServerReadHandler.java:440)
at alluxio.worker.netty.DataServerReadHandler$PacketReader.run(DataServerReadHandler.java:407)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-09-05 07:14:48,929 ERROR netty.DataServerReadHandler (DataServerReadHandler.java:runInternal) - Failed to read data.
alluxio.exception.BlockDoesNotExistException: Ufs path s3a://path/y=2017/m=08/d=25/h=19/min=30/part-r-00068-da987792-0d58-4a3e-bdc2-b16be246263f.zlib.orc does not exist
at alluxio.worker.block.UnderFileSystemBlockReader.init(UnderFileSystemBlockReader.java:128)
at alluxio.worker.block.UnderFileSystemBlockReader.create(UnderFileSystemBlockReader.java:97)
at alluxio.worker.block.UnderFileSystemBlockStore.getBlockReader(UnderFileSystemBlockStore.java:227)
at alluxio.worker.block.DefaultBlockWorker.readUfsBlock(DefaultBlockWorker.java:417)
at alluxio.worker.netty.DataServerBlockReadHandler.openBlock(DataServerBlockReadHandler.java:209)
at alluxio.worker.netty.DataServerBlockReadHandler.getDataBuffer(DataServerBlockReadHandler.java:141)
at alluxio.worker.netty.DataServerReadHandler$PacketReader.runInternal(DataServerReadHandler.java:440)
at alluxio.worker.netty.DataServerReadHandler$PacketReader.run(DataServerReadHandler.java:407)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
We are running Presto over Alluxio on S3. The setup works fine for around 2 days but suddenly after two days we start seeing exceptions like these. Any query which has to get data from S3 fails because of this exception and every-time there's a different file which alluxio says it is unable to find. The max threads properties to read/write from/to S3 are set to default. I checked the tcp connections in CLOSED_WAIT state (to S3) in all the machines and these hover around 20-30 everytime. The problem disappears when either Presto/Alluxio is restarted. I'm thinking that the problem might be the CLOSED_WAIT states only. Is there any way to increase max number of concurrent connections in S3 similar to Impala/Spark (fs.s3a.connection.maximum)?
CLOSED_WAIT state is generally an application specific issue, that the application is not ACKing the FIN state from the opposite end. Has anybody faced such an issue? Can anything be done about it ?