We have a 200 GB set of RCFiles in s3.
We have a Presto cluster, which is a single machine, the largest instance Amazon offer - 32 cores.
When we issue an SQL command (say, "select count(*) from table") it begins extremely quickly - about 800mb/sec - and then tails off to zero.
What seems to be happening is that about a dozen connections are opened, their files are read, those connections then never close, and hive/presto (which-ever it is) will not open any more connections (presumably due to a limit having been reached).
As such, the per/sec bandwidth value tails off to zero over time.
After a minute or so, Apache times out the query.
Currently we cannot use s3 with Presto.
It all seemed okay when using an eight x1.large cluster; this problem we've found with a single large node.
Currently we are running a hadoop cluster and using HDFS - this is however problematic, as we now need a data loading phase and it also costs money to run the cluster.
--
You received this message because you are subscribed to the Google Groups "Presto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to presto-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Yes :-)
> * net.ipv4.tcp_tw_reuse (and optionally tcp_tw_recycle): you can enable them using sysctl command. If "netstat -tna" shows many TIME_WAIT connections, the parameters may fix the issue.
> * "ulimit -n" (max open files): Default 1024 is too small.
I shall look at both.
> Or less likely following problems might be happening:
>
> * S3 occasionally hangs for several minutes under heavy load.
> Read timeout (ClientConfiguration.setSocketTimeout) is
> necessary as well as connection timeout for better performance.
I don't think it's this, because it's consistent - our problem always happens.
> * Reading data from sockets occasionally hang. JVM can't
> recover the thread. It happens very very rarely, though. It's
> probably due to a bug in JVM: https://gist.github.com/frsyuki
> /3ebc4b56b32cbdc614bc.
Not immpossible it could be this. The 32 core machine ran Ubuntu, which was at the request of the user. Prior to that, I set up an eight node x1.large cluster, with Amazon Linux. There may be Ubuntu specific issues at play.
--
You received this message because you are subscribed to the Google Groups "Presto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to presto-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I have brought up an r3.8xlarge, a 32 core machine.
I've installed the current presto (0.74).
Follows are the full and only configuration steps;
===start===
sudo su -
yum -y update
cd
wget http://central.maven.org/maven2/com/facebook/presto/presto-server/0.74/presto-server-0.74.tar.gz
tar -zxf presto-server-0.74.tar.gz
wget http://central.maven.org/maven2/com/facebook/presto/presto-cli/0.74/presto-cli-0.74-executable.jar
mv presto-cli-0.74-executable.jar presto
chmod +x presto
cd presto-server-0.74
mkdir -p etc/catalog
cd etc
mkdir -p /var/presto/data/
nano node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/var/presto/data
nano jvm.config
-server
-Xmx16G
-XX:+UseConcMarkSweepGC
-XX:+ExplicitGCInvokesConcurrent
-XX:+CMSClassUnloadingEnabled
-XX:+AggressiveOpts
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-XX:PermSize=150M
-XX:MaxPermSize=150M
-XX:ReservedCodeCacheSize=150M
-Xbootclasspath/p:/root/presto-server-0.74/lib/floatingdecimal-0.1.jar
nano config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
task.max-memory=1GB
discovery-server.enabled=true
discovery.uri=http://172.31.42.156:8080
nano log.properties
stdout=DEBUG
apache=DEBUG
cd catalog
nano hive.properties
connector.name=hive-hadoop2ex
hive.metastore.uri=thrift://n.n.n.n:9083
cd /root/presto-server-0.74/bin
===end===
Note the log.properties;
stdout=DEBUG
apache=DEBUG
Having done this, I start the presto server with "./launcher start". Logging is going to "/var/presto/data/var/log/". I issued "tail -f server.log", to follow events as they occurred.
I then ran the presto client with "--server localhost:8080 --catalog hive --schema default".
I then issued "select count(*) from standard_feed_s3_rcfile_fixed;"
The table standard_feed_s3_rcfile_fixed is an 81 column table, stored in RCFile format on s3. There are 823 files, coming to 202 GB. The command "select * from standard_feed_s3_rcfile_fixed limit 1" works.
So, now - what happens when I issue the count(*)?
So, the server started up okay. The query begins to run - data is being processed in the client, I can see in the progress info - and there's no output yet in the server.log.
The after a few seconds, we get this;
===start===
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection
at org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:232) ~[httpclient-4.2.jar:4.2]
at org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:199) ~[httpclient-4.2.jar:4.2]
at sun.reflect.GeneratedMethodAccessor263.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_65]
at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_65]
at com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70) ~[aws-java-sdk-1.7.1.jar:na]
at com.amazonaws.http.conn.$Proxy131.getConnection(Unknown Source) ~[na:na]
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:456) ~[httpclient-4.2.jar:4.2]
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) ~[httpclient-4.2.jar:4.2]
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) ~[httpclient-4.2.jar:4.2]
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:380) [aws-java-sdk-1.7.1.jar:na]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:229) [aws-java-sdk-1.7.1.jar:na]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3384) [aws-java-sdk-1.7.1.jar:na]
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1093) [aws-java-sdk-1.7.1.jar:na]
at com.facebook.presto.hive.PrestoS3FileSystem$PrestoS3InputStream$2.call(PrestoS3FileSystem.java:567) [presto-hive-0.74.jar:0.74]
at com.facebook.presto.hive.PrestoS3FileSystem$PrestoS3InputStream$2.call(PrestoS3FileSystem.java:562) [presto-hive-0.74.jar:0.74]
at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:111) [presto-hive-0.74.jar:0.74]
at com.facebook.presto.hive.PrestoS3FileSystem$PrestoS3InputStream.getS3Object(PrestoS3FileSystem.java:557) [presto-hive-0.74.jar:0.74]
at com.facebook.presto.hive.PrestoS3FileSystem$PrestoS3InputStream.openStream(PrestoS3FileSystem.java:585) [presto-hive-0.74.jar:0.74]
at com.facebook.presto.hive.PrestoS3FileSystem$PrestoS3InputStream.seek(PrestoS3FileSystem.java:487) [presto-hive-0.74.jar:0.74]
at org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:89) [hadoop-apache2-0.1.jar:na]
at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:50) [hadoop-apache2-0.1.jar:na]
at org.apache.hadoop.hive.ql.io.RCFile$Reader.seek(RCFile.java:1544) [hive-apache-0.9.jar:na]
at org.apache.hadoop.hive.ql.io.RCFile$Reader.sync(RCFile.java:1576) [hive-apache-0.9.jar:na]
at org.apache.hadoop.hive.ql.io.RCFileRecordReader.<init>(RCFileRecordReader.java:110) [hive-apache-0.9.jar:na]
at org.apache.hadoop.hive.ql.io.RCFileInputFormat.getRecordReader(RCFileInputFormat.java:57) [hive-apache-0.9.jar:na]
at com.facebook.presto.hive.HiveRecordSet$1.call(HiveRecordSet.java:158) [presto-hive-0.74.jar:0.74]
at com.facebook.presto.hive.HiveRecordSet$1.call(HiveRecordSet.java:153) [presto-hive-0.74.jar:0.74]
at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:111) [presto-hive-0.74.jar:0.74]
at com.facebook.presto.hive.HiveRecordSet.createRecordReader(HiveRecordSet.java:152) [presto-hive-0.74.jar:0.74]
at com.facebook.presto.hive.HiveRecordSet.cursor(HiveRecordSet.java:111) [presto-hive-0.74.jar:0.74]
at com.facebook.presto.hive.HiveRecordSet.cursor(HiveRecordSet.java:57) [presto-hive-0.74.jar:0.74]
at com.facebook.presto.spi.classloader.ClassLoaderSafeRecordSet.cursor(ClassLoaderSafeRecordSet.java:48) [presto-spi-0.74.jar:0.74]
at com.facebook.presto.operator.RecordProjectOperator.<init>(RecordProjectOperator.java:45) [presto-main-0.74.jar:0.74]
at com.facebook.presto.split.RecordSetDataStreamProvider.createNewDataStream(RecordSetDataStreamProvider.java:40) [presto-main-0.74.jar:0.74]
at com.facebook.presto.split.DataStreamManager.createNewDataStream(DataStreamManager.java:58) [presto-main-0.74.jar:0.74]
at com.facebook.presto.operator.TableScanOperator.addSplit(TableScanOperator.java:132) [presto-main-0.74.jar:0.74]
at com.facebook.presto.operator.Driver.processNewSource(Driver.java:251) [presto-main-0.74.jar:0.74]
at com.facebook.presto.operator.Driver.processNewSources(Driver.java:216) [presto-main-0.74.jar:0.74]
at com.facebook.presto.operator.Driver.access$300(Driver.java:52) [presto-main-0.74.jar:0.74]
at com.facebook.presto.operator.Driver$DriverLockResult.close(Driver.java:491) [presto-main-0.74.jar:0.74]
at com.facebook.presto.operator.Driver.updateSource(Driver.java:194) [presto-main-0.74.jar:0.74]
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunnerFactory.createDriver(SqlTaskExecution.java:585) [presto-main-0.74.jar:0.74]
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunnerFactory.access$1700(SqlTaskExecution.java:552) [presto-main-0.74.jar:0.74]
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:673) [presto-main-0.74.jar:0.74]
at com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.process(TaskExecutor.java:444) [presto-main-0.74.jar:0.74]
at com.facebook.presto.execution.TaskExecutor$Runner.run(TaskExecutor.java:578) [presto-main-0.74.jar:0.74]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
===end===
In fact, we don't just get one - we get a whole rush of them at once, and then a pause. The after a bit, another rush of them, and another pause.
This repeats, until eventually the client says;
"Query 20140728_161350_00002_9j5eu failed: Unable to execute HTTP request: Timeout waiting for connection"
And the server stops generating output into server.log.
While this was going on, I in another window ran netstat, thus;
"netstat -t -n"
The output is this;
===start===
tcp 0 0 172.31.42.156:22 217.42.156.150:50477 ESTABLISHED
tcp 0 256 172.31.42.156:22 217.42.156.150:50501 ESTABLISHED
tcp 0 148104 172.31.42.156:22 217.42.156.150:50513 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:8080 ::ffff:172.31.42.156:55231 ESTABLISHED
tcp 56055 0 ::ffff:172.31.42.156:38440 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56054 0 ::ffff:172.31.42.156:38472 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55687 0 ::ffff:172.31.42.156:38404 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55223 0 ::ffff:172.31.42.156:38418 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55671 0 ::ffff:172.31.42.156:38439 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:8080 ::ffff:172.31.42.156:55228 ESTABLISHED
tcp 56763 0 ::ffff:172.31.42.156:38495 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56635 0 ::ffff:172.31.42.156:38464 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 57580 0 ::ffff:172.31.42.156:38431 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56071 0 ::ffff:172.31.42.156:38473 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 0 0 ::ffff:127.0.0.1:52684 ::ffff:127.0.0.1:8080 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:34675 ::ffff:172.31.27.7:9083 TIME_WAIT
tcp 0 0 ::ffff:172.31.42.156:55230 ::ffff:172.31.42.156:8080 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:34684 ::ffff:172.31.27.7:9083 TIME_WAIT
tcp 55158 0 ::ffff:172.31.42.156:38477 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56299 0 ::ffff:172.31.42.156:38433 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55223 0 ::ffff:172.31.42.156:38398 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56969 0 ::ffff:172.31.42.156:38457 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:55227 ::ffff:172.31.42.156:8080 ESTABLISHED
tcp 55109 0 ::ffff:172.31.42.156:38445 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55671 0 ::ffff:172.31.42.156:38460 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 54594 0 ::ffff:172.31.42.156:38465 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56071 0 ::ffff:172.31.42.156:38425 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:55228 ::ffff:172.31.42.156:8080 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:55222 ::ffff:172.31.42.156:8080 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:55221 ::ffff:172.31.42.156:8080 ESTABLISHED
tcp 56055 0 ::ffff:172.31.42.156:38434 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:8080 ::ffff:172.31.42.156:55227 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:55229 ::ffff:172.31.42.156:8080 ESTABLISHED
tcp 55687 0 ::ffff:172.31.42.156:38417 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55287 0 ::ffff:172.31.42.156:38420 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:8080 ::ffff:172.31.42.156:55221 ESTABLISHED
tcp 55687 0 ::ffff:172.31.42.156:38409 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:8080 ::ffff:172.31.42.156:55230 ESTABLISHED
tcp 57131 0 ::ffff:172.31.42.156:38489 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 54597 0 ::ffff:172.31.42.156:38474 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 53714 0 ::ffff:172.31.42.156:38421 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55158 0 ::ffff:172.31.42.156:38392 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:34682 ::ffff:172.31.27.7:9083 TIME_WAIT
tcp 0 0 ::ffff:172.31.42.156:8080 ::ffff:172.31.42.156:55229 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:34683 ::ffff:172.31.27.7:9083 TIME_WAIT
tcp 55671 0 ::ffff:172.31.42.156:38396 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56635 0 ::ffff:172.31.42.156:38419 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55223 0 ::ffff:172.31.42.156:38462 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56054 0 ::ffff:172.31.42.156:38466 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:55223 ::ffff:172.31.42.156:8080 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:8080 ::ffff:172.31.42.156:55223 ESTABLISHED
tcp 53698 0 ::ffff:172.31.42.156:38456 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 57515 0 ::ffff:172.31.42.156:38455 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56667 0 ::ffff:172.31.42.156:38441 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56376 0 ::ffff:172.31.42.156:38448 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56055 0 ::ffff:172.31.42.156:38423 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 57515 0 ::ffff:172.31.42.156:38399 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56683 0 ::ffff:172.31.42.156:38490 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56569 0 ::ffff:172.31.42.156:38488 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 53714 0 ::ffff:172.31.42.156:38412 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55977 0 ::ffff:172.31.42.156:38426 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 59424 0 ::ffff:172.31.42.156:38500 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 57147 0 ::ffff:172.31.42.156:38494 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55158 0 ::ffff:172.31.42.156:38446 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55687 0 ::ffff:172.31.42.156:38482 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:55226 ::ffff:172.31.42.156:8080 ESTABLISHED
tcp 56953 0 ::ffff:172.31.42.156:38458 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55158 0 ::ffff:172.31.42.156:38452 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:8080 ::ffff:172.31.42.156:55222 ESTABLISHED
tcp 54210 0 ::ffff:172.31.42.156:38492 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 58144 0 ::ffff:172.31.42.156:38450 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:8080 ::ffff:172.31.42.156:55226 ESTABLISHED
tcp 0 0 ::ffff:127.0.0.1:8080 ::ffff:127.0.0.1:52684 ESTABLISHED
tcp 57929 0 ::ffff:172.31.42.156:38461 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 57531 0 ::ffff:172.31.42.156:38467 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 0 0 ::ffff:172.31.42.156:34685 ::ffff:172.31.27.7:9083 TIME_WAIT
tcp 0 0 ::ffff:172.31.42.156:55231 ::ffff:172.31.42.156:8080 ESTABLISHED
tcp 55671 0 ::ffff:172.31.42.156:38501 ::ffff:54.231.10.177:443 ESTABLISHED
===end===
And if I reduce this to only connections to port 443 (which I think is s3), I have this...
===start===
tcp 56055 0 ::ffff:172.31.42.156:38440 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56054 0 ::ffff:172.31.42.156:38472 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55687 0 ::ffff:172.31.42.156:38404 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55223 0 ::ffff:172.31.42.156:38418 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55671 0 ::ffff:172.31.42.156:38439 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56763 0 ::ffff:172.31.42.156:38495 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56635 0 ::ffff:172.31.42.156:38464 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 57580 0 ::ffff:172.31.42.156:38431 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56071 0 ::ffff:172.31.42.156:38473 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55158 0 ::ffff:172.31.42.156:38477 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56299 0 ::ffff:172.31.42.156:38433 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55223 0 ::ffff:172.31.42.156:38398 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56969 0 ::ffff:172.31.42.156:38457 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55109 0 ::ffff:172.31.42.156:38445 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55671 0 ::ffff:172.31.42.156:38460 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 54594 0 ::ffff:172.31.42.156:38465 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56071 0 ::ffff:172.31.42.156:38425 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56055 0 ::ffff:172.31.42.156:38434 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55687 0 ::ffff:172.31.42.156:38417 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55287 0 ::ffff:172.31.42.156:38420 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55687 0 ::ffff:172.31.42.156:38409 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 57131 0 ::ffff:172.31.42.156:38489 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 54597 0 ::ffff:172.31.42.156:38474 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 53714 0 ::ffff:172.31.42.156:38421 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55158 0 ::ffff:172.31.42.156:38392 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55671 0 ::ffff:172.31.42.156:38396 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56635 0 ::ffff:172.31.42.156:38419 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55223 0 ::ffff:172.31.42.156:38462 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56054 0 ::ffff:172.31.42.156:38466 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 53698 0 ::ffff:172.31.42.156:38456 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 57515 0 ::ffff:172.31.42.156:38455 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56667 0 ::ffff:172.31.42.156:38441 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56376 0 ::ffff:172.31.42.156:38448 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56055 0 ::ffff:172.31.42.156:38423 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 57515 0 ::ffff:172.31.42.156:38399 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56683 0 ::ffff:172.31.42.156:38490 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56569 0 ::ffff:172.31.42.156:38488 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 53714 0 ::ffff:172.31.42.156:38412 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55977 0 ::ffff:172.31.42.156:38426 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 59424 0 ::ffff:172.31.42.156:38500 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 57147 0 ::ffff:172.31.42.156:38494 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55158 0 ::ffff:172.31.42.156:38446 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55687 0 ::ffff:172.31.42.156:38482 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 56953 0 ::ffff:172.31.42.156:38458 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55158 0 ::ffff:172.31.42.156:38452 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 54210 0 ::ffff:172.31.42.156:38492 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 58144 0 ::ffff:172.31.42.156:38450 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 57929 0 ::ffff:172.31.42.156:38461 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 57531 0 ::ffff:172.31.42.156:38467 ::ffff:54.231.10.177:443 ESTABLISHED
tcp 55671 0 ::ffff:172.31.42.156:38501 ::ffff:54.231.10.177:443 ESTABLISHED
===end===
Which you will note is 50 connections.
After the client gives up, these connections *immediately* go away - the client does not need to quit for this to occur. I specifically checked for this just now, running netstat during and then the instant after the client returns its error message.
Now, what's interesting is that this - ESTABLISHED - is different to the problem I described before, with CLOSE_WAIT.
It's the correct thread.
The performance tail off is really caused by the s3 connection pool being exhausted (e.g. you start with tons of performance, because everything is working, so you get 1 GB/sec or whatever for a short time, then no more connections open, so the time increases but the data transfer doesn't, so the per second bandwidth statistic tapers off).