We're looking for some help from the community as we've seen queries failing recently due to nodes crashing. It's predictable in that a specific set of queries can cause this to happen. Any advice on optimizing our settings or VM sizes would be greatly appreciated.
We operate a 25 node Presto .179 in Azure cloud.
VM sizes are 8-Core / 110GB of memory each
It is current utilized for both ad-hoc queries of Hive tables (Parquet and some ORC) external tables and have recently started to see nodes crashing with the following error:
com.facebook.presto.spi.PrestoException: Could not communicate with the remote task. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (10.115.1.62:8080)
at com.facebook.presto.server.remotetask.ContinuousTaskStatusFetcher.updateTaskStatus(ContinuousTaskStatusFetcher.java:239)
at com.facebook.presto.server.remotetask.ContinuousTaskStatusFetcher.success(ContinuousTaskStatusFetcher.java:170)
at com.facebook.presto.server.remotetask.ContinuousTaskStatusFetcher.success(ContinuousTaskStatusFetcher.java:53)
at com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:49)
at com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)
at com.google.common.util.concurrent.Futures$4.run(Futures.java:1135)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Our usage patterns vary:
Table sizes can be measured in the 100TB range down to 1TB (some queries are not filtered by our partitioning scheme)
max concurrent queries: 15-30 typically with often none running or periodically a peak of 70 queries.
* Note I've seen 10 concurrent queries against a relatively small table 500GB table will cause a crash.
Our conf settings are:
conf.properties:
query.max-memory-per-node=50GB
jvm.conf:
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-XX:ReservedCodeCacheSize=512M
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=xxxx
hive.properties:
hive.metastore.uri=thrift://x.x.x.x:xxxx,thrift://x.x.x.x:xxxx
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
hive.allow-drop-table=true
hive.storage-format=PARQUET
hive.immutable-partitions=true
hive.metastore-cache-ttl=5m
hive.metastore-refresh-interval=10s
hive.parquet.use-column-names=true
/etc/security/limits.conf:
* soft nproc 1024000
* hard nproc 1024000
* hard nofile 1024000
* soft nofile 1024000