"No worker nodes available" error

smi...@twitter.com

unread,

Mar 10, 2016, 3:30:45 PM3/10/16

to Presto

Hi all,

We are seeing some queries failing with "No nodes available to run query" followed by "No worker nodes available". It lasts for about 10-15 seconds before the new queries run normally again. The cluster has about 190 workers, all with 72G RAM. Workers have 64G and coordinator has 36G.

We are running 0.139. This is what our coordinator jvm.config looks like.

-server
-XX:+PreserveFramePointer
-XX:-UseBiasedLocking
-XX:+UnlockExperimentalVMOptions
-XX:G1MaxNewSizePercent=20
-XX:G1HeapRegionSize=32M
-XX:+UseG1GC
-XX:+ExplicitGCInvokesConcurrent
-XX:+UseGCOverheadLimit
-XX:ReservedCodeCacheSize=250M
-Djava.library.path=/usr/lib64
-XX:+ExplicitGCInvokesConcurrent
-XX:+AggressiveOpts
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCCause
-XX:+PrintGCDateStamps
-XX:+PrintClassHistogramAfterFullGC
-XX:+PrintClassHistogramBeforeFullGC
-XX:PrintFLSStatistics=2
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1
-verbose:gc
-Xms36G
-Xmx36G
-XX:-UseGCLogFileRotation
-Xloggc:/home/presto/presto/data/var/log/gc.log

I checked gc.log, http-request.log and server.log for coordinator and a couple of workers, but nothing stands out. GC pauses are in ms. All http requests have 2xx return codes.

I added some logs in DiscoveryNodeManager.java to narrow it down to a discovery issue. (https://github.com/twitter-forks/presto/pull/29/files)

And I see the number of alive nodes drop to 1 (just the coordinator itself) and rest of them being reported as dead nodes. All the worker nodes run fine during the "outage". Stats do not show any network problems either (which is confirmed by the logs).

Has anyone seen this issue, or have any ideas what could be wrong?

Exception for the running query that failed:

com.facebook.presto.spi.PrestoException: No nodes available to run query
	at com.facebook.presto.execution.scheduler.SimpleNodeSelector.computeAssignments(SimpleNodeSelector.java:120)
	at com.facebook.presto.execution.scheduler.DynamicSplitPlacementPolicy.computeAssignments(DynamicSplitPlacementPolicy.java:42)
	at com.facebook.presto.execution.scheduler.SourcePartitionedScheduler.schedule(SourcePartitionedScheduler.java:97)
	at com.facebook.presto.execution.scheduler.SqlQueryScheduler.schedule(SqlQueryScheduler.java:322)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Exception for the new queries that fail:

com.facebook.presto.spi.PrestoException: No worker nodes available
	at com.facebook.presto.util.Failures.checkCondition(Failures.java:76)
	at com.facebook.presto.sql.planner.SystemPartitioningHandle.getNodePartitionMap(SystemPartitioningHandle.java:149)
	at com.facebook.presto.sql.planner.NodePartitioningManager.getNodePartitioningMap(NodePartitioningManager.java:96)
	at com.facebook.presto.execution.scheduler.SqlQueryScheduler.lambda$null$157(SqlQueryScheduler.java:121)
	at java.util.HashMap.computeIfAbsent(HashMap.java:1118)
	at com.facebook.presto.execution.scheduler.SqlQueryScheduler.lambda$new$158(SqlQueryScheduler.java:121)
	at com.facebook.presto.execution.scheduler.SqlQueryScheduler.createStages(SqlQueryScheduler.java:212)
	at com.facebook.presto.execution.scheduler.SqlQueryScheduler.<init>(SqlQueryScheduler.java:112)
	at com.facebook.presto.execution.SqlQueryExecution.planDistribution(SqlQueryExecution.java:305)
	at com.facebook.presto.execution.SqlQueryExecution.start(SqlQueryExecution.java:211)
	at com.facebook.presto.execution.QueuedExecution.lambda$start$140(QueuedExecution.java:68)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Thanks,

Sailesh

smi...@twitter.com

unread,

Mar 14, 2016, 1:02:40 PM3/14/16

to Presto

Can someone help with any ideas?

Kamil Bajda-Pawlikowski

unread,

Mar 14, 2016, 1:31:31 PM3/14/16

to Presto

Never saw that myself. How often does this happen?

Do you observe anything unusual before those errors? Like system is struggling to keep up with the load (high CPU, RAM, disk, network usage)?

How many concurrent queries are you running?

Sailesh Mittal

unread,

Mar 14, 2016, 2:19:47 PM3/14/16

to presto...@googlegroups.com

This happens almost every hour to one or two queries.

Nothing is unusual - COU, RAM, GC, network or disk. Memory pool drops down to 0 because it can not find any workers.

Number of concurrent queries do not generally exceed 5-6. Sometimes this error occurs with just 1 running query. And sometimes, all 5-6 queries fail with this error.

--
You received this message because you are subscribed to the Google Groups "Presto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to presto-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kamil Bajda-Pawlikowski

unread,

Mar 14, 2016, 2:55:03 PM3/14/16

to Presto

Shooting in the dark here... is it at all possible that your Coordinator periodically but briefly disappears from the network and that causes all Workers to re-register themselves?

Schlussel, Rebecca T

unread,

Mar 14, 2016, 3:31:27 PM3/14/16

to presto...@googlegroups.com

I agree that regular periods where the worker nodes aren't registering themselves at the appropriate frequency seem like momentary network outages. There are some discovery server properties about how long the discovery server keeps cached node information: https://github.com/airlift/discovery/blob/master/discovery-server/src/main/java/io/airlift/discovery/server/DiscoveryConfig.java. You could try setting discovery.store-cache-ttl to a higher value to allow the coordinator to use the cached worker information for longer to assign work. Though if coordinator and worker nodes are still disconnected then you'll see an error when it tries actually sending the work.

From: presto...@googlegroups.com [presto...@googlegroups.com] on behalf of Kamil Bajda-Pawlikowski [kba...@gmail.com]
Sent: Monday, March 14, 2016 2:55 PM
To: Presto
Subject: Re: "No worker nodes available" error

Sailesh Mittal

unread,

Mar 14, 2016, 3:43:24 PM3/14/16

to presto...@googlegroups.com

@Kamil Coordinator does not go out of network. All http-requests have 2xx return codes, and network graph does not show any strange behaviors.

@Rebecca, didn't know that ttl was just 1 second. I will try increasing it to 10 seconds to match with announcement frequency.

Kamil Bajda-Pawlikowski

unread,

Mar 15, 2016, 10:14:07 AM3/15/16

to Presto

Do you see any improvement?

Sailesh Mittal

unread,

Mar 15, 2016, 11:55:57 AM3/15/16

to presto...@googlegroups.com

I updated the coordinator yesterday evening and there hasn't been much traffic yet. I will update this thread on how it went today evening.

smi...@twitter.com

unread,

Mar 16, 2016, 5:54:36 PM3/16/16

to Presto

Hi Kamil, we are still seeing the "No worker nodes available". Increasing cache-ttl didn't seem to work.

To unsubscribe from this group and stop receiving emails from it, send an email to presto-users+unsubscribe@googlegroups.com.

smi...@twitter.com

unread,

Mar 21, 2016, 2:44:49 PM3/21/16

to Presto

So here is an update. The issue was indeed network. The announcements were not going through because network was completely used up by tasks reading data from hdfs. We have 1GE network and working on getting 10GEs. But in the mean time, is there a way to limit the network usage per worker?

We tried reducing the "task.max-worker-threads" from 96 to 32 to limit it, and it seemed to work in most cases, but we saw a large query that ran for more than 30 mins, ultimately causing this "No worker nodes available" error when a few more queries were issued.

Any more configs that might help in this case?

Thanks

Dain Sundstrom

unread,

Mar 21, 2016, 4:02:47 PM3/21/16

to presto...@googlegroups.com

That is what we typically do to limit network traffic.

-dain

> On Mar 21, 2016, at 11:44 AM, smittal via Presto <presto...@googlegroups.com> wrote:
>
> So here is an update. The issue was indeed network. The announcements were not going through because network was completely used up by tasks reading data from hdfs. We have 1GE network and working on getting 10GEs. But in the mean time, is there a way to limit the network usage per worker?
>
> We tried reducing the "task.max-worker-threads" from 96 to 32 to limit it, and it seemed to work in most cases, but we saw a large query that ran for more than 30 mins, ultimately causing this "No worker nodes available" error when a few more queries were issued.
>
> Any more configs that might help in this case?
>
> Thanks
>
> On Wednesday, March 16, 2016 at 2:54:36 PM UTC-7, smi...@twitter.com wrote:
> Hi Kamil, we are still seeing the "No worker nodes available". Increasing cache-ttl didn't seem to work.
>
> On Tuesday, March 15, 2016 at 8:55:57 AM UTC-7, Sailesh Mittal wrote:
> I updated the coordinator yesterday evening and there hasn't been much traffic yet. I will update this thread on how it went today evening.
>
> On Tue, Mar 15, 2016 at 7:14 AM, Kamil Bajda-Pawlikowski <kba...@gmail.com> wrote:
> Do you see any improvement?
>
>
> On Monday, March 14, 2016 at 3:43:24 PM UTC-4, Sailesh Mittal wrote:
> @Kamil Coordinator does not go out of network. All http-requests have 2xx return codes, and network graph does not show any strange behaviors.
>
> @Rebecca, didn't know that ttl was just 1 second. I will try increasing it to 10 seconds to match with announcement frequency.
>
> On Mon, Mar 14, 2016 at 12:31 PM, Schlussel, Rebecca T <Rebecca....@teradata.com> wrote:
> I agree that regular periods where the worker nodes aren't registering themselves at the appropriate frequency seem like momentary network outages. There are some discovery server properties about how long the discovery server keeps cached node information: https://github.com/airlift/discovery/blob/master/discovery-server/src/main/java/io/airlift/discovery/server/DiscoveryConfig.java. You could try setting discovery.store-cache-ttl to a higher value to allow the coordinator to use the cached worker information for longer to assign work. Though if coordinator and worker nodes are still disconnected then you'll see an error when it tries actually sending the work.
>

smi...@twitter.com

unread,

Mar 21, 2016, 4:30:09 PM3/21/16

to Presto

Thanks, we are trying with different values to see what the good number would be.

Kamil Bajda-Pawlikowski

unread,

Mar 23, 2016, 8:51:54 AM3/23/16

to Presto

Is your HDFS colocated with Preso? If so, are you using "hive.force-local-scheduling=true" ?

Sailesh Mittal

unread,

Mar 23, 2016, 1:43:33 PM3/23/16

to presto...@googlegroups.com

On Wed, Mar 23, 2016 at 5:51 AM, Kamil Bajda-Pawlikowski <kba...@gmail.com> wrote:

Is your HDFS colocated with Preso? If so, are you using "hive.force-local-scheduling=true" ?

No and No.

Yesterday in the meetup, we discussed this and it seems we are on the right track. Limiting the threads is currently the only way to throttle network usage, and a good value for threads is 1-2 per cores, and since we have 24 cores, 24 threads looks like a nice value (32 still gave us this error for a very large query).

wang...@conew.com

unread,

Dec 7, 2016, 3:49:12 AM12/7/16

to Presto

Are you using S3 as Hive tables' storage? I had the same problem and I solved it by use the parameter metioned in this page https://github.com/prestodb/presto/issues/5375. You can check it!

Reply all

Reply to author

Forward