Internal Error - encountered too many errors talking to a worker node

487 views
Skip to first unread message

Yuchen Mao

unread,
Mar 24, 2017, 6:25:54 PM3/24/17
to Presto
Hi, 

I've been benchmarking large queries. Sometimes, I notice that the query will cause Presto to error out. Below is the stacktrace for one example: 
Please let me know what further or config settings I need to supply to figure out what causes this. 

Here is the actual query:
explain analyze select count(1) from fact.tableA log join fact.tableB act on log.udid = act.udid and log.ad_view_id = act.ad_view_id and log.offerid = act.offer_id where log.d between '2017-03-19' and '2017-03-23' and act.d between '2017-03-19' and '2017-03-23'

tableA is around 0.5 TB per day and tableB is around 3 GB per day.

Thanks

com.facebook.presto.operator.PageTransportTimeoutException: Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (http://10.10.9.29:8080/v1/task/20170324_220235_00271_x6dsg.3.18/results/18/1462 - 62 failures, time since last success 99.92s)
	at com.facebook.presto.operator.HttpPageBufferClient$1.onFailure(HttpPageBufferClient.java:379)
	at com.google.common.util.concurrent.Futures$4.run(Futures.java:1123)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: io.airlift.http.client.RuntimeIOException: java.net.SocketTimeoutException: Connect Timeout
	at io.airlift.http.client.ResponseHandlerUtils.propagate(ResponseHandlerUtils.java:20)
	at com.facebook.presto.operator.HttpPageBufferClient$PageResponseHandler.handleException(HttpPageBufferClient.java:514)
	at com.facebook.presto.operator.HttpPageBufferClient$PageResponseHandler.handleException(HttpPageBufferClient.java:508)
	at io.airlift.http.client.jetty.JettyHttpClient$JettyResponseFuture.failed(JettyHttpClient.java:876)
	at io.airlift.http.client.jetty.JettyHttpClient$BufferingResponseListener.onComplete(JettyHttpClient.java:1114)
	at org.eclipse.jetty.client.ResponseNotifier.notifyComplete(ResponseNotifier.java:193)
	at org.eclipse.jetty.client.ResponseNotifier.notifyComplete(ResponseNotifier.java:185)
	at org.eclipse.jetty.client.HttpExchange.notifyFailureComplete(HttpExchange.java:269)
	at org.eclipse.jetty.client.HttpExchange.abort(HttpExchange.java:240)
	at org.eclipse.jetty.client.HttpConversation.abort(HttpConversation.java:141)
	at org.eclipse.jetty.client.HttpRequest.abort(HttpRequest.java:708)
	at org.eclipse.jetty.client.HttpDestination.abort(HttpDestination.java:267)
	at org.eclipse.jetty.client.PoolingHttpDestination.failed(PoolingHttpDestination.java:90)
	at org.eclipse.jetty.client.DuplexConnectionPool$1.failed(DuplexConnectionPool.java:159)
	at org.eclipse.jetty.util.Promise$Wrapper.failed(Promise.java:84)
	at org.eclipse.jetty.client.HttpClient$1$1.failed(HttpClient.java:572)
	at org.eclipse.jetty.client.AbstractHttpClientTransport.connectFailed(AbstractHttpClientTransport.java:152)
	at org.eclipse.jetty.client.AbstractHttpClientTransport$ClientSelectorManager.connectionFailed(AbstractHttpClientTransport.java:195)
	at org.eclipse.jetty.io.ManagedSelector$Connect.failed(ManagedSelector.java:661)
	at org.eclipse.jetty.io.ManagedSelector$Connect.access$1300(ManagedSelector.java:628)
	at org.eclipse.jetty.io.ManagedSelector$ConnectTimeout.run(ManagedSelector.java:683)
	... 7 more
Caused by: java.net.SocketTimeoutException: Connect Timeout
	... 8 more

Bill Graham

unread,
Mar 24, 2017, 6:35:36 PM3/24/17
to Presto
We've seen this before due to communication issues. I'd recommend checking to see if you're saturating your NICs, which would surely cause these exceptions. If not that, check for CG pressure causing long pauses.

--
You received this message because you are subscribed to the Google Groups "Presto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to presto-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages