Query exceeded local memory limit

1,741 views
Skip to first unread message

Ankur Goel

unread,
Aug 19, 2016, 4:37:49 PM8/19/16
to Presto
Hi Guys,
         I am running into this issue again, but this time I have more info to share

I am using SQL window function (PARTITION BY col1, col2 ORDER BY col3)
on 30GB / 25M rows of data and one of the nodes executing this particular 
plan fragment fails with the error. 

I have 25 workers running Presto 0.147

Looking closer at the plan I see that 8 buckets were chose to partition the data
which clearly is insufficient in this case and searching PrestoDB group lead me
to the suggestion around tweaking

query.initial-hash-partitions

1. Is this property still being used ? 

2. Can I set this property in the query session ?

3. How do we control partitioning/parallelism for different 
    parts of a query plan (like in Pig/Hive/MR) ?

4. Is the option of spill-to-disk available in 0.147 ?

For BIG queries involving WINDOW functions or COUNT(DISTINCT)
spill-to-disk / over-partition may be a better choice than failing the
query with 'Exceeded local memory limit'.  My 2 cents :-)

Thanks
-Ankur

Kamil Bajda-Pawlikowski

unread,
Aug 19, 2016, 4:49:24 PM8/19/16
to Presto
Hi Ankur,

1. Yes
2. Yes (hash_partition_count)
3. See task.concurrency property and some tuning advice at http://teradata.github.io/presto/docs/148t/admin/tuning.html
4. No

Our team at Teradata is currently working on spill to disk. Aggregations and joins are first, then we will work on additional operators. No firm date yet.

What is your query.max-memory-per-node value? How many nodes do you have? 
You should should be able to just add nodes and increase the hash partition count to get your query working in the meantime.

Thanks,
Kamil
Message has been deleted

Ankur Goel

unread,
Aug 19, 2016, 5:52:37 PM8/19/16
to Presto

Thanks for your response Kamil,

Here are my Presto worker JVM settings
-server
-Xmx12G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p

query.max-memory-per-node = 6G
Number of nodes = 25 (workers) 1 (coordinator)

So I tried setting the session property hash_partition_count=17
and did 3 runs all of which failed with below stack trace.

The query did make more progress than last time (hash_partition_count=8)
but failed every time with this

com.facebook.presto.operator.PageTransportTimeoutException: Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (http://host:8080/v1/task/20160819_212743_00167_b5rbk.3.7/results/20160819_212743_00167_b5rbk.2.4/1 - requests failed for 61.49s)
	at com.facebook.presto.operator.HttpPageBufferClient$1.onFailure(HttpPageBufferClient.java:375)
	at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException: Total timeout 10000 ms elapsed
	at com.google.common.base.Throwables.propagate(Throwables.java:160)
	at io.airlift.http.client.ResponseHandlerUtils.propagate(ResponseHandlerUtils.java:22)
	at com.facebook.presto.operator.HttpPageBufferClient$PageResponseHandler.handleException(HttpPageBufferClient.java:540)




I have seen Jetty throw this error when network bandwidth is saturated.

Any idea how to prevent this ?  

Should we try forcing local split scheduling ?

Can node-scheduler.network-topology be set in a query session ?

Thanks
-Ankur

Kamil Bajda-Pawlikowski

unread,
Aug 19, 2016, 5:56:20 PM8/19/16
to presto...@googlegroups.com

Please try making hash_partition_count=25 so that all workers participate.

If this doesn't help, try reducing the value of initial_splits_per_node by 2 (to display the current value please run "show session;").

Thanks,
Kamil


On Aug 19, 2016 5:48 PM, "Ankur Goel" <ankur....@gmail.com> wrote:
Thanks for your response Kamil,

Here are my Presto worker JVM settings
-server
-Xmx12G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p

query.max-memory-per-node = 6G
Number of nodes = 25 (workers) 1 (coordinator)

So I tried setting the session property hash_partition_count=17
and did 3 runs all of which failed with below stack trace.

The query did make more progress than last time (hash_partition_count=8)
but failed every time with this

com.facebook.presto.operator.PageTransportTimeoutException: Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (http://17.125.67.146:8080/v1/task/20160819_212743_00167_b5rbk.3.7/results/20160819_212743_00167_b5rbk.2.4/1 - requests failed for 61.49s)
	at com.facebook.presto.operator.HttpPageBufferClient$1.onFailure(HttpPageBufferClient.java:375)
	at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException: Total timeout 10000 ms elapsed
	at com.google.common.base.Throwables.propagate(Throwables.java:160)
	at io.airlift.http.client.ResponseHandlerUtils.propagate(ResponseHandlerUtils.java:22)
	at com.facebook.presto.operator.HttpPageBufferClient$PageResponseHandler.handleException(HttpPageBufferClient.java:540)




I have seen Jetty throw this error when network bandwidth is saturated.

Any idea how to prevent this ?  

Should we try forcing local split scheduling ?

Can node-scheduler.network-topology be set in a query session ?

Thanks
-Ankur


--
You received this message because you are subscribed to a topic in the Google Groups "Presto" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/presto-users/7_2rPASbkdw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to presto-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Message has been deleted

Ankur Goel

unread,
Aug 19, 2016, 9:07:08 PM8/19/16
to Presto
I did set the hash_partition_count=24  (no. of workers)
and reduced initial_splits_per_node by 6
the query finishes successfully.

Thank you for your help! 
Much appreciated :-)

The query performs poorly, though, takes 
13 minutes to process 44 GB  of data.

Let me read the teradata page on tuning to see 
what more I can find.

Thanks again
-Ankur

Kamil Bajda-Pawlikowski

unread,
Aug 21, 2016, 9:56:39 PM8/21/16
to presto...@googlegroups.com
OK, great. Let's look a bit deeper...
Could you please share your query & explain plan?



On Fri, Aug 19, 2016 at 9:06 PM, Ankur Goel <ankur....@gmail.com> wrote:
I did set the hash_partition_count=24  (no. of workers)
and reduced initial_splits_per_node by 6
the query no finishes successfully!

Thank you for your help! 
Much appreciated :-)

The query performs poorly, though, takes 
13 minutes to process 44 GB  of data.

Let me read the teradata page on tuning to see 
what more I can find.

Thanks again
-Ankur

To unsubscribe from this group and all its topics, send an email to presto-users...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Message has been deleted

Ankur Goel

unread,
Aug 22, 2016, 10:13:31 PM8/22/16
to Presto
Here is the query


SELECT approx_distinct(col5) uniq_col_5, 
       approx_distinct(col6) uniq_col6,
       approx_distinct(col7) uniq_col7,
       sum(total_col_4) total_col_4,
       sum(if(col77 <> '-1' AND (cast(date_diff('SECOND', cast(col77 AS TIMESTAMP),cast(col40 AS TIMESTAMP)) AS bigint) < 2), total_col_4, 0)) total_dupe_col_4,
       sum(total_col_4) - sum(if(col77 <> '-1' AND (cast(date_diff('SECOND', cast(col77 AS TIMESTAMP),cast(col40 AS TIMESTAMP)) AS bigint) < 2), total_col_4, 0)) total_deduped_col_4
FROM
  ( SELECT substr(col40,1,10) dt ,
      col5, 
      col6,
      case when cardinality(mylist)>0 then mylist[1].col7 else null end col7,
      case when col11  = 'string1' then 1 else 0 end total_col_4,
      col40,
      COALESCE(lead(col40) over (partition BY col5,col6
                                                   ORDER BY col40 DESC),'-1') col77
   FROM schema.table_old
   WHERE substr(col22.subcol1,10,4) = 'string22'
     AND dt >= '2016-06-08'
     AND col22.subcol2 IN ('string2', 'string3')
     AND col33.isTest = FALSE
     AND NOT CONTAINS (col44,'U13')
     AND NOT CONTAINS (col44,'T13')
   UNION ALL
   SELECT substr(col40,1,10) dt ,
      col5,
      col6,
      case when cardinality(mylist)>0 then mylist[1].col7 else null end col7,
      case when col11  = 'string1' then 1 else 0 end total_col_4,
      col40,
      COALESCE(lead(col40) over (partition BY col5,col6
                                                   ORDER BY col40 DESC),'-1') col77
   FROM schema.table
   WHERE substr(col22.subcol1,10,4) = 'string22'
     AND dt <= '2016-08-17'
     AND col22.subcol2 IN ('string2', 'string3')
     AND col33.isTest = FALSE
     AND NOT CONTAINS (col44,
                       'U13')
     AND NOT CONTAINS (col44,
                       'T13')) a

Kamil Bajda-Pawlikowski

unread,
Aug 23, 2016, 1:20:35 PM8/23/16
to Presto
I suggest looking at EXPLAIN and EXPLAIN ANALYZE. Presto UI also displays nice visualized Query Plan with some stats that are good to analyze.
If you are comfortable with posting the results here, someone can probably take a look.

One performance debugging technique is to materialize on disk certain stages of the original query (in your case the result of the UNION ALL) and measure the time of the broken down queries to identify which one is a bottleneck. After that, you can also break down the UNION ALL and see what you get.

On the cluster system level, are you observing CPU being utilized a lot? If not, please consider increasing the value of the following session property "task_concurrency" (defaults to 1) to maybe 4 or 8 (depending on the CPU cores you have per node).

Thanks,
Kamil
Reply all
Reply to author
Forward
0 new messages