java.lang.IndexOutOfBoundsException while running SQL query having self joins

Markku Hinkka

unread,

Oct 13, 2014, 3:13:33 AM10/13/14

to presto...@googlegroups.com

Hi,

I've been trying to run large self join queries on Presto deployed into a cluster which has Hadoop 2.4.0, Hive 0.13.1 and Presto 0.77 servers
running.

For a reason or another, when the number of rows in the table having the self query increases, I quite fast start getting errors into server logs
similar to these:
----------
2014-10-13T02:00:17.249+0300    ERROR   SplitRunner-20-75       com.facebook.presto.execution.TaskExecutor      Error processing Split 20141012_230000_00000_pyz92.10.1-0 (start = 14131
54813122, wall = 4107 ms, cpu = 17 ms, calls = 5)
java.lang.IndexOutOfBoundsException: end index (16391) must not be greater than size (16386)
        at io.airlift.slice.Preconditions.checkPositionIndexes(Preconditions.java:80) ~[slice-0.7.jar:0.7]
        at io.airlift.slice.Slice.checkIndexLength(Slice.java:984) ~[slice-0.7.jar:0.7]
        at io.airlift.slice.Slice.getLong(Slice.java:366) ~[slice-0.7.jar:0.7]
        at com.facebook.presto.spi.block.AbstractVariableWidthBlock.getLong(AbstractVariableWidthBlock.java:59) ~[presto-spi-0.77.jar:0.77]
        at com.facebook.presto.spi.type.BigintType.compareTo(BigintType.java:72) ~[presto-spi-0.77.jar:0.77]
        at com.facebook.presto.spi.block.SortOrder.compareBlockValue(SortOrder.java:62) ~[presto-spi-0.77.jar:0.77]
        at com.facebook.presto.$gen.PagesIndexComparator_139.compareTo(Unknown Source) ~[na:na]
        at com.facebook.presto.operator.PagesIndex$1.compare(PagesIndex.java:241) ~[presto-main-0.77.jar:0.77]
        at com.facebook.presto.operator.WindowOperator.getOutput(WindowOperator.java:281) ~[presto-main-0.77.jar:0.77]
        at com.facebook.presto.operator.Driver.process(Driver.java:330) ~[presto-main-0.77.jar:0.77]
        at com.facebook.presto.operator.Driver.processFor(Driver.java:272) ~[presto-main-0.77.jar:0.77]
        at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:541) ~[presto-main-0.77.jar:0.77]
        at com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.process(TaskExecutor.java:444) ~[presto-main-0.77.jar:0.77]
        at com.facebook.presto.execution.TaskExecutor$Runner.run(TaskExecutor.java:578) ~[presto-main-0.77.jar:0.77]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
----------

The threshold number of rows when this problem starts to appear is somewhere in between 10000 and 100000 rows, so the tables aren't that big. The rows in
the table (events) are like this:
----------
3980444,614,2010-06-21 15:16:04.000
3980730,614,2010-06-21 15:16:04.000
3985097,614,2010-06-14 11:42:41.000
3985261,614,2010-06-14 11:42:41.000
----------

For tables having less than 10000 rows, everything seems to work without problems.

The query I'm trying to run is like this:
----------
WITH events_with_rank_asc AS
(
SELECT
    evt_case,
    evt_event,
    evt_timestamp,
    ROW_NUMBER() OVER (PARTITION BY evt_case ORDER BY evt_timestamp ASC, evt_event ASC) AS evt_asc_rank
FROM
    events
),
events_with_rank_desc AS
(
SELECT
    evt_case,
    evt_event,
    evt_timestamp,
    ROW_NUMBER() OVER (PARTITION BY evt_case ORDER BY evt_timestamp DESC, evt_event DESC) AS evt_desc_rank
FROM
    events
)
SELECT *
FROM
(
SELECT
    count(1) AS tra_count,
    x.evt_event AS tra_event,
    y.evt_event AS tra_next_event,
    CAST((SUM(y.evt_timestamp - x.evt_timestamp)) AS BIGINT) AS tra_duration
FROM
    events_with_rank_asc x
      JOIN events_with_rank_asc y ON
        ((x.evt_case = y.evt_case) AND (x.evt_asc_rank = y.evt_asc_rank-1))
GROUP BY
    x.evt_event, y.evt_event
UNION ALL
SELECT
    count(1) AS tra_count,
    '_START' AS tra_event,
    evt_event AS tra_next_event,
    CAST(0.0 AS BIGINT) AS tra_duration
FROM
    events_with_rank_asc
WHERE
    evt_asc_rank = 1
GROUP BY
    evt_event
UNION ALL
SELECT
    count(1) AS tra_count,
    evt_event AS tra_event,
    '_END' AS tra_next_event,
    CAST(0.0 AS BIGINT) AS tra_duration
FROM
    events_with_rank_desc
WHERE
    evt_desc_rank = 1
GROUP BY
    evt_event
) unionResult
ORDER BY
tra_count DESC, tra_event ASC, tra_next_event ASC;
----------

Configurations look like this:
jvm.config:
---
-server
-Xmx7G
-XX:+UseConcMarkSweepGC
-XX:+ExplicitGCInvokesConcurrent
-XX:+CMSClassUnloadingEnabled
-XX:+AggressiveOpts
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-XX:PermSize=150M
-XX:MaxPermSize=150M
-XX:ReservedCodeCacheSize=150M
-Xbootclasspath/p:<PRESTO_SERVER_INSTALLATION_PATH>/lib/floatingdecimal-0.1.jar
---

config.properties
---
http-server.http.port=<PRESTO_PORT>
task.max-memory=4GB
discovery-server.enabled=true
discovery.uri=http://<PRESTO_DISCOVERY_HOST>:<PRESTO_PORT>
---

hive.properties:
---
connector.name=hive-hadoop2
hive.metastore.uri=thrift://<NAMENODE>:<HIVE_PORT>
hive.config.resources=<HADOOP_CONFIG_PATH>/core-site.xml,<HADOOP_CONFIG_PATH>/hdfs-site.xml
---

There is also a bit related question (that has been resolved) for earlier Presto version I wrote earlier in:
https://groups.google.com/d/msg/presto-users/B5YVhiddkdM/8WmvM0qnLDEJ

Does anyone have any idea what could be causing the failed queries or even have an idea on how to proceed in the investigation? I'll gladly give more information if needed. I wouldn't want to
revert back to using 0.60 since I'm trying to do a benchmark of the latest versions of several different frameworks.

- Markku

Dain Sundstrom

unread,

Oct 14, 2014, 1:43:57 PM10/14/14

to presto...@googlegroups.com

This is definitely a bug. Can you file an issue? Also, can you reproduce this with a simpler query, or maybe a query against the tpch connector?

-dain

Markku Hinkka

unread,

Oct 20, 2014, 6:52:07 AM10/20/14

to presto...@googlegroups.com

Thanks,

I posted an issue with simpler query. It can be found here:
https://github.com/facebook/presto/issues/1833

- Markku

Reply all

Reply to author

Forward