Hadoop indexing task fails when partitioning by dimension

32 views
Skip to first unread message

Nikita Salnikov-Tarnovski

unread,
Sep 7, 2017, 2:24:10 AM9/7/17
to Druid User
I have a datasource in Druid that I want to repartition by one of the dimensions. So I run a Hadoop indexing task, which reads from the existing datasource, apply different segment granularity and partitionSpec. This task completes successfully for some time periods, but fails for other with the following exception:

2017-09-06T20:54:23,450 INFO [pool-22-thread-1] io.druid.indexer.DeterminePartitionsJob - Determining partitions for interval: 2017-08-19T00:00:00.000Z/2017-08-20T00:00:00.000Z


2017-09-06T20:54:23,451 INFO [pool-22-thread-1] io.druid.indexer.DeterminePartitionsJob - Adding possible shard with 30,790 rows and 1 unique values:SingleDimensionShardSpec{dimension='accountId', start='null', end='reductedAccount1', partitionNum=0}


2017-09-06T20:54:23,451 INFO [pool-22-thread-1] io.druid.indexer.DeterminePartitionsJob - Adding possible shard with 5,069,343 rows and 1 unique values:SingleDimensionShardSpec{dimension='accountId', start='reductedAccount1', end='reductedAccount2', partitionNum=1}


2017-09-06T20:54:23,456 INFO [pool-22-thread-1] io.druid.indexer.DeterminePartitionsJob - Adding possible shard with 3,992,123 rows and 55 unique values:SingleDimensionShardSpec{dimension='accountId', start='reductedAccount2', end='reductedAccount3', partitionNum=2}


2017-09-06T20:54:23,458 INFO [pool-22-thread-1] io.druid.indexer.DeterminePartitionsJob - Adding possible shard with 2,473,194 rows and 16 unique values:SingleDimensionShardSpec{dimension='accountId', start='reductedAccount3', end='null', partitionNum=3}


2017-09-06T20:54:23,458 INFO [pool-22-thread-1] io.druid.indexer.DeterminePartitionsJob - Completed dimension[accountId]: 4 possible shards with 73 unique values


2017-09-06T20:54:23,461 INFO [pool-22-thread-1] io.druid.indexer.DeterminePartitionsJob - Chosen partitions:


2017-09-06T20:54:23,461 INFO [pool-22-thread-1] io.druid.indexer.DeterminePartitionsJob -   {"type":"single","dimension":"accountId","start":null,"end":"reductedAccount1","partitionNum":0}


2017-09-06T20:54:23,461 INFO [pool-22-thread-1] io.druid.indexer.DeterminePartitionsJob -   {"type":"single","dimension":"accountId","start":"reductedAccount1","end":"reductedAccount2","partitionNum":1}


2017-09-06T20:54:23,461 INFO [pool-22-thread-1] io.druid.indexer.DeterminePartitionsJob -   {"type":"single","dimension":"accountId","start":"reductedAccount2","end":"reductedAccount3","partitionNum":2}


2017-09-06T20:54:23,461 INFO [pool-22-thread-1] io.druid.indexer.DeterminePartitionsJob -   {"type":"single","dimension":"accountId","start":"reductedAccount3","end":null,"partitionNum":3}


2017-09-06T20:54:23,461 INFO [pool-22-thread-1] org.apache.hadoop.mapred.Task - Task:attempt_local1020340908_0002_r_000006_0 is done. And is in the process of committing


2017-09-06T20:54:23,462 INFO [pool-22-thread-1] org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce


2017-09-06T20:54:23,462 INFO [pool-22-thread-1] org.apache.hadoop.mapred.Task - Task 'attempt_local1020340908_0002_r_000006_0' done.


2017-09-06T20:54:23,462 INFO [pool-22-thread-1] org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local1020340908_0002_r_000006_0


2017-09-06T20:54:23,462 INFO [Thread-248] org.apache.hadoop.mapred.LocalJobRunner - reduce task executor complete.


2017-09-06T20:54:23,670 WARN [Thread-248] org.apache.hadoop.mapred.LocalJobRunner - job_local1020340908_0002


java.lang.Exception: com.metamx.common.ISE: No suitable partitioning dimension found!


        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]


        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.3.0.jar:?]


Caused by: com.metamx.common.ISE: No suitable partitioning dimension found!


        at io.druid.indexer.DeterminePartitionsJob$DeterminePartitionsDimSelectionReducer.innerReduce(DeterminePartitionsJob.java:753) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2]


        at io.druid.indexer.DeterminePartitionsJob$DeterminePartitionsDimSelectionBaseReducer.reduce(DeterminePartitionsJob.java:496) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2]


        at io.druid.indexer.DeterminePartitionsJob$DeterminePartitionsDimSelectionBaseReducer.reduce(DeterminePartitionsJob.java:470) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2]


        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]


        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]


        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]


        at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]


        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_131]


        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_131]


        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]


        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]


        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]


2017-09-06T20:54:24,089 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Job job_local1020340908_0002 failed with state FAILED due to: NA


Any ideas, what does this mean?
Reply all
Reply to author
Forward
0 new messages