Persist index error and ArrayOutOfBoundsException while batch ingestion...

224 views
Skip to first unread message

prinull

unread,
May 20, 2016, 3:11:32 AM5/20/16
to Druid User
I am using hadoop indexer (java -Xmx12G -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:conf/druid/_common io.druid.cli.Main index hadoop specfile.json) because the other indexer gives errors that I cannot understand.

Here is the end of my log file:

2016-05-18T15:12:51,908 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2016-05-18T15:12:52,184 INFO [pool-10-thread-1] io.druid.indexer.IndexGeneratorJob - Maximum number of rows [50000] reached
2016-05-18T15:12:52,185 INFO [pool-10-thread-1] io.druid.indexer.IndexGeneratorJob - 50,001 lines to 50,000 rows in 1,402 millis
2016-05-18T15:12:52,185 INFO [index00020-persist] io.druid.segment.IndexMerger - Starting persist for interval[2016-02-21T00:00:00.000Z/2016-02-22T00:00:00.000Z], rows[50,000]
2016-05-18T15:12:52,232 INFO [index00020-persist] io.druid.segment.IndexMerger - outDir[/tmp/base880361292817636286flush/index00020/v8-tmp] completed index.drd in 1 millis.
2016-05-18T15:12:52,258 INFO [index00020-persist] io.druid.segment.IndexMerger - outDir[/tmp/base880361292817636286flush/index00020/v8-tmp] completed dim conversions in 26 millis.
2016-05-18T15:12:52,262 ERROR [index00020-persist] io.druid.indexer.IndexGeneratorJob - persist index error
2016-05-18T15:12:53,635 INFO [pool-10-thread-1] io.druid.indexer.IndexGeneratorJob - Maximum number of rows [50000] reached
2016-05-18T15:12:53,635 INFO [pool-10-thread-1] io.druid.indexer.IndexGeneratorJob - 50,001 lines to 50,000 rows in 1,370 millis
2016-05-18T15:12:53,636 INFO [index00021-persist] io.druid.segment.IndexMerger - Starting persist for interval[2016-02-21T00:00:00.000Z/2016-02-22T00:00:00.000Z], rows[50,000]
2016-05-18T15:12:53,709 INFO [index00021-persist] io.druid.segment.IndexMerger - outDir[/tmp/base880361292817636286flush/index00021/v8-tmp] completed index.drd in 0 millis.
2016-05-18T15:12:53,745 INFO [index00021-persist] io.druid.segment.IndexMerger - outDir[/tmp/base880361292817636286flush/index00021/v8-tmp] completed dim conversions in 36 millis.
2016-05-18T15:12:53,750 ERROR [index00021-persist] io.druid.indexer.IndexGeneratorJob - persist index error
2016-05-18T15:12:54,908 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2016-05-18T15:12:55,019 INFO [pool-10-thread-1] io.druid.indexer.IndexGeneratorJob - Maximum number of rows [50000] reached
2016-05-18T15:12:55,019 INFO [pool-10-thread-1] io.druid.indexer.IndexGeneratorJob - 50,006 lines to 50,000 rows in 1,267 millis
2016-05-18T15:12:55,019 INFO [index00022-persist] io.druid.segment.IndexMerger - Starting persist for interval[2016-02-21T00:00:00.000Z/2016-02-22T00:00:00.000Z], rows[50,000]
2016-05-18T15:12:55,084 INFO [index00022-persist] io.druid.segment.IndexMerger - outDir[/tmp/base880361292817636286flush/index00022/v8-tmp] completed index.drd in 0 millis.
2016-05-18T15:12:55,126 INFO [index00022-persist] io.druid.segment.IndexMerger - outDir[/tmp/base880361292817636286flush/index00022/v8-tmp] completed dim conversions in 42 millis.
2016-05-18T15:12:55,131 ERROR [index00022-persist] io.druid.indexer.IndexGeneratorJob - persist index error
2016-05-18T15:12:56,474 INFO [pool-10-thread-1] io.druid.indexer.IndexGeneratorJob - Maximum number of rows [50000] reached
2016-05-18T15:12:56,474 INFO [pool-10-thread-1] io.druid.indexer.IndexGeneratorJob - 50,000 lines to 50,000 rows in 1,341 millis
2016-05-18T15:12:56,474 INFO [index00023-persist] io.druid.segment.IndexMerger - Starting persist for interval[2016-02-21T00:00:00.000Z/2016-02-22T00:00:00.000Z], rows[50,000]
2016-05-18T15:12:56,536 INFO [index00023-persist] io.druid.segment.IndexMerger - outDir[/tmp/base880361292817636286flush/index00023/v8-tmp] completed index.drd in 0 millis.
2016-05-18T15:12:56,567 INFO [index00023-persist] io.druid.segment.IndexMerger - outDir[/tmp/base880361292817636286flush/index00023/v8-tmp] completed dim conversions in 31 millis.
2016-05-18T15:12:56,572 ERROR [index00023-persist] io.druid.indexer.IndexGeneratorJob - persist index error
2016-05-18T15:12:56,816 INFO [pool-10-thread-1] io.druid.indexer.IndexGeneratorJob - 1,209,683 lines completed.
2016-05-18T15:12:56,816 INFO [pool-10-thread-1] io.druid.segment.IndexMerger - Starting persist for interval[2016-02-21T00:00:00.000Z/2016-02-22T00:00:00.000Z], rows[9,628]
2016-05-18T15:12:56,832 INFO [pool-10-thread-1] io.druid.segment.IndexMerger - outDir[/tmp/base880361292817636286flush/final/v8-tmp] completed index.drd in 0 millis.
2016-05-18T15:12:56,849 INFO [pool-10-thread-1] io.druid.segment.IndexMerger - outDir[/tmp/base880361292817636286flush/final/v8-tmp] completed dim conversions in 17 millis.
2016-05-18T15:12:56,854 INFO [Thread-232] org.apache.hadoop.mapred.LocalJobRunner - reduce task executor complete.
2016-05-18T15:12:57,267 WARN [Thread-232] org.apache.hadoop.mapred.LocalJobRunner - job_local1298338564_0002
java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 39
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.3.0.jar:?]
Caused by: java.lang.ArrayIndexOutOfBoundsException: 39
at io.druid.segment.IndexMerger$10.apply(IndexMerger.java:1057) ~[druid-processing-0.9.0.jar:0.9.0]
at io.druid.segment.IndexMerger$10.apply(IndexMerger.java:1038) ~[druid-processing-0.9.0.jar:0.9.0]
at com.google.common.collect.Iterators$8.transform(Iterators.java:794) ~[guava-16.0.1.jar:?]
at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48) ~[guava-16.0.1.jar:?]
at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48) ~[guava-16.0.1.jar:?]
at com.google.common.collect.Iterators$PeekingImpl.next(Iterators.java:1145) ~[guava-16.0.1.jar:?]
at com.metamx.common.guava.MergeIterator.next(MergeIterator.java:75) ~[java-util-0.27.7.jar:?]
at com.google.common.collect.Iterators$PeekingImpl.next(Iterators.java:1145) ~[guava-16.0.1.jar:?]
at io.druid.collections.CombiningIterator.next(CombiningIterator.java:75) ~[druid-common-0.9.0.jar:0.9.0]
at io.druid.segment.IndexMerger.makeIndexFiles(IndexMerger.java:791) ~[druid-processing-0.9.0.jar:0.9.0]
at io.druid.segment.IndexMerger.merge(IndexMerger.java:421) ~[druid-processing-0.9.0.jar:0.9.0]
at io.druid.segment.IndexMerger.persist(IndexMerger.java:193) ~[druid-processing-0.9.0.jar:0.9.0]
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.persist(IndexGeneratorJob.java:501) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:676) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:469) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_74]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_74]
at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_74]
2016-05-18T15:12:57,531 INFO [main] org.apache.hadoop.mapreduce.Job - Job job_local1298338564_0002 failed with state FAILED due to: NA
2016-05-18T15:12:57,558 INFO [main] org.apache.hadoop.mapreduce.Job - Counters: 38
File System Counters
FILE: Number of bytes read=1352583638495
FILE: Number of bytes written=626472806399
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=0
HDFS: Number of bytes written=0
HDFS: Number of read operations=0
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Map-Reduce Framework
Map input records=8592036
Map output records=8535605
Map output bytes=6084709857
Map output materialized bytes=6118858955
Input split bytes=42771
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=6118858955
Reduce input records=1146208
Reduce output records=0
Spilled Records=8535605
Shuffled Maps =1113
Failed Shuffles=0
Merged Map outputs=1113
GC time elapsed (ms)=4982
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=877354024960
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters 
Bytes Read=0
File Output Format Counters 
Bytes Written=56
2016-05-18T15:12:57,573 INFO [main] io.druid.indexer.JobHelper - Deleting path[file:/tmp/testdata/2016-05-18T150008.063Z/27fd216025c941a1a096775f5ddc5fdd]
2016-05-18T15:12:57,584 ERROR [main] io.druid.cli.CliHadoopIndexer - failure!!!!
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_74]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_74]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_74]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_74]
at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:115) [druid-services-0.9.0.jar:0.9.0]
at io.druid.cli.Main.main(Main.java:105) [druid-services-0.9.0.jar:0.9.0]
Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:343) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]
at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:94) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:323) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]
at io.druid.cli.CliInternalHadoopIndexer.run(CliInternalHadoopIndexer.java:130) ~[druid-services-0.9.0.jar:0.9.0]
at io.druid.cli.Main.main(Main.java:105) ~[druid-services-0.9.0.jar:0.9.0]


How can I overcome these errors ? (In Druid 0.8.3 I was having Java Heap Space error with both Overlord indexing task and hadoop indexing. I handled that with reducing maxRowsInMemory to 50000, I am using the same thing now)

Thanks! 


prinull

unread,
May 20, 2016, 4:21:58 AM5/20/16
to Druid User
Addition: Using Indexing Service with 10G peons, I got the exact same error.

prinull

unread,
May 20, 2016, 6:22:59 AM5/20/16
to Druid User
Addition again: I got so many metrics(50+), I reduced them to 3 in spec file, now ingestion is complete. Is there an explanation for this ?

Nishant Bangarwa

unread,
May 20, 2016, 8:14:05 AM5/20/16
to Druid User
Hi prinull, 
There is a 2G maximum byte size limit for column in druid. I wonder if some of the metric is reaching that limit causing this error. 
Try reducing your targetPartitionSize in order to have smaller segments. 

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/028e9641-a9d9-4934-8023-39bbf70ff559%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Fatih Mermerkaya

unread,
Nov 5, 2016, 1:09:16 PM11/5/16
to Druid User
Hi

I have the same issue. Did you find the resolution. ? if i reduce the metrics it works. if i keep the metrics (25), even if task run 1 minutes, upon completion, it fails to write to disk.

David Lim

unread,
Nov 7, 2016, 2:42:16 PM11/7/16
to Druid User
Hi Fatih, can you post your task log?

Fatih Mermerkaya

unread,
Nov 17, 2016, 2:19:09 AM11/17/16
to Druid User
Hi David

I believe its due to null fields in csv format, i move to json it works without problem.
Somehow, it doesn't like the null and shift the fields, before it handing over the segment to deep storage, i can see that dimension field has shifted from peons.
do parser has enclosure option in parsing, i couldn't find in documentation. 
log.log

David Lim

unread,
Dec 15, 2016, 7:09:46 PM12/15/16
to Druid User
The CSV parser can only be configured with a list delimiter for multi-valued dimensions.

Have you moved from CSV to JSON now? Is everything working alright?
Reply all
Reply to author
Forward
0 new messages