error on hadoopIndexer over indexingService

87 views
Skip to first unread message

Andres Gomez

unread,
Sep 7, 2015, 4:59:22 AM9/7/15
to Druid User
Hi all,

Currently, we are working with HadoopIndexer over indexing service to reindex data adding new dimensions. We are working with:

Reindex nodes:
  • 3 druid_middleManagers (12 CPUs and 64GB RAM)
  • 1 druid_overlord
  • 1 hadoop_namenode
  • 2 hadoop_datanode
Commons nodes:
  • 2 druid_historicals
  • 2 druid_coordinator
  • 2 druid_brokers
  • 2 druid_realtime
We have 1 hour segments granularity, and we are trying to reindex 1 month. We are using hadoop-static reindexing because we haven't a partition data on HDFS, we are using static files with raw data.
We are launching hadoopIndexer task to reindex 4 hour of data (4 segments of 1 hour), some task end with status "SUCCESS" but other task end with status "FAILED". I have been seen this exception on the log task:

2015-09-06T11:01:49,916 WARN [Thread-125] org.apache.hadoop.mapred.LocalJobRunner - job_local394600922_0003
java.lang.Exception: java.lang.NullPointerException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.3.0.jar:?]
Caused by: java.lang.NullPointerException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:433) ~[hadoop-common-2.3.0.jar:?]
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1399) ~[hadoop-common-2.3.0.jar:?]
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.writeSegmentDescriptor(IndexGeneratorJob.java:645) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.renameIndexFiles(IndexGeneratorJob.java:633) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.serializeOutIndex(IndexGeneratorJob.java:545) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:449) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:295) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:1.7.0_03]
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) ~[?:1.7.0_03]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.7.0_03]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:1.7.0_03]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:1.7.0_03]
at java.lang.Thread.run(Unknown Source) ~[?:1.7.0_03]

I have sheared about this exception on Internet and other forums, and I only found that this maybe is a problem with the s3 credentials.. but I think that this isn't the problem because others task works fine and all the tasks run with the same configuration.

I attach the task log file on this post. I hope that someone can help me with this issue ..

Regards and thanks,

Andres


task.log

Gian Merlino

unread,
Sep 9, 2015, 12:03:43 AM9/9/15
to Druid User
Hey Andres,

IIRC this can happen if you try to read an empty file, or an S3 "directory". Can you try running again with any empty files removed from your pathSpec?

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/2b996a7b-172b-4d84-bac1-c540744282a8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andres Gomez

unread,
Sep 9, 2015, 3:22:13 AM9/9/15
to Druid User
Hi Gian, when you said "pathSpec" is the same that "inputSpec" ??? 

Regards,

Andres

Gian Merlino

unread,
Sep 10, 2015, 2:57:38 PM9/10/15
to Druid User
Ah, yeah, I mean "inputSpec". I get those mixed up sometimes because in the code the object is called a PathSpec :)

Andres Gomez Ferrer

unread,
Sep 10, 2015, 3:20:17 PM9/10/15
to druid...@googlegroups.com
hahaha thanks Gian :) I will try to check this tomorrow!

Regards,


Piénsalo antes de imprimir este mensaje
 
Este correo electrónico, incluidos sus anexos, se dirige exclusivamente a su destinatario. Contiene información CONFIDENCIAL cuya divulgación está prohibida por la ley o puede estar sometida a secreto profesional. Si ha recibido este mensaje por error, le rogamos nos lo comunique inmediatamente y proceda a su destrucción.
 
This email, including attachments, is intended exclusively for its addressee. It contains information that is CONFIDENTIAL whose disclosure is prohibited by law and may be covered by legal privilege. If you have received this email in error, please notify the sender and delete it from your system. 


En 10 de septiembre de 2015 en 20:57:39, Gian Merlino (gianm...@gmail.com) escrito:

You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/4YcnWhFQa20/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages