error on hadoopIndexer over indexingService

Andres Gomez

unread,

Sep 7, 2015, 4:59:22 AM9/7/15

to Druid User

Hi all,

Currently, we are working with HadoopIndexer over indexing service to reindex data adding new dimensions. We are working with:

Reindex nodes:

3 druid_middleManagers (12 CPUs and 64GB RAM)
1 druid_overlord
1 hadoop_namenode
2 hadoop_datanode

Commons nodes:

2 druid_historicals
2 druid_coordinator
2 druid_brokers
2 druid_realtime

We have 1 hour segments granularity, and we are trying to reindex 1 month. We are using hadoop-static reindexing because we haven't a partition data on HDFS, we are using static files with raw data.

We are launching hadoopIndexer task to reindex 4 hour of data (4 segments of 1 hour), some task end with status "SUCCESS" but other task end with status "FAILED". I have been seen this exception on the log task:

2015-09-06T11:01:49,916 WARN [Thread-125] org.apache.hadoop.mapred.LocalJobRunner - job_local394600922_0003
java.lang.Exception: java.lang.NullPointerException
	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.3.0.jar:?]
Caused by: java.lang.NullPointerException
	at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:433) ~[hadoop-common-2.3.0.jar:?]
	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1399) ~[hadoop-common-2.3.0.jar:?]
	at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.writeSegmentDescriptor(IndexGeneratorJob.java:645) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]
	at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.renameIndexFiles(IndexGeneratorJob.java:633) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]
	at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.serializeOutIndex(IndexGeneratorJob.java:545) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]
	at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:449) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]
	at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:295) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
	at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:1.7.0_03]
	at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) ~[?:1.7.0_03]
	at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.7.0_03]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:1.7.0_03]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:1.7.0_03]
	at java.lang.Thread.run(Unknown Source) ~[?:1.7.0_03]

I have sheared about this exception on Internet and other forums, and I only found that this maybe is a problem with the s3 credentials.. but I think that this isn't the problem because others task works fine and all the tasks run with the same configuration.

I attach the task log file on this post. I hope that someone can help me with this issue ..

Regards and thanks,

Andres

task.log

Gian Merlino

unread,

Sep 9, 2015, 12:03:43 AM9/9/15

to Druid User

Hey Andres,

IIRC this can happen if you try to read an empty file, or an S3 "directory". Can you try running again with any empty files removed from your pathSpec?

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/2b996a7b-172b-4d84-bac1-c540744282a8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andres Gomez

unread,

Sep 9, 2015, 3:22:13 AM9/9/15

to Druid User

Hi Gian, when you said "pathSpec" is the same that "inputSpec" ???

Regards,

Andres

Gian Merlino

unread,

Sep 10, 2015, 2:57:38 PM9/10/15

to Druid User

Ah, yeah, I mean "inputSpec". I get those mixed up sometimes because in the code the object is called a PathSpec :)

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/92e2cc77-f6d4-4d52-86e9-85cab534263d%40googlegroups.com.

Andres Gomez Ferrer

unread,

Sep 10, 2015, 3:20:17 PM9/10/15

to druid...@googlegroups.com

hahaha thanks Gian :) I will try to check this tomorrow!

Regards,

Andrés Gómez

Developer

redborder.net / ago...@redborder.net

Phone: +34 955 60 11 60

Piénsalo antes de imprimir este mensaje

Este correo electrónico, incluidos sus anexos, se dirige exclusivamente a su destinatario. Contiene información CONFIDENCIAL cuya divulgación está prohibida por la ley o puede estar sometida a secreto profesional. Si ha recibido este mensaje por error, le rogamos nos lo comunique inmediatamente y proceda a su destrucción.

This email, including attachments, is intended exclusively for its addressee. It contains information that is CONFIDENTIAL whose disclosure is prohibited by law and may be covered by legal privilege. If you have received this email in error, please notify the sender and delete it from your system.

En 10 de septiembre de 2015 en 20:57:39, Gian Merlino (gianm...@gmail.com) escrito:

You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/4YcnWhFQa20/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CAPP9sfb_0xTqBp8Z7Xz%2BZL_%2B9a_jydxezG5U%2B3zF0O1dMWcwZw%40mail.gmail.com.

Reply all

Reply to author

Forward