Hadoop Indexing Not Working with Azure Deep Storage

40 views
Skip to first unread message

Pravesh Gupta

unread,
Apr 6, 2018, 2:41:25 AM4/6/18
to Druid User
Hi,
We are migrating the Druid cluster from AWS to Azure. As part of that, we were running our hadoop indexing job to ingest data into Druid on Azure with Azure Blob Storage as deep storage.
But looks like this is not supported in Druid 0.12 version as well.

Could anyone please confirm about it (as is what all things are restricted when we migrate Druid Cluster from AWS to Azure) ?
Also Why this is not supported , a quick explanation ?
And what are the ways to make things work , as we do want Hadoop indexing job to ingest data .

```2018-04-05 14:05:02.073+0000 *INFO* CAMP [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask Starting flush of map output
2018-04-05 14:05:02.081+0000 *INFO* CAMP [Thread-78] org.apache.hadoop.mapred.LocalJobRunner map task executor complete.
2018-04-05 14:05:02.082+0000 *WARN* CAMP [Thread-78] org.apache.hadoop.mapred.LocalJobRunner job_local1436102658_0001
java.lang.Exception: java.lang.NullPointerException: segmentOutputPath
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.7.3.jar:?]
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) [hadoop-mapreduce-client-common-2.7.3.jar:?]
Caused by: java.lang.NullPointerException: segmentOutputPath
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229) ~[guava-16.0.1.jar:?]
        at io.druid.indexer.HadoopDruidIndexerConfig.verify(HadoopDruidIndexerConfig.java:589) ~[druid-indexing-hadoop-0.12.0.jar:0.12.0]
        at io.druid.indexer.HadoopDruidIndexerConfig.fromConfiguration(HadoopDruidIndexerConfig.java:211) ~[druid-indexing-hadoop-0.12.0.jar:0.12.0]
        at io.druid.indexer.HadoopDruidIndexerMapper.setup(HadoopDruidIndexerMapper.java:51) ~[druid-indexing-hadoop-0.12.0.jar:0.12.0]
        at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.setup(DetermineHashedPartitionsJob.java:225) ~[druid-indexing-hadoop-0.12.0.jar:0.12.0]
        at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.run(DetermineHashedPartitionsJob.java:280) ~[druid-indexing-hadoop-0.12.0.jar:0.12.0]
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) ~[hadoop-mapreduce-client-common-2.7.3.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_161]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_161]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_161]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_161]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_161]
2018-04-05 14:05:02.762+0000 *INFO* CAMP [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job Job job_local1436102658_0001 running in uber mode : false
2018-04-05 14:05:02.763+0000 *INFO* CAMP [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job  map 0% reduce 0%
2018-04-05 14:05:02.765+0000 *INFO* CAMP [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job Job job_local1436102658_0001 failed with state FAILED due to: NA
2018-04-05 14:05:02.773+0000 *INFO* CAMP [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job Counters: 0
2018-04-05 14:05:02.773+0000 *ERROR* CAMP [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob Job failed: job_local1436102658_0001
2018-04-05 14:05:02.774+0000 *INFO* CAMP [task-runner-0-priority-0] io.druid.indexer.JobHelper Deleting path[/tmp/druid-indexing/wikiticker/2018-04-05T140453.615Z_c8d08b4bb74141a2ad94d2956b41defc]
2018-04-05 14:05:02.793+0000 *ERROR* CAMP [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner Exception while running task[HadoopIndexTask{id=index_hadoop_wikiticker_2018-04-05T14:04:53.617Z, type=index_hadoop, dataSource=wikiticker}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
        at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
        at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:222) ~[druid-indexing-service-0.12.0.jar:0.12.0]
        at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:184) ~[druid-indexing-service-0.12.0.jar:0.12.0]
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.0.jar:0.12.0]
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.0.jar:0.12.0]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_161]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
Caused by: java.lang.reflect.InvocationTargetException

Thanks,
Pravesh Gupta

Jonathan Wei

unread,
Apr 6, 2018, 4:04:50 PM4/6/18
to druid...@googlegroups.com
Hi Pravesh,

I think you'll need this patch which was merged after 0.12.0:


Thanks,
Jon

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/00a124cc-9282-4b36-b6c9-8cad5fc4358a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pravesh Gupta

unread,
Apr 7, 2018, 3:27:06 AM4/7/18
to Druid User
Thanks Jonathan.

Any idea on when can we expect next Druid release.
We are dependent on this PR.
Also, is there any rc Druid tar available with PR merged. If not, how can we get the same ?

Thanks,
Pravesh Gupta
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages