Hadoop Indexing Not Working with Azure Deep Storage

Pravesh Gupta

unread,

Apr 6, 2018, 2:41:25 AM4/6/18

to Druid User

Hi,

We are migrating the Druid cluster from AWS to Azure. As part of that, we were running our hadoop indexing job to ingest data into Druid on Azure with Azure Blob Storage as deep storage.

But looks like this is not supported in Druid 0.12 version as well.

Could anyone please confirm about it (as is what all things are restricted when we migrate Druid Cluster from AWS to Azure) ?

Also Why this is not supported , a quick explanation ?

And what are the ways to make things work , as we do want Hadoop indexing job to ingest data .

```2018-04-05 14:05:02.073+0000 *INFO* CAMP [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask Starting flush of map output

2018-04-05 14:05:02.081+0000 *INFO* CAMP [Thread-78] org.apache.hadoop.mapred.LocalJobRunner map task executor complete.

2018-04-05 14:05:02.082+0000 *WARN* CAMP [Thread-78] org.apache.hadoop.mapred.LocalJobRunner job_local1436102658_0001

java.lang.Exception: java.lang.NullPointerException: segmentOutputPath

at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.7.3.jar:?]

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) [hadoop-mapreduce-client-common-2.7.3.jar:?]

Caused by: java.lang.NullPointerException: segmentOutputPath

at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229) ~[guava-16.0.1.jar:?]

at io.druid.indexer.HadoopDruidIndexerConfig.verify(HadoopDruidIndexerConfig.java:589) ~[druid-indexing-hadoop-0.12.0.jar:0.12.0]

at io.druid.indexer.HadoopDruidIndexerConfig.fromConfiguration(HadoopDruidIndexerConfig.java:211) ~[druid-indexing-hadoop-0.12.0.jar:0.12.0]

at io.druid.indexer.HadoopDruidIndexerMapper.setup(HadoopDruidIndexerMapper.java:51) ~[druid-indexing-hadoop-0.12.0.jar:0.12.0]

at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.setup(DetermineHashedPartitionsJob.java:225) ~[druid-indexing-hadoop-0.12.0.jar:0.12.0]

at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.run(DetermineHashedPartitionsJob.java:280) ~[druid-indexing-hadoop-0.12.0.jar:0.12.0]

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]

at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) ~[hadoop-mapreduce-client-common-2.7.3.jar:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_161]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_161]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_161]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_161]

at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_161]

2018-04-05 14:05:02.762+0000 *INFO* CAMP [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job Job job_local1436102658_0001 running in uber mode : false

2018-04-05 14:05:02.763+0000 *INFO* CAMP [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job map 0% reduce 0%

2018-04-05 14:05:02.765+0000 *INFO* CAMP [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job Job job_local1436102658_0001 failed with state FAILED due to: NA

2018-04-05 14:05:02.773+0000 *INFO* CAMP [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job Counters: 0

2018-04-05 14:05:02.773+0000 *ERROR* CAMP [task-runner-0-priority-0] io.druid.indexer.DetermineHashedPartitionsJob Job failed: job_local1436102658_0001

2018-04-05 14:05:02.774+0000 *INFO* CAMP [task-runner-0-priority-0] io.druid.indexer.JobHelper Deleting path[/tmp/druid-indexing/wikiticker/2018-04-05T140453.615Z_c8d08b4bb74141a2ad94d2956b41defc]

2018-04-05 14:05:02.793+0000 *ERROR* CAMP [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner Exception while running task[HadoopIndexTask{id=index_hadoop_wikiticker_2018-04-05T14:04:53.617Z, type=index_hadoop, dataSource=wikiticker}]

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:222) ~[druid-indexing-service-0.12.0.jar:0.12.0]

at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:184) ~[druid-indexing-service-0.12.0.jar:0.12.0]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.0.jar:0.12.0]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.0.jar:0.12.0]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_161]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]

Caused by: java.lang.reflect.InvocationTargetException

Thanks,

Pravesh Gupta

Jonathan Wei

unread,

Apr 6, 2018, 4:04:50 PM4/6/18

to druid...@googlegroups.com

Hi Pravesh,

I think you'll need this patch which was merged after 0.12.0:

https://github.com/druid-io/druid/pull/5221

Thanks,

Jon

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/00a124cc-9282-4b36-b6c9-8cad5fc4358a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pravesh Gupta

unread,

Apr 7, 2018, 3:27:06 AM4/7/18

to Druid User

Thanks Jonathan.

Any idea on when can we expect next Druid release.

We are dependent on this PR.

Also, is there any rc Druid tar available with PR merged. If not, how can we get the same ?

Thanks,

Pravesh Gupta

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

Reply all

Reply to author

Forward