Rackspace cloudfiles extension for deep storage not working

Manish Deora

unread,

Jul 15, 2016, 8:38:50 AM7/15/16

to Druid User

Hello, I am trying to do an ingestion task with deep storage configured as cloudfiles, the ingestion task is failing with segmentOutputPath null pointer exception.

The ingestion task I am running is the default example of wikiticker-index.json. I have not made any changes in the ingestion task though.

Any ideas why its failing, as per my understanding the value of segmentOutputPath should be calculated internally depending on deep storage configuration type.

I am using the Implydata-1.3.0 package and running druid in local mode, with deep storage to cloud files.

Fangjin Yang

unread,

Jul 17, 2016, 1:53:26 PM7/17/16

to Druid User

Cloudfiles is a community extension and not supported by Imply or the Druid committers. You'll have the most luck with finding the original author and getting support there. You can post the full stack trace of your error and we might be able to help/

Message has been deleted

Manish Deora

unread,

Jul 21, 2016, 9:12:45 AM7/21/16

to Druid User

Hi, attached the failed task log.

failed_task_log.txt

Fangjin Yang

unread,

Jul 25, 2016, 9:35:19 PM7/25/16

to Druid User

Remove segmentOutputPath from your indexing spec.

Manish Deora

unread,

Jul 28, 2016, 8:41:49 AM7/28/16

to Druid User

Hi Fang,

I am not specifying segmentOutputPath in the indexing spec. Attached the ingestion task spec for your reference.

Is there anything needs to be specified in the jobProperties ?

wikiticker-index.json

Benjamin Angelaud

unread,

Jul 28, 2016, 8:58:46 AM7/28/16

to Druid User

Hey Manish,

Seems like your deep storage is not configured.

You don't set any segmentOutputPath in the indexing spec, and in the logs, when the task is printed the segmentOutputPath is null and should be replace by your deep storage path.

Try to reconfigure it, and just read the task logs, segmentOutputPath can't be null.

Hope it helps, let me know.

Ben

Manish Deora

unread,

Jul 28, 2016, 9:06:13 AM7/28/16

to druid...@googlegroups.com

Hi Benjamin,

There is nothing in the log, attached the task log for your reference,

The log says - cloudfiles deepStorage configured. See line 191 in log.

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/FNZeo2vlWRY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/c1881649-7234-40dd-8f28-76d7a0d5e2e0%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

log.txt

Benjamin Angelaud

unread,

Jul 28, 2016, 9:17:32 AM7/28/16

to Druid User

See in the log "2016-07-28T12:37:53,631 INFO [main] io.druid.indexing.worker.executor.ExecutorLifecycle - Running with task:"

And the task submitted is the following one.

In this task definition, the task submitted your segmentOutputPath is set to null:

"ioConfig" : { 
    "type" : "hadoop", 
    "inputSpec" : { 
        "type" : "static",
        "paths" : "quickstart/wikiticker-2016-06-27-sampled.json" 
    }, 
    "metadataUpdateSpec" : null, 
    "segmentOutputPath" : null 
},

When the deep storage is defined correctly, your segmentOutputPath is not null.

Manish Deora

unread,

Jul 28, 2016, 9:21:36 AM7/28/16

to Druid User

I understand that Benjamin, but logs doesn't indicate any clue on what might be going wrong in the deepstorage definition

Benjamin Angelaud

unread,

Jul 28, 2016, 9:27:37 AM7/28/16

to Druid User

You should investigate that way, see the overlord logs when it starts maybe ?

Manish Deora

unread,

Jul 28, 2016, 9:37:22 AM7/28/16

to druid...@googlegroups.com

No abnormalities in overlord and coordinator logs . Attached for your reference.

On Thu, Jul 28, 2016 at 6:57 PM, Benjamin Angelaud <be.an...@gmail.com> wrote:

You should investigate that way, see the overlord logs when it starts maybe ?

--

You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/FNZeo2vlWRY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/f2fe0652-8cdf-4287-8c5e-fb6018420ceb%40googlegroups.com.

overlord.log

coordinator.log

Fangjin Yang

unread,

Jul 30, 2016, 1:17:53 AM7/30/16

to Druid User

Hi Manish, this is a bug with the Cloudfiles extension and it actually doesn't work with Hadoop indexing.

The problem is here if you want to fix it:

https://github.com/druid-io/druid/blob/485e381387edf1795c5ebbb8002537f14b4ed8cb/extensions-contrib/cloudfiles-extensions/src/main/java/io/druid/storage/cloudfiles/CloudFilesDataSegmentPusher.java#L64

That needs to return an actual valid directory

I should also add that Cloudfiles is not an officially supported module by the Druid committers.

On Thursday, July 28, 2016 at 6:37:22 AM UTC-7, Manish Deora wrote:

No abnormalities in overlord and coordinator logs . Attached for your reference.

On Thu, Jul 28, 2016 at 6:57 PM, Benjamin Angelaud <be.an...@gmail.com> wrote:

You should investigate that way, see the overlord logs when it starts maybe ?

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/FNZeo2vlWRY/unsubscribe.

To unsubscribe from this group and all its topics, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

manish deora

unread,

Aug 1, 2016, 6:21:34 AM8/1/16

to druid...@googlegroups.com

Hi Fang, thanks for confirming the bug. I will see if i can open a pull request for the same.

I am aware of that, its not a officially supported module, can it become part of officially supported module ?

On Sat, Jul 30, 2016 at 10:47 AM, Fangjin Yang <fan...@imply.io> wrote:

Hi Manish, this is a bug with the Cloudfiles extension and it actually doesn't work with Hadoop indexing.

The problem is here if you want to fix it:
https://github.com/druid-io/druid/blob/485e381387edf1795c5ebbb8002537f14b4ed8cb/extensions-contrib/cloudfiles-extensions/src/main/java/io/druid/storage/cloudfiles/CloudFilesDataSegmentPusher.java#L64

That needs to return an actual valid directory

I should also add that Cloudfiles is not an officially supported module by the Druid committers.

On Thursday, July 28, 2016 at 6:37:22 AM UTC-7, Manish Deora wrote:

No abnormalities in overlord and coordinator logs . Attached for your reference.

On Thu, Jul 28, 2016 at 6:57 PM, Benjamin Angelaud <be.an...@gmail.com> wrote:

You should investigate that way, see the overlord logs when it starts maybe ?

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/FNZeo2vlWRY/unsubscribe.

To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/f2fe0652-8cdf-4287-8c5e-fb6018420ceb%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/FNZeo2vlWRY/unsubscribe.

To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/f3118edc-6427-4108-b341-fdcccab2be64%40googlegroups.com.

Manish Deora

unread,

Aug 1, 2016, 7:06:48 AM8/1/16

to Druid User

Hi,

I just noticed the same thing in AzureDataSegmentPusher

https://github.com/druid-io/druid/blob/485e381387edf1795c5ebbb8002537f14b4ed8cb/extensions-contrib/azure-extensions/src/main/java/io/druid/storage/azure/AzureDataSegmentPusher.java#L71

Is that I am missing anything there ?

Manish Deora

unread,

Aug 2, 2016, 7:21:37 AM8/2/16

to Druid User

Hi,

I am trying out the cloudfiles fix, getting below exception any ideas on this.

Attached the job spec. Also note I can see the files getting created in the cloudfiles container (index.zip.0)

2016-08-02T11:12:17,657 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2016-08-02T11:14:16,498 ERROR [pool-23-thread-1] io.druid.indexer.JobHelper - Exception in retry loop
java.lang.NullPointerException
	at org.apache.hadoop.fs.swift.snative.SwiftNativeOutputStream.flush(SwiftNativeOutputStream.java:102) ~[hadoop-openstack-2.3.0.jar:?]
	at java.io.FilterOutputStream.flush(FilterOutputStream.java:140) ~[?:1.8.0_73]
	at java.io.DataOutputStream.flush(DataOutputStream.java:123) ~[?:1.8.0_73]
	at io.druid.indexer.JobHelper$4.push(JobHelper.java:375) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_73]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_73]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_73]
	at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_73]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) [hadoop-common-2.3.0.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) [hadoop-common-2.3.0.jar:?]
	at com.sun.proxy.$Proxy229.push(Unknown Source) [?:?]
	at io.druid.indexer.JobHelper.serializeOutIndex(JobHelper.java:386) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:703) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:469) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) [hadoop-mapreduce-client-core-2.3.0.jar:?]
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) [hadoop-mapreduce-client-core-2.3.0.jar:?]
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) [hadoop-mapreduce-client-core-2.3.0.jar:?]
	at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) [hadoop-mapreduce-client-common-2.3.0.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_73]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_73]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_73]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_73]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_73]
2016-08-02T11:14:29,917 INFO [pool-23-thread-1] io.druid.indexer.JobHelper - Creating new ZipEntry[00000.smoosh]
2016-08-02T11:14:30,422 INFO [pool-23-thread-1] io.druid.indexer.JobHelper - Creating new ZipEntry[meta.smoosh]
2016-08-02T11:14:30,431 INFO [pool-23-thread-1] io.druid.indexer.JobHelper - Creating new ZipEntry[version.bin]
2016-08-02T11:14:35,769 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce

job.txt

Fangjin Yang

unread,

Aug 15, 2016, 6:42:30 PM8/15/16

to Druid User

Hi Manish, look at the stack trace. What is the var that is null? I'm not sure which version of Druid you are on, but maybe you can include the Druid line it is complaining about in the stack trace.

Reply all

Reply to author

Forward