Rackspace cloudfiles extension for deep storage not working

74 views
Skip to first unread message

Manish Deora

unread,
Jul 15, 2016, 8:38:50 AM7/15/16
to Druid User
Hello, I am trying to do an ingestion task with deep storage configured as cloudfiles, the ingestion task is failing with segmentOutputPath null pointer exception.

The ingestion task I am running is the default example of wikiticker-index.json. I have not  made any changes in the ingestion task though.


Any ideas why its failing, as per my understanding the value of segmentOutputPath should be calculated internally depending on deep storage configuration type.


I am using the Implydata-1.3.0 package and running druid in local mode, with deep storage to cloud files.


Fangjin Yang

unread,
Jul 17, 2016, 1:53:26 PM7/17/16
to Druid User
Cloudfiles is a community extension and not supported by Imply or the Druid committers. You'll have the most luck with finding the original author and getting support there. You can post the full stack trace of your error and we might be able to help/
Message has been deleted

Manish Deora

unread,
Jul 21, 2016, 9:12:45 AM7/21/16
to Druid User
Hi, attached the failed task log.
failed_task_log.txt

Fangjin Yang

unread,
Jul 25, 2016, 9:35:19 PM7/25/16
to Druid User
Remove segmentOutputPath from your indexing spec.

Manish Deora

unread,
Jul 28, 2016, 8:41:49 AM7/28/16
to Druid User
Hi Fang, 

I am not specifying segmentOutputPath in the indexing spec. Attached the ingestion task spec for your reference.

Is there anything needs to be specified in the jobProperties ?
wikiticker-index.json

Benjamin Angelaud

unread,
Jul 28, 2016, 8:58:46 AM7/28/16
to Druid User
Hey Manish,

Seems like your deep storage is not configured.
You don't set any segmentOutputPath in the indexing spec, and in the logs, when the task is printed the segmentOutputPath is null and should be replace by your deep storage path.
Try to reconfigure it, and just read the task logs, segmentOutputPath can't be null.
Hope it helps, let me know.

Ben

Manish Deora

unread,
Jul 28, 2016, 9:06:13 AM7/28/16
to druid...@googlegroups.com
Hi Benjamin,

There is nothing in the log, attached the task log for your reference, 
The log says - cloudfiles deepStorage configured. See line 191 in log.

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/FNZeo2vlWRY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/c1881649-7234-40dd-8f28-76d7a0d5e2e0%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

log.txt

Benjamin Angelaud

unread,
Jul 28, 2016, 9:17:32 AM7/28/16
to Druid User
See in the log "2016-07-28T12:37:53,631 INFO [main] io.druid.indexing.worker.executor.ExecutorLifecycle - Running with task:"
And the task submitted is the following one.
In this task definition, the task submitted your segmentOutputPath is set to null:

"ioConfig" : {
   
"type" : "hadoop",
   
"inputSpec" : {
       
"type" : "static",
       
"paths" : "quickstart/wikiticker-2016-06-27-sampled.json"
   
},
   
"metadataUpdateSpec" : null,
   
"segmentOutputPath" : null
},

When the deep storage is defined correctly, your segmentOutputPath is not null.


Manish Deora

unread,
Jul 28, 2016, 9:21:36 AM7/28/16
to Druid User
I understand that Benjamin, but logs doesn't indicate any clue on what might be going wrong in the deepstorage definition 

Benjamin Angelaud

unread,
Jul 28, 2016, 9:27:37 AM7/28/16
to Druid User
You should investigate that way, see the overlord logs when it starts maybe ?

Manish Deora

unread,
Jul 28, 2016, 9:37:22 AM7/28/16
to druid...@googlegroups.com
No abnormalities in overlord and coordinator logs . Attached for your reference.

On Thu, Jul 28, 2016 at 6:57 PM, Benjamin Angelaud <be.an...@gmail.com> wrote:
You should investigate that way, see the overlord logs when it starts maybe ?

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/FNZeo2vlWRY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
overlord.log
coordinator.log

Fangjin Yang

unread,
Jul 30, 2016, 1:17:53 AM7/30/16
to Druid User
Hi Manish, this is a bug with the Cloudfiles extension and it actually doesn't work with Hadoop indexing.

The problem is here if you want to fix it:

That needs to return an actual valid directory

I should also add that Cloudfiles is not an officially supported module by the Druid committers.

On Thursday, July 28, 2016 at 6:37:22 AM UTC-7, Manish Deora wrote:
No abnormalities in overlord and coordinator logs . Attached for your reference.
On Thu, Jul 28, 2016 at 6:57 PM, Benjamin Angelaud <be.an...@gmail.com> wrote:
You should investigate that way, see the overlord logs when it starts maybe ?

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/FNZeo2vlWRY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

manish deora

unread,
Aug 1, 2016, 6:21:34 AM8/1/16
to druid...@googlegroups.com
Hi Fang, thanks for confirming the bug. I will see if i can open a pull request for the same.

I am aware of that, its not a officially supported module, can it become part of officially supported module ?



On Sat, Jul 30, 2016 at 10:47 AM, Fangjin Yang <fan...@imply.io> wrote:
Hi Manish, this is a bug with the Cloudfiles extension and it actually doesn't work with Hadoop indexing.

The problem is here if you want to fix it:

That needs to return an actual valid directory

I should also add that Cloudfiles is not an officially supported module by the Druid committers.

On Thursday, July 28, 2016 at 6:37:22 AM UTC-7, Manish Deora wrote:
No abnormalities in overlord and coordinator logs . Attached for your reference.
On Thu, Jul 28, 2016 at 6:57 PM, Benjamin Angelaud <be.an...@gmail.com> wrote:
You should investigate that way, see the overlord logs when it starts maybe ?

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/FNZeo2vlWRY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/FNZeo2vlWRY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Manish Deora

unread,
Aug 1, 2016, 7:06:48 AM8/1/16
to Druid User

Manish Deora

unread,
Aug 2, 2016, 7:21:37 AM8/2/16
to Druid User
Hi,

I am trying out the cloudfiles fix, getting below exception any ideas on this.
Attached the job spec. Also note I can see the files getting created in the cloudfiles container (index.zip.0)

2016-08-02T11:12:17,657 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2016-08-02T11:14:16,498 ERROR [pool-23-thread-1] io.druid.indexer.JobHelper - Exception in retry loop
java.lang.NullPointerException
	at org.apache.hadoop.fs.swift.snative.SwiftNativeOutputStream.flush(SwiftNativeOutputStream.java:102) ~[hadoop-openstack-2.3.0.jar:?]
	at java.io.FilterOutputStream.flush(FilterOutputStream.java:140) ~[?:1.8.0_73]
	at java.io.DataOutputStream.flush(DataOutputStream.java:123) ~[?:1.8.0_73]
	at io.druid.indexer.JobHelper$4.push(JobHelper.java:375) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_73]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_73]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_73]
	at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_73]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) [hadoop-common-2.3.0.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) [hadoop-common-2.3.0.jar:?]
	at com.sun.proxy.$Proxy229.push(Unknown Source) [?:?]
	at io.druid.indexer.JobHelper.serializeOutIndex(JobHelper.java:386) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:703) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:469) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) [hadoop-mapreduce-client-core-2.3.0.jar:?]
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) [hadoop-mapreduce-client-core-2.3.0.jar:?]
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) [hadoop-mapreduce-client-core-2.3.0.jar:?]
	at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) [hadoop-mapreduce-client-common-2.3.0.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_73]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_73]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_73]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_73]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_73]
2016-08-02T11:14:29,917 INFO [pool-23-thread-1] io.druid.indexer.JobHelper - Creating new ZipEntry[00000.smoosh]
2016-08-02T11:14:30,422 INFO [pool-23-thread-1] io.druid.indexer.JobHelper - Creating new ZipEntry[meta.smoosh]
2016-08-02T11:14:30,431 INFO [pool-23-thread-1] io.druid.indexer.JobHelper - Creating new ZipEntry[version.bin]
2016-08-02T11:14:35,769 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
job.txt

Fangjin Yang

unread,
Aug 15, 2016, 6:42:30 PM8/15/16
to Druid User
Hi Manish, look at the stack trace. What is the var that is null? I'm not sure which version of Druid you are on, but maybe you can include the Druid line it is complaining about in the stack trace.
Reply all
Reply to author
Forward
0 new messages