Druid Setup with HDFS

tarun gulyani

unread,

Jul 24, 2014, 4:59:35 AM7/24/14

to druid-de...@googlegroups.com

Hi,

Please help me out, how o setup druid with HDFS. Instead of Storing data at /tmp/druid/localStorage, Store at HDFS path.

One more thing how can i change the path of local storage also like as instead of Store at /tmp/druid/localStorage Store as some other path may be /tmp/druid/localStorageNew etc.

I have tried put this entry at Broker,Reltime and Historical config but it doesn't work.

druid.pusher.local=true

druid.pusher.local.storageDirectory=/tmp/druid/localStorage1

druid.storage.local.storageDirectory=[{"path": "/tmp/druid/indexCache1", "maxSize"\: 10000000000}]

I have tried different way for path writing as druid.pusher.local.storageDirectory=/tmp/druid/localStorage1 and druid.storage.local.storageDirectory=[{"path": "/tmp/druid/indexCache1", "maxSize"\: 10000000000}]. But it doesn't change. All data store at /tmp/druid/localStorage.

Nishant Bangarwa

unread,

Jul 25, 2014, 1:32:49 AM7/25/14

to druid-de...@googlegroups.com

See Inline

On Thu, Jul 24, 2014 at 2:29 PM, tarun gulyani <tarung...@gmail.com> wrote:

Hi,

Please help me out, how o setup druid with HDFS. Instead of Storing data at /tmp/druid/localStorage, Store at HDFS path.

you can set your deep storage to hdfs by setting these properties in runtime.props for realtime -

druid.extensions.coordinates=["io.druid.extensions:druid-hdfs-storage:<druid_version>"]

druid.storage.type=hdfs
druid.storage.storageDirectory=<directory for storing segments>

One more thing how can i change the path of local storage also like as instead of Store at /tmp/druid/localStorage Store as some other path may be /tmp/druid/localStorageNew etc.

I have tried put this entry at Broker,Reltime and Historical config but it doesn't work.

druid.pusher.local=true
druid.pusher.local.storageDirectory=/tmp/druid/localStorage1

druid.storage.local.storageDirectory=[{"path": "/tmp/druid/indexCache1", "maxSize"\: 10000000000}]

the correct properties to set for different deep storage are documented here -

http://druid.io/docs/0.6.121/Deep-Storage.html

I have tried different way for path writing as druid.pusher.local.storageDirectory=/tmp/druid/localStorage1 and druid.storage.local.storageDirectory=[{"path": "/tmp/druid/indexCache1", "maxSize"\: 10000000000}]. But it doesn't change. All data store at /tmp/druid/localStorage.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/dee4ac79-0cde-48dd-8dc1-7b4150db0047%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Nishant

Software Engineer

|

METAMARKETS

m	+91-9729200044

nishant....@metamarkets.com

tarun gulyani

unread,

Jul 25, 2014, 2:06:26 AM7/25/14

to druid-de...@googlegroups.com

Thanks Nishant for replying,

1) To change the storage path, i found the way . We have to add entry in "config/overlord/runtime.properties" :

druid.storage.type=local

druid.storage.storageDirectory=/tmp/druid/localStorage1

This is properly working.

2) Regarding HDFS i tried same way in "config/overlord/runtime.properties" :

druid.storage.type=hdfs

druid.storage.storageDirectory=hdfs://localhost:54310/home/tarun/hadoop-workDir/druid.

But it doesn't work. Indexer prompt message the path for task is "hdfs:/localhost:54310/home/tarun/hadoop-workDir/druid". It consider "/" as escape sequence.

If i add entry of this "druid.extensions.coordinates=["io.druid.extensions:druid-hdfs-storage:0.6.121"]" parameter in runtime properties. Indexer hangs, This parameter also worn't work.

Command for indexer :

java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:/home/tarun/hadoop/conf:config/overlord io.druid.cli.Main server overlord

Task Command :

curl -X 'POST' -H 'Content-Type:application/json' -d @examples/indexing/wikipedia_index_hadoop_task.json localhost:8087/druid/indexer/v1/task

Nishant Bangarwa

unread,

Jul 25, 2014, 4:21:02 AM7/25/14

to druid-de...@googlegroups.com

When you say the indexer hangs, i wonder if it was downloading the extension and loading it at startup ?

the indexer logs will have more info on this.

To use hdfs as a deep storage you will need to add druid-hdfs-storage as an extension.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/9c506ba9-af61-413e-b841-2370bec77ca4%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

tarun gulyani

unread,

Jul 25, 2014, 6:44:16 AM7/25/14

to druid-de...@googlegroups.com

Hi Nishant,

I am able to run using hdfs. Now segment is storing at hdfs. But i am able to run the task "wikipedia_hadoop_config.json", not the "wikipedia_index_hadoop_task.json".

wikipedia_hadoop_config.json give correct path at console on indexer and also store at hdfs properly. But when i run the task "wikipedia_index_hadoop_task.json", it failed. In log it shows below exception :

java.lang.Exception: java.lang.IllegalArgumentException: Pathname /druid/wikipedia/wikipedia/2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z/2014-07-25T09:50:22.991Z/0 from hdfs://localhost:9000/druid/wikipedia/wikipedia/2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z/2014-07-25T09:50:22.991Z/0 is not a valid DFS filename.

tarun gulyani

unread,

Jul 25, 2014, 7:26:46 AM7/25/14

to druid-de...@googlegroups.com

One more thing Nishant :

Indexer run for hadoop task "wikipedia_index_hadoop_task.json" is this way "java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhadoop.mapred.job.queue.name=HI -classpath lib/*:/home/tarun/hadoop-yarn/hadoopJars/*:config/overlord io.druid.cli.Main server overlord"

or do i need to run indexer different way?

Nishant Bangarwa

unread,

Jul 25, 2014, 8:18:13 AM7/25/14

to druid-de...@googlegroups.com

Hi,

to run the indexer you need to add hadoop jars and config to the classpath,

Also, If you are running on a version of hadoop other than the default (2.3.0) you will need to specify hadoopCoordinates for your hadoop version.

which version of hadoop are you running with, did you added hadoop config files to the classpath ?

can you share full stack trace of the exception ?

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/8af5af12-3623-4f04-98ac-6596bc205f17%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

tarun gulyani

unread,

Jul 25, 2014, 1:30:01 PM7/25/14

to druid-de...@googlegroups.com

HI Nishant,

I am using apache hadoop 2.4. I have added Hadoop jars in classpath . Command for indexer run is :

java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:/home/tarun/hadoop-yarn/hadoopJars/*:config/overlord io.druid.cli.Main server overlord

i have copied all the jars of Hadoop 2.4 in this hadoopJars folder and include this path for any hadoop jar needed to druid indexer.

This task "curl -X 'POST' -H 'Content-Type:application/json' -d @examples/indexing/wikipedia_index_task.json localhost:8087/druid/indexer/v1/task" perfectly work and store segments at hdfs path mentioned in "config/overlord/runtime.properties"

But this task "curl -X 'POST' -H 'Content-Type:application/json' -d @examples/indexing/wikipedia_index_hadoop_task.json localhost:8087/druid/indexer/v1/task" fails. I have attached complete stack trace also.

index_hadoop_wikipedia_2014-07-25T11:15:54.304Z.log

Nishant Bangarwa

unread,

Jul 28, 2014, 9:59:39 AM7/28/14

to druid-de...@googlegroups.com

Hi Tarun,

Druid checks the default file system for replacing ":" with "_" and making a valid DFS file path,

What is the value of fs.defaultFS set in hadoop config files ?

can you try pointing this to hdfs filesystem, If its not already doing that ?

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/0fffa1f5-35d5-42dc-a661-f9a22fd338cf%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

tarun gulyani

unread,

Jul 28, 2014, 1:38:03 PM7/28/14

to druid-de...@googlegroups.com

Hi Nishant,

I have added that entry also , previously there was entry for "fs.default.name". Now i have added "fs.defaultFS" entry also. Still getting same error.

Configuration :

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

<description>The name of the default file system. A URI whose

scheme and authority determine the FileSystem implementation. The

uri's scheme determines the config property (fs.SCHEME.impl) naming

the FileSystem implementation class. The uri's authority is used to

determine the host, port, etc. for a filesystem.</description>

</property>

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

tarun gulyani

unread,

Jul 29, 2014, 11:35:40 AM7/29/14

to druid-de...@googlegroups.com

Hi Nishant,

Due to add the hadoop configuration folder path and including the "druid.indexer.fork.property.druid.indexer.task.hadoopWorkingPath=hdfs://localhost:9000/druid" path for for hadoop working directory. I am able to resolve most of the errors and now intermediate files are generating in hdfs.

But final task segment is not saving on hdfs and Task status is being failed. Exception from log is :

2014-07-29 09:08:47,249 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Running job: job_1406612167702_0009

2014-07-29 09:09:10,352 INFO [task-runner-0] org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server

2014-07-29 09:09:10,365 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_wikipedia_2014-07-29T09:08:36.566Z, type=index_hadoop, dataSource=wikipedia}]

java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:206)

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:219)

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:198)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.RuntimeException: java.io.IOException: Job status not available

at com.google.common.base.Throwables.propagate(Throwables.java:160)

at io.druid.indexer.DeterminePartitionsJob.run(DeterminePartitionsJob.java:246)

at io.druid.indexer.JobHelper.runJobs(JobHelper.java:135)

at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:86)

at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:303)

... 11 more

Caused by: java.io.IOException: Job status not available

at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)

at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:599)

at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1344)

at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1306)

at io.druid.indexer.DeterminePartitionsJob.run(DeterminePartitionsJob.java:151)

... 14 more

To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/druid-development/0fffa1f5-35d5-42dc-a661-f9a22fd338cf%40googlegroups.com?utm_medium=email&utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/druid-development/0fffa1f5-35d5-42dc-a661-f9a22fd338cf%40googlegroups.com?utm_medium\75email\46utm_source\75footer';return true;"
...

Deepak Jain

unread,

Jul 29, 2014, 11:06:09 PM7/29/14

to druid-de...@googlegroups.com

Hello Tarun,

I found this with google search of job status not available.

https://support.pivotal.io/hc/en-us/articles/201180246-IOException-Job-status-not-available-when-mapreduce-job-exits-successfully

Let us know if it helped ?

Regards.

Deepak

--

<td width="7" bgcolor="#F95F39" style="padding:0px;fon
...

tarun gulyani

unread,

Jul 31, 2014, 3:26:32 PM7/31/14

to druid-de...@googlegroups.com

Hi Deepak,

Thanks for reply. It didn't helped. I have run history server again. Still facing same issue with index_hadoop task :

...

Gian Merlino

unread,

Jul 31, 2014, 10:58:47 PM7/31/14

to druid-de...@googlegroups.com

What sort of storage properties have you set? From what you've said so far it sounds like you should at least have:

druid.storage.type=hdfs

druid.storage.storageDirectory=hdfs://localhost:9000/druid_segments_go_here/

If you don't already have something like that (especially the hdfs://localhost:9000/ part) then please try again after setting those.

Otherwise, you said intermediate files are generated on hdfs but the final segment is not. If you go into the hadoop web ui, do you see any of the failed jobs? Can you see which jobs are failing, and whether it's mapper or reducer tasks that fail? Can you pull logs for one of the failed tasks? There should be some exceptions in there that will help figure out what is going on.

Btw- the "Job status not available" error is not likely the root cause of your problem, although it indicates that your job history server is having issues. Getting that working right would help a lot with debugging. Usually it's enough to have the job history daemon running, to have the properties mapreduce.jobhistory.address and mapreduce.jobhistory.webapp.address set in mapred-site.xml, and to have yarn.log.server.url set in yarn-site.xml.

...

Deepak Jain

unread,

Jul 31, 2014, 11:24:40 PM7/31/14

to druid-de...@googlegroups.com

Verify these settings. (Working one)

Overlord node

echo "druid.host=`hostname -f`" >> config/overlord/runtime.properties

echo "druid.port=8087" >> config/overlord/runtime.properties

echo "druid.service=overlord" >> config/overlord/runtime.properties

echo "druid.zk.service.host=druid-zookeeper-251252.slc01.dev.ebayc3.com" >> config/overlord/runtime.properties

echo "druid.db.connector.connectURI=jdbc:mysql://druid-mysql-255225.slc01.dev.ebayc3.com:3306/druid" >> config/overlord/runtime.properties

echo "druid.db.connector.user=druid" >> config/overlord/runtime.properties

echo "druid.db.connector.password=diurd" >> config/overlord/runtime.properties

echo "druid.selectors.indexing.serviceName=overlord" >> config/overlord/runtime.properties

echo "druid.indexer.queue.startDelay=PT0M" >> config/overlord/runtime.properties

echo "druid.indexer.runner.javaOpts=\"-server -Xmx2g\"" >> config/overlord/runtime.properties

echo "druid.indexer.runner.startPort=8089" >> config/overlord/runtime.properties

echo "druid.indexer.fork.property.druid.computation.buffer.size=268435456" >> config/overlord/runtime.properties

echo "druid.indexer.fork.property.druid.processing.numThreads=1" >> config/overlord/runtime.properties

echo "druid.extensions.coordinates=[\"io.druid.extensions:druid-hdfs-storage:0.6.99\"]" >> config/overlord/runtime.properties

echo "druid.storage.type=hdfs" >> config/overlord/runtime.properties

echo "druid.storage.storageDirectory=hdfs://namenode-284133.slc01.dev.com:8020/tmp/trackingstorage" >> config/overlord/runtime.properties

Run

------

export DRUID_HOME=/home/hdfs/druid-services-0.6.109-SNAPSHOT

java -Xmx12g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath $DRUID_HOME/lib/*:/etc/hadoop/conf/:$DRUID_HOME/config/overlord io.druid.cli.Main server overlord

Historical

echo "druid.host=`hostname -f`" >> config/historical/runtime.properties

echo "druid.port=8081" >> config/historical/runtime.properties

echo "druid.service=historical" >> config/historical/runtime.properties

echo "druid.zk.service.host=druid-zookeeper-251252.slc01.dev.ebayc3.com" >> config/historical/runtime.properties

echo "druid.db.connector.connectURI=jdbc:mysql://druid-mysql-255225.slc01.dev.ebayc3.com:3306/druid" >> config/historical/runtime.properties

echo "druid.db.connector.user=druid" >> config/historical/runtime.properties

echo "druid.db.connector.password=diurd" >> config/historical/runtime.properties

echo "druid.extensions.coordinates=[\"io.druid.extensions:druid-hdfs-storage:0.6.99\"]" >> config/historical/runtime.properties

echo "druid.storage.type=hdfs" >> config/historical/runtime.properties

echo "druid.storage.storageDirectory=hdfs://namenode-284133.slc01.dev.com:8020/tmp/trackingstorage" >> config/historical/runtime.properties

echo "druid.server.maxSize=11000000000" >> config/historical/runtime.properties

echo "druid.segmentCache.locations=[{\"path\": \"/tmp/druid/indexCache\", \"maxSize\"\: 11000000000}]" >> config/historical/runtime.properties

echo "druid.monitoring.monitors=[\"io.druid.server.metrics.ServerMonitor\", \"com.metamx.metrics.SysMonitor\",\"com.metamx.metrics.JvmMonitor\"]" >> config/historical/runtime.properties

echo "# Change these to make Druid faster" >> config/historical/runtime.properties

echo "druid.processing.buffer.sizeBytes=512000000" >> config/historical/runtime.properties

echo "druid.processing.numThreads=7" >> config/historical/runtime.properties

echo "druid.query.groupBy.maxResults=1000000" >> config/historical/runtime.properties

Run

------

java -Xmx6g -Xms6g -XX:NewSize=256m -XX:MaxNewSize=256m -XX:+PrintGCDetails -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath $DRUID_HOME/lib/*:$DRUID_HOME/config/historical:/etc/hadoop/conf:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.4.0.2.1.1.0-385.jar:/usr/lib/hadoop/hadoop-common-2.4.0.2.1.1.0-385.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-mapreduce/hadoop-auth.jar io.druid.cli.Main server historical

1) The bold text in runtime.properties make sure segments are stored in HDFS by indexing service and segments are read from HDFS by historical nodes.

2) Make sure you include hadoop_conf directory as shown above in classpath and for historical node you include all the jars mentioned above in classpath (or run hadoop classpath and include those jars).

3) What kind of hadoop cluster is yours ? Is it single node or multi node ? Who setup the cluster for you, or did you use Ambari ?

Regards,

Deepak

For indexing you must use indexing REST service instead of standalone java program.

Nishant

Software Engineer | METAMARKETS
<table cellspacing="0" cellpadding="0" border="0" style="border-collapse:collapse;font-size:11px;fon
...

tarun gulyani

unread,

Aug 1, 2014, 1:56:25 PM8/1/14

to druid-de...@googlegroups.com

Hi Gian,

Thanks for replying. I have already mentioned there storage properties in runtime.properties of indexer. Below the detail of "config/overlord/runtime.properties" :

############################################config/overlord/runtime.properties######################################

druid.host=localhost

druid.port=8087

druid.service=overlord

druid.zk.service.host=localhost

druid.extensions.coordinates=["io.druid.extensions:druid-kafka-seven:0.6.121"]

druid.extensions.coordinates=["io.druid.extensions:druid-hdfs-storage:0.6.121"]

druid.db.connector.connectURI=jdbc:mysql://localhost:3306/druid

druid.db.connector.user=root

druid.db.connector.password=root

druid.selectors.indexing.serviceName=overlord

druid.indexer.queue.startDelay=PT0M

druid.indexer.runner.javaOpts="-server -Xmx256m"

druid.indexer.fork.property.druid.processing.numThreads=1

druid.indexer.fork.property.druid.computation.buffer.size=100000000

#druid.storage.type=local

#druid.storage.storageDirectory=/tmp/druid/localStorage1

druid.storage.type=hdfs

druid.storage.storageDirectory=hdfs://localhost:9000/druid

druid.pusher.hdfs=true

druid.indexer.fork.property.druid.indexer.task.hadoopWorkingPath=hdfs://localhost:9000/druid

druid.indexer.fork.property.druid.indexer.task.baseTaskDir=hdfs://localhost:9000/tmp/persistent

druid.indexer.fork.property.druid.indexer.task.baseDir=hdfs://localhost:9000/tmp

druid.indexer.task.hadoopWorkingPath=hdfs://localhost:9000/druid

#############################################################################################################

Regarding failure of job : Map job is failing, log of failure job is :

/job_1406914384450_0001/job_1406914384450_0001_1.jhist to file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001-1406914786322-tarun-wikipedia%2Ddetermine_partitions_groupby%2DOptional.of-1406914826342-0-0-FAILED-default-1406914798528.jhist_tmp
2014-08-01 23:10:26,539 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001-1406914786322-tarun-wikipedia%2Ddetermine_partitions_groupby%2DOptional.of-1406914826342-0-0-FAILED-default-1406914798528.jhist_tmp
2014-08-01 23:10:26,542 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying file:/tmp/hadoop-yarn/staging/tarun/.staging/job_1406914384450_0001/job_1406914384450_0001_1_conf.xml to file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001_conf.xml_tmp
2014-08-01 23:10:26,558 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001_conf.xml_tmp
2014-08-01 23:10:26,561 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001.summary_tmp to file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001.summary
2014-08-01 23:10:26,561 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001_conf.xml_tmp to file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001_conf.xml
2014-08-01 23:10:26,562 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001-1406914786322-tarun-wikipedia%2Ddetermine_partitions_groupby%2DOptional.of-1406914826342-0-0-FAILED-default-1406914798528.jhist_tmp to file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001-1406914786322-tarun-wikipedia%2Ddetermine_partitions_groupby%2DOptional.of-1406914826342-0-0-FAILED-default-1406914798528.jhist
2014-08-01 23:10:26,584 INFO [Thread-58] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
2014-08-01 23:10:26,588 INFO [Thread-58] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Setting job diagnostics to Task failed task_1406914384450_0001_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

##################################################################################################################

Regarding Intermediate directory : Whenever i submit the index_hadoop task, then at hdfs this blank directory create at hdfs path :

hdfs://localhost:9000/druid/wikipedia/2014-08-01T173931.647Z/groupedData

I am in doubt if hdfs path or something wrong, then it should not be able to create directory at hdfs.

tarun gulyani

unread,

Aug 1, 2014, 2:18:00 PM8/1/14

to druid-de...@googlegroups.com

Hi Deepak,

Thanks for replying. All the properties which you have mentioned, most of the properties i have already added in runtime.properties, except hdfs storage properties at historical node. Previously i was adding in only "config/overlord/runtime.properties". Now i have added in "config/historical/runtime.properties". Even then index_hadoop task fails.

Please go through properties and command mentioned below. If i am doing anything wrong let me know.

1) ###################################"config/overlord/runtime.properties"#################################################

druid.host=localhost

druid.port=8087

druid.service=overlord

druid.zk.service.host=localhost

druid.extensions.coordinates=["io.druid.extensions:druid-kafka-seven:0.6.121"]

druid.extensions.coordinates=["io.druid.extensions:druid-hdfs-storage:0.6.121"]

druid.db.connector.connectURI=jdbc:mysql://localhost:3306/druid

druid.db.connector.user=root

druid.db.connector.password=root

druid.selectors.indexing.serviceName=overlord

druid.indexer.queue.startDelay=PT0M

druid.indexer.runner.javaOpts="-server -Xmx256m"

druid.indexer.fork.property.druid.processing.numThreads=1

druid.indexer.fork.property.druid.computation.buffer.size=100000000

#druid.storage.type=local

#druid.storage.storageDirectory=/tmp/druid/localStorage1

druid.storage.type=hdfs

druid.storage.storageDirectory=hdfs://localhost:9000/druid

druid.pusher.hdfs=true

druid.indexer.fork.property.druid.indexer.task.hadoopWorkingPath=hdfs://localhost:9000/druid

druid.indexer.fork.property.druid.indexer.task.baseTaskDir=hdfs://localhost:9000/tmp/persistent

druid.indexer.fork.property.druid.indexer.task.baseDir=hdfs://localhost:9000/tmp

druid.indexer.task.hadoopWorkingPath=hdfs://localhost:9000/druid

###########################################################################################################

Command to run Indexer :

java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:/home/tarun/hadoop-yarn/hadoopJars/*:/home/tarun/hadoop-yarn/etc/hadoop:config/overlord io.druid.cli.Main server overlord

2) ###################################"config/historical/runtime.properties"#################################################

druid.host=localhost

druid.service=historical

druid.port=8091

druid.zk.service.host=localhost

druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.121"]

druid.extensions.coordinates=["io.druid.extensions:druid-hdfs-storage:0.6.121"]

# Dummy read only AWS account (used to download example data)

druid.s3.secretKey=QyyfVZ7llSiRg6Qcrql1eEUG7buFpAK6T6engr1b

druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ

druid.server.maxSize=10000000000

# Change these to make Druid faster

druid.processing.buffer.sizeBytes=100000000

druid.processing.numThreads=1

druid.segmentCache.locations=[{"path": "/tmp/druid/indexCacheNew", "maxSize"\: 10000000000}]

druid.storage.type=hdfs

druid.storage.storageDirectory=hdfs://localhost:9000/druid

druid.pusher.hdfs=true

druid.indexer.fork.property.druid.indexer.task.hadoopWorkingPath=hdfs://localhost:9000/druid

druid.indexer.fork.property.druid.indexer.task.baseTaskDir=hdfs://localhost:9000/tmp/persistent

druid.indexer.fork.property.druid.indexer.task.baseDir=hdfs://localhost:9000/tmp

##########################################################################################################################

Command to run historical :

java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:/home/tarun/hadoop-yarn/hadoopJars/*:/home/tarun/hadoop-yarn/etc/hadoop:config/historical io.druid.cli.Main server historical

3) Procedure to call the index_hadoop task :

curl -X 'POST' -H 'Content-Type:application/json' -d @examples/indexing/wikipedia_index_hadoop_task.json localhost:8087/druid/indexer/v1/task

############################################################################################################################

1) When we setup apache hadoop-2.4, all the conf file present in etc/hadoop folder, there is no conf folder. Therefore i have mentioned in all above command etc/hadoop folder for configuration. But in your command for conf folder path is ":/etc/hadoop/conf/. DO you separately create conf folder and put all configuration file there ?

2) I am using apache -hadoop 2.4. It is in "pseudo distributed mode" on my laptop. Setup is done by myself not Ambari. I am using bunch of Big Data and Machine Learning tools, all are working perfectly in this setup. No idea what is wrong causing for Druid.

I am again looking at Druid setup, what is wrong causing for hadoop task. Please let me know if you find anything wrong from my side in setup of druid which i have mentioned above.

To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/druid-development/9c506ba9-af61-413e-b841-2370bec77ca4%40googlegroups.com?utm_medium=email&utm_source=footer" target="_blank" onmousedown="this.href='https://groups.google.com/d/msgid/druid-development/9c506ba9-af61-413e-b841-2370bec77ca4%40googlegroups.
...

Gian Merlino

unread,

Aug 1, 2014, 5:51:21 PM8/1/14

to druid-de...@googlegroups.com

That exception looks like the exception from the hadoop client. The actual task node should have a more interesting exception, which you should be able to find by clicking through the hadoop web ui. It's interesting that a map failed rather than a reduce- maybe the job is having trouble reading your input data, possibly because it's expecting a different format. The exception from the mapper would help a lot.

Reply all

Reply to author

Forward