Druid Setup with HDFS

1,955 views
Skip to first unread message

tarun gulyani

unread,
Jul 24, 2014, 4:59:35 AM7/24/14
to druid-de...@googlegroups.com
Hi,

Please help me out, how o setup druid with HDFS. Instead of Storing data at /tmp/druid/localStorage, Store at HDFS path.

One more thing how can i change the path of local storage also like as instead of Store at /tmp/druid/localStorage Store as some other path may be  /tmp/druid/localStorageNew etc.

I have tried put this entry at Broker,Reltime and Historical config but it doesn't work. 

druid.pusher.local=true
druid.pusher.local.storageDirectory=/tmp/druid/localStorage1
druid.storage.local.storageDirectory=[{"path": "/tmp/druid/indexCache1", "maxSize"\: 10000000000}]

I have tried different way for path writing as druid.pusher.local.storageDirectory=/tmp/druid/localStorage1 and druid.storage.local.storageDirectory=[{"path": "/tmp/druid/indexCache1", "maxSize"\: 10000000000}]. But it doesn't change. All data store at /tmp/druid/localStorage.


Nishant Bangarwa

unread,
Jul 25, 2014, 1:32:49 AM7/25/14
to druid-de...@googlegroups.com
See Inline


On Thu, Jul 24, 2014 at 2:29 PM, tarun gulyani <tarung...@gmail.com> wrote:
Hi,

Please help me out, how o setup druid with HDFS. Instead of Storing data at /tmp/druid/localStorage, Store at HDFS path.
you can set your deep storage to hdfs by setting these properties in runtime.props for realtime - 

druid.extensions.coordinates=["io.druid.extensions:druid-hdfs-storage:<druid_version>"]
druid.storage.type=hdfs
druid.storage.storageDirectory=<directory for storing segments>
  

One more thing how can i change the path of local storage also like as instead of Store at /tmp/druid/localStorage Store as some other path may be  /tmp/druid/localStorageNew etc.

I have tried put this entry at Broker,Reltime and Historical config but it doesn't work. 

druid.pusher.local=true
druid.pusher.local.storageDirectory=/tmp/druid/localStorage1
druid.storage.local.storageDirectory=[{"path": "/tmp/druid/indexCache1", "maxSize"\: 10000000000}]
the correct properties to set for different deep storage are documented here - 

I have tried different way for path writing as druid.pusher.local.storageDirectory=/tmp/druid/localStorage1 and druid.storage.local.storageDirectory=[{"path": "/tmp/druid/indexCache1", "maxSize"\: 10000000000}]. But it doesn't change. All data store at /tmp/druid/localStorage.


--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/dee4ac79-0cde-48dd-8dc1-7b4150db0047%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

tarun gulyani

unread,
Jul 25, 2014, 2:06:26 AM7/25/14
to druid-de...@googlegroups.com

Thanks Nishant for replying,

1) To change the storage path, i found the way . We have to add entry in "config/overlord/runtime.properties" :
    druid.storage.type=local
   druid.storage.storageDirectory=/tmp/druid/localStorage1 
This is properly working.

2) Regarding HDFS i tried same way in  "config/overlord/runtime.properties" : 
druid.storage.type=hdfs
druid.storage.storageDirectory=hdfs://localhost:54310/home/tarun/hadoop-workDir/druid.

But it doesn't work. Indexer prompt message the path for task is "hdfs:/localhost:54310/home/tarun/hadoop-workDir/druid". It consider "/" as escape sequence.

If i add entry of this "druid.extensions.coordinates=["io.druid.extensions:druid-hdfs-storage:0.6.121"]" parameter in runtime properties. Indexer hangs, This parameter also worn't work.

Command for indexer :
java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:/home/tarun/hadoop/conf:config/overlord io.druid.cli.Main server overlord

Task Command :
curl -X 'POST' -H 'Content-Type:application/json' -d @examples/indexing/wikipedia_index_hadoop_task.json localhost:8087/druid/indexer/v1/task

Nishant Bangarwa

unread,
Jul 25, 2014, 4:21:02 AM7/25/14
to druid-de...@googlegroups.com
When you say the indexer hangs, i wonder if it was downloading the extension and loading it at startup ? 
the indexer logs will have more info on this. 
To use hdfs as a deep storage you will need to add druid-hdfs-storage as an extension. 



For more options, visit https://groups.google.com/d/optout.

tarun gulyani

unread,
Jul 25, 2014, 6:44:16 AM7/25/14
to druid-de...@googlegroups.com
Hi Nishant,

I am able to run using hdfs. Now segment is storing at hdfs. But i am able to run the task "wikipedia_hadoop_config.json", not the "wikipedia_index_hadoop_task.json".

wikipedia_hadoop_config.json give correct path at console on indexer and also store at hdfs properly. But when i run the task "wikipedia_index_hadoop_task.json", it failed. In log it shows  below exception : 
java.lang.Exception: java.lang.IllegalArgumentException: Pathname /druid/wikipedia/wikipedia/2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z/2014-07-25T09:50:22.991Z/0 from hdfs://localhost:9000/druid/wikipedia/wikipedia/2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z/2014-07-25T09:50:22.991Z/0 is not a valid DFS filename.

tarun gulyani

unread,
Jul 25, 2014, 7:26:46 AM7/25/14
to druid-de...@googlegroups.com
One more thing Nishant : 
    Indexer run for hadoop task "wikipedia_index_hadoop_task.json"  is this way "java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhadoop.mapred.job.queue.name=HI -classpath lib/*:/home/tarun/hadoop-yarn/hadoopJars/*:config/overlord io.druid.cli.Main server overlord" 
or do i need to run indexer different way?

Nishant Bangarwa

unread,
Jul 25, 2014, 8:18:13 AM7/25/14
to druid-de...@googlegroups.com
Hi, 
to run the indexer you need to add hadoop jars and config to the classpath, 
Also, If you are running on a version of hadoop other than the default (2.3.0) you will need to specify hadoopCoordinates for your hadoop version. 
which version of hadoop are you running with, did you added hadoop config files to the classpath ? 
can you share full stack trace of the exception ?  



For more options, visit https://groups.google.com/d/optout.

tarun gulyani

unread,
Jul 25, 2014, 1:30:01 PM7/25/14
to druid-de...@googlegroups.com
HI Nishant,

I am using apache hadoop 2.4. I have added Hadoop jars in classpath . Command for indexer run is :
java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8  -classpath lib/*:/home/tarun/hadoop-yarn/hadoopJars/*:config/overlord io.druid.cli.Main server overlord

i have copied all the jars of Hadoop 2.4 in this hadoopJars folder and include this path for any hadoop jar needed to druid indexer.

This task "curl -X 'POST' -H 'Content-Type:application/json' -d @examples/indexing/wikipedia_index_task.json localhost:8087/druid/indexer/v1/task" perfectly work and store segments at hdfs path mentioned in "config/overlord/runtime.properties"

But this task "curl -X 'POST' -H 'Content-Type:application/json' -d @examples/indexing/wikipedia_index_hadoop_task.json localhost:8087/druid/indexer/v1/task" fails. I have attached complete stack trace also.
index_hadoop_wikipedia_2014-07-25T11:15:54.304Z.log

Nishant Bangarwa

unread,
Jul 28, 2014, 9:59:39 AM7/28/14
to druid-de...@googlegroups.com
Hi Tarun, 

Druid checks the default file system for replacing ":" with "_" and making a valid DFS file path,
What is the value of fs.defaultFS set in hadoop config files ? 
can you try pointing this to hdfs filesystem, If its not already doing that ? 



For more options, visit https://groups.google.com/d/optout.

tarun gulyani

unread,
Jul 28, 2014, 1:38:03 PM7/28/14
to druid-de...@googlegroups.com
Hi Nishant,

I have added that entry also , previously there was entry for "fs.default.name". Now i have added "fs.defaultFS" entry also. Still getting same error.

Configuration :
<property>
  <name>fs.default.name</name>
 <value>hdfs://localhost:9000</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://localhost:9000</value>
</property>

tarun gulyani

unread,
Jul 29, 2014, 11:35:40 AM7/29/14
to druid-de...@googlegroups.com

Hi Nishant,

Due to add the hadoop configuration folder path and including the "druid.indexer.fork.property.druid.indexer.task.hadoopWorkingPath=hdfs://localhost:9000/druid" path for for hadoop working directory. I am able to resolve most of the errors and now intermediate files are generating in hdfs.

But final task segment is not saving on hdfs and Task status is being failed. Exception from log is :

2014-07-29 09:08:47,249 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Running job: job_1406612167702_0009
2014-07-29 09:09:10,352 INFO [task-runner-0] org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server
2014-07-29 09:09:10,365 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_wikipedia_2014-07-29T09:08:36.566Z, type=index_hadoop, dataSource=wikipedia}]
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:206)
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:219)
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:198)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.io.IOException: Job status not available
        at com.google.common.base.Throwables.propagate(Throwables.java:160)
        at io.druid.indexer.DeterminePartitionsJob.run(DeterminePartitionsJob.java:246)
        at io.druid.indexer.JobHelper.runJobs(JobHelper.java:135)
        at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:86)
        at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:303)
        ... 11 more
Caused by: java.io.IOException: Job status not available
        at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
        at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:599)
        at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1344)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1306)
        at io.druid.indexer.DeterminePartitionsJob.run(DeterminePartitionsJob.java:151)
        ... 14 more

Deepak Jain

unread,
Jul 29, 2014, 11:06:09 PM7/29/14
to druid-de...@googlegroups.com

tarun gulyani

unread,
Jul 31, 2014, 3:26:32 PM7/31/14
to druid-de...@googlegroups.com
Hi Deepak,

Thanks for reply. It didn't helped. I have run history server again. Still facing same issue with index_hadoop task : 
...

Gian Merlino

unread,
Jul 31, 2014, 10:58:47 PM7/31/14
to druid-de...@googlegroups.com
What sort of storage properties have you set? From what you've said so far it sounds like you should at least have:

    druid.storage.type=hdfs
    druid.storage.storageDirectory=hdfs://localhost:9000/druid_segments_go_here/

If you don't already have something like that (especially the hdfs://localhost:9000/ part) then please try again after setting those.

Otherwise, you said intermediate files are generated on hdfs but the final segment is not. If you go into the hadoop web ui, do you see any of the failed jobs? Can you see which jobs are failing, and whether it's mapper or reducer tasks that fail? Can you pull logs for one of the failed tasks? There should be some exceptions in there that will help figure out what is going on.

Btw- the "Job status not available" error is not likely the root cause of your problem, although it indicates that your job history server is having issues. Getting that working right would help a lot with debugging. Usually it's enough to have the job history daemon running, to have the properties mapreduce.jobhistory.address and mapreduce.jobhistory.webapp.address set in mapred-site.xml, and to have yarn.log.server.url set in yarn-site.xml.

...

Deepak Jain

unread,
Jul 31, 2014, 11:24:40 PM7/31/14
to druid-de...@googlegroups.com
Verify these settings.  (Working one)

Overlord node
echo "druid.host=`hostname -f`" >> config/overlord/runtime.properties

echo "druid.port=8087" >> config/overlord/runtime.properties
echo "druid.service=overlord" >> config/overlord/runtime.properties

echo "druid.zk.service.host=druid-zookeeper-251252.slc01.dev.ebayc3.com" >> config/overlord/runtime.properties

echo "druid.db.connector.connectURI=jdbc:mysql://druid-mysql-255225.slc01.dev.ebayc3.com:3306/druid" >> config/overlord/runtime.properties
echo "druid.db.connector.user=druid" >> config/overlord/runtime.properties
echo "druid.db.connector.password=diurd" >> config/overlord/runtime.properties

echo "druid.selectors.indexing.serviceName=overlord" >> config/overlord/runtime.properties
echo "druid.indexer.queue.startDelay=PT0M" >> config/overlord/runtime.properties
echo "druid.indexer.runner.javaOpts=\"-server -Xmx2g\"" >> config/overlord/runtime.properties
echo "druid.indexer.runner.startPort=8089" >> config/overlord/runtime.properties
echo "druid.indexer.fork.property.druid.computation.buffer.size=268435456" >> config/overlord/runtime.properties
echo "druid.indexer.fork.property.druid.processing.numThreads=1" >> config/overlord/runtime.properties

echo "druid.extensions.coordinates=[\"io.druid.extensions:druid-hdfs-storage:0.6.99\"]" >> config/overlord/runtime.properties
echo "druid.storage.type=hdfs" >> config/overlord/runtime.properties
echo "druid.storage.storageDirectory=hdfs://namenode-284133.slc01.dev.com:8020/tmp/trackingstorage" >> config/overlord/runtime.properties

Run
------
export DRUID_HOME=/home/hdfs/druid-services-0.6.109-SNAPSHOT
java -Xmx12g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath $DRUID_HOME/lib/*:/etc/hadoop/conf/:$DRUID_HOME/config/overlord io.druid.cli.Main server overlord


Historical
echo "druid.host=`hostname -f`" >> config/historical/runtime.properties
echo "druid.port=8081" >> config/historical/runtime.properties
echo "druid.service=historical" >> config/historical/runtime.properties

echo "druid.zk.service.host=druid-zookeeper-251252.slc01.dev.ebayc3.com" >> config/historical/runtime.properties

echo "druid.db.connector.connectURI=jdbc:mysql://druid-mysql-255225.slc01.dev.ebayc3.com:3306/druid" >> config/historical/runtime.properties
echo "druid.db.connector.user=druid" >> config/historical/runtime.properties
echo "druid.db.connector.password=diurd" >> config/historical/runtime.properties

echo "druid.extensions.coordinates=[\"io.druid.extensions:druid-hdfs-storage:0.6.99\"]" >> config/historical/runtime.properties
echo "druid.storage.type=hdfs" >> config/historical/runtime.properties
echo "druid.storage.storageDirectory=hdfs://namenode-284133.slc01.dev.com:8020/tmp/trackingstorage" >> config/historical/runtime.properties

echo "druid.server.maxSize=11000000000" >> config/historical/runtime.properties
echo "druid.segmentCache.locations=[{\"path\": \"/tmp/druid/indexCache\", \"maxSize\"\: 11000000000}]" >> config/historical/runtime.properties

echo "druid.monitoring.monitors=[\"io.druid.server.metrics.ServerMonitor\", \"com.metamx.metrics.SysMonitor\",\"com.metamx.metrics.JvmMonitor\"]" >> config/historical/runtime.properties

echo "# Change these to make Druid faster" >> config/historical/runtime.properties
echo "druid.processing.buffer.sizeBytes=512000000" >> config/historical/runtime.properties
echo "druid.processing.numThreads=7" >> config/historical/runtime.properties
echo "druid.query.groupBy.maxResults=1000000" >> config/historical/runtime.properties

Run
------
java -Xmx6g -Xms6g -XX:NewSize=256m -XX:MaxNewSize=256m -XX:+PrintGCDetails -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath $DRUID_HOME/lib/*:$DRUID_HOME/config/historical:/etc/hadoop/conf:/usr/lib/hadoop-hdfs/hadoop-hdfs-2.4.0.2.1.1.0-385.jar:/usr/lib/hadoop/hadoop-common-2.4.0.2.1.1.0-385.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-mapreduce/hadoop-auth.jar io.druid.cli.Main server historical

1) The bold text in runtime.properties make sure segments are stored in HDFS by indexing service and segments are read from HDFS by historical nodes.
2) Make sure you include hadoop_conf directory as shown above in classpath and for historical node you include all the jars mentioned above in classpath (or run hadoop classpath and include those jars).
3) What kind of hadoop cluster is yours ? Is it single node or multi node ? Who setup the cluster for you, or did you use Ambari ?

Regards,
Deepak 

For indexing you must use indexing REST service instead of standalone java program. 

         
         
         
         
         
         
         
 
Nishant
Software Engineer|METAMARKETS
<table cellspacing="0" cellpadding="0" border="0" style="border-collapse:collapse;font-size:11px;fon
...

tarun gulyani

unread,
Aug 1, 2014, 1:56:25 PM8/1/14
to druid-de...@googlegroups.com
Hi Gian,

Thanks for replying. I have already mentioned there storage properties in runtime.properties of indexer. Below the detail of "config/overlord/runtime.properties" :
############################################config/overlord/runtime.properties######################################
druid.host=localhost
druid.port=8087
druid.service=overlord

druid.zk.service.host=localhost

druid.extensions.coordinates=["io.druid.extensions:druid-kafka-seven:0.6.121"]
druid.extensions.coordinates=["io.druid.extensions:druid-hdfs-storage:0.6.121"]

druid.db.connector.connectURI=jdbc:mysql://localhost:3306/druid
druid.db.connector.user=root
druid.db.connector.password=root

druid.selectors.indexing.serviceName=overlord
druid.indexer.queue.startDelay=PT0M
druid.indexer.runner.javaOpts="-server -Xmx256m"
druid.indexer.fork.property.druid.processing.numThreads=1
druid.indexer.fork.property.druid.computation.buffer.size=100000000
#druid.storage.type=local
#druid.storage.storageDirectory=/tmp/druid/localStorage1

druid.storage.type=hdfs
druid.storage.storageDirectory=hdfs://localhost:9000/druid

druid.pusher.hdfs=true
druid.indexer.fork.property.druid.indexer.task.hadoopWorkingPath=hdfs://localhost:9000/druid
druid.indexer.fork.property.druid.indexer.task.baseTaskDir=hdfs://localhost:9000/tmp/persistent
druid.indexer.fork.property.druid.indexer.task.baseDir=hdfs://localhost:9000/tmp
druid.indexer.task.hadoopWorkingPath=hdfs://localhost:9000/druid

#############################################################################################################

Regarding failure of job : Map job is failing, log of failure job is :
/job_1406914384450_0001/job_1406914384450_0001_1.jhist to file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001-1406914786322-tarun-wikipedia%2Ddetermine_partitions_groupby%2DOptional.of-1406914826342-0-0-FAILED-default-1406914798528.jhist_tmp
2014-08-01 23:10:26,539 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001-1406914786322-tarun-wikipedia%2Ddetermine_partitions_groupby%2DOptional.of-1406914826342-0-0-FAILED-default-1406914798528.jhist_tmp
2014-08-01 23:10:26,542 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying file:/tmp/hadoop-yarn/staging/tarun/.staging/job_1406914384450_0001/job_1406914384450_0001_1_conf.xml to file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001_conf.xml_tmp
2014-08-01 23:10:26,558 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001_conf.xml_tmp
2014-08-01 23:10:26,561 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001.summary_tmp to file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001.summary
2014-08-01 23:10:26,561 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001_conf.xml_tmp to file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001_conf.xml
2014-08-01 23:10:26,562 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001-1406914786322-tarun-wikipedia%2Ddetermine_partitions_groupby%2DOptional.of-1406914826342-0-0-FAILED-default-1406914798528.jhist_tmp to file:/tmp/hadoop-yarn/staging/history/done_intermediate/tarun/job_1406914384450_0001-1406914786322-tarun-wikipedia%2Ddetermine_partitions_groupby%2DOptional.of-1406914826342-0-0-FAILED-default-1406914798528.jhist
2014-08-01 23:10:26,584 INFO [Thread-58] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
2014-08-01 23:10:26,588 INFO [Thread-58] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Setting job diagnostics to Task failed task_1406914384450_0001_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
##################################################################################################################
Regarding Intermediate directory : Whenever i submit the  index_hadoop task, then at hdfs this blank directory create at hdfs path :
hdfs://localhost:9000/druid/wikipedia/2014-08-01T173931.647Z/groupedData

I am in doubt if hdfs path or something wrong, then it should not be able to create directory at hdfs.

 

tarun gulyani

unread,
Aug 1, 2014, 2:18:00 PM8/1/14
to druid-de...@googlegroups.com
Hi Deepak,

Thanks for replying. All the properties which you have  mentioned, most of the properties i have already added in runtime.properties, except hdfs storage properties at historical node. Previously i was adding in only "config/overlord/runtime.properties". Now i have added in "config/historical/runtime.properties". Even then index_hadoop task fails.

Please go through properties and command mentioned below. If i am doing anything wrong let me know.

1)  ###################################"config/overlord/runtime.properties"#################################################
druid.host=localhost
druid.port=8087
druid.service=overlord

druid.zk.service.host=localhost

druid.extensions.coordinates=["io.druid.extensions:druid-kafka-seven:0.6.121"]
druid.extensions.coordinates=["io.druid.extensions:druid-hdfs-storage:0.6.121"]

druid.db.connector.connectURI=jdbc:mysql://localhost:3306/druid
druid.db.connector.user=root
druid.db.connector.password=root

druid.selectors.indexing.serviceName=overlord
druid.indexer.queue.startDelay=PT0M
druid.indexer.runner.javaOpts="-server -Xmx256m"
druid.indexer.fork.property.druid.processing.numThreads=1
druid.indexer.fork.property.druid.computation.buffer.size=100000000
#druid.storage.type=local
#druid.storage.storageDirectory=/tmp/druid/localStorage1

druid.storage.type=hdfs
druid.storage.storageDirectory=hdfs://localhost:9000/druid

druid.pusher.hdfs=true
druid.indexer.fork.property.druid.indexer.task.hadoopWorkingPath=hdfs://localhost:9000/druid
druid.indexer.fork.property.druid.indexer.task.baseTaskDir=hdfs://localhost:9000/tmp/persistent
druid.indexer.fork.property.druid.indexer.task.baseDir=hdfs://localhost:9000/tmp
druid.indexer.task.hadoopWorkingPath=hdfs://localhost:9000/druid
###########################################################################################################

Command to run Indexer : 
java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:/home/tarun/hadoop-yarn/hadoopJars/*:/home/tarun/hadoop-yarn/etc/hadoop:config/overlord io.druid.cli.Main server overlord

2)  ###################################"config/historical/runtime.properties"#################################################

druid.host=localhost
druid.service=historical
druid.port=8091

druid.zk.service.host=localhost

druid.extensions.coordinates=["io.druid.extensions:druid-s3-extensions:0.6.121"]
druid.extensions.coordinates=["io.druid.extensions:druid-hdfs-storage:0.6.121"]
# Dummy read only AWS account (used to download example data)
druid.s3.secretKey=QyyfVZ7llSiRg6Qcrql1eEUG7buFpAK6T6engr1b
druid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ

druid.server.maxSize=10000000000

# Change these to make Druid faster
druid.processing.buffer.sizeBytes=100000000
druid.processing.numThreads=1

druid.segmentCache.locations=[{"path": "/tmp/druid/indexCacheNew", "maxSize"\: 10000000000}]
druid.storage.type=hdfs
druid.storage.storageDirectory=hdfs://localhost:9000/druid

druid.pusher.hdfs=true
druid.indexer.fork.property.druid.indexer.task.hadoopWorkingPath=hdfs://localhost:9000/druid
druid.indexer.fork.property.druid.indexer.task.baseTaskDir=hdfs://localhost:9000/tmp/persistent
druid.indexer.fork.property.druid.indexer.task.baseDir=hdfs://localhost:9000/tmp
##########################################################################################################################

Command to run historical : 
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:/home/tarun/hadoop-yarn/hadoopJars/*:/home/tarun/hadoop-yarn/etc/hadoop:config/historical io.druid.cli.Main server historical
 
3) Procedure to call the index_hadoop task : 
curl -X 'POST' -H 'Content-Type:application/json' -d @examples/indexing/wikipedia_index_hadoop_task.json localhost:8087/druid/indexer/v1/task


############################################################################################################################

1)  When we setup apache hadoop-2.4, all the conf file present in etc/hadoop folder, there is no conf folder. Therefore i have mentioned in all above command  etc/hadoop folder for configuration. But in your command for conf folder path is ":/etc/hadoop/conf/. DO you separately create conf folder and put all configuration file there ?

2)  I am using apache -hadoop 2.4. It is in "pseudo distributed mode" on my laptop. Setup is done by myself not Ambari. I am using bunch of Big Data and Machine Learning tools, all are working perfectly in this setup. No idea what is wrong causing for Druid.

I am again looking at Druid setup, what is wrong causing for hadoop task. Please let me know if you find anything wrong from my side in setup of druid which i have mentioned above.

Gian Merlino

unread,
Aug 1, 2014, 5:51:21 PM8/1/14
to druid-de...@googlegroups.com
That exception looks like the exception from the hadoop client. The actual task node should have a more interesting exception, which you should be able to find by clicking through the hadoop web ui. It's interesting that a map failed rather than a reduce- maybe the job is having trouble reading your input data, possibly because it's expecting a different format. The exception from the mapper would help a lot.
Reply all
Reply to author
Forward
0 new messages