I have another problem: I am trying try to use HDFS for Deep Storge for indexing and can't get it to work with Druid 0.6.0. Local storage works fine. I always get the following Exception during reduce:
java.lang.IllegalArgumentException: Wrong FS: file://foobar/dummydata_2_10_10/dummydata_2_10_10/20130102T000000.000+0100_20130103T000000.000+0100/2013-10-25T13_12_33.426+02_00/0, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:390)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:340)
at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:492)
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.serializeOutIndex(IndexGeneratorJob.java:394)
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:374)
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:237)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
The path is obviously wrong and I think it tries to use the local storage. I set the path in config/overlord/runtime.properties (all other configs are default like in the tutorial):
druid.storage.type=hdfs
druid.storage.storageDirectory=foobar
Using something like "hdfs://hadoop-master:54310/foobar" for storageDirectory produces wrong paths too (something like file://hdfs:/hadoop-master:54310/foobar)
I start the indexing service with this command: java -Xmx2g -Duser.timezone=Europe/Berlin -Dfile.encoding=UTF-8 -classpath lib/*:`hadoop classpath`:config/overlord io.druid.cli.Main server overlord
My indexer configuration:
{
"config": {
"rollupSpec": {
"rollupGranularity": "day",
"aggs": [
{
"name": "event_1",
"fieldName": "event_1",
"type": "longSum"
},
{
"name": "event_2",
"fieldName": "event_2",
"type": "longSum"
},
{
"name": "event_3",
"fieldName": "event_3",
"type": "longSum"
},
{
"name": "event_4",
"fieldName": "event_4",
"type": "longSum"
},
{
"name": "event_5",
"fieldName": "event_5",
"type": "longSum"
},
{
"name": "event6",
"fieldName": "event_6",
"type": "longSum"
},
{
"name": "event7",
"fieldName": "event_7",
"type": "longSum"
},
{
"name": "event8",
"fieldName": "event_8",
"type": "longSum"
},
{
"name": "event9",
"fieldName": "event_9",
"type": "longSum"
},
{
"name": "event10",
"fieldName": "event_10",
"type": "longSum"
},
{
"name": "value_1",
"fieldName": "value_1",
"type": "doubleSum"
},
{
"name": "value_2",
"fieldName": "value_2",
"type": "doubleSum"
},
{
"name": "value_3",
"fieldName": "value_3",
"type": "doubleSum"
},
{
"name": "value_4",
"fieldName": "value_4",
"type": "doubleSum"
},
{
"name": "value_5",
"fieldName": "value_5",
"type": "doubleSum"
},
{
"name": "value_6",
"fieldName": "value_6",
"type": "doubleSum"
},
{
"name": "value_7",
"fieldName": "value_7",
"type": "doubleSum"
},
{
"name": "value_8",
"fieldName": "value_8",
"type": "doubleSum"
},
{
"name": "value_9",
"fieldName": "value_9",
"type": "doubleSum"
},
{
"name": "value_10",
"fieldName": "value_10",
"type": "doubleSum"
}
]
},
"pathSpec": {
"paths": "csv/dummydata_2_10_10.csv",
"type": "static"
},
"granularitySpec": {
"gran": "DAY",
"intervals": [
"2013-01-01TZ/2013-01-08TZ"
],
"type": "uniform"
},
"dataSpec": {
"dimensions": [
"dim_1",
"dim_2"
],
"columns": [
"ts",
"dim_1",
"dim_2",
"event_1",
"event_2",
"event_3",
"event_4",
"event_5",
"event_6",
"event_7",
"event_8",
"event_9",
"event_10",
"value_1",
"value_2",
"value_3",
"value_4",
"value_5",
"value_6",
"value_7",
"value_8",
"value_9",
"value_10"
],
"format": "csv"
},
"timestampFormat": "posix",
"timestampColumn": "ts",
"dataSource": "dummydata_2_10_10",
"targetPartitionSize" : 5000000
},
"type": "index_hadoop"
}
Thank you!