Hi ,
I am trying to use HadoopDruidIndexer for batch ingestion from a CSV file. I have installed Hadoop on Ubuntu in single node cluster mode. According to the wiki we can leave "partitionDimension" option blank for Index node. But when I run HadoopDruidIndexer I see below error:
attempt_201305151342_0016_r_000000_0: com.metamx.common.ISE: No suitable partitioning dimension found!
attempt_201305151342_0016_r_000000_0: at com.metamx.druid.indexer.DeterminePartitionsJob$DeterminePartitionsDimSelectionReducer.innerReduce(DeterminePartitionsJob.java:652)
attempt_201305151342_0016_r_000000_0: at com.metamx.druid.indexer.DeterminePartitionsJob$DeterminePartitionsDimSelectionBaseReducer.reduce(DeterminePartitionsJob.java:444)
attempt_201305151342_0016_r_000000_0: at com.metamx.druid.indexer.DeterminePartitionsJob$DeterminePartitionsDimSelectionBaseReducer.reduce(DeterminePartitionsJob.java:417)
attempt_201305151342_0016_r_000000_0: at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
attempt_201305151342_0016_r_000000_0: at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
attempt_201305151342_0016_r_000000_0: at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
attempt_201305151342_0016_r_000000_0: at org.apache.hadoop.mapred.Child.main(Child.java:170)
attempt_201305151342_0016_r_000000_0: 2013-05-15 14:52:13,016 INFO [main] org.apache.hadoop.mapred.TaskRunner - Runnning cleanup for the task
2013-05-15 14:52:28,660 INFO [main] org.apache.hadoop.mapred.JobClient - Task Id : attempt_201305151342_0016_r_000000_1, Status : FAILED
attempt_20
Here is the config that I am using:
{
"dataSource":"customer",
"timestampColumn": "ts",
"timestampFormat": "auto",
"dataSpec": {
"format": "csv",
"columns": [ "ts","City", "State", "ZipCode", "Country"],
"dimensions": ["City", "State"]
},
"granularitySpec": {
"type":"uniform",
"intervals":["2013-05-10/2013-05-11"],
"gran":"day"
},
"pathSpec": { "type": "granularity",
"dataGranularity": "year",
"inputPath": "hdfs://localhost:54310/home/test/app/hadoop/tmp/customer",
"filePattern": ".*" },
"rollupSpec": { "aggs": [
{ "type": "count", "name":"event_count" },
{ "type": "count", "fieldName": "ZipCode", "name": "revenue" }
],
"rollupGranularity": "minute"},
"workingPath": "/home/sharmin/app/hadoop/tmp",
"segmentOutputPath": "hdfs://localhost:54310/home/test/app/hadoop/tmp/customer/output",
"leaveIntermediate": "false",
"partitionsSpec": {
"targetPartitionSize": 5000000
},
"updaterJobSpec": {
"type":"db",
"user":"root",
"password":"root",
"segmentTable":"prod_segments"
}
}
Not sure what I am missing here. Any help appreciated.
Thanks,
Sharmin