Ok it seems that we moved further on solving this.
So we created a batch indexing job (correct me if I am wrong, but this is batch indexing job, not batch-reindexing, correct?) that looks like this:
{
"type": "index_hadoop",
"spec": {
"dataSchema": {
"dataSource": "new-datasource-index-1h",
"parser": {
"type": "hadoopyString",
"parseSpec": {
"format": "json",
"timestampSpec": {
"column": "timestamp",
"format": "auto"
},
"dimensionsSpec": {
"dimensions" : ["aaa", "bbb", "ccc", "ddd", "eee"],
"dimensionExclusions" : ["xxx", "yyy"],
"spatialDimensions" : []
}
}
},
"metricsSpec": [
{ "name": "event_qty", "type": "longSum", "fieldName": "count" },
{ "name": "sum_value", "type": "doubleSum", "fieldName": "val"},
{ "name": "min_value", "type": "doubleMin", "fieldName": "val"},
{ "name": "max_value", "type": "doubleMax", "fieldName": "val"}
],
"granularitySpec": {
"type" : "uniform",
"segmentGranularity" : "day",
"queryGranularity" : "hour",
"rollup" : true,
"intervals" : ["2017-12-01T00:00:00.000/2017-12-01T04:00:00.000"]
}
},
"ioConfig": {
"type": "hadoop",
"inputSpec": {
"type": "dataSource",
"ingestionSpec": {
"dataSource" : "existing-druid-datasource",
"intervals" : ["2017-12-01T00:00:00.000/2017-12-01T04:00:00.000"],
"metrics" : ["count", "val"]
}
}
},
"tuningConfig": {
"type": "hadoop",
"maxRowsInMemory": 15000000,
"numBackgroundPersistThreads": 0,
"jobProperties": {
"mapreduce.job.classloader": "true"
}
}
}
}
We can confirm that this job runs ok (sometimes not, crashing with "Java heap space" error) and indeed creates a new data source from existing ones with new granularity settings.
But, there's a slight problem: it seems that it cannot recognize metrics since after job completes, I can see that all new metrics in new data source are equal to 0 (as if I have not found any metrics in existing data source even if I had it specified in ingestionSpec under ioConfig).
For example here's one row from existing-druid-datasource data source:
{
"__time" : "2017-12-01T01:05:00.000Z",
"aaa" : "some-aaa-value",
"bbb" : "some-bbb-value",
"ccc" : "some-ccc-value",
"ddd" : "some-ddd-value",
"eee" : "some-eee-value",
"xxx" : "151128900",
"yyy" : "some irrelevant text",
"count" : 1,
"val" : 59287810
}
And here's what was created in new-datasource-index-1h datasource:
{
"__time" : "2017-12-01T01:00:00.000Z",
"aaa" : "some-aaa-value",
"bbb" : "some-bbb-value",
"ccc" : "some-ccc-value",
"ddd" : "some-ddd-value",
"eee" : "some-eee-value",
"event_qty" : 0,
"sum_value" : 0
"min_value" : 0
"max_value" : 0
}
Any clue why this happens?
Thanks in advance!