"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "DAY",
"queryGranularity" : "DAY",
"intervals" : [ "2016-08-03T01:00:00.000Z/2016-08-03T02:00:00.000Z" ]
}--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/zY_i2JuYJIU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/f9f4e3f7-4a47-4f0c-9efd-6f9f85ae00a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.
- If you *really* want to do this hourly DAY granularity ingestion, it should be possible to every hour submit a batch indexing job with DAY segment granularity but instead of specifying an hour long interval like 2016-08-03T02:00/2016-08-03T03:00, specify the interval for the full day like 2016-08-03/2016-08-04. You would then trigger this job with the same day long interval each hour and it would generate successively larger segments that contain the data for the past hour + all the other hours in the day before that, and the 24th run of the day would be a segment containing all of your data. You just have to be sure that you retain the data for all the previous hours. Also you'd want to have some segment killing tasks set up otherwise you'll be using an excessive amount of deep storage.
- Having said that, I'm not totally clear about your setup, but probably one of the following is what you actually want to do:
- have a realtime ingestion pipeline that generates HOUR segments combined with a batch ingestion job that takes those segments and merges them together into DAY granularity every 24 hours or so. Your batch ingestion job can generate the DAY segment either from the raw data that was fed into the realtime indexers or by reading the segments generated by the realtime indexers and re-indexing them with a different schema (i.e. DAY segment granularity).
- or, if you're getting bursts of data from your pipeline every hour instead of a continuous stream, setting up a realtime pipeline may be overkill. In that case, it'd probably make sense to submit hourly ingestion tasks with HOUR segment granularity for the past hour of data, and then once per day run another ingestion task to generate a segment with DAY granularity that again can either source the input from the original data fed into Druid or from the completed segments generated by the hourly ingestion tasks. This aggregated DAY segment will overshadow the previous HOUR segments and will be used when responding to queries. Again, Druid has no problem running queries with segments of different granularities.
"tuningConfig" : {
"type" : "hadoop",
"jobProperties" : {
"fs.s3.awsAccessKeyId" : "foo",
"fs.s3n.awsAccessKeyId" : "foo",
"fs.s3.awsSecretAccessKey" : "bar",
"fs.s3n.awsSecretAccessKey" : "bar"
}