Thanks for the explanation, Gian.
I have a follow-up question:
We now have a hot tier with r3.8xlarge nodes serving one month of data and are setting up a cold tier with i2.8xlarge nodes for serving up another 2 or more months of data.
Currently we have an in-tier replication level of 2 for both tiers.
Given that most users tend to query the last month of data, the cold tier CPU resources would not be utilized much, as they'd be just sitting around idling until a user happens to query older data. Is there a way to facilitate the cold tier better?
I was wondering whether there is a way to configure the cold tier to host the replicas of the hot tier in such a way that if a given query needed to scan more segments that there are CPU cores in the hot tier, the cold tier would "help out" with scanning the other segments and if such a setup doable, practical or even recommended?
I guess, for such a setup, Druid would ideally have to have some sort of cost-planner: If the number of segments to be scanned for a query meant that the hot tier's CPUs had to take 10 sequential passes, then it might make sense to let the cold tier participate in the scanning if the time for disk-to-memory mapping could be assumed to be taking less time than 10 sequential in-mem scans of the hot tier.
If something like that was configurable, it might reduce the query latency for long-period queries, woudln't it?
thanks
Sascha