Hi Team,
We are new to Druid. In our Druid cluster setup, we are using HDFS as DeepStorage. After verifying the documentation/posts, it seems that Druid will copy the segments from DeepStorage (HDFS) and store them in the DataNodes(local Storage) to make them available for queries.
My question is, if I plan to ingest 100TB of data into Druid with a replica factor of 2 in both HDFS and Druid, do I need to have a minimum of 200TB of storage in both Druid's data nodes and HDFS to store that 100TB of data?
In other words, if the replica factor is the same in HDFS and Druid, do I need to have an equal amount of space in Druid data nodes as I have in HDFS? If yes, is there any alternative way to resolve/reduce the storage capacity of Druid data nodes by fetching uncached data from HDFS on demand?
we are using druid version 28.0.1
FYI.
Thanks,
Ananthan.