druid data server: why ec2 i3?

43 views
Skip to first unread message

Ellen Shen

unread,
Jun 23, 2022, 7:47:55 PM6/23/22
to Druid User
https://druid.apache.org/docs/latest/tutorials/cluster.html#data-server

is there a reason i3 is recommended? the SSD comes with i3 is instance store, 
if we perform a Stop/Start on the i3 instance, data stored in the instance store volume will be lost. this means start/stop EC2 will lose all the local disk cache.

Tijo Thomas

unread,
Jun 24, 2022, 8:51:31 AM6/24/22
to druid...@googlegroups.com
For Druid  it's good to have high speed ssd disk e.g. nvme ssd disk for the segment cache and  better network bandwidth , i3 instances have these both and are also very much cost effective when compared to other types of instance. 

Regarding your second point : In my understanding it's an option you can enable while creating the instance to persist the disk while stopping the node. Another option is to mount the disk and format it while starting the node. This is  even better in the case of Druid as druid has the capability to pull the segment from deepstorage. In fact the later option will save some Dollar in case your use case demands keeping the nodes down while not in use.   

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/a9283bfc-8a5a-4c01-b348-883e2b0633afn%40googlegroups.com.


--
Tijo Thomas
Solutions Architect  | => Imply , Bangalore , India  

Doug Byrne

unread,
Jun 24, 2022, 2:28:30 PM6/24/22
to Druid User
Yeah, I would agree that the high performance of the disk is the reason to use those instance types. We are using i3en instances currently. 

Instance store volumes can't be persisted after the instance is stopped. They are not lost on a reboot, but there is no option to keep the volume after stopping or termination. Therefore we rely on the deep storage to persist the data. There is a danger, if too many instances are lost at once, the data may be unavailable while it is loaded from deep storage.

Ellen Shen

unread,
Jun 24, 2022, 3:21:58 PM6/24/22
to Druid User
Thanks, doug and Tijo! 
Reply all
Reply to author
Forward
0 new messages