Hi,
We (Bannerflow) have been running Druid on Azure for about half year. I realize that this reply is a bit late, I just saw this question today and I think the topic is still relevant.
Our setup is like this
1. Data comes in to Event Hub
2. A spark streaming jobs reads from EH and put it to druid for near realtime ingestion
3. A spark job does hourly batch processing of data from EH, put it to Azure Blob Storage and finally triggers druid Hadoop indexing job to ingest it from the blob storage
- Druid itself is running on custom VMs (just noticed a couple of days ago that the latest HDInsights version provides a preview of Druid, maybe we will utilize that in the future)
- We use HDInsights cluster for Spark & Hadoop
- Azure Blob storage for storage. (Data Lake only recently became available in the region where we have druid when we started, but it would be interesting to try now since Blob have some problems)
In order to get it to work we had to do some small patches to the azure extension and the Hadoop indexing, otherwise it wouldnt work with the wasb protocol used by Azure Blob Storage. Our druid version is 9.1.1, and because of our custom changes we havent had the time to try upgrading to the latest version yet. I hope the later versions will work out of the box without our patches :)
In general Druid itself has been running very stable and without any problems. However, spark streaming from Event Hub has required a lot of work to run smoothly.
/Victor