Has anyone tried running druid with Azure Data Lake or Azure HDInsights?

1,419 views
Skip to first unread message

Imtiaz Ahmed

unread,
Apr 25, 2017, 5:38:49 PM4/25/17
to Druid User
Hi,

I was wondering if anyone tried running druid with Azure Data Lake or Azure HDInsights? We are trying to build out a druid cluster on Azure and I couldn't find anything on searches regarding that. The only thing I found was azure-extensions which makes use of Azure Blob Storage.

Any pointers are appreciated.

Thanks,
Imtiaz

Victor Blomqvist

unread,
Jun 30, 2017, 7:56:31 AM6/30/17
to Druid User
Hi,

We (Bannerflow) have been running Druid on Azure for about half year. I realize that this reply is a bit late, I just saw this question today and I think the topic is still relevant.

Our setup is like this
1. Data comes in to Event Hub
2. A spark streaming jobs reads from EH and put it to druid for near realtime ingestion
3. A spark job does hourly batch processing of data from EH, put it to Azure Blob Storage and finally triggers druid Hadoop indexing job to ingest it from the blob storage

- Druid itself is running on custom VMs (just noticed a couple of days ago that the latest HDInsights version provides a preview of Druid, maybe we will utilize that in the future)
- We use HDInsights cluster for Spark & Hadoop
- Azure Blob storage for storage. (Data Lake only recently became available in the region where we have druid when we started, but it would be interesting to try now since Blob have some problems)

In order to get it to work we had to do some small patches to the azure extension and the Hadoop indexing, otherwise it wouldnt work with the wasb protocol used by Azure Blob Storage. Our druid version is 9.1.1, and because of our custom changes we havent had the time to try upgrading to the latest version yet. I hope the later versions will work out of the box without our patches :)

In general Druid itself has been running very stable and without any problems. However, spark streaming from Event Hub has required a lot of work to run smoothly.

/Victor

vinayak sudheer

unread,
Nov 21, 2017, 11:47:16 PM11/21/17
to Druid User
Hi Victor

Can you explain the patches done to azure extension and Hadoop indexing.

We are working on something similar.And facing issues with wasb protocol

Thanks
Vinayak

Victor Blomqvist

unread,
Nov 22, 2017, 4:34:15 AM11/22/17
to Druid User
Hello Vinayak,

The fork with our changes are available here: https://github.com/nordicfactory/druid in the branch wasb-integration
Note that as I wrote its based on Druid 0.9.1.1 and we havent had time to look into whats needed for newer versions of druid.

/Victor
Reply all
Reply to author
Forward
0 new messages