Druid as managed service in Cloudera CDH

1,194 views
Skip to first unread message

Kenji Noguchi

unread,
Nov 4, 2016, 6:36:25 PM11/4/16
to Druid Development
I started working on integrating Cloudera CDH5.8 and Druid.   Is there anyone working on the same idea?

I basically need to create two stuff: Parcel and CSD.

Parcel is just a repackage of druid distribution with meta data.  It's already done.

CSD is a customer service descriptor that adds Druid service in CDH.
- assign Druid roles to cluster hosts (overload, coordinator, historical, middlemanager)
- configure Druid for the Metadata storage, Zookeeper, HDFS, Kafka
- control start / stop
- upgrade, migration (I'm not familiar with this part though)



Cheers,
Kenji Noguchi





Fangjin Yang

unread,
Nov 23, 2016, 1:56:41 PM11/23/16
to Druid Development
Hi Kenji, not too familar with CDH but I know some folks that have integrated Druid with Ambari, so hopefully there some similarities there. Hopefully those folks can weigh in.

Kenji Noguchi

unread,
Mar 11, 2017, 12:20:11 AM3/11/17
to Druid Development
Last week I resumed my Cloudera CSD (custom service descriptor) development for Druid as our Cloudera Hadoop servers got RAM upgraded, and finally have headroom to run extra apps.

Here is my 1st attempt of the .  It's beta quality.   I was able to setup 30 nodes Druid in 15min.   

Attached are some screenshots for the status of the services, configuration, and list of instances.

I haven't looked at Ambari integration but I'm interested in what they manage and validate configurations especially for the JVM heap size, number of threads.   My CSD does not have any safeguard at the moment.

Thanks,
Kenji





services.png
configuration.png
instances.png

Dongkyu Hwangbo

unread,
Apr 3, 2017, 9:17:01 PM4/3/17
to Druid Development
I already tested your stuff! What a great job!
But, I wish that It's more helpful to user if It can collect druid's metric.

2016년 11월 5일 토요일 오전 7시 36분 25초 UTC+9, Kenji Noguchi 님의 말:

Kenji Noguchi

unread,
Apr 4, 2017, 3:22:29 PM4/4/17
to Druid Development
# I accidentally replied privately.  Reposting.

Thank you for trying it out, Dongkyu.

Yes, the metrics are definitely on my TODO list.   I will see if I can generate .mdl (metric descriptor language) that takes the Druid Metrics via Druid Emitters.

I haven't really explored the capability of CDH for the metrics, but it shouldn't be difficult looking at the Kafka example.


Other TODO items are:
- expose more configuration properties
- add configuration validators and safe guards for memory and threads configurations
- add pull-deps command for plugins
- add rolling restart
- test Druid upgrade
- clean up the CSD and parcel build tools for production use.
- setup repo

Some configuration values are not user serviceable for now.  The deep storage support is pretty much HDFS only.
Nonetheless, I was able to ingest via Kafka Indexing Service as well as batch ingestion using Hadoop.

At the moment I'm working on ETL extension.  I will improve the CSD after the ETL.


kamusahamnida,
Kenji Noguchi

Jason Heo

unread,
Jul 19, 2017, 5:49:55 AM7/19/17
to Druid Development
Hi Kenji,

Great job!

By the way, I'm wondering I should use CDH 5.8 or higher version.

Regards,

Jason


2016년 11월 5일 토요일 오전 7시 36분 25초 UTC+9, Kenji Noguchi 님의 말:
I started working on integrating Cloudera CDH5.8 and Druid.   Is there anyone working on the same idea?
Reply all
Reply to author
Forward
0 new messages