Druid on Kubernetes

1,310 views
Skip to first unread message

Ramachandran Ramesh

unread,
Oct 4, 2016, 1:52:58 AM10/4/16
to Druid Development
Hi Team,
I just wanted to check with you if there is any interest behind deploying druid on a Kubernetes cluster. I have managed to get it working so if there is some interest I can make it more generic and open the source out. The image is actually built over the docker image already present in git.

Eric Tschetter

unread,
Oct 4, 2016, 9:35:27 AM10/4/16
to druid-de...@googlegroups.com
How are you handling updating the historical nodes?  Do they stay on the same mode so that they can load up the same segments they had already downloaded or do they have to download their segments over again when you deploy new code?

--Eric


On Monday, October 3, 2016, Ramachandran Ramesh <jayara...@gmail.com> wrote:
Hi Team,
I just wanted to check with you if there is any interest behind deploying druid on a Kubernetes cluster. I have managed to get it working so if there is some interest I can make it more generic and open the source out. The image is actually built over the docker image already present in git.

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/57d64595-c366-49b0-8ac8-4d903bebbb02%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ramachandran Ramesh

unread,
Oct 5, 2016, 7:12:29 AM10/5/16
to Druid Development
They have to be downloaded again on a new node. Kubernetes does not give control of the storage so we know the flavour of node that is used in the cluster and limit the historical cache's mazSize to be less that the max number of historicals that can run on a node. I still havent looked at network attached storage to see if it can help with persisting the segments.

-Ram

On Tuesday, October 4, 2016 at 7:05:27 PM UTC+5:30, Eric Tschetter wrote:
How are you handling updating the historical nodes?  Do they stay on the same mode so that they can load up the same segments they had already downloaded or do they have to download their segments over again when you deploy new code?

--Eric

Charles Allen

unread,
Oct 5, 2016, 12:26:00 PM10/5/16
to Druid Development
I'm using Marathon with hostname:UNIQUE constraint and just jailbreak the cache path (hacky, I know). I have not tinkered with persistent storage yet, but that is an aspect of Mesos/Marathon that is supposed to solve this scenario.

Charles Allen

unread,
Oct 5, 2016, 12:26:27 PM10/5/16
to Druid Development
Not exactly what you were asking, but containerization of druid can be done.

Ramachandran Ramesh

unread,
Oct 5, 2016, 9:53:00 PM10/5/16
to Druid Development
Right. Kubernetes 1.3 supports petset that could potentially solve this but I haven't explored it yet. My question was not whether containerisation and orchestration was possible but more of whether there is interest to have such work a part of the community itself. A production ready docker container that could get you up and running a production cluster in GCE in a few minutes.

Eric Tschetter

unread,
Oct 5, 2016, 10:18:02 PM10/5/16
to druid-de...@googlegroups.com
Yes, there's definitely appetite.

TBH, though, I'd have some issues recommending people use something
that doesn't handle the persistence of segments as a first-class
citizen. It can definitely make sense for some use cases and if
that's the limitation and it's clearly called out in whatever docs
exist, I think it will be useful to people. I just worry about how we
would endorse it and it would require certain caveats/explanations
about persistent storage. Those caveats are not always clear to a
user who didn't actually set it all up on their own.

--Eric

On Wed, Oct 5, 2016 at 6:52 PM, Ramachandran Ramesh
<jayara...@gmail.com> wrote:
> Right. Kubernetes 1.3 supports petset that could potentially solve this but I haven't explored it yet. My question was not whether containerisation and orchestration was possible but more of whether there is interest to have such work a part of the community itself. A production ready docker container that could get you up and running a production cluster in GCE in a few minutes.
>
> --
> You received this message because you are subscribed to the Google Groups "Druid Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
> To post to this group, send email to druid-de...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/88622f27-e628-49a4-bda5-6eabec9c0301%40googlegroups.com.

Ramachandran Ramesh

unread,
Oct 6, 2016, 8:51:41 PM10/6/16
to Druid Development
That's perfect. I understand your concerns with the persistent storage. Let me look at the newer version of kubernetes to see if any of the new features can remove the concerns that you have stated.

Thanks,
Ram

jakob....@codecentric.de

unread,
Nov 17, 2016, 3:06:23 AM11/17/16
to Druid Development
I create a Helm Chart (Kubernetes package manager) for Druid. So far its only tested on minikube and its still work in progress (many rough edges, bad configs etc..) but it somewhats works. 

You can find the code here: https://github.com/krallistic/druid-kubernetes and follow the few steps in the readme to run it.

I will try to refine it in the future but if people want to help, there is still a lot to do.
Reply all
Reply to author
Forward
0 new messages