Cassandra on Kubernetes with some kind of persisted volume

367 views
Skip to first unread message

Rafael Chacón

unread,
Sep 13, 2015, 4:59:06 PM9/13/15
to Containers at Google
Hi all,

I'm playing with the Cassandra example in https://github.com/kubernetes/kubernetes/tree/master/examples/cassandra and I was wondering which kind of volume type I can use that is not emptyDir. At first I thought in gcePersistentDisk but that can only be mounted by one pod in read-write. Has anyone tried this before?

Best,

Rafael. 

Tim Hockin

unread,
Sep 13, 2015, 5:39:16 PM9/13/15
to Containers at Google
You can use NFS now.

We're still considering how to do replicated compute with storage
properly. It's a fun problem, but sort of tricky.
> --
> You received this message because you are subscribed to the Google Groups
> "Containers at Google" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-contain...@googlegroups.com.
> To post to this group, send email to google-c...@googlegroups.com.
> Visit this group at http://groups.google.com/group/google-containers.
> For more options, visit https://groups.google.com/d/optout.

Brendan Burns

unread,
Sep 14, 2015, 11:45:36 AM9/14/15
to Containers at Google
You really shouldn't need anything other than emptyDir in general.  Cassandra takes care of it's own replication.

If you are worried about persistence, you can make the very first member of the cassandra cluster a special snowflake and have it mount a GCE persistent disk, and then all of the other cassandra replicas can just use their local emptyDir for persistent storage, that's probably better than using NFS.

Personally, I'd use emptyDir with a sidecar container that periodically snapshots the directory up to Google Cloud Storage (or someother cloud storage)

You could use something like:


Rafael Chacon

unread,
Sep 14, 2015, 12:57:09 PM9/14/15
to google-c...@googlegroups.com
Oh that makes sense. Thanks. Do you mean it would be something like the first node is on a replicationController of size 1 with the GCE as the volume and then the others in a different one with  the emptyDir?

Rafael.
You received this message because you are subscribed to a topic in the Google Groups "Containers at Google" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-containers/fW44zEbtMXs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-contain...@googlegroups.com.

Ward Harold

unread,
Sep 15, 2015, 5:41:53 PM9/15/15
to Containers at Google
We've tried a couple of approaches.

Initially we created persistent disks in GCE, mounted them on selected VMs in our GKE cluster, and then used nodeSelectors to make the Cassandra pods, which used hostPath mounts, run on those nodes. That worked fine but the manual mounts were going to be a problem if we wanted to do an automated GKE node upgrade.

Next we created persistent disks and used persistent volumes and claims to avoid the manual mounts. That worked as advertised but forced us to do separate replication controllers for each of our Cassandra nodes which isn't a terrible thing. The big win, we hope, is that it will allow us to to an automated GKE node upgrade when the time comes.

Brendan's solution is simple and clean, and basically what we do in out dev and qa namespaces, but wouldn't handle "big data" unless you made very large root partitions on your nodes.

Rafael Chacón

unread,
Sep 24, 2015, 1:58:46 AM9/24/15
to Containers at Google
Thanks for the feedback!

I ended up doing something like Ward mentioned and it has been working nicely. 

Best,

George Antoniadis

unread,
Oct 13, 2015, 11:31:12 AM10/13/15
to Containers at Google
I'm currently trying to get around the same issue.
I get that the RC does not have access to AWS/GCE to create volumes but I assumed that there would be a way to pick an available PV from a pool in order to allow the RC be able to scale something.

Having a "master" cassandra node in a kubernetes environment seems to me missing the forest for the trees.
Is there a best practice for data storage for dbs etc, or is it just using the emptyDir?

Rafael would it be possible for you to share your solution/configs? I would be very interested in checking out how you decided and managed to solve this.

Thank you all very much in advance! :)
George

Brendan Burns

unread,
Oct 13, 2015, 11:49:19 AM10/13/15
to Containers at Google

Yes, this is too hard/awkward right now and it *should* work out of the box but we haven't had a chance to implement it.  The basic idea is to have a generic PV claim on a pool of PVs and then create a replication controller style loop manage the size of the pool of PVs.

There are relevant designs in github.  If someone has time to implement it...

Otherwise, I'd guess we'll get to it in early 2016.

Sorry!
Brendan

--

David Oppenheimer

unread,
Oct 13, 2015, 4:10:34 PM10/13/15
to google-c...@googlegroups.com
https://github.com/kubernetes/kubernetes/pull/14095
has our latest thinking on this, but as Brendan said, it's not fully implemented yet.

is a description of what's currently implemented.

Rafael Chacón

unread,
Oct 13, 2015, 11:07:35 PM10/13/15
to google-c...@googlegroups.com
Hi George,

Right now I'm not using the PV claims. Here are some gists with an example of the way I'm setting up:


It's not ideal as I have to create manually both the volume and a new replication controller for every cassandra container that I want to add, but for now it's doing the job. 

I also found some other issues when using emptyDir with Cassandra. Sometimes after having the cluster running for a long time, when some pods died and got restarted by the replicationController, Cassandra was thinking it was a new node joining the ring and that another node was down. Depending on the replication level of the keyspace it was causing troubles. Not sure if someone else have found similar issues. 

Hope it helps!

Best,



You received this message because you are subscribed to a topic in the Google Groups "Containers at Google" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-containers/fW44zEbtMXs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-contain...@googlegroups.com.

To post to this group, send email to google-c...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-containers.
For more options, visit https://groups.google.com/d/optout.



--
Rafael Chacón


Lakhindr

unread,
Feb 19, 2016, 5:43:51 PM2/19/16
to Containers at Google
Could some labels be used with some regexp like matching criterion to select PVs for a certain PVC? Additionally, this labeling might help in geographic affinity -- which can help deliver a better network utilization.
Multiple label matching would give a fine grained control, and without any special instrumentation.

BTW, what is a 'special snowflake', that you have referred to in your reply above?

Thanks.

Alex Sabadyr

unread,
May 24, 2016, 9:04:31 PM5/24/16
to Containers at Google
Any news for this topic?
Should we really create 3 controllers to set up a cluster of 3 cassandra nodes each having a separate persistent storage?

Rodrigo Campos

unread,
May 24, 2016, 9:54:09 PM5/24/16
to google-c...@googlegroups.com

On Tuesday, May 24, 2016, Alex Sabadyr <alexande...@gmail.com> wrote:
Any news for this topic?
Should we really create 3 controllers to set up a cluster of 3 cassandra nodes each having a separate persistent storage?

Regarding the volume matching in the mail you quote, a proposal has been merged recently (I'm on my phone, can't look for the link. You can see the proposals dir in kubernetes master).

And support for "pet sets" is on it's way with kubernetes 1.3 (I think it will be alpha in kubernetes 1.3)
Reply all
Reply to author
Forward
0 new messages