what is the best distributed file system solution to coreos?

893 views
Skip to first unread message

Fernando Neto

unread,
Jun 30, 2015, 7:34:40 AM6/30/15
to coreo...@googlegroups.com
Hi everyone. 

I need to configure a coreos cluster, but im facing a big problem. i need to share data betwen all CoreOS node's in cluster and keep it persistent. 
i would like to know if anyone of you have some suggestion. 



Seán C. McCord

unread,
Jun 30, 2015, 8:46:00 AM6/30/15
to Fernando Neto, coreo...@googlegroups.com
I am biased, of course, but I think Ceph is a fine choice.  I run it on my own CoreOS clusters.



--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Seán C McCord
CyCore Systems, Inc

Félix Barbeira

unread,
Jul 1, 2015, 2:05:54 PM7/1/15
to coreo...@googlegroups.com, fernando.n...@gmail.com
I'm in the same situation wondering the same question :( 

These are the options I manage right now:

=== CEPH ===

Using ceph you have the following options:

- ceph block device: you are able to mount only a block device per coreos machine (loading the "rbd" kernel module), I think this option does not offer the possibility to share data between coreos nodes because a block device can only be mounted in one node. Correct me if I'm wrong but the same block device cannot be simultaneously mounted with RW perms in two coreos nodes. Maybe in a near future when the ceph people release the "RBD mirroring" feature this options could be a good choice. Nowadays it's "in process": http://tracker.ceph.com/issues/8569

- cephfs: it looks like it's not yet stable and I think requires to run docker containers in privileged mode in order to access to "/dev/fuse". I really not investigate a lot this option because like I said before it's not stable yet.

=== GLUSTERFS ===

It requires to run docker containers in privileged mode and install ceph client on every single container because coreos does not have the glusterfs client installed.

You can also use this solution, but for me it's not appropiate for production environments:



Conclusion: I really undecided about what to do, somebody has new ideas?

Jeffrey Ollie

unread,
Jul 1, 2015, 3:35:29 PM7/1/15
to Félix Barbeira, coreos-user, fernando.n...@gmail.com
On Wed, Jul 1, 2015 at 1:05 PM, Félix Barbeira <fbar...@gmail.com> wrote:
I'm in the same situation wondering the same question :( 

These are the options I manage right now:

=== CEPH ===

Using ceph you have the following options:

- ceph block device: you are able to mount only a block device per coreos machine (loading the "rbd" kernel module), I think this option does not offer the possibility to share data between coreos nodes because a block device can only be mounted in one node. Correct me if I'm wrong but the same block device cannot be simultaneously mounted with RW perms in two coreos nodes. Maybe in a near future when the ceph people release the "RBD mirroring" feature this options could be a good choice. Nowadays it's "in process": http://tracker.ceph.com/issues/8569

The limitation of only one host accessing a RBD at a time has more to do with the filesystem that you use than the RBD.  Most filesystems were designed using the assumption that only one kernel was accessing the underlying device.  I mean, you wouldn't hook a SATA drive up to two different servers at once and expect your ext4 filesystem to turn into anything but gibberish would you?

A single host can access multiple RBDs, but one RBD shouldn't be accessed by multiple hosts.
 
- cephfs: it looks like it's not yet stable and I think requires to run docker containers in privileged mode in order to access to "/dev/fuse". I really not investigate a lot this option because like I said before it's not stable yet.

I wouldn't mount the CephFS from inside the container.  Mount the CephFS on the host and then bind mount parts of the CephFS  into the container using --volume.

I personally really like Ceph but it's not something you can really "set and forget".  You'll need to have someone dedicate at least a portion of their time to managing and monitoring your Ceph system, and they'll want to keep a close eye on the Ceph mailing lists.  You can avoid a lot of pain by learning from the pain of others.

--
Jeff Ollie

Seán C. McCord

unread,
Jul 1, 2015, 5:42:51 PM7/1/15
to Jeffrey Ollie, Félix Barbeira, coreos-user, fernando.n...@gmail.com
I agree with everything Jeffrey said, but I would also add that mixing in Ceph's RGW (rados gateway... think Amazon S3) should not be understated.

Also, one of the chief focus areas for the next release of Ceph is to get CephFS production-ready.  I use it already, for various special shared data systems, but yes, it is definitely the weakest link.

It is perfectly fine to use RBD read-only multi-mount, as well as backing for various CoW volumes (like as with qcow2).

At the end, though, it is always best to properly separate your data as much as possible and use the existing cluster-aware data storage systems.

--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Griessen

unread,
Jul 3, 2015, 11:03:49 AM7/3/15
to coreo...@googlegroups.com
On 07/01/2015 04:42 PM, Seán C. McCord wrote:
> it is always best to properly separate your data as much as possible and use the existing cluster-aware data storage systems.

By separating, do you mean, "Keep non-shared data in per app backups, and shared data such as databases in separate
storage systems."?

Would you please tell us more about the existing cluster-aware data storage systems you know?

Seán C. McCord

unread,
Jul 3, 2015, 1:42:22 PM7/3/15
to John Griessen, coreo...@googlegroups.com
I'm just meaning that, while Ceph and distributed filesystems allow you to ignore proper container architecture where necessary, you _should_ architect your data appropriately (to container theory), if possible and as much as possible.

This includes such things as:
  * keeping your data separated from your containers
  * using cluster-aware databases (and generally, using databases instead of file storage)
  * use database-level replication
  * use cluster-scoped backing stores

When I need cluster-wide storage, I use Ceph.  However, I will always try _first_ to construct the system in a container-oriented fashion.  Examples:
  * instead of distributing configuration files, use etcd and/or confd
  * instead of bundling static assets with a container, use S3 or RGW
  * instead of having specific database instances, write replication management components into the container's start/stop routines


--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrew Webber

unread,
Jul 5, 2015, 11:18:50 AM7/5/15
to coreo...@googlegroups.com
We deploy couchbase docker containers in a cluster as our main database. There is a community project called couchbase file system that architecturally is similar to beegfs and ceph although simplified and provides infrastructure consolidation as it reuses the couchbase (memcached) databases as "metadata databases " which all of these storage solutions require. The advantage of this is that memcached/couchbase is a masterless database architecture not like other storage based solutions. The API for cbfs (couchbase file system) is an s3 like interface perfect for non legacy, start up like application. It's written in go so deployment is xcopy a binary and give it a drive to write too. Replication and and more importantly atomically replicated transactional uploads are also support. It's not for everyone especially if you're looking for a traditional mount posix style solution but we aren't due to our microservices architecture
Reply all
Reply to author
Forward
0 new messages