Question regarding Torus

423 views
Skip to first unread message

Jelle Holtkamp

unread,
Jun 3, 2016, 10:34:41 AM6/3/16
to CoreOS Dev
Hi all,

I am interested in your new product Torus. I have no experience with CoreOS yet, so I do not know the full capabilities or restrictions it has. Anyway, the blog post states that it uses 'multiple ways' to expose itself to user applications. I am looking into numerous distributed storage software solutions, which have to be able to:

  • Provide constant high speed remote storage
  • Provide highly available PXE boot capabilities for Linux clients using NFS
  • Preferably be able provide these services in a 2 node cluster

The PXE boot does not necessarily need to be extremely fast, as long as the OS can boot in a reasonable time (at most 3 minutes). So some added overhead of running NFS over a block level distributed solution would not be a big problem, because once the OS is loaded it does not really do much besides writing data to the high speed remote storage. Of course if the NFS storage can provide the necessary I/O it would be perfect to use as both PXE boot and high speed remote storage. It is also a big preference if this can be done in a two node cluster, meaning that there is no external NFS server that has the storage devices mounted and exports it to the clients. 


Anyway, my question is:


Does Torus provide native NFS support, or

Would it be possible to run an NFS server on the storage nodes and use ip failover (pacemaker/corosync?) to make it highly available?


Unless Torus has built-in failover capabilities for an active/passive setup, one possible way I see is creating an active/active setup by having the storage nodes themselves mounting the NBD devices  format it as OCFS2, NFS export it on all nodes and use corosync+pacemaker to provide a floating IP to which the clients can connect. 


Looking forward to the response, as Torus seems very promising!

barak.m...@coreos.com

unread,
Jun 3, 2016, 4:54:58 PM6/3/16
to CoreOS Dev
Hi!

So no, no native NFS support; that's a bit orthogonal. You could run NFS on the storage nodes; for booting purposes this is probably fine, but it's not high-speed storage yet. OCFS2 is a trick we haven't tried (and it wouldn't work, currently, block devices get locked when mounted to prevent people from doing the wrong thing; using something like OCFS2 is possible, but we need to do a relatively small amount of work to allow folks like yourself to relax that restriction). 

Since high availability is your goal, and it's explicitly two nodes, I suppose you could use Torus as effectively a replication tool, such that if one of the hosts goes down, the other still has the data (and can serve it). There's no built in failover for the mount point -- that's more in the realm of Kubernetes or another scheduler, same with the floating IP notion. 

I'm glad you're interested, and welcome giving it a try and finding bugs, but it's also a brand new prototype project, so don't expect too much from it. ;) PXE booting is within scope for what it might be able to help you with, though. 

--Barak

Jeffrey Ollie

unread,
Jun 3, 2016, 6:48:24 PM6/3/16
to coreo...@googlegroups.com
On Fri, Jun 3, 2016 at 3:54 PM, <barak.m...@coreos.com> wrote:

I'm glad you're interested, and welcome giving it a try and finding bugs, but it's also a brand new prototype project, so don't expect too much from it. ;) PXE booting is within scope for what it might be able to help you with, though.

So how does this compare to Ceph? From what I've read so far it looks like Torus is competing feature-wise with a Ceph cluster that's providing RADOS block devices. Just curious where Torus is going to differentiate itself.

--
Jeff Ollie

Seán C. McCord

unread,
Jun 4, 2016, 10:30:37 AM6/4/16
to coreo...@googlegroups.com
Ceph is a much larger, more generalized project.  It was designed, first, to provide highly-scalable storage solutions on commodity hardware.  In more recent years, it had tried to position itself as a software-defined storage solution.  In a number of ways, it is.  However, it was built back in the days of real, physical, defined and deployed everything.  That doesn't translate well into the modern cluster era where everything is dynamic.  It defines components primarily by IP address.  Its clustering technology is old, very complicated, and hopelessly intertwined with the storage stack as a whole.  This is not to say it doesn't work or is a bad solution... it does work, and it is implemented by many truly enormous, production-grade clusters.  Its main problem is that it just doesn't fit into a modern, container-oriented, cloud-native-oriented, dynamic cluster.

Torus is, well, brand new.  It comes at the problem from a different perspective, and with different tools.  For one, it leverages the clustering tools of etcd.  It cannot be overstated how important this abstraction is.  etcd _handles_ consistency.  It _handles_ locking.  It handles these features in a way that only truly specialized pieces of software can do.  It makes a huge swathe of logic unnecessary, because it is already implemented as primitives in etcd.  That means what is left is the business logic of storing bits in ways which are accessible, redundant, scalable, and performant.  While this is no small feat, being able to assume and trust the consistency of your metadata makes the storage logic much easier.  It allows abstracted and complex storage profiles to be overlayed onto a common set of manipulations.  It allows endpoints and stores themselves to be abstracted and proxied (Ceph does this, too, but they had to roll their own, and as a result, it is not nearly so flexible).

There are many other differences in the details, but primarily, while they're both providing scalable, redundant block storage, Torus is built for the modern cluster.


--
Seán C McCord
CyCore Systems, Inc

Jelle Holtkamp

unread,
Jun 8, 2016, 5:56:25 AM6/8/16
to CoreOS Dev
Thank you for your replies. We actually use Kubernetes as well and have another project pending for using distributed storage for that, so I was hoping that maybe we could kill two birds with one stone. I understand high availability NFS is outside of the scope of Torus (for now at least), but Torus still looks very promising for Kubernetes so we might still end up using it in the near future.

Regards
Reply all
Reply to author
Forward
0 new messages