Has anyone used ceph or tahoe-lafs with one of these storage pods?

1,005 views
Skip to first unread message

Charles Smith

unread,
May 25, 2012, 5:51:07 PM5/25/12
to OpenStoragePod
http://ceph.com/
https://tahoe-lafs.org/trac/tahoe-lafs

Would love to hear about any implementations of any other distributed
storage systems deployed with these pods.

Thanks in advance.

John wallace

unread,
Mar 26, 2013, 2:36:06 PM3/26/13
to opensto...@googlegroups.com
HI Charles,

We are looking at using ceph this year and have been looking at the storage pods.  We've been running a distributed file system (AFS) for 20+ years and one of the issues with storage nodes is that you want them to be relatively small compared to your total storage since from time to time storage nodes either go off-line are are taken down for updates and when this is done the data needs to be moved (or restored) to another node.   So for distributed file systems, more smaller nodes work better then fewer large nodes. 

Let me know how you project worked out.

maximilie...@gmail.com

unread,
Jun 16, 2013, 5:43:37 PM6/16/13
to opensto...@googlegroups.com
We're also looking at using CEPH in few weeks. Have you finally tested CEPH or anything else ?
The software part seems the biggest problem of these PODs to me. I'd like to hear about it.

Michael Sage

unread,
Jun 17, 2013, 1:49:43 AM6/17/13
to opensto...@googlegroups.com

We used gluster on ours worked really well.

Michael

--
You received this message because you are subscribed to the Google Groups "OpenStoragePod" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openstoragepo...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Hareem Haque

unread,
Jun 18, 2013, 9:48:11 AM6/18/13
to opensto...@googlegroups.com
Hi Micheal.

Using Glustefs how did you manage to handle failover and recovery.

jason google

unread,
Jun 18, 2013, 6:58:30 PM6/18/13
to opensto...@googlegroups.com

G'day Michael,

And further to that, do you use glusterfs across multiple pods and you can deal with the loss of a whole pod ? or individual volumes created within a pod or?

As per a previous comment i've been thinking about how to deal with losing a 'large' chunk of storage in one hit.. where you have to take a whole pod down..

regards,

-jason

Michael Sage

unread,
Jun 19, 2013, 1:55:43 AM6/19/13
to opensto...@googlegroups.com

It handles it in "blocks" so we just made sure that enough of them were running on different pods.

Kent Kovac

unread,
Jun 26, 2013, 2:05:59 PM6/26/13
to opensto...@googlegroups.com
I use ZFS/AOE/S.M.A.R.T to run my pods.  Easy and supports our mini-cloud arch nicely.  We have between 10 - 50 users on the cloud and have no issues with I/O over the network to the pods.  Point of fact, we use 8 1xGB connections per box, trunked, into our cloud.

Kent

Ouroboros

unread,
Jun 26, 2013, 9:41:14 PM6/26/13
to opensto...@googlegroups.com
Out of curiosity, what do you have running on your pods, and roughly what is the overall storage layout for your cloud system? Are you running FreeBSD or a OpenSolaris flavor or Linux with the separate ZFS kernel driver, or are you simply dealing out AoE from the pods and using a ZFS frontend?


--

Kent Kovac

unread,
Jun 27, 2013, 3:15:35 PM6/27/13
to opensto...@googlegroups.com
I have provisioned the datastore into 5 zfs pools, 8 + 1.  This setup gives me ~11TB per slice and ~55TB total.  This is a reduced size from that boasted by BB but I am using 1.5 TB drives and am more worried about long term storage.  I export each of ZFS pools to the cloud stack and then run a mirror between 2 pods on the VM host.  This way I am duplicating my data (not running any de-dup as everything I have is image based and unique) as well as allowing pull from both servers on read.  My OS is CentOS-6.4 minimal install with AoE and ZFS installed.  It is running on a private network and has a complete iptables lockout except for SSH.  I run S.M.A.R.T.D. on each drive (high level) ~30 minute intervals and low level S.M.A.R.T.D. each night.  Terminal disks are degraded and the ZFS rebuilds off the hot-swap drive.  I am planning on adding in some SSDs to speed up ZFS and dropping in (for a final total) 64GB of RAM.  We also speed things up by putting our archival data on the outer part (slower part) of the HDD platter and let the high-use occur on the inner part of the HDD platter.  

jason google

unread,
Jun 27, 2013, 6:46:36 PM6/27/13
to opensto...@googlegroups.com
G"day Kent,

Do you have any hard performance numbers you can share on this configuration ?

iozone or bonnie++ stuff or even simply 'dd' numbers with a variety of block sizes to establish streaming performance.

(or if someone else has better ideas on how to do performance metrics?)

For example it would be really interesting to know if going to 64G of ram (because of ZFS?) makes a bigger difference than using SSDs for your l2arc and/or zil.

I'd always wondered as well - do people put in separate SSDs for l2arc and zil or do they put in one for each to separate out I/O profiles ?

Why AoE instead of iSCSI btw ? just better latency from your OSP ?

regards,

-jason

Ouroboros

unread,
Jun 27, 2013, 10:27:53 PM6/27/13
to opensto...@googlegroups.com
My environment uses free ESXi 5.1 with iSCSI, and I don't mind using a VM based measurement, so I use the VMware IO Analyzer OVA VM appliance, which is basically IOmeter packaged up nicely. I had originally wanted to build a proper pod for the the environment, but the stars didn't line up to make it happen.

I am now using a single Supermicro 847 chassis loaded with Nexenta (OpenSolaris variant) which uses ZFS, exporting iSCSI over 4x1Gbit with SSD ZIL/L2ARC and a single pool of 12 disks (2 RAIDZ2). The Nexenta has plugin packages for IOmeter and bonnie++ internally, but since this is network storage I would rather see the performance at the VM. The VMware analyzer appliance is clocking max 60K RW IOps, though to be fair, there is some VAAI acceleration along with iSCSI ATS command support improving performance. The new 847D chassis (OEM only unfortunately and single path SAS at that) that holds 72 disks in 4U seems to be a monster but I bet it runs really hot.

While ZFS loves main memory, it never hurts to have good internal capacitor backed SSD's for ZIL if you are using a high ZFS version that allows for removing ZIL (use enterprise SATA/SAS SSD but even then, read the spec sheet for power protection info) to speed up and smooth out the HDD writes, and L2ARC is usually very helpful provided you can store the mapping info fully in main memory (if you can't, performance can drop). As a general rule, you shouldn't combine ZIL/L2ARC on the same SSD via partitions (you should dedicate a whole device)(if you were going to combine, partition a SSD for multiples of the same usage type, so say 2 SSD with multiple ZIL on one device and multiple L2ARC on the other), but using whole devices typically leads to needing at least 2 SSD's per ZFS pool (though you can specialize the SSD types, using smaller/faster SSD's for ZIL typically in the 40GB or less range, and bigger/slower/non-enterprise SSD's for L2ARC typically in the 200GB+ range).

I can see the merit of AoE export to the hypervisors, which RAID mirror the AoE feeds, as an alternative to trying to run HA clustering software on the pods themselves. ESXi doesn't natively have the capability of RAID mirroring network storage at the hypervisor, and there is no generic AoE driver (Coraid has an AoE driver but that is tied to their HBA network cards).


Reply all
Reply to author
Forward
0 new messages