Coming from OSX software raid...

kareldc

unread,

Sep 2, 2012, 3:18:56 PM9/2/12

to zfs-...@googlegroups.com

Hi,

I'm interested in adopting ZFS but I had a few questions.

Right now I have a hackingtosh and I've set up a pair of hd's in mirror with the OSX software raid. However, I have had a few bad experiences... apparently when you move the drives to a different sata port your mirror raid volume gets degraded. One of the drives kinda get lost apparently.

Can anyone tell me if ZFS is better in that aspect? Can I pop drives in and out, increasing raid structures and move them around?

I don't intend to move them around but the native OSX software raid just seems way to fragile. It would be nice to be able to do some maintenance without nerve wrecking fear.

Also should I do a clean install of the os. Do I just need to reinstall ZFS to get things running again, that simple?

Cheers,

K.

Daniel Bethe

unread,

Sep 2, 2012, 3:29:03 PM9/2/12

to zfs-...@googlegroups.com

Hello and welcome! Yes you are absolutely correct -- all your wildest dreams are about to come true....

....and more!

:) ZFS maintains a unique ID per device, so they wont even get lost if you pack one up inside USB! :-o But don't do that unless absolutely forced ;) Or if you put them in another machine. Running another OS.

Yeah just reinstall ZFS and kaboom, there you are. I have it installed on my Mac OS installer USB stick. Just be sure to follow the Getting Started guide if you want them to automatically load upon boot. I am wondering if we can optimize that guide using the 'gpt' command but we'll see. And read the FAQ and the rest of the wiki if you like.

Please let us know how it goes and what your system specs are.

From: kareldc <kar...@gmail.com>
To: zfs-...@googlegroups.com
Sent: Sunday, 2 September 2012, 12:18
Subject: [zfs-macos] Coming from OSX software raid...

--

Boyd Waters

unread,

Sep 2, 2012, 3:38:26 PM9/2/12

to zfs-...@googlegroups.com

Important caveat:

ZFS doesn't let you add devices to an existing "RAID" unless you follow some rules.

Or at least, that used to be a restriction. Before I get into the rules that I know about, can someone with current Mac ZFS experience chime in? Can you expand a RAIDZ1 to RAIDZ2, and then on to adding more storage to an existing RAIDZ?

The rule that I follow, for my concatenation of mirrored 2TB drives, is that I expand the pool with same-sized mirrored pairs. (I think that this policy of mine is more restrictive than what ZFS requires.)

Sorry if this is confusing.

But I do recall running into some limits when I wanted to go RAIDZ, and then later wanted to grab that disk drive on sale and slap in in there.

ZFS is way more capable than Apple's "RAID", but less flexible than (for example) a Drobo or ReadyNAS. Or previous versions of Windows Home Server.

Daniel Bethe

unread,

Sep 2, 2012, 3:57:32 PM9/2/12

to zfs-...@googlegroups.com

>Or at least, that used to be a restriction. Before I get into the rules that I know about, can someone with current Mac ZFS experience chime in? Can you expand a RAIDZ1 to RAIDZ2, and then on to adding more storage to an existing RAIDZ?

As with any ZFS implementation, no you can't change a raidz1 to raidz2, but you can concatenate any devices. You could take that raidz1 and add any other device -- most sensibly, another equivalent raidz1. Then they're two, striped together, as one, automagically.

>The rule that I follow, for my concatenation of mirrored 2TB drives, is that I expand the pool with same-sized mirrored pairs. (I think that this policy of mine is more restrictive than what ZFS requires.)

Yeah but it'll make them into effectively same-sized pairs. The smallest size prevails.

>But I do recall running into some limits when I wanted to go RAIDZ, and then later wanted to grab that disk drive on sale and slap in in there.

Sadly that is true.

>ZFS is way more capable than Apple's "RAID", but less flexible than (for example) a Drobo or ReadyNAS. Or previous versions of Windows Home Server.

Yeah that is true, unless you consider it to be more flexible, that an incident of data corruption on ZFS doesn't necessarily force you to restore from backups or else send your array in to a professional data recovery service. Some people have different ideas of flexibility, but yeah most of us do wish that we could just toss around whatever resources we feel like. The word with the upstream ZFS engineers is that they roughly know how to do it, but it's really hard.

Björn Kahl

unread,

Sep 2, 2012, 4:27:27 PM9/2/12

to zfs-...@googlegroups.com

Hi kareldc,

welcome to maczfs!

Am 02.09.12 21:18, schrieb kareldc:

> Hi,
>
> I'm interested in adopting ZFS but I had a few questions.
>
> Right now I have a hackingtosh and I've set up a pair of hd's in mirror
> with the OSX software raid. However, I have had a few bad experiences...
> apparently when you move the drives to a different sata port your mirror
> raid volume gets degraded. One of the drives kinda get lost apparently.
>
> Can anyone tell me if ZFS is better in that aspect? Can I pop drives in and
> out, increasing raid structures and move them around?

Yes and no.

ZFS organizes drives in so-called vdevs ("Virtual Devices"). You can
pop drives in and out of top-level vdevs *if* the vdev is a redundant
one. *But* there is absolutely no way to ever remove a top-level vdev
from a ZFS pool. It is plain impossible.

So *please* be very careful what you type, when you add drives. It is
way to easy to accidentally add another stripped top-level vdev,
instead of adding the drive to an existing mirror or make two new
drives into a new top-level mirror, which then can not be removed
again without a backup-destroy-restore cycle.

(However, you can promote an accidentally added single-drive top-level
vdev to a mirror, to at least have redundancy. And you can exchange
the accidentally added disk with another one of same or bigger size)

> I don't intend to move them around but the native OSX software raid just
> seems way to fragile. It would be nice to be able to do
> some maintenance without nerve wrecking fear.
>
> Also should I do a clean install of the os. Do I just need to reinstall ZFS
> to get things running again, that simple?

No need to reinstall the OS, just grab one of our installer and you
are set. Strictly speaking, it is not even necessary to reboot after
install, but nevertheless still recommended for safety. It is also
quite simple to uninstall, again even without absolutely needed reboot.

Best

Björn

--
| Bjoern Kahl +++ Siegburg +++ Germany |
| "googlelogin@-my-domain-" +++ www.bjoern-kahl.de |
| Languages: German, English, Ancient Latin (a bit :-)) |

signature.asc

Richard Elling

unread,

Sep 2, 2012, 4:40:35 PM9/2/12

to zfs-...@googlegroups.com

On Sep 2, 2012, at 1:27 PM, Björn Kahl <googl...@bjoern-kahl.de> wrote:

Hi kareldc,

welcome to maczfs!

Am 02.09.12 21:18, schrieb kareldc:
Hi,

I'm interested in adopting ZFS but I had a few questions.

Right now I have a hackingtosh and I've set up a pair of hd's in mirror
with the OSX software raid. However, I have had a few bad experiences...
apparently when you move the drives to a different sata port your mirror
raid volume gets degraded. One of the drives kinda get lost apparently.

Can anyone tell me if ZFS is better in that aspect? Can I pop drives in and
out, increasing raid structures and move them around?

Yes and no.

ZFS organizes drives in so-called vdevs ("Virtual Devices"). You can
pop drives in and out of top-level vdevs *if* the vdev is a redundant
one. *But* there is absolutely no way to ever remove a top-level vdev
from a ZFS pool. It is plain impossible.

This, as written, is not the correct use of the terminology. Logs, spares, and

cache devices are also top-level vdevs that can be added to or removed

from a pool.

A more precise description is: you cannot remove a top-level vdev from the

dynamic stripe of the main pool.

So *please* be very careful what you type, when you add drives. It is
way to easy to accidentally add another stripped top-level vdev,
instead of adding the drive to an existing mirror or make two new
drives into a new top-level mirror, which then can not be removed
again without a backup-destroy-restore cycle.

This is the definition of "add." The risk is that you will use "add" when you

meant "attach." I blame the English language for part of this confusion ;-).

-- richard

(However, you can promote an accidentally added single-drive top-level
vdev to a mirror, to at least have redundancy. And you can exchange
the accidentally added disk with another one of same or bigger size)

I don't intend to move them around but the native OSX software raid just
seems way to fragile. It would be nice to be able to do
some maintenance without nerve wrecking fear.

Also should I do a clean install of the os. Do I just need to reinstall ZFS
to get things running again, that simple?

No need to reinstall the OS, just grab one of our installer and you
are set. Strictly speaking, it is not even necessary to reboot after
install, but nevertheless still recommended for safety. It is also
quite simple to uninstall, again even without absolutely needed reboot.

Best

   Björn

--
|     Bjoern Kahl   +++   Siegburg   +++    Germany     |
| "googlelogin@-my-domain-"   +++   www.bjoern-kahl.de |
| Languages: German, English, Ancient Latin (a bit :-)) |

--

illumos Day & ZFS Day, Oct 1-2, 2012 San Fransisco

www.zfsday.com

Richard...@RichardElling.com

+1-760-896-4422

Daniel Bethe

unread,

Sep 2, 2012, 4:42:12 PM9/2/12

to zfs-...@googlegroups.com

>> Can anyone tell me if ZFS is better in that aspect? Can I pop drives in and
>
>> out, increasing raid structures and move them around?
>
> Yes and no.

Oh goodness, I misread that. You were talking about moving your hard drives around, and yes you can do that. But like everyone has said, you can't arbitrarily restructure.

You can plan forward strategies. Like I said, you can expand size by concatenating a new vdev. But you ideally don't want to lose redundancy that way. You ideally wouldn't want to add a raidz1 to a raidz2 to a mirror, just randomly. But if you can plan ahead, you could start a 4x1TB raidz1 now (3TB usable) and then add another 4x1TB raidz1 later (6TB usable). And so on. Or you could buy two drives at a time, adding them as mirrors, and that would be totally redundant and much faster.

You could do the old switcheroo, by building a raidz now and then later reclaiming your parity drive(s) for temporary storage as you copy your data to additional drives and then reassimilate the parity back in. I've done that.

These are generic ZFS concepts, so you can google them around. And you can experiment using raw files.

You're limited by your ability to plan and purchase, and by the saturation of your surrounding hardware resources -- number of ports, speed of controllers, etc. But if you're building a custom Mac clone, you're probably intending to be superior on those fronts compared to any Apple hardware anyway!

If you find any web sites which discuss such forward capacity planning strategies, let me know and I'll put em in the FAQ.

Gregg Wonderly

unread,

Sep 2, 2012, 6:30:18 PM9/2/12

to zfs-...@googlegroups.com

The important thing, is to make sure you really think about "add" vs "attach". Also, if you start building with "pairs" of disks, mirrored as vdevs in a single pool, you can slowing extend your space. You can grow the size of the disks, pair by pair to expand your space, and that gives you a lot of the behavior of the Drobo, overall, but not quite as flexible in what you can swap in.

The most important thing to understand about Drobo, is that it does not do media validation as ZFS does, and the end result, is that when tracks become marginal on one disk, and you lose another disk, completely, you are SOL.

With ZFS, you do get feedback from scrub, that things are going bad, and you will get repairs to data that in backed on recoverable media. In the case of mirrored pairs, the other disk can usually providing the correct version.

There are lots of things that could be done different in ZFS, to make it feel a lot like Drobo in terms of flexible vdev add, resize and remove. But even, without that, for data integrity, ZFS still provides the best protection you can get on disk drives it seems to me.

Gregg

> --
>
>
>

Raoul Callaghan

unread,

Sep 2, 2012, 6:59:18 PM9/2/12

to zfs-...@googlegroups.com

Hi guys,

Are there any zfs.sysctl parameters that can be tuned like there is for other BSD distros? (I assuming they exist in the Solaris src as well)

eg:

parameter: sysctl vfs.zfs.txg.write_limit_override=n
descscription:
You tune this variable to the max mb/s your HD can handle. this way there are a lot less write stalls.
Before that you would tune the txg wait time to 4s or 5s but that is only convenient if you'r writing a full speed.
So now the txg gets written when a. 30sec have passed or b. when x amount of data is in the txg (x= max you'r harddisk can handle)

(source: http://forums.freebsd.org/showthread.php?t=15647)

I've also seen things for BSD like:

If you use AHCI (have loaded ahci.ko and drives are recognized as using AHCI):
/boot/loader.conf
Code:
vfs.zfs.vdev.min_pending=1 #default=4
vfs.zfs.vdev.max_pending=1 #default = 35

If you don't use AHCI (didn't load ahci.ko and/or the drives are not recognized as using AHCI):
/boot/loader.conf
Code:
vfs.zfs.vdev.min_pending=4 #default=4
vfs.zfs.vdev.max_pending=8 #default = 35

(source: http://forums.freebsd.org/showthread.php?t=16445)

Then there's stuff out there about NCQ and TCQ

(source: http://letsgetdugg.com/2009/10/21/zfs-slow-performance-fix/)

If this facility to tune parameters after installing ZFS do not exist, does this mean that such variables are hard-coded in our binaries?

If so, does anyone know how many there are in the code that are hardcoded?

Like postgres, do you think we should craft a sysctl.conf file to help macZFS along?

I understand that playing with ZFS' internals is not necessarily recommended, (http://constantin.glez.de/blog/2010/04/ten-ways-easily-improve-oracle-solaris-zfs-filesystem-performance#eviltuning) but surely something can be mentioned about this?

Time to grab the src myself I think, but I'm not 100% sure what I'm actually searching for... 8(

Cheers,

Raoul.

Björn Kahl

unread,

Sep 2, 2012, 7:34:26 PM9/2/12

to zfs-...@googlegroups.com

Am 03.09.12 00:59, schrieb Raoul Callaghan:

> Hi guys,
>
> Are there any zfs.sysctl parameters that can be tuned like there is for other BSD distros? (I assuming they exist in the Solaris src as well)
>
> eg:
> parameter: sysctl vfs.zfs.txg.write_limit_override=n
> descscription:
> You tune this variable to the max mb/s your HD can handle. this way there are a lot less write stalls.
> Before that you would tune the txg wait time to 4s or 5s but that is only convenient if you'r writing a full speed.
> So now the txg gets written when a. 30sec have passed or b. when x amount of data is in the txg (x= max you'r harddisk can handle)
>
> (source: http://forums.freebsd.org/showthread.php?t=15647)

Not really.

Please keep in mind, our ZFS is much older then the FreeBSD version.
So tunable parameters that came into existence in later versions do not
exist in our pool version 8 code.

> If this facility to tune parameters after installing ZFS do not exist, does this mean that such variables are hard-coded in our binaries?

We do have a sysctl interface, but it is currently not really used.
You can find it in usr/src/uts/common/fs/zfs/zfs_vfsops.c

> If so, does anyone know how many there are in the code that are hardcoded?

The txg sync time is hard coded in usr/src/uts/common/fs/zfs/txg.c

> Time to grab the src myself I think, but I'm not 100% sure what I'm actually searching for... 8(

signature.asc

Raoul Callaghan

unread,

Sep 2, 2012, 8:33:23 PM9/2/12

to zfs-...@googlegroups.com

Thanks Björn,

So one would expect ZEVO's efforts to have these parameters considering it got to build 28?

Alex Bowden

unread,

Sep 3, 2012, 3:58:43 AM9/3/12

to zfs-...@googlegroups.com

A number of issues are getting confused here.

The graphical Disk Utility application should NEVER be used to set up MacOS's own software raid. It is an absolute disaster and has been for many years. They should be embarrassed. However the underlying software raid is ROCK SOLID. Just use the diskutil command line facilities 'appleRAID' options.

This has the advantage that it works perfectly well to boot from.

Note that from Lion the auto installed recovery disk option is only available installing to raw disks. And maybe the Mac Pro Raid Card. But its easy enough to build a recovery disk.

Moving a SATA disk about breaks your mirror. Not that surprising as SATA disks don't have unique IDs but that's not a feature of the software mirroring as moving disks around works fine with the software mirroring on FCAL disks, which do have built in unique IDs. ZFS adds virtual unique IDs to the stored data on the disk, so a pair of mirrors are no longer truly mirrors.

Truly ZFS is a great improvement. ZFS is a brilliant concept. ZFS is ROCK SOLID. MacZFS Hmmm. Are you expecting production ready? Are you wanting confidence that it won't break with each change to MacOS? Are you needing the latest features of ZFS or MacOS on a "same millennium" basis?

You are talking about a small group of enthusiastic amateurs with attitude, swimming against the tide.

The easy way is to just run Solaris on your server. Solaris has been bootable from ZFS for years.

Boot up solaris from the solaris live distribution DVD. It boots from a single disk degraded ZFS mirror. Add a couple of hard disk to the zpool. Then Drop the DVD.

Et Voila. Solaris installed and running without a single reboot.

Not actually the best way to install solaris but a brilliant demonstration of the power of ZFS.

> --
>
>
>

Jason

unread,

Sep 3, 2012, 8:21:38 AM9/3/12

to zfs-...@googlegroups.com

Greg is so right on the points about Drobo, tested these against ZFS for some of my clients. Did the ole 'yank out the drive test', so funny watching people's faces turn to masks of horror. Drobo doesn't seem to recover well from several common issues, ZFS really does and has saved a lot of data over the last few years for clients.

Jason
Sent from my iPad

> --
>
>
>

Jason

unread,

Sep 3, 2012, 8:28:14 AM9/3/12

to zfs-...@googlegroups.com

Yes, use the command line. I have several OS X servers, dual SSD Raid (stripped), booting. If you clone the Recovery partition before beginning and test it, you use it easily. I just partition the disks each with a 1GB added after the EFI and Main partitions. Once the RAID is built with the Mains, you can restore the Recovery to either or both 1GB partition. Then all data including user space goes to ZFS pools connected through ESATA or Thunderbolt depending on the age of the system.

Jason
Sent from my iPad

> --
>
>
>

Boyd Waters

unread,

Sep 3, 2012, 4:32:05 PM9/3/12

to zfs-...@googlegroups.com

Perhaps I'm belaboring the point, but:

I use a Solaris 11 Express installation on some of the cheapest hardware I could find on NewEgg -- a "consumer" AMD CPU and motherboard, additional Silicon Image 3132 PCIe x1 SATA controllers ($14 each), 6 Seagate "Green" 2TB SATA drives. Slightly more expensive power supply, $75 and a case with way too many cooling fans...

3 mirrored pairs; one side of each mirror is on the motherboard's SATA controller, while each of the remaining mirrored disks is on its own Sil3132 SATA controller. (This was an inexpensive experiment of mine, to see if this would improve performance or reliability. Not really. These cheap controllers could not keep up if they were asked to push more than one hard disk. Don't do this.)

I built this thing four (!) years ago for a total cost of $500 shipped besides the hard disks.

Performance is horrible. :-)

BUT -- those disks, which keep spinning at constant speed and temperature, haven't fared too well. Four of the original six have gone bad. Don't use "Green" drives in a NAS.

Here's the punchline: in every case, ZFS *told* me that the disk was starting to die. Since ZFS verifies checksums every block that it writes, it could tell when something didn't add up.

Bad blocks on a drive were repaired by verifying the block checksum with the still-good mirror of the pair. After a huge number of such errors, ZFS faulted the drive, removed it from the array, and I got an email on my iPhone.

I pulled the drive; for my cheap hardware, I shut down my NAS server before doing so, but I could have just pulled the drive after removing it from the pool. I think. I hooked up the offending disk in a Dell box that I inherited from somewhere, and booted it with the Seagate SeaTools CD. Yep, it was toast. Copy those error codes into the online form on Seagate web site, pay $20 and they overnighted me a replacement drive.

I put the new drive in there, booted up Solaris, and told the pool to replace the old device with this new one. The array was online, serving up data, while the new disk was populated with the mirrored data. After that I did another scrub and (10 hours later) got no errors.

I booted the Dell box with the SystemRescueCD, which I like because I'm at home with Gentoo Linux. I added some entropy from http://www.random.org/ and then filled the dying disk with noise. Put it in the box and dropped it in the mail. Done.

I could have done this with *no* downtime on the server. But I'm the only "user" of this NAS.

No data loss. I am confident that I didn't lose data.

I have offline backups, of course.

Wow. Long. I'll add scripts and whatnot to a blog post someday.

Be careful out there!

Gregg Wonderly

unread,

Sep 3, 2012, 4:43:19 PM9/3/12

to zfs-...@googlegroups.com

Boyd, a much better controller, from a performance perspective, yet okay price is

http://www.newegg.com/Product/Product.aspx?Item=N82E16816117157

add one of these for each of the two "lanes" and you have 8 drives with great performance. I got better than double performance, in general by switching to this card from the old Sil3132 cards I had. Costs more, but now I can get scrub of 4 pairs of 2TBs mostly overnight, instead of the whole weekend.

Gregg

--

Darik Horn

unread,

Sep 3, 2012, 5:06:26 PM9/3/12

to zfs-...@googlegroups.com

On Mon, Sep 3, 2012 at 3:32 PM, Boyd Waters <water...@gmail.com> wrote:
>
> BUT -- those disks, which keep spinning at constant speed and temperature,
> haven't fared too well. Four of the original six have gone bad. Don't use
> "Green" drives in a NAS.

In the small booklet that comes with these disks, look for fine print
that reads something like "Annualized Failure Rate (AFR) and Mean Time
Between Failures (MTBF)".

Notice how the the product lifetime is usually expressed as a 6 hour
work day and assumes an average temperature near 25 centigrade. If you
run these disks under moderate load, such that they heat up and stay
hot, then they are used-up after a year.

--
Darik Horn <daj...@vanadac.com>

Boyd Waters

unread,

Sep 3, 2012, 5:45:44 PM9/3/12

to zfs-...@googlegroups.com

On Sep 3, 2012, at 2:43 PM, Gregg Wonderly <greg...@gmail.com> wrote:

Boyd, a much better controller, from a performance perspective, yet okay price is

http://www.newegg.com/Product/Product.aspx?Item=N82E16816117157

You are absolutely right Greg! This is the controller that has been in my NewEgg "saved" list for the past two years. It's a re-branded LSI controller, you can upgrade the firmware and it works very well with Solaris.

It's also available at Amazon.

Boyd Waters

unread,

Sep 3, 2012, 6:17:11 PM9/3/12

to zfs-...@googlegroups.com

On Sep 3, 2012, at 2:43 PM, Gregg Wonderly <greg...@gmail.com> wrote:

> 8 drives with great performance

The primary source of my performance woes, in addition to the horrid IOPS of my hardware choices, is ZFS deduplication.

Awesome idea, and a slam-dunk for my use case, because the dedup factor on my pool is about 3x -- That is, I'm hosting about 13 TB of data on my 6TB pool with room to spare.

But in order to determine whether an incoming block is already on the pool, ZFS has to maintain a table in memory that holds the content checksum (and some additional metadata) for every block. There are a lot of blocks. I need more than 24 GB of RAM to hold this table.

Since that's more than my cheap server can hold (or that I could afford at the time), that table gets swapped out. This is worst-case scenario for ZFS performance. It's awful.

So my next $100 purchase was an SSD, which I added to the ZFS pool as a "Level 2 Read Cache", an L2ARC.

This is a neat feature of ZFS, you can specify some SSDs as cache for reading (the ARC), some as a write cache (the "intent log", or ZIL). The read cache can fail safe; if the read fails ZFS will try again, and eventually fault the SSD. If *write* cache fails, you are hosed, so pros usually use a mirrored pair of "enterprise" SSDs for ZIL. For home use, I'd recommend Intel "Cherryville" SSDs, which are (I think) Intel's latest 20nm NAND with firmware that Intel worked on for 6 months. The same hardware is OEM'd to other vendors, but Intel still holds onto their firmware for the next year, and it's supposed to have significant bug fixes.

The SSD L2ARC helped a bit, but not a lot. I should have spent the money on the Intel SATA controller.

(I haven't yet used ZFS on-disk encryption, but I'm almost certain that it would *not* impact performance, although I don't quite understand how (or if) encrypted volumes change all of these in-memory data structures, or change memory requirements... I don't think there is any change at that level, that encryption is perhaps the last transformation applied to the bits as they are written to the spinning hard disks.) ((But avoid ZFS encryption if you want to later use the devices on your Mac, because the MacZFS version does not yet support this feature.))(((I think)))

Dang. Another long post. And this stuff isn't on-topic for a Mac port of ZFS. But for people getting started with ZFS, I hope it is of interest.

Richard Elling

unread,

Sep 3, 2012, 6:40:55 PM9/3/12

to zfs-...@googlegroups.com

On Sep 3, 2012, at 3:17 PM, Boyd Waters <water...@gmail.com> wrote:

On Sep 3, 2012, at 2:43 PM, Gregg Wonderly <greg...@gmail.com> wrote:

8 drives with great performance

The primary source of my performance woes, in addition to the horrid IOPS of my hardware choices, is ZFS deduplication.

Awesome idea, and a slam-dunk for my use case, because the dedup factor on my pool is about 3x -- That is, I'm hosting about 13 TB of data on my 6TB pool with room to spare.

But in order to determine whether an incoming block is already on the pool, ZFS has to maintain a table in memory that holds the content checksum (and some additional metadata) for every block. There are a lot of blocks. I need more than 24 GB of RAM to hold this table.

The dedup table is metadata. Did you adjust the metadata limit to be larger (default = arc_c_max/4)?

A simple change like this can make a huge impact.

Since that's more than my cheap server can hold (or that I could afford at the time), that table gets swapped out. This is worst-case scenario for ZFS performance. It's awful.

So my next $100 purchase was an SSD, which I added to the ZFS pool as a "Level 2 Read Cache", an L2ARC.

This is a neat feature of ZFS, you can specify some SSDs as cache for reading (the ARC), some as a write cache (the "intent log", or ZIL). The read cache can fail safe; if the read fails ZFS will try again, and eventually fault the SSD. If *write* cache fails, you are hosed, so pros usually use a mirrored pair of "enterprise" SSDs for ZIL. For home use, I'd recommend Intel "Cherryville" SSDs, which are (I think) Intel's latest 20nm NAND with firmware that Intel worked on for 6 months. The same hardware is OEM'd to other vendors, but Intel still holds onto their firmware for the next year, and it's supposed to have significant bug fixes.

The SSD L2ARC helped a bit, but not a lot. I should have spent the money on the Intel SATA controller.

(I haven't yet used ZFS on-disk encryption, but I'm almost certain that it would *not* impact performance, although I don't quite understand how (or if) encrypted volumes change all of these in-memory data structures, or change memory requirements... I don't think there is any change at that level, that encryption is perhaps the last transformation applied to the bits as they are written to the spinning hard disks.) ((But avoid ZFS encryption if you want to later use the devices on your Mac, because the MacZFS version does not yet support this feature.))(((I think)))

ZFS-internal encryption is only available in Solaris 11, today.

-- richard

Dang. Another long post. And this stuff isn't on-topic for a Mac port of ZFS. But for people getting started with ZFS, I hope it is of interest.

--

Daniel Becker

unread,

Sep 3, 2012, 10:19:57 PM9/3/12

to zfs-...@googlegroups.com

These controllers (LSI1068E) used to have issues with losing disks if SMART commands were issued under certain conditions, which essentially made running smartd infeasible. Have these issues been resolved?

--

Alex Wasserman

unread,

Sep 4, 2012, 12:11:51 AM9/4/12

to zfs-...@googlegroups.com

ZFS is way more capable than Apple's "RAID", but less flexible than (for example) a Drobo or ReadyNAS. Or previous versions of Windows Home Server.

I actually moved my disks from my Drobo into my main tower case and added them to my ZFS pool.

The Drobo had awful performance, even over FW800 using a decent FW800 add-in card. The performance has always been awful. Pushing it too hard caused it to drop and hang. Not a good situation.

ZFS has been rock solid so far.

The Drobo didn't report disks dying. After I pulled them, hearing clicking etc. SeaTools instantly reported and issue running the checks, and Seagate have replaced the drive under warranty. Drobo never mentioned an issue.

ZFS has flagged issues for me, Drobo hasn't.

Drobo performance sucks. ZFS is faster than raw drives.

Drobo is good as an external backup system. Large disks, slow performance, etc. However, I just don't feel I can really rely on it. It's part of a disk system for me, including Time Capsule for recent (last couple of months) file changes, nightly system drive duplication, etc.

The SSD system drives dupes to an HD nightly, as well as to a sparse image on ZFS. Whenever I upgrade I keep the final image before upgrading. I can roll back to where I was to Lion, and Snow Leopard, if necessary.

Similarly, important stuff gets backed up using Time Capsule, and Drop-box, as well as a nightly copy to the ZFS pool.

All less critical big-data stuff goes onto the ZFS pool.

ZFS pool then gets duped to Drobo as a distinct backup.

The ZFS pool is currently 5.5Tb, with 2.15Tb used.

ZFS just seems the most stable all over. I've messed with the disks, moved them around, etc. It always comes up fine. Pulled them, and replaced new ones and it resilvers. Scrubs fix data quality on disk.

Add in the snapshots and it's a really great system.

That was more than I wanted to write about ZFS beating Drobo, but for me, it's no contest. Drobo is nothing but a fancy, and pretty expensive backup box. Mine sits basically unused now, and without it, I have more faith in my data than with it.

Roddi

unread,

Sep 4, 2012, 4:59:50 AM9/4/12

to zfs-...@googlegroups.com

Am Dienstag, 4. September 2012 um 06:11 schrieb Alex Wasserman:

Drobo is nothing but a fancy, and pretty expensive backup box. Mine sits basically unused now, and without it, I have more faith in my data than with it.

I have exactly the same situation here!

Roddi

Gregg Wonderly

unread,

Sep 5, 2012, 1:26:29 AM9/5/12

to zfs-...@googlegroups.com

I have seen some of these things discussed, and I have, on occasion, rebooted to recover a drive which seemed to disappear when another drive failed during a scrub or resilver. I use mirrored pairs on my machines, and I try to make sure that I have them mirrored with 1 drive from each of the 2 "sides" of the controller, So, something like

c2t0d0-c2t0d4

c2t0d1-c2t0d5

c2t0d2-c2t0d6

c2t0d3-c2t0d7

pairing is what I do, so that if something "falls" offline, I still have always had a fully functional pool that just has missing mirrored devices. I can then, replace the drive taken out by scrub, and reboot to recover the other drive(s), and just let ZFS resilver what ever is missing on the dropped drives.

This happens about every 2-3 months as I loose another drive.

Gregg Wonderly

Boyd Waters

unread,

Sep 5, 2012, 4:33:06 AM9/5/12

to zfs-...@googlegroups.com

Gregg Wonderly <greg...@gmail.com> wrote:

> I try to make sure that I have them mirrored with 1 drive from each
> of the 2 "sides" of the controller

I know that the LSI controller has 2 multilane ports on it, and you
get 4 SATA connections per port. So your mirroring scheme could
protect you from a port or cable problem.

I've never experienced a controller failure. But I suppose it does
happen.

Jason

unread,

Sep 5, 2012, 7:47:24 AM9/5/12

to zfs-...@googlegroups.com

I have something similar to Greg at home, but if one is losing drives as fast as Greg I really recommend checking the power supply and power connectors. I usually only lose a drive in this system once every year or so, and always the same location/connector which will hopefully go better this year as I've split the load between two power supplies.

Jason

Sent from my iPad

--

Boyd Waters

unread,

Sep 5, 2012, 11:09:19 AM9/5/12

to zfs-...@googlegroups.com

On Wed, Sep 5, 2012 at 5:47 AM, Jason <jason...@belecmartin.com> wrote:
> I usually only lose a drive in this system once every year or so, and always
> the same location/connector which will hopefully go better this year as I've
> split the load between two power supplies.

Yep. A single, small-business or home NAS with fewer than 12 drives --
you shouldn't expect to lose more than one drive per year on average.
That *doesn't* mean that you shouldn't plan for it.

There is a certain element of randomness to drive lifespan, but
certainly a bad power supply can make short (ha) work of destroying
many drives. When I moved across country, I lost 5 drives in 6 weeks
-- and a USB hub (smoke!) and a MacBook (sparks!) due to lousy cables
and a terrible power supply.

(Now I use an Antec 900 case with an embarrassing number of case fans,
and a high-quality power supply. The temperature of the disks stays
about ambient, between 24-30°C, seasonal variation. smartmontools. It
pulls about 40 Watts at idle, 250 Watts at cold boot drive spin-up. No
drive loss in 18 months.)

Gregg Wonderly

unread,

Sep 5, 2012, 2:11:32 PM9/5/12

to zfs-...@googlegroups.com

On Sep 5, 2012, at 12:26 AM, Gregg Wonderly <greg...@gmail.com> wrote:

I have seen some of these things discussed, and I have, on occasion, rebooted to recover a drive which seemed to disappear when another drive failed during a scrub or resilver. I use mirrored pairs on my machines, and I try to make sure that I have them mirrored with 1 drive from each of the 2 "sides" of the controller, So, something like

c2t0d0-c2t0d4
c2t0d1-c2t0d5
c2t0d2-c2t0d6
c2t0d3-c2t0d7

pairing is what I do, so that if something "falls" offline, I still have always had a fully functional pool that just has missing mirrored devices. I can then, replace the drive taken out by scrub, and reboot to recover the other drive(s), and just let ZFS resilver what ever is missing on the dropped drives.

This happens about every 2-3 months as I loose another drive.

I guess I should qualify this statement. What happened was that I did have the drives in too small of a system without enough airflow, plus the slow controller. With scrub running for 36-50 hours sometimes, the heat on the drives was quite sustained. SMART didn't show to extreme temps, but they were about 9-12 C higher than I have now on the drives. So some of the drives are going to fail faster than usual. Hopefully I am going to experience less problems over time, as I replace the drives, I am buying some new drives as replacements to try and keep at least one side of the mirror on "new" drives with the replacement drives, on the other side.

I replaced 6 different drives in 5 different devices last month. 1 on my iMac (4 year old 500GB drive), 1 on my old HP home server (5 year old 750GB drive) that I still used for some windows backup, 2 on my Drobo S (4 year old 1.5TB WD and 2 year old 2TB Seagate), 1, 2TB Seagate on my primary Solaris server, and 1, 1.5TB WD and 1, 2TB Seagate on my backup Solaris Server.

Gregg

Reply all

Reply to author

Forward