pros/cons of multiple zfs filesystems

250 views
Skip to first unread message

roemer

unread,
Mar 15, 2014, 6:52:20 PM3/15/14
to zfs-...@googlegroups.com
When one creates a new zpool, this automatically creates a root filesystem too - and even mounts this.
What is now the advantage (or disadvantage) of creating further sub-filesystems inside the pool using zfs?
And what is the difference to simply create sub-directories under the zpool root?

Two advantages, that I can see, are separate compression and quota settings.
But what about general performance? Is there a performance penalty for having multiple zfs filesystems inside one pool, perhaps even with different settings?


Bjoern Kahl

unread,
Mar 15, 2014, 7:34:52 PM3/15/14
to zfs-...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hi roemer,

Am 15.03.14 23:52, schrieb roemer:
Not really.

Technically, a file system (or, in ZFS language: a dataset) is very
similar to a directory and one can have thousands of these without
noticeable performance impacts as far as the ZFS core is concerned.

Under Mac OSX, a mounted file system comes at higher costs than on
other Unix like operating systems, due to the Finder and MDS services,
so I would not suggest to really try to have hundreds of file systems
mounted at the same time. But any reasonable number (some 10) go
without noticeable performance impact.


One additional advantage not in your list is the ability to make
snapshots, including cloning these as new then (almost) independent
read-write file systems, or to use the snapshots as lightweight
backups against user error / application misbehavior. Of course,
these can not replace a true off-site backup, but are nevertheless
useful. For example I used to have parts of my User directory on ZFS
and have it automatically snapshoted every 15 minutes as a cheap
versioning solution. Snapshots can also easily be used for real
off-site backups by the zfs send / receive mechanism.


Best regards

Björn

- --
| Bjoern Kahl +++ Siegburg +++ Germany |
| "googlelogin@-my-domain-" +++ www.bjoern-kahl.de |
| Languages: German, English, Ancient Latin (a bit :-)) |
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQCVAgUBUyTjmlsDv2ib9OLFAQIzpQP/VdOU581ICrh8olSnyqVrErKuT/FJCHAf
dGygGvv/b9egj8fajOcENwkEY9Lrzv14DNP/EFWmssaNfIpSpUR7TikaumUPMJgV
QfTEj51zWCwbwcWtln9vrMmQ9fk31vicyDhbjs7iph2YVRd+nABMT3c2Tt93WPZg
8WwkngD9TaQ=
=yYu8
-----END PGP SIGNATURE-----

roemer

unread,
Mar 16, 2014, 9:16:42 AM3/16/14
to zfs-...@googlegroups.com
Thanks for the response, Björn.
The hint regarding dataset-specific snapshots is good, though I have to first think about how I would best make use of them.

However another point that you raised is interesting:

On Sunday, 16 March 2014 10:34:52 UTC+11, Bjoern Kahl wrote:
[...]

 Under Mac OSX, a mounted file system comes at higher costs than on
 other Unix like operating systems, due to the Finder and MDS services,
 so I would not suggest to really try to have hundreds of file systems
 mounted at the same time.  But any reasonable number (some 10) go
 without noticeable performance impact.

I would need about 10 separate mount points / data sets, so I guess this would be fine.
MDS services however means Spotlight, but the MacZFS Wiki as well as several other posts on the web give the advice to switch off spotlight for ZFS with
mdutil -i off mountPoint

Why is Spotlight thought to be evil for ZFS? 
Or does your comment imply that these advices are outdated, and mds-indexing for ZFS mount points is ok nowadays?
Note that I am mainly aiming to store static 'archival' data and documents on ZFS, not my main user directory.
 
[...] Snapshots can also easily be used for real
 off-site backups by the zfs send / receive mechanism.

Haven't looked at send/receive yet, but if they require network connections, I am afraid classical ADSL speeds with mac 1MBit/s upload will not be much fun...
And for periodic backup to an external HDD I was thinking about ChronoSync or simply rsync

roemer

Simon Casady

unread,
Mar 16, 2014, 3:40:02 PM3/16/14
to zfs-...@googlegroups.com
An advantage of snapshots is with active filesystems such as those used by a database.  For a consist at database backup you of course need to stop the program then backup then restart ( or use some database tool if available) .  The time to create a snapshot is essentially zero so the above start - stop is actually practical.  Then you use your backup software of choice on the snapshot not the active file system.


--

---
You received this message because you are subscribed to the Google Groups "zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zfs-macos+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jason Belec

unread,
Mar 16, 2014, 4:20:30 PM3/16/14
to zfs-...@googlegroups.com
Snapshots also only store the difference from the last snapshot and when combined with send/receive are very efficient for replication through transmission to remote servers.


--
Jason Belec
Sent from my iPad

Dave Cottlehuber

unread,
Mar 16, 2014, 4:34:45 PM3/16/14
to zfs-...@googlegroups.com
I've been a happy maczfs and also zfsosx user for several years now.

TL;DR
- the team are amazing at nailing fixes when I've reported issues. I
use zfs 100% of the time for my work and sanity
- while I can't get a clean shutdown atm, it's rare that I need to
anyway, and zfs has my data anyway once sync has completed
- the zfs compatibility across OS is a huge win
- performance is not a constraint for me, and I'm a very heavy user
- datasets and snapshots are almost as nice as openafs vols for management

I'm a heavy user of snapshots and pools, for some inspiration, the LR:

3 main systems, 2x OSX, 1x large FreeBSD physical hosted server.

My main work laptop is a 16GB early 2011 MBP with a small 256GB SSD
for OS, 1 partitions for each of 4 OS, and a large native ZFS 512GB
SSD. Now that I've been using this for a while, I could have survived
with a 64GB OS disk, and a 256GB zfs SSD, but hey. If I could fit more
ram in, I would. The other boxes are bigger (32GB iMac, 64GB FreeBSD
box with ECC RAM, dual disks mirrored ZFS). I use an ashifted zpool
which has made a noticeable difference in performance on all the
systems I've implemented.

I keep my itunes collection (in a zfs filesystem, formD normalisation,
noatime) and use the snapshots to keep an up-to-date read-only zfs
mirror on the other 2 systems. movies are the reverse, after watching
one on the laptop it gets shuffled off to the larger boxes for
permanent storage. zfs send is a very easy way to do a very trustable
backup, once you get past the first potentially large transfers.

All my source code & work lives in a zfs case sensitive noatime
copies=2 filesystem, and I replicate that regularly to my other boxes
as required.

For most customer projects I will have 3 or more VMs running different
configs or operating systems under VMWare Fusion. These each live in
their own zfs filesystem, compressed lz4 noatime case sensitive. I
snapshot these after creation using vagrant install, again after
config, and the changes are replicated using zfs snapshots again to
the other OSX system, and also to the remote FreeBSD box.

Where I can, I spin up these VMs in a zpool-backed ramdisk (with
compression) which means I can fit a 20GB disk image into 16GB of RAM
and still work effectively. I don't confess to knowing how that
actually works but it does. And its very very fast. The specific
config for that image is stored in the main SSD and as I'm not writing
continuously to it while running the VM, things are peachy.

At the end of the project, I can remove the local snapshots as
required, and I archive them onto 32GB SD cards (yup) with a zpool and
copies=2. They're a nice easy archival format, so long as you have
another copy stashed safely too.

A couple of months ago, I had a number of hardware failures on the
MBP, and each time I was able to guarantee that my data was intact,
with full integrity, despite the travesties worked upon it each time
it went to the factory for repair. I'd never have been certain with
HFS+.

I don't have my ~ homedir in zfs just yet, but I've no particular
reason not to move it now other than time constraints. with
normalisation and case insensitivity I don't think I will see the
issues I did under prior versions with less support. Spotlight is not
important to me, and finder behaves itself now under Mavericks and the
new ZFSOSX builds.

In summary, I'm more than happy with the performance once I used
ashift=12 and moved past 8GB ram. Datasets once you get used to them
are extraordinarily useful -- snapshot your config just before a
critical upgrade.

A+
Dave

roemer

unread,
Mar 16, 2014, 8:28:36 PM3/16/14
to zfs-...@googlegroups.com
On Monday, 17 March 2014 06:40:02 UTC+11, cap wrote:
An advantage of snapshots is with active filesystems such as those used by a database.  For a consist at database backup you of course need to stop the program then backup then restart ( or use some database tool if available) .  The time to create a snapshot is essentially zero so the above start - stop is actually practical.  Then you use your backup software of choice on the snapshot not the active file system.

This is only fine if your database is read-only or you have control on the update workload.
Most database systems use a combination of no-force+steal buffering and WAL logging (e.g. MySQL InnoDB or PostgreSQl and basically all commercial RDBMS).
Taking a file-system level snapshot underneath does not guarantee that you get a consistent snapshot of the database log and data pages.
Together with high update rates, this can be dangerous. Better use the database system's snapshot facility too before you take the ZFS snapshot. Granted, open source systems are a bit weak in that regard...
  

Jason Belec

unread,
Mar 16, 2014, 8:35:26 PM3/16/14
to zfs-...@googlegroups.com
Yeah but that's databases! Whole different game. ;)

Jason
Sent from my iPhone 5S
--

Alex Wasserman

unread,
Mar 16, 2014, 10:20:56 PM3/16/14
to zfs-...@googlegroups.com
Some examples, of how I've divied up the pool - 

/Users gets mounted as /Users. This makes it easy for OSX as there's a filesystem there when it creates a user, and I don't have to retroactively move users onto ZFS. I keep an admin account not on ZFS in case too. This is regularly snap-shotted with a script running nightly keeping a configurable history period. Currently 31 days. Runs through launchd. This is also compressed as it's mostly text files, documents, etc

/Media - Stores iPhoto/iTunes/Movies, etc. Not compressed as everything here is already compressed in one way or another. Also, easier to share between users when mounted up in a shared space, not under a single user.

/Backups - separate filesystem to allow easier cloning of others into here from multiple sources. eg. system disk gets a nightly sync into a backup img in this space.

/Apps lives on my system SSD, but /Apps/Games comes from another ZFS filesystem as games these days are huge. With a 100Gb SSD, and games weighing in at ~10G, I don't have the space to house them. ZFS takes them easily. Again, mounting into /Apps/Games means it's a standard location for the OS, and everybody on the system can use them.

Alex

roemer

unread,
Mar 17, 2014, 12:00:22 AM3/17/14
to zfs-...@googlegroups.com
Thanks for the detailed example!

On Monday, 17 March 2014 07:34:45 UTC+11, dch wrote:
I've been a happy maczfs and also zfsosx user for several years now.
[...]

zfs send is a very easy way to do a very trustable
backup, once you get past the first potentially large transfers.

Can this happen bi-directiona? Or is it only applicable for creating 'read-only' replicas of a master filesystem onto some clients?
I mean, what happens once you cloned one file system, sent it to your laptop, then edit on both the laptop and your ZFS server?
 
All my source code & work lives in a zfs case sensitive noatime
copies=2 filesystem, and I replicate that regularly to my other boxes
as required.

How does a 'copies=2' filesystem play together with a 'RAIDZ1' (or even RAIDZ2) pool?
RAIDZ would have all data stored redundantly already, so would 'copies=2'
not end up in quadrupling the storage requirement if used on a raidz pool?
 
For most customer projects I will have 3 or more VMs running different
configs or operating systems under VMWare Fusion. These each live in
their own zfs filesystem, compressed lz4 noatime case sensitive. I
snapshot these after creation using vagrant install, again after
config, and the changes are replicated using zfs snapshots again to
the other OSX system, and also to the remote FreeBSD box.

I can see that zfs is really good for handling multiple virtual machines.
 
[...]
In summary, I'm more than happy with the performance once I used 
ashift=12 and moved past 8GB ram. Datasets once you get used to them
are extraordinarily useful -- snapshot your config just before a
critical upgrade.

I start seeing the potential in snapshots. In fact, I just realised that I do manual 
'snapshots' on some of my repeating projects already for quite some time with annual 
clones of the previous directory structure. So ZFS snapshots would be a natural fit here.

But regarding the memory consumption:
What makes ZFS so memory hungry in your case?
Do you use deduplication?

roemer

unread,
Mar 17, 2014, 12:15:21 AM3/17/14
to zfs-...@googlegroups.com
Thanks for sharing this info. Very interesting.
I am currently developing a very similar idea on how zfs could help me.
And dedicated Media and Documents (in your case: Users) filesystems / datasets would certainly make a lot of sense, especially with the separate compression and snapshotting settings.

How do iTunes and especially iPhoto like it that their working set is stored on ZFS?
Is the zfs pool mounted locally on the same machine, or does it come from a file server?

Another interesting question is how laptops fit into the picture.
Once you have a file server and at least one laptop, you can't guarantee that it is always able to connect to the file server, nor that there is only one modifiable copy of shared data (such as work documents - or your music ;)...

Dave Cottlehuber

unread,
Mar 17, 2014, 3:35:38 AM3/17/14
to zfs-...@googlegroups.com
On 17. März 2014 at 05:00:25, roemer (uwe....@gmail.com) wrote:
> Thanks for the detailed example!
>
> On Monday, 17 March 2014 07:34:45 UTC+11, dch wrote:
> >
> > I've been a happy maczfs and also zfsosx user for several years now.
> > [...]
> > zfs send is a very easy way to do a very trustable
> > backup, once you get past the first potentially large transfers.
> >
> > Can this happen bi-directiona? Or is it only applicable for creating
> 'read-only' replicas of a master filesystem onto some clients?
> I mean, what happens once you cloned one file system, sent it to your
> laptop, then edit on both the laptop and your ZFS server?

Then you’re screwed :-). It’s not duplicity or some other low-level sync
tool. I find it works best when you have a known master that you’re working
off.

Slightly OT, but in FreeBSD with HAST you can do some gonzo crazy stuff:
 http://www.aisecure.net/2012/02/07/hast-freebsd-zfs-with-carp-failover/

> > All my source code & work lives in a zfs case sensitive noatime
> > copies=2 filesystem, and I replicate that regularly to my other boxes
> > as required.
> >
> > How does a 'copies=2' filesystem play together with a 'RAIDZ1' (or even
> RAIDZ2) pool?
> RAIDZ would have all data stored redundantly already, so would 'copies=2'
> not end up in quadrupling the storage requirement if used on a raidz pool?

Yes, but in this case, the laptop isn’t redundant, and my data is precious.
IIRC the whole repos dataset, even with history, is < 40 Gb, so that’s
reasonable IMO.

> > For most customer projects I will have 3 or more VMs running different
> > configs or operating systems under VMWare Fusion. These each live in
> > their own zfs filesystem, compressed lz4 noatime case sensitive. I
> > snapshot these after creation using vagrant install, again after
> > config, and the changes are replicated using zfs snapshots again to
> > the other OSX system, and also to the remote FreeBSD box.
> >
> > I can see that zfs is really good for handling multiple virtual machines.

Yup, zfs rollback for testing deployments or upgrades is simply bliss.

> In summary, I'm more than happy with the performance once I used
> > ashift=12 and moved past 8GB ram. Datasets once you get used to them
> > are extraordinarily useful -- snapshot your config just before a
> > critical upgrade.
> >
> > I start seeing the potential in snapshots. In fact, I just realised that I
> do manual
> 'snapshots' on some of my repeating projects already for quite some time
> with annual
> clones of the previous directory structure. So ZFS snapshots would be a
> natural fit here.
>
> But regarding the memory consumption:
> What makes ZFS so memory hungry in your case?

I don’t think it’s very hungry actually. 4GB (under the old MacZFS 74.1)
simply wasn’t enough and I’d get crashes. With 8GB that went away. Bearing
in mind with 16GB RAM I can run a web browser (oink at least 1GB), a 20GB VM
that’s been compressed into a 10GB RAMdisk, +1 GB RAM for the VM, that seems
pretty reasonable. That would leave 4GB for ZFS and the normal OSX baseline
stuff roughly.

I’m happy to report back with RAM usage if somebody tells me what z* 
incantation is needed.

> Do you use deduplication?

Never. But I do use cloned datasets a fair bit, which probably helps the
 situation a bit.

The 2nd law of ZFS is not to use deduplication, even if you think you need it.
IIRC the rough numbers are 1GB RAM / TB storage, and I’d want ECC RAM for that.

BTW pretty sure the 1st law of ZFS is not to trust USB devices with your data.

--
Dave Cottlehuber
Sent from my PDP11



Geoff Smith

unread,
Mar 17, 2014, 4:15:53 AM3/17/14
to zfs-...@googlegroups.com
My iTunes library is stored in ZFS, all you have to do is point iTunes to a pre-existing library and it figures itself out, works really well.

Sent from my iPhone

Dave Cottlehuber

unread,
Mar 17, 2014, 6:49:55 AM3/17/14
to zfs-...@googlegroups.com
On 17. März 2014 at 09:15:55, Geoff Smith (lucidi...@gmail.com) wrote:
> My iTunes library is stored in ZFS, all you have to do is point iTunes to a pre-existing
> library and it figures itself out, works really well.
>
> Sent from my iPhone

+1

Just make sure you use formD normalisation & case insensitivity for the dataset :-)


Philip Robar

unread,
Mar 17, 2014, 7:23:18 AM3/17/14
to zfs-...@googlegroups.com
On Mon, Mar 17, 2014 at 3:35 AM, Dave Cottlehuber <d...@jsonified.com> wrote:
On 17. März 2014 at 05:00:25, roemer (uwe....@gmail.com) wrote:

> > How does a 'copies=2' filesystem play together with a 'RAIDZ1' (or even
> > RAIDZ2) pool? RAIDZ would have all data stored redundantly already, so
> > would 'copies=2' not end up in quadrupling the storage requirement if used
> > on a raidz pool?

Yes

No, RAIDZ does not store your data redundantly. It splits your data across multiple drives and uses space equivalent to one drive to store parity information about the data so that it can be mathematically made whole if one drive goes missing. RAIDZ2 or RAIDZ3 just raise the level of parity, i.e. the number of disk failures that can happen before data is lost, to two or three respectively.

So the amount of space lost to parity is a constant of disk size x RAID level. Thus, if you're using copies, the amount of space lost is just dataset size / copies. One of the nice things about using copies as opposed to mirroring is that you can set it on a per file system (e.g. dataset) as opposed to mirroring which affects the entire vdev.

On the other hand, if you're using mirroring, then yes turning on copies=2 does cut your storage space to pool size / 4. (Assuming all datasets in the pool have this set.)

RAIDZ vs mirroring vs copies all comes down to trading off performance vs Reliability, Availability and Serviceability vs space. There are formulas for figuring all of this out. Start at Serve the Home's Raid Reliablitity calculator* which takes into account everything, but increasing file redundancy. For that there's this article: ZFS, Copies, and Data Protection. And for RAIDZ vs Mirroring performance see When To (And Not To) Use RAID-Z.


Phil

* Note that the Mean Time to Data Loss calculated at this site, while being an industry standard, is essentially useless other than for getting a relative comparison of different configurations. For details see: Mean time to meaningless: MTTDL, Markov models, and storage system reliability.

Jason Belec

unread,
Mar 17, 2014, 7:46:11 AM3/17/14
to zfs-...@googlegroups.com
Good man.


--
Jason Belec
Sent from my iPad

Jason Belec

unread,
Mar 17, 2014, 9:05:36 AM3/17/14
to zfs-...@googlegroups.com
My wife and I have both our iTunes libraries on ZFS on the basement server, each of our systems user data also is ZFS which backs up every 20 minutes to the basement server. This has been running for years under OSX and the current/stable and old MacZFS. That server then forwards all the snapshots to another location just in case, losing family photos is bad!

Currently, anything that must have HFS+ is being tested in a ZVOL (development builds) which is formatted for HFS+ with ZFS underneath. So far this has been quite good for Mail and seems to be Spotlight friendly, no guarantees yet. For those that want to try it.


--
Jason Belec
Sent from my iPad

Philip Robar

unread,
Mar 17, 2014, 2:17:22 PM3/17/14
to zfs-...@googlegroups.com
I admit to being one whose eyes glaze over when the discussion turns to i18n/l10n. So why should I use formD normalization?

I've been using case sensitive filesystems on my Mac for as long as there was a choice (I grew up in UNIX land where this was made correctly from the start.) and I've never had a problem—especially since I don't use poorly written Adobe programs and the like.  So why would I use a case insensitive dataset for iTunes?

Phil

Alex Wasserman

unread,
Mar 17, 2014, 2:48:04 PM3/17/14
to zfs-...@googlegroups.com

Media:
ZFS mounted locally - iPhoto/Aperture, and iTunes both quite happy.

Laptops/Syncing:

I have DropBox running just fine on my desktop. No reason I couldn't run it on the laptop and use it to sync documents. Wouldn't be as good for serious volume, but for just work docs, it handles that just fine, and would keep things in sync. Alernatively, you could setup some rsync scripts (or CarbonCopyCloner, etc) to duplicate your directories when your laptop is back home. Nice thing about DropBox is that it'll work from anywhere without requiring a VPN back home.

Jason Belec

unread,
Mar 17, 2014, 3:02:02 PM3/17/14
to zfs-...@googlegroups.com
Well technically, setting up your own Dropbox, Box, AWS, etc., is not hard. But hey, people can pay someone for the service so they do. ;)



--
Jason Belec
Sent from my iPad

Dave Cottlehuber

unread,
Mar 17, 2014, 3:40:00 PM3/17/14
to zfs-...@googlegroups.com
On 17. März 2014 at 19:17:23, Philip Robar (philip...@gmail.com) wrote:
> I admit to being one whose eyes glaze over when the discussion turns to
> i18n/l10n. So why should I use formD normalization?

Because (as you point out ;-) poorly written software won’t work.

iTunes is one of them, sadly.

FWIW my work repos, git, vmware fusion stuff, etc are all std zfs, and it works brilliantly when I shift between OSs.

<3 that zfs.

A+
Dave


Philip Robar

unread,
Mar 17, 2014, 3:56:37 PM3/17/14
to zfs-...@googlegroups.com
On Mon, Mar 17, 2014 at 3:40 PM, Dave Cottlehuber <d...@jsonified.com> wrote:
On 17. März 2014 at 19:17:23, Philip Robar (philip...@gmail.com) wrote:
> I admit to being one whose eyes glaze over when the discussion turns to
> i18n/l10n. So why should I use formD normalization?

Because (as you point out ;-) poorly written software won’t work.

iTunes is one of them, sadly.

OK, let me try again. I read a description of the various normalization forms and despite my being a native speaker of English I couldn't find any meaning in the words. (Something, unfortunately all too common when it comes to standards docs.) So can you explain for the naive and mildly interested what "formD" means?

Phil
  

Chris Ridd

unread,
Mar 17, 2014, 6:25:09 PM3/17/14
to zfs-...@googlegroups.com
Unicode has multiple ways of representing certain characters, e.g. characters with accents. Each way is defined in the Unicode standard. You may have filenames with these characters in, and Apple expects these characters to be encoded one way, Linux may (for argument's sake) assume another way, and so on. These are (AIUI) the different normalization forms.

ZFS lets you choose the normalization form when you create each filesystem.

Chris

Bjoern Kahl

unread,
Mar 17, 2014, 7:31:18 PM3/17/14
to zfs-...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


I apologize for this being a bit longer, but I tried to really clarify
what normalization is all about nd how it affects ZFS on OSX.

Am 17.03.14 20:56, schrieb Philip Robar:
The two normalization forms "formD" and "formC" mandate how certain
characters outside the standard ASCII range (A-Z, a-z, 0-9 and a few
punctuation characters ".,-;" and some other) are represented.


For example (note: the following is not fully technical correct, but
illustrates the idea), the German letter "ö", named o_umlaut, could
be represented as-is, that is as a single entity of Unicode code point
number 246.

However, the "ö" could also be seen as a plain "o" with two dots (in
printed text and modern German hand writing since 1978) or two short
downward lines (in some German hand writing scripts, for example the
Sütterlin script or other Kurrent scripts and hand writing taught
before 1978).

Similarly, the "ö" can be encode in Unicode by a two character
sequence, a plain "o" and a modifier '"' with the meaning "put two
dots above the previous character" (note: '"' is not such a modifier,
it serves here as a visualization of the actual modifier).


Now, a text in normalization "formC" or "combined form" would have all
characters, which can be represented by a single entity encode using
this single character.

A text in "formD" or "decomposed form" normalization would have all
characters that have some dots, accents, or other "additions" encoded
using the plain base character followed by one or more modifiers.

It is normalized formD, if the modifiers come in a defined order, for
example if a character has a dot above and below, the modifier for
"dot below" comes always first.

It is in irregular formD, if all characters are decomposed, but the
modifiers do not come in the defined order, in the example of a dot
below and above a character, having the modifier for "dot above"
coming before the modifier for "dot below" makes the string irregular.


This whole mess is important, because it affects how sorting works.
For example, two strings "o" + "dot_below" + "dot_above" and "o" +
"dot_above" + "dot_below" should compare equal, because they carry
the same information, despite the fact that they differ in their
binary representation.

Normalizing make comparing and sorting easier.


Normalization and ZFS and OSX
=============================


Why should we care?

Because Finder wants to sort directory listings, and for this needs to
know how the byte sequence it gets from the VFS maps to scripting
symbols and how these symbols order.

Finder expects text like filenames to be in formD.

For file systems like ZFS this means, they need to

(a) simple case: ignore encoding altogether and just deal with byte
sequences. Since names are stored and returned as they arrive from
the Finder & Co. no Problem arises. (In practice, problems arise
when the using terminal or applications that don't follow Apple's
encoding rules, because names in the wrong encoding could end up on
the file system.)

(b) complex case: Convert the internal form to and from formD when
communicating with the VFS (and through it with higher levels like
Finder)

In case of (b) we have two implementation choices:

(1) stick to the rules and really do the conversion, in both
directions, and verifying that what ever we get from the VFS is
actually in formD (it might not, when using terminal or 3rd party
applications not following Apple's encoding rules). In that case, the
setting of the normalization property doesn't matter, because it
controls how names are recorded *on* *disk*, and this encoding would
*never* be exposed to the VFS.

(2) be lazy and essentially do (a), that is present the names to VFS
in the form mandated by the normalization property when reading, i.e.
pass-through, but still do a best effort to force names received from
the VFS into the form mandated by normalization property when writing.



I hope this answers the question and sheds some light on the problem
of filename encoding.


Best regards

Björn

- --
| Bjoern Kahl +++ Siegburg +++ Germany |
| "googlelogin@-my-domain-" +++ www.bjoern-kahl.de |
| Languages: German, English, Ancient Latin (a bit :-)) |
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQCVAgUBUyeFwlsDv2ib9OLFAQIpNQP/a5lWCP4RGusktQhUsdm8uIaILLPendG2
9K3zgX8zHr2oMHftLQO8RU9Gk6dN68woINWmXkwGJYhrgFjQOuUMzJo38rR++AoJ
ZsKqX5siOGTnxHntypyxFsjiLfY6NBHHY1spHAH9wU6kDTgJyqRrQ8LoBi5xASuh
nweR0JdSXQg=
=Wbv0
-----END PGP SIGNATURE-----

Bjoern Kahl

unread,
Mar 17, 2014, 8:18:01 PM3/17/14
to zfs-...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


To late in the night, hit "send" to early :-(

Am 18.03.14 00:31, schrieb Bjoern Kahl:
That should have read:

(2) be lazy and essentially do (a) but require the user to set "formD"
as value for he normalization property and then present the names to
VFS in the form found on disk, but still do a best effort to force
names received from the VFS into the form mandated by normalization
property when writing, in order not to taint a ZFS pool originating
from some other system.

Obviously (b.2) isn't a real option.



> I hope this answers the question and sheds some light on the
> problem of filename encoding.
>
>
> Best regards
>
> Björn

- --
| Bjoern Kahl +++ Siegburg +++ Germany |
| "googlelogin@-my-domain-" +++ www.bjoern-kahl.de |
| Languages: German, English, Ancient Latin (a bit :-)) |
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQCVAgUBUyeQtVsDv2ib9OLFAQK3MQP+JEhwtmjyAwikJ+KRMdcKOqWxy/Sf1jjG
z2tVkM2BM2zkZAFV+iq3W3BwWHftESiKWRObzbLkvZjEhYUYxGfCbuTfD0f4V8Ng
oV5vjOkoxNCi82QiCDQq04vUlCEpbp0QSojguixLpBKPM4OisPYdGqoNo510w8cx
J9f+G88Iw10=
=2Z9E
-----END PGP SIGNATURE-----

Peter Lai

unread,
Mar 17, 2014, 9:51:38 PM3/17/14
to zfs-...@googlegroups.com
why do I get the feeling apple made everything worse by not sticking
with either UTF-16 or UTF-8 encodings and posix collation for Finder
etc.?

Alex Blewitt

unread,
Mar 18, 2014, 5:18:29 AM3/18/14
to zfs-...@googlegroups.com
These are using UTF-8. The problem is that there is multiple ways of referring to ö in UTF-8.

Alex

Sent from my iPhone 5

> On 18 Mar 2014, at 01:51, Peter Lai <cow...@gmail.com> wrote

roemer

unread,
Mar 18, 2014, 7:52:49 AM3/18/14
to zfs-...@googlegroups.com
Excellent post, many thanks for the links, especially about RAIDZ and the MTTDL metric problem. 

Now back to the question about RAIDZ and/or copies=X:
Both protect against data corruption on disk. RAIDZ does it with parity information on the whole disk level, copies=X does it via internal file copies.
If the goal is to protect against whole disk failures in a multi-disk setting, I would assume RAIDZ is more natural. 
It doesn't discriminate though about which files it protects - everything gets stored with additional parity information.

The interesting bit that I read in the 'ZFS, Copies, and Data Protection' article is that, in contrast to traditional raid, it seems not to give improved performance due to stripping. At least not in terms of disk operation rate. It rather 'just' improves fail safety. The article leaves out though whether the disk operations actually deal with more data in the RAIDZ case as one logical RAIDZ I/O still affects N-1 disk blocks, so I would assume data throughput per file access still increases with the number of disks N (for N>2). 
Is my reasoning here correct?

The 'copies=X' parameter of zfs file systems seems to target settings with just a single disk, say a ZFS-formatted partition on a laptop drive, where raid is not applicable. This is fine and makes IMHO a lot of sense. But I do not see a point for combining both together as the parity information of RAIDZ would already protect against data corruption and even disk loss. 

One interesting question is whether copies=X (X>=2) alone could do the same than RAIDZ on a purely stripped disk pool.
I read in Oracle's zfs documentation that copies=X tries to store copies on different disks - but wouldn't a stripped disk pool use all disks anyway?
Or am I incorrectly mixing here my understanding of traditional raid0 settings with the mechanics of zfs?

Some background information perhaps why I am asking all this:
I am playing with the idea to format a 4-disk 'JBOD" enclosure using zfs with a RAIDZ or even RAIDZ2 setting to protect against disk failures.
In my understanding this also should protect against single file corruption and the ominous 'bit rot' - especially with RAIDZ2.
I would loose one or two disk capacity though from the beginning, which I would be fine with. 
If I could gain some space again using a different tactic, I am fine too though as the enclosure has 4 bays only.
I am also not sure now whether the performance is still higher due to parallel I/O (see comment above about constant number of disk ops per RAIDZ)...
At least it should be so high to saturate a gigabit ethernet link (i.e. 100 - 110 MB/s).


On Monday, 17 March 2014 22:23:18 UTC+11, Philip Robar wrote:
On Mon, Mar 17, 2014 at 3:35 AM, Dave Cottlehuber <d...@jsonified.com> wrote:
On 17. März 2014 at 05:00:25, roemer (uwe....@gmail.com) wrote:

> > How does a 'copies=2' filesystem play together with a 'RAIDZ1' (or even
> > RAIDZ2) pool? RAIDZ would have all data stored redundantly already, so
> > would 'copies=2' not end up in quadrupling the storage requirement if used
> > on a raidz pool?

Yes

So the amount of space lost to parity is a constant of disk size x RAID level. Thus, if you're using copies, the amount of space lost is just dataset size / copies. One of the nice things about using copies as opposed to mirroring is that you can set it on a per file system (e.g. dataset) as opposed to mirroring which affects the entire vdev.

Daniel Becker

unread,
Mar 18, 2014, 9:13:27 AM3/18/14
to zfs-...@googlegroups.com
On Mar 18, 2014, at 4:52 AM, roemer <uwe....@gmail.com> wrote:

I am also not sure now whether the performance is still higher due to parallel I/O (see comment above about constant number of disk ops per RAIDZ)...
At least it should be so high to saturate a gigabit ethernet link (i.e. 100 - 110 MB/s).

For sequential transfers, RAIDZ with n disks will give you (n-1) times the throughput of a single disk. It’s only IOPS (i.e., performance in seek-dominated workloads / random I/O) where it doesn’t buy you anything, and with those kinds of workloads you typically won’t reach that kind of throughput anyway, at least not with conventional hard drives.

roemer

unread,
Mar 18, 2014, 10:04:07 AM3/18/14
to zfs-...@googlegroups.com
That's what I assumed too when reading that zfs article, but it didn't become clear to me.
Thanks for the clarification. For my typical SOHO usage pattern, I wouldn't expect IOPS throughput to be a problem. 

David Cantrell

unread,
Mar 18, 2014, 11:08:41 AM3/18/14
to zfs-...@googlegroups.com
On Mon, Mar 17, 2014 at 09:51:38PM -0400, Peter Lai wrote:

> why do I get the feeling apple made everything worse by not sticking
> with either UTF-16 or UTF-8 encodings and posix collation for Finder
> etc.?

If you're going to allow funny foreign characters in filenames then you
can't use POSIX collation. Why? Because POSIX collation sorts by
character code byte values. However, to pick a language at random - in
Icelandic, ¿ is sorted between D and E, despite the character codes
being, in order, 0x44, 0xD0 and 0x45, and their UTF-8 encodings being
0x44, 0xC3 0x90, 0x45.

No doubt my editor/MUA combination will have bollocksed up the funny
foreign character in this email.

--
David Cantrell | even more awesome than a panda-fur coat

People from my sort of background needed grammar schools to
compete with children from privileged homes like ... Tony Benn
-- Margaret Thatcher

Daniel Jozsef

unread,
Mar 19, 2014, 12:26:53 PM3/19/14
to zfs-...@googlegroups.com
Finder is one of them.

When I first migrated my Linux-created ZFS mirror pool over to ZEVO (after tearing down my NAS box, and housing my data in a pair of simple Firewire 800 enclosures), I noticed that files that were there in the command line were missing in Finder. Sometimes Finder would flash a folder full of files upon opening, and then just make most of the icons (files) disappear.

These disappearing files were ones with native characters in their names. Since the pool was already created, I had no option of setting normalization at a ZFS level, so I just wrote a script to go through all the files and rename them to a FormD name. Finder suddenly started working normally.

Matt Elliott

unread,
Mar 19, 2014, 2:44:48 PM3/19/14
to zfs-...@googlegroups.com
Could you share your script?

Dave Cottlehuber

unread,
Mar 19, 2014, 4:28:51 PM3/19/14
to zfs-...@googlegroups.com
On 19. März 2014 at 17:26:55, Daniel Jozsef (daniel...@gmail.com) wrote:
> *Finder *is one of them.

aah yes, you’re right! I’ve found that some of these files were not accessible via terminal either in my case.

> When I first migrated my Linux-created ZFS mirror pool over to ZEVO (after
> tearing down my NAS box, and housing my data in a pair of simple Firewire
> 800 enclosures), I noticed that files that were there in the command line
> were missing in Finder. Sometimes Finder would flash a folder full of files
> upon opening, and then just make most of the icons (files) disappear.
>
> These disappearing files were ones with native characters in their names.
> Since the pool was already created, I had no option of setting
> normalization at a ZFS level, so I just wrote a script to go through all
> the files and rename them to a FormD name. Finder suddenly started working
> normally.

Can you elaborate on how you did that? BTW normalisation can be set at the dataset (zfs filesystem) level, not just the pool.

To fix mine, I exported the pool, booted a VM with raw disk access, in FreeBSD (mfsBSD actually), created new datasets and used rsync to copy them over. All was well after rebooting back to osx again.
Reply all
Reply to author
Forward
0 new messages