More benchmark results coming

19 views
Skip to first unread message

David Abrahams

unread,
May 18, 2009, 9:49:28 AM5/18/09
to zfs-...@googlegroups.com
I finally got a reasonable OpenSolaris (0906) installation on my
server, so I'm going to compare the speed of ZFS-Fuse with the in-
kernel stuff. The tests I'm planning to run are essentially "iozone -
a -g 16G" over two different configurations:

1. A regular pool of 8 whole drives, no redundancy
2. RAIDZ2 over those same 8 drives.

Anyone wanting to make more specific requests for results, now is the
time to speak up!

Cheers,

--
David Abrahams
BoostPro Computing
http://boostpro.com


Jonathan Schmidt

unread,
May 18, 2009, 1:28:13 PM5/18/09
to zfs-...@googlegroups.com
David Abrahams wrote:
> I finally got a reasonable OpenSolaris (0906) installation on my
> server, so I'm going to compare the speed of ZFS-Fuse with the in-
> kernel stuff. The tests I'm planning to run are essentially "iozone -
> a -g 16G" over two different configurations:
>
> 1. A regular pool of 8 whole drives, no redundancy
> 2. RAIDZ2 over those same 8 drives.
>
> Anyone wanting to make more specific requests for results, now is the
> time to speak up!
It might be good to have baseline numbers to compare it to, such as ext3
single disk performance under linux, and possibly an 8-way software
RAID0. Actually running something like HD Tune will give you the raw
disk performance independent of the filesystem (might require Windows).
That way we can see how much speed is lost under each configuration.

Also if you have time, a pareto of the various zfs settings would be
interesting. Most importantly (to me), on the 8 drive pool without
redundancy, test a filesystem that has "copies=2" set. Also I'd be
interested in the performance loss going to "checksum=sha256" and
"compression=on/gzip".

Thanks!

David Abrahams

unread,
May 18, 2009, 4:17:27 PM5/18/09
to zfs-...@googlegroups.com

on Mon May 18 2009, Jonathan Schmidt <jon-AT-jschmidt.ca> wrote:

> David Abrahams wrote:
>> I finally got a reasonable OpenSolaris (0906) installation on my
>> server, so I'm going to compare the speed of ZFS-Fuse with the in-
>> kernel stuff. The tests I'm planning to run are essentially "iozone -
>> a -g 16G" over two different configurations:
>>
>> 1. A regular pool of 8 whole drives, no redundancy
>> 2. RAIDZ2 over those same 8 drives.
>>
>> Anyone wanting to make more specific requests for results, now is the
>> time to speak up!
> It might be good to have baseline numbers to compare it to, such as ext3
> single disk performance under linux,

ext3 is such a poorly-performing filesystem that I don't think it's
worth much as a baseline. I'd be happy to test JFS or XFS for that
purpose.

> and possibly an 8-way software RAID0.

OK, could do that. It will take some time. With 8G of RAM in this
machine, any testing that exceeds the cache and really tests the disk
throughput takes hours.

> Actually running something like HD Tune will give you the raw
> disk performance independent of the filesystem (might require Windows).
> That way we can see how much speed is lost under each configuration.

I hadn't planned to run Windows on bare metal ever again. Hoping not to
start now. Maybe it works under Wine?

> Also if you have time, a pareto of the various zfs settings would be
> interesting. Most importantly (to me), on the 8 drive pool without
> redundancy, test a filesystem that has "copies=2" set.

Yes, I'm interested in that too.

> Also I'd be interested in the performance loss going to
> "checksum=sha256" and "compression=on/gzip".

Wow, we'll be at this for days! Let's see how patient I can be... ;-)

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

sghe...@hotmail.com

unread,
May 18, 2009, 4:57:36 PM5/18/09
to zfs-...@googlegroups.com
David Abrahams wrote:
> Also if you have time, a pareto of the various zfs settings would be
> interesting. Most importantly (to me), on the 8 drive pool without
> redundancy, test a filesystem that has "copies=2" set. Also I'd be
> interested in the performance loss going to "checksum=sha256" and
> "compression=on/gzip".
>
Ermmm.. Bias? I'd be interested in the performance characteristics and
impact of different checksumming and compression options. Legend (and my
experience in low-end 2way mirror setups) has it that compression can
typically boost throughput, at the expense of more CPU usage.

Perhaps you meant the performance loss of gzip vs. lzo compression, and
performance loss of sha256 vs. fletcher? In those comparisons it would
be safe to predict a loss of performance[1]

Anyhow, I'd specify what I like measured, not the desired outcome :_P

Seth

[1] in the absence of specialized hardware or fluked measurements :)

sghe...@hotmail.com

unread,
May 18, 2009, 5:07:13 PM5/18/09
to zfs-...@googlegroups.com

> ext3 is such a poorly-performing filesystem that I don't think it's
> worth much as a baseline. I'd be happy to test JFS or XFS for that
> purpose.
>
>
Wow - this seems based on some assumptions. In my opinion, these are not
givens.

my thoughts here:

(a) if the baseline is intended to show cost/improvement over *regular*
linux setups then ext3 is perfect choice
(b) if the baseline is intended to give some foothold to correlate the
findings with other published benchmarks, then ext3 will *still* be a
very good choice, because of the high volume of benchmark data available
that includes ext3 (this includes all of the jfs/xfs figures I ever came
across).

(c) if the baseline is intended to provide comparisons to other
tuned/optimized raid setups than ext3 might be a poor choice indeed.

$0.02
----
I personally like ext3 as a baseline: to me its more or less the
'English' of linux filesystems. It has few dials/knobs and they are
pretty well-understood. Therefore it seems quite likely that with a
'default ext3 setup' you'll have figures that are recognizable to
readers of the benchmark. Contrast that with jfs, xfs?


Jonathan Schmidt

unread,
May 18, 2009, 5:12:17 PM5/18/09
to zfs-...@googlegroups.com
sghe...@hotmail.com wrote:
> David Abrahams wrote:
>
>> Also if you have time, a pareto of the various zfs settings would be
>> interesting. Most importantly (to me), on the 8 drive pool without
>> redundancy, test a filesystem that has "copies=2" set. Also I'd be
>> interested in the performance loss going to "checksum=sha256" and
>> "compression=on/gzip".
>>
>>
> Ermmm.. Bias?
I'm not sure what you're getting at -- I'm certainly biased towards my
own interests and opinions, but which are you referring to?

> I'd be interested in the performance characteristics and
> impact of different checksumming and compression options. Legend (and my
> experience in low-end 2way mirror setups) has it that compression can
> typically boost throughput, at the expense of more CPU usage.
>
> Perhaps you meant the performance loss of gzip vs. lzo compression, and
> performance loss of sha256 vs. fletcher? In those comparisons it would
> be safe to predict a loss of performance[1]
>
I can run the tests myself under zfs-fuse. Mostly I'm interested to see
it under solaris. I've heard claims that ZFS runs "at platter speed"
which is several hundred MB/s.

> Anyhow, I'd specify what I like measured, not the desired outcome :_P
>
Oh maybe I get it. I suppose I asked for the performance "loss" by
enabling those features, which could be interpreted as me predicting the
outcome of the measurements. Well, sha256 isn't going to be faster than
fletcher for sure (and yes that's the comparison I want), but it might
be exactly the same speed (with higher CPU use). Compression can
theoretically boost throughput as you mentioned, but it's going to be
data dependent. I suppose it could fall either way.

As an engineer I've learned to guess outcomes, and my instincts tend to
be very accurate, so I apologise if I'm making undue assumptions. I
love being wrong though, so get the data and make my day :)

> Seth
>
> [1] in the absence of specialized hardware or fluked measurements :)
>

I'd love to repurpose my GPU to do ZFS calculations :D

Jonathan

sghe...@hotmail.com

unread,
May 18, 2009, 5:19:49 PM5/18/09
to zfs-...@googlegroups.com

> Oh maybe I get it. I suppose I asked for the performance "loss" by
> enabling those features, which could be interpreted as me predicting the
> outcome of the measurements. Well, sha256 isn't going to be faster than
> fletcher for sure (and yes that's the comparison I want), but it might
> be exactly the same speed (with higher CPU use). Compression can
> theoretically boost throughput as you mentioned, but it's going to be
> data dependent. I suppose it could fall either way.
>
> As an engineer I've learned to guess outcomes, and my instincts tend to
> be very accurate, so I apologise if I'm making undue assumptions. I
> love being wrong though, so get the data and make my day :)
>
>
I share your analysis and expectations. My personal experience with
zfs-fuse on fast CPU and cheap disks is that compression increases
throughput. Which is why I triggered at the word 'loss' :) Let's find
out what a proper system would do!

Though, unless your system would be CPU-bound (not a very interesting
case for benchmarking then) then compression could hardly decrease the
performance. Further, I might have read (need to refresh memory) that
zfs[-fuse] sports a smart compression detection algorithm that tries to
avoid compressing 'uncompressible' (ergo: compressed) data.[1] This
could alleviate even the CPU load issues.

[1] where is my memory these days


Jonathan Schmidt

unread,
May 18, 2009, 5:30:29 PM5/18/09
to zfs-...@googlegroups.com
>
> Though, unless your system would be CPU-bound (not a very interesting
> case for benchmarking then) then compression could hardly decrease the
> performance. Further, I might have read (need to refresh memory) that
> zfs[-fuse] sports a smart compression detection algorithm that tries to
> avoid compressing 'uncompressible' (ergo: compressed) data.[1] This
> could alleviate even the CPU load issues.
>

Yeah I've seen that feature while trolling through the code. It does a
test compression with a very fast compressor and if it doesn't get at
least a certain ratio then it doesn't run the more expensive one.

> [1] where is my memory these days
>

Who needs memory when DRAM is so cheap??

sghe...@hotmail.com

unread,
May 18, 2009, 5:36:57 PM5/18/09
to zfs-...@googlegroups.com
My mind is so oldfashioned. It still only get's the dual-inline pin
layouts hehehe. Well, fortunately these are cheap now
> >
>
>
>

David Abrahams

unread,
May 18, 2009, 6:36:13 PM5/18/09
to zfs-...@googlegroups.com

On May 18, 2009, at 5:07 PM, sghe...@hotmail.com wrote:

>
>
>> ext3 is such a poorly-performing filesystem that I don't think it's
>> worth much as a baseline. I'd be happy to test JFS or XFS for that
>> purpose.
>>
>>
> Wow - this seems based on some assumptions.

Not assumptions, my boy. Hearsay. There's a big difference ;-)
http://tinyurl.com/which-linux-filesystem
http://www.debian-administration.org/articles/388
http://linuxgazette.net/102/piszcz.html

I don't have time to do the tests myself, what with each iozone run
taking something like 8 hours. I have to go based on what I've read.

> In my opinion, these are not
> givens.
>
> my thoughts here:
>
> (a) if the baseline is intended to show cost/improvement over
> *regular*
> linux setups then ext3 is perfect choice

Well, I don't know what other people want it for, but I'd like to know
how ZFS is doing compared to another filesystem I'd actually *use* if
I found something to be really unworkable about ZFS.

> (b) if the baseline is intended to give some foothold to correlate the
> findings with other published benchmarks, then ext3 will *still* be a
> very good choice, because of the high volume of benchmark data
> available
> that includes ext3 (this includes all of the jfs/xfs figures I ever
> came
> across).

By that logic, I guess it doesn't matter much which one we use as a
baseline, since we can always deduce one from the other given known
numbers.

> (c) if the baseline is intended to provide comparisons to other
> tuned/optimized raid setups than ext3 might be a poor choice indeed.


> $0.02
> ----
> I personally like ext3 as a baseline: to me its more or less the
> 'English' of linux filesystems. It has few dials/knobs and they are
> pretty well-understood. Therefore it seems quite likely that with a
> 'default ext3 setup' you'll have figures that are recognizable to
> readers of the benchmark. Contrast that with jfs, xfs?


Understood; it would only be less useful to me, personally. However,
I did solicit requests...

David Abrahams

unread,
May 18, 2009, 6:37:38 PM5/18/09
to zfs-...@googlegroups.com

On May 18, 2009, at 5:12 PM, Jonathan Schmidt wrote:

> I'd love to repurpose my GPU to do ZFS calculations :D


Now *that* is a lovely idea.

sghe...@hotmail.com

unread,
May 18, 2009, 7:46:11 PM5/18/09
to zfs-...@googlegroups.com
David Abrahams wrote:
On May 18, 2009, at 5:07 PM, sghe...@hotmail.com wrote:

  
ext3 is such a poorly-performing filesystem that I don't think it's
worth much as a baseline.  I'd be happy to test JFS or XFS for that
purpose.

      
Wow - this seems based on some assumptions.
    
Not assumptions, my boy.  Hearsay.  There's a big difference ;-)
http://tinyurl.com/which-linux-filesystem
http://www.debian-administration.org/articles/388
http://linuxgazette.net/102/piszcz.html
  
Clarifying: I did not mean 'is such a poorly-performing' is an assumption. 'I don't think its worth much as a baseline', however, contained a big one: the assumption is "what is the baseline for".
...
  
Well, I don't know what other people want it for, but I'd like to know  
how ZFS is doing compared to another filesystem I'd actually *use* ...
So that basically makes the assumption "you" :) No problem, all my bullets contain multiple (unintended) subjectivities as well :) If anyone cares, feel free to point them out.[1]

if  
I found something to be really unworkable about ZFS.

  
(b) if the baseline is intended to give some foothold to correlate the
findings with other published benchmarks, then ext3 will *still* be a
very good choice, because of the high volume of benchmark data  
available
that includes ext3 (this includes all of the jfs/xfs figures I ever  
came
across).
    
By that logic, I guess it doesn't matter much which one we use as a  
baseline, since we can always deduce one from the other given known  
numbers.

  
No, because some fs-es have significantly less published results and significantly less predictable results for varying configurations. The type of fs used as a baseline matters. Much for the same reason Napolean cast the first standard (now ISO) meter in platinum-iridium, not wood. [2]

Understood; it would only be less useful to me, personally.  However,  
I did solicit requests...

  
Thx for sharing your expertise and experience!

--
David Abrahams
BoostPro Computing
http://boostpro.com
  
[1] I detect nervous smiley-ness, sorry :)
[2] I feel I might need to clarify this big hand-wavy statement a little? I know for a fact that running XFS on your regular linux setup may yield wildly varying results depending on (subtle or not-so-subtle) configuration differences e.g. in partitioning or the types of (virtual) block devices used. I've had some spotty results with JFS, and I know for a fact that JFS on linux has vastly different characteristics than JFS on a recent AIX. That makes me wary of interpreting any published JFS numbers - they just might not be relevant to my case(s) for too many reasons. Of course I could use more expertise to fix these wholes in my ability to judge, but then again, like you, I don't have the time to do all of that

Fajar A. Nugraha

unread,
May 18, 2009, 10:52:00 PM5/18/09
to zfs-...@googlegroups.com
On Tue, May 19, 2009 at 5:36 AM, David Abrahams <da...@boostpro.com> wrote:
> On May 18, 2009, at 5:07 PM, sghe...@hotmail.com wrote:
>>> ext3 is such a poorly-performing filesystem that I don't think it's
>>> worth much as a baseline.
>>>
>> Wow - this seems based on some assumptions.
>
> Not assumptions, my boy.  Hearsay.  There's a big difference ;-)
> http://tinyurl.com/which-linux-filesystem
> http://www.debian-administration.org/articles/388
> http://linuxgazette.net/102/piszcz.html
>
> I don't have time to do the tests myself, what with each iozone run
> taking something like 8 hours.  I have to go based on what I've read.

Last time I test with a simple dd (bs=1M, size twice available
memory), on two server with the same hardware, ext3 on Linux has twice
the throughput of zfs on opensolaris. Please do dd test as well if you
can, I'd be interested in your results.

Regards,

Fajar

David Abrahams

unread,
May 19, 2009, 1:04:53 PM5/19/09
to zfs-...@googlegroups.com

on Mon May 18 2009, Jonathan Schmidt <jon-AT-jschmidt.ca> wrote:

> Most importantly (to me), on the 8 drive pool without
> redundancy, test a filesystem that has "copies=2" set

Here are results from the two tests that have completed so far: one is
the 8-drive pool with no redundancy; the other is the same thing, but
with copies=2.

I'd appreciate it greatly if someone else could invest a little in
analyzing and/or graphing these results; I've got my hands a bit full
with the testing itself. Thanks!

------

osol-zfs.txt
osol-zfs-copies-2.txt

David Abrahams

unread,
May 19, 2009, 1:23:01 PM5/19/09
to zfs-...@googlegroups.com

on Mon May 18 2009, Jonathan Schmidt <jon-AT-jschmidt.ca> wrote:

> Also I'd be interested in the performance loss going to
> "checksum=sha256" and "compression=on/gzip".

"compression=gzip" is legal, but "compression=on/gzip" is not, so I
assume you meant the former. I'm doing that test now. I'm uncertain
whether iozone writes anything to its test files that can give us
meaningful results in this case, but I've asked Don Capps about it, so
we'll see.

Jonathan Schmidt

unread,
May 19, 2009, 2:09:22 PM5/19/09
to zfs-...@googlegroups.com
>> Also I'd be interested in the performance loss going to
>> "checksum=sha256" and "compression=on/gzip".
>
> "compression=gzip" is legal, but "compression=on/gzip" is not, so I
> assume you meant the former. I'm doing that test now. I'm uncertain
> whether iozone writes anything to its test files that can give us
> meaningful results in this case, but I've asked Don Capps about it, so
> we'll see.

Sorry, I meant those as two separate settings. "compression=on" uses
LZO and "compression=gzip" uses gzip. Both settings are going to be
data dependent so zero-filled files will skew the results significantly.
Random data will skew the results as well (pure entropy is not very
compressible). So I'm not sure what to suggest...

David Abrahams

unread,
May 19, 2009, 2:12:53 PM5/19/09
to zfs-...@googlegroups.com

On May 19, 2009, at 2:09 PM, Jonathan Schmidt wrote:

>
>>> Also I'd be interested in the performance loss going to
>>> "checksum=sha256" and "compression=on/gzip".
>>
>> "compression=gzip" is legal, but "compression=on/gzip" is not, so I
>> assume you meant the former. I'm doing that test now. I'm uncertain
>> whether iozone writes anything to its test files that can give us
>> meaningful results in this case, but I've asked Don Capps about it,
>> so
>> we'll see.
>
> Sorry, I meant those as two separate settings. "compression=on" uses
> LZO and "compression=gzip" uses gzip.

OK, I'll try them both if I can muster the patience.

> Both settings are going to be
> data dependent so zero-filled files will skew the results
> significantly.
> Random data will skew the results as well (pure entropy is not very
> compressible). So I'm not sure what to suggest...

Right, that's what I meant.

Luke Marsden

unread,
May 20, 2009, 8:25:30 AM5/20/09
to zfs-...@googlegroups.com, Kieran Simkin, Robin Haswell
Hi everyone,

In case it helps, we recently did some similar benchmarks with iozone, except in this case it was Linux ZFS-FUSE vs. FreeBSD 7.2 vs. OpenSolaris 11/08. We even made some pretty graphs:


Headlines: OpenSolaris runs about 10% above FreeBSD. ZFS-FUSE write performance is terrible (we got only 3% of the write performance that we got out of FreeBSD) but it does compete when it comes to reads.

All the OSes were running on the metal, and on the same hardware (Core 2 Quad with 8GB RAM and a single Samsung SpinPoint F1 HD103UJ 1TB Hard Drive w/32MB cache). We used the latest version of ZFS-FUSE which was available packaged for Ubuntu 9.04. The performance figures on the Y axis (vertical) are in kb/sec, the X axis shows dataset size in kb and the Z axis I believe shows the block size used per request.

It looks like the cost of going from kernel -> userspace and back for each filesystem request really adds to the latency and the jitter of the results, not to mention throughput. Furthermore, ZFS's intensive use of a large RAM cache really shows. The ARC cache on  OpenSolaris was explicitly set to 2GB (we realised this afterwards), whereas FreeBSD was left free to use whatever it could, and I've no idea what ZFS-FUSE does regarding memory allocation, but it seems to stay pretty limited (the fuse process never took up more than a few hundred megs of RAM).

If we've got anything badly wrong regarding ZFS-FUSE here I'd be very happy to know about ways to improve the performance, because for our application it looks like it's presently not an option (although we'd love it to be).

Cheers!
Luke Marsden
Hybrid Logic Ltd.

Mobile: 07791750420

sghe...@hotmail.com

unread,
May 20, 2009, 8:40:18 AM5/20/09
to zfs-...@googlegroups.com
Very nice!

I'll give it a look, although I must admit I stopped reading at the
hardware specs: "a single Samsung SpinPoint...". A single HD? No mirror,
striping. No redundancy. Why ZFS?

I'll still read it because it compares the different OSes (sweet) and
you are running on almost exactly my hardware (at least in important areas).

David Abrahams

unread,
May 20, 2009, 9:33:52 AM5/20/09
to zfs-...@googlegroups.com, Kieran Simkin, Robin Haswell

on Wed May 20 2009, Luke Marsden <luke.marsden-AT-gmail.com> wrote:

> Hi everyone,
>
> In case it helps, we recently did some similar benchmarks with iozone,
> except in this case it was Linux ZFS-FUSE vs. FreeBSD 7.2 vs. OpenSolaris
> 11/08. We even made some pretty graphs:
>
> http://lukemarsden.net/hl/Linux%20vs.%20FreeBSD%20ZFS%20Performance%20Report%202.pdf

Wow, I wish I'd known about this one earlier.

> Headlines: OpenSolaris runs about 10% above FreeBSD. ZFS-FUSE write
> performance is terrible (we got only 3% of the write performance that we got
> out of FreeBSD) but it does compete when it comes to reads.

How competitive _is_ it?

> All the OSes were running on the metal, and on the same hardware (Core 2
> Quad with 8GB RAM and a single Samsung SpinPoint F1 HD103UJ 1TB Hard Drive
> w/32MB cache). We used the latest version of ZFS-FUSE which was available
> packaged for Ubuntu 9.04.

without the big_writes patch it's not surprising you got poor write
performance, I think. See below

> The performance figures on the Y axis (vertical) are in kb/sec, the X
> axis shows dataset size in kb and the Z axis I believe shows the block
> size used per request.
>
> It looks like the cost of going from kernel -> userspace and back for each
> filesystem request really adds to the latency and the jitter of the results,
> not to mention throughput. Furthermore, ZFS's intensive use of a large RAM
> cache really shows. The ARC cache on OpenSolaris was explicitly set to 2GB
> (we realised this afterwards), whereas FreeBSD was left free to use whatever
> it could, and I've no idea what ZFS-FUSE does regarding memory allocation,
> but it seems to stay pretty limited (the fuse process never took up more
> than a few hundred megs of RAM).

I found that ZFS-Fuse "takes up" *lots* of memory, but not much of it
stays resident:
http://groups.google.com/group/zfs-fuse/browse_thread/thread/911370fa54cde008

> If we've got anything badly wrong regarding ZFS-FUSE here I'd be very happy
> to know about ways to improve the performance, because for our application
> it looks like it's presently not an option (although we'd love it to be).

Well, you might try doing what I've documented here, which includes the
aforementioned big_writes patch:

http://techarcana.net/hydra/zfs-installation/

David Abrahams

unread,
May 20, 2009, 9:36:20 AM5/20/09
to zfs-...@googlegroups.com, Kieran Simkin, Robin Haswell

on Wed May 20 2009, Luke Marsden <luke.marsden-AT-gmail.com> wrote:

Just a thought: it might be really useful to make a graph of the
*differences* in performance of the three platforms. That's what I'm
planning to do if nobody else takes up the data analysis challenge.

Luke

unread,
May 20, 2009, 1:54:41 PM5/20/09
to zfs-fuse, Mike Smithson, Robin Haswell, Kieran Simkin
Hi Dave,

Thanks for this hint. When we revisit doing a Linux port later I will
definitely test with these patches.

If you want the raw data to do subtractive graphs or such, here it is:
http://lukemarsden.net/zfs-benchmarks/ (results.txt is the OpenSolaris
one).

I didn't realise that ZFS-FUSE sets the ARC cache to 128Mb by default.
That certainly helps explain the results.

Do you / does anyone here use ZFS-FUSE for production? If so, what
kind of stability do you get with it?

Cheers!
Luke Marsden
Hybrid Logic

Mobile: 07791750420


On May 20, 2:36 pm, David Abrahams <d...@boostpro.com> wrote:
> on Wed May 20 2009, Luke Marsden <luke.marsden-AT-gmail.com> wrote:
>
> > We even made some pretty graphs:
>
> >http://lukemarsden.net/hl/Linux%20vs.%20FreeBSD%20ZFS%20Performance%2...

Jonathan Schmidt

unread,
May 20, 2009, 1:59:17 PM5/20/09
to zfs-...@googlegroups.com
> Do you / does anyone here use ZFS-FUSE for production? If so, what
> kind of stability do you get with it?

If by "production" you mean my home file server, then yes, I do use it.
I haven't had a crash or anything so I suppose it's been pretty stable
for me.

sghe...@hotmail.com

unread,
May 20, 2009, 2:29:10 PM5/20/09
to zfs-...@googlegroups.com
In my setups it is pretty stable as well.

Most notable quirks: On one of my setups currently I have reported
* consistent crashing (core dumps) on zfs umount -a.
* On this same system I need to manually (u)mount nested mountpoints in
the correct order.
* I can never snapshot all filesystems recursively (-r) while unmounted
('filesystem is busy'). The standard Solaris workaround of mount+umount
does not fix that.
* Filesystems with spaces in the name will not properly unmount.

The system is Intrepid,
Package: zfs-fuse
Pin: version 0.5.1-1ubuntu1
Pin-Priority: 1000

Even that system is pretty stable otherwise (and has been since
2008-12-24, rebooting every workday and with an uptime of approx. 8hrs/day).

I must say that I haven't come round to testing upgrades (because the
machine is remote) and I don't have any of these issues on my other setups.

My other systems are for longterm backup/history. They don't have any
notable uptime but high volume and snapshot volumes. Also the usage
pattern is one of frequent exporting/importing the pools across machines
(USB storage).

I must add, these are *dev* machines, except for the one with the
quirks, which is unfortunately one of the few machines I administer for
my family relations (who drives a small accountancy business from this
desktop). I still vouch for the use of ZFS (-fuse) with an eye on the
data reliability (corruption detection) but I'd rather not have had the
spotty quirks mentioned.

Regards,
Seth

sghe...@hotmail.com

unread,
May 20, 2009, 2:48:06 PM5/20/09
to zfs-...@googlegroups.com
Addition/correction:

I haven't seen any of these segmentation faults/general protection faults in zfs-fuse since May 1st, the day when I upgraded the flakey system to Jaunty, with 0.5.1-1ubuntu2. There must have been an issue with the ubuntu kernel (or perhaps my config, as I use some nonstandard tweaks like CONCURRENCY=shell in /etc/init.d/rc[1]).

I'm still using the same mods to mount/umount in the correct order manually and still cannot take a clean recursive snapshot before mounting any of the filesystems.

Cheers,
Seth

[1] I just checked: I reverted that setting on 2009-02-03, so the issue must be unrelated to this particular tweak, because I sure recieved at least 62 more failures/coredumps between feb 4th and may 1st:

Maildir/.sysadmin.<hostname># find -ctime -105 | xargs grep zfs-fuse -h|sort | uniq -c
      1     [ 1054.643591] zfs-fuse :  1 Time(s)
      1     [  113.469890] zfs-fuse :  1 Time(s)
      1     [  132.743156] zfs-fuse :  1 Time(s)
      1     [ 1403.548224] zfs-fuse :  1 Time(s)
      1     [ 1535.047036] zfs-fuse :  1 Time(s)
      1     [ 2015.671443] zfs-fuse :  1 Time(s)
      1     [  203.893585] zfs-fuse :  1 Time(s)
      1     [ 2053.720550] zfs-fuse :  1 Time(s)
      1     [ 2090.624382] zfs-fuse :  1 Time(s)
      1     [ 3070.767025] zfs-fuse :  1 Time(s)
      1     [ 3394.546600] zfs-fuse :  1 Time(s)
      1     [ 3888.801543] zfs-fuse :  1 Time(s)
      1     [  452.826720] zfs-fuse :  1 Time(s)
      2     [ 4798.378729] zfs-fuse :  1 Time(s)
      1     [ 4909.921736] zfs-fuse :  1 Time(s)
      1     [ 5347.063889] zfs-fuse :  1 Time(s)
      1     [ 5420.223617] zfs-fuse :  1 Time(s)
      1     [ 5791.004344] zfs-fuse :  1 Time(s)
      1     [ 6364.610262] zfs-fuse :  1 Time(s)
      1     [  638.930126] zfs-fuse :  1 Time(s)
      1     [  642.738407] zfs-fuse :  1 Time(s)
      1     [ 6603.571389] zfs-fuse :  1 Time(s)
      1     [ 7071.407203] zfs-fuse :  1 Time(s)
      1     [ 7089.269916] zfs-fuse :  1 Time(s)
      1     [ 7159.041495] zfs-fuse :  1 Time(s)
      1     [ 7291.711998] zfs-fuse :  1 Time(s)
      1     [ 7502.083117] zfs-fuse :  1 Time(s)
      1     [  856.132054] zfs-fuse :  1 Time(s)
      1     [ 8643.400377] zfs-fuse :  1 Time(s)
      1     [ 9535.088880] zfs-fuse :  1 Time(s)
      1     [ 9728.585187] zfs-fuse :  1 Time(s)
      1  /etc/init.d/zfs-fuse - 1 Times.
      1  /etc/init.d/zfs-fuse - 3 Times.
      1     zfs-fuse 0.5.1-1ubuntu1 => 0.5.1-1ubuntu2
     24     zfs-fuse :  1 Time(s)
      3     zfs-fuse :  2 Time(s)

David Abrahams

unread,
May 20, 2009, 4:52:59 PM5/20/09
to zfs-...@googlegroups.com
Here are the sha256 results

osol-zfs-sha256.txt

David Abrahams

unread,
May 21, 2009, 11:10:24 AM5/21/09
to zfs-...@googlegroups.com

On May 20, 2009, at 4:52 PM, David Abrahams wrote:

> Here are the sha256 results

...and here are the results for raidz2

osol-raidz2.txt

Luke Marsden

unread,
May 21, 2009, 12:03:39 PM5/21/09
to zfs-...@googlegroups.com
Hey,

We used a single HD because we were comparing the performance between the operating system implementations, not raw disk speed. All that mattered about the disk setup is that it stayed constant between tests.

And why ZFS? Because our application makes heavy use of differential snapshots being sent and received between nodes in a cluster, and as far as I know ZFS is the only filesystem which supports this mode of operation efficiently enough to send/recv a whole filesystem every 10-20 seconds. You don't need lots of HDs to find that useful ;)

Cheers,

Luke Marsden
Hybrid Logic Ltd.


David Abrahams

unread,
May 21, 2009, 1:16:15 PM5/21/09
to zfs-...@googlegroups.com

On May 21, 2009, at 12:03 PM, Luke Marsden wrote:

> Hey,
>
> We used a single HD because we were comparing the performance
> between the operating system implementations, not raw disk speed.
> All that mattered about the disk setup is that it stayed constant
> between tests.

But unless the disk is wicked fast, you'll learn mostly about the disk
bottleneck and not the OS speed, right?

> And why ZFS? Because our application makes heavy use of differential
> snapshots being sent and received between nodes in a cluster, and as
> far as I know ZFS is the only filesystem which supports this mode of
> operation efficiently enough to send/recv a whole filesystem every
> 10-20 seconds. You don't need lots of HDs to find that useful ;)


Nice application!

David Abrahams

unread,
May 21, 2009, 1:20:51 PM5/21/09
to zfs-...@googlegroups.com, Mike Smithson, Robin Haswell, Kieran Simkin

On May 20, 2009, at 1:54 PM, Luke wrote:

>
> Hi Dave,
>
> Thanks for this hint. When we revisit doing a Linux port later I will
> definitely test with these patches.

Good on ya.

> If you want the raw data to do subtractive graphs or such, here it is:
> http://lukemarsden.net/zfs-benchmarks/ (results.txt is the OpenSolaris
> one).

I think my tests are giving a better picture of the OS potential (more
disks) and the differences as applied to my hardware, natch. But I
may check that out anyway.

Interestingly, I did a quick difference graph on my OpenSolaris
tests, and RAIDZ2 over 8 disks is notably faster overall than A
regular pool over 8 disks with copies=2, which surprised me.

> I didn't realise that ZFS-FUSE sets the ARC cache to 128Mb by default.
> That certainly helps explain the results.
>
> Do you / does anyone here use ZFS-FUSE for production? If so, what
> kind of stability do you get with it?


It was working fine until I allocated the same set of partitions to
ZFS and mdRAID simultaneously...

which is why I'm now trying to use a simpler setup with ZFS on whole
disks.

David Abrahams

unread,
May 21, 2009, 3:00:03 PM5/21/09
to zfs-...@googlegroups.com


OpenSolaris, raidz2 across 8 7200 RPM SATA disks:

$ time dd bs=1M count=16K if=/dev/zero of=/tank/bigfile
16384+0 records in
16384+0 records out
17179869184 bytes (17 GB) copied, 313.916 s, 54.7 MB/s

real 5m13.942s
user 0m0.061s
sys 0m24.385s

$ time dd bs=1M count=16K if=/tank/bigfile of=/dev/null

16384+0 records in
16384+0 records out
17179869184 bytes (17 GB) copied, 127.308 s, 135 MB/s

real 2m7.313s
user 0m0.018s
sys 0m15.675s

OpenSolaris, "flat" pool across the same 8 disks:

$ time dd bs=1M count=16K if=/dev/zero of=/tank/bigfile
16384+0 records in
16384+0 records out
17179869184 bytes (17 GB) copied, 290.356 s, 59.2 MB/s

real 4m50.362s
user 0m0.030s
sys 0m17.264s

$ time dd bs=1M count=16K if=/tank/bigfile of=/dev/null
16384+0 records in
16384+0 records out
17179869184 bytes (17 GB) copied, 61.328 s, 280 MB/s

real 1m1.333s
user 0m0.024s
sys 0m13.499s

!! Wow, reads are less than half as fast with raidz2 by these
measurements.

Jonathan Schmidt

unread,
May 21, 2009, 3:07:45 PM5/21/09
to zfs-...@googlegroups.com

Which doesn't make any sense, does it? I've suspected the same thing of
copies=2, but neither should cause any overhead during reads. Thoughts?

David Abrahams

unread,
May 21, 2009, 4:48:09 PM5/21/09
to zfs-...@googlegroups.com

On May 21, 2009, at 3:07 PM, Jonathan Schmidt wrote:

>> !! Wow, reads are less than half as fast with raidz2 by these
>> measurements.
>
> Which doesn't make any sense, does it? I've suspected the same
> thing of
> copies=2, but neither should cause any overhead during reads.
> Thoughts?


I dunno; I guess this is a question for a pure ZFS forum where non-
FUSE people hang out.

sghe...@hotmail.com

unread,
May 22, 2009, 3:49:03 PM5/22/09
to zfs-...@googlegroups.com
Luke Marsden wrote:
Hey,

We used a single HD because we were comparing the performance between the operating system implementations, not raw disk speed. All that mattered about the disk setup is that it stayed constant between tests.
Ok useful enough. However, the striping implementation (and therefore the performance impact) may vary between the OS *driver* implementations as well. Especially zfs-fuse being userspace may kill the benefit. I'd like to know [1]

Cheers

[1] FWIW: I've *seen* the increase in speed when using a set of disk over using a single disk. I'd still be interested in the performance differences across OS-es.


And why ZFS? Because our application makes heavy use of differential snapshots being sent and received between nodes in a cluster, and as far as I know ZFS is the only filesystem which supports this mode of operation efficiently enough to send/recv a whole filesystem every 10-20 seconds. You don't need lots of HDs to find that useful ;)
Yup. I agree.

David Abrahams

unread,
May 22, 2009, 3:54:39 PM5/22/09
to zfs-...@googlegroups.com

on Fri May 22 2009, "sgheeren-AT-hotmail.com" <sgheeren-AT-hotmail.com> wrote:

> Luke Marsden wrote:
>> Hey,
>>
>> We used a single HD because we were comparing the performance between

>> the /operating system implementations/, not raw disk speed. All that


>> mattered about the disk setup is that it stayed constant between tests.
> Ok useful enough. However, the striping implementation (and therefore
> the performance impact) may vary between the OS *driver* implementations
> as well. Especially zfs-fuse being userspace may kill the benefit. I'd
> like to know [1]
>
> Cheers
>
> [1] FWIW: I've *seen* the increase in speed when using a set of disk
> over using a single disk. I'd still be interested in the performance
> differences across OS-es.

Soon, my friend. My first test on Linux is in the last phase (16G files,
which takes forever).

sghe...@hotmail.com

unread,
May 22, 2009, 4:11:16 PM5/22/09
to zfs-...@googlegroups.com
David Abrahams wrote:
>
>
> Soon, my friend. My first test on Linux is in the last phase (16G files,
> which takes forever).
>
>
>
Bows!

I've only started getting workable results with OpenSolaris this week
(as I finally got my head around making the NIC work - muhahaha. It
appears if the NIC doesn't work, OS is dead in the water. Surprisingly.
It can't even shutdown <gawk/>). I'm motivated to switch to OpenSolaris
if I get the basics working. I might do virtual machines. Still
pondering xen or branded zones (which I have zero experience with).

Seth

David Abrahams

unread,
May 22, 2009, 5:15:56 PM5/22/09
to zfs-...@googlegroups.com

On May 22, 2009, at 4:11 PM, sghe...@hotmail.com wrote:

>
> David Abrahams wrote:
>>
>>
>> Soon, my friend. My first test on Linux is in the last phase (16G
>> files,
>> which takes forever).
>>
>>
>>
> Bows!
>
> I've only started getting workable results with OpenSolaris this week
> (as I finally got my head around making the NIC work - muhahaha. It
> appears if the NIC doesn't work, OS is dead in the water.
> Surprisingly.
> It can't even shutdown <gawk/>). I'm motivated to switch to
> OpenSolaris
> if I get the basics working.

I find OSOL quite foreign and difficult, not to mention lacking in
flexibility and easily available software. If Linux competes on ZFS
speed, I'll use it. If not, I'll stick with Solaris for the fileserver.

Ironically, I'm getting a Sun server here for running VMs... on which
I'm planning to run Linux.

> I might do virtual machines. Still
> pondering xen


I still find Xen mysterious, and fear it would be a huge time sink for
lack of broad support and documentation. Do you *have* to have
separate partitions for each domU? I still don't know


> or branded zones (which I have zero experience with).

Hmm, didn't know about that one. Looks interesting, but again, quite
limited ATM.

I'll probably end up with KVM. I hear VBox is pretty good, but I
found out the hard way that it can't virtualize 64-bit guests on 64-
bit hosts without hardware virtualization support, which still baffles
me.

David Abrahams

unread,
May 22, 2009, 6:12:10 PM5/22/09
to zfs-...@googlegroups.com

on Fri May 22 2009, "sgheeren-AT-hotmail.com" <sgheeren-AT-hotmail.com> wrote:

> David Abrahams wrote:
>>
>>
>> Soon, my friend. My first test on Linux is in the last phase (16G files,
>> which takes forever).
>>
>>
>>
> Bows!

Preliminary analysis shows that Solaris ZFS is the overall winner on all
tests except these two, where ZFS-Fuse not surprisingly wins until we
blow past the system RAM size:

Fread: This test measures the performance of reading a file using the
library function fread(). This is a library routine that performs
buffered & blocked read operations. The buffer is within the user’s
address space. If an application were to read in very small size
transfers then the buffered & blocked I/O functionality of fread() can
enhance the performance of the application by reducing the number of
actual operating system calls and increasing the size of the transfers
when operating system calls are made.

Freread: This test is the same as fread above except that in this test
the file that is being read was read in the recent past. This should
result in higher performance as the operating system is likely to have
the file data in cache.

David Abrahams

unread,
May 23, 2009, 1:40:50 AM5/23/09
to zfs-...@googlegroups.com

et voilà. Not sure what to make of these yet. Linux often wins on
smaller files, maybe because I've set the ARC cache so very large. It
really loses on the large files.

linux-zfs.txt

Fajar A. Nugraha

unread,
May 23, 2009, 9:51:30 AM5/23/09
to zfs-...@googlegroups.com
On Sat, May 23, 2009 at 4:15 AM, David Abrahams <da...@boostpro.com> wrote:
> On May 22, 2009, at 4:11 PM, sghe...@hotmail.com wrote:
>> I've only started getting workable results with OpenSolaris this week
>> (as I finally got my head around making the NIC work - muhahaha. It
>> appears if the NIC doesn't work, OS is dead in the water.

It's a known problem :)
They use nwam (kinda like network-manager in Linux) whose purpose is
good, but at this point still have some bugs (like what you
mentioned). Since I'm using it on server only, I simply disable
svc:/network/physical:nwam and enable svc:/network/physical:default
(the old-style solaris networking config)

> I find OSOL quite foreign and difficult,

It's actually quite good. If you're familiar with Solaris, you can
make it behave the same way with a little modification (like the
network part I mentioned above). Add pkg (the software management),
crossbow (the new bridge/vlan framework), and XVM (Xen), it has some
of the best things from both Solaris and Linux.

> not to mention lacking in
> flexibility and easily available software.

pkg is similar to apt-get, with more and more packages coming. Third
party IPS repositories are also available.

>  If Linux competes on ZFS
> speed, I'll use it.  If not, I'll stick with Solaris for the fileserver.

(Open)Solaris also has the benefits of zvol, something not available
in zfs-fuse (yet). Very handy to create iscsi SAN.

> I still find Xen mysterious, and fear it would be a huge time sink for
> lack of broad support and documentation.

Is it? xen-...@lists.xensource.com and xen-d...@opensolaris.org
is a good place to start.

> Do you *have* to have
> separate partitions for each domU?

Not really. file-backed domU is also possible, although for optimum
performance you need block device backend (partition, LV, or ZVOL).

Personally I use zfs-fuse when the application runs most optimally
(well-tested, binary availablilty, etc.) on Linux.
I use Opensolaris mostly for its ZVOL capability, which I find very
useful when testing new configuration on top of Xen.

--
Fajar

David Abrahams

unread,
May 23, 2009, 10:26:00 AM5/23/09
to zfs-...@googlegroups.com

on Sat May 23 2009, "Fajar A. Nugraha" <fajar-AT-fajar.net> wrote:

> On Sat, May 23, 2009 at 4:15 AM, David Abrahams <da...@boostpro.com> wrote:
>> On May 22, 2009, at 4:11 PM, sghe...@hotmail.com wrote:
>>> I've only started getting workable results with OpenSolaris this week
>>> (as I finally got my head around making the NIC work - muhahaha. It
>>> appears if the NIC doesn't work, OS is dead in the water.
>
> It's a known problem :)
> They use nwam (kinda like network-manager in Linux) whose purpose is
> good, but at this point still have some bugs (like what you
> mentioned). Since I'm using it on server only, I simply disable
> svc:/network/physical:nwam and enable svc:/network/physical:default
> (the old-style solaris networking config)
>
>> I find OSOL quite foreign and difficult,
>
> It's actually quite good.

That's reassuring.

> If you're familiar with Solaris,

I'm not :(

> you can
> make it behave the same way with a little modification (like the
> network part I mentioned above). Add pkg (the software management),
> crossbow (the new bridge/vlan framework), and XVM (Xen), it has some
> of the best things from both Solaris and Linux.

Got installation pointers for all those things?

>> not to mention lacking in
>> flexibility and easily available software.
>
> pkg is similar to apt-get, with more and more packages coming. Third
> party IPS repositories are also available.

When I did a Solaris installation it gave me options of a whole mess of
packages to install, but apparently had no automatic dependency
management, with the result that it was pretty much impossible to
install anything other than a default configuration. Thank sort of
thing scares me; makes me think it's miles behind apt[itude].

>>  If Linux competes on ZFS
>> speed, I'll use it.  If not, I'll stick with Solaris for the fileserver.
>
> (Open)Solaris also has the benefits of zvol, something not available
> in zfs-fuse (yet).

At the pace development is going, I'm not holding my breath ;-)

> Very handy to create iscsi SAN.

Good point; I'll probably try that.

>> I still find Xen mysterious, and fear it would be a huge time sink for
>> lack of broad support and documentation.
>
> Is it? xen-...@lists.xensource.com and xen-d...@opensolaris.org
> is a good place to start.
>
>> Do you *have* to have
>> separate partitions for each domU?
>
> Not really. file-backed domU is also possible, although for optimum
> performance you need block device backend (partition, LV, or ZVOL).

That's a good reason to use OpenSolaris as the dom0 right there.

> Personally I use zfs-fuse when the application runs most optimally
> (well-tested, binary availablilty, etc.) on Linux.
> I use Opensolaris mostly for its ZVOL capability, which I find very
> useful when testing new configuration on top of Xen.

If I could get Ubuntu to run in a domU on top of OpenSolaris, I think
I'd be interested. But it looks like there are lots of iffy edges to
that picture, not least that the latest Ubuntu doesn't have a Xen
kernel.

Fajar A. Nugraha

unread,
May 23, 2009, 12:51:10 PM5/23/09
to zfs-...@googlegroups.com
On Sat, May 23, 2009 at 9:26 PM, David Abrahams <da...@boostpro.com> wrote:
> on Sat May 23 2009, "Fajar A. Nugraha" <fajar-AT-fajar.net> wrote:
>> If you're familiar with Solaris,
>
> I'm not :(

That would make it a little harder then :D
Going from Solaris 10 -> Opensolaris is easy, only small adaptations required.
Going from Linux -> Opensolaris is somewhat harder, since there are
some new concepts to learn. Since you already installed the preview
release of osol 0906, and familiar-enough with zfs, it shouldn't be
too hard.

>
>> you can
>> make it behave the same way with a little modification (like the
>> network part I mentioned above). Add pkg (the software management),
>> crossbow (the new bridge/vlan framework), and XVM (Xen), it has some
>> of the best things from both Solaris and Linux.
>
> Got installation pointers for all those things?

Start from opensolaris.com. In particular :
- installing : http://www.opensolaris.com/use/. At this point I
suggest do NOT use osol 2008.11, but use latest 2009.06 preview
version from http://genunix.org/ (but I guess you know about this
already :P )
- IPS package manager : installed by default.
http://www.opensolaris.com/use/update/#packagingsystem
- crossbow : installed by default. One of the first thing I did was
use it to rename "bnx0" to "eth0" :D Try "man dladm" or
http://opensolaris.org/os/project/crossbow/
- xVM : This one's a bit tricky. Start from
http://trevoro.ca/blog/2008/05/07/getting-xvm-to-work-in-opensolaris-200805/.
I hope there will be official updated docs from Sun when opensolaris
2009.06 finally comes out.

>> pkg is similar to apt-get, with more and more packages coming. Third
>> party IPS repositories are also available.
>
> When I did a Solaris installation it gave me options of a whole mess of
> packages to install, but apparently had no automatic dependency
> management, with the result that it was pretty much impossible to
> install anything other than a default configuration.  Thank sort of
> thing scares me; makes me think it's miles behind apt[itude].

Opensolaris installation today is similar to Ubuntu install from live
CD, you start with what you have on the CD. After that you have both
command line (pkg) and GUI (packagemanager) package management with
automatic depsolving capability. The old "pkgadd" still works though,
in case you need it :)

>> Personally I use zfs-fuse when the application runs most optimally
>> (well-tested, binary availablilty, etc.) on Linux.
>> I use Opensolaris mostly for its ZVOL capability, which I find very
>> useful when testing new configuration on top of Xen.
>
> If I could get Ubuntu to run in a domU on top of OpenSolaris, I think
> I'd be interested.  But it looks like there are lots of iffy edges to
> that picture, not least that the latest Ubuntu doesn't have a Xen
> kernel.

Sort of. The main problem is Ubuntu doesn't have a Xen kernel, and
opensolaris' xen version can't run pv_ops kernel yet. There's a
workaround though: install Ubuntu as HVM domU, and convert it to PV
domU using Debian's xen kernel :
http://lists.xensource.com/archives/html/xen-users/2009-05/msg00536.html

--
Fajar

David Abrahams

unread,
May 23, 2009, 5:27:24 PM5/23/09
to zfs-...@googlegroups.com

[Sorry to all for how off-topic this has drifted for the zfs-fuse list.
If anyone would like to object, I'll contact fajar off-list]

on Sat May 23 2009, "Fajar A. Nugraha" <fajar-AT-fajar.net> wrote:

> On Sat, May 23, 2009 at 9:26 PM, David Abrahams <da...@boostpro.com> wrote:
>> on Sat May 23 2009, "Fajar A. Nugraha" <fajar-AT-fajar.net> wrote:
>>> If you're familiar with Solaris,
>>
>> I'm not :(
>
> That would make it a little harder then :D Going from Solaris 10 ->
> Opensolaris is easy, only small adaptations required.

> Going from Linux -> Opensolaris is somewhat harder, since there are
> some new concepts to learn. Since you already installed the preview
> release of osol 0906, and familiar-enough with zfs, it shouldn't be
> too hard.

OK, I might try that again, in that case. Linux grub installation
confusion/problems "forced" me to blow away OpenSolaris when I installed
Linux, but I could sure try again.

>>> you can make it behave the same way with a little modification (like
>>> the network part I mentioned above).

I'm not sure what any of that meant; I wasn't having any networking
problems with OpenSolaris, but one thing I don't know how to do is
network bonding a la
http://techarcana.net/hydra/miscellanea/#network-bonding

Pointers?

>>> Add pkg (the software
>>> management), crossbow (the new bridge/vlan framework), and XVM
>>> (Xen), it has some of the best things from both Solaris and Linux.
>>
>> Got installation pointers for all those things?
>
> Start from opensolaris.com. In particular :
> - installing : http://www.opensolaris.com/use/. At this point I
> suggest do NOT use osol 2008.11, but use latest 2009.06 preview
> version from http://genunix.org/ (but I guess you know about this
> already :P )

Didn't know about it, actually. Thanks.

> - IPS package manager : installed by default.
> http://www.opensolaris.com/use/update/#packagingsystem

That's important, thanks.

> - crossbow : installed by default. One of the first thing I did was
> use it to rename "bnx0" to "eth0" :D Try "man dladm" or
> http://opensolaris.org/os/project/crossbow/

Fancy. Not sure I need it, but will look.

> - xVM : This one's a bit tricky.

Yes, that's exactly what I meant about Xen :-)

> Start from
> http://trevoro.ca/blog/2008/05/07/getting-xvm-to-work-in-opensolaris-200805/.

I'll look at that, too. Again, thanks.

> I hope there will be official updated docs from Sun when opensolaris
> 2009.06 finally comes out.
>
>>> pkg is similar to apt-get, with more and more packages coming. Third
>>> party IPS repositories are also available.
>>
>> When I did a Solaris installation it gave me options of a whole mess of
>> packages to install, but apparently had no automatic dependency
>> management, with the result that it was pretty much impossible to
>> install anything other than a default configuration.  Thank sort of
>> thing scares me; makes me think it's miles behind apt[itude].
>
> Opensolaris installation today is similar to Ubuntu install from live
> CD, you start with what you have on the CD.

Yeah, but the expert mode, which gives a similar list of packages from
which to choose during installation, handles the dependency management
by (duh!) using apt.

> After that you have both command line (pkg) and GUI (packagemanager)
> package management with automatic depsolving capability. The old
> "pkgadd" still works though, in case you need it :)

Why would I need it? pkg doesn't always work?

>>> Personally I use zfs-fuse when the application runs most optimally
>>> (well-tested, binary availablilty, etc.) on Linux.
>>> I use Opensolaris mostly for its ZVOL capability, which I find very
>>> useful when testing new configuration on top of Xen.
>>
>> If I could get Ubuntu to run in a domU on top of OpenSolaris, I think
>> I'd be interested.  But it looks like there are lots of iffy edges to
>> that picture, not least that the latest Ubuntu doesn't have a Xen
>> kernel.
>
> Sort of. The main problem is Ubuntu doesn't have a Xen kernel, and
> opensolaris' xen version can't run pv_ops kernel yet.

pv_ops?
http://wiki.xen.prgmr.com/xenophobia/2008/08/tell-me-about-pv-ops.html

I'm not the only one! This (among other similar things) is why Xen is
still a mystery to me.

> There's a
> workaround though: install Ubuntu as HVM domU,
> and convert it to PV domU using Debian's xen kernel :
> http://lists.xensource.com/archives/html/xen-users/2009-05/msg00536.html

The box in question doesn't have hardware virtualization support, so
IIUC HVM is off the table.

sghe...@hotmail.com

unread,
May 24, 2009, 8:41:23 AM5/24/09
to zfs-...@googlegroups.com
No harm done. I benefited :) I was pulling hard at making 2008.11 work
as we spoke. I will restart using 2009.06 now since it has many of those
goodies I *knew* should be feasible but *knew not* how to do, or at
least more easily available. Great. I won't, however discuss my
questions further on this list because, indeed, it is not exactly on topic!

Fajar A. Nugraha

unread,
May 24, 2009, 10:10:34 PM5/24/09
to zfs-...@googlegroups.com
(my last reply on this topic. promise! :D )

On Sun, May 24, 2009 at 7:41 PM, sghe...@hotmail.com
<sghe...@hotmail.com> wrote:
> I was pulling hard at making 2008.11 work
> as we spoke. I will restart using 2009.06 now since it has many of those
> goodies I *knew* should be feasible but *knew not* how to do, or at
> least more easily available.

For those interested in deploying opensolaris, I'd suggest joining
indiana-discuss list at opensolaris.org. Sun engineers are usually
there as well, so you'll get (semi) authoritative response as well as
the usual response from fellow users. There are also specialized lists
available : http://mail.opensolaris.org/mailman/listinfo

> David Abrahams wrote:
>> OK, I might try that again, in that case.  Linux grub installation
>> confusion/problems "forced" me to blow away OpenSolaris when I installed
>> Linux, but I could sure try again.

For dual-boot LInux-Solaris on the same system, the easiest way to do
so is by treating opensolaris like Windows, and install its grub on
the solaris partition (not on MBR). Let Linux have the MBR (or the
active partition). It's also handy having the live CD for recovery
purposes.

If you want to share data between opensolaris and zfs fuse, "zpool
create -o version=13" is your friend. opensolaris now uses v14 while
zfs-fuse still use v13. You can not modify the zpool version for
opensolaris root pool (rpool) though (at least not easily), so it's
best to put shared data on a different pool.

>> but one thing I don't know how to do is
>> network bonding a la
>> http://techarcana.net/hydra/miscellanea/#network-bonding
>>
>> Pointers?

Solaris supports network bonding for a long time. With crossbow, the
instructions are slightly different though. "man dladm" is your friend
(look for dladm create-aggr)

>>> - xVM : This one's a bit tricky.
>> Yes, that's exactly what I meant about Xen :-)

Some features (like a "usable" xVM and crossbow) is available post
2008.11, so the instructions are still scattered everywhere. The lists
are a good place to start, and 2009.06 should have better
documentation.

>>> Opensolaris installation today is similar to Ubuntu install from live
>>> CD, you start with what you have on the CD.
>>
>> Yeah, but the expert mode, which gives a similar list of packages from
>> which to choose during installation, handles the dependency management
>> by (duh!) using apt.

As I recall the actual customization (package addition or removal) is
done after the entire live CD contents is copied to your HD. This is
actually similar with opensolaris, only you have to reboot first :)

>>> The old
>>> "pkgadd" still works though, in case you need it :)
>> Why would I need it?  pkg doesn't always work?

Function-wise, pkg is similar to apt-get while pkgadd is similar to dpkg.

--
Fajar

David Abrahams

unread,
May 25, 2009, 6:22:12 AM5/25/09
to zfs-...@googlegroups.com

on Sun May 24 2009, "Fajar A. Nugraha" <fajar-AT-fajar.net> wrote:

> (my last reply on this topic. promise! :D )
>
> On Sun, May 24, 2009 at 7:41 PM, sghe...@hotmail.com
> <sghe...@hotmail.com> wrote:
>> I was pulling hard at making 2008.11 work
>> as we spoke. I will restart using 2009.06 now since it has many of those
>> goodies I *knew* should be feasible but *knew not* how to do, or at
>> least more easily available.
>
> For those interested in deploying opensolaris, I'd suggest joining
> indiana-discuss list at opensolaris.org. Sun engineers are usually
> there as well, so you'll get (semi) authoritative response as well as
> the usual response from fellow users. There are also specialized lists
> available : http://mail.opensolaris.org/mailman/listinfo

Thanks for the pointer.

>> David Abrahams wrote:
>>> OK, I might try that again, in that case.  Linux grub installation
>>> confusion/problems "forced" me to blow away OpenSolaris when I installed
>>> Linux, but I could sure try again.
>
> For dual-boot LInux-Solaris on the same system, the easiest way to do
> so is by treating opensolaris like Windows, and install its grub on
> the solaris partition (not on MBR). Let Linux have the MBR (or the
> active partition). It's also handy having the live CD for recovery
> purposes.

Yeah, that wasn't the problem. It was grub quietly mapping /dev/sdi to
(hd0) when the machine was booted from installation CD, but not
otherwise.

> If you want to share data between opensolaris and zfs fuse, "zpool
> create -o version=13" is your friend.

Oh, this is good to know, thanks!

> opensolaris now uses v14 while
> zfs-fuse still use v13. You can not modify the zpool version for
> opensolaris root pool (rpool) though (at least not easily), so it's
> best to put shared data on a different pool.
>
>>> but one thing I don't know how to do is
>>> network bonding a la
>>> http://techarcana.net/hydra/miscellanea/#network-bonding
>>>
>>> Pointers?
>
> Solaris supports network bonding for a long time. With crossbow, the
> instructions are slightly different though. "man dladm" is your friend
> (look for dladm create-aggr)

Nice.

>>>> - xVM : This one's a bit tricky.
>>> Yes, that's exactly what I meant about Xen :-)
>
> Some features (like a "usable" xVM and crossbow) is available post
> 2008.11, so the instructions are still scattered everywhere. The lists
> are a good place to start, and 2009.06 should have better
> documentation.
>
>>>> Opensolaris installation today is similar to Ubuntu install from live
>>>> CD, you start with what you have on the CD.
>>>
>>> Yeah, but the expert mode, which gives a similar list of packages from
>>> which to choose during installation, handles the dependency management
>>> by (duh!) using apt.
>
> As I recall the actual customization (package addition or removal) is
> done after the entire live CD contents is copied to your HD. This is
> actually similar with opensolaris, only you have to reboot first :)

Well, it gives big, scary warnings, anyway.

>>>> The old
>>>> "pkgadd" still works though, in case you need it :)
>>> Why would I need it?  pkg doesn't always work?
>
> Function-wise, pkg is similar to apt-get while pkgadd is similar to dpkg.

Thanks again for all the info. Now back to analyzing my data...

David Abrahams

unread,
May 25, 2009, 1:24:26 PM5/25/09
to zfs-...@googlegroups.com

Here are two test runs of raidz2 on ZFS-Fuse. The second one was done
the same way as the first, except that zfs-fuse was built with debug=0
on the scons command line. IIUC zfs-fuse was already optimized; this
change only removes debug symbols.

Just for kicks, to see how different the results were, I plotted the
difference divided by the average:

2(X-Y)/(X+Y)

Where X was the one with debug=0 and Y was the earlier build.

The graph is enclosed. Those spikes are pretty surprising, and I'm not
sure what they mean.

linux-raidz2.txt
variance.png
linux-raidz2-nodebug.txt

Jonathan Schmidt

unread,
May 25, 2009, 1:41:28 PM5/25/09
to zfs-...@googlegroups.com
> Here are two test runs of raidz2 on ZFS-Fuse. The second one was done
> the same way as the first, except that zfs-fuse was built with debug=0
> on the scons command line. IIUC zfs-fuse was already optimized; this
> change only removes debug symbols.
>
> Just for kicks, to see how different the results were, I plotted the
> difference divided by the average:
>
> 2(X-Y)/(X+Y)
>
> Where X was the one with debug=0 and Y was the earlier build.
>
> The graph is enclosed. Those spikes are pretty surprising, and I'm not
> sure what they mean.

To be sure that it isn't just randomness, try comparing two runs of
raidz2 under identical conditions (remove the debug=0 perturbation).

Also, is there any way to label the axes?

Thanks for all your hard work benchmarking, BTW.

Jonathan

David Abrahams

unread,
May 25, 2009, 2:56:55 PM5/25/09
to zfs-...@googlegroups.com

on Mon May 25 2009, Jonathan Schmidt <jon-AT-jschmidt.ca> wrote:

>> Here are two test runs of raidz2 on ZFS-Fuse. The second one was done
>> the same way as the first, except that zfs-fuse was built with debug=0
>> on the scons command line. IIUC zfs-fuse was already optimized; this
>> change only removes debug symbols.
>>
>> Just for kicks, to see how different the results were, I plotted the
>> difference divided by the average:
>>
>> 2(X-Y)/(X+Y)
>>
>> Where X was the one with debug=0 and Y was the earlier build.
>>
>> The graph is enclosed. Those spikes are pretty surprising, and I'm not
>> sure what they mean.
>
> To be sure that it isn't just randomness, try comparing two runs of
> raidz2 under identical conditions (remove the debug=0 perturbation).

I'll try; I'm running out of cycles to spend on this. I'm doing a flat
mdRAID0 ext3 test now; any other requests?

> Also, is there any way to label the axes?

Of course; that graph was just quick-and-dirty. I have been
researching ways to get good images and I think I'm settling on
http://code.enthought.com/projects/mayavi

That's an open-source data visualization framework/application, so in
principle once I've put something out there, anyone else can explore the
graphs and do their own analysis.

David Abrahams

unread,
May 26, 2009, 7:11:38 AM5/26/09
to zfs-...@googlegroups.com

on Mon May 25 2009, David Abrahams <dave-AT-boostpro.com> wrote:

> on Mon May 25 2009, Jonathan Schmidt <jon-AT-jschmidt.ca> wrote:
>
>>> Here are two test runs of raidz2 on ZFS-Fuse. The second one was done
>>> the same way as the first, except that zfs-fuse was built with debug=0
>>> on the scons command line. IIUC zfs-fuse was already optimized; this
>>> change only removes debug symbols.
>>>
>>> Just for kicks, to see how different the results were, I plotted the
>>> difference divided by the average:
>>>
>>> 2(X-Y)/(X+Y)
>>>
>>> Where X was the one with debug=0 and Y was the earlier build.
>>>
>>> The graph is enclosed. Those spikes are pretty surprising, and I'm not
>>> sure what they mean.
>>
>> To be sure that it isn't just randomness, try comparing two runs of
>> raidz2 under identical conditions (remove the debug=0 perturbation).
>
> I'll try; I'm running out of cycles to spend on this. I'm doing a flat
> mdRAID0 ext3 test now;

...which is taking truly forever. I guess you can all feel good about
picking ZFS-Fuse instead of EXT3 unless your application spends a great
deal of time rewriting the same spot within a given file. The "record
rewrite" test is the only one on which EXT3 is winning; otherwise it's
losing bigtime when the file size exceeds memory.

Now I'm _really_ running out of time to work on this. With another
machine coming on which I can virtualize 64-bit OSes, I don't really
have a strong incentive to keep this one running Linux. I think I have
the information *I* need, which tells me that in addition to continuing
to be maintained and improved, OpenSolaris ZFS is beating ZFS-Fuse on
performance often by a factor of 5 or more.

I will post the numbers I have, but as for making pretty graphs, I've
already spent too long trying to set that up. If anyone else would like
to pick up where I left off, I'd be very happy to help. I have a bunch
of Python scripts for parsing the iozone output, and even something that
will use mayavi to draw a surface.

> any other requests?

*** Really, last call for requests. ***

>> Also, is there any way to label the axes?
>
> Of course; that graph was just quick-and-dirty. I have been
> researching ways to get good images and I think I'm settling on
> http://code.enthought.com/projects/mayavi
>
> That's an open-source data visualization framework/application, so in
> principle once I've put something out there, anyone else can explore the
> graphs and do their own analysis.

Sorry about that; I think I've overpromised on that score. Again, I
really would be very glad if someone else would take up the job of
making graphs and doing some analysis -- it would be good to publish
these results in a digestible form on the web.

Regards,

sghe...@hotmail.com

unread,
May 26, 2009, 7:21:37 AM5/26/09
to zfs-...@googlegroups.com
Hi David,

thanks for all the efforts. The numbers (so far) are good enough for
me... pretty graphs are nice but not as valuable as the data itself.
Will you be documenting your setup like with Hydera the Beast? Sounds
like a very interesting setup *and* I have a box capable of virtualizing
64-bit OS-es already.

I might want to go the same route (allthough the download of
osol-0906-111a-x86 is taking forever)

David Abrahams

unread,
May 26, 2009, 10:00:32 AM5/26/09
to zfs-...@googlegroups.com

on Tue May 26 2009, "sgheeren-AT-hotmail.com" <sgheeren-AT-hotmail.com> wrote:

> Hi David,
>
> thanks for all the efforts. The numbers (so far) are good enough for
> me... pretty graphs are nice but not as valuable as the data itself.
> Will you be documenting your setup like with Hydera the Beast?

Yep, I need to keep track of everything, and FWIW this is the same
physical piece of hardware as Hydra was. Hydra is dead; long live
Hydra!

I'll try to document everything... but are you interested in the
ZFS-Fuse setup that I have now, or the Solaris, or...?

> Sounds like a very interesting setup *and* I have a box capable of
> virtualizing 64-bit OS-es already.

> I might want to go the same route (allthough the download of
> osol-0906-111a-x86 is taking forever)

Yes, I noticed that it is a bit of a monster.

David Abrahams

unread,
May 27, 2009, 8:09:44 AM5/27/09
to zfs-...@googlegroups.com

on Tue May 26 2009, David Abrahams <dave-AT-boostpro.com> wrote:

>> I'll try; I'm running out of cycles to spend on this. I'm doing a flat
>> mdRAID0 ext3 test now;
>
> ...which is taking truly forever.

et voila.

Last chance for requests; I'm going to close this party down. Do I owe
somebody a 'dd' test before I do that?

linux-ext3-raid0.txt

David Abrahams

unread,
Jun 10, 2009, 9:34:49 PM6/10/09
to zfs-...@googlegroups.com, fa...@fajar.net

on Sun May 24 2009, "Fajar A. Nugraha" <fajar-gqapnpqMBQ1eoWH0uzbU5w-AT-public.gmane.org> wrote:

> For those interested in deploying opensolaris, I'd suggest joining
> indiana-discuss list at opensolaris.org. Sun engineers are usually
> there as well, so you'll get (semi) authoritative response as well as
> the usual response from fellow users. There are also specialized lists
> available : http://mail.opensolaris.org/mailman/listinfo

Hi Fajar,

Just a quick thank you for encouraging me w.r.t. OpenSolaris. The more
I use it, the more impressed I am with the technology, especially the
way ZFS is integrated into Boot Environments and Zones, not to mention
time slider and who-knows-what else.

Regards,

sghe...@hotmail.com

unread,
Jun 11, 2009, 4:28:07 AM6/11/09
to zfs-...@googlegroups.com
It seems Dave and I are in the same step of the Way To Solaris :)
When we meet Budha, remember to kill him

On a slightly less positive note: I managed to kill my most promising
Solaris install by installing Sun Studio (probably); somehow the package
manager has become wedged (it keeps telling me 'this package cannot be
installed on its own, please update all'. Then, when I try to do
'update all' it sais: your system has already been updated. End of story[1]

In a desperate attempt to remedy the situation, I deleted my other
bootenvironments (I should have known better, now I even lost the
ability to go back to one of them...)

So, although the machine runs and is all shiny (I love the features Dave
mentions) I know it is a dead end, and I'll have to start over.
At least then, I'll be better able to leverage boot environments in case
something bad like this happens again.

FTM. I'm looking into blastwave packages - these still work, but i'm not
exactly clear on how (well) this integrates with regular Sun package
managements and updates. I'm still having fun finding all this out. To
me, it seems obvious I'll need to have Blastwave working for me in a way
that is durable to make the switch. I'm using all that stuff on a
day-to-day basis and don't want to have to virtualize everything.

Oh and - I'm looking into buying one of the Tranquil PC' s (T7 series)
as a fanless server because the mini-ITX Mobo apparently supports
OpenSolaris. That would be crazy fun as a replacement for my web/mailserver!

[1] PS. I should mention that googling any of these error messages
returns ZERO hits. ZERO! Btw. I'm writing this mail offline so I might
not recall the exact wording of the messages correctly.

Reply all
Reply to author
Forward
0 new messages