1. A regular pool of 8 whole drives, no redundancy
2. RAIDZ2 over those same 8 drives.
Anyone wanting to make more specific requests for results, now is the
time to speak up!
Cheers,
--
David Abrahams
BoostPro Computing
http://boostpro.com
Also if you have time, a pareto of the various zfs settings would be
interesting. Most importantly (to me), on the 8 drive pool without
redundancy, test a filesystem that has "copies=2" set. Also I'd be
interested in the performance loss going to "checksum=sha256" and
"compression=on/gzip".
Thanks!
> David Abrahams wrote:
>> I finally got a reasonable OpenSolaris (0906) installation on my
>> server, so I'm going to compare the speed of ZFS-Fuse with the in-
>> kernel stuff. The tests I'm planning to run are essentially "iozone -
>> a -g 16G" over two different configurations:
>>
>> 1. A regular pool of 8 whole drives, no redundancy
>> 2. RAIDZ2 over those same 8 drives.
>>
>> Anyone wanting to make more specific requests for results, now is the
>> time to speak up!
> It might be good to have baseline numbers to compare it to, such as ext3
> single disk performance under linux,
ext3 is such a poorly-performing filesystem that I don't think it's
worth much as a baseline. I'd be happy to test JFS or XFS for that
purpose.
> and possibly an 8-way software RAID0.
OK, could do that. It will take some time. With 8G of RAM in this
machine, any testing that exceeds the cache and really tests the disk
throughput takes hours.
> Actually running something like HD Tune will give you the raw
> disk performance independent of the filesystem (might require Windows).
> That way we can see how much speed is lost under each configuration.
I hadn't planned to run Windows on bare metal ever again. Hoping not to
start now. Maybe it works under Wine?
> Also if you have time, a pareto of the various zfs settings would be
> interesting. Most importantly (to me), on the 8 drive pool without
> redundancy, test a filesystem that has "copies=2" set.
Yes, I'm interested in that too.
> Also I'd be interested in the performance loss going to
> "checksum=sha256" and "compression=on/gzip".
Wow, we'll be at this for days! Let's see how patient I can be... ;-)
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
Perhaps you meant the performance loss of gzip vs. lzo compression, and
performance loss of sha256 vs. fletcher? In those comparisons it would
be safe to predict a loss of performance[1]
Anyhow, I'd specify what I like measured, not the desired outcome :_P
Seth
[1] in the absence of specialized hardware or fluked measurements :)
my thoughts here:
(a) if the baseline is intended to show cost/improvement over *regular*
linux setups then ext3 is perfect choice
(b) if the baseline is intended to give some foothold to correlate the
findings with other published benchmarks, then ext3 will *still* be a
very good choice, because of the high volume of benchmark data available
that includes ext3 (this includes all of the jfs/xfs figures I ever came
across).
(c) if the baseline is intended to provide comparisons to other
tuned/optimized raid setups than ext3 might be a poor choice indeed.
$0.02
----
I personally like ext3 as a baseline: to me its more or less the
'English' of linux filesystems. It has few dials/knobs and they are
pretty well-understood. Therefore it seems quite likely that with a
'default ext3 setup' you'll have figures that are recognizable to
readers of the benchmark. Contrast that with jfs, xfs?
As an engineer I've learned to guess outcomes, and my instincts tend to
be very accurate, so I apologise if I'm making undue assumptions. I
love being wrong though, so get the data and make my day :)
> Seth
>
> [1] in the absence of specialized hardware or fluked measurements :)
>
I'd love to repurpose my GPU to do ZFS calculations :D
Jonathan
Though, unless your system would be CPU-bound (not a very interesting
case for benchmarking then) then compression could hardly decrease the
performance. Further, I might have read (need to refresh memory) that
zfs[-fuse] sports a smart compression detection algorithm that tries to
avoid compressing 'uncompressible' (ergo: compressed) data.[1] This
could alleviate even the CPU load issues.
[1] where is my memory these days
Yeah I've seen that feature while trolling through the code. It does a
test compression with a very fast compressor and if it doesn't get at
least a certain ratio then it doesn't run the more expensive one.
> [1] where is my memory these days
>
Who needs memory when DRAM is so cheap??
>
>
>> ext3 is such a poorly-performing filesystem that I don't think it's
>> worth much as a baseline. I'd be happy to test JFS or XFS for that
>> purpose.
>>
>>
> Wow - this seems based on some assumptions.
Not assumptions, my boy. Hearsay. There's a big difference ;-)
http://tinyurl.com/which-linux-filesystem
http://www.debian-administration.org/articles/388
http://linuxgazette.net/102/piszcz.html
I don't have time to do the tests myself, what with each iozone run
taking something like 8 hours. I have to go based on what I've read.
> In my opinion, these are not
> givens.
>
> my thoughts here:
>
> (a) if the baseline is intended to show cost/improvement over
> *regular*
> linux setups then ext3 is perfect choice
Well, I don't know what other people want it for, but I'd like to know
how ZFS is doing compared to another filesystem I'd actually *use* if
I found something to be really unworkable about ZFS.
> (b) if the baseline is intended to give some foothold to correlate the
> findings with other published benchmarks, then ext3 will *still* be a
> very good choice, because of the high volume of benchmark data
> available
> that includes ext3 (this includes all of the jfs/xfs figures I ever
> came
> across).
By that logic, I guess it doesn't matter much which one we use as a
baseline, since we can always deduce one from the other given known
numbers.
> (c) if the baseline is intended to provide comparisons to other
> tuned/optimized raid setups than ext3 might be a poor choice indeed.
> $0.02
> ----
> I personally like ext3 as a baseline: to me its more or less the
> 'English' of linux filesystems. It has few dials/knobs and they are
> pretty well-understood. Therefore it seems quite likely that with a
> 'default ext3 setup' you'll have figures that are recognizable to
> readers of the benchmark. Contrast that with jfs, xfs?
Understood; it would only be less useful to me, personally. However,
I did solicit requests...
> I'd love to repurpose my GPU to do ZFS calculations :D
Now *that* is a lovely idea.
On May 18, 2009, at 5:07 PM, sghe...@hotmail.com wrote:ext3 is such a poorly-performing filesystem that I don't think it's worth much as a baseline. I'd be happy to test JFS or XFS for that purpose.Wow - this seems based on some assumptions.Not assumptions, my boy. Hearsay. There's a big difference ;-) http://tinyurl.com/which-linux-filesystem http://www.debian-administration.org/articles/388 http://linuxgazette.net/102/piszcz.html
So that basically makes the assumption "you" :) No problem, all my bullets contain multiple (unintended) subjectivities as well :) If anyone cares, feel free to point them out.[1]...Well, I don't know what other people want it for, but I'd like to know how ZFS is doing compared to another filesystem I'd actually *use* ...
if I found something to be really unworkable about ZFS.(b) if the baseline is intended to give some foothold to correlate the findings with other published benchmarks, then ext3 will *still* be a very good choice, because of the high volume of benchmark data available that includes ext3 (this includes all of the jfs/xfs figures I ever came across).By that logic, I guess it doesn't matter much which one we use as a baseline, since we can always deduce one from the other given known numbers.
Understood; it would only be less useful to me, personally. However, I did solicit requests...
-- David Abrahams BoostPro Computing http://boostpro.com
Last time I test with a simple dd (bs=1M, size twice available
memory), on two server with the same hardware, ext3 on Linux has twice
the throughput of zfs on opensolaris. Please do dd test as well if you
can, I'd be interested in your results.
Regards,
Fajar
> Most importantly (to me), on the 8 drive pool without
> redundancy, test a filesystem that has "copies=2" set
Here are results from the two tests that have completed so far: one is
the 8-drive pool with no redundancy; the other is the same thing, but
with copies=2.
I'd appreciate it greatly if someone else could invest a little in
analyzing and/or graphing these results; I've got my hands a bit full
with the testing itself. Thanks!
------
> Also I'd be interested in the performance loss going to
> "checksum=sha256" and "compression=on/gzip".
"compression=gzip" is legal, but "compression=on/gzip" is not, so I
assume you meant the former. I'm doing that test now. I'm uncertain
whether iozone writes anything to its test files that can give us
meaningful results in this case, but I've asked Don Capps about it, so
we'll see.
Sorry, I meant those as two separate settings. "compression=on" uses
LZO and "compression=gzip" uses gzip. Both settings are going to be
data dependent so zero-filled files will skew the results significantly.
Random data will skew the results as well (pure entropy is not very
compressible). So I'm not sure what to suggest...
>
>>> Also I'd be interested in the performance loss going to
>>> "checksum=sha256" and "compression=on/gzip".
>>
>> "compression=gzip" is legal, but "compression=on/gzip" is not, so I
>> assume you meant the former. I'm doing that test now. I'm uncertain
>> whether iozone writes anything to its test files that can give us
>> meaningful results in this case, but I've asked Don Capps about it,
>> so
>> we'll see.
>
> Sorry, I meant those as two separate settings. "compression=on" uses
> LZO and "compression=gzip" uses gzip.
OK, I'll try them both if I can muster the patience.
> Both settings are going to be
> data dependent so zero-filled files will skew the results
> significantly.
> Random data will skew the results as well (pure entropy is not very
> compressible). So I'm not sure what to suggest...
Right, that's what I meant.
> Hi everyone,
>
> In case it helps, we recently did some similar benchmarks with iozone,
> except in this case it was Linux ZFS-FUSE vs. FreeBSD 7.2 vs. OpenSolaris
> 11/08. We even made some pretty graphs:
>
> http://lukemarsden.net/hl/Linux%20vs.%20FreeBSD%20ZFS%20Performance%20Report%202.pdf
Wow, I wish I'd known about this one earlier.
> Headlines: OpenSolaris runs about 10% above FreeBSD. ZFS-FUSE write
> performance is terrible (we got only 3% of the write performance that we got
> out of FreeBSD) but it does compete when it comes to reads.
How competitive _is_ it?
> All the OSes were running on the metal, and on the same hardware (Core 2
> Quad with 8GB RAM and a single Samsung SpinPoint F1 HD103UJ 1TB Hard Drive
> w/32MB cache). We used the latest version of ZFS-FUSE which was available
> packaged for Ubuntu 9.04.
without the big_writes patch it's not surprising you got poor write
performance, I think. See below
> The performance figures on the Y axis (vertical) are in kb/sec, the X
> axis shows dataset size in kb and the Z axis I believe shows the block
> size used per request.
>
> It looks like the cost of going from kernel -> userspace and back for each
> filesystem request really adds to the latency and the jitter of the results,
> not to mention throughput. Furthermore, ZFS's intensive use of a large RAM
> cache really shows. The ARC cache on OpenSolaris was explicitly set to 2GB
> (we realised this afterwards), whereas FreeBSD was left free to use whatever
> it could, and I've no idea what ZFS-FUSE does regarding memory allocation,
> but it seems to stay pretty limited (the fuse process never took up more
> than a few hundred megs of RAM).
I found that ZFS-Fuse "takes up" *lots* of memory, but not much of it
stays resident:
http://groups.google.com/group/zfs-fuse/browse_thread/thread/911370fa54cde008
> If we've got anything badly wrong regarding ZFS-FUSE here I'd be very happy
> to know about ways to improve the performance, because for our application
> it looks like it's presently not an option (although we'd love it to be).
Well, you might try doing what I've documented here, which includes the
aforementioned big_writes patch:
http://techarcana.net/hydra/zfs-installation/
> We even made some pretty graphs:
>
> http://lukemarsden.net/hl/Linux%20vs.%20FreeBSD%20ZFS%20Performance%20Report%202.pdf
Just a thought: it might be really useful to make a graph of the
*differences* in performance of the three platforms. That's what I'm
planning to do if nobody else takes up the data analysis challenge.
If by "production" you mean my home file server, then yes, I do use it.
I haven't had a crash or anything so I suppose it's been pretty stable
for me.
> Hey,
>
> We used a single HD because we were comparing the performance
> between the operating system implementations, not raw disk speed.
> All that mattered about the disk setup is that it stayed constant
> between tests.
But unless the disk is wicked fast, you'll learn mostly about the disk
bottleneck and not the OS speed, right?
> And why ZFS? Because our application makes heavy use of differential
> snapshots being sent and received between nodes in a cluster, and as
> far as I know ZFS is the only filesystem which supports this mode of
> operation efficiently enough to send/recv a whole filesystem every
> 10-20 seconds. You don't need lots of HDs to find that useful ;)
Nice application!
>
> Hi Dave,
>
> Thanks for this hint. When we revisit doing a Linux port later I will
> definitely test with these patches.
Good on ya.
> If you want the raw data to do subtractive graphs or such, here it is:
> http://lukemarsden.net/zfs-benchmarks/ (results.txt is the OpenSolaris
> one).
I think my tests are giving a better picture of the OS potential (more
disks) and the differences as applied to my hardware, natch. But I
may check that out anyway.
Interestingly, I did a quick difference graph on my OpenSolaris
tests, and RAIDZ2 over 8 disks is notably faster overall than A
regular pool over 8 disks with copies=2, which surprised me.
> I didn't realise that ZFS-FUSE sets the ARC cache to 128Mb by default.
> That certainly helps explain the results.
>
> Do you / does anyone here use ZFS-FUSE for production? If so, what
> kind of stability do you get with it?
It was working fine until I allocated the same set of partitions to
ZFS and mdRAID simultaneously...
which is why I'm now trying to use a simpler setup with ZFS on whole
disks.
OpenSolaris, raidz2 across 8 7200 RPM SATA disks:
$ time dd bs=1M count=16K if=/dev/zero of=/tank/bigfile
16384+0 records in
16384+0 records out
17179869184 bytes (17 GB) copied, 313.916 s, 54.7 MB/s
real 5m13.942s
user 0m0.061s
sys 0m24.385s
$ time dd bs=1M count=16K if=/tank/bigfile of=/dev/null
16384+0 records in
16384+0 records out
17179869184 bytes (17 GB) copied, 127.308 s, 135 MB/s
real 2m7.313s
user 0m0.018s
sys 0m15.675s
OpenSolaris, "flat" pool across the same 8 disks:
$ time dd bs=1M count=16K if=/dev/zero of=/tank/bigfile
16384+0 records in
16384+0 records out
17179869184 bytes (17 GB) copied, 290.356 s, 59.2 MB/s
real 4m50.362s
user 0m0.030s
sys 0m17.264s
$ time dd bs=1M count=16K if=/tank/bigfile of=/dev/null
16384+0 records in
16384+0 records out
17179869184 bytes (17 GB) copied, 61.328 s, 280 MB/s
real 1m1.333s
user 0m0.024s
sys 0m13.499s
!! Wow, reads are less than half as fast with raidz2 by these
measurements.
Which doesn't make any sense, does it? I've suspected the same thing of
copies=2, but neither should cause any overhead during reads. Thoughts?
>> !! Wow, reads are less than half as fast with raidz2 by these
>> measurements.
>
> Which doesn't make any sense, does it? I've suspected the same
> thing of
> copies=2, but neither should cause any overhead during reads.
> Thoughts?
I dunno; I guess this is a question for a pure ZFS forum where non-
FUSE people hang out.
Hey,
We used a single HD because we were comparing the performance between the operating system implementations, not raw disk speed. All that mattered about the disk setup is that it stayed constant between tests.
And why ZFS? Because our application makes heavy use of differential snapshots being sent and received between nodes in a cluster, and as far as I know ZFS is the only filesystem which supports this mode of operation efficiently enough to send/recv a whole filesystem every 10-20 seconds. You don't need lots of HDs to find that useful ;)
> Luke Marsden wrote:
>> Hey,
>>
>> We used a single HD because we were comparing the performance between
>> the /operating system implementations/, not raw disk speed. All that
>> mattered about the disk setup is that it stayed constant between tests.
> Ok useful enough. However, the striping implementation (and therefore
> the performance impact) may vary between the OS *driver* implementations
> as well. Especially zfs-fuse being userspace may kill the benefit. I'd
> like to know [1]
>
> Cheers
>
> [1] FWIW: I've *seen* the increase in speed when using a set of disk
> over using a single disk. I'd still be interested in the performance
> differences across OS-es.
Soon, my friend. My first test on Linux is in the last phase (16G files,
which takes forever).
I've only started getting workable results with OpenSolaris this week
(as I finally got my head around making the NIC work - muhahaha. It
appears if the NIC doesn't work, OS is dead in the water. Surprisingly.
It can't even shutdown <gawk/>). I'm motivated to switch to OpenSolaris
if I get the basics working. I might do virtual machines. Still
pondering xen or branded zones (which I have zero experience with).
Seth
>
> David Abrahams wrote:
>>
>>
>> Soon, my friend. My first test on Linux is in the last phase (16G
>> files,
>> which takes forever).
>>
>>
>>
> Bows!
>
> I've only started getting workable results with OpenSolaris this week
> (as I finally got my head around making the NIC work - muhahaha. It
> appears if the NIC doesn't work, OS is dead in the water.
> Surprisingly.
> It can't even shutdown <gawk/>). I'm motivated to switch to
> OpenSolaris
> if I get the basics working.
I find OSOL quite foreign and difficult, not to mention lacking in
flexibility and easily available software. If Linux competes on ZFS
speed, I'll use it. If not, I'll stick with Solaris for the fileserver.
Ironically, I'm getting a Sun server here for running VMs... on which
I'm planning to run Linux.
> I might do virtual machines. Still
> pondering xen
I still find Xen mysterious, and fear it would be a huge time sink for
lack of broad support and documentation. Do you *have* to have
separate partitions for each domU? I still don't know
> or branded zones (which I have zero experience with).
Hmm, didn't know about that one. Looks interesting, but again, quite
limited ATM.
I'll probably end up with KVM. I hear VBox is pretty good, but I
found out the hard way that it can't virtualize 64-bit guests on 64-
bit hosts without hardware virtualization support, which still baffles
me.
> David Abrahams wrote:
>>
>>
>> Soon, my friend. My first test on Linux is in the last phase (16G files,
>> which takes forever).
>>
>>
>>
> Bows!
Preliminary analysis shows that Solaris ZFS is the overall winner on all
tests except these two, where ZFS-Fuse not surprisingly wins until we
blow past the system RAM size:
Fread: This test measures the performance of reading a file using the
library function fread(). This is a library routine that performs
buffered & blocked read operations. The buffer is within the user’s
address space. If an application were to read in very small size
transfers then the buffered & blocked I/O functionality of fread() can
enhance the performance of the application by reducing the number of
actual operating system calls and increasing the size of the transfers
when operating system calls are made.
Freread: This test is the same as fread above except that in this test
the file that is being read was read in the recent past. This should
result in higher performance as the operating system is likely to have
the file data in cache.
It's a known problem :)
They use nwam (kinda like network-manager in Linux) whose purpose is
good, but at this point still have some bugs (like what you
mentioned). Since I'm using it on server only, I simply disable
svc:/network/physical:nwam and enable svc:/network/physical:default
(the old-style solaris networking config)
> I find OSOL quite foreign and difficult,
It's actually quite good. If you're familiar with Solaris, you can
make it behave the same way with a little modification (like the
network part I mentioned above). Add pkg (the software management),
crossbow (the new bridge/vlan framework), and XVM (Xen), it has some
of the best things from both Solaris and Linux.
> not to mention lacking in
> flexibility and easily available software.
pkg is similar to apt-get, with more and more packages coming. Third
party IPS repositories are also available.
> If Linux competes on ZFS
> speed, I'll use it. If not, I'll stick with Solaris for the fileserver.
(Open)Solaris also has the benefits of zvol, something not available
in zfs-fuse (yet). Very handy to create iscsi SAN.
> I still find Xen mysterious, and fear it would be a huge time sink for
> lack of broad support and documentation.
Is it? xen-...@lists.xensource.com and xen-d...@opensolaris.org
is a good place to start.
> Do you *have* to have
> separate partitions for each domU?
Not really. file-backed domU is also possible, although for optimum
performance you need block device backend (partition, LV, or ZVOL).
Personally I use zfs-fuse when the application runs most optimally
(well-tested, binary availablilty, etc.) on Linux.
I use Opensolaris mostly for its ZVOL capability, which I find very
useful when testing new configuration on top of Xen.
--
Fajar
> On Sat, May 23, 2009 at 4:15 AM, David Abrahams <da...@boostpro.com> wrote:
>> On May 22, 2009, at 4:11 PM, sghe...@hotmail.com wrote:
>>> I've only started getting workable results with OpenSolaris this week
>>> (as I finally got my head around making the NIC work - muhahaha. It
>>> appears if the NIC doesn't work, OS is dead in the water.
>
> It's a known problem :)
> They use nwam (kinda like network-manager in Linux) whose purpose is
> good, but at this point still have some bugs (like what you
> mentioned). Since I'm using it on server only, I simply disable
> svc:/network/physical:nwam and enable svc:/network/physical:default
> (the old-style solaris networking config)
>
>> I find OSOL quite foreign and difficult,
>
> It's actually quite good.
That's reassuring.
> If you're familiar with Solaris,
I'm not :(
> you can
> make it behave the same way with a little modification (like the
> network part I mentioned above). Add pkg (the software management),
> crossbow (the new bridge/vlan framework), and XVM (Xen), it has some
> of the best things from both Solaris and Linux.
Got installation pointers for all those things?
>> not to mention lacking in
>> flexibility and easily available software.
>
> pkg is similar to apt-get, with more and more packages coming. Third
> party IPS repositories are also available.
When I did a Solaris installation it gave me options of a whole mess of
packages to install, but apparently had no automatic dependency
management, with the result that it was pretty much impossible to
install anything other than a default configuration. Thank sort of
thing scares me; makes me think it's miles behind apt[itude].
>> If Linux competes on ZFS
>> speed, I'll use it. If not, I'll stick with Solaris for the fileserver.
>
> (Open)Solaris also has the benefits of zvol, something not available
> in zfs-fuse (yet).
At the pace development is going, I'm not holding my breath ;-)
> Very handy to create iscsi SAN.
Good point; I'll probably try that.
>> I still find Xen mysterious, and fear it would be a huge time sink for
>> lack of broad support and documentation.
>
> Is it? xen-...@lists.xensource.com and xen-d...@opensolaris.org
> is a good place to start.
>
>> Do you *have* to have
>> separate partitions for each domU?
>
> Not really. file-backed domU is also possible, although for optimum
> performance you need block device backend (partition, LV, or ZVOL).
That's a good reason to use OpenSolaris as the dom0 right there.
> Personally I use zfs-fuse when the application runs most optimally
> (well-tested, binary availablilty, etc.) on Linux.
> I use Opensolaris mostly for its ZVOL capability, which I find very
> useful when testing new configuration on top of Xen.
If I could get Ubuntu to run in a domU on top of OpenSolaris, I think
I'd be interested. But it looks like there are lots of iffy edges to
that picture, not least that the latest Ubuntu doesn't have a Xen
kernel.
That would make it a little harder then :D
Going from Solaris 10 -> Opensolaris is easy, only small adaptations required.
Going from Linux -> Opensolaris is somewhat harder, since there are
some new concepts to learn. Since you already installed the preview
release of osol 0906, and familiar-enough with zfs, it shouldn't be
too hard.
>
>> you can
>> make it behave the same way with a little modification (like the
>> network part I mentioned above). Add pkg (the software management),
>> crossbow (the new bridge/vlan framework), and XVM (Xen), it has some
>> of the best things from both Solaris and Linux.
>
> Got installation pointers for all those things?
Start from opensolaris.com. In particular :
- installing : http://www.opensolaris.com/use/. At this point I
suggest do NOT use osol 2008.11, but use latest 2009.06 preview
version from http://genunix.org/ (but I guess you know about this
already :P )
- IPS package manager : installed by default.
http://www.opensolaris.com/use/update/#packagingsystem
- crossbow : installed by default. One of the first thing I did was
use it to rename "bnx0" to "eth0" :D Try "man dladm" or
http://opensolaris.org/os/project/crossbow/
- xVM : This one's a bit tricky. Start from
http://trevoro.ca/blog/2008/05/07/getting-xvm-to-work-in-opensolaris-200805/.
I hope there will be official updated docs from Sun when opensolaris
2009.06 finally comes out.
>> pkg is similar to apt-get, with more and more packages coming. Third
>> party IPS repositories are also available.
>
> When I did a Solaris installation it gave me options of a whole mess of
> packages to install, but apparently had no automatic dependency
> management, with the result that it was pretty much impossible to
> install anything other than a default configuration. Thank sort of
> thing scares me; makes me think it's miles behind apt[itude].
Opensolaris installation today is similar to Ubuntu install from live
CD, you start with what you have on the CD. After that you have both
command line (pkg) and GUI (packagemanager) package management with
automatic depsolving capability. The old "pkgadd" still works though,
in case you need it :)
>> Personally I use zfs-fuse when the application runs most optimally
>> (well-tested, binary availablilty, etc.) on Linux.
>> I use Opensolaris mostly for its ZVOL capability, which I find very
>> useful when testing new configuration on top of Xen.
>
> If I could get Ubuntu to run in a domU on top of OpenSolaris, I think
> I'd be interested. But it looks like there are lots of iffy edges to
> that picture, not least that the latest Ubuntu doesn't have a Xen
> kernel.
Sort of. The main problem is Ubuntu doesn't have a Xen kernel, and
opensolaris' xen version can't run pv_ops kernel yet. There's a
workaround though: install Ubuntu as HVM domU, and convert it to PV
domU using Debian's xen kernel :
http://lists.xensource.com/archives/html/xen-users/2009-05/msg00536.html
--
Fajar
on Sat May 23 2009, "Fajar A. Nugraha" <fajar-AT-fajar.net> wrote:
> On Sat, May 23, 2009 at 9:26 PM, David Abrahams <da...@boostpro.com> wrote:
>> on Sat May 23 2009, "Fajar A. Nugraha" <fajar-AT-fajar.net> wrote:
>>> If you're familiar with Solaris,
>>
>> I'm not :(
>
> That would make it a little harder then :D Going from Solaris 10 ->
> Opensolaris is easy, only small adaptations required.
> Going from Linux -> Opensolaris is somewhat harder, since there are
> some new concepts to learn. Since you already installed the preview
> release of osol 0906, and familiar-enough with zfs, it shouldn't be
> too hard.
OK, I might try that again, in that case. Linux grub installation
confusion/problems "forced" me to blow away OpenSolaris when I installed
Linux, but I could sure try again.
>>> you can make it behave the same way with a little modification (like
>>> the network part I mentioned above).
I'm not sure what any of that meant; I wasn't having any networking
problems with OpenSolaris, but one thing I don't know how to do is
network bonding a la
http://techarcana.net/hydra/miscellanea/#network-bonding
Pointers?
>>> Add pkg (the software
>>> management), crossbow (the new bridge/vlan framework), and XVM
>>> (Xen), it has some of the best things from both Solaris and Linux.
>>
>> Got installation pointers for all those things?
>
> Start from opensolaris.com. In particular :
> - installing : http://www.opensolaris.com/use/. At this point I
> suggest do NOT use osol 2008.11, but use latest 2009.06 preview
> version from http://genunix.org/ (but I guess you know about this
> already :P )
Didn't know about it, actually. Thanks.
> - IPS package manager : installed by default.
> http://www.opensolaris.com/use/update/#packagingsystem
That's important, thanks.
> - crossbow : installed by default. One of the first thing I did was
> use it to rename "bnx0" to "eth0" :D Try "man dladm" or
> http://opensolaris.org/os/project/crossbow/
Fancy. Not sure I need it, but will look.
> - xVM : This one's a bit tricky.
Yes, that's exactly what I meant about Xen :-)
> Start from
> http://trevoro.ca/blog/2008/05/07/getting-xvm-to-work-in-opensolaris-200805/.
I'll look at that, too. Again, thanks.
> I hope there will be official updated docs from Sun when opensolaris
> 2009.06 finally comes out.
>
>>> pkg is similar to apt-get, with more and more packages coming. Third
>>> party IPS repositories are also available.
>>
>> When I did a Solaris installation it gave me options of a whole mess of
>> packages to install, but apparently had no automatic dependency
>> management, with the result that it was pretty much impossible to
>> install anything other than a default configuration. Thank sort of
>> thing scares me; makes me think it's miles behind apt[itude].
>
> Opensolaris installation today is similar to Ubuntu install from live
> CD, you start with what you have on the CD.
Yeah, but the expert mode, which gives a similar list of packages from
which to choose during installation, handles the dependency management
by (duh!) using apt.
> After that you have both command line (pkg) and GUI (packagemanager)
> package management with automatic depsolving capability. The old
> "pkgadd" still works though, in case you need it :)
Why would I need it? pkg doesn't always work?
>>> Personally I use zfs-fuse when the application runs most optimally
>>> (well-tested, binary availablilty, etc.) on Linux.
>>> I use Opensolaris mostly for its ZVOL capability, which I find very
>>> useful when testing new configuration on top of Xen.
>>
>> If I could get Ubuntu to run in a domU on top of OpenSolaris, I think
>> I'd be interested. But it looks like there are lots of iffy edges to
>> that picture, not least that the latest Ubuntu doesn't have a Xen
>> kernel.
>
> Sort of. The main problem is Ubuntu doesn't have a Xen kernel, and
> opensolaris' xen version can't run pv_ops kernel yet.
pv_ops?
http://wiki.xen.prgmr.com/xenophobia/2008/08/tell-me-about-pv-ops.html
I'm not the only one! This (among other similar things) is why Xen is
still a mystery to me.
> There's a
> workaround though: install Ubuntu as HVM domU,
> and convert it to PV domU using Debian's xen kernel :
> http://lists.xensource.com/archives/html/xen-users/2009-05/msg00536.html
The box in question doesn't have hardware virtualization support, so
IIUC HVM is off the table.
On Sun, May 24, 2009 at 7:41 PM, sghe...@hotmail.com
<sghe...@hotmail.com> wrote:
> I was pulling hard at making 2008.11 work
> as we spoke. I will restart using 2009.06 now since it has many of those
> goodies I *knew* should be feasible but *knew not* how to do, or at
> least more easily available.
For those interested in deploying opensolaris, I'd suggest joining
indiana-discuss list at opensolaris.org. Sun engineers are usually
there as well, so you'll get (semi) authoritative response as well as
the usual response from fellow users. There are also specialized lists
available : http://mail.opensolaris.org/mailman/listinfo
> David Abrahams wrote:
>> OK, I might try that again, in that case. Linux grub installation
>> confusion/problems "forced" me to blow away OpenSolaris when I installed
>> Linux, but I could sure try again.
For dual-boot LInux-Solaris on the same system, the easiest way to do
so is by treating opensolaris like Windows, and install its grub on
the solaris partition (not on MBR). Let Linux have the MBR (or the
active partition). It's also handy having the live CD for recovery
purposes.
If you want to share data between opensolaris and zfs fuse, "zpool
create -o version=13" is your friend. opensolaris now uses v14 while
zfs-fuse still use v13. You can not modify the zpool version for
opensolaris root pool (rpool) though (at least not easily), so it's
best to put shared data on a different pool.
>> but one thing I don't know how to do is
>> network bonding a la
>> http://techarcana.net/hydra/miscellanea/#network-bonding
>>
>> Pointers?
Solaris supports network bonding for a long time. With crossbow, the
instructions are slightly different though. "man dladm" is your friend
(look for dladm create-aggr)
>>> - xVM : This one's a bit tricky.
>> Yes, that's exactly what I meant about Xen :-)
Some features (like a "usable" xVM and crossbow) is available post
2008.11, so the instructions are still scattered everywhere. The lists
are a good place to start, and 2009.06 should have better
documentation.
>>> Opensolaris installation today is similar to Ubuntu install from live
>>> CD, you start with what you have on the CD.
>>
>> Yeah, but the expert mode, which gives a similar list of packages from
>> which to choose during installation, handles the dependency management
>> by (duh!) using apt.
As I recall the actual customization (package addition or removal) is
done after the entire live CD contents is copied to your HD. This is
actually similar with opensolaris, only you have to reboot first :)
>>> The old
>>> "pkgadd" still works though, in case you need it :)
>> Why would I need it? pkg doesn't always work?
Function-wise, pkg is similar to apt-get while pkgadd is similar to dpkg.
--
Fajar
> (my last reply on this topic. promise! :D )
>
> On Sun, May 24, 2009 at 7:41 PM, sghe...@hotmail.com
> <sghe...@hotmail.com> wrote:
>> I was pulling hard at making 2008.11 work
>> as we spoke. I will restart using 2009.06 now since it has many of those
>> goodies I *knew* should be feasible but *knew not* how to do, or at
>> least more easily available.
>
> For those interested in deploying opensolaris, I'd suggest joining
> indiana-discuss list at opensolaris.org. Sun engineers are usually
> there as well, so you'll get (semi) authoritative response as well as
> the usual response from fellow users. There are also specialized lists
> available : http://mail.opensolaris.org/mailman/listinfo
Thanks for the pointer.
>> David Abrahams wrote:
>>> OK, I might try that again, in that case. Linux grub installation
>>> confusion/problems "forced" me to blow away OpenSolaris when I installed
>>> Linux, but I could sure try again.
>
> For dual-boot LInux-Solaris on the same system, the easiest way to do
> so is by treating opensolaris like Windows, and install its grub on
> the solaris partition (not on MBR). Let Linux have the MBR (or the
> active partition). It's also handy having the live CD for recovery
> purposes.
Yeah, that wasn't the problem. It was grub quietly mapping /dev/sdi to
(hd0) when the machine was booted from installation CD, but not
otherwise.
> If you want to share data between opensolaris and zfs fuse, "zpool
> create -o version=13" is your friend.
Oh, this is good to know, thanks!
> opensolaris now uses v14 while
> zfs-fuse still use v13. You can not modify the zpool version for
> opensolaris root pool (rpool) though (at least not easily), so it's
> best to put shared data on a different pool.
>
>>> but one thing I don't know how to do is
>>> network bonding a la
>>> http://techarcana.net/hydra/miscellanea/#network-bonding
>>>
>>> Pointers?
>
> Solaris supports network bonding for a long time. With crossbow, the
> instructions are slightly different though. "man dladm" is your friend
> (look for dladm create-aggr)
Nice.
>>>> - xVM : This one's a bit tricky.
>>> Yes, that's exactly what I meant about Xen :-)
>
> Some features (like a "usable" xVM and crossbow) is available post
> 2008.11, so the instructions are still scattered everywhere. The lists
> are a good place to start, and 2009.06 should have better
> documentation.
>
>>>> Opensolaris installation today is similar to Ubuntu install from live
>>>> CD, you start with what you have on the CD.
>>>
>>> Yeah, but the expert mode, which gives a similar list of packages from
>>> which to choose during installation, handles the dependency management
>>> by (duh!) using apt.
>
> As I recall the actual customization (package addition or removal) is
> done after the entire live CD contents is copied to your HD. This is
> actually similar with opensolaris, only you have to reboot first :)
Well, it gives big, scary warnings, anyway.
>>>> The old
>>>> "pkgadd" still works though, in case you need it :)
>>> Why would I need it? pkg doesn't always work?
>
> Function-wise, pkg is similar to apt-get while pkgadd is similar to dpkg.
Thanks again for all the info. Now back to analyzing my data...
Here are two test runs of raidz2 on ZFS-Fuse. The second one was done
the same way as the first, except that zfs-fuse was built with debug=0
on the scons command line. IIUC zfs-fuse was already optimized; this
change only removes debug symbols.
Just for kicks, to see how different the results were, I plotted the
difference divided by the average:
2(X-Y)/(X+Y)
Where X was the one with debug=0 and Y was the earlier build.
The graph is enclosed. Those spikes are pretty surprising, and I'm not
sure what they mean.
To be sure that it isn't just randomness, try comparing two runs of
raidz2 under identical conditions (remove the debug=0 perturbation).
Also, is there any way to label the axes?
Thanks for all your hard work benchmarking, BTW.
Jonathan
>> Here are two test runs of raidz2 on ZFS-Fuse. The second one was done
>> the same way as the first, except that zfs-fuse was built with debug=0
>> on the scons command line. IIUC zfs-fuse was already optimized; this
>> change only removes debug symbols.
>>
>> Just for kicks, to see how different the results were, I plotted the
>> difference divided by the average:
>>
>> 2(X-Y)/(X+Y)
>>
>> Where X was the one with debug=0 and Y was the earlier build.
>>
>> The graph is enclosed. Those spikes are pretty surprising, and I'm not
>> sure what they mean.
>
> To be sure that it isn't just randomness, try comparing two runs of
> raidz2 under identical conditions (remove the debug=0 perturbation).
I'll try; I'm running out of cycles to spend on this. I'm doing a flat
mdRAID0 ext3 test now; any other requests?
> Also, is there any way to label the axes?
Of course; that graph was just quick-and-dirty. I have been
researching ways to get good images and I think I'm settling on
http://code.enthought.com/projects/mayavi
That's an open-source data visualization framework/application, so in
principle once I've put something out there, anyone else can explore the
graphs and do their own analysis.
> on Mon May 25 2009, Jonathan Schmidt <jon-AT-jschmidt.ca> wrote:
>
>>> Here are two test runs of raidz2 on ZFS-Fuse. The second one was done
>>> the same way as the first, except that zfs-fuse was built with debug=0
>>> on the scons command line. IIUC zfs-fuse was already optimized; this
>>> change only removes debug symbols.
>>>
>>> Just for kicks, to see how different the results were, I plotted the
>>> difference divided by the average:
>>>
>>> 2(X-Y)/(X+Y)
>>>
>>> Where X was the one with debug=0 and Y was the earlier build.
>>>
>>> The graph is enclosed. Those spikes are pretty surprising, and I'm not
>>> sure what they mean.
>>
>> To be sure that it isn't just randomness, try comparing two runs of
>> raidz2 under identical conditions (remove the debug=0 perturbation).
>
> I'll try; I'm running out of cycles to spend on this. I'm doing a flat
> mdRAID0 ext3 test now;
...which is taking truly forever. I guess you can all feel good about
picking ZFS-Fuse instead of EXT3 unless your application spends a great
deal of time rewriting the same spot within a given file. The "record
rewrite" test is the only one on which EXT3 is winning; otherwise it's
losing bigtime when the file size exceeds memory.
Now I'm _really_ running out of time to work on this. With another
machine coming on which I can virtualize 64-bit OSes, I don't really
have a strong incentive to keep this one running Linux. I think I have
the information *I* need, which tells me that in addition to continuing
to be maintained and improved, OpenSolaris ZFS is beating ZFS-Fuse on
performance often by a factor of 5 or more.
I will post the numbers I have, but as for making pretty graphs, I've
already spent too long trying to set that up. If anyone else would like
to pick up where I left off, I'd be very happy to help. I have a bunch
of Python scripts for parsing the iozone output, and even something that
will use mayavi to draw a surface.
> any other requests?
*** Really, last call for requests. ***
>> Also, is there any way to label the axes?
>
> Of course; that graph was just quick-and-dirty. I have been
> researching ways to get good images and I think I'm settling on
> http://code.enthought.com/projects/mayavi
>
> That's an open-source data visualization framework/application, so in
> principle once I've put something out there, anyone else can explore the
> graphs and do their own analysis.
Sorry about that; I think I've overpromised on that score. Again, I
really would be very glad if someone else would take up the job of
making graphs and doing some analysis -- it would be good to publish
these results in a digestible form on the web.
Regards,
thanks for all the efforts. The numbers (so far) are good enough for
me... pretty graphs are nice but not as valuable as the data itself.
Will you be documenting your setup like with Hydera the Beast? Sounds
like a very interesting setup *and* I have a box capable of virtualizing
64-bit OS-es already.
I might want to go the same route (allthough the download of
osol-0906-111a-x86 is taking forever)
>> I'll try; I'm running out of cycles to spend on this. I'm doing a flat
>> mdRAID0 ext3 test now;
>
> ...which is taking truly forever.
et voila.
Last chance for requests; I'm going to close this party down. Do I owe
somebody a 'dd' test before I do that?
> For those interested in deploying opensolaris, I'd suggest joining
> indiana-discuss list at opensolaris.org. Sun engineers are usually
> there as well, so you'll get (semi) authoritative response as well as
> the usual response from fellow users. There are also specialized lists
> available : http://mail.opensolaris.org/mailman/listinfo
Hi Fajar,
Just a quick thank you for encouraging me w.r.t. OpenSolaris. The more
I use it, the more impressed I am with the technology, especially the
way ZFS is integrated into Boot Environments and Zones, not to mention
time slider and who-knows-what else.
Regards,
On a slightly less positive note: I managed to kill my most promising
Solaris install by installing Sun Studio (probably); somehow the package
manager has become wedged (it keeps telling me 'this package cannot be
installed on its own, please update all'. Then, when I try to do
'update all' it sais: your system has already been updated. End of story[1]
In a desperate attempt to remedy the situation, I deleted my other
bootenvironments (I should have known better, now I even lost the
ability to go back to one of them...)
So, although the machine runs and is all shiny (I love the features Dave
mentions) I know it is a dead end, and I'll have to start over.
At least then, I'll be better able to leverage boot environments in case
something bad like this happens again.
FTM. I'm looking into blastwave packages - these still work, but i'm not
exactly clear on how (well) this integrates with regular Sun package
managements and updates. I'm still having fun finding all this out. To
me, it seems obvious I'll need to have Blastwave working for me in a way
that is durable to make the switch. I'm using all that stuff on a
day-to-day basis and don't want to have to virtualize everything.
Oh and - I'm looking into buying one of the Tranquil PC' s (T7 series)
as a fanless server because the mini-ITX Mobo apparently supports
OpenSolaris. That would be crazy fun as a replacement for my web/mailserver!
[1] PS. I should mention that googling any of these error messages
returns ZERO hits. ZERO! Btw. I'm writing this mail offline so I might
not recall the exact wording of the messages correctly.