Raidz Performance

81 views
Skip to first unread message

Jeff

unread,
Feb 26, 2011, 5:55:48 PM2/26/11
to KQStor ZFS Discussion
I have 4x 1.5TB disks in a raidz configuration and it has terrible
write speeds. Typical write speeds are 8-12 MB/s. All the disks are
plugged into onboard sata III ports. Single disk gives read/write of
120/130 MB/s. 4GB ECC ram, AMD 735 3.0GHz triple-core, on AMD 890GX.
Any tweaks I can make to improve performance?

<code>
uname -a
Linux userver 2.6.35-22-server #35-Ubuntu SMP Sat Oct 16 22:02:33 UTC
2010 x86_64 GNU/Linux
</code>


<code>
lspci
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge
Alternate
00:01.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI
bridge (int gfx)
00:05.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI
bridge (PCIE port 1)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI
bridge (PCIE port 5)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA
Controller [AHCI mode] (rev 40)
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0
Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI
Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0
Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI
Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 42)
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA)
(rev 40)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host
controller (rev 40)
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge (rev
40)
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2
Controller
00:15.0 PCI bridge: ATI Technologies Inc Device 43a0
00:16.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0
Controller
00:16.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI
Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
Link Control
01:05.0 VGA compatible controller: ATI Technologies Inc RS880 [Radeon
HD 4290]
01:05.1 Audio device: ATI Technologies Inc RS880 Audio Device [Radeon
HD 4200]
02:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host
Controller (rev 03)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
04:07.0 PCI bridge: Pericom Semiconductor PI7C8140A PCI-to-PCI Bridge
04:0e.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23
IEEE-1394a-2000 Controller (PHY/Link)
05:08.0 Multimedia video controller: Internext Compression Inc iTVC16
(CX23416) MPEG-2 Encoder (rev 01)
05:09.0 Multimedia video controller: Internext Compression Inc iTVC16
(CX23416) MPEG-2 Encoder (rev 01)
06:00.0 SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial
ATA Controller (rev 02)
06:00.1 IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial
ATA Controller (rev 02)
</code>

Hugues Talbot

unread,
Feb 26, 2011, 7:50:08 PM2/26/11
to kqstor-zf...@googlegroups.com
Hello all,

By comparison, I have a 3x2TB raidz configuration on FreeBSD (FreeNAS actually) on a 1000Gbps network. The CPU is a single core 2GHz Athlon64.

On this system, Zfs average write speed over the network is in the order of 15MB/s ... It doesn't seem to be limited by CPU but by the I/O bottleneck, everything is hanging off an old-fashion PCI bus.

I don't think ZFS is known for writing performance.

xyzzyx

unread,
Feb 27, 2011, 11:15:25 AM2/27/11
to KQStor ZFS Discussion
This is very odd... i am running a 7x1TB raidz and getting close to
100MB/s write, ~85MB/s read over network... are you using dedup? That
can slow write down to ridiculous level....
I am running on an AMD 1090T, 890FX chipset and 16GB ram.

Hugues Talbot

unread,
Feb 27, 2011, 1:08:00 PM2/27/11
to kqstor-zf...@googlegroups.com
I guess the six cores of the 1090T are helping a little.

Jeff

unread,
Feb 28, 2011, 12:10:15 PM2/28/11
to KQStor ZFS Discussion
On a raid5 I get much better read and write speeds.
http://openbenchmarking.org/result/1102262-IV-RAID5723587
I've gone back to that for now for my main storage. I'll keep playing
around with zfs on other drives, but until it can saturate my network
on read and write, its just for testing. I can't find the results for
the raidz, maybe they were deleted, but it wasn't good. I'll add them
when I get the next array up.

Hugues Talbot

unread,
Feb 28, 2011, 12:42:28 PM2/28/11
to kqstor-zf...@googlegroups.com
Sure, RAID5 is faster, but are you interested in keeping your data ?

http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162

ZFS and raidz were designed to solve the problem outlined in this article.

I actually have had this problem: one disk died, I replaced it. While the array was rebuilding itself, there was one bad sector on another disk that had never been detected. The array was unrebuildable. I lost everything (I had backup).

With raidz, I would have lost one sector worth of data. With ZFS, if you scrub from time to time you can detect bit rot and failing disks in advance.

ZFS is not for speed, it is for data integrity.

===
Hugues Talbot
9 allées des cornouillers, 77420 Champs sur Marne, France
+33 6 72 07 51 26


Jeff

unread,
Mar 1, 2011, 9:36:48 PM3/1/11
to KQStor ZFS Discussion
I do understand that ZFS is more for data integrity, and I have read
that article. I would like to be able to saturate a Gb ethernet
connection though. After reading around today, performance for a raidz
should be on par with a single disk or better, which is just enough to
saturate the ethernet, great. Problem is I am getting 10% of those
speeds. I am going to give it another shot with some different disks
and on another system as well to see if I can get the speeds up. I am
also going to try and run some tests on OpenIndiana and FreeBSD to
compare if they will install because I am interested in whether this
is a ZFS on linux issue or not.

Overall though a lot of reading, it seems that though raidz is nice,
most are turning to mirror sets and adding disks to the pool in
groups. This gives increased performance and safety, with decreased
storage. Using mirrors should hopefully give me the performance I want
with ZFS. Then I can add cool features like dedup and compression and
still have enough performance. Seems I have a lot of testing to do.
Thanks for your input.

timo viitanen

unread,
Mar 2, 2011, 3:26:28 AM3/2/11
to kqstor-zf...@googlegroups.com
Do not bother to test with the dedup. It is not ready for any real use. You need so much memory on the server that you can buy a lot more disks with the same price and have much better system.

/Timo

Khushil Dep

unread,
Mar 2, 2011, 3:32:31 AM3/2/11
to kqstor-zf...@googlegroups.com, timo viitanen
I have installed a number of large ZFS installs on Solaris and NexentaStor and I can say that de-dupe is well worth it. You do however have to provide at least 48GB of ram. That's not really a large amount on a server these days and if you shop around you'll get a good deal - I always use SuperMicro kit. In a number of virtualisation scenarios, I've used compression on the VM's and de-dupe on the NFS data store for general storage. Make sure you have a couple fo 5540's in the head-node though and use a 6GB SAS driver.

You can indeed saturate a gig pipe - but remember to use jumbo packets, increase the tcp buffer sizes and ensure that your processor can keep up with the data. Turn of cache's as well if you're really wanting to test.

---
W. A. Khushil Dep - khush...@nixiphite.net-  07905374843
Windows - Linux - Solaris - ZFS - XenServer - FreeBSD - C/C++ - PHP/Perl - LAMP - Nexenta - Development - Consulting & Contracting

Hugues Talbot

unread,
Mar 2, 2011, 3:56:03 AM3/2/11
to kqstor-zf...@googlegroups.com, KQStor ZFS Discussion
This is a great plan. My own testing with OpenBSD and raidz is that it is significantly slower than a single disk at least with a relatively weak CPU by today's standards.

Further tests with zfs and mirroring under Linux indicate that the speed is fine.

Please keep us posted with your results and conclusions.

If you have references for us to read that would be helpful too.

Hugues Talbot (depuis mobile)

Jeff

unread,
Mar 9, 2011, 11:24:26 PM3/9/11
to KQStor ZFS Discussion
I have done some quick tests of a zfs 3-disk raid-z on OpenIndiana
148. The disks are Samsung F4EG 2TB drives. I am getting average
writes of 110MB/s, which is on-par with single disk performance.
Compression is even better. With these results I might have to learn
how to use solaris better.

Raid-z 3-disk: 110MB/s
Raid-z compression=on: 210MB/s
Raid-z compression=on, dedup=on: 220MB/s

I know its not a good comparison to my last sets but I couldn't get
the other hardware to work with solaris yet. So the hardware is all
different. I've read about the performance hit for 3 vs 4 disk raidz,
but this seems like too much. ZFSguru has a lot of benchmarks of
different configurations. I'll post again once there is more.

<code>
root@openindiana:/tank# sudo zfs set compression=off tank
root@openindiana:/tank# time mkfile 20g test2

real 3m12.771s
user 0m0.027s
sys 0m4.047s
root@openindiana:/tank# sudo zfs set compression=on tank
root@openindiana:/tank# time mkfile 20g test3

real 1m37.034s
user 0m0.027s
sys 0m3.880s
root@openindiana:/tank# sudo zfs set dedup=on tank
root@openindiana:/tank# time mkfile 20g test4

real 1m32.619s
user 0m0.027s
sys 0m3.855s
</code>


On Mar 2, 8:56 am, Hugues Talbot <hugues.tal...@gmail.com> wrote:
> This is a great plan. My own testing with OpenBSD and raidz is that it is significantly slower than a single disk at least with a relatively weak CPU by today's standards.
>
> Further tests with zfs and mirroring under Linux indicate that the speed is fine.
>
> Please keep us posted with your results and conclusions.
>
> If you have references for us to read that would be helpful too.
>
> Hugues Talbot (depuis mobile)
>

k...@ironsoftware.de

unread,
Mar 10, 2011, 4:14:51 AM3/10/11
to kqstor-zf...@googlegroups.com
Very interesting test. More reflective would be a test where you pipe RANDOM data through a network socket into a file.
Netcat is your friend.

So basically with mkfile u r compressing zeros at cpu speed and dedup'ing existing block hashes from RAM.

Viele Grüße / Kind Regards / Un Saludo

---
Dipl.-Ing. Christian Kendi
Iron Software GbR
Gärtnerstr. 62b
80992 Munich, Germany
mailto: k...@ironsoftware.de
mobile: +49 (0) 177 / 55 - 31 33 7
phone: +49 (0) 89 42 09 56 319
spain: +34 (637) 12 43 49
*****************************************
Geschaeftsfuehrer: Christian Kendi
Steuernr: 114/235/50572
Amtsgericht: Erding

Jeff

unread,
Mar 10, 2011, 12:40:31 PM3/10/11
to KQStor ZFS Discussion
Ah that makes sense. Not a very good test then, although I would hope
my CPU can compress/dedup zeros faster than 220MB/s. When I tried to
use dd if=/dev/random I was getting writes of 0.3KB/s. I will try
netcat when I have some more time. Interesting though when I had
dedup=on zpool list shows no deduplication for the mkfile. Once I get
more comfortable with solaris I'd like to run a comparison using
IOstat like my other tests, but I was just excited I had something
mounted.

Stone

unread,
Mar 10, 2011, 12:44:36 PM3/10/11
to kqstor-zf...@googlegroups.com
try bonnie++ too.

stone

On 03/10/2011 06:40 PM, Jeff wrote:
> Ah that makes sense. Not a very good test then, although I would hope
> my CPU can compress/dedup zeros faster than 220MB/s. When I tried to
> use dd if=/dev/random I was getting writes of 0.3KB/s. I will try
> netcat when I have some more time. Interesting though when I had
> dedup=on zpool list shows no deduplication for the mkfile. Once I get
> more comfortable with solaris I'd like to run a comparison using
> IOstat like my other tests, but I was just excited I had something
> mounted.
>
> On Mar 10, 1:14 am, k...@ironsoftware.de wrote:
>> Very interesting test. More reflective would be a test where you pipe RANDOM data through a network socket into a file.
>> Netcat is your friend.
>>
>> So basically with mkfile u r compressing zeros at cpu speed and dedup'ing existing block hashes from RAM.
>>

>> Viele Gr��e / Kind Regards / Un Saludo


>>
>> ---
>> Dipl.-Ing. Christian Kendi
>> Iron Software GbR

>> G�rtnerstr. 62b

>>> Le 2 mars 2011 � 03:36, Jeff<jefe.s...@gmail.com> a �crit :

>>>>> 9 all�es des cornouillers, 77420 Champs sur Marne, France

Hugues Talbot

unread,
Mar 10, 2011, 4:51:45 PM3/10/11
to kqstor-zf...@googlegroups.com
Hello,

/dev/random can be very slow because it outputs truly random numbers from various hardware sources and will block if not enough entropy is available in the system.

/dev/urandom is usually much faster, but output only pseudorandom numbers.

This is under Linux, I don't know what the behaviour is under Solaris.

Reply all
Reply to author
Forward
0 new messages