I have two separate systems and with ext4 I cannot get speeds greater than
~350MiB/s when using ext4 as the filesystem on top of a raid5 or raid0.
It appears to be a bug with ext4 (or its just that ext4 is slower for this
test)?
Each system runs 2.6.33 x86_64.
Can someone please confirm?
Here is ext4:
# dd if=/dev/zero of=bigfile bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 29.8556 s, 360 MB/s
The result is the same regardless of the RAID type (RAID-5 or RAID-0)
Note, this is not a bandwidth problem:
# dd if=/dev/zero of=/dev/md0 bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 17.6871 s, 607 MB/s
With XFS:
p63:~# mkfs.xfs -f /dev/md0
p63:~# mount /dev/md0 /r1
p63:~# cd /r1
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 17.6078 s, 610 MB/s
NOTE: With a HW raid controller (OR using XFS), I can get > 500 MiB/s,
this problem only occurs with SW raid (Linux/mdadm).
Example (3ware 9650SE-16PML RAID-6, 15 drives (using EXT4)
$ dd if=/dev/zero of=bigfile bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 21.1729 s, 507 MB/s
Justin.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
> Hello,
>
> I have two separate systems and with ext4 I cannot get speeds greater than
> ~350MiB/s when using ext4 as the filesystem on top of a raid5 or raid0.
> It appears to be a bug with ext4 (or its just that ext4 is slower for this
> test)?
>
> Each system runs 2.6.33 x86_64.
Could be related to the recent implementation of IO barriers in md.
Can you try mounting your filesystem with
-o barrier=0
and see how that changes the result.
NeilBrown
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
On Sun, 28 Feb 2010, Neil Brown wrote:
> On Sat, 27 Feb 2010 08:47:48 -0500 (EST)
> Justin Piszcz <jpi...@lucidpixels.com> wrote:
>
>> Hello,
>>
>> I have two separate systems and with ext4 I cannot get speeds greater than
>> ~350MiB/s when using ext4 as the filesystem on top of a raid5 or raid0.
>> It appears to be a bug with ext4 (or its just that ext4 is slower for this
>> test)?
>>
>> Each system runs 2.6.33 x86_64.
>
> Could be related to the recent implementation of IO barriers in md.
> Can you try mounting your filesystem with
> -o barrier=0
>
> and see how that changes the result.
>
> NeilBrown
Hi Neil,
Thanks for the suggestion, it has been used here:
http://lkml.org/lkml/2010/2/27/66
Looks like an EXT4 issue as XFS does ~600MiB/s..?
Its strange though, on a single hard disk, I get approximately the same
speed for XFS and EXT4, but when it comes to scaling across multiple disks,
in RAID-0 or RAID-5 (tested), there is a performance problem as it hits a
performance problem at ~350MiB/s. I tried multiple chunk sizes but nothing
seemed to made a difference (whether 64KiB or 1024KiB), XFS performs at
500-600MiB/s no matter what and EXT4 does not exceed ~350MiB/s.
Is there anyone on any of the lists that gets > 350MiB/s on a mdadm/sw raid
with EXT4?
A single raw disk, no partitions:
p63:~# dd if=/dev/zero of=/dev/sdm bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 92.4249 s, 116 MB/s
A single raw disk formatted with XFS, no partitions:
p63:~# mkfs.xfs /dev/sdm > /dev/null 2>&1
p63:~# mount /dev/sdm -o nobarrier,noatime /r1
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 85.7782 s, 125 MB/s
A single raw disk formatted with EXT4, no partitions:
p63:~# mkfs.ext4 /dev/sdm > /dev/null 2>&1
p63:~# mount /dev/sdm -o nobarrier,noatime /r1
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10737418240 bytes (11 GB) copied, 85.1501 s, 126 MB/s
p63:/r1#
EXT2 vs. EXT3 vs. EXT4
http://lkml.org/lkml/2010/2/27/77
XFS tests: (550-600MiB/s)
http://linux.derkeiler.com/Mailing-Lists/Kernel/2010-02/msg10572.html
Justin.
--
I hate to say it, but I don't think this measures anything useful. When
I was doing similar things I got great variabilty in my results until I
learned about the fdatasync option so you measure the actual speed to
the destination and not the disk cache. After that my results were far
slower and reproducible.
--
Bill Davidsen <davi...@tmr.com>
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein
Did you use any of the options with ext4? I found about 15-20% with
options, but I didn't take good enough notes to quote now. :-(
That doesn't mean there wasn't more, I tested on FC9, ext4 was
experimental then.
--
Bill Davidsen <davi...@tmr.com>
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein
--
How did you format the ext3 and ext4 filesystems?
Did you use mkfs.ext[34] -E stride and stripe-width accordingly?
AFAIK even older versions of mkfs.xfs will probe for this info but
older mkfs.ext[34] won't (though new versions of mkfs.ext[34] will,
using the Linux "topology" info).
On Sun, 28 Feb 2010, Bill Davidsen wrote:
> Justin Piszcz wrote:
>>
[ .. ]
>> fdatasync:
>> http://lkml.indiana.edu/hypermail/linux/kernel/1002.3/01507.html
>>
> I wasn't expecting a huge change in value, your data size is large. But
> thanks, the total time without sync can be off by at least seconds, making it
> hard to duplicate results. You missed nothing this time.
>
> Did you use any of the options with ext4? I found about 15-20% with options,
> but I didn't take good enough notes to quote now. :-(
> That doesn't mean there wasn't more, I tested on FC9, ext4 was experimental
> then.
Yes:
I tried nearly every option in the ext4 readme:
more:
p63:~# tune2fs -o journal_data_writeback /dev/md0
tune2fs 1.41.10 (10-Feb-2009)
p63:~# mount /dev/md0 /r1 -o noatime,barrier=0,data=writeback,nobh,commit=100,nouser_xattr
p63:~#
p63:~# cd /r1
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 35.7193 s, 301 MB/s
p63:/r1#
p63:~# mount /dev/md0 /r1 -o noatime,barrier=0,data=writeback,nobh,commit=100,nouser_xattr,nodelalloc
p63:~#
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 30.5846 s, 351 MB/s
p63:/r1#
p63:~# mount /dev/md0 /r1 -o noatime,barrier=0,data=writeback,nobh,commit=100,nouser_xattr,nodelalloc,max_batch_time=0
p63:~#
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 30.8501 s, 348 MB/s
p63:/r1#
p63:~# mount /dev/md0 /r1 -o noatime,barrier=0,data=writeback,nobh,commit=100,nouser_xattr,nodelalloc,min_batch_time=10000
p63:~#
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 31.0127 s, 346 MB/s
p63:/r1#
p63:~# mount /dev/md0 /r1 -o noatime,barrier=0,data=writeback,nobh,commit=100,nouser_xattr,nodelalloc,journal_ioprio=0
p63:~#
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 31.1559 s, 345 MB/s
p63:/r1# cd
p63:~# mount /dev/md0 /r1 -o noatime,barrier=0,data=writeback,nobh,commit=100,nouser_xattr,nodelalloc,journal_ioprio=7
p63:~#
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 31.4713 s, 341 MB/s
p63:/r1#
p63:~# mount /dev/md0 /r1 -o noatime,barrier=0,data=writeback,nobh,commit=100,nouser_xattr,nodelalloc,journal_async_commit
p63:~#
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 30.7633 s, 349 MB/s
p63:/r1#
p63:~# mount /dev/md0 /r1 -o noatime,barrier=0,data=writeback,nobh,commit=100,nouser_xattr,nodelalloc,journal_async_commit,oldalloc
p63:~#
p63:/r1# dd if=/dev/zero of=file bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 30.7607 s, 349 MB/s
p63:/r1#
p63:~# mount /dev/md0 /r1 -o noatime,barrier=0,data=writeback,nobh,commit=100,nouser_xattr,nodelalloc,journal_async_commit,stripe=1024
p63:~#
Justin.
On Sun, 28 Feb 2010, Mike Snitzer wrote:
> On Sun, Feb 28, 2010 at 4:45 AM, Justin Piszcz <jpi...@lucidpixels.com> wrote:
[ .. ]
>
> How did you format the ext3 and ext4 filesystems?
>
> Did you use mkfs.ext[34] -E stride and stripe-width accordingly?
> AFAIK even older versions of mkfs.xfs will probe for this info but
> older mkfs.ext[34] won't (though new versions of mkfs.ext[34] will,
> using the Linux "topology" info).
Yes and it did not make any difference:
http://lkml.org/lkml/2010/2/27/77
Incase anyone else wants to try too, you can calculate by hand, or if you
are in a hurry, I found this useful:
http://busybox.net/~aldot/mkfs_stride.html
I believe there is something fundamentally wrong with ext4 when performing
large sequential I/O when writing, esp. after Ted's comments.
Justin.
More later.
--
Bill Davidsen <davi...@tmr.com>
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein
--
Thanks, let me know how it goes, I see the same thing, on a single hard
drive, there is little difference between EXT4 and XFS:
http://permalink.gmane.org/gmane.linux.kernel/955357
However, when multiple disks are involved, it is a different story.
Justin.
--