Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ZFS raidz and 4k sector disks

83 views
Skip to first unread message

Alexey Tarasov

unread,
Mar 28, 2010, 3:06:29 AM3/28/10
to freeb...@freebsd.org
Hello.

I have reviewed a lot of discussion about new WD 4k sector disks (...EARS).
I have RAIDZ pool of such disks with very bad performance. Now my GPT ZFS partitions don't start from value dividable by 4 (162).
Some guys noticed that aligning ZFS partitions according to recommendation wouldn't help at all because RAIDZ uses variable stripe size.
So where is the bottleneck of this configuration: 1) in ZFS which doesn't know about 4k sectorsize? 2) maybe somewhere inside FreeBSD VFS code or disk driver code? 3) somewhere else?

If you doesn't want to use ZFS there are two ways to deal with this problem:
1) Use GNOP with virtual 4k sectorsize.
2) Use UFS on properly aligned partition and some tuning.

--
Alexey Tarasov

(\__/)
(='.'=)
E[: | | | | :]З
(")_(")

Ivan Voras

unread,
Mar 28, 2010, 6:44:35 AM3/28/10
to freeb...@freebsd.org
Alexey Tarasov wrote:
> Hello.
>
> I have reviewed a lot of discussion about new WD 4k sector disks (...EARS).
> I have RAIDZ pool of such disks with very bad performance. Now my GPT ZFS partitions don't start from value dividable by 4 (162).
> Some guys noticed that aligning ZFS partitions according to recommendation wouldn't help at all because RAIDZ uses variable stripe size.

Yes, it doesn't group data in aligned "clusters" - the basic data
alignment is sector-sized.

> So where is the bottleneck of this configuration: 1) in ZFS which doesn't know about 4k sectorsize? 2) maybe somewhere inside FreeBSD VFS code or disk driver code? 3) somewhere else?

AFAIK (I don't have actual experience with them) current 4k drives
emulate 512b drives and have a performance penalty in the above
scenario. Because of this emulation, ZFS doesn't know you have a 4k
drive. If you can, try disabling this emulation and make it present to
the operating system as a true 4k drive. Of course, this will make the
data on it unreadable - you will have to reformat it, and the drive will
be unbootable.

Good luck and report what you find.

Alexey Tarasov

unread,
Mar 28, 2010, 9:03:38 AM3/28/10
to Ivan Voras, freeb...@freebsd.org
According to mailing lists, these drives can't be configured to present 4k sectors to OS.
Emulation jumper only changes first LBA number to 1 to made WinXP start first partition from 64 sector (63+1).
May be fresh firmware update can solve this problem.

On 28.03.2010, at 14:44, Ivan Voras wrote:

> AFAIK (I don't have actual experience with them) current 4k drives emulate 512b drives and have a performance penalty in the above scenario. Because of this emulation, ZFS doesn't know you have a 4k drive. If you can, try disabling this emulation and make it present to the operating system as a true 4k drive. Of course, this will make the data on it unreadable - you will have to reformat it, and the drive will be unbootable.
>
> Good luck and report what you find.
>

> _______________________________________________
> freeb...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-...@freebsd.org"

Thomas Zander

unread,
Mar 28, 2010, 12:49:18 PM3/28/10
to freeb...@freebsd.org
On Sun, Mar 28, 2010 at 15:03, Alexey Tarasov <m...@lexasoft.ru> wrote:
> According to mailing lists, these drives can't be configured to present 4k sectors to OS.

Maybe that's a dumb idea, but I am curious.
As far as I understand, the following points apply:
- EARS and RAIDZ is slow, but
- if you make ZFS aware of the 4k sector size (gnop), performance is okay

If so, what happens if you configure geli for the whole drive with 4k
sectors? Maybe you get AES-128 for free :-)

Riggs

Alexey Tarasov

unread,
Mar 28, 2010, 1:53:44 PM3/28/10
to Thomas Zander, freeb...@freebsd.org

What do you mean "for free"? =)

Simun Mikecin

unread,
Mar 29, 2010, 3:36:58 AM3/29/10
to Alexey Tarasov, freeb...@freebsd.org
----- Original Message ----
> From: Alexey Tarasov <m...@lexasoft.ru>
> To: Ivan Voras <ivo...@freebsd.org>; freeb...@freebsd.org
> Sent: Sun, March 28, 2010 3:03:38 PM
> Subject: Re: ZFS raidz and 4k sector disks
>
> According to mailing lists, these drives can't be configured to present 4k
> sectors to OS.
Emulation jumper only changes first LBA number to 1 to made
> WinXP start first partition from 64 sector (63+1).
May be fresh firmware
> update can solve this problem.


I don't think they will ever come with a firmware update like that because WD has major operating systems (like Windows 7) covered.
It could be solved by asking the drive what is the size of a sector (such ATA call exists: see http://www.wdc.com/wdproducts/library/WhitePapers/ENG/2579-771430.pdf) and set the sector size of the device (/dev/[a]da*) as needed . That would be a permanent solution for ZFS, UFS, swap or any other consumer of this device.


Thomas Zander

unread,
Mar 29, 2010, 3:33:34 AM3/29/10
to Alexey Tarasov, freeb...@freebsd.org
On Sun, Mar 28, 2010 at 19:53, Alexey Tarasov <m...@lexasoft.ru> wrote:
>> If so, what happens if you configure geli for the whole drive with 4k
>> sectors? Maybe you get AES-128 for free :-)
>
> What do you mean "for free"? =)

I mean (4K sectors + geli + RAIDZ) could be faster than (4K sectors + RAIDZ) ?

Riggs

Alexey Tarasov

unread,
Mar 29, 2010, 4:40:21 AM3/29/10
to freeb...@freebsd.org
Cool.
May be there is someone here who can implement this solution?

On 29.03.2010, at 11:36, Simun Mikecin wrote:

> I don't think they will ever come with a firmware update like that because WD has major operating systems (like Windows 7) covered.
> It could be solved by asking the drive what is the size of a sector (such ATA call exists: see http://www.wdc.com/wdproducts/library/WhitePapers/ENG/2579-771430.pdf) and set the sector size of the device (/dev/[a]da*) as needed . That would be a permanent solution for ZFS, UFS, swap or any other consumer of this device.

--

Pascal Stumpf

unread,
Mar 29, 2010, 10:16:27 AM3/29/10
to freeb...@freebsd.org
I have exactly that setup (3 drives, GPT partitions aligned to 4k
boundaries, geli with 4k sector size, on top of that RAID-Z), and
the write performance is the same as with a single ZFS partition on
a regular 512-sector disk. So yes, it does work. (I don’t have a
benchmark on how geli affects RAID-Z performance though, since
all my ZFS partitions are encrypted.

Alexey Tarasov

unread,
Mar 29, 2010, 10:40:37 AM3/29/10
to Pascal Stumpf, freeb...@freebsd.org
Thank you for your reply, I will try to realign my partitions as soon as possible.

> _______________________________________________
> freeb...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-...@freebsd.org"

--

Alexey Tarasov

unread,
Mar 29, 2010, 11:26:39 AM3/29/10
to Ivan Voras, freeb...@freebsd.org
There is one problem with this solution: GNOP can't be attached to root-on-ZFS configuration.

On 29.03.2010, at 19:18, Ivan Voras wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1


>
> On 03/29/10 16:40, Alexey Tarasov wrote:
>> Thank you for your reply, I will try to realign my partitions as soon as possible.
>

> Another possible solution is gnop, which I think somebody already
> mentioned. It too can create sector sizes of multiple base sector size.

Ivan Voras

unread,
Mar 29, 2010, 11:18:35 AM3/29/10
to freeb...@freebsd.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/29/10 16:40, Alexey Tarasov wrote:

> Thank you for your reply, I will try to realign my partitions as soon as possible.

Another possible solution is gnop, which I think somebody already


mentioned. It too can create sector sizes of multiple base sector size.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuwxMsACgkQldnAQVacBcg4NQCcCV3GaPcbN+3eZV9+8T0ute4e
bqMAn3GceXqmxsH8fznglIxqEQ2hpzaU
=+M9c
-----END PGP SIGNATURE-----

Alexey Tarasov

unread,
Apr 8, 2010, 9:55:33 AM4/8/10
to freeb...@freebsd.org
Hello.

I've tried all methods and realized that unfortunately the only working method is gnop. So you can't use these disks for ZFS at all now.

On 29.03.2010, at 19:18, Ivan Voras wrote:

> Another possible solution is gnop, which I think somebody already
> mentioned. It too can create sector sizes of multiple base sector size.

--

Andriy Gapon

unread,
Apr 9, 2010, 4:28:35 AM4/9/10
to Alexey Tarasov, freeb...@freebsd.org
on 08/04/2010 16:55 Alexey Tarasov said the following:

> Hello.
>
> I've tried all methods and realized that unfortunately the only working
> method is gnop. So you can't use these disks for ZFS at all now.

Why? And what are you actually trying to do?
My understanding was that even with 512-byte sectors ZFS still aligns its
on-disk data with > 4K alignment.
Do you see otherwise? What problem do you have?

--
Andriy Gapon

Alexey Tarasov

unread,
Apr 9, 2010, 7:15:17 AM4/9/10
to freeb...@freebsd.org
Hello.

I see considerably increased performance when creating over gnop -S 4096 virtual disk. Even when I create zpool over raw disks the performance is very bad and concurent writes stalls. When using gnop, zfs works VERY fast!

Btw, here is another discussion, may be there is a bug in a mav@ commit, because he has added support for >512 sector size:
http://lists.freebsd.org/pipermail/freebsd-current/2010-April/016495.html

Превед Украине! =)

--

Alexey Tarasov

unread,
Apr 9, 2010, 7:36:17 AM4/9/10
to Andrey V. Elsukov, freeb...@freebsd.org

On 09.04.2010, at 15:24, Andrey V. Elsukov wrote:

> On 09.04.2010 15:15, Alexey Tarasov wrote:
>> Btw, here is another discussion, may be there is a bug in a mav@ commit, because he has added
>> support for>512 sector size:
>

> First of can you look to the commit log and understand what it made?

http://svn.freebsd.org/viewvc/base?view=revision&revision=198897

- Add support for sector size > 512 bytes and physical sector of several
logical sectors, introduced by ATA-7 specification.

May be I have misunderstood this log message?

--
Alexey Tarasov

(\__/)
(='.'=)

E[: | | | | :]О©╫
(")_(")

Andrey V. Elsukov

unread,
Apr 9, 2010, 7:24:11 AM4/9/10
to Alexey Tarasov, freeb...@freebsd.org
On 09.04.2010 15:15, Alexey Tarasov wrote:
> Btw, here is another discussion, may be there is a bug in a mav@ commit, because he has added
> support for>512 sector size:

First of can you look to the commit log and understand what it made?

--
WBR, Andrey V. Elsukov

Alexey Tarasov

unread,
Apr 9, 2010, 8:25:05 AM4/9/10
to Šimun Mikecin, freeb...@freebsd.org

> This commit is for HEAD (are you using HEAD?) and I suppose it doesn't work if you are using ata (disk name is adX) driver instead of ahci (disk name is adaX).

It was MFC'ed to 8-STABLE.
Will try ahci with fresh STABLE later.

--
Alexey Tarasov

(\__/)
(='.'=)

E[: | | | | :]З
(")_(")

Šimun Mikecin

unread,
Apr 9, 2010, 8:11:02 AM4/9/10
to Alexey Tarasov, freebsd-fs

http://svn.freebsd.org/viewvc/base?view=revision&revision=198897

- Add support for sector size > 512 bytes and physical sector of several
logical sectors, introduced by ATA-7 specification.

May be I have misunderstood this log message?

0 new messages