Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

mfi driver performance too bad on LSI MegaRAID SAS 9260-8i

16 views
Skip to first unread message

Jason Zhang

unread,
Jun 17, 2016, 6:55:15 AM6/17/16
to
Hi,

I am working on storage service based on FreeBSD. I look forward to a good result because many professional storage company use FreeBSD as its OS. But I am disappointed with the Bad performance. I tested the the performance of LSI MegaRAID 9260-8i and had the following bad result:

1. Test environment:
(1) OS: FreeBSD 10.0 release
(2) Memory: 16G
(3) RAID adapter: LSI MegaRAID 9260-8i
(4) Disks: 9 SAS hard drives (10000 rpm), performance is expected for each hard drive
(5) Test tools: fio with io-depth=1, thread num is 32 and block size is 64k or 1M
(6) RAID configuration: RAID 5, stripe size is 1M

2. Test result:
(1) write performance too bad: 20Mbytes/s throughput and 200 random write IOPS
(2) read performance is expected: 700Mbytes/s throughput and 1500 random read IOPS


I tested the same hardware configuration with CentOS linux and Linux's write performance is 5 times better than FreeBSD.


Anyone encountered the same performance problem? Does the mfi driver have performance issue or I should give up on FreeBSD?

张京城 Jason

赛凡信息科技(厦门)有限公司
Cyphy Technology (Xiamen) Co.Ltd.
公司总部:厦门市软件园望海路55号A座901-904单元
研发总部:北京市东城区美术馆后街大取灯胡同2号
热线:4008798066
总机:0592-2936100
邮箱:jason...@cyphytech.com
公司网址:Http://www.cyphytech.com


_______________________________________________
freebsd...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"

Jason Zhang

unread,
Jun 17, 2016, 6:55:23 AM6/17/16
to
Hi,

I am working on storage service based on FreeBSD. I look forward to a good result because many professional storage company use FreeBSD as its OS. But I am disappointed with the Bad performance. I tested the the performance of LSI MegaRAID 9260-8i and had the following bad result:

1. Test environment:
(1) OS: FreeBSD 10.0 release
(2) Memory: 16G
(3) RAID adapter: LSI MegaRAID 9260-8i
(4) Disks: 9 SAS hard drives (10000 rpm), performance is expected for each hard drive
(5) Test tools: fio with io-depth=1, thread num is 32 and block size is 64k or 1M
(6) RAID configuration: RAID 5, stripe size is 1M

2. Test result:
(1) write performance too bad: 20Mbytes/s throughput and 200 random write IOPS
(2) read performance is expected: 700Mbytes/s throughput and 1500 random read IOPS


I tested the same hardware configuration with CentOS linux and Linux's write performance is 5 times better than FreeBSD.


Anyone encountered the same performance problem? Does the mfi driver have performance issue or I should give up on FreeBSD?





Jason

Julian Elischer

unread,
Jun 20, 2016, 2:58:46 AM6/20/16
to
On 17/06/2016 3:16 PM, Jason Zhang wrote:
> Hi,
>
> I am working on storage service based on FreeBSD. I look forward to a good result because many professional storage company use FreeBSD as its OS. But I am disappointed with the Bad performance. I tested the the performance of LSI MegaRAID 9260-8i and had the following bad result:
>
> 1. Test environment:
> (1) OS: FreeBSD 10.0 release
> (2) Memory: 16G
> (3) RAID adapter: LSI MegaRAID 9260-8i
> (4) Disks: 9 SAS hard drives (10000 rpm), performance is expected for each hard drive
> (5) Test tools: fio with io-depth=1, thread num is 32 and block size is 64k or 1M
> (6) RAID configuration: RAID 5, stripe size is 1M
>
> 2. Test result:
> (1) write performance too bad: 20Mbytes/s throughput and 200 random write IOPS
> (2) read performance is expected: 700Mbytes/s throughput and 1500 random read IOPS
>
>
> I tested the same hardware configuration with CentOS linux and Linux's write performance is 5 times better than FreeBSD.
>
>
> Anyone encountered the same performance problem? Does the mfi driver have performance issue or I should give up on FreeBSD?
>
>
Unfortunatley issues related to performance can often be very specific.
We use the LSI cards with great success under FreeBSD 8 in our product
at work but it is impossible to say what is specifically wrong in your
setup.

Some years ago I did discover that fio needed to have completely
different arguments to get good performance under FreeBSD, so please
check that first.

What does performance look like with a single large write stream?

Also look at the handling of interrupts (systat -vmstat) to ensure
that interrupts are being handled correctly.
that can vary greatly from motherboard to motherboard and bios to
bios. (even between revisions).
Sometimes Linux will cope differently with these issues as they have
better support from the motherboard makers themselves.
(sometimes we cope better too).

One final thought.. make sure you have partitioned your drives and
filesyste,s so that all the block boundaries agree and line up.
At on place I worked we found we had accidentally partitioned all our
drives starting 63 sectors into the drive.
That did NOT work well. :-) 8k raid stripe writes were always 2
writes (and sometimes a read)

>
>
> 张京城 Jason
>
> 赛凡信息科技(厦门)有限公司
> Cyphy Technology (Xiamen) Co.Ltd.
> 公司总部:厦门市软件园望海路55号A座901-904单元
> 研发总部:北京市东城区美术馆后街大取灯胡同2号
> 热线:4008798066
> 总机:0592-2936100
> 邮箱:jason...@cyphytech.com
> 公司网址:Http://www.cyphytech.com
>
>
> _______________________________________________

> freebsd-p...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-performance
> To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

Mark Felder

unread,
Jun 21, 2016, 12:30:02 PM6/21/16
to


On Fri, Jun 17, 2016, at 02:17, Jason Zhang wrote:
> Hi,
>
> I am working on storage service based on FreeBSD. I look forward to a
> good result because many professional storage company use FreeBSD as its
> OS. But I am disappointed with the Bad performance. I tested the the
> performance of LSI MegaRAID 9260-8i and had the following bad result:
>
> 1. Test environment:
> (1) OS: FreeBSD 10.0 release

10.0-RELEASE is no longer supported. Can you reproduce this on
10.3-RELEASE?


--
Mark Felder
fe...@feld.me

Mark Felder

unread,
Jun 21, 2016, 12:37:09 PM6/21/16
to


On Fri, Jun 17, 2016, at 02:17, Jason Zhang wrote:
> Hi,
>
> I am working on storage service based on FreeBSD. I look forward to a
> good result because many professional storage company use FreeBSD as its
> OS. But I am disappointed with the Bad performance. I tested the the
> performance of LSI MegaRAID 9260-8i and had the following bad result:
>
> 1. Test environment:
> (1) OS: FreeBSD 10.0 release

10.0-RELEASE is no longer supported. Can you test this on 10.3-RELEASE?

Have you confirmed that both servers are using identical RAID controller
settings? It's possible the CentOS install has enabled write caching but
it's disabled on your FreeBSD server. Are you using UFS or ZFS on
FreeBSD? Do you have atime enabled? I believe CentOS is going to have
"relatime" or "nodiratime" by default to mitigate the write penalty on
each read access.

We need more data :-)


--
Mark Felder
ports-secteam member
fe...@FreeBSD.org

Jason Zhang

unread,
Jun 21, 2016, 10:14:48 PM6/21/16
to
Mark,

Thanks

We have same RAID setting both on FreeBSD and CentOS including cache setting. In FreeBSD, I enabled the write cache but the performance is the same.

We don’t use ZFS or UFS, and test the performance on the RAW GEOM disk “mfidx” exported by mfi driver. We observed the “gstat” result and found that the write latency
is too high. When we “dd" the disk with 8k, it is lower than 1ms, but it is 6ms on 64kb write. It seems that each single write operation is very slow. But I don’t know
whether it is a driver problem or not.


Jason

Doros Eracledes

unread,
Jun 22, 2016, 1:44:48 AM6/22/16
to
As a side note, we also use this controller with FreeBSD 10.1 but configured each drive as a JBOD and then created raidz zfs pools and that was much faster than to let the LSI do raid5. 

Best
Doros

Borja Marcos

unread,
Jun 22, 2016, 3:05:33 AM6/22/16
to

> On 22 Jun 2016, at 04:08, Jason Zhang <jason...@cyphytech.com> wrote:
>
> Mark,
>
> Thanks
>
> We have same RAID setting both on FreeBSD and CentOS including cache setting. In FreeBSD, I enabled the write cache but the performance is the same.
>
> We don’t use ZFS or UFS, and test the performance on the RAW GEOM disk “mfidx” exported by mfi driver. We observed the “gstat” result and found that the write latency
> is too high. When we “dd" the disk with 8k, it is lower than 1ms, but it is 6ms on 64kb write. It seems that each single write operation is very slow. But I don’t know
> whether it is a driver problem or not.

There is an option you can use (I do it all the time!) to make the card behave as a plain HBA so that the disks are handled by the “da” driver.

Add this to /boot/loader.conf

hw.mfi.allow_cam_disk_passthrough=1
mfip_load=“YES"

And do the tests accessing the disks as “da”. To avoid confusions, it’s better to make sure the disks are not part of a “jbod” or logical volume configuration.


Borja.

Harry Schmalzbauer

unread,
Jul 31, 2016, 12:52:15 PM7/31/16
to
Bezüglich Jason Zhang's Nachricht vom 17.06.2016 09:16 (localtime):

> Hi,
>
> I am working on storage service based on FreeBSD. I look forward to a good result because many professional storage company use FreeBSD as its OS. But I am disappointed with the Bad performance. I tested the the performance of LSI MegaRAID 9260-8i and had the following bad result:
>
> 1. Test environment:
> (1) OS: FreeBSD 10.0 release
> (2) Memory: 16G
> (3) RAID adapter: LSI MegaRAID 9260-8i
> (4) Disks: 9 SAS hard drives (10000 rpm), performance is expected for each hard drive

Were the drives completely initialized?
I remember that at least one vendor had implemented read-past-write for
every sector when written first.
It was with 15k 3.5" spindles and I'm really not sure which vendor it
was, so I won't name any. But "slow init" had solved a similir problem
for me back then...

-Harry

O. Hartmann

unread,
Aug 1, 2016, 4:57:44 AM8/1/16
to
On Wed, 22 Jun 2016 08:58:08 +0200
Borja Marcos <bor...@sarenet.es> wrote:

> > On 22 Jun 2016, at 04:08, Jason Zhang <jason...@cyphytech.com> wrote:
> >
> > Mark,
> >
> > Thanks
> >
> > We have same RAID setting both on FreeBSD and CentOS including cache
> > setting. In FreeBSD, I enabled the write cache but the performance is the
> > same.
> >
> > We don’t use ZFS or UFS, and test the performance on the RAW GEOM disk
> > “mfidx” exported by mfi driver. We observed the “gstat” result and found
> > that the write latency is too high. When we “dd" the disk with 8k, it is
> > lower than 1ms, but it is 6ms on 64kb write. It seems that each single
> > write operation is very slow. But I don’t know whether it is a driver
> > problem or not.
>
> There is an option you can use (I do it all the time!) to make the card
> behave as a plain HBA so that the disks are handled by the “da” driver.
>
> Add this to /boot/loader.conf
>
> hw.mfi.allow_cam_disk_passthrough=1
> mfip_load=“YES"
>
> And do the tests accessing the disks as “da”. To avoid confusions, it’s
> better to make sure the disks are not part of a “jbod” or logical volume
> configuration.
>
>
>
>
> Borja.

[...]

How is this supposed to work when ALL disks (including boot device) are settled
with the mfi (in our case, it is a Fujitsu CP400i, based upon LSI3008 and
detected within FreeBSD 11-BETA and 12-CURRENT) controller itself?

I did not find any solution to force the CP400i into a mode making itself
acting as a HBA (we intend to use all drives with ZFS and let FreeBSD
kernel/ZFS control everything).

The boot device is a 256 GB Samsung SSD for enterprise use and putting the UEFI
load onto a EFI partition from 11-CURRENT-ALPHA4 is worse: dd takes up to
almost a minute to put the image onto the SSD. The SSD active LED is blinking
alle the time indicating activity. Caches are off. I tried to enable the cache
via the mfiutil command by 'mfiutil cache mfid0 enable', but it failed ... It
failed also on all other attached drives.

I didn't further go into more investigations right now, since the experience
with the EFI boot loader makes me suspect bad performance and that is harsh so
to speak. Glad to have found this thread anyway.

I cross post this also to CURRENT as it might be an issue with CURRENT ...

Kind regards,

Oliver Hartmann

Borja Marcos

unread,
Aug 1, 2016, 5:57:14 AM8/1/16
to

> On 01 Aug 2016, at 08:45, O. Hartmann <ohar...@zedat.fu-berlin.de> wrote:
>
> On Wed, 22 Jun 2016 08:58:08 +0200
> Borja Marcos <bor...@sarenet.es> wrote:
>
>> There is an option you can use (I do it all the time!) to make the card
>> behave as a plain HBA so that the disks are handled by the “da” driver.
>>
>> Add this to /boot/loader.conf
>>
>> hw.mfi.allow_cam_disk_passthrough=1
>> mfip_load=“YES"
>>
>> And do the tests accessing the disks as “da”. To avoid confusions, it’s
>> better to make sure the disks are not part of a “jbod” or logical volume
>> configuration.
>>
>>
>>
>>
>> Borja.
> [...]
>
> How is this supposed to work when ALL disks (including boot device) are settled
> with the mfi (in our case, it is a Fujitsu CP400i, based upon LSI3008 and
> detected within FreeBSD 11-BETA and 12-CURRENT) controller itself?
>
> I did not find any solution to force the CP400i into a mode making itself
> acting as a HBA (we intend to use all drives with ZFS and let FreeBSD
> kernel/ZFS control everything).

Have you tried that particular option?

With kinda recent LSI based cards you have three options:

- The most usual and definitely NOT RECOMMENDED option is to define a logical volume per disk
which actually LSI Logic called before JBOD mode. It’s not recommended at all if you want to run ZFS.

- Recent cards, I think I saw this first on the LSI3008, have a JBOD mode that exposes the drives as “mfisyspd” devices.
I don’t recommend it either, because the syspd drives are a sort of limited version of a disk device. With SSDs, especially, you
don’t have access to the TRIM command.

- The third option is to make the driver expose the SAS devices like a HBA would do, so that they are visible to the
CAM layer, and disks are handled by the stock “da” driver, which is the ideal solution.

However, this third option might not be available in some custom firmware versions for certain manufacturers? I don´t
know. And I would hesitate to make the conversion on a production machine unless you have a complete and reliable
full backup of all the data in case you need to rebuild it.

In order to do it you need a couple of things. You need to set the variable hw.mfi.allow_cam_disk_passthrough=1
and to load the mfip.ko module.

When booting installation media, enter command mode and use these commands:

-----
set hw.mfi.allow_cam_disk_passthrough=1
load mfip
boot
———


Remember that after installation you need to update /boot/loader.conf in the system you just installed with the
following contents:

hw.mfi.allow_cam_disk_passthrough=1
mfip_load=“YES”


A note regarding CAM and MFI visibility: On some old firmware versions for the LSI2008 I’ve even seen the disks
available both as “mfi” and “da” drivers. If possible, you should try to set them up as “unconfigured good” on the RAID
firmware. Use the RAID firmware set up or maybe mfiutil(8)

Also, make sure you don’t create any logical volumes on the disks you want exposed to CAM. You should delete
the logical volumes so that the MFI firmware doesn’t do anything with them.

AND BEWARE: Doing these changes to a system in production with valuable data is dangerous. Make sure you have a full
and sound backup before making these changes.

As a worst case, the card could expose the devices both as “syspd” and CAM (i.e., “da” drives) but as long as you don’t
touch the syspd devices the card won’t do anything to them as far as I know. It could be a serious problem, however, if you
access a drive part of a logical volume through CAM, as RAID cards tend do to “patrol reads” and other stuff on them.

Provided it’s safe to do what I recommended, try it and follow up by email.

Borja.

O. Hartmann

unread,
Aug 1, 2016, 9:12:43 AM8/1/16
to
On Mon, 1 Aug 2016 11:48:30 +0200
Borja Marcos <bor...@sarenet.es> wrote:

Hello.

First, thanks for responding so quickly.

> > On 01 Aug 2016, at 08:45, O. Hartmann <ohar...@zedat.fu-berlin.de> wrote:
> >
> > On Wed, 22 Jun 2016 08:58:08 +0200
> > Borja Marcos <bor...@sarenet.es> wrote:
> >
> >> There is an option you can use (I do it all the time!) to make the card
> >> behave as a plain HBA so that the disks are handled by the “da” driver.
> >>
> >> Add this to /boot/loader.conf
> >>
> >> hw.mfi.allow_cam_disk_passthrough=1
> >> mfip_load=“YES"
> >>
> >> And do the tests accessing the disks as “da”. To avoid confusions, it’s
> >> better to make sure the disks are not part of a “jbod” or logical volume
> >> configuration.
> >>
> >>
> >>
> >>
> >> Borja.
> > [...]
> >
> > How is this supposed to work when ALL disks (including boot device) are
> > settled with the mfi (in our case, it is a Fujitsu CP400i, based upon
> > LSI3008 and detected within FreeBSD 11-BETA and 12-CURRENT) controller
> > itself?
> >
> > I did not find any solution to force the CP400i into a mode making itself
> > acting as a HBA (we intend to use all drives with ZFS and let FreeBSD
> > kernel/ZFS control everything).
>
> Have you tried that particular option?

I have, indeed, used the "JBOD" function of the PRAID CP400i controller and the
intention of my posting regards to the suspicion, that this is, as mentioned in
many posts concerning RAID controllers and ZFS, the reason for the worse
performance. And as I can see, it has been confirmed, sadly.

>
> With kinda recent LSI based cards you have three options:
>
> - The most usual and definitely NOT RECOMMENDED option is to define a logical
> volume per disk which actually LSI Logic called before JBOD mode. It’s not
> recommended at all if you want to run ZFS.

This is the only way to expose each disk as it is to the OS with the PRAID
CP400i built-in into our RX1330-M2 server (XEON Skylake based). I ordered that
specific box with a HBA capable controller. Searching the net reveals that
there is another one, called PSAS CP400i, which is also based on LSI/Avago
SAS3008 and the possibility to expose drives as-is is explicitely mentioned. I
do not know whether this is a software feature - as I suspect - or something
which has been hardwired to the controller.

>
> - Recent cards, I think I saw this first on the LSI3008, have a JBOD mode
> that exposes the drives as “mfisyspd” devices. I don’t recommend it either,
> because the syspd drives are a sort of limited version of a disk device. With
> SSDs, especially, you don’t have access to the TRIM command.

They expose the drives as "mfidX" if setup as JBOD.

>
> - The third option is to make the driver expose the SAS devices like a HBA
> would do, so that they are visible to the CAM layer, and disks are handled by
> the stock “da” driver, which is the ideal solution.

I didn't find any switch which offers me the opportunity to put the PRAID
CP400i into a simple HBA mode.



>
> However, this third option might not be available in some custom firmware
> versions for certain manufacturers? I don´t know. And I would hesitate to
> make the conversion on a production machine unless you have a complete and
> reliable full backup of all the data in case you need to rebuild it.

The boxes are empty and ready-for-installation, so I do not worry. It is more
worrying about this stupid software-based strangulations of options by Fujitsu
- if any. i do not want to blame them before I haven't double-checked.


>
> In order to do it you need a couple of things. You need to set the variable
> hw.mfi.allow_cam_disk_passthrough=1 and to load the mfip.ko module.
>
> When booting installation media, enter command mode and use these commands:
>
> -----
> set hw.mfi.allow_cam_disk_passthrough=1
> load mfip
> boot
> ———

Well, I'm truly aware of this problemacy and solution (now), but I run into a
henn-egg-problem, literally. As long as I can boot off of the installation
medium, I have a kernel which deals with the setting. But the boot medium is
supposed to be a SSD sitting with the PRAID CP400i controller itself! So, I
never be able to boot off the system without crippling the ability to have a
fullspeed ZFS configuration which I suppose to have with HBA mode, but not
with any of the forced RAID modes offered by the controller.


I will check with Fujitsu for a solution. Maybe the PRAID CP400i is capable
somehow of being a PSAS CP400i also, even if not exposed by the
recent/installed firmware.

Kind regards,
Oliver


>
>
> Remember that after installation you need to update /boot/loader.conf in the
> system you just installed with the following contents:
>
> hw.mfi.allow_cam_disk_passthrough=1
> mfip_load=“YES”
>
>
> A note regarding CAM and MFI visibility: On some old firmware versions for
> the LSI2008 I’ve even seen the disks available both as “mfi” and “da”
> drivers. If possible, you should try to set them up as “unconfigured good” on
> the RAID firmware. Use the RAID firmware set up or maybe mfiutil(8)
>
> Also, make sure you don’t create any logical volumes on the disks you want
> exposed to CAM. You should delete the logical volumes so that the MFI
> firmware doesn’t do anything with them.
>
> AND BEWARE: Doing these changes to a system in production with valuable data
> is dangerous. Make sure you have a full and sound backup before making these
> changes.
>
> As a worst case, the card could expose the devices both as “syspd” and CAM
> (i.e., “da” drives) but as long as you don’t touch the syspd devices the card
> won’t do anything to them as far as I know. It could be a serious problem,
> however, if you access a drive part of a logical volume through CAM, as RAID
> cards tend do to “patrol reads” and other stuff on them.
>
> Provided it’s safe to do what I recommended, try it and follow up by email.
>
>
>
>
>
> Borja.
>
>
>
>
> _______________________________________________

> To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

Borja Marcos

unread,
Aug 1, 2016, 9:30:54 AM8/1/16
to

> On 01 Aug 2016, at 15:12, O. Hartmann <ohar...@zedat.fu-berlin.de> wrote:
>
> First, thanks for responding so quickly.
>
>> - The third option is to make the driver expose the SAS devices like a HBA
>> would do, so that they are visible to the CAM layer, and disks are handled by
>> the stock “da” driver, which is the ideal solution.
>
> I didn't find any switch which offers me the opportunity to put the PRAID
> CP400i into a simple HBA mode.

The switch is in the FreeBSD mfi driver, the loader tunable I mentioned, regardless of what the card
firmware does or pretends to do.

It’s not visible doing a "sysctl -a”, but it exists and it’s unique even. It’s defined here:

https://svnweb.freebsd.org/base/stable/10/sys/dev/mfi/mfi_cam.c?revision=267084&view=markup
(line 93)

>> In order to do it you need a couple of things. You need to set the variable
>> hw.mfi.allow_cam_disk_passthrough=1 and to load the mfip.ko module.
>>
>> When booting installation media, enter command mode and use these commands:
>>
>> -----
>> set hw.mfi.allow_cam_disk_passthrough=1
>> load mfip
>> boot
>> ———
>
> Well, I'm truly aware of this problemacy and solution (now), but I run into a
> henn-egg-problem, literally. As long as I can boot off of the installation
> medium, I have a kernel which deals with the setting. But the boot medium is
> supposed to be a SSD sitting with the PRAID CP400i controller itself! So, I
> never be able to boot off the system without crippling the ability to have a
> fullspeed ZFS configuration which I suppose to have with HBA mode, but not
> with any of the forced RAID modes offered by the controller.

Been there plenty of times, even argued quite strongly about the advantages of ZFS against hardware based RAID
5 cards. :) I remember when the Dell salesmen couldn’t possibly understand why I wanted a “software based RAID rather than a
robust, hardware based solution” :D

At worst, you can set up a simple boot from a thumb drive or, even better, a SATADOM installed inside the server. I guess it will
have SATA ports on the mainboard. That’s what I use to do. FreeNAS uses a similar approach as well. And some modern servers
also can boot from a SD card which you can use just to load the kernel.

Depending on the number of disks you have, you can also sacrifice two to set up a mirror with a “nomal” boot system, and using
the rest of the disks for ZFS. Actually I’ve got an old server I set up in 2012. It has 16 disks, and I created a logical volume (mirror)
with 2 disks for boot, the other 14 disks for ZFS.

If I installed this server now I would do it different, booting off a thumb drive. But I was younger and naiver :)


Borja.

Michelle Sullivan

unread,
Aug 1, 2016, 4:34:32 PM8/1/16
to

There are reasons for using either...

Nowadays its seems the conversations have degenerated into those like
Windows vs Linux vs Mac where everyone thinks their answer is the right
one (just as you suggested you (Borja Marcos) did with the Dell
salesman), where in reality each has its own advantages and
disadvantages. Eg: I'm running 2 zfs servers on 'LSI 9260-16i's... big
mistake! (the ZFS, not LSI's)... one is a 'movie server' the other a
'postgresql database' server... The latter most would agree is a bad
use of zfs, the die-hards won't but then they don't understand database
servers and how they work on disk. The former has mixed views, some
argue that zfs is the only way to ensure the movies will always work,
personally I think of all the years before zfs when my data on disk
worked without failure until the disks themselves failed... and RAID
stopped that happening... what suddenly changed, are disks and ram
suddenly not reliable at transferring data? .. anyhow back to the issue
there is another part with this particular hardware that people just
throw away...

The LSI 9260-* controllers have been designed to provide on hardware
RAID. The caching whether using the Cachecade SSD or just oneboard ECC
memory is *ONLY* used when running some sort of RAID set and LVs... this
is why LSI recommend 'MegaCli -CfgEachDskRaid0' because it does enable
caching.. A good read on how to setup something similar is here:
https://calomel.org/megacli_lsi_commands.html (disclaimer, I haven't
parsed it all so the author could be clueless, but it seems to give
generally good advice.) Going the way of 'JBOD' is a bad thing to do,
just don't, performance sucks. As for the recommended command above,
can't comment because currently I don't use it nor will I need to in the
near future... but...

If you (O Hartmann) want to use or need to use ZFS with any OS including
FreeBSD don't go with the LSI 92xx series controllers, its just the
wrong thing to do.. Pick an HBA that is designed to give you direct
access to the drives not one you have to kludge and cajole.. Including
LSI controllers with caches that use the mfi driver, just not those that
are not designed to work in a non RAID mode (with or without the
passthru command/mode above.)

>
> At worst, you can set up a simple boot from a thumb drive or, even better, a SATADOM installed inside the server. I guess it will
> have SATA ports on the mainboard. That’s what I use to do. FreeNAS uses a similar approach as well. And some modern servers
> also can boot from a SD card which you can use just to load the kernel.
>
> Depending on the number of disks you have, you can also sacrifice two to set up a mirror with a “nomal” boot system, and using
> the rest of the disks for ZFS. Actually I’ve got an old server I set up in 2012. It has 16 disks, and I created a logical volume (mirror)
> with 2 disks for boot, the other 14 disks for ZFS.
>
> If I installed this server now I would do it different, booting off a thumb drive. But I was younger and naiver :)
>
>

If I installed mine now I would do them differently as well... neither
would run ZFS, both would use their on card RAID kernels and UFS on top
of them... ZFS would be reserved for the multi-user NFS file servers.
(and trust me here, when it comes to media servers - where the media is
just stored not changed/updated/edited - the 16i with a good highspeed
SSD as 'Cachecade' really performs well... and on a moderately powerful
MB/CPU combo with good RAM and several gigabit interfaces it's
surprising how many unicast transcoded media streams it can handle...
(read: my twin fibres are saturated before the machine reaches anywhere
near full load, and I can still write at 13MBps from my old Mac Mini
over NFS... which is about all it can do without any load either.)

So moral of the story/choices. Don't go with ZFS because people tell
you its best, because it isn't, go with ZFS if it suits your hardware
and application, and if ZFS suits your application, get hardware for it.

Regards,

--
Michelle Sullivan
http://www.mhix.org/

Ultima

unread,
Aug 1, 2016, 11:23:13 PM8/1/16
to
If anyone is interested, as Michelle Sullivan just mentioned. One problem I
found when looking for an HBA is that they are not so easy to find. Scoured
the internet for a backup HBA I came across these -
http://www.avagotech.com/products/server-storage/host-bus-adapters/#tab-12Gb1

Can only speak for sas-9305-24i. All 24 bays are occupied and quite pleased
with the performance compared to its predecessor. It was originally going
to be a backup unit, however that changed after running a scrub and the
amount of hours to complete cut in half (around 30ish to 15 for 35T). And
of course, the reason for this post, it replaced a raid card in passthrough
mode.

Another note, because it is an HBA, the ability to flash firmware is once
again possible! (yay!)

+1 to HBA's + ZFS, if possible replace it for an HBA.

On Mon, Aug 1, 2016 at 1:30 PM, Michelle Sullivan <mich...@sorbs.net>
wrote:

Borja Marcos

unread,
Aug 2, 2016, 4:27:54 AM8/2/16
to

> On 01 Aug 2016, at 19:30, Michelle Sullivan <mich...@sorbs.net> wrote:
>
> There are reasons for using either…

Indeed, but my decision was to run ZFS. And getting a HBA in some configurations can be difficult because vendors insist on using
RAID adapters. After all, that’s what most of their customers demand.

Fortunately, at least some Avago/LSI cards can work as HBAs pretty well. An example is the now venerable LSI2008.

> Nowadays its seems the conversations have degenerated into those like Windows vs Linux vs Mac where everyone thinks their answer is the right one (just as you suggested you (Borja Marcos) did with the Dell salesman), where in reality each has its own advantages and disadvantages.

I know, but this is not the case. But it’s quite frustrating to try to order a server with a HBA rather than a RAID and receiving an answer such as
“the HBA option is not available”. That’s why people are zapping, flashing and, generally, torturing HBA cards rather cruelly ;)

So, in my case, it’s not about what’s better or worse. It’s just a simpler issue. Customer (myself) has made a decision, which can be right or wrong. Manufacturer fails to deliver what I need. If it was only one manufacturer, well, off with them, but the issue is widespread in industry.

> Eg: I'm running 2 zfs servers on 'LSI 9260-16i's... big mistake! (the ZFS, not LSI's)... one is a 'movie server' the other a 'postgresql database' server... The latter most would agree is a bad use of zfs, the die-hards won't but then they don't understand database servers and how they work on disk. The former has mixed views, some argue that zfs is the only way to ensure the movies will always work, personally I think of all the years before zfs when my data on disk worked without failure until the disks themselves failed... and RAID stopped that happening... what suddenly changed, are disks and ram suddenly not reliable at transferring data? .. anyhow back to the issue there is another part with this particular hardware that people just throw away…

Well, silent corruption can happen. I’ve seen it once caused by a flaky HBA and ZFS saved the cake. Yes. there were reliable replicas. Still, rebuilding would be a pain in the ass.

> The LSI 9260-* controllers have been designed to provide on hardware RAID. The caching whether using the Cachecade SSD or just oneboard ECC memory is *ONLY* used when running some sort of RAID set and LVs... this is why LSI recommend 'MegaCli -CfgEachDskRaid0' because it does enable caching.. A good read on how to setup something similar is here: https://calomel.org/megacli_lsi_commands.html (disclaimer, I haven't parsed it all so the author could be clueless, but it seems to give generally good advice.) Going the way of 'JBOD' is a bad thing to do, just don't, performance sucks. As for the recommended command above, can't comment because currently I don't use it nor will I need to in the near future... but…

Actually it’s not a good idea to use heavy disk caching when running ZFS. Its reliability depends on being able to commit metadata to disk. So I don’t care about that caching option. Provided you have enough RAM, ZFS is very effective caching data itself.

> If you (O Hartmann) want to use or need to use ZFS with any OS including FreeBSD don't go with the LSI 92xx series controllers, its just the wrong thing to do.. Pick an HBA that is designed to give you direct access to the drives not one you have to kludge and cajole.. Including LSI controllers with caches that use the mfi driver, just not those that are not designed to work in a non RAID mode (with or without the passthru command/mode above.)

As I said, the problem is, sometimes it’s not so easy to find the right HBA.

> So moral of the story/choices. Don't go with ZFS because people tell you its best, because it isn't, go with ZFS if it suits your hardware and application, and if ZFS suits your application, get hardware for it.

Indeed, I second this. But really, "hardware for it" covers a rather broad cathegory ;) ZFS can even manage to work on hardware _against_ it.


Borja.

0 new messages