SATA host adaptors - experiences, options, throughput etc.

560 views
Skip to first unread message

Tim Small

unread,
Jul 28, 2014, 6:13:28 AM7/28/14
to opensto...@googlegroups.com
Hello,

I don't operate any pods, but I do have a couple of custom-made systems which are similar (although not on the same scale - 26 drives per machine which half hot-swappable via front panel).

The Silicon Image 3132 (AKA SiI3132 or Sil3132) chips which backblaze formerly used are quite nice, but are now quite dated, and are a bottle neck as they only support PCIe 1.0 single lane ("x1").

I've had poor experience with SATA controllers from Marvell and universally poor experience with SAS controllers (especially chips made by LSI).  e.g. SAT pass-through broken on LSI (so can't use hdparm / smart reliably), plus generally flakiness, NCQ broken on Marvell AHCI, FIS-method port multiplier support broken on Marvell, and other general flakiness on Marvell too.

The Intel motherboard SATA controllers have been rock-solid.

Linux kernel support for port multipliers has been patchy, especially with respect to hot-plug (plus there is an Erata on the Silicon Image Sil3132 

I've recently come across a problem with the Syba / IOCrest SY-PEX40039 conroller cards which Backblaze have been recommending and newer Linux kernels - they lock up under heavy load on my systems (reported to linux-ide recently).

I've recently started experimenting with ASMedia (owned by Asus) 1061 / 1062 based AHCI controllers (only direct to drives so-far, not with port multipliers).  Results have been positive so-far - I was wondering if anyone else had any experiences with these chips?

Ultimately I think it would be nice to come up with a decent cheap high-bandwidth SATA host adaptor design, using a PCI Express switch chip to put multiple reliable PCIe SATA controller chips into a single slot.

e.g.
[PCI Express 3.0 x1 slot] (985 MB/sec)
|
[PCI Express 3.0 switch chip]
|
[four individual 2 SATA port to PCI Express host adaptors - e.g. Sil3132, etc.]
|
[8x SATA ports - max simultaneous bandwidth per drive = 123 MB/s minus protocol overheads]


6 of these per machine would do a storage pod (alternatively use 5, and all the motherboard SATA controller ports too), then you can stick a nice fast interconnect on the machine too.


Another option would be:

[PCI Express 2.0 x8 lanes] (4000 MB/sec)
|
[PCI Express switch chip]
|
[twelve individual 2 SATA port to PCI Express host adaptors - e.g. Sil3132, etc.]
|
[24x SATA ports - max simultaneous bandwidth per drive = 166 MB/s minus protocol overheads]

2 of these per machine giving 48 ports.

The main ICs for the second option would cost:

e.g. PLX PEX 8625 = $77 (qty of 100+)
Sil3132 = $7.50 each
ASMedia 1061 = $2.50 each

so $105 for the second option implemented with Asmedia, or $165 implemented with Silicon Image.

OK, you've got to design and build the board, but the cards Backblaze are currently using are $750 each, and these will outperform them heavily...

Tim.

Tim Lossen

unread,
Jul 28, 2014, 4:14:48 PM7/28/14
to opensto...@googlegroups.com
i like your high-bandwidth SATA host adaptor designs! there is also a commercial version from highpoint:

http://www.highpoint-tech.com/USA_new/cs-series_DC7280.htm

i’ve seen them offered for around 400 euro in germany. not quite as cheap as yours, but fully assembled :)

on the other hand, reviews are not so hot:

http://www.newegg.com/Product/Product.aspx?Item=N82E16816115110

tim
> --
> You received this message because you are subscribed to the Google Groups "OpenStoragePod" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to openstoragepo...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Tim Small

unread,
Jul 29, 2014, 5:59:42 AM7/29/14
to opensto...@googlegroups.com
A couple of corrections to my original post.  The cards I've been having problems with under newer Linux kernels are the Sil3124 (with PCIe PCI-X bridge chip) SY-PEX40008 cards...

Not the SY-PEX40039 which I originally stated (these cards are ASM1061 based cards which I haven't seen any problems with under limited testing - neither can I find any trace of any on Google with the exception of a bug with ATAPI which seems to hit some DVD writers - so probably not of concern) - I have one of these in production on trial at the moment, and a couple more on order - has anyone else used them?


Also I claimed that my designs would have a lot more bandwidth available than the ones which BackBlaze are using in the 4.0 pods.  This isn't true, they only have a bit more useful bandwidth, as that design has the following bottle necks:

Backblaze 3.0 design (Sil + PMP)
PCIe version 1.0 single lane has 250 MB/s - this split between 15 drives (I think that's right?) gives min 17 MB/s per drive (minus protocol overheads), if all 15 drives used simultaneously.
each 300 MB/s SATA channel spilt 5 ways using port multiplier gives 60 MB/s per drive (minus overheads) if all 5 drives on one port used simultaneously.

Backblaze 4.0 design (RR 750):
PCIe version 2.0 8 lane has 4000 MB/s - this split 40 ways gives min 100 MB/s per drive (minus protocol overheads), if all 40 drives used simultaneously.
each 600 MB/s SATA channel spilt 5 ways using port multiplier gives 150 MB/s per drive (minus overheads) if all 5 drives on one port used simultaneously.

The 4TB SATA drives I have do about 180 MB/s sequential reads on the Intel motherboard controllers, so both of these designs are constrained by the I/O card arrangements when loaded heavily for sequential I/O across multiple drives.  The Pod 3.0 is very constrained (9% performance worst-case), and the 4.0 less-so (83% performance worst-case).


The design examples I gave at the start of this thread have a best-case performance of 125 MB/s min if using Sil3132 controllers (but 166 MB/s if a PCIe 2.0 controller like the ASM1061 is used instead).


Interestingly, just dropping ASM1061 cards (e.g. SY-PEX40039) into the Pod 3.0 design would ease the bottle neck quite a bit (but I haven't tried these with the Silicon Image SATA port multipliers at all, and haven't used them much in general, but can't find any relevant negative reports about them - but then neither could I with some of the Marvell based cards before I started using them!).  This would require a motherboard with min 5 PCIe slots (these do exist, in fact I've seen at least one with 8 PCIe slots).

PCIe version 2.0 single lane has 500 MB/s - this split between 10 drives gives min 50 MB/s per drive (minus protocol overheads), if all 10 drives used simultaneously.
each 300 MB/s SATA channel spilt 5 ways using port multiplier gives 60 MB/s per drive (minus overheads) if all 5 drives on one port used simultaneously.

So we've gone from 17 MB/s per drive up to 50 MB/s per drive in the most constrained case...

Tim Small

unread,
Jul 29, 2014, 6:27:28 AM7/29/14
to opensto...@googlegroups.com
On Monday, 28 July 2014 21:14:48 UTC+1, Tim Lossen wrote:
there is also a commercial version from highpoint:

http://www.highpoint-tech.com/USA_new/cs-series_DC7280.htm

This is based around a single Marvell SATA controller, with a load of Marvell SATA port multipliers.


It seems to me that whilst there are multiple issues with a SATA port multiplier based design, there are fewer with PCIe switches:

+ More vendors
+ Far simpler designs from a software point of view
+ You end up using better-debugged software - every PC has a built-in PCIe switch, whereas very few have SATA PMPs
+ More flexible designs are possible

 (about the only one I can think of is BIOS bugs which would stop the devices being recognised or enumerated, however there are many motherboards on the market - and even this could be worked around using some PCIe switch designs by hiding the controllers from the BIOS, but exposing them to Linux - would require a bit of driver writing under Linux but nothing complex/drastic).

Some routes to getting one made:

. Select a controller chip to use (e.g. gain confidence in the ASM1061, or just use the lower performance but tried/trusted Sil3132)

Then either:

. Someone comes up with an open hardware design and we get some hardware manufacturers' to produce it
. We get enough people together to commit to a minimum order quantity, and get an existing hardware manufacturer to design/produce.
. Kickstarter type project to do the previous option

I found a few companies with relevant experience:

Amfeltec.com - produce PCIe switch based hardware
ioi.com.tw - produce PCIe switch based hardware, and also both Sil3124 and ASM1061 based SATA cards already
sintech.cn - produce PCIe switch based hardware, and also both Marvell and ASM1061 based SATA cards already

... there are probably a few others ...

Any big Storage Pod 4.0 users (anyone from Backblaze listening ? :o) ) fancy getting a second source of SATA HBAs? 

Tim Small

unread,
Jul 29, 2014, 6:38:42 AM7/29/14
to opensto...@googlegroups.com
On 29/07/14 11:27, Tim Small wrote:
On Monday, 28 July 2014 21:14:48 UTC+1, Tim Lossen wrote:
there is also a commercial version from highpoint:

http://www.highpoint-tech.com/USA_new/cs-series_DC7280.htm

This is based around a single Marvell SATA controller, with a load of Marvell SATA port multipliers.

I meant to also say - since this DC7280 is a better fit for the Storage Pod 4.0 design in terms of SATA channels available, and is also a lot cheaper than the RR750, I'd guess that Backblaze evaluated it and rejected it...

Gotta be something to be said for using lots of simple AHCI controllers which are used on squillions of motherboards and third party adapters, instead of a weird expensive high-end controller which next-to-no-one uses...

Tim.

-- 
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.  
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309

Allan Pickett

unread,
Jan 23, 2015, 8:57:17 AM1/23/15
to opensto...@googlegroups.com
hi

we are in the process of assessing a storage pod, focusing on low cost per GiB rather than performance.

MB - Asus P8C WS
HBA - 3x LSI9201-16i (do not use the P20 firmware, acknowledged as bad by LSI/Avago. P19 is ok)
FCAL target, qle2642
HDD - WD40EFRX

Currently running openfiler 2.99.2 - but hope to move to a more generic solution, centos based.
And will probably move to a server mb which supports some form of remote management, such as Supermicro X10SRL-F

The simple sata hba approach appeals as a more of a solid solution than sata port multipliers.
We could get the LSI cards cheaper than the RocketRaid
Have 8 systems using openfiler, LSI 9211-8i, 15x HDD, and they run well.

Still evaluating, so not much more to report yet. But pls ask any questions, critique, or suggestions. All welcome.

Regards
Allan
Reply all
Reply to author
Forward
0 new messages