10 PB rack

244 views
Skip to first unread message

Rick Peralta

unread,
Oct 19, 2015, 4:14:39 AM10/19/15
to OpenStoragePod
Given available technology it is feasible to build a 10 PB rack, based on a modified 3U POD design, using 120 2.5" drives and powered from a domestic power outlet.

The media would be SSDs, which are almost cost competitive with the rotating media now.  This convergence has been inevitable for many years and is expected to continue to accelerate with SSDs ultimately being much less expensive per bit than rotating media.  The prospects of a kilowatt per rack should be appealing for the IT center folks, especially when the idle power will be a small fraction of active power.  A nice side effect of the SSD technology is long shelf life.  The SSDs do tend to fatigue with use, but if used for archive and retrieval (write once, read many), they can outlive most other forms of media (e.g. rotating disk, DVD, CD, floppy, tape, et cetera).  As a nice side effect, the storage will be fast and could provide terabits of storage bandwidth.

This is all based on existing and proven technology that exists now and will be rolled out in the short term, with good prospects of more and better to come.  The 4 TB SSDs are here now and 8 TB drives going into production.  Announcements of 16 TB have been made and 32 TB is in the lab, so we can expect to see them all before long.  The price trends show SSDs coming in cost parity with rotating media soon and the truth is that at those prices they are more profitable than the rotating media.  The core of the price performance is that the cost to produce silicon is largely based on die area, so increases in density do not really increase costs.  And power consumption tracks with usage, not density - which as a side effect reduced power consumption.

So, building a 10 PB rack that can provide lots of low latency, low power, high reliability, high bandwidth storage with largely COTS components at cost parity with existing technology is within reach.

jason andrade

unread,
Oct 19, 2015, 7:34:23 AM10/19/15
to opensto...@googlegroups.com

G'day Rick,

No dispute with your premise and the indicative (hypothetical?) numbers but.. some thoughts:

- A SSD based system aimed at low(er) power, longer life, higher read throughput will solve one set of problems but still leaves open others that have to be addressed

- This is where the mechanical disk based system has worked as well as it has for so long - it combines some parts of 'fast' 'cheap' and 'good' for values of 'read' and 'write' and depending on how you lay it all out, you can balance somewhat equally between all those values (and especially the read and write bits)

I came up with a figure of around A$5 million to do this using the 1T SSDs. If I assume that you can get 4T SSDs for say only twice the price, you're looking at a bit over A$2.5m.

So not quite down to where disk is at the moment (e.g. 2015/2016 pricing) when you factor in all the other things - but yeah it seems to be headed in that direction.

Has anyone run (and published) numbers though on SSD reliability as well as cost when combined with the same workload spinning disk provides to work out if it stands up ? I know a bunch of places have done work on individual SSDs. Don't think I've seen as much in the way of the analysis on using them in clusters..

Now I need to go work out the actual DC costs of power/cooling and do some modelling on 10PB over 5 years compared to the same amount of rust..

regards,

-jason
-----
M: +61 402 489 637 E: jason....@gmail.com
> --
> You received this message because you are subscribed to the Google Groups "OpenStoragePod" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to openstoragepo...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Rick Peralta

unread,
Oct 20, 2015, 3:12:52 PM10/20/15
to OpenStoragePod
Hi Jason,

I agree that today the SSDs are not at cost parity with rotating media.

The presence of the proposal is that SSDs will come to price parity with rotating media.  And that may achieve total cost of ownership parity very soon, and perhaps already has. 

The power issues are significant.  Backblaze can provide real data, but I would expect rotating media to average about 5 watts and SSDs one tenth of that. Knocking off a few kw per rack helps with the total cost of ownership and longevity should help with amortization.

The current production FLASH provide a Tb of storage per device, so 8 devices is a terabyte. 64 devices can fit in the space of a 2.5" drive - so 8 TB is feasible with available production devices.  Those densities are expected to continue to increase.  The cost of making the silicon does not significantly increase for the newer process.

As a practical issue, silicon costs about 50 cents each, so the cost of the silicon is relatively small, compared to the cost of the device.

The true economics of SSDs is not based on the cost of the technology, but rather the value.  The market should find balance over time.

The point of the post is to prepare.  The economics and performance needs to be fully investigated.  

MTBF of SSDs is generally the millions of hours, where rotating medial is maybe 15% of that.

FLASH silicon has data retention measured in decades.  Conversely, hammering on storage generally will fail in 5 - 10 years.  Many of the manufacturers provide 10 year warranties.

Backblaze has done a great job at collecting and sharing actuarial data on hard drives and given a nod to COTS hardware reliability.

Such data should be collected on the newer technologies. Perhaps populating a current POD with SSDs and seeing how it fares over time in regular use.

Cheers,

 - Rick

Ouroboros

unread,
Oct 20, 2015, 7:32:46 PM10/20/15
to opensto...@googlegroups.com
The key drivers for cost here are the number of devices within a single chassis. Most HBA cards will top out at 24 ports. I am not offhand aware of a cheap backplane for 2.5 inch devices like the SATA port multiplier backplanes currently used for the pods, which are repurposed backplanes from 5 bay external disk chassis' for home use. Which pushes you into buying many high port count HBA cards, with their attendent PCIexpress lane requirements, and/or commiting at some point to designing and manufacturing a custom backplane.

For a rough estimation, current pods have 3 bays of 15 3.5 inch drives, underlayed with 9 port multiplier backplanes in blocks of 5. Chassis from other server makers show 24 2.5 inch drives in a wide config. WIth a vertical orientation, this gives roughly 4 bays for the same rough dimensions, so 96 drives. If I remember correctly, high port count HBA chips are similar to port multiplier chips, so internally they have 32 "ports", typically with 24 downstream and 8 upstream reserved. So a custom 24 port backplane  to cover one bay is feasible. However, many server makers use a expander backplane board on their servers such that 2.5 inch devices are managed in 8 device blocks. Those backplanes likely feature a cheaper chip with 16 or less internal "ports". If doing a custom backplane based on an 8 device reference expander backplane design, the cost is likely lower since the number of backplanes goes up (mass production quantity cost reduction) and through the use of a reference design. So that makes 12 backplanes for a 4 bay 96 device chassis. Splitting the bays off onto individual HBA cars at the motherboard means 4 24 port HBA cards if direct connecting.

One starts to wonder if the OpenCompute storage platform reference designs start to make some sense in that context.

--
Reply all
Reply to author
Forward
0 new messages