From the beginning it has been professed that CC for economies of scale reason need mega data centers.
It certaily makes sense from economies of scale perspective but is the physical maga data center(s) only way to achieve the economies of scale?...
Why not make mini-clouds from reasonably sized data centers (say 10,000+ as opposed to 100,000+) and connect them up in such a way that they form a virtual mega data center.
If the mini-data centers run out of resource elasticity, they seamlessly borrow / migrate resources from other nearby mini data centers.
Ofcourse, one major issue here is, can WAN bandwidth support this?....
That is why my earlier post some time back about why spend $75 - $100 billion on healthcare?....why not internet2?.
By increasing the WAN/Internet2 bandwidth, you not only help the CC industry but also the internet/web industry.
Now, which has better ROI?...investing in Healthcare or the Internet2?....
Dean’s talk is very good. Only quibble I have with it is his citation of disk failures – pages 9 & 10. Instead of 1-5% disk AFR, it should be 0.01-0.05% disk AFR, and on page 10, O(1000s) is should be O(10s) in a year, using best practices and storage elements.
The reason that disk AFR is very important is this; they are permanent, not ephemeral. CPU failures (servers) are one thing, since they hold very little data; disk failures, OTOH, are critical to avoid. If a design assumes high disk AFR, it’s forced into replicating data at least 2x (if not 3x or more, present in some designs today) just to overcome same. Plus, we know that over half of reported disk failures, upon close examination (e.g. running drive-level diagnostics) are false positive, which makes the problem much worse. That is non-optimal both in technical and economic terms.
Rob
No virus found
in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.698 / Virus Database: 270.14.53/2487 - Release Date: 11/08/09
01:37:00
Bob,
Yes, I agree that latency is definitely a problem. But in life we all make compromises especially when it comes to economics. When we cannot buy a mansion we settle for a large house. Our expectations are adjusted to the reality. We do not expect to drive cars that go more than 200+ mph or planes that go 1000+ mph (at this time). So considering the fact that there are hundreds if not thousands of hosting providers across the country and the world in existance today with mini-cloud size facilities, is it not lot more economical (especailly after an almost a great depression) to make virtual mega data centers out of them than build these huge/mega data centers. Yes, you certainly need to “get better at sharding workloads and storage.” as you say. Also thanks for the great slides.
From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Bob Sutterfield
Sent: Sunday, November 08, 2009
10:12 AM
To: cloud-c...@googlegroups.com
Subject: [ Cloud Computing ] Re:
Are Mega Data Centers Necessary?...
It's not just the WAN bandwidth (which isn't itself a trivial problem), it's the latency between the workload and its storage. You can light the fiber as fat as you like with DWDM or whatever, but physics still controls the round-trip travel time. Different workloads are differently suited to different distances from their storage.
The reason that disk AFR is very important is this; they are permanent, not ephemeral. CPU failures (servers) are one thing, since they hold very little data; disk failures, OTOH, are critical to avoid.
If a design assumes high disk AFR, it’s forced into replicating data at least 2x (if not 3x or more, present in some designs today) just to overcome same. Plus, we know that over half of reported disk failures, upon close examination (e.g. running drive-level diagnostics) are false positive, which makes the problem much worse. That is non-optimal both in technical and economic terms.
@Bob
Can’t disagree with your last paragraph. But, meeting that goal given massive increases (non-linear) in data using high disk AFR is increasingly difficult as more FTE labor has to be applied just to keep the disk farm running. Disks may be cheap, but humans are expensive.
Plus, if the true goal is overall system availability, at some point the concept of reliable s/w on top of unreliable h/w falls apart, unless you have an endless supply of cheap humans for break/fix. What is really needed is reliable s/w on top of reliable h/w; then there are far fewer scaling issues. If the disk h/w is autonomic, you have a shot at scalability.
Of course, all this is predicated on the assumption that optimizing for low FTE cost is a goal. It may very well not be in Google’s case.
Rob
From:
cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On
Behalf Of Bob Sutterfield
Sent: Sunday, November 08, 2009 3:59 PM
To: cloud-c...@googlegroups.com
Subject: [ Cloud Computing ] Re: Are Mega Data Centers Necessary?...
Robert Peglar wrote:
No virus found
in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.698 / Virus Database: 270.14.53/2487 - Release Date: 11/08/09 13:39:00
meeting that goal [overall availability] given massive increases (non-linear) in data using high disk AFR is increasingly difficult as more FTE labor has to be applied just to keep the disk farm running. Disks may be cheap, but humans are expensive... all this is predicated on the assumption that optimizing for low FTE cost is a goal. It may very well not be in Google’s case.
“Another reason why you may want multiple data centers is the concept of "computing at the edge", which means to bring computing as close to the customer as possible. While this is theoretically possible, I personally think there are a few technologies that need to be baked into the applications on "DAY 1".”
There are lot of hosting providers who are already close/computing to the edge, their customers. This infrastructure has been put in place in the last 15 years (ever since internet/web became ubiquitous). These facilities and their proximity to customers should alleviate the latency problem. They also have worked out the BW problems over the last 15 years. But I do agree with Bob that if you go this route, the inter-miniCloud/datacenter latency would be a problem and consequently you need to do workload/processing proximity optimization. In addition, this distributed architecture distributes risk, BW congestion associated with mega data centers. Even with mega data centers we will come across BW and latency issues. Because in order to run them efficiently, they will have to aggregate hundreds of multi-clients from across the country and continents. This in itself will cause both BW and latency issues.
Also another issue comes to mind, considering that the CC industry is headed the private cloud route atleast in the next 3 to 5 years, the private clouds are distributed across the country. So in the next phase, the hybrid clouds will augment the extra capacity of these private clouds and they will also possibly be closer to their customers – the private clouds. So economics is driving the architecture/topology of the clouds to be distributed rather than consolidated, monolithic mega data centers. It will be intereting to see what happens in the next 5 years.
From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Alan Ho
Sent: Sunday, November 08, 2009
11:19 PM
To: cloud-c...@googlegroups.com
Subject: [ Cloud Computing ] Re:
Are Mega Data Centers Necessary?...
Another reason why you may want multiple data centers is the concept of "computing at the edge", which means to bring computing as close to the customer as possible. While this is theoretically possible, I personally think there are a few technologies that need to be baked into the applications on "DAY 1".