From the beginning it has been professed that CC for economies of scale reason need mega data centers.
It certaily makes sense from economies of scale perspective but is the physical maga data center(s) only way to achieve the economies of scale?...
Why not make mini-clouds from reasonably sized data centers (say 10,000+ as opposed to 100,000+) and connect them up in such a way that they form a virtual mega data center.
If the mini-data centers run out of resource elasticity, they seamlessly borrow / migrate resources from other nearby mini data centers.
Ofcourse, one major issue here is, can WAN bandwidth support this?....
That is why my earlier post some time back about why spend $75 - $100 billion on healthcare?....why not internet2?.
By increasing the WAN/Internet2 bandwidth, you not only help the CC industry but also the internet/web industry.
Now, which has better ROI?...investing in Healthcare or the Internet2?....
> wrote:
> *From the beginning it has been professed that CC for economies of scale
> reason need mega data centers.*
> *It certaily makes sense from economies of scale perspective but is the
> physical maga data center(s) only way to achieve the economies of scale?...
> *
> * *
> *Why not make mini-clouds from reasonably sized data centers (say 10,000+
> as opposed to 100,000+) and connect them up in such a way that they form a
> virtual mega data center.*
> *If the mini-data centers run out of resource elasticity, they seamlessly
> borrow / migrate resources from other nearby mini data centers.*
> * *
> *Ofcourse, one major issue here is, can WAN bandwidth support this?....*
> * *
> *That is why my earlier post some time back about why spend $75 - $100
> billion on healthcare?....why not internet2?.*
> *By increasing the WAN/Internet2 bandwidth, you not only help the CC
> industry but also the internet/web industry.*
> *Now, which has better ROI?...investing in Healthcare or the
> Internet2?....*
> * *
> * *
-- Ray DePeña
Director, Stealth Startups
Strategic Business Advisor
> If noone has called it yet, I call dibs on CAN (Cloud Area Network) ;-)
> On Sat, Nov 7, 2009 at 1:32 PM, Rao Dronamraju <rao.dronamr...@sbcglobal.net
> > wrote:
> > *From the beginning it has been professed that CC for economies of scale
> > reason need mega data centers.*
> > *It certaily makes sense from economies of scale perspective but is the
> > physical maga data center(s) only way to achieve the economies of scale?...
> > *
> > * *
> > *Why not make mini-clouds from reasonably sized data centers (say 10,000+
> > as opposed to 100,000+) and connect them up in such a way that they form a
> > virtual mega data center.*
> > *If the mini-data centers run out of resource elasticity, they seamlessly
> > borrow / migrate resources from other nearby mini data centers.*
> > * *
> > *Ofcourse, one major issue here is, can WAN bandwidth support this?....*
> > * *
> > *That is why my earlier post some time back about why spend $75 - $100
> > billion on healthcare?....why not internet2?.*
> > *By increasing the WAN/Internet2 bandwidth, you not only help the CC
> > industry but also the internet/web industry.*
> > *Now, which has better ROI?...investing in Healthcare or the
> > Internet2?....*
> > * *
> > * *
> --
> Ray DePeña
> Director, Stealth Startups
> Strategic Business Advisor
It's not just the WAN bandwidth (which isn't itself a trivial problem), it's the latency between the workload and its storage. You can light the fiber as fat as you like with DWDM or whatever, but physics still controls the round-trip travel time. Different workloads are differently suited to different distances from their storage.
For many workloads, it's important to be very near their storage. This leads even to re-architecting the datacenter network from a routed hierarchy to a flat switched mesh, just to reduce the number of switching and routing networking devices in the path. In the public cloud (e.g. AWS) there's a strong performance incentive to use ephemeral instance storage (e.g. EC2) for primary processing, rather than pay the latency penalty for access to permanent storage services (e.g. S3).
So yes, mega data centers are necessary for mega workloads and mega storage, until we either solve the speed of light problem or get better at sharding workloads and storage.
Dean's talk is very good. Only quibble I have with it is his citation
of disk failures - pages 9 & 10. Instead of 1-5% disk AFR, it should be
0.01-0.05% disk AFR, and on page 10, O(1000s) is should be O(10s) in a
year, using best practices and storage elements.
The reason that disk AFR is very important is this; they are permanent,
not ephemeral. CPU failures (servers) are one thing, since they hold
very little data; disk failures, OTOH, are critical to avoid. If a
design assumes high disk AFR, it's forced into replicating data at least
2x (if not 3x or more, present in some designs today) just to overcome
same. Plus, we know that over half of reported disk failures, upon
close examination (e.g. running drive-level diagnostics) are false
positive, which makes the problem much worse. That is non-optimal both
in technical and economic terms.
Rob
From: cloud-computing@googlegroups.com
[mailto:cloud-computing@googlegroups.com] On Behalf Of Bob Sutterfield
Sent: Sunday, November 08, 2009 10:12 AM
To: cloud-computing@googlegroups.com
Subject: [ Cloud Computing ] Re: Are Mega Data Centers Necessary?...
It's not just the WAN bandwidth (which isn't itself a trivial problem),
it's the latency between the workload and its storage. You can light
the fiber as fat as you like with DWDM or whatever, but physics still
controls the round-trip travel time. Different workloads are
differently suited to different distances from their storage.
For many workloads, it's important to be very near their storage. This
leads even to re-architecting the datacenter network from a routed
hierarchy to a flat switched mesh, just to reduce the number of
switching and routing networking devices in the path. In the public
cloud (e.g. AWS) there's a strong performance incentive to use ephemeral
instance storage (e.g. EC2) for primary processing, rather than pay the
latency penalty for access to permanent storage services (e.g. S3).
So yes, mega data centers are necessary for mega workloads and mega
storage, until we either solve the speed of light problem or get better
at sharding workloads and storage.
No virus found in this incoming message.
Checked by AVG - www.avg.com Version: 9.0.698 / Virus Database: 270.14.53/2487 - Release Date:
11/08/09 01:37:00
Yes, I agree that latency is definitely a problem. But in life we all make
compromises especially when it comes to economics. When we cannot buy a
mansion we settle for a large house. Our expectations are adjusted to the
reality. We do not expect to drive cars that go more than 200+ mph or planes
that go 1000+ mph (at this time). So considering the fact that there are
hundreds if not thousands of hosting providers across the country and the
world in existance today with mini-cloud size facilities, is it not lot more
economical (especailly after an almost a great depression) to make virtual
mega data centers out of them than build these huge/mega data centers. Yes,
you certainly need to "get better at sharding workloads and storage." as you
say. Also thanks for the great slides.
_____
From: cloud-computing@googlegroups.com
[mailto:cloud-computing@googlegroups.com] On Behalf Of Bob Sutterfield
Sent: Sunday, November 08, 2009 10:12 AM
To: cloud-computing@googlegroups.com
Subject: [ Cloud Computing ] Re: Are Mega Data Centers Necessary?...
It's not just the WAN bandwidth (which isn't itself a trivial problem), it's
the latency between the workload and its storage. You can light the fiber
as fat as you like with DWDM or whatever, but physics still controls the
round-trip travel time. Different workloads are differently suited to
different distances from their storage.
For many workloads, it's important to be very near their storage. This
leads even to re-architecting the datacenter network from a routed hierarchy
to a flat switched mesh, just to reduce the number of switching and routing
networking devices in the path. In the public cloud (e.g. AWS) there's a
strong performance incentive to use ephemeral instance storage (e.g. EC2)
for primary processing, rather than pay the latency penalty for access to
permanent storage services (e.g. S3).
So yes, mega data centers are necessary for mega workloads and mega storage,
until we either solve the speed of light problem or get better at sharding
workloads and storage.
Robert Peglar wrote: > The reason that disk AFR is very important is this; they are permanent, > not ephemeral. CPU failures (servers) are one thing, since they hold very > little data; disk failures, OTOH, are critical to avoid.
In Google's designs, all hardware at every level of scope and scale is assumed to be ephemeral. This runs from memory chip bit error rates to data center power grid availability.
> If a design assumes high disk AFR, it’s forced into replicating data at > least 2x (if not 3x or more, present in some designs today) just to overcome > same. Plus, we know that over half of reported disk failures, upon close > examination (e.g. running drive-level diagnostics) are false positive, which > makes the problem much worse. That is non-optimal both in technical and > economic terms.
If the highest goal of the design is overall system availability and the second highest goal is low latency to each request, it makes sense to pay more in replication multipliers and in maintenance costs (diagnostic and inventory and time to return-to-service). Also, by now Google doesn't assume disk (or any other) failure rates, they measure their fleet's historical experience and project future expectations. So their architecture and software designs and their staffing levels are well informed by data. -- Bob Sutterfield b...@sutterfields.us http://www.linkedin.com/in/BobSutterfield
Another reason why you may want multiple data centers is the concept of
"computing at the edge", which means to bring computing as close to the
customer as possible. While this is theoretically possible, I personally
think there are a few technologies that need to be baked into the
applications on "DAY 1".
1. Cost based routing / cost based allocation- without cost-based routing,
requests end up going to the wrong datacenter, hence you lose the benefit of
distributed datacenters
2. Intelligent replication - Assuming that it is impossible to have high
bandwidth between each data center, high-read applications (e.g. Amazon)
needs to have the ability to read from a local datacenter, and have the data
replicated to other datacenters as necessary.
3. Multi-master systems - Assuming that it is impossible to have high
bandwidth between each data center, high-write applications (e.g. gmail)
needs to have the ability to write to the local datacenter, and have the
data replicated to other datacenters as necessary. The traditional database
does not support this !!!
My opinion is that high-bandwidth interconnects between data-centers is not
a tractable problem for public clouds. Its much better if the data locality
issue is tackled upfront instead. Now latency between data-centers is an
interesting problem - it might make sense to ask "what type of latency" is
important to you. For many applications, it is the TP99 (99th percentile of
latency) that matters for each request. In those cases, the latency is
typically dominated by bandwidth between data-centers. That's why if you
want good latency, it makes sense to build datacenter locality into your
application -> but the more datacenters that you have the harder it makes
the data to be local.
In my opinion, it is far too early for the average developer to tackle these
issues, so it makes sense to go with mega-data centers for each "region"
(Europe / NA / Asia / etc). 3 mega-datacenters per region is probably the
sweet-spot for application builders because it balances the need for
redundancy & and the ease developing applications with datacenter locality.
On Sun, Nov 8, 2009 at 1:59 PM, Bob Sutterfield <b...@sutterfields.us> wrote:
> Robert Peglar wrote:
> The reason that disk AFR is very important is this; they are permanent,
>> not ephemeral. CPU failures (servers) are one thing, since they hold very
>> little data; disk failures, OTOH, are critical to avoid.
> In Google's designs, all hardware at every level of scope and scale is
> assumed to be ephemeral. This runs from memory chip bit error rates to data
> center power grid availability.
>> If a design assumes high disk AFR, it’s forced into replicating data at
>> least 2x (if not 3x or more, present in some designs today) just to overcome
>> same. Plus, we know that over half of reported disk failures, upon close
>> examination (e.g. running drive-level diagnostics) are false positive, which
>> makes the problem much worse. That is non-optimal both in technical and
>> economic terms.
> If the highest goal of the design is overall system availability and the
> second highest goal is low latency to each request, it makes sense to pay
> more in replication multipliers and in maintenance costs (diagnostic and
> inventory and time to return-to-service). Also, by now Google doesn't
> assume disk (or any other) failure rates, they measure their fleet's
> historical experience and project future expectations. So their
> architecture and software designs and their staffing levels are well
> informed by data.
> --
> Bob Sutterfield
> b...@sutterfields.us
> http://www.linkedin.com/in/BobSutterfield
Can't disagree with your last paragraph. But, meeting that goal given
massive increases (non-linear) in data using high disk AFR is
increasingly difficult as more FTE labor has to be applied just to keep
the disk farm running. Disks may be cheap, but humans are expensive.
Plus, if the true goal is overall system availability, at some point the
concept of reliable s/w on top of unreliable h/w falls apart, unless you
have an endless supply of cheap humans for break/fix. What is really
needed is reliable s/w on top of reliable h/w; then there are far fewer
scaling issues. If the disk h/w is autonomic, you have a shot at
scalability.
Of course, all this is predicated on the assumption that optimizing for
low FTE cost is a goal. It may very well not be in Google's case.
Rob
From: cloud-computing@googlegroups.com
[mailto:cloud-computing@googlegroups.com] On Behalf Of Bob Sutterfield
Sent: Sunday, November 08, 2009 3:59 PM
To: cloud-computing@googlegroups.com
Subject: [ Cloud Computing ] Re: Are Mega Data Centers Necessary?...
Robert Peglar wrote:
The reason that disk AFR is very important is this; they are permanent,
not ephemeral. CPU failures (servers) are one thing, since they hold
very little data; disk failures, OTOH, are critical to avoid.
In Google's designs, all hardware at every level of scope and scale is
assumed to be ephemeral. This runs from memory chip bit error rates to
data center power grid availability.
If a design assumes high disk AFR, it's forced into replicating
data at least 2x (if not 3x or more, present in some designs today) just
to overcome same. Plus, we know that over half of reported disk
failures, upon close examination (e.g. running drive-level diagnostics)
are false positive, which makes the problem much worse. That is
non-optimal both in technical and economic terms.
If the highest goal of the design is overall system availability and the
second highest goal is low latency to each request, it makes sense to
pay more in replication multipliers and in maintenance costs (diagnostic
and inventory and time to return-to-service). Also, by now Google
doesn't assume disk (or any other) failure rates, they measure their
fleet's historical experience and project future expectations. So their
architecture and software designs and their staffing levels are well
informed by data.
No virus found in this incoming message.
Checked by AVG - www.avg.com Version: 9.0.698 / Virus Database: 270.14.53/2487 - Release Date:
11/08/09 13:39:00
---
Robert Peglar Vice President, Technology, Storage Systems Group
Email: mailto:Robert_Peg...@xiotech.com
Office: 952 983 2287
Mobile:314 308 6983
Fax: 636 532 0828
Xiotech Corporation
1606 Highland Valley Circle
Wildwood, MO 63005 http://www.xiotech.com/ : Toll-Free 866 472 6764
Robert Peglar wrote: > meeting that goal [overall availability] given massive increases > (non-linear) in data using high disk AFR is increasingly difficult as more > FTE labor has to be applied just to keep the disk farm running. Disks may > be cheap, but humans are expensive... all this is predicated on the > assumption that optimizing for low FTE cost is a goal. It may very well not > be in Google’s case.
One way to reduce human costs is to staff for average needs, not peak, which means leveling the demand. That means building resilient systems that can survive a while, meeting SLAs, even with broken components. So a spindle that packs in at 2:00am isn't an emergency requiring immediate attention. It can be left in place until the day shift arrives, and replaced as part of their (optimally sorted) batch repair process. -- Bob Sutterfield b...@sutterfields.us http://www.linkedin.com/in/BobSutterfield
"Another reason why you may want multiple data centers is the concept of
"computing at the edge", which means to bring computing as close to the
customer as possible. While this is theoretically possible, I personally
think there are a few technologies that need to be baked into the
applications on "DAY 1"."
There are lot of hosting providers who are already close/computing to the
edge, their customers. This infrastructure has been put in place in the last
15 years (ever since internet/web became ubiquitous). These facilities and
their proximity to customers should alleviate the latency problem. They also
have worked out the BW problems over the last 15 years. But I do agree with
Bob that if you go this route, the inter-miniCloud/datacenter latency would
be a problem and consequently you need to do workload/processing proximity
optimization. In addition, this distributed architecture distributes risk,
BW congestion associated with mega data centers. Even with mega data centers
we will come across BW and latency issues. Because in order to run them
efficiently, they will have to aggregate hundreds of multi-clients from
across the country and continents. This in itself will cause both BW and
latency issues.
Also another issue comes to mind, considering that the CC industry is headed
the private cloud route atleast in the next 3 to 5 years, the private clouds
are distributed across the country. So in the next phase, the hybrid clouds
will augment the extra capacity of these private clouds and they will also
possibly be closer to their customers - the private clouds. So economics is
driving the architecture/topology of the clouds to be distributed rather
than consolidated, monolithic mega data centers. It will be intereting to
see what happens in the next 5 years.
_____
From: cloud-computing@googlegroups.com
[mailto:cloud-computing@googlegroups.com] On Behalf Of Alan Ho
Sent: Sunday, November 08, 2009 11:19 PM
To: cloud-computing@googlegroups.com
Subject: [ Cloud Computing ] Re: Are Mega Data Centers Necessary?...
Another reason why you may want multiple data centers is the concept of
"computing at the edge", which means to bring computing as close to the
customer as possible. While this is theoretically possible, I personally
think there are a few technologies that need to be baked into the
applications on "DAY 1".
1. Cost based routing / cost based allocation- without cost-based routing,
requests end up going to the wrong datacenter, hence you lose the benefit of
distributed datacenters
2. Intelligent replication - Assuming that it is impossible to have high
bandwidth between each data center, high-read applications (e.g. Amazon)
needs to have the ability to read from a local datacenter, and have the data
replicated to other datacenters as necessary.
3. Multi-master systems - Assuming that it is impossible to have high
bandwidth between each data center, high-write applications (e.g. gmail)
needs to have the ability to write to the local datacenter, and have the
data replicated to other datacenters as necessary. The traditional database
does not support this !!!
My opinion is that high-bandwidth interconnects between data-centers is not
a tractable problem for public clouds. Its much better if the data locality
issue is tackled upfront instead. Now latency between data-centers is an
interesting problem - it might make sense to ask "what type of latency" is
important to you. For many applications, it is the TP99 (99th percentile of
latency) that matters for each request. In those cases, the latency is
typically dominated by bandwidth between data-centers. That's why if you
want good latency, it makes sense to build datacenter locality into your
application -> but the more datacenters that you have the harder it makes
the data to be local.
In my opinion, it is far too early for the average developer to tackle these
issues, so it makes sense to go with mega-data centers for each "region"
(Europe / NA / Asia / etc). 3 mega-datacenters per region is probably the
sweet-spot for application builders because it balances the need for
redundancy & and the ease developing applications with datacenter locality.
Regards,
Alan Ho
On Sun, Nov 8, 2009 at 1:59 PM, Bob Sutterfield <b...@sutterfields.us> wrote:
Robert Peglar wrote:
The reason that disk AFR is very important is this; they are permanent, not
ephemeral. CPU failures (servers) are one thing, since they hold very
little data; disk failures, OTOH, are critical to avoid.
In Google's designs, all hardware at every level of scope and scale is
assumed to be ephemeral. This runs from memory chip bit error rates to data
center power grid availability.
If a design assumes high disk AFR, it's forced into replicating data at
least 2x (if not 3x or more, present in some designs today) just to overcome
same. Plus, we know that over half of reported disk failures, upon close
examination (e.g. running drive-level diagnostics) are false positive, which
makes the problem much worse. That is non-optimal both in technical and
economic terms.
If the highest goal of the design is overall system availability and the
second highest goal is low latency to each request, it makes sense to pay
more in replication multipliers and in maintenance costs (diagnostic and
inventory and time to return-to-service). Also, by now Google doesn't
assume disk (or any other) failure rates, they measure their fleet's
historical experience and project future expectations. So their
architecture and software designs and their staffing levels are well
informed by data.
Two comments: - SSD's offer an almost 2 order improvement in random read performance (most relevant to a search engine type application). And despite the flattening of the Zipfian distribution (which appears to be stabilizing now?), a quantitative analysis of real web search workloads (well, not Googles - there arent public traces for those that I am aware of) shows significant amenability to dynamic tiering across SSD & HDD hybrid storage.
I.e. the close synergy between multi-replication for performance and multi-replication for availability which served Google so well in a 2000-2003 architecture needs to be revisited IMHO. It was quite appropriate (perhaps) then with HDD being your only medium, but there are more tools to be brought to bear in 2009-2012. (We don't publicly know what GOOG is doing at the moment, probably not standing still is my guess).
- Not frequently mentioned is that one weakness of the "nine fives" architecture of hordes of modest performance servers is power consumption. In some ways Google type architecture is trading off a lot of capex for a sub-optimal opex.
It is hard for me to believe that the optimal solutions are exclusively at either end point (i.e. the most vanilla servers on one side and only the best EMC/NetApp/HP/IBM gear that money can buy on the other) and not somewhere in between.
This aligns with my first comment. In some ways, it is worse to keep tossing thousands of servers at a problem (because it burns up a lot of power and that is actually more precious to us all than money) and replicating like crazy than to spend lots of (and certainly even a bit of) money on doing something less brute force.
Jay
Date: Sun, 8 Nov 2009 13:59:00 -0800
Subject: [ Cloud Computing ] Re: Are Mega Data Centers Necessary?...
From: b...@sutterfields.us
To: cloud-computing@googlegroups.com
If the highest goal of the design is overall system availability and the second highest goal is low latency to each request, it makes sense to pay more in replication multipliers and in maintenance costs (diagnostic and inventory and time to return-to-service). Also, by now Google doesn't assume disk (or any other) failure rates, they measure their fleet's historical experience and project future expectations. So their architecture and software designs and their staffing levels are well informed by data.
--Bob Sutterfield...@sutterfields.ushttp://www.linkedin.com/in/BobSutterfield