5 Key Lessons for Customers of the Cloud

CloudSigma

unread,

Apr 23, 2011, 1:09:10 PM4/23/11

to Cloud Computing

Following the major outage that AWS suffered in their east coast US
facility this week, after the dust settles, what lessons can customers
actually learn from the events of this last week?

Here are the five key lessons we've highlighted to customers:

Lesson 1: Both Cloud and Dedicated Computing Have Single Points of
Failure

Lesson 2: Size is No Protection from Outages without Redundancy

Lesson 3: All Data Centres Are Not Equal

Lesson 4: The Price-Performance-Reliability Metric

Lesson 5: Achieving a highly robust set-up is cheaper and easier in
the Cloud

Customers need openness from vendors about their infrastructure
choices and locations in order to create price-performance-reliability
comparisons between clouds. This is a key development needed if people
are to make the right decisions and create the appropriate strategies
in line with their computing needs in the cloud.

Best wishes,

Robert
CTO
CloudSigma
Full test of our blog post on this subject can be found at
http://www.cloudsigma.com/en/blog/2011/04/23/21-cloud-outages-lessons-learned
.

Khazret Sapenov

unread,

Apr 23, 2011, 2:46:52 PM4/23/11

to cloud-c...@googlegroups.com

On Sat, Apr 23, 2011 at 1:09 PM, CloudSigma <rob...@cloudsigma.com> wrote:

Following the major outage that AWS suffered in their east coast US
facility this week, after the dust settles, what lessons can customers
actually learn from the events of this last week?

Here are the five key lessons we've highlighted to customers:

Lesson 1: Both Cloud and Dedicated Computing Have Single Points of
Failure

Lesson 2: Size is No Protection from Outages without Redundancy

Lesson 3: All Data Centres Are Not Equal

Lesson 4: The Price-Performance-Reliability Metric

Lesson 5: Achieving a highly robust set-up is cheaper and easier in
the Cloud

Customers need openness from vendors about their infrastructure
choices and locations in order to create price-performance-reliability
comparisons between clouds. This is a key development needed if people
are to make the right decisions and create the appropriate strategies
in line with their computing needs in the cloud.

Customers already have wide choice of locations within Amazon EC2.

If you saw their status of availability, only one location of many was affected,

so if one has to engineer his/her apps for redundancy/resilience,

then first quick and easy way would be to implement such DR functionality within one cloud (be it AWS or another provider, if they have multiple locations at all),

where you have uniform interfaces and formats, so re-engineering efforts do not sacrifice reliability and interoperability of final solution.

Best wishes,

Robert
CTO
CloudSigma
Full test of our blog post on this subject can be found at
http://www.cloudsigma.com/en/blog/2011/04/23/21-cloud-outages-lessons-learned
.

--
~~~~~
3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View, CA * http://cloudslam.org
UP 2010 Conference: http://www.up-con.com
Posting guidelines: http://bit.ly/bL3u3v
Follow us on Twitter @cloudcomp_group
Post Job/Resume at http://cloudjobs.net
Get hundreds of conference sessions and panels on cloud computing on DVD at
http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B004L1755W, http://www.amazon.com/gp/product/B002H0IW1U or GET instant access to downloadable versions at
- http://www.up-con.com/register
- http://cloudslam09.com/content/registration-5.html
- http://cloudslam10.com/content/registration

~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com

Ray

unread,

Apr 23, 2011, 2:57:55 PM4/23/11

to cloud-c...@googlegroups.com

You forgot the single most important lesson - design you applications for
mechanical failure.

No IT service based on the mechanically bound unit of a Datacenter can
overcome the eventual failure of the mechanics. Power will be lost, network
interrupted, equipment fail, cloud provider will go out of business.
Building apps that are Datacenter bound is a GAURENTEE that you will have
downtime at some point in the app's lifecycle. If it's an HR app it's
probably no big deal. If it's an ecommerce app generating a million dollars
a minute in revenue, it will be a very big deal.

The lessons learned by the AWS victims and onlookers alike this week should
be that applications designed from the ground up to survive mechanical
failure will have less downtime that their hardware bound cousins. Netflix
is the most visible of applications built like this and, in fact, it's most
recent outage was not cloud related but, rather, a failure in one of its few
remaining datacenters. Netflix survived this week's AWS outage because it
redesigned its application off of its hardware dependency. To take this
fault tolerant strategy a step further, Netflix runs a continual set of
tests to ensure its capability to deal with infrastructure failure.

If you are going to build applications in the cloud, at a minimum, you need
to have a cloud provider with more than one datacenter in more than one
geography. Otherwise you might as well accept that, sometime in the future,
you will have a significant outage.

Best wishes,

Miha Ahronovitz

unread,

Apr 23, 2011, 2:58:52 PM4/23/11

to Cloud Computing

Khazret, to give credibility to what you say, see http://status.aws.amazon.com/
California EC is up, N. Virginia is down.
But why it should be the worry of the customers to move things
around?
AWS should have mechanisms to move automatically to other facilities.
It doesn't

Miha

> >http://www.cloudsigma.com/en/blog/2011/04/23/21-cloud-outages-lessons...

> > .
>
> > --
> > ~~~~~
> > 3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View,

> > CA *http://cloudslam.org

> > UP 2010 Conference:http://www.up-con.com
> > Posting guidelines:http://bit.ly/bL3u3v
> > Follow us on Twitter @cloudcomp_group

> > Post Job/Resume athttp://cloudjobs.net

> > Get hundreds of conference sessions and panels on cloud computing on DVD at
> >http://www.amazon.com/gp/product/B002H07SEC,
> >http://www.amazon.com/gp/product/B004L1755W,

> >http://www.amazon.com/gp/product/B002H0IW1Uor GET instant access to

> > downloadable versions at
> > -http://www.up-con.com/register
> > -http://cloudslam09.com/content/registration-5.html

> > -http://cloudslam10.com/content/registration

Khazret Sapenov

unread,

Apr 23, 2011, 3:06:58 PM4/23/11

to cloud-c...@googlegroups.com

On Sat, Apr 23, 2011 at 2:58 PM, Miha Ahronovitz <myinne...@gmail.com> wrote:

Khazret, to give credibility to what you say, see http://status.aws.amazon.com/
California EC is up, N. Virginia is down.

Thanks, Miha for enhancing credibility of my post, but I've posted this link already yesterday and it's referenced everywhere.

But why it should be the worry of the customers to move things
around?
AWS should have mechanisms to move automatically to other facilities.
It doesn't

Your statement contradicts your employer's notion of IaaS providing only bare minimum with all 'non-relevant' functionality outsourced to external parties.

Now you want your competition to make an extra step, that sounds logical, but not implemented by many [IaaS] yet.

Nice.

~~~~~
3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View, CA * http://cloudslam.org

UP 2010 Conference: http://www.up-con.com
Posting guidelines: http://bit.ly/bL3u3v
Follow us on Twitter @cloudcomp_group

Post Job/Resume at http://cloudjobs.net

Get hundreds of conference sessions and panels on cloud computing on DVD at

http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B004L1755W, http://www.amazon.com/gp/product/B002H0IW1U or GET instant access to downloadable versions at
- http://www.up-con.com/register
- http://cloudslam09.com/content/registration-5.html

- http://cloudslam10.com/content/registration

Ray

unread,

Apr 23, 2011, 3:10:01 PM4/23/11

to cloud-c...@googlegroups.com

The other key is to avoid using cloud provider specific features like RDS. This is killing some AWS customers as RDS seems to be having an especially hard time recovering and there is no RDS equivalent in other cloud provider feature inventories to fail over to.

Miha Ahronovitz

unread,

Apr 23, 2011, 4:04:50 PM4/23/11

to cloud-c...@googlegroups.com

The opinions I express are mine. AWS is not a competition, and it is a competition. So is every company in the cloud space. AWS. is a player, albeit the largest one, in what we preach on this forum: cloud computing.

The started it all and their failure can compromise the idea of cloud computing future adoption, in narrow minds that abound, unfortunately around. . This makes us all friends with AWS. However their SLA - discussed here in depth -are not consumer friendly.

Sure, I think Amazon should tolerate 3rd party , sort of cloud disaster recovery. These 3rd parties, may use tools to migrate to safer locations of the provider (like from AWS East to West) or may have multiple providers under a single management.

But knowing how Amazon does things, they will probably develop in-house another Service , let's name it, ADR (Amazon Disaster Recovery) and charge for it accordingly.

This is not a "non-relevant" functionality. After the AWS incident, it is highly relevant to offer services for cloud disaster recovery.

Miha

mij123.vcf

Miha Ahronovitz

unread,

Apr 23, 2011, 4:05:52 PM4/23/11

to cloud-c...@googlegroups.com

Well said Ray.

Miha

mij123.vcf

Robert Jenkins

unread,

Apr 24, 2011, 2:22:49 AM4/24/11

to cloud-c...@googlegroups.com

The military often get accused of fighting the last war or at least is in danger of doing so and we can look at things in the same light in the cloud right now. People's knee jerk reaction to the outage this week saw simple 'use multiple AWS zones' however this doesn't fully address the issue because AWS remains a single point of failure in this case. When it comes to this we very much 'eat our own dog food' by the way and do encourage all our customers to run secondary or balanced operations within another cloud provider.

There are a myriad of scenarios where not being reliant on one cloud provider (even in multiple locations) makes a lot of sense. Companies go under, get bought, change policies, change products, do things we don't like etc. There are therefore many practical, technical and ethical reasons why it is wise to be reliant on more than one cloud vendor. That is the next single point of failure that people will talk about.

Cheers,

Robert

CTO

CloudSigma

http://www.cloudsigma.com

--
~~~~~
3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View, CA * http://cloudslam.org
UP 2010 Conference: http://www.up-con.com
Posting guidelines: http://bit.ly/bL3u3v
Follow us on Twitter @cloudcomp_group
Post Job/Resume at http://cloudjobs.net
Get hundreds of conference sessions and panels on cloud computing on DVD at
http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B004L1755W, http://www.amazon.com/gp/product/B002H0IW1U or GET instant access to downloadable versions at
- http://www.up-con.com/register
- http://cloudslam09.com/content/registration-5.html
- http://cloudslam10.com/content/registration

~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com

--
Robert Jenkins
Co-Founder
CloudSigma AG
E: rob...@cloudsigma.com
T: www.twitter.com/CloudSigma
W: www.cloudsigma.com

Create a cloud server in 2 minutes: http://bit.ly/g3UuTN

====================
This email is from CLOUDSIGMA AG. The contents of this email and any attachments are confidential to the intended recipient. They may not be disclosed to or used by or copied in any way by anyone other than the intended recipient. If this email is received in error, please contact CLOUDSIGMA AG on +41 (0)44 585 39 07 quoting the name of the sender and the email address to which it has been sent and then delete it. Please note that neither CLOUDSIGMA AG nor the sender accepts any responsibility for viruses and it is your responsibility to scan or otherwise check this email and any attachments. CLOUDSIGMA AG is a public limited company registered in Canton Zürich, Switzerland (registered number CH-020.3.034.422-0) with registered offices at Sägereistrasse 29, 8152 Glattbrugg, Switzerland. For further information, please refer to www.cloudsigma.com .
====================

kowsik

unread,

Apr 23, 2011, 8:31:02 PM4/23/11

to cloud-c...@googlegroups.com

On Sat, Apr 23, 2011 at 12:06 PM, Khazret Sapenov <sap...@gmail.com> wrote:
> On Sat, Apr 23, 2011 at 2:58 PM, Miha Ahronovitz <myinne...@gmail.com>
> wrote:
>>
>> Khazret, to give credibility to what you say, see
>> http://status.aws.amazon.com/
>> California EC is up, N. Virginia is down.
>
> Thanks, Miha for enhancing credibility of my post, but I've posted this link
> already yesterday and it's referenced everywhere.
>
>>
>> But why it should be the worry of the customers to move things
>> around?
>> AWS should have mechanisms to move automatically to other facilities.
>> It doesn't
>
> Your statement contradicts your employer's notion of IaaS providing only
> bare minimum with all 'non-relevant' functionality outsourced to external
> parties.
> Now you want your competition to make an extra step, that sounds logical,
> but not implemented by many [IaaS] yet.
> Nice.

+1. AWS is the single largest multi-region IaaS vendor that provides a
consistent API (across regions). If the apps were designed for failing
over to an alternate region, all the bare metal capabilities do
already exist to solve this problem (ELB, anycast DNS). Netflix, for
example, runs in 3 different regions (see @adrianco's tweets).

I wouldn't say the same to PaaS vendors (built on top of AWS) that put
all their apps in a single location. That was fail.

K.
---
http://blitz.io
http://twitter.com/pcapr

Robert Jenkins

unread,

Apr 24, 2011, 10:43:35 AM4/24/11

to cloud-c...@googlegroups.com

I don't know of any multi-location IaaS vendor that doesn't provide a consistent API across their availability areas. This can't be described as a special characteristic of AWS but the default position of IaaS vendors. When would an IaaS vendor actually have different APIs by location? I can't find any use case or example.

AWS may have more locations due to its size but on the flip side it is one of the most proprietary platforms. You are into a sort of Apple versus Android debate here. If you use Apple a lot its great but its compatibility with non-Apple isn't good. The question is does AWS make it easy to migrate OFF AWS or to be compatible with other clouds and standards? The answer is clearly no. That's fine but customers need to clearly understand that approach and its implications for their own cloud infrastructure strategies especially when outages occur.

Kind regards,

Robert

CTO

http://www.cloudsigma.com

--

~~~~~
3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View, CA * http://cloudslam.org
UP 2010 Conference: http://www.up-con.com
Posting guidelines: http://bit.ly/bL3u3v
Follow us on Twitter @cloudcomp_group
Post Job/Resume at http://cloudjobs.net
Get hundreds of conference sessions and panels on cloud computing on DVD at
http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B004L1755W, http://www.amazon.com/gp/product/B002H0IW1U or GET instant access to downloadable versions at
- http://www.up-con.com/register
- http://cloudslam09.com/content/registration-5.html
- http://cloudslam10.com/content/registration

~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com

Ray

unread,

Apr 24, 2011, 12:21:48 PM4/24/11

to cloud-c...@googlegroups.com

Netflix, I believe, runs in multiple Availability Zones (AZ) which are
separate physical buildings in the same region - in this case US-East in
Virginia. Today ELB does not span regions so you would need an outboard
solution hosted in another datacenter.

Ray

-----Original Message-----
From: cloud-c...@googlegroups.com
[mailto:cloud-c...@googlegroups.com] On Behalf Of kowsik
Sent: Saturday, April 23, 2011 5:31 PM
To: cloud-c...@googlegroups.com
Subject: Re: [ Cloud Computing ] Re: 5 Key Lessons for Customers of the
Cloud

--

Geoff Arnold

unread,

Apr 24, 2011, 1:03:27 PM4/24/11

to cloud-c...@googlegroups.com

>
> AWS may have more locations due to its size but on the flip side it is one of the most proprietary platforms. You are into a sort of Apple versus Android debate here. If you use Apple a lot its great but its compatibility with non-Apple isn't good. The question is does AWS make it easy to migrate OFF AWS or to be compatible with other clouds and standards? The answer is clearly no. That's fine but customers need to clearly understand that approach and its implications for their own cloud infrastructure strategies especially when outages occur.

Since there are no fully-baked open source solutions or ratified standards documents, all cloud platforms are "proprietary" today. Saying that AWS is "one of t he most proprietary" makes no sense.

The AWS API represents the nearest thing to a de facto standard, however. It's supported by OpenStack, Eucalyptus, and others. So if you define "proprietary" as "leading to lock-in", AWS is (relatively) non-proprietary!

In practice, none of the leading public cloud operators have much incentive to make it easy for customers to leave. This means that interoperation and high-level abstraction is going to be driven by customer groups (like TM Forum) and orchestration vendors like RightScale.

Personally, I'd like to see a two-tier API standard emerge: EC2/EBS/... for the lower, procedural interface, and something like the Oracle Cloud Resource Model (http://www.oracle.com/us/corporate/press/184426) for high-level declarative deployments. I think that the Oracle - formerly Sun - model is more powerful and expressive than either vCloud or CloudFormation, but there's still a lot of work to be done in this area.

Miha Ahronovitz

unread,

Apr 24, 2011, 1:08:53 PM4/24/11

to cloud-c...@googlegroups.com, kowsik

On 4/23/2011 5:31 PM, kowsik wrote:
> Netflix, for
> example, runs in 3 different regions (see @adrianco's tweets).

Sure, this a good thing , but how much this costs? Netflix footprint
on AWS is huge and cost a huge X (Netflix repeatedly refused to disclose
how much they pay AWS for hosting their Data Center at each meetup
gathering where their top technical people spoke)

Now if they host on 3 regions, quite simply this means a cost is 3X
versus 1X. Say they got a deal. They still pay , say 2X, and it should
be two HUGE Xs.

Netflix has a group of rock stars that claim credit for the success of
Netflix port to AWS and now their survival of AWS failure. But not every
company has the same capabilities. And even if they have the same
capabilities (like Zencoder and Quora, both with superb engineering
talent) they do not have the same $ to spend.

The Zecoder blog entry is must read
http://blog.zencoder.com/2011/04/22/skynet-ec2-and-zencoder/

As a trainee engineer, I was told the following definition of an
engineer: "An engineer is someone who can build with $1M the same
thing that any fool can with $100M"

Cheers,

Miha

mij123.vcf

Flávio R. C. Sousa

unread,

Apr 24, 2011, 1:54:13 PM4/24/11

to cloud-c...@googlegroups.com

More lessons. http://agilesysadmin.net/ec2-outage-lessons

Prof. Flávio R. C. Sousa
Software Engineering Coordinator
Federal University of Ceará, Quixadá, Brazil
http://www.es.ufc.br/~flavio

AdrianC

unread,

Apr 24, 2011, 6:18:45 PM4/24/11

to Cloud Computing

Miha,

If you think it costs 3X to be in three availability zones then you do
need better engineers...

We run a third of our systems in each zone normally, to avoid the bad
zone we moved to be running half our systems in each of two zones.
About the same number of systems total, and the cost of the systems is
the dominant cost.

Since the cloud is elastic (remember, that was the point :-) we don't
need to pre-allocate capacity in each zone to take the extra load,
unlike a datacenter DR solution.

The cost of running in multiple zones is a minor increase in network
cost, slightly more latency, and that you have to decide to do it up
front. This works for any scale, nothing to do with running large
scale. If you don't have enough instances to spread three ways, use
smaller instances.

Adrian

On Apr 24, 10:08 am, Miha Ahronovitz <mij...@sbcglobal.net> wrote:
> On 4/23/2011 5:31 PM, kowsik wrote:> Netflix, for
> > example, runs in 3 different regions (see @adrianco's tweets).
>
> Sure, this a good thing , but how much this costs? Netflix footprint
> on AWS is huge and cost a huge X (Netflix repeatedly refused to disclose
> how much they pay AWS for hosting their Data Center at each meetup
> gathering where their top technical people spoke)
>
> Now if they host on 3 regions, quite simply this means a cost is 3X
> versus 1X. Say they got a deal. They still pay , say 2X, and it should
> be two HUGE Xs.
>
> Netflix has a group of rock stars that claim credit for the success of
> Netflix port to AWS and now their survival of AWS failure. But not every
> company has the same capabilities. And even if they have the same
> capabilities (like Zencoder and Quora, both with superb engineering
> talent) they do not have the same $ to spend.
>

> The Zecoder blog entry is must readhttp://blog.zencoder.com/2011/04/22/skynet-ec2-and-zencoder/

>
> As a trainee engineer, I was told the following definition of an
> engineer: "An engineer is someone who can build with $1M the same
> thing that any fool can with $100M"
>
> Cheers,
>
> Miha
>

> mij123.vcf
> < 1KViewDownload

Miha Ahronovitz

unread,

Apr 24, 2011, 10:32:23 PM4/24/11

to cloud-c...@googlegroups.com

Adrian thanks. It is magic what you guys did. Thanks for teaching us it does not cost so much more to run in three zones.

However, as you twitted, Netflix was lucky that all AWS failed during the night. It appears during the peak day hours, the operation the reduction from 3 to 2 zone might not have had gone so smoothly.

Also in the immense flow of explanations and article coming from all directions, how come such smart teams like Zencoder , reddit or quora did not do the same as Netflix? Actually reddit said " "EBS also has reliability issues. Even before AWS fail, we had random disks degrading multiple times a week".

This EBS blues is also a subject of your blog entry from March 18, "Understanding and using Amazon EBS - Elastic Block Store"
http://perfcap.blogspot.com/2011/03/understanding-and-using-amazon-ebs.html

You write

" The problem with EBS is that it doesn't have a particularly steady state. To explain why we need to look at the underlying architecture. I don't know the details of how EBS is implemented, but there is enough information available to explain how it behaves."

You recommend some pragmatic tests to " collect response time and throughput and plot your data over time. You need to run long enough that the performance shows steady state behavior." How many enterprises have teams to do these experiments with a product offered by most prestigious IaaS and PaaS provider in the known Universe?

So my remark of the definition of the engineer, fits best the the team who designed the EBS product to begin with. This is not a finished product. There must be a way to " provide a reliable place to store data that doesn't go away when EC2 instances are dropped," without mounting EBS volumes on a single EC instance until is crashes"

BTW, other providers have persistent storage, You know them.

I do recognize your team contribution, you passed an extraordinary test. But, AWS has some intrinsic problems, EBS is one of them, that in spite of the warnings coming from all directions, it was not fixed. We are all humans, we err, sure we understand that.

It is not engineering. If AWS opens and invites third parties to develop solutions to make persistent storage in EC instances, we will be surprise what the worls can come with. But if they keep it in house, invoking a monopoly. People who believe they can do everything themselves, are punished by then Divine, whom we can not ever replace. They loose their gift of prophecy, and may fall down like an apple froma tree.

2 cents and thanks for keeping the debate interesting.

Miha

mij123.vcf

Ray

unread,

Apr 25, 2011, 12:26:16 AM4/25/11

to cloud-c...@googlegroups.com

Miha, the problem with EBS is that, like all other parts of the AWS architecture, it’s shared. So the question is, if you know this and don’t design you app or at least your DR strategy with this in mind, what’s broken? AWS or your app’s architecture.

I would argue it’s the later and it reminds me of the first move from mainframe to client/server apps. There was much experimentation and invention before multi-tiered, distributed application stacks came to be. That’s what’s happening now with the cloud. The learning curve is steep and teams like reddit and Quara are getting hard lessons now.

The lesson here is that lessons will be the norm for some time until popular design patterns emerge and people understand how to build apps for the cloud.

Ray

From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Miha Ahronovitz
Sent: Sunday, April 24, 2011 7:32 PM
To: cloud-c...@googlegroups.com
Subject: Re: [ Cloud Computing ] Re: 5 Key Lessons for Customers of the Cloud

Adrian thanks. It is magic what you guys did. Thanks for teaching us it does not cost so much more to run in three zones.

--

Miha Ahronovitz

unread,

Apr 25, 2011, 12:10:07 PM4/25/11

to cloud-c...@googlegroups.com

Ray, why should I know, as a principle the insides and the misbehavior of an element from my providers architecture, designed to give service a customer who does not work there ? 

If you remember we defined a long time the cloud from the customer perspective. Customers have the "illusion of infinite resources" (elasticity, meaning constant quality of service), they pay only what they use (billing predictable) and they have no idea - and want to have no idea or have steep learning curves - to learn the cloud internals.

AWS - because of EBS - violates this definition of a cloud many times over. A well designed cloud does not require it's customers to learn nothing new. They simply must take their apps, as they run them today on their physical servers, and place it on the
 cloud. Are they providers like that? Yes they are, but they are not as big as AWS. For now.

Miha


From: Ray <rnu...@yahoo.com>
To: cloud-c...@googlegroups.com
Sent: Sun, April 24, 2011 9:26:16 PM
Subject: RE: [ Cloud Computing ] Re: 5 Key Lessons for Customers of the Cloud

Jeanne Morain

unread,

Apr 25, 2011, 2:34:42 PM4/25/11

to cloud-c...@googlegroups.com

Kowsik,
Very well said. One key lesson is for customers to not auto-magically believe that because they port their apps & data to either an IAAS or PaaS provider that the basic best practices for DR, Redundancy and Fail over need not apply to achieve continuous system ops in a public cloud.

For the point about automatically moving things around to other facilities - that is a slippery slope given that Amazon is an IAAS provider with Pops in several countries. There would need to be more workflows, tools for customers for audit, etc to ensure that regulated apps and data were not moved to the wrong Pop. As a company they have a responsibility to insure applications are compliant not just with technical SLAs but also Business Directives (regulatory, security, or people)....

Nice post.

Regards,
Jeanne

--- On Sat, 4/23/11, kowsik <kow...@gmail.com> wrote:

From: kowsik <kow...@gmail.com>
Subject: Re: [ Cloud Computing ] Re: 5 Key Lessons for Customers of the Cloud

Robert Jenkins

unread,

Apr 26, 2011, 8:33:50 AM4/26/11

to cloud-c...@googlegroups.com

Adrian makes an excellent point that replication doesn't necessarily mean like for like increases in cost. Yes you'll have a degree of base load maintaining data across multiple locations but can then elastically move work loads between locations. You can have this baked into a good set-up so whether it is relative customer demand or a location going down, the system should react automatically and accordingly to maintain service. This was our final point in the post, this is a lot easier to achieve with a good cloud set-up than with dedicated hardware. Actually with dedicated you would likely see the sort of replication that Miha was talking about because you don't have the cloud elasticity to play with so you need full redundancy of capacity.

This outage will likely encourage many users of IaaS to look at adopting such strategies which can only be a good thing but in doing so they would be in danger of fighting the last war not the next to use an old military adage. Single points of failure be that a location or a system aren't good. I think the next big disruption will be vendor specific, meaning an issue with a vendor be it software or corporate. That's another single point of failure and regardless of size, absolutely no company is beyond failure or problems. To deny this is, IMHO hubris. That's a lesson I think everyone should see from the recent financial collapse, Enron scandals before it etc. Given that there are many credible choices in the IaaS space, a company of the size wanting a proper multi location strategy should also implement a multi-vendor strategy.

For example, what happens if a company you are doing business with has their assets frozen? Or perhaps more likely, an exploit is found to some software they are using. That could bring down ALL their locations and potentially destroy all their data too in an extreme case. It isn't likely but with so much data and computing moving to the cloud, there will definitely be exploits found in some of the plethora of cloud management software suites being pushed out currently and that is the sort of vendor specific multi-location problem you could see.

Don't get me wrong, the cloud is in my opinion far more robust than a dedicated deployment and I don't wish to be accused of fear mongering because I think the track record of IaaS to date speaks for itself. My point is, doesn't it make sense to address all single points of failure that can be identified and understood, especially when they can be relatively easily addressed?

Cheers,

Robert

Pietrasanta, Mark

unread,

Apr 26, 2011, 12:15:49 PM4/26/11

to cloud-c...@googlegroups.com

And… it all comes back to the databases, as it always seems to.

Multi-site is great in theory. But whether it’s hot-hot-load-balanced, or hot-cold-cut-over, or something in the middle, the issue of data replication and data integrity continue to be the core problem.

Amazon would likely do what some here have suggested, and automagically move people around when there’s an outage like this, except they likely can’t replicate all the data, maintain integrity, and keep up with the volume of changes on a second-to-second basis. SAN-to-SAN in a single data center is feasible, but from East-to-West coast is just not practical, today, on the scale of what Amazon is dealing with. And even so, this doesn’t really deal with the database/transactional integrity issue, which adds a whole additional layer.

So we agree that IaaS vendors don’t give us free, transparent DR/COOP, and we need to build that in just like we did before. But what about the increasing number of PaaS/SaaS (real) cloud vendors, where we have no insight or control over the infrastructure? Don’t those PaaS/SaaS (real) cloud vendors have to build in their own DR/COOP? And shouldn’t they disclose that to us?

I hope that products like NimbusDB (based on that great description a few weeks back) come to fruition and finally free us from location-centric-with-hacked-replication DB models. But it seems we’re still not there, and so multi-site cloud will still suffer from this currently not-intractable-but-really-pain-in-the-neck problem.

Sassa

unread,

Apr 26, 2011, 11:36:05 AM4/26/11

to Cloud Computing

The same argument works the other way around. As a metaphor, how can a
service provider know some client uses a key-value store as a
transactional database?

Perhaps, it is ok to observe "finite" reliability of an "infinite"
resource.

(yep, and SLAs are just promises - which clients have no way to even
estimate - because they don't know the cloud internals)

Sassa

On Apr 25, 5:10 pm, Miha Ahronovitz <mij...@sbcglobal.net> wrote:
> Ray, why should I know, as a principle the insides and the misbehavior of an
> element from my providers architecture, designed to give service a customer who
> does not work there ?
>
> If you remember we defined a long time the cloud from the customer perspective.
> Customers have the "illusion of infinite resources" (elasticity, meaning
> constant quality of service), they pay only what they use (billing predictable)
> and they have no idea - and want to have no idea or have steep learning curves -
> to learn the cloud internals.
>
> AWS - because of EBS - violates this definition of a cloud many times over. A
> well designed cloud does not require it's customers to learn nothing new. They
> simply must take their apps, as they run them today on their physical servers,
> and place it on the cloud. Are they providers like that? Yes they are, but they
> are not as big as AWS. For now.
>
> Miha
>
> ________________________________

> From: Ray <rnug...@yahoo.com>

> http://perfcap.blogspot.com/2011/03/understanding-and-using-amazon-eb...

> 3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View, CA *http://cloudslam.org

> UP 2010 Conference:http://www.up-con.com
> Posting guidelines:http://bit.ly/bL3u3v
> Follow us on Twitter @cloudcomp_group

> Post Job/Resume athttp://cloudjobs.net
> Get hundreds of conference sessions and panels on cloud computing on DVD athttp://www.amazon.com/gp/product/B002H07SEC,http://www.amazon.com/gp/product/B004L1755W,http://www.amazon.com/gp/product/B002H0IW1Uor GET instant access to
> downloadable versions at
>
> -http://www.up-con.com/register
> -http://cloudslam09.com/content/registration-5.html
> -http://cloudslam10.com/content/registration

>
> ~~~~~
> You received this message because you are subscribed to the Google Groups "Cloud
> Computing" group.
> To post to this group, send email to cloud-c...@googlegroups.com
> To unsubscribe from this group, send email to
> cloud-computi...@googlegroups.com
> --
> ~~~~~

> 3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View, CA *http://cloudslam.org

> UP 2010 Conference:http://www.up-con.com
> Posting guidelines:http://bit.ly/bL3u3v
> Follow us on Twitter @cloudcomp_group

> Post Job/Resume athttp://cloudjobs.net
> Get hundreds of conference sessions and panels on cloud computing on DVD athttp://www.amazon.com/gp/product/B002H07SEC,http://www.amazon.com/gp/product/B004L1755W,http://www.amazon.com/gp/product/B002H0IW1Uor GET instant access to
> downloadable versions at
>
> -http://www.up-con.com/register
> -http://cloudslam09.com/content/registration-5.html
> -http://cloudslam10.com/content/registration

Jim Peters

unread,

Apr 26, 2011, 9:34:48 PM4/26/11

to cloud-c...@googlegroups.com

http://www.amazon.com/Breaking-Availability-Barrier-Survivable-Enterprise/dp/1410792323/ref=sr_1_2?ie=UTF8&qid=1303868062&sr=1-2-catcorr

Jim Peters
+1-415-608-0851 (Cell)
+1-416-466-9790 (Home)
+1-415-508-8651 (Google Voice)

Ian Mills

unread,

Apr 27, 2011, 4:57:48 AM4/27/11

to cloud-c...@googlegroups.com

Totally agree with one exception, I believe the laws of Physics, speed of light, means there never can be a full solution to the problem. Computers get ever faster and the inevitable two way speed of light time delay to a remote site every time you update your data is becoming more significant not less.

Ian Mills
0755 394 6958
Service to the customer, the pursuit of excellence, respect for the individual.

Jeanne Morain

unread,

Apr 28, 2011, 1:36:59 PM4/28/11

to cloud-c...@googlegroups.com

Jim - thanks for the book tip it looks very interesting.

Mark,
Thank you - we are definitely on the same page.. Forgive me as I should have been clearer - when I define systems it is not just looking at the server or applications but also the data. At the end of the day it boils down to system redundancy. Part of a good DR/COOP strategy is to provide redundancy for the entire system (including the database) whether hot or cold in a different location.

No it is not easy - but I have seen customers do some very creative things to insure that they have data redundancy. Particularly highly regulated ones such as banks, hospitals, and government bodies to replicate gigs of data to various pops for DR/COOP. Some of them were not pretty solutions mind you but the point here is this requirement is not new and until the IaaS/PaaS providers have something - IT should build this requirement into their cloud strategy.

The real lesson for IT is that there needs to be a system of checks with both IaaS and PaaS providers to insure that while they are consuming the responsibility of providing the "infrastructure" for the Cloud that they have systems in place to meet minimum requirements to insure app & data retention in their infrastructure, security, and auditing. The market is still to nascent for anyone to assume that this is handled.

There will be more bumps like this the key to insuring they don't upset the apple cart is to have a checklist and insure that whomever the provider is - they meet the minimum requirements to play.

Great posts - thanks.

Regards,
Jeanne
www.universalclient.blogspot.com
Twitter@ JeanneMorain
Visible Ops Private Cloud available on Amazon
http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dstripbooks&field-keywords=Jeanne+Morain&x=0&y=0

--- On Tue, 4/26/11, Jim Peters <j...@jamesgpeters.com> wrote:

Barr, Bill

unread,

Apr 28, 2011, 1:25:32 PM4/28/11

to cloud-c...@googlegroups.com

Actually, it comes down to realistic requirements. Does the application really require instantaneous replication? Is 100 or five 9s really necessary? etc.

So very many systems are completely over-spec'd.

From: cloud-c...@googlegroups.com [cloud-c...@googlegroups.com] On Behalf Of Ian Mills [i.c.r...@googlemail.com]
Sent: Wednesday, April 27, 2011 1:57 AM

Robert Jenkins

unread,

Apr 28, 2011, 2:14:30 PM4/28/11

to cloud-c...@googlegroups.com

@Bill

You are absolutely right and of course not having perfect replication isn't a valid reason for not having a practically acceptable level of replication (if near perfect isn't possible/cost effective) compared with a non-redundant set-up.

Jeanne Morain

unread,

Apr 28, 2011, 2:16:23 PM4/28/11

to cloud-c...@googlegroups.com

http://www.cioinsight.com/c/a/Security/ISACA-Security-Study-Compliance-Governance-Risk-Are-Top-Concerns-103335/?kc=CIOMINEPNL04272011

Regulatory compliance is only getting more complex - with the recent breaches from Sony and Data Loss from Amazon - it will be even more interesting to discover the impact on the cloud.

We cover some of the issues cited in the Appendices of our book for Visible Ops Private Cloud. I thought this article may be interesting for this group to gain a little more insight. Particularly as compliance was cited as one of the primary barriers for cloud bursting or implementing a hybrid cloud model (why we stopped at Private)....

Regards,
Jeanne

Visible Ops Private Cloud - available on Amazon
http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dstripbooks&field-keywords=Jeanne+Morain&x=0&y=0

Flávio R. C. Sousa

unread,

Apr 29, 2011, 8:29:47 PM4/29/11

to cloud-c...@googlegroups.com

Lessons Netflix Learned from the AWS Outage

http://techblog.netflix.com/2011/04/lessons-netflix-learned-from-aws-outage.html

Reply all

Reply to author

Forward