Get a life and build redundancy/resiliency in your apps

122 views
Skip to first unread message

Ali, Saqib

unread,
Apr 21, 2011, 9:42:35 PM4/21/11
to cloud-c...@googlegroups.com
Poor amazon has gotten a lot of bad press today. But as if that's not enough, the so-called "analysts" are targeting the whole cloud computing paradigm. While the hybrid cloud vendors are touting their offerings, the nay sayers are rejoicing. 

I think these so-called experts need to go back to school and re-take the application design principles 101. Any recent computer science student knows the importance of building resiliency and redundancy in an app. A good design accounts for the the datastore being unavailable by including replicated datastores in the design. You would do the same even if the application was hosted in your datacenter, why treat amazon any different? Datacenters go down, whether they are internal or in the cloud. It's the reality. Build your applications accordingly.

saqib
@weaselese

Tim Crawford

unread,
Apr 21, 2011, 11:12:11 PM4/21/11
to cloud-c...@googlegroups.com
Agreed. I blogged about this very issue today.
--
~~~~~
3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View, CA * http://cloudslam.org
UP 2010 Conference: http://www.up-con.com
Posting guidelines: http://bit.ly/bL3u3v
Follow us on Twitter @cloudcomp_group
Post Job/Resume at http://cloudjobs.net
Get hundreds of conference sessions and panels on cloud computing on DVD at
http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B004L1755W, http://www.amazon.com/gp/product/B002H0IW1U or GET instant access to downloadable versions at
- http://www.up-con.com/register
- http://cloudslam09.com/content/registration-5.html
- http://cloudslam10.com/content/registration
 
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com

Robert Jenkins

unread,
Apr 22, 2011, 1:36:24 AM4/22/11
to cloud-c...@googlegroups.com
Agree with these comments. It isn't shocking that AWS has had a significant outage but the number of quite large web operations seemingly relying on one cloud vendor and one location no less. If it wasn't obvious before it is obvious now, don't built your web platform on one vendor and one location. A multi-vendor multi-location strategy needs to be employed in order to provide independence and the sort of reliability people are looking for.

It is worth noting that if any of these companies were managing their own clouds they would likely suffer such outages themselves from time to time and no-one would be that surprised.

Best wishes,

Robert

CTO
CloudSigma
--
Robert Jenkins
Co-Founder
CloudSigma AG
E: rob...@cloudsigma.com
T: www.twitter.com/CloudSigma
W: www.cloudsigma.com
Create a cloud server in 2 minutes: http://bit.ly/g3UuTN

====================
This email is from CLOUDSIGMA AG. The contents of this email and any attachments are confidential to the intended recipient. They may not be disclosed to or used by or copied in any way by anyone other than the intended recipient. If this email is received in error, please contact CLOUDSIGMA AG on +41 (0)44 585 39 07 quoting the name of the sender and the email address to which it has been sent and then delete it. Please note that neither CLOUDSIGMA AG nor the sender accepts any responsibility for viruses and it is your responsibility to scan or otherwise check this email and any attachments. CLOUDSIGMA AG is a public limited company registered in Canton Zürich, Switzerland (registered number CH-020.3.034.422-0) with registered offices at Sägereistrasse 29, 8152 Glattbrugg, Switzerland. For further information, please refer to www.cloudsigma.com .
====================

Pietrasanta, Mark

unread,
Apr 22, 2011, 7:41:36 AM4/22/11
to cloud-c...@googlegroups.com
The issue is that one outage effected to many people.  It's a pretty strong argument against moving to the cloud – you've got a lot of eggs in one basket across many companies.

And, if you have to build in full hot DR across vendors, a lot of the cost benefits start to trickle away.

Agree this is a wake up call for people using clouds, but it's also a wake up call for people considering clouds.

Zul Kagalwalla

unread,
Apr 22, 2011, 8:46:08 AM4/22/11
to cloud-c...@googlegroups.com
Agreed on the apps. But coming back to the main issue SLA, did amazon meet its SLA? I have not read enough yet and will do, but anyone know if the question on SLA has been answered or discussed?
 
It has to be the customers responsibility to build redundancy and availability for their apps unless they are a SaaS custmer.
 
IaaS provider would provide a infrastructure DR if it is part of SLA agreement.
Paas again would provide a platform DR if it is part of SLA agreement.
 
 


From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Pietrasanta, Mark
Sent: Friday, April 22, 2011 7:42 AM
To: cloud-c...@googlegroups.com

Robert Jenkins

unread,
Apr 22, 2011, 8:45:55 AM4/22/11
to cloud-c...@googlegroups.com
@Mark

I am afraid outages have nothing to do with the cloud and everything to do with the reliability of data centres, carriers etc. which are no more reliable whether used in conjunction with a cloud or your own dedicated infrastructure. There is no evidence that the cloud is any more prone to outages than dedicated solutions. A dedicated solution is no more reliable than a cloud if you only have one location (i.e. single point of failure).

The fact is, it is significantly easier with a cloud deployment to manage multiple locations than if you are dealing with physical hardware yourself. That doesn't of course mean that people using the cloud don't take one single location set-up and port it to the cloud. Often they do and that is why they now see down time with one location going down. Is that really a cloud problem or a single location problem? I think it clearly is the latter.

Kind regards,

Robert

Pietrasanta, Mark

unread,
Apr 22, 2011, 9:34:23 AM4/22/11
to cloud-c...@googlegroups.com
Right – my point was more about the consolidation of people into a smaller number of very large clouds.  I'm still not aware of what the specific EC2 problem was, but if it was in amazon's core cloud management infrastructure, then this same thing was less likely to happen with such a broad impact in a more traditional environment.  If it was a facility issue like power, or cooling, then perhaps it's a wash.

I'm a huge proponent of cloud, and it's what we're moving all our customers to.

But this outage highlights the issues with the massive public clouds – one core outage and you have thousands of customers out of commission.  In a more traditional hosting model, outages tended to be more isolated because much of the infrastructure was replicated for each customer.  Private clouds are sort of in the middle, with some core infrastructure but also some replicated infrastructure – and the costs are also in the middle.

And you're right that if you want to stay up, you need a DR/COOP plan.  The cloud is not DR/COOP (although it can make it easier/cheaper).  And this is a big wake up call for folks thinking they can just avoid paying for DR/COOP – those costs need to be factored into the move to the cloud, whether it's cross-cloud, or just multi-site within the same cloud.

Abhishekpamecha

unread,
Apr 22, 2011, 10:27:04 AM4/22/11
to cloud-c...@googlegroups.com, cloud-c...@googlegroups.com
I think the more pressing point is : who gets impacted as a result ?

Chances of all the ( yesterday's impaced businesses)  failing together is higher in the cloud than when hosted individually. And like some one said before, having a different vendor deployment for a reliable DR defeats the point of having a cloud in the first place. Two vendors: two standards: two relationships : .....  

Analogy : your own backyard generator  fails OR pg&e causes a brownout ? 

Abhishek 



iSent from my iPhone with iMstakes. 

Tim Crawford

unread,
Apr 22, 2011, 10:48:48 AM4/22/11
to cloud-c...@googlegroups.com
Mark,
I think you're hitting on the core issue at the end of your note. But I think folks are looking at this the wrong way.

Today, many people don't have a full DR/ BC plan because of cost and complexity. They may have some of the basics in place. But they're not prepared for catastrophic failure. This is namely because of cost. That's in their own data center(s).

Moving to the cloud should provide you with economic benefits in most cases. With that new-found economic value, one could apply for better DR/BC options. For example, compare the equivalent of running an app in your own data center vs. running it in two Amazon data centers. The benefit: You can leverage DR for less than you could do on your own.

Tim

Pietrasanta, Mark

unread,
Apr 22, 2011, 9:27:14 AM4/22/11
to cloud-c...@googlegroups.com
Unless people negotiated a special SLA, amazon's SLA is basically "we promise nothing, and can do anything we want, at any time".  So yes, they met their core SLA.

Date: Fri, 22 Apr 2011 05:46:08 -0700
To: "cloud-c...@googlegroups.com" <cloud-c...@googlegroups.com>
Subject: RE: [ Cloud Computing ] Get a life and build redundancy/resiliency in your apps

Scott Herson

unread,
Apr 22, 2011, 11:27:05 AM4/22/11
to cloud-c...@googlegroups.com
Amazon was fortunate to become the first real cloud IaaS.  We all know ec2 was just a solution to utilize excess hardware early on as Amazon never *intended* for be a force in hosting. The downside attached to that surprise, fast and unpredictable growth however is the cause of their on-going challenge and yesterdays and other common downtime.  The fact is, the growth they experienced early on and over the last few years was a surprise and that surprise prevented them from being able to plan and appropriately scale and architect their footprint with best practices in mind. So, what you effectively have at Amazon is a spaghetti of systems, networks and data centers.  Any customer knows this because of the internal latency they normally experience.  If you have been a customer of amazon for a while, you usually have a server here, a server there and another one over there....often hops from each other.

The good thing is that today's more modern IaaS cloud models have created a more localized, robust and honeycombed type of cloud infrastructure.  



On Apr 22, 2011, at 6:34 AM, Pietrasanta, Mark wrote:

Robert

On Fri, Apr 22, 2011 at 6:12 AM, Tim Crawford <tim.cr...@me.com> wrote:
Agreed. I blogged about this very issue today.
wp.me/p515c-6Z

Tim
@tcrawford

On Apr 21, 2011, at 6:42 PM, Ali, Saqib wrote:

Poor amazon has gotten a lot of bad press today. But as if that's not enough, the so-called "analysts" are targeting the whole cloud computing paradigm. While the hybrid cloud vendors are touting their offerings, the nay sayers are rejoicing. 

I think these so-called experts need to go back to school and re-take the application design principles 101. Any recent computer science student knows the importance of building resiliency and redundancy in an app. A good design accounts for the the datastore being unavailable by including replicated datastores in the design. You would do the same even if the application was hosted in your datacenter, why treat amazon any different? Datacenters go down, whether they are internal or in the cloud. It's the reality. Build your applications accordingly.

saqib
@weaselese


--
~~~~~
3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View, CA * cloudslam
UP 2010 Conference: up-con
Posting guidelines: bit.ly/bL3u3v

Follow us on Twitter @cloudcomp_group
Post Job/Resume at cloudjobs

Get hundreds of conference sessions and panels on cloud computing on DVD at

 
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com


--
~~~~~
3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View, CA * cloudslam
UP 2010 Conference: up-con
Posting guidelines: bit.ly/bL3u3v

Follow us on Twitter @cloudcomp_group
Post Job/Resume at cloudjobs

Get hundreds of conference sessions and panels on cloud computing on DVD at

 
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com



--
Robert Jenkins
Co-Founder
CloudSigma AG
E: rob...@cloudsigma.com
T: twitter/CloudSigma
W: cloudsigma
Create a cloud server in 2 minutes: bit.ly/g3UuTN

====================
This email is from CLOUDSIGMA AG. The contents of this email and any attachments are confidential to the intended recipient. They may not be disclosed to or used by or copied in any way by anyone other than the intended recipient. If this email is received in error, please contact CLOUDSIGMA AG on +41 (0)44 585 39 07 quoting the name of the sender and the email address to which it has been sent and then delete it. Please note that neither CLOUDSIGMA AG nor the sender accepts any responsibility for viruses and it is your responsibility to scan or otherwise check this email and any attachments. CLOUDSIGMA AG is a public limited company registered in Canton Zürich, Switzerland (registered number CH-020.3.034.422-0) with registered offices at Sägereistrasse 29, 8152 Glattbrugg, Switzerland. For further information, please refer to cloudsigma .
====================


--
~~~~~
3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View, CA * cloudslam
UP 2010 Conference: up-con
Posting guidelines: bit.ly/bL3u3v

Follow us on Twitter @cloudcomp_group
Post Job/Resume at cloudjobs

Get hundreds of conference sessions and panels on cloud computing on DVD at

 
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com

--
~~~~~
3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View, CA * cloudslam
UP 2010 Conference: up-con
Posting guidelines: bit.ly/bL3u3v

Follow us on Twitter @cloudcomp_group
Post Job/Resume at cloudjobs

Get hundreds of conference sessions and panels on cloud computing on DVD at

 
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com



--
Robert Jenkins
Co-Founder
CloudSigma AG
E: rob...@cloudsigma.com
T: twitter/CloudSigma
W: cloudsigma
Create a cloud server in 2 minutes: bit.ly/g3UuTN

====================
This email is from CLOUDSIGMA AG. The contents of this email and any attachments are confidential to the intended recipient. They may not be disclosed to or used by or copied in any way by anyone other than the intended recipient. If this email is received in error, please contact CLOUDSIGMA AG on +41 (0)44 585 39 07 quoting the name of the sender and the email address to which it has been sent and then delete it. Please note that neither CLOUDSIGMA AG nor the sender accepts any responsibility for viruses and it is your responsibility to scan or otherwise check this email and any attachments. CLOUDSIGMA AG is a public limited company registered in Canton Zürich, Switzerland (registered number CH-020.3.034.422-0) with registered offices at Sägereistrasse 29, 8152 Glattbrugg, Switzerland. For further information, please refer to cloudsigma .
====================


--
~~~~~
3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View, CA * cloudslam
UP 2010 Conference: up-con
Posting guidelines: bit.ly/bL3u3v

Follow us on Twitter @cloudcomp_group
Post Job/Resume at cloudjobs

Get hundreds of conference sessions and panels on cloud computing on DVD at

 
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com

--
~~~~~
3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View, CA * cloudslam
UP 2010 Conference: up-con
Posting guidelines: bit.ly/bL3u3v

Follow us on Twitter @cloudcomp_group
Post Job/Resume at cloudjobs

Get hundreds of conference sessions and panels on cloud computing on DVD at

 
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com



Scott Herson
Joyent Cloud / node.js 
Market Development Manager - Americas




Geoff Arnold

unread,
Apr 22, 2011, 11:50:24 AM4/22/11
to cloud-c...@googlegroups.com
Apart from the first sentence, almost every assertion in this first paragraph is completely false.

3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View, CA * http://cloudslam.org
UP 2010 Conference: http://www.up-con.com
Posting guidelines: http://bit.ly/bL3u3v
Follow us on Twitter @cloudcomp_group
Post Job/Resume at http://cloudjobs.net

Get hundreds of conference sessions and panels on cloud computing on DVD at

Raj Badarinath

unread,
Apr 22, 2011, 12:38:17 PM4/22/11
to cloud-c...@googlegroups.com
Something to think about...

A point came up in discussion with a CIO yesterday of a ~1B process manufacturing company, who refuses to go into the cloud. His explanation: " All datacenters will fail, whether on-prem or on-cloud. The difference is that I can go to my business and tell them, either you can hold your breath for 10 minutes every week, or for over an hour once every year. Which will it be?"

Additionally, his other point was that SLAs compensate for the price one pays for the actual service itself, not the hard costs of lost revenues which may run into the millions of dollars.

It does come back to the point that Robert is making below - this is a DR failure, and companies that do not consider multi-sourcing in the cloud world risk running into this scenario over and over - one that no single cloud provider can solve.

-Raj


Follow me on Twitter:@rajmatazz

sudhendra Seshachala

unread,
Apr 22, 2011, 4:32:23 PM4/22/11
to cloud-c...@googlegroups.com
All great posts and do appreciate the knowledge share that goes on this group.

So in a nutshell what are the lessons we could learn?
1. Leverage multiple cloud providers and each in different zones?
2. Have a disaster management plan at PaaS layer?
3. Is on-premise data center better than public cloud such as IaaS?
4. Build/Leverage Hybrid cloud setup ?

In case of EC2 - it was a failure in the switch or something right? Stuff like that happens all the time in a data center.

Would be interested to learn more...


Sudhi
Thanks & Regards
Sudhi
Founder & Chief Architect
Hooduku Inc
Plexicloud
1.888.262.8389
http://www.hooduku.com
http://www.plexicloud.com



Jeanne Morain

unread,
Apr 22, 2011, 1:51:45 PM4/22/11
to cloud-c...@googlegroups.com
There are a lot of great points on this thread.  Like any disruptive paradigm shift the movement to the "Cloud" will take it's bumps and bruises.  The unfortunate thing about Amazon's Cloud being the one to take the first hit is the branding and overall name recognition of the Amazon Cloud itself will lead to more fear, uncertainty and doubt.

Delivering applications and hosting websites on the Internet is not new - we all know that.  Companies like Akamai, Limelight, and IP Services have been providing hosting services for well over a decade.  Applications like Intuit (Quicken, TurboTax), EA etc have had this model for well over 15 years.  The outage is not a reflection of the "Cloud" per se but that of the overall capacity of a specific data center in which those sites were hosted.  All of us that have been in IT industry for more than a handful of years have seen failure at some point. 

While the media is hyping up a technical glitch - we should all ask - were there any user errors involved?  As noted in the multiple threads on the topic - did the users build out a DR site at a different location (best practice) or not?

Amazon may have expanded more quickly than they anticipated - as with any highly publicized glitch - they will analyze, address and move forward.  Yes there are other providers but let us not forget - having been one of the largest retailers for quite a while - they have achieved operational excellence, built in redundancy, and built a great brand because of it. 

Most of the companies I interviewed for my book are considering Amazon or a 3rd party ISV (Akamai and Limelight) over PaaS providers to circumvent vendor lock in and control future costs.  

For those interested - Visible Ops Private Cloud is available on AMAZON :-)
 - http://www.amazon.com/Visible-Ops-Private-Cloud-Virtualization/dp/0975568639/ref=sr_1_1?ie=UTF8&s=books&qid=1303489642&sr=8-1 )  -

This thread is very insightful and an interesting read. 
Cheers,
Jeanne Morain
www.universalclient.blogspot.com
twitter: JeanneMorain


--- On Fri, 4/22/11, Scott Herson <she...@joyent.com> wrote:
3rd Annual Cloud Slam 2011 Conference * April 18-22, 2011 * Mountain View, CA * http://cloudslam.org
UP 2010 Conference: http://www.up-con.com
Posting guidelines: http://bit.ly/bL3u3v
Follow us on Twitter @cloudcomp_group
Post Job/Resume at http://cloudjobs.net

Get hundreds of conference sessions and panels on cloud computing on DVD at

Pietrasanta, Mark

unread,
Apr 22, 2011, 7:13:27 PM4/22/11
to cloud-c...@googlegroups.com
I think that's probably overkill.  DR is good, but multi-site/zone is probably enough.  The problem with multi-vendor, especially with today's vendor lock-in, is that makes it a lot harder to do things like data replication, and transaction load balancing, which you'd typically want across your main/DR sites.

Basically, have a DR plan that:
  1. You're willing to pay for
  2. You're willing to accept the risk of
You can spend as much money as you want on DR, so it's really about the risk you're willing to accept versus the money you're willing to spend on it.

Strangely, a lot of big names didn't seem willing to spend anything on DR/COOP.

Pietrasanta, Mark

unread,
Apr 22, 2011, 7:11:03 PM4/22/11
to cloud-c...@googlegroups.com
"Most of the companies I interviewed for my book are considering Amazon or a 3rd party ISV (Akamai and Limelight) over PaaS providers to circumvent vendor lock in and control future costs."

I'm confused by that statement:
  • Amazon does kind of lock you in, in that it's not super simple to get your VMs out.  You can, but it's not like the private providers (e.g. Terremark) where it's basically a normal VHD you can just grab and run.
  • Akamai/Limelinght are CDNs, which you'd use on top of your IaaS provider (or use the CDN built-in to the IaaS provider if they offer one, like Amazon does).  But you can't use Akamai instead of Amazon, you'd still need some place for your infrastructure.

Ken North

unread,
Apr 22, 2011, 9:00:56 PM4/22/11
to cloud-c...@googlegroups.com
Mark Pietrasanta wrote:

>> Unless people negotiated a special SLA, amazon's SLA is basically
"we promise nothing, and can do anything we want, at any time".

From the Amazon EC2 SLA at https://aws.amazon.com/ec2-sla/:

"AWS will use commercially reasonable efforts to make Amazon EC2
available with an Annual Uptime Percentage (defined below) of at least
99.95% during the Service Year.

In normal usage, 99.95% uptime means 4.38291 hours of downtime in a
year.

The Amazon SLA states:

“Annual Uptime Percentage” is calculated by subtracting from 100% the
percentage of 5 minute periods during the Service Year in which Amazon
EC2 was in the state of “Region Unavailable.

“Region Unavailable” and “Region Unavailability” means that more than
one Availability Zone in which you are running an instance, within the
same Region, is “Unavailable” to you. "


Ken North
________________
www.kncomputing.com
@knorth2
kncomputing.tumblr.com

 

Robert Jenkins

unread,
Apr 23, 2011, 3:27:29 AM4/23/11
to cloud-c...@googlegroups.com
I wouldn't agree with the CIO that using the cloud implies somehow less frequent but longer outages. It really depends as anything on the implementation of the cloud versus the in-house implementation, it could really be either way around. I do appreciate that many people feel better when they have an outage with their own infrastructure because they are in control (sort of!). As I outlined before, there is nothing intrinsic about the cloud and outages. Actually the main difference is that when a cloud has an outage, everyone knows. How many times to internal systems go down in a company? Externally often such outages are not known and certainly the company doesn't post something on their website and issue a press release. My point is, there are many psychological and perceptive aspects to the move to the cloud which come very much to the fore when you see an outage as AWS had recently.

One more point, AWS takes a strategy of using relatively low tier data centres with limited networking and other redundancy with the idea that you have multiple locations. Other cloud vendors are in higher specification centres and are less likely to suffer outages but of course they will eventually so customers should also look at the location strategy of the cloud vendor and make sure that fits with their own set-up plans. That needs openness from the IaaS vendor and sadly many are not willing to provide that sort of detail that the customer needs to make an informed decision. What customers need is effective a price/performance/reliability metric in order to make an informed decision.

Cheers,

Robert
CTO
CloudSigma
Reply all
Reply to author
Forward
0 new messages