In regards to failover, does Google App Engine have some sort of a LoadBalancer API?
thanks
Wayne Yim
From: stuartc...@gmail.comSubject: Re: The Business of Building CloudsDate: Thu, 5 Jun 2008 09:07:12 -0700On 5-Jun-08, at 8:35 AM, Alan Ho wrote:Picking a provider that has data center failover is critical - but it does mean that you write your application in a way that can failover gracefully. Cloud providers need to provide the base infrastructure to do so OR constrain the user to a particular programming paradigm (like the limitations of google app engine)
That's a very astute observation, Alan. Constraining an architecture to induce certain properties (guarantees?) is likely the right approach.Though I wonder if AppEngine is a bit too "Nanny-ish" that limit its audience in ways that don't really impact the big picture qualities.For example, the choice of Python was easy because it was a standard Google language, but that doesn't seem to be inherently a more applicable language than say C#, Java or Ruby.I expect in the future that cloud computing systems will provide the concept of "cloud events" in case of major datacenter failures. I just don't see any way round it.
I wonder if Google actually provides this sort of failover for AppEngine today. Certainly, they could, though they provide no such guarantees at the moment.As for "cloud events" - yup. In the traditional data centre, it's likely SNMP or JMX traps. On the cloud, it's not entirely clear if/where SNMP would play. Or WS-Man. Or something newer (?).CheersStu
Things get really interesting when you need to do election leader decisions across data centers. E.g. If you are doing a big map-reduce task in one data center, it goes down, so you want to finish the task in another data center.
How does one transfer the task ? Is it even worth solving ?
Alan Ho
----------------
From: Reuven Cohen <r...@enomaly.com>
Sent: June 05, 2008 10:03 AM
To: cloud-c...@googlegroups.com
Subject: Re: The Business of Building Clouds
From what I've seen of Google App Engine, they distribute your python code to dozens of servers and then use some kind of round robin to spread the load. Nothing ground breaking.
r/c
On Thu, Jun 5, 2008 at 12:59 PM, wyim wyim <wingm...@hotmail.com <mailto:wingmanyim@hotmailcom> > wrote:
In regards to failover, does Google App Engine have some sort of a LoadBalancer API?
thanks
Wayne Yim
----------------
From: stuartc...@gmail.com <mailto:stuartc...@gmail.com>
To: cloud-c...@googlegroups.com <mailto:cloud-c...@googlegroups.com>
Subject: Re: The Business of Building Clouds
Date: Thu, 5 Jun 2008 09:07:12 -0700
On 5-Jun-08, at 8:35 AM, Alan Ho wrote:
Picking a provider that has data center failover is critical - but it does mean that you write your application in a way that can failover gracefully. Cloud providers need to provide the base infrastructure to do so OR constrain the user to a particular programming paradigm (like the limitations of google app engine)
That's a very astute observation, Alan. Constraining an architecture to induce certain properties (guarantees?) is likely the right approach.
Though I wonder if AppEngine is a bit too "Nanny-ish" that limit its audience in ways that don't really impact the big picture qualities.
For example, the choice of Python was easy because it was a standard Google language, but that doesn't seem to be inherently a more applicable language than say C#, Java or Ruby.
I expect in the future that cloud computing systems will provide the concept of "cloud events" in case of major datacenter failures. I just don't see any way round it.
I wonder if Google actually provides this sort of failover for AppEngine today. Certainly, they could, though they provide no such guarantees at the moment.
As for "cloud events" - yup. In the traditional data centre, it's likely SNMP or JMX traps. On the cloud, it's not entirely clear if/where SNMP would play. Or WS-Man. Or something newer (?).
Cheers
Stu
--
--
Reuven Cohen
Founder & Chief Technologist, Enomaly Inc.
www.enomaly.com <http://www.enomaly.com> :: 416 848 6036 x 1
skype: ruv.net <http://ruv.net> // aol: ruv6
blog > www.elasticvapor.com <http://wwwelasticvapor.com>
-
Get Linked in> http://linkedin.com/pub/0/b72/7b4 <http://linkedin.com/pub/0/b72/7b4>
I guessed that about google app engine too.
Things get really interesting when you need to do election leader decisions across data centers. E.g. If you are doing a big map-reduce task in one data center, it goes down, so you want to finish the task in another data center.
How does one transfer the task ? Is it even worth solving ?
Alan Ho
From: Reuven Cohen <r...@enomaly.com>
Sent: June 05, 2008 10:03 AM
To: cloud-c...@googlegroups.com
Subject: Re: The Business of Building Clouds
From what I've seen of Google App Engine, they distribute your python code to dozens of servers and then use some kind of round robin to spread the load. Nothing ground breaking.
r/c
On Thu, Jun 5, 2008 at 12:59 PM, wyim wyim <wingm...@hotmail.com> wrote:
In regards to failover, does Google App Engine have some sort of a LoadBalancer API?
thanks
Wayne Yim
From: stuartc...@gmail.com
Subject: Re: The Business of Building Clouds
Date: Thu, 5 Jun 2008 09:07:12 -0700On 5-Jun-08, at 8:35 AM, Alan Ho wrote:Picking a provider that has data center failover is critical - but it does mean that you write your application in a way that can failover gracefully. Cloud providers need to provide the base infrastructure to do so OR constrain the user to a particular programming paradigm (like the limitations of google app engine)
That's a very astute observation, Alan. Constraining an architecture to induce certain properties (guarantees?) is likely the right approach.Though I wonder if AppEngine is a bit too "Nanny-ish" that limit its audience in ways that don't really impact the big picture qualities.For example, the choice of Python was easy because it was a standard Google language, but that doesn't seem to be inherently a more applicable language than say C#, Java or Ruby.I expect in the future that cloud computing systems will provide the concept of "cloud events" in case of major datacenter failures. I just don't see any way round it.
I wonder if Google actually provides this sort of failover for AppEngine today. Certainly, they could, though they provide no such guarantees at the moment.As for "cloud events" - yup. In the traditional data centre, it's likely SNMP or JMX traps. On the cloud, it's not entirely clear if/where SNMP would play. Or WS-Man. Or something newer (?).CheersStu
--
--
Reuven Cohen
Founder & Chief Technologist, Enomaly Inc.
Start task 1, and start task 2 doing the same, but later, _not_
simultaneously. Cancel task 2, when you receive result from task 1.
Cancel task 1, when you receive the result from task 2.
The benefit is that task 2 will not spend the same amount of
time/CPU/resources, as task 1, when everything goes OK.
This will work only if cancelling tasks is supported. The timing of
starting task 2 is related to the amount of time you are prepared to
wait after the expected time of finishing task 1.
Sassa
2008/6/6 <ian...@gmail.com>:
To solve this problem one of the team, Dmitriy Samovskiy, created a
cross-VPN routing layer which we then open sourced as 'VCUBEV'. It's
a really simple and neat idea, and anyone else is welcome to use it.
There are a one or two gotchas to watch out for, such as latency
between sites, but we've found that for a capable administrator these
are manageable.
A good place to start is:
http://highscalability.com/manage-downtime-risk-connecting-multiple-data-centers-secure-virtual-lan
More details and downloadable code are here:
http://www.cohesiveft.com/Developer/VcubeV/VcubeV_-_Usecases/
Enjoy :-)
alexis
--
Alexis Richardson
+44 20 7617 7339 (UK)
+44 77 9865 2911 (cell)
+1 650 206 2517 (US)
Visit our website at http://www.nyse.com
*****************************************************************************
Note: The information contained in this message and any attachment to it is privileged, confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to the message, and please delete it from your system. Thank you. NYSE Euronext, Inc.
I can see how it is useful in some use cases.
----- Original Message ----
From: Alexis Richardson <alexis.r...@gmail.com>
To: cloud-c...@googlegroups.com
Sent: Friday, June 6, 2008 1:09:17 AM
Subject: Re: Reliability in of the cloud
http://highscalability.com/manage-downtime-risk-connecting-multiple-data-centers-secure-virtual-lan
http://www.cohesiveft.com/Developer/VcubeV/VcubeV_-_Usecases/
Enjoy :-)
alexis
--
Alexis Richardson
+44 20 7617 7339 (UK)
+44 77 9865 2911 (cell)
+1 650 206 2517 (US)
__________________________________________________________________
Get a sneak peak at messages with a handy reading pane with All new Yahoo! Mail: http://ca.promos.yahoo.com/newmail/overview2/
Visit our website at http://www.nyse.com
*****************************************************************************
Note: The information contained in this message and any attachment to it is privileged, confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to the message, and please delete it from your system. Thank you. NYSE Euronext, Inc.
The topic of reliability on the cloud can be fairly broad.
Just in this thread were discussing the issues of:
Many others refers to reliability in the cloud with reference to Amazon S3 and the complexity involved in setting up databases on the cloud.
Other would refer to reliability from a different perspective i.e. what happens if the cloud is down?
Each of those items deserve an entire discussion.
As Gavan noted many of those issues has been address by some of the data-grid providers (I happen to represent GigaSpaces). The reason is simple: The nature of the grid applications that are using those products in the financial industry have the requirements that I would view as a superset to those of cloud. The demand in this type of applications is to address reliability, consistency, performance and scalability at the same time. Unlike google and other solution that is commonly used for addressing similar requirements compromising on consistency and/or reliability was not an option.
As I noted earlier I'm not going to be able to cover how all this is achieved, instead I'll cover some of the core principles.
How do we address consistency and avoid split-brain scenario between separate clusters or data-centers:
Split-brain scenario starts when we maintain more then one copy of the data across separate network segments (data centers is just an example for such scenario).
The question is how do we ensure consistency when an update can happen simultaneously on each of the copies. If there is a network failure both can succeed and once the network connection is re-established we can get into inconsistent situation that we can't recover from.
There are various patterns for dealing with this scenario:
So what is the solution?
Take the master site and partition it between the sites – in this way each site will act as the master for certain part of the data.
All sites can maintain local copy of the data for read purposes. That means that the data is always available in case of a network failure.
In this way we can ensure consistency (there is central owner per data item through the entire cluster), scalability (through replication and partitioning), performance (replication can be made local).
Now let's look at the Amazon S3 issues.
One of the simplest solution that fits nicely with the cloud is to decouple the persistency layer from the application and use memory resources as the system of records.
I wrote a lengthy post targeted more specifically to MySQL but the same principles that I decided in that post applies here as well.
http://natishalom.typepad.com/nati_shaloms_blog/2008/03/scaling-out-mys.html
There is also an interesting opensource project that is using this pattern and built an In-memory data-grid (GigaSpaces in this specific case) synchronized with Amazon SimpleDB.
http://www.openspaces.org/display/EDS/External+Data+Source+by+Amazon+SimpleDB
You could use the same pattern to load data from your own local-site i.e. keep the data persistent at your own site and use the data-grid as the system of record for applications running in the cloud.
The data-grid will be responsible for keeping the data-grid in-sync with the local database. You can control the rate in which those two entities will be synchronized based on the performance, network latency and reliability requirements through configuration.
From obvious reasons I can speak more on GigaSpaces then I can speak of the other data-grid products such as Gemstone and Coherence but I believe that the principles are similar while the underlying implementation can still be very different. What I can say safely at this stage is that unlike pure data-grid products we see data-grid as a component in broader solution which we refer to as a Scale-out-application server (The equivalent of google AppEngine but for Java,.Net and C++).
Nati Shalom
CTO GigaSpaces
From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Gavan Corr
Sent: Friday, June 06, 2008 3:58
PM
To: cloud-c...@googlegroups.com
Subject: Re: Reliability in of the
cloud
There are a number of commercial data caching solutions in the market, Gemstone, Coherence (now from Oracle) and Gigaspaces, and to a lesser extent terracotta. of those, Gemstone is the only one I have seen successfully deployed in a large scale multi site environment to ensure consistency of data between multiple sites, and to do reliable failover if a node or a center fails. Hadoop is gaining interest but not there yet...
Gavan
"Talip Ozturk"
<ozt...@gmail.com>
Sent by: cloud-c...@googlegroups.com 09-06-08 10:21 AM
|
|
Gavan Corr <gc...@nyx.com>
Sent by: cloud-c...@googlegroups.com 10-06-08 05:07 PM |