Cloud Computing and Multi-tenancy

36 views
Skip to first unread message

Jim Starkey

unread,
Apr 12, 2009, 12:19:24 PM4/12/09
to cloud-c...@googlegroups.com
When you get down to brass tacks, cloud computing is all about
multi-tenancy -- the ability of multiple applications from multiple
organizations to share computing resources. The economic model and
ownership of the resources is irrelevant. Cloud computing is about sharing.

The history of multi-tenancy is something like this. In the beginning
of computing, there was batch -- serial multi-tenancy. Robust but
ludicrously wasteful of meager resources. Then came timesharing (too
weak for production work) and transaction monitors (rigid and unsuitable
for interactive applications). The advent of cheaper computers lead to
the decomposition of applications into client and server components, and
server computing emerged. Initially the clients were specified ad hoc
programs, but the trended towards generic clients such as browsers. The
client side, however, did some computing, became primarily the human
interface. Behind the scenes, servers evolved.

The first generations of servers evolved from timesharing systems
initially designed for multi-tenancy, systems built by disciplined
professional engineers like Unix and VMS. The operating environment
developers worked for companies that competed on stability, and binary
compatibility across upgrades was the golden rule. New versions meant
new capabilities and new bugs, but running applications were expected to
continue to run without change. Big switcheroos were hidden by
coexisting versioned libraries and the like.

As PCs grew up, got faster, larger, and more reliable, the traditional
"soup to nuts" vertically integrated computer company were gradually
replaced by industry of independent hardware and software companies.
Rather than buying a VAX and getting VMS, companies bought computers
from A and operating software from B.

Operating systems, however, are large, expensive to develop and
maintain, and are remarkably unglamorous. Operating system development
was once funded from the high margins of proprietary hardware.
Commodization of computers cut deeply into margins. And, more
significantly, differentiation in operating systems went from an plus to
a minus as strong standards evolved. A few companies tried to ride the
emerging server wave -- SCO, OSF, System V, and Netware (ugh) -- but
none succeeded. What did succeed was open source operating systems --
Linux, BSD, FreeBSD, NetBSD, xBSD, yBSD, zBSD (the BSD proliferation
demonstrates Achilles 's heal of open source -- the inability of
hormonally challenged, angry, white, 27 year old males to agree on
anything but a clone or a standard).

So, after much evolution and many die-offs, we have an industry of
commodity hardware, one mega-successfully operating system vendor, and
an equally successful free operating system available in a dozen tints.

Linux is wonderfully reliable and incredibly efficient. But it is also
a wretched vehicle for multi-tenancy. Linux is a mono-theistic
operating system: There is one god, and his name is "root". To install
and manage an application or do anything of significance, you must be
root. But if you're root, you can do anything. So it just isn't
feasible to delegate rights to manage a single application. A second
equally serious problem is that open source developers, unlike their
disciplined predecessors, pride them selves on version to version
incompatibility, believing that all true software is built from source.
This means that upgrading, say, Apache or MySQL for application A is
almost guaranteed to take out application B.

It didn't take very long for organizations to realize that although
Linux was free, Linux administration was not, and something had to
give. What gave was multi-application tenancy on Linux servers. It was
much cheaper to buy a box for each application than to administer two or
more applications on a single box. The was wildly successful in solving
the problem and, incidentally, a great boon to Silicon Valley. This was
server sprawl, and while inefficient, had a solid economic
justification. The bean counters, however, discovered miniscule machine
utilization, which they abhorred.

Enter machine virtualization and hypervisors. Hypervisors address the
problems of server sprawl and under-utilization (I do hope the bean
counters don't discover how little oxygen we inhale is actually used).
They support multi-tenancy on servers while providing a degree of
administrative control not possible with physical servers. On the other
hand, they do nothing positive for scalability. Machine virtualization
is a dandy way to consolidate low volume applications that require more
presence than performance onto a single piece of hardware. Applications
that exceed the capabilities of single box are another story.
Virtualization gives them nothing but overhead, leaving the problem of
scaling to te application developers. Virtualization also does nothing
to simplify the administration of individual applications; in many ways
it make it worse since an administrator must manage both the application
and the host operating system than runs within a VM. Finally,
virtualization is relatively poor on resource sharing, requiring memory
to each of the host operating systems, having schedulers fighting
schedulers, and inducing overhead on every I/O operation (even dumb
operating systems know how to share executables and libraries).

There are other forms of multi-tenancy besides transaction monitors,
time sharing, and machine virtualization. The basic rules are simple:

1. One tenant should not be able to detect the presence of another
tenant.
2. One tenant should not be able to adversely affect another tenant.
3. Each tenant must see a stable environment

There are many ways to achieve this. Java comes very close in a single
JVM and succeeds completely with multiple JVMs. The Google application
engine does it, as does Azure. The common denominator is that in each
system an application runs in a rigorously defined sandbox from which
there is no escape. The Google app engine and Azure go a step further,
however, and support a sufficiently high level application definition to
permit automatic instantiation and failover. All all absolve
application administrators from system administration and all enable
application transparent infrastructure upgrade.

The Google and Microsoft models are just the vanguard of whole class of
controlled application infrastructures. All will have sandboxes,
rigorous APIs, load balancing, automatic instantiation, and failover.
All with compete to reduce the total cost applications -- development,
hosting, and administration. Some will be hosted services and other
will be available for private and small scale clouds. And some will
prevail and others will die.

Does virtualization of a future in this world? During the transition,
certainly. But in a data center where 90% of the servers are running a
common application infrastructure hosting sandboxed applications, it has
nothing to offer.


Miha Ahronovitz

unread,
Apr 12, 2009, 1:24:11 PM4/12/09
to cloud-c...@googlegroups.com
Jim, multi-tenancy is a design requirement and virtualization is one of the tools to  achieve it.
What is the problem here? If we have something better to replace virtualization, be it!

Also multi-tenancy (as security) is a corollary of the way the grids are perceived by end users:

> 1. A user will always have all resources s/he needs
> 2. A user will pay only for what it uses
> 3. The applications are delivered as an easy to use service
> 4. The users do not want to know what is going inside the cloud

Your post is excellent, I can not make contributions, except listening carefully to what you propose and others propose to achieve ultimately the four objectives above.

Cheers

Miha
http://www.sun.com/offers/details/sun_grid_engine.html



From: Jim Starkey <jsta...@NimbusDB.com>
To: cloud-c...@googlegroups.com
Sent: Sunday, April 12, 2009 9:19:24 AM
Subject: [ Cloud Computing ] Cloud Computing and Multi-tenancy

Miha Ahronovitz

unread,
Apr 12, 2009, 1:24:11 PM4/12/09
to cloud-c...@googlegroups.com
Jim, multi-tenancy is a design requirement and virtualization is one of the tools to  achieve it.
What is the problem here? If we have something better to replace virtualization, be it!

Also multi-tenancy (as security) is a corollary of the way the grids are perceived by end users:

> 1. A user will always have all resources s/he needs
> 2. A user will pay only for what it uses
> 3. The applications are delivered as an easy to use service
> 4. The users do not want to know what is going inside the cloud

Your post is excellent, I can not make contributions, except listening carefully to what you propose and others propose to achieve ultimately the four objectives above.

Cheers

Miha
http://www.sun.com/offers/details/sun_grid_engine.html


From: Jim Starkey <jsta...@NimbusDB.com>
To: cloud-c...@googlegroups.com
Sent: Sunday, April 12, 2009 9:19:24 AM
Subject: [ Cloud Computing ] Cloud Computing and Multi-tenancy


Rao Dronamraju

unread,
Apr 12, 2009, 1:47:08 PM4/12/09
to cloud-c...@googlegroups.com

Yes, it is about sharing resources, there is no doubt about it but....

"The economic model and ownership of the resources is irrelevant."

Not at all, it is about sharing resources PROFITABLY....otherwise, we might
as well leave it to Fannie Mae, Freddie Mac, or Medicare and Medicaid to do
it...

Profitable multi-tenancy is all around us....apartments.....car
rentals...when you fly in a plane it is multi-tenancy/pay-per-use....but
none of them are DEVOID of an ECONOMIC MODEL....

So I do not understand what you mean by economic model and ownership is
irrelevant.

Ownership is equally relevant because economic model consequently the
profits and the very survivability of multi-tenancy the RIGHTS to the
profits all depend on the ownership model.


-----Original Message-----
From: cloud-c...@googlegroups.com
[mailto:cloud-c...@googlegroups.com] On Behalf Of Jim Starkey
Sent: Sunday, April 12, 2009 11:19 AM
To: cloud-c...@googlegroups.com
Subject: [ Cloud Computing ] Cloud Computing and Multi-tenancy


Rao Dronamraju

unread,
Apr 12, 2009, 1:56:58 PM4/12/09
to cloud-c...@googlegroups.com

Multi-tenancy is an economic model to maximize profits…..

 

Virtualization is a technological innovation to the means (multi-tenancy) to an end (Profits)

 


From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Miha Ahronovitz


Sent: Sunday, April 12, 2009 12:24 PM
To: cloud-c...@googlegroups.com

Alexis Richardson

unread,
Apr 12, 2009, 5:56:58 PM4/12/09
to cloud-c...@googlegroups.com
Jim

Excellent post.

You say that:

"in a data center where 90% of the servers are running a common
application infrastructure hosting sandboxed applications, it has
nothing to offer."

You also say that:

"cloud computing is all about multi-tenancy -- the ability of multiple
applications from multiple organizations to share computing resources.
The economic model and ownership of the resources is irrelevant.
Cloud computing is about sharing."

I take it that you mean the business and revenue models for *how*
resources are shared is irrelevant. The economic model is - surely -
"the economics of sharing". At least, that is: for "public" clouds,
to which I assume you refer.

So you seem to be saying that the (beneficial) economics of sharing
can ultimately be provided without multitenancy requiring
virtualization. Is that what you are saying? You give examples such
as the JVM as an alternative; ie. that a cloud provider hosting JVMs
can deliver sharing at scale, and consequently economics of scale for
shared apps on JVMs.

Is that a correct version of your view?

alexis

Jim Starkey

unread,
Apr 13, 2009, 7:18:33 AM4/13/09
to cloud-c...@googlegroups.com
Rao Dronamraju wrote:
> Yes, it is about sharing resources, there is no doubt about it but....
>
> "The economic model and ownership of the resources is irrelevant."
>
> Not at all, it is about sharing resources PROFITABLY....otherwise, we might
> as well leave it to Fannie Mae, Freddie Mac, or Medicare and Medicaid to do
> it...
>
>
There are many economic models that can support cloud computer.
Rent-a-VM a la EC2 is only one. Utilizing existing corporate assets is
another. Google can find ad revenue anywhere, so count that as a
third. Charging for application usage as opposed to hosting is
another. There are dozens more.

There has to be can economic basis. Hosting charges is simply the most
obvious of many schemes.

Jim Starkey

unread,
Apr 13, 2009, 7:31:24 AM4/13/09
to cloud-c...@googlegroups.com
The business model and hosting technology have to be synergistic. But
that's it. Public cloud, private cloud, application specific cloud --
it's all the same. The technology doesn't care how the meters work.

Yes, I am arguing that multi-tenancy without virtualization is a better,
cheaper, more efficient technology than virtualization.

Java can be part of a solution, but it isn't sufficient by itself. An
application is more than code. An abstract application platform
requires a comprehensive packaging mechanism that include code,
templates, any local files, and policy. Everything required, in short,
to instantiate and execute the application.

Technically, Java falls a little short. Two applications in a single
JVM can collide on class name and can interact through public class
variables. I admire Java, but separate JVMs are necessary, which takes
some of the lustre off.

Jan Klincewicz

unread,
Apr 13, 2009, 9:23:22 AM4/13/09
to cloud-c...@googlegroups.com
Aren't JVMs Virtualization" by definition ??  I am PRETTY sure that is what the "V" stands for ....
--
Cheers,
Jan

Pietrasanta, Mark

unread,
Apr 13, 2009, 10:03:05 AM4/13/09
to cloud-c...@googlegroups.com

No, JVMs are virtualization of the java environment, not virtualization of the platform.  JVMs are interpreter engines, nothing more.

 

This underscores why everyone is calling things “cloud”, since apparently just using a word blurs the reality for most people.

 

From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Jan Klincewicz
Sent: Monday, April 13, 2009 9:23 AM
To: cloud-c...@googlegroups.com
Subject: [ Cloud Computing ] Re: Cloud Computing and Multi-tenancy

 

Aren't JVMs Virtualization" by definition ??  I am PRETTY sure that is what the "V" stands for ....

Jim Starkey

unread,
Apr 13, 2009, 10:16:01 AM4/13/09
to cloud-c...@googlegroups.com
Jan Klincewicz wrote:
> Aren't JVMs Virtualization" by definition ?? I am PRETTY sure that is
> what the "V" stands for ....
>
Not the same sense as hypervisor scheduled VMs. They are virtual
machines in the sense that a JVM implements the Java byte code
instruction set. A Java Virtual Machine is just a process -- or even
just a library -- running inside of an ordinary operating system.

--
Jim Starkey
President, NimbusDB, Inc.
978 526-1376

dave corley

unread,
Apr 13, 2009, 11:22:26 AM4/13/09
to cloud-c...@googlegroups.com
Interesting thread. Thanks, Jim, for the important discussion and the historical context. You said...


"in a data center where 90% of the servers are running a common
application infrastructure hosting sandboxed applications, it has
nothing to offer."

I agree. Expected max resource utilization during peak utilization periods dictates overall capex for the data center. Any demonstrated inefficiencies and the bean counters get angry. If the application shared across the aggregate platforms is common, the peak usage typically occurs at the same time for the "average" user. Analogy is public phone system. Phone service companies build out an deploy their application platforms (switches) and their transport infrastructure (trunks) to limit probability of dropped calls to an acceptable, non-zero value. Because the application is the same (voice), the average to peak resourceutilization ratio is relatively low. There is some probability that calls to Mom on Mother's Day (peak public phone system utilization day in the US) will not be completed. But the user simply waits a few seconds and re-dials. Callers are happy enough with the service on these peak days and the phone company bean counters can point to usage stats and dropped call rates to validate their financial models that argue against massive over-provisioning of resources to accommodate the peak.

The phone company is restricted in the capabilities of its infrastructure. The switch can only do one thing - switch and route voice calls. Its transport infrastructure can only do one thing - transport voice. While the phone services companies have attempted to use the same resources for data ... the resource utilization, particularly for the voice switch, is a one-trick pony.

Cloud computing's un-realized benefit is the potential of usage of multiple applications simultaneously whose utilization profile is a mix of applications with in-synch, out-of synch and contra-synch utilization characteristics in the time domain. As the number and distribution of these diverse application types expands, the statistical models will predict better average to peak utilization ratios. So, while cloud service providers will still have to plan for peak utilizatrion and an acceptable probability of lost service, they can do so with lower cost of infrastructure (capex) and lower cost of operations (opex). Another anaogy is diversification of personal wealth portfolios. This statistical model optimizes the economies of scale and favors large public cloud service providers or large enterprises with large, diverse private clouds.

But there's a catch, the statistical model does not, account for the punctuated equilibrium effect. Catastrophic events (500 year flood, Tunguska event) that are chaotic in their frequency and unpredciatable in their magnitude can overwhelm the best of resource optimization planning. But, if chaotic events are eliminated from the discussion, Jim's point is the crux of cloud computing's potential value...economies of scale lead to better average to peak utilzation ratios, so capex and opex can drop if the sample size of applications in a particular cloud is large and their utilization profiles are diverse.

Dave

Tarry Singh

unread,
Apr 13, 2009, 11:58:19 AM4/13/09
to cloud-c...@googlegroups.com
Nice write-up, Jim. Do remember virtualization means a lot of things to a lot of people. I don't want to lecture you guys here on all those forms of virtualization, I'm sure you have heard/preached enough of that yourselves.

You need to slice it at the consumer end and for now 1-0-1 or x-0-x slicing with Virtualization, representing the siloed multi-tenant version at the data center, might be helpful but I doubt it myself if it will be the norm when apps will move to next generation and will all become metered RIA/RWA apps. Then all that unneccesary and worthless encapsulation and trapping of instruction sets of unneccesary baggage (read os and all other crapoware like avs, patches and backup , all bundled in a virt-appliance) would be history.
--
Kind Regards,

Tarry Singh
______________________________________________________________
Founder, Avastu Blog: Research-Analysis-Ideation
"Do something with your ideas!"
Business Cell: +31630617633
Private Cell: +31629159400
LinkedIn: http://www.linkedin.com/in/tarrysingh
Blogs: http://www.ideationcloud.com

John D. Mitchell

unread,
Apr 13, 2009, 2:14:07 PM4/13/09
to cloud-c...@googlegroups.com
Hi Jim,

On Apr 13, 2009, at 04:31 , Jim Starkey wrote:
[...]


> Yes, I am arguing that multi-tenancy without virtualization is a
> better,
> cheaper, more efficient technology than virtualization.

Depends on what you're doing and what you're counting in the total cost.

I.e., virtualizing MS Windows pays off big for lots of reasons even if
it's is less efficient in terms of hardware utilization.

On Unix-ish systems, I've done it both ways and there are goods and
bads to both approaches. It really depends on lots of different
tradeoffs not the least of which is people's time, experience, and
competencies.

It's interesting to see how the old school apps are much more often
well-behaved sharing the bare metal while the tendency is for all of
the big, fancy, and all too over-complicated modern monstrosities tend
to have so much crap baked in that virtualization ends up being the
easiest way to manage them.

> Java can be part of a solution, but it isn't sufficient by itself. An
> application is more than code. An abstract application platform
> requires a comprehensive packaging mechanism that include code,
> templates, any local files, and policy. Everything required, in
> short,
> to instantiate and execute the application.
>
> Technically, Java falls a little short. Two applications in a single
> JVM can collide on class name and can interact through public class
> variables. I admire Java, but separate JVMs are necessary, which
> takes
> some of the lustre off.

Indeed. The sillinesses involving perm-gen space and classloaders is
a real PITA. The biggest performance constraint on multi-tenant Java
applications is the GC.

Take care,
John

John D. Mitchell

unread,
Apr 13, 2009, 2:17:02 PM4/13/09
to cloud-c...@googlegroups.com
On Apr 13, 2009, at 07:03 , Pietrasanta, Mark wrote:
[...]

> JVMs are interpreter engines, nothing more.

Um, er, no... the JVM is a virtual runtime environment. Things like
HotSpot are a lot more sophisticated than just being an interpreter.

> This underscores why everyone is calling things “cloud”, since
> apparently just using a word blurs the reality for most people.

So true.

Take care,
John

Christopher Steel

unread,
Apr 13, 2009, 2:54:33 PM4/13/09
to cloud-c...@googlegroups.com
You can avoid naming issues (and should do so) using different class loaders
within the same VM. Granted this is not as straightforward as you would
expect it to be, but it is not rocket science ether. I agree that the GC is
a large performance consideration, it always has been. On the positive side,
Java offers a rigorous security model that makes it a serious contender in
this space.
John,
I am not sure what you meant you meant in regards to the perm-gen space
(perhaps that it is shared)?

-Chris

-----Original Message-----
From: John D. Mitchell [mailto:jdmit...@gmail.com]
Sent: Monday, April 13, 2009 2:14 PM
To: cloud-c...@googlegroups.com
Subject: [ Cloud Computing ] Re: Cloud Computing and Multi-tenancy


Rao Dronamraju

unread,
Apr 13, 2009, 5:05:37 PM4/13/09
to cloud-c...@googlegroups.com
"Yes, I am arguing that multi-tenancy without virtualization is a better,
cheaper, more efficient technology than virtualization."

I think people are getting confused with two different cloud concepts here.

1. Virtualization
2. Multi-tenancy

Multi-tenancy is achieved through virtualization simply because
virtualization isolates the resources of one tenant from another tenant. But
the primary function of virtualization is to virtualize the resources so
that multiple entities can share those resources. It happens that in the
case of public clouds, the entity happens to be tenants. In the case of
private clouds the entities are probably departments and divisions of the
same company. So multi-tenancy is a feature that "happens" because of
virtualization.

Multi-tenancy can also be achieved through non-hypervisor virtualization
technologies. For instance, the Grid Computing folks have been scheduling
jobs on resources in a non-hypervisor way for over 10 to 15 years.

In Y2000-2004, there were some companies that did provisioning of resources
without hypervisors, i.e provisioning on bare-metal dynamically.

But what makes the hypervisor technology far better suited for cloud
computing especially is the live migration of a running image from one
system to another, in other words the very foundation for elasticity.
(although no live migration is necessary for storage and network resources).

Now this can be done by bare metal provisioning also. But think of the
latency involved in the bare-metal provisioning technique. You need to move
the OS image from a SAN or a disk, boot it, configure it and commission it
into the network to make it a functioning system.

With hypervisors you do not have this latency, not at least this much. The
Hypervisors are pretty efficient in moving a running system and with elastic
IP you will have a running system very quickly, seamlessly up and running
under the covers. Why does this matter?...

If you have an e-commerce business and you have thousands or hundreds of
thousands of customers performing transactions over the web/internet and if
you need resource elasticity or failover, the hypervisor based
virtualization has a distinct performance advantage in terms of latency over
the bare-metal provisioning.

I am not even sure why we are discussing, JVM in this context.

JVM is only an CPU/instruction set virtualizer. It is basically an
interpreter that generates byte code which is nothing but intermediate
language that is re-mapped to different machine architectures by the JRE
which is specific to a machine architecture. So JVM's facilitates platform
independency for code portability.

I guess in a way you can say JVM also facilitates live migration because if
you move the Java byte code of an application autonomically and dynamically
based on resource requirements from one platform to another, yes you are
doing pretty much what hypervisor is doing except the purposes are
different.

With JVM you can only do live migration with applications and middleware
whereas with hypervisors, you can live migrate the entire OS and above
stack.

It will be interesting to know from Grid folks, what is their latency and
performance characteristics of grid applications especially those that are
latency sensitive during live migration of jobs from one grid cluster to
another.





-----Original Message-----
From: cloud-c...@googlegroups.com
[mailto:cloud-c...@googlegroups.com] On Behalf Of Jim Starkey
Sent: Monday, April 13, 2009 6:31 AM
To: cloud-c...@googlegroups.com
Subject: [ Cloud Computing ] Re: Cloud Computing and Multi-tenancy


John D. Mitchell

unread,
Apr 13, 2009, 6:47:35 PM4/13/09
to cloud-c...@googlegroups.com
On Apr 13, 2009, at 11:54 , Christopher Steel wrote:
[...]

> You can avoid naming issues (and should do so) using different class
> loaders
> within the same VM. Granted this is not as straightforward as you
> would
> expect it to be, but it is not rocket science ether.

Yes and no (I've written a few articles about it somewhere :-). There
are plenty of nasty problems in (over-)retention when one tries to get
rid of old versions (when, for instance loading new versions). Trying
to robustly do "multi-tenants" in a single JVM is asking for trouble...

> I agree that the GC is
> a large performance consideration, it always has been.

It's not just performance. In a multi-tenant situation, a badly
behaving tenant can easily destroy all of the utility for all of the
tenants just by being a bad citizen w.r.t. to GC.

> On the positive side,
> Java offers a rigorous security model that makes it a serious
> contender in
> this space.

Yes and no. There is all of the help in terms of things like
references but by the time one takes care of all of the various
issues, you're basically right back to separate instances of the JVM.
Of course, there are lots of security issues that people don't always
(know enough to) care about YMMV (such as timing attacks and eating up
randomness).

> I am not sure what you meant you meant in regards to the perm-gen
> space
> (perhaps that it is shared)?

Perm-gen space is indeed shared on a per-instance of the JVM basis and
it's a fixed size. There was one simple bug in a library that was
required by a library used by a framework that one of our contractors
used that ate perm-gen space and it ended up taking forever to track
down (for a variety of reasons including crappy tools and crappy
people). Classloader issues have also come up that can eat perm-gen
space.

Have fun,
John

Jan Klincewicz

unread,
Apr 13, 2009, 7:00:29 PM4/13/09
to cloud-c...@googlegroups.com
We've been done the Grid vs. Hypervisor path many times here (I know Jim and I have.)  My conclusion is that Grid has some advantages (especially in that redundancy is at the software level allowing for bare-bones (cheap) servers as a platform.

Hypervisors, on the other hand, while requiring robust hardware to accommodate failures (too late for live migrations) can handle nearly all of-the-shelf software right now.

Both Grid and hypervisor-based VMs seem to be able to abstract from (or nearly ignore) hardware pretty well which is a boon to rapid deployments (because drivers cease to be a huge issue.)  "Bare Metal" is a lot easier on a vanilla VM than a proprietary server with vendor-supplied drivers.

Grids with which I am familiar (granted, from the old days of Beowulf clusters) tended to run mostly diskless with master nodes spawning images over the wire to slaves (which made them pretty elastic.)  VMs can be deployed (and re-deployed) pretty quickly by cloning but still require boot-time similar to physical boxes.

I suspect approaches both will have their respective places in CC near-term.  Both seem well-suited for multi-tenancy as well as elasticity.
--
Cheers,
Jan

John D. Mitchell

unread,
Apr 13, 2009, 7:12:36 PM4/13/09
to cloud-c...@googlegroups.com
On Apr 13, 2009, at 14:05 , Rao Dronamraju wrote:
[...]

> "Yes, I am arguing that multi-tenancy without virtualization is a
> better,
> cheaper, more efficient technology than virtualization."
>
> I think people are getting confused with two different cloud
> concepts here.
>
> 1. Virtualization
> 2. Multi-tenancy
>
> Multi-tenancy is achieved through virtualization simply because
> virtualization isolates the resources of one tenant from another
> tenant. But
> the primary function of virtualization is to virtualize the
> resources so
> that multiple entities can share those resources. It happens that in
> the
> case of public clouds, the entity happens to be tenants. In the case
> of
> private clouds the entities are probably departments and divisions
> of the
> same company. So multi-tenancy is a feature that "happens" because of
> virtualization.

OS instance virtualization is simplistic and so it works well for
humans at the cost of performance and hardware efficiency.

> Multi-tenancy can also be achieved through non-hypervisor
> virtualization
> technologies. For instance, the Grid Computing folks have been
> scheduling
> jobs on resources in a non-hypervisor way for over 10 to 15 years.

And process migration has been around for a lot longer in various
guises. Hint: you might want to check out some of the names on the old
Sprite OS papers from the 80's and 90's. :-)

> In Y2000-2004, there were some companies that did provisioning of
> resources
> without hypervisors, i.e provisioning on bare-metal dynamically.
>
> But what makes the hypervisor technology far better suited for cloud
> computing especially is the live migration of a running image from one
> system to another, in other words the very foundation for elasticity.
> (although no live migration is necessary for storage and network
> resources).

Sprite OS did that in the late 80's and it wasn't the only system to
allow for that. Plan 9 did but IIRC it was more in the early 90's.

> Now this can be done by bare metal provisioning also. But think of the
> latency involved in the bare-metal provisioning technique. You need
> to move
> the OS image from a SAN or a disk, boot it, configure it and
> commission it
> into the network to make it a functioning system.
>
> With hypervisors you do not have this latency, not at least this
> much. The
> Hypervisors are pretty efficient in moving a running system and with
> elastic
> IP you will have a running system very quickly, seamlessly up and
> running
> under the covers. Why does this matter?...
>
> If you have an e-commerce business and you have thousands or
> hundreds of
> thousands of customers performing transactions over the web/internet
> and if
> you need resource elasticity or failover, the hypervisor based
> virtualization has a distinct performance advantage in terms of
> latency over
> the bare-metal provisioning.

You seem to be comparing different parts of different things: the bare-
metal provisioning of, e.g., the entire OS plus applications with the
provisioning and movement of applications on top of a base OS with OS-
virtual-image provisioning (on top of a base/host OS).

With a good base OS, you can "provision" an app on top of it as fast,
if not a lot faster, and just as easily as doing an OS-virtual-image
(since all that is, really, is just another application). Using
chroot jails/containers/etc. for stricter isolation is old, mature
technologies and the performance can be quite good (especially w.r.t.
to the heavier-weight virtualization approaches).

As I've noted before, I do agree that if you have huge, brittle
"stacks" of libraries, frameworks, and applications, virtualizing the
entire OS is a good way to manage that complexity. But it's precisely
because it's a complexity management tool that I'd use it for that
situation.


FWIW, just as we learned 20 years ago with Sprite, process migration
is sexy but it's not as useful as people think and it has real costs.
I.e., it's not a silver bullet. Process/instance migration is a big
deal to the people selling virtualization as THE approach precisely
because they sell the notion that physical machines should be run with
lots of virtual instances and then we (or the system will magically)
move processes/images around for us when the load spikes. As with any
distributed sytems problem, that works great in some cases and
horribly in others.

Take care,
John

Rao Dronamraju

unread,
Apr 13, 2009, 8:23:03 PM4/13/09
to cloud-c...@googlegroups.com

"And process migration has been around for a lot longer in various
guises. Hint: you might want to check out some of the names on the old
Sprite OS papers from the 80's and 90's. :-)"

Yes, process migration has been around for quite some time and I remember
working on process migration in late 80s or early 90s. Infact, if I remember
right, there was TCF (Transparent Computing Facility) that tried process
migration.

But process migration never progressed far enough to be compared with
hypervisor based migration. The technological success of hypervisor based
migration is far exceeds any techniques tried before. The reason, if I
remember right, for process migration to not work very well at that time as
the process state was too difficult to capture/handle during the migration
process at that time. But the hypervisors have the advantage of capturing
all the process(s) state as they are transferring the entire OS address
space and they do it very gracefully.

"FWIW, just as we learned 20 years ago with Sprite, process migration
is sexy but it's not as useful as people think and it has real costs.
I.e., it's not a silver bullet. Process/instance migration is a big
deal to the people selling virtualization as THE approach precisely
because they sell the notion that physical machines should be run with
lots of virtual instances and then we (or the system will magically)
move processes/images around for us when the load spikes. As with any
distributed sytems problem, that works great in some cases and
horribly in others."

I agree with you, but the hypervisor technology has progressed and matured
quite a bit although it has never been tried in internet/web scale
migrations.



-----Original Message-----
From: cloud-c...@googlegroups.com
[mailto:cloud-c...@googlegroups.com] On Behalf Of John D. Mitchell
Sent: Monday, April 13, 2009 6:13 PM
To: cloud-c...@googlegroups.com
Subject: [ Cloud Computing ] Re: Cloud Computing and Multi-tenancy


Jim Houghton

unread,
Apr 13, 2009, 8:47:09 PM4/13/09
to cloud-c...@googlegroups.com
Rao,

To answer your questions comparing grids to hypervisor-based
virtualization...the short answer is you have fundamentally different
computing architectures in play and there is no equivalent to 'live
migration'.

With hypervisor systems you essentially have 1 application to 1 partition
along with n other partitions on a server - if the server fails, if you want
to rebalance workload, if you need to do maintenance you clone that
partition to another server and shut down the original partition. If you've
done everything correctly and the software performs as it should you have a
minimal blip in availability.

With grids it is completely different; you have some intelligence about the
various workloads and the resources and what you're allowed to do with
what/when (policies), and then you have many, many 'dumb' servers doing
whatever the intelligence tells them to do. [Note: I'm being very
non-specific with 'intelligence' because depending on your choice of
software it can be very simple deterministic scheduling or highly
sophisticated where it dynamically reacts to changes (sudden abnormal
workload spike, or an outage reducing the worker pool by half so it needs to
re-prioritize all work).]

When most people think of grid they think of HPC applications such as crash
test simulators, or seismic analysis, or gene sequencing, or risk
modeling...the point is they are not stateful workloads. Work is sent to the
grid, the intelligence figures out the best place to run it, and then it
gets done...generally across many servers. If an individual server fails
it's task is restarted on another server, and unless you happen to be
pouring through the logs you would never know that the work 'migrated'
during execution. The consideration that typically needs to be made about
shifting work around is network constraints/data affinities (don't ship a
job from NYC to London if every compute node requires a 50MB input file
before executing a 60 second task), or if a particular workload needs a
certain set of libraries. Those things are typically handled by the policies
mentioned earlier.

Now lest I persist the common (and untrue) myth that applications have to be
custom written to take advantage of grids, permit me to expand the thinking
about non-HPC workloads that can run very nicely in a grid environment.
These include, but are not limited to, print services, file services, image
conversion services ... things that may at times require a lot of servers
but most of the time require very few. These are excellent 'filler'
applications ... things that aren't necessarily parallel in nature but
consume a lot of servers in a typical enterprise. That grid 'intelligence'
can appropriately prioritize those services, spin it up/down on more or less
servers as demand warrants and basically allow you to significantly reduce
your server count (a pool of 100 servers can do a lot more work than 10
pools of 10 servers - so you can either take on more work or cut your server
footprint).

So to conclude this lengthy monologue (sorry) let me make my perspective
clear - I happen to know a few things about grids, but I am not saying one
is better than the other. They are different, and each should be used under
the conditions that best suit their strengths. Any good carpenter has more
than 1 saw ... you wouldn't want to use a jigsaw to frame a house.


Jim

Rao Dronamraju

unread,
Apr 13, 2009, 10:13:26 PM4/13/09
to cloud-c...@googlegroups.com

No I am not talking in the context of Beowulf clusters or anything else…..

 

I am talking more specifically w.r.t. OGSA, globus, SGE, LSF, PBS, Condor….etc and their way of handling job migrations and scheduling.

Based on my memory most of these grid systems were batch oriented. Since batch oriented-ness generally precludes low latency user interaction like the ones you see with web/internet applications. But since I am not completely knowledgeable about the wide spectrum of applications used in the grid area, I though there might be some instances of grid applications that might have such requirements.

 

 


John D. Mitchell

unread,
Apr 13, 2009, 11:06:35 PM4/13/09
to cloud-c...@googlegroups.com
On Apr 13, 2009, at 17:23 , Rao Dronamraju wrote:
[...]
> "And process migration has been around for a lot longer in various
> guises. Hint: you might want to check out some of the names on the old
> Sprite OS papers from the 80's and 90's. :-)"
>
> Yes, process migration has been around for quite some time and I
> remember
> working on process migration in late 80s or early 90s. Infact, if I
> remember
> right, there was TCF (Transparent Computing Facility) that tried
> process
> migration.
>
> But process migration never progressed far enough to be compared with
> hypervisor based migration. The technological success of hypervisor
> based
> migration is far exceeds any techniques tried before. The reason, if I
> remember right, for process migration to not work very well at that
> time as
> the process state was too difficult to capture/handle during the
> migration
> process at that time. But the hypervisors have the advantage of
> capturing
> all the process(s) state as they are transferring the entire OS
> address
> space and they do it very gracefully.

You should definitely go back and look at e.g., Sprite and Plan 9.
Both did it quite well -- they were after all the actual OS doing the
work and so had all of the information/hooks/etc. to do that.

As with many things, the biggest factor was that the (relatively
inexpensive) hardware just wasn't as powerful back then and so running
lots of jobs concurrently really impacted the local/other user of a
machine.

VMWare, for example, partly grew out of the desire to bring a lot of
the cool and powerful capabilities discovered in Sprite to the masses
and for general X (client) on Y (host) combinations.

Take care,
John

Rao Dronamraju

unread,
Apr 14, 2009, 12:19:42 AM4/14/09
to cloud-c...@googlegroups.com

Jim

Thanks for the excellent explanation about grids.

"Now lest I persist the common (and untrue) myth that applications have to
be custom written to take advantage of grids, permit me to expand the
thinking about non-HPC workloads that can run very nicely in a grid
environment. These include, but are not limited to, print services, file
services, image conversion services ... things that may at times require a
lot of servers but most of the time require very few. These are excellent
'filler' applications ... things that aren't necessarily parallel in nature
but consume a lot of servers in a typical enterprise. That grid
'intelligence' can appropriately prioritize those services, spin it up/down
on more or less servers as demand warrants and basically allow you to
significantly reduce your server count (a pool of 100 servers can do a lot
more work than 10 pools of 10 servers - so you can either take on more work
or cut your server footprint)."

I was thinking about the type of workloads that you mentioned above. In case
of such non-parallel, non-HPC workloads if the cluster workload reaches the
max for that cluster how does the scheduler re-direct the incoming work to
other clusters, does it instantiate an instance of the server on another
cluster??.. or is it a static set of servers that have been
pre-configured?.. and no dynamic capability?... and also if a server in a
clusters fails, how does the failover happen, is it as in the traditional
"heart-beat" failover?...or some kind of again instantiating an instance
dynamically.

Matthew Zito

unread,
Apr 14, 2009, 12:44:15 AM4/14/09
to cloud-c...@googlegroups.com

There's a couple of workload distribution models, depending on the nature of the application.  If the application is low-overhead when idle, then you just run instances of the application on all nodes in the grid, and it just doesn't use a significant amount of resources when there's no work to be done.  In other scenarios, say if an app preallocates a significant amount of RAM for its own nefarious purposes on startup, then the scheduler will look at the current workload, see that there's needs for additional horsepower, and only then fire up instances of the app on additional servers.

I think it's important that you not unnecessarily conflate "cluster" with "grid".  On a traditional grid environment with plain vanilla apps, the failure of a node is handled by another node getting assigned that one's workload.  It is not uncommon that the failure event require a restart from a checkpoint or from the beginning of a job, hence failures really are "data loss events" - it's just that you can start from scratch.  For example, you might have a batch job to convert 200,000 word documents to PDFs, and 100 servers.  Each server gets 2000 items to convert.  You can either have checkpointing after every conversion, after a set of conversions, or no checkpointing.  If node 46 fails after document 1587, the scheduler will add the appropriate number of conversions to the queues of the surviving nodes, or by adding another node.

Generally, the reason people partition their grids are business/compliance/security/administrative, rather than technical.  One business unit is concerned with sharing the same OS images as another business unit, or they can't agree on the scheduling priorities, or chargeback, etc.

Incidentally, you see the same problems with virtualization, in terms of sharing resources - nice to know that some problems remain constant.

Thanks,
Matt



--
Matthew Zito
Chief Scientist
GridApp Systems
P: 646-452-4090
mz...@gridapp.com
http://www.gridapp.com

Jim Houghton

unread,
Apr 14, 2009, 11:24:57 AM4/14/09
to cloud-c...@googlegroups.com

Matt, thanks for jumping in … permit me to add a few more points:

 

Realizing I run the risk of rekindling a definition debate far older than “what is a cloud”, it is important to reiterate Matt’s point that grids are not necessarily clusters.  Clusters can be part of the resource pool for grids (makes sense for optimization purposes) but TYPICALLY when you talk about clusters it’s a pool of identical hardware resources whereas when you talk about grids you’re talking about the software layer that controls the execution of work across that pool.  There is no requirement for all nodes to be identical in a grid environment; the intelligence that controls workload distribution just needs to be aware of the attributes of that resource.  In that manner if you understand the requirements of the workload (data affinities, must complete by, relative priority, perquisites, etc) and you understand the available resources you can dynamically allocate work to the resource that can best complete the work at that point in time.


This goes to your question below – if something fails, the work can be restarted elsewhere but the work that was underway on that node must start over from scratch.  Matt references checkpointing;  if you have jobs that are going to take a long time to complete that makes sense, but if you have a bunch of short-lived tasks it’s not worth the delay (while results are written) to have a checkpoint.  Of course there are other tricks that my buddies selling Coherence (or similar products from GigaSpaces, Gemstone, etc) can teach you to mitigate that delay.  For long-running services as I mentioned previously your guess is correct – typically the ‘intelligence’ will spawn a new instance of that service on another node and it will take over the work.  Best practices for deployment of those types of services is to always have at least 2 instances running, so in general a failure is about as noticeable as what a user would experience with a hypervisor-based live migration.

 

Now, before Jan and others jump on me … true cluster systems (IBM HACMP, etc) handle failures at the hardware level … none of this icky software in the way.  A failure of a node is 100% transparent to the end user.  But you will pay a lot for that level of availability, so I’ll close this the same way as my previous post … use the right tool (architecture) for the right situation.

 

Jim

Kevin Apte

unread,
Apr 14, 2009, 11:45:31 AM4/14/09
to cloud-c...@googlegroups.com
Virtualization using VMWare or Xen allows isolation of system resources between different  Virtual Machines. JVMs run in the same OS, and can potentially impact each other by causing crashes at the kernel level, causing each other to get starved of system resources like Memory, CPU, File Handles, Network bandwidth etc.

Kevin

Jan Klincewicz

unread,
Apr 14, 2009, 11:52:22 AM4/14/09
to cloud-c...@googlegroups.com
Well, I may as well jump on you anyway (but I promise to be gentle.)  There ARE a few software solutions out there that can mirror Windows instances across two identical machines (or VMs) so that one instance of the workload appears downstream.  Marathon, Quorum, Stratus (had a cloudy name before it was popular !)

 Of course, this does require 2x the hardware (but this is somewhat mitigated when you can lockstep multiple VMs.)

The downside is only single CPU instances are protected at the full system level.
--
Cheers,
Jan

Randall Minter

unread,
Apr 14, 2009, 1:02:59 PM4/14/09
to cloud-c...@googlegroups.com
What happens when the application needs to scale beyond the limits of the physical server? What happens to the other applications on a machine if one of the VMs needs to go from 8GB of ram to 16GB of ram and only 4GB of the physical are available? How long does this take and can it be automated?

These are the immediate questions that come to mind when comparing virtualization and grid environments. If the above scenario happens in a grid, just add a few more servers to the cluster. In the status quo server world, scaling beyond a single instance required a lot more work by humans. I'm not sure if the process is different in a VM world or not, that's why I ask.

- Randall

Christopher Steel

unread,
Apr 14, 2009, 3:19:42 PM4/14/09
to cloud-c...@googlegroups.com

 

 

From: Kevin Apte [mailto:technicalar...@gmail.com]
Sent: Tuesday, April 14, 2009 11:46 AM
To: cloud-c...@googlegroups.com
Subject: [ Cloud Computing ] Re: Cloud Computing and Multi-tenancy

 

Virtualization using VMWare or Xen allows isolation of system resources between different  Virtual Machines. JVMs run in the same OS, and can potentially impact each other by causing crashes at the kernel level, causing each other to get starved of system resources like Memory, CPU, File Handles, Network bandwidth etc.


Kevin

On Mon, Apr 13, 2009 at 6:47 PM, John D. Mitchell <jdmit...@gmail.com> wrote:


On Apr 13, 2009, at 11:54 , Christopher Steel wrote:
[...]

> You can avoid naming issues (and should do so) using different class
> loaders
> within the same VM. Granted this is not as straightforward as you
> would
> expect it to be, but it is not rocket science ether.

Yes and no (I've written a few articles about it somewhere :-).  There
are plenty of nasty problems in (over-)retention when one tries to get
rid of old versions (when, for instance loading new versions).  Trying
to robustly do "multi-tenants" in a single JVM is asking for trouble...

We have had a lot of success with a custom classloader that handled versioning without trouble. I recall that there were a significant amount of initial hurdles, but we have not had any troubles since.


> I agree that the GC is
> a large performance consideration, it always has been.

It's not just performance.  In a multi-tenant situation, a badly
behaving tenant can easily destroy all of the utility for all of the
tenants just by being a bad citizen w.r.t. to GC.

That’s very true, but it is no different than a run-away process hogging CPU. And since most VM installations out there do not have assigned CPUs, a run-away process in one VM can effect processes in other VMs on the same box/blade.


> On the positive side,
> Java offers a rigorous security model that makes it a serious
> contender in
> this space.

Yes and no.  There is all of the help in terms of things like
references but by the time one takes care of all of the various
issues, you're basically right back to separate instances of the JVM.
Of course, there are lots of security issues that people don't always
(know enough to) care about YMMV (such as timing attacks and eating up
randomness).

I am not sure which timing attacks are specific to the Java Security model, and I do agree you need a lot of work to take care of the various issues, but I disagree that you are back to separate instances of the VM.


> I am not sure what you meant you meant in regards to the perm-gen
> space
> (perhaps that it is shared)?

Perm-gen space is indeed shared on a per-instance of the JVM basis and
it's a fixed size. There was one simple bug in a library that was
required by a library used by a framework that one of our contractors
used that ate perm-gen space and it ended up taking forever to track
down (for a variety of reasons including crappy tools and crappy
people).  Classloader issues have also come up that can eat perm-gen
space.

Anybody can create bad code that eats up memory whether in a VM or in an OS instance. With Java, you have to have an understanding of the heap space, how it is allocated and what the memory requirements are of the applications that will run within a VM. This is no different than understanding the memory requirements when sizing a VM or even a box with a single OS instance.

At the end of the day, I still believe the JVM provides a significant resource cost savings over an OS hosted VM instance.

 

-Chris



Have fun,

John




dave corley

unread,
Apr 14, 2009, 1:45:26 PM4/14/09