I would like to hear from others what the effect of the "Cloud Chip"
will have on Virtualization and Cloud Computing...
Thanks,
GregO
Moore’s Law: The Future of Cloud Computing from the Bottom Up
I'm a serial entrepreneurial leader. It's an art/science, left/right
brain thing. I have to say that one of the most challenging parts of
creating a compelling strategy, leading a company or building products
is getting people to see the possibilities, transitions and tipping
points. Imagineering the future calls me to look back at what made
companies great -- specifically, how they capitalized on paradigm
shifts while the rest missed it. Reading the recent bestseller,
Outliers, it struck me that, not only do you have to be smart, but you
have to be in the right place with the experience to see and grab the
brass ring.
Moore's Law is one of those history lessons that have traditionally
been a touchpoint that points the way to the future. Simply put,
Moore's law describes a long-term trend in the history of computing
hardware, in which the number of transistors that can be placed
inexpensively on an integrated circuit has doubled approximately every
two years.
Translation: compute power has reliably doubled at a decreased cost
every two years.
In a recent announcement, Intel gave a glimpse of what the future will
look like. The "Cloud" chip will have 48 cores, is available to
Intel's ISV partner today and will be shipping in volume in less then
18 months. The quote from the Intel dude stated that it will increase
the power of what is available today by 10-20 times. Oh my.... Buckle
your seatbelt .... Moore's law just took a giant step up the paradigm.
<the rest at http://blog.appzero.com/ >
In the absence of such an application (and one that doesn't also require
scale-out), a more useful configuration is a server sled -- single
board, single power supply, on board switch but a half dozen servers
each with dedicated memory and maybe a dedicated local disk. James
Hamilton has written about these. A sled gives the server density of a
massive-core system without the memory contention, and is probably cheaper.
Intel, I think, is pushing what they think they know how to build.
Whether there is any market pull for this, I don't know, but I doubt it.
The future doesn't belong to scale-up (bigger, faster machines) but to
scale-out (more, cheaper machines). Maybe Intel is just looking for
their next boat to miss, but cloud computing will always be happier with
more, cheaper machines (Jan will insist on a high powered logo on each
on, though).
GregO wrote:
> Below is a snippet from my latest blog..
>
> I would like to hear from others what the effect of the "Cloud Chip"
> will have on Virtualization and Cloud Computing...
>
> Thanks,
> GregO
>
> Moore�s Law: The Future of Cloud Computing from the Bottom Up
--
Jim Starkey
Founder, NimbusDB, Inc.
978 526-1376
I'm skeptical, very skeptical. I see the system cost of large number of cores -- memory contention and memory bandwidth contention, but I don't see the benefit unless there is an application that needs memory shared between a large number of threads. 48 cores with 36 stalled waiting on the memory controller doesn't strike me as a good architecture.
In the absence of such an application (and one that doesn't also require scale-out), a more useful configuration is a server sled -- single board, single power supply, on board switch but a half dozen servers each with dedicated memory and maybe a dedicated local disk. James Hamilton has written about these. A sled gives the server density of a massive-core system without the memory contention, and is probably cheaper.
Intel, I think, is pushing what they think they know how to build. Whether there is any market pull for this, I don't know, but I doubt it.
The future doesn't belong to scale-up (bigger, faster machines) but to scale-out (more, cheaper machines). Maybe Intel is just looking for their next boat to miss, but cloud computing will always be happier with more, cheaper machines (Jan will insist on a high powered logo on each on, though).
GregO wrote:
Below is a snippet from my latest blog..
I would like to hear from others what the effect of the "Cloud Chip"
will have on Virtualization and Cloud Computing...
Thanks,
GregO
Moore’s Law: The Future of Cloud Computing from the Bottom Up
--
--
~~~~~
Register Today for Cloud Slam 2010 at official website - http://cloudslam10.com
Posting guidelines: http://groups.google.ca/group/cloud-computing/web/frequently-asked-questions
Follow us on Twitter http://twitter.com/cloudcomp_group or @cloudcomp_group
Post Job/Resume at http://cloudjobs.net
Buy 88 conference sessions and panels on cloud computing on DVD at http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B002H0IW1U or get instant access to downloadable versions at http://cloudslam09.com/content/registration-5.html
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
Something like the Pregel model from Google ... Here is the model I'm thinking ...
http://horicky.blogspot.com/2010/02/nosql-graphdb.html
Rgds,
Ricky
Are you sure about this statistic?....With millions of web servers deployed
in the last 10 to 15 years during the internet/web era, and it is my
understanding that majority of these web servers are Linux/Apache/Tomcat
servers, does MS have such a whopping 80 to 15 edge over linux?...Even if
you include application and database servers the ratio seems to be heavily
skewed in favor of MS. Also most DNS, DHCP etc run on Linux more than MS.
About the 48 cores and making use of them, I agree that there are two models
to fully utilize the cores. 1. Load Balancing model and 2. Paralelism model.
The LB model is easy. All you have to do is make sure that the scheduler
keeps the core fully scheduled. The parallelism model is more difficuilt as
we all know that parallelization at this scale is difficuilt from many
different perspectives.
It appears to me that you are talking about Solaris Containers for MS
Windows, is that right?...
I think the biggest challnge would be not only leveraging the cores through
parallelization techniques but management of such mega (millions of VMs)
sprawl. If a cloud has 10,000 servers and ecah one hosts 240 VMs, it is 2,
400,000 VMs in a 10,000 server data center. A 10,000 server data center is
not considered a big cloud, based on what was being built around - MS with
500,000 server cloud in San Antonio and/or Chicago. Amazon has already close
to 50,000 I heard and Racksapce 25,000. So managing all these millions of
VMs is going to be a challenge of galactic proportions.
I disagree with your prediction about Windows MS being at 75%. I think Linux
would be a dominant VM OS in the clouds. In fact it is my guess that the
reason MS is building its own mega data centers is because it knows that
most clouds might use Linux and hence it needs to drive the MS windows based
clouds itself.
My prediction is 2012 is when public clouds will have major growth as the
industry would have solved by that time the security and management problems
that we are seeing today.
Thanks,
GregO
--
This stat ignores the RISC server market, which is roughly 50% AIX and
25/25 for each of Solaris and HP-UX. But the unit shipments of these
servers is far below x86 servers.
As for the OSes/Hypervisors running in cloud compute centers, I would
agree that the majority of them will be Linux/VMware with some HyperV
and Citrix. Not a great amount of native Windows, but never
underestimate Microsoft to take market share as time goes by.
Rob
Robert Peglar
Vice President, Technology, Storage Systems Group
Xiotech Corporation | Toll-Free: 866.472.6764
o 952 983 2287 m 314 308 6983 f 636 532 0828
Robert...@xiotech.com | www.xiotech.com
@Erik,
Good post. I agree on your point about running VM farms using shared storage (SAN) – but alas, some folks insist on running VM farms on DAS, believing it to be cheaper. It’s not, over the long term, but the perception persists. Also, those folks would rather transfer risk to the user, i.e. VM outages, than construct optimal farms.
As for cheap whiteboxes replacing mainframes, it hasn’t happened. The venerable machine is still around, still running, still a terrific platform for running tons of Linux instances, for example. It won’t go away, in my lifetime at least.
One thing, though; VM failover isn’t clustering, it’s failover. The difference is simple; one instance which can move between N platforms, versus N instances (on N platforms) communicating with each other and managing shared resources. Both are HA tactics, but the method is different.
Rob
There are plenty of VMs deployed w/o shared storage, absolutely. This has led to data movers such as VMware SRM, which moves the data for the VM from one disk (or array) to another. Without this piece, VM failover would not be possible. The mere existence of s/w bits like this indicate the demand for time/space tradeoffs, sadly.
Rob
________________________________
From: cloud-c...@googlegroups.com on behalf of Peglar, Robert
Sent: Wed 2/10/2010 15:16
To: cloud-c...@googlegroups.com
Subject: RE: [ Cloud Computing ] Moore's Law: The Future of Cloud Computing from the Bottom Up
There are plenty of VMs deployed w/o shared storage, absolutely. This has led to data movers such as VMware SRM, which moves the data for the VM from one disk (or array) to another. Without this piece, VM failover would not be possible. The mere existence of s/w bits like this indicate the demand for time/space tradeoffs, sadly.
Rob
From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Jan Klincewicz
Sent: Wednesday, February 10, 2010 8:22 AM
To: cloud-c...@googlegroups.com
Subject: Re: [ Cloud Computing ] Moore's Law: The Future of Cloud Computing from the Bottom Up
Rob:
Clustering is defined differently in the Windows world than the Linux world. I's guess the majority of Windows clusters are two-node, active-passive. I understand the distinction, but unfortunately,the definition evolves to repelct whatever the marketers can get the public to accept. As even a 2-node cluster requires a Quorum disk, shared storage is a must. So yum really see many users deploying VMs without at least a NAS ? I have seen a few "one-offs" but you pretty much give up all the great benefits of virtualization when you skip a common storage platform.
On Wed, Feb 10, 2010 at 8:27 AM, Peglar, Robert <Robert...@xiotech.com> wrote:
@Erik,
Good post. I agree on your point about running VM farms using shared storage (SAN) - but alas, some folks insist on running VM farms on DAS, believing it to be cheaper. It's not, over the long term, but the perception persists. Also, those folks would rather transfer risk to the user, i.e. VM outages, than construct optimal farms.
As for cheap whiteboxes replacing mainframes, it hasn't happened. The venerable machine is still around, still running, still a terrific platform for running tons of Linux instances, for example. It won't go away, in my lifetime at least.
One thing, though; VM failover isn't clustering, it's failover. The difference is simple; one instance which can move between N platforms, versus N instances (on N platforms) communicating with each other and managing shared resources. Both are HA tactics, but the method is different.
Rob
From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Erik Sliman
Sent: Tuesday, February 09, 2010 9:49 PM
To: cloud-c...@googlegroups.com
Subject: Re: [ Cloud Computing ] Moore's Law: The Future of Cloud Computing from the Bottom Up
Jim and Jan,
Your points are all valid, and do primary concerns. I understand the concerns with memory contention and failover. Being that this was once an argument on why cheap wintel boxes could never displace the mainframe, my gut tell me that these high density core chips do have a real chance in the cloud, if, for no other reason, then for their ability to save space and reduce power consumption. So, let's play devil's advocate with the concerns.
Memory contention: imagine 48 cores and 12-channel memory. What is the REAL % of wait time for the threads? It seems potentially contentious, but without intricate knowledge of what % of a core's time requires an open memory channel, I don't really know how contentious it will really be, particularly considering potential core idle time.
Failover: Why does a VM have to depend on a CPU housing for this? Let's assume it is using a fiber optic SAN and the memory state of a VM's requiring high availability is mirrored. Why can't the VM fail over to another box if the box it is in fails? This is, fundamentally, the type of clustering that made cheap hardware capable of being used to build data centers and supercomputers.
Xen does it
http://sheepy.org/node/65
Erik
OpenStandards.net <http://openstandards.net/>
GregO wrote:
Thanks,
GregO
--
--
~~~~~
Register Today for Cloud Slam 2010 at official website - http://cloudslam10.com <http://cloudslam10.com/>
Posting guidelines: http://groups.google.ca/group/cloud-computing/web/frequently-asked-questions
Follow us on Twitter http://twitter.com/cloudcomp_group or @cloudcomp_group
Post Job/Resume at http://cloudjobs.net <http://cloudjobs.net/>
Buy 88 conference sessions and panels on cloud computing on DVD at http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B002H0IW1U or get instant access to downloadable versions at http://cloudslam09.com/content/registration-5.html
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
--
Cheers,
Jan
--
~~~~~
Register Today for Cloud Slam 2010 at official website - http://cloudslam10.com <http://cloudslam10.com/>
Posting guidelines: http://groups.google.ca/group/cloud-computing/web/frequently-asked-questions
Follow us on Twitter http://twitter.com/cloudcomp_group or @cloudcomp_group
Post Job/Resume at http://cloudjobs.net <http://cloudjobs.net/>
Buy 88 conference sessions and panels on cloud computing on DVD at
http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B002H0IW1U or get instant access to downloadable versions at http://cloudslam09.com/content/registration-5.html
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
--
~~~~~
Register Today for Cloud Slam 2010 at official website - http://cloudslam10.com <http://cloudslam10.com/>
Posting guidelines: http://groups.google.ca/group/cloud-computing/web/frequently-asked-questions
Follow us on Twitter http://twitter.com/cloudcomp_group or @cloudcomp_group
Post Job/Resume at http://cloudjobs.net <http://cloudjobs.net/>
Buy 88 conference sessions and panels on cloud computing on DVD at
http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B002H0IW1U or get instant access to downloadable versions at http://cloudslam09.com/content/registration-5.html
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
Robert Peglar
Vice President, Technology, Storage Systems Group
Xiotech Corporation | Toll-Free: 866.472.6764
o 952 983 2287 m 314 308 6983 f 636 532 0828
Robert...@xiotech.com | www.xiotech.com <http://www.xiotech.com/>
~~~~~
Register Today for Cloud Slam 2010 at official website - http://cloudslam10.com
Posting guidelines: http://groups.google.ca/group/cloud-computing/web/frequently-asked-questions
Follow us on Twitter http://twitter.com/cloudcomp_group or @cloudcomp_group
Post Job/Resume at http://cloudjobs.net
Buy 88 conference sessions and panels on cloud computing on DVD at
http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B002H0IW1U or get instant access to downloadable versions at http://cloudslam09.com/content/registration-5.html
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com <http://www.xiotech.com/>
--
Cheers,
Jan
--
~~~~~
Register Today for Cloud Slam 2010 at official website - http://cloudslam10.com
Posting guidelines: http://groups.google.ca/group/cloud-computing/web/frequently-asked-questions
Follow us on Twitter http://twitter.com/cloudcomp_group or @cloudcomp_group
Post Job/Resume at http://cloudjobs.net
Buy 88 conference sessions and panels on cloud computing on DVD at
http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B002H0IW1U or get instant access to downloadable versions at http://cloudslam09.com/content/registration-5.html
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
To report this email as spam click here <https://www.mailcontrol.com/sr/wQw0zmjPoHdJTZGyOCrrhg==> .
--
~~~~~
Register Today for Cloud Slam 2010 at official website - http://cloudslam10.com <http://cloudslam10.com/>
Posting guidelines: http://groups.google.ca/group/cloud-computing/web/frequently-asked-questions
Follow us on Twitter http://twitter.com/cloudcomp_group or @cloudcomp_group
Post Job/Resume at http://cloudjobs.net <http://cloudjobs.net/>
Buy 88 conference sessions and panels on cloud computing on DVD at
http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B002H0IW1U or get instant access to downloadable versions at http://cloudslam09.com/content/registration-5.html
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom
"* Microsoft Windows server revenue was $4.5 billion in 3Q09 showing a 12.8%
year-over-year decline and comprising 43.0% of all server revenue in the
quarter. Windows servers account for the single largest segment, by
operating system, in the worldwide server market.
* Linux server revenue declined 12.6% year over year to $1.5 billion in the
quarter. Linux servers now represent 14.8% of all server revenue, up
slightly from 14.0% a year ago."
http://www.idg.com/www/pr.nsf/0/02732ED3B5E320328525768000659F41
Altough the numbers are quarterly year over year, the overall ratio per year
would be close.
--
OK. Its not SRM that’s (Site recovery Manager) , you’re talking about Storage motion
Rob
Darren, you have the history correct, yes. But remember many VMs on version 2 were direct attached, and therefore the Dmotion utility to move them along. So, the tool really has multiple effects.
From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Darren Sykes
Sent: Wednesday, February 10, 2010 9:51 AM
To: cloud-c...@googlegroups.com
Subject: RE: [ Cloud Computing ] Moore's Law: The Future of Cloud Computing from the Bottom Up
My understanding was that the VMWare technology to move running machines between storage devices was a direct resut of the work they put in to allow their customers to move from VMFS 2 to 3; At the time it was called DMotion (D for data) internally at VMWare, but was renamed when they decided to make it a production feature.
We (and most people I've spoken to about this) use the feature when reorganising data or moving between storage platforms. Whilst I suppose it would be useful for DAS VMWare environments, I'm certain that's not why it was originally conceived.
From: cloud-c...@googlegroups.com on behalf of Peglar, Robert
Sent: Wed 2/10/2010 15:16
To: cloud-c...@googlegroups.com
Subject: RE: [ Cloud Computing ] Moore's Law: The Future of Cloud Computing from the Bottom Up
There are plenty of VMs deployed w/o shared storage, absolutely. This has led to data movers such as VMware SRM, which moves the data for the VM from one disk (or array) to another. Without this piece, VM failover would not be possible. The mere existence of s/w bits like this indicate the demand for time/space tradeoffs, sadly.
Rob
From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Jan Klincewicz
Sent: Wednesday, February 10, 2010 8:22 AM
To: cloud-c...@googlegroups.com
Subject: Re: [ Cloud Computing ] Moore's Law: The Future of Cloud Computing from the Bottom Up
Rob:
Clustering is defined differently in the Windows world than the Linux world. I's guess the majority of Windows clusters are two-node, active-passive. I understand the distinction, but unfortunately,the definition evolves to repelct whatever the marketers can get the public to accept. As even a 2-node cluster requires a Quorum disk, shared storage is a must. So yum really see many users deploying VMs without at least a NAS ? I have seen a few "one-offs" but you pretty much give up all the great benefits of virtualization when you skip a common storage platform.
On Wed, Feb 10, 2010 at 8:27 AM, Peglar, Robert <Robert...@xiotech.com> wrote:
@Erik,
Good post. I agree on your point about running VM farms using shared storage (SAN) – but alas, some folks insist on running VM farms on DAS, believing it to be cheaper. It’s not, over the long term, but the perception persists. Also, those folks would rather transfer risk to the user, i.e. VM outages, than construct optimal farms.
As for cheap whiteboxes replacing mainframes, it hasn’t happened. The venerable machine is still around, still running, still a terrific platform for running tons of Linux instances, for example. It won’t go away, in my lifetime at least.
One thing, though; VM failover isn’t clustering, it’s failover. The difference is simple; one instance which can move between N platforms, versus N instances (on N platforms) communicating with each other and managing shared resources. Both are HA tactics, but the method is different.
Rob
From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Erik Sliman
Sent: Tuesday, February 09, 2010 9:49 PM
Subject: Re: [ Cloud Computing ] Moore’s Law: The Future of Cloud Computing from the Bottom Up
Jim and Jan,
Your points are all valid, and do primary concerns. I understand the concerns with memory contention and failover. Being that this was once an argument on why cheap wintel boxes could never displace the mainframe, my gut tell me that these high density core chips do have a real chance in the cloud, if, for no other reason, then for their ability to save space and reduce power consumption. So, let's play devil's advocate with the concerns.
Memory contention: imagine 48 cores and 12-channel memory. What is the REAL % of wait time for the threads? It seems potentially contentious, but without intricate knowledge of what % of a core's time requires an open memory channel, I don't really know how contentious it will really be, particularly considering potential core idle time.
Failover: Why does a VM have to depend on a CPU housing for this? Let's assume it is using a fiber optic SAN and the memory state of a VM's requiring high availability is mirrored. Why can't the VM fail over to another box if the box it is in fails? This is, fundamentally, the type of clustering that made cheap hardware capable of being used to build data centers and supercomputers.
Xen does it
http://sheepy.org/node/65
Erik
OpenStandards.net
On Tue, Feb 9, 2010 at 7:57 PM, Jan Klincewicz <jan.kli...@gmail.com> wrote:
Posting guidelines: http://groups.google.ca/group/cloud-computing/web/frequently-asked-questions
Follow us on Twitter http://twitter.com/cloudcomp_group or @cloudcomp_group
Post Job/Resume at http://cloudjobs.net
Buy 88 conference sessions and panels on cloud computing on DVD at http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B002H0IW1U or get instant access to downloadable versions at http://cloudslam09.com/content/registration-5.html
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
I agree but absense of real numbers, one can derive some inference….
Since Linux is free, the cost (License+Service) per unit (server) of Linux should be lower than cost (license+Service) per Windows server, is it not?...
If the cost is lower per unit that means the numbers are higher. That should mean, there are more numbers of Linux VMs in deployment (those that are directly attributable to revenue + those that are installed by free downloads)
So the market share of the numbers deployed would be 15+% for Linux and 43% for Windows. I would think the ratio could be something like 30% to 45%.
But that is my guestimate as I have not been keeping track of Linux Vs Windows (deployed) market share for 3+ years.
From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Erik Sliman
Sent: Wednesday, February 10, 2010
10:13 AM
To: cloud-c...@googlegroups.com
I echo Jim's comment that it's engineers saying "here's something
something we know how to do." ... but I note that "know" is limited
since it's a lab prototype, not a product ... So, OK, they got their
QuickConnect multilink memory interconnect, so now they can string
cores out to as many as then can cram on as big a chip as they can
make. Good test of the cache coherence algorithms.
After that, they say "here are a random bunch of hot topics we think
maybe it might be good for," topics chosen by hardware engineers and
not software guys.
Time will tell whether anybody uses it for anything like what they
speculate. At least this time they're not saying it will enable brain
simulation or something like that.
I'd want to know the memory latency and bandwidth, and the IO latency
and bandwidth, before saying it's good for anything at all.
And I just *love* their equation of 48-way parallel performance to 2-
or 4-way parallel performance, undoubted achieved by just multiplying
by the number of processors. Perfect scaling, anybody? Out to 48
cores? ROTFLMAO.
(See my recent Perils of Parallel post about Larrabee memory
bandwidth. Do you want to write code that will be memory starved if on
average it accesses more than 1 byte per instruction?)
Greg Pfister
http://perilsofparallel.blogspot.com/
You have cheap boxes of adequate power because somebody's pushing the
envelope - like not everyone buys a racing car, but any given Escort
gets cheaper because of the people who do. So you can't say "the
future is after scale-out".
Sassa
> > Moore s Law: The Future of Cloud Computing from the Bottom Up
>
> > I'm a serial entrepreneurial leader. It's an art/science, left/right
> > brain thing. I have to say that one of the most challenging parts of
> > creating a compelling strategy, leading a company or building products
> > is getting people to see the possibilities, transitions and tipping
> > points. Imagineering the future calls me to look back at what made
> > companies great -- specifically, how they capitalized on paradigm
> > shifts while the rest missed it. Reading the recent bestseller,
> > Outliers, it struck me that, not only do you have to be smart, but you
> > have to be in the right place with the experience to see and grab the
> > brass ring.
>
> > Moore's Law is one of those history lessons that have traditionally
> > been a touchpoint that points the way to the future. Simply put,
> > Moore's law describes a long-term trend in the history of computing
> > hardware, in which the number of transistors that can be placed
> > inexpensively on an integrated circuit has doubled approximately every
> > two years.
>
> > Translation: compute power has reliably doubled at a decreased cost
> > every two years.
>
> > In a recent announcement, Intel gave a glimpse of what the future will
> > look like. The "Cloud" chip will have 48 cores, is available to
> > Intel's ISV partner today and will be shipping in volume in less then
> > 18 months. The quote from the Intel dude stated that it will increase
> > the power of what is available today by 10-20 times. Oh my.... Buckle
> > your seatbelt .... Moore's law just took a giant step up the paradigm.
>
> > <the rest athttp://blog.appzero.com/>
I think you guys have completely missed the point about what is happening in the virtualization market, in the data center and in particular on the Windows platform.
@jim & @jan – Both of you talk about an application @jim “In the absence of such an application..” @jan “There are very few bits of code out there used for commercial purposes today that can take advantage of a modern Quad processor”.
I surely have done a poor job of communicating in this blog.
The point is that Hypervisors and OS can run multiple OS or apps and partition the workload across all these cores. It is not about a single app! Unless of course we are talking about how a hypervisor or OS as an app that schedules, partition and utilizes the silicon.
It is about a lot of apps that quite frankly are not doing a lot of stuff concurrently.
Here is the data from Gartner Dataquest Insight: Virtualization Market Size Driven by Cost Reduction, Resource Utilization and Management Advantages, January 5 January 2009
2008 2009 2010 2011
Windows (Server) $5,419 $5,952 $6,457 $6,907
Sun Solaris $1,362 $1,366 $1,377 $1,383
Linux (Server) $1,407 $1,568 $1,771 $1,980
IBM AIX $1,010 $1,021 $1,042 $1,050
IBM System z $973 $975 $978 $980
HP-UX $821 $829 $844 $855
Total $10,993 $11,711 $12,469 $13,155
If you assume all the boxes cost the same then Windows has 50% part share of the number of boxes that ship. Clear they have more on a numbers basis since they on average cost less the UNIX ones list above. I will see if I can get the numbers of units shipped that goes along with this report.
Let me go on record that I agree with your points.
- Memory contention will be a challenge (the big point of the blog)
- I/O bandwidth is as well (no help here offered)
- Throughput does not scale up as fast as the number of cores
- There are *few* SINGLE applications that can take advantage of multi-core
I was at a large bank in NYC city discussing this with them last month. They have 70,000 server machines (physical). Close to 50,000 run Windows. 98% of those windows machines run 1 application, actually less then one because and app is often 3 tiers and spread across multiple machines. Most of these “apps” run <10% cpu utilization. There is a huge cost of running 25,000 apps that do very little work compared to a month end process, or a risk analysis app. I asked how much it cost to run and maintain 25,000 apps that do very little, they would not say…
Can you image the pain and cost of managing and running all the little apps in a huge data center? This argument is not about clustering, HPC, fail-over it is for tons of little apps that consume way too much of the budget.
I would bet that 20% of the apps make up 80% of the CPU, and I/O consumption of the total consumption of a FY1000 data center.
The cloud chip is for these 80% of the apps. That stacking them up and running 200+ copies of the same operating system is silly and will make the memory contention issues even worse. @jim I have seen many a post from you about how wasteful the OS on top of and OS is, why not this time?
It is interesting to me that public clouds are mostly Linux, heck Rackspace doesn’t even have a Windows offering yet. Clearly public clouds are about developing new applications. As the cloud market matures I am hopeful that it will deal with the long tail of applications not just the 10-20% of the apps that get the most visibility.
I also agree that everyone just sticks in the word cloud to be part of the movement.
I most likely can’t produce a Gartner report that says the long-tail of business applications is run on Windows platform but if there was one I would be willing to bet my kids college tuition that this is true. I have 5 boys it is a lot of money!
One more time:
- Any person that wants to keep there job at a FY1000 only runs one app per OS on the Windows server
- # VMs that run Windows only application will increase dramatically from 10+ today over the coming years
- In many cases the VM will be running the same version of OS consuming tons of memory
- Memory access is one of the challenges in utilizing all the cores in a multi-core chip
- A giant reduction in running all these copies of the OS can be attained if you run more then one app on the OS
- Windows need better app Isolation to run more then one app at a time
- A good way to reduce the 80% of the number of apps that do only 20% of the work is consolidate them
Any better?
It is the boring unwashed mass of applications that are not the least bit technically challenging. I should have been stated that right up front… My bad.
Cheers
GregO
“I was at a large bank in NYC city discussing this with them last month. They have 70,000 server machines (physical). Close to 50,000 run Windows. 98% of those windows machines run 1 application, actually less then one because and app is often 3 tiers and spread across multiple machines. Most of these “apps” run <10% cpu utilization. There is a huge cost of running 25,000 apps that do very little work compared to a month end process, or a risk analysis app. I asked how much it cost to run and maintain 25,000 apps that do very little, they would not say…”
Greg, did you ask them, why even after virtualization has been around for nearly 10 years and having some 50,000+ servers running 1 application each at < 10% cpu utilization, the bank has not thought of server consolidation as a first step?....The bankers are probably too busy giving each other $140 Billion!!! bonuses from the TARP moneyJ.
“The cloud chip is for these 80% of the apps. That stacking them up and running 200+ copies of the same operating system is silly and will make the memory contention issues even worse. @jim I have seen many a post from you about how wasteful the OS on top of and OS is, why not this time?”
Yes, I agree with your point that 200+ copies of OS is not necessary. But “OS on top of OS”, Hypervisor is not an OS in the traditional OS sense. There is a LOT of OS functionality that is NOT in the Hypervisors. If you see KVM, it is only a Linux module and QEMU, not a lot of duplicated OS function like other hypervisor folks have done especially in the area of I/O. Although, I must add that the KVM folks had the luxury of learning from the other hypervisor folks and improvise/innovate on it. Most hypervisors are aroud 100,000 lines of code. In addition, if the workload is application centric, then most of the pages in memory would be application + those pages of OS that are absolutely needed. But you point about why run 200 copies of OS is well taken. So it appears to me that you are proposing something like Solaris Conatainers on Windows, is that right?...
Your Gartner table below, seems to suggest that Windows server has the highest cost reduction, resource utilization and management advantages?...is this because they have the higherst cost, lowest resource utilization and management disadvantages, so that when they are virtualized you get the most out of them. Does this also mean Linux & Unix, have very little inefficeincy built into them, hence very little ROI in cost reduction, resource utilization and management advantages?....
From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Greg O'Connor
Sent: Thursday, February 11, 2010
8:10 AM
To: cloud-c...@googlegroups.com
Cloud computing environment has some inherited constraints and limitations such as high latency and eventual consistency. The cloud Map Reduce implementation has some interesting tricks to get around these constraints and I think it will be very useful for architect designing cloud-based apps.
http://horicky.blogspot.com/2010/02/cloud-mapreduce-tricks.html
Of course, Cloud Map Reduce itself is a strong
alternative choice to Hadoop, which may also be interested to some
members of this group.
Comments and feedback are welcome.
Rgds,
Ricky
As others have said, you have lots of good data here, and it
corresponds to what I also know. I happened to see, two years ago,
utilization data collected from 1000s of server systems, over many
months, that showed the mean, median, and mode of the utilization to
all be 10%-15%. This was not just Wintel or Lintel systems; this
included big *nix boxes, too.
So, I AGREE WITH YOU. Virtualization is good and an appropriate way
forward and a good way -- a slam-dunk -- to use multiple cores. See my
blog post of a while ago on "Why IT departments should not fear
multicore."
(http://bit.ly/buT1gW)
But that's not the point. The point for me is: What does a 48-core
chip have to do with fixing this mess of single-app/single machine?
(Assuming agreement that it is a mess; some would disagree.)
The answer is: Nothing. Zero. Not one blessed thing.
*If* everybody were _already_ happily virtualized or partitioned and
running multiple apps on single boxes, *then* a gazillion cores per
chip would be relevant as a way to reduce the number of boxes.
But they're not already virtualized. As you said. And having this
hardware won't help them do so; the inhibitors have nothing to do with
the number of cores. And we don't even know it's any good, really.
It's just a "here's what I know what to do" from Intel Labs.
Greg Pfister
http://perilsofparallel.blogspot.com/
On Feb 11, 7:10 am, "Greg O'Connor" <oc.g...@gmail.com> wrote:
> I think you guys have completely missed the point about what is happening in
> the virtualization market, in the data center and in particular on the
> Windows platform.
>
> @jim & @jan – Both of you talk about an application @jim “In the absence of
> such an application..” @jan “There are very few bits of code out there used
> for commercial purposes today that can take advantage of a modern Quad
> processor”.
>
> I surely have done a poor job of communicating in this blog.
>
> The point is that Hypervisors and OS can run multiple OS or apps and
> partition the workload across all these cores. It is not about a single app!
> Unless of course we are talking about how a hypervisor or OS as an app that
> schedules, partition and utilizes the silicon.
>
> It is about a lot of apps that quite frankly are not doing a lot of stuff
> concurrently.
>
> Here is the data from *Gartner Dataquest Insight: Virtualization Market Size
> Driven by Cost Reduction, Resource Utilization and Management Advantages,
> January 5 January 2009*
> >http://groups.google.ca/group/cloud-computing/web/frequently-asked-qu...
> > Follow us on Twitterhttp://twitter.com/cloudcomp_groupor
> > @cloudcomp_group
> > Post Job/Resume athttp://cloudjobs.net
> > Buy 88 conference sessions and panels on cloud computing on DVD at
> >http://www.amazon.com/gp/product/B002H07SEC,
> >http://www.amazon.com/gp/product/B002H0IW1Uor get instant access to
--
~~~~~
Register Today for Cloud Slam 2010 at official website - http://cloudslam10.com
Posting guidelines: http://groups.google.ca/group/cloud-computing/web/frequently-asked-questions
Follow us on Twitter http://twitter.com/cloudcomp_group or @cloudcomp_group
Post Job/Resume at http://cloudjobs.net
Buy 88 conference sessions and panels on cloud computing on DVD at
http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B002H0IW1U or get instant access to downloadable versions at http://cloudslam09.com/content/registration-5.html
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
> *Gottfrid:* I�ve been working with Hadoop for the last three years.
> Back in 2007, the New York Times decided to make all the public domain
> articles from 1851-1922 available free of charge in the form of images
> scanned from the original paper. That�s eleven million articles
> available as images in PDF format. The code to generate the PDFs was
> fairly straightforward, but to get it to run in parallel across
> multiple machines was an issue. As I wrote about in detail back then,
> I came across the MapReduce paper from Google. That, coupled with what
> I had learned about Hadoop, got me started on the road to tackle this
> huge data challenge.
http://saviorodrigues.wordpress.com/2009/09/11/whats-the-nyt-doing-with-hadoop/
I am wondering how many customer-organizations in this world would need
to implement similar, if not identical projects, among thousands more
equally straight forward problems which can change the information and
decision taking game forever. .Whom shell they call? Cloudera? they are
handful of people and they are Apache Hadoop, not Cloud MR.
After reading your post Ricky, Amazon should do something by hiring
people with the same know how like Huan Liu and Dan Orban and offer easy
to use MR Amazon API. But then , you are talking of " HBase, Hive, Pig,
Mahout .... etc". What are those and who uses them and why? Why not
Java? When to use a public cloud and when to use in house tools for MR?
Can you write a post with your thoughts?
I am not sure how many people on this group, know really that Map Reduce
is. Sure they know it is a fantastic tool that fits the concept of
cloud, but have no idea whom to call or what to learn first.
There is a huge space to filled by new start-ups, who not only know
technically what they are doing, but have a solid business case for MR
as well
Thanks Rick,
Miha
> ------------------------------------------------------------------------
> *From:* Miha Ahronovitz <mij...@sbcglobal.net>
> *To:* cloud-c...@googlegroups.com
> *Sent:* Fri, February 12, 2010 1:28:19 AM
> *Subject:* Re: [ Cloud Computing ] Cloud Map Reduce Tricks
> ------------------------------------------------------------------------
> *From:* Ricky Ho <rickyp...@yahoo.com>
> *To:* cloud-c...@googlegroups.com
> *Sent:* Thu, February 11, 2010 5:04:37 PM
> *Subject:* [ Cloud Computing ] Cloud Map Reduce Tricks
You are probably right. It looks like the majority audience in this group are biz folks, not tech folks.
Regarding Cloud computing and Map Reduce, from the email thread I feel that Cloud Computing audience focus more in the infrastructure operation aspects (virtualization, economics) while Map Reduce audience focus more in algorithmic aspects (how to structure the processing in an easily parallelizable fashion). Their focus are quite different at the moment.
Although it is technically possible to run a large Map/Reduce job in the cloud (e.g. Run your Hadoop Cluster in EC2, or use Amazon's Elastic Map/Reduce), I don't know if any large enterprise doing this in production yet. I honestly think the bandwidth cost (and time for data upload into the cloud) is prohibitive to doing large scale parallel processing in the cloud. There are some mitigation technique that I mention in an earlier blog here at http://horicky.blogspot.com/2009/08/skinny-straw-in-cloud-shake.html but this is a hard problem in my opinion. I am also looking forward for Amazon to publish some large scale reference customer case using Elastic Map/Reduce.
I am not advocating we should use Cloud MR, as I said the community behind Hadoop is much bigger (and the presence of the strong Cloudera consulting team is another consideration factor as well). I am trying to articulate the architecture of Cloud MR's simplicity (and hence elegance) because it is built on top of a Cloud OS.
"Why not Java ?" I don't know how to answer this question. But programming language is just a tool and you use different tools at different level of abstraction. For example, most of the time when designing a parallel algorithm I will use higher level language like PIG / Hive. And when the design looks right, then we can rewrite the algorithm in Java (if you want more control than what the Pig / Hive compiler gives you). But using Java or not is an implementation decision, not important at the design phase.
But hearing your advices, getting down the Pig/Hive route may cause even more confusion. For those who are interested, there is another mail alias in the Apache Hadoop project.
Sure, I'll write more blogs on this as I learn more along the way. And thanks for your detail feedback and coments.
Rgds,
Ricky
----- Original Message ----
From: Miha Ahronovitz <mij...@sbcglobal.net>
To: cloud-c...@googlegroups.com
Cc: Miha Ahronovitz <mij...@sbcglobal.net>
Sent: Sat, February 13, 2010 5:59:00 PM
Subject: Re: [ Cloud Computing ] Cloud Map Reduce Tricks
Ricky, there is a definite need to apply MR on a commerical sale . right now. If Hadoop entered places like New York Times, it was because it had employee who were visionaries and Whiz Kids who took destiny in their hands See the Interview with Derek Gottfrid. He attended the world Haddop conference because they gave them a discount, went back to NYT and implemented. See what Derek says:
> *Gottfrid:* I’ve been working with Hadoop for the last three years. Back in 2007, the New York Times decided to make all the public domain articles from 1851-1922 available free of charge in the form of images scanned from the original paper. That’s eleven million articles available as images in PDF format. The code to generate the PDFs was fairly straightforward, but to get it to run in parallel across multiple machines was an issue. As I wrote about in detail back then, I came across the MapReduce paper from Google. That, coupled with what I had learned about Hadoop, got me started on the road to tackle this huge data challenge.
Miha and Ricky,
A lot of people on this forum know about MR.
It is a "niche" area right now. Just like cloud data bases are new, MR based
applications are also new and there is still plenty of time 2 to 3 years or
more down the line for MR to become mainstream.
A lot of people for some reason think what Google is doing or did is
everything with respect to cloud. What google does is a very specialized
search niche application.
Other 90%+ world doesn't give a damn about search, although they use it may
be to a lesser extent in the form of intranet search provided by
applications like SharePoint etc, but ERP, CRM, Billing Systems, Messaging,
Collaboration, e-Commerce etc etc are the REAL application of the REAL IT
world.
Right now, it is the time of IaaS, PaaS and SaaS in the context of migarting
existing enterpises, SMBs and startups to the clooud as much AS IS as
possible so that there is minimal cost to the customers. Think of how many
data center are out there in the WORLD in the above market segments. It is a
HUGE market. MR is relatively small market at this time.
Exotic applications are going to be on the back burner for a while,
especially given the bad economy and investment.
> MR is relatively small market at this time.
> Exotic applications are going to be on the back burner for a while,
> especially given the bad economy and investment.
You think of Goggle as an "exotic" application, while the DC IaaS, SaaS
are more important because of the recession? I do agree DC cloud is very
important
I know many people know how Map Reduce works, but few understand the
business potential of MR. MR is equally important to PaaS, SaaS etc..
Hadoop runs in a a disruptive environment, (relative to a DC) and the
biggest hurdle is that it requires a separate environment from the Data
Center, and a separate set of skills. As Cloudera CTO said, Hadoop ( and
MR and the rest), are big hammers in search for nails, Every conceivable
business from Oil and Gas to analyzing business data (quotes, invoices
in massive volumes). This is why SGE (a DRM) was integrated with Haddop.
See this customer quote:
> Sun Grid Engine 6.2 Update 5 allows us to run Hadoop jobs within
> exactly the same scheduling and submission environment we use for
> traditional scalar and parallel loads.
>
> Before Sun Grid Engine 6.2 Update 5 we were forced to either dedicate
> specialized clusters or to make use of convoluted, ad-hoc, integration
> schemes; solutions that were both expensive to maintain and
> inefficient to run. Now we have the best of both worlds: high
> flexibility within a single, consistent and robust, scheduling system*"*
You can read the whole quote at http://www.sun.com/software/sge/ (which
is now part of Oracle).
Ricky's post presented another intriguing idea: What about a Map Reduce
on top of Amazon, as we have so many complementary services to the Cloud
Api's already provided. Assuming Amazon manages to deliver the
technology Ricky comments, we have another way to make MR mainstream,
with HUGE impact in the way we do get information to run a business. If
the in a business negotiation, we have asymmetric information, meanuing
one party knows much more than the othe partyr, gets who is the winner?
If one day MR is as easy to run as any services offered in the Data
Center, MR application wil innundate the Enterprise, litterally. I will
go as far as to say that organizations that ignore HAdoop and MR will no
longer be in business 10 years from now.
Miha
MR is data-intensive by definition. After all, one of the 'tricks' in
MR is to take huge amounts of data and split it up into smaller
individual files so the M routines can ingest it.
Once again, it's not cloud compute that is 'hard', it's storage. The
mere scheduling of CPU resources is very simple, and we've been doing it
in various formats/fashions for decades now. In fact, one can argue
that cloud compute is a return to batch jobs.
But unless the cloud storage mechanisms get to a point where they are
standardized - via CDMI - and onerous charges for storage and retrieval
aren't in the business model - MR won't take off as a viable model for
the cloud.
Data is central, compute is peripheral.
Rob
Robert Peglar
Vice President, Technology, Storage Systems Group
Xiotech Corporation | Toll-Free: 866.472.6764
o 952 983 2287 m 314 308 6983 f 636 532 0828
Robert...@xiotech.com | www.xiotech.com
-----Original Message-----
See this customer quote:
Miha
> Follow us on Twitter http://twitter.com/cloudcomp_group or
> @cloudcomp_group Post Job/Resume at http://cloudjobs.net Buy 88
> conference sessions and panels on cloud computing on DVD at
> http://www.amazon.com/gp/product/B002H07SEC,
> http://www.amazon.com/gp/product/B002H0IW1U or get instant access to
> downloadable versions at
> http://cloudslam09.com/content/registration-5.html
>
> ~~~~~ You received this message because you are subscribed to the
> Google Groups "Cloud Computing" group. To post to this group, send
> email to cloud-c...@googlegroups.com To unsubscribe from this
> group, send email to cloud-computi...@googlegroups.com
>
>
>
>
>
> Think of how many
> data center are out there in the WORLD in the above market segments.
It is a
> HUGE market. MR is relatively small market at this time.
>
> Exotic applications are going to be on the back burner for a while,
> especially given the bad economy and investment.
--
Miha,
Running Hadoop on Amazon EC2 is available today. You can either DIY to install Hadoop into EC2, or pay 15% more EC2 charges to use Elastic Map Reduce.
At the time when Hadoop (also GFS, Goggle MR) is designed, there is no cloud concept out there. Hadoop (pretty much based on the Google model) is fundamentally designed to run in a DataCenter where everything is under your full control, which is very different from the cloud environment we have today.
1) Hadoop has focused a lot in optimizing disk performance (e.g. large files, sequential access ... etc). This is no longer important in the Cloud as the disk I/O has turned into network I/O
2) Hadoop has focused a lot in optimizing network I/O (e.g. replica placement, data colocation ... etc). This again is no longer important as you have no control in the location of data placement
3) Hadoop assumes a static infrastructure (highly distributed, large no of commodity hardware) but doesn't take advantage of the power of elasticity, which is a major strength of the Cloud environment. For example in Hadoop, you cannot add more Mappers or Reducers to speed up the execution after the job started.
Therefore, even though you can run Hadoop in the Cloud today, I seriously doubt the overall architecture is optimized.
So what do we need ? A specially designed SCHEDULER tailored for a) Map/Reduce load characteristic, b) The cloud environment characteristic.
Hadoop's current scheduler do (a) very well, but fail in b).
I haven't looked at SGE very detail. But I would be surprised by any general-purpose scheduler who can do (a) very well. I also think (b) is quite different from a federated grid where SGE is originally designed for. I'd love to be proved wrong.
Cloud MR, to me, seems to be closer because its design is definitely has both (a) and (b) in mind.
However, I think figuring out "What application can I restructure to run in Map/Reduce ?" is even more important than "How to run Map/Reduce efficiently in the infrastructure environment ?". I am referring to the transformation of a sequential algorithm to parallel.
And yes, I completely agree with you. We are at the beginning of an important phase.
I was not saying anything against Ricky's idea or the discussion that you
both were having.
I was refering to two statements in your discussion.
"I am not sure how many people on this group, know really that Map Reduce
is. Sure they know it is a fantastic tool that fits the concept of cloud,
but have no idea whom to call or what to learn first."
"You are probably right. It looks like the majority audience in this group
are biz folks, not tech folks."
Both the above statements are not right...
-----Original Message-----
From: cloud-c...@googlegroups.com
[mailto:cloud-c...@googlegroups.com] On Behalf Of Miha Ahronovitz
Sent: Sunday, February 14, 2010 1:16 AM
To: cloud-c...@googlegroups.com; Miha Ahronovitz
As was said a long, long time ago by one of the guys involved in the
founding of this here whole internet thingy:
Distributed computing, FOO. Tell me where the data is, and I'll tell
you where the computing must be.
Greg Pfister
http://perilsofparallel.blogspot.com/
On Feb 14, 6:28 am, "Peglar, Robert" <Robert_Peg...@xiotech.com>
wrote:
> The only thing that is holding back MR implementations in cloud is our
> old friend, data.
>
> MR is data-intensive by definition. After all, one of the 'tricks' in
> MR is to take huge amounts of data and split it up into smaller
> individual files so the M routines can ingest it.
>
> Once again, it's not cloud compute that is 'hard', it's storage. The
> mere scheduling of CPU resources is very simple, and we've been doing it
> in various formats/fashions for decades now. In fact, one can argue
> that cloud compute is a return to batch jobs.
>
> But unless the cloud storage mechanisms get to a point where they are
> standardized - via CDMI - and onerous charges for storage and retrieval
> aren't in the business model - MR won't take off as a viable model for
> the cloud.
>
> Data is central, compute is peripheral.
>
> Rob
>
> Robert Peglar
> Vice President, Technology, Storage Systems Group
> Xiotech Corporation | Toll-Free: 866.472.6764
> o 952 983 2287 m 314 308 6983 f 636 532 0828
> Robert_Peg...@xiotech.com |www.xiotech.com
> ...
>
> read more »
A very interesting statement!.
If you look at it from the perspective of the most fundamental of all
processing units (human brain) and data (the world around us), is data
peripheral or processing peripheral?...
From your own (brain/processing) perpective data/world becomes peripheral.
But from other's perspective, you become data hence you (your brain which
data now) also becomes peripheral for others.
So the bottom line seems to be processing and data are both central and
peripheral at the same time?. Is this the duality of the nature of
data?...or processing?...
Just food for your thought!.
Many people do not stop and ask one simple question. What is the value that
Google provides in terms of search?....Can you live without google
search?...Ofcourse 90%+ people in this world CAN LIVE without google search.
Most of google's usage is because it is FREE. It is no different than the
100s of unimportant channels that you get in a TV package. If the cable
company charges go up, immediately people will drop all the unnecssary
channels. Same with most web surfing stuff. It has not become essential
services like email, ERP, billing systems etc etc in a business.
MR is a very niche application. You have to have MAP and REDUCE as your
fundamental abstractions. Sure you will have MAP as your fundamental
abstration in many data intensive applications. But how many have
REDUCE?...Is genomic applications have REDUCE?....
Do financial applications have REDUCE?...You can always justify it by
saying, you always DERIVE a PATTERN or an INFERENCE from large amouns of
data so it is a REDUCE abstraction. But where is MAP and REDUCE in an
e-commerce application?....where is MAP and REDUCE in FICO, MM, SCM of
ERP?...
"I know many people know how Map Reduce works, but few understand the
business potential of MR. MR is equally important to PaaS, SaaS etc.."
At this time the business potential is limited, unless they convert all
applications and their data models to MR. Remember we solve world's
problems. You do not create problems to fit to certain abstrations and then
say here is the solution to the problem because we have an algorithm. You
have a solution for the problem, not a problem for a solution.
"As Cloudera CTO said, Hadoop ( and MR and the rest), are big hammers in
search for nails"
It doesn't matter whether you have a big hammer and/or nails, you need to
have something that needs nailing. If there is nothing that needs nailing,
there is no need for the hammer and nails.
"If one day MR is as easy to run as any services offered in the Data
Center, MR application wil innundate the Enterprise, litterally. I will
go as far as to say that organizations that ignore HAdoop and MR will no
longer be in business 10 years from now."
I totally disagree...it is like saying internet.web is all about surfing.
Although surfing is a large segment primarily consisting of consumers,
remember the B2B, B2C, E2E market segments of internet/web, they makeup for
the backbone of the VALUE delivered, not surfing. Similarly, google's search
and facebook do not make up the VALUE of internet/web/cloud, it is the
business services on these that have the real VALUE delivery.
-----Original Message-----
From: cloud-c...@googlegroups.com
[mailto:cloud-c...@googlegroups.com] On Behalf Of Miha Ahronovitz
Sent: Sunday, February 14, 2010 1:16 AM
To: cloud-c...@googlegroups.com; Miha Ahronovitz
The brain is absolutely fascinating, no doubt, as it is its own little
'cloud', I suppose, with all the elements necessary for both compute and
storage. Plus, it commands an entire network (the nervous system) to
get data and send commands. Great design, no question.
Put another way ("data is central, compute is peripheral"), there is
plenty of data without compute, but there is no compute without data.
In the current thread about MapReduce, if the MR codes can't ingest data
fast enough to be efficient, they are not particularly viable. This is
where the rubber hits the cloud road. MR compute in the cloud is
exciting, but getting the MR data efficiently _into_ the cloud is the
trick.
Rob
100 years ago, there was no way to travel from America to Europe in one
day. The reasons are the commercial aviation did not exist. But to leave
the metaphors, here is New York Times article about Cloudera, published
last year:
http://www.nytimes.com/2009/03/17/technology/business-computing/17cloud.html?_r=1
Look how everyone is smiling in the photo: Christophe Bisciglia, Amr
Awadallah, Jeff Hammerbacher and Mike Olson. What is great about them is
that see what other people can not see. This is why Google founders did:
they saw what others did not notice. By the time we, the "masses" will
understand, our opportunity will be to apply hat in hand for jobs in
their billion $ companies...
Also read "What You Didn�t Know About Cloudera"
http://gigaom.com/2010/02/10/what-you-didnt-know-about-cloudera/
> But Olson delivered a surprise when he said that it�s wrong to assume
> that his company is solely focused on open source software. On the
> contrary, Cloudera will diversify out of a strategy focused solely on
> it. �Either this quarter or next we will offer an enterprise software
> bundle consisting of proprietary enhancements for Hadoop users,� Olson
> said. �Our proprietary apps will complement the open source core, and,
> like Facebook and Yahoo, we continue to have core committers to Hadoop.�
Enterprise? Hadoop? Propietary? We can not ignore the Map Reduce
contributions to the cloud... After all, this is the cloud group on Google.
Cheers,
Miha
It's the idea that the data resides at a single location that is the
problem.
The data needs to be where ever it's needed. There isn't a world
bandwidth shortage, so the challenge is how to use it wisely. (Hint: a
distributed file system synchronized by an inter-galactic lock manager
isn't the answer.)
--
Jim Starkey
NimbusDB, Inc.
978 526-1376
1) Create the data at the Cloud in the first place
2) Move the code to the data
3) Partition the data according to processing patterns to maximize data collocation
4) Push more processing at the data source
Please share your bag of tricks as well.
Rgds,
Ricky
----- Original Message ----
From: "Peglar, Robert" <Robert...@xiotech.com>
To: cloud-c...@googlegroups.com
Sent: Sun, February 14, 2010 4:10:04 PM
Invent a problem for a solution is also not a bad idea. Sometimes we don't recognize a need because we cannot imagine beyond what is impossible.
In my opinion, we should always look at things from both directions. You are absolutely right that we should look at our problem and pick the best technology solution. But we should also look at new technologies and imagine what kind of opportunities that it has enabled.
Map/Reduce, in my opinion, belongs to the later one. If you are completely happy with your existing application, there is no point to transform it to Map/Reduce. But on the other hand, there are many new opportunities around you that map/reduce has enabled. Of course, you can choose to just focus in what you have and ignore these opportunities. And your competitors will be very happy that you do that.
I am surprised to find quite a lot of basic algorithms (sort, search, statistical calculation, matrix, query joins, graph, machine learning ... etc) can be represented using Map/Reduce and hence enjoy the power of parallelism.
There are two types of applications. Transaction system (most of the example that you mentioned about) as well as Analytic system. Transaction system is about many concurrent users making simple transaction. Map/Reduce is not relevant for this one. But for analytic system where you want to look across large amount of raw information to extract insight. Map/reduce is a key enabler for this.
Rgds,
Ricky
----- Original Message ----
From: Rao Dronamraju <rao.dro...@sbcglobal.net>
To: cloud-c...@googlegroups.com
“Put another way ("data is central, compute is
peripheral"), there is
plenty of data without compute, but there is no compute without data.”
Data is converted into information through processing (even if it is in the brain)
Data has no value. Information has value.
So data without processing is useless.
Please note that data is processed by perceptions and at this stage has no value because of lack of semantics. Only when it gets processes through cognition that it has semantics and becomes information and hence has value.
Information/Value = Cognition ( No Value = Perception (Data))
From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Ray Nugent
Sent: Sunday, February 14, 2010
9:30 PM
To: cloud-c...@googlegroups.com
Ricky, it is not just computer science. All human learning is all about
generalization of the world/universe. This is the only way human beings can
learn. In fact this is the essence and foundation of machine learning.
Machine's cannot learn unless human beings make them learn. Only way human
beings can make a machine learn is duplicate the human learning process in
machines. So this is the foundation and essence of human learning, machine
learning and learning in all fields of study not just computer science.
"In my opinion, we should always look at things from both directions. You
are absolutely right that we should look at our problem and pick the best
technology solution. But we should also look at new technologies and
imagine what kind of opportunities that it has enabled."
Sure I agree, especially with new and highly disruptive areas you need to do
that. Infact we had a discussion on this forum about Intel's 48 core systems
and how they can be applied in ML and AI areas.
"There are two types of applications. Transaction system (most of the
example that you mentioned about) as well as Analytic system. Transaction
system is about many concurrent users making simple transaction. Map/Reduce
is not relevant for this one. But for analytic system where you want to
look across large amount of raw information to extract insight. Map/reduce
is a key enabler for this."
Yes, I agree with you. Infact MR is a great candidate for future
applications like analytics, ML, AI etc etc. But the real world is quite a
bit transactional systems. The existing data centers are heavily
transactional in nature. So MR is great for future applications not suitable
for the large transaction based IT industry that is existing.
Having "the data come to you" is indeed the trick, or as you say
creating the data inside the cloud in the first place. This works
fairly well if the data is highly fragmented, can be assembled over long
periods of time and has little or no cost involved in the gathering (and
waiting). For example, O(millions) of people typing a few sentences on
keyboards, which is transmitted over relatively low-speed links to a
cloud. Over time, the cloud contains many TB of data. That's
'creating' the data at the cloud in the first place.
However, many commercial compute jobs - that IT directors want to move
into the cloud - already have petabytes of data at rest to be analyzed
and/or used in compute jobs. Moving those datasets into a cloud is
cost-prohibitive at best and logistically impossible at worst. Plus,
moving a small portion of that data into a cloud, compute, move the
results back to the commercial institution is also a non-starter, mostly
due to the transport time involved. Large datasets are going to stay
put and private clouds will have to be built around them, and/or
non-cloud compute resources put in place (such as those already present
in major datacenters worldwide).
The only way these commercial jobs will migrate into a cloud is through
sufficient bandwidth and sufficiently high-speed links, such as the
overlay Internet2, where one can put up a 10gb/s virtual circuit between
two points. This kind of connectivity and bandwidth is essential to
public cloud's success for anything other than small datasets.
There is also promising R&D now in the areas of data reduction
(compression, incrementalization, deduplication) that will also help.
So far, though, no major breakthroughs. I remain hopeful.
There was a discussion in this group arguing about the actual law of
occurrences of failures. I personally found very interesting that the
failures were shown to be NOT independent events. If you search the
archives, you'll find a reference to a statistical study that the CPU
and disk failures in the same room had short lag. Mirroring the state
of VMs to a box powered from a different grid is a power supply fault
tolerance. I can't tell if that's a bigger deal than "$2.00 component"
failure. Can you? Just curious how much concerned should I be.
If 1 X-core box is N times cheaper than X/Y Y-core boxes, I can afford
to waste N times more resources (this automatically means I am not
reaching the capacity for neither of the resources). Alternatively, I
can afford N times more guests (with the overhead guests being
passive, waiting for failover), and reduce time to recovery. Powering
off the cores without tasks is easier to achieve in a larger box than
a smaller box (6 x quad-cores @70% = 6 cores idle = can be powered
off; 1 x 24-core @70% = 7 cores idle = can be powered off).
Then the failover cost - yes, it is more expensive to migrate the
whole 24-core box than one quad-core (kinda, "stop the world" for more
vs less VMs at once). But does this mean you need to do it at the same
rate? Of course, not.
Several people banged on about "memory bandwidth". I buy this as a
prospect. What is a reliable way to estimate it for any given
application?
Sassa
On Feb 10, 2:05 pm, Jan Klincewicz <jan.klincew...@gmail.com> wrote:
> Erik:
>
> Failover refers to the ability of a OS instance to RESTART after failure.
> Failover still means sufficient downtime to re-boot an OS and load apps.
>
> Mirroring the memory state (and CPU state) is what Marathon EverRun
> does.(I mentioned them in my last post) Marathon can do this for physical
> as well as Virtual Xen servers. I suppose you could call this a two-node
> cluster, but the real term in "Fault Tolerant" as opposed to "Highly
> Available".
>
> A configuration where every VM in a 48-core box was running in lockstep with
> its twin on a machine powered by a separate grid would be a pretty reliable
> design.
>
> On Tue, Feb 9, 2010 at 10:49 PM, Erik Sliman <erikslima...@gmail.com> wrote:
> > Jim and Jan,
>
> > Your points are all valid, and do primary concerns. I understand the
> > concerns with memory contention and failover. Being that this was once an
> > argument on why cheap wintel boxes could never displace the mainframe, my
> > gut tell me that these high density core chips do have a real chance in the
> > cloud, if, for no other reason, then for their ability to save space and
> > reduce power consumption. So, let's play devil's advocate with the
> > concerns.
>
> > Memory contention: imagine 48 cores and 12-channel memory. What is the
> > REAL % of wait time for the threads? It seems potentially contentious, but
> > without intricate knowledge of what % of a core's time requires an open
> > memory channel, I don't really know how contentious it will really be,
> > particularly considering potential core idle time.
>
> > Failover: Why does a VM have to depend on a CPU housing for this? Let's
> > assume it is using a fiber optic SAN and the memory state of a VM's
> > requiring high availability is mirrored. Why can't the VM fail over to
> > another box if the box it is in fails? This is, fundamentally, the type of
> > clustering that made cheap hardware capable of being used to build data
> > centers and supercomputers.
>
> > Xen does it
> >http://sheepy.org/node/65
>
> > Erik
> > OpenStandards.net
>
> > On Tue, Feb 9, 2010 at 7:57 PM, Jan Klincewicz <jan.klincew...@gmail.com>wrote:
>
> >> Fundamentally, I agree with Jim. There are very few bits of code out
> >> there used for commercial purposes today that can take advantage of a modern
> >> Quad processor. Once a chip has gone multi-core, adding additional cores is
> >> not that big a deal, and we see at this point the move from quad core to hex
> >> core gives about a 30% increase. This is not linear. Also, servers are not
> >> CPU bound, thus we get diminishing returns pumping them full of cores.
> >> Right now they are out of sync with I/O and Storage. Memory has become much
> >> less expensive, but we have precious few applications that can use more than
> >> 4GB since most of the world runs 32-bit code. I quoted 4GB of server memory
> >> to customer today at $208.00 U.S.
>
> >> HPTC outliers : Please refrain from reminding me how your protein folding
> >> app can chew up a TB of RAM and 64 cores ... What's your email server
> >> running ?
>
> >> Virtualization is a paradigm that already shifted 10 years ago. It IS a
> >> good way to squeeze more resources out of a single box, but very few truly
> >> Fault Tolerant solutions exist for SMP VMs. Check out Marathon Technologies
> >>http://www.marathontechnologies.com/for that. Running 80 VMs on a host
> >> that will inevitably fail sometime is not what most people do in Production
> >> environments.
>
> >> Jim likes to talk about "sleds." It must be snowing where he is too. I
> >> find the concept very similar to blade servers (which we also called a
> >> paradigm shift five years ago.) Sharing components like power supplies is
> >> a great idea, likewise virtualized I/O. More efficient servers, though are
> >> still an evolutionary and not revolutionary accomplishment.
>
> >> A company called 3Leaf http://www.3leafsystems.com/seems to have a
> >> means of cobbling together a huge SMP box out of smaller ones and sharing
> >> memory, but again. aside from HPTC applications, few programs scale up this
> >> well in the commodity space.
>
> >> I am having my cojones busted for suggesting that running high densities
> >> of VMs on dirt-cheap white boxes is ill-advised, but I stand by my assertion
> >> that if all your eggs are in one basket, don't cheap out on the basket.
>
> >> Moore's Law has stood the test of quite a bit of time, but at the end of
> >> the day,what are the practical ramifications of mega-powerful CPUs without
> >> highly available apps to run on them ? If I were to predict the NEXT real
> >> paradigm shift, I would look for a more grid-oriented software architecture,
> >> where the loss of a single server is insignificant. Google seems to
> >> operate its search this way, but they are an "outlyer" with the capacity and
> >> finances to focus on a specific area of compute.
>
> >> IMO, Intel is looking to attach the "Cloud" mojo to a product to jump on
> >> the bandwagon just like Compaq called itself the "Non-Stop Internet Company"
> >> back in 1999.
>
> >> On Tue, Feb 9, 2010 at 5:56 PM, Jim Starkey <jstar...@nimbusdb.com>wrote:
>
> >>> I'm skeptical, very skeptical. I see the system cost of large number of
> >>> cores -- memory contention and memory bandwidth contention, but I don't see
> >>> the benefit unless there is an application that needs memory shared between
> >>> a large number of threads. 48 cores with 36 stalled waiting on the memory
> >>> controller doesn't strike me as a good architecture.
>
> >>> In the absence of such an application (and one that doesn't also require
> >>> scale-out), a more useful configuration is a server sled -- single board,
> >>> single power supply, on board switch but a half dozen servers each with
> >>> dedicated memory and maybe a dedicated local disk. James Hamilton has
> >>> written about these. A sled gives the server density of a massive-core
> >>> system without the memory contention, and is probably cheaper.
>
> >>> Intel, I think, is pushing what they think they know how to build.
> >>> Whether there is any market pull for this, I don't know, but I doubt it.
>
> >>> The future doesn't belong to scale-up (bigger, faster machines) but to
> >>> scale-out (more, cheaper machines). Maybe Intel is just looking for their
> >>> next boat to miss, but cloud computing will always be happier with more,
> >>> cheaper machines (Jan will insist on a high powered logo on each on,
> >>> though).
>
> >>> GregO wrote:
>
> >>>> Below is a snippet from my latest blog..
>
> >>>> I would like to hear from others what the effect of the "Cloud Chip"
> >>>> will have on Virtualization and Cloud Computing...
>
> >>>> Thanks,
> >>>> GregO
>
> >>>> Moore’s Law: The Future of Cloud Computing from the Bottom Up
>
> >>>> I'm a serial entrepreneurial leader. It's an art/science, left/right
> >>>> brain thing. I have to say that one of the most challenging parts of
> >>>> creating a compelling strategy, leading a company or building products
> >>>> is getting people to see the possibilities, transitions and tipping
> >>>> points. Imagineering the future calls me to look back at what made
> >>>> companies great -- specifically, how they capitalized on paradigm
> >>>> shifts while the rest missed it. Reading the recent bestseller,
> >>>> Outliers, it struck me that, not only do you have to be smart, but you
> >>>> have to be in the right place with the experience to see and grab the
> >>>> brass ring.
>
> >>>> Moore's Law is one of those history lessons that have traditionally
> >>>> been a touchpoint that points the way to the future. Simply put,
> >>>> Moore's law describes a long-term trend in the history of computing
> >>>> hardware, in which the number of transistors that can be placed
> >>>> inexpensively on an integrated circuit has doubled approximately every
> >>>> two years.
>
> >>>> Translation: compute power has reliably doubled at a decreased cost
> >>>> every two years.
>
> >>>> In a recent announcement, Intel gave a glimpse of what the future will
> >>>> look like. The "Cloud" chip will have 48 cores, is available to
> >>>> Intel's ISV partner today and will be shipping in volume in less then
> >>>> 18 months. The quote from the Intel dude stated that it will increase
> >>>> the power of what is available today by 10-20 times. Oh my.... Buckle
> >>>> your seatbelt .... Moore's law just took a giant step up the paradigm.
>
> >>>> <the rest athttp://blog.appzero.com/>
>
> >>> --
> >>> Jim Starkey
> >>> Founder, NimbusDB, Inc.
> >>> 978 526-1376
>
> >>> --
> >>> ~~~~~
> >>> Register Today for Cloud Slam 2010 at official website -
> >>>http://cloudslam10.com
> >>> Posting guidelines:
> >>>http://groups.google.ca/group/cloud-computing/web/frequently-asked-qu...
> >>> Follow us on Twitterhttp://twitter.com/cloudcomp_groupor
> >>> @cloudcomp_group
> >>> Post Job/Resume athttp://cloudjobs.net
> >>> Buy 88 conference sessions and panels on cloud computing on DVD at
> >>>http://www.amazon.com/gp/product/B002H07SEC,
> >>>http://www.amazon.com/gp/product/B002H0IW1Uor get instant access to
> >>> downloadable versions at
> >>>http://cloudslam09.com/content/registration-5.html
>
> >>> ~~~~~
> >>> You received this message because you are subscribed to the Google Groups
> >>> "Cloud Computing" group.
> >>> To post to this group, send email to cloud-c...@googlegroups.com
> >>> To unsubscribe from this group, send email to
> >>> cloud-computi...@googlegroups.com
>
> >> --
> >> Cheers,
> >> Jan
>
> >> --
> >> ~~~~~
> >> Register Today for Cloud Slam 2010 at official website -
> >>http://cloudslam10.com
> >> Posting guidelines:
> >>http://groups.google.ca/group/cloud-computing/web/frequently-asked-qu...
> >> Follow us on Twitterhttp://twitter.com/cloudcomp_groupor
> >> @cloudcomp_group
> >> Post Job/Resume athttp://cloudjobs.net
> >> Buy 88 conference sessions and panels on cloud computing on DVD at
> >>http://www.amazon.com/gp/product/B002H07SEC,
> >>http://www.amazon.com/gp/product/B002H0IW1Uor get instant access to
> >> downloadable versions at
> >>http://cloudslam09.com/content/registration-5.html
>
> >> ~~~~~
> >> You received this message because you are subscribed to the Google Groups
> >> "Cloud Computing" group.
> >> To post to this group, send email to cloud-c...@googlegroups.com
> >> To unsubscribe from this group, send email to
> >> cloud-computi...@googlegroups.com
>
> > --
> > ~~~~~
> > Register Today for Cloud Slam 2010 at official website -
> >http://cloudslam10.com
> > Posting guidelines:
> >http://groups.google.ca/group/cloud-computing/web/frequently-asked-qu...
> > Follow us on Twitterhttp://twitter.com/cloudcomp_groupor
> > @cloudcomp_group
> > Post Job/Resume athttp://cloudjobs.net
> > Buy 88 conference sessions and panels on cloud computing on DVD at
> >http://www.amazon.com/gp/product/B002H07SEC,
> >http://www.amazon.com/gp/product/B002H0IW1Uor get instant access to
> > downloadable versions at
> >http://cloudslam09.com/content/registration-5.html
>
> > ~~~~~
> > You received this message because you are subscribed to the Google Groups
> > "Cloud Computing" group.
> > To post to this group, send email to cloud-c...@googlegroups.com
> > To unsubscribe from this group, send email to
> > cloud-computi...@googlegroups.com
>
> --
> Cheers,
> Jan
Posting guidelines: http://groups.google.ca/group/cloud-computing/web/frequently-asked-questions
Post Job/Resume at http://cloudjobs.net
Buy 88 conference sessions and panels on cloud computing on DVD at
http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B002H0IW1U or get instant access to downloadable versions at http://cloudslam09.com/content/registration-5.html
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
That's not even close to true. Failover can only occur when another
node knows enough cluster state to recognize the necessity, and it would
have to be up to know that. I was working on HA clusters as long ago as
1992 which could fail over *much* faster than the nodes could boot, and
the state of the art hasn't regressed since then. The limiting factor
tends to be not the speed with which anything can be brought up, since
nodes can be in varying states of readiness from "barely booted" to
"actively processing the same input but with output disabled" depending
on need, but the time necessary to keep transient problems from causing
premature failover.
Lets say to conduct a weekly analytic processing, you need to gather the last 5 years of data, which is in multiple petabyte range.
This huge volume of data can be partitioned into 2 kinds ...
A = What you have produced over the last 5 years.
B = What you just produced over the last week.
You ship the hard drivers containing A to Amazon and create B in the cloud onwards.
Will this work ?
I assume ...
1) The output of the analytic result is small and downloading it is not a concern.
2) There is a storage cost to store petadata in Amazon Cloud long term, which I think is acceptable.
3) There is no security concern to store the data in Amazon Cloud long term, which I think is reasonable because otherwise you won't consider running Map/Reduce in the cloud anyway.
One of the problems is that joining datasets A and B is not trivial.
You'd have to not just collect set B (e.g. transaction logs, ordered
updates, etc.) but also a mechanism to play the I/O back into set A in
order. In other words, it's not just a pile of data, it's the ordering
of the I/O that counts as well. Now, if you have pure unstructured
data, a bit pile, it's much easier to perform union (A,B).
1) Agree, analytic results are usually small and not a concern.
2) Disagree, the current business models are very skewed towards small
datasets. I've already posted about how it's more cost-efficient to buy
and operate your own terabyte than it is to upload it to S3, keep it,
and download it (once) in a given year. When you are charged 10
cents/month per GB just for pure storage and also 10 cents per GB
transferred, in or out, that's cost-inefficient. In order to compete
with efficient onsite storage, the pricing should be at least an order
of magnitude cheaper. In terabyte terms, it's $100/TB/month and another
$100 to transfer it one way, one time. That's $1,200/TB/year just for
storage. I know plenty of enterprise storage vendors that would be
delighted to sell you a terabyte that runs for 5 years for $6,000 or a
petabyte for $6 million. At that rate, they'd pay your co-lo bill too,
and be _way_ money ahead.
3) There are certainly security concerns - unless you store data
encrypted at rest, which alleviates some of them. More importantly,
there are huge integrity concerns in large public stores because they
typically implement storage devices that do not perform DIF (e.g. SATA
disk drives). No DIF, no integrity at rest.
Cheers
Jim and Jan,
Your points are all valid, and do primary concerns. I understand the concerns with memory contention and failover. Being that this was once an argument on why cheap wintel boxes could never displace the mainframe, my gut tell me that these high density core chips do have a real chance in the cloud, if, for no other reason, then for their ability to save space and reduce power consumption. So, let's play devil's advocate with the concerns.
Memory contention: imagine 48 cores and 12-channel memory. What is the REAL % of wait time for the threads? It seems potentially contentious, but without intricate knowledge of what % of a core's time requires an open memory channel, I don't really know how contentious it will really be, particularly considering potential core idle time.
Failover: Why does a VM have to depend on a CPU housing for this? Let's assume it is using a fiber optic SAN and the memory state of a VM's requiring high availability is mirrored. Why can't the VM fail over to another box if the box it is in fails? This is, fundamentally, the type of clustering that made cheap hardware capable of being used to build data centers and supercomputers.
Xen does it
http://sheepy.org/node/65
Erik
OpenStandards.net
On Tue, Feb 9, 2010 at 7:57 PM, Jan Klincewicz <jan.kli...@gmail.com> wrote:
Fundamentally, I agree with Jim. There are very few bits of code out there used for commercial purposes today that can take advantage of a modern Quad processor. Once a chip has gone multi-core, adding additional cores is not that big a deal, and we see at this point the move from quad core to hex core gives about a 30% increase. This is not linear. Also, servers are not CPU bound, thus we get diminishing returns pumping them full of cores. Right now they are out of sync with I/O and Storage. Memory has become much less expensive, but we have precious few applications that can use more than 4GB since most of the world runs 32-bit code. I quoted 4GB of server memory to customer today at $208.00 U.S.
HPTC outliers : Please refrain from reminding me how your protein folding app can chew up a TB of RAM and 64 cores ... What's your email server running ?
Virtualization is a paradigm that already shifted 10 years ago. It IS a good way to squeeze more resources out of a single box, but very few truly Fault Tolerant solutions exist for SMP VMs. Check out Marathon Technologies http://www.marathontechnologies.com/ for that. Running 80 VMs on a host that will inevitably fail sometime is not what most people do in Production environments.
Jim likes to talk about "sleds." It must be snowing where he is too. I find the concept very similar to blade servers (which we also called a paradigm shift five years ago.) Sharing components like power supplies is a great idea, likewise virtualized I/O. More efficient servers, though are still an evolutionary and not revolutionary accomplishment.
A company called 3Leaf http://www.3leafsystems.com/ seems to have a means of cobbling together a huge SMP box out of smaller ones and sharing memory, but again. aside from HPTC applications, few programs scale up this well in the commodity space.
I am having my cojones busted for suggesting that running high densities of VMs on dirt-cheap white boxes is ill-advised, but I stand by my assertion that if all your eggs are in one basket, don't cheap out on the basket.
Moore's Law has stood the test of quite a bit of time, but at the end of the day,what are the practical ramifications of mega-powerful CPUs without highly available apps to run on them ? If I were to predict the NEXT real paradigm shift, I would look for a more grid-oriented software architecture, where the loss of a single server is insignificant. Google seems to operate its search this way, but they are an "outlyer" with the capacity and finances to focus on a specific area of compute.
IMO, Intel is looking to attach the "Cloud" mojo to a product to jump on the bandwagon just like Compaq called itself the "Non-Stop Internet Company" back in 1999.
On Tue, Feb 9, 2010 at 5:56 PM, Jim Starkey <jsta...@nimbusdb.com> wrote:I'm skeptical, very skeptical. I see the system cost of large number of cores -- memory contention and memory bandwidth contention, but I don't see the benefit unless there is an application that needs memory shared between a large number of threads. 48 cores with 36 stalled waiting on the memory controller doesn't strike me as a good architecture.
GregO wrote:
Jim Starkey
Founder, NimbusDB, Inc.
978 526-1376
--
~~~~~
Register Today for Cloud Slam 2010 at official website - http://cloudslam10.com
Posting guidelines: http://groups.google.ca/group/cloud-computing/web/frequently-asked-questions
Follow us on Twitter http://twitter.com/cloudcomp_group or @cloudcomp_group
Post Job/Resume at http://cloudjobs.net
Buy 88 conference sessions and panels on cloud computing on DVD at http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B002H0IW1U or get instant access to downloadable versions at http://cloudslam09.com/content/registration-5.html
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
--
Cheers,
Jan
--
~~~~~
Register Today for Cloud Slam 2010 at official website - http://cloudslam10.com
Posting guidelines: http://groups.google.ca/group/cloud-computing/web/frequently-asked-questions
Follow us on Twitter http://twitter.com/cloudcomp_group or @cloudcomp_group
Post Job/Resume at http://cloudjobs.net
Buy 88 conference sessions and panels on cloud computing on DVD at
http://www.amazon.com/gp/product/B002H07SEC, http://www.amazon.com/gp/product/B002H0IW1U or get instant access to downloadable versions at http://cloudslam09.com/content/registration-5.html
~~~~~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-c...@googlegroups.com
To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
For the point (2), the cost is not just hardware cost but also the administrator cost. If my company is google-size, then of course I won't consider running my business on Amazon. But what if I am a small online store. I think the fundamental question is where is the breakeven point (in terms of data volume and processing need) that public cloud doesn't make sense any more. But this is a general Cloud Computing question and non-specific to Map/Reduce.
My thought process is ...
a) If the security risk concern is so high that you are willing to pay the difference between purchasing and renting disk, go for it. But don't ignore you need to hire a DBA, and you may need to pay for the bandwidth cost of loading the data into the cloud.
b) If you cannot tolerate the data upload latency or don't want to pay for the upload bandwidth cost that you are willing to pay the difference between purchasing and renting CPU, go for it. But don't ignore you need to hire a system admin, as well as setting up the data center.
c) Well, if you actually don't have that much to process. You don't have to invent a new application just to use the cloud and map/reduce. Stay at where you are.
As for system admins, you don't need to pay for those necessarily, no.
But again I am looking at enterprise-scale which still needs an admin,
again regardless if the compute is in the cloud or not. Expertise is
always needed. What isn't needed is infrastructure babysitters, to be
colloquial. Unfortunately, many commercial datacenters have those out
of a lack of regard for good datacenter design, so the perception is
there. They've designed their datacenters inefficiently and poorly, so
they have to throw FTEs at the problem.
I agree for SMB the public cloud may make sense, no question. But I am
trying to solve petabyte-sized problems and the sheer physics of latency
and distance - never mind the outrageous costs - just don't permit much
use of public cloud. This is not to say it's not useful in several use
cases, as you correctly state. I think MR has great potential in the
cloud for small datasets.
> Before you veer so close to calling someone else a liar, <<That's not
> even close to true.>> consider that clusters with which you have had
> experience do not represent the entire universe of clusters.
If you don't want people to point out that your claims are untrue, don't
make untrue claims. The clusters or virtual infrastructures with which
you've had experience don't represent the entire universe either.
> Also,
> there are many different ways of achieving "fail-over."
Indeed. You referred to "cluster-type" failover. What does
"cluster-type" mean? You didn't specify, so I applied the definition
that would be most intuitive to people who've worked with clusters since
before virtualization became all the rage, and according to that
definition your claim remains nowhere close to true. It's entirely
possible to do the same kind of clustering between virtual machines as
was previously done between physical ones, instead of relying on the VM
hosts to do it, and by doing so one can achieve failover times much
lower than boot times.
> This conversation is about fail-over of virtual servers, and when a host
> in a resource pool fails, the VMs which were running on that host need
> to re-load from scratch on a surviving host.
So you *assume*, but you know what they say about assumptions. Just
because you hadn't already thought of other ways to do failover doesn't
mean those ways don't exist.