Business Intelligence solution in Cloud Computing

SRINIVASAN GANESAN

unread,

Jun 17, 2008, 3:40:14 PM6/17/08

to cloud-c...@googlegroups.com

Folks,
Thanks for sharing valuable points....Just by reading the postings i have picked up quite a bit of information...
I was wondering if any of you have experience (or know a vendor) in running a data warehouse based business intelligence solution in a cloud.
For instance, accept data through FTP, run it through an ETL tool to load the dimensional model and point the reports, dashboards and what not against the model...
Do the cloud vendors support this model?
Thanks
Ramesh.

Khazret Sapenov

unread,

Jun 17, 2008, 4:09:05 PM6/17/08

to cloud-c...@googlegroups.com

Ramesh,

I know similar solution from NASDAQ

quote:

NASDAQ Market Replay provides a NASDAQ-validated replay and analysis of the activity in the stock market. The application is built using the Adobe Flex and AIR platform, and utilizes the Amazon Simple Storage Service (S3) for persisting historical market data.

sources:
https://data.nasdaq.com/mr.aspx and

http://www.infoq.com/articles/nasdaq-case-study-air-and-s3

salut,

Khaz Sapenov

Naren Chawla

unread,

Jun 17, 2008, 4:35:34 PM6/17/08

to cloud-c...@googlegroups.com

You can also look at LucidEra - http://www.lucidera.com/solutions/index.php

--Naren

--- On Tue, 6/17/08, Khazret Sapenov <sap...@gmail.com> wrote:

Subhasis Dasgupta

unread,

Jun 18, 2008, 3:43:24 AM6/18/08

to cloud-c...@googlegroups.com

This is one link I have seen but I have not used it they are providing BI solutins on EC2
Pentaho
http://blog.vmdatamine.com/2007/08/pentaho-business-intelligence-suite-on.html
Weka
http://blog.vmdatamine.com/2008/02/gridweka-on-ec2.html

-Subhasis

2008/6/18 Naren Chawla <naren_...@yahoo.com>:

--
Subhasis Dasgupta
Indian Representative
Kaavo Inc
Stamford
CT, USA
www.kaavo.com
Phone : +919830282548
skype : subhasis.dasgupta

Dilli Babu

unread,

Jun 18, 2008, 8:59:32 AM6/18/08

to Cloud Computing

For data intensive requirements such as clickstream analysis, Call
data reports etc, there is a cloud edition available from Vertica in
Amazon web services.

check here for the details:

http://solutions.amazonwebservices.com/connect/entry.jspa?externalID=1469

If you have huge data and have issues in generating data intensive
reports, vertica's columnar on the cloud architecture will be a good
option.
--
Best Regards,
Dilli Babu
On-line Computing Architect,
DataSisar,
5 & 6 Walton road,
Bangalore-560001
E-mail: dill...@datasisar.com
Mobile:+919449191299
Visit:http://www.datasisar.com

On Jun 18, 12:43 pm, "Subhasis Dasgupta" <dasgupta.subha...@gmail.com>
wrote:

> This is one link I have seen but I have not used it they are providing BI
> solutins on EC2
> Pentaho
>

> http://blog.vmdatamine.com/2007/08/pentaho-business-intelligence-suit...
> Wekahttp://blog.vmdatamine.com/2008/02/gridweka-on-ec2.html
>
> -Subhasis
>
> 2008/6/18 Naren Chawla <naren_cha...@yahoo.com>:

>
>
>
>
>
> > You can also look at LucidEra -
> >http://www.lucidera.com/solutions/index.php
>
> > --Naren
>

> > --- On *Tue, 6/17/08, Khazret Sapenov <sape...@gmail.com>* wrote:

>
> > From: Khazret Sapenov <sape...@gmail.com>
> > Subject: Re: Business Intelligence solution in Cloud Computing
> > To: cloud-c...@googlegroups.com
> > Date: Tuesday, June 17, 2008, 1:09 PM
>
> > Ramesh,
> > I know similar solution from NASDAQ
> > quote:

> > *NASDAQ Market Replay provides a NASDAQ-validated replay and analysis of

> > the activity in the stock market. The application is built using the Adobe
> > Flex and AIR platform, and utilizes the Amazon Simple Storage Service (S3)

> > for persisting historical market data. *
> > sources:
> >https://data.nasdaq.com/mr.aspxand

> >http://www.infoq.com/articles/nasdaq-case-study-air-and-s3
>
> > salut,
> > Khaz Sapenov

> > On Tue, Jun 17, 2008 at 3:40 PM, SRINIVASAN GANESAN <s...@yahoo.com>

> > wrote:
>
> >> Folks,
> >> Thanks for sharing valuable points....Just by reading the postings i have
> >> picked up quite a bit of information...
> >> I was wondering if any of you have experience (or know a vendor) in
> >> running a data warehouse based business intelligence solution in a cloud.
> >> For instance, accept data through FTP, run it through an ETL tool to load
> >> the dimensional model and point the reports, dashboards and what not against
> >> the model...
> >> Do the cloud vendors support this model?
> >> Thanks
> >> Ramesh.
>
> --
> Subhasis Dasgupta
> Indian Representative
> Kaavo Inc
> Stamford
> CT, USAwww.kaavo.com
> Phone : +919830282548

> skype : subhasis.dasgupta- Hide quoted text -
>
> - Show quoted text -

Utpal Datta

unread,

Jun 18, 2008, 11:15:40 AM6/18/08

to cloud-c...@googlegroups.com

Hi All

I am *very* new to this group. But i am really excited by the quality
of postings in the group. I am learning a lot, quickly.

I have a couple of questions. May be someone has some answers.

1. I think "data in the cloud" is so far a big block to widespread
adoption and using cloud for large, sensitive and mission critical
applications (espicially for Financial organization). Is someone
thinking of a way to leave the data within the user-premises and do
just the computing in the cloud? Kind of a reverse connection back to
the user datacenter.

That way the conventional data respositories can still be used. The
users will not have to worry about the reliability, availability and
(to a large part) security of the data. We still have to worry about
the security of the data travelling back and forth to and from the
cloud to the user data center.

This probably is more relevant for medium to large scale users with
"sensitive" data.

Comments? tips?

2. Considering the "cloud computing" is at the beginning of its
adoption curve, the user data center will, for a long time, have a
mixture of their own Physical, Virtual devices within their datacenter
along with their "virtual" datacenters in one or more clouds (may be
from different vendors).

The user will obviously look for a management portal that seamlessly
crosses the boundaries of Physical, Virtual and Cloud devices (for
discovery, monitoring at the very least).

Are there some talk/thought on standardizing the "cloud managemnet
actions" and "cloud management data" interfaces?

Comments? tips?

Thanks

--utpal

Khazret Sapenov

unread,

Jun 18, 2008, 1:50:41 PM6/18/08

to cloud-c...@googlegroups.com

On Wed, Jun 18, 2008 at 11:15 AM, Utpal Datta <utpa...@gmail.com> wrote:

1. I think "data in the cloud" is so far a big block to widespread
adoption and using cloud for large, sensitive and mission critical
applications (espicially for Financial organization). Is someone
thinking of a way to leave the data within the user-premises and do
just the computing in the cloud? Kind of a reverse connection back to
the user datacenter.

That way the conventional data respositories can still be used. The
users will not have to worry about the reliability, availability and
(to a large part) security of the data. We still have to worry about
the security of the data travelling back and forth to and from the
cloud to the user data center.

This probably is more relevant for medium to large scale users with
"sensitive" data.

Comments? tips?

I was also thinking about some kind of staged DMZ-like data island on premises (with enforced access policies),

that has protected communication/transport channel to various compute cloud providers.

As a simple example, I had a use case with Maya3D render job using NFS/SMB shares for input and output files, where NFS server is located on premises and rendering process was done by multiple remote nodes at Amazon Elastic Compute Cloud, orchestrated by LSF.

salut,

Khaz Sapenov

Chris K Wensel

unread,

Jun 19, 2008, 11:08:15 AM6/19/08

to cloud-c...@googlegroups.com

On Jun 18, 2008, at 8:15 AM, Utpal Datta wrote:

> 1. I think "data in the cloud" is so far a big block to widespread
> adoption and using cloud for large, sensitive and mission critical
> applications (espicially for Financial organization). Is someone
> thinking of a way to leave the data within the user-premises and do
> just the computing in the cloud? Kind of a reverse connection back to
> the user datacenter.
>
> That way the conventional data respositories can still be used. The
> users will not have to worry about the reliability, availability and
> (to a large part) security of the data. We still have to worry about
> the security of the data travelling back and forth to and from the
> cloud to the user data center.
>
> This probably is more relevant for medium to large scale users with
> "sensitive" data.
>
> Comments? tips?

I've been processing large historical data sets for a Financial
company I'm consulting with using Cascading/Hadoop on EC2/S3.

The biggest bottleneck has been getting data to the compute
infrastructure.

The obvious pattern is to have datacenter processes push data to S3,
then have the temporary cluster spin up and pull data from S3, do
something interesting, then push the results to S3, notify the
datacenter the job is complete (SQS), have the datacenter pull down
the results from S3.

Because of the need to support both well defined daily processes and
ad-hoc processes, my clients data generally needs to stay on S3.
Having it pulled from a remote datacenter on duplicate runs would be
extraordinarily slow and expensive considering Amazon charges for
bandwidth in and out. Plus, it is a bit cheaper just to keep data on
S3 than to buy a NAS for storage.

That said, with bandwidth being the bottleneck in the face of the
ability to spin up 100 or 1000 nodes to crunch numbers, larger pipes
into a vendors Cloud would be very welcome. Otherwise your Cloud
solution is only as fast as getting data in and out of it.

chris

--
Chris K Wensel
ch...@wensel.net
http://chris.wensel.net/
http://www.cascading.org/

timothy norman huber

unread,

Jun 19, 2008, 11:18:21 AM6/19/08

to cloud-c...@googlegroups.com

On Jun 19, 2008, at 8:08 AM, Chris K Wensel wrote:

> That said, with bandwidth being the bottleneck in the face of the
> ability to spin up 100 or 1000 nodes to crunch numbers

how big are the datasets you're working with? Random or linear access ?

Timothy Huber
Strategic Account Development

tim....@metaram.com
cell 310 795.6599

MetaRAM Inc.
181 Metro Drive, Suite 400
San Jose, CA 95110

Utpal Datta

unread,

Jun 19, 2008, 12:16:01 PM6/19/08

to cloud-c...@googlegroups.com

You make all the right points on speed, bandwidth, Amazon charging on
bandwidth etc. But consider the need for the user (say a large
financial company with a sensitive business critical application),

1. who will guarantee that the data in S3 is secure from physical and
logical access

2. who will guarantee that the data is always available using a
multi-site recovery system (that is what they would have in their own
data center) that meets their RPO (Recovery Point Objective) and RTO
(Recovery Time Objective) guidelines.

Either Amazon or other Cloud providers will make these available with
EC2 with SP3 (or some other storage mechanism with more robust
security and availability characteristics) or the users will have to
build something similar on their own using EC2 as their basic building
block.

This will be a *very* non-trivial task for any user to do on their own
and they will have to make the decision to put resources to build this
on a cloud or to invest more on their own datacenter.

So I guess a lot will depend on the level of maturity of the clouds.
Not sure if all this work belong in a mid-layer outside of the
original cloud and leave the cloud providers just to provide the basic
building blocks

--utpal

Chris K Wensel

unread,

Jun 19, 2008, 12:27:51 PM6/19/08

to cloud-c...@googlegroups.com

> On Jun 19, 2008, at 8:08 AM, Chris K Wensel wrote:
>
>> That said, with bandwidth being the bottleneck in the face of the
>> ability to spin up 100 or 1000 nodes to crunch numbers
>
> how big are the datasets you're working with? Random or linear
> access ?
>

total data is 100's of G. Individual work loads are ~10G. All linear
(this being Hadoop), but there is much joining, binning, and crunching
between the multiple input datasets (the actual workload translates to
~60 MapReduce jobs, all rendered and managed by Cascading).

So it kinda sucks to have uploads of data to the cluster take longer
than it does to compute on it. Worse since my client then has to fetch
the derived data back.

ckw

Marc Evans

unread,

Jun 19, 2008, 12:46:44 PM6/19/08

to cloud-c...@googlegroups.com

I have been contemplating this issue of safe storage in the cloud. My
opinion is that what I need is at least 4 in the cloud storage vendors,
which I can then layer RAID5 behavior on top of, combined with a
loopback encryption file system. Even with that, pulling the data into
the compute cloud places the data in danger of being observable and
possibly tamperable. This all ignores latency problems, which I am
certain will be a problem, as well as transit costs.

I personally would like my application-at-the-edge software to also span
a number of in the cloud vendors, so that I don't experience vendor
lock-in problems. In particular, I am concerned that my public facing
services will be targets of DDoS attacks and as a result vendors will
consider abruptly discontinuing service.

For these reasons, I have not been able to consider much of what in the
cloud providers can offer to date, though I continue to build proof of
concept packages in preparation for the point in time that the industry
evolves enough to facilitate my needs. I am very curious if others have
similar concerns and if plausible solutions are being found...

- Marc

Chaz.

unread,

Jun 19, 2008, 12:57:55 PM6/19/08

to cloud-c...@googlegroups.com

While data access and recovery is a very important aspect of cloud
computing, I'm curious as to the legal issues surrounding the movement
of data across national boundaries or even across company boundaries.

How does the "cloud" protect data going from the owner to the computing
service without being compromised (read that as sniffed)? Will a
computing service in country A have the right to impose restrictions on
data from another country (even if the results of the computing don't
affect the citizens of country A)? An so on.

Chuck Wegrzyn

Chaz.

unread,

Jun 19, 2008, 1:00:04 PM6/19/08

to cloud-c...@googlegroups.com

Hi Marc! Not the only problems. I'd be worried about trans-country laws
governing the data. After all once it is in country A, the laws of that
country would hold.

Chuck Wegrzyn

Marc Evans

unread,

Jun 19, 2008, 1:06:35 PM6/19/08

to cloud-c...@googlegroups.com

Hey Chuck!

I agree with your concerns. Thus far I have been using vendors within
single governance regions, and then having a policy engine at my
application layer to govern where data is allowed to be operated upon.
So, EU data stays in the EU for example. As the vendors grow to span
multiple boundaries, if they are not providing programmatic interfaces
to allow application layer control of these issues, I may need to avoid
those vendors.

- Marc

On SaaS

unread,

Jun 19, 2008, 1:10:54 PM6/19/08

to cloud-c...@googlegroups.com

Data locality is definitely a huge issue in the cloud. My company works with a lot of multi-nationals with huge data sets in various countries. In many countries, especially the EU ones as well as like Mexico have some fairly strict laws around privacy data (e.g., data with personal info, etc.) Some of these multi-national countries have to architect their on-premise software around these restrictions (e.g., putting on-premise software in each country) and restrict the data movement. One of them took several months to study the laws and legality of data location and movement before implementing their solution.

So the location of the cloud and data is definitely going to be very important to these multi-nationals. That's part of the reasons why Amazon has an EU cloud and Salesforce is building a cloud in Singapore. Some of the countries are also wary of putting any data inside U.S. due to concerns about patriot act. In general the country where the data resides has jurisdiction over it.

--

OnSaaS.net - Blogging about the SaaS and cloud computing world

OnSaaS.info - Providing a continuous stream of SaaS and cloud computing news

Follow on http://twitter.com/onsaas, http://friendfeed.com/onsaas

Pittard, Rick

unread,

Jun 19, 2008, 1:13:22 PM6/19/08

to cloud-c...@googlegroups.com

One big concern are compliance with the data privacy laws in the EU and
other countries which require protection of personal data and that it
not be transmitted to locations that have less protections. Since the
laws in the US are generally less protective than those in the EU, then
additional controls/agreements need to be in place to legally move the
data from the EU to the US.

Rick

Chaz.

unread,

Jun 19, 2008, 1:25:33 PM6/19/08

to cloud-c...@googlegroups.com

I think privacy is one aspect of data movement but what I see as a
bigger problem is that it might become a national security issue. How
about one country not allowing the data to leave once "it" has
possession? Or organizations like the NSA mining the data as it passes
through the borders.

Chuck Wegrzyn

On SaaS wrote:
> Data locality is definitely a huge issue in the cloud. My company works
> with a lot of multi-nationals with huge data sets in various countries.
> In many countries, especially the EU ones as well as like Mexico have
> some fairly strict laws around privacy data (e.g., data with personal
> info, etc.) Some of these multi-national countries have to architect
> their on-premise software around these restrictions (e.g., putting
> on-premise software in each country) and restrict the data movement. One
> of them took several months to study the laws and legality of data
> location and movement before implementing their solution.
>
> So the location of the cloud and data is definitely going to be very
> important to these multi-nationals. That's part of the reasons why
> Amazon has an EU cloud and Salesforce is building a cloud in
> Singapore. Some of the countries are also wary of putting any data
> inside U.S. due to concerns about patriot act. In general the country
> where the data resides has jurisdiction over it.
>
> --

> OnSaaS.net - /Blogging about the SaaS and cloud computing world/

> OnSaaS.info - Providing a continuous stream of SaaS and cloud computing news

> /Follow on http://twitter.com/onsaas, http://friendfeed.com/onsaas/
> /
> /

>>>> ch...@wensel.net <mailto:ch...@wensel.net>
>>>> http://chris.wensel.net/
>>>> http://www.cascading.org/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>>
>>>
>>
>>
>>
>>
>
>
>
> >

Chaz.

unread,

Jun 19, 2008, 1:22:10 PM6/19/08

to cloud-c...@googlegroups.com

That probably works well now. In the future I would expect compute
clouds to be available in 'cheaper' locales (think of Washington
State...lol) or Finland, at that point it becomes a real issue.

Chuck

Chaz.

unread,

Jun 19, 2008, 1:40:20 PM6/19/08

to cloud-c...@googlegroups.com

While I think trans-national data movement will be an area that requires
governance of some kind I think that companies can get around the
problem in other ways. I think it just requires looking at the problem
in a different way.

I'd think the approach is to keep the data still and move the computing
to it. The idea is to see the thousands of machines it takes to hold the
petabytes worth of data as the compute cloud. What needs to move to it
is the programs that can process the data. I've been working on this
approach for the last 3 years (Twisted Storage).

Chuck Wegrzyn

Khazret Sapenov

unread,

Jun 19, 2008, 1:51:55 PM6/19/08

to cloud-c...@googlegroups.com

On Thu, Jun 19, 2008 at 1:40 PM, Chaz. <eprpa...@gmail.com> wrote:

[snip]

I'd think the approach is to keep the data still and move the computing
to it. The idea is to see the thousands of machines it takes to hold the
petabytes worth of data as the compute cloud. What needs to move to it
is the programs that can process the data. I've been working on this
approach for the last 3 years (Twisted Storage).

Chuck Wegrzyn

This is valid approach, that I personally called "Plumber Pattern", when application, encapsulated in some kind of container (e.g. virtual machine image) is marshalled to secure data islands to iteratively do its unique work (say, do a matches on some criterium in Interpol, FBI, CIA, MI5 and other databases, all distributed across continents). Due to utterly confidential nature of these types of data, it is impossible to move them to public storage (at least this time). Above-mentioned case might be extrapolated to some lines of business as well with reduced privacy/security requirements.

Khaz Sapenov

Stuart Altenhaus

unread,

Jun 19, 2008, 1:44:28 PM6/19/08

to cloud-c...@googlegroups.com

I think Chaz is right. There are privacy issues regarding use and exposure of data that vary country by country. If the cloud computes the data, there is no control on where that data is moved for computation, right?

R/s,
Stu Altenhaus

Sent from my Verizon Wireless BlackBerry

-----Original Message-----
From: "Chaz." <eprpa...@gmail.com>

Date: Thu, 19 Jun 2008 13:40:20
To:cloud-c...@googlegroups.com
Subject: Re: Issues of data in the cloud...

On SaaS

unread,

Jun 19, 2008, 1:57:18 PM6/19/08

to cloud-c...@googlegroups.com

That depends on how the cloud is architected, no?

And I would think the cloud providers will have to start answering these questions if they want large enterprises to start adopting the cloud. There maybe no control of which server in the cloud is doing the computation, but service providers may provide options to restrict based on geographic domains.

We have quite a few people here from the cloud providers, maybe they can share some insight?

thx

--

OnSaaS.net - Blogging about the SaaS and cloud computing world

OnSaaS.info - Providing a continuous stream of SaaS and cloud computing news

Chaz.

unread,

Jun 19, 2008, 2:00:55 PM6/19/08

to cloud-c...@googlegroups.com

I know from my work that many firms are reluctant to let there data "out
the door" since they see that as their edge in the market. But even that
aside for a minute, it seems to make more sense to move "small" programs
(relative to the size of the data) then to move massive amounts of data.

So my question is as follows: what makes a good "storage cloud"?

Chuck Wegrzyn

Chaz.

unread,

Jun 19, 2008, 2:20:08 PM6/19/08

to cloud-c...@googlegroups.com

That is one approach - again it seems to indicate the model is the data
moving to the compute resources. The other approach is to look at it
from the data perspective - can the data sit some place and the compute
come to it?

Chuck Wegrzyn

On SaaS wrote:
> That depends on how the cloud is architected, no?
>
> And I would think the cloud providers will have to start answering these
> questions if they want large enterprises to start adopting the
> cloud. There maybe no control of which server in the cloud is doing the
> computation, but service providers may provide options to restrict based
> on geographic domains.
>
> We have quite a few people here from the cloud providers, maybe they can
> share some insight?
>
> thx
>
> On Jun 19, 2008, at 10:44 AM, Stuart Altenhaus wrote:
>
>> I think Chaz is right. There are privacy issues regarding use and
>> exposure of data that vary country by country. If the cloud computes
>> the data, there is no control on where that data is moved for
>> computation, right?
>>
>> R/s,
>> Stu Altenhaus
>>
>> Sent from my Verizon Wireless BlackBerry
>>
>> -----Original Message-----
>> From: "Chaz." <eprpa...@gmail.com <mailto:eprpa...@gmail.com>>
>>
>> Date: Thu, 19 Jun 2008 13:40:20
>> To:cloud-c...@googlegroups.com

>> <mailto:cloud-c...@googlegroups.com>

>>>> <mailto:ch...@wensel.net>>

>>>>> ch...@wensel.net <mailto:ch...@wensel.net>

>>>>> http://chris.wensel.net/
>>>>> http://www.cascading.org/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>>>
>>>
>>
>>
>>
>>
>>
>>
>
> --

> OnSaaS.net - /Blogging about the SaaS and cloud computing world/

> OnSaaS.info - Providing a continuous stream of SaaS and cloud computing news

Ray Nugent

unread,

Jun 19, 2008, 2:17:13 PM6/19/08

to cloud-c...@googlegroups.com

Chris, couple of thoughts -

1) is it possible to have the app run on AWS so that the derived data does not need to traverse back down in real time (that way you could use a lazy download in the background to archive it in their DC while their app accesses the copy in real time on AWS.)?

2) I've been thinking about the problem of upload times as well (in the context of large DNA data sets). The cost of loading into AWS is not that prohibitive so if one where to pre-process that data such that it could be uploaded in a bunch of parrallel processes to AWS you could reduce the bottleneck considerably. In theory.

Ray

----- Original Message ----
From: Chris K Wensel <ch...@wensel.net>
To: cloud-c...@googlegroups.com
Sent: Thursday, June 19, 2008 9:27:51 AM
Subject: Re: Business Intelligence solution in Cloud Computing

> On Jun 19, 2008, at 8:08 AM, Chris K Wensel wrote:
>
>> That said, with bandwidth being the bottleneck in the face of the

>> ability to spin up 100 or 1000 nodes to crunch numbers
>
> how big are the datasets you're working with? Random or linear
> access ?
>

total data is 100's of G. Individual work loads are ~10G. All linear
(this being Hadoop), but there is much joining, binning, and crunching
between the multiple input datasets (the actual workload translates to
~60 MapReduce jobs, all rendered and managed by Cascading).

So it kinda sucks to have uploads of data to the cluster take longer
than it does to compute on it. Worse since my client then has to fetch
the derived data back.

ckw

--
Chris K Wensel
ch...@wensel.net

http://chris.wensel.net/
http://www.cascading.org/

Marc Evans

unread,

Jun 19, 2008, 2:45:44 PM6/19/08

to cloud-c...@googlegroups.com

In my experiences, there are cases where having the data / computation
as close to the customer edge as possible is what is required for an
acceptable user experience. In other cases, the relationship of the user
/ data / computation is not important. Most often, there is a mix of
both. One of the ideas behind Hadoop as I understand it is to bring the
computation to the data location, while also providing for the data to
be in several locations. The scheduler is critical to making good use of
data locality. So yes, I believe that what you are looking for does
exist within Hadoop at a minimum, though I also believe that there is
alot of room to evolve the techniques that it uses.

- Marc

Jim Peters

unread,

Jun 19, 2008, 2:40:35 PM6/19/08

to cloud-c...@googlegroups.com

Even if the cloud providers come up with excellent answers to the security and reliability questions, who's going to trust them? Credit card numbers are one thing, but cloud data is something else entirely.

+J

--
Jim Peters
+415-608-0851

Chaz.

unread,

Jun 19, 2008, 3:10:33 PM6/19/08

to cloud-c...@googlegroups.com

Jim,

I definitely agree with your point. I can't think of very many
multi-nationsls that would let there data out to wander around. I'd
think they would want to protect their data and move the computing
resources close to it....

Chuck

>> <mailto:cloud-c...@googlegroups.com>

>>>> <ch...@wensel.net <mailto:ch...@wensel.net>>

>>>>> ch...@wensel.net <mailto:ch...@wensel.net>

>>>>> http://chris.wensel.net/
>>>>> http://www.cascading.org/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>>>
>>>
>>
>>
>>
>>
>>
>>
>
> --

> OnSaaS.net - /Blogging about the SaaS and cloud computing world/

> OnSaaS.info - Providing a continuous stream of SaaS and cloud
> computing news

> /Follow on http://twitter.com/onsaas, http://friendfeed.com/onsaas/
>
>
>
>
>
>

> --
> Jim Peters
> +415-608-0851
> >

Stuart Altenhaus

unread,

Jun 19, 2008, 2:39:03 PM6/19/08

to cloud-c...@googlegroups.com

If the programs are moved to the data, then what is the distinction between cloud computing and CORBA? Seems like the same basic tenets would have to be in place.

(I'm new to the concept of cloud computing, but do see the opportunities for advancing a network of computers that renders geo location trivial. Surely enhancing existing network clouds such that the computing power were placed at each node, a net-centric approach is achieved... The telcos do that today, right?)

Sent from my Verizon Wireless BlackBerry

-----Original Message-----
From: "Chaz." <eprpa...@gmail.com>

Date: Thu, 19 Jun 2008 14:00:55
To:cloud-c...@googlegroups.com
Subject: Re: Issues of data in the cloud...

Utpal Datta

unread,

Jun 19, 2008, 3:43:50 PM6/19/08

to cloud-c...@googlegroups.com

May be this is a redundant question, where is this protected data
residing? In the cloud or in the user's data center?

If it is in the cloud then we are still dealing with Security,
Availability and Recoverability isues (that everyone agrees on).

If is in the users data center then how will the computing resources
offered (and controlled by Amazon) be brought to that specific user's
datacenter?

--utpal

Lynne VanArsdale

unread,

Jun 19, 2008, 3:57:37 PM6/19/08

to cloud-c...@googlegroups.com

Just joined cloud-computing and this is the first conversation I've received.

A couple of weeks ago I attended Gartner Security where Neil MacDonald spoke on "Adaptive Security." In a nutshell, this approach builds a resilient system for secure data, acting much like the human immune system. It involves whitelisting as the foundation, blacklisting as a mid-tier and learned/adaptive mechanisms at the top. In such an environment, elements would be "autonomic" and self-managing to a large degree, and would share and communicate with other elements to protect workloads and information (as opposed to endpoints). There is a lot more to this vision, and it is probably a number of years away, but it may be a reasonable approach to address the concerns about data security being discussed here.

In any case, does anyone know of any product or standards efforts for the industry to collaborate on a more cohesive architecture for security in the cloud?

Chris K Wensel

unread,

Jun 19, 2008, 4:03:21 PM6/19/08

to cloud-c...@googlegroups.com

CORBA isn't about mobility, it's just typesafe OO RPC. There was work
done by ObjectSpace and GeneralMagic in the 90's on agent based
computing (move code to the data). but that movement died off.

if the Cloud is a collection of compute resources, and you need to
apply them to lots of your data, you have little choice but to move
your data. you can't move the compute power. (unless you order a
shipping container of servers I guess)

ckw

--

Chris K Wensel

unread,

Jun 19, 2008, 3:55:05 PM6/19/08

to cloud-c...@googlegroups.com

1) is it possible to have the app run on AWS so that the derived data does not need to traverse back down in real time (that way you could use a lazy download in the background to archive it in their DC while their app accesses the copy in real time on AWS.)?

The pattern is roughly this:

-- load dataset to S3 from datacenter (in small pieces, in parallel), repeat

- identify current dataset

- boot hadoop cluster

- start job on given dataset

- head of job pulls down parts from S3 in parallel (very natural with Hadoop)

- compete middle of job

- tail of job stuffs results sets into S3 in parallel (again fairly natural with Hadoop)

-- repeat above concurrently as datasets become available (easy to have concurrent Hadoop clusters in EC2).

-- pull data from S3 in parts in parallel

note 'job' above means a given data processing flow. in terms of Hadoop, the 'job' could be dozens of MapReduce jobs on the cluster.

2) I've been thinking about the problem of upload times as well (in the context of large DNA data sets). The cost of loading into AWS is not that prohibitive so if one where to pre-process that data such that it could be uploaded in a bunch of parrallel processes to AWS you could reduce the bottleneck considerably. In theory.

you will see a boost if you spawn multiple connects from one location to S3. it seems (was clearly in the past, unsure as of today) that individual connections were throttled, and up to a point bandwidth from a given ip was throttled. so doing things in parallel by breaking your big data into small parts give you a boost. I can't remember the numbers, else i'd share. its been a couple months since that project.

one benefit of using small parts, is that a given part will be available before the 'whole' is available. S3 won't show things for download that aren't finished uploading. So this also improves things (especially when coupled with SQS).

by 'parts' i mean, I may have locally 10G of data. I will break it into n MB pieces (compressed) and push them up to S3 (in parallel). having a manifest (*.parts file) is great when you need to manage the integrity of individual parts (MD5) and the whole (parts list all available, MD5 on parts file). This in part guarantees you aren't processing a job on partial data (because the upload failed an no one noticed).

Chaz.

unread,

Jun 19, 2008, 4:18:45 PM6/19/08

to cloud-c...@googlegroups.com

Security is a funny issue. Can you ever use a cloud computing complex
and know for certain your data is protected? I'm betting there is no
fool proof way that it can be. So the only real way is to fall back to
what we know today: maintain physical control of it for once that is
gone you are on your own baby.

Chaz.

unread,

Jun 19, 2008, 4:30:30 PM6/19/08

to cloud-c...@googlegroups.com

I don't believe it is possible to have data security in the "cloud"
without having physical security of the data. After all whenever I use a
cloud computer I hope that no one has hacked it to replace the security
modules, or to map memory and look into a running program, etc.

Now if you have to build out an autonomic system we will never have
secure cloud computing. No system today is so tight that it can't be
hacked. Just look at all the attempts to protect DVDs or BD disks...

Chuck Wegrzyn

Lynne VanArsdale wrote:
> Just joined cloud-computing and this is the first conversation I've
> received.
>
> A couple of weeks ago I attended Gartner Security where Neil MacDonald
> spoke on "Adaptive Security." In a nutshell, this approach builds a
> resilient system for secure data, acting much like the human immune
> system. It involves whitelisting as the foundation, blacklisting as a
> mid-tier and learned/adaptive mechanisms at the top. In such an
> environment, elements would be "autonomic" and self-managing to a large
> degree, and would share and communicate with other elements to protect
> workloads and information (as opposed to endpoints). There is a lot
> more to this vision, and it is probably a number of years away, but it
> may be a reasonable approach to address the concerns about data security
> being discussed here.
>
> In any case, does anyone know of any product or standards efforts for
> the industry to collaborate on a more cohesive architecture for security
> in the cloud?
>
>

> On 6/19/08, *Chaz.* <eprpa...@gmail.com

> <mailto:eprpa...@gmail.com>> wrote:
>
>
> Jim,
>
> I definitely agree with your point. I can't think of very many
> multi-nationsls that would let there data out to wander around. I'd
> think they would want to protect their data and move the computing
> resources close to it....
>
> Chuck
>
> Jim Peters wrote:
> > Even if the cloud providers come up with excellent answers to the
> > security and reliability questions, who's going to trust them? Credit
> > card numbers are one thing, but cloud data is something else
> entirely.
> >
> > +J
> >
> > On Thu, Jun 19, 2008 at 10:57 AM, On SaaS <ons...@gmail.com
> <mailto:ons...@gmail.com>

> <mailto:To%3Acloud-...@googlegroups.com>
> >> <mailto:cloud-c...@googlegroups.com

> >>> <mailto:cloud-c...@googlegroups.com

> <mailto:ch...@wensel.net <mailto:ch...@wensel.net>>>

> <mailto:ch...@wensel.net <mailto:ch...@wensel.net>>

Chaz.

unread,

Jun 19, 2008, 4:34:38 PM6/19/08

to cloud-c...@googlegroups.com

And CORBA isn't what I am thinking of, or even HADOOP but things like
JavaSpaces (?).

I'm not sure I would agree you have to ship your data to somewhere else.
After all a "cloud data provider" could create just the secure
environment for holding the data and processing it (isn't that really
what S3 is all about?). The only thing the using company needs to do is
write the program and have it installed, more or less automagically, on
the machines that hold the user's data.

Chuck Wegrzyn

Chris K Wensel

unread,

Jun 19, 2008, 4:39:58 PM6/19/08

to cloud-c...@googlegroups.com

If you are deploying an application in EC2, you must architect it to
survive failure, because it will fail in varying degrees. Subsequently
features of AWS allow you to do that, roughly (booting a pre-
configured xen vm, simple db, sqs, s3, etc etc).

I suggest you do the same regarding security, just assume it's a
hostile environment.

The question is, what features of AWS support you in this? shared
keychains/stores, encrypted volumes, CA, kerberos, ?? or will this
always be left to the user. or could you ever really trust those
services the same way you trust them to not lose data.

That said, not being a security person. What 'cloud security services'
could a provider provide? Or should they even bother.

ckw

http://chris.wensel.net/
http://www.cascading.org/

Ray Nugent

unread,

Jun 19, 2008, 4:51:48 PM6/19/08

to cloud-c...@googlegroups.com

Hey Chuck,

I think the front page of the Wall Street Journal proves that even having physical security of your data does not provide security! :-)

Security is really a business issue. Each layer of security should cost no more than the data is worth. So the concept of "secure enough" becomes important. What security is appropriate for a given type of data and is it more or less secure in the cloud than in the corp DC? Is data inherently "less secure" by virtue of being in the cloud than, say, an employees laptop or flash dongle or "on the wire"? I don't think corporate data centers are a secure as you're suggesting they are...

Ray

----- Original Message ----
From: Chaz. <eprpa...@gmail.com>
To: cloud-c...@googlegroups.com
Sent: Thursday, June 19, 2008 1:30:30 PM
Subject: Re: Issues of data in the cloud...

> <mailto:eprpa...@gmail.com>> wrote:
>
>
> Jim,
>
> I definitely agree with your point. I can't think of very many
> multi-nationsls that would let there data out to wander around. I'd
> think they would want to protect their data and move the computing
> resources close to it....
>
> Chuck
>
> Jim Peters wrote:
> > Even if the cloud providers come up with excellent answers to the
> > security and reliability questions, who's going to trust them? Credit
> > card numbers are one thing, but cloud data is something else
> entirely.
> >
> > +J
> >
> > On Thu, Jun 19, 2008 at 10:57 AM, On SaaS <ons...@gmail.com
> <mailto:ons...@gmail.com>

> <mailto:To%3Acloud-...@googlegroups.com>
> >> <mailto:cloud-c...@googlegroups.com

> >>> <mailto:cloud-c...@googlegroups.com

> <mailto:ch...@wensel.net <mailto:ch...@wensel.net>>>

> <mailto:ch...@wensel.net <mailto:ch...@wensel.net>>

Ray Nugent

unread,

Jun 19, 2008, 4:55:41 PM6/19/08

to cloud-c...@googlegroups.com

Chris, it's the last step I wonder about. If you leave the resultant data on S3 and run whatever app they have that operates against that data on EC2 it seems you could save some time?

Ray

----- Original Message ----
From: Chris K Wensel <ch...@wensel.net>
To: cloud-c...@googlegroups.com

Sent: Thursday, June 19, 2008 12:55:05 PM
Subject: Re: Business Intelligence solution in Cloud Computing

--

Chris K Wensel

ch...@wensel.net

http://chris.wensel.net/

http://www.cascading.org/

Chris K Wensel

unread,

Jun 19, 2008, 5:47:57 PM6/19/08

to cloud-c...@googlegroups.com

If there was a next processing step, then yes it would save time. But those jobs represent all the work being done that isn't done by client/customers of my client.

Chaz.

unread,

Jun 19, 2008, 6:52:32 PM6/19/08

to cloud-c...@googlegroups.com

Ray,

You are absolutely correct. Once you have a person involved it can be
compromised. It is all about risk and how to make it so small it would
take an act of God (or a really large budget) to breach it!

Chuck

> <mailto:cloud-c...@googlegroups.com>
> > <mailto:To%3Acloud-...@googlegroups.com
> <mailto:3Acloud-...@googlegroups.com>>

Chaz.

unread,

Jun 19, 2008, 6:50:51 PM6/19/08

to cloud-c...@googlegroups.com

I think you missed the point. Even having shared keychains, use of X509,
etc. there is no guarantee you data is safe. Once it is in the hands of
a 3rd party you better assume it is compromised.

Perhaps the real solution is to carefully architect your solution to
provide "bulk" services outside the company and leave the critical
things - those that are absolutely vital - to inside the company.

Chuck Wegrzyn

Alan Ho

unread,

Jun 20, 2008, 2:00:07 AM6/20/08

to cloud-c...@googlegroups.com

Hi Chris,

I've looked at this issue quite a bit too. There are a few ways that I think the problem can be "relieved"

1. Don't encourage your clients to download the entire data-set. As long as you provide URLs to the "crunched data", they should only have to pull the data as needed. You can index the data too using SDB too - a nice convenience function for searching the data.

2. See if the customers can split the dataset into sub-datasets, each reachable via some sort of URL. When you run your Map job, each of the Map nodes will be responsible for downloading the data from your clients - you might get some benefits from the parallelization of the download.

3. Use S3 for more of a backing store - If you don't have many clients consuming the data, or you think that the clients will download the data soon after the mapreduce job is complete, they can download it directly from the HDFS (http://hadoop.apache.org/core/docs/r0.17.0/hdfs_design.html#Browser+Interface)

I don't know if that helps.

Regards,
Alan Ho

----- Original Message ----
From: Ray Nugent <rnu...@yahoo.com>
To: cloud-c...@googlegroups.com
Sent: Thursday, June 19, 2008 11:17:13 AM
Subject: Re: Business Intelligence solution in Cloud Computing

Chris, couple of thoughts -

1) is it possible to have the app run on AWS so that the derived data does not need to traverse back down in real time (that way you could use a lazy download in the background to archive it in their DC while their app accesses the copy in real time on AWS.)?

2) I've been thinking about the problem of upload times as well (in the context of large DNA data sets). The cost of loading into AWS is not that prohibitive so if one where to pre-process that data such that it could be uploaded in a bunch of parrallel processes to AWS you could reduce the bottleneck considerably. In theory.

Ray

----- Original Message ----
From: Chris K Wensel <ch...@wensel.net>
To: cloud-c...@googlegroups.com
Sent: Thursday, June 19, 2008 9:27:51 AM
Subject: Re: Business Intelligence solution in Cloud Computing

> On Jun 19, 2008, at 8:08 AM, Chris K Wensel wrote:
>
>> That said, with bandwidth being the bottleneck in the face of the

>> ability to spin up 100 or 1000 nodes to crunch numbers
>
> how big are the datasets you're working with? Random or linear
> access ?
>

total data is 100's of G. Individual work loads are ~10G. All linear
(this being Hadoop), but there is much joining, binning, and crunching
between the multiple input datasets (the actual workload translates to
~60 MapReduce jobs, all rendered and managed by Cascading).

So it kinda sucks to have uploads of data to the cluster take longer
than it does to compute on it. Worse since my client then has to fetch
the derived data back.

ckw

--
Chris K Wensel
ch...@wensel.net
http://chris.wensel.net/
http://www.cascading.org/

Ask a question on any topic and get answers from real people. Go to Yahoo! Answers.

Nati Shalom

unread,

Jun 20, 2008, 3:33:55 AM6/20/08

to cloud-c...@googlegroups.com

A practical way for dealing with data in the cloud IMO is to decouple
the way we persist the data from the application.
What that basically means is that the application loads the data into an
in-memory cloud and that memory cloud keeps the data synchronized with
persistent storage asynchronously.
There is an opensource version that does that with Amazon -SimpleDB and
GigaSpaces as the data-grid
http://www.openspaces.org/display/EDS/External+Data+Source+by+Amazon+Sim
pleDB - It basically means that the data is stored in Amazon S3 as the
persistent storage. When the application boots, the datagrid loads the
data from S3 using SimpleDB interface to the memory of the cloud
resources.
The application use the in-memory data. Updates to the memory is being
propagated asynchronously back to S3 through the same datagrid and
SimpleDB interface. You can do pretty much the same thing with MySQL
instead of SimpleDB as I noted in one of my previous posts:
http://natishalom.typepad.com/nati_shaloms_blog/2008/03/scaling-out-mys.
html
Once the persistent storage is decoupled from our application we can
easily use the same model for keeping our persistent data outside of the
cloud i.e. in our local IT.

The nice thing is that we can be very flexible with our strategy as it
relates to where the data will reside, how it will be stored, and at
what rate it will be synchronized from the application. We can change it
overtime to best fit our application scenario and constraints without
touching our application code.

" And CORBA isn't what I am thinking of, or even HADOOP but things like
JavaSpaces (?)."

JavaSpaces is indeed more relevant for this type of scenarios.
What's unique about JavaSpaces IMO is that it can be used for handling
both the compute side and the data storage. The references above shows
how you could use space-based storage for handling the data side.
Now that its stored in the in-memory space cluster you can easily use
the same space to route business logic on those in-memory instances in
parallel. There's a nice way to abstract that from the user using a
remoting abstraction - see more details on how that works here:
http://uri-cohen.blogspot.com/2008/02/openspaces-svf-remoting-on-steroid
s.html

Nati S.
GigaSpaces

-----Original Message-----
From: cloud-c...@googlegroups.com
[mailto:cloud-c...@googlegroups.com] On Behalf Of Chaz.

Oscar Koeroo

unread,

Jun 20, 2008, 4:00:55 AM6/20/08

to cloud-c...@googlegroups.com

Isn't the discussion relative to the level of assurance (in terms of
security here) that is supplied and demanded per use case?

For absolute control, you should stay absolutely in control of the
resources and I don't think Cloud Computing is something for you.

If you want a secured environment you should understand that the
administrator of the resource can read the memory. If you want to
prevent that, then you should look into secured techniques to hide the
memory contents. Ultimately you must get your instructions through a
(virtual) CPU which can also be obscured on what it is doing, but that's
the only threat left in that solution.

Most cloud computing solution don't allow for that level of security in
the cloud.

The other more generic good thing to do IMHO is to encrypt all your data
that resides in the Cloud. Irregardless if this is somewhere between
absolutely needed and a tiny wish. This would also solve issues where
inadvertently some transfer protocol are unencrypted.

For some bio-medical use-cases in Grid computing (more my main field)
this approach is also being used. Decryption happens just prior to the
actual processing. A more advanced solution is the sliding decrypted
window approach. Where the dataset is decrypted per section or block.
CPU usage goes up, but most of the file/database stays encrypted and
opportunities to snoop around on the resource is very limited in its
opportunity.

cheers,

Oscar Koeroo

> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
> To post to this group, send email to cloud-c...@googlegroups.com
> To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
> For more options, visit this group at http://groups.google.ca/group/cloud-computing?hl=en
> -~----------~----~----~----~------~----~------~--~---

Ray Nugent

unread,

Jun 20, 2008, 10:54:59 AM6/20/08

to cloud-c...@googlegroups.com

Nati, I agree that decoupling will help. However your point here confuses me -

"Once the persistent storage is decoupled from our application we can easily use the same model for keeping our persistent data outside of the cloud i.e. in our local IT."

Keeping persistence that far away is bound to have pretty significant impact on your performance isn't it?

Ray

-----Original Message-----
From: cloud-c...@googlegroups.com
[mailto:cloud-c...@googlegroups.com] On Behalf Of Chaz.
Sent: Thursday, June 19, 2008 11:35 PM
To: cloud-c...@googlegroups.com
Subject: Re: Issues of data in the cloud...

>> Sent from my Verizon Wireless BlackBerry
>>
>> -----Original Message-----
>> From: "Chaz." <eprpa...@gmail.com>
>>

>> Date: Thu, 19 Jun 2008 14:00:55
>> To:cloud-c...@googlegroups.com

>> Subject: Re: Issues of data in the cloud...
>>
>>
>>

>> I know from my work that many firms are reluctant to let there data
>> "out
>> the door" since they see that as their edge in the market. But even
>> that
>> aside for a minute, it seems to make more sense to move "small"
>> programs
>> (relative to the size of the data) then to move massive amounts of
>> data.
>>
>> So my question is as follows: what makes a good "storage cloud"?
>>
>> Chuck Wegrzyn
>>
>> Khazret Sapenov wrote:
>>>
>>> On Thu, Jun 19, 2008 at 1:40 PM, Chaz. <eprpa...@gmail.com
>>> <mailto:eprpa...@gmail.com>> wrote:
>>>
>>> [snip]

>>> I'd think the approach is to keep the data still and move the
>>> computing
>>> to it. The idea is to see the thousands of machines it takes to
>>> hold the
>>> petabytes worth of data as the compute cloud. What needs to
>>> move to it
>>> is the programs that can process the data. I've been working on
>>> this
>>> approach for the last 3 years (Twisted Storage).
>>>
>>> Chuck Wegrzyn
>>>
>>>

>>> This is valid approach, that I personally called "Plumber Pattern",

>>> when
>>> application, encapsulated in some kind of container (e.g. virtual
>>> machine image) is marshalled to secure data islands to iteratively
do
>>> its unique work (say, do a matches on some criterium in Interpol,
>>> FBI,
>>> CIA, MI5 and other databases, all distributed across continents).
>>> Due to
>>> utterly confidential nature of these types of data, it is
>>> impossible to
>>> move them to public storage (at least this time). Above-mentioned
>>> case
>>> might be extrapolated to some lines of business as well with reduced
>>> privacy/security requirements.
>>>
>>> Khaz Sapenov
>>>
>>>
>>
>>
>>
>

> --
> Chris K Wensel
> ch...@wensel.net

> http://chris.wensel.net/
> http://www.cascading.org/
>
>
>
>
>
>
>
> >
>

Chris K Wensel

unread,

Jun 20, 2008, 11:15:29 AM6/20/08

to cloud-c...@googlegroups.com

Thanks for the comments Alan. My previous post should outline how we have parallelized much of the infrastructure to alleviate my clients issues to a reasonable degree. In short, we employed the patterns you suggest, but not the specific technologies for various reason. I'd be happy to go into a little more detail offline.

The gist of my comments in this thread are to complain that you can't unfortunately scale bandwidth into a cloud to match the relative scale of the compute resources, currently. many hours to upload, and relatively few minutes to crunch, is an annoying imbalance.

For the analytics in the cloud space, there is an opportunity for a vendor to offer whatever services (many introduced in this thread by others) to alleviate the imbalance.

cheers,

ckw

On Jun 19, 2008, at 11:00 PM, Alan Ho wrote:

Hi Chris,

I've looked at this issue quite a bit too. There are a few ways that I think the problem can be "relieved"

1. Don't encourage your clients to download the entire data-set. As long as you provide URLs to the "crunched data", they should only have to pull the data as needed. You can index the data too using SDB too - a nice convenience function for searching the data.

2. See if the customers can split the dataset into sub-datasets, each reachable via some sort of URL. When you run your Map job, each of the Map nodes will be responsible for downloading the data from your clients - you might get some benefits from the parallelization of the download.

3. Use S3 for more of a backing store - If you don't have many clients consuming the data, or you think that the clients will download the data soon after the mapreduce job is complete, they can download it directly from the HDFS (http://hadoop.apache.org/core/docs/r0.17.0/hdfs_design.html#Browser+Interface)

I don't know if that helps.

Regards,
Alan Ho

Alan Ho

unread,

Jun 20, 2008, 12:11:41 PM6/20/08

to cloud-c...@googlegroups.com

Yeah. The whole issue with SOA as it is today is that you are expected to move the data to where the data is processed. What we really need is the ability to move the processing to where the data is (Which is kinda the point of Hadoop)

Cheers,
Alan Ho

----- Original Message ----
From: Chris K Wensel <ch...@wensel.net>
To: cloud-c...@googlegroups.com

Sent: Friday, June 20, 2008 8:15:29 AM
Subject: Re: Business Intelligence solution in Cloud Computing

Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail

Gavan Corr

unread,

Jun 20, 2008, 12:32:42 PM6/20/08

to cloud-c...@googlegroups.com

Hadoop, fantastic idea (it would be great if it worked...)

if you need a production ready environment in Finance, it's a long way off. The distributed caching products, Gemfire, Oracle's Coherence and Nati's gigaspaces are all miles ahead of hadoop at this point, some more than others ;-)

Visit our website at http://www.nyse.com
*****************************************************************************
Note: The information contained in this message and any attachment to it is privileged, confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to the message, and please delete it from your system. Thank you. NYSE Euronext, Inc.

Chris K Wensel

unread,

Jun 20, 2008, 1:28:49 PM6/20/08

to cloud-c...@googlegroups.com

Hadoop works fine, if you have proper expectations in respect to its architecture. It is not intended for real time processing. Coherence or GigSpaces, as you point out are great for that.

But if you have dozens or more things that need to get done reliably against reasonably large datasets, it will get them done. One user recently commented he hadn't noticed part of his production cluster was down (hardware failure) since the cluster just kept running his scheduled jobs.

You have to apply the appropriate tools to the problem. I've used Hadoop to process historical stock trade and equity data. It was a perfect fit for the requirements.

ckw

Chris Marino

unread,

Jun 20, 2008, 2:13:01 PM6/20/08

to cloud-c...@googlegroups.com

Somewhat related to point #1 below, there is a new class of BI tool/client that is sometimes called a Data Browser that relies on data being exposed as data services. These clients have local DBs and can perform analytics as well as reports, etc. In the purest form, there is no data warehouse, but practically speaking, the data warehouse is exposed via data services, which can be incrementally be supplemented by other data sources.

Kirix has a DataBrowser (www.kirix.com).

This whole BI area is sometimes called BI 2.0. Good article here:

http://www.intelligententerprise.com/showArticle.jhtml?articleID=197002610

One other point, I came across another cloud BI provider the other day: Good Data (www.gooddata.com).

__________________________________
Chris Marino
SnapLogic, Inc.

Really Simple Integration
www.snaplogic.com
650-655-7200

Michael Moran

unread,

Jun 20, 2008, 2:26:13 PM6/20/08

to cloud-c...@googlegroups.com

Nati,

I am intrigued with the idea of decoupling you mention below.

Question: Is the scenario you described any different from the one you presented at the Spring Experience conference in Miami FL, back in December 2007?

Thanks,

Michael

Nati Shalom

unread,

Jun 20, 2008, 2:49:33 PM6/20/08

to cloud-c...@googlegroups.com

" Question: Is the scenario you described any different from the one you presented at the Spring Experience conference in Miami FL, back in December 2007?"

The principles are quite the same – this pattern becomes even more applicable and relevant in the cloud simply since cloud emphasize the implicit contradiction between dynamic scaling and persistency.

Its very easy to add more memory resources dynamically, its much harder to add persistence resources on demand or dynamically partition them. Its easier to handle continues high availability with memory based cluster then with failure on the persistent layer etc.

HTH

Nati S.

<br

Dennis Reedy

unread,

Jun 20, 2008, 3:31:52 PM6/20/08

to cloud-c...@googlegroups.com

On Jun 20, 2008, at 249PM, Nati Shalom wrote:

" Question: Is the scenario you described any different from the one you presented at the Spring Experience conference in Miami FL, back in December 2007?"

The principles are quite the same – this pattern becomes even more applicable and relevant in the cloud simply since cloud emphasize the implicit contradiction between dynamic scaling and persistency.

Its very easy to add more memory resources dynamically, its much harder to add persistence resources on demand or dynamically partition them. Its easier to handle continues high availability with memory based cluster then with failure on the persistent layer etc.

What you are proposing is seems to make sense, adding memory resources is certainly easier then adding persistent resources (or dynamically partitioning them). However, if adding more memory resources requires those memory resources to first sync from the persistent resources at startup, possibly resulting in a re-partitioning of the already cached data-set, then scaling the data grid becomes equally as challenging.

What we have seen with the approach(es) you are bringing up here is that once the data is loaded from the persistent resource into distributed shared memory, the access to the cached data is certainly faster. However, scaling that in-memory data grid is not as straight forward. While the application components processing the data can easily scale, the underlying data grid may not be as easily scaled

Dennis

William Newport

unread,

Jun 20, 2008, 3:42:49 PM6/20/08

to cloud-c...@googlegroups.com

Can you clarify this? Most datagrid products including ours (IBM WebSphere eXtreme Scale) can scale to a couple of thousand JVMs pretty easily and can be expanded while they are running transparently. Expanding a grid provides it with more CPU, network and memory.

Which scalability aspect have you had a problem with?

Dennis Reedy

unread,

Jun 20, 2008, 4:12:47 PM6/20/08

to cloud-c...@googlegroups.com

On Jun 20, 2008, at 342PM, William Newport wrote:

> Can you clarify this?

The issue I was raising is that in some cases when data is partitioned
adding more data grid "members" may result in re-partitioning of the
cached data. If that is indeed the case then scaling the data grid may
become difficult. If the cluster members need to resync on the new
partitioning scheme, this presents an issue for existing clients, and
may result in delays accessing the cached data.

Simply put, if you have to re-partition, scaling the data-grid is not
that simple.

> Most datagrid products including ours (IBM WebSphere eXtreme Scale)
> can scale to a couple of thousand JVMs pretty easily and can be
> expanded while they are running transparently. Expanding a grid
> provides it with more CPU, network and memory.

Yep.

>
> Which scalability aspect have you had a problem with?

We have been very successful scaling application services using a
declarative SLA approach. This approach is based on an open source
project called Rio (http://www.rio-project.org).

HTH

Dennis

Ray Nugent

unread,

Jun 20, 2008, 7:33:17 PM6/20/08

to cloud-c...@googlegroups.com

I would tend to agree with Chris, Hadoop seems to work pretty well, particularly as of the .17 release. There are also some interesting supporting projects taking shape around Hadoop. Gavan, what was the experience you had with Hadoop that led you to feel it was not ready yet?

Ray

----- Original Message ----
From: Chris K Wensel <ch...@wensel.net>
To: cloud-c...@googlegroups.com
Sent: Friday, June 20, 2008 10:28:49 AM
Subject: Re: Business Intelligence solution in Cloud Computing

Hadoop works fine, if you have proper expectations in respect to its architecture. It is not intended for real time processing. Coherence or GigSpaces, as you point out are great for that.

But if you have dozens or more things that need to get done reliably against reasonably large datasets, it will get them done. One user recently commented he hadn't noticed part of his production cluster was down (hardware failure) since the cluster just kept running his scheduled jobs.

You have to apply the appropriate tools to the problem. I've used Hadoop to process historical stock trade and equity data. It was a perfect fit for the requirements.

ckw

On Jun 20, 2008, at 9:32 AM, Gavan Corr wrote:

Hadoop, fantastic idea (it would be great if it worked...)

if you need a production ready environment in Finance, it's a long way off. The distributed caching products, Gemfire, Oracle's Coherence and Nati's gigaspaces are all miles ahead of hadoop at this point, some more than others ;-)

On Jun 20, 2008, at 12:11 PM, Alan Ho wrote:

Yeah. The whole issue with SOA as it is today is that you are expected to move the data to where the data is processed. What we really need is the ability to move the processing to where the data is (Which is kinda the point of Hadoop)

Cheers,

Alan Ho

----- Original Message ----
From: Chris K Wensel <ch...@wensel.net>
To: cloud-c...@googlegroups.com
Sent: Friday, June 20, 2008 8:15:29 AM
Subject: Re: Business Intelligence solution in Cloud Computing

Thanks for the comments Alan. My previous post should outline how we have parallelized much of the infrastructure to alleviate my clients issues to a reasonable degree. In short, we employed the patterns you suggest, but not the specific technologies for various reason. I'd be happy to go into a little more detail offline.

The gist of my comments in this thread are to complain that you can't unfortunately scale bandwidth into a cloud to match the relative scale of the compute resources, currently. many hours to upload, and relatively few minutes to crunch, is an annoying imbalance.

For the analytics in the cloud space, there is an opportunity for a vendor to offer whatever services (many introduced in this thread by others) to alleviate the imbalance.

cheers,

ckw

On Jun 19, 2008, at 11:00 PM, Alan Ho wrote:

Hi Chris,

I've looked at this issue quite a bit too. There are a few ways that I think the problem can be "relieved"

1. Don't encourage your clients to download the entire data-set. As long as you provide URLs to the "crunched data", they should only have to pull the data as needed. You can index the data too using SDB too - a nice convenience function for searching the data.

2. See if the customers can split the dataset into sub-datasets, each reachable via some sort of URL. When you run your Map job, each of the Map nodes will be responsible for downloading the data from your clients - you might get some benefits from the parallelization of the download.

3. Use S3 for more of a backing store - If you don't have many clients consuming the data, or you think that the clients will download the data soon after the mapreduce job is complete, they can download it directly from the HDFS (http://hadoop.apache.org/core/docs/r0.17.0/hdfs_design.html#Browser+Interface)

I don't know if that helps.

Regards,
Alan Ho

--
Chris K Wensel
ch...@wensel.net
http://chris.wensel.net/
http://www.cascading.org/

Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail

Visit our website at http://www.nyse.com
*****************************************************************************
Note: The information contained in this message and any attachment to it is privileged, confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to the message, and please delete it from your system. Thank you. NYSE Euronext, Inc.

William Newport

unread,

Jun 20, 2008, 6:00:51 PM6/20/08

to cloud-c...@googlegroups.com

I guess there are two ways data is laid out in a grid. Hashing or
range based partitioning. Hash based partition is what all the
datagrid products use today. There is some fixed # of partitions (may
be physical or logical) and each key is hashed and taken modulo this
for the home. When new servers are added, enough partitions are taken
from the existing set to put the average number on the new ones. This
results in network copying and CPU burn on the existing servers but
this shouldn't take too long.

Range based partitioning isn't used by anyone except google (for now)
and here you start with a single partition representing A-Z and fill
it until it splits to A-D,E-Z and so on splitting as it fills. New
servers can force a split or migration of an existing range. Cost of
moving is similar but can be more expensive than hash based because
with hash based, each partition is independant of all others but with
range based indexes need to be split etc which makes it more
expensive. Range based look very attractive to me though moving
forward with IBM eXtreme Scales evolution.

But either way, adding servers is handled up to large numbers of
servers using this mechanism and we haven't see any issues with it
except for the CPU and network burn on existing servers whose data is
being moved to the new servers but the other servers carry on
independantly.

Andrew Rogers

unread,

Jun 22, 2008, 12:26:15 PM6/22/08

to cloud-c...@googlegroups.com

--- On Fri, 6/20/08, William Newport <bnew...@mac.com> wrote:
> Range based partitioning isn't used by anyone except google
> (for now) and here you start with a single partition
> representing A-Z and fill it until it splits to A-D,E-Z and
> so on splitting as it fills. New servers can force a split
> or migration of an existing range. Cost of moving is similar
> but can be more expensive than hash based because with hash
> based, each partition is independant of all others but with
> range based indexes need to be split etc which makes it more
> expensive. Range based look very attractive to me though moving
> forward with IBM eXtreme Scales evolution.

We are working with something conceptually similar to range-based partitioning as well, in our case novel high-dimensional spaces (not yet a public product), and I would like to add a few additional reasons it is a superior mechanism in many use cases relative to simple hash distribution:

1.) Semi-adaptive runtime resource balancing across partitions is trivially possible that helps smooth over hotspots, whether CPU, I/O, storage, etc. This is particularly true if the cloud platform is using data structures that span many nodes for a single end-user instance e.g. a database, as in our case.

2.) Preservation of spatial relationships in they key range e.g. in the Google case you cite above. For some simple cases like session objects there may be no meaningful spatial relationships in the key set worth preserving and hashing is simpler, but for cases like Google (and us) the lack of locality and order in the distribution of keys in the cluster has very negative performance implications.

3.) For more clever clouds, you can start doing cost-based optimizations and computing more efficient and complex access patterns with the metadata when using range partitioning. However, I don't think any current production cloud platform can really take advantage of this so it is not very important at this point (but will be in the future).

Also, I would point out that the network burn can be managed so that the impact is quite modest when repartitioning. There is no requirement to have one partition per physical node as opposed to several, and if you look at e.g. Google's architecture you find that a single partition can be moved across the network in about a second assuming wire speed. There are a lot of advantages to using several smaller logical partitions per physical node, not the least of which is that partitioning events are much smaller, more distributed, and you can keep the physical node hovering closer to max capacity most of the time. Repartitioning should generally be a smaller resource event on the network than e.g. the very regular transparent recovery going on from node failures in the case of a large cluster.

As you can see, range partitioning reflects a different basic use case than hash partitioning. Hash partitioning is simple and effective when access is only going to be affecting one logical node, but range partitioning becomes valuable when a single access may span multiple nodes. A lot of clouds currently use the former model because it is simple and adequate given the current state of what the clouds can do.

Andrew

jurq...@yahoo.com

unread,

Jun 23, 2008, 1:46:22 AM6/23/08

to Cloud Computing

Somebody may already have replied to this, but I have to ask: how is
maintaining physical control over the hardware that stores the data
guaranteed to be any more secure than having a business who's
lifeblood depends on its providing security to customers do it? I
should note that I think "maintain physical control of [the data]" is
completely nonsensical.

James

On Jun 19, 1:18 pm, "Chaz." <eprparad...@gmail.com> wrote:
> Security is a funny issue. Can you ever use a cloud computing complex
> and know for certain your data is protected? I'm betting there is no
> fool proof way that it can be. So the only real way is to fall back to
> what we know today: maintain physical control of it for once that is
> gone you are on your own baby.
>

> Chuck Wegrzyn
>
> Utpal Datta wrote:

> > May be this is a redundant question, where is this protected data
> > residing? In the cloud or in the user's data center?
>
> > If it is in the cloud then we are still dealing with Security,
> > Availability and Recoverability isues (that everyone agrees on).
>
> > If is in the users data center then how will the computing resources
> > offered (and controlled by Amazon) be brought to that specific user's
> > datacenter?
>
> > --utpal
>

> > On Thu, Jun 19, 2008 at 3:10 PM, Chaz. <eprparad...@gmail.com> wrote:
> >> Jim,
>
> >> I definitely agree with your point. I can't think of very many
> >> multi-nationsls that would let there data out to wander around. I'd
> >> think they would want to protect their data and move the computing
> >> resources close to it....
>
> >> Chuck
>
> >> Jim Peters wrote:
> >>> Even if the cloud providers come up with excellent answers to the
> >>> security and reliability questions, who's going to trust them? Credit
> >>> card numbers are one thing, but cloud data is something else entirely.
>
> >>> +J
>
> >>> On Thu, Jun 19, 2008 at 10:57 AM, On SaaS <ons...@gmail.com

> >>> <mailto:ons...@gmail.com>> wrote:
>
> >>> That depends on how the cloud is architected, no?
>
> >>> And I would think the cloud providers will have to start answering
> >>> these questions if they want large enterprises to start adopting the
> >>> cloud. There maybe no control of which server in the cloud is doing
> >>> the computation, but service providers may provide options to
> >>> restrict based on geographic domains.
>
> >>> We have quite a few people here from the cloud providers, maybe they
> >>> can share some insight?
>
> >>> thx
>
> >>> On Jun 19, 2008, at 10:44 AM, Stuart Altenhaus wrote:
>
> >>>> I think Chaz is right. There are privacy issues regarding use and
> >>>> exposure of data that vary country by country. If the cloud
> >>>> computes the data, there is no control on where that data is moved
> >>>> for computation, right?
>
> >>>> R/s,
> >>>> Stu Altenhaus
>

> >>>> Sent from my Verizon Wireless BlackBerry
>
> >>>> -----Original Message-----

> >>>> From: "Chaz." <eprparad...@gmail.com <mailto:eprparad...@gmail.com>>
>
> >>>> Date: Thu, 19 Jun 2008 13:40:20
> >>>> To:cloud-c...@googlegroups.com

> >>>> <mailto:cloud-c...@googlegroups.com>

> >>>> Subject: Re: Issues of data in the cloud...
>

> >>>> While I think trans-national data movement will be an area that
> >>>> requires
> >>>> governance of some kind I think that companies can get around the
> >>>> problem in other ways. I think it just requires looking at the
> >>>> problem
> >>>> in a different way.
>

> >>>> I'd think the approach is to keep the data still and move the
> >>>> computing
> >>>> to it. The idea is to see the thousands of machines it takes to
> >>>> hold the
> >>>> petabytes worth of data as the compute cloud. What needs to move
> >>>> to it
> >>>> is the programs that can process the data. I've been working on this
> >>>> approach for the last 3 years (Twisted Storage).
>
> >>>> Chuck Wegrzyn
>

> >>>> Pittard, Rick wrote:
> >>>>> One big concern are compliance with the data privacy laws in the
> >>>>> EU and
> >>>>> other countries which require protection of personal data and that it
> >>>>> not be transmitted to locations that have less protections.
> >>>>> Since the
> >>>>> laws in the US are generally less protective than those in the
> >>>>> EU, then
> >>>>> additional controls/agreements need to be in place to legally
> >>>>> move the
> >>>>> data from the EU to the US.
>
> >>>>> Rick
>

> >>>>> -----Original Message-----
> >>>>> From: cloud-c...@googlegroups.com
> >>>>> <mailto:cloud-c...@googlegroups.com>
> >>>>> [mailto:cloud-c...@googlegroups.com] On Behalf Of Chaz.

> >>>>>> On Thu, Jun 19, 2008 at 11:08 AM, Chris K Wensel

> >>>>>>> That said, with bandwidth being the bottleneck in the face of the

> >>>>>>> ability to spin up 100 or 1000 nodes to crunch numbers, larger
> >>>>>>> pipes
> >>>>>>> into a vendors Cloud would be very welcome. Otherwise your Cloud
> >>>>>>> solution is only as fast as getting data in and out of it.
>
> >>>>>>> chris
>

> >>>>>>> --
> >>>>>>> Chris K Wensel
> >>>>>>> ch...@wensel.net <mailto:ch...@wensel.net>
> >>>>>>> http://chris.wensel.net/
> >>>>>>> http://www.cascading.org/
>

> >>> --
> >>> OnSaaS.net - /Blogging about the SaaS and cloud computing world/
> >>> OnSaaS.info - Providing a continuous stream of SaaS and cloud
> >>> computing news
> >>> /Follow on
>

> ...
>
> read more »

Chaz.

unread,

Jun 23, 2008, 9:45:44 AM6/23/08

to cloud-c...@googlegroups.com

It all comes down to trust, pure and simple.

Chuck Wegrzyn

Blakley, Jim R

unread,

Jun 23, 2008, 11:38:11 AM6/23/08

to cloud-c...@googlegroups.com

Trust and economies of scale. To James' point, think about private
building security firms. Why would I outsource something as critical as
the person who determines who gets into my building? Because the cost
savings (and reduction in headaches) from having a professional do it is
substantial and outweighs any concerns I have with trusting that firm.
Especially after the security firm provides me with assurances and data
that supports their trustability (e.g., independent bonding and
background checks on their employees).

The cloud is going to be no different. I already know of firms who are
looking at differentiating based on their ability to provide trusted
cloud data services and believe that their economies of scale in
offering those services will be far better than an individual IT shop's
cost structure for meeting data security and compliance requirements.
Most data security breaches come from insiders anyway (either
carelessness or maliciousness).

-- Jim Blakley

Brian Cinque

unread,

Jun 23, 2008, 12:50:17 PM6/23/08

to cloud-c...@googlegroups.com

What Jim says is interesting as it maps back to impact analysis in regards to security; regardless of corporation size. Each company has to classify their application/services at some point and within those classification tiers is implied many different dependent and independent variables. Those variables are too long to list but in this thread a few are raised but they themselves are not the entirety of the components of apps/service classification.

The cloud is going to be no different.

I find this to be true in relation to this discussion as it relates to "cost of downtime". In Jim's example of a a security firm(s) sourcing out a service based against CAPEX/OPEX expenditures is interesting but is it valid? What I mean is in certain industries like hospital's if critical services are down then the cost of downtime is death - an extreme example but true. So my open-ended question is; does Jim's example speak truth to the 20% of Pareto Law or to the 80%?

Information security and compliance will always be there (again to Jim's point) but the question is who can utilize the benefits of the Cloud? I am of the camp that Chaz raised, you have to be able to trust but you also have to be able to know what your trusting out and what the impact is of failure. If the impact of failure exceeds the benefits then maybe that app/service is of the 20% camp - for now. In the end utilizing the benefits of the cloud will all depend on each corporations/business requirements.

What I find a fox in the hen house discussion is... If I utilize a GRC SaaS solution to monitor my applications that I utilize from the cloud, where are the lines drawn? Are the SaaS providers of today ensuring levels of GRC and can I utilize 3rd party or even internal compliance groups to keep a check's and balance?

brian

randall

unread,

Jun 23, 2008, 1:25:57 PM6/23/08

to Cloud Computing

You don't really believe it's "nonsensical" do you? Seriously?

> ...
>
> read more »

randall

unread,

Jun 23, 2008, 1:37:50 PM6/23/08

to Cloud Computing

Introducing more variables into the security equation is the precise
reason that giving up physical control to move data into the cloud is
inherently less secure than keeping the data behind firewalls in
corporate data centers that, in my experience, are many times just as
-- if not more -- secure than hosting providers in the cloud.

I think the rampant disregard among cloud vendors of this fact
compromises our credibility. A lot of the issue is simply about
trust, but how can we expect potential customers to trust us when we
completely miss the mark on this point?

On Jun 23, 11:50 am, Brian Cinque <brian.cin...@gmail.com> wrote:
> What Jim says is interesting as it maps back to impact analysis in regards to security; regardless of corporation size. Each company has to classify their application/services at some point and within those classification tiers is implied many different dependent and independent variables. Those variables are too long to list but in this thread a few are raised but they themselves are not the entirety of the components of apps/service classification.The cloud is going to be no different.

Jim Peters

unread,

Jun 23, 2008, 2:08:21 PM6/23/08

to cloud-c...@googlegroups.com

It all comes down to risk.

Say you're a big company with valuable data. You decide to let a 3rd party manage that data for you, because it's significantly cheaper than doing it yourself.
If the 3rd party mis-handles your data, you can probably get your money back. But that's it. All you can recover is what you paid them. I can't see a 3rd party being able to cover the true value of the data if it gets mis-handled. So I think big cos. will be very reluctant to go this route.

--
Jim Peters
+415-608-0851

Brian Cinque

unread,

Jun 23, 2008, 2:20:33 PM6/23/08

to cloud-c...@googlegroups.com

Randall,

giving up physical control to move data into the cloud is inherently less secure than keeping the data behind firewalls in corporate data centers

That statement is an opinion... For some of the security vendors who provide services from the Cloud they might take umbrage to those statements. In my experience I have seen behind the network/security technologists provide some amazing service and on the flip side I have seen some major user error. Which makes me wonder is the issue a technology issue or a training issue? or a funding issue? or ???

in my experience, are many times just as -- if not more -- secure than hosting providers in the cloud.

You nailed my point.. For each service/application they require different levels of security so they will pay accordingly for it. Does it make more sense for services/applications that require minimal security requirements to be held to the same security standards as financial institutions? Whats the cost or impact to a business if they need secure access from the Cloud? Is the cost, risk, etc lower then keeping it in-house.. It all depends on the individual business and their specific service/application. I think there is enough potential businesses out there that will deem in-house costs too expensive (from OPEX/CAPEX) to support a secure information access model via Cloud. Again this discussion is on secure access there are a whole bunch of other variables to discuss about the economics behind the Cloud.

when we completely miss the mark on this point?

Can you elaborate a bit more on your views of the fundamental points? For me, my take away from this thread has been calling out the elephant in the room that no one wants to discuss and understanding the entire impacts from a security and compliance standpoint of accessing information from the Cloud.

Ian Rae

unread,

Jun 23, 2008, 2:52:09 PM6/23/08

to cloud-c...@googlegroups.com

Which brings up a great point, if I'm a big company and do this inhouse
and mishandle the data in an equivalently bad way or worse, can I do a
better job of recovery? Maybe. The balance of risks may be slowly shifting.

I suspect that over a long period of time the cloud providers will simply
get much better at doing this, and so much cheaper that the balance of
this decision will shift for the majority of the market.

There may also be premium priced verticalized clouds that specialize, for
example on HIPAA compliance, and give a better level of regulation and
controls while still saving $.

One things for certain, none of this is likely to happen overnight.

Ian @
infreemation.net

On Mon, 23 Jun 2008 14:08:21 -0400, Jim Peters <jazzm...@gmail.com>
wrote:

>> To:cloud-c...@googlegroups.com<To%3Acloud-...@googlegroups.com>

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Ray Nugent

unread,

Jun 23, 2008, 2:56:46 PM6/23/08

to cloud-c...@googlegroups.com

Big Businesses don't keep all their cash in a safe in the basement at HQ, they use banks. Why do they need to keep their data on servers in the HQ data center rather than a secured third party repository?

Ray

Reuven Cohen

unread,

Jun 23, 2008, 3:00:04 PM6/23/08

to cloud-c...@googlegroups.com

Ray, I think you nailed it. Problem is convincing enterprises to think
the same way about the cloud.

reuven

--
--

Reuven Cohen
Founder & Chief Technologist, Enomaly Inc.
www.enomaly.com :: 416 848 6036 x 1
skype: ruv.net // aol: ruv6

blog > www.elasticvapor.com
-
Get Linked in> http://linkedin.com/pub/0/b72/7b4

Khazret Sapenov

unread,

Jun 23, 2008, 3:04:58 PM6/23/08

to cloud-c...@googlegroups.com

I wouldn't agree. Banks are regulated/audited by state, while cloud providers are not at this point (if ever).

Chaz.

unread,

Jun 23, 2008, 3:06:18 PM6/23/08

to cloud-c...@googlegroups.com

Cash in the bank is much different from data. You and I can both agree
that I have a certain amount of it. We have a contract - a passbook or a
deposit slip - that verifies you got so much from me. Once I have that
paper in hand you have to give me my cash back.

Data is much, much different. Unlike cash it can be shared without me
knowing it. As such there are few guarantees that it can remain private
with out me knowing it.

When it comes right down to it there are a fixed number of risks in
securing data, both in and out of the corporation. What I think is true
is that there is a more inherent risk in farming out the data than
keeping it inside.

But that said, there is no absolute certainty of security.

Chuck Wegrzyn

Ray Nugent wrote:
> Big Businesses don't keep all their cash in a safe in the basement at
> HQ, they use banks. Why do they need to keep their data on servers in
> the HQ data center rather than a secured third party repository?
>
> Ray
>
> ----- Original Message ----
> From: Ian Rae <ian...@gmail.com>
> To: cloud-c...@googlegroups.com
> Sent: Monday, June 23, 2008 11:52:09 AM
> Subject: Re: Issues of data in the cloud...
>
>
> Which brings up a great point, if I'm a big company and do this inhouse
> and mishandle the data in an equivalently bad way or worse, can I do a
> better job of recovery? Maybe. The balance of risks may be slowly shifting.
>
> I suspect that over a long period of time the cloud providers will simply
> get much better at doing this, and so much cheaper that the balance of
> this decision will shift for the majority of the market.
>
> There may also be premium priced verticalized clouds that specialize, for
> example on HIPAA compliance, and give a better level of regulation and
> controls while still saving $.
>
> One things for certain, none of this is likely to happen overnight.
>
> Ian @
> infreemation.net
>
> On Mon, 23 Jun 2008 14:08:21 -0400, Jim Peters <jazzm...@gmail.com

> <mailto:jazzm...@gmail.com>>

> wrote:
>
> > It all comes down to risk.
> >
> > Say you're a big company with valuable data. You decide to let a 3rd
> > party
> > manage that data for you, because it's significantly cheaper than doing
> > it
> > yourself.
> > If the 3rd party mis-handles your data, you can probably get your money
> > back. But that's it. All you can recover is what you paid them. I can't
> > see
> > a 3rd party being able to cover the true value of the data if it gets
> > mis-handled. So I think big cos. will be very reluctant to go this route.
> >
> > On Sun, Jun 22, 2008 at 10:46 PM, <jurq...@yahoo.com
> <mailto:jurq...@yahoo.com>> wrote:
> >
> >>
> >> Somebody may already have replied to this, but I have to ask: how is
> >> maintaining physical control over the hardware that stores the data
> >> guaranteed to be any more secure than having a business who's
> >> lifeblood depends on its providing security to customers do it? I
> >> should note that I think "maintain physical control of [the data]" is
> >> completely nonsensical.
> >>
> >> James
> >>
> >> On Jun 19, 1:18 pm, "Chaz." <eprparad...@gmail.com

> <mailto:eprparad...@gmail.com>> wrote:
> >> > Security is a funny issue. Can you ever use a cloud computing complex
> >> > and know for certain your data is protected? I'm betting there is no
> >> > fool proof way that it can be. So the only real way is to fall back to
> >> > what we know today: maintain physical control of it for once that is
> >> > gone you are on your own baby.
> >> >
> >> > Chuck Wegrzyn
> >> >
> >> > Utpal Datta wrote:
> >> > > May be this is a redundant question, where is this protected data
> >> > > residing? In the cloud or in the user's data center?
> >> >
> >> > > If it is in the cloud then we are still dealing with Security,
> >> > > Availability and Recoverability isues (that everyone agrees on).
> >> >
> >> > > If is in the users data center then how will the computing resources
> >> > > offered (and controlled by Amazon) be brought to that specific
> >> user's
> >> > > datacenter?
> >> >
> >> > > --utpal
> >> >
> >> > > On Thu, Jun 19, 2008 at 3:10 PM, Chaz. <eprparad...@gmail.com

> <mailto:eprparad...@gmail.com>>

> >> wrote:
> >> > >> Jim,
> >> >
> >> > >> I definitely agree with your point. I can't think of very many
> >> > >> multi-nationsls that would let there data out to wander around. I'd
> >> > >> think they would want to protect their data and move the computing
> >> > >> resources close to it....
> >> >
> >> > >> Chuck
> >> >
> >> > >> Jim Peters wrote:
> >> > >>> Even if the cloud providers come up with excellent answers to the
> >> > >>> security and reliability questions, who's going to trust them?
> >> Credit
> >> > >>> card numbers are one thing, but cloud data is something else
> >> entirely.
> >> >
> >> > >>> +J
> >> >
> >> > >>> On Thu, Jun 19, 2008 at 10:57 AM, On SaaS <ons...@gmail.com
> <mailto:ons...@gmail.com>

> <mailto:cloud-c...@googlegroups.com><To%3Acloud-...@googlegroups.com
> <mailto:3Acloud-...@googlegroups.com>>
> >> > >>>> <mailto:cloud-c...@googlegroups.com

> >> > >>>>> <mailto:cloud-c...@googlegroups.com

> <mailto:ch...@wensel.net <mailto:ch...@wensel.net>>>

> <mailto:ch...@wensel.net <mailto:ch...@wensel.net>>

Oscar Koeroo

unread,

Jun 23, 2008, 3:42:46 PM6/23/08

to cloud-c...@googlegroups.com

Doesn't this depend on the scenario? If the cloud's purpose is to hold
the data, then you can encrypt this and query it from a different (more
controlled) environment. To scale, keep indexes and encrypt those.

Replication to the cloud might be very simple, depending on volume of
course, but it is possibly to keep that very secure.

Oscar

> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
> To post to this group, send email to cloud-c...@googlegroups.com
> To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
> For more options, visit this group at http://groups.google.ca/group/cloud-computing?hl=en
> -~----------~----~----~----~------~----~------~--~---

Blakley, Jim R

unread,

Jun 23, 2008, 3:51:25 PM6/23/08

to cloud-c...@googlegroups.com

Yes, IMO, both Pareto and the normal technology adoption lifecycle (http://en.wikipedia.org/wiki/Technology_Adoption_LifeCycle) apply. Amazon would be happy to demonstrate that there are plenty of companies at the innovator end of the curve for whom S3 and SimpleDB are more than adequate. Hospitals and financial service firms are very likely to be the laggards in cloud-based storage. (Although you have some innovators even there: http://googlesystem.blogspot.com/2008/02/cleveland-clinic-to-pilot-google-health.html)

There’s a huge range of companies between the innovators and laggards and the solutions will evolve and improve with time to meet most companies’ needs. Each one will make an individual choice around whether keeping the data housed in their own data center or in the cloud is more or less costly or more or less secure.

I can remember a time when many people said they would never trust a web site with their credit card number. Ultimately, convenience and increasing levels of trust in Amazon, Paypal, et. al. made the concern moot for the 80%. It would be a mistake to think that there won’t be a Paypal equivalent for cloud based data storage just because one hasn’t emerged yet.

-- Jim Blakley

randall

unread,

Jun 23, 2008, 4:02:30 PM6/23/08

to Cloud Computing

I'm not saying the cloud vendors' facilities are inherently less
secure than a corporate data center, but I am saying that moving data
between the two locations introduces additional security concerns.
Even if it can be assumed that your data is better protected in the
cloud than your own IT professionals can secure it, you still have all
the same requirements to protect your internal computers that have
access to the information in the cloud, plus you have to worry about
all the security issues of the cloud itself.

Assume your data is on a server with just a few computers physically
networked to it -- or none. Would you suggest it is less secure there
than in the cloud?

On Jun 23, 1:20 pm, Brian Cinque <brian.cin...@gmail.com> wrote:
> Randall,giving up physical control to move data into the cloud is inherently less secure than keeping the data behind firewalls in corporate data centersThat statement is an opinion... For some of the security vendors who provide services from the Cloud they might take umbrage to those statements. In my experience I have seen behind the network/security technologists provide some amazing service and on the flip side I have seen some major user error. Which makes me wonder is the issue a technology issue or a training issue? or a funding issue? or ???in my experience, are many times just as -- if not more -- secure than hosting providers in the cloud.You nailed my point.. For each service/application they require different levels of security so they will pay accordingly for it. Does it make more sense for services/applications that require minimal security requirements to be held to the same security standards as financial institutions? Whats the cost or impact to a business if they need secure access from the Cloud? Is the cost, risk, etc lower then keeping it in-house.. It all depends on the individual business and their specific service/application. I think there is enough potential businesses out there that will deem in-house costs too expensive (from OPEX/CAPEX) to support a secure information access model via Cloud. Again this discussion is on secure access there are a whole bunch of other variables to discuss about the economics behind the Cloud.when we completely miss the mark on this point?Can you elaborate a bit more on your views of the fundamental points? For me, my take away from this thread has been calling out the elephant in the room that no one wants to discuss and understanding the entire impacts from a security and compliance standpoint of accessing information from the Cloud.

Jim Peters

unread,

Jun 23, 2008, 4:46:39 PM6/23/08

to cloud-c...@googlegroups.com

Bad metaphor. If the bank loses the cash, the chances are good that BigCo can get it all back.

If some Clouds "R" Us. loses (or worse exposes) some data, all BigCo can do is get their money back. The damage is likely to be much greater than the piddly-ish amounts that BigCo will have paid Clouds R Us. Maybe they can slap Clouds "R" Us with a lawsuit that would put Clouds "R" Us out of business, but that doesn't remedy the situation.

+J

Jim Peters
+415-608-0851

Khazret Sapenov

unread,

Jun 23, 2008, 5:20:38 PM6/23/08

to cloud-c...@googlegroups.com

On Mon, Jun 23, 2008 at 4:46 PM, Jim Peters <jazzm...@gmail.com> wrote:

Bad metaphor. If the bank loses the cash, the chances are good that BigCo can get it all back.

If some Clouds "R" Us. loses (or worse exposes) some data, all BigCo can do is get their money back. The damage is likely to be much greater than the piddly-ish amounts that BigCo will have paid Clouds R Us. Maybe they can slap Clouds "R" Us with a lawsuit that would put Clouds "R" Us out of business, but that doesn't remedy the situation.

+J

For every Clouds'R'Us almost always exists Cloud Life Insurance (inevitable?), that might cover risks of losing customer data.:)

The other thing is that such loss might have irrecoverable consequences like those related to industrial espionage cases.

KS

Chaz.

unread,

Jun 23, 2008, 5:35:52 PM6/23/08

to cloud-c...@googlegroups.com

A mistake is to assume encrypting data makes it secure. While it can't
be read when it is encrypted, it doesn't guarantee that once it is
unencrypted to be used that it can't be viewed (or stolen) by others.

Think of how simple it would be to write a wrapper DLL or library that
looks ands acts like some encrypt library but also diverts the contents.
It would be very very simple to do this.

Yes, it is possible to have this same thing happen in a corporation
under my control, but I can audit for it or lock the system down (well
almost). But in a remote site holding my data how can I be sure?

Chuck Wegrzyn

timothy norman huber

unread,

Jun 23, 2008, 5:48:13 PM6/23/08

to cloud-c...@googlegroups.com

On Jun 23, 2008, at 11:56 AM, Ray Nugent wrote:

Big Businesses don't keep all their cash in a safe in the basement at HQ, they use banks. Why do they need to keep their data on servers in the HQ data center rather than a secured third party repository?

Great analogy Ray. I think the "problem" for data on the cloud may lie in the word choice. Clouds evoke floaty puffy images. Banks (supposedly_) do not, although we've had many banks that were both floaty and puffy in the recent past.

Timothy Huber

Strategic Account Development

tim....@metaram.com

cell 310 795.6599

MetaRAM Inc.

181 Metro Drive, Suite 400

San Jose, CA 95110

randall

unread,

Jun 23, 2008, 6:34:03 PM6/23/08

to Cloud Computing

The analogies of electricity and cash to cloud computing both ignore
the fact that there is no information in electricity or cash.

On Jun 23, 4:48 pm, timothy norman huber <timothyhu...@mac.com> wrote:
> On Jun 23, 2008, at 11:56 AM, Ray Nugent wrote:
>
> > Big Businesses don't keep all their cash in a safe in the basement
> > at HQ, they use banks. Why do they need to keep their data on
> > servers in the HQ data center rather than a secured third party
> > repository?
>
> Great analogy Ray. I think the "problem" for data on the cloud may
> lie in the word choice. Clouds evoke floaty puffy images. Banks
> (supposedly_) do not, although we've had many banks that were both
> floaty and puffy in the recent past.
>
> Timothy Huber
> Strategic Account Development
>

> tim.hu...@metaram.com

Pratap Subrahmanyam

unread,

Jun 23, 2008, 6:21:30 PM6/23/08

to cloud-c...@googlegroups.com

Some people call that DRM!

Pratap

Subra K

unread,

Jun 23, 2008, 9:47:34 PM6/23/08

to cloud-c...@googlegroups.com

IMO, the security in cloud will be driven by demand of customers on the "transparency" of the controls employed by the Cloud service provider. Today, the users of cloud do not have full transparency on how security is handled behind the scene by cloud service provider. IMO, given the SMB market segment focus by service provider at this time, and that SMB is more interested in economics than regulatory or compliance concerns, the providers are getting a free ride on this issue. Once the Enterprise market become viable (or is this a catch-22 situation?) , cloud providers will have no choice but to provide transparency as well as API required for security policy and compliance management. Security Management will range from Identity & Access mgmt to data protection in the cloud.

If business is forced to lot more work (compensating controls) to reduce risk in the cloud, business will have to bake that cost into the economics of cloud computing. This is very similar to offshoring development trend that we saw where Economics was the key business driver to push development to low cost development oriented countries. But soon business realized that certain high value application (Intellectual property) development cannot be offshored/outsourced to retain competitive edge while outsourcing commodity type business application development. IMHO, we'll see a similar situation with Cloud computing where commodity type computation and non-core competent applications will move to cloud while highly complex applications that has lots of security and compliance requirements may still reside within the firewall for foreseeable future.

Initially, cloud service competitors may differentiate the service based on security and compliance features they offer but eventually it may just become a check list item like the safety belt and airbags that come with any standard automobile.

Here is an interesting article on this subject - http://www.intelligententerprise.com/blog/archives/2008/06/demystifying_cl.html

Cheers

--Subra

Ray Nugent

unread,

Jun 23, 2008, 11:43:51 PM6/23/08

to cloud-c...@googlegroups.com

Well it's certainly a great deal less useful...:-)

Chaz.

unread,

Jun 24, 2008, 7:16:52 AM6/24/08

to cloud-c...@googlegroups.com

Some people call what "DRM"? We aren't talking about DRM, per se, but
security.

Chuck Wegrzyn

Chaz.

unread,

Jun 24, 2008, 7:21:14 AM6/24/08

to cloud-c...@googlegroups.com

Randall, that is very very true. I'd also like to say that cash has a
value that everyone can agree on - when we see a one dollar bill we can
agree on it being one dollar.

Information is more like "trash" - what one person sees as garbage and
worthless might be worth its weight in gold. lol. But security is
different from the item being secured.

A bank without security no one would trust. So the point is what makes
data secure? When it hits the "cloud" there are more ways to subvert it
than if you maintain position of it. The amount of subversion is the
"risk", and the risk is what we are talking about.

Chuck Wegrzyn

Chaz.

unread,

Jun 24, 2008, 7:25:08 AM6/24/08

to cloud-c...@googlegroups.com

Very true and depending on the computer(s), maybe more secure. You can't
say just because data is on a single unnetworked computer it is
secure. After all how much information has been stolen from laptops
recently?

Chuck Wegrzyn

pboo...@gmail.com

unread,

Jun 24, 2008, 11:25:50 AM6/24/08

to Cloud Computing

I have to agree that should be a focal point. It was over very little
data size wise which tainted brands who have lost people's personal
information. A brand is worth more generally then insurance can or
will give credit for. As people care more about their personal
information and the definition of personal information expands, not
having policy enforcement in the cloud will be a detriment to using
it. IMHO

On Jun 23, 3:20 pm, "Khazret Sapenov" <sape...@gmail.com> wrote:

randall

unread,

Jun 24, 2008, 2:00:37 PM6/24/08

to Cloud Computing

Touché!

randall

unread,

Jun 24, 2008, 2:14:28 PM6/24/08

to Cloud Computing

I want to support what Subra K said about transparency. The lack of
transparency is the very reason we can't determine if the cloud is or
is not safer than hosting ourselves. Without transparency in the
cloud, we'll never know. It is currently impossible for us to compare
the cloud to our DMZ, and additionally also impossible to compare say,
the security of Amazon's cloud to Mosso's cloud.

I would argue, data may actually have an advantage in a properly
secured DMZ because there are fewer hackers targeting your particular
DMZ. There are fewer viruses on Macs than Windows, not only because
Macs are arguably stronger, but because there is more effort expended
building viruses for PCs. We can't discount that the damage to the
community when S3 or FPS is ultimately hacked will be much more
significant than if someone hacks Acme Widget's DMZ and this is enough
motivation for many.

randall

unread,

Jun 24, 2008, 2:40:36 PM6/24/08

to Cloud Computing

Right! When we say, "our cloud is safer than your network," the
customer thinks, "You don't understand the value of my data."

How do we calculate this risk and hedge against it? Insurance, as
someone mentioned earlier, may be the answer, but there isn't yet
enough empirical data for the insurance companies to break out credits
or specialized cloud services, though I suspect it is only a matter of
time. How insurance companies take notice in their policies will be
the test of our claims.

Khazret Sapenov

unread,

Jun 24, 2008, 3:31:11 PM6/24/08

to cloud-c...@googlegroups.com

On Mon, Jun 23, 2008 at 12:50 PM, Brian Cinque <brian....@gmail.com> wrote:

What Jim says is interesting as it maps back to impact analysis in regards to security; regardless of corporation size. Each company has to classify their application/services at some point and within those classification tiers is implied many different dependent and independent variables. Those variables are too long to list but in this thread a few are raised but they themselves are not the entirety of the components of apps/service classification.
The cloud is going to be no different.

I find this to be true in relation to this discussion as it relates to "cost of downtime". In Jim's example of a a security firm(s) sourcing out a service based against CAPEX/OPEX expenditures is interesting but is it valid? What I mean is in certain industries like hospital's if critical services are down then the cost of downtime is death - an extreme example but true. So my open-ended question is; does Jim's example speak truth to the 20% of Pareto Law or to the 80%?

Information security and compliance will always be there (again to Jim's point) but the question is who can utilize the benefits of the Cloud? I am of the camp that Chaz raised, you have to be able to trust but you also have to be able to know what your trusting out and what the impact is of failure. If the impact of failure exceeds the benefits then maybe that app/service is of the 20% camp - for now. In the end utilizing the benefits of the cloud will all depend on each corporations/business requirements.

...
brian

My estimate about distribution would be 30/70, which might fit a successor of Pareto - Long-Tail type, where I introduce security requirements scale 1 to 10, being just one of several dimension axes (like annual revenue etc).

This guess is based just on intuition so far, but I think Amazon, SUN, IBM et al might already have some historical data optimistically extrapolated to future, supporting that conjecture.

Such approximation helps cloud providers to position their services, targeting certain segment of the market, or do some other important decisions on shaping their LoB.

Khazret Sapenov

Popp, Nicolas

unread,

Jun 24, 2008, 5:50:20 PM6/24/08

to cloud-c...@googlegroups.com

An alternative to insurance is security accreditation (more of a complement, actually).

You could envision industry wide security specifications and policies that cloud service providers could adhere to. Cloud providers would then be accredited and regularly audited for different levels by a neutral trusted entity.

That would create a common foundation for customers to understand the level of protections that are being enforced and provide a base line for trust with customers..

Not too different from things such as PCI compliance standards in the credit card world for example...

Nico
.
Dive into my Blue Ocean: http://blogs.verisign.com/innovation

----- Original Message -----
From: cloud-c...@googlegroups.com <cloud-c...@googlegroups.com>
To: Cloud Computing <cloud-c...@googlegroups.com>
Sent: Tue Jun 24 11:40:36 2008
Subject: Re: Issues of data in the cloud...

Renee Martin

unread,

Jun 24, 2008, 6:39:13 PM6/24/08

to cloud-c...@googlegroups.com

Certifications such as SaaS 70 Type 11 Compliancy, PCI Level 1 Service Provider, Safe Harbor (EMEA), and 100% uptime/security SLAs and mitigation plans should there be a breach or downtown.

Interesting article published in 4/2008 by New York Times clearly outlines the confusion in the space:

nytimes/idg/IDG_002570DE00740E180025742400363509.html?ref=technology

RM

"

From: cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Popp, Nicolas
Sent: Tuesday, June 24, 2008 2:50 PM
To: cloud-c...@googlegroups.com
Subject: Re: Issues of data in the cloud...

An alternative to insurance is security accreditation (more of a complement, actually).

You could envision industry wide security specifications and policies that cloud service providers could adhere to. Cloud providers would then be accredited and regularly audited for different levels by a neutral trusted entity.

That would create a common foundation for customers to understand the level of protections that are being enforced and provide a base line for trust with customers..

Not too different from things such as PCI compliance standards in the credit card world for example...

Nico
.

Dive into my Blue Ocean: blogs.verisign/innovation

Ben Cherian

unread,

Jun 24, 2008, 7:59:14 PM6/24/08

to cloud-c...@googlegroups.com

Good point Nico. I’m sure as the cloud computing industry matures, we’ll start to see extensions to existing accreditations that will include this paradigm. It’ll take some time though since the accreditation organizations are pretty mature and strict in their ratification procedures.

Ben

C. Y.

unread,

Jun 25, 2008, 3:19:19 PM6/25/08

to cloud-c...@googlegroups.com

On Tue, Jun 24, 2008 at 1:14 PM, randall <ran...@qrimp.com> wrote:

I want to support what Subra K said about transparency. The lack of
transparency is the very reason we can't determine if the cloud is or
is not safer than hosting ourselves. Without transparency in the
cloud, we'll never know. It is currently impossible for us to compare
the cloud to our DMZ, and additionally also impossible to compare say,
the security of Amazon's cloud to Mosso's cloud.

And how could this ever be possible? "Transparency" is usually only really marketing literature when dealing with remote facilities.

cy

Suman Chaudhuri

unread,

Jun 25, 2008, 4:08:43 PM6/25/08

to cloud-c...@googlegroups.com

I have been reading this and other related topics on Cloud Computing on this thread and I wanted to pose a question to everyone on here - enterprises, ISVs, etc are all trying to see how to utilize cloud computing to lower their infrastructure and maintenance cost, but at the same time, they are wary of issues such as data security and privacy, integration challenges (integrating your enterprise apps with cloud based apps), etc.

If you are partnering with an IT services provider (Accenture, Deloitte, Wipro, etc), what kinds of services would you want this company to provide in terms of guidance, best practices, methodologies, etc to help either build your product in the cloud (if you are an ISV) or integrate a cloud based product in to your IT environment (if you are an enterprise)?

What are the core areas where you would want guidance and help? What would these service offerings look like?

Suman

Date: Wed, 25 Jun 2008 14:19:19 -0500
From: cty...@gmail.com

To: cloud-c...@googlegroups.com
Subject: Re: Issues of data in the cloud...

Introducing Live Search cashback . It's search that pays you back! Try it Now

Suman Chaudhuri

unread,

Jun 25, 2008, 4:25:06 PM6/25/08

to cloud-c...@googlegroups.com

I posted this by mistake as a response to another thread so I thought I'd re-post it here as a brand new topic:

I have been reading this and other related topics on Cloud Computing on this thread and I wanted to pose a question to everyone on here - enterprises, ISVs, etc are all trying to see how to utilize cloud computing to lower their infrastructure and maintenance cost, but at the same time, they are wary of issues such as data security and privacy, integration challenges (integrating your enterprise apps with cloud based apps), etc.

If you are partnering with an IT services provider (Accenture, Deloitte, Wipro, etc), what kinds of services would you want this company to provide in terms of guidance, best practices, methodologies, etc to help either build your product in the cloud (if you are an ISV) or integrate a cloud based product in to your IT environment (if you are an enterprise)?

What are the core areas where you would want guidance and help? What would these service offerings look like?

Suman

Alexis Richardson

unread,

Jun 25, 2008, 5:11:20 PM6/25/08

to cloud-c...@googlegroups.com

Suman

Surely the first service offering you would want to see is 'cloud
enablement'. Say I have an application - how do I move it to a cloud
deployment and what are the pros and cons of that?

alexis

--
Alexis Richardson
+44 20 7617 7339 (UK)
+44 77 9865 2911 (cell)
+1 650 206 2517 (US)

Suman Chaudhuri

unread,

Jun 25, 2008, 5:21:45 PM6/25/08

to cloud-c...@googlegroups.com

Alexis,

Thanks for the response. However, what you are stating is at a very high level and generic - there are so many things one needs to consider to "move to the cloud" - hosting issues, data issues, integration issues, etc. What you are mentioning is also more along the lines of SaaS rather than just Cloud Computing - in other words if you already have a product and you want to leverage the cloud to serve your product to consumers, what are your issues. That is a mixture of SaaS issues (multi-tenancy, etc) and cloud issues.

My intention in this thread is to break down each issue based on whether you are a consumer or provider of the application and see what the real, granular issues are and what sort of offerings would make sense in your case.

Does that make sense?

Suman

> Date: Wed, 25 Jun 2008 22:11:20 +0100
> From: alexis.r...@gmail.com
> To: cloud-c...@googlegroups.com
> Subject: Re: Cloud Computing Service Offerings