You can also look at LucidEra - http://www.lucidera.com/solutions/index.php
--Naren |
I am *very* new to this group. But i am really excited by the quality
of postings in the group. I am learning a lot, quickly.
I have a couple of questions. May be someone has some answers.
1. I think "data in the cloud" is so far a big block to widespread
adoption and using cloud for large, sensitive and mission critical
applications (espicially for Financial organization). Is someone
thinking of a way to leave the data within the user-premises and do
just the computing in the cloud? Kind of a reverse connection back to
the user datacenter.
That way the conventional data respositories can still be used. The
users will not have to worry about the reliability, availability and
(to a large part) security of the data. We still have to worry about
the security of the data travelling back and forth to and from the
cloud to the user data center.
This probably is more relevant for medium to large scale users with
"sensitive" data.
Comments? tips?
2. Considering the "cloud computing" is at the beginning of its
adoption curve, the user data center will, for a long time, have a
mixture of their own Physical, Virtual devices within their datacenter
along with their "virtual" datacenters in one or more clouds (may be
from different vendors).
The user will obviously look for a management portal that seamlessly
crosses the boundaries of Physical, Virtual and Cloud devices (for
discovery, monitoring at the very least).
Are there some talk/thought on standardizing the "cloud managemnet
actions" and "cloud management data" interfaces?
Comments? tips?
Thanks
--utpal
1. I think "data in the cloud" is so far a big block to widespread
adoption and using cloud for large, sensitive and mission critical
applications (espicially for Financial organization). Is someone
thinking of a way to leave the data within the user-premises and do
just the computing in the cloud? Kind of a reverse connection back to
the user datacenter.
That way the conventional data respositories can still be used. The
users will not have to worry about the reliability, availability and
(to a large part) security of the data. We still have to worry about
the security of the data travelling back and forth to and from the
cloud to the user data center.
This probably is more relevant for medium to large scale users with
"sensitive" data.
Comments? tips?
> 1. I think "data in the cloud" is so far a big block to widespread
> adoption and using cloud for large, sensitive and mission critical
> applications (espicially for Financial organization). Is someone
> thinking of a way to leave the data within the user-premises and do
> just the computing in the cloud? Kind of a reverse connection back to
> the user datacenter.
>
> That way the conventional data respositories can still be used. The
> users will not have to worry about the reliability, availability and
> (to a large part) security of the data. We still have to worry about
> the security of the data travelling back and forth to and from the
> cloud to the user data center.
>
> This probably is more relevant for medium to large scale users with
> "sensitive" data.
>
> Comments? tips?
I've been processing large historical data sets for a Financial
company I'm consulting with using Cascading/Hadoop on EC2/S3.
The biggest bottleneck has been getting data to the compute
infrastructure.
The obvious pattern is to have datacenter processes push data to S3,
then have the temporary cluster spin up and pull data from S3, do
something interesting, then push the results to S3, notify the
datacenter the job is complete (SQS), have the datacenter pull down
the results from S3.
Because of the need to support both well defined daily processes and
ad-hoc processes, my clients data generally needs to stay on S3.
Having it pulled from a remote datacenter on duplicate runs would be
extraordinarily slow and expensive considering Amazon charges for
bandwidth in and out. Plus, it is a bit cheaper just to keep data on
S3 than to buy a NAS for storage.
That said, with bandwidth being the bottleneck in the face of the
ability to spin up 100 or 1000 nodes to crunch numbers, larger pipes
into a vendors Cloud would be very welcome. Otherwise your Cloud
solution is only as fast as getting data in and out of it.
chris
--
Chris K Wensel
ch...@wensel.net
http://chris.wensel.net/
http://www.cascading.org/
> That said, with bandwidth being the bottleneck in the face of the
> ability to spin up 100 or 1000 nodes to crunch numbers
how big are the datasets you're working with? Random or linear access ?
Timothy Huber
Strategic Account Development
tim....@metaram.com
cell 310 795.6599
MetaRAM Inc.
181 Metro Drive, Suite 400
San Jose, CA 95110
1. who will guarantee that the data in S3 is secure from physical and
logical access
2. who will guarantee that the data is always available using a
multi-site recovery system (that is what they would have in their own
data center) that meets their RPO (Recovery Point Objective) and RTO
(Recovery Time Objective) guidelines.
Either Amazon or other Cloud providers will make these available with
EC2 with SP3 (or some other storage mechanism with more robust
security and availability characteristics) or the users will have to
build something similar on their own using EC2 as their basic building
block.
This will be a *very* non-trivial task for any user to do on their own
and they will have to make the decision to put resources to build this
on a cloud or to invest more on their own datacenter.
So I guess a lot will depend on the level of maturity of the clouds.
Not sure if all this work belong in a mid-layer outside of the
original cloud and leave the cloud providers just to provide the basic
building blocks
--utpal
total data is 100's of G. Individual work loads are ~10G. All linear
(this being Hadoop), but there is much joining, binning, and crunching
between the multiple input datasets (the actual workload translates to
~60 MapReduce jobs, all rendered and managed by Cascading).
So it kinda sucks to have uploads of data to the cluster take longer
than it does to compute on it. Worse since my client then has to fetch
the derived data back.
ckw
I personally would like my application-at-the-edge software to also span
a number of in the cloud vendors, so that I don't experience vendor
lock-in problems. In particular, I am concerned that my public facing
services will be targets of DDoS attacks and as a result vendors will
consider abruptly discontinuing service.
For these reasons, I have not been able to consider much of what in the
cloud providers can offer to date, though I continue to build proof of
concept packages in preparation for the point in time that the industry
evolves enough to facilitate my needs. I am very curious if others have
similar concerns and if plausible solutions are being found...
- Marc
How does the "cloud" protect data going from the owner to the computing
service without being compromised (read that as sniffed)? Will a
computing service in country A have the right to impose restrictions on
data from another country (even if the results of the computing don't
affect the citizens of country A)? An so on.
Chuck Wegrzyn
Chuck Wegrzyn
I agree with your concerns. Thus far I have been using vendors within
single governance regions, and then having a policy engine at my
application layer to govern where data is allowed to be operated upon.
So, EU data stays in the EU for example. As the vendors grow to span
multiple boundaries, if they are not providing programmatic interfaces
to allow application layer control of these issues, I may need to avoid
those vendors.
- Marc
Rick
Chuck Wegrzyn
On SaaS wrote:
> Data locality is definitely a huge issue in the cloud. My company works
> with a lot of multi-nationals with huge data sets in various countries.
> In many countries, especially the EU ones as well as like Mexico have
> some fairly strict laws around privacy data (e.g., data with personal
> info, etc.) Some of these multi-national countries have to architect
> their on-premise software around these restrictions (e.g., putting
> on-premise software in each country) and restrict the data movement. One
> of them took several months to study the laws and legality of data
> location and movement before implementing their solution.
>
> So the location of the cloud and data is definitely going to be very
> important to these multi-nationals. That's part of the reasons why
> Amazon has an EU cloud and Salesforce is building a cloud in
> Singapore. Some of the countries are also wary of putting any data
> inside U.S. due to concerns about patriot act. In general the country
> where the data resides has jurisdiction over it.
>
> --
> OnSaaS.net - /Blogging about the SaaS and cloud computing world/
> OnSaaS.info - Providing a continuous stream of SaaS and cloud computing news
> /Follow on http://twitter.com/onsaas, http://friendfeed.com/onsaas/
> /
> /
>>>> ch...@wensel.net <mailto:ch...@wensel.net>
>>>> http://chris.wensel.net/
>>>> http://www.cascading.org/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>>
>>>
>>
>>
>>
>>
>
>
>
> >
Chuck
I'd think the approach is to keep the data still and move the computing
to it. The idea is to see the thousands of machines it takes to hold the
petabytes worth of data as the compute cloud. What needs to move to it
is the programs that can process the data. I've been working on this
approach for the last 3 years (Twisted Storage).
Chuck Wegrzyn
I'd think the approach is to keep the data still and move the computing
to it. The idea is to see the thousands of machines it takes to hold the
petabytes worth of data as the compute cloud. What needs to move to it
is the programs that can process the data. I've been working on this
approach for the last 3 years (Twisted Storage).
Chuck Wegrzyn
So my question is as follows: what makes a good "storage cloud"?
Chuck Wegrzyn
Chuck Wegrzyn
On SaaS wrote:
> That depends on how the cloud is architected, no?
>
> And I would think the cloud providers will have to start answering these
> questions if they want large enterprises to start adopting the
> cloud. There maybe no control of which server in the cloud is doing the
> computation, but service providers may provide options to restrict based
> on geographic domains.
>
> We have quite a few people here from the cloud providers, maybe they can
> share some insight?
>
> thx
>
> On Jun 19, 2008, at 10:44 AM, Stuart Altenhaus wrote:
>
>> I think Chaz is right. There are privacy issues regarding use and
>> exposure of data that vary country by country. If the cloud computes
>> the data, there is no control on where that data is moved for
>> computation, right?
>>
>> R/s,
>> Stu Altenhaus
>>
>> Sent from my Verizon Wireless BlackBerry
>>
>> -----Original Message-----
>> From: "Chaz." <eprpa...@gmail.com <mailto:eprpa...@gmail.com>>
>>
>> Date: Thu, 19 Jun 2008 13:40:20
>> To:cloud-c...@googlegroups.com
>> <mailto:cloud-c...@googlegroups.com>
>>>> <mailto:ch...@wensel.net>>
>>>>> ch...@wensel.net <mailto:ch...@wensel.net>
>>>>> http://chris.wensel.net/
>>>>> http://www.cascading.org/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>>>
>>>
>>
>>
>>
>>
>>
>>
>
> --
> OnSaaS.net - /Blogging about the SaaS and cloud computing world/
> OnSaaS.info - Providing a continuous stream of SaaS and cloud computing news
- Marc
I definitely agree with your point. I can't think of very many
multi-nationsls that would let there data out to wander around. I'd
think they would want to protect their data and move the computing
resources close to it....
Chuck
>> <mailto:cloud-c...@googlegroups.com>
>>>> <ch...@wensel.net <mailto:ch...@wensel.net>>
>>>>> ch...@wensel.net <mailto:ch...@wensel.net>
>>>>> http://chris.wensel.net/
>>>>> http://www.cascading.org/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>>>
>>>
>>
>>
>>
>>
>>
>>
>
> --
> OnSaaS.net - /Blogging about the SaaS and cloud computing world/
> OnSaaS.info - Providing a continuous stream of SaaS and cloud
> computing news
> /Follow on http://twitter.com/onsaas, http://friendfeed.com/onsaas/
>
>
>
>
>
>
> --
> Jim Peters
> +415-608-0851
> >
If it is in the cloud then we are still dealing with Security,
Availability and Recoverability isues (that everyone agrees on).
If is in the users data center then how will the computing resources
offered (and controlled by Amazon) be brought to that specific user's
datacenter?
--utpal
if the Cloud is a collection of compute resources, and you need to
apply them to lots of your data, you have little choice but to move
your data. you can't move the compute power. (unless you order a
shipping container of servers I guess)
ckw
--
1) is it possible to have the app run on AWS so that the derived data does not need to traverse back down in real time (that way you could use a lazy download in the background to archive it in their DC while their app accesses the copy in real time on AWS.)?
2) I've been thinking about the problem of upload times as well (in the context of large DNA data sets). The cost of loading into AWS is not that prohibitive so if one where to pre-process that data such that it could be uploaded in a bunch of parrallel processes to AWS you could reduce the bottleneck considerably. In theory.
Now if you have to build out an autonomic system we will never have
secure cloud computing. No system today is so tight that it can't be
hacked. Just look at all the attempts to protect DVDs or BD disks...
Chuck Wegrzyn
Lynne VanArsdale wrote:
> Just joined cloud-computing and this is the first conversation I've
> received.
>
> A couple of weeks ago I attended Gartner Security where Neil MacDonald
> spoke on "Adaptive Security." In a nutshell, this approach builds a
> resilient system for secure data, acting much like the human immune
> system. It involves whitelisting as the foundation, blacklisting as a
> mid-tier and learned/adaptive mechanisms at the top. In such an
> environment, elements would be "autonomic" and self-managing to a large
> degree, and would share and communicate with other elements to protect
> workloads and information (as opposed to endpoints). There is a lot
> more to this vision, and it is probably a number of years away, but it
> may be a reasonable approach to address the concerns about data security
> being discussed here.
>
> In any case, does anyone know of any product or standards efforts for
> the industry to collaborate on a more cohesive architecture for security
> in the cloud?
>
>
> On 6/19/08, *Chaz.* <eprpa...@gmail.com
> <mailto:eprpa...@gmail.com>> wrote:
>
>
> Jim,
>
> I definitely agree with your point. I can't think of very many
> multi-nationsls that would let there data out to wander around. I'd
> think they would want to protect their data and move the computing
> resources close to it....
>
> Chuck
>
> Jim Peters wrote:
> > Even if the cloud providers come up with excellent answers to the
> > security and reliability questions, who's going to trust them? Credit
> > card numbers are one thing, but cloud data is something else
> entirely.
> >
> > +J
> >
> > On Thu, Jun 19, 2008 at 10:57 AM, On SaaS <ons...@gmail.com
> <mailto:ons...@gmail.com>
> <mailto:To%3Acloud-...@googlegroups.com>
> >> <mailto:cloud-c...@googlegroups.com
> >>> <mailto:cloud-c...@googlegroups.com
> <mailto:ch...@wensel.net <mailto:ch...@wensel.net>>>
> <mailto:ch...@wensel.net <mailto:ch...@wensel.net>>
I'm not sure I would agree you have to ship your data to somewhere else.
After all a "cloud data provider" could create just the secure
environment for holding the data and processing it (isn't that really
what S3 is all about?). The only thing the using company needs to do is
write the program and have it installed, more or less automagically, on
the machines that hold the user's data.
Chuck Wegrzyn
I suggest you do the same regarding security, just assume it's a
hostile environment.
The question is, what features of AWS support you in this? shared
keychains/stores, encrypted volumes, CA, kerberos, ?? or will this
always be left to the user. or could you ever really trust those
services the same way you trust them to not lose data.
That said, not being a security person. What 'cloud security services'
could a provider provide? Or should they even bother.
ckw
http://chris.wensel.net/
http://www.cascading.org/
You are absolutely correct. Once you have a person involved it can be
compromised. It is all about risk and how to make it so small it would
take an act of God (or a really large budget) to breach it!
Chuck
> <mailto:cloud-c...@googlegroups.com>
> > <mailto:To%3Acloud-...@googlegroups.com
> <mailto:3Acloud-...@googlegroups.com>>
Perhaps the real solution is to carefully architect your solution to
provide "bulk" services outside the company and leave the critical
things - those that are absolutely vital - to inside the company.
Chuck Wegrzyn
The nice thing is that we can be very flexible with our strategy as it
relates to where the data will reside, how it will be stored, and at
what rate it will be synchronized from the application. We can change it
overtime to best fit our application scenario and constraints without
touching our application code.
" And CORBA isn't what I am thinking of, or even HADOOP but things like
JavaSpaces (?)."
JavaSpaces is indeed more relevant for this type of scenarios.
What's unique about JavaSpaces IMO is that it can be used for handling
both the compute side and the data storage. The references above shows
how you could use space-based storage for handling the data side.
Now that its stored in the in-memory space cluster you can easily use
the same space to route business logic on those in-memory instances in
parallel. There's a nice way to abstract that from the user using a
remoting abstraction - see more details on how that works here:
http://uri-cohen.blogspot.com/2008/02/openspaces-svf-remoting-on-steroid
s.html
Nati S.
GigaSpaces
-----Original Message-----
From: cloud-c...@googlegroups.com
[mailto:cloud-c...@googlegroups.com] On Behalf Of Chaz.
For absolute control, you should stay absolutely in control of the
resources and I don't think Cloud Computing is something for you.
If you want a secured environment you should understand that the
administrator of the resource can read the memory. If you want to
prevent that, then you should look into secured techniques to hide the
memory contents. Ultimately you must get your instructions through a
(virtual) CPU which can also be obscured on what it is doing, but that's
the only threat left in that solution.
Most cloud computing solution don't allow for that level of security in
the cloud.
The other more generic good thing to do IMHO is to encrypt all your data
that resides in the Cloud. Irregardless if this is somewhere between
absolutely needed and a tiny wish. This would also solve issues where
inadvertently some transfer protocol are unencrypted.
For some bio-medical use-cases in Grid computing (more my main field)
this approach is also being used. Decryption happens just prior to the
actual processing. A more advanced solution is the sliding decrypted
window approach. Where the dataset is decrypted per section or block.
CPU usage goes up, but most of the file/database stays encrypted and
opportunities to snoop around on the resource is very limited in its
opportunity.
cheers,
Oscar Koeroo
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
> To post to this group, send email to cloud-c...@googlegroups.com
> To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
> For more options, visit this group at http://groups.google.ca/group/cloud-computing?hl=en
> -~----------~----~----~----~------~----~------~--~---
Hi Chris,
I've looked at this issue quite a bit too. There are a few ways that I think the problem can be "relieved"
1. Don't encourage your clients to download the entire data-set. As long as you provide URLs to the "crunched data", they should only have to pull the data as needed. You can index the data too using SDB too - a nice convenience function for searching the data.
2. See if the customers can split the dataset into sub-datasets, each reachable via some sort of URL. When you run your Map job, each of the Map nodes will be responsible for downloading the data from your clients - you might get some benefits from the parallelization of the download.
3. Use S3 for more of a backing store - If you don't have many clients consuming the data, or you think that the clients will download the data soon after the mapreduce job is complete, they can download it directly from the HDFS (http://hadoop.apache.org/core/docs/r0.17.0/hdfs_design.html#Browser+Interface)
I don't know if that helps.
Regards,
Alan Ho
Visit our website at http://www.nyse.com
*****************************************************************************
Note: The information contained in this message and any attachment to it is privileged, confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to the message, and please delete it from your system. Thank you. NYSE Euronext, Inc.
__________________________________
Chris Marino
SnapLogic,
Inc.
Really Simple
Integration
www.snaplogic.com
650-655-7200
" Question: Is the scenario you described any different from the one you presented at the Spring Experience conference in Miami FL, back in December 2007?"
The principles are quite the same – this pattern becomes even more applicable and relevant in the cloud simply since cloud emphasize the implicit contradiction between dynamic scaling and persistency.
Its very easy to add more memory resources dynamically, its much harder to add persistence resources on demand or dynamically partition them. Its easier to handle continues high availability with memory based cluster then with failure on the persistent layer etc.
HTH
Nati S.
<br
" Question: Is the scenario you described any different from the one you presented at the Spring Experience conference in Miami FL, back in December 2007?"The principles are quite the same – this pattern becomes even more applicable and relevant in the cloud simply since cloud emphasize the implicit contradiction between dynamic scaling and persistency.
Its very easy to add more memory resources dynamically, its much harder to add persistence resources on demand or dynamically partition them. Its easier to handle continues high availability with memory based cluster then with failure on the persistent layer etc.
> Can you clarify this?
The issue I was raising is that in some cases when data is partitioned
adding more data grid "members" may result in re-partitioning of the
cached data. If that is indeed the case then scaling the data grid may
become difficult. If the cluster members need to resync on the new
partitioning scheme, this presents an issue for existing clients, and
may result in delays accessing the cached data.
Simply put, if you have to re-partition, scaling the data-grid is not
that simple.
> Most datagrid products including ours (IBM WebSphere eXtreme Scale)
> can scale to a couple of thousand JVMs pretty easily and can be
> expanded while they are running transparently. Expanding a grid
> provides it with more CPU, network and memory.
Yep.
>
> Which scalability aspect have you had a problem with?
We have been very successful scaling application services using a
declarative SLA approach. This approach is based on an open source
project called Rio (http://www.rio-project.org).
HTH
Dennis
Hadoop, fantastic idea (it would be great if it worked...)
if you need a production ready environment in Finance, it's a long way off. The distributed caching products, Gemfire, Oracle's Coherence and Nati's gigaspaces are all miles ahead of hadoop at this point, some more than others ;-)
On Jun 20, 2008, at 12:11 PM, Alan Ho wrote:
Yeah. The whole issue with SOA as it is today is that you are expected to move the data to where the data is processed. What we really need is the ability to move the processing to where the data is (Which is kinda the point of Hadoop)
Cheers,
Alan Ho
----- Original Message ----
From: Chris K Wensel <ch...@wensel.net>
To: cloud-c...@googlegroups.com
Sent: Friday, June 20, 2008 8:15:29 AM
Subject: Re: Business Intelligence solution in Cloud Computing
Thanks for the comments Alan. My previous post should outline how we have parallelized much of the infrastructure to alleviate my clients issues to a reasonable degree. In short, we employed the patterns you suggest, but not the specific technologies for various reason. I'd be happy to go into a little more detail offline.The gist of my comments in this thread are to complain that you can't unfortunately scale bandwidth into a cloud to match the relative scale of the compute resources, currently. many hours to upload, and relatively few minutes to crunch, is an annoying imbalance.For the analytics in the cloud space, there is an opportunity for a vendor to offer whatever services (many introduced in this thread by others) to alleviate the imbalance.cheers,
ckw
On Jun 19, 2008, at 11:00 PM, Alan Ho wrote:
Hi Chris,
I've looked at this issue quite a bit too. There are a few ways that I think the problem can be "relieved"
1. Don't encourage your clients to download the entire data-set. As long as you provide URLs to the "crunched data", they should only have to pull the data as needed. You can index the data too using SDB too - a nice convenience function for searching the data.
2. See if the customers can split the dataset into sub-datasets, each reachable via some sort of URL. When you run your Map job, each of the Map nodes will be responsible for downloading the data from your clients - you might get some benefits from the parallelization of the download.
3. Use S3 for more of a backing store - If you don't have many clients consuming the data, or you think that the clients will download the data soon after the mapreduce job is complete, they can download it directly from the HDFS (http://hadoop.apache.org/core/docs/r0.17.0/hdfs_design.html#Browser+Interface)
I don't know if that helps.
Regards,
Alan Ho
Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! MailVisit our website at http://www.nyse.com
*****************************************************************************
Note: The information contained in this message and any attachment to it is privileged, confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to the message, and please delete it from your system. Thank you. NYSE Euronext, Inc.
Range based partitioning isn't used by anyone except google (for now)
and here you start with a single partition representing A-Z and fill
it until it splits to A-D,E-Z and so on splitting as it fills. New
servers can force a split or migration of an existing range. Cost of
moving is similar but can be more expensive than hash based because
with hash based, each partition is independant of all others but with
range based indexes need to be split etc which makes it more
expensive. Range based look very attractive to me though moving
forward with IBM eXtreme Scales evolution.
But either way, adding servers is handled up to large numbers of
servers using this mechanism and we haven't see any issues with it
except for the CPU and network burn on existing servers whose data is
being moved to the new servers but the other servers carry on
independantly.
We are working with something conceptually similar to range-based partitioning as well, in our case novel high-dimensional spaces (not yet a public product), and I would like to add a few additional reasons it is a superior mechanism in many use cases relative to simple hash distribution:
1.) Semi-adaptive runtime resource balancing across partitions is trivially possible that helps smooth over hotspots, whether CPU, I/O, storage, etc. This is particularly true if the cloud platform is using data structures that span many nodes for a single end-user instance e.g. a database, as in our case.
2.) Preservation of spatial relationships in they key range e.g. in the Google case you cite above. For some simple cases like session objects there may be no meaningful spatial relationships in the key set worth preserving and hashing is simpler, but for cases like Google (and us) the lack of locality and order in the distribution of keys in the cluster has very negative performance implications.
3.) For more clever clouds, you can start doing cost-based optimizations and computing more efficient and complex access patterns with the metadata when using range partitioning. However, I don't think any current production cloud platform can really take advantage of this so it is not very important at this point (but will be in the future).
Also, I would point out that the network burn can be managed so that the impact is quite modest when repartitioning. There is no requirement to have one partition per physical node as opposed to several, and if you look at e.g. Google's architecture you find that a single partition can be moved across the network in about a second assuming wire speed. There are a lot of advantages to using several smaller logical partitions per physical node, not the least of which is that partitioning events are much smaller, more distributed, and you can keep the physical node hovering closer to max capacity most of the time. Repartitioning should generally be a smaller resource event on the network than e.g. the very regular transparent recovery going on from node failures in the case of a large cluster.
As you can see, range partitioning reflects a different basic use case than hash partitioning. Hash partitioning is simple and effective when access is only going to be affecting one logical node, but range partitioning becomes valuable when a single access may span multiple nodes. A lot of clouds currently use the former model because it is simple and adequate given the current state of what the clouds can do.
Andrew
Chuck Wegrzyn
The cloud is going to be no different. I already know of firms who are
looking at differentiating based on their ability to provide trusted
cloud data services and believe that their economies of scale in
offering those services will be far better than an individual IT shop's
cost structure for meeting data security and compliance requirements.
Most data security breaches come from insiders anyway (either
carelessness or maliciousness).
-- Jim Blakley
The cloud is going to be no different.
giving up physical control to move data into the cloud is inherently less secure than keeping the data behind firewalls in corporate data centers
in my experience, are many times just as -- if not more -- secure than hosting providers in the cloud.
when we completely miss the mark on this point?
I suspect that over a long period of time the cloud providers will simply
get much better at doing this, and so much cheaper that the balance of
this decision will shift for the majority of the market.
There may also be premium priced verticalized clouds that specialize, for
example on HIPAA compliance, and give a better level of regulation and
controls while still saving $.
One things for certain, none of this is likely to happen overnight.
Ian @
infreemation.net
On Mon, 23 Jun 2008 14:08:21 -0400, Jim Peters <jazzm...@gmail.com>
wrote:
>> To:cloud-c...@googlegroups.com<To%3Acloud-...@googlegroups.com>
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
reuven
--
--
Reuven Cohen
Founder & Chief Technologist, Enomaly Inc.
www.enomaly.com :: 416 848 6036 x 1
skype: ruv.net // aol: ruv6
blog > www.elasticvapor.com
-
Get Linked in> http://linkedin.com/pub/0/b72/7b4
Data is much, much different. Unlike cash it can be shared without me
knowing it. As such there are few guarantees that it can remain private
with out me knowing it.
When it comes right down to it there are a fixed number of risks in
securing data, both in and out of the corporation. What I think is true
is that there is a more inherent risk in farming out the data than
keeping it inside.
But that said, there is no absolute certainty of security.
Chuck Wegrzyn
Ray Nugent wrote:
> Big Businesses don't keep all their cash in a safe in the basement at
> HQ, they use banks. Why do they need to keep their data on servers in
> the HQ data center rather than a secured third party repository?
>
> Ray
>
> ----- Original Message ----
> From: Ian Rae <ian...@gmail.com>
> To: cloud-c...@googlegroups.com
> Sent: Monday, June 23, 2008 11:52:09 AM
> Subject: Re: Issues of data in the cloud...
>
>
> Which brings up a great point, if I'm a big company and do this inhouse
> and mishandle the data in an equivalently bad way or worse, can I do a
> better job of recovery? Maybe. The balance of risks may be slowly shifting.
>
> I suspect that over a long period of time the cloud providers will simply
> get much better at doing this, and so much cheaper that the balance of
> this decision will shift for the majority of the market.
>
> There may also be premium priced verticalized clouds that specialize, for
> example on HIPAA compliance, and give a better level of regulation and
> controls while still saving $.
>
> One things for certain, none of this is likely to happen overnight.
>
> Ian @
> infreemation.net
>
> On Mon, 23 Jun 2008 14:08:21 -0400, Jim Peters <jazzm...@gmail.com
> <mailto:jazzm...@gmail.com>>
> wrote:
>
> > It all comes down to risk.
> >
> > Say you're a big company with valuable data. You decide to let a 3rd
> > party
> > manage that data for you, because it's significantly cheaper than doing
> > it
> > yourself.
> > If the 3rd party mis-handles your data, you can probably get your money
> > back. But that's it. All you can recover is what you paid them. I can't
> > see
> > a 3rd party being able to cover the true value of the data if it gets
> > mis-handled. So I think big cos. will be very reluctant to go this route.
> >
> > On Sun, Jun 22, 2008 at 10:46 PM, <jurq...@yahoo.com
> <mailto:jurq...@yahoo.com>> wrote:
> >
> >>
> >> Somebody may already have replied to this, but I have to ask: how is
> >> maintaining physical control over the hardware that stores the data
> >> guaranteed to be any more secure than having a business who's
> >> lifeblood depends on its providing security to customers do it? I
> >> should note that I think "maintain physical control of [the data]" is
> >> completely nonsensical.
> >>
> >> James
> >>
> >> On Jun 19, 1:18 pm, "Chaz." <eprparad...@gmail.com
> <mailto:eprparad...@gmail.com>> wrote:
> >> > Security is a funny issue. Can you ever use a cloud computing complex
> >> > and know for certain your data is protected? I'm betting there is no
> >> > fool proof way that it can be. So the only real way is to fall back to
> >> > what we know today: maintain physical control of it for once that is
> >> > gone you are on your own baby.
> >> >
> >> > Chuck Wegrzyn
> >> >
> >> > Utpal Datta wrote:
> >> > > May be this is a redundant question, where is this protected data
> >> > > residing? In the cloud or in the user's data center?
> >> >
> >> > > If it is in the cloud then we are still dealing with Security,
> >> > > Availability and Recoverability isues (that everyone agrees on).
> >> >
> >> > > If is in the users data center then how will the computing resources
> >> > > offered (and controlled by Amazon) be brought to that specific
> >> user's
> >> > > datacenter?
> >> >
> >> > > --utpal
> >> >
> >> > > On Thu, Jun 19, 2008 at 3:10 PM, Chaz. <eprparad...@gmail.com
> <mailto:eprparad...@gmail.com>>
> >> wrote:
> >> > >> Jim,
> >> >
> >> > >> I definitely agree with your point. I can't think of very many
> >> > >> multi-nationsls that would let there data out to wander around. I'd
> >> > >> think they would want to protect their data and move the computing
> >> > >> resources close to it....
> >> >
> >> > >> Chuck
> >> >
> >> > >> Jim Peters wrote:
> >> > >>> Even if the cloud providers come up with excellent answers to the
> >> > >>> security and reliability questions, who's going to trust them?
> >> Credit
> >> > >>> card numbers are one thing, but cloud data is something else
> >> entirely.
> >> >
> >> > >>> +J
> >> >
> >> > >>> On Thu, Jun 19, 2008 at 10:57 AM, On SaaS <ons...@gmail.com
> <mailto:ons...@gmail.com>
> <mailto:cloud-c...@googlegroups.com><To%3Acloud-...@googlegroups.com
> <mailto:3Acloud-...@googlegroups.com>>
> >> > >>>> <mailto:cloud-c...@googlegroups.com
> >> > >>>>> <mailto:cloud-c...@googlegroups.com
> <mailto:ch...@wensel.net <mailto:ch...@wensel.net>>>
> <mailto:ch...@wensel.net <mailto:ch...@wensel.net>>
Replication to the cloud might be very simple, depending on volume of
course, but it is possibly to keep that very secure.
Oscar
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
> To post to this group, send email to cloud-c...@googlegroups.com
> To unsubscribe from this group, send email to cloud-computi...@googlegroups.com
> For more options, visit this group at http://groups.google.ca/group/cloud-computing?hl=en
> -~----------~----~----~----~------~----~------~--~---
Yes, IMO, both Pareto and the normal technology adoption lifecycle (http://en.wikipedia.org/wiki/Technology_Adoption_LifeCycle) apply. Amazon would be happy to demonstrate that there are plenty of companies at the innovator end of the curve for whom S3 and SimpleDB are more than adequate. Hospitals and financial service firms are very likely to be the laggards in cloud-based storage. (Although you have some innovators even there: http://googlesystem.blogspot.com/2008/02/cleveland-clinic-to-pilot-google-health.html)
There’s a huge range of companies between the innovators and laggards and the solutions will evolve and improve with time to meet most companies’ needs. Each one will make an individual choice around whether keeping the data housed in their own data center or in the cloud is more or less costly or more or less secure.
I can remember a time when many people said they would never trust a web site with their credit card number. Ultimately, convenience and increasing levels of trust in Amazon, Paypal, et. al. made the concern moot for the 80%. It would be a mistake to think that there won’t be a Paypal equivalent for cloud based data storage just because one hasn’t emerged yet.
-- Jim Blakley
Bad metaphor. If the bank loses the cash, the chances are good that BigCo can get it all back.
If some Clouds "R" Us. loses (or worse exposes) some data, all BigCo can do is get their money back. The damage is likely to be much greater than the piddly-ish amounts that BigCo will have paid Clouds R Us. Maybe they can slap Clouds "R" Us with a lawsuit that would put Clouds "R" Us out of business, but that doesn't remedy the situation.
+J
Think of how simple it would be to write a wrapper DLL or library that
looks ands acts like some encrypt library but also diverts the contents.
It would be very very simple to do this.
Yes, it is possible to have this same thing happen in a corporation
under my control, but I can audit for it or lock the system down (well
almost). But in a remote site holding my data how can I be sure?
Chuck Wegrzyn
Big Businesses don't keep all their cash in a safe in the basement at HQ, they use banks. Why do they need to keep their data on servers in the HQ data center rather than a secured third party repository?
Pratap
Chuck Wegrzyn
Information is more like "trash" - what one person sees as garbage and
worthless might be worth its weight in gold. lol. But security is
different from the item being secured.
A bank without security no one would trust. So the point is what makes
data secure? When it hits the "cloud" there are more ways to subvert it
than if you maintain position of it. The amount of subversion is the
"risk", and the risk is what we are talking about.
Chuck Wegrzyn
Chuck Wegrzyn
What Jim says is interesting as it maps back to impact analysis in regards to security; regardless of corporation size. Each company has to classify their application/services at some point and within those classification tiers is implied many different dependent and independent variables. Those variables are too long to list but in this thread a few are raised but they themselves are not the entirety of the components of apps/service classification.
The cloud is going to be no different.
I find this to be true in relation to this discussion as it relates to "cost of downtime". In Jim's example of a a security firm(s) sourcing out a service based against CAPEX/OPEX expenditures is interesting but is it valid? What I mean is in certain industries like hospital's if critical services are down then the cost of downtime is death - an extreme example but true. So my open-ended question is; does Jim's example speak truth to the 20% of Pareto Law or to the 80%?
Information security and compliance will always be there (again to Jim's point) but the question is who can utilize the benefits of the Cloud? I am of the camp that Chaz raised, you have to be able to trust but you also have to be able to know what your trusting out and what the impact is of failure. If the impact of failure exceeds the benefits then maybe that app/service is of the 20% camp - for now. In the end utilizing the benefits of the cloud will all depend on each corporations/business requirements.
...
brian
An alternative to insurance is security accreditation (more of a complement, actually).
You could envision industry wide security specifications and policies that cloud service providers could adhere to. Cloud providers would then be accredited and regularly audited for different levels by a neutral trusted entity.
That would create a common foundation for customers to understand the level of protections that are being enforced and provide a base line for trust with customers..
Not too different from things such as PCI compliance standards in the credit card world for example...
Nico
.
Dive into my Blue Ocean: http://blogs.verisign.com/innovation
----- Original Message -----
From: cloud-c...@googlegroups.com <cloud-c...@googlegroups.com>
To: Cloud Computing <cloud-c...@googlegroups.com>
Sent: Tue Jun 24 11:40:36 2008
Subject: Re: Issues of data in the cloud...
Certifications such as SaaS 70 Type 11 Compliancy, PCI Level 1 Service Provider, Safe Harbor (EMEA), and 100% uptime/security SLAs and mitigation plans should there be a breach or downtown.
Interesting article published in 4/2008 by New York Times clearly outlines the confusion in the space:
nytimes/idg/IDG_002570DE00740E180025742400363509.html?ref=technology
RM
"
From:
cloud-c...@googlegroups.com [mailto:cloud-c...@googlegroups.com] On Behalf Of Popp, Nicolas
Sent: Tuesday, June 24, 2008 2:50
PM
To:
cloud-c...@googlegroups.com
Subject: Re: Issues of data in the
cloud...
An alternative to insurance is security accreditation (more of a complement, actually).
You could envision industry wide security specifications and policies that
cloud service providers could adhere to. Cloud providers would then be
accredited and regularly audited for different levels by a neutral trusted
entity.
That would create a common foundation for customers to understand the level of
protections that are being enforced and provide a base line for trust with
customers..
Not too different from things such as PCI compliance standards in the credit
card world for example...
Nico
.
Dive into my Blue Ocean: blogs.verisign/innovation
Good point Nico. I’m sure as the cloud computing industry matures, we’ll start to see extensions to existing accreditations that will include this paradigm. It’ll take some time though since the accreditation organizations are pretty mature and strict in their ratification procedures.
Ben
I want to support what Subra K said about transparency. The lack of
transparency is the very reason we can't determine if the cloud is or
is not safer than hosting ourselves. Without transparency in the
cloud, we'll never know. It is currently impossible for us to compare
the cloud to our DMZ, and additionally also impossible to compare say,
the security of Amazon's cloud to Mosso's cloud.
Date: Wed, 25 Jun 2008 14:19:19 -0500
From: cty...@gmail.com
To: cloud-c...@googlegroups.com
Subject: Re: Issues of data in the cloud...
Surely the first service offering you would want to see is 'cloud
enablement'. Say I have an application - how do I move it to a cloud
deployment and what are the pros and cons of that?
alexis
--
Alexis Richardson
+44 20 7617 7339 (UK)
+44 77 9865 2911 (cell)
+1 650 206 2517 (US)