1 application, multiple datastores

GTako

unread,

Dec 19, 2008, 10:05:52 AM12/19/08

to Google App Engine

Hi, is it possible to maintain under 1 application, multiple
datastores that each datastore will be as if it is different app
engine account?
for example: i have a web application that should serve 2 companies, A
and B. I would want to open a google app engine account for the web
application files. the datastores for A and B could be 2 different
deployments under the same app engine account or under seperate
accounts. now assume i have N companies. what should i do?
the reason for seperation is that i dont want the datastores will be
dependent and under same account in case soemthing happens. please
advise.

Bill

unread,

Dec 19, 2008, 12:47:24 PM12/19/08

to Google App Engine

It's currently not possible to address multiple datastores. Just
looking at the API, it looks like addressing datastores should be
possible because the keys include an app name, etc, but the App Engine
team has said this feature is not coming any time soon. Cross-app
datastore queries complicate the business model when you offer free
apps. I think, though, that this is an important feature and should
be supported under the pay-as-you-go option, i.e., if you want your
datastore to be available cross-app, you elect to forfeit your free
quota.

Feel free to star this enhancement request:
http://code.google.com/p/googleappengine/issues/detail?id=106

-Bill

Ben Bishop

unread,

Dec 20, 2008, 12:17:43 AM12/20/08

to Google App Engine

Not sure what you mean by "in case something happens" - your app and
its datastore is served by the same network of servers that serve
other apps, so separate accounts won't help, (unless you're going
against the Terms of Service, running the risk of having an account
banned).

One App Engine account can have 10 apps, each with its own datastore
and quota. You could deploy a single app's codebase to multiple app
slots, simply by changing the app name in the app.yaml for each
instance. That way you could test on a production "test" app or one of
your client apps before rolling out updates to your other client apps.

You still maintain a single codebase, each client app has its own
datastore, and you can control updates.

hawkett

unread,

Dec 20, 2008, 8:09:48 AM12/20/08

to Google App Engine

This is a required feature for a commercial SaaS/PaaS offering, and is
not the same as Bill's issue in previous thread entry (Issue 06).
This discussion can help you understand why -

http://blogs.zdnet.com/service-oriented/?p=1236

as can bugs like this

http://forum.assembla.com/forums/3/topics/256

We need it to be as close to impossible for one customer's data to be
made available to another customer, without having to deploy a new
instance of the application.

Let's call it data segregation. A concept of 'virtual instances'
would be a possible approach - so we can aggregate billing & quota
stats across multiple instances, and also identify individual instance
billing and quota.

Use Case:
1. Customer comes to my site
2. Clicks the 'Sign up now' button
3. Enters their details
4. Starts using the system

You can't get a more 'core' use-case than that for a SaaS/PaaS
platform. Notice there is no requirement to deploy a new version of
the app for this customer. The system spawns a virtual instance of
the app - or at least allows mapping a single datastore partition to
the authenticated entity. You coudl extend it by allowing multiple
datastores per authenticated entity and choosing the appropriate one
at authentication time.

The key requirement is that we can on-board a customer without manual
intervention, and accurately understand a single customer's usage
profile. Data corruption for one customer does not equal data
corruption for another customer.

This feature is in some ways the *opposite* of the feature request
identified by the previous poster - we *do not* want to be able to
access data in another partition - even if we tried to, and especially
via a bug in our code.

Here it is, please star it :)

http://code.google.com/p/googleappengine/issues/detail?id=945

Any chance someone at Google has something to say about it?

Thanks,

Colin

Roberto Saccon

unread,

Dec 20, 2008, 10:21:57 AM12/20/08

to Google App Engine

You can just use separate tables for each customer. I haven't tried it
myself yet, so I don't know all the problems you will have to deal
with. Of course than the dynamic handling with class names for the
models gets more complex (and I am not aware of any existing framework
for that) and porting existing python web apps to appengine also gets
much more complicated.

An easier approach is just to prepend keys with a customer identifier,
but then you still have the possibility to query on all customers, and
if things go wrong, possibly breaking the segregation.

I do not know how Google has solved this problem at "Google Apps for
your domain", but of course would love to hear how they have done it.

regards
Roberto

Andy Freeman

unread,

Dec 20, 2008, 7:30:42 PM12/20/08

to Google App Engine

Neither of the cited discussions nor your comments explain why it's
different that Bill's "access to separate datastore" request. In
fact, his request is essentially "at least allows mapping a single

datastore partition to the authenticated entity".

There are some issues with accounting, but if your app can do its
accounting in the user's datastore, you get that too.

> > > advise.- Hide quoted text -
>
> - Show quoted text -

hawkett

unread,

Dec 20, 2008, 10:30:35 PM12/20/08

to Google App Engine

Andy - they are essentially mutually exclusive. One suggests it
should be impossible for the same piece of code to access separate
datastore instances, the other suggests that this is a desirable
feature. I don't see how you consider them the same - are you saying
that you can't see how the cited bug is caused by multiple customers
sharing the same data space? I don't understand your perspective -
the difference seems utterly obvious to me.

I *can* see that depending on the use case, one or the other would be
good. In most cases I would say access between different customer
data spaces is better modelled through an API accessible by HTTP.

Perhaps you have a different use case where you have the same app
deployed multiple times and do not have the customer data segregation
issue, but that is not what the original poster is talking about. The
original poster is *clearly* and *unambiguously* talking about
avoiding bugs like the one cited, and doing so through a low level
data partition.

Andy Freeman

unread,

Dec 21, 2008, 12:55:42 PM12/21/08

to Google App Engine

> One suggests it
> should be impossible for the same piece of code to access separate
> datastore instances, the other suggests that this is a desirable
> feature. I don't see how you consider them the same - are you saying
> that you can't see how the cited bug is caused by multiple customers
> sharing the same data space?

Right now, separate applications have separate code and separate
datastores. If management issues are the only obstacle to using
separate applications for different users, that tells us that separate
datastores do not share the same data space for these purposes.

Yes, there is the issue that application code has to manage the
customer-specific datastores, but if multiple customers are hosted on
the same hardware, someone's code has to do that work and it's unclear
why application code can't be part of that process. If the response
is that application code isn't trusted by customers to maintain
separation, I'm going to ask how you do maintenance and fixes on their
behalf.

Note that customers don't write application code in this model,
whether they use separate applications or one that uses customer-
specific datastores.

Here's how it would work. Customer accesses system, system figures
out which datastore to use, system acts upon datastore on customer's
behalf using application code.

Note that this is exactly the same way that any scheme with shared
hardware would accomplish the same separation. The only difference is
whether the "figure out" is done by Google or by you.

> > > - Show quoted text -- Hide quoted text -

hawkett

unread,

Dec 21, 2008, 5:13:40 PM12/21/08

to Google App Engine

> Yes, there is the issue that application code has to manage the
> customer-specific datastores, but if multiple customers are hosted on
> the same hardware, someone's code has to do that work and it's unclear
> why application code can't be part of that process. If the response
> is that application code isn't trusted by customers to maintain
> separation, I'm going to ask how you do maintenance and fixes on their
> behalf.

If data segregation is a fundamental feature of the platform, then it
is inherently more trustable that N pieces of application code all
attempting the same thing. Me saying 'My code will keep your data
private' carries nothing like the weight that Google saying 'It is not
possible to run a query across two data stores' does. I would only
need to say 'Your data will be stored in a separate partition', and
that has tangible meaning to the customer from a data security
perspective. They are then placing their trust more in Google for
this feature than in my application.

From a maintenance, reliability, trustability, transparency etc.
perspective, moving a common feature (especially a security feature)
from the application layer to platform layer is a major advantage, and
something a good architecture should always try to achieve.

I want as little application code as possible to express my
application. This is already one of the key wins of the GAE platform,
and moving something as fundamental as data partitioning out of the
application platform will enhance this capability.

hawkett

unread,

Dec 21, 2008, 5:16:18 PM12/21/08

to Google App Engine

Apologies - the following fragment

'...moving something as fundamental as data partitioning out of the
application platform will enhance this capability.'

should read

'...moving something as fundamental as data partitioning out of the
application layer will enhance this capability.'

Andy Freeman

unread,

Dec 21, 2008, 5:49:04 PM12/21/08

to Google App Engine

As I promised, now I'm going to ask how you plan to do maintenance and
fixes on behalf of your customers if you can't get to their data.

If you have access to the customer's data, they're trusting your code
and Google is not protecting their data.

hawkett

unread,

Dec 21, 2008, 6:15:32 PM12/21/08

to Google App Engine

Via the admin console. Google provides this application code, and it
is common - part of the platform offering. This is one possibility.
Another is that an admin user for that customer is made available to
you for administration purposes. You could initialise the customer
data space with this user profile. It may depend how you map the
authenticated entity to logical identities in your application.
Whichever, you do not have application code capable of querying across
customer data stores, because the platform does not allow it.

Andy Freeman

unread,

Dec 21, 2008, 6:53:15 PM12/21/08

to Google App Engine

The distinction between "application code that can access multiple
datastores" and "code that can access multiple datastores" seems
strained at best.

If there's code that can get to a user's data (and both the admin
console and an admin user are code that can get to the user's data),
does it really matter what you call it?

hawkett

unread,

Dec 21, 2008, 7:13:32 PM12/21/08

to Google App Engine

Who are you quoting?

The Google admin console should not be capable of querying across
multiple customer data stores. I repeat - application code can not
execute a query across multiple customer data stores - did I offer a
distinction somewhere? Admin console *would* allow you to run queries
against each of your customer data stores in isolation. I expect it
would use a common, non-public, platform API (i.e. making data
security part of the platform) to access the logical partitions.

What is your use-case? You use the example of maintenance and fixes
on behalf of customers - when would that require querying across two
customer's data stores? It's a recipe for disaster.

> ...
>
> read more »

Andy Freeman

unread,

Dec 21, 2008, 10:03:29 PM12/21/08

to Google App Engine

I'm paraphrasing you. You've written repeatedly that a feature that
allows an application to choose the datastore on which it operates can
not be used for your purposes. The argument appears to be that an
application that uses such a feature can theoretically access multiple
datastores and is therefore unacceptable, even if that application is
written so it validates the user and then chooses which datastore to
access and only accesses one datastore after doing so.

However, you're happy if a user's data can be accessed through a
google admin console or via an admin user.

The reason that I find that distinction strained is that GAE
applications and the google admin console can be driven
programmatically. As a result, one can easily write code using those
facilities that simultaneously accesses multiple datastores, which is
your reason for rejecting the "choose which datastore to access"
feature.

> You use the example of maintenance and fixes
> on behalf of customers - when would that require querying across two
> customer's data stores?

I never said or implied that it did.

> ...
>
> read more »- Hide quoted text -

hawkett

unread,

Dec 22, 2008, 7:25:45 AM12/22/08

to Google App Engine

> > You use the example of maintenance and fixes
> > on behalf of customers - when would that require querying across two
> > customer's data stores?
>
> I never said or implied that it did.

Issue 106 proposes '...cross app queries using the db APIs only' -
which to me means you can easily introduce a bug like the one
originally posted - i.e. querying across two customer's data stores.
Apologies if I understood your responses to be in support of this
approach when they were not. Perhaps you could elaborate your use
case in a little more detail.

Are you ok with the constraint that a query can not be run across
multiple data stores? If we can agree on that, then I'd say we are
doing pretty well.

For accessing another application's data store from your code, I would
(and have) recommended exposing an API that you can access via HTTP.
I believe this is what Google has suggested in this post

http://groups.google.com/group/google-appengine/browse_thread/thread/12eb676e98a25293/f5cfaad4e0d79ac8

which is quoted in Issue 106.

If you do have a use case where you do want/need to run queries across
customer data stores, then I would have that customer data in the same
data store - i.e. what do you need the partition for in the first
place?

Unfortunately the idea of a data partition and an application
partition are the same thing at the moment with GAE, so perhaps you
need the partition for quota and billing purposes, which forces you to
have separate data stores when you don't want them. In that case I
would raise a feature request for multiple applications to be able to
share a single data store - would this satisfy what you are trying to
achieve?

> ...
>
> read more »

Andy Freeman

unread,

Dec 22, 2008, 11:47:05 AM12/22/08

to Google App Engine

> > Are you ok with the constraint that a query can not be run across
> multiple data stores? If we can agree on that, then I'd say we are
> doing pretty well.

I'm okay with that constraint. My point is that if the application
has an admin console or an admin user, one can write a query that runs
across multiple datastores by writing code that accesses said
datastores through their admin consoles and/or users.

No, such a query doesn't run in the application itself. However, a
query in an application that validates the user, determines which
datastore to use, and then runs all queries within that datastore also
doesn't access multiple datastores even if it does use an API feature
that could be used to access multiple datastores if said application
were written differently.

I still have no interest in running a query across multiple datastores
and have never suggested otherwise.

I'm trying understand why a feature that lets the application
programmer determine which datastore to use is an unacceptable way to
support "one code base, customer-specific datastores" if it's okay to
have an admin console and/or applications that have an admin user.

Yes, it's convenient to have google manage all login stuff, but that
means that you don't have any control. If they're your customers....

On Dec 22, 4:25 am, hawkett <hawk...@gmail.com> wrote:
> > > You use the example of maintenance and fixes
> > > on behalf of customers - when would that require querying across two
> > > customer's data stores?
>
> > I never said or implied that it did.
>
> Issue 106 proposes '...cross app queries using the db APIs only' -
> which to me means you can easily introduce a bug like the one
> originally posted - i.e. querying across two customer's data stores.
> Apologies if I understood your responses to be in support of this
> approach when they were not. Perhaps you could elaborate your use
> case in a little more detail.
>
> Are you ok with the constraint that a query can not be run across
> multiple data stores? If we can agree on that, then I'd say we are
> doing pretty well.
>
> For accessing another application's data store from your code, I would
> (and have) recommended exposing an API that you can access via HTTP.
> I believe this is what Google has suggested in this post
>

> http://groups.google.com/group/google-appengine/browse_thread/thread/...

hawkett

unread,

Dec 22, 2008, 1:21:32 PM12/22/08

to Google App Engine

> I'm okay with that constraint. My point is that if the application
> has an admin console or an admin user, one can write a query that runs
> across multiple datastores by writing code that accesses said
> datastores through their admin consoles and/or users.

For the admin console, I'm saying you can only use this feature to run
against each datastore in isolation. Pick the datastore, run the
query. It's fair to say the admin console security model is another
problem that GAE needs to sort out (e.g.
http://code.google.com/p/googleappengine/issues/detail?id=91), but I
would hope that when it is sorted out, I can assign admin rights on
different data stores to different users in my organisation.

For the admin user option, I am expecting that the admin user is
unique to each data store, not one admin user for all customers. The
picture I am painting is that you administer your customer data
instances individually, not as an aggregate.

I want to be able to make a statement like this about my application
running on GAE (note especially the data security section at the
bottom)

http://www.rallydev.com/products/deployment_solutions/security/

And I can't do that if administrators can run ad-hoc unsecured queries
across customer data stores. (well, maybe they are secured, but only
by your application code)

And finally - I am looking for features that allow me to give my
customers confidence, not erode it. Saying that their data is
partitioned from other customer's achieves that goal. That doesn't
mean their data is perfectly safe - there would be any number of other
means by which their data could be exposed to their competitors, but I
can guarantee them that they their business plan is not going to
suddenly appear on the welcome screen of a competitor due to a bug in
my application code. The cited bug is a perfect example of this sort
of thing actually happening, and of a situation that would be closed
off with effective data partitioning.

Do you agree that the cited bug would not occur with strict data
partitioning, and could occur if issue 106 was actioned? If you are
looking for a distinction, then this is it. To be perfectly clear, I
see this bug as an example of multiple customers having their data
exposed to multiple other customers - this is not a bug that would
occur by someone making a mistake in admin console (when you can only
query customer datastores in isolation).

It seems to me you are saying that if there is *any* mechanism that
could compromise customer data, then why bother worrying about it?

There is a *lot* of work for GAE to do to get to the point where an
app on their infrastructure can make a claim like that - e.g. I can't
believe only 6 people have starred this issue for example -

http://code.google.com/p/googleappengine/issues/detail?id=501

I suspect it is because people only look at the first page of issues -
which completely debunks the idea that Google should be using stars in
its issues list to prioritise its work schedule.

> ...
>
> read more »

Andy Freeman

unread,

Dec 22, 2008, 2:30:56 PM12/22/08

to Google App Engine

> For the admin console, I'm saying you can only use this feature to run
> against each datastore in isolation.

My point is that that's not true. If I have access to multiple admin
consoles (for maintenance reasons), I can combine the results that I
get from each of the consoles, effectively giving me the ability to
query against multiple datastores. I can do this with a program that
"runs" the admin consoles or I can do it by hand.

> And I can't do that if administrators can run ad-hoc unsecured queries
> across customer data stores. (well, maybe they are secured, but only
> by your application code)

Since the console is application code....

> Do you agree that the cited bug would not occur with strict data
> partitioning, and could occur if issue 106 was actioned?

No, I don't agree. Even if we ignore the admin console hole, "strict
data partitioning" is a fantasy in an environment where data lives on
the same hardware. The google code for handling multiple datastores
could go wonky. Or, their user login code could do the wrong thing.

Since the risk of your login code doing the wrong thing is
unacceptable, it's unclear why the risk of their code doing the wrong
thing is any more acceptable.

> And finally - I am looking for features that allow me to give my
> customers confidence, not erode it.

That's nice, but the feature in question doesn't affect the real
security of your customer's data.

If multiple customers have data on the same piece of hardware, some
code has to manage the separation. If it's unacceptable for your code
to do so....

> It seems to me you are saying that if there is *any* mechanism that
> could compromise customer data, then why bother worrying about it?

Not at all. I'm saying that if a given mechanism is an unacceptable
risk under one name, it's an unacceptable risk under all names. I'm
also saying that if you're putting in a screen door (admin consoles),
it's somewhat silly to worry about weatherstripping said door.

On Dec 22, 10:21 am, hawkett <hawk...@gmail.com> wrote:
> > I'm okay with that constraint. My point is that if the application
> > has an admin console or an admin user, one can write a query that runs
> > across multiple datastores by writing code that accesses said
> > datastores through their admin consoles and/or users.
>
> For the admin console, I'm saying you can only use this feature to run
> against each datastore in isolation. Pick the datastore, run the
> query. It's fair to say the admin console security model is another

> problem that GAE needs to sort out (e.g.http://code.google.com/p/googleappengine/issues/detail?id=91), but I

hawkett

unread,

Dec 22, 2008, 5:03:40 PM12/22/08

to Google App Engine

> No, I don't agree. Even if we ignore the admin console hole, "strict
> data partitioning" is a fantasy in an environment where data lives on
> the same hardware. The google code for handling multiple datastores
> could go wonky. Or, their user login code could do the wrong thing.

I think you are making a number of mistakes -

1. Believing that the risk profile of platform code is the same as
your application code (and the 1000's of other developers that roll
their own data security solution because it isn't part of the
platform). If Google offers it as part of the platform, then it has
been tested by those 1000's of developers, by their customers and by
Google as a major part of a strategic platform offering by a company
with enormous resources. You state - 'Since the risk of your login

code doing the wrong thing is unacceptable, it's unclear why the risk

of their code doing the wrong thing is any more acceptable' - it is
*absolutely* clear that the risks are not even in the same ballpark.

2. Not seeing it from the customer's perspective. What they see is
that every app on GAE is a roll your own data security effort - what a
nightmare - how are they to tell which app was written well and which
wasn't? How would they even begin to assess the risk profile - do
they have to audit your company's development practices? If Google
offers it as part of the platform the customer knows that every app
shares the same implementation, and (assuming you agree with the first
point) a far less risky one. Maybe you can write more or equally
stable code than Google, but the customer has know way of knowing that
- I'll bet you that 100% of customer's that approach you for the first
time would rather hear that data security is supplied as part of the
Google platform than by your application code and not because "that's
nice" - but because this feature *does* affect the real security of
their data.

3. Thinking strict data partitioning "is a fantasy in an environment
where data lives on the same hardware". I did say strict, not
physical. I think you are missing the value of the software platform
again - it is not the same thing as application code that runs on the
platform. You appear to be making a simple distinction between
hardware and software. The application platform is inherently more
robust than your application code. In fact, given that Google already
have a data partitioning mechanism for applications, I wouldn't be
surprised if it was even lower level than the GAE platform, and part
of the BigTable implementation. That would make it even more robust
than the GAE platform code. Which is ludicrously robust compared to
your application code.

4. Thinking risk reduction is not valuable unless it results in total
risk removal. Security is all about the management and mitigation of
risk - not necessarily the removal of it. Remove it if you can,
obviously - but generally this is an unlikely outcome. We've
identified three threats (admin console error, external app doing
admin, bug in application code) - and you are saying that unless we
remove the first two, why remove the third. It's false logic, and
poor security.

5. Confusing a admin error or malicious attack with a software bug.
You state - "My point is that that's not true. If I have access to

multiple admin consoles (for maintenance reasons), I can combine the
results that I get from each of the consoles, effectively giving me
the ability to query against multiple datastores. I can do this with a
program that

'runs' the admin consoles or I can do it by hand". Can you point me
to the admin console API you'd use to 'run' it via a program - or are
you talking about screen scraping? Both situations are totally
different scenarios, and risk profiles, to being able to introduce a
bug into your application code that exposes customer data.

6. Thinking that with strict data partitioning you *will* be able to
introduce a bug into *your* application code that exposes multiple
customers data to multiple other customers via the Datastore API. And
this is the key - from the customer's perspective - yes they have to
worry about an admin error, or some secondary application you might
write, or a malicious attack, or a defect in the platform code, or a
disgruntled employee - but they don't have to worry about the
application code they use every day, and that is one of their biggest
risk points eliminated - the largest part of your company offering.
Security is about risk mitigation and management. If you still
disagree, can you please explain to me how the bug would manifest in
your application code (I'm talking about code that runs on GAE, with
strict data partitioning).

In the end, I think it all comes down to point 1, and an understanding
that software security is all about risk mitigation and management.
Control what you can, have contingency for what you can't. If you
agree with point 1, then you understand my position. If you don't
agree with point 1, well, good luck to you. I think I've just about
exhausted the ways I have of expressing what I feel is a very obvious
point.

> ...
>
> read more »

Geoffrey Spear

unread,

Dec 23, 2008, 11:51:08 AM12/23/08

to Google App Engine

On Dec 22, 5:03 pm, hawkett <hawk...@gmail.com> wrote:
> 2. Not seeing it from the customer's perspective. What they see is
> that every app on GAE is a roll your own data security effort - what a
> nightmare - how are they to tell which app was written well and which
> wasn't? How would they even begin to assess the risk profile - do
> they have to audit your company's development practices?

They're trusting you with their data. Why should they trust your code
not to email all of the data they input to malicious people if they
can't trust you to write code that keeps their data separate enough
from other people's data? If they don't trust you buy use your
product anyway, they're stupid, whether Google provides you the tools
to do better security than you can do yourself or not.

Andy Freeman

unread,

Dec 23, 2008, 2:22:31 PM12/23/08

to Google App Engine

> In fact, given that Google already
> have a data partitioning mechanism for applications, I wouldn't be
> surprised if it was even lower level than the GAE platform, and part
> of the BigTable implementation.

You're hoping that the partitioning for a given datastore depends on
how google allows access to said datastore. In particular, you're
hoping that the partitioning for datastores using a feature where a
given application can pick between a set of datastores is different
than the partitioning when a given application has access to exactly
one datastore.

That's unlikely. If google decides to implement such a feature, it
would be silly to also introduce a different mechanism for
partitioning datastores.

> How would they even begin to assess the risk profile - do
> they have to audit your company's development practices?

If your customers are serious, they must, regardless of how your
application is deployed, regardless of who handles login/access
management. Login code isn't the only risk.

And, if you're serious about login code, you must validate the login
result. That is, once it is determined that a given user running your
application should use a given datastore, the application then must
look the datastore that it is trying to use and verify that it is
actually the correct datastore for that user. Platform login code
can't do that check. And, the platform's login doesn't provide much
information to the application for it to do such a check.

Yes, I realize that customers have different risk and cost
sensitivities so there must be some right around the points that you
like. However, that's a long way from saying that such points
dominate.

hawkett

unread,

Dec 24, 2008, 1:17:07 PM12/24/08

to Google App Engine

> You're hoping that the partitioning for a given datastore depends on
> how google allows access to said datastore

Exactly - that is the feature request I am proposing. It seems
likely to me that GAE uses a data partitioning feature of BigTable
(maybe not, I don't know, but to me it seems the right place to
implement a data partitioning function) - they should expand the way
GAE uses that BigTable feature to offer the functionality I am
requesting.

> If your customers are serious, they must, regardless of how your
> application is deployed, regardless of who handles login/access
> management. Login code isn't the only risk.

Perhaps, but the threshold is significantly lowered - customers are
more likely to undertake an audit (rather than go to a competitor) if
they can see you are using platform features for security - I stand by
the assertion that 100% of customers who engage you for the first time
will prefer you to be using the platform over custom application code
- especially for security.

> And, if you're serious about login code, you must validate the login
> result. That is, once it is determined that a given user running your
> application should use a given datastore, the application then must
> look the datastore that it is trying to use and verify that it is
> actually the correct datastore for that user

I don't agree. You should trust your authentication mechanism - this
is a trust relationship. If you don't trust it, then you need to
address that problem, not write additional application code which adds
to the complexity of your security implementation. Complexity in your
security implementation increases risk, not decreases it. Note this
is not an argument against defense in depth - it is an argument for
simplicity in each implementation layer. We are talking about the
authentication layer, and the db access layer, and both should be
platform concerns, not application concerns (at least from my
perspective) - certainly they are currently platform concerns in GAE,
and I would like them to stay that way.

It is very important to note that the functionality is *nearly* there
already - i.e. restricting access to users from a google apps account
- it has strict data partitioning, authentication and db access are
platform concerns, user provisioning administration etc. is already
there in google apps. The only thing missing is a method of
automatically spawning a new application in response to a customer
registration (and the 10 app limit).

The architecture of GAE right now is totally in line with what I am
talking about, and I have no doubt that this is for all the reasons I
have listed, and many I haven't even thought of. Consequently I doubt
that you will ever be given the functionality you are looking for -
i.e. accessing multiple datastores from the same application instance.

I'll ask again - would a feature that allowed you to map the same
datastore to multiple application instances satisfy your use-case? It
does stretch the data partitioning thing a bit, but might be workable
from a platform configuration perspective.

> ...
>
> read more »

Andy Freeman

unread,

Dec 24, 2008, 6:45:42 PM12/24/08

to Google App Engine

>> You're hoping that the partitioning for a given datastore depends on
>> how google allows access to said datastore

> Exactly - that is the feature request I am proposing.

Huh? You were requesting the ability to spawn a new datastore and to
have the login scheme for a given pile of application code pick the
datastore. The above is about methods for separating datastores and
whether the method for separating them should depend on how the
datastore is chosen.

> I don't agree. You should trust your authentication mechanism - this
> is a trust relationship. If you don't trust it, then you need to
> address that problem, not write additional application code which adds
> to the complexity of your security implementation. Complexity in your
> security implementation increases risk, not decreases it. Note this
> is not an argument against defense in depth - it is an argument for
> simplicity in each implementation layer.

Let's look at these alternatives.

With no post-login check, the application runs using whatever
datastore the login procedure finds acceptable. If the login
procedure fails or the datastore layer serves up the wrong datastore,
the application still does its thing.

Post-validate may catch either of those errors. (Of course, the post-
validate could fail as well and allow access when it shouldn't, but
that just leaves you no worse off than you were without the check.)
Yes, the post-validate may block execution when it shouldn't, but
that's likely to be because the datastore layer is misbehaving,
delivering wrong data. The application may have failed eventually
anyway when running with a misbehaving datastore layer, but detection
during validation is better because the application doesn't get a
chance to corrupt user-data.

hawkett

unread,

Dec 26, 2008, 8:50:20 AM12/26/08

to Google App Engine

> Huh? You were requesting the ability to spawn a new datastore and to
> have the login scheme for a given pile of application code pick the
> datastore. The above is about methods for separating datastores and
> whether the method for separating them should depend on how the
> datastore is chose

I assume you are talking about this statement from my first post? -

'The system spawns a virtual instance of the app - or at least allows

mapping a single datastore partition to the authenticated entity. You
coudl extend it by allowing multiple datastores per authenticated

entity and choosing the appropriate one at authentication time.'

I haven't mentioned application code at all. If you have interpreted
'the system'' to mean my application code, then I think you are being
disingenuous. What's the point of a feature request for my own
application code? The feature request has the term 'data segregation'
in its title, and doesn't include the proposed extension (as this
would add significant additional complexity). Anyway, when I request
functionality in 'the system' in a GAE feature request, I am talking
about GAE, not my own application code. If you're talking about some
other statement I made, then please say what it is.

> With no post-login check, the application runs using whatever
> datastore the login procedure finds acceptable.

Yes, it does. This is not our application code, and we trust it. If
you don't trust it, modify it or choose a different authentication
mechanism that you do trust.

> If the login
> procedure fails or the datastore layer serves up the wrong datastore,
> the application still does its thing.

Raise and fix the bug in the authentication/db layer.

What is your actual position Andy? Do you support request 106? Do
you oppose 945? At the moment, I am getting the idea you support 106,
but not the implication that it would support queries across
datastores. I am also understanding that you oppose the data
segregation from 945 because you think it doesn't serve a purpose.
This is despite the fact that the entire security architecture of GAE
is based on trustable external authentication, data partitioning,
mapping that data partition to the authenticated entity, and not
allowing cross data store queries. Are you saying the current GAE
security architecture is wrong? Or just that they should get rid of
the data partitioning to deliver feature 106? If this is your
position, then it seems totally unsustainable to me.

> ...
>
> read more »

hawkett

unread,

Dec 26, 2008, 8:52:24 AM12/26/08

to Google App Engine

> They're trusting you with their data. Why should they trust your code
> not to email all of the data they input to malicious people if they
> can't trust you to write code that keeps their data separate enough
> from other people's data?

Because one is a malicious attack, and one is a bug. From a security
perspective you address them in totally different ways.

> If they don't trust you buy use your
> product anyway, they're stupid, whether Google provides you the tools
> to do better security than you can do yourself or not.

You're kidding, right?

Andy Freeman

unread,

Dec 29, 2008, 7:53:47 PM12/29/08

to Google App Engine

> 'The system spawns a virtual instance of the app - or at least allows
> mapping a single datastore partition to the authenticated entity. You
> coudl extend it by allowing multiple datastores per authenticated
> entity and choosing the appropriate one at authentication time.'
>
> I haven't mentioned application code at all. If you have interpreted
> 'the system'' to mean my application code, then I think you are being
> disingenuous.

"The system" in this case is the combination of the GAE platform and
an application running on said platform.

> What's the point of a feature request for my own application code?

Oh really? The reason that this requires a feature request is that it
isn't (currently) possible for an application running on GAE to
request the creation of another datastore. (One could call an outside
agent to request another application, but ....)

> Do you support request 106?

Yes.

> Do you oppose 945?

Not sure.

> At the moment, I am getting the idea you support 106,
> but not the implication that it would support queries across
> datastores.

106 allows an application to access multiple datastores, so why would
I think that it doesn't?

Note that the ability of an application to access multiple datastores
does not imply the ability to access arbitrary datastores. Note also
that the ability to access multiple datastores could be satisfied via
a "datastore login" API used by the application which would be as
secure as anything by the platform before the application starts.
(Both schemes can be exploited by malicious code. Both are only as
secure as the platform's login.)

> I am also understanding that you oppose the data
> segregation from 945 because you think it doesn't serve a purpose.

I'm skeptical of 945 because it's a lot of mechanism. There are many
ways to get data segregation using the existing partitioning.

> This is despite the fact that the entire security architecture of GAE
> is based on trustable external authentication, data partitioning,
> mapping that data partition to the authenticated entity, and not
> allowing cross data store queries.

The GAE security architecture is not based on "not allowing cross data
store queries". It's based on authenticated access to partitioned
datastores, which is a very different thing. One could have
authenticated access to partitioned datastores AND cross datastore
queries. One could have authenticated access to choice of partitioned
datastore but not have cross datastore queries. One have
authenticated access to choice of partitioned datastores and allow
cross datastore queries. One could even have an "authenticated
choice" mechanism that allowed cross datastore queries for some
datastores and not others.

> Are you saying the current GAE security architecture is wrong?

No.

> Or just that they should get rid of the data partitioning to deliver feature 106?

No.

hawkett

unread,

Dec 30, 2008, 9:17:23 AM12/30/08

to Google App Engine

> "The system" in this case is the combination of the GAE platform and
> an application running on said platform.

No, I would prefer GAE to implement the system completely, using
existing elements. How? In app.yaml, you specify that your
application supports mapping multiple google apps user spaces to your
app. Currently it only allows one. This is an application
marketplace type concept. When my app is added, a new data partition
is created for their users. Most importantly I am the administrator
of the app and all of the data partitions - this is different to
deploying my app to their GAE account - it still resides in my
account, and the customer has no administrative rights beyond what I
give the in my application code. I need only have one app deployed.

With this model, registration, user provisioning, authentication and
data partitioning are all handled external to my application code
using building blocks that are already present in the GAE offering.
The only change to my application code from right now is an entry in
app.yaml. It's not even particularly complicated - especially for me,
the application developer. I can imagine implementations that don't
even require the app.yaml entry.

I'll admit (as I'm sure you will) that this thread has led me to think
more deeply about the implementation of the use case from my original
post, but the above is not excessive, and is much preferable to
application code. It would allow some great additions, such as a
common billing and payment engine - something most app developers
would love to have taken off their plate.

Yet another feature this would allow - version migration for
customers. I deploy separate versions of my app, and have the ability
to move customer data partitions between app deployments. An obvious
use case is that some customers may be happy to try new features in
beta, others may want to wait for release versions. It is worth
noting that google apps essentially supports this feature currently
with the checkbox indicating that you want the latest features.

These are all major development efforts that carry significant risks
to your customers, and are mostly diversions to core creative
application development. Common use cases should be moved to the
platform layer, freeing the developer to actually build their
application. This, I think, is a good summary of the stated goals of
GAE platform.

I'll add the above implementation as a suggestion to 945 to clear up
any misunderstanding about platform vs application.

> The GAE security architecture is not based on "not allowing cross data
> store queries". It's based on authenticated access to partitioned
> datastores, which is a very different thing.

I did say -

'...security architecture of GAE is based on trustable external

authentication, data partitioning, mapping that data partition to the

authenticated entity, and not allowing cross data store queries'

I realise they are different things, that's why I listed them
separately. As it stands GAE does not allow cross data store queries,
and from my perspective that is an aspect of the security
architecture. 106 wants that aspect 'relaxed'.

While I don't think GAE will implement cross data store queries using
the data API (I still think exposing an application API to access said
data, or supporting one data partition for many apps is the right
choice), a possible implementation that would be acceptable to me is
adding an entry to app.yaml specifying how strict data partitioning
should be for an application. For my use case I would choose the
strictest option, and for yours something less so. It's not ideal, as
an error in app.yaml could lead to the cited bug, but the risk profile
much less, and more easily auditable.

> ...
>
> read more »

Andy Freeman

unread,

Dec 30, 2008, 10:14:56 AM12/30/08

to Google App Engine

> No, I would prefer GAE to implement the system completely, using
> existing elements.

I was unaware of the weight that your preferences have.

I note that your implementation requires new elements, namely
additions to app.yaml.

> It would allow some great additions, such as a
> common billing and payment engine - something most app developers
> would love to have taken off their plate.

There are lots of other implementations that have that property, as
well as the others described below.

> As it stands GAE does not allow cross data store queries,
> and from my perspective that is an aspect of the security
> architecture. 106 wants that aspect 'relaxed'.

How do you know how the current GAE code actually works?

One possible implementation that satisfies every currently observable
behavior involves an "open datastore" routine that is passed the name
of the relevant datastore and called by Google code that lives in
application space. This routine returns a token that is used by every
datastore access routine. (A given process may access the datastore
on behalf of urls that require login as well as ones that don't so
whatever mechanism connects a process to a datastore probably does not
require any user credentials. However, "open datastore" may use app-
specific credentials baked into the application by google's set up
code.) There are a number of places where "open datastore" could be
called.

106 or any of the variants that I've mentioned would merely make "open
datastore" available through some appropriate safeguards and would be
just as secure as the current system.

I don't know Google's code either, but it is generally believed that
BigTable is used in many internal Google applications. The easy way
to make BigTable available to applications is via such a routine
called by application-space code. To the extent that GAE's datastore
is "just" a BigTable wrapper....

hawkett

unread,

Jan 5, 2009, 9:34:29 AM1/5/09

to Google App Engine

On Dec 30 2008, 3:14 pm, Andy Freeman <ana...@earthlink.net> wrote:
> > No, I would prefer GAE to implement the system completely, using
> > existing elements.
>
> I was unaware of the weight that your preferences have.

Isn't that what a feature request is? Should I raise feature requests
for other people's preferences? What a strange statement.

> I note that your implementation requires new elements, namely
> additions to app.yaml.

And you fail to note that I said it was not a requirement. e.g. you
could achieve the same thing when you deploy the app (e.g. when you
choose to tie it to a domain or not), or via configuration in admin
console.

> > As it stands GAE does not allow cross data store queries,
> > and from my perspective that is an aspect of the security
> > architecture. 106 wants that aspect 'relaxed'.
>
> How do you know how the current GAE code actually works?

I read the API docs - how do you manage it?

> 106 or any of the variants that I've mentioned would merely make "open
> datastore" available through some appropriate safeguards and would be
> just as secure as the current system.

Let's examine the token idea - and assume you have obtained N tokens
securely. You can easily introduce a bug in your application code
that uses the wrong token for the wrong end-user. Secure access,
buggy exposure of customer data. Your idea does not prevent the cited
bug, because it is not an alternative to strict data partitioning.
This is not a solution to the concerns of the original poster.
Perhaps I have misunderstood your implementation?

> I don't know Google's code either, but it is generally believed that
> BigTable is used in many internal Google applications. The easy way
> to make BigTable available to applications is via such a routine

> called byapplication-space code. To the extent that GAE's datastore

> is "just" a BigTable wrapper....

I think you are probably over-simplifying the meaning of BigTable.
BigTable is indeed used by many internal applications (as I understand
it), and as previously stated, I would expect (don't know) that the
data segregation required to achieve this would not be implemented by
each of those internal applications, but by lower level features in
BigTable. Move common use cases to the platform level.

On Dec 30 2008, 3:14 pm, Andy Freeman <ana...@earthlink.net> wrote:
> > No, I would prefer GAE to implement the system completely, using
> > existing elements.
>
> I was unaware of the weight that your preferences have.
>
> I note that your implementation requires new elements, namely
> additions to app.yaml.
>
> > It would allow some great additions, such as a
> > common billing and payment engine - something most app developers
> > would love to have taken off their plate.
>
> There are lots of other implementations that have that property, as
> well as the others described below.
>
> > As it stands GAE does not allow cross data store queries,
> > and from my perspective that is an aspect of the security
> > architecture. 106 wants that aspect 'relaxed'.
>
> How do you know how the current GAE code actually works?
>
> One possible implementation that satisfies every currently observable
> behavior involves an "open datastore" routine that is passed the name

> of the relevant datastore and called by Google code that lives inapplicationspace. This routine returns a token that is used by every

> datastore access routine. (A given process may access the datastore
> on behalf of urls that require login as well as ones that don't so
> whatever mechanism connects a process to a datastore probably does not
> require any user credentials. However, "open datastore" may use app-

> specific credentials baked into theapplicationby google's set up

> code.) There are a number of places where "open datastore" could be
> called.
>
> 106 or any of the variants that I've mentioned would merely make "open
> datastore" available through some appropriate safeguards and would be
> just as secure as the current system.
>
> I don't know Google's code either, but it is generally believed that
> BigTable is used in many internal Google applications. The easy way
> to make BigTable available to applications is via such a routine

> called byapplication-space code. To the extent that GAE's datastore

> is "just" a BigTable wrapper....
>
> On Dec 30, 6:17 am, hawkett <hawk...@gmail.com> wrote:
>
> > > "The system" in this case is the combination of the GAE platform and

> > > anapplicationrunning on said platform.

>
> > No, I would prefer GAE to implement the system completely, using
> > existing elements. How? In app.yaml, you specify that your

> >applicationsupports mapping multiple google apps user spaces to your

> > app. Currently it only allows one. This is anapplication
> > marketplace type concept. When my app is added, a new data partition
> > is created for their users. Most importantly I am the administrator
> > of the app and all of the data partitions - this is different to
> > deploying my app to their GAE account - it still resides in my
> > account, and the customer has no administrative rights beyond what I

> > give the in myapplicationcode. I need only have one app deployed.

>
> > With this model, registration, user provisioning, authentication and
> > data partitioning are all handled external to myapplicationcode
> > using building blocks that are already present in the GAE offering.

> > The only change to myapplicationcode from right now is an entry in

> > app.yaml. It's not even particularly complicated - especially for me,

> > theapplicationdeveloper. I can imagine implementations that don't

> > even require the app.yaml entry.
>
> > I'll admit (as I'm sure you will) that this thread has led me to think
> > more deeply about the implementation of the use case from my original
> > post, but the above is not excessive, and is much preferable to

> >applicationcode. It would allow some great additions, such as a

> > common billing and payment engine - something most app developers
> > would love to have taken off their plate.
>
> > Yet another feature this would allow - version migration for
> > customers. I deploy separate versions of my app, and have the ability
> > to move customer data partitions between app deployments. An obvious
> > use case is that some customers may be happy to try new features in
> > beta, others may want to wait for release versions. It is worth
> > noting that google apps essentially supports this feature currently
> > with the checkbox indicating that you want the latest features.
>
> > These are all major development efforts that carry significant risks
> > to your customers, and are mostly diversions to core creative

> >applicationdevelopment. Common use cases should be moved to the

> > platform layer, freeing the developer to actually build their
> >application. This, I think, is a good summary of the stated goals of
> > GAE platform.
>
> > I'll add the above implementation as a suggestion to 945 to clear up
> > any misunderstanding about platform vsapplication.
>
> > > The GAE security architecture is not based on "not allowing cross data
> > > store queries". It's based on authenticated access to partitioned
> > > datastores, which is a very different thing.
>
> > I did say -
>
> > '...security architecture of GAE is based on trustable external
> > authentication, data partitioning, mapping that data partition to the
> > authenticated entity, and not allowing cross data store queries'
>
> > I realise they are different things, that's why I listed them
> > separately. As it stands GAE does not allow cross data store queries,
> > and from my perspective that is an aspect of the security
> > architecture. 106 wants that aspect 'relaxed'.
>
> > While I don't think GAE will implement cross data store queries using

> > the data API (I still think exposing anapplicationAPI to access said

> > data, or supporting one data partition for many apps is the right
> > choice), a possible implementation that would be acceptable to me is
> > adding an entry to app.yaml specifying how strict data partitioning

> > should be for anapplication. For my use case I would choose the

> > strictest option, and for yours something less so. It's not ideal, as
> > an error in app.yaml could lead to the cited bug, but the risk profile
> > much less, and more easily auditable.
>
> > On Dec 30, 12:53 am, Andy Freeman <ana...@earthlink.net> wrote:
>
> > > > 'The system spawns a virtual instance of the app - or at least allows
> > > > mapping a single datastore partition to the authenticated entity. You
> > > > coudl extend it by allowing multiple datastores per authenticated
> > > > entity and choosing the appropriate one at authentication time.'
>

> > > > I haven't mentionedapplicationcode at all. If you have interpreted
> > > > 'the system'' to mean myapplicationcode, then I think you are being

> > > > disingenuous.
>
> > > "The system" in this case is the combination of the GAE platform and

> > > anapplicationrunning on said platform.

>
> > > > What's the point of a feature request for my ownapplicationcode?
>
> > > Oh really? The reason that this requires a feature request is that it

> > > isn't (currently) possible for anapplicationrunning on GAE to

> > > request the creation of another datastore. (One could call an outside

> > > agent to request anotherapplication, but ....)

>
> > > > Do you support request 106?
>
> > > Yes.
>
> > > > Do you oppose 945?
>
> > > Not sure.
>
> > > > At the moment, I am getting the idea you support 106,
> > > > but not the implication that it would support queries across
> > > > datastores.
>

> > > 106 allows anapplicationto access multiple datastores, so why would

> > > I think that it doesn't?
>

> > > Note that the ability of anapplicationto access multiple datastores

> > > does not imply the ability to access arbitrary datastores. Note also
> > > that the ability to access multiple datastores could be satisfied via

> > > a "datastore login" API used by theapplicationwhich would be as

> > > > > have the login scheme for a given pile ofapplicationcode pick the

> > > > > datastore. The above is about methods for separating datastores and
> > > > > whether the method for separating them should depend on how the
> > > > > datastore is chose
>
> > > > I assume you are talking about this statement from my first post? -
>
> > > > 'The system spawns a virtual instance of the app - or at least allows
> > > > mapping a single datastore partition to the authenticated entity. You
> > > > coudl extend it by allowing multiple datastores per
>

> ...
>
> read more »

Andy Freeman

unread,

Jan 5, 2009, 10:40:59 PM1/5/09

to Google App Engine

> > > As it stands GAE does not allow cross data store queries,
> > > and from my perspective that is an aspect of the security
> > > architecture. 106 wants that aspect 'relaxed'.
>
> > How do you know how the current GAE code actually works?
>
> I read the API docs - how do you manage it?

I'm not the one asserting that there are hard boundaries between GAE
datastores that the GAE run-time can't pierce.

It is generally believed that GAE is built on top of BigTable, which
has a lot of internal Google users. I don't know that all of them can
work with only one datastore; I'd guess that several require to access
multiple datastores simultaneously. So, if there is a BigTable-level
"only one datastore" and/or "can't switch" restriction, I'd be very
surprised if was universal or could only be pierced by suid
applications.

Why are you so certain that there are enough any google internal
applications that require "just one datastore" and/or "can't switch"
that they'd bake that option into BigTable? If there aren't any, you
get to argue why they'd add it just for GAE even though GAE can
provide that segregation in its run-time....

FWIW, while I haven't seen Google's BigTable API, the published info
labs.google.com/papers/bigtable-osdi06.pdf, mentions an "open" call.
Yes, there are probably access restrictions on said open call, but
what are the odds that there's a user per GAE application and said
application's datastore is only accessible to said user?

> > 106 or any of the variants that I've mentioned would merely make "open
> > datastore" available through some appropriate safeguards and would be
> > just as secure as the current system.
>
> Let's examine the token idea - and assume you have obtained N tokens
> securely.

Not so fast. Who said anything about application visible tokens? In
fact, it could be just "change_to_application_userstore", where a
userstore is an ordinary GAE datastore. This could easily be written
so it doesn't take any parameters from application code, which makes
it just as secure as an "open datastore" call done at process startup.

Or, it could support one token, so the application has access to the
"default" datastore and a datastore determined by such a call. Again,
that call need not take parameters from application code.

hawkett

unread,

Jan 6, 2009, 7:57:55 AM1/6/09

to Google App Engine

> > > How do you know how the current GAE code actually works?
>
> > I read the API docs - how do you manage it?
>
> I'm not the one asserting that there are hard boundaries between GAE
> datastores that the GAE run-time can't pierce.

Neither am I - I am asserting that there are hard boundaries that you
or I can't pierce, and that is a feature of the security
architecture. The API docs bear out that assertion. I do *expect*
that data partitioning is a DB layer feature, but as I said
previously, I don't know that.

> It is generally believed that GAE is built on top of BigTable, which
> has a lot of internal Google users. I don't know that all of them can
> work with only one datastore; I'd guess that several require to access
> multiple datastores simultaneously. So, if there is a BigTable-level
> "only one datastore" and/or "can't switch" restriction, I'd be very
> surprised if was universal or could only be pierced by suid
> applications.

I guess one of us will be surprised then :) - I would be surprised if
gmail, sites, blogger, picassa, orkut etc. all operated in an open
space and avoided data exposure through code implemented in each of
those applications. That seems a ludicrous architecture to me - which
is my point in this thread I guess. It makes much more sense to me to
have the partitioning logic at the DB level (like a standard database
tablespace), and for those applications to leverage that. Then they
expose API's to access their data at the application level - not use
the DB API's.

Google does, in fact, expose API's for data access - http://code.google.com/apis/gdata/
- and does not give DB level access to it. So I think just by
observing google's current architecture, it makes sense that they
wouldn't break with that tradition at the application level for GAE.
And not just because its tradition, but because it is rooted in sound
architectural principles.

> Not so fast. Who said anything about application visible tokens? In
> fact, it could be just "change_to_application_userstore", where a
> userstore is an ordinary GAE datastore. This could easily be written
> so it doesn't take any parameters from application code, which makes
> it just as secure as an "open datastore" call done at process startup.
>
> Or, it could support one token, so the application has access to the
> "default" datastore and a datastore determined by such a call. Again,
> that call need not take parameters from application code.

I think this is getting away from the 106 proposal now, which states -
'This feature request is about allowing cross app queries using the db
APIs only'

And regardless, you can easily introduce the cited bug based on your
clarification. Simply make the wrong call to 'change_to_datastore',
and you still have the exposure problem. When your code is
responsible for selecting the datastore, you can introduce the bug.
This is fairly obvious.

> This could easily be written so it doesn't take any parameters from application code, which makes
> it just as secure as an "open datastore" call done at process startup.

You are still asserting that application code carries the same
robustness profile as a platform code. This is clearly not the case.
If there are N applications implementing the application API, vs just
the platform implementing the platform API, then it is a simple matter
of statistics to show that you will get at least N times as many
bugs. In fact it will be much more than N, because the volume of
testing on the platform will be N times greater, and ther
implementation process will be much more rigourous than most
application. Without doing the analysis, I would expect the platform
fragility (e.g. fragility = defects per month) to decrease
exponentially as N increases. Using the application API, I expect
fragility would remain roughly constant, and unrelated to N. But
there is a hidden bigger probem - if fragility remains constant on a
per app basis, then customers see app engine as a minefield - which
apps are well implemented? The one they choose could be a broken
one. How would they know?

This means across the board, the risk of data exposure _from the
customer perpsective_ is much worse if partitioning logic is performed
in application code.

What do you think of the possibility of being able to decide when you
deploy your app how strict the data partitioning should be? In the
marketplace concept, the customer could be made aware of the
strictness of data partitioing when they sign up. My main concern is
protecting customer data, and giving customers confidence in the data
security of the GAE platform. This is how I read the intent of the
original poster as well.

> ...
>
> read more »

Andy Freeman

unread,

Jan 6, 2009, 9:46:24 AM1/6/09

to Google App Engine

> I guess one of us will be surprised then :) - I would be surprised if
> gmail, sites, blogger, picassa, orkut etc. all operated in an open
> space and avoided data exposure through code implemented in each of
> those applications.

If the separation is by name and ordinary "file" access control, the
"code implemented" consists of the name of the datastore for the
application plus some application configuration that has to happen
regardless. I'm pretty sure that google thinks that their folks can
open an application-specific datastore name reliably. And, if they
fail, they're talking to a datastore with the wrong structure.

Or, are you thinking that those applications use a different datastore
per external user? (If "separate datastore per user" is the usage
pattern, bigtable requires far less concurrency support than the
report mentions.)

> - and does not give DB level access to it. So I think just by
> observing google's current architecture, it makes sense that they
> wouldn't break with that tradition at the application level for GAE.
> And not just because its tradition, but because it is rooted in sound
> architectural principles

What "db level access" are you talking about? The result of that open
call is used by every other bigtable operation, including all db
operations performed at the datastore. Unless GAE works differently,
the runtime has access to that result.

> > Not so fast. Who said anything about application visible tokens? In
> > fact, it could be just "change_to_application_userstore", where a
> > userstore is an ordinary GAE datastore. This could easily be written
> > so it doesn't take any parameters from application code, which makes
> > it just as secure as an "open datastore" call done at process startup.

> And regardless, you can easily introduce the cited bug based on your
> clarification. Simply make the wrong call to 'change_to_datastore',
> and you still have the exposure problem. When your code is
> responsible for selecting the datastore, you can introduce the bug.
> This is fairly obvious.

Huh? How can you make a "wrong call" that doesn't have any
parameters?

Here's the application code:
{operations on application-wide datastore}
change_to_application_userstore() # note - no parameters
{operations on user-specific datastore}
{return to user}

The runtime knows what user and the mapping from said user to an
application-specific datastore. The application doesn't specify the
user and doesn't even know the name of the datastore.

There are only two mistakes that the application writer can make -
calling change_to_application_userstore too early or too late.

If the change_to_application_userstore() call is too late, the
application will try to perform some user-specific operations on the
application-wide datastore, but those will likely fail because its
structure is completely different. Note that the application doesn't
have access to any data from the user's datastore at that point.

If the change_to_application_userstore() call is too early, the
application will try to perform some application-generic operations on
the user's datastore, but those will likely fail for the same reason
as above. Moreover, this can't leak user data because the application
only has access to the user's datastore at that point.

> You are still asserting that application code carries the same
> robustness profile as a platform code.

No, I'm not. I'm pointing out that the platform includes the run-time
and that run-time can provide meaningful services in this area. If
it's already providing related services, and I'm pretty sure that it
is calling "open_application_datastore" with some application-specific
key on startup, this doesn't change the risk profile.

Do you really want to argue that the platform code in the run-time has
a significantly different "robustness profile" than platform code
running on a different server? (If I'm correct about it already
providing related services, you're actually arguing about the relative
robustness of related run-time code.) Would platform code running in
a different process on the same machine have yet another robustness
profile?

> Google does, in fact, expose API's for data access -http://code.google.com/apis/gdata/

hawkett

unread,

Jan 7, 2009, 8:28:56 AM1/7/09

to Google App Engine

> Huh? How can you make a "wrong call" that doesn't have any
> parameters?
>
> Here's the application code:
> {operations on application-wide datastore}
> change_to_application_userstore() # note - no parameters
> {operations on user-specific datastore}
> {return to user}

Ok - I understand (maybe), I don't think it matches what 106 is asking
for though - none of these data stores appear to be accessible between
applications - they all appear to be tied to a single application - or
are you saying the user specific data store is portable between
applications? i.e. my application can access it via db APIs, and so
can yours, provided the user is logged in?

If you don't intend portability of the user store, I agree that the
risk is different, and much lower, because the partitioning mechanism
does at least exist, and the chance of a bug is *much* lower because
the actual db query is likely to be different. When we were talking
about cross app queries, the db schemas in each data store were liekly
to be the same, which made the risk of data exposure very high. In
the implementation you now describe, the user data store and the
application data store probably have substantially different schemas.
The datastores with the same schema (user) is partitioned. I can see
value in this approach, although it does add complexity.

Essentially you are recommending strict data partitioning (aka 945)
plus a shared application datastore?

If you intend for the user data store to be portable between apps,
then I have problems with that approach. I think it should use a
specific data API, and not db level access. There's too much
unwarranted trust involved between the apps - i.e. you have to trust
that I read/write the db properly, as does everyone else - I imagine
over time such a shared database would get very 'dirty'. If you use
an API then it can enforce structure and data integrity through
validation. The portable user datastore (if that is what you are
suggesting) is a good idea, but I think it is something that google
has already implemented to some degree with their social data API -
i.e. a bunch of data attached to your identity. I guess it depends on
your implementation how useful this is.

To me, the portability of data and data partitioning should be treated
separately.

The other thing to note is that in order to map users to data
partitions, you need one of two things -
1. An API that your application can use to do so - accidently map the
wrong user to the wrong data store = data exposure problem.
2. Some form of platform supplied user provisioing - aka 945

Which of the above are you proposing?

> ...
>
> read more »

bowman...@gmail.com

unread,

Jan 7, 2009, 9:17:36 AM1/7/09

to Google App Engine

Guys, I think you need to take a step back and look at this from a
higher level.

Appengine supplies you with an instance in a cloud that includes a
customized python set, and a BigTable backend. It does not support
multiple BigTable backends and design wise I doubt it ever will. There
comes a time when you have to look at your application and determine
what is the right environment for it to be built in to meet your
business requirements. In this case, it does not sound like appengine
in and of itself is going to meet those requirements.

Generally business requirements dictate the speed at which your
product must become available for use. Google has a published roadmap
for appengine, and support for multiple BigTable instances per
application is not on it, and they have not even implied it's
something they have any interest in implementing.

So, at this point, I'd suggest you look at other alternatives in order
to meet your business requirements.

- Separate it by table within BigTable as has been suggested.
- Pull everything back inhouse and build server(s) capable of
supporting your application with the requirements you have. Such as
with MySQL running a different database for each of your users.
- Examine other cloud db storage options to see if they can meet your
requirements, such as the offering from Amazon. Though, while you
could use appengine combined with that solution, I would question how
quick you'd hit the urlfetch quota limits.
- Examine all the offerings at Amazon and other cloud providers such
as Aptana to see if any of them are a better fit for your
requirements.

Sometimes you have to stop and realize that business/security
requirements will dictate the technology you need to use, rather than
personal preference/comfort with technology you know. Over a decade of
preaching Linux while supporting Exchange and Citrix on Windows has
pounded this into my head.

hawkett

unread,

Jan 8, 2009, 8:27:14 AM1/8/09

to Google App Engine

Hi Joseph,

I've previously made this point here -
http://groups.google.com/group/google-appengine/browse_thread/thread/6319dceae6ec73e7/4d4d464c25537bda?lnk=gst&q=#4d4d464c25537bda
- so Google - people really do recommend against GAE because your
rodmap is so impenetrable. Joseph, you may recognise a little of Bob
in yourself :)

I am a C -> C++ -> Java -> J2EE guy over many years, so python is
not my language of choice - I only started using python with app
engine. I'm not here because of comfort with a technology I know, I'm
here because of business requirements. I thik you'll find as the
next few years progress that IT houses everywhere will be pushing back
against massive in house platforms. Amaszon manages the hardware side
of that pushback, Google manages hardware *and* sofware. The killer
feature with GAE is the platform offering - with Amazon you have to
roll your own software platform, which is time consuming and
maintenance intensive. Yes Amazon provides a hardware virtualisation
service that lets you do that, and I use it for a number if things.
But for a lean architecture, which leads to agility, which makes
business happy, you want as much as possible in the not-my-problem
basket - i.e. in GAE.

If you've spent much time in the world of relational databases,
then BigTable is an absolute killer feature. It collapses
Scalability, Performance, Fault Tolerance, and ORM into a simple
offering with sensible mechanisms for transaction management. All of
this stuff ends up as not-my-problem. With Amazon, most of it is my
problem to some degree.

To a degree, I think the argument that if it isn't on the
(laughably undefined) roadmap then you have to go elsewhere is a bit
hasty for a beta software platform. Granted, google is always beta,
but GAE is in early beta by their standard - you can't even pay for it
yet. This is an ideal time to try and influence the architectural
direction of the platform. That said, I think google should realise
that if they don't publish a proper roadmap (especially for must-have
platform features - e.g. SLA, asynch etc.), then people are going to
recommend dumping GAE in favour of something else.

The only feature request (that I am proposing) here is customer
data partitioning. It is not a massive big deal, especially
considering that this is already in place, you just have to deploy one
app per customer, which is a bit tedious. It doesn't seem like a
sufficient reason to jump ship and roll my own everything at Amazon.
It seems to me like an opportunity to say 'Hey, could you make this
aspect, which already exists, a little easier?'

It seems to be like a ludicrously obvious step for Google to set up
an application market place, like the iphone app store, and open that
market to all of their google apps customers. It is a license for
Google to print money - more so than the iphone, because we are
talking business customers, not joe-public. If it isn't on their
(secret) roadmap, then my name is Fred. This is the direction I have
suggested for the customer based data partitioning in
http://code.google.com/p/googleappengine/issues/detail?can=2&q=945

Finally, it is obvious that Andy and I are coming at this from
different (strongly held) positions. That is usually a good thing, as
long as both parties are honest in their attempts to understand the
other.

Colin
(not Fred)

On Jan 7, 2:17 pm, "bowman.jos...@gmail.com" <bowman.jos...@gmail.com>
wrote:

Andy Freeman

unread,

Jan 9, 2009, 4:22:41 PM1/9/09

to Google App Engine

> Ok - I understand (maybe), I don't think it matches what 106 is asking
> for though

It doesn't support 106, but that wasn't the goal.

The goal was to show that one could support application--driven
datastore choice with an appropriate amount of security.

The call to support 106 would be different, but its existence would
not mean that an application using change_to_application_userstore()
was any less secure.

Both (and others) require different application configuration as well.

For sharing a datastore between aps, I'd go through an app that
managed said shared datastore, but that's something best left up to
the designer - it isn't a platform level decision.

> Which of the above are you proposing?

I'm still not proposing anything. I'm pointing out that GAE can
reasonably support a wide range of application to datastore access
patterns.

Reply all

Reply to author

Forward