DataBase in GoogleApp

6 views
Skip to first unread message

Thegremlin

unread,
Jun 17, 2009, 5:12:53 AM6/17/09
to Google App Engine
Hello all ,

I am trying to get started with GoogleApp and i am reading as much
as i can about it.
I have a few questions about it.

1. I understood that the DB is not a treditional DB. In the example
(shoutout) they show how to store and get data from the DB, however
what if i need to store different data in different DB's ?

Lets take for example an application that handles expences of a
"any" google user.
user "A" uses it and then comes user "B" and user "C" etc... after
while user "A" loggs in again and wants
to get his data, this means i need to go over all the data in the
data store in order to see wich records are
user "A" records ... is there a way to open a "DB" for a user ...
so when the user runs this application system will only "know" user
"A" data ?

2. I dont come from "webapp" programming background, and i find the
tutorials and docs very very hard to understand, are there any other
resources ?

thanks ,

Nick Johnson (Google)

unread,
Jun 17, 2009, 12:00:39 PM6/17/09
to google-a...@googlegroups.com
Hi Thegremlin,

On Wed, Jun 17, 2009 at 10:12 AM, Thegremlin <eddie....@gmail.com> wrote:

Hello all ,

  I am trying to get started with GoogleApp and i am reading as much
as i can about it.
  I have a few questions about it.

 1. I understood that the DB is not a treditional DB. In the example
(shoutout) they show how to store and get data from the DB, however
what if i need to store different data in different DB's ?

Only one datastore is available per application. You can shard your data by adding fields to your models, and querying based on them.



  Lets take for example an application that handles expences of a
"any" google user.
  user "A" uses it and then comes user "B" and user "C" etc... after
while user "A" loggs in again and wants
  to get his data, this means i need to go over all the data in the
data store in order to see wich records are
  user "A" records ... is there a way to open a "DB" for a user ...
so when the user runs this application system will only "know" user
"A" data ?

 2. I dont come from "webapp" programming background, and i find the
tutorials and docs very very hard to understand, are there any other
resources ?

You may like http://www.appenginelearn.com/. The author, Charles Severance, has recently published a very good book based on this content, called Using Google App Engine.

-Nick Johnson



thanks  ,





--
Nick Johnson, App Engine Developer Programs Engineer
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047

Eddie Harari

unread,
Jun 17, 2009, 2:35:44 PM6/17/09
to google-a...@googlegroups.com

Can any one point for the best practice with regards to the DB ?

Adding some kind of fields to my data models is possible in order to be able to know what data belongs where…

But it seems to me that the more data you have on that single application the more queries will be inefficient.

Cause if I have stored 40000 objects in my DB and one user has only 500 objects in the DB , I will need to go over all the 40,000 objects to find

His data ? is that correct ?

 

Would it be more correct to upload several instances of the application ? one for each user ?

 

 

 

 Eddie.

Wooble

unread,
Jun 17, 2009, 4:22:22 PM6/17/09
to Google App Engine


On Jun 17, 2:35 pm, "Eddie Harari" <eddie.har...@gmail.com> wrote:
> But it seems to me that the more data you have on that single application
> the more queries will be inefficient.
>
> Cause if I have stored 40000 objects in my DB and one user has only 500
> objects in the DB , I will need to go over all the 40,000 objects to find
>
> His data ? is that correct ?

No. The database queries on indexes, it doesn't scan the entire table.
The time it takes to fetch entities returned by a query is entirely
dependent on the number of entities you fetch, not on the size of your
database.

Eddie Harari

unread,
Jun 17, 2009, 6:19:02 PM6/17/09
to google-a...@googlegroups.com
1. If i have the USA phone directory as my DB, and NY has 10,000,000 records in it. or i have a DB with only NY 10,000,000 records without the rest of the USA. searching for all NY records will take same time at both cases ?

2. The other reason to split the DB is security, should a "bug" or a google app exploit  will be available you dont want
   your customer "A" records be viewed by customer "B".  the application runs in somekind of sandbox but if the DB is one
   any small error in program can expose everything to everyone.
 
Eddie.

Federico Builes

unread,
Jun 17, 2009, 6:29:55 PM6/17/09
to google-a...@googlegroups.com
Eddie Harari writes:
> 1. If i have the USA phone directory as my DB, and NY has 10,000,000 records
> in it. or i have a DB with only NY 10,000,000 records without the rest of
> the USA. searching for all NY records will take same time at both cases ?

From my understanding, they will both run in the same time since the sharding on data is based on
the indexes and not on the "databases"/"tables". In this case the index is the same for NY so it
should run in the same time.

> 2. The other reason to split the DB is security, should a "bug" or a google
> app exploit will be available you dont want
> your customer "A" records be viewed by customer "B". the application
> runs in somekind of sandbox but if the DB is one
> any small error in program can expose everything to everyone.

I don't have any real comments about this, but it feels to me like you're seeing this as a typical
DB server isolated in one part, with all the data in the same place. One of the advantages provided
by the Datastore is that that idea is no longer true.

--
Federico Builes

Eddie Harari

unread,
Jun 18, 2009, 5:54:56 AM6/18/09
to google-a...@googlegroups.com
I am sorry if this is off topic / off interest to the rest of the group but
I do think this is an important
Issue with google apps.

>> From my understanding, they will both run in the same time since the
sharding on data is based on
>>.he indexes and not on the "databases"/"tables". In this case the index is
the same for NY so it
>> should run in the same time.

I don't see how this is possible, even if you have indexed objects the "CPU"
will still need to go over all
Records to compare their index and see if it is equal to the index your
looking for.
So this should be READING 10,000,000 records vs READING 400,000,000 records,
how can it be the same time ?
Even if you fetch only 10,000,000 of them , you still need to go over all of
them...( the 400,000,000)
Is there an index property inside the record that the Google DB knows to
make it as an index ? or the index is just another field in your data object
?


> 2. The other reason to split the DB is security, should a "bug" or a
google
> app exploit will be available you dont want
> your customer "A" records be viewed by customer "B". the application
> runs in somekind of sandbox but if the DB is one
> any small error in program can expose everything to everyone.

>>I don't have any real comments about this, but it feels to me like you're
seeing this as a typical
>>DB server isolated in one part, with all the data in the same place. One
of the advantages provided
>>by the Datastore is that that idea is no longer true.

I have to tell you, you are wrong here and with current google app
configuration there is a potential security risk.
Your application might be separated from other application in a "virtual"
separate environment. But within your
Application the DB is ONE and only ONE , and there is no separation between
the "clients" that use your DB to store and retrieve objects , so it is not
up to google to secure your application but it is up to you.
Cause within your application , between your different customers , your
application is still one and have access to all records on the DB.
On a standard web development you can simply split the data to separate DB's
(even phisicaly). on google app the problem is that a single ERROR inside
"YOUR" application will lead to everyone watching everything including NOT
THEIR OWN data , cause all the data is stored in same DB within your
application.
so google may provide a secure virtual sendbox but this has nothing to do
with the internal issue.

Do you see my point ?

--
Federico Builes


Nick Johnson (Google)

unread,
Jun 18, 2009, 6:04:31 AM6/18/09
to google-a...@googlegroups.com
On Thu, Jun 18, 2009 at 10:54 AM, Eddie Harari <eddie....@gmail.com> wrote:

I am sorry if this is off topic / off interest to the rest of the group but
I do think this is an important
Issue with google apps.

>> From my understanding, they will both run in the same time since the
sharding on data is based on
>>.he indexes and not on the "databases"/"tables". In this case the index is
the same for NY so it
>> should run in the same time.

I don't see how this is possible, even if you have indexed objects the "CPU"
will still need to go over all
Records to compare their index and see if it is equal to the index your
looking for.
So this should be READING 10,000,000 records vs READING 400,000,000 records,
how can it be the same time ?
Even if you fetch only 10,000,000 of them , you still need to go over all of
them...( the 400,000,000)
Is there an index property inside the record that the Google DB knows to
make it as an index ? or the index is just another field in your data object
?

By 'index', we mean a database index - that is, a sorted list of keys and values - not an index as in 'array index'. App Engine automatically indexes individual properties, and constructs composite indexes as specified in your index.yaml file. Using these, it only needs to retrieve the keys that pertain to your query, regardless of how many records there are in total.

Note that every single entity for every App Engine app is stored in a single Bigtable table! It's due to indexing that it's able to satisfy queries efficiently, regardless of the size of that table.


It's up to you to ensure your app is secure, and doesn't permit this. The low level API in both the Python and Java runtimes permits installation of 'pre' and 'post' hooks for API calls. You can use these to perform sanity checks on requests, and to ensure that all requests pertain only to the data that applies to the current user. A multi-tenancy library that makes use of this is forthcoming for the Python runtime, but we don't have an ETA for it right now.

-Nick Johnson


so google may provide a secure virtual sendbox but this has nothing to do
with the internal issue.

Do you see my point ?

--
Federico Builes




Reply all
Reply to author
Forward
0 new messages