Small tutorial on ER-modeling with GAE

Joscha Feth

unread,

Apr 15, 2008, 10:55:41 AM4/15/08

to Google App Engine

Hi there,

I wrote a small tutorial about ER-modeling with Google App Engine:
http://daily.profeth.de/2008/04/er-modeling-with-google-app-engine.html

I hope this article will help people coming from a relational
background migrating faster :-)

I'd love to hear what you think and/or if you have any suggestions/
improvements!

regards,
Joscha

Michael Brunton-Spall

unread,

Apr 15, 2008, 12:31:19 PM4/15/08

to google-a...@googlegroups.com

Joscha,

While it is informative on how to write a nice ER modelled database, and covers how one might use relationships between models, I don't think you adequately covered why it is that you don't want to do your modelling like that on the AppEngine.

--
Michael Brunton-Spall
http://www.mibgames.co.uk

DocDay

unread,

Apr 15, 2008, 1:54:13 PM4/15/08

to Google App Engine

Joscha, I found it very informative. Thank You!

Doc

On Apr 15, 12:31 pm, "Michael Brunton-Spall"

<michael.bruntonsp...@gmail.com> wrote:
> Joscha,
>
> While it is informative on how to write a nice ER modelled database, and
> covers how one might use relationships between models, I don't think you
> adequately covered why it is that you don't want to do your modelling like
> that on the AppEngine.
>
> --

> Michael Brunton-Spallhttp://www.mibgames.co.uk

>
>
>
> On Tue, Apr 15, 2008 at 3:55 PM, Joscha Feth <jos...@feth.com> wrote:
>
> > Hi there,
>
> > I wrote a small tutorial about ER-modeling with Google App Engine:
> >http://daily.profeth.de/2008/04/er-modeling-with-google-app-engine.html
>
> > I hope this article will help people coming from a relational
> > background migrating faster :-)
>
> > I'd love to hear what you think and/or if you have any suggestions/
> > improvements!
>
> > regards,

> > Joscha- Hide quoted text -
>
> - Show quoted text -

Joscha Feth

unread,

Apr 15, 2008, 5:11:36 PM4/15/08

to Google App Engine

Hello Michael,

maybe I am misunderstanding you, but: who said I didn't want to do my
modeling like that? I think the examples I gave are the right way to
do it?!

greets,
Joscha

On 15 Apr., 18:31, "Michael Brunton-Spall"

<michael.bruntonsp...@gmail.com> wrote:
> Joscha,
>
> While it is informative on how to write a nice ER modelled database, and
> covers how one might use relationships between models, I don't think you
> adequately covered why it is that you don't want to do your modelling like
> that on the AppEngine.
>
> --

> Michael Brunton-Spallhttp://www.mibgames.co.uk

Joscha Feth

unread,

Apr 15, 2008, 5:12:31 PM4/15/08

to Google App Engine

Thanks!

Michael Brunton-Spall

unread,

Apr 15, 2008, 5:35:48 PM4/15/08

to google-a...@googlegroups.com

Joscha,
Just to confirm, I did find it very interesting and it's a good example of how to write traditional database applications using the AppEngine.

However, if you do your modelling like that, you will have performance issues further down the line when your application becomes more popular, or there is more data in the BigTable.

However, well written and informative for what you set out to do.

Michael

Ben the Indefatigable

unread,

Apr 15, 2008, 5:59:31 PM4/15/08

to Google App Engine

> However, if you do your modelling like that, you will have performance
> issues further down the line

I disagree. I think this is exactly how to do the data modelling.
Maybe what you are referring to is that you don't want too many
entities to be in the same group. Well, you are only going to have 4
or 5 wheels on a car, so it is exactly correct to have them in the
same group.

However, the purpose of having them in the same group, more than just
to access them via parent relationships, it to combine modifications
into a transaction so that two users that happen to be modifying the
same car at the same time, do not create an inconsistent state.

Nice article.

Ben the Indefatigable

unread,

Apr 15, 2008, 6:03:40 PM4/15/08

to Google App Engine

Oh - I just noticed he did give somebook the parent books. That would
be a no-no.
So I agree with Michael Brunton-Spall that is a performance design
issue.

Ben the Indefatigable

unread,

Apr 15, 2008, 6:13:59 PM4/15/08

to Google App Engine

What you do in the "cascading relationship" case is use a reference to
the category like was done in the M:N case, not a parent relationship.
You can use parent relationships between the categories if it is a
limited hierarchy, but you must use a reference for the books, so as
not to have all the books in one entity group. A reference will still
allow you to select all books in a category just fine if you need to,
or to use book.category to go straight to the category of a particular
book.

Brett Morgan

unread,

Apr 15, 2008, 6:22:05 PM4/15/08

to google-a...@googlegroups.com

My biggest mind change with coming to GAE is to understand that:
1) For all my read pages there should be 1 entity that contains all
the information needed to render that page (which implies data
redundancy)
2) For all my update pages, I'm going to need to use ajax style
techniques to cope with slow updates over multiple entities without
showing the user a white screen of loading. Showing the user some form
of feedback that progress is happening is better than white screen of
loading. Allowing the user to keep on working while the updates are
happening is even better again.

brett

Joscha Feth

unread,

Apr 15, 2008, 6:28:32 PM4/15/08

to Google App Engine

I see - so assuming I have only a few categories, but a lot of
products, I would do something like this:

class Category(db.Model):
name = db.StringProperty(required=True)
products = db.ListProperty(db.Key)

class Product(db.Model):
name = db.StringProperty(required=True)
price = db.FloatProperty()
categories = db.ListProperty(db.Key)

root = Category(name="Products")
root.put()
tech = Category(parent=root,name="Tech stuff").put()
books = Category(parent=root,name="Books")
books.put()
fantasy = Category(parent=books,name="Fantasy")
fantasy.put()
scifi = Category(parent=books,name="Science Fiction").put()

somebook= Product(name="Some book")
somebook.categories.append(books.key())
somebook.price = 9.99
books.products.append(somebook.put())
books.put()

lotr = Product(parent=fantasy,name="Lord Of The Rings")
lotr.categories.append(books.key())
lotr.price = 29.99
fantasy.products.append(lotr.put())
fantasy.put()

correct?

regards,
Joscha

Joscha Feth

unread,

Apr 15, 2008, 6:29:38 PM4/15/08

to Google App Engine

actually:

lotr = Product(parent=fantasy,name="Lord Of The Rings")

must read:
lotr = Product(name="Lord Of The Rings")

Ben the Indefatigable

unread,

Apr 15, 2008, 11:14:57 PM4/15/08

to Google App Engine

no, just:

lotr = Product(category=fantasy,name="Lord Of The Rings")

and no category.products list property!

Ben the Indefatigable

unread,

Apr 15, 2008, 11:20:57 PM4/15/08

to Google App Engine

> 1) For all my read pages there should be 1 entity that contains all
> the information needed to render that page (which implies data
> redundancy)

okay, so even if it is a small fixed number of entities, and/or a
group (as in Joscha's hierarchy), you would try to get it into a
single entity? I was not aware that the cost of each entity retrieved
is so significant. I was thinking that grabbing a 100 entities for
each page draw would be no big deal.

Miguel Sanchez

unread,

Apr 16, 2008, 4:04:18 AM4/16/08

to Google App Engine

Joscha,

sorry, but this is NOT a 1:1 relationship. Is a 1:M

jack = Human(name="Jack")
mike = Human(name="Mike")

mercedes = Car(brand="Mercedes")
mercedesid= mercedes.put()

jack.drives = mercedesid
jack.put()

mike.drives = mercedesid
mike.put()

Joscha Feth

unread,

Apr 16, 2008, 6:09:50 AM4/16/08

to Google App Engine

Hi Michael,

it is a 1:M if you model it like that, yes. Also if you look closer at
your code, this might not even an 1:m but also a n:m relationship if
you don't make sure, that nowhere in your code an entity (like the
Car) can be used to add a reference. I guess this just depends on
your code design - but it is the same in a relational database - you
need to make sure by defining specific keys that your entity can only
be referenced once - same with GAE. But thas for the hint, I'll add a
note to the article!

greets,
Joscha

If you want to make sure that an entity belongs to exactly one other
entity, you need to use the parent relationship

Joscha Feth

unread,

Apr 16, 2008, 6:41:16 AM4/16/08

to Google App Engine

All right - I updated it...thanks for pointing this out!

Aprigio Vasconcelos

unread,

Apr 16, 2008, 9:46:26 AM4/16/08

to Google App Engine

Hi Joscha Feth,

There's still another way to model relationships with GAE.
I'll use part of your classes to example:

class Car(db.Model):
brand = db.StringProperty(required=True)
owner = db.ReferenceProperty(Human, required=True)

class Human(db.Model):
name = db.StringProperty(required=True)

Let's insert some rows:

jack = Human(name="Jack")

jack.put()

jacks_bmw = Car(brand="BMW", owner=jack)
jacks_bwm.put()

jacks_mercedes = Car(brand="Mercedes", owner=jack)
jacks_mercedes.put()

Now, if we want to know what cars jack owns:

jacks_cars = jack.car_set
print >> sys.stdout, "Jack's cars: "
for car in jacks_cars:
print >> sys.stdout, "-"+car.brand

Thanks to back-references.

Joscha Feth

unread,

Apr 16, 2008, 10:10:17 AM4/16/08

to Google App Engine

Hi Aprigio,

Thanks a lot for your input - I added a small section where I picked
up your example with some minor changes!

Aprigio Vasconcelos

unread,

Apr 16, 2008, 10:26:52 AM4/16/08

to Google App Engine

Thanks Joscha,

But, actually, I want to illustrate this part of the code:

jacks_cars = jack.car_set

The Human model has an attribute called by default car_set (in your
case ownedcar_set) that is a Query object refering to all OwnedCar
that points to Human. So, you don't have to create another gql query
to get the cars.

cheers

Joscha Feth

unread,

Apr 16, 2008, 10:34:56 AM4/16/08

to Google App Engine

I missed this part - I thought it was a typo - this is really great (i
updated the article once more) - is this functionality noted somewhere
in the documentatuon? Can't find it...

btoc

unread,

Apr 16, 2008, 11:12:09 AM4/16/08

to Google App Engine

You need to be careful with some of your assumptions here. First of
all in you "Cascading relations" you use the parent property to
implement an ancestor relationship. While this if fine for the use
case you provide it would not be very efficient in the case where
there are a lot of children associated with a parent. This is because
this results in a very large entity group. The more entity groups your
application has--that is, the more root entities there are--the more
efficiently the datastore can distribute the entity groups across
datastore nodes. Thus for efficiency you should avoid the case where
there are a lot of children. According to the docs, "A good rule of
thumb for entity groups is that they should be about the size of a
single user's worth of data or smaller.".

Now to address to "Many-to-Many (m:n)" section. It might be a better
idea to implement a different entity to represent the relationship
between a human and the cars they own. Something like:

class CarOwner(db.Model):
car = db.Reference(Car, required=True)
owner = db.Reference(Human, required=True)

The reason for this is that you could add more fields to this which
may be beneficial later in a query. Lets say we add a bought field
that contains the date the car was bough. Then one could get cars
owned by Jack that he bought after a certain date.

Now you could have a static method on Car ...

@staticmethod
def get_owner_cars(human, bought):
"""Returns the cars that the given human owns since a bought
date."""
if not human: return []
carsowned = db.Query(CarOwner).filter('owner =',
human).filter('bought=', bought)
return [entry.car for entry in carsowned ]

A useful non-static method on Car may be ...

def human_owns(self, human):
"""Returns true if the given human owns this car."""
if not human: return False
query = db.Query(CarOwner)
query.filter('car =', self)
query.filter('owner =', human)
return query.get()

Filip

unread,

Apr 16, 2008, 11:20:24 AM4/16/08

to Google App Engine

I think it would be useful to add the output to the code samples in
the tutorial. Otherwise, either the user has to predict correctly the
outcome, or actually run the code.

Filip.

> > Joscha- Tekst uit oorspronkelijk bericht niet weergeven -
>
> - Tekst uit oorspronkelijk bericht weergeven -

Aprigio Vasconcelos

unread,

Apr 16, 2008, 11:21:07 AM4/16/08

to Google App Engine

Just search for '_set' in here http://code.google.com/appengine/docs/datastore/entitiesandmodels.html

Miguel Sanchez

unread,

Apr 16, 2008, 12:40:51 PM4/16/08

to Google App Engine

On Apr 16, 12:09 pm, Joscha Feth <jos...@feth.com> wrote:
> Hi Michael,
>
> it is a 1:M if you model it like that, yes. Also if you look closer at
> your code, this might not even an 1:m but also a n:m relationship if
> you don't make sure, that nowhere in your code an entity (like the
> Car) can be used to add a reference.

It can't be a n:m because an owner has only one reference to car, so:

1 owner -> 1 car

but, as you can see in my example, two or more owners can have a
mercedes.

The best solution is an extra table with two references (to car and
owners) and unique values.

Joscha Feth

unread,

Apr 16, 2008, 6:14:43 PM4/16/08

to Google App Engine

Hi Michael,

I added hints at the according paragraphs and also added your input on
a mapping entity to the n:m chapter.

Thanks!

Joscha Feth

unread,

Apr 16, 2008, 6:15:41 PM4/16/08

to Google App Engine

Thanks - added this link as well to the article.

On 16 Apr., 17:21, Aprigio Vasconcelos <apri...@gmail.com> wrote:
> Just search for '_set' in herehttp://code.google.com/appengine/docs/datastore/entitiesandmodels.html

Joscha Feth

unread,

Apr 16, 2008, 6:16:37 PM4/16/08

to Google App Engine

Hi Brian,
thanks for the explanation - I added this to the article together with
your nice example. Thank you very much.

Joscha Feth

unread,

Apr 16, 2008, 6:17:32 PM4/16/08

to Google App Engine

Hi Filip,

On 16 Apr., 17:20, Filip <filip.verhae...@gmail.com> wrote:
> I think it would be useful to add the output to the code samples in
> the tutorial. Otherwise, either the user has to predict correctly the
> outcome, or actually run the code.

done. Thanks for your input.

canopus

unread,

Apr 17, 2008, 6:04:30 AM4/17/08

to google-a...@googlegroups.com

> Now to address to "Many-to-Many (m:n)" section. It might be a better
> idea to implement a different entity to represent the relationship
> between a human and the cars they own. Something like:
>
> class CarOwner(db.Model):
> car = db.Reference(Car, required=True)
> owner = db.Reference(Human, required=True)
>
> The reason for this is that you could add more fields to this which
> may be beneficial later in a query. Lets say we add a bought field
> that contains the date the car was bough. Then one could get cars
> owned by Jack that he bought after a certain date.
>

This is the classic m:n relation table in a relational BBDD... works
well with BigTable? (scaling issues?)

The other approach (if you don't need those extra fields) will be a
list in both entities you need to be related m:n, like:

class User(db.model):
product = db.ListProperty(db.Key)
.....

class Product(db.model):
user = db.ListProperty(db.Key)
......

So a user use multiple products and a product is used by multiple users...

Yes, there are redundant data and it's not normalized, just for boost
queries... but...

This one scale better than the other solution?

Both list properties could be really huge... is this a problem for BigTable?

Which one do you think is the best solution using BigTable?

Joaquin.

Filip

unread,

Apr 17, 2008, 7:15:38 AM4/17/08

to Google App Engine

That's an interesting question:
How do lists get processed by BigTable?
And in particular, how are long lists of keys stored?
Is there an optimal number of keys you can put in a list, after which
performance degrades?

Anyone have educated guesses?

btoc

unread,

Apr 17, 2008, 11:16:49 AM4/17/08

to Google App Engine

All good questions. I do not have an account yet so I cannot really do
some analysis. The main issue I can think of is that to add a key to
the list you would need to first load the complete list before you
could add one. Imagine if you have 1,000,000 references in there.

The advantage with using an entity to represent the an m:n
relationship is that you can limit your queries. This makes it
possible to paginate your results. So you can get the top 10
relationships and then use db.get(keys) to load all the entities in
one swoop. There is not way to do this with lists. You would have to
load the entity that contains the list and then manipulate that list
in code which seems like it would be disastrous.

This model should also be used instead of the back-references feature.
Again there is no way (at least from my reading of the docs) to filter
appropriately. So lets say you have an entity that is referenced by a
million entities. When you get the back-reference set you would get
one million entities. Of course this wouldn't work with the current
limitations and there is no way to filter the set. Overall I think the
back-reference feature should be avoided. Implement the feature
yourself .... it will give you much better flexibility.

Also what is up with the way they implemented the back-reference
syntax? To make the implementation be a dynamically changing method
name is just ridiculous. Why didn't they do something like
obj1.getReferences(Model)? This would mean get me all the Model
instances that refer to the instance obj1.

Brian

Cary Palmer

unread,

Apr 17, 2008, 11:38:32 AM4/17/08

to google-a...@googlegroups.com

Is it possible to update the App Engine datastore tables via the
Google Base Data API or something similar? I want to update my tables
serverside in a separate environment.

Thanks

btoc

unread,

Apr 17, 2008, 11:59:12 AM4/17/08

to Google App Engine

Just create Python pages to do it and call them via HTTP from your
server.

Cary Palmer

unread,

Apr 17, 2008, 12:02:37 PM4/17/08

to google-a...@googlegroups.com

of course...

Aprigio Vasconcelos

unread,

Apr 17, 2008, 12:05:38 PM4/17/08

to Google App Engine

Back-reference is a Query object, which you can order and filter until
fetch data. So you don't have to get one million entities if you don't
want to.

btoc

unread,

Apr 17, 2008, 1:06:21 PM4/17/08

to Google App Engine

Thanks for that Aprigio

Reply all

Reply to author

Forward