GQL

200 views
Skip to first unread message

Francesco G

unread,
Apr 15, 2008, 8:21:01 AM4/15/08
to Google App Engine
Hi....
I was thinking about the "no joins" design in DataStore.

Suppose I have the following entities:

Customer:
- Name
- Country

Product:
- Code
- Name
- Description

Purchases:
- Reference to Product Entity
- Reference to Customer Entity
- Date of order

I cannot think so far to a GQL query that will answer the question:
Give me all the purchases of Product "Acme Dynamite" sold to "Wile E.
Coyote".

What you think about it?

Michael Brunton-Spall

unread,
Apr 15, 2008, 8:46:43 AM4/15/08
to google-a...@googlegroups.com
Francesco,

You don't design your tables like that, you are still thinking about your data in an RDBMS style.  When it comes to Google AppEngine, and Google BigTable the key is "denormalise, denormalise, denormalise"
You want to design your database not around "What bits of data should be grouped together", but around the principle, "What bits of data will I need to read concurrently".
In your case, what you want to get out is
Purchase information, Customer Name, Product Name.

Therefore your table should be.
Purchase:
Customer Name
Customer Country
Product Code
Product Name
Purchase Order Number
Date Of Order

Now your query becomes
Purchase.all().filter("customer_name =",customer).filter("product_name =", product).fetch(100)
Easy huh?  It's all one big table.
Say you want to show a customer what items they ordered for a given purchase order...
Purchase.all().filter("purchase_order =",order_num).fetch(100)

Updating becomes a little harder, but is still pretty easy.
The question then comes, how do I update a product name, if I change my widgets from Acme Widget to "Acme Superwidget"?
Well firstly, really, how often do you do that?  I bet it's a number of orders of magnitude less often than you execute the queries above.

If you must, it is something like:
items = Purchase.all().filter("product_name =",product).fetch(1000)
for item in items:
    item.product_name = new_name
    item.put()

Hope that helps


--
Michael Brunton-Spall
http://www.mibgames.co.uk

Francesco G

unread,
Apr 15, 2008, 8:57:52 AM4/15/08
to Google App Engine
Thanks for the clear answer!
Francesco

On Apr 15, 2:46 pm, "Michael Brunton-Spall"
> Michael Brunton-Spallhttp://www.mibgames.co.uk
>
> On Tue, Apr 15, 2008 at 1:21 PM, Francesco G <
>

Michael Brunton-Spall

unread,
Apr 15, 2008, 9:05:25 AM4/15/08
to google-a...@googlegroups.com
Francesco,

Not a problem, I've just blogged about it too, and hopefully that will help anyone else searching for the same kind of advice.
Cheers for prompting the blog entry.

--
Michael Brunton-Spall
http://www.mibgames.co.uk

gnz

unread,
Apr 15, 2008, 1:37:03 PM4/15/08
to Google App Engine
Mmm... I really don't think that denormalising like this is a proper
solution.
The updating loop will only fix the first 1000 records, and the rest
will never get updated!
Besides, this does not scale to a large e-commerce application, what
would you
do if you have millions of purchases on this table?

I guess the way to go in this case is to use ReferenceProperty:
http://code.google.com/appengine/docs/datastore/entitiesandmodels.html#References

These properties seem to be lazy-loaded, and assuming you will display
the purchases
on a paged list/grid of 10 to 50 elements, performance should be
fairly decent.

Anyway, some BigTable-modeling samples/tutorials/best practices from
Google would be
very helpful...!

bye!

On Apr 15, 3:05 pm, "Michael Brunton-Spall"
<michael.bruntonsp...@gmail.com> wrote:
> Francesco,
>
> Not a problem, I've just blogged about it too, and hopefully that will help
> anyone else searching for the same kind of advice.
> Cheers for prompting the blog entry.
>
> --
> Michael Brunton-Spallhttp://www.mibgames.co.uk
>
> On Tue, Apr 15, 2008 at 1:57 PM, Francesco G <

Cousa1

unread,
Apr 15, 2008, 5:46:29 PM4/15/08
to Google App Engine
why don't you iterate through the back reference?

when you create the Reference property Product on the entity Purchases
there is a automatic back reference created on the Product entity
called purchases_set. If you have the Product entity you can do some
thing like the following:

for purch_obj in prod_obj.purchases_set:
# do your thing!

think objects, not tables.



On Apr 16, 12:21 am, Francesco G <francesco.gianferrarip...@gmail.com>
wrote:

Michael Brunton-Spall

unread,
Apr 15, 2008, 5:50:21 PM4/15/08
to google-a...@googlegroups.com
The code demonstrated was obviously quickly written code.  A better solution would be to loop, attempting to get a few hundred rows from the table, updating those, and doing it again until the get doesn't return any rows.  that way you are done.

It appears that Google's AppEngine is not designed or streamlined for inserting or updating lots of data.  If you are writing an application in which there is a lot of updating the data, modifying records and stuff, then maybe you are using the wrong framework.

Using the reference property is a way to provide a "virtual" join.
The solution using a reference would be to do:
product = Product.all().filter("name =", product_name).get()
product.name = "New Name"

The get would however become
Purchase.all().filter("product =", product).filter(customer =", customer).fetch(1000)
which again looks fine, but what about when you are looking at a single customer's order history?
purchases = Purchase.all().filter("customer =",customer).fetch(1000)
for purchase in purchases:
    output(purchase.product.name,...)

At this point, everytime you want to display the name, your call to purchase.product does a lookup in the BigTable to get the product out, and that is going to take time. 
Because of the way BigTable is organised, that data may be on a completely differant server, in a different tablet, giving you a high cost to get that data out.  That might be fine for one or two properties, but as your data model gets more expensive, so will each call, and your application will not scale.

The viewpoint for building stuff on Google's AppEngine is that you should be expecting to do millions of page views for every update or edit.  Thats what the underlying datastore is designed to do.

Michael

Cousa1

unread,
Apr 15, 2008, 6:25:39 PM4/15/08
to Google App Engine
I don't see why you are doing this?

if you have a customer you have all their purchases in the property
purchases_set. This back reference is maintained by big table.

so if you are looking at a single customer purchases wouldn't be
better to write

for purchase in purchases_set:
output(purchase.productname, ...)



On Apr 16, 9:50 am, "Michael Brunton-Spall"
> >http://code.google.com/appengine/docs/datastore/entitiesandmodels.htm...

Ben the Indefatigable

unread,
Apr 15, 2008, 6:38:07 PM4/15/08
to Google App Engine
Denormalize X3? That strikes me as very bad. If a customer's name is
corrected later, you would need to search down and change other
entities, or lose the connection.

Instead, you can select based on the references, so just:

I don't know why people say there is no joins; this is an inner join,
and then you can just follow the references to grab the product and
customer fields for each purchase in the result.

first find acmeDynamite and wileyE, then:
db.GqlQuery("select * from Purchase where product=:1 and customer=:2",
acmeDynamite, wileyE)

Brett Morgan

unread,
Apr 15, 2008, 6:44:27 PM4/15/08
to google-a...@googlegroups.com
On Wed, Apr 16, 2008 at 8:38 AM, Ben the Indefatigable
<bcbr...@gmail.com> wrote:
>
> Denormalize X3? That strikes me as very bad. If a customer's name is
> corrected later, you would need to search down and change other
> entities, or lose the connection.

It's only bad if you can't write correct code.

You can maintain a core of relational entities that form your editable
database, and a published database that all the public read pages load
from. Yes, data redundancy. Denormalisation.

You can't scale without it.

The simple truth is that normalisation only makes sense when the
read:write ratio is close to 1:1. The web ain't nothing like that. Not
by a very long shot.

Ben the Indefatigable

unread,
Apr 15, 2008, 6:47:20 PM4/15/08
to Google App Engine
>that data may be on a completely differant server, in a different tablet, giving you a high cost to get that data out.

Being on a different server is not costly in this architecture, it is
only that your program will have to initiate additional grabs for each
result row that is costly. Since an RDBMS join can decide whether it
needs to pull the relevant records for non-indexed columns *during*
the query it gains efficiencies there. But I doubt this is a big deal
here, because the main purpose of the query is narrowing down the
information before pulling it for display. With the reference to the
entities in question, you are doing the quickest operation the system
is built for and the beauty is it doesn't matter which server it is
on.

Ben the Indefatigable

unread,
Apr 15, 2008, 11:10:38 PM4/15/08
to Google App Engine
Michael and Brett, that was well explained. Thanks. I see you were way
ahead of me.

Brett Morgan

unread,
Apr 16, 2008, 1:26:44 AM4/16/08
to google-a...@googlegroups.com
Heh. It's an expected part of the learning process dude. All good. =)
Reply all
Reply to author
Forward
0 new messages