Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Many-to-many JOIN with the Datastore
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  20 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
arbingersys@gmail.com  
View profile  
 More options Apr 29 2008, 11:19 am
From: "arbinger...@gmail.com" <arbinger...@gmail.com>
Date: Tue, 29 Apr 2008 08:19:30 -0700 (PDT)
Local: Tues, Apr 29 2008 11:19 am
Subject: Many-to-many JOIN with the Datastore
It turns out my assumption was right, that there is a need for better
relational to GAE Datastore examples in the docs.

Following on my one-to-many JOIN post --

http://blog.arbingersys.com/2008/04/google-app-engine-one-to-many-joi...

-- I've created a second tutorial that demonstrates how to, in effect,
do a many-to-many JOIN using the Datastore. It's up on my blog:

http://blog.arbingersys.com/

Permalink:

http://blog.arbingersys.com/2008/04/google-app-engine-many-to-many-jo...

You can download the source and run it in your SDK. Hope you find it
useful.

James


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Yagiz Erkan  
View profile  
 More options Apr 29 2008, 11:44 am
From: "Yagiz Erkan" <yagizer...@gmail.com>
Date: Tue, 29 Apr 2008 16:44:09 +0100
Local: Tues, Apr 29 2008 11:44 am
Subject: Re: [google-appengine] Many-to-many JOIN with the Datastore
James,

Thanks for the blog entries!

Regards,

 - Yagiz -


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeremey Barrett  
View profile  
 More options Apr 29 2008, 11:50 am
From: "Jeremey Barrett" <jeremey.barr...@gmail.com>
Date: Tue, 29 Apr 2008 10:50:03 -0500
Local: Tues, Apr 29 2008 11:50 am
Subject: Re: [google-appengine] Many-to-many JOIN with the Datastore
Out of curiosity, why not use a join model?

class Library(db.Model):
    name = db.StringProperty()
    address = db.StringProperty()

    def books(self):
        return (x.book for x in self.book_set)

class Book(db.Model):
    title = db.StringProperty()

    def libraries(self):
        return (x.library for x in self.librarybook_set)

class LibraryBook(db.Model):
    library = db.ReferenceProperty(Library)
    book = db.ReferenceProperty(Book)

Then you can CRUD the LibraryBook and store quantities there, etc.

Regards,
Jeremey.

On Tue, Apr 29, 2008 at 10:19 AM, arbinger...@gmail.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dado  
View profile  
 More options Apr 29 2008, 12:16 pm
From: Dado <edoardo.marc...@gmail.com>
Date: Tue, 29 Apr 2008 09:16:21 -0700 (PDT)
Local: Tues, Apr 29 2008 12:16 pm
Subject: Re: Many-to-many JOIN with the Datastore
I too think using a join model for many-to-many relationship is more
appropriate (see http://groups.google.com/group/google-appengine/msg/d8b3901937c7765d).
Reference property effectively model many-to-one (or belongs_to)
relationships. From the other side (using the automagically generated
collection) you get a one-to-many (or has_many) relationship. Using
the join model you get many-to-many relationships (or
has_and_belongs_to_many). It would be nice to have a one-to-one
relationship also, but that's easy to code over the one-to-many
relationship.

On Apr 29, 8:50 am, "Jeremey Barrett" <jeremey.barr...@gmail.com>
wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
arbingersys@gmail.com  
View profile  
 More options Apr 29 2008, 12:50 pm
From: "arbinger...@gmail.com" <arbinger...@gmail.com>
Date: Tue, 29 Apr 2008 09:50:39 -0700 (PDT)
Local: Tues, Apr 29 2008 12:50 pm
Subject: Re: Many-to-many JOIN with the Datastore
That does look a little better -- at least more analogous to the SQL
counterpart. I'll play around with it maybe and compare it to what I
came up with.

James

On Apr 29, 4:50 pm, "Jeremey Barrett" <jeremey.barr...@gmail.com>
wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben the Indefatigable  
View profile  
 More options Apr 29 2008, 2:06 pm
From: Ben the Indefatigable <bcbry...@gmail.com>
Date: Tue, 29 Apr 2008 11:06:45 -0700 (PDT)
Local: Tues, Apr 29 2008 2:06 pm
Subject: Re: Many-to-many JOIN with the Datastore
Using the reference property has also been called a "virtual join" on
this forum. You can actually do the select on the intermediary entity
without pulling the referenced entities. This is good if you de-
normalize the data that you need for your query out of those
referenced entities into your intermediary entity (such as LibraryBook
above) to avoid pulling the additional entities in your common read
cases. A good discussion of many to many joins in the datastore should
mostly discuss how to avoid them and find scalable alternatives. :)

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
arbingersys@gmail.com  
View profile  
 More options Apr 29 2008, 2:19 pm
From: "arbinger...@gmail.com" <arbinger...@gmail.com>
Date: Tue, 29 Apr 2008 11:19:40 -0700 (PDT)
Local: Tues, Apr 29 2008 2:19 pm
Subject: Re: Many-to-many JOIN with the Datastore
Ben: Can you post a code snippet that might demonstrate the
intermediate query sans referenced entities? I think I understand what
you're saying, but something visual would definitely help.

James

On Apr 29, 7:06 pm, Ben the Indefatigable <bcbry...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
arbingersys@gmail.com  
View profile  
 More options Apr 29 2008, 4:26 pm
From: "arbinger...@gmail.com" <arbinger...@gmail.com>
Date: Tue, 29 Apr 2008 13:26:11 -0700 (PDT)
Local: Tues, Apr 29 2008 4:26 pm
Subject: Re: Many-to-many JOIN with the Datastore
I went ahead and played with your "join model" example, Jeremey, and
it's definitely an improvement over what I had. Obtaining books from
libraries and libraries from books becomes quite intuitive:

    # Books by library
    for book in lib1.books():
        self.response.out.write(book.title + ' ' + book.author + '<br /

>')

    # Libraries by book
    for lib in book1.libraries():
        self.response.out.write(book1.title +
            ' ' + book1.author + '-' + lib.name + '<br />')

I still wanted to demonstrate outputting all data in total (notice
from the above that I'm getting all the books at library 'lib1' and
all the libraries that contain 'book1) and so I did the following,
using Libraries and Books Models as single collection points for each
entity type:

class Libraries(db.Model):
    notes = db.StringProperty()

class Books(db.Model):
    notes = db.StringProperty()

class Library(db.Model):
    name = db.StringProperty()
    address = db.StringProperty()
    city = db.StringProperty()
    libscol = db.ReferenceProperty(Libraries,
collection_name='libscol')

    def books(self):
        return (x.book for x in self.librarybook_set)

class Book(db.Model):
    title = db.StringProperty()
    author = db.StringProperty()
    bookscol = db.ReferenceProperty(Books, collection_name='bookscol')

    def libraries(self):
        return (x.library for x in self.librarybook_set)

class LibraryBook(db.Model):
    library = db.ReferenceProperty(Library)
    book = db.ReferenceProperty(Book)

Then, I was able to output everything from the library perspective and
the book perspective:

    # All libraries, all books
    for lib in libs.libscol.order('-name'):
        for book in lib.books():
            self.response.out.write(
                lib.name + '-' + book.title + ' ' + book.author +
'<br />')

    # All books, all libraries
    for book in books.bookscol.order('author').order('-title'):
        for lib in book.libraries():
            self.response.out.write(
                book.title + ' ' + book.author + '-' + lib.name +
'<br />')

I really just wanted to demonstrate being able to do this. I plan to
post an update to my blog using this code. Is there a better way to do
the last part, which is full, ordered output?

Thanks for all the help. James


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben the Indefatigable  
View profile  
 More options Apr 29 2008, 8:01 pm
From: Ben the Indefatigable <bcbry...@gmail.com>
Date: Tue, 29 Apr 2008 17:01:53 -0700 (PDT)
Local: Tues, Apr 29 2008 8:01 pm
Subject: Re: Many-to-many JOIN with the Datastore
Thanks for the response James. I was talking in terms of scalability.
Since the datastore is not written for this kind of join access on a
page load, it might be made more feasible by denormalizing at least
the fields you will need as you are displaying all the books in a
library (booktitle) and/or all the libraries a book is in
(libraryname):

class LibraryBook(db.Model):
    library = db.ReferenceProperty(Library)
    book = db.ReferenceProperty(Book)
    booktitle = db.StringProperty()
    libraryname = db.StringProperty()

So if your page shows a whole list of librarynames without actually
pulling the library entities, then you could include links for
drilling down into more details about a particular library such as the
address and only at that point would you access the particular library
entity with:

librarybook.library.address

Since until you follow that reference, the entity will not be pulled.
Denormalizing goes against the intuition at first, I know.

On Apr 29, 2:19 pm, "arbinger...@gmail.com" <arbinger...@gmail.com>
wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben the Indefatigable  
View profile  
 More options Apr 29 2008, 9:33 pm
From: Ben the Indefatigable <bcbry...@gmail.com>
Date: Tue, 29 Apr 2008 18:33:53 -0700 (PDT)
Local: Tues, Apr 29 2008 9:33 pm
Subject: Re: Many-to-many JOIN with the Datastore
actually, librarybook.library.address confuses what I was saying. You
would have a key (likely in the URL of the link) in order to go
directly to the library entity on the new request.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
arbingersys@gmail.com  
View profile  
 More options Apr 30 2008, 10:23 am
From: "arbinger...@gmail.com" <arbinger...@gmail.com>
Date: Wed, 30 Apr 2008 07:23:08 -0700 (PDT)
Local: Wed, Apr 30 2008 10:23 am
Subject: Re: Many-to-many JOIN with the Datastore
On Apr 30, 1:01 am, Ben the Indefatigable <bcbry...@gmail.com> wrote:

> Since the datastore is not written for this kind of join access on a
> page load, it might be made more feasible by denormalizing at least
> the fields you will need as you are displaying all the books in a
> library (booktitle) and/or all the libraries a book is in
> (libraryname):

> class LibraryBook(db.Model):
>     library = db.ReferenceProperty(Library)
>     book = db.ReferenceProperty(Book)
>     booktitle = db.StringProperty()
>     libraryname = db.StringProperty()

> ...

> Denormalizing goes against the intuition at first, I know.

Ben, thanks, that's helpful. The whole idea of "disk and CPU are
cheap, stop worrying about them and scale" is starting to gel, I
think. I do find it amazing how resistant a mind is to change, though.
I look at the code above, and even knowing why denormalization is
useful in GAE, I still want to worry about the wastefulness of having
both librarybook.booktitle and book.title.

James


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brett Morgan  
View profile  
 More options Apr 30 2008, 8:36 pm
From: "Brett Morgan" <brett.mor...@gmail.com>
Date: Thu, 1 May 2008 10:36:33 +1000
Local: Wed, Apr 30 2008 8:36 pm
Subject: Re: [google-appengine] Re: Many-to-many JOIN with the Datastore
On Thu, May 1, 2008 at 12:23 AM, arbinger...@gmail.com

Habits take time to change. We, as an industry, have been focusing on
speed and size just about forever. It's only when you pull back from
the micro world and focus on what we are attempting to achieve that
you realise that you can achieve real world usability doing things
that seem counter-intuitive on first pass.

Intuitive, of course, is another way of saying familiar. =)

> James

--

Brett Morgan http://brett.morgan.googlepages.com/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
cyj  
View profile  
 More options May 3 2008, 9:27 am
From: cyj <chenyue...@gmail.com>
Date: Sat, 3 May 2008 06:27:48 -0700 (PDT)
Local: Sat, May 3 2008 9:27 am
Subject: Re: Many-to-many JOIN with the Datastore
Hi, I am a newbie to both GAE and python.
I have a question about how to query the m:n model simulated by GAE?
e.g. get a library has book1.

On May 1, 8:36 am, "Brett Morgan" <brett.mor...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
arbingersys@gmail.com  
View profile  
 More options May 8 2008, 12:17 pm
From: "arbinger...@gmail.com" <arbinger...@gmail.com>
Date: Thu, 8 May 2008 09:17:08 -0700 (PDT)
Local: Thurs, May 8 2008 12:17 pm
Subject: Re: Many-to-many JOIN with the Datastore
On May 3, 2:27 pm, cyj <chenyue...@gmail.com> wrote:

> Hi, I am a newbie to both GAE and python.
> I have a question about how to query the m:n model simulated by GAE?
> e.g. get a library has book1.

That's described in this post:

http://blog.arbingersys.com/2008/04/google-app-engine-better-many-to-...

Specifically, I pass

'libs_by_book': book1.libraries(),

to the template, which in turn does:

  <h4>All libraries with book "{{ forbook }}"</h4>
  <table border="1">
      {% for lib in libs_by_book %}
        <tr>
            <td>{{ lib.name }}</td>
            <td>{{ lib.address }}</td>
            <td>{{ lib.city }}</td>
        </tr>
        {% endfor %}
    </table>


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
joh...@easypublisher.com  
View profile  
 More options May 12 2008, 11:59 am
From: joh...@easypublisher.com
Date: Mon, 12 May 2008 17:59:12 +0200
Local: Mon, May 12 2008 11:59 am
Subject: Re: [google-appengine] Re: Many-to-many JOIN with the Datastore
On 5/8/08, arbinger...@gmail.com <arbinger...@gmail.com> wrote:

> On May 3, 2:27 pm, cyj <chenyue...@gmail.com> wrote:
> > Hi, I am a newbie to both GAE and python.
> > I have a question about how to query the m:n model simulated by GAE?
> > e.g. get a library has book1.

> That's described in this post:

> http://blog.arbingersys.com/2008/04/google-app-engine-better-many-to-...

Hi,
Here is my modification of James example.py.
I've removed cleaned it up a bit for clarity (removing the Libraries
and Books datamodel).
(this is just the example.py replacement)

   http://cluebin.appspot.com/pasted/25

Regards,
/Johan

--
Johan Carlsson
Colliberty Easy Publisher
http://www.easypublisher.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andrew Fong  
View profile  
 More options May 14 2008, 7:15 pm
From: Andrew Fong <FongAnd...@gmail.com>
Date: Wed, 14 May 2008 16:15:27 -0700 (PDT)
Local: Wed, May 14 2008 7:15 pm
Subject: Re: Many-to-many JOIN with the Datastore
I still have issues with denormalization. It's not just a space issue.
The reason normalized databases don't repeat records is to avoid some
confusion down the road. For example, what happens if, in the
LibraryBook example, the Library changes its name? In a normalized
database, you would only have to update one record. Under a
denormalized database, would that entail finding every LibraryBook
that referenced that particular Library and updating it?

It so, it seems that the more denormalized a database is, the more
expensive updates are (even if the reads are fast).

Furthermore, it would require anyone trying to update an entity to
understand the structure of all the entities that referenced this
entity. In the LibraryBook example, updating the name attribute for
Library requires knowing that there is a libraryname attribute in
LibraryBook. Not a big deal for one model, but as the number of models
increases, it's going to get difficult keeping track of which entities
referencing Library have a libraryname attribute, which have a
libraryaddress attribute, and which ones might not have any such
attribute at all -- especially on a multi-person project.

Am I missing something?

-- Andrew

On Apr 30, 7:23 am, "arbinger...@gmail.com" <arbinger...@gmail.com>
wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brett Morgan  
View profile  
 More options May 14 2008, 8:01 pm
From: "Brett Morgan" <brett.mor...@gmail.com>
Date: Thu, 15 May 2008 10:01:09 +1000
Local: Wed, May 14 2008 8:01 pm
Subject: Re: [google-appengine] Re: Many-to-many JOIN with the Datastore

Yes, all of the above concerns are valid. Yes, denormalisation hurts,
both on disk space, and on correctness.

The reason we are doing this is to achieve scale. At scale you wind up
doing a bunch of things that seem wrong, but that are required by the
numbers we are running. Go watch the EBay talks. Or read the posts
about how many database instances FaceBook is running.

The simple truth is, what we learned about in uni was great for the
business automation apps of small to medium enterprise applications,
where the load was predictable, and there was money enough to buy the
server required to handle the load of 50 people doing data entry into
an accounts or business planning and control app.

On the web, we are in a different world. If you get successful, you'll
get slashdotted. Well, these days it's probably more correct to call
it reddited. Or boing boinged. And suddenly you have to go from 4
servers to fourty, to four hundred, to four thousand. Read up the
story about the iLike guys. They wrote an app that went viral on FB.
And they melted. Needed servers. Yesterday.

What GAE gives you is the ability to handle this, easily. All the
things that GAE makes you do is done with this end game in mind. You
have to write your code such that it can run on 400 app servers spread
across the globe, on google's infrastructure. You have to deal with
the fact that the transaction engine is distributed. You have to deal
with the fact that queries are slow, and you should really be
publishing entities that match one to one with your popular pages. And
that you need to hide your updates using ajax. It's better to give the
user a progress bar than a white screen of death, anyways.

If you aren't interested in serving millions of customers, then this
is likely overkill for you. But if you are, then you have to go
through this world view change. And yes it hurts. I'm not going to say
it's easy. It hurt me when I had to go through it back in 2000. It
actually took me about four attempts (aka, webapps that melted
underload) before i got it. But, once you make the leap, and
understand that we are breaking rules for a reason, then you'll
understand where and when to do it. Every choice has costs and
benefits. Understanding when GAE makes sense is part of this journey
of discovery.

And if any of the above doesn't make sense, feel free to come back
with more questions. =)

--

Brett Morgan http://brett.morgan.googlepages.com/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andrew Fong  
View profile  
 More options May 15 2008, 4:15 am
From: Andrew Fong <FongAnd...@gmail.com>
Date: Thu, 15 May 2008 01:15:00 -0700 (PDT)
Subject: Re: Many-to-many JOIN with the Datastore
Hmmm, so maybe the proper way to approach the datastore is think of it
as a pseudo-cache. Let's say we start with a more or less normalized
datastore and we do all the joins through a ReferenceProperty -- and
if we notice we're frequently using that reference, we "cache" the the
referenced values in the referencing entity. And we treat updates to
the referenced attribute using the same strategies we treat updates to
any item that's cached -- e.g. wait for the values to propagate via
some background task (speaking of which, how are people doing
background tasks in GAE?), whether that's one that runs periodically
or whenever certain kinds of entities are updated.

It seems to me that a large part of this could be automated though. I
really like how the datastore indices are automatically generated in
the index.yaml file without any action on the developers part. I'm new
to python and GAE but how feasible would it be to write a plugin that
automatically does this sort of "caching"?

-- Andrew

On May 14, 5:01 pm, "Brett Morgan" <brett.mor...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brett Morgan  
View profile  
 More options May 15 2008, 7:43 am
From: "Brett Morgan" <brett.mor...@gmail.com>
Date: Thu, 15 May 2008 21:43:15 +1000
Local: Thurs, May 15 2008 7:43 am
Subject: Re: [google-appengine] Re: Many-to-many JOIN with the Datastore

On Thu, May 15, 2008 at 6:15 PM, Andrew Fong <FongAnd...@gmail.com> wrote:

> Hmmm, so maybe the proper way to approach the datastore is think of it
> as a pseudo-cache. Let's say we start with a more or less normalized
> datastore and we do all the joins through a ReferenceProperty -- and
> if we notice we're frequently using that reference, we "cache" the the
> referenced values in the referencing entity. And we treat updates to
> the referenced attribute using the same strategies we treat updates to
> any item that's cached -- e.g. wait for the values to propagate via
> some background task (speaking of which, how are people doing
> background tasks in GAE?), whether that's one that runs periodically
> or whenever certain kinds of entities are updated.

Yes, splitting things so that you have a primary data source, and the
published cache entities makes sense. It's a strategy i've suggested
several times in various situations.

As for background tasks, you could always star my issue:
http://code.google.com/p/googleappengine/issues/detail?id=6

> It seems to me that a large part of this could be automated though. I
> really like how the datastore indices are automatically generated in
> the index.yaml file without any action on the developers part. I'm new
> to python and GAE but how feasible would it be to write a plugin that
> automatically does this sort of "caching"?

Probably. I haven't thought that far. =)

--

Brett Morgan http://brett.morgan.googlepages.com/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Filip  
View profile  
 More options May 15 2008, 8:01 am
From: Filip <filip.verhae...@gmail.com>
Date: Thu, 15 May 2008 05:01:51 -0700 (PDT)
Local: Thurs, May 15 2008 8:01 am
Subject: Re: Many-to-many JOIN with the Datastore
Another method is to unwind links into a single Expando model. Often,
you need an record which has a fixed set of references in a list, and
each referenced record has a set of properties. Instead of retrieving
the set of referenced records for the record every time, you could try
to include the referenced records in the parent record (Expando) by
including ref1_prop1, ref1_prop2, etc, where prop1 and prop2 are
replaced by actual property names. That way, you can substantially
reduce the number of searches in the database to build a single
webpage.

Now, I wonder if it would be possible to write a plugin script that
automatically convert those ref1, ref2, etc. back into a list you can
loop through as if there were real references. I'll try that.

Is anybody aware of limitations on the number of fields an Expando
model can have?

Filip.

On 15 mei, 10:15, Andrew Fong <FongAnd...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »