A GQL query cannot perform a SQL-like "join" query, how should i work around this?

2,882 views
Skip to first unread message

Rif

unread,
Apr 8, 2008, 6:24:04 PM4/8/08
to Google App Engine
Hi Google,

Per your doc "A GQL query cannot perform a SQL-like "join" query" how
should I design around this? Do you have a doc that goes into that?

Regards

RK

redsox2005

unread,
Apr 9, 2008, 3:23:05 PM4/9/08
to Google App Engine
Great question. I've got a fairly simple set of tables in a
relational database, and I'd like to build a sample application with a
basic datastore and give this Google App Engine and datastore a nice
little test run. But, looking through the API reference, there is no
mention of the GQL query using INNER JOIN.

I can see doing some interative GQL queries to get a collection of
objects I need. But, before I do that, is there something I've
perhaps missed? Are GQL INNER JOIN queries something simply not at
our disposal...perhaps an undocumented feature of some sort?


On Apr 8, 6:24 pm, Rif <rif.kia...@gmail.com> wrote:
> Hi Google,
>
> Per your doc "AGQLquery cannot perform a SQL-like "join" query" how

Bruno Patini Furtado

unread,
Apr 9, 2008, 3:30:56 PM4/9/08
to google-a...@googlegroups.com
According to the released videos so far, there's no joins available on the query language.
--
"I am not young enough to know everything" --Oscar Wilde.
"I know nothing except the fact of my ignorance." --Socrates.
Bruno Patini Furtado
Software Developer
webpage: http://bpfurtado.net
text adventures suite: http://bpfurtado.net/tas
software development blog: http://bpfurtado.livejournal.com

binaryjesus

unread,
Apr 9, 2008, 3:31:40 PM4/9/08
to Google App Engine
[...doing some interative GQL queries to get a collection of
objects I need....]

thats a pretty bad way of doing it. getting all those objects and THEN
doing in python what should have been done in sql it not at all
impressive.

all those objects r going to eat up memory bet then its runnin on
google's infrastructure they wont care *wink*

Ryan Mulligan

unread,
Apr 9, 2008, 3:56:10 PM4/9/08
to google-a...@googlegroups.com
On Wed, Apr 9, 2008 at 2:30 PM, Bruno Patini Furtado <bpfu...@gmail.com> wrote:
According to the released videos so far, there's no joins available on the query language.

The creator  of barbound.appspot.com  talked about how his users HAVE_MANY friends, so there's some way to do joins. I don't know if it's efficient or not.

Bruno Patini Furtado

unread,
Apr 9, 2008, 3:58:33 PM4/9/08
to google-a...@googlegroups.com
Joins as we know from the databases world are clearly not supported.

It's probably some manual work around... but still interesting to see :)

Digital Logic

unread,
Apr 9, 2008, 4:21:01 PM4/9/08
to Google App Engine
A lot of people are going to be looking for this (myself included).
Is anyone else interested in implementing a version of the Python DB-
API that translates SQL queries into GQL commands? It would make it A
LOT easier to port existing python web applications.

-Mark



On Apr 9, 3:58 pm, "Bruno Patini Furtado" <bpfurt...@gmail.com> wrote:
> Joins as we know from the databases world are clearly not supported.
>
> It's probably some manual work around... but still interesting to see :)
>
>
>
> On Wed, Apr 9, 2008 at 4:56 PM, Ryan Mulligan <r...@ryantm.com> wrote:
>
> > On Wed, Apr 9, 2008 at 2:30 PM, Bruno Patini Furtado <bpfurt...@gmail.com>
> > wrote:
>
> > > According to the released videos so far, there's no joins available on
> > > the query language.
>
> > The creator  of barbound.appspot.com  talked about how his users HAVE_MANY
> > friends, so there's some way to do joins. I don't know if it's efficient or
> > not.
>
> > > On Wed, Apr 9, 2008 at 4:23 PM, redsox2005 <mganley2...@gmail.com>

Paul Jobs

unread,
Apr 9, 2008, 4:23:04 PM4/9/08
to google-a...@googlegroups.com
is the 500 limit for the db?. I think it is more for the code am i right?

cuz if it is for the db no point in scaling our app since we cant store
a million user rows

Ryan Mulligan

unread,
Apr 9, 2008, 4:26:11 PM4/9/08
to google-a...@googlegroups.com
I believe that if you are scaling your application you will be paying money to Google. Some of that money will be for increasing your storage limits.

redsox2005

unread,
Apr 9, 2008, 4:38:31 PM4/9/08
to Google App Engine
I agree, certainly not the optimal way of doing this. But, it looks
like the only option for now.

A lot of "lists", sifted and spliced with Python logic for sure...


On Apr 9, 3:31 pm, binaryjesus <coolman.gu...@gmail.com> wrote:
> [...doing some interative GQL queries to get a collection of
> objects I need....]
>
> thats a pretty bad way of doing it. getting all those objects and THEN
> doing in python what should have been done in sql it not at all
> impressive.
>
> all those objects r going to eat up memory bet then its runnin on
> google's infrastructure they wont care *wink*
>
> On Apr 10, 12:23 am, redsox2005 <mganley2...@gmail.com> wrote:
>
>
>
> > Great question.  I've got a fairly simple set of tables in a
> > relational database, and I'd like to build a sample application with a
> > basic datastore and give this Google App Engine and datastore a nice
> > little test run.  But, looking through the API reference, there is no
> > mention of the GQL query using INNERJOIN.
>
> > I can see doing some interative GQL queries to get a collection of
> > objects I need.  But, before I do that, is there something I've
> > perhaps missed?   Are GQL INNERJOINqueries something simply not at
> > our disposal...perhaps an undocumented feature of some sort?
>
> > On Apr 8, 6:24 pm, Rif <rif.kia...@gmail.com> wrote:
>
> > > Hi Google,
>
> > > Per your doc "AGQLquery cannot perform a SQL-like "join" query" how
> > > should I design around this? Do you have a doc that goes into that?
>
> > > Regards
>
> > > RK- Hide quoted text -
>
> - Show quoted text -

Lee O

unread,
Apr 9, 2008, 9:52:33 PM4/9/08
to google-a...@googlegroups.com
Yea, i believe you can scale in all mediums, but any expanse would be paid.
--
Lee Olayvar
http://www.leeolayvar.com

Lee O

unread,
Apr 9, 2008, 10:20:58 PM4/9/08
to google-a...@googlegroups.com
This does seem odd, i'm sure there is a way/better way. I doubt they are simply giving us a poor methodology to work with, knowing full well that it will hog their resources due to the code we're forced to write.

paul jobs

unread,
Apr 10, 2008, 3:34:43 AM4/10/08
to google-a...@googlegroups.com
no wat about till google discloses the prices?

Jens Scheffler

unread,
Apr 10, 2008, 11:35:53 AM4/10/08
to Google App Engine
I haven't really worked with this yet, but check out this sections
from the documentation:

http://code.google.com/appengine/docs/datastore/entitiesandmodels.html#References
http://code.google.com/appengine/docs/datastore/typesandpropertyclasses.html#ReferenceProperty

I know that those are not really joins, but they allow to model entity
relationships. Don't know about
their capabilities or limitations though (that probably also depends
on your specific use case).

If anyone has played with this, please post an article :-)

Cameron Singe

unread,
Apr 10, 2008, 6:21:04 PM4/10/08
to Google App Engine
Just throwing it out there, but could you do?

Select * from Person,Contact
Where Person.ContactID = Contact.ID


On Apr 11, 12:35 am, Jens Scheffler <schefflerj...@gmail.com> wrote:
> I haven't really worked with this yet, but check out this sections
> from the documentation:
>
> http://code.google.com/appengine/docs/datastore/entitiesandmodels.htm...http://code.google.com/appengine/docs/datastore/typesandpropertyclass...

Brett Morgan

unread,
Apr 10, 2008, 7:04:50 PM4/10/08
to google-a...@googlegroups.com
Why split person and contact?

Ben the Indefatigable

unread,
Apr 10, 2008, 7:15:03 PM4/10/08
to Google App Engine
> his users HAVE_MANY friends, so there's some way to do joins

This is not necessarily a join, an entity can contain a list of keys
to other entities.

BigTable has some join capability so I suspect they will eventually
introduce something.

Brett Morgan

unread,
Apr 10, 2008, 7:25:05 PM4/10/08
to google-a...@googlegroups.com

You are looking for read time functionality. Everything about how
google works is trading request time for disk space. Push you effort
into pre-computing things at write time and you will be going with the
grain of BigTable.

Lee O

unread,
Apr 10, 2008, 8:51:22 PM4/10/08
to google-a...@googlegroups.com
What exactly do you mean by precomputing the data to go with BigTables methodology?

In the previous example of:

Select * from Person,Contact
Where Person.ContactID = Contact.ID


What exactly would you do?

Brett Morgan

unread,
Apr 10, 2008, 9:07:48 PM4/10/08
to google-a...@googlegroups.com
On Fri, Apr 11, 2008 at 10:51 AM, Lee O <lee...@gmail.com> wrote:
> What exactly do you mean by precomputing the data to go with BigTables
> methodology?
>
> In the previous example of:
>
> Select * from Person,Contact
> Where Person.ContactID = Contact.ID
>
> What exactly would you do?

Merge the concepts of Person and Contact, for starters.

In the dim dark past when relational databases came to the fore, disk
was expensive. So we made sure to slice and dice things such that
there was no wasted space. Thus instead of optional fields, you
created seperate tables such that the optional fields could be pulled
in using a join.

In this new world of disk space being free, merge these previously
split concepts such that the optional fields are in the main object.
Thus the reason why you keep seeing denormalisation being bandied
about this group as a tactic for dealing with BigTable. Make few,
large entities with optional fields, instead of lots of small
entities.

This is the same lesson we had to learn with RPC. With normal
procedure calls, having lots of small calls with a few parameters made
sense, because stack space needed to be conserved, and the latency of
a local call is almost nothing. With RPC, each individual call is
expensive, both in computational terms at each end for serialisation
and deserialisation and also in raw network latency. Suddenly you had
to change the shape of the functions. Instead of lots of little calls,
you suddenly had a few calls that returned lots of data. Cheaper to
return a copy of the world than make fifty calls to get the small part
of the world you were interested in.

Cameron Singe

unread,
Apr 11, 2008, 1:46:55 AM4/11/08
to Google App Engine
It was just an example, of course they would be merged in a real app

I swapped this around, which might make things a little clearer

Select * from Person,Contact
Where Contact.PersonID = Person.ID

Instead of Trad SQL

Select * from Person inner join Contact.PersonID = Person.ID

Mike Axiak

unread,
Apr 11, 2008, 1:53:25 AM4/11/08
to Google App Engine
On Apr 11, 1:46 am, Cameron Singe <csi...@gmail.com> wrote:
> [...]
> Select * from Person,Contact
> Where Contact.PersonID = Person.ID

This is called an "implicit join". It's still a join, and thus doesn't
fit in the BigTables model.
To fit with this new model, you have to give up normalization for ease
of scalability. This can be tricky in some circumstances, but surely
there's no need for a separate Contact table/model in this case? This
seems like a trivial "denormalization" to fit the BigTables model.

To read more about the BigTables technology, the whitepaper discusses
the theory and practice nicely [1].

1: http://labs.google.com/papers/bigtable-osdi06.pdf

-Mike
http://mike.axiak.net/

Justin

unread,
Apr 11, 2008, 9:31:10 AM4/11/08
to Google App Engine
As Brett said you have to stop thinking in terms of relational
database. Denormalize your data, define your schema in your models,
embrace dynamic properties and set yourself free.

- Justin

David

unread,
Apr 11, 2008, 11:23:24 AM4/11/08
to Google App Engine


Can't you use some sort of sub-queries? Just pull an initial result
set back from one query and use that as a filter for a second query.

trad sql would be: select * from table_a where table_a.field1 in
(select field1 from table_b) True you aren't getting any data back
from table_b but it should work for some results. anyone see why that
wouldn't work? Haven't tried it yet though.

Leap

unread,
Apr 23, 2008, 9:59:50 PM4/23/08
to Google App Engine
Add to the fact that Big Table is schema-less if a value is zero,
null, false, etc then instead of defining that value within the
document simply don't and check for it when the results are iterated.
The only consistent values that need be in a table are your keys,
otherwise additions and modifications are fare game. So instead of
doing a join, count with group to see how many widgets you have in
each color simply have a widget count field within your table and
update it whenever inventory changes. To see if images are associated
with said widgets then pre-process an image array within the widget
table, if no images exist don't bother defining them. This is actually
a great prospect because I don't know about anyone else but wading
through pages of SQL with thousands of joins is not my idea of a good
time.

What we are missing for this type of computing is the ability to
schedule tasks because pre-processing is often not desirable on
update. Actually, what I am missing is a google app engine account so
I can try these things out properly, the SDK is great and all

- Ryan

Edoardo Marcora

unread,
Apr 23, 2008, 11:58:38 PM4/23/08
to Google App Engine
When you define a property of type ReferenceProperty (you can think of
it as a foreign key or a BELONGS_TO association in RoR parlance) in
your model you automagically get a "join-like" behavior (or a HAS_MANY
association in RoR parlance) on the associated model.

For example:

class Book(db.Model):
title = db.StringProperty(required=True)
isbn = db.StringProperty()

class Chapter(db.Model):
title = db.StringProperty(required=True)
book = db.ReferenceProperty(Book, required=True, collection_name =
'chapters')

gae_book = Book(title="The ultimate guide to Google App Engine")
gae_book.put()

intro_chapter = Chapter(title="A short introduction to GAE",
book=gae_book)
intro_chapter.put()

gae_book.chapters => [intro_chapter, ...]

This should cover a lot of use cases where SQL joins are required with
SQL dbs. For example, you can create MANY_TO_MANY relationships by
defining a "join" model that holds the references to the associated
models.

Just my $.02

Edoardo "Dado" Marcora


On Apr 9, 12:56 pm, "Ryan Mulligan" <r...@ryantm.com> wrote:
> On Wed, Apr 9, 2008 at 2:30 PM, Bruno Patini Furtado <bpfurt...@gmail.com>

Edoardo Marcora

unread,
Apr 24, 2008, 12:00:54 AM4/24/08
to Google App Engine
There is no IN operator in GQL (I wish it was there!!!).

Dado

Brett Morgan

unread,
Apr 24, 2008, 12:05:59 AM4/24/08
to google-a...@googlegroups.com
I'm curious as to what design problems you are facing with the lack of
the IN operator. I'd love to help you re-arrange your solution. =)

Edoardo Marcora

unread,
Apr 24, 2008, 1:26:15 AM4/24/08
to Google App Engine
Well, I need to fetch a subset of "articles" given an array of unique
identifiers. I am now doing this by using the unique identifier as a
key name and passing the array into Model.get_by_key_name([array of
key names])... it is a bit of a hack but it works (it actually also
helps "solving" the lack of "unique=True" argument in property
constructors). But I am curious as to how you would solve the problem
(without, of course, looping over the array and performing a GQL query
for each identifier).

Dado

On Apr 23, 9:05 pm, "Brett Morgan" <brett.mor...@gmail.com> wrote:
> I'm curious as to what design problems you are facing with the lack of
> the IN operator. I'd love to help you re-arrange your solution. =)
>

Brett Morgan

unread,
Apr 24, 2008, 1:49:11 AM4/24/08
to google-a...@googlegroups.com
Ok, I think I understand what you are doing, but I'm going to have to
make assumptions about why. Feel free to correct me where I goof it.

I guess you are driving a magazine style site with content in
articles, and then summary pages with lists of articles, authors, and
a pull quote? Maybe with the articles dressed up into genres, or
related article groups? Or issues?

So looking at this, I can divide the site into layers, the main
landing page with article teasers, genre pages, individual article
pages, and then possibly comment threads and trackback links per
article.

We actually have a bunch of information that would be good to keep in
a normalised fashion for ease of editing. Authors, articles, genres,
commentors, et al. We also want to cache this information in
denormalised fashion for speed, i.e. pre-rendered to html.

So I'd keep both.

Have the CMS editing side interact with the normalised data, with a
nice big fat "publish" button that takes all this nicely normalised
data and generates all the rendered html for the landing page, the
genre pages, and the individual article pages with cross links to
related pages.

The publish button is going to take a while - it has a bunch of work
to do. So do it piecemeal via AJAX so that you can report progress
back to the user via a progress bar or something.

So, in a bunch of ways, I've completely avoided your question. I think
the technique you are using is probably quite good. I just wouldn't do
it outside of the "publish" phase. I'm centralising work in the cms
part such that the end user using the site sees a faster site, because
it's effectively a flat published site by then.

Does that help?

Gene Conroy-Jones

unread,
Apr 24, 2008, 4:02:54 AM4/24/08
to Google App Engine
This is the only post here which appears to be on the right track.
Because of the shared computing environment Google has opted to now
allow joins, however what would be good is an example of a scenario
which is completed in the RDBMS way and how one might complete the
same sceanrio in Big Table. Do you have a good example to share?

Regards
G

Edoardo Marcora

unread,
Apr 24, 2008, 4:15:34 AM4/24/08
to Google App Engine
> So, in a bunch of ways, I've completely avoided your question. I think
> the technique you are using is probably quite good. I just wouldn't do
> it outside of the "publish" phase. I'm centralising work in the cms
> part such that the end user using the site sees a faster site, because
> it's effectively a flat published site by then.
>
> Does that help?

Thanx for the lenghty reply but I am not designing a CMS. Actually
what I trying to implement is a jsonp service that acts in the
background to serve data for a Firefox plugin. The information sent by
the service is used by the plugin to "augment" an existing government
website with a bunch of useful features via ajax. The gov web site is
basically a search site... when the plugin sees the search results
(biomedical articles) it scrapes their UUIDs and send the service a
list thereof. The service has to be able to fetch the model instances
associated with this UUIDs very rapidly and send back the data to the
plugin. How would you approach this?

Moreover, I filed a ticket regarding the need for "transactional
callbacks" throught the model instance lifecycle... which would help
tremendously in cascading changes from the normalized models to the
flat records you're talking about... that said, I don't think this
solves my problem.

Dado

Brett Morgan

unread,
Apr 24, 2008, 4:34:31 AM4/24/08
to google-a...@googlegroups.com
On Thu, Apr 24, 2008 at 6:15 PM, Dado <edoardo...@gmail.com> wrote:
>
> > So, in a bunch of ways, I've completely avoided your question. I think
> > the technique you are using is probably quite good. I just wouldn't do
> > it outside of the "publish" phase. I'm centralising work in the cms
> > part such that the end user using the site sees a faster site, because
> > it's effectively a flat published site by then.
> >
> > Does that help?
>
> Thanx for the lenghty reply but I am not designing a CMS. Actually
> what I trying to implement is a jsonp service that acts in the
> background to serve data for a Firefox plugin. The information sent by
> the service is used by the plugin to "augment" an existing government
> website with a bunch of useful features via ajax. The gov web site is
> basically a search site... when the plugin sees the search results
> (biomedical articles) it scrapes their UUIDs and send the service a
> list thereof. The service has to be able to fetch the model instances
> associated with this UUIDs very rapidly and send back the data to the
> plugin. How would you approach this?

How many connections can you run in parrallel from the firefox plugin?

Edoardo Marcora

unread,
Apr 24, 2008, 4:51:48 AM4/24/08
to Google App Engine
I think it caps at 4 max concurrent connections. If I am thinking what
you're thinking, we are going back to the "one query per uuid" issue
at the datastore level. Wouldn't it be great if, instead, we could do
something like Article.gql("WHERE uuid IN :uuids ", uuids=[list of
uuids])???!!!

Dado

"On Apr 24, 1:34 am, "Brett Morgan" <brett.mor...@gmail.com> wrote:

Brett Morgan

unread,
Apr 24, 2008, 7:53:59 AM4/24/08
to google-a...@googlegroups.com
Actually, what I was wondering is if you have any need for server side
intelligence at all.

You have an intelligent client that is parsing the page, and figuring
out a list of unique identifiers. You then want information keyed by
each identifier to augment the page. GreaseMonkey style, i'm guessing.

So, what intelligence do you need server side?

I'd be thinking the most important thing server side is to get the
bits shipped quickly. What I'd do is publish information about each
unique id in it's own file. Formatted as json for ease of use on
client side. And a json formatted manifest file recording that you can
use client side to map from the uuids to server side urls. Then spread
them out across a bunch of virtual servers www[1..50].host.com. That
means you can fire as many parrallel requests (across the different
virtual hosts) so as to pull it all down. It looks like you have a
question of byte shipping.

Amusingly enough I was just reading in jwz's livejournal that the
original netscape code had a magic hostname that it did exactly the
above trick to get load balancing. Ahh, nostalgia. Client side load
balancing. Heh.

But even if you do want to stick with gae for what ever set of
reasons, I kinda doubt that query performance is going to hurt you.
Where I expect you to hurt is the 500meg disk space limitation...

Filip

unread,
Apr 24, 2008, 8:18:22 AM4/24/08
to Google App Engine
Has anybody tried this on a reasonable scale? Does it scale?

Filip
> > > software development blog:http://bpfurtado.livejournal.com- Tekst uit oorspronkelijk bericht niet weergeven -
>
> - Tekst uit oorspronkelijk bericht weergeven -

Brett Morgan

unread,
Apr 24, 2008, 8:43:26 AM4/24/08
to google-a...@googlegroups.com
On Thu, Apr 24, 2008 at 10:18 PM, Filip <filip.v...@gmail.com> wrote:
>
> Has anybody tried this on a reasonable scale? Does it scale?
>
> Filip
>

Depends what yo mean by perform. =)

Each new entity you reference is another instance you are pulling from
DataStore, with the overhead of finding it on the cluster and moving
it across the wire to your appserver.

Filip

unread,
Apr 24, 2008, 10:50:25 AM4/24/08
to Google App Engine
Exactly, so the approach doesn't scale at all.

On 24 apr, 14:43, "Brett Morgan" <brett.mor...@gmail.com> wrote:

Edoardo Marcora

unread,
Apr 24, 2008, 1:35:25 PM4/24/08
to Google App Engine
> You have an intelligent client that is parsing the page, and figuring
> out a list of unique identifiers. You then want information keyed by
> each identifier to augment the page. GreaseMonkey style, i'm guessing.

Greasemonkey-style... exactly right! But with data coming across-
domains from a GAE web app.

> So, what intelligence do you need server side?

Server-side I need, first and foremost, persistence of user-specific
data entered by the user through the plugin about a specific article
on the "augmented" gov site.
The interaction is all ajax-based and thus needs to be quick. Using
data from all users the server, periodically, calculates euclidean
distances between users and uses this information to recommend to
users articles of interest... among other things. Not having
background tasks is another severe limitation of GAE in this respect.

> It looks like you have a question of byte shipping.

Caching the data in static files and using load-balancing client-side
is not an option, since the app is write-intensive... the users
interact with each article and thus with the datastore record
frequently (adding ratings, annotations, assigning tags etc).

> But even if you do want to stick with gae for what ever set of
> reasons, I kinda doubt that query performance is going to hurt you.
> Where I expect you to hurt is the 500meg disk space limitation...

The reason I am playing around with GAE is because of scalability
issue (I assume GAE scales but have not seen real-world benchmarking
yet). The govt site contains, as of today, more than 18 millions
articles and there will be, hopefully, millions of users (biomedical
scientists and doctors all around the world). Being able to scale is
important. But what I gather from your answers is that the only
solution (beside the one that I've already found using key names) to
do this in GAE is to perform a query for each client-supplied uid in a
for loop... kinda ugly. How difficult would it be to implemente a IN
operator in GQL? Ensuring uniqueness of the value of a property across
the datastore is another very much lacking feature that has not been
transferred over from SQL (UNIQUE indices) and Django (unique = True
property constructor argument).

Dado

Brett Morgan

unread,
Apr 24, 2008, 6:31:43 PM4/24/08
to google-a...@googlegroups.com
>
> > So, what intelligence do you need server side?
>
> Server-side I need, first and foremost, persistence of user-specific
> data entered by the user through the plugin about a specific article
> on the "augmented" gov site.
> The interaction is all ajax-based and thus needs to be quick. Using
> data from all users the server, periodically, calculates euclidean
> distances between users and uses this information to recommend to
> users articles of interest... among other things. Not having
> background tasks is another severe limitation of GAE in this respect.

You have stared issue 6, right? =)

http://code.google.com/p/googleappengine/issues/detail?id=6

> > It looks like you have a question of byte shipping.
>
> Caching the data in static files and using load-balancing client-side
> is not an option, since the app is write-intensive... the users
> interact with each article and thus with the datastore record
> frequently (adding ratings, annotations, assigning tags etc).

I'm guessing you are heading in the direction of having popular
articles pages, recently annotated articles pages, and tag pages?
These ajaxy interactions can be slower because the user isn't
nacigating away from the page. So you can update pre-generated html
content on these requests.

> > But even if you do want to stick with gae for what ever set of
> > reasons, I kinda doubt that query performance is going to hurt you.
> > Where I expect you to hurt is the 500meg disk space limitation...
>
> The reason I am playing around with GAE is because of scalability
> issue (I assume GAE scales but have not seen real-world benchmarking
> yet). The govt site contains, as of today, more than 18 millions
> articles and there will be, hopefully, millions of users (biomedical
> scientists and doctors all around the world).

Yeah, I understand why you are using GAE now.

> Being able to scale is
> important. But what I gather from your answers is that the only
> solution (beside the one that I've already found using key names) to
> do this in GAE is to perform a query for each client-supplied uid in a
> for loop... kinda ugly. How difficult would it be to implemente a IN
> operator in GQL? Ensuring uniqueness of the value of a property across
> the datastore is another very much lacking feature that has not been
> transferred over from SQL (UNIQUE indices) and Django (unique = True
> property constructor argument).

I'm still figuring out what we can and can't do with the back end
store. And I so can't answer for the AppEngine team on how hard it
would be to implement functionality. You could raise an request on the
issue tracker tho...

Brett Morgan

unread,
Apr 24, 2008, 7:58:40 PM4/24/08
to google-a...@googlegroups.com
Just thinking out loud for a second. You have enough control over the
clients to install a firefox plugin, can you also install google gears
while you are at it? This would buy you the ability to cache
annotations, ratings, et al, client side. This would deal with speed
issues on needing to mashup the query results pages with the extra
annotations and star ratings. Then you could use GAE as a
communications store house that the star ratings and annotations pass
through. You could even run the data mining code client side in google
gears background threads.

Thoughts?

Edoardo Marcora

unread,
Apr 24, 2008, 8:08:23 PM4/24/08
to Google App Engine
> >  Server-side I need, first and foremost, persistence of user-specific
> >  data entered by the user through the plugin about a specific article
> >  on the "augmented" gov site.
> >  The interaction is all ajax-based and thus needs to be quick. Using
> >  data from all users the server, periodically, calculates euclidean
> >  distances between users and uses this information to recommend to
> >  users articles of interest... among other things. Not having
> >  background tasks is another severe limitation of GAE in this respect.
>
> You have stared issue 6, right? =)
>
> http://code.google.com/p/googleappengine/issues/detail?id=6

Yes, I did a while back!!! :)

> I'm still figuring out what we can and can't do with the back end
> store. And I so can't answer for the AppEngine team on how hard it
> would be to implement functionality. You could raise an request on the
> issue tracker tho...

I thought you were on the Google Datastore team ;)

Dado

Brett Morgan

unread,
Apr 24, 2008, 8:18:18 PM4/24/08
to google-a...@googlegroups.com
On Fri, Apr 25, 2008 at 10:08 AM, Dado <edoardo...@gmail.com> wrote:
>
> > I'm still figuring out what we can and can't do with the back end
> > store. And I so can't answer for the AppEngine team on how hard it
> > would be to implement functionality. You could raise an request on the
> > issue tracker tho...
>
> I thought you were on the Google Datastore team ;)

Bwahaha, I wish. All those toys to play with? I'd be in heaven =)

> Dado

Edoardo Marcora

unread,
Apr 24, 2008, 8:46:58 PM4/24/08
to Google App Engine

Brett Morgan

unread,
Apr 24, 2008, 8:59:57 PM4/24/08
to google-a...@googlegroups.com


I stared 223 some time back. I still don't understand 178, but I'm
being slow this morning.

Edoardo Marcora

unread,
Apr 24, 2008, 9:36:56 PM4/24/08
to Google App Engine
178 relates to unique constraints in property values. Something like
UNIQUE indices in SQL.

For example, say you have a User model (not the GAE's one) with an
email property that you want to make sure is unique across your user
base (but that can also change from time to time, so that it can't be
embedded in the key_name to ensure uniqueness, since the key_name is
immutable after creation). How would you do ensure that? It would nice
to have a Django-like unique attribute added to property constructors
to ensure uniqueness (e.g., email = db.EmailProperty(unique = True,
required= True). Since email in the aforementioned examples becomes a
unique identifier for user instances, one should also be given dynamic
static methods for unique properties such as these,
User.get_by_email('some...@somewhere.com') and
User.get_or_insert_by_email('some...@somewhere.com', **kwattrs).

A problem still remains open though, I would one go about ensuring
uniqueness across multiple properties, like in sql composite unique
indices?

Overall, the mechanisms that we are given to ensure data integrity in
the datastore are still a bit lacking in my opinion.

Dado

On Apr 24, 5:59 pm, "Brett Morgan" <brett.mor...@gmail.com> wrote:

Ben the Indefatigable

unread,
Apr 25, 2008, 8:22:00 AM4/25/08
to Google App Engine

On Apr 24, 9:36 pm, Dado <edoardo.marc...@gmail.com> wrote:
> 178 relates to unique constraints in property values. Something like
> UNIQUE indices in SQL.
>

I doubt 178 is possible in BigTable; it seems to imply a transaction.

Edoardo Marcora

unread,
Apr 25, 2008, 11:32:36 AM4/25/08
to Google App Engine
I think it would just require an index, but I am a neuroscientist...
not a software engineer!
Reply all
Reply to author
Forward
0 new messages