Datastore query limitations

131 views
Skip to first unread message

Peter

unread,
Sep 6, 2008, 9:03:57 AM9/6/08
to Google App Engine
Hi folks,
I'm new to google app engine so I'm probably missing something. I'm
trying to write an app that lets people do a search by longitude and
latitude. Basically the user will pick a point and I want to return
all results within a few km of that point.

On a standard db I'd just do ' WHERE lat between 2 and 4 AND lng
between 2 and 4' and the job is done. The datastore doesn't seem to
support this. Only one inequality operator is allowed. How should I
approach this problem the 'google' way?

Cheers,
Pete

baron

unread,
Sep 6, 2008, 4:27:55 PM9/6/08
to Google App Engine
You can't use inequality operators on two fields as in SQL. The data
store works in a much different way. I highly recommend the 2 data
store videos so you can see how it is all put together on the backend
which will help you model your applications. That being said, one way
to do bounding box location queries within the limitations of the data
store is to use geohashes (http://pypi.python.org/pypi/Geohash/
1.0rc1). You can then do 2 inequality checks on just ONE field which
is perfectly legal. ex: 'WHERE gHash > boundBoxGHashLowerLeftPoint
AND gHash < boundBoxGHashUpperRightPoint'

Peter

unread,
Sep 6, 2008, 7:24:52 PM9/6/08
to Google App Engine
Cheers for that Baron. Geohash does look interesting. So a geo hash
value for point A will always be less than the geo hash value of point
B if point B has a smaller latitude or a smaller longitude? That
sounds like the goods for me. I'll do some testing, and see if I can
track down those datastore vids as well.

Peter

unread,
Sep 7, 2008, 6:25:56 AM9/7/08
to Google App Engine
Thinking further on my application, I also want to filter my search
results by date. How should I go about writing something that would
let me query on both location and date at the same time?

Should I be trying to jam the date into the same field as well or
something?

Cheers,
Pete

Nash

unread,
Sep 7, 2008, 7:42:53 AM9/7/08
to Google App Engine
This is the part of the appengine that starts to get frustrating, the
more criteria you will want to add, the more impossible you will find
it to be.
If all you are looking for is to find data for a particular date then
I suggest adding three fields to your entity: year, month, day. If you
want to get results for a particular day, just provide values for
these attributes. By breaking the date unto three parts, you can also
get results by month, by year etc. But if you have want to pull data
between two points and between two dates, then good luck!

David Symonds

unread,
Sep 7, 2008, 10:37:21 AM9/7/08
to google-a...@googlegroups.com
On Sun, Sep 7, 2008 at 9:42 PM, Nash <nasr...@gmail.com> wrote:
>
> This is the part of the appengine that starts to get frustrating, the
> more criteria you will want to add, the more impossible you will find
> it to be.
> If all you are looking for is to find data for a particular date then
> I suggest adding three fields to your entity: year, month, day. If you
> want to get results for a particular day, just provide values for
> these attributes. By breaking the date unto three parts, you can also
> get results by month, by year etc. But if you have want to pull data
> between two points and between two dates, then good luck!

In that case, you would use multiple properties: one for the whole
date, and then split it out Y/M/D. Remember: BigTable is designed for
lots of redundant data that makes it really easy and fast to index and
find what you want. Don't be afraid to drop in redundant properties.


Dave.

Peter

unread,
Sep 7, 2008, 2:53:00 PM9/7/08
to Google App Engine
Hi,
Redundant data is fine by me. I still don't see how I can search on
both location and date though.

There must be a way...

Cheers,
Peter

On 7 Sep, 15:37, "David Symonds" <dsymo...@gmail.com> wrote:

iceanfire

unread,
Sep 7, 2008, 3:21:07 PM9/7/08
to Google App Engine
That's not really true. In my experience, the more redundant data I
have, the more likely I am to go over the CPU limit for accessing a
datastore with lots of rows.

On Sep 7, 9:37 am, "David Symonds" <dsymo...@gmail.com> wrote:

Peter

unread,
Sep 7, 2008, 4:18:18 PM9/7/08
to Google App Engine
Seems like the datastore is a real sticking point. People saying it
flat out can't do some stuff and that it's bad at doing the stuff it
should be good at. Are we all ignorant? Is google app engine just
not ready yet?

David Symonds

unread,
Sep 7, 2008, 5:48:52 PM9/7/08
to google-a...@googlegroups.com
On Mon, Sep 8, 2008 at 5:21 AM, iceanfire <icea...@gmail.com> wrote:

> That's not really true. In my experience, the more redundant data I
> have, the more likely I am to go over the CPU limit for accessing a
> datastore with lots of rows.

So there's a cost in decoding a row, but if you typically only load a
small number then you can get the whole set indexed properly if you
have redundant data. It's a tradeoff.


Dave.

Peter

unread,
Sep 8, 2008, 3:31:27 PM9/8/08
to Google App Engine
I'm not at the point of caring about cost. As far as I can see it's
just flat out impossible to write a standard put data in, search on
various fields and return results sort of application.

So it seems to me that of course anything you write that uses
datatable will scale well, because it's not possible to write an
application of even moderate complexity....

On Sep 7, 10:48 pm, "David Symonds" <dsymo...@gmail.com> wrote:

Nash

unread,
Sep 8, 2008, 4:09:32 PM9/8/08
to Google App Engine
The primary workaround is that you architecture your data in some way
that makes the query compatible with the way appengine stores and
indexes data.

Other than that, all workarounds involve re-inventing a DB at the app
layer. As an example, in your case; you can have another Entity; which
has a date and a reference property to the main entity with the geo
information. Query both entities and get two sets, then perform an in-
memory intersection. You have your result set.

Of course, this means exhuming both data sets for a complete result.
Given that you have 2 seconds before your process dies; good luck!

It's one of those issues where you spend a lot of times figuring out
what you did wrong; the fact is, the functionality just does not
exist. You have to reinvent the wheel and reinvent it very poorly.
I've studied the data store enough to know why it works like this but
it doesn't help with Real Life Applications. It would be swell if a
Googler would take a complex case, and run through the a design
process of how to handle a query.

Good luck.

Pierre

unread,
Sep 8, 2008, 4:50:52 PM9/8/08
to Google App Engine
Hi Peter,

"Geohash does look interesting. So a geo hash
value for point A will always be less than the geo hash value of point
B if point B has a smaller latitude or a smaller longitude?"

That sounded good to me.... Unfortunately (I may be wrong) geohash
don't give you accurate results when asking geohash(SouthWest corner
point) < geohash(point) < geohash(NorthEast corner point).
For sure, it'll give you points inside a bounding box but this
bounding box may generally be much more wider than the one accurately
set to LAT/LNG of SW corner and NE corner.
IMHO best would be to use some kind of "real" spatial DB.

Regards,

Pierre

On Sep 7, 1:24 am, Peter <pe...@thinksmarthosting.com> wrote:
> Cheers for that Baron. Geohashdoes look interesting. So a geo hash

Peter

unread,
Sep 8, 2008, 6:47:22 PM9/8/08
to Google App Engine
Yeah,
My search for a solution led me to http://catherinedevlin.blogspot.com/2008/09/bigtable-blues.html.
Apparently a search on just 5000 rows is not possible.

I think I'll shelve the idea till someone (from google maybe....)
sticks up some sort of guideline/tutorial on how we are meant to
approach this common problem.

Cheers,
Pete

ready...@gmail.com

unread,
Sep 9, 2008, 11:30:46 AM9/9/08
to Google App Engine
Dave,

Thank you for this post. Your description of how to handle date
ranges led me to a solution on the map lat/long issue that has gotten
so much discussion in various threads. I've posted the following in a
couple of other threads as well, since it seems to be a real issue.
But the solution is redundant properties, as you indicate. Basically,
the idea is finding a way to tokenize a region so that an infinite
number of points becomes a block...

What I do is that I break down the grid into sub-degree squares by
truncating after the first decimal point, and then when I save
something that I need to find on the map later, I save metadata with
that point indicating the surrounding grid squares.

So if I have a point with long -122.123123123 and lat 35.56565, that's
in a grid square called -122.1x35.5, and it's surrounded as follows:

[-122.2x35.6][-122.1x35.6][-122.0x35.6]
[-122.2x35.5][-122.1x35.5][-122.0x35.5]
[-122.2x35.4][-122.1x35.4][-122.0x35.4]

Those are all represented in my object as a list
(db.StringListProperty, so you have to do the right permutations to
make them into strings), and because of the way lists work, if a point
that you're searching on is in any of the grid squares associated with
a saved point, that saved point will come up.

To wit, if you have saved that above point, and someone comes in
searching on an address that corresponds to:

LONG -122.0857
LAT 35.69999

that corresponds to grid square '-122.0x35.6', which is in your upper
right hand corner. Thus, if your code is something like...

square = '-122.0x35.6' # based on whatever normalizing function
you use
points = (SELECT * FROM Locations WHERE gridList=:1", square)

...you'll find that the original point we saved above will return. Of
course, this is a simple example. Each grid square can be any size
and shape, and you can cluster as many as you like in whatever shape
you like. For me, a 3x3 grid of 0.1 degree squares is good enough.
0.1 degrees is approximately 7 miles in the north/south direction, and
up to that big in the east-west dimension, at least at the equator.
It's a blunt tool, but once I've got a reasonable set of map data
back, then I can do further operations to throw out what I don't need.

It's all about metadata. Don't think in terms of inequalities and
boundary conditions, think in terms of inclusive ranges. With a
little creativity, you don't need inequalities, at least for this
problem.

Best,

Ben



On Sep 7, 7:37 am, "David Symonds" <dsymo...@gmail.com> wrote:

Peter

unread,
Sep 9, 2008, 2:19:12 PM9/9/08
to Google App Engine
Hi Ben,
Interesting approach, but only works if you know how you want to
break things up ahead of time. I'd like to be able to pick a point
and say 'give me everything within 1km' or 'give me everything within
100km'.

Cheers,
Pete

On Sep 9, 4:30 pm, "readyass...@gmail.com" <readyass...@gmail.com>
wrote:

ready...@gmail.com

unread,
Sep 9, 2008, 3:42:57 PM9/9/08
to Google App Engine
True, but there's no reason that you couldn't make an approximation
for each range within reason. For my app, which is about finding
garage sales, I assume that nobody wants to go more than 50 miles from
their starting point. I can certainly imagine applications where the
range might be a lot more free-floating, but this might resolve the
problem for a lot of cases.

-B
Reply all
Reply to author
Forward
0 new messages