--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
How wide and variable is the query window? How variable are the
occurrences of car journey data? Once a car starts having journeys,
will it continue having journeys for a (relatively) long time? You
said there would be up to 2,000 cars, are new cars frequently
introduced and old cars removed?
Depending on those questions, perhaps you could build your own
index. Think about an entity that stores a (non-indexed) list of cars
/ car journeys. If you make the key_name for that entity "lower
resolution" than the start time, you store the most recent journey for
each car. If you construct these entities properly, then you could
easily build a list of the keys needed to completely enclose a
particular query window by fetching a small number of these lists. If
you also create higher-level groups, you can also efficiently query
over very large periods of time.
The key names might look something like:
201102151430: 15 Feb 2011 14:30: Journeys that occurred between
14:25 and 14:30.
2011021514: 15 Feb 2011 14: Journeys that occurred from 13:01 to 14:00
20110215: 15 Feb 2011: Journeys that occurred on 15 Feb 2011
etc...
You might use a list of tuples on the entity that stores the car id,
journey id, and start time. That will allow you to efficiently fetch
the cars and the journeys and correctly combine and filter the results
in memory when you need more than one index entity.
You would need to construct the key names based on your query
windows. With some planning, the technique work very well even for
'continuous' query windows that are highly variable in size. I've
used this technique for reporting and am very pleased with it. It is
also relatively easy to implement. You just need a background process
that maintains your indexes, it can be (almost) real-time or an
offline background process.
Robert
(I'm by no means expert, so feel free to discount the following)
In my somewhat limited experience, I've identified only three ways of
making queries faster:
1. denormalizing data (pushing more work into the writes)
2. deferring work (typically via task queues)
3. admitting inconsistency (typically reconciling with cron jobs)
and I try to apply these techniques in that order. Disregarding (3)
(under the assumption that users need accurate results), you've
already explained that the data goes back to the client on the same
request, so (2) is only an option if the client is able to page the
results of queries over smaller lists of Cars, or poll for the
completion of a task. Perhaps this is an option.
As I see it, the crux of your problem is that you are trying to
retrieve the CarJourneys based on a list of Keys. Since App Engine
doesn't provide an equivalent to the SQL IN operator (one that won't
just repeat the query N times on your behalf), no amount of
denormalization is going to help you here, because you're always going
to be stuck making separate queries against each Car key OR plucking
out the small set of Cars you want from all the results.
IF there is something that identifies the Cars in your list (eg. that
they are all associated with the same Company say) then it might be
possible to denormalize that (eg. adding the Company key to every
CarJourney entity would allow you to narrow down the dataset returned
even if some application level filtering was still necessary). Without
that, I can't see a direct way of significantly improving the query.
Tom.