"Ancestor is" performance

349 views
Skip to first unread message

Tony Arkles

unread,
Jan 19, 2009, 10:10:24 AM1/19/09
to Google App Engine
Hi everyone!

In a thread [1], and in the documentation [2], it says that setting
ancestors doesn't affect performance, but I'm not sure that this is
the case.

I set up two queries, one using "WHERE locationKey = :1" (locationKey
is a db.StringProperty), and one using "WHERE ANCESTOR IS :1" (the
ancestor is an entity created based on the locationKey).

The measured "ms-cpu" in the request logs comes out WAY smaller for
the "ANCESTOR IS" query (roughly 3,000ms-cpu vs. 30,000 ms-cpu for
1,000 entities, and roughly this same ratio for smaller queries)

Does anyone have any thoughts on this? Did I mess something up, or is
there something from the documentation, or is it something else
entirely?

Cheers
Tony



[1]
http://groups.google.com/group/google-appengine/browse_thread/thread/29836be55ad7491d/f4039f9f27f31617?lnk=gst&q=ancestor+is#f4039f9f27f31617

Tony Arkles

unread,
Jan 19, 2009, 10:52:47 AM1/19/09
to Google App Engine

Alexander Kojevnikov

unread,
Jan 19, 2009, 6:42:11 PM1/19/09
to Google App Engine
> The measured "ms-cpu" in the request logs comes out WAY smaller for
> the "ANCESTOR IS" query (roughly 3,000ms-cpu vs. 30,000 ms-cpu for
> 1,000 entities, and roughly this same ratio for smaller queries)
>
> Does anyone have any thoughts on this?  Did I mess something up, or is
> there something from the documentation, or is it something else
> entirely?
>
From your second link:

All entities in a group are stored in the same datastore node.

I guess this means that entities from the same group are stored close
to each other. When your query uses "ANCESTOR IS", the query engine
can take advantage of this. Just a speculation though...

Tony Arkles

unread,
Jan 20, 2009, 11:39:05 AM1/20/09
to Google App Engine


On Jan 19, 5:42 pm, Alexander Kojevnikov <alexan...@kojevnikov.com>
wrote:
> From your second link:
>
>   All entities in a group are stored in the same datastore node.
>
> I guess this means that entities from the same group are stored close
> to each other. When your query uses "ANCESTOR IS", the query engine
> can take advantage of this. Just a speculation though...

Yeah, one idea we tossed around at the office was that it knows
*which* node to go to to fetch the entities. It'd be awesome to get
clarification about this though, it looks like a pretty big
performance boost.

Additionally, the ms-cpu advantage is lost if you add an "ORDER BY"
clause to the GQL.

ryan

unread,
Jan 20, 2009, 4:54:28 PM1/20/09
to Google App Engine
On Jan 19, 7:10 am, Tony Arkles <t.ark...@gmail.com> wrote:
>
> In a thread [1], and in the documentation [2], it says that setting
> ancestors doesn't affect performance, but I'm not sure that this is
> the case.
...
> The measured "ms-cpu" in the request logs comes out WAY smaller for
> the "ANCESTOR IS" query (roughly 3,000ms-cpu vs. 30,000 ms-cpu for
> 1,000 entities, and roughly this same ratio for smaller queries)

the important distinction to make here is between performance and
cost. the thread and doc you cited are correct in that ancestor
queries don't differ in performance from non-ancestor queries.
however, they may differ in their CPU cost, which is what you noticed.

On Jan 19, 3:42 pm, Alexander Kojevnikov <alexan...@kojevnikov.com>
wrote:
>
> All entities in a group are stored in the same datastore node.
>
> I guess this means that entities from the same group are stored close
> to each other. When your query uses "ANCESTOR IS", the query engine
> can take advantage of this. Just a speculation though...

this is definitely correct. the query engine can take advantage of
this in some kinds of queries, but most ancestor queries use a
composite (ie developer-defined) index. those queries don't take
advantage of the entity group locality.

another point i'd reiterate is that regardless of the filters,
ancestor, and sort orders you use, query performance should generally
be the same. a query consists of a single, bounded, bigtable scan over
an index table, along with individual row lookups for each result
entity. details in http://snarfed.org/space/datastore_talk.html . this
means that query performance will depend on the number and size of the
result entities fetched, but not (generally) on the query itself.

the one exception is merge join queries, ie queries that include only
equality filters, and maybe an ancestor, but no sort orders. those
queries are implemented using a somewhat more expensive algorithm.
it's still roughly O(n) in the number of result entities fetched, but
it has a higher constant factor. this is described in the slides
linked above, and has been discussed in more detail in other threads
on this group.

Tony Arkles

unread,
Jan 21, 2009, 10:09:56 AM1/21/09
to Google App Engine
Thanks for the excellent reply Ryan!



On Jan 20, 3:54 pm, ryan <ryanb+appeng...@google.com> wrote:
> entity. details inhttp://snarfed.org/space/datastore_talk.html. this
Reply all
Reply to author
Forward
0 new messages