Watch out for this subtle performance killer

4 views
Skip to first unread message

dburns

unread,
Nov 21, 2009, 9:41:46 AM11/21/09
to Google App Engine
I thought I'd share this, since I'm sure there are others that have
fallen into the same trap using this very common pattern (in this
sample, Pix derives from db.Model; get_pics is called on every page
load):

def get_pics(self):
pics = memcache.get("pics")
if pics is None:
pics = Pix.gql("LIMIT 100")
memcache.add("pics", pics, 300) # Good for 5 minutes
return pics

See the bug? Here, memcache is actually HURTING performance since the
overhead of memcache is there but it saves nothing at all. The query
is still executed on every page load when the calling code iterates
through the result.

http://code.google.com/appengine/docs/python/datastore/queryclass.html#Introduction
mentions this by saying "creating a new iterator from the Query object
will re-execute the query", but it doesn't highlight this pitfall.
The issue here is that entities are not fetched on the Pix.gql line.
Instead, that simply returns a Query object. The results are actually
fetched when the calling code begins to iterate (in Python-speak, the
__iter__() method on the Query is what actually fetches entities).

To fix this, you'd change the gql line to :
pics = list(Pix.gql("LIMIT 100"))
Putting a list() around the Pix.gql forces the query to happen at that
moment. Then the list of entities is stored in memcache, not the
Query object itself.

I'm not sure if this applies to the Java API too, but it's worth a
heads-up.

Comments welcome...

Sharp-Developer.Net

unread,
Nov 21, 2009, 2:03:31 PM11/21/09
to Google App Engine
pics = Pix.gql("something").fetch(100)

On Nov 21, 2:41 pm, dburns <drrnb...@gmail.com> wrote:
> I thought I'd share this, since I'm sure there are others that have
> fallen into the same trap using this very common pattern (in this
> sample, Pix derives from db.Model; get_pics is called on every page
> load):
>
>         def get_pics(self):
>                 pics = memcache.get("pics")
>                 if pics is None:
>                         pics = Pix.gql("LIMIT 100")
>                         memcache.add("pics", pics, 300)               # Good for 5 minutes
>                 return pics
>
> See the bug?  Here, memcache is actually HURTING performance since the
> overhead of memcache is there but it saves nothing at all.  The query
> is still executed on every page load when the calling code iterates
> through the result.
>
> http://code.google.com/appengine/docs/python/datastore/queryclass.htm...

Ikai L (Google)

unread,
Nov 23, 2009, 2:26:13 PM11/23/09
to google-a...@googlegroups.com
Yes, the Query object is lazy. This is so that it's possible to allow chaining of filters. It's actually a pretty powerful abstraction, but you do need to realize that the Query isn't executed until the last possible moment. When you store a Query object into Memcache, it hasn't executed yet.


--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=.





--
Ikai Lan
Developer Programs Engineer, Google App Engine

dreadjr

unread,
Nov 23, 2009, 10:42:04 PM11/23/09
to Google App Engine
so what is the best practice if something is added to the datamodel,
and you have already cached it, just to remove the memcache key and
reload?

On Nov 21, 6:41 am, dburns <drrnb...@gmail.com> wrote:
> I thought I'd share this, since I'm sure there are others that have
> fallen into the same trap using this very common pattern (in this
> sample, Pix derives from db.Model; get_pics is called on every page
> load):
>
>         def get_pics(self):
>                 pics = memcache.get("pics")
>                 if pics is None:
>                         pics = Pix.gql("LIMIT 100")
>                         memcache.add("pics", pics, 300)               # Good for 5 minutes
>                 return pics
>
> See the bug?  Here, memcache is actually HURTINGperformancesince the
> overhead of memcache is there but it saves nothing at all.  The query
> is still executed on every page load when the calling code iterates
> through the result.
>
> http://code.google.com/appengine/docs/python/datastore/queryclass.htm...

Ikai L (Google)

unread,
Nov 24, 2009, 1:12:31 PM11/24/09
to google-a...@googlegroups.com
When you need to expire a cached query, you can either:

1. Delete the cache entry
2. Regenerate the cache entry with updated data
3. Use a key scheme that allows you to lazy expire that model. Ex: Setting your key to pix:id:version. The challenge of this key is knowing what the newest version is.

--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.


Reply all
Reply to author
Forward
0 new messages