NDB Caching Question

144 views
Skip to first unread message

Richard Arrano

unread,
Apr 24, 2012, 1:21:26 AM4/24/12
to Google App Engine
Hello,
I'm switching from db to ndb and I have a question regarding caching:

In the old db, I would have a class X that contains a reference to a
class Y. The Y type would be accessed most frequently and rarely
change. So when I would query an X and retrieve the Y type it points
to, I would store X in the memcache with the actual instance Y rather
than the key. If X is invalidated in the memcache, then so is the Y
instance but otherwise I would skip the step of querying Y upon re-
retrieving X from the memcache. Is there any way to do this in ndb? Or
must I re-query each Y type even if it is from memcache or context?

Thanks,
Richard

Guido van Rossum

unread,
Apr 24, 2012, 3:59:25 PM4/24/12
to google-a...@googlegroups.com
If you leave the caching to NDB, you probably needn't worry about this much. It's going to be an extra API call to retrieve Y (e.g. y = x.yref.get()) but that will generally be a memcache roundtrip. If you are retrieving a lot of Xes in one query, there's a neat NDB idiom to prefetch all the corresponding Ys in one roundtrip:

xs = MyModel.query(...).fetch()
_
= ndb.get_multi([x.yref for x in xs])



This effectively throws away the ys, but populates them in the context cache. After this, for any x in xs, the call x.yref.get() will use the context cache, which is a Python dict in memory. (Its lifetime is one incoming HTTP request.)

You can even postpone waiting for the ys, using an async call:

xs = MyModel.query(...).fetch()
_
= ndb.get_multi_async([x.yref for x in xs])




Now the first time you reference some x.yref.get() it will block for the get_multi_async() call to complete, and after that all subsequent x.yref.get() calls will be satisfied from memory (no server roundtrip at all).

Kaan Soral

unread,
Apr 24, 2012, 4:54:12 PM4/24/12
to google-a...@googlegroups.com
Your answer was very enjoying to read as it provides great insight into ndb, thanks

Offtopic: Is there any reason why one should store KeyProperty's instead of StringProperty's with string key names (assuming all keys are strings, that's how I store things) ?

On topic: Richard, I think you can get a PickleProperty and put the Y inside that pickle and occasionally refresh its contents (or never if Y stays the same), this way you can achieve what you were doing with db even easier with ndb (auto caching).

Richard Arrano

unread,
Apr 24, 2012, 6:07:35 PM4/24/12
to Google App Engine
Thank you for the quick and very informative reply. I wasn't even
aware this was possible with NDB. How would those x.yref.get() calls
show up in AppStats? Or would they at all if it's just pulling it from
memory?

Thank you Kaan as well, I will actually experiment with the
PickleProperty and see what's faster. I like that solution because the
X kind is not one I expect to be heavily cached so I don't mind
actually caching the pickled instance as I expect them to be evicted
within a relatively short amount of time.

I also wanted to ask: I saw someone did a speed test with NDB and I
noticed he was pulling 500 entities of 40K and in the worst-case 0%
cache hit scenario, it took something like 8-10 seconds. I was
actually planning to have a piece of my application regularly query
and cache ~2500 entities(of 2500) and sort on it to avoid a huge
amount of indices(and a NOT IN filter that would really slow things
down). Is this feasible or would you expect his results to scale, i.e.
500 entities with 0% cache hits * 5 ~= 40-50s in my usage scenario? Or
was there something unique to his situation with his indices and large
amount of data? In mine each entity has about 10 properties with zero
indices. If this is the case I'll probably copy the entities into a
JsonProperty that occasionally gets updated and simply query/cache
that since I don't expect the 2500 entities to change very often.

Thanks,
Richard

Guido van Rossum

unread,
Apr 25, 2012, 3:14:44 PM4/25/12
to google-a...@googlegroups.com
On Tuesday, April 24, 2012 3:07:35 PM UTC-7, Richard Arrano wrote:
Thank you for the quick and very informative reply. I wasn't even
aware this was possible with NDB. How would those x.yref.get() calls
show up in AppStats? Or would they at all if it's just pulling it from
memory?

If they pull from memory they don't show up in Appstats at all. Otherwise they'll probably look like a memcache Get possibly followed by a datastore Get.
 
Thank you Kaan as well, I will actually experiment with the
PickleProperty and see what's faster. I like that solution because the
X kind is not one I expect to be heavily cached so I don't mind
actually caching the pickled instance as I expect them to be evicted
within a relatively short amount of time.

If you're considering storing a pickled entity, you should look into LocalStructuredProperty, which is a little bit more efficient (but doesn't store the key).
 
I also wanted to ask: I saw someone did a speed test with NDB and I
noticed he was pulling 500 entities of 40K and in the worst-case 0%
cache hit scenario, it took something like 8-10 seconds. I was
actually planning to have a piece of my application regularly query
and cache ~2500 entities(of 2500) and sort on it to avoid a huge
amount of indices(and a NOT IN filter that would really slow things
down). Is this feasible or would you expect his results to scale, i.e.
500 entities with 0% cache hits * 5 ~= 40-50s in my usage scenario? Or
was there something unique to his situation with his indices and large
amount of data? In mine each entity has about 10 properties with zero
indices. If this is the case I'll probably copy the entities into a
JsonProperty that occasionally gets updated and simply query/cache
that since I don't expect the 2500 entities to change very often.

There are too many unknown variables here. You're best off benchmarking this yourself... 

Alexander Trakhimenok

unread,
Apr 25, 2012, 4:17:05 PM4/25/12
to Google App Engine
Richard, I would advise to go with the JSON property. In our project
we intensively use JSONs and update them in task queues & backends.
Actually we have a rule - every page should make just 3-5 DB requests.
In future we would consider to move from JSON to ProtoBuf but not for
now.

Also we've moved some rarely changed dictionaries (like geo locations
- e.g. all cities in the world) into the Python code. That pushed us
to use F2 instances due to higher memory demand but resulted in lower
latency and almost same costs. It's cheaper to upload new version of
app when needed.
--
Alexander Trakhimenok
Dev lead at http://www.myclasses.org/ project

Richard Arrano

unread,
Apr 25, 2012, 9:34:03 PM4/25/12
to Google App Engine
Do you mean rather than pull my 2500 entities, use a task to keep the
2500 updated in a single JSON property and then use it to sort on a
desired property as necessary? I was considering doing this as an
alternative. It seemed wasteful in my usage scenario to pull 2500
entities just to give the user back 50 or so, but to do it with
indexes caused a huge explosion in storage costs. Did you guys do any
experiments to see what was faster in your case?

Thanks,
Richard

On Apr 25, 1:17 pm, Alexander Trakhimenok
Reply all
Reply to author
Forward
0 new messages