caching and in-transaction queries

102 views
Skip to first unread message

Amy

unread,
Jan 10, 2012, 4:48:03 AM1/10/12
to appengine-ndb-discuss
hi,

I'm curious about something I'm seeing with in-transaction object
caching, and want to check that it's the intended behaviour.

More detailed code is in this gist: https://gist.github.com/8bc6bdeb8c4734b54bc9

Suppose we have two models, Game and Participant, where participant
entities are children of a game entity, and there is a method of Game,
participants(), to fetch all participants of a game via an ancestor
query on Participant:

class Game(model.Model):

def participants(self):
return Participant.query(
ancestor=self.key).fetch()

If from within a transaction, I call game.participants(), modify the
returned participant objects, 'put' them, then -- still in the
transaction-- call game.participants() again, the second call does not
return the modified objects.

On the other hand, if in the same transaction I fetch the participant
objects directly by key (after modification and a 'put'), the fetched
objects *do* reflect the modification. So presumably they are being
fetched from the cache by key.
The TestHandler.get() method in the gist (https://gist.github.com/
8bc6bdeb8c4734b54bc9) steps through these tests.

This seems a bit inconsistent, as caching is being used in-transaction
in one case but not the other. However, this might be how it is
intended to work? I know that normally (without caching), queries
made in a transaction do access the pre-transaction state.

--Amy

Guido van Rossum

unread,
Jan 10, 2012, 10:28:26 AM1/10/12
to appengine-...@googlegroups.com, Google App Engine
Let me widen the scope of this question, since I'm not convinced that
this is NDB-specific (though I'm also not convinced that it isn't).
First, can you rerun the tests with NDB's caching turned off? INSIDE
tx(), do

ctx = tasklets.get_context()
ctx.set_cache_policy(False)
ctx.set_memcache_policy(False)

(You also have to add from ndb import tasklets to the top.)

Let me know if that changes the outcome, and how. If it does, it is an
NDB issue; if it doesn't, it is not specific to NDB. If it isn't an
NDB issue, there are all sorts of subtle semantics associated with
transactions that I can't quite enumerate, and it could well be that
ancestor queries inside transactions always query the state at the
start of the transaction.

--Guido

--
--Guido van Rossum (python.org/~guido)

Stephen Lewis

unread,
Jan 10, 2012, 1:57:46 PM1/10/12
to appengine-...@googlegroups.com
[Apologies for the multipost - I wasn't a member of this group when I posted my reply in the App Engine group]

Under the heading "Isolation and Consistency" in



it says "Unlike with most databases, queries and gets inside a datastore transaction do not see the results of previous writes inside that transaction.". Unless I've misunderstood the question, I think this is the behaviour you're seeing.

Stephen

Stephen Lewis

unread,
Jan 10, 2012, 4:52:55 PM1/10/12
to appengine-...@googlegroups.com
Now I realise that I just restated what Amy already said at the bottom of her original post. Sorry about the noise - I'll read more carefully in future!

Stephen

Guido van Rossum

unread,
Jan 10, 2012, 5:36:35 PM1/10/12
to appengine-...@googlegroups.com, a...@infosleuth.net
Ah. And you make me realize that I didn't quite get that part either. So, this does sound like it's being caused by the NDB cache. (Still, to be sure, I'd like to hear the outcome of the experiment I proposed earlier.)

--Guido


On Tue, Jan 10, 2012 at 13:52, Stephen Lewis <step...@yourgolftravel.com> wrote:
Now I realise that I just restated what Amy already said at the bottom of her original post. Sorry about the noise - I'll read more carefully in future!

Stephen



Amy

unread,
Jan 10, 2012, 6:27:49 PM1/10/12
to Guido van Rossum, appengine-...@googlegroups.com
On Wed, Jan 11, 2012 at 9:36 AM, Guido van Rossum <gu...@google.com> wrote:
> Ah. And you make me realize that I didn't quite get that part either. So,
> this does sound like it's being caused by the NDB cache. (Still, to be sure,
> I'd like to hear the outcome of the experiment I proposed earlier.)

The outcome: Interestingly, with caching off, the direct fetch by key
does *not* reflect the in-txn modifications either (in contrast to the
cache-enabled behaviour, where it did). So the inconsistency does
seem to be cache-related.

Guido van Rossum

unread,
Jan 10, 2012, 6:35:50 PM1/10/12
to Amy, appengine-...@googlegroups.com
So, the question is, what do we want to happen here? Maybe the cache should just be turned off? (Looking at the code, memcache is already ignored on get(), but the in-context cache is explicitly kept on.) Let me know and I'll file an issue.

Guido van Rossum

unread,
Jan 10, 2012, 6:57:21 PM1/10/12
to Amy, appengine-...@googlegroups.com
OTOH it would be somewhat sad if the in-context cache was disabled inside transactions, since it can provide some some good optimizations (e.g. the prefetch pattern in NDB depends on it). Maybe we just need to warn about this in the docs?

Amy

unread,
Jan 10, 2012, 7:26:57 PM1/10/12
to Guido van Rossum, appengine-...@googlegroups.com
On Wed, Jan 11, 2012 at 10:57 AM, Guido van Rossum <gu...@google.com> wrote:
> OTOH it would be somewhat sad if the in-context cache was disabled inside
> transactions, since it can provide some some good optimizations (e.g. the
> prefetch pattern in NDB depends on it). Maybe we just need to warn about
> this in the docs?

Yes- just documenting this makes sense to me too. I think the
distinction would be clear.

Tijmen Roberti

unread,
Jan 11, 2012, 3:53:43 AM1/11/12
to appengine-...@googlegroups.com
So what do I have to do if I want to get instances from the cache with a ancestor query during a transaction? I suppose doing a keys-only ancestor query and then fetching by key would return in the instances from the cache, but it feels a bit of a workaround.

The instance cache during transactions is really useful (saves me from having to write it myself), but the behavior of the ancestor query is not what I would expect. I also expected that modified instances would be returned.

Tijmen

Guido van Rossum

unread,
Jan 11, 2012, 11:41:19 AM1/11/12
to appengine-...@googlegroups.com
On Wed, Jan 11, 2012 at 00:53, Tijmen Roberti <trob...@gmail.com> wrote:
So what do I have to do if I want to get instances from the cache with a ancestor query during a transaction? I suppose doing a keys-only ancestor query and then fetching by key would return in the instances from the cache, but it feels a bit of a workaround.

That wouldn't quite work, because the set of keys returned would be computed by the datastore without knowledge of what's in your cache or what you have written since the transaction started. If that doesn't matter (e.g. you're not updating properties that figure in the queries), it's easy enough to code:

results = ndb.get_multi(ModelClass.query(<filters>).fetch(keys_only=True))

There's no other way, NDB doesn't actually cache queries for you -- it's too hard to figure out which cached query results should be invalidated due to a given write.

The instance cache during transactions is really useful (saves me from having to write it myself), but the behavior of the ancestor query is not what I  would expect. I also expected that modified instances would be returned.

Funnily, an earlier version of NDB replaced the query result with a result from the in-context cache, but this was found to cause other problems -- see http://code.google.com/p/appengine-ndb-experiment/issues/detail?id=117 .
Reply all
Reply to author
Forward
0 new messages