NDB strategy for keeping caches in sync with the datastore

177 views
Skip to first unread message

Dan

unread,
Nov 6, 2013, 3:09:31 PM11/6/13
to google-a...@googlegroups.com
Would someone be able to explain to me the strategy that NDB uses to keep its memory, memcache and datastore entities in sync and consistent (especially during transactions)?

I can't quite figure out from the code what goes on.

For example, before the start of a transaction, does NDB delete memory and memcache entities that will be affected by the transaction and then repopulate them if the transaction succeeds?

I see reference to a _LOCKED value to lock memcache. What is this used for and what happens if the unlock operation fails?

Best wishes,
Dan

Dan

unread,
Nov 11, 2013, 10:25:24 AM11/11/13
to google-a...@googlegroups.com
Having thought about this a bit, I think I understand why _LOCKED needs to be used with NDB memcache to keep the datastore and memcache in sync within a transaction. FYI, NDB checks local memory then memcache and then the datastore for an entity.

Suppose I implemented the naive approach mentioned above. I clear out all memcached entities affected at the start of a transaction and then after the transaction succeeds I repopulate memcache with the updated entities. This method would give stale data if memcache fails to repopulate at the end of the transaction.

For example, if I have an entity MyEntity{int_property: 0} and I want to increment int_property by 1 transactionally.

  1. Starting point: MyEntity{int_property: 0}  is in memcache and the datastore.
  2. Transaction begins.
  3. Delete MyEntity{int_property: 0} from memcache.
  4. Get MyEntity{int_property: 0} from datastore.
  5. Put MyEntity{int_property: 0} to MyEntity{int_property: 1}
  6. Transaction succeeds.
  7. Place MyEntity{int_property: 1} into memcache.
What happens if between the start and end of the transaction, an external Get request repopulates memcache with MyEntity{int_property: 0}. That's fine because step 7 will overwrite that memcache entity when the transaction succeeds. However, what if step 7 fails? Everyone will be reading stale values (MyEntity{int_property: 0}) from memcache despite the transaction succeeding.

I imagine _LOCKED is used to prevent this from happening:
  1. Starting point: MyEntity{int_property: 0}  is in memcache and the datastore.
  2. Transaction begins.
  3. Lock MyEntity{int_property: 0} key in memcache.
  4. Get MyEntity{int_property: 0} from datastore.
  5. Put MyEntity{int_property: 0} to MyEntity{int_property: 1}
  6. Transaction succeeds.
  7. Unlock MyEntity key from memcache and place MyEntity{int_property: 1} into it.
With this method a Get request external to the transaction will go straight to the datastore for its entity as it will see that memcache is locked. The important bit is that if the transaction succeeds in step 6 but memcache Set fails in step 7 we have no problem of data consistency as all Get requests will still see the memcache lock and use the underlying datastore. The unfortunate side effect is that memcache will be out of service for that entity however I see a _LOCK_TIME variable which will timeout memcache after a reasonable period in order to put it back in action.

I know I could probably just use pdb and step through NDB but it is more fun to figure it out for myself. Anyone know if I am on the right track with this?

My motivation is that the Go SDK saves me money on instances but loses me money on datastore access. There are several Go libraries around but none of them seem to be as rigorous or useful as NDB. After having tried several of them I am back to using "appengine/datastore" and slowly crafting each datastore hotspot which is a pain and error prone.

Alex Burgel

unread,
Nov 11, 2013, 1:20:24 PM11/11/13
to google-a...@googlegroups.com
Thanks for writing this up. I had been trying to figure this out myself.

I think your reasoning is correct. There is also the case of deleting an entity from one client and another client causing it to be added back to memcache. _LOCKED should help in that case too. I think the key to all this is the timeout duration, because if you have one very slow client, it could still put old data into the cache.

I came across some ndb issues that talk about some of these issues:


Also, I found this Facebook post on how they replaced memcache. It also discusses similar issues:


--Alex

Dan

unread,
Nov 11, 2013, 2:51:09 PM11/11/13
to google-a...@googlegroups.com
Great references. I had not seen them before and they explain a lot. I was puzzled by why _LOCK_TIME was so long at 32 seconds but now I know it is to cater for the maximum datastore retry length of 30 seconds plus a little bit extra.

It looks like Guido had a difficult time of it before memcache compare and swap was available in the Python runtime. Thanks for the tips.
Reply all
Reply to author
Forward
0 new messages