Having thought about this a bit, I think I understand why
_LOCKED needs to be used with NDB memcache to keep the datastore and memcache in sync within a transaction. FYI, NDB checks local memory then memcache and then the datastore for an entity.
Suppose I implemented the naive approach mentioned above. I clear out all memcached entities affected at the start of a transaction and then after the transaction succeeds I repopulate memcache with the updated entities. This method would give stale data if memcache fails to repopulate at the end of the transaction.
For example, if I have an entity MyEntity{int_property: 0} and I want to increment int_property by 1 transactionally.
- Starting point: MyEntity{int_property: 0} is in memcache and the datastore.
- Transaction begins.
- Delete MyEntity{int_property: 0} from memcache.
- Get MyEntity{int_property: 0} from datastore.
- Put MyEntity{int_property: 0} to MyEntity{int_property: 1}
- Transaction succeeds.
- Place MyEntity{int_property: 1} into memcache.
What happens if between the start and end of the transaction, an external Get request repopulates memcache with MyEntity{int_property: 0}. That's fine because step 7 will overwrite that memcache entity when the transaction succeeds. However, what if step 7 fails? Everyone will be reading stale values (MyEntity{int_property: 0}) from memcache despite the transaction succeeding.
I imagine _LOCKED is used to prevent this from happening:
- Starting point: MyEntity{int_property: 0} is in memcache and the datastore.
- Transaction begins.
- Lock MyEntity{int_property: 0} key in memcache.
- Get MyEntity{int_property: 0} from datastore.
- Put MyEntity{int_property: 0} to MyEntity{int_property: 1}
- Transaction succeeds.
- Unlock MyEntity key from memcache and place MyEntity{int_property: 1} into it.
With this method a Get request external to the transaction will go straight to the datastore for its entity as it will see that memcache is locked. The important bit is that if the transaction succeeds in step 6 but memcache Set fails in step 7 we have no problem of data consistency as all Get requests will still see the memcache lock and use the underlying datastore. The unfortunate side effect is that memcache will be out of service for that entity however I see a
_LOCK_TIME variable which will timeout memcache after a reasonable period in order to put it back in action.
I know I could probably just use pdb and step through NDB but it is more fun to figure it out for myself. Anyone know if I am on the right track with this?
My motivation is that the Go SDK saves me money on instances but loses me money on datastore access. There are several Go libraries around but none of them seem to be as rigorous or useful as NDB. After having tried several of them I am back to using "appengine/datastore" and slowly crafting each datastore hotspot which is a pain and error prone.