Is Datastore lookup-by-key strongly consistent when write is done in Python module and read is done in Java module?

137 views
Skip to first unread message

Jonathan Munson

unread,
Oct 31, 2016, 9:41:07 AM10/31/16
to Google App Engine
Hi,

We have an app wherein a Datastore entity is written to by a Python module (using NDB), and then in an immediately following request, read from, using the entity's key, by a Java module. In a situation where the app was heavily loaded, it seemed like the Java module was reading stale data. The Datastore documentation says that lookup-by-key requests are strongly consistent, but does that apply even when writes are from a Python module and reads are from a Java module?

Thanks,

--Jon

Chad Vincent

unread,
Oct 31, 2016, 3:05:06 PM10/31/16
to Google App Engine
Yes.  Consistency is handled by the database layer, not the application runtime.

HOWEVER, if you are caching entities (Objectify, etc.) you either need to ensure your Memcache keys are identical or disable caching for those entities.  Otherwise the Java cache may return a stale result because the Python module didn't clear/update the entity on write.

Chad Vincent

unread,
Oct 31, 2016, 3:07:43 PM10/31/16
to Google App Engine
Also, make sure you aren't doing deferred writes or starting the Java request before your Python transaction closes.

Jonathan Munson

unread,
Oct 31, 2016, 5:01:21 PM10/31/16
to Google App Engine
Thanks, Chad. Re your comment about caching, I am using NDB on the Python side, which I thought used memcache automatically. So I guess I am using caching, but I thought that modules in the same project (mine are) shared memcache. Is that not true if one module is Python and the other Java?

Jeff Schnitzer

unread,
Oct 31, 2016, 8:56:18 PM10/31/16
to Google App Engine
There is no standard way of storing entities in memcache. Objectify uses its own namespace and uses the string version of Keys as the cache key. I don’t know what NDB does.

Cache invalidation is already a hard problem (that and naming things, as they say). If you want to access data from both python and java, best to disable the memcache behavior of both NDB and Objectify (don’t put @Cache on anything shared).

Jeff

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/3d0c204d-f2d4-4cb7-8db0-26c9db942e6a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jonathan Munson

unread,
Oct 31, 2016, 9:33:19 PM10/31/16
to Google App Engine, je...@infohazard.org
I now think that caching shouldn't be an issue. When I write an entity on the Python side, NDB invalidates the entity's cache entry (according to the docs), forcing (strongly consistent) reads to get the value from the Datastore service. So a lookup-by-key read from the Java side should get the most recently written value, according to Chad.

I think I'm going to have to put a version counter on the entity, which I'll increment on the Python side, then pass to the Java module so it can check the value of it when it gets the entity from the Datastore. "Trust but verify."

--Jon


On Monday, October 31, 2016 at 8:56:18 PM UTC-4, Jeff Schnitzer wrote:
There is no standard way of storing entities in memcache. Objectify uses its own namespace and uses the string version of Keys as the cache key. I don’t know what NDB does.

Cache invalidation is already a hard problem (that and naming things, as they say). If you want to access data from both python and java, best to disable the memcache behavior of both NDB and Objectify (don’t put @Cache on anything shared).

Jeff
On Mon, Oct 31, 2016 at 2:01 PM, Jonathan Munson <jpmu...@gmail.com> wrote:
Thanks, Chad. Re your comment about caching, I am using NDB on the Python side, which I thought used memcache automatically. So I guess I am using caching, but I thought that modules in the same project (mine are) shared memcache. Is that not true if one module is Python and the other Java?


On Monday, October 31, 2016 at 3:07:43 PM UTC-4, Chad Vincent wrote:
Also, make sure you aren't doing deferred writes or starting the Java request before your Python transaction closes.

On Monday, October 31, 2016 at 2:05:06 PM UTC-5, Chad Vincent wrote:
Yes.  Consistency is handled by the database layer, not the application runtime.

HOWEVER, if you are caching entities (Objectify, etc.) you either need to ensure your Memcache keys are identical or disable caching for those entities.  Otherwise the Java cache may return a stale result because the Python module didn't clear/update the entity on write.

On Monday, October 31, 2016 at 8:41:07 AM UTC-5, Jonathan Munson wrote:
Hi,

We have an app wherein a Datastore entity is written to by a Python module (using NDB), and then in an immediately following request, read from, using the entity's key, by a Java module. In a situation where the app was heavily loaded, it seemed like the Java module was reading stale data. The Datastore documentation says that lookup-by-key requests are strongly consistent, but does that apply even when writes are from a Python module and reads are from a Java module?

Thanks,

--Jon

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.

Chad Vincent

unread,
Oct 31, 2016, 11:40:18 PM10/31/16
to Google App Engine, je...@infohazard.org
> When I write an entity on the Python side, NDB invalidates the entity's cache entry (according to the docs),

I think you missed the key part of Jeff's response.  Objectify and NDB do not use the same entries in Memcache.  So while NDB on Python invalidated *its* cache entry, Objectify used a different namespace and thus the Objectify cached entry is still there.

Chad Vincent

unread,
Oct 31, 2016, 11:47:12 PM10/31/16
to Google App Engine, je...@infohazard.org
Just to be 100% clear (sorry I keep double-posting, it's a bad habit):

Java and Python modules in the same app share Memcache, which is just a giant Key-Object map.

Objectify uses different Memcache keys than NDB.  I don't remember Ofy's format, and can't find NDB's, so let's just say [ofy-agR0ZXN0cgkLEgNGb28YGQw] vs. [ndb-agR0ZXN0cgkLEgNGb28YGQw].  So when you do an NDB write, [ndb-agR0ZXN0cgkLEgNGb28YGQw] gets invalidated and [ofy-agR0ZXN0cgkLEgNGb28YGQw] is stale.

You need to either add a wrapper to NDB to ensure you're invalidating the Objectify cache, too, or disable *at least* Objectify's caching for that entity type.

Jonathan Munson

unread,
Oct 31, 2016, 11:54:28 PM10/31/16
to Google App Engine, je...@infohazard.org
I don't believe I use Objectify, unless it's used transparently by the Java Datastore API. Here's a snippet of code that shows how I use the API:

DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
Key key = KeyFactory.stringToKey(serKey);
try {
Entity entity = datastore.get(key);

Is there any caching involved here, on the Java side? If there is, then what you and Jeff say is definitely the problem.

Chad Vincent

unread,
Nov 1, 2016, 1:37:49 PM11/1/16
to Google App Engine, je...@infohazard.org
Then no, you aren't caching on the Java side.

Are you (or NDB) using transactions?  If so, then make sure you schedule your hand-off to Java after the transaction has committed?

Jonathan Munson

unread,
Nov 1, 2016, 5:12:12 PM11/1/16
to Google App Engine, je...@infohazard.org
Nope, not using transactions. On the Python side, we read the entity, update a few of its properties, put(), then return to the browser, which then invokes a request on a Java servlet, which reads the entity using lookup-by-key.

Thanks Chad, thanks Jeff. I'm going to do that versioning scheme I mentioned above, put some load on it, and make sure I'm always getting the same version.

--Jon

pdknsk

unread,
Nov 1, 2016, 6:13:13 PM11/1/16
to Google App Engine
If more than one user can update the same entity, the bug may be that you're not updating the entity atomically (as in a transaction).

Jonathan Munson

unread,
Nov 1, 2016, 8:42:42 PM11/1/16
to Google App Engine
Good thought, but in this case only one user can update the entity.

Jeff Schnitzer

unread,
Nov 1, 2016, 11:58:38 PM11/1/16
to Google App Engine
Users are clever and insidious when it comes to breaking software. If you aren’t using a transaction in a get/update/put cycle, there are all manner of ways that updates could get screwed up or lost. Consider that requests might be sitting for many seconds at a cold start and therefore come in out of order… users may click buttons many times, launching multiple ajax requests… and since you only see this under heavy traffic, you’re probably seeing some weird 0.01% edge behavior.

Transactions are the only way to guarantee that get/update/put cycles have the effect you think they do. This is the first thing I would fix before trying anything else.

Jeff

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.

Jonathan Munson

unread,
Nov 2, 2016, 8:01:20 PM11/2/16
to Google App Engine, je...@infohazard.org
I'll have to think more about this. At the moment I can't see how to apply transactions. Here is the sequence of operations:

- Browser sends update request to Python module, including key of entity to update.
- Python module performs update using entity key, and sends response.
- Upon receiving response, browser sends request to Java module, including key of entity to do processing on.
- Java module reads entity using entity key, does processing.

But I think you are on to something with the multiple Ajax requests. I'm disabling the button until a request returns, and staring to use an update-sequence-number.

On Tuesday, November 1, 2016 at 11:58:38 PM UTC-4, Jeff Schnitzer wrote:
Users are clever and insidious when it comes to breaking software. If you aren’t using a transaction in a get/update/put cycle, there are all manner of ways that updates could get screwed up or lost. Consider that requests might be sitting for many seconds at a cold start and therefore come in out of order… users may click buttons many times, launching multiple ajax requests… and since you only see this under heavy traffic, you’re probably seeing some weird 0.01% edge behavior.

Transactions are the only way to guarantee that get/update/put cycles have the effect you think they do. This is the first thing I would fix before trying anything else.

Jeff
On Tue, Nov 1, 2016 at 5:42 PM, Jonathan Munson <jpmu...@gmail.com> wrote:
Good thought, but in this case only one user can update the entity.

On Tuesday, November 1, 2016 at 6:13:13 PM UTC-4, pdknsk wrote:
If more than one user can update the same entity, the bug may be that you're not updating the entity atomically (as in a transaction).

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.

pdknsk

unread,
Nov 2, 2016, 8:52:17 PM11/2/16
to Google App Engine, je...@infohazard.org
> At the moment I can't see how to apply transactions.

It depends on whether update means "read, change value, write back" or "overwrite with new value without reading".

Jonathan Munson

unread,
Nov 2, 2016, 9:19:36 PM11/2/16
to Google App Engine, je...@infohazard.org
In this case, it means "overwrite with new value without reading". 
Reply all
Reply to author
Forward
0 new messages