How long a query results takes to be updated? HR Datastore

101 views
Skip to first unread message

Leandro Rezende

unread,
Feb 14, 2012, 8:28:56 AM2/14/12
to google-a...@googlegroups.com
I have the follow situation, 

in the same function i will "Get PM, Begin Transaction,GetObjectByID, Update Values, Make Persistent, Commit Transaction, Close PM" and then make a Query.

Sometimes the Query returns the value updated, sometimes not. But if i GetObjectByID the value is always updated.

So, is this  common?


Mahron

unread,
Feb 14, 2012, 8:49:18 AM2/14/12
to Google App Engine
Yes, its one of the limitations of app engine. Indexes have a lag. It
is simulated in the dev server. A query after a put will sometime fail
as the index is not up to date.

But I am not aware of a situation where the entity itself, when
returned, is not up to date.

Leandro Rezende

unread,
Feb 14, 2012, 11:36:05 AM2/14/12
to google-a...@googlegroups.com
Thx for the answer Mahron,

Can u confirm to me if when use "GetObjectbyID" it always return the Updated result? or it can lag too...



2012/2/14 Mahron <gan...@xehon.com>

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.


Mahron

unread,
Feb 14, 2012, 12:15:29 PM2/14/12
to Google App Engine
get_by_id or GetObjectbyID are supposed to always be consistent.
Basically all Items returned by query or id are up to date. If it is
not, there is a serious problem.

"Sometimes the Query returns the value updated, sometimes not." By
that I suppose it does not return the entity at all. If it returns a
version different than after the last Put(), then I am pretty sure
that's a bug.

Jeff Schnitzer

unread,
Feb 14, 2012, 12:57:49 PM2/14/12
to google-a...@googlegroups.com
Officially, there is no guaranteed upper bound for queries to become consistent.  In practice it's usually a second or two, but infrastructure failures could in theory extend the period to minutes or hours or (gasp) days.

One thing worth noting:  If you access the datastore in ReadPolicy.Consistency.EVENTUAL mode, get-by-id will be eventually consistent just like queries.  And it's *dramatically* faster, ~1/5 latency in my recent experiments.

Jeff

Leandro Rezende

unread,
Feb 14, 2012, 1:51:44 PM2/14/12
to google-a...@googlegroups.com
im confused now =(

so, even  GetObjectById can return Old Data?

Jeff said "If you access the datastore in ReadPolicy.Consistency.EVENTUAL mode...",

Is this something i have to "Set"? to all access become  "ReadPolicy.Consistency.EVENTUAL" or its just the way im coding..

Thanks again.


2012/2/14 Jeff Schnitzer <je...@infohazard.org>

Robert Kluin

unread,
Feb 15, 2012, 12:00:24 AM2/15/12
to google-a...@googlegroups.com
You have to explicitly set eventually consistent to get that behavior.

Note that ancestor queries are also strongly consistent. This is
detailed in the docs:
http://code.google.com/appengine/docs/java/datastore/hr/#Data_Storage_Options_Compared


Robert

Leandro Rezende

unread,
Feb 15, 2012, 6:50:18 AM2/15/12
to google-a...@googlegroups.com
=)
I feel relieved now, to know that i can trust on Gets
thx for the "Docs" link Robert

2012/2/15 Robert Kluin <robert...@gmail.com>

stephenp

unread,
Feb 15, 2012, 8:14:43 AM2/15/12
to google-a...@googlegroups.com
I have been confused by this lately as well. The Java documentation says:

Queries and gets inside a transaction are guaranteed to see a single, consistent snapshot of the Datastore as of the beginning of the transaction. Entities and index rows in the transaction's entity group are fully updated so that queries return the complete, correct set of result entities, without the false positives or false negatives described in Transaction Isolation that can occur in queries outside of transactions.

The "without the false positives or false negatives" comment led me to believe that as long as I did my query within a transaction I would see results that were previously added within a different transaction (except in the case of an unapplied write - which requires manual admin intervention). For example, in my app I need to do some post-processing after a new item is added. I do this by enqueuing a transactional task. That task runs a query that should return the item previously added. Is there no way to ensure my query will include the item - e.g. avoid the false negative. If not, any suggestions on how to implement this sort of post processing once the item's index do get updated? My post processing queries lots of different data and stitches together a bunch of statistics and statistical summaries.

Thanks,

Stephen




Leandro Rezende

unread,
Feb 15, 2012, 12:43:39 PM2/15/12
to google-a...@googlegroups.com
Stephen

I guess we have to "Query with ID only" and then GET by ID each Item in a loop

2012/2/15 stephenp <slp...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/w-_g6OeGwFcJ.

Jeff Schnitzer

unread,
Feb 15, 2012, 2:17:57 PM2/15/12
to google-a...@googlegroups.com
On Wed, Feb 15, 2012 at 8:14 AM, stephenp <slp...@gmail.com> wrote:

The "without the false positives or false negatives" comment led me to believe that as long as I did my query within a transaction I would see results that were previously added within a different transaction (except in the case of an unapplied write - which requires manual admin intervention). For example, in my app I need to do some post-processing after a new item is added. I do this by enqueuing a transactional task. That task runs a query that should return the item previously added. Is there no way to ensure my query will include the item - e.g. avoid the false negative. If not, any suggestions on how to implement this sort of post processing once the item's index do get updated? My post processing queries lots of different data and stitches together a bunch of statistics and statistical summaries.

Queries inside of transactions must be ancestor queries.  GAE won't let you run a global query in a transaction.

I'm not sure, but it sounds like you're confusing "enlisting the add-task operation in your current transaction" with "run a task within a transaction".  When your task runs, there is no transaction unless you create one.

In answer to your general question, which I'll roughly summarize as "how do you model and query data to minimize the impact of eventuality on your business?", the answer is "very carefully".  Pick and choose your entity group structures carefully.  Allow for eventuality when you can.  In the handful of places where you need exact numbers, use shareded counters and memcache CAS operations for performance.

But usually when you're talking about statistics you aren't worried about realtime operations, so why do you care?  Just let eventuality rule.  Statistics are old milliseconds after you generate them anyways; does it matter if the stats are tens of seconds of eventually consistent?  Obviously in some situations it can be, but it's certainly worth re-examining your real requirements.  Anything "realtime" is 100X more effort no matter what platform you're on.

Jeff

stephenp

unread,
Feb 15, 2012, 4:16:17 PM2/15/12
to google-a...@googlegroups.com
Jeff,

Thanks for the reply! You've really, really helped. I understand that a task will have to create a transaction and that it doesn't run in the context of the transaction that may have enqueued it. A re-phrasing of my question: 
Request 1 -> txn.begin() -> add entity "statistic" to "account"  -> commit() ... where the commit enqueues the next request/task
Request 2 (task) -> txn.begin() -> query "account" for all "statistics" -> txn.commit()  

It sounds like what you're saying is that as long as "account" and "statistic" are in the same entity group, then I'll see the added person. If they're not in the same entity group, then it'll be eventually consistent, right? In our case, we're using JDO and the relationship between statistic and account is modeled as a "Key" field, but they aren't in the same entity group. In "Request 2" we simply query by kind "statistic" for all statistics with a filter for the current account. We don't get any error like "ancestor required in transaction".

Last question: does the use of the transaction in Request 2 make any difference? 

As for our app, you can think of it as a finance app with credits and debits and statistics like "total spent", "average spent", etc. And we roll up stats so we have a hierarchy of summaries for different periods: weeks, months, years, all years. We currently kick off tasks to recalculate stat summaries whenever a credit or debit is added. Call it "near-realtime". When the user navigates to the "all-years" page we've likely recalculated the hierarchy of stat summaries. We don't really want to wait for them to request this page to go try and calculate all the summaries because it could be a long wait. I also wasn't too fond of having a cron job every minute looking for changes, but maybe that's what I need to do.

Again, thanks. 

Stephen
  



 


Jeff Schnitzer

unread,
Feb 15, 2012, 4:36:53 PM2/15/12
to google-a...@googlegroups.com
On Wed, Feb 15, 2012 at 4:16 PM, stephenp <slp...@gmail.com> wrote:
Jeff,

Thanks for the reply! You've really, really helped. I understand that a task will have to create a transaction and that it doesn't run in the context of the transaction that may have enqueued it. A re-phrasing of my question: 
Request 1 -> txn.begin() -> add entity "statistic" to "account"  -> commit() ... where the commit enqueues the next request/task
Request 2 (task) -> txn.begin() -> query "account" for all "statistics" -> txn.commit()  

It sounds like what you're saying is that as long as "account" and "statistic" are in the same entity group, then I'll see the added person. If they're not in the same entity group, then it'll be eventually consistent, right? In our case, we're using JDO and the relationship between statistic and account is modeled as a "Key" field, but they aren't in the same entity group. In "Request 2" we simply query by kind "statistic" for all statistics with a filter for the current account. We don't get any error like "ancestor required in transaction".

Sounds like you're on the right track.  If you put account-related statistics in the same entity group as the account entity, you can always do an ancestor query (strongly consistent) to get statistics for the account.  This is why it's very important to pick and choose entity group structures on the HRD.  It's much more important than it was back in the M/S datastore.

BTW there is another "trick" here which you might find useful someday (probably not here).  Let's say you have two entities, A and B in different entity groups.  Since get-by-key is strongly consistent, if you store a Key<B> in A, you can always get a strongly consistent view of B when you have an A.

Last question: does the use of the transaction in Request 2 make any difference? 

None whatsoever, except that it's probably slightly slower (extra rpcs to set up the txn).
 
As for our app, you can think of it as a finance app with credits and debits and statistics like "total spent", "average spent", etc. And we roll up stats so we have a hierarchy of summaries for different periods: weeks, months, years, all years. We currently kick off tasks to recalculate stat summaries whenever a credit or debit is added. Call it "near-realtime". When the user navigates to the "all-years" page we've likely recalculated the hierarchy of stat summaries. We don't really want to wait for them to request this page to go try and calculate all the summaries because it could be a long wait. I also wasn't too fond of having a cron job every minute looking for changes, but maybe that's what I need to do.

Assuming you generate the statistics for each account linearly (ie, not firing off a dozen parallel tasks to add up the data for a single account), it sounds like putting statistics in the same entity group as Account is the ideal solution.

Jeff

stephenp

unread,
Feb 15, 2012, 5:49:25 PM2/15/12
to google-a...@googlegroups.com
Jeff,

Again, thank you. BTW, I just noticed this in the docs:
An app can perform a query during a transaction, but only if it includes an ancestor filter. (You can actually perform a query without an ancestor filter, but the results won't reflect any particular transactionally consistent state). 

That parenthetical is kind of important :) You won't get an error or warning, but you'll quietly lose consistency. I went looking for this because you kept saying "you can only do ancestor queries in a transaction" and I knew I was doing queries in a transaction without them - glad I found it.

I feel like I am keeping you from real work :) I do appreciate it.

Stephen

Jeff Schnitzer

unread,
Feb 15, 2012, 9:44:41 PM2/15/12
to google-a...@googlegroups.com
On Wed, Feb 15, 2012 at 5:49 PM, stephenp <slp...@gmail.com> wrote:

I feel like I am keeping you from real work :) I do appreciate it.

I'll send you my bill later ;-)

Glad you're getting it worked out.

Jeff
Reply all
Reply to author
Forward
0 new messages