Transaction usage

79 views
Skip to first unread message

Amir Naor

unread,
Jun 9, 2017, 12:40:53 AM6/9/17
to Google App Engine
I'm struggling with a data inconsistency issue (something that is supposed to be saved atomically is not) and it might got to do to with how I use transactions. 

Would love to clarify if my usage is flawed in any way:

Use case
Making changes in couple entities and need to ensure they are saved atomically. In other words, if one entity fails to save all entities should rollback the changes.

Environment
Java standard env using ofy()

Data model
User (Cached entity using ofy()), Task (child of User, each User can have multiple Task instances)

Usage:

1. load couple User and Task entities (User by key, Task(s) by query)
2. make changes to fetched entities from #1
3. Starts a transaction that only does a save() on all entities from #1 (all belong to the same group)

I'm expecting #3 to save all entities atomically but at times some entities still hold old values while others in the same transaction are not. 

Does the above usage make sense? or is there any requirement that #1+#2 will reside within the same transaction in #3?

Jeff Schnitzer

unread,
Jun 9, 2017, 1:33:48 AM6/9/17
to Google App Engine
If all you wanted to do was make sure that all writes happen together and you otherwise don’t care about data consistency, then sure. However, 9 times out of 10 when people ask this question, they’re making a terrible mistake and what they really want to do is load the entities in the transaction.

The problem with your code is that when you write in your transaction, you may be writing stale data. Someone else may have changed the underlying data in between your load and your commit. The only way to guarantee a consistent, serializable view of the order of operations is to load _and_ save the entity in a transaction. If you’re seeing old values for some entities and not others, perhaps you’re seeing the result of write conflicts?

Jeff


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/2657da22-f904-44eb-9d0e-a61826407226%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Amir Naor

unread,
Jun 9, 2017, 12:34:38 PM6/9/17
to Google App Engine, je...@infohazard.org
Thanks Jeff. It makes a lot of sense and probably is the cause for the issue.

Couple follow up questions:
  1. Is my following understanding re transactions seem right? 
    1. loads() within a transaction or without, will succeed even if another concurrent transaction is making changes to the same object concurrently. The loads will just read the latest fields stored in the db/cache. 
    2. A non-transactional save() will succeed even if another transaction make changes to the same object concurrently. Only concurrent save() by another transaction will fail.
  2. The reason I was trying to avoid having both load() and save() in one transaction is because I have some calculations that require loads of other entity groups (may go beyond the cross group limit). I read that I could use transactionless() for this purpose but not sure how this work:
    1. Does transactionless() work limited to loads() only? 
    2. Will loads() in transactionless() return the data values that exist before the transaction started? Meaning, if the transaction modified an entity before transactionless(), loads() in transactionless () will not include those changes?
  3. Is it possible to define a retry behavior for fail transition commits with ofy()? I'm looking to control the retries count.
Thanks!

On Thursday, June 8, 2017 at 10:33:48 PM UTC-7, Jeff Schnitzer wrote:
If all you wanted to do was make sure that all writes happen together and you otherwise don’t care about data consistency, then sure. However, 9 times out of 10 when people ask this question, they’re making a terrible mistake and what they really want to do is load the entities in the transaction.

The problem with your code is that when you write in your transaction, you may be writing stale data. Someone else may have changed the underlying data in between your load and your commit. The only way to guarantee a consistent, serializable view of the order of operations is to load _and_ save the entity in a transaction. If you’re seeing old values for some entities and not others, perhaps you’re seeing the result of write conflicts?

Jeff

On Thu, Jun 8, 2017 at 9:40 PM, Amir Naor <amir...@gmail.com> wrote:
I'm struggling with a data inconsistency issue (something that is supposed to be saved atomically is not) and it might got to do to with how I use transactions. 

Would love to clarify if my usage is flawed in any way:

Use case
Making changes in couple entities and need to ensure they are saved atomically. In other words, if one entity fails to save all entities should rollback the changes.

Environment
Java standard env using ofy()

Data model
User (Cached entity using ofy()), Task (child of User, each User can have multiple Task instances)

Usage:

1. load couple User and Task entities (User by key, Task(s) by query)
2. make changes to fetched entities from #1
3. Starts a transaction that only does a save() on all entities from #1 (all belong to the same group)

I'm expecting #3 to save all entities atomically but at times some entities still hold old values while others in the same transaction are not. 

Does the above usage make sense? or is there any requirement that #1+#2 will reside within the same transaction in #3?

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.

Jeff Schnitzer

unread,
Jun 10, 2017, 3:06:37 AM6/10/17
to Google App Engine
This is sort of a fundamental database question that isn’t directly related to Objectify or the datastore, but I’ll have at it:

If you want a series of operations to occur as if they are executed in serial, you need to use transactions and you need *each* operation to be wrapped in a transaction. If you do this, the database (datastore, postgres*, oracle*, etc) will ensure that the world works as if each transaction was executed in serial - even if it wasn’t. The datastore has an “optimistic concurrency model” which means that instead of locking and blocking, it lets each transaction execute in full and detects collisions. Objectify (and, fwiw, GAE/Python transactions) automatically retries transactions with collisions. If you have massive contention around a single piece of data, you’ll see timeouts as various transactions retry beyond reasonable bounds - just like as in with pessimistic concurrency models you’ll see timeouts waiting for locks to be acquired. The difference is that locks are expensive and potentially cause deadlocks; retries just go until a fixed timeout.

You can try to cheat the system by loading data outside of a transaction, and sometimes this is appropriate. If you didn’t really understand that last paragraph at a deep and intuitive level, DO NOT attempt to cheat the system, you’re probably doing it wrong.

Ok to answer your questions specifically:

* A load() is just a load(). In a transaction you skip the cache. It loads the current state of the entity. The loaded value is *ALWAYS* potentially stale, even 0.000001s after executing (hey, it’s a live system). The only reason you can count on a loaded value being consistent is when you load it in a transaction — when that txn commits, it will either write (yeay) or retry (if something else modified it in the mean time).

* A “non transactional save” is the same as a transaction that does only the save. It will retry until success. If you didn’t load the value inside the transaction, that means the unit of work is “save exactly this value no matter what, overwriting whatever was or wasn’t in the datastore beforehand and ignore any other transactions in progress”.

* Outside of a transaction, you cannot trust the value of any data. It can *always* be written by some other operation 0.00000001s after the read executes. If you care about consistency and order of operations, you need transactions and carefully defined units of work. This is the same whether you are using the datastore or any other kind of database.

Suerte,
Jeff

* Assuming you have the database in SERIALIZABLE mode

To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.

Amir Naor

unread,
Jun 11, 2017, 2:43:30 PM6/11/17
to Google App Engine, je...@infohazard.org
Super helpful. I took that context, read further and implemented accordantly. Your clarification about optimistic concurrency model helped solve another issue i was experiencing. Everything seem to work perfectly now - thanks Jeff!
Reply all
Reply to author
Forward
0 new messages