Creating Unique Entities

196 views
Skip to first unread message

Francis Stephens

unread,
May 13, 2015, 8:59:57 AM5/13/15
to google-a...@googlegroups.com
We have an issue where we want to lazily create an entity if it does not exist. There is some discussion going on about how to do this and I would like to clarify some things around app engine transactions. I will limit my query to single entity group transactions.

I am using Go in my examples, but I hope the code is clear enough for non-Go programmers.

My understanding is that a transaction, on a single entity group, will succeed only if the entity group is not modified externally during the transaction. The 'entity group timestamp' indicating when an entity group was changed is stored in the root entity of the entity group. So during a transaction the current 'entity group timestamp' is read and the transaction can only succeed if it hasn't changed by the end of the transaction.

 key := datastore.NewKey(c, "Counter", "mycounter", 0, nil)
 count
:= new(Counter)
 err
:= datastore.RunInTransaction(c, func(c appengine.Context) error {
   err
:= datastore.Get(c, key, count)
   
if err != nil && err != datastore.ErrNoSuchEntity {
     
return err
   
}
   count
.Count++
   _
, err = datastore.Put(c, key, count)
   
return err
 
}, nil)


In the example above (taken from https://cloud.google.com/appengine/docs/go/datastore/transactions) there are two non-error cases, I can see.

1: The Get succeeds and the 'entity group timestamp' on the counter can be used to ensure no other transactions update the counter during this transaction.
2: The Get fails with ErrNoSuchEntity and the Put is used to store the counter for the first time.

In the second case it is possible that another identical transaction is running. If both transactions' Get return ErrNoSuchEntity how does the datastore ensure that only one put succeeds? I would expect there to be no 'entity group timestamp' in the datastore to test against?

Does the transaction know that it needs to test for the non-existence of the counter in order for the Put and the entire transaction to succeed?

Is there a chance in this case for two transactions to succeed and for one Put to overwrite the other?

If there is documentation, or videos etc, around the mechanism that controls this I would love to read it.

Thanks in advance.

Francis Stephens

unread,
May 14, 2015, 8:49:15 AM5/14/15
to google-a...@googlegroups.com
It's unclear to me where I should be posting this question. I've posted it to stack overflow in case that is more appropriate, however the nature of the question goes against the SO guidelines.

Jeff Schnitzer

unread,
May 14, 2015, 8:35:10 PM5/14/15
to Google App Engine
I'm honestly having a hard time understanding your questions.

GAE transactions are pretty simple. When you start a transaction, each entity group (defined by the root key, whether or not an entity exists at that key) touched is enlisted in the transaction. When you commit a transaction, if any of those entity groups have been changed by anyone anywhere, your transaction rolls back (in Java you get ConcurrentModificationException).

Your questions would be easier to understand without bringing up implementation details like timestamps. The implementation is not particularly relevant to the behavior.

Jeff


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/39fce7cf-fa5a-464f-821e-0826f8fdb500%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Francis Stephens

unread,
May 15, 2015, 5:46:12 PM5/15/15
to google-a...@googlegroups.com
I appreciate that the question is difficult, and I'm sure it could be worded differently. However, the reason I have phrased it as I did was because I really need to clarify this one case. It's very important to me.

Perhaps it would be clearer if I try to address your description of an app engine transaction.

"When you start a transaction, each entity group (defined by the root key, whether or not an entity exists at that key) touched is enlisted in the transaction."

In the case I described the transaction is entered simultaneously by two instances.

In this scenario both Get(...) calls returns an ErrNoSuchEntity we know (or I believe) at that point that the key does not exist anywhere in the datastore, attached to an entity or not. So I would like to know how the entity group is enlisted in this case.

"When you commit a transaction, if any of those entity groups have been changed by anyone anywhere, your transaction rolls back"

When the two Put(...) calls are executed one of them must fail with an ErrConcurrentTransaction (or ConcurrentModificationException in Java). So the question I have is, where the information which would allow for one of the Put(...) calls to fail?

I have two reasons for asking for clarification on this point.

1: Data races are often very subtle, and I would like to clarify my understanding in this instance.
2: In the case given above a low probability data-race where the first calls might be over-written would be acceptable for most counters. That would be a very small counting loss and the chance of such a race, if it is possible at all, would have negligible impact. We don't have the same luxury, so I want be absolutely sure.

Thanks for your response.

Francis

--
You received this message because you are subscribed to a topic in the Google Groups "Google App Engine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine/NagJ97YExB0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengi...@googlegroups.com.

To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.

For more options, visit https://groups.google.com/d/optout.



--
Francis Stephens
Software Developer @ Belua

Francis Stephens

unread,
May 15, 2015, 6:06:38 PM5/15/15
to google-a...@googlegroups.com
Maybe this clarifies my question. The paragraph at bottom is a description of a transaction from Google. It refers to checking the 'last update time' of an entity group and then checking whether 'it has changed since our initial check'.

While I don't expect this short paragraph to describe all of the complexities of App Engine transactions it does lay out a simplistic implementation which does not cover the 'create if it doesn't exist' transaction given on the same page. I have read a number of other descriptions of App Engine transactions and none of them indicate that this case is covered. My expectation is that it is, in fact, covered and these transactions work. But I do need to be sure.

"When a transaction starts, App Engine uses 
optimistic concurrency control by checking the last update time for the entity groups used in the transaction. Upon commiting a transaction for the entity groups, App Engine again checks the last update time for the entity groups used in the transaction. If it has changed since our initial check, an error is returned."


Francis

Jeff Schnitzer

unread,
May 17, 2015, 1:28:50 AM5/17/15
to Google App Engine
This is really weird. Actually, it's pretty unique in my experience.

You clearly speak english well, and yet - even after two rounds - I have absolutely no idea what you are asking. While I don't pretend to be the smartest person in the universe, I suspect that nobody else has any idea what you are asking either.

Let's take a step back for a second. Stop worrying about put() or any other kind of operation. There's only one operation that matters in GAE-land or any other kind of optimistic-concurrency-land: commit().

You can do all kinds of crazy shit in a transaction but it only hits the fan when you try to commit it. If someone else was making sweet sweet love to your EG while you were away, you get an optimistic concurrency failure - ConcurrentModificationException in Java, some kind of annoying return value if you're in Go. The right answer in either case is to retry your idempotent (!) transaction. Eventually, no matter how many retries later, your txn will succeed and you will ride off into the sunset with the transaction of appropriate sex.

There is absolutely nothing unique or novel about GAE transactions. Just search the internet for Optimistic Concurrency Control; GAE is not special in this regard.

Jeff


Dan Dubois

unread,
May 17, 2015, 9:26:33 AM5/17/15
to google-a...@googlegroups.com
Hi Francis,

I think I understand exactly what you mean and it's an interesting edge case when designing the datastore transaction system.

Somehow within a transaction it needs to record that the 'put' method either expects the entity it is saving to already exist or not exists. What's more this logic needs to kick in only if earlier in the transaction a 'get' bothered to detect if the entity existed in the first place!

I hope the designers of the datastore considered this and am sure they have as I have not seen anything to suggest transactional integrity has ever been broken in my apps. I guess you are looking for confirmation though as the way the datastore's internals are described in various don't suggest the edge case is catered for.

Maybe you could write a test and run it a reasonable number of times to see if atomicity guarantees break. I would be interested in seeing the results.

Dan

Nick

unread,
May 17, 2015, 5:39:53 PM5/17/15
to google-a...@googlegroups.com
1. Transactional units are 'Entity Groups', not 'Entities'. Existence of entities is irrelevant, just whether or not there is a concurrent change to the group.
2. Entity groups are represented by keys, and nothing else. You might have an entity stored with that key, but that is not meaningful except to say that entity belongs in that group for transactional purposes.
3. Transactions are almost more definately nuanced than 'last updated time stamp' - don't worry about how they work, you just need to trust they do. This is true of every persistence software youve ever used.
4. To address your specific question, whether an entity does or does not exist can be tested and acted on in the confines of a transactional boundary. If your transaction fails, when you rerun it you will need to test the existence again (hence idempotent comment from Jeff).

5. If you are going to write a test to verify this behavior, be aware that you'll need to deploy that to ensure you're testing the right things. The dev servers behave differently at a low level to the actual datastore. I think you will struggle to test this behavior until you fully understand how transactions work, though. False positives, etc. Read the docs again.

I hope that helps,

Nick

unread,
May 17, 2015, 5:42:36 PM5/17/15
to google-a...@googlegroups.com
Apologies, forgot one important item:

6. The datastore uses a restful interface. A put always puts the entity, regardless of whether it already exists or not. If you use a deterministic key the behavior you seek will just happen. It will create an entity if none exists, or update the existing one if it does.

Francis Stephens

unread,
May 17, 2015, 6:53:53 PM5/17/15
to google-a...@googlegroups.com
Dan,

That is a very good summary of what I am looking for. I will write some tests. Your description of the problem is spot on. Much clearer than anything I've written :)

I also hope that the designers of the datastore thought of this. I fully expect that they have. I just need to confirm it.

F



--
You received this message because you are subscribed to a topic in the Google Groups "Google App Engine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine/NagJ97YExB0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.

For more options, visit https://groups.google.com/d/optout.

Francis Stephens

unread,
May 17, 2015, 6:56:35 PM5/17/15
to google-a...@googlegroups.com
Nick,

The Puts in the code snippet really mustn't 'always put the entity'. It is crucial in this scenario that one of the Puts fails. If all Puts always succeed (barring non-concurrent failures) then the transactions definitely don't work.

With respect I can't not "worry about how they work, you just need to trust they do". My expectation is that the App Engine datastore really is like every other persistence software I've ever used. If you look closely at the isolation guarantees given by popular relational databases you will see a raft of complex and surprising edge cases.

A good review of these complexities to be found is


Francis

On 17 May 2015 at 22:42, Nick <naok...@gmail.com> wrote:
Apologies, forgot one important item:

6. The datastore uses a restful interface. A put always puts the entity, regardless of whether it already exists or not. If you use a deterministic key the behavior you seek will just happen. It will create an entity if none exists, or update the existing one if it does.
--
You received this message because you are subscribed to a topic in the Google Groups "Google App Engine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine/NagJ97YExB0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.

For more options, visit https://groups.google.com/d/optout.

Jeff Schnitzer

unread,
May 18, 2015, 2:38:51 PM5/18/15
to Google App Engine
That was a miscommunication; Nick was just saying that a Put is an upsert operation. Concurrency is handled at a "whole transaction" level. When you 'touch' EGs, you enlist them in your transaction; when you commit the transaction, if any of those EGs have been modified by someone else, your transaction fails.

If you make all transactions idempotent and retry them, your app will be fine and you don't have to worry about edge cases.

HOWEVER, you have to take idempotence rather seriously. The "edge cases" of the datastore produce errors/exceptions. Errors can happen even if the underlying transaction committed successfully. Your Counter example is not an idempotent transaction - if you retry it, it can produce a miscount.

That may not matter; usually counters are fairly flexible, and the likelihood of success-but-error is low. If you need to make an *exact* count, you have to work a little harder at constructing the transaction.

If you have further questions and can ask them at a conceptual level, we can help. However, you will not likely get implementation details out of Google. Forget timestamps, they don't really matter.

Jeff





--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.

To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.

Dan Dubois

unread,
May 19, 2015, 1:38:36 AM5/19/15
to google-a...@googlegroups.com
Hi Jeff,

I just wanted some clarification on the following statement:

"HOWEVER, you have to take idempotence rather seriously. The "edge cases" of the datastore produce errors/exceptions. Errors can happen even if the underlying transaction committed successfully."

Are you saying that datastore transaction API might return an error to us but actually commit anyway? I just hope that it's not also the other way around where the transaction API returns that everything is OK but actually doesn't commit!

Dan

Jeff Schnitzer

unread,
May 19, 2015, 1:55:02 AM5/19/15
to Google App Engine
It isn't the other way 'round, no. This is pretty standard fare for all distributed systems, including RDBMSes.

Let's say you get a timeout error (on GAE or Oracle). Did the txn commit or not? It's not clear; there was probably a network disconnect and you can't assume it was before or after the commit. You don't know.

The HRD was (last time I checked) documented to have some edge conditions that would throw ConcurrentModificationException even when the txn commits. I didn't find a similar statement on a casual search now; it would be nice to have some clarification from Google on this, because it would make life easier if CME was idempotently retry-able. Timeouts and whatnot can just be raised to the user as long as they aren't frequent.

On the other hand, successful commits are successful commits. You don't need to wonder about that.

If you work on distirbuted applications (and practically all apps are these days), this should be mandatory reading: http://en.wikipedia.org/wiki/Two_Generals%27_Problem

Jeff

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.

Dan Dubois

unread,
May 25, 2015, 11:49:08 AM5/25/15
to google-a...@googlegroups.com, je...@infohazard.org
Jeff,

Thanks for confirmation and all the extra background information.

Dan

Francis Stephens

unread,
May 28, 2015, 5:53:55 AM5/28/15
to google-a...@googlegroups.com
A quick update. This question has been answered on Stackoverflow here


I will summarise the answer here, if some curious soul is browsing google-groups. The explanation was provided by Jamie Gomez, and confirmed by Ryan Barrett.

In the scenario described previously.

1: When the first Get(...) returns there is no entity, and therefore no timestamp for the transaction, but a new timestamp is generated at this point.
2: The Put(...) does not check or generate a timestamp.
3: When the transaction concludes the generated timestamp is tested against the timestamp in the datastore. If this transaction is the first, then there is no timestamp in the datastore and the transaction succeeds. If this transaction is the second then there will be a timestamp for the other transaction and this one will fail.

It was very interesting the learn that timestamps are created for unsuccessful Get(...) calls. That was fun :)

If what I have written above is inaccurate or flat out wrong I would love to hear about that. Thanks.

Francis
Reply all
Reply to author
Forward
0 new messages