Versioning applications/entities

Skip to first unread message

Kyle Baley

Sep 20, 2010, 2:29:54 PM9/20/10
to Google App Engine
We've just released the first version of our application and are now
looking at a problem we've been avoiding until now. Namely, what is
the best way to upgrade the application to a new version that requires
changes to the datastore. We're looking at two options:

1) Big Bang Upgrade
We take the application down and run an upgrade process to update all
entities from version 1 to version 2.

Pros: Easy to maintain; intuitive
Cons: App has to be taken down for a period of time, which will
increase as time passes and more data is added to the datastore
(potentially hitting the limit for long-running processes eventually)
Question: What's a good way to take the app offline?

2) Version Entities Individually
Each entity has a version number and we have a series of commands,
each one responsible for upgrading an entity from one version to the
next. As we request entities, we check to see if it's the latest
version. If not, we run each necessary upgrade command in sequence
until it is the latest version.

Pros: No need to take the app offline; provides flexibility on whether
to upgrade everything at once or piecemeal
Cons: Not as intuitive; entities with different versions in the
datastore (if that matters)

What do other people do to upgrade their datastore for a live


Sep 22, 2010, 4:13:54 AM9/22/10
to Google App Engine
I'm considering the samme issue currently, and I'm looking forward to
see other suggestions. Here mine.

Up until now i have gotten away with making entities backward
compatible. They have so designed them so
they can be upgraded on the fly and stille work even though not
upgraded. But this i probably coming to and
en since some changes can just not be made that way.

I'm very much against option 1. IT's hard to test the upgrade, and if
something goes wrong during the update
you have no transactional logic to protect you. The only way too
rollback would by making a data dump and
restore that if things go wrong. More time spent, more things which
needs to be tested.

That leaves option 2. This has the the advantage that you can start
the upgrade before changing default versions
and you don't have any downtime. The entities still need to be
somewhat backward compatible since you probably want
to upgrade before changing the default version. Also it's easier to
"recover" from since you can probably live with some entities
needing som patching for a while if something goes wrong - your system
will run as a whole. I'm not sure how well a version number
will do in the indexes. And off course there is the waste of an entire
index, just for recording metadata about the entity

As mentioned, I hope we see some other update strategies in the


Tim Hoffman

Sep 22, 2010, 5:41:37 AM9/22/10
to Google App Engine
Why can't you push the version 2, start updating the data
but still keep running version 1 until the data update has finished.

That I believe is the normal recommended approach.

Is there something about your new version 2 data model that will break
version 1 ?


Kyle Baley

Sep 22, 2010, 1:46:31 PM9/22/10
to Google App Engine
Here's one extreme example. In the original data model, the password
isn't encrypted. In the new one, it is.

Tim Hoffman

Sep 22, 2010, 6:43:28 PM9/22/10
to Google App Engine

Thats one example where I think you should handle that data
discrepancy in the code.
Update you code to support both forms of password during the
transition, (either the old code or the new code) the update data.
Then once you have converted everything remove the cleartext password


Eli Jones

Sep 22, 2010, 8:07:01 PM9/22/10
It might be useful for you to use a namespace for the new version of the datastore.

Thus, you could have the "new version" of the app deployed as a non-live version of the app.. and code that "new version" to use the "new version datastore" namespace.

Then, when you are ready.. just change the live version of your app to the "new version".

Here's a link:

So.. you could just think of your version 1 datastore as "that old customer who we're going to dump just as soon as our new version 2 datastore customer is ready"... or something like that.

It's better (I think) than adding a "version" property to all of your models or trying to maintain model consistency between app versions.

You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to
To unsubscribe from this group, send email to
For more options, visit this group at

Kyle Baley

Sep 22, 2010, 9:39:43 PM9/22/10
to Google App Engine
That's an interesting idea. But if you do that, wouldn't you need to
first copy all the data from the v1 namespace to the v2 namespace?

On Sep 22, 7:07 pm, Eli Jones <> wrote:
> It might be useful for you to use a namespace for the new version of the
> datastore.
> Thus, you could have the "new version" of the app deployed as a non-live
> version of the app.. and code that "new version" to use the "new version
> datastore" namespace.
> Then, when you are ready.. just change the live version of your app to the
> "new version".
> Here's a link:
> <>So..
> ><google-appengine%2Bunsubscrib>
> > .

Tim Hoffman

Sep 22, 2010, 11:22:07 PM9/22/10
to Google App Engine
Yep you will duplicate your data,.

I don't think its a good idea unless you only have a very small amount
of data.



Sep 23, 2010, 3:17:07 AM9/23/10
to Google App Engine
I would say that's a case where you easily can be backward compatible.

Then hash them at you leisure.


Sep 23, 2010, 3:18:03 AM9/23/10
to Google App Engine
Plus you'll need to sync those data until v2 is live.

Eli Jones

Sep 23, 2010, 11:27:45 AM9/23/10
Yes, you'll need to duplicate your data.

This is the cleanest way to do this.. in my mind.  You would have two separate versions of your app.. and you could easily switch back to your old version of you realized some horrible mistake was made after switching to the new version.

The main work you would need to do for the changeover from Old Version to New Version would be:

1.  Write code that converts old datastore data to the New Version namespace data.
2.  Test New Version datastore with converted data on New App version.
3.  Schedule a downtime late at night where the app is brought offline (just upload a dummy app.yaml for the Old Version that points /.* to some message page.. or you could be official and have a "Maintenance Period" version of your app that you make live.. so that you make no changes whatsoever to the "Old Version".), and the datastore conversion code is run after you are comfortable that no users are using the site and are all viewing the "maintenance period" messages.
4.  Once complete, make New Version app the live version.

To me, it seems like a massive headache to try to convert the data in place.. in the same datastore.. you have to have new model names or at least some new property for "version" on your models.. and then your key_names have to be different.. etc..  If you screw something up, that's a potential headache for your live app..  with the New Version namespace.. you can freely move ahead with experimenting with your coding.. and even make plenty of changes to your models (if you think they are beneficial) without thinking "I don't know.. it's going to be a pain in the ass to figure out how to keep track of the old version and new version entities and now I have to add a new model or property definition to my schema code.."

With a namespace, you just have to add the correct mapping, conversion code to your datastore converter code, and make schema changes to your New Version app code.. and just leave the Old Version datastore and code alone.

You can also get fancy with your converter.. (depending on how the conversion actually works) and you can have your converter code get a cursor with keys_only for each model, serialize that cursor and pass it off to a task that then iterates through the cursor and breaks up the keys into batches to be converted and fires off tasks to do the batches in parallel.  And, to make the process simpler, you could just used the deferred api.. (so you don't have to deal with setting up task handler urls or anything like that).

However big your datastore is.. and however much money it might cost to have your data duplicated (it costs $0.01 a day for 2 GB of data above the free limit).. I think it would be worth it.. it would be valuable to learn to do it this way, since a large scale site would (or should) do something more like this.

To unsubscribe from this group, send email to

andy stevko

Sep 23, 2010, 1:03:07 PM9/23/10
Not being very familiar with the namespace apis, is it possible with the approach below to convert only a portion of the datastore object model and not duplicate the entire datastore?

For example - convert Users and Passwords but leave the weakly related Accounts in place.

John McLaughlin

Sep 24, 2010, 10:33:30 AM9/24/10
to Google App Engine
For this particular case I would consider doing a two pass process.
Add the encrypted password to the datastore on the first pass. Then
switch the code to use the encrypted passwords, and finally remove the
old passwords. The advantages are that you can do your final testing
on live data, it's easy to roll back, and there's no need to code a
special transition case.
> >> > ><google-appengine%2Bunsubscrib><google-appengine%2Bunsubscrib

Kyle Baley

Sep 27, 2010, 9:37:58 PM9/27/10
to Google App Engine

I can see a benefit to this approach but isn't there a chance some
data would be missed? If customers are using the app while you're
testing the new version, wouldn't you need to re-import the data from
version 1 and convert it again while the app is down?
> > > ><google-appengine%2Bunsubscrib><google-appengine%2Bunsubscrib

Eli Jones

Sep 27, 2010, 10:56:21 PM9/27/10
Yes, when you did the official switch to put the new version of your app live for your customers.. you would need to take the app down and re-convert all the datastore data.

Now, you could put a lot of time and energy figuring out a synchronization process.. but it seems like that would take a lot of extra coding.

You'd have to have some process to compare the old datastore entities to the new datastore entities (either some sort of "datemodified" property or entity checksum.. or some sort of prop by prop comparison) and really.. seems like that would take more resources than just doing a straight re-conversion.

I suppose if the re-conversion took hours and hours.. and cost a lot of money.. it might make sense.  But then.. you'd have to figure.. would the re-conversion cost more money (and customer time) than the price of your time (and saved customer time) to code up a iterative sync process?

Trying to straddle two databases (one for an old version of code.. and one for a new version of code) that stays synchronized can get pretty mind bending.  But, that all depends on the complexity of your datastore..

So, synchronization is sort of a catch 22.

If your datastore is so big and complex that synchronization takes less time than total re-conversion, the time to code out a sync process might be prohibitive.

If your datastore is so simple that coding a synchronization process is easy, then re-conversion probably wouldn't take much time to run..

But, this is just a guess on my part.. I don't know what your datastore changes are going to look like.

To unsubscribe from this group, send email to

Robert Kluin

Sep 27, 2010, 11:16:44 PM9/27/10
I have several apps with complex models and relationships. Personally
I have found using a version property on my models has made most of my
upgrades and schema changes fairly straight forward. Although it does
sometimes take an interim version of the code capable of handling both
versions; or, possibly that sets the "version" explicitly on a put()
so that it can be upgraded later.

In some cases I simply leave a version of the 'old' and 'new' logic in
place. When dealing with new data I use the new logic; when dealing
with existing data I use the version of the logic corresponding to the
entity or set of entities I am dealing with. Of course the structure
of my code itself facilitates this method, but it has proven very easy
to make even significant schema changes. Basically I push the interim
version then kick off my upgrade handlers. Works really well if you
do not want much, if any, downtime. YMMV.

I like the idea of using namespaces to version the data too. But
figuring out how to either keep stuff in sync or going offline for a
full conversion could be tricky. I suppose you could add some type of
change indicator that gets set on _all_ of your models, then increment
it each time you run a conversion. That would let you identify what
has changed since your last run... possibly minimizing downtime?


Kyle Baley

Sep 28, 2010, 9:20:37 AM9/28/10
to Google App Engine
I asked Fred Sauer at Google and his advice sounded like what Robert,
and possibly John if I understand it correctly, describe:

The best practice is to deploy a version of your app which understand
the old model, but which on update/insert creates the new model. For
minor changes (like a new property), you can just keep running this
way. For bigger changes, or if you want to clean up, use the Mapper
API (or your own task queues) to migrate all existing entities. Once
migrated, you can remove the support for the old data structure from
the app.

Advantages I can see to this approach:
- No downtime
- Data store doesn't contain various versions of entities. There are
only two versions: old and new. And the old ones will get cleaned up
in short order.

Disadvantage is that you need to support two models for a time and you
need to re-deploy to remove support for the old version. Doesn't seem
too big a deal since I doubt we'd deploy specifically for this
purpose. More likely, we'd bundle it with another deployment.
> ...
> read more »


Sep 29, 2010, 4:11:37 AM9/29/10
to Google App Engine
Is that version number indexed?

And if, how is performance on puts (and searches on that index)?
> ...
> read more »

Robert Kluin

Sep 29, 2010, 11:55:58 AM9/29/10
Yes, I usually use the built-in index (like any field that does not
get indexed=False). The field is never used in any other (composite)
indexes. I leave it indeed so that I can very easily run a query such
as: TheModel().all().filter('version <', 7).fetch(50). That allows
me to have background processing to slowly (1 serial task), or quickly
(many parallel tasks), update my entities in a reliable way.
Performance is fine, just like any other property.

You could also forgo the index and instead loop over all entities of
that kind to do a bulk update. Or, you could simply leave code that
knows how to handle the old schema in place, and update entities as
you encounter them.


Reply all
Reply to author
0 new messages