data migration as application evolves

99 views
Skip to first unread message

__doc__

unread,
Apr 14, 2008, 4:33:24 PM4/14/08
to Google App Engine
How do you migrate an existing bulk of data for an application update?

P.S. also opened as http://code.google.com/p/googleappengine/issues/detail?id=181
which is currently set to invalid, please reopen as you see fit as I
don't think that question/need is addressed in any Documentation/API/
concept, and therefore would be a "thing" to "do" for google.

Craig

unread,
Apr 14, 2008, 4:43:25 PM4/14/08
to Google App Engine

__doc__

unread,
Apr 14, 2008, 6:26:23 PM4/14/08
to Google App Engine
On Apr 14, 10:43 pm, Craig <craig.carpen...@gmail.com> wrote:
> http://code.google.com/appengine/articles/bulkload.html

This does not cover data migration, bulk loading != data migration

Example:
-1 write a fancy myapp 1.0 that handles music, simplified the model
looks like album 1:n song and deploy it.
-2 it would be great if an offer entity would be connected to your
albums and songs so you can price them and store it to a customers
contract, such that it reflects the price at the time of sale.
-3 change your application code to work with such a revised model
-4 change all your existing data and deploy myapp 1.1

That is a simple example, but basically it means you should be able
to:
1) add new fields with default values for all instances of a type
2) remove fields from all instances of of a type
3) crunch trough all existing data to build new instances and connect
them to the existing data.

barryhunter

unread,
Apr 14, 2008, 7:42:27 PM4/14/08
to Google App Engine
Seems to me this would be highly specific to your application, and it
would relativly easy to write some loops to migrate.

However the effectivly freeform datastore, means that many large scale
migrations not really needed.

1) You just add it to your model definition, when loading a model that
doesnt have that value set it will be set the default - the
application doesnt know (or care) it was not set in the original, next
time (if) it writtem it will saved.

2) just remove it from your model definition, loading 'old' data will
just ignore it, when you save it will be rewitten without

3) create a loop that does that. Or write code so its not needed.

__doc__

unread,
Apr 15, 2008, 5:32:49 AM4/15/08
to Google App Engine
On Apr 14, 4:42 pm, barryhunter <BarryBHun...@googlemail.com> wrote:
> Seems to me this would be highly specific to your application, and it
Data migration isn't specific to any application, it's specific to
evolving your application logic and changing your data accordingly.

> would relativly easy to write some loops to migrate.
You can't "loop" on appengine as far as I know because you have a
query limit of 1000 rows and a runtime limit for a method of some
200ms.

> However the effectivly freeform datastore, means that many large scale
> migrations not really needed.
I don't believe that, and it is in fact refutable easely. ZODB is a
freeform data store in that it stores objects and their members. You
_do_ perform data migration tasks on data stored on ZODB when you
write code that wouldn't be compatible with the old data.

> 1) You just add it to your model definition, when loading a model that
> doesnt have that value set it will be set the default - the
> application doesnt know (or care) it was not set in the original, next
> time (if) it writtem it will saved.
Which works for a limited use case where that value can have a
constant default, which is not always desirable.

> 2) just remove it from your model definition, loading 'old' data will
> just ignore it, when you save it will be rewitten without
That does take care of members that need no further processing (for
instance if the removal is instead a move to somewhere else).

> 3) create a loop that does that.
As far as I know, impossible on appengine.

> Or write code so its not needed.
That is a bad option. It is bad because it forces you to write
application code that has to care about the past from the beginning of
time, makes your code unrefactorable and buries your essential
application in reams of conditional coding that takes care of the
changes up to the present.

The alternative of refactoring your code and getting rid of the
baggage, and then writing a run once migrate job to take care of the
conversion is much more maintainable.

nayrelgof

unread,
May 12, 2008, 6:54:00 PM5/12/08
to Google App Engine
I want very much to start using Google App Engine, at least for
personal projects, and then possibly for professional after that. Data
Migration is the only thing holding me back. I cannot bring myself to
use a framework that doesn't let me be wrong. I will name a field
wrong. I will come back later and want to rename it. As far as I can
tell, that's not even possible.

I could add a field, trigger an action that migrates the data, and
then remove the old field. However, I can only do that to 1000 records
at a time.

I could add a wrapper object that handles old versions of records, but
how many versions can I handle before I go insane?

Renaming is just the start. What if my desired format changes? What if
I find that I need to extract part of a model into another model, or
rename one altogether? In my current project, I'm about to remove a
model that was unnecessarily over-engineered. I need to migrate the
data into its parent object. As far as I can tell, that's not possible
in Google App Engine.

Iterative development is all about making constant changes, and I
can't do that without data migrations. No framework is ready for
development use, much less production, until it supports some form of
data migrations.

Ross Ridge

unread,
May 12, 2008, 8:50:12 PM5/12/08
to Google App Engine
nayrelgof wrote:
> I want very much to start using Google App Engine, at least for
> personal projects, and then possibly for professional after that. Data
> Migration is the only thing holding me back. I cannot bring myself to
> use a framework that doesn't let me be wrong. I will name a field
> wrong. I will come back later and want to rename it. As far as I can
> tell, that's not even possible.

You can create Model properties that have different names than are
used in the datastore so you can effectively rename "fields" in your
source code as much as you want.

> Renaming is just the start. What if my desired format changes? What if
> I find that I need to extract part of a model into another model, or
> rename one altogether?

You can "rename" models in your source code by just assigning them to
another variable:

class old_model_name(db.Model):
new_property_name = db.StringProperty(name =
"old_property_name")

new_model_name = old_model_name

As for other kinds of changes, I'd consider migrating the data one
entity at time as it's loaded. I think you can do something like
this:

class thingie(db.Model):
name = db.StringProperty() # old property
names = db.StringListProperty() # new property

def __init__(self, **kwargs):
value = kwargs.get("name")
if value != None:
kwargs["names"] = [value]
kwargs["name"] = None
db.Model.__init__(self, **kwargs)

Eventually once most of the entities are migrated this way, you can
just load and save the rest 1000 at a time and then remove the "name"
property and the __init__ funciton from your source code.

For more complex migrations you'd have to implement something like the
CSV uploader that instead of uploading your data, migrates it.

Ross Ridge

Filip

unread,
May 15, 2008, 8:16:51 AM5/15/08
to Google App Engine
Great suggestions.
Also, __doc__ should probably star issues 6, and/or issue 112?

Filip

Christopher Blunck

unread,
May 23, 2008, 3:06:08 PM5/23/08
to Google App Engine
I agree with __doc__. There are many migrations (especially those of
a relational nature) that are too clumsy to handle at subsequent
object retrieval. Not only is it inefficient from a computing
standpoint it's a nightmare from a coding maintenance perspective.

Say it's day 0 and I have a User class with firstName, lastName,
userName, and type fields. Now lets say it's day 50 and I realize
that I need some additional model information for the User when their
type is "employee". Furthermore, let's expand that by saying I have
other fields that are needed when the user's type is "customer", or
"owner". You can't write if logic all over the place that looks for
different fields in the User based on the type. At that point you're
throwing out decades of advances in OO.

It's much easier to say "prior to version X of the datastore all i had
were employees. thus to migrate from X to X+1 i need to create
Employee instances for each User instance, and copy the User data over
while assigning reasonable defaults to the new fields."

Also ... what happens when you change the multiplicity of
relationships? Or you change the relationship paths? Rather than a
User have a single Address they now have a [Address]. And then what
happens next when you need to subclass Address into USAddress and
EUAddress?

The point that __doc__ and I are trying to make is: your persistence
model will invariably change over time. An effective way to manage
that change is through data migration scripts that can be applied at
specific points in time to cutover data from version X to version X +
1.

I'm currently evaluating AppEngine for use on my site, but like
__doc__ I'm holding back because a clear-cut migration path doesn't
exist for the DataStore.


-c
> baggage, and then writing a run oncemigratejob to take care of the

Edoardo Marcora

unread,
May 23, 2008, 4:26:36 PM5/23/08
to Google App Engine
I agree with Chris and __doc__ that migrations are a much needed
feature. I submitted an issue to the tracker, please star it if you
agree that having migrations would be nice <http://code.google.com/p/
googleappengine/issues/detail?id=385>.

However, let's not forget that is still a beta release and Google
already gave us a lot to play with... things will only get better!

Dado

On May 23, 12:06 pm, Christopher Blunck <christopher.blu...@gmail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages