I haven't seen a framework that does data versioning like that.. but I
haven't looked very hard as I rolled my own abstraction layer with the
datastore...
Here's an excerpt from some stuff I wrote a while back and never got
around to publishing. It's a little outdated from what I do nowadays,
but it captures the essence of the framework that can do updating:
=======
Entity Versions and Lazy Migration
Before diving into data versioning, let's quickly go over how we will
use the methods we created. First, let's create a BaseEntity object in
the datastore:
...
BaseEntity myEntity = new BaseEntity();
myEntity.setCreatedAt(new Date());
myEntity.setUpdatedAt(myEntity.getCreatedAt());
Key myEntityKey = datastore.put(myEntity.toEntity());
...
And that's how it's done. (Yes, we did not set version, more on that
later). Below is how we would read from the datastore (assuming we
have the entity key ready):
...
Entity entity = datastore.get(myEntityKey);
BaseEntity myEntity = new BaseEntity();
entity = myEntity.fromEntity(entity);
...
After the call to fromEntity(Entity entity) above, myEntity is now
updated with the data from the datastore and is ready to be used. And
lastly, this is how we would update an entity in the datastore:
...
Entity entity = datastore.get(myEntityKey);
BaseEntity myEntity = new BaseEntity();
entity = myEntity.fromEntity(entity);
myEntity.setUpdatedAt(new Date());
myEntityKey = datastore.put(myEntity.updateEntity(entity));
...
So you can see why we have toEntity() and updateEntity(Entity entity)
as separate methods. The updateEntity(Entity entity) does not touch
the entity key, therefore ensuring that when we execute a put to the
datastore, it will update the exact same entity instead of creating a
new one.
Now I have to confess. Our implementation of BaseEntity interface is
not quite correct. We are going to have to modify it in order to take
advantage of data versioning.
To illustrate how versioning works, let's assume that for some stupid
reason we want to store the createdAt date as a string! So we change
our implementation as follows:
public class BaseEntityImpl implements BaseEntity {
/* variables */
private String createdAt;
private Date updatedAt;
private Long version;
...
}
Ok, after making the above change we make sure that the interface and
the getters and setters are changed so that everything compiles. We
also change our fromEntity(Entity entity) method to reflect the new
state of things:
...
@Override
public Entity fromEntity(Entity entity){
setCreatedAt((String) entity.getProperty(BaseEntity.NcreatedAt));
setEntityKey(entity.getKey());
setUpdatedAt((Date) entity.getProperty(BaseEntity.NupdatedAt));
setVersion((Long) entity.getProperty(BaseEntity.Nversion));
return entity;
}
...
But here's a catch! What if you have 10,000,000 entities already in
your production datastore where createdAt properties are stored as
Date objects? This is where data versioning becomes useful to do, what
I like to call, Lazy Migration (the idea is probably not new, so you
might know it as something else).
The concept behind lazy migration is to do the data migration from old
version to new version only when that data is actually read. Since we
are working with Google App Engine and we are paying for CPU cycles,
this may save you some money, especially if your schema happens to
change more than it should. Let's go back to our implementation and
set it up to support this concept.
First, we will go over what the implementation should have looked like
when createdAt was stored as Date:
public class BaseEntityImpl implements BaseEntity {
/* variables */
private Date createdAt;
private Date updatedAt;
private Long version = 1;
...
}
What we changed was the version number to 1. Now that we changed our
model to using String for storage, the implementation will look like:
public class BaseEntityImpl implements BaseEntity {
/* variables */
private String createdAt;
private Date updatedAt;
private Long version = 2;
...
}
And here is how we do the lazy migration inside fromEntity(Entity
entity):
...
@Override
public Entity fromEntity(Entity entity){
Object property;
Long version = ((Long) entity.getProperty(BaseEntity.Nversion));
// convert from old version (Date) to new version (String)
property = entity.getProperty(BaseEntity.NcreatedAt));
if (version.equals(1)){ // old version, do conversion
setCreatedAt(((Date) property).toString());
entity.setProperty(BaseEntity.NcreatedAt, getCreatedAt());
} else { // new version, no worries
setCreatedAt((String) property));
}
setEntityKey(entity.getKey());
setUpdatedAt((Date) entity.getProperty(BaseEntity.NupdatedAt));
// we converted any old data to new data above, so set the new
version
entity.setProperty(BaseEntity.Nversion, getVersion());
return entity;
}
...
This now fulfills the fromEntity(Entity entity) contract of returning
an entity that is updated to the latest version. Notice that we don't
set the version variable. That now becomes pretty much read only, and
if you'd like, you could probably make it final static.
Hopefully now you'll see the reason for the syntax of reading and
updating entities from and to the datastore. Let's look again at the
update code:
...
Entity entity = datastore.get(myEntityKey);
BaseEntity myEntity = new BaseEntity();
entity = myEntity.fromEntity(entity);
myEntity.setUpdatedAt(new Date());
myEntityKey = datastore.put(myEntity.updateEntity(entity));
...
You should be able to convince yourself that when using the above
technique, after you store the updated entity in the datastore, it
will be converted from version 1 to version 2 and our lazy migration
took place without any code above the entity implementation level
knowing any better. As far as the code blowing up as more and more
changes to the schema occur... after giving the system some time to
work you can run some admin queries and see if there are still any
leftover entities that are stored as the old version and then do a
migration on a much smaller dataset if you want to simplify your
fromEntity(Entity entity) code.