Updating Google App Engine datastore

26 views
Skip to first unread message

Víctor Mayoral

unread,
Jun 1, 2010, 7:28:10 AM6/1/10
to Google App Engine
Hello,
I've just an applications using GWT and GAE. After reading the docs
and doing some examples I decided to begin coding but soon I reach
something tricky.

My application data would be at the GAE datastore using JDO. I want to
assume
that the app "will live". Let me explain this:
The entities at the datastore will change. This means that some
properties would be added and some of them maybe removed what leaves
me with a ton of exceptions to handle. Knowing this, which should be
the way to update the datastore entities? One by one after a change
is
made? Should I handle exceptions and then change just the entitie
requested? Is there a way to do this cleaner?

I've been thinking about how to do this stuff with my actual
knowledge
and i think that the most logical way to proceed is get the object,
remove it from the database, create a new one based on this old
object
and then persist it again.

To sum up the question is about if there's a way to add a property to
an stored entitie in the datastore (GAE) easily.

I hope you to understand my lines. Any suggestion would be
appreciated.

Thanks,

Víctor.

Tristan

unread,
Jun 1, 2010, 6:41:58 PM6/1/10
to Google App Engine
Not sure how it would work in JDO but its simple to add properties in
low-level datastore.

I store my entities with a version property, when I update the entity,
I change the version and change my code to either intialize a non-
existing property on an old entity, or to delete a deprecated
property. This happens lazily (only when the entity is actally used).
You can also include a counter of how many entities were updated from
old to new and have a flag trigger when 100% of updates are complete.
Then in the next version of the code you can remove the updating
method since all old entities were lazily updated. An alternative
would be to set notification at 90% or some other percentage, and then
trigger a task that cleans up the rest of the entities. I use this
approach to keep my data consistent without the need for hugely
intensive datastore scans to update when changes happen.

Hope this will give you some ideas.

Cheers!

Tristan

Víctor Mayoral

unread,
Jun 2, 2010, 2:25:34 AM6/2/10
to Google App Engine
Hey Tristan!

Thanks for your answer. I will check the docs again and take a look at
the low level API. The notification + trigger method seems pretty
intelligent i will also check that too.
But I was just wondering, isn't there something like a framework for
this types of jobs?.

Again, thanks

Víctor.

Tristan

unread,
Jun 2, 2010, 2:45:42 AM6/2/10
to Google App Engine
I haven't seen a framework that does data versioning like that.. but I
haven't looked very hard as I rolled my own abstraction layer with the
datastore...

Here's an excerpt from some stuff I wrote a while back and never got
around to publishing. It's a little outdated from what I do nowadays,
but it captures the essence of the framework that can do updating:

=======

Entity Versions and Lazy Migration

Before diving into data versioning, let's quickly go over how we will
use the methods we created. First, let's create a BaseEntity object in
the datastore:

...

BaseEntity myEntity = new BaseEntity();
myEntity.setCreatedAt(new Date());
myEntity.setUpdatedAt(myEntity.getCreatedAt());
Key myEntityKey = datastore.put(myEntity.toEntity());

...

And that's how it's done. (Yes, we did not set version, more on that
later). Below is how we would read from the datastore (assuming we
have the entity key ready):

...

Entity entity = datastore.get(myEntityKey);
BaseEntity myEntity = new BaseEntity();
entity = myEntity.fromEntity(entity);

...

After the call to fromEntity(Entity entity) above, myEntity is now
updated with the data from the datastore and is ready to be used. And
lastly, this is how we would update an entity in the datastore:

...

Entity entity = datastore.get(myEntityKey);
BaseEntity myEntity = new BaseEntity();
entity = myEntity.fromEntity(entity);
myEntity.setUpdatedAt(new Date());
myEntityKey = datastore.put(myEntity.updateEntity(entity));

...

So you can see why we have toEntity() and updateEntity(Entity entity)
as separate methods. The updateEntity(Entity entity) does not touch
the entity key, therefore ensuring that when we execute a put to the
datastore, it will update the exact same entity instead of creating a
new one.

Now I have to confess. Our implementation of BaseEntity interface is
not quite correct. We are going to have to modify it in order to take
advantage of data versioning.

To illustrate how versioning works, let's assume that for some stupid
reason we want to store the createdAt date as a string! So we change
our implementation as follows:

public class BaseEntityImpl implements BaseEntity {

/* variables */
private String createdAt;
private Date updatedAt;
private Long version;

...

}

Ok, after making the above change we make sure that the interface and
the getters and setters are changed so that everything compiles. We
also change our fromEntity(Entity entity) method to reflect the new
state of things:

...

@Override
public Entity fromEntity(Entity entity){
setCreatedAt((String) entity.getProperty(BaseEntity.NcreatedAt));
setEntityKey(entity.getKey());
setUpdatedAt((Date) entity.getProperty(BaseEntity.NupdatedAt));
setVersion((Long) entity.getProperty(BaseEntity.Nversion));
return entity;
}

...


But here's a catch! What if you have 10,000,000 entities already in
your production datastore where createdAt properties are stored as
Date objects? This is where data versioning becomes useful to do, what
I like to call, Lazy Migration (the idea is probably not new, so you
might know it as something else).

The concept behind lazy migration is to do the data migration from old
version to new version only when that data is actually read. Since we
are working with Google App Engine and we are paying for CPU cycles,
this may save you some money, especially if your schema happens to
change more than it should. Let's go back to our implementation and
set it up to support this concept.

First, we will go over what the implementation should have looked like
when createdAt was stored as Date:

public class BaseEntityImpl implements BaseEntity {

/* variables */
private Date createdAt;
private Date updatedAt;
private Long version = 1;

...

}

What we changed was the version number to 1. Now that we changed our
model to using String for storage, the implementation will look like:

public class BaseEntityImpl implements BaseEntity {

/* variables */
private String createdAt;
private Date updatedAt;
private Long version = 2;

...

}

And here is how we do the lazy migration inside fromEntity(Entity
entity):

...

@Override
public Entity fromEntity(Entity entity){
Object property;

Long version = ((Long) entity.getProperty(BaseEntity.Nversion));

// convert from old version (Date) to new version (String)
property = entity.getProperty(BaseEntity.NcreatedAt));
if (version.equals(1)){ // old version, do conversion
setCreatedAt(((Date) property).toString());
entity.setProperty(BaseEntity.NcreatedAt, getCreatedAt());
} else { // new version, no worries
setCreatedAt((String) property));
}

setEntityKey(entity.getKey());
setUpdatedAt((Date) entity.getProperty(BaseEntity.NupdatedAt));

// we converted any old data to new data above, so set the new
version
entity.setProperty(BaseEntity.Nversion, getVersion());
return entity;
}

...

This now fulfills the fromEntity(Entity entity) contract of returning
an entity that is updated to the latest version. Notice that we don't
set the version variable. That now becomes pretty much read only, and
if you'd like, you could probably make it final static.

Hopefully now you'll see the reason for the syntax of reading and
updating entities from and to the datastore. Let's look again at the
update code:

...

Entity entity = datastore.get(myEntityKey);
BaseEntity myEntity = new BaseEntity();
entity = myEntity.fromEntity(entity);
myEntity.setUpdatedAt(new Date());
myEntityKey = datastore.put(myEntity.updateEntity(entity));

...

You should be able to convince yourself that when using the above
technique, after you store the updated entity in the datastore, it
will be converted from version 1 to version 2 and our lazy migration
took place without any code above the entity implementation level
knowing any better. As far as the code blowing up as more and more
changes to the schema occur... after giving the system some time to
work you can run some admin queries and see if there are still any
leftover entities that are stored as the old version and then do a
migration on a much smaller dataset if you want to simplify your
fromEntity(Entity entity) code.

Jeff Schnitzer

unread,
Jun 2, 2010, 1:46:41 PM6/2/10
to google-a...@googlegroups.com
There are two distinctions to be made - do your properties change
dynamically with program code or do they just evolve rapidly over
time? Ie: Do you need a dynamic interface to properties
(Map<String,Object>) or do you just need a more flexible way to deal
with schema evolution?

For a fully dynamic interface, go directly to the Low-Level API. The
Entity object is basically a hashmap and you can get/set any
properties you want. You can iterate through the available
properties.

For basic schema evolution, you have more choices. JDO is
particularly bad at this. Pretty much any change in the underlying
schema will break your application.

Objectify is particularly good at this. Possibly the other
alternative datastore APIs can help too (Twig, SimpleDS, Slim3) but I
don't know. You can read a lot more about Objectify's tools for
schema migration here:

http://code.google.com/p/objectify-appengine/wiki/IntroductionToObjectify#Migrating_Schemas

Basically, you can add or remove fields to your entities at any time
without any ill effects. You can also create mappings so that you can
change your entity classes, loading the old schema into the new
entities and upgrading them on the fly. This lets you make some
fairly extensive schema changes with zero downtime.

Note that this still isn't a wholly generic Map interface to the
datastore; you are still working with your typed Java classes. But
you can use both the Low-Level API and Objectify together very well;
the data structures have an intuitive 1-to-1 mapping.

Here's the project:

http://code.google.com/p/objectify-appengine/

Good luck,
Jeff

2010/6/1 Víctor Mayoral <v.may...@gmail.com>:

> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>
>

Víctor Mayoral

unread,
Jun 2, 2010, 4:04:44 PM6/2/10
to Google App Engine
Thanks Tristan and Jeff,
Both of you have been really kind and your suggestions were really
clear and understandable.

I'll start with both approaches and see what fits me better.

My best reggards,

Víctor.


On 2 jun, 19:46, Jeff Schnitzer <j...@infohazard.org> wrote:
> There are two distinctions to be made - do your properties change
> dynamically with program code or do they just evolve rapidly over
> time?  Ie:  Do you need a dynamic interface to properties
> (Map<String,Object>) or do you just need a more flexible way to deal
> with schema evolution?
>
> For a fully dynamic interface, go directly to the Low-Level API.  The
> Entity object is basically a hashmap and you can get/set any
> properties you want.  You can iterate through the available
> properties.
>
> For basic schema evolution, you have more choices.  JDO is
> particularly bad at this.  Pretty much any change in the underlying
> schema will break your application.
>
> Objectify is particularly good at this.  Possibly the other
> alternativedatastoreAPIs can help too (Twig, SimpleDS, Slim3) but I
> don't know.  You can read a lot more about Objectify's tools for
> schema migration here:
>
> http://code.google.com/p/objectify-appengine/wiki/IntroductionToObjec...
>
> Basically, you can add or remove fields to your entities at any time
> without any ill effects.  You can also create mappings so that you can
> change your entity classes, loading the old schema into the new
> entities and upgrading them on the fly.  This lets you make some
> fairly extensive schema changes with zero downtime.
>
> Note that this still isn't a wholly generic Map interface to thedatastore; you are still working with your typed Java classes.  But
> you can use both the Low-Level API and Objectify together very well;
> the data structures have an intuitive 1-to-1 mapping.
>
> Here's the project:
>
> http://code.google.com/p/objectify-appengine/
>
> Good luck,
> Jeff
>
> 2010/6/1 Víctor Mayoral <v.mayor...@gmail.com>:
>
> > Hey Tristan!
>
> > Thanks for your answer. I will check the docs again and take a look at
> > the low level API. The notification + trigger method seems pretty
> > intelligent i will also check that too.
> > But I was just wondering, isn't there something like a framework for
> > this types of jobs?.
>
> > Again, thanks
>
> > Víctor.
>
> > On 2 jun, 00:41, Tristan <tristan.slomin...@gmail.com> wrote:
> >> Not sure how it would work in JDO but its simple to add properties in
> >> low-leveldatastore.
>
> >> I store my entities with a version property, when I update the entity,
> >> I change the version and change my code to either intialize a non-
> >> existing property on an old entity, or to delete a deprecated
> >> property. This happens lazily (only when the entity is actally used).
> >> You can also include a counter of how many entities were updated from
> >> old to new and have a flag trigger when 100% of updates are complete.
> >> Then in the next version of the code you can remove theupdating
> >> method since all old entities were lazily updated. An alternative
> >> would be to set notification at 90% or some other percentage, and then
> >> trigger a task that cleans up the rest of the entities. I use this
> >> approach to keep my data consistent without the need for hugely
> >> intensivedatastorescans to update when changes happen.
>
> >> Hope this will give you some ideas.
>
> >> Cheers!
>
> >> Tristan
>
> >> On Jun 1, 6:28 am, Víctor Mayoral <v.mayor...@gmail.com> wrote:
>
> >> > Hello,
> >> > I've just an applications using GWT and GAE. After reading the docs
> >> > and doing some examples I decided to begin coding but soon I reach
> >> > something tricky.
>
> >> > My application data would be at the GAEdatastoreusing JDO. I want to
> >> > assume
> >> > that the app "will live". Let me explain this:
> >> > The entities at thedatastorewill change. This means that some
> >> > properties would be added and some of them maybe removed what leaves
> >> > me with a ton of exceptions to handle. Knowing this, which should be
> >> > the way to update thedatastoreentities? One by one after a change
> >> > is
> >> > made? Should I handle exceptions and then change just the entitie
> >> > requested? Is there a way to do this cleaner?
>
> >> > I've been thinking about how to do this stuff with my actual
> >> > knowledge
> >> > and i think that the most logical way to proceed is get the object,
> >> > remove it from the database, create a new one based on this old
> >> > object
> >> > and then persist it again.
>
> >> > To sum up the question is about if there's a way to add a property to
> >> > an stored entitie in thedatastore(GAE) easily.
Reply all
Reply to author
Forward
0 new messages