Using bulkloader to restore entities created using objectify

133 views
Skip to first unread message

Dennis Lo

unread,
May 15, 2011, 11:46:51 AM5/15/11
to objectify...@googlegroups.com
Hi,

I'm using objectify 3.0 and the bulkloader to backup and restore entities that I've created.
To test I'm using the "Game" kind and outputting it to a csv file called game.csv.
However, when I upload the data and try to retrieve data I get the following error:

Uncaught exception from servlet
java.lang.IllegalStateException: Loaded Entity has name but com.example.app.model.Game has no String @Id
	at com.googlecode.objectify.impl.ConcreteEntityMetadata.setKey(ConcreteEntityMetadata.java:343)
	at com.googlecode.objectify.impl.ConcreteEntityMetadata.toObject(ConcreteEntityMetadata.java:210)
	at com.googlecode.objectify.impl.QueryImpl$ToObjectIterator.translate(QueryImpl.java:640)
	at com.googlecode.objectify.impl.QueryImpl$ToObjectIterator.translate(QueryImpl.java:629)
	at com.googlecode.objectify.util.TranslatingIterator.next(TranslatingIterator.java:35)
	at com.googlecode.objectify.impl.QueryImpl.list(QueryImpl.java:470)

This is the process I go through:
1. Download the game kind to game.csv using:
appcfg.py download_data --config_file=bulkloader.yaml --kind=Game --filename=game.csv --application=MyAppId --url=http://MyAppId.appspot.com/remote_api --rps_limit=500 --bandwidth_limit=2500000 --batch_size=100

2. Delete all game entities. I then checked the my app's admin portal datastore tab and I see that there are no more entities in my data store.

3. Upload the game kind using the game.csv created using (command is the same as download_data but with upload_data):

appcfg.py upload_data --config_file=bulkloader.yaml --kind=Game --filename=game.csv --application=MyAppId --url=http://MyAppId.appspot.com/remote_api --rps_limit=500 --bandwidth_limit=2500000 --batch_size=100

4. Run a servlet that retrieves an entity by 'name' (this is the property shown in Game.java below). The error above occurs.

Does anyone know how to fix this?

In case this is useful this is my Game.java

public class Game {
   
@Id private Long id; //This is my key, auto generated by objectify  
   
private String name;
   
private String genre;
   
private Date releasedate;

   
//ommitting getters and setters
}

I suspect it has something to do with my bulkload.yaml file not being configure correct. So I've posted it below:

- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.ext.db
- import: google.appengine.api.datastore
- import: google.appengine.api.users

transformers
:

- kind: Game
  connector
: csv
  connector_options
:
   
# TODO: Add connector options here--these are specific to each connector.
  property_map
:
   
- property: __key__
      external_name
: key
      export_transform
: transform.key_id_or_name_as_string

   
- property: __scatter__
     
#external_name: __scatter__
     
# Type: ShortBlob Stats: 56 properties of this type in this kind.

   
- property: genre
      external_name
: genre
     
# Type: String Stats: 6639 properties of this type in this kind.

   
- property: name
      external_name
: name
     
# Type: String Stats: 6639 properties of this type in this kind.

   
- property: releasedate
      external_name
: releasedate
     
# Type: Date/Time Stats: 6548 properties of this type in this kind.
      import_transform
: transform.import_date_time('%Y-%m-%dT%H:%M:%S')
      export_transform
: transform.export_date_time('%Y-%m-%dT%H:%M:%S')



Thanks!


Jeff Schnitzer

unread,
May 15, 2011, 2:01:15 PM5/15/11
to objectify...@googlegroups.com
The error message says that the data in the datastore as a String name
key, but your Game entity has a numeric Long @Id. I don't really know
the syntax for the bulk loader, but the most suspect line is this one:

>       export_transform: transform.key_id_or_name_as_string

It looks like you are converting all numeric ids to strings here,
which would be your problem. Leave them as numbers.

(background info: entities in the datastore can have either a numeric
id or a string name. it's either-or.)

Jeff

Dennis Lo

unread,
May 16, 2011, 7:52:57 PM5/16/11
to objectify...@googlegroups.com
I did some experiements and I believe I have the solution.

I actually got the idea from a stackoverflow post I asked: http://stackoverflow.com/questions/6008929/using-java-google-app-engine-bulkloader-to-download-entire-datastore-to-one-csv-f

The fix is to avoid using the --config_file and bulkloader.yaml.

I used the following to download every kind to a single csv file:

appcfg.py download_data --filename=backup.csv --application=MyAppId --url=http://MyAppId.appspot.com/remote_api --rps_limit=500 --bandwidth_limit=2500000 --batch_size=100

I used the following to upload the single csv file back to the datastore:

appcfg.py upload_data --filename=backup.csv --application=MyAppId --url=http://MyAppId.appspot.com/remote_api --rps_limit=500 --bandwidth_limit=2500000 --batch_size=100


They are the same commands but just download_data and upload_data swapped around
.

The idea is just let appcfg download and upload all entities (not being kind specific) with using any export or import transformations.

Dennis Lo

unread,
May 16, 2011, 8:00:36 PM5/16/11
to objectify...@googlegroups.com
Sorry, the last sentence:


"The idea is just let appcfg download and upload all entities (not being kind specific) with using any export or import transformations."

Should say:

"The idea is just let appcfg download and upload all entities (not being kind specific) WITHOUT using any export or import transformations from your bulkloader.yaml config file."

Jeff Schnitzer

unread,
May 16, 2011, 8:55:22 PM5/16/11
to objectify...@googlegroups.com
That seems safest.

One thing to watch out for, reported here in the past: The bulk
loader API doesn't restore properties with the exact same indexed
state as they were in the original data. You might find you need to
re-put() all your entities with Objectify (post-restore) to get proper
index behavior.

Jeff

Dennis Lo

unread,
May 16, 2011, 9:11:12 PM5/16/11
to objectify...@googlegroups.com
I'm unsure what you mean by "same indexed" state?

How do I "re-put()" them all my entities (again, I'm thinking about
with the 30 second timelimit) as there are many few entities.

Jeff Schnitzer

unread,
May 17, 2011, 1:59:05 PM5/17/11
to objectify...@googlegroups.com
Every individual property of every single entity has a state of
indexed-or-not. It's the difference, at the low-level api, of calling
Entity.setProperty() or Entity.setUnindexedProperty(). In addition to
whether or not the single-property index is created, this state
defines whether or not the entity participates in a multi-property
index that includes that property.

The problem is that this is bit of state is write-only. There is no
API at the low-level to find out whether a property was indexed or
not. So I believe (at least, this is what was reported recently) the
bulkloader just assumes that all properties are unindexed. Thus
queries don't work after you restore.

The only solution to this is to define your schema (and its indexing
rules) and feed that to the bulk loader. But this isn't very
convenient if you aren't using python.

Another solution is to use Objectify to re-put() every single entity.
Just iterate through every single entity in your datastore with a
query and put() it (making no other changes - watch out for lifecycle
callbacks). Objectify will set the indexed state properly.

Obviously, going through and re-put()ing all your entities may have a
cost, potentially severe if you have billions of entities. But then,
a bulk restore won't be cheap either.

This is a major deficiency in the bulk load/restore process. There
doesn't appear to be an open issue in the appengine issue tracker for
it, but there should be.

Jeff

Riley

unread,
May 18, 2011, 10:35:06 AM5/18/11
to objectify-appengine
Regarding unindexed bulkuploader... are you sure that it doesn't
assume that every property SHOULD be indexed? I've used the
bulkuploader to restore ~1M entities (cost 6 cpu hours) and my queries
work fine afterwards.

Riley
> > On 5/17/11, Jeff Schnitzer <j...@infohazard.org> wrote:
> >> That seems safest.
>
> >> One thing to watch out for, reported here in the past:  The bulk
> >> loader API doesn't restore properties with the exact same indexed
> >> state as they were in the original data.  You might find you need to
> >> re-put() all your entities with Objectify (post-restore) to get proper
> >> index behavior.
>
> >> Jeff
>

Jeff Schnitzer

unread,
May 18, 2011, 2:19:34 PM5/18/11
to objectify...@googlegroups.com
Not sure at all. I just know that one person complained about it on
this list a few months ago.

Assuming every property should be indexed brings up an entirely
different set of problems, depending on your schema...

Jeff

Reply all
Reply to author
Forward
0 new messages