create_bulkloader_config doesn't reflect the latest schema changes

61 views
Skip to first unread message

Hooman Korasani

unread,
Jun 25, 2013, 3:58:18 AM6/25/13
to google-a...@googlegroups.com
I have created a bulkloader.yaml like this:

appcfg.py create_bulkloader_config --filename=tools/bulkloader.yaml --url=https://myid.appspot.com/_ah/remote_api


However it doesn't reflect my changes from a day ago.  How long do I have to wait until it picks it up?

According to the data-store statistics, the last statistic update was Last updated: 1 day, 21:11:28 ago
However it also says "Statistics are updated at least once per day"

Is there any other way to get the bulkloader consider my latest changes?

Many Thanks,
Hooman


Hooman Korasani

unread,
Jun 26, 2013, 5:41:40 AM6/26/13
to google-a...@googlegroups.com
Any one from google here?

The statistics are failing since three days. Without up-to-date statistics there is no way to get the bulkloader working properly. 

http://stackoverflow.com/questions/17275202/how-to-force-create-bulkloader-config-to-pickup-latest-schema-data/17277052#comment25075099_17277052

Jesse Rohwer

unread,
Jun 27, 2013, 1:58:18 PM6/27/13
to google-a...@googlegroups.com
Hi Hooman,

Bulkload operations on the HRD are eventually consistent, which means you cannot use bulkloader with HRD and be guaranteed to get up-to-date data.

In particular,
Note: This document applies to apps that use the master/slave datastore. If your app uses the High Replication datastore, it is possible to copy data from the app, but Google does not currently support this use case. If you attempt to copy from a High Replication datastore, you'll see a high_replication_warning error in the Admin Console, and the downloaded data might not include recently saved entities.

That said, it is true that in the past two days the HRD has had longer-than-usual eventual consistency. This is returning to a more normal range now, but again: for operations that are not strongly consistent, including bulkloader, there are no guarantees. I hope this helps clarify the situation.

Thanks,
Jesse Rohwer on behalf of the App Engine team

Hooman Korasani

unread,
Jun 27, 2013, 2:28:30 PM6/27/13
to google-a...@googlegroups.com
Hi Jesse,

First off thank you so much for replying to this issue. We are a young startup and are in process of migrating from AWS to GAE.
On day one (early this week) the bulkloader worked nicely even despite the HRD.  I understand I can't rely on it, but how else do we transfer our data to google app engine?

At least, if I were able to create one bulk_loader_config yaml, I would be able to transfer our data via cvs files.  Is there any other official way for HDS?

Many Thanks,
Hooman
--
You received this message because you are subscribed to a topic in the Google Groups "Google App Engine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine/1zv8LNucS7g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Jesse Rohwer

unread,
Jun 28, 2013, 4:21:06 PM6/28/13
to google-a...@googlegroups.com
Hi Hooman,

For uploading data, if your dataset is relatively small, the bulkloader will do fine. You might want to also consider uploading as CSV to Cloud Storage, then running a mapreduce to copy from Cloud Storage to Datastore. This is the method we would currently recommend for larger datasets, please see https://developers.google.com/appengine/docs/python/googlestorage/#Using_Google_Storage

Also, if your issue is just that the bulkloader auto config is not picking up recent datastore changes, you could manually edit the configuration as described here: https://developers.google.com/appengine/docs/python/tools/uploadingdata#Editing_the_Configuration_File.

Hope that helps!
Reply all
Reply to author
Forward
0 new messages