Prerelease: New Appengine Bulkloader

238 views
Skip to first unread message

Matthew Blain

unread,
Apr 9, 2010, 6:26:45 PM4/9/10
to Google App Engine
I’d like to announce a wider prerelease of the new App Engine
Bulkloader configuration format. This new format extends the existing
bulkloader with a new format with several advantages:

* Semi-Automatic configuration generation: The SDK can generate a
bulkloader configuration based on your existing data.
* Supports more data formats. CSV support has been extended to files
with headers, basic XML support has been added, and it is simpler to
use alternate text encodings in the files. Additional and custom data
connectors are now easier to create.
* Easier to use -- the new syntax is more declarative and
descriptive than the old one. Developers who use languages other than
Python no longer need to write code in the Python language, although
the Python SDK is still required to run the tool.

To try out the new bulkloader, please visit http://bulkloadersample.appspot.com/
. Preliminary documentation is available on the README on that site
along with a version of the 1.3.2 Python SDK containing the new
bulkloader.

You can send feedback to the group or directly to me.

--Matthew Blain
Google App Engine Team
matthew.blai...@google.com

Jayz

unread,
Apr 21, 2010, 1:55:55 AM4/21/10
to Google App Engine
A welcome improvement over the previous one. I tried this one on my
java app. There is a small concern, the generated yaml file contains
duplicate properties (with the message "# Warning: This property is a
duplicate, but with a different kind.") I had to comment out all the
duplicate properties to make this work. Now I have to try it on large
entities.

On Apr 10, 3:26 am, Matthew Blain <matthew.bl...@google.com> wrote:
> I’d like to announce a wider prerelease of the new App EngineBulkloaderconfiguration format. This new format extends the existingbulkloaderwith a new format with several advantages:
>
>   * Semi-Automatic configuration generation: The SDK can generate abulkloaderconfiguration based on your existing data.
>   * Supports more data formats. CSV support has been extended to files
> with headers, basic XML support has been added, and it is simpler to
> use alternate text encodings in the files. Additional and custom data
> connectors are now easier to create.
>   * Easier to use -- the new syntax is more declarative and
> descriptive than the old one. Developers who use languages other than
> Python no longer need to write code in the Python language, although
> the Python SDK is still required to run the tool.
>
> To try out the newbulkloader, please visithttp://bulkloadersample.appspot.com/
> . Preliminary documentation is available on the README on that site
> along with a version of the 1.3.2 Python SDK containing the newbulkloader.
>
> You can send feedback to the group or directly to me.
>
> --Matthew Blain
> Google App Engine Team
> matthew.blain+bulkloa...@google.com

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Sam Briesemeister

unread,
Apr 22, 2010, 1:13:12 PM4/22/10
to Google App Engine
A couple of questions:

- Is this feature easily ported into the 1.3.3 release of 4/21?
- Is there a release timeline for this feature?

Thanks!


On Apr 9, 3:26 pm, Matthew Blain <matthew.bl...@google.com> wrote:
> I’d like to announce a wider prerelease of the new App Engine
> Bulkloader configuration format. This new format extends the existing
> bulkloader with a new format with several advantages:
>
>   * Semi-Automatic configuration generation: The SDK can generate a
> bulkloader configuration based on your existing data.
>   * Supports more data formats. CSV support has been extended to files
> with headers, basic XML support has been added, and it is simpler to
> use alternate text encodings in the files. Additional and custom data
> connectors are now easier to create.
>   * Easier to use -- the new syntax is more declarative and
> descriptive than the old one. Developers who use languages other than
> Python no longer need to write code in the Python language, although
> the Python SDK is still required to run the tool.
>
> To try out the new bulkloader, please visithttp://bulkloadersample.appspot.com/
> . Preliminary documentation is available on the README on that site
> along with a version of the 1.3.2 Python SDK containing the new
> bulkloader.
>
> You can send feedback to the group or directly to me.
>
> --Matthew Blain
> Google App Engine Team

Matthew Blain

unread,
Apr 23, 2010, 2:36:13 PM4/23/10
to Google App Engine
I have just updated http://bulkloadersample.appspot.com/ with a new
release.
* Updated to the 1.3.3 Python SDK
* The wizard is easier to run and generates TODO comments on all
lines which require editing. (You may want to do additional edits
too.)
* 'xml' format has been renamed to 'simplexml'--leaving room a more
effective xml connector in the future may have a different interface.
(No, there's no current plans for one.)

I plan to talk about this at Google I/O:
http://code.google.com/events/io/2010/sessions/data-migration-appengine.html
Now is a great time to send feedback about the tool, or questions
about the tool, to maximize the chance that your questions can be
addressed in the session.

--Matthew


On Apr 22, 10:13 am, Sam Briesemeister <sam.briesemeis...@gmail.com>
wrote:
> A couple of questions:
>
>  - Is this feature easily ported into the 1.3.3 release of 4/21?
>  - Is there a release timeline for this feature?
>
> Thanks!
>
> On Apr 9, 3:26 pm, Matthew Blain <matthew.bl...@google.com> wrote:
>
>
>
>
>
> > I’d like to announce a wider prerelease of the new App Engine
> >Bulkloaderconfiguration format. This new format extends the existing
> >bulkloaderwith a new format with several advantages:
>
> >   * Semi-Automatic configuration generation: The SDK can generate a
> >bulkloaderconfiguration based on your existing data.
> >   * Supports more data formats. CSV support has been extended to files
> > with headers, basic XML support has been added, and it is simpler to
> > use alternate text encodings in the files. Additional and custom data
> > connectors are now easier to create.
> >   * Easier to use -- the new syntax is more declarative and
> > descriptive than the old one. Developers who use languages other than
> > Python no longer need to write code in the Python language, although
> > the Python SDK is still required to run the tool.
>
> > To try out the newbulkloader, please visithttp://bulkloadersample.appspot.com/

Josh

unread,
Apr 28, 2010, 3:44:39 PM4/28/10
to Google App Engine
I get the following error and then the process has to be killed to
exit. Any ideas?

[ERROR ] [Thread-11] ExportProgressThread:
Traceback (most recent call last):
File "/Users/s/Downloads/google_appengine/google/appengine/tools/
bulkloader.py", line 1442, in run
self.PerformWork()
File "/Users/s/Downloads/google_appengine/google/appengine/tools/
bulkloader.py", line 2210, in PerformWork
item.key_end)
File "/Users/s/Downloads/google_appengine/google/appengine/tools/
bulkloader.py", line 1993, in StoreKeys
self.py_type))
AssertionError:
agh3Zi1kYWlseXJFCxInX19TdGF0X1Byb3BlcnR5VHlwZV9Qcm9wZXJ0eU5hbWVfS2luZF9fIhhCbG9iX2NvbnRlbnRfRW50aXR5U2hhcmQM
is a <class 'google.appengine.api.datastore_types.Key'>,
_ProgressDatabase expected <type 'int'>



On Apr 23, 1:36 pm, Matthew Blain <matthew.bl...@google.com> wrote:
> I have just updatedhttp://bulkloadersample.appspot.com/with a new
> release.
>  * Updated to the 1.3.3 Python SDK
>  * The wizard is easier to run and generates TODO comments on all
> lines which require editing. (You may want to do additional edits
> too.)
>  * 'xml' format has been renamed to 'simplexml'--leaving room a more
> effective xml connector in the future may have a different interface.
> (No, there's no current plans for one.)
>
> I plan to talk about this at Google I/O:http://code.google.com/events/io/2010/sessions/data-migration-appengi...

Yoav Aviram

unread,
May 3, 2010, 8:21:23 AM5/3/10
to Google App Engine
I am getting the same AssertionError exception.

On Apr 28, 10:44 pm, Josh <jmskin...@gmail.com> wrote:
> I get the following error and then the process has to be killed to
> exit.  Any ideas?
>
> [ERROR   ] [Thread-11] ExportProgressThread:
> Traceback (most recent call last):
>   File "/Users/s/Downloads/google_appengine/google/appengine/tools/
> bulkloader.py", line 1442, in run
>     self.PerformWork()
>   File "/Users/s/Downloads/google_appengine/google/appengine/tools/
> bulkloader.py", line 2210, in PerformWork
>     item.key_end)
>   File "/Users/s/Downloads/google_appengine/google/appengine/tools/
> bulkloader.py", line 1993, in StoreKeys
>     self.py_type))
> AssertionError:
> agh3Zi1kYWlseXJFCxInX19TdGF0X1Byb3BlcnR5VHlwZV9Qcm9wZXJ0eU5hbWVfS2luZF9fIhhCbG9iX2NvbnRlbnRfRW50aXR5U2hhcmQM
> is a <class 'google.appengine.api.datastore_types.Key'>,
> _ProgressDatabase expected <type 'int'>
>
> On Apr 23, 1:36 pm, Matthew Blain <matthew.bl...@google.com> wrote:
>
>
>
> > I have just updatedhttp://bulkloadersample.appspot.com/witha new

Yoav Aviram

unread,
May 3, 2010, 8:21:39 AM5/3/10
to Google App Engine
I am getting the same AssertionError exception.

On Apr 28, 10:44 pm, Josh <jmskin...@gmail.com> wrote:
> I get the following error and then the process has to be killed to
> exit.  Any ideas?
>
> [ERROR   ] [Thread-11] ExportProgressThread:
> Traceback (most recent call last):
>   File "/Users/s/Downloads/google_appengine/google/appengine/tools/
> bulkloader.py", line 1442, in run
>     self.PerformWork()
>   File "/Users/s/Downloads/google_appengine/google/appengine/tools/
> bulkloader.py", line 2210, in PerformWork
>     item.key_end)
>   File "/Users/s/Downloads/google_appengine/google/appengine/tools/
> bulkloader.py", line 1993, in StoreKeys
>     self.py_type))
> AssertionError:
> agh3Zi1kYWlseXJFCxInX19TdGF0X1Byb3BlcnR5VHlwZV9Qcm9wZXJ0eU5hbWVfS2luZF9fIhhCbG9iX2NvbnRlbnRfRW50aXR5U2hhcmQM
> is a <class 'google.appengine.api.datastore_types.Key'>,
> _ProgressDatabase expected <type 'int'>
>
> On Apr 23, 1:36 pm, Matthew Blain <matthew.bl...@google.com> wrote:
>
>
>
> > I have just updatedhttp://bulkloadersample.appspot.com/witha new

Craig Berry

unread,
May 3, 2010, 10:53:54 PM5/3/10
to Google App Engine
Perhaps this is documented somewhere, but I can't find it. I'm trying
to upload to a Java app, configured as recommended, if that matters.
How do I provide authentication information (username and password) to
bulkloader.py so that it can connect to my remote_api?

On Apr 9, 3:26 pm, Matthew Blain <matthew.bl...@google.com> wrote:
> I’d like to announce a wider prerelease of the new App Engine
> Bulkloader configuration format. This new format extends the existing
> bulkloader with a new format with several advantages:
>
>   * Semi-Automatic configuration generation: The SDK can generate a
> bulkloader configuration based on your existing data.
>   * Supports more data formats. CSV support has been extended to files
> with headers, basic XML support has been added, and it is simpler to
> use alternate text encodings in the files. Additional and custom data
> connectors are now easier to create.
>   * Easier to use -- the new syntax is more declarative and
> descriptive than the old one. Developers who use languages other than
> Python no longer need to write code in the Python language, although
> the Python SDK is still required to run the tool.
>
> To try out the new bulkloader, please visithttp://bulkloadersample.appspot.com/
> . Preliminary documentation is available on the README on that site
> along with a version of the 1.3.2 Python SDK containing the new
> bulkloader.
>
> You can send feedback to the group or directly to me.
>
> --Matthew Blain
> Google App Engine Team

Craig Berry

unread,
May 4, 2010, 1:51:15 PM5/4/10
to Google App Engine
Anybody there? I could really use a pointer on this. To clarify my
situation:

1. I'm using the Eclipse java plugin to build and deploy my app.
2. I'm trying to use the preview bulkloader tool to transform and
upload data from an existing db.
3. I set up the app-side java remote_api servlet as described.
4. I obtained the zip file, extracted it to a new dir on disk, and ran
the script as described in the doc.
5. When I do that, I get an authentication failure.

This is perhaps not surprising, as I never provided my credentials
anywhere. How and where is that supposed to happen?

Matthew Blain

unread,
May 4, 2010, 5:34:47 PM5/4/10
to Google App Engine
You should get prompted for authentication. Make sure remote_api is
installed correctly on your server by visiting whatever you used for
the --url argument in your browser. It should require you to log in
then say "This request did not contain a necessary header"

--Matthew

Matthew Blain

unread,
May 4, 2010, 5:36:37 PM5/4/10
to Google App Engine
If you don't mind sending me your appid privately and giving you
permission to download the data from your app, I can try to
investigate.

--Matthew

Craig Berry

unread,
May 4, 2010, 5:42:40 PM5/4/10
to google-a...@googlegroups.com
I did that check, and got prompted to log in. But the script dies
without ever having prompted me.
--
Craig Berry - http://lapidum.org/home.html
"Magicians lie to the universe, and the
universe believes them." -- Lenore Berry

Ryan

unread,
May 8, 2010, 6:42:07 PM5/8/10
to Google App Engine

Not sure where you want bug reports, but I fixed a little something:

Basically, I couldn't have null/empty reference attributes, I would
get this exception:

Traceback (most recent call last):
File "/Users/ryan/Downloads/google_appengine/google/appengine/tools/
adaptive_thread_pool.py", line 150, in WorkOnItems
status, instruction = item.PerformWork(self.__thread_pool)
File "/Users/ryan/Downloads/google_appengine/google/appengine/tools/
bulkloader.py", line 695, in PerformWork
transfer_time = self._TransferItem(thread_pool)
File "/Users/ryan/Downloads/google_appengine/google/appengine/tools/
bulkloader.py", line 850, in _TransferItem
self.content = self.request_manager.EncodeContent(self.rows)
File "/Users/ryan/Downloads/google_appengine/google/appengine/tools/
bulkloader.py", line 1271, in EncodeContent
entity = loader.create_entity(values, key_name=key, parent=parent)
File "/Users/ryan/Downloads/google_appengine/google/appengine/ext/
bulkload/bulkloader_config.py", line 383, in create_entity
return self.dict_to_entity(input_dict, self.bulkload_state)
File "/Users/ryan/Downloads/google_appengine/google/appengine/ext/
bulkload/bulkloader_config.py", line 129, in dict_to_entity
self.__run_import_transforms(input_dict, instance, bulkload_state)
File "/Users/ryan/Downloads/google_appengine/google/appengine/ext/
bulkload/bulkloader_config.py", line 231, in __run_import_transforms
value = self.__dict_to_prop(transform, input_dict, bulkload_state)
File "/Users/ryan/Downloads/google_appengine/google/appengine/ext/
bulkload/bulkloader_config.py", line 186, in __dict_to_prop
value = transform.import_transform(value)
File "/Users/ryan/Downloads/google_appengine/google/appengine/ext/
bulkload/bulkloader_parser.py", line 90, in __call__
return self.method(*args, **kwargs)
File "/Users/ryan/Downloads/google_appengine/google/appengine/ext/
bulkload/transform.py", line 141, in generate_foreign_key_lambda
return datastore.Key.from_path(kind, value)
File "/Users/ryan/Downloads/google_appengine/google/appengine/api/
datastore_types.py", line 382, in from_path
ValidateString(id_or_name, 'name')
File "/Users/ryan/Downloads/google_appengine/google/appengine/api/
datastore_types.py", line 107, in ValidateString
raise exception('%s must not be empty.' % name)
BadValueError: name must not be empty.

The fix is pretty simple (in ext/bulkload/transform.py):

def generate_foreign_key_lambda(value):
if key_is_id:
value = int(value)
if not value: # ADDED
return None # ADDED
return datastore.Key.from_path(kind, value)

return generate_foreign_key_lambda

Other than that, working great for me.

Thanks,
Ryan

On May 4, 3:42 pm, Craig Berry <cdbe...@gmail.com> wrote:
> I did that check, and got prompted to log in. But the script dies
> without ever having prompted me.
>
>
>
>
>
> On Tue, May 4, 2010 at 14:34, Matthew Blain <matthew.bl...@google.com> wrote:
> > You should get prompted for authentication. Make sure remote_api is
> > installed correctly on your server by visiting whatever you used for
> > the --url argument in your browser. It should require you to log in
> > then say "This request did not contain a necessary header"
>
> > --Matthew
>
> > On May 4, 10:51 am, Craig Berry <cdbe...@gmail.com> wrote:
> >> Anybody there? I could really use a pointer on this. To clarify my
> >> situation:
>
> >> 1. I'm using the Eclipse java plugin to build and deploy my app.
> >> 2. I'm trying to use the previewbulkloadertool to transform and
> >> upload data from an existing db.
> >> 3. I set up the app-side java remote_api servlet as described.
> >> 4. I obtained the zip file, extracted it to a new dir on disk, and ran
> >> the script as described in the doc.
> >> 5. When I do that, I get an authentication failure.
>
> >> This is perhaps not surprising, as I never provided my credentials
> >> anywhere. How and where is that supposed to happen?
>
> >> On May 3, 7:53 pm, Craig Berry <cdbe...@gmail.com> wrote:
>
> >> > Perhaps this is documented somewhere, but I can't find it. I'm trying
> >> > to upload to a Java app, configured as recommended, if that matters.
> >> > How do I provide authentication information (username and password) to
> >> >bulkloader.py so that it can connect to my remote_api?
>
> >> > On Apr 9, 3:26 pm, Matthew Blain <matthew.bl...@google.com> wrote:
>
> >> > > I’d like to announce a wider prerelease of the new App Engine
> >> > >Bulkloaderconfiguration format. This new format extends the existing
> >> > >bulkloaderwith a new format with several advantages:
>
> >> > >   * Semi-Automatic configuration generation: The SDK can generate a
> >> > >bulkloaderconfiguration based on your existing data.
> >> > >   * Supports more data formats. CSV support has been extended to files
> >> > > with headers, basic XML support has been added, and it is simpler to
> >> > > use alternate text encodings in the files. Additional and custom data
> >> > > connectors are now easier to create.
> >> > >   * Easier to use -- the new syntax is more declarative and
> >> > > descriptive than the old one. Developers who use languages other than
> >> > > Python no longer need to write code in the Python language, although
> >> > > the Python SDK is still required to run the tool.
>
> >> > > To try out the newbulkloader, please visithttp://bulkloadersample.appspot.com/
> >> > > . Preliminary documentation is available on the README on that site
> >> > > along with a version of the 1.3.2 Python SDK containing the new
> >> > >bulkloader.
>
> >> > > You can send feedback to the group or directly to me.
>
> >> > > --Matthew Blain
> >> > > Google App Engine Team
> >> > > matthew.blain+bulkloa...@google.com
>
> >> > --
> >> > You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> >> > To post to this group, send email to google-a...@googlegroups.com.
> >> > To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> >> > For more options, visit this group athttp://groups.google.com/group/google-appengine?hl=en.
>
> >> --
> >> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> >> To post to this group, send email to google-a...@googlegroups.com.
> >> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> >> For more options, visit this group athttp://groups.google.com/group/google-appengine?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> > To post to this group, send email to google-a...@googlegroups.com.
> > To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/google-appengine?hl=en.
>
> --
> Craig Berry -http://lapidum.org/home.html
> "Magicians lie to the universe, and the
> universe believes them."  -- Lenore Berry
>
> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

Matthew Blain

unread,
May 11, 2010, 9:10:54 PM5/11/10
to Google App Engine
Hi Ryan,
Thanks for the note. You can also do the same thing in your transform
with

transform.none_if_empty(transform.generate_foreign_key(...))

Perhaps this should be the default on a few more of the supplied
transforms.

--Matthew

On May 8, 3:42 pm, Ryan <ryanleeschnei...@gmail.com> wrote:
> Not sure where you want bug reports, but I fixed a little something:
>
> Basically, I couldn't have null/empty reference attributes, I would
> get this exception:
>
> Traceback (most recent call last):
>   File "/Users/ryan/Downloads/google_appengine/google/appengine/tools/
> adaptive_thread_pool.py", line 150, in WorkOnItems
>     status, instruction = item.PerformWork(self.__thread_pool)
>   File "/Users/ryan/Downloads/google_appengine/google/appengine/tools/bulkloader.py", line 695, in PerformWork
>     transfer_time = self._TransferItem(thread_pool)
>   File "/Users/ryan/Downloads/google_appengine/google/appengine/tools/bulkloader.py", line 850, in _TransferItem
>     self.content = self.request_manager.EncodeContent(self.rows)
>   File "/Users/ryan/Downloads/google_appengine/google/appengine/tools/bulkloader.py", line 1271, in EncodeContent

Stirman

unread,
May 12, 2010, 12:23:09 PM5/12/10
to Google App Engine
I am also having the same AssertionError exception, has anyone
discovered what the issue is?

Ryan

unread,
May 16, 2010, 3:32:39 PM5/16/10
to Google App Engine
I have another issue: I'm using GeoModel which stores a list of
strings. However, when uploading vales the string list is instead set
to a sinlge string (e.g. "[ u'8', u'82, ...]" ). Is there a yaml field
I'm missing to tell the bulk loader that a coulmn in my CDC is
actually a list?

Thanks!
Ryan
Reply all
Reply to author
Forward
0 new messages