Using bulkloader to load 13k entities from 1 kind consume all Datastore Write Operations free quota and load only 2k entities on a Master/Slave datastore

97 views
Skip to first unread message

Mathieu Clavel

unread,
Nov 16, 2011, 4:34:07 AM11/16/11
to google-a...@googlegroups.com
Hello,

I'm trying to load data to the datastore to use my application.
I'm loading more than 13.000 entities of one kind from a cvs file via the bulkloader.
Since the new free quota, loading fail by consuming all quota loading only 1.000-2.000 entities.

I'm using the python sdk 1.5.3 (I've not yet tested the 1.6.0).
The command line I use is :
appcfg.py upload_data --batch_size=1000 --rps_limit=1000 --config_file=bulkloader.yaml --filename=contrats.csv --kind=Contrat --url "https://%APP_HOST%.appspot.com/remote_api"

My bulkloader.yaml look like this (not every properties)
transformers:

- kind: Contrat
  connector: csv
  connector_options:
    # TODO: Add connector options here--these are specific to each connector.
    encoding: windows-1252
    import_options:
        dialect: 'excel'
        delimiter: ';'
    export_options:
        dialect: 'excel'
        delimiter: ';'
  property_map:
    - property: __key__
      external_name: key
      export_transform: transform.key_id_or_name_as_string
      import_transform: transform.create_foreign_key('Contrat', key_is_id=True)

    - property: canalClientId
      external_name: canalClientId
      # Type: Integer Stats: 6010 properties of this type in this kind.
      import_transform: transform.none_if_empty(int)

    - property: codePostalPDL
      external_name: codePostalPDL
      # Type: String Stats: 7135 properties of this type in this kind.

My file contrats.csv is obtain by downloading data from another server using the bulkloader.
Upload is working on the local dev server (not the same command line) and on another paying appengine app.

How can trying to load 13k entities can consume 50k datastore entities (and loading only 2k) ?
(Datastore Write Operation was at 0% before the first try)


Thanks,

Mathieu

Mathieu Clavel

unread,
Nov 17, 2011, 5:44:48 AM11/17/11
to google-a...@googlegroups.com
Here is the error log (using only 1 thread instead of default 10) :

Uploading data records.
[INFO    ] Logging to bulkloader-log-20111117.113920
[INFO    ] Throttling transfers:
[INFO    ] Bandwidth: 250000 bytes/second
[INFO    ] HTTP connections: 8/second
[INFO    ] Entities inserted/fetched/modified: 1000/second
[INFO    ] Batch Size: 1000
[INFO    ] Opening database: bulkloader-progress-20111117.113920.sql3
[INFO    ] Connecting to altergaz-oav.appspot.com/remote_api
[INFO    ] Starting import; maximum 1000 entities per post
.[INFO    ] [WorkerThread-0] Backing off due to errors: 1.0 seconds
.[INFO    ] [WorkerThread-0] Backing off due to errors: 2.0 seconds
[ERROR   ] [WorkerThread-0] WorkerThread:
Traceback (most recent call last):
  File "C:\Dev\sdk\pyActuel\google\appengine\tools\adaptive_thread_pool.py", line 176, in WorkOnItems
    status, instruction = item.PerformWork(self.__thread_pool)
  File "C:\Dev\sdk\pyActuel\google\appengine\tools\bulkloader.py", line 764, in PerformWork
    transfer_time = self._TransferItem(thread_pool)
  File "C:\Dev\sdk\pyActuel\google\appengine\tools\bulkloader.py", line 935, in _TransferItem
    self.request_manager.PostEntities(self.content)
  File "C:\Dev\sdk\pyActuel\google\appengine\tools\bulkloader.py", line 1420, in PostEntities
    datastore.Put(entities)
  File "C:\Dev\sdk\pyActuel\google\appengine\api\datastore.py", line 602, in Put
    return PutAsync(entities, **kwargs).get_result()
  File "C:\Dev\sdk\pyActuel\google\appengine\datastore\datastore_rpc.py", line 783, in get_result
    result = rpc.get_result()
  File "C:\Dev\sdk\pyActuel\google\appengine\api\apiproxy_stub_map.py", line 592, in get_result
    return self.__get_result_hook(self)
  File "C:\Dev\sdk\pyActuel\google\appengine\datastore\datastore_rpc.py", line 1547, in __put_hook
    self.check_rpc_success(rpc)
  File "C:\Dev\sdk\pyActuel\google\appengine\datastore\datastore_rpc.py", line 1182, in check_rpc_success
    rpc.check_success()
  File "C:\Dev\sdk\pyActuel\google\appengine\api\apiproxy_stub_map.py", line 558, in check_success
    self.__rpc.CheckSuccess()
  File "C:\Dev\sdk\pyActuel\google\appengine\api\apiproxy_rpc.py", line 156, in _WaitImpl
    self.request, self.response)
  File "C:\Dev\sdk\pyActuel\google\appengine\ext\remote_api\remote_api_stub.py", line 248, in MakeSyncCall
    handler(request, response)
  File "C:\Dev\sdk\pyActuel\google\appengine\ext\remote_api\remote_api_stub.py", line 391, in _Dynamic_Put
    'datastore_v3', 'Put', put_request, put_response)
  File "C:\Dev\sdk\pyActuel\google\appengine\ext\remote_api\remote_api_stub.py", line 177, in MakeSyncCall
    self._MakeRealSyncCall(service, call, request, response)
  File "C:\Dev\sdk\pyActuel\google\appengine\ext\remote_api\remote_api_stub.py", line 199, in _MakeRealSyncCall
    raise UnknownJavaServerError("An unknown error has occured in the "
UnknownJavaServerError: An unknown error has occured in the Java remote_api handler for this call.
[INFO    ] An error occurred. Shutting down...
[ERROR   ] Error in WorkerThread-0: An unknown error has occured in the Java remote_api handler for this call.

[INFO    ] 13320 entities total, 0 previously transferred
[INFO    ] 2000 entities (3920912 bytes) transferred in 162.3 seconds
[INFO    ] Some entities not successfully transferred

Simon Knott

unread,
Nov 17, 2011, 5:56:07 AM11/17/11
to google-a...@googlegroups.com
How many properties do you have indexed on your types?

Remember that for each new entity it will consume 1 write for the entity, 2 writes for each property which is indexed and then 1 write for each custom query index.

Mathieu Clavel

unread,
Nov 17, 2011, 6:03:53 AM11/17/11
to google-a...@googlegroups.com
52 indexes...
I'm vaccuming old indexes, I though it was done when uploading a new version.
I will try again when it will be clean.

Thanks, now I know why all quota is used.

Mathieu
Reply all
Reply to author
Forward
0 new messages