import Datastore entities inside BigQuery : problem while reading csv files (regression issue ?)

Julien Piquot

unread,

Mar 17, 2015, 11:52:19 AM3/17/15

to app-engine-...@googlegroups.com

Hi all,

I am using the pipeline and the map reduce API to import datastore entites inside BigQuery. My configuration is pretty much the same as the bigqueryload exemple :

The input is a DatastoreInput.
The output is BigQueryGoogleCloudStorageStoreOutput
There is an additional job called BigQueryLoadGoogleCloudStorageFilesJob that read the csv files and fill the table

I am using appengine-mapreduce version 0.8.1 and everything works just fine. I tried to update the library to 0.8.2 and the final staging job is failing :

com.google.appengine.tools.mapreduce.bigqueryjobs.RetryLoadOrCleanupJob run: Job failed while writing to Bigquery. Retrying...#attempt 4 Error details : invalid: Invalid path: gs://my_bucket/Job-57656379-0175-4393-afaa-6d70d86a3322/Shard-0000/file-1426599362203 at null

I noticed that regardless of the version or the number of Datastore entities, some csv files may stay empty :

com.google.appengine.tools.mapreduce.impl.WorkerShardTask run: Ending slice after 0 items read and calling the worker 0 times

I am having hard time understanding what's going on but my guess is that the BigQueryLoadGoogleCloudStorageFilesJob 0.8.2 version doen't like very much empty csv files.

Any idea about that ?

Thanks for the help,

Julien

Arie Ozarov

unread,

Mar 19, 2015, 8:29:59 PM3/19/15

to app-engine-...@googlegroups.com

It does not look like BigQueryLoadGoogleCloudStorageFilesJob between 0.8.1 and 0.8.2 but do you mind to post this issue here: https://github.com/GoogleCloudPlatform/appengine-mapreduce/issues

Both appengine-mapreduce and appengine-pipeline projects were moved to gitHub.

Julien Piquot

unread,

Mar 20, 2015, 5:32:10 AM3/20/15

to app-engine-...@googlegroups.com

Hello,

Thanks for your answer. I just published the same message on Github.

Julien

Reply all

Reply to author

Forward