Hi all,
I am using the pipeline and the map reduce API to import datastore entites inside BigQuery. My configuration is pretty much the same as the bigqueryload exemple :
- The input is a DatastoreInput.
- The output is BigQueryGoogleCloudStorageStoreOutput
- There is an additional job called BigQueryLoadGoogleCloudStorageFilesJob that read the csv files and fill the table
I am using appengine-mapreduce version 0.8.1 and everything works just fine. I tried to update the library to 0.8.2 and the final staging job is failing :
com.google.appengine.tools.mapreduce.bigqueryjobs.RetryLoadOrCleanupJob run: Job failed while writing to Bigquery. Retrying...#attempt 4 Error details : invalid: Invalid path: gs://my_bucket/Job-57656379-0175-4393-afaa-6d70d86a3322/Shard-0000/file-1426599362203 at null
I noticed that regardless of the version or the number of Datastore entities, some csv files may stay empty :
com.google.appengine.tools.mapreduce.impl.WorkerShardTask run: Ending slice after 0 items read and calling the worker 0 times
I am having hard time understanding what's going on but my guess is that the BigQueryLoadGoogleCloudStorageFilesJob 0.8.2 version doen't like very much empty csv files.
Any idea about that ?
Thanks for the help,
Julien