Greetings all. I've hit a snag trying to get a dataflow pipeline (using Scio/Beam) that accesses BigQuery running under cloud-composer. The first step of the pipeline pulls data from BQ, it processes said data and writes the results out to Bigtable.
I have the DAG written and installed and such. Then when I use the CLI to run the job by hand, it does kick the Dataflow job off correctly, but it dies with the following exceptions:
INFO - Start waiting for DataFlow process to complete.
WARNING - Dec 12, 2018 9:53:43 PM com.google.cloud.bigtable.grpc.io.RefreshingOAuth2CredentialsInterceptor info
WARNING - INFO: Refreshing the OAuth token
WARNING - Exception in thread "main" java.io.IOException: Unable to create parent directories of /.bigquery/81e826a252ca6001d4581b1debbfaf08.table.txt
As far as I can tell, composer/airflow is trying to create a scratch area to export the BigQuery data to, and failing to do so due to some sort of permissions issue. The stagingLocation and tempLocation variables are both set to GCS buckets I have access to.
In the DAG, I'm running the job using the DataFlowJavaOperator if this is relevant.
Any insight on how to iron this out would be greatly appreciated. This is the first time we've tried to get a pipeline that reads from BigQuery and I'm presuming that we've missed a configuration flag or something like that.
Cheers all,
Monte