BigQuery hook doesn't work fully for BigQuery dataset in regions other than US and EU

1,611 views
Skip to first unread message

Shiqiang Duan

unread,
Dec 27, 2018, 9:06:55 AM12/27/18
to cloud-composer-discuss
We were using cloud composer to do a log data load jobs. Recently we started to using it with BigQuery dataset that's not in US or EU (asia-southeast1) and one of our DAG stops working with errors like below:

[2018-12-27 04:22:18,884] {models.py:1736} ERROR - ('BigQuery job status check failed. Final error was: %s', 404)
Traceback (most recent call last)
  File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 981, in run_with_configuratio
    jobId=self.running_job_id).execute(
  File "/usr/local/lib/python3.6/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrappe
    return wrapped(*args, **kwargs
  File "/usr/local/lib/python3.6/site-packages/googleapiclient/http.py", line 851, in execut
    raise HttpError(resp, content, uri=self.uri
googleapiclient.errors.HttpError: <HttpError 404 when requesting https://www.googleapis.com/bigquery/v2/projects/perx-production/jobs/job_lvpf7lmKyR92vFxdEzJ0sH4cnZpx?alt=json returned "Not found: Job perx-production:job_lvpf7lmKyR92vFxdEzJ0sH4cnZpx"

After some dig up, I figured out the error is because of airflow BigQuery hook doesn't submit a required parameter "location" when pooling for job results for jobs that are created in other location. 

I have created a JIRA ticket https://issues.apache.org/jira/projects/AIRFLOW/issues/AIRFLOW-3577 and I think it should be quite critical for companies store their data in other locations other than US and EU.

 Any help would be appreciated.

Shiqiang Duan

unread,
Dec 27, 2018, 9:13:38 AM12/27/18
to cloud-composer-discuss
A off-topic feedback for airflow project is that issues seem to be not being taken care, I have created 5 JIRA tickets over the time for bugs or improvements but there's no one working on any of them. BTW, 4 of the 5 tickets are related to Google Cloud hooks or operators. Since composer is such an important offering for ETLs on GCP (or maybe it's not), I think Google should put some effort into helping to make the integration with GCP better.

Shiqiang Duan

unread,
Dec 27, 2018, 9:34:21 AM12/27/18
to cloud-composer-discuss

Shiqiang Duan

unread,
Jan 10, 2019, 11:35:56 AM1/10/19
to cloud-composer-discuss
I do see that the fix of the issue has been merged to master. I wonder when Cloud Composer can have the fix?

Instead of waiting for a new version of Cloud Composer, can I somehow use the fixed upstream version of the Operator and Hooks?

Really hope there's a simple way to archive this...

Wilson Lian

unread,
Jan 14, 2019, 4:25:19 PM1/14/19
to Shiqiang Duan, cloud-composer-discuss
The PR to use the new BigQueryHook functionality is still pending. We are tracking that PR and will consider backporting it once it's accepted.


--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-composer-discuss/ce780570-944f-48da-bbab-e923e83d55cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages