Storage to BigQuery time partitioned tables

2,227 views
Skip to first unread message

Matías Battocchia

unread,
Aug 16, 2018, 5:55:03 PM8/16/18
to cloud-composer-discuss
Hello,

I am thriving with an issue in Composer that does not occur in a local installation of Airflow 1.9.0.

I want to load records to a daily partitioned table in an idempotent way. The table is configured to time partition on a field called updated_on. Normally I would specify dataset.table$partition with WRITE_TRUNCATE to achieve this, so I can upload the same set of records of a given day any number of times without messing it up.

The error is:

BigQuery job failed. Final error was: {u'reason': u'invalid', u'message': u'Incompatible table partitioning specification. Expects partitioning specification interval(type:day,field:updated_on), but input partitioning specification is interval(type:day)'}

If I try to fix this setting time_partitioning={'type':'day','field':'updated_on'} on the operator, the following error shows up:

airflow.exceptions.AirflowException: Cannot specify field partition and partition name (dataset.table$partition) at the same time

And if I use field partition without partition name the job finishes but all the whole table gets truncated.

Thanks in advance.

Matías

Tim Swast

unread,
Aug 17, 2018, 1:29:11 PM8/17/18
to Matías Battocchia, cloud-composer-discuss
What happens if you specify the partition name but do not supply a partition configuration?

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-composer-discuss/3c675b9d-2fa7-47c1-9eb5-c197e99b85e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
  •  Tim Swast
  •  Software Friendliness Engineer
  •  Google Cloud Developer Relations
  •  Seattle, WA, USA

Matías Battocchia

unread,
Aug 17, 2018, 2:24:46 PM8/17/18
to sw...@google.com, cloud-compo...@googlegroups.com
On Fri, Aug 17, 2018 at 2:29 PM Tim Swast <sw...@google.com> wrote:
What happens if you specify the partition name but do not supply a partition configuration?

That would be the first case, where the time_partitioning argument is not used.

BigQuery job failed. Final error was: {u'reason': u'invalid', u'message': u'Incompatible table partitioning specification. Expects partitioning specification interval(type:day,field:updated_on), but input partitioning specification is interval(type:day)'}

I found this issue report, which is exactly what is happening to me.

Tim Swast

unread,
Aug 18, 2018, 1:00:19 PM8/18/18
to Matías Battocchia, cloud-composer-discuss
> I found this issue report, which is exactly what is happening to me.

Yuck. Thanks for looking into that. I agree that Airflow is doing the wrong thing by raising that error. Honestly, I'd suggest using the PythonOperator with the google.cloud.bigquery library for now.

  •  Tim Swast
  •  Software Friendliness Engineer
  •  Google Cloud Developer Relations
  •  Seattle, WA, USA

z...@useracquisition.com

unread,
Sep 18, 2018, 9:01:30 PM9/18/18
to cloud-composer-discuss
Hi Tim, 

Any idea when this https://github.com/apache/incubator-airflow/pull/3901 will make its way back into Cloud Composer. Not that this constitutes a national emergency :) but it's blocking for one of our pipelines. I will go ahead and use PythonOperator if I have to, but it would be swell if I just didn't have to :)

Thanks!

Zachary Friedman

Tim Swast

unread,
Sep 19, 2018, 3:15:24 PM9/19/18
to z...@useracquisition.com, cloud-composer-discuss
As this didn't make it into Airflow 1.10, I've added this to our list of patches to backport to Cloud Composer. Hopefully will go out in a Cloud Composer release soon.

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Reply all
Reply to author
Forward
0 new messages