Cannot write to specific Bigquery partition

247 views
Skip to first unread message

Andrew Lisi

unread,
Aug 10, 2021, 2:16:10 PM8/10/21
to CDAP User
I have a table that is partitioned by day with partition filter required. When attempting to write to specific partitions, I can only ever get the bigquery sink to write to the current day's partition (2021-08-10).

I have tried:
 1. specifying partition filter field as both _PARTITIONDATE = "2021-08-08" and _PARTITIONTIME = "2021-08-08" (separate runs) while using insert operation

2. specifying partition filter field as both _PARTITIONDATE = "2021-08-08" and _PARTITIONTIME = "2021-08-08" (separate runs) while using upsert operation and specifying Primary Key appropriately.

If anyone knows how to write to specific partition, please let me know. Thank you!

Albert Shau

unread,
Aug 10, 2021, 7:48:55 PM8/10/21
to cdap...@googlegroups.com
Hi Andrew,

The partition filter field isn't used to control which partition to write to. It's used in update/upsert situations to avoid processing entire tables when not needed (update/upsert is implemented as a BigQuery merge query).

If you want to write to a specific partition, you should make sure the partition field is part of the incoming data to the sink. For example, the input schema to the sink should have a _PARTITIONDATE field that is always set to '2021-08-08'. You can use wrangler or the field adder plugin (from Hub) in conjunction with the logicalStartTime macro (https://cdap.atlassian.net/wiki/spaces/DOCS/pages/1188036697/Macros+and+macro+functions#Logical-Start-Time-function) to do add a field with the date.

Regards,
Albert


-----------------------------
The information contained in this e-mail message is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, or is not the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this message in error, please notify us immediately by telephone or reply by e-mail and then promptly delete the message. Thank-you.

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/18c1cb03-e5c7-4d8a-9782-9e6692f1b079n%40googlegroups.com.

Andrew Lisi

unread,
Aug 13, 2021, 4:40:02 PM8/13/21
to CDAP User
Ah, OK yes that makes sense thank you for the thorough explanation Albert. I think I was thrown off because I know it was possible with the BQ CLI by specifying the partition you want to write to via a partition decorator in the table name like so:

bq load --source_format=CSV 'my_dataset.my_table$20210808' data.csv

I'm now guessing that's what it's really doing under the hood via the decorator but I appreciate your insight and help,
Andy

Reply all
Reply to author
Forward
0 new messages