BigQuery source plugin with sharded tables

18 views
Skip to first unread message

Daniel SA

unread,
Aug 16, 2022, 6:51:16 AM8/16/22
to CDAP User
Hi there,

I would need to copy several tables from a BigQuery project to a different one (with a different service account). Most of them are sharded tables. And it seems that the BigQueryTable source plugin (v.0.18.2) doesn't like the wildcard as a valid input:
"Table name can only contain letters (lower or uppercase), numbers and '_'."

Is there a way in CDAP to copy all sharded tables without needing to specify the full table name? I guess BigQuery Execute might be an option if the tables would remain in the same project or the SA would have read access to both projects, but it's something I can not change.

Would be the "*" accepted in a future/existing google plugin version?

Thanks in advance and kind regards, Daniel

Albert Shau

unread,
Sep 6, 2022, 5:46:02 PM9/6/22
to CDAP User
Hi Daniel,

The BigQuery source is designed to work with just a single table, so there are no plans for it to take a wildcard.

This type of use case has normally been handled with the 'multiple table plugins' source (available in the Hub) connected to a BigQuery Multi Sink. Note that the primary use case for this source is relational databases, so it requires a jdbc driver. BigQuery does have one (https://cloud.google.com/bigquery/docs/reference/odbc-jdbc-drivers), but I don't know if anybody has ever tried it with the source.

Regards,
Albert 

viann bono

unread,
Sep 29, 2022, 4:28:39 AM9/29/22
to CDAP User
hi Albert,
piggy-backing on this post, is there a plan to add more features to the BQ multitable plugin that are currently missing, compared to the main plugin. In my case, the insert/update/upsert options are most needed but don't seem to be available.
Also from my testing and from posts online, "Allow flexible schemas in Output" seems required if we want it to work at all, otherwise I get the error message "BigQuery Multi Table has no outputs. Please check that the sink calls addOutput at some point."


any help figuring out how to navigate these limitations is greatly appreciated
Vianney

viann bono

unread,
Sep 29, 2022, 5:03:27 AM9/29/22
to CDAP User
There's also the Bucket name issue:
'error: The value of property bq.delegating.multi.bucket must not be null '
though the documentation says:
'Temporary Bucket Name: (...) If it is not provided, a unique bucket will be created and then deleted after the run finishes.'

Albert Shau

unread,
Sep 30, 2022, 6:03:04 PM9/30/22
to cdap...@googlegroups.com
Hi Vianney,

I didn't see any jira open for adding more properties to the multi-sink, so I have opened https://cdap.atlassian.net/browse/PLUGIN-1416. You can follow there to see if it gets prioritized (or contribute an enhancement if you wish).

I'm not familiar with the bucket error, but if you've found a bug it would be helpful to open a jira with steps to reproduce and the full stack trace from the logs.

Regards,
Albert

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/3d1ab8fe-aa6a-4b21-bfa6-5604b5eecd56n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages