[JIRA] (PLUGIN-678) BigQuery sink is not able to write to exisiting bigquery tables if the schema contains type integer

Yaojie Feng (Jira)

unread,

Apr 22, 2021, 4:43:14 PM4/22/21

to cdap...@googlegroups.com

Yaojie Feng created an issue

CDAP Plugins /

PLUGIN-678

BigQuery sink is not able to write to exisiting bigquery tables if the schema contains type integer

Issue Type:	Bug
Assignee:	Unassigned
Created:	22/Apr/21 1:43 PM
Fix Versions:	6.5.0
Labels:	regression
Priority:	Blocker
Reporter:	Yaojie Feng

If the input schema of bigquery sink contains integer field, the bigquery sink is not able to write the record to existing tables. The pipeline will fail with exception like:

 
                                                                org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long
	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308) ~[avro-1.8.2.jar:1.8.2]
	at io.cdap.plugin.gcp.bigquery.sink.AvroRecordWriter.write(AvroRecordWriter.java:90) ~[SYSTEM-google-cloud-0.18.0-SNAPSHOT.jar:na]
	at io.cdap.plugin.gcp.bigquery.sink.AvroRecordWriter.write(AvroRecordWriter.java:37) ~[SYSTEM-google-cloud-0.18.0-SNAPSHOT.jar:na]
	at io.cdap.plugin.gcp.bigquery.sink.BigQueryRecordWriter.write(BigQueryRecordWriter.java:58) ~[SYSTEM-google-cloud-0.18.0-SNAPSHOT.jar:na]
 
                                                            

or

 
                                                                org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.UnresolvedUnionException: Not in union ["long","null"]: 1
	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308) ~[org.apache.avro.avro-1.8.2.jar:1.8.2]
	at io.cdap.plugin.gcp.bigquery.sink.AvroRecordWriter.write(AvroRecordWriter.java:90) ~[SYSTEM-google-cloud-0.18.0-SNAPSHOT.jar:na]
	at io.cdap.plugin.gcp.bigquery.sink.AvroRecordWriter.write(AvroRecordWriter.java:37) ~[SYSTEM-google-cloud-0.18.0-SNAPSHOT.jar:na]
	at io.cdap.plugin.gcp.bigquery.sink.BigQueryRecordWriter.write(BigQueryRecordWriter.java:58) ~[SYSTEM-google-cloud-0.18.0-SNAPSHOT.jar:na]
	at io.cdap.plugin.gcp.bigquery.sink.BigQueryRecordWriter.write(BigQueryRecordWriter.java:32) ~[SYSTEM-google-cloud-0.18.0-SNAPSHOT.jar:na]
	at io.cdap.cdap.etl.spark.io.TrackingRecordWriter.write(TrackingRecordWriter.java:41) ~[hydrator-spark-core2_2.11-6.5.0-SNAPSHOT
 
                                                            

To reproduce, create a pipeline that contains integer type field as input schema of bigquery sink, and try to write to bigquery sink. If the table does not exist, the pipeline will succeed in first run but fail in subsequent runs. If table exists, the pipeline will always fail.

Add Comment

Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS

This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100157-sha1:de71da4)

Greeshma Swaminathan (Jira)

unread,

Apr 27, 2021, 7:36:53 PM4/27/21

to cdap...@googlegroups.com

Greeshma Swaminathan commented on

PLUGIN-678

Re: BigQuery sink is not able to write to exisiting bigquery tables if the schema contains type integer

https://github.com/data-integrations/google-cloud/pull/636

https://github.com/data-integrations/google-cloud/pull/637

Configured schema was overridden for BQSink in prepareRun for preventing unexpected schema changes in target table (https://cdap.atlassian.net/browse/PLUGIN-395 )
With date time changes (9bde7d2), the configured schema was used to create the AVRO schema and this resulted in this bug. This was done because configured schema could be a subset of the data schema and is consistent with the BigQueryJsonConverter behavior.
Moved the schema overriding to OutputCommitter only for the load job setting so that the configured schema remains the same and target table schema is not changed

Add Comment

Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS

This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100157-sha1:76eb1b6)

Robin Rielley (Jira)

unread,

Apr 29, 2021, 2:17:26 PM4/29/21

to cdap...@googlegroups.com

Robin Rielley commented on

PLUGIN-678

Re: BigQuery sink is not able to write to exisiting bigquery tables if the schema contains type integer

I’ve added this blurb to the 6.4.0 release notes under Known Issues:

PLUGIN-678: Data pipelines that include BigQuery sinks version 0.17.0 fail or give incorrect results. This is fixed in BigQuery sink version 0.17.1, which is available for download in the Hub.

Workaround: In the Hub, download Google Cloud Platform version 0.17.1. For each pipeline, replace BigQuery sink plugins version 0.17.0 with BigQuery sink plugins version 0.17.1.

Add Comment

Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS

This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100157-sha1:45b74eb)

Viral Kothari (Jira)

unread,

Dec 13, 2021, 12:35:14 AM12/13/21

to cdap...@googlegroups.com

Viral Kothari commented on

PLUGIN-678

Re: BigQuery sink is not able to write to exisiting bigquery tables if the schema contains type integer

I am getting this error in GCP BigQuery Sink 0.18.2

Not in union [{"type":"int","logicalType":"date"},"null"]:

Add Comment

Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS

This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100183-sha1:f6eba4b)

Reply all

Reply to author

Forward