[JIRA] (PLUGIN-678) BigQuery sink is not able to write to exisiting bigquery tables if the schema contains type integer

170 views
Skip to first unread message

Yaojie Feng (Jira)

unread,
Apr 22, 2021, 4:43:14 PM4/22/21
to cdap...@googlegroups.com
Yaojie Feng created an issue
 
CDAP Plugins / Bug PLUGIN-678
BigQuery sink is not able to write to exisiting bigquery tables if the schema contains type integer
Issue Type: Bug Bug
Assignee: Unassigned
Created: 22/Apr/21 1:43 PM
Fix Versions: 6.5.0
Labels: regression
Priority: Blocker Blocker
Reporter: Yaojie Feng

If the input schema of bigquery sink contains integer field, the bigquery sink is not able to write the record to existing tables. The pipeline will fail with exception like:

org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long
	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308) ~[avro-1.8.2.jar:1.8.2]
	at io.cdap.plugin.gcp.bigquery.sink.AvroRecordWriter.write(AvroRecordWriter.java:90) ~[SYSTEM-google-cloud-0.18.0-SNAPSHOT.jar:na]
	at io.cdap.plugin.gcp.bigquery.sink.AvroRecordWriter.write(AvroRecordWriter.java:37) ~[SYSTEM-google-cloud-0.18.0-SNAPSHOT.jar:na]
	at io.cdap.plugin.gcp.bigquery.sink.BigQueryRecordWriter.write(BigQueryRecordWriter.java:58) ~[SYSTEM-google-cloud-0.18.0-SNAPSHOT.jar:na]

or

org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.UnresolvedUnionException: Not in union ["long","null"]: 1
	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308) ~[org.apache.avro.avro-1.8.2.jar:1.8.2]
	at io.cdap.plugin.gcp.bigquery.sink.AvroRecordWriter.write(AvroRecordWriter.java:90) ~[SYSTEM-google-cloud-0.18.0-SNAPSHOT.jar:na]
	at io.cdap.plugin.gcp.bigquery.sink.AvroRecordWriter.write(AvroRecordWriter.java:37) ~[SYSTEM-google-cloud-0.18.0-SNAPSHOT.jar:na]
	at io.cdap.plugin.gcp.bigquery.sink.BigQueryRecordWriter.write(BigQueryRecordWriter.java:58) ~[SYSTEM-google-cloud-0.18.0-SNAPSHOT.jar:na]
	at io.cdap.plugin.gcp.bigquery.sink.BigQueryRecordWriter.write(BigQueryRecordWriter.java:32) ~[SYSTEM-google-cloud-0.18.0-SNAPSHOT.jar:na]
	at io.cdap.cdap.etl.spark.io.TrackingRecordWriter.write(TrackingRecordWriter.java:41) ~[hydrator-spark-core2_2.11-6.5.0-SNAPSHOT

To reproduce, create a pipeline that contains integer type field as input schema of bigquery sink, and try to write to bigquery sink. If the table does not exist, the pipeline will succeed in first run but fail in subsequent runs. If table exists, the pipeline will always fail.

Add Comment Add Comment
 
Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100157-sha1:de71da4)
Atlassian logo

Greeshma Swaminathan (Jira)

unread,
Apr 27, 2021, 7:36:53 PM4/27/21
to cdap...@googlegroups.com
Greeshma Swaminathan commented on Bug PLUGIN-678
 
Re: BigQuery sink is not able to write to exisiting bigquery tables if the schema contains type integer

https://github.com/data-integrations/google-cloud/pull/636

https://github.com/data-integrations/google-cloud/pull/637

  • Configured schema was overridden for BQSink in prepareRun for preventing unexpected schema changes in target table (https://cdap.atlassian.net/browse/PLUGIN-395 )
  • With date time changes (9bde7d2), the configured schema was used to create the AVRO schema and this resulted in this bug. This was done because configured schema could be a subset of the data schema and is consistent with the BigQueryJsonConverter behavior.
  • Moved the schema overriding to OutputCommitter only for the load job setting so that the configured schema remains the same and target table schema is not changed
Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100157-sha1:76eb1b6)
Atlassian logo

Robin Rielley (Jira)

unread,
Apr 29, 2021, 2:17:26 PM4/29/21
to cdap...@googlegroups.com

I’ve added this blurb to the 6.4.0 release notes under Known Issues:

PLUGIN-678: Data pipelines that include BigQuery sinks version 0.17.0 fail or give incorrect results. This is fixed in BigQuery sink version 0.17.1, which is available for download in the Hub. 

Workaround: In the Hub, download Google Cloud Platform version 0.17.1. For each pipeline, replace BigQuery sink plugins version 0.17.0 with BigQuery sink plugins version 0.17.1.

Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100157-sha1:45b74eb)
Atlassian logo

Viral Kothari (Jira)

unread,
Dec 13, 2021, 12:35:14 AM12/13/21
to cdap...@googlegroups.com

I am getting this error in GCP BigQuery Sink 0.18.2

 Not in union [{"type":"int","logicalType":"date"},"null"]: 

Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS
This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100183-sha1:f6eba4b)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages