Re: How to integrate MySQL to BigQuery CDC tables with Pub/Sub and Debezium

15 views
Skip to first unread message

Sehwee Kim

unread,
Sep 16, 2025, 4:46:12 AMSep 16
to debe...@googlegroups.com, Cloud Pub/Sub Users, cloud-pu...@google.com
+ resending to @debe...@googlegroups.com



On Tue, Sep 16, 2025 at 4:16 PM Sehwee Kim <seh...@google.com> wrote:
Hi team,
I am looking for referable material on integrating MySQL to Debezium to Pub/Sub to BigQuery CDC tables using BigQuery Subscriptions.

I found the following related resources, but they do not contain detailed information on integrating Debezium to Pub/Sub with BigQuery CDC tables.
For example, I need to know how to transform the `__op` field from Debezium into `_CHANGE_TYPE` for BigQuery CDC tables when using use_topic_schema.

Any guidance or additional resources you can provide would be greatly appreciated.

Regards,
Sehwee

                                                                                              

Sehwee Kim (김세휘)
Customer Engineer, Data Analytics, Google Cloud
seh...@google.com 
+82-2-531-9961

Nathan Smit

unread,
Sep 16, 2025, 6:03:59 AMSep 16
to debezium
Hey there, are you planning to use Debezium + Kafka or Debezium Server or are you still researching?

We currently ingest data into bq with Debezium Server using the Mysql source and the Pubsub sink.  We don't use the BQ CDC tables unfortunately.  The way we do this is we flatten the messages with ExtractNewRecordState (which you seem to be doing as well) and then send to pubsub with a bigquery subscription and use the big query table schema.  We then implement our own procedures to do the merge.

For the BQ CDC tables you'll need to map the __op field to either UPSERT (c,r,u) or DELETE (d).  Unfortunately as you've probably seen there isn't a simple way to do this as far as I'm aware (although I'm sure the DBZ maintainers can weigh in).  I believe the only way to do this would be to write your own custom SMT.  If you haven't done this before there is a tutorial here:  https://www.confluent.io/blog/kafka-connect-single-message-transformation-tutorial-with-examples/

I wonder though if another option would be to use the newly added Pubsub SMTs?  I'm not sure what functionality has been added there, but perhaps this could work for the transform you need:  https://cloud.google.com/blog/products/data-analytics/pub-sub-single-message-transforms

Just aliasing the column names (without needing to inspect and transform them) is pretty easy if you use ExtractNewRecordState.  For example, you could do:

transforms.unwrap.add.fields=op:CHANGE_TYPE,source.ts_ns:_CHANGE_SEQUENCE_NUMBER

And then set add.fields.prefix to be just a single underscore.  So you'd then have the op named as the correct psuedo-column for pubsub and then can use a really granular timestamp like the source nanoseconds for the change sequence number.  Then the only remaining challenge is mapping the more detailed OP to the CHANGE_TYPE.

Also just FYI your google doc you linked is not publicly accessible.

Sehwee Kim

unread,
Sep 17, 2025, 8:08:43 PMSep 17
to debe...@googlegroups.com, Mehran Nazir, Junho Jang
Hi Nathan,
Thank you for your email and valuable insights.

I plan to use Debezium Server directly to Pub/Sub, similar to your approach. Could you share what products or features you utilize for the merging stage?

Regarding your suggestion to alias the column name, I encountered an issue where `CHANGE_TYPE` was automatically prefixed with a double underscore. I will try setting `add.fields.prefix`, which I overlooked previously.

Thank you again for sharing your experience. I will keep you updated on the testing progress.

FYI, I have updated the reference links to publicly accessible versions:

    Regards,
    Sehwee


                                                                                                  

    Sehwee Kim (김세휘)
    Customer Engineer, Data Analytics, Google Cloud
    seh...@google.com 
    +82-2-531-9961


    This email and all contents are subject to the following disclaimer: https://www.pepkor.co.za/email-disclaimer

    IMPORTANT: This electronic message may contain information that is considered confidential. If received in error, please notify the sender immediately and then delete the original and all subsequent/related messages.

    --
    You received this message because you are subscribed to the Google Groups "debezium" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
    To view this discussion visit https://groups.google.com/d/msgid/debezium/aaaa0fa8-92cd-4117-84b0-a70cfddedd1en%40googlegroups.com.
    Reply all
    Reply to author
    Forward
    0 new messages