Issue getting NVARCHAR data in to GCP

32 views
Skip to first unread message

Denzil Brokken

unread,
Dec 15, 2025, 11:40:11 PM (8 days ago) Dec 15
to debezium
Hi there,

We currently have an issue with 2x fields (PRTB_CTNT and NON_PRTB_CTNT) not properly pulling data through from source system (sql server) to GCP. I see that instead of the expected value in the source system, it comes through as __debezium_unavailable_value. Is there something we can enable in the debezium config to ensure the data passes through correctly?

Thanks,
Denzil.
value in source mmo_t.png
mmo_t_non_prtb_ctnt_prtb_ctnt_fields2.png
value in gcp mmo_t.png

jiri.p...@gmail.com

unread,
Dec 16, 2025, 12:11:42 AM (8 days ago) Dec 16
to debezium
Hi,

could you please share the connector log? I just double checked SQL Server source code I don't think we operate with unavalibale valu placeholder in it.

Jiri

Denzil Brokken

unread,
Dec 17, 2025, 11:32:34 PM (6 days ago) Dec 17
to debezium
Hi Jiri,

I did trial lob.enabled = true and it seemingly resolved the issue. now instead of seeing __debezium_value_unavailable in kafka i see the actual text mentioned from the NVARCHAR(4000) which is great.

However, we want a least impact approach. Are you able to confirm if lob.enabled = true can be configured at a per column or per table level? We would ideally just wish to set this configuration for a single column but I am unsure if this is possible?

If you could please confirm what my options are here with support documentation that would be great, thank you.

Cheers,
Denzil.

jiri.p...@gmail.com

unread,
Dec 18, 2025, 2:40:35 AM (6 days ago) Dec 18
to debezium
Hi,

I am now super confused, in the original report you are talking abou SQL Server but lob.enabled is config option for Oracle connector so there should be no impact on SQL Server.
Can you please double check what are we talking about?

Thanks

Jiri

Denzil Brokken

unread,
Dec 21, 2025, 10:35:05 PM (2 days ago) Dec 21
to debezium
Hi Jiri,

My apologies yes you are correct source is SQL Server and the lob.enabled setting is Oracle so this does not fix my issue.

Wondering is there any similar setting for SQL Server to handle NVARCHAR(4000) or CLOB fields and parse in the correct value? 

Right now I am seeing __debezium_unavailable_value but i want this to flow through with the correct data in the field.

Thanks,
Denzil.

Chris Cranford

unread,
Dec 21, 2025, 10:45:22 PM (2 days ago) Dec 21
to debe...@googlegroups.com
Hi Denzil -

So the `lob.enabled` toggle does not necessarily replace or somehow make the unavailable value placeholders disappear in the event stream. This setting toggles the Oracle connector to capture changes for CLOB/BLOB columns from the transaction logs. Without it, such operations are not mined and therfore those fields will always be represented as being unavailable. 

What you are likely after is the ReselectPostProcessor which you can use to have any event that contains the unavailable value placeholders to be queried out of bands to populate the field, if the value is still available in the base table. 
Does that help?

-cc
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/646d7358-0ec8-4749-8444-4e8a79554173n%40googlegroups.com.

Denzil Brokken

unread,
Dec 21, 2025, 10:59:31 PM (2 days ago) Dec 21
to debezium
Hi Chris,

Thanks for the prompt response. I see. So just want to check this aligns with the useage of  ReselectPostProcessor

Currently in kafka i see the record with the field being represented correctly (screenshot attached), yet misrepresented in GCP (screenshot attached).

This behaviour feels odd to me, if i can see it in kafka but then it gets hidden between kafka and GCP - is this abnormal in your eyes?

Thanks,
Denzil.
nonprtbctnt_in_kafka.png
nonprtbctnt_in_gcp.png

Chris Cranford

unread,
Dec 21, 2025, 11:12:16 PM (2 days ago) Dec 21
to debe...@googlegroups.com
Hi Denzil.

So the screenshot labels "in kafka" appears to be an insert for MMOID 4370932; however, on the GCP side what you showed does not tell me whether that correlates to that insert, or its a row that was later mutated by an update?

In databases, there are two types of storage. There is the data that is stored in the data page, which is where most column data is maintained. But for specific data types that consume significant space like VARCHAR2(MAX), TEXT, CLOB, BLOB, XML, VARBINARY, and JSON -- these are most often stored in what is called external storage. In the Oracle connector we use the phrase LOB and in PostgreSQL this is referred to as TOAST data.

On inserts, these column values are being persisted for the first time, and so the change vector in the transaction log will always have the value for these columns. This explains why your "In Kafka" screenshot shows the value, because before is null and after has a value (aka an insert). 

But for updates and deletes, this is where this unavailable value placeholder shows up. These LOB/TOAST columns are only ever recorded by the database in the change vector if and only if the column is modified by the update. If the update does not change that externally stored value, the change vector will indicate the field was unchanged but supplies no value for performance reasons. In addition, if the field is modified, you will notice a value in the after, but in the before block, it will have the unavailable value placeholder. And much like updates, since deletes do not mutate data, these column types generally will not ave a value in the before block, again because external storage values are not pre-fetched for the change vector recording.

The ReselectPostProcessor is a way to try and bridge this gap. It's not always going to work in latent corner cases or for deletes (unless you're using Oracle).

-cc
Reply all
Reply to author
Forward
0 new messages