Oracle SCN in kafka message headers

168 views
Skip to first unread message

Bernd Helmle

unread,
Aug 10, 2021, 11:40:05 AM8/10/21
to debe...@googlegroups.com
Hi Folks,

Not sure if the following really is a Debezium thing or something that
belongs into the Kafka Connect community, but since
ExtractNewRecordState is involved i thought i'm asking here first.

I'm currently playing around with the Oracle source connector in
Debezium and i'm currently faced with the following observation i did
today:

I'm using a configuration which flattens a Debezium event emitted by
the Oracle source connector via the ExtractNewRecordState transform and
put the "scn" field into the message header via add.headers. The config
looks as follows (nothing surprising though):

"transforms.unwrap.type":
"io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.delete.handling.mode": "none",
"transforms.unwrap.add.headers": "scn,table,commit_scn",

I have developed my own transform predicate class which is able to
evaluate specific properties of the message kafka header.

I was surprised that when extracting the Header value for
"__source_scn" which should have the SCN from Oracle now, i finally get
a java.lang.Integer object back and not, like an untouched Debezium
event in the source struct has, a String object.

Is there a potential misunderstanding on my side?

Thanks,

Bernd


Gunnar Morling

unread,
Aug 10, 2021, 1:08:22 PM8/10/21
to debezium
Hi Bernd,

Indeed there should no type conversion happening for a field promoted from source to a header property. If you log the headers e.g. via the console consumer, how does it look like there?

--Gunnar

Bernd Helmle

unread,
Aug 11, 2021, 5:13:44 AM8/11/21
to debezium
Hi Gunnar,


On Tuesday, August 10, 2021 at 7:08:22 PM UTC+2 gunnar....@googlemail.com wrote:
Hi Bernd,

Indeed there should no type conversion happening for a field promoted from source to a header property. If you log the headers e.g. via the console consumer, how does it look like there?


with kafka-console-consumer.sh --property print-headers=true i get this (for an INSERT):

__source_scn:11091227,__table:FOO,__source_commit_scn:11091235,__source_ts_ms:1628672316000    {"schema":{"type":"struct","fields":[{"type":"double","optional":false,"field":"ID"},{"type":"double","optional":false,"field":"VALUE"}],"optional":false,"name":"test.HR.FOO.Value"},"payload":{"ID":3.0,"VALUE":3.0}}

I've added some debug messages to my Predicate class and get this when looping through the Headers collection (note the type at the end of each line):

found header key __source_scn / matching key: _source_scn / type java.lang.Integer
found header key __table / matching key: _source_scn / type java.lang.String
found header key __source_commit_scn / matching key: _source_scn / type java.lang.Integer
found header key __source_ts_ms / matching key: _source_scn / type java.lang.Long

After reflection, i'm not so much concerned about the type conversion itself. What makes me nervous is that, afair, Oracle SCN is a 48 Bit number, whereas a java.lang.Integer obviously has MAX_INT at 2^31 -1.
If that all is correct, i might get into trouble once an Oracle instance has reached MAX_INT of java.lang.Integer with its SCN....

Thanks,

Bernd
 

Chris Cranford

unread,
Aug 11, 2021, 10:23:00 AM8/11/21
to debe...@googlegroups.com
Hi Bernd -

This specific concern was identified in DBZ-2994 [1] and we have since moved all references of SCN in both offsets & source blocks to be represented as String values.  Additionally, the connector now uses a domain class type, Scn, to represent these values that aids in the conversion of the String values to a satisfactory numerical representation that allows us to do comparisons and so forth without any issue of value overflow. 

Can you confirm whether you're at least using Debezium 1.5.0.Final or later?  If you are not, I would highly recommend updating to the latest 1.6.x.Final.

Thanks
CC

[1]: https://issues.redhat.com/browse/DBZ-2994
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/db738e04-f565-4fda-a245-eec57eb3677bn%40googlegroups.com.

Bernd Helmle

unread,
Aug 12, 2021, 11:53:49 AM8/12/21
to debezium
Hi Chris

Thanks for your input.

On Wednesday, August 11, 2021 at 4:23:00 PM UTC+2 Chris Cranford wrote:
Hi Bernd -

This specific concern was identified in DBZ-2994 [1] and we have since moved all references of SCN in both offsets & source blocks to be represented as String values.  Additionally, the connector now uses a domain class type, Scn, to represent these values that aids in the conversion of the String values to a satisfactory numerical representation that allows us to do comparisons and so forth without any issue of value overflow. 

Hmm interesting, but somehow this information gets lost when the field is added to the Kafka message header.
 

Can you confirm whether you're at least using Debezium 1.5.0.Final or later?  If you are not, I would highly recommend updating to the latest 1.6.x.Final.



I'm on 1.6.1 already.
 
Thanks,

Bernd

Chris Cranford

unread,
Aug 13, 2021, 1:00:14 PM8/13/21
to debe...@googlegroups.com
Hi Bernd -

Interesting, so that definitely sounds like a bug then if you're on 1.6.  Could you open a Jira if you haven't already with the details and we'll take a look to see if its something with the SMT?

Thanks,
CC
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.

Bernd Helmle

unread,
Aug 25, 2021, 6:54:24 AM8/25/21
to debe...@googlegroups.com
Hi Chris, Folks...

Am Freitag, dem 13.08.2021 um 13:00 -0400 schrieb Chris Cranford:
> Hi Bernd -
>
> Interesting, so that definitely sounds like a bug then if you're on
> 1.6.  Could you open a Jira if you haven't already with the details
> and
> we'll take a look to see if its something with the SMT?
>

Sorry for the delay, got other issues to solve before...

I've investigated this a little more this morning and finally
recognized what's going on here: I'm using the default
SimpleHeaderConverter in my configuration, which is called on the sink
side when reading in the SinkRecord. This uses its toConnectHeader()
method which doesn't recognize the schema attached to the header values
but reparses the content of the header.

toConnectHeader() then returns a SchemaAndValue object which
dynamically attaches a value schema based on the result of
Values.parseString(), which converts the header value into an
appropiate type it thinks is the closest one (also stated in its
documentation). So, if the number is high enough, it gets its correct
final type, depending on its precision. I've tested this with a custom
field injected into the header via ExtractNewRecordState.

Depending on the size of the number you either get an INT32 (for e.g
2^16) or INT64 (for e.g. 2^63).

So everything looks right, no bug involved here. A correct solution
looks like i just need to watch out for the correct type(s) and convert
them into a BigDecimal...

Thanks,

Bernd


Chris Cranford

unread,
Aug 25, 2021, 11:51:10 AM8/25/21
to debe...@googlegroups.com
Hi Bernd -

I'm glad it's resolved and thanks for the update!

CC
Reply all
Reply to author
Forward
0 new messages