Hello, we need your help.
We are encountering a problem using the debezium connector oracle 3.0 in a production environment consisting of Openshift 4.9, confluent for kubernetes 2.9.3, confluent platform 7.7.1 and oracle database 19c - RAC on exadata.
In our last test, we enabled the "rac.nodes" parameter to try to mitigate the error that we were encountering frequently, but even with this new parameter the error occurred again after a week.
The error we are encountering is the following: after a random amount of time, the connector reports "missing scn" warnings for a specific scn (scn=32152259204298) and after about 5 minutes, it starts reporting "long running transaction" errors referencing a scn number immediately before the one reported in the "missing scn" error (scn=32152259204297). And it continues reporting this error forever. If we try to restart the connector, it crashes because it cannot find the scn (scn=32152259204297).
When we try to locate the scn (scn=32152259204297) in the archive logs, it does not exist, only the previous and the next ones exist. for example: 32152259204296 and 32152259204298.
And according to what we were informed by Oracle support, this behavior would be normal, that is, there may be gaps in the SCN numbers.
Therefore, some questions we have are:
is this a bug in Debezium?
How does Debezium detect a "missing SCN"?
Why does Debezium assume that a "missing SCN" is a problem?
And why does it consider that a "missing SCN" is a "long running transaction"?
Or is it possible that the information given to us by Oracle support was inaccurate, and there should not be any SCN gaps?
More information is attached.
Note: The timezone in log file is UTC, and in png file is UTC-3, this means that at the same time that the "missing scn" was detected, the metric debezium_metrics_numberofactivetransactions changed to 1.
Thank you in advance !