Oracle Source Connector Issue ORA-01013

42 views
Skip to first unread message

hyojin lee

unread,
Mar 3, 2026, 2:42:33 AM (8 days ago) Mar 3
to debezium
Hi Team,

I am using the Debezium Oracle Source Connector v3.3.1.Final and encountered a task failure that I cannot recover without changing the connector name. I would like to understand why this happened and how to fix it properly.

1. The Error:
The connector stopped with the following exception:
java.sql.SQLTimeoutException: ORA-01013: user requested cancel of current operation
at io.debezium.connector.oracle.logminer.AbstractLogMinerStreamingChangeEventSource.executeAndProcessQuery

2. The Problem after Restart:
About 14 hours after the failure, I restarted the task. The logs show that it successfully connected and skipped the snapshot (as expected), but it got stuck:

It keeps committing the same offset: {scn=14280879081915}.

Only 1 record is sent repeatedly during offset commit intervals.

Even though there is 14 hours' worth of data in the source DB, the connector does not seem to mine any new changes or progress to a newer SCN.

Questions:

Root Cause: What specifically triggers ORA-01013 in the LogMiner streaming phase? Is it purely a log.mining.query.timeout.ms issue, or could it be related to specific DB conditions?

SCN Stuck: Why does the connector fail to fetch new data after being down for 8 hours? If the required archive logs were missing, I expected a different error (like ORA-01291). Why does it just sit there with a "1 record sent" message at the old SCN?

Prevention: What configuration changes are recommended to prevent this timeout and subsequent "stuck" state in high-traffic environments?

Recovery: Is there a way to force the existing connector (same name) to resume or "jump" to a specific SCN? I had to create a new connector with a different name to get it working, but I want to avoid this to maintain data continuity in the same topics.

Environment:

Debezium version: 3.3.1.Final

Oracle Version: AWS RDS Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production

Log Mining Strategy: hybrid

I have attached the full error and restart logs for your reference. Any insights would be greatly appreciated.

Best regards,
sourceconnector_config.txt
OracleSourceConnector_Restart.txt
OracleSourceConnector_Error.txt

Nathan Smit

unread,
Mar 3, 2026, 4:57:59 AM (8 days ago) Mar 3
to debezium
Hey there, I can take a stab at some of these

1)  This is mentioned in the FAQ.  By default DBZ has a ten minute timeout on queries.  You can increase this value by setting database.query.timeout.ms https://debezium.io/documentation/reference/stable/connectors/oracle.html

2)  This is more complicated.  Do you track any metrics for your environment like lag/oldest scn?  Do you know if you have long-running transactions in your environment?  My guess would be that your Logminer query is spanning a very wide transactions and the low watermark is being pinned while the transaction is running.  Each loop takes longer and longer to execute which prevents it from catching up.  This seems plausible if your hit the database timeout.  That's just a guess though, would probably need to see your metrics

hyojin lee

unread,
Mar 10, 2026, 6:32:08 AM (yesterday) Mar 10
to debezium

  Thanks for the detailed response! To follow up on your points:

  1. On log.mining.transaction.retention.ms

We set this to 60000 (1 min) temporarily and the lag cleared up after a while. However, we've since reverted it back to a higher value because we have long-running legitimate transactions (e.g. migrations) in our environment that could be abandoned prematurely. So this was more of a workaround than a fix.

I'm actually not 100% sure why the retention-based abandon didn't kick in automatically during the stuck state — the connector was running continuously and the ghost transaction appeared to have already disappeared from v$transaction, yet the offset SCN stayed frozen. Any insight into why the abandon logic might not trigger in this case would be appreciated.


2. On the strange transaction pattern — logs attached

Looking at our logs (15,770 lines total), the abandoned transactions all share a suspicious characteristic: 0 events.

# Line 7474-7475 2026-03-05T06:07:28,488 WARN All transactions with SCN <= 14280962899892 will be abandoned. 2026-03-05T06:07:28,489 WARN Transaction 19001e0053860300 (start SCN 14280962899863, change time 2026-03-05T09:11:21Z, redo thread 1, 0 events) is being abandoned. # Line 7484-7485 2026-03-05T06:15:36,157 WARN Transaction 11001200c9d00b00 (start SCN 14280963678246, change time 2026-03-05T09:53:00Z, redo thread 1, 0 events) is being abandoned. # Line 7488-7489 2026-03-05T06:18:24,558 WARN Transaction 0e000c00a03e0900 (start SCN 14280963936999, change time 2026-03-05T10:05:38Z, redo thread 1, 0 events) is being abandoned. # Line 7491-7492 2026-03-05T06:19:07,734 WARN Transaction 02000200807e0700 (start SCN 14280963999698, change time 2026-03-05T10:08:49Z, redo thread 1, 0 events) is being abandoned. # Line 7495-7496 2026-03-05T06:21:28,086 WARN Transaction 17002100ad4f0600 (start SCN 14280964221149, change time 2026-03-05T10:20:00Z, redo thread 1, 0 events) is being abandoned. # Line 15159-15160 (most recent incident, 2026-03-09) 2026-03-10T01:22:15,160 WARN All transactions with SCN <= 14281111131412 will be abandoned. 2026-03-10T01:22:15,161 WARN Transaction 0a000c0019550b00 (start SCN 14281111130435, change time 2026-03-10T07:06:08Z, redo thread 1, 0 events) is being abandoned.

Every single abandoned transaction has 0 events — meaning Debezium sees them as open but has never captured any DML inside them.

The key problem: the freeze starts at line 14756:

# Line 14756 2026-03-09T22:07:38,121 WARN Offset SCN 14281111130434 has not changed in 25 mining session iterations. Commit SCNs {1=14281111131716}

When we immediately query v$transaction after this WARN, we get 0 rows. The transaction is already gone from Oracle's perspective, but Debezium's in-memory cache still holds it as open and pins the offset SCN. The connector then stays frozen from line 14756 (22:07) all the way until line 15159 (01:22) — over 3 hours — with only this repeating:

WorkerSourceTask{id=prod-source-erp10-v2-0} Committing offsets for 1 acknowledged messages

Questions:

  • Why does Debezium's in-memory transaction cache retain a transaction that no longer exists in v$transaction? Is this a known behavior with hybrid log mining strategy on RDS Oracle?
  • Is there a way to force-expire or flush a specific transaction from Debezium's cache without a full connector restart?

3. On metrics tracking

We're planning to set up SCN lag monitoring. Key metrics we're thinking of tracking:

  • Oldest active transaction SCN vs current DB SCN → gap indicates a pinned low watermark
  • MilliSecondsBehindSource via JMX
  • Offset SCN change rate — if it stops moving for N iterations, that's the signal

Are there specific JMX MBean paths or v$ views you'd recommend? We're on AWS RDS Oracle so access to some internal views may be limited.


4. On frequency and operational impact

This is happening more than twice a week and causing real business impact. We're now building a heartbeat-based lag monitor (using DEBEZIUM_HEARTBEAT table + __source_ts_ms) to detect when the offset freezes and trigger an automatic connector restart via the Kafka Connect REST API.

However, we'd prefer to prevent the issue rather than just react to it. Given that all abandoned transactions consistently show 0 events, we suspect these are Oracle-internal transactions or system-level operations that Debezium picks up from the redo log but never resolves.

Is there a way to filter out or exclude these 0-event transactions and Offset SCN ### has not changed in 25 mining session iterations~ before they pin the offset SCN? And more broadly, could this issue be related to the MSK or Kafka Connect worker server specifications? Any recommendations on a prevention strategy would be greatly appreciated. We're happy to provide additional logs or configuration details if that would help.

Best regards,

2026년 3월 3일 화요일 PM 6시 57분 59초 UTC+9에 nath...@pepkorit.com님이 작성:
oracle-connector.log

Chris Cranford

unread,
Mar 10, 2026, 6:25:53 PM (14 hours ago) Mar 10
to debezium
Hi -

Regarding 1, Oracle is designed for high-throughput writes to its transaction log system; however, read operations are significantly slower. So while you may have seen a transaction disappear from V$TRANSACTION, it could take Debezium much longer to read the transaction off disk and emit any relevant changes depending on the overall IO workload on the system, how your disk groups are setup in Oracle, and what speed of disks the redo and archive logs are placed on.

Transaction retention is based on the duration of the transaction while executing in Oracle. If the retention is 10 minutes, and you run a bulk operation that runs as a single transaction taking 9 minutes and 59 seconds, that transaction won't be discarded. The transaction would have to run for 10 minutes and 1 second in Oracle to be abandoned. The duration that Debezium takes to ingest that 9 minute 59 second transaction is irrelevant here. If it takes Debezium 20 minutes to ingest it, the transaction won't be discarded with a 10 minute retention. It's done this way so that if you were to stop Debezium for 30 minutes and restart, Debezium doesn't immediately enter into a position where transactions are discarded.

Regarding 2, this ties back to my explanation above. If you need more detail, please let me know. 

Regarding 3, I suggest scraping all Oracle connector metrics to be safe. While you can make some assumptions by looking at one or two metrics, sometimes you may need to investigate multiple of them to get to a root cause, as there are lots of intertwined state to consider. However, some of the most critical ones to make sure you do have would be:

    * MilliSecondsSinceLastEvent
    * TotalNumberOfEventsSeen
    * QueueRemainingCapacity
    * MilliSecondsBehindSource
    * CurrentScn
    * OldestScn
    * OldestScnAgeInMilliseconds
    * OffsetScn
    * MiningSession[Lower|Upper]Bounds
    * MiningFetch[Lower|Upper]Bounds
    * MaximumMinedLogCount
    * LogSwitchCount
    * [Last|Total]CapturedDmlCount
    * [Last|Max]DurationOfFetchQueryInMilliseconds
    * [Last|Total]BatchProcsesingTimeInMilliseconds
    * ProcessedRows
    * BatchSize
    * NumberOfEventsInBuffer
    * NumberOfActiveTransactions
    * NumberOfCommittedTransactions
    * NumberOfRolledBackTransactions
    * NumberOfPartialRollbackCount
    * LagFromSourceInMillliseconds
    * ScnFreezeCount

These are generally the metrics I most often request when we are looking at connector stability to gain insight into the possible root causes.

Regarding 4, please make sure you have moved to 3.4.2.Final. As I mentioned in the prior post on the ML, there is one corner case where LogMiner did not handle emitting the transaction ID for rollback events properly, and this created issues with Debezium marrying the rollback event with the start/dml events. This is fixed, so it would be interesting to see if that helps. But understand, just because your SCN does not advance after 25 iterations is not necessarily a reason to stop and restart the connector. This could be indicative that your `log.mining.batch.size.*` settings are insufficiently small and need to be adjusted. It could also mean you may need to increase your `query.fetch.size` beyond the default, particularly if you have a lot of activity. And lastly, it could also be a sign that other configuration properties need to be adjusted.

But looking at your logs, you have a number of "Detected SCN gap" messages too. These are just informational, and are quite common if you have large batch jobs that create lots of changes in a small window where the changes all have the same SCN. This shows you positions in the redo logs where LogMiner need to perform considerably more work mining, because clustered SCN ranges require LogMiner to do more bookkeeping to maintain order.

Lastly ORA-01013 in your logs, this is due to a LogMiner session that took longer than 10 minutes to return data. This can be a sign of many logs are queued up for LogMiner to read. Your redo logs are only 1GB and you only have 6 log groups. If you are doing a lot of data loads or large batch transactions in production, this may be insufficiently sizes, and you could be facing checkpoint starvation on the database. When this happens, LogMiner will stop, and won't be able to move forward until the database checkpoints. I'd talk with the DBA and make sure you don't have any checkpoint waits.

And my last question is what is your connector configuration? I don't see it logged in the connector logs, and there be an opportunity there to optimize your setup. Could you share that information?


Thanks,
-cc

Chris Cranford

unread,
Mar 10, 2026, 6:27:12 PM (14 hours ago) Mar 10
to debezium
My apologies, regarding ORA-01013, I also meant to add that you can set `database.query.timeout.ms` to `0`, and that will cause Debezium to wait indefintely for query results, even if it takes many minutes/hours to return.

hyojin lee

unread,
Mar 10, 2026, 10:42:50 PM (10 hours ago) Mar 10
to debezium

Hi again,

Thank you so much for the detailed explanation. Your insights were incredibly helpful in clearing up some of our misconceptions. I have a few follow-up questions regarding your points to ensure we fix the root cause properly.

1. On log.mining.transaction.retention.ms and the 0-event transactions Thanks to your explanation, I realized I had misunderstood how the retention.ms option works. I originally thought that if a transaction started at 4:01 PM and was left improperly open, setting retention to 1 hour would force Debezium to abandon it 1 hour later (at 5:01 PM). However, you clarified that it is based on the actual duration the transaction is executing inside Oracle.

This makes my situation even more confusing. The problematic transactions we experienced were not heavy, long-running queries. They were abnormal "ghost" transactions with 0 events (no DMLs) that were improperly left open. If the retention option relies strictly on execution duration, why weren't these 0-event transactions cleaned up by the retention threshold, ending up pinning the offset SCN for over 7 hours?

2. Fundamental solution for the "SCN Stuck" issue and Fetch Sizing The reason we previously resorted to restarting the connector or manipulating the retention.ms value was that whenever we forced an abandon after hours of "SCN stuck" WARN logs, the pipeline would immediately unblock and flush all the pending messages. Because of this, we assumed "skipping the stuck transaction" was the proper fix.

If blindly restarting the connector isn't the answer, what is the fundamental root cause and solution for the offset SCN freezing due to these lingering transactions? You mentioned adjusting log.mining.batch.size.* and fetch sizes. Looking at our current configuration, we have both database.statement.fetchSize and log.mining.batch.size.max set to 2000. Is this value too conservative for a high-volume environment, causing the connector to digest pending transactions too slowly? What values would you recommend we scale these up to?

3. Best practices for Redo Log Sizing I completely agree with your assessment regarding the redo log capacity being insufficient. (Fortunately, we have over 250GB of free storage, so scaling up the logs is not an issue at all.)

Currently, we are running with 6 groups of 1GB, which we had increased from the default setup. Due to the nature of our business, our DB workload is relatively quiet with steady I/U/D traffic most of the time. However, during specific periods like financial month-end - month-start season closings, we experience massive spikes of batch operations and bulk inserts. It seems these bursty spikes overwhelmed our 1GB log groups and caused the Checkpoint Starvation.

In a CDC environment with a "bursty" workload like ours, what is the general best practice for sizing the redo log files and determining the optimal number of log groups?


Per your request, here is our current connector configuration (with sensitive info masked). We also already have database.query.timeout.ms set to 0 as you suggested.


2026년 3월 11일 수요일 AM 7시 27분 12초 UTC+9에 Chris Cranford님이 작성:
source-docker-compose.yml
oracle_connector_config.json
Reply all
Reply to author
Forward
0 new messages