[Debezium Oracle Connector] ERROR Graceful stop of task xx-connector-0 failed.

masami310

unread,

Feb 2, 2022, 3:54:26 AM2/2/22

to debezium

Hi,

I am getting the following error in our production environment.

ERROR || Graceful stop of task xx-connector-0 failed. [org.apache.kafka.connect.runtime.Worker]

This error occurs every few hours.
There are no other exceptions on the log.

In what cases does this error occur?

The version of Debezium is 1.8.0.Final.

Oracle is 12.2.
Kafka broker（Amazon MSK） 2.8.0.

masami310

unread,

Feb 2, 2022, 4:31:39 AM2/2/22

to debezium

Append.

The status of the connector is RUNNING as shown below.

curl -H "Accept:application/json" xx.xx.xx.xx:xx/connectors/xx-connector/status
{"name":"xx-connector","connector":{"state":"RUNNING","worker_id":"xx.xx.xx.xx:xx"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"xx.xx.xx.xx:xx"}],"type":"source"}

2022年2月2日水曜日 17:54:26 UTC+9 masami310:

Chris Cranford

unread,

Feb 2, 2022, 7:47:40 AM2/2/22

to debe...@googlegroups.com, masami310

Hi -

Is the cluster attempting to re-balance and therefore the connector needs to be stopped & restarted?

There quite a number of reasons why a connector would be gracefully stopped in Kafka and the answer should be in the logs. Could you zip and attach the full log in a mail to me so that I could take a look off list and see if I can understand why?

Thanks,
CHris

--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/f718848d-8842-44f5-9eb0-971e60fc209an%40googlegroups.com.

masami310

unread,

Feb 3, 2022, 5:09:47 AM2/3/22

to debezium

Hi, Chris

I got the log, so I will attach it.
Thank you.

2022年2月2日水曜日 21:47:40 UTC+9 Chris Cranford:

debezium_log.zip

masami310

unread,

Feb 3, 2022, 11:42:13 PM2/3/22

to debezium

Supplement.

2022-01-31T11:22:34.789Z 2022-01-31 20:22:34,789 ERROR || Graceful stop of task xx-xx-connector-0 failed. [org.apache.kafka.connect.runtime.Worker]

When the above error occurred, the change data message stopped flowing.

However, the status of the connector task was RUNNING.

2022年2月3日木曜日 19:09:47 UTC+9 masami310:

Chris Cranford

unread,

Feb 4, 2022, 12:02:41 PM2/4/22

to debe...@googlegroups.com, masami310

At 20:18:11, you had some stability issues on your cluster. At 20:22:22, a request was made for the connector to stop but according to KC, it didn't respond and it began force shutdown at 20:22:34. From the logs it's hard to determine if there is/was a Debezium issue or if this is a result of the instability. In either case, I can only theorize but here are somethings that did jump out to me:

1. maxCommitDuration = 12.2s
This indicates a pretty substantially large/long commit loop. Perhaps you had some bulk transactions that needed to be processed and the duration exceeds your configurationed task.shutdown.graceful.timeout.ms value.

2. lastDurationOfFetchingQuery = 12.25s
This again is a pretty substantially long fetch query, particularly using online_catalog with a batch size of 68,000 rows at the time the task finally stopped.

3. totalDurationOfFetchingQuery=59 minutes and 5.5s
This seems pretty high overall for a connector that has only ran since 19:10, so effectively 1 hour and 12 minutes. This indicates that a vast majority of your time is being spent just getting the results from LogMiner and could be indicative of a number of things. Your database's alert log would provide more detail here for me to diagnose what could be causing the delays.

With regard to (1) or (2), this can most definitely lead to the ERROR you saw. Given the value in (3), I'm more inclined to think if my observations are related, then its more attributable to (2) more so than (1). If that's true, then you may just need to raise the task.shutdown.graceful.timeout.ms to a value thats a bit more reasonable given your environment's performance.

Lastly, regarding the performance indicated by (3), I'd like for you to remove the "schema.include.list" configuration property and see whether or not the performance metric for both (2) and (3) seem better after about 1 hour and 12 minutes of execution. Ideally what we're looking for is that the "totalDurationOfFetchingQuery" relative to the connector's runtime is less than 82%, or ((totalDurationOfFetchingQueryMetric / totalRuntimeOfConnector) * 100). If the removal of the "schema.include.list" helps or hurts performance, I'd like to know. It seems that we may need to see if there is a way to optimize that query.

For what its worth, I haven't observed any performance issues locally with tests using online_catalog; however, we also do not specifically test with Oracle 12.1 so there may be some naunce with Oracle 12 R1 that needs some tuning.

Chris

To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/a26780b0-58bc-4812-9d5d-8fca281edac4n%40googlegroups.com.

masami310

unread,

Feb 7, 2022, 7:21:33 PM2/7/22

to debezium

Hi, Chris

Thank you for your very polite response.
I will change the setting based on the advice I received.
The results will be reported at a later date.

Thank you always so much.

2022年2月5日土曜日 2:02:41 UTC+9 Chris Cranford:

Long Văn

unread,

Jul 23, 2023, 11:47:01 PM7/23/23

to debezium

Hello @Chris Cranford
I got the same problem
This error occurs every day.

Could you tell me what causes this error happens?

Debezium is 2.2.1.Final.
kafka 3.4.0
DBMS: Oracle (ver. Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0)

abnormal I found in my log:

[2023-07-21 20:09:35,831] WARN [my-connector|task-0] LOB value was truncated due to the connector limitation of 40 MB (io.debezium.connector.oracle.logminer.events.LogMinerEventRow:224)

...

[2023-07-21 20:56:07,431] ERROR [my-connector|task-0] Graceful stop of task my-connector-0 failed. (org.apache.kafka.connect.runtime.Worker:1025)

I also attach full the log at 8 p.m

Thank you.

Graceful-stop-failed.txt

Chris Cranford

unread,

Jul 24, 2023, 7:37:19 AM7/24/23

to debe...@googlegroups.com

Hi,

Could you please raise a Jira, that looks like a bug. We shouldn't truncate the LOB data and instead we should rely on Kafka Connect and Kafka to inform you that you are generating messages that are larger than the topic is configured to store.

Thanks,
Chris

To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/9ef66375-40f2-4636-b5f3-27a1cec67012n%40googlegroups.com.

Reply all

Reply to author

Forward