MySQL Connector - Issue with Message Flushing After Initial Operation

20 views
Skip to first unread message

Kaua Rodrigues

unread,
Feb 2, 2026, 4:00:45 PMFeb 2
to debezium

Hi all,

We're experiencing an issue with our AWS MSK Connector (Debezium) connected to a MySQL RDS instance, with multiple large databases (~20GB/week data volume). After running successfully for a few hours or days, the connector stops flushing/committing messages to Kafka (0 messages sent), despite no errors or warnings appearing in CloudWatch Logs, the connector remaining alive and connected to MySQL, the source tables being actively updated and MySQL binlog continuing to grow.

[Worker-0989ee99c536b29b0] [2026-02-02 20:27:51,011] INFO [mysql|task-0|offsets] WorkerSourceTask{id=mysql-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:507)
[Worker-0989ee99c536b29b0] [2026-02-02 20:27:56,012] INFO [mysql|task-0|offsets] WorkerSourceTask{id=mysql-extractor-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:490)

Our current configuration is:
database_server_id = "5401"
max_batch_size = 128
max_queue_size = 2048
mcu_count = 1
worker_count = 1
connect_timeout_ms = 30000
offset_flush_timeout_ms = 1200000 # 20min
offset_flush_interval_ms = 5000 # 5s
errors_log_enable = true
errors_tolerance = "all"
errors_log_include_messages = true
producer_buffer_memory_bytes = 16777216 # 16MB
snapshot_mode = "schema_only"
snapshot.locking.mode = "none"
tasks.max = "1"
key.converter = "org.apache.kafka.connect.json.JsonConverter"
value.converter = "org.apache.kafka.connect.json.JsonConverter"
transforms = "unwrap,addVersion,underscoreRemover"
transforms.unwrap.type = "io.debezium.transforms.ExtractNewRecordState"
transforms.unwrap.delete.handling.mode = "rewrite"

Destroying and recreating the connector via Terraform resolves the issue temporarily, but it recurs after some time. Besides, when the connector reaches this stale state the Kafka offset topic stops receiving new messages, which implies the offsets stop advancing even though the binlog keeps growing.

We have also investigated the binlog positions where the connector stopped processing, considering two cases in which it reached this ghost state. In both scenarios, the last offset corresponded to an Xid (transaction BEGIN) and the transactions involved multiple tables with INSERT operations. In case 1, one table was in the connector exclude list. In case 2, all 3 tables were in the include list, with tables 2 and 3 being, actually, the same table.

We have, finally, attempted to enable DEBUG logs but discovered that, apparently, AWS MSK only supports INFO level logging.

We're using Debizium MySQK Connector Plug-In (2.7.1) and Kafka Connect (2.7.1).

I'd really appreciate any insights into potential root causes or recommended troubleshooting steps.

Thank youu in advance,

Chris Cranford

unread,
Feb 3, 2026, 11:42:11 AMFeb 3
to debe...@googlegroups.com
Hi -

Given that MSK does not support anything but INFO level logging, we likely need to use some radical debugging.

First, could you enable `heartbeat.interval.ms`, and set this to emit a heartbeat perhaps every 60 or 120 seconds. This will simply be a pulse that is sent from Kafka Connect to Kafka on this interval. When you observe that you are no longer getting messages from your database tables, could you check if you are still getting heartbeat events, or do they stop too?

In conjunction, can you check and determine if you have any sort of Gateway Load Balancer or Proxy that sits between the MSK cluster and the MySQL RDS instance? And lastly, if you can identify the time when this occurs, are you able to identify if any specific activity is taken on the RDS side? s there any sort of backup or other process that could lock or prevent the Binlog Client from being able to read information from the MySQL database? Does the RDS MySQL instance logs share any enlightening information?

Thanks,
-cc
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/4c1c20d8-8bd7-4a12-92f1-9f8efc5195e8n%40googlegroups.com.

Kaua Rodrigues

unread,
Feb 9, 2026, 8:32:58 AMFeb 9
to debezium

Hi Chris,

Thanks for the suggestion. We enabled the Debezium heartbeat as suggested, with the following configurations:

heartbeat.interval.ms = 300000 (5 minutes)

topic.heartbeat.prefix = "heartbeat"

No query heartbeat was configured. After this change, the connector ran normally for around 2 days, emitting both CDC events and heartbeat messages. On Friday late afternoon, it entered the same stale state, in which CDC events stopped, heartbeat messages stopped at the same time and no new events appear in CloudWatch or Kafka. 

Meanwhile the connector and task remain "RUNNING", logs keep showing "Committing offsets / flushing 0 outstanding messages", MySQL RDS still shows an active "Binlog Dump – Sending to client", tables continue to be updated and binlog keeps growing. We are now checking RDS logs, but I wanted to ask first if does the fact that the heartbeat stops together with CDC already point to a known class of issues or clarify what the issue might be?

Thanks again for the help.

Chris Cranford

unread,
Feb 9, 2026, 12:42:44 PM (14 days ago) Feb 9
to debe...@googlegroups.com
Hi -

If I were to speculate, the fact heartbeats stopped would potentially signal to me that the connector entered some sort of blocking state, where the connector thread likely faced some sort of deadlock. This most often happens when the `QueueRemainingCapacity` reaches 0, and this occurs when Kafka Connect can no longer deliver events to the Kafka broker. But there could be other reasons for a thread deadlock. 

Is it possible when that happens to supply us with a thread dump of the connector process?

-cc
Este conteúdo é externo, cuidado. --
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.

Kaua Rodrigues

unread,
Feb 9, 2026, 6:25:11 PM (14 days ago) Feb 9
to debezium

Hi Chris,

We investigated the issue from our side. Looking at the Aurora RDS logs around the time the connector stopped flushing, we only saw aborted connections and the binlog backlog growing. There were no errors or events indicating a crash, timeout, or lock that could explain the connector halt, and the database continued receiving updates normally. The heartbeat and offsets from Debezium stopped way before any of these Aurora events, so I believe the connector might have, indeed, entered some kind of blocking state internally as you suggested.

Regarding your suggestion for a thread dump, since we’re using AWS MSK Connect (managed infrastructure) we don’t have access to the JVM process, so I'm afraid it’s not possible for us to provide a thread dump. Are there any alternative ways to diagnose these kinds of blocking states in MSK Connect, given the MSK level logging limitation?

Thanks,

Chris Cranford

unread,
Feb 10, 2026, 9:24:07 AM (13 days ago) Feb 10
to debe...@googlegroups.com
Hi -

There are a few things I'd check.

    1. Make sure that `connect.keep.alive` isn't being forced to `false`.
    2. Check if there is any Gateway Load Balancer between MSK and RDS, removing it from the topology if so.
    3. See if using `heartbeat.interval.ms` + `heartbeat.action.query` generates adequate traffic to keep the database connection alive.
    
Hope that helps.
-cc

Reply all
Reply to author
Forward
0 new messages