HI team,
We have multiple kafka connect pods, hosting around 10 debezium MYSQL connectors connected to RDS. These produces messages to MSK brokers and from there are being consumed by respective services.
Our connectors stop producing messages randomly every now and then, exactly for 14 minutes whenever we see below message:
INFO: Keepalive: Trying to restore lost connection to aurora-prod-cluster.cluster-asdasdasd.us-east-1.rds.amazonaws.com:3306
And auto-recovers in 14mins exactly. During this 14 mins, If i restart the connect pod on which this connector is hosted, the connector recovers in ~3-5 mins.
I tried tweaking lot of configurations with my kafka, tried adding below as well:
database.additional.properties: "socketTimeout=20000;connectTimeout=10000;tcpKeepAlive=true"
But nothing helped.
But I can not afford the delay of 15mins for few of my very important tables as it is extremely critical and breaches our SLA with clients.
Anyone faced this before and what can be the issue here?
Any help will be greatly appreciated.
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/330bc647-9043-4177-bef1-fcc695c8685cn%40googlegroups.com.