kafka status/offset/config topic for debezium consumer group coordinator not available

1,889 views
Skip to first unread message

Jinsong Hu

unread,
Mar 31, 2023, 6:38:14 PM3/31/23
to debezium
I have been running debezium 2.1 and I found the kafka consumer for status/offset/config topic went into a dead loop of connecting to kafka, was told that the consumer group coordinator is not available , disconnect, and connect again to kafka, this loop goes on forever, and no message can be sent . the only way to recover is to stop debezium, delete and create these 3 topics again, and start debezium. But after a while, the same issue happens again. Does anybody have any solution ? 

Here is the message:

2023-03-26 04:07:30,483 INFO [Consumer clientId=consumer-tpdata-pg-introspect02-cluster-3, groupId=tpdata-pg-introspect02-cluster] Group coordinator ch-kafka-event03.domain.redacted:9092 (id: 2147483644 rack: null) is unavailable or invalid due to cause: coordinator unavailable.isDisconnected: true. Rediscovery will be attempted. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [KafkaBasedLog Work Thread - tpdata.pg-introspect02.connect-configs]
2023-03-26 04:07:30,485 INFO [Consumer clientId=consumer-tpdata-pg-introspect02-cluster-3, groupId=tpdata-pg-introspect02-cluster] Discovered group coordinator ch-kafka-event03.domain.redacted:9092 (id: 2147483644 rack: null) (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [KafkaBasedLog Work Thread - tpdata.pg-introspect02.connect-configs]
2023-03-26 04:07:30,485 INFO [Consumer clientId=consumer-tpdata-pg-introspect02-cluster-3, groupId=tpdata-pg-introspect02-cluster] Group coordinator ch-kafka-event03.domain.redacted:9092 (id: 2147483644 rack: null) is unavailable or invalid due to cause: coordinator unavailable.isDisconnected: false. Rediscovery will be attempted. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [KafkaBasedLog Work Thread - tpdata.pg-introspect02.connect-configs]
2023-03-26 04:07:30,485 INFO [Consumer clientId=consumer-tpdata-pg-introspect02-cluster-3, groupId=tpdata-pg-introspect02-cluster] Requesting disconnect from last known coordinator ch-kafka-event03.domain.redacted:9092 (id: 2147483644 rack: null) (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [KafkaBasedLog Work Thread - tpdata.pg-introspect02.connect-configs]
2023-03-26 04:07:30,585 INFO [Consumer clientId=consumer-tpdata-pg-introspect02-cluster-3, groupId=tpdata-pg-introspect02-cluster] Discovered group coordinator ch-kafka-event03.domain.redacted:9092 (id: 2147483644 rack: null) (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [KafkaBasedLog Work Thread - tpdata.pg-introspect02.connect-configs]
2023-03-26 04:07:30,641 INFO [Consumer clientId=consumer-tpdata-pg-introspect02-cluster-1, groupId=tpdata-pg-introspect02-cluster] Node 2147483644 disconnected. (org.apache.kafka.clients.NetworkClient) [KafkaBasedLog Work Thread - tpdata.pg-introspect02.connect-offsets]
2023-03-26 04:07:30,641 INFO [Consumer clientId=consumer-tpdata-pg-introspect02-cluster-1, groupId=tpdata-pg-introspect02-cluster] Group coordinator ch-kafka-event03.domain.redacted:9092 (id: 2147483644 rack: null) is unavailable or invalid due to cause: coordinator unavailable.isDisconnected: true. Rediscovery will be attempted. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [KafkaBasedLog Work Thread - tpdata.pg-introspect02.connect-offsets]
2023-03-26 04:07:30,642 INFO [Consumer clientId=consumer-tpdata-pg-introspect02-cluster-1, groupId=tpdata-pg-introspect02-cluster] Discovered group coordinator ch-kafka-event03.domain.redacted:9092 (id: 2147483644 rack: null) (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [KafkaBasedLog Work Thread - tpdata.pg-introspect02.connect-offsets]
2023-03-26 04:07:30,642 INFO [Consumer clientId=consumer-tpdata-pg-introspect02-cluster-1, groupId=tpdata-pg-introspect02-cluster] Group coordinator ch-kafka-event03.domain.redacted:9092 (id: 2147483644 rack: null) is unavailable or invalid due to cause: coordinator unavailable.isDisconnected: false. Rediscovery will be attempted. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [KafkaBasedLog Work Thread - tpdata.pg-introspect02.connect-offsets]
 

Chris Cranford

unread,
Apr 1, 2023, 10:10:24 AM4/1/23
to debe...@googlegroups.com
Hi -

This typically implies that the Kafka broker has become unavailable and it's not clear specifically from the logs why; you'll need to do some investigation yourself.  Some common reasons for this could be simply your Kafka broker's disks have gotten full.  If you have a distributed Kafka environment, if you don't enable the auto.leader.rebalance.enable option, this can also cause such situations when partitions are not properly redistributed when a node goes offline.

Please take a look at your Kafka broker's stability.

Hope that helps.
Chris
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/192bc93e-e8df-41fd-abe8-9d7f5c167b5en%40googlegroups.com.

Jinsong Hu

unread,
Apr 4, 2023, 6:48:13 PM4/4/23
to debezium
the disk is only 50% full.  I have confirmed that this issue happens for one single broker, or with 3 brokers. I also tried kafka 2.4 3.2 , 3.4 version, all have this problem . another strange thing I found is that

kafka-consumer-groups.sh --bootstrap-server $kafka_brokers --list

doesn't show the consumer group for debezium , while it shows for other clients. this leads me to believe that the consumer group, including the group coordinator , was never setup properly. hence the "coordinator unavailable" message when it reconnects. 

tentatively I have resorted to set  connections.max.idle.ms=-1 in both client side and server side to that consumer socket connection never disconnects to solve the problem in development environment. but I am not sure this is appropriate for production environment, as socket connect can break for all kinds of reasons when running long enough.  setting this parameter also means load balancing is disabled in the consumer group too. 

We still need to know the root cause of this issue and have a permanent fix. 

Jinsong Hu

unread,
Apr 4, 2023, 7:09:24 PM4/4/23
to debezium
by the way,  in both the 3 node and single node kafka cluster. the brokers never went down during my testing. and in the single node case, the balancing is not relevant at all , because there is no other node to balance to. so this issue doesn't seem to be related to balancing and there is no load re-distribution. 

Jinsong Hu

unread,
Apr 4, 2023, 7:30:31 PM4/4/23
to debezium
turns out https://github.com/orgs/strimzi/discussions/8038 is also discussing this issue. 

Jinsong Hu

unread,
Apr 11, 2023, 12:08:21 PM4/11/23
to debezium
Just an update:

I noticed a new kafkaconnect image http://quay.io/strimzi/kafka:0.34.0-kafka-3.3.2 released recently.
So I used that one and the consumer group coordinator unavailable issue is gone.
but we still do not know the cause of this, and how it is fixed.

While doing this, I also noticed that the kafka needs to be upgraded to version 3.0 or above to avoid
A deadlock issue in the kafka. 

Chris Cranford

unread,
Apr 11, 2023, 5:31:18 PM4/11/23
to debe...@googlegroups.com
Hi Jinsong -

These are questions most likely directed at maintainers of Apache Kafka and Kafka Connect.  All I can say is that it could have been a bug or some compatibility issue between your KC cluster and the Kafka broker.

Thanks,
Chris

Tuyến Nguyễn Kim

unread,
Nov 20, 2023, 3:00:43 AM11/20/23
to debezium
Hi Jinsong, i met the same problem, but i'm using kafkaconnect image  http://quay.io/strimzi/kafka:0.38.0-kafka-3.6.0. How can i fix it?

Here is the message:
2023-11-17 10:49:58,062 INFO [Worker clientId=connect-1, groupId=connect-cluster] Group coordinator evn-cluster-kafka-0.evn-cluster-kafka-brokers.strimzi.svc:9093 (id: 2147483647 rack: null) is unavailable or invalid due to cause: error response NOT_COORDINATOR. isDisconnected: false. Rediscovery will be attempted. (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) [DistributedHerder-connect-1-1]
2023-11-17 10:49:58,062 INFO [Worker clientId=connect-1, groupId=connect-cluster] Requesting disconnect from last known coordinator evn-cluster-kafka-0.evn-cluster-kafka-brokers.strimzi.svc:9093 (id: 2147483647 rack: null) (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) [DistributedHerder-connect-1-1]
2023-11-17 10:49:58,062 INFO [Worker clientId=connect-1, groupId=connect-cluster] JoinGroup failed: This is not the correct coordinator. Marking coordinator unknown. Sent generation was Generation{generationId=-1, memberId='', protocol='null'} (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) [DistributedHerder-connect-1-1]
2023-11-17 10:49:58,062 INFO [Worker clientId=connect-1, groupId=connect-cluster] Request joining group due to: rebalance failed due to 'This is not the correct coordinator.' (NotCoordinatorException) (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) [DistributedHerder-connect-1-1]
2023-11-17 10:49:58,162 INFO [Worker clientId=connect-1, groupId=connect-cluster] Client requested disconnect from node 2147483647 (org.apache.kafka.clients.NetworkClient) [DistributedHerder-connect-1-1]
2023-11-17 10:49:58,164 INFO [Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator evn-cluster-kafka-0.evn-cluster-kafka-brokers.strimzi.svc:9093 (id: 2147483647 rack: null) (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) [DistributedHerder-connect-1-1]
2023-11-17 10:49:58,164 INFO [Worker clientId=connect-1, groupId=connect-cluster] Group coordinator evn-cluster-kafka-0.evn-cluster-kafka-brokers.strimzi.svc:9093 (id: 2147483647 rack: null) is unavailable or invalid due to cause: coordinator unavailable. isDisconnected: false. Rediscovery will be attempted. (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) [DistributedHerder-connect-1-1]
2023-11-17 10:49:58,164 INFO [Worker clientId=connect-1, groupId=connect-cluster] Requesting disconnect from last known coordinator evn-cluster-kafka-0.evn-cluster-kafka-brokers.strimzi.svc:9093 (id: 2147483647 rack: null) (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) [DistributedHerder-connect-1-1]
2023-11-17 10:49:58,265 INFO [Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator evn-cluster-kafka-0.evn-cluster-kafka-brokers.strimzi.svc:9093 (id: 2147483647 rack: null) (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) [DistributedHerder-connect-1-1]

Vào lúc 23:08:21 UTC+7 ngày Thứ Ba, 11 tháng 4, 2023, Jinsong Hu đã viết:
Reply all
Reply to author
Forward
0 new messages