In connect-distributed mode multiple connector threads are getting created for single connector

728 views
Skip to first unread message

Anirudh Sharma Arrojwala

unread,
Jul 20, 2018, 5:48:29 AM7/20/18
to Confluent Platform
Hi All,

We developed file source connectors(which suppose to pick files from specified location and process one after another) with source partition and source offset to capture previous offset to continue during the re-balancing while new connectors are added to worker. 

But during the re-balancing I observed that a new connect instance is getting created for already running connector, continuing both instance to parallel and logs are showing following error. A new connector instance is getting created each time whenever connectors are added/deleted to the worker. 


[2018-07-16 07:07:11,575] INFO Rebalance started (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1214)
[2018-07-16 07:07:11,575] INFO Stopping task ami-bge-location-source-connector-0 (org.apache.kafka.connect.runtime.Worker:464)
[2018-07-16 07:07:16,576] ERROR Graceful stop of task ami-bge-location-source-connector-0 failed. (org.apache.kafka.connect.runtime.Worker:493)
[2018-07-16 07:07:16,576] INFO Finished stopping tasks in preparation for rebalance (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1244)
[2018-07-16 07:07:16,576] INFO [Worker clientId=connect-1, groupId=ami-comed-source-connector] (Re-)joining group 


Please suggest how to restrict creating new instances for already running connector and resolve Graceful stop failed issue. 

Regards,
Anirudh. 

Konstantine Karantasis

unread,
Aug 16, 2018, 10:41:27 PM8/16/18
to confluent...@googlegroups.com

Checking again your implementations of poll and stop in your SourceTask might help. For some reason your source tasks don't complete the loop in a correct way, that will allow them to check that a stop due to rebalance has been requested. Thus, graceful stop times out after 5 seconds (which is the default timeout controlled by the property: task.shutdown.graceful.timeout.ms). But even after that, the task is not forcefully terminated. Connect depends on the task executing the poll loop regularly in order to check termination conditions. If the task doesn't do that, it might keep running in the background.

That's an educated guess of what might be happening. More logs would be helpful. 

Konstantine


--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/c6246046-22d5-40af-a76d-5d4c465991e1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages