kafka-connect: Timeout expired while fetching topic metadata

10,396 views
Skip to first unread message

Barry Kaplan

unread,
Jun 13, 2016, 1:13:08 PM6/13/16
to Confluent Platform
When starting kafka-connect we are getting:

Exception in thread "main" org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata

This seems to cause kafka-connect to hang in zombie state. 

I can't tell which (or if any) setting will adjust this timeout, nor what the current value is.

We running two brokers on EC2 D2's which are 2% cpu. The configuration is pulling six topics. But I need to go into the hundreds.

Barry Kaplan

unread,
Jun 13, 2016, 1:19:37 PM6/13/16
to Confluent Platform
Also in this state the restapi does not respond so I can't even delete the configuration.

Ewen Cheslack-Postava

unread,
Jun 14, 2016, 1:16:44 AM6/14/16
to Confluent Platform
Can you include more of the log? It's not clear from just that error message which topic is the problem. It could be one of the cluster topics (config, offsets, status, etc) and could block startup of the worker, which would explain why the REST API does not respond.

If the topic does exist, this implies there's a connectivity issue with the cluster.

-Ewen

On Mon, Jun 13, 2016 at 10:19 AM, Barry Kaplan <mem...@gmail.com> wrote:
Also in this state the restapi does not respond so I can't even delete the configuration.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/b522a6f7-dbcd-44af-8148-8220a3ebe1e8%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Thanks,
Ewen

Barry Kaplan

unread,
Jun 14, 2016, 2:42:54 AM6/14/16
to Confluent Platform
That was all of the log. Well, actually it didn't even make it into the log, it went to stdout. Got hold of it from mesos UI. 

But the real problem is that when this error occurs, the REST API is dead and I cannot even reconfigure. 

This happened late last night (for me). I will do some experiments and see what I can learn. I'm guessing that I will have to destroy the connect config topic to get back to a working state. Which of course we be really bad if this was production.

Barry Kaplan

unread,
Jun 14, 2016, 2:45:58 AM6/14/16
to Confluent Platform
I see this occurs at Fetcher.java:262 in getTopicMetadata(). That method knows the topics in the MetadataRequest argument, but the exception does not include any of that information.

Barry Kaplan

unread,
Jun 14, 2016, 6:32:47 AM6/14/16
to Confluent Platform
The ultimate problem was that that the brokers were hung. We lost our zookeepers and kafka never recovered. After restarting the brokers, no more error getting topic metadata. I was put off by the error message, which lead me to believe that the broker was contacted but we just did not the result. But really the broker was never contacted.
Reply all
Reply to author
Forward
0 new messages