Kafka ingestion sessionId=INVALID

1,159 views
Skip to first unread message

Raven Tan

unread,
Jul 29, 2021, 10:29:11 PM7/29/21
to Druid User

I have set up a kafka ingestion supervisor. In the beginning it works fine. However, usually after a day or two, it randomly hits the following error in log, but the ingestion task is still showing as successful, albeit no data is actually ingested.

...
2021-07-29T23:50:29,887 INFO [task-runner-0-priority-0] org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-kafka-supervisor-feommldm-1, groupId=kafka-supervisor-feommldm] Subscribed to partition(s): final-occupancy-0
2021-07-29T23:50:29,892 INFO [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Seeking partition[0] to[253221925924319233].
2021-07-29T23:50:29,892 INFO [task-runner-0-priority-0] org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-kafka-supervisor-feommldm-1, groupId=kafka-supervisor-feommldm] Seeking to offset 253221925924319233 for partition final-occupancy-0
2021-07-29T23:50:30,977 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@59b98ad1{/,null,AVAILABLE}
2021-07-29T23:50:30,998 INFO [main] org.eclipse.jetty.server.AbstractConnector - Started ServerConnector@7a3a49e5{HTTP/1.1, (http/1.1)}{0.0.0.0:8100}
2021-07-29T23:50:30,998 INFO [main] org.eclipse.jetty.server.Server - Started @9187ms
2021-07-29T23:50:30,999 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [ANNOUNCEMENTS]
2021-07-29T23:50:31,030 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Successfully started lifecycle [module]
2021-07-29T23:50:31,049 INFO [task-runner-0-priority-0] org.apache.kafka.clients.Metadata - [Consumer clientId=consumer-kafka-supervisor-feommldm-1, groupId=kafka-supervisor-feommldm] Cluster ID: pulsar-cluster-1
2021-07-29T23:51:01,112 INFO [task-runner-0-priority-0] org.apache.kafka.clients.FetchSessionHandler - [Consumer clientId=consumer-kafka-supervisor-feommldm-1, groupId=kafka-supervisor-feommldm] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 2042978450:
org.apache.kafka.common.errors.DisconnectException: null
2021-07-29T23:51:31,242 INFO [task-runner-0-priority-0] org.apache.kafka.clients.FetchSessionHandler - [Consumer clientId=consumer-kafka-supervisor-feommldm-1, groupId=kafka-supervisor-feommldm] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 2042978450:
org.apache.kafka.common.errors.DisconnectException: null
2021-07-29T23:52:01,426 INFO [task-runner-0-priority-0] org.apache.kafka.clients.FetchSessionHandler - [Consumer clientId=consumer-kafka-supervisor-feommldm-1, groupId=kafka-supervisor-feommldm] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 2042978450:
org.apache.kafka.common.errors.DisconnectException: null
...

I understand that this may not be druid issue per-se, I'd like some pointers on how can I find the root cause? Right now the log does not seem quite informative. Specifically, I have no idea why sessionId=INVALID in the first place. How can I gather more information?

Many thanks.

Peter Marshall

unread,
Aug 11, 2021, 4:14:09 AM8/11/21
to Druid User
Could this because there are no messages left to consume?  Druid acts just as a normal consumer, and (not that I know ANYTHING technical about Kafka!!) most of the Google results for INVALID epoch stuff is about timeouts and not being able to consume messages.

Maybe you need to increase the timeout on the tasks?  there are also some Druid config options around how far back in time to consume messages from, etc. (e.g. lateMessageRejectionPeriod) that may be worth investigating.

Reply all
Reply to author
Forward
0 new messages