KafkaSupervisor task failed with with completion timeout elapsed

906 views
Skip to first unread message

Sohel Sarder

unread,
Apr 25, 2018, 10:23:53 AM4/25/18
to Druid User

Hi All,

I am using 

1. Durid version is: druid-0.12.0
2. Kafaka version: kafka_2.11-1.1.0
3. druid-kafka-indexing-service extension for ingestion of kafka steam into druid. 

From my overlord log I can see the following error messages and task completed with failed status.

2018-04-25 11:47:28,742 ERROR i.d.i.k.s.KafkaSupervisor [KafkaSupervisor-test3] No task in [[index_kafka_test3_970f510e2628df6_phpjaoje]] succeeded before the completion timeout elapsed [PT1800S]!: {class=io.druid.indexing.kafka.supervisor.KafkaSupervisor}

I have found few of the task successfully completed though majorities are failed.

I have increased "taskDuration" in my supervisor spec and increased "druid.worker.capacity" in middle manager properties. 

can you please let me know what else configuration tuning i can do for it.

Thanks advance for your help.

Best Wishes,
Sohel

张鑫

unread,
Apr 25, 2018, 10:29:18 AM4/25/18
to druid...@googlegroups.com
same problem.Have you solved it?

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/3da3b1ba-7efb-455f-bfba-ce59a3ab0547%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jonathan Wei

unread,
Apr 27, 2018, 6:45:49 PM4/27/18
to druid...@googlegroups.com
How much data in bytes is being published by the task? Do the logs for the failed tasks show them trying to publish segments?

If given the data size it's not unreasonable for the segment publish to take 30 minutes or more, then you can try increasing `completionTimeout` in the IOConfig in the Kafka supervisor spec from the default 1800S.

If that time seems unreasonable for the amount of data you're working with, it may be good idea to try to see why the publish is taking so long.

Thanks,
Jon

Sohel Sarder

unread,
May 10, 2018, 3:56:44 AM5/10/18
to Druid User
Hi Jon,

Thanks for your reply. My data in bytes are very small only few kilo bytes and logs are showing every 30s tasks failed task was publishing before timeout happened.

2018-05-09 13:18:28,742 INFO i.d.i.k.s.KafkaSupervisor [KafkaSupervisor-test3] {id='test3', generationTime=2018-05-09T13:18:28.742Z, payload={dataSource='test3', topic='test3', partitions=1, replicas=1, durationSeconds=600, active=[{id='index_kafka_test3_04c8dd52c83a25b_hbicpkfi', startTime=null, remainingSeconds=null}], publishing=[{id='index_kafka_test3_04c8dd52c83a25b_aefkalhc', startTime=2018-05-09T12:39:30.055Z, remainingSeconds=62}]}}

2018-05-09 13:18:58,741 INFO i.d.i.k.s.KafkaSupervisor [KafkaSupervisor-test3] {id='test3', generationTime=2018-05-09T13:18:58.741Z, payload={dataSource='test3', topic='test3', partitions=1, replicas=1, durationSeconds=600, active=[{id='index_kafka_test3_04c8dd52c83a25b_hbicpkfi', startTime=null, remainingSeconds=null}], publishing=[{id='index_kafka_test3_04c8dd52c83a25b_aefkalhc', startTime=2018-05-09T12:39:30.055Z, remainingSeconds=32}]}}

2018-05-09 13:19:28,741 INFO i.d.i.k.s.KafkaSupervisor [KafkaSupervisor-test3] {id='test3', generationTime=2018-05-09T13:19:28.741Z, payload={dataSource='test3', topic='test3', partitions=1, replicas=1, durationSeconds=600, active=[{id='index_kafka_test3_04c8dd52c83a25b_hbicpkfi', startTime=null, remainingSeconds=null}], publishing=[{id='index_kafka_test3_04c8dd52c83a25b_aefkalhc', startTime=2018-05-09T12:39:30.055Z, remainingSeconds=2}]}}

2018-05-09 13:19:58,739 ERROR i.d.i.k.s.KafkaSupervisor [KafkaSupervisor-test3] No task in [[index_kafka_test3_04c8dd52c83a25b_aefkalhc]] succeeded before the completion timeout elapsed [PT1800S]!: {class=io.druid.indexing.kafka.supervisor.KafkaSupervisor}

Regards,
Sohel

On Saturday, April 28, 2018 at 4:45:49 AM UTC+6, Jonathan Wei wrote:
How much data in bytes is being published by the task? Do the logs for the failed tasks show them trying to publish segments?

If given the data size it's not unreasonable for the segment publish to take 30 minutes or more, then you can try increasing `completionTimeout` in the IOConfig in the Kafka supervisor spec from the default 1800S.

If that time seems unreasonable for the amount of data you're working with, it may be good idea to try to see why the publish is taking so long.

Thanks,
Jon
On Wed, Apr 25, 2018 at 7:29 AM, 张鑫 <zhangxin...@gmail.com> wrote:
same problem.Have you solved it?
2018-04-25 22:23 GMT+08:00 Sohel Sarder <sohel....@gmail.com>:

Hi All,

I am using 

1. Durid version is: druid-0.12.0
2. Kafaka version: kafka_2.11-1.1.0
3. druid-kafka-indexing-service extension for ingestion of kafka steam into druid. 

From my overlord log I can see the following error messages and task completed with failed status.

2018-04-25 11:47:28,742 ERROR i.d.i.k.s.KafkaSupervisor [KafkaSupervisor-test3] No task in [[index_kafka_test3_970f510e2628df6_phpjaoje]] succeeded before the completion timeout elapsed [PT1800S]!: {class=io.druid.indexing.kafka.supervisor.KafkaSupervisor}

I have found few of the task successfully completed though majorities are failed.

I have increased "taskDuration" in my supervisor spec and increased "druid.worker.capacity" in middle manager properties. 

can you please let me know what else configuration tuning i can do for it.

Thanks advance for your help.

Best Wishes,
Sohel

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/3da3b1ba-7efb-455f-bfba-ce59a3ab0547%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Steven Dang

unread,
Sep 20, 2018, 8:35:06 PM9/20/18
to Druid User
I run into same issue.  Any solution for this yet?
Reply all
Reply to author
Forward
0 new messages