Help with Kafka index tasks in failed state

440 views
Skip to first unread message

Luis Gomez

unread,
Jan 14, 2020, 5:10:23 AM1/14/20
to druid...@googlegroups.com
Hello,

I have an issue with Druid. I have a stream on Kafka and I subscribe to the topic through a supervisor.
The tasks are created and sometimes fail, although they do fail, the segments appear in the historical node and the queries are resolved so the segments are stored correctly.
I'm using Druid 0.16.1-incubating version and the last lines of the peon log, peon status and peon report are attached.
Could you help me get the tasks to end up in SUCCESS status instead of FAILED and understand what happens to fix it?

Thanks!
peon_log.txt
report.json
status_peon.json

Vaibhav Vaibhav

unread,
Jan 14, 2020, 7:19:43 AM1/14/20
to druid...@googlegroups.com
Hi Luis Gomez,

Please check the MiddleManager and Overlord log for the failed task which should give you more details.

Thanks and Regards,
Vaibhav

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CAMso-UpWNbLL2%3D6jdhRmny%2BDzVaKbLNyX%2BADWQ32UsfTooaq8Q%40mail.gmail.com.

Luis Gomez

unread,
Jan 14, 2020, 9:09:16 AM1/14/20
to Druid User
Hello,

Attached are the logs of MiddleManager, Overlord and the peon task.
Could you help me see what's going on and why the indexing task is failing?

Thank you!

El martes, 14 de enero de 2020, 13:19:43 (UTC+1), Vaibhav Vaibhav escribió:
Hi Luis Gomez,

Please check the MiddleManager and Overlord log for the failed task which should give you more details.

Thanks and Regards,
Vaibhav

On Tue, Jan 14, 2020 at 3:40 PM Luis Gomez <luis...@keepler.io> wrote:
Hello,

I have an issue with Druid. I have a stream on Kafka and I subscribe to the topic through a supervisor.
The tasks are created and sometimes fail, although they do fail, the segments appear in the historical node and the queries are resolved so the segments are stored correctly.
I'm using Druid 0.16.1-incubating version and the last lines of the peon log, peon status and peon report are attached.
Could you help me get the tasks to end up in SUCCESS status instead of FAILED and understand what happens to fix it?

Thanks!

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid...@googlegroups.com.
overlod.txt
peon_task_log.txt
middle_manager.txt

Vaibhav Vaibhav

unread,
Jan 14, 2020, 10:53:33 AM1/14/20
to druid...@googlegroups.com
Hi Luis ,

For the attached peon task the middle-manager and Overlord log do not have any details as they seems incomplete . Peon task has completed at 2020-01-14T13:47:24,929 however the middlemanager and overlord log has logging till 2020-01-14 12:47 only.

However, I looked into the one of the old  kafka indexing task for supervisor :[KafkaSupervisor-rt-idbox]. I see below error in the overlord log:

Task-Id: index_kafka_rt-idbox_4ed329169eec831_hcbhclbm

2020-01-14 12:46:39.004,"2020-01-14T12:46:39,004 INFO [KafkaSupervisor-rt-idbox] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - {id='rt-idbox', generationTime=2020-01-14T12:46:39.004Z, payload=KafkaSupervisorReportPayload{dataSource='rt-idbox', topic='rt-idbox', partitions=1, replicas=2, durationSeconds=1800, active=[{id='index_kafka_rt-idbox_1257830c5cc7315_dclelgnc', startTime=2020-01-14T12:17:05.024Z, remainingSeconds=26}, {id='index_kafka_rt-idbox_1257830c5cc7315_dknffpaf', startTime=2020-01-14T12:17:07.505Z, remainingSeconds=28}], publishing=[{id='index_kafka_rt-idbox_4ed329169eec831_hcbhclbm', startTime=2020-01-14T11:46:56.488Z, remainingSeconds=18}, {id='index_kafka_rt-idbox_4ed329169eec831_adfklccf', startTime=2020-01-14T11:46:56.066Z, remainingSeconds=18}], suspended=false, healthy=true, state=RUNNING, detailedState=RUNNING, recentErrors=[]}}"

2020-01-14 12:47:06.116,"2020-01-14T12:47:06,116 ERROR [KafkaSupervisor-rt-idbox] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - No task in [[index_kafka_rt-idbox_4ed329169eec831_hcbhclbm, index_kafka_rt-idbox_4ed329169eec831_adfklccf]] for taskGroup [0] succeeded before the completion timeout elapsed [PT1800S]!: {class=org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor}"

2020-01-14 12:47:06.116,"2020-01-14T12:47:06,116 INFO [KafkaSupervisor-rt-idbox] org.apache.druid.indexing.overlord.RemoteTaskRunner - Shutdown [index_kafka_rt-idbox_4ed329169eec831_hcbhclbm] because: [No task in pending completion taskGroup[0] succeeded before completion timeout elapsed]"


Kafka Indexing tasks are supposed to finish a task within the completion timeout. If they won't,The Kafka-supervisor assumes that there are some problems and issue a kill/shutdown signal to the tasks, that's what seems has happened here.

A running task will normally be in one of two states: reading or publishing. A task will remain in reading state for taskDuration, at which point it will transition to publishing state. A task will remain in publishing state for as long as it takes to generate segments, push segments to deep storage, and have them be loaded and served by a Historical process (or until completionTimeout elapses).
The length of time to wait before declaring a publishing task as failed and terminating it. If this is set too low, your tasks may never publish. The publishing clock for a task begins roughly after taskDuration elapses.

For now, Please increase the completion timeout to 60 minutes [ i.e PT60M] and see if that helps.

Additionally, I will suggest you to go through below link to fine-tune your kafka ingestion:

https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html
https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html#capacity-planning

Thanks and Regards,
Vaibhav



To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/b83ffa93-e4ee-44c8-9da7-0dc9b6a47eab%40googlegroups.com.

Laxmikant Pandhare

unread,
Sep 22, 2023, 6:18:10 PM9/22/23
to Druid User
Even after I set completion timeout to 60 minutes. My job is failing with same error.

No task in the corresponding pending completion taskGroup[0] succeeded before completion timeout ela...

Sergio Ferragut

unread,
Sep 26, 2023, 11:42:04 AM9/26/23
to druid...@googlegroups.com
A publishing task completes when a historical pick up the published segment(s) and announces. 
Your publishing task is not seeing that announcement. A few things to check:
- is the segment being published? Check deep storage to find it.
- is the coordinator assigning it? Check coordinator log for errors? 
- is the historical picking it up? Check coordinator/historical logs.
- is it being announced? Check historical logs and zookeeper if you are using zookeeper.


Reply all
Reply to author
Forward
0 new messages