Overlord logs
2022-03-11T06:35:14,890 INFO [KafkaSupervisor-events_datasource] org.apache.druid.indexing.overlord.MetadataTaskStorage - Inserting task index_kafka_events_datasource with status: TaskStatus{id=index_kafka_events_datasource, status=RUNNING, duration=-1, errorMsg=null}
2022-03-11T06:35:14,901 INFO [KafkaSupervisor-events_datasource] org.apache.druid.indexing.overlord.TaskLockbox - Adding task[index_kafka_events_datasource] to activeTasks
2022-03-11T06:35:14,902 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.TaskQueue - Asking taskRunner to run: index_kafka_events_datasource
2022-03-11T06:35:14,902 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Added pending task index_kafka_events_datasource
2022-03-11T06:35:14,902 INFO [rtr-pending-tasks-runner-0] org.apache.druid.indexing.overlord.RemoteTaskRunner - Coordinator asking Worker[
10.244.13.154:8091] to add task[index_kafka_events_datasource]
2022-03-11T06:35:14,906 INFO [rtr-pending-tasks-runner-0] org.apache.druid.indexing.overlord.RemoteTaskRunner - Task index_kafka_events_datasource switched from pending to running (on [
10.244.13.154:8091])
2022-03-11T06:35:14,920 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[
10.244.13.154:8091] wrote RUNNING status for task [index_kafka_events_datasource] on [TaskLocation{host='null', port=-1, tlsPort=-1}]
2022-03-11T06:35:14,923 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[
10.244.13.154:8091] wrote RUNNING status for task [index_kafka_events_datasource] on [TaskLocation{host='10.244.13.154', port=8100, tlsPort=-1}]
2022-03-11T06:35:31,118 INFO [KafkaSupervisor-events_datasource] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - {id='events_datasource', generationTime=2022-03-11T06:35:31.118Z, payload=KafkaSupervisorReportPayload{dataSource='events_datasource', topic='events_datasource_topic', partitions=1, replicas=1, durationSeconds=3600, active=[{id='index_kafka_events_datasource', startTime=2022-03-11T06:35:23.416Z, remainingSeconds=3592}], publishing=[], suspended=false, healthy=false, state=UNHEALTHY_TASKS, detailedState=UNHEALTHY_TASKS, recentErrors=[]}}
2022-03-11T06:35:31,160 INFO [KafkaSupervisor-events_datasource] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - {id='events_datasource', generationTime=2022-03-11T06:35:31.160Z, payload=KafkaSupervisorReportPayload{dataSource='events_datasource', topic='events_datasource_topic', partitions=1, replicas=1, durationSeconds=3600, active=[{id='index_kafka_events_datasource', startTime=2022-03-11T06:35:23.416Z, remainingSeconds=3592}], publishing=[], suspended=false, healthy=false, state=UNHEALTHY_TASKS, detailedState=UNHEALTHY_TASKS, recentErrors=[]}}
ERROR in OVERLORD Removing the task form activeTasks
2022-03-11T06:36:00,094 INFO [TaskQueue-StorageSync] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_events_datasource] from activeTasks
2022-03-11T06:36:00,094 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Shutdown [index_kafka_skills]]]
2022-03-11T06:36:00,095 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Sent shutdown message to worker:
10.244.13.154:8091, status 200 OK, response: {"task":"index_kafka_events_datasource"}
2022-03-11T06:36:00,146 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Shutdown [index_kafka_events_datasource] because: [task is not in knownTaskIds[[index_kafka_skills]]]
2022-03-11T06:36:00,147 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Sent shutdown message to worker:
10.244.13.154:8091, status 200 OK, response: {"task":"index_kafka_events_datasource"}
2022-03-11T06:36:00,494 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[
10.244.13.154:8091] wrote FAILED status for task [index_kafka_events_datasource] on [TaskLocation{host='10.214.13.134', port=8100, tlsPort=-1}]
2022-03-11T06:36:00,494 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[
10.244.13.154:8091] completed task[index_kafka_events_datasource] with status[FAILED]
2022-03-11T06:36:00,494 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskQueue - Received FAILED status for task:
2022-03-11T06:36:00,494 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Shutdown [index_kafka_events_datasource] because: [notified status change from task]
2022-03-11T06:36:00,494 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Cleaning up task[index_kafka_events_datasource] on worker[
10.244.13.154:8091]
2022-03-11T06:36:00,496 WARN [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskQueue - Unknown task completed: index_kafka_events_datasource
2022-03-11T06:36:00,496 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskQueue - Task FAILED: AbstractTask{id='index_kafka_events_datasource', groupId='index_kafka_events_datasource', taskResource=TaskResource{availabilityGroup='index_kafka_events_datasource_30c2b50db08af1a', requiredCapacity=1}, dataSource='events_datasource', context={forceTimeChunkLock=true, checkpoints={"0":{"0":126063}}, IS_INCREMENTAL_HANDOFF_SUPPORTED=true}} (45572 run duration)
2022-03-11T06:36:00,496 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Task[index_kafka_events_datasource] went bye bye.
MiddleManager Logs
corresponding ERROR in middleManager
2022-03-11T06:36:00,095 INFO [qtp1365833373-100] org.apache.druid.indexing.overlord.ForkingTaskRunner - Shutdown [index_kafka_events_datasource] because: [shut down request via HTTP endpoint]
2022-03-11T06:36:00,095 INFO [qtp1365833373-100] org.apache.druid.indexing.overlord.ForkingTaskRunner - Closing output stream to task[index_kafka_events_datasource].
2022-03-11T06:36:00,147 INFO [qtp1365833373-124] org.apache.druid.indexing.overlord.ForkingTaskRunner - Shutdown [index_kafka_events_datasource] because: [shut down request via HTTP endpoint]
2022-03-11T06:36:00,147 INFO [qtp1365833373-124] org.apache.druid.indexing.overlord.ForkingTaskRunner - Closing output stream to task[index_kafka_events_datasource].
2022-03-11T06:36:00,231 INFO [forking-task-runner-2-[index_kafka_events_datasource]] org.apache.druid.indexing.overlord.ForkingTaskRunner - Process exited with status[2] for task: index_kafka_events_datasource
2022-03-11T06:36:00,232 INFO [forking-task-runner-2] com.cloud.bots.druid.storage.OciTaskLogs - Pushing task log var/druid/task/index_kafka_events_datasource/log to druid/indexer-logs/index_kafka_events_datasource/log
2022-03-11T06:36:00,365 INFO [forking-task-runner-2] com.cloud.bots.druid.storage.OciTaskLogs - Pushing task reports var/druid/task/index_kafka_events_datasource/990d3707-37af-4f30-b98f-bcfd9ffaf12b/report.json to druid/indexer-logs/index_kafka_events_datasource/report.json
2022-03-11T06:36:00,485 INFO [forking-task-runner-2] org.apache.druid.indexing.overlord.ForkingTaskRunner - Removing task directory: var/druid/task/index_kafka_events_datasource
2022-03-11T06:36:00,493 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Task [index_kafka_events_datasource] completed with status [FAILED].
Corresponding issue in kafka index task logs
2022-03-11T06:36:00,095 INFO [parent-monitor-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Triggering JVM shutdown.
2022-03-11T06:36:00,096 INFO [Thread-69] org.apache.druid.cli.CliPeon - Running shutdown hook
2022-03-11T06:36:00,098 INFO [Thread-69] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]
Not sure about the why the task keep getting again and again if supervisor is suspended/terminated, but the overlord is removing the task from the active task list even when task is running, due to which task get terminated.
Waiting for valuable suggestions.
Thanks
Kuldeep Gaur