having trouble with stop/shutdown kafka indexing supervisor jobs

1,157 views
Skip to first unread message

Scott N

unread,
Oct 29, 2016, 9:45:02 AM10/29/16
to Druid User
Hi, fairly new to Druid and am primarily using the kafka indexing service. Having trouble shutting down running kafka indexing data sources.

curl -XGET http://IP:PORT/druid/indexer/v1/supervisor
returns an empty list, as it should, since I've run this:

curl -XPOST http://IP:PORT/druid/indexer/v1/supervisor/DATASOURCE/shutdown

on each dataSource in the kafka indexing specs.

However, for days now I still see activity in the logs (example below). Also, oddly, some data sources are still indexing incoming data, others are not. (viewing data in Pivot)

I've restarted all nodes and no change.

I've also tried killing running tasks I see in the coordinator console using 
curl -X POST -H 'Content-Type: application/json' -d @kill-task.json http://HOST:PORT/druid/indexer/v1/task

and the response is
{"error":"Task[index_kafka_DATASOURCE_54d9e987be475d5_kdejoddf] already exists!"}
which makes no sense to me


What's the trick to managing what the cluster is doing? Have I overloaded it and it's just really slowly catching up?

Thanks for any help!



task that is actively indexing data, even though does not show up in GET /supervisor call
2016-10-29T13:23:30,550 INFO [qtp26418585-82] io.druid.indexing.common.actions.LocalTaskActionClient - Performing action for task[index_kafka_DATASOURCE2_54d9e987be475d5_kdejoddf]: SegmentAllocateAction{dataSource='DATASOURCE2', timestamp=2016-10-29T13:23:30.501Z, queryGranularity=NoneGranularity, preferredSegmentGranularity=MINUTE, sequenceName='index_kafka_DATASOURCE2_54d9e987be475d5_1', previousSegmentId='DATASOURCE2_2016-10-29T13:21:00.000Z_2016-10-29T13:22:00.000Z_2016-10-29T13:21:30.194Z_3'}


task that is NOT actively indexing data:
2016-10-29T13:11:33,468 INFO [qtp1044965465-43] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-10-29T13:11:33.468Z","service":"druid/historical","host":"DRUIDHOST:9083","metric":"query/bytes","value":870,"context":"{\"finalize\":false,\"queryId\":\"ceabeae2-922a-4d07-9233-edb7c6f1da56\",\"timeout\":40000}","dataSource":"DATASOURCE","duration":"PT180120S","hasFilters":"false","id":"ceabeae2-922a-4d07-9233-edb7c6f1da56","interval":["2016-10-13T00:49:00.000Z/2016-10-15T01:15:00.000Z","2016-10-15T01:52:00.000Z/2016-10-15T02:45:00.000Z","2016-10-15T09:42:00.000Z/2016-10-15T09:44:00.000Z","2016-10-15T14:08:00.000Z/2016-10-15T14:49:00.000Z"],"remoteAddress":"xxxx","type":"segmentMetadata","version":"0.9.1.1"}]

2016-10-29T13:11:33,407 INFO [HttpClient-Netty-Worker-3] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-10-29T13:11:33.407Z","service":"druid/broker","host":"DRUIDHOST:9082","metric":"query/node/ttfb","value":8695,"dataSource":"DATASOURCE","duration":"PT180120S","hasFilters":"false","id":"ceabeae2-922a-4d07-9233-edb7c6f1da56","interval":["2016-10-13T00:49:00.000Z/2016-10-15T01:15:00.000Z","2016-10-15T01:52:00.000Z/2016-10-15T02:45:00.000Z","2016-10-15T09:42:00.000Z/2016-10-15T09:44:00.000Z","2016-10-15T14:08:00.000Z/2016-10-15T14:49:00.000Z"],"server":"DRUIDHOST:9083","type":"segmentMetadata","version":"0.9.1.1"}]

David Lim

unread,
Oct 31, 2016, 9:15:15 PM10/31/16
to Druid User
Hey Scott,

When you shutdown the supervisor, it attempts to stop the indexing tasks it was managing, but if the tasks don't respond before timeout, the supervisor will just exit and the tasks may remain. There's been some improvements in 0.9.2 that should make the KafkaIndexTask more responsive to commands from the supervisor even when it's in a bad state.

Regarding killing tasks, I believe you're using the wrong command to try to kill a task. What you're doing is submitting a KillTask, which is used to remove the metadata for a segment and remove it from deep storage, and you're giving your kill task the ID of an existing task which is why it gets rejected. The command you're actually looking for is:

POST http://<OVERLORD_IP>:<port>/druid/indexer/v1/task/{taskId}/shutdown

as described here: http://druid.io/docs/0.9.1.1/design/indexing-service.html

If you post the full task log of one of the tasks that isn't exiting when the supervisor is shutdown, I can check to see if there's any clues in there why it's not shutting down.

Suhas

unread,
Jun 4, 2018, 5:13:27 AM6/4/18
to Druid User
Hey David,

I'm using Kafka 1.1.0 and the problem still persists. When I submit a supervisor spec to the server, it creates a supervisor and then creates an indexing task for the same. Now, shutting the supervisor doesn't kill the indexing task immediately. I need to manually kill them. 

David Lim

unread,
Jun 4, 2018, 1:43:32 PM6/4/18
to Druid User
Hey Suhas,

Stopping the supervisor does not kill the indexing task immediately. Under normal circumstances, when the supervisor is stopped, it will attempt to signal the indexing tasks to stop reading and begin publishing. You can see if this is happening by searching for 'Pausing ingestion until resumed' in the task logs and reading the log messages following that. This publishing process can take minutes or even a few hours depending on the quantity/complexity of data read.

If the supervisor is not able to signal to the indexing tasks to stop reading, then the supervisor will exit without touching the indexing tasks (i.e. it will not force kill them). If this is what's happening, check your overlord logs to see why it was unable to send a request to the task.

Suhas

unread,
Jun 27, 2018, 7:46:57 AM6/27/18
to Druid User
Hey David,

This is happening very sporadically. I'm unable to reproduce this issue. But under normal circumstances, shutting down the supervisor finishes the task remaining and marks it as "Success." But on some days, shutting down the supervisor doesn't finish the task at all and runs forever. The overlord log shows "index_kafka_id" paused successfully. And it gets hung there forever.

Eddie C

unread,
Jun 27, 2018, 3:22:33 PM6/27/18
to Druid User
 I notice the same thing. We kill the supervisor tasks and the subtasks do not die. Very frustrating because the ids of the subtasks are like index_kafka_thing_f94ddb4adea3e44_dlepgcee . and it takes extra scripting to shutdown the subtasks. 
Reply all
Reply to author
Forward
0 new messages