Kafka Indexing Service

887 views
Skip to first unread message

Robin

unread,
Jun 1, 2016, 4:56:59 AM6/1/16
to Druid User
Hi,

I'm reading up on the new experimental Kafka Indexing Service.

As I understand, I need to submit a supervisor spec for each of the data sources. The spec is then persisted to the metadata storage and will survive restarts of the overload, etc?

Also, does this indexing service mean that I can remove the specFile configuration from the realtime node?


regards,
Robin

Fangjin Yang

unread,
Jun 1, 2016, 11:04:55 AM6/1/16
to Druid User
As you'll note from 0.9.0, we've changed our entire getting started process to use the indexing service and remove the focus from realtime nodes. If you choose to use the indexing service you should no longer need realtime nodes.

David Lim

unread,
Jun 1, 2016, 11:57:41 AM6/1/16
to Druid User
Hi Robin,

Yeah that is correct - each data source is managed by one supervisor so you'll need to submit a spec for each data source. The spec is persisted to metadata storage and will automatically be loaded when an overlord restarts or takes leadership in a HA configuration.

Robin

unread,
Jun 7, 2016, 8:34:35 AM6/7/16
to Druid User
Hi,

Started on this today and I have run into a few problems. 

First KafkaIndexTask didn't work out that good.

The waiting for handoff appeared about 30 minutes once every minute before ending like this:
2016-06-07T12:02:25,928 INFO [coordinator_handoff_scheduled_0] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://127.0.0.1:8081
2016-06-07T12:02:25,972 INFO [coordinator_handoff_scheduled_0] io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2016-06-07T00:00:00.000Z/2016-06-08T00:00:00.000Z, version='2016-06-07T10:35:25.486Z', partitionNumber=0}]]
2016-06-07T12:02:42,387 INFO [qtp1261810493-31] io.druid.indexing.kafka.KafkaIndexTask - Stopping gracefully.
2016-06-07T12:02:42,387 INFO [qtp1261810493-31] io.druid.indexing.kafka.KafkaIndexTask - Interrupting run thread (status: [PUBLISHING])
2016-06-07T12:02:42,512 INFO [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.AppenderatorImpl - Shutting down...
2016-06-07T12:02:42,637 INFO [appenderator_persist_0] io.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[datasource_2016-06-07T00:00:00.000Z_2016-06-08T00:00:00.000Z_2016-06-07T10:35:25.486Z] at path[/druid/segments/127.0.0.1:8100/127.0.0.1:8100_indexer-executor__default_tier_2016-06-07T10:35:26.238Z_66bf5db44ba045248a92b99d82071cba0]
2016-06-07T12:02:42,638 INFO [appenderator_persist_0] io.druid.curator.announcement.Announcer - unannouncing [/druid/segments/127.0.0.1:8100/127.0.0.1:8100_indexer-executor__default_tier_2016-06-07T10:35:26.238Z_66bf5db44ba045248a92b99d82071cba0]
2016-06-07T12:02:43,170 INFO [appenderator_persist_0] io.druid.segment.realtime.appenderator.AppenderatorImpl - Removing sink for segment[datasource_2016-06-07T00:00:00.000Z_2016-06-08T00:00:00.000Z_2016-06-07T10:35:25.486Z].
2016-06-07T12:02:43,370 INFO [task-runner-0-priority-0] io.druid.indexing.kafka.KafkaIndexTask - The task was asked to stop before completing
2016-06-07T12:02:43,414 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_datasource_d173e68e1e4aa53_llplejna] status changed to [SUCCESS].
2016-06-07T12:02:43,671 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_kafka_datasource_d173e68e1e4aa53_llplejna",
  "status" : "SUCCESS",
  "duration" : 5418515
}

The second supervisor working on another kafka-topic didn't publish any events at all, rather just logging the following a few times during it's lifetime:
2016-06-07T11:00:24,799 INFO [task-runner-0-priority-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483647
dead.


regards,
Robin

drew dahlke

unread,
Jun 7, 2016, 10:06:23 AM6/7/16
to Druid User
I'm having the same problem as well with the kafka index task. The middle manager logs show it waiting for handoff and then marking the coordinator dead

{"t": "2016-06-07T13:46:54.716+0000", "message": "Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2016-06-07T13:00:00.000Z/2016-06-07T14:00:00.000Z, version='2016-06-07T13:00:01.283Z', partitionNumber=1}, SegmentDescriptor{interval=2016-06-07T12:00:00.000Z/2016-06-07T13:00:00.000Z, version='2016-06-07T12:00:27.406Z', partitionNumber=1}, SegmentDescriptor{interval=2016-06-07T11:00:00.000Z/2016-06-07T12:00:00.000Z, version='2016-06-07T11:00:05.975Z', partitionNumber=2}]]", "level": "INFO", "logger_name": "io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier", "thread_name": "coordinator_handoff_scheduled_0"}
{"t": "2016-06-07T13:47:10.207+0000", "message": "Generating: http://localhost:4000", "level": "INFO", "logger_name": "com.metamx.http.client.pool.ChannelResourceFactory", "thread_name": "HttpPostEmitter-1-0"}
{"t": "2016-06-07T13:47:11.762+0000", "message": "Generating: http://localhost:4000", "level": "INFO", "logger_name": "com.metamx.http.client.pool.ChannelResourceFactory", "thread_name": "HttpPostEmitter-1-0"}
{"t": "2016-06-07T13:47:12.048+0000", "message": "Generating: http://localhost:4000", "level": "INFO", "logger_name": "com.metamx.http.client.pool.ChannelResourceFactory", "thread_name": "HttpPostEmitter-1-0"}
{"t": "2016-06-07T13:47:12.192+0000", "message": "Marking the coordinator 2147483645 dead.", "level": "INFO", "logger_name": "org.apache.kafka.clients.consumer.internals.AbstractCoordinator", "thread_name": "task-runner-0-priority-0"}


The coordinator node is logging very actively and it's not crashing (supervisord log is clean)

{"t": "2016-06-07T13:53:52.827+0000", "message": "Found 0 availableSegments, skipping the cleanup of segments from historicals. This is done to prevent a race condition in which the coordinator would drop all segments if it started running cleanup before it finished polling the metadata storage for available segments for the first time.", "level": "INFO", "logger_name": "io.druid.server.coordinator.helper.DruidCoordinatorCleanupUnneeded", "thread_name": "Coordinator-Exec--0"}
{"t": "2016-06-07T13:53:52.827+0000", "message": "No segments found.  Cannot balance.", "level": "INFO", "logger_name": "io.druid.server.coordinator.helper.DruidCoordinatorBalancer", "thread_name": "Coordinator-Exec--0"}
{"t": "2016-06-07T13:53:52.828+0000", "message": "Load Queues:", "level": "INFO", "logger_name": "io.druid.server.coordinator.helper.DruidCoordinatorLogger", "thread_name": "Coordinator-Exec--0"}
{"t": "2016-06-07T13:53:52.828+0000", "message": "Server[172.18.11.43:8083, historical, _default_tier] has 0 left to load, 0 left to drop, 0 bytes queued, 0 bytes served.", "level": "INFO", "logger_name": "io.druid.server.coordinator.helper.DruidCoordinatorLogger", "thread_name": "Coordinator-Exec--0"}
{"t": "2016-06-07T13:53:52.828+0000", "message": "Server[172.18.11.200:8083, historical, _default_tier] has 0 left to load, 0 left to drop, 0 bytes queued, 0 bytes served.", "level": "INFO", "logger_name": "io.druid.server.coordinator.helper.DruidCoordinatorLogger", "thread_name": "Coordinator-Exec--0"}

Which is weird b/c there's totally a bunch of active segments in mysql

mysql> select count(*) from druid_segments where used = true;
+----------+
| count(*) |
+----------+
|     1306 |
+----------+

mysql> select count(*) from druid_pendingSegments ;
+----------+
| count(*) |
+----------+
|     1339 |
+----------+

It runs the coordination every minute

cat druid.coordinator.log | grep -v 4000 | grep Start
...
{"t": "2016-06-07T13:56:52.835+0000", "message": "Starting coordination. Getting available segments.", "level": "INFO", "logger_name": "io.druid.server.coordinator.helper.DruidCoordinatorSegmentInfoLoader", "thread_name": "Coordinator-Exec--0"}
{"t": "2016-06-07T13:57:52.837+0000", "message": "Starting coordination. Getting available segments.", "level": "INFO", "logger_name": "io.druid.server.coordinator.helper.DruidCoordinatorSegmentInfoLoader", "thread_name": "Coordinator-Exec--0"}
{"t": "2016-06-07T13:58:52.840+0000", "message": "Starting coordination. Getting available segments.", "level": "INFO", "logger_name": "io.druid.server.coordinator.helper.DruidCoordinatorSegmentInfoLoader", "thread_name": "Coordinator-Exec--0"}
{"t": "2016-06-07T13:59:52.843+0000", "message": "Starting coordination. Getting available segments.", "level": "INFO", "logger_name": "io.druid.server.coordinator.helper.DruidCoordinatorSegmentInfoLoader", "thread_name": "Coordinator-Exec--0"}
{"t": "2016-06-07T14:00:52.845+0000", "message": "Starting coordination. Getting available segments.", "level": "INFO", "logger_name": "io.druid.server.coordinator.helper.DruidCoordinatorSegmentInfoLoader", "thread_name": "Coordinator-Exec--0"}
{"t": "2016-06-07T14:01:52.848+0000", "message": "Starting coordination. Getting available segments.", "level": "INFO", "logger_name": "io.druid.server.coordinator.helper.DruidCoordinatorSegmentInfoLoader", "thread_name": "Coordinator-Exec--0"}

Any suggestions?

Drew

drew dahlke

unread,
Jun 7, 2016, 10:34:49 AM6/7/16
to Druid User
This is interesting. The coordinator log shows a new segment added with size=0 (which might exclude it from being loaded)

{"t": "2016-06-07T14:21:43.216+0000", "message": "Adding segment[DataSegment{size=0, shardSpec=NumberedShardSpec{partitionNum=4, partitions=0}, metrics=[events, tAbandon, tAbandon.cnt, tIvr, tIvr.cnt, tAnswered, tAnswered.cnt, tAcd, tAcd.cnt, tTalk, tTalk.cnt, tTalkCompleted, tTalkCompleted.cnt, tHeld, tHeld.cnt, tHeldCompleted, tHeldCompleted.cnt, tAcw, tAcw.cnt, tHandle, tHandle.cnt, tVoicemail, tVoicemail.cnt, tUserResponseTime, tUserResponseTime.cnt, tAgentResponseTime, tAgentResponseTime.cnt, nOffered, nOverSla, nTransferred, nDialerAttempted, nDialerConnected, nDialerAbandoned, nError, mCreatedVoicemailSize, mCreatedVoicemailDuration, mDeletedVoicemailSize, mDeletedVoicemailDuration, oMailboxVoicemailSize, oMailboxVoicemailDuration, oMailboxVoicemailCount], dimensions=[], version='2016-06-07T14:00:10.365Z', loadSpec={}, interval=2016-06-07T14:00:00.000Z/2016-06-07T15:00:00.000Z, dataSource='imetrics', binaryVersion='null'}] for server[DruidServerMetadata{name='172.18.11.43:8104', host='172.18.11.43:8104', maxSize=0, tier='_default_tier', type='indexer-executor', priority='0'}]", "level": "DEBUG", "logger_name": "io.druid.client.CoordinatorServerView", "thread_name": "CoordinatorServerView-0"}

But if you check mysql for that same segment, size=59840

mysql> select * from druid_segments where datasource = "imetrics" and start = "2016-06-07T14:00:00.000Z" limit 1 \G;
*************************** 1. row ***************************
          id: imetrics_2016-06-07T14:00:00.000Z_2016-06-07T15:00:00.000Z_2016-06-07T14:00:10.365Z_1
  dataSource: imetrics
created_date: 2016-06-07T14:21:09.100Z
       start: 2016-06-07T14:00:00.000Z
         end: 2016-06-07T15:00:00.000Z
 partitioned: 1
     version: 2016-06-07T14:00:10.365Z
        used: 1
     payload: {"dataSource":"imetrics","interval":"2016-06-07T14:00:00.000Z/2016-06-07T15:00:00.000Z","version":"2016-06-07T14:00:10.365Z","loadSpec":{"type":"s3_zip","bucket":"inin-dca-useast1-analytics","key":"druid/analytics-druid-v3/v3/realtime/archive/imetrics/2016-06-07T14:00:00.000Z_2016-06-07T15:00:00.000Z/2016-06-07T14:00:10.365Z/1/index.zip"},"dimensions":"orgid,cid,chat,chatroom,media,interaction_type,session_id,dg,direction,ani,callid,user,station,team,edge,dnis,wrapup_code,address_from,address_to,email","metrics":"events,tAbandon,tAbandon.cnt,tIvr,tIvr.cnt,tAnswered,tAnswered.cnt,tAcd,tAcd.cnt,tTalk,tTalk.cnt,tTalkCompleted,tTalkCompleted.cnt,tHeld,tHeld.cnt,tHeldCompleted,tHeldCompleted.cnt,tAcw,tAcw.cnt,tHandle,tHandle.cnt,tVoicemail,tVoicemail.cnt,tUserResponseTime,tUserResponseTime.cnt,tAgentResponseTime,tAgentResponseTime.cnt,nOffered,nOverSla,nTransferred,nDialerAttempted,nDialerConnected,nDialerAbandoned,nError,mCreatedVoicemailSize,mCreatedVoicemailDuration,mDeletedVoicemailSize,mDeletedVoicemailDuration,oMailboxVoicemailSize,oMailboxVoicemailDuration,oMailboxVoicemailCount","shardSpec":{"type":"numbered","partitionNum":1,"partitions":0},"binaryVersion":9,"size":59840,"identifier":"imetrics_2016-06-07T14:00:00.000Z_2016-06-07T15:00:00.000Z_2016-06-07T14:00:10.365Z_1"}
1 row in set (0.04 sec)

David Lim

unread,
Jun 7, 2016, 12:11:30 PM6/7/16
to Druid User
Hey Robin,

It sounds like you might have to set a completionTimeout > PT30M in your supervisor config file since the tasks are taking longer than 30 minutes to publish. You should make sure that your coordinator is alive and that you have sufficient capacity in your historical nodes to load the segments from deep storage, otherwise the segment will never be handed off and the task will sit around until it gets killed by the timeout.

For the second supervisor, hard to tell what's going on without seeing the logs. I see that 'marking the coordinator .. dead' log happen fairly frequently in my tasks and I find it usually recovers (but doesn't tell you it recovered). If you post your logs (task + overlord) and supervisor spec I can see if there's anything interesting going on in there.

David Lim

unread,
Jun 7, 2016, 12:26:26 PM6/7/16
to Druid User
Hey Drew,

The 'coordinator' in the logs isn't actually the Druid coordinator; the message is generated by the Kafka client:


{"t": "2016-06-07T13:47:12.192+0000", "message": "Marking the coordinator 2147483645 dead.", "level": "INFO", "logger_name": "org.apache.kafka.clients.consumer.internals.AbstractCoordinator", "thread_name": "task-runner-0-priority-0"}

I believe this message is logged whenever the Kafka consumer gets disconnected from a broker which can happen transiently for a number of different reasons.

If you're still having trouble with segments being published / handed off, if you post your task + overlord logs and supervisor spec I can take a look.

drew dahlke

unread,
Jun 7, 2016, 12:43:38 PM6/7/16
to Druid User
Awesome, thanks David
druid.zip

David Lim

unread,
Jun 7, 2016, 2:15:21 PM6/7/16
to Druid User
Hey Drew,

Your task logs are pretty intense since your logger seems to be aggregating messages from all your tasks and it's not immediately clear what's going on with any given task. What issue(s) are you actually seeing with the indexing tasks? Are they not handing off to the historicals properly? I saw some tasks succeeding for both evalmetrics and imetrics, so are only some of the tasks failing to publish?

drew dahlke

unread,
Jun 7, 2016, 2:34:44 PM6/7/16
to Druid User
I'll see a task complete, make its way deep storage, and get written to the druid_segments table. However the coordinator never seems to get the hint that it should assign the segment to a historical node. When it polls the mysql table it sees zero segments available. I enabled debug logging on the coordinator (why it's so verbose) hoping to see an explanation of why it wont assign any segments, but I'm not seeing anything of interest.

Another interesting log message from the middle manager

{"t": "2016-06-07T16:14:10.330+0000", "message": "[GET http://172.18.11.43:8081/druid/coordinator/v1/datasources/imetrics/intervals/2016-06-07T15:00:00.000Z_2016-06-07T16:00:00.000Z/serverview?partial=true] Channel disconnected before response complete", "level": "WARN", "logger_name": "com.metamx.http.client.NettyHttpClient", "thread_name": "HttpClient-Netty-Worker-2"}
{"t": "2016-06-07T16:14:10.330+0000", "message": "Exception while checking handoff for dataSource[imetrics] Segment[SegmentDescriptor{interval=2016-06-07T15:00:00.000Z/2016-06-07T16:00:00.000Z, version='2016-06-07T15:00:01.907Z', partitionNumber=2}], Will try again after [60000]secs", "level": "ERROR", "logger_name": "io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier", "thread_name": "coordinator_handoff_scheduled_0"}
java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.jboss.netty.channel.ChannelException: Channel disconnected
    at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
    at io.druid.client.coordinator.CoordinatorClient.fetchServerView(CoordinatorClient.java:98) ~[druid-server-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
    at io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier.checkForSegmentHandoffs(CoordinatorBasedSegmentHandoffNotifier.java:101) [druid-server-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
    at io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier$1.run(CoordinatorBasedSegmentHandoffNotifier.java:86) [druid-server-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_75]
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) [?:1.7.0_75]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) [?:1.7.0_75]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.7.0_75]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_75]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_75]
    at java.lang.Thread.run(Thread.java:745) [?:1.7.0_75]
Caused by: java.util.concurrent.ExecutionException: org.jboss.netty.channel.ChannelException: Channel disconnected
    at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-16.0.1.jar:?]
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-16.0.1.jar:?]
    at io.druid.client.coordinator.CoordinatorClient.fetchServerView(CoordinatorClient.java:68) ~[druid-server-0.9.1-SNAPSHOT.jar:0.9.1-SNAPSHOT]

I can curl http://172.18.11.43:8081/druid/coordinator/v1/datasources/imetrics/intervals/2016-06-07T15:00:00.000Z_2016-06-07T16:00:00.000Z/serverview?partial=true no problem and get the payload almost instantly.
Message has been deleted

David Lim

unread,
Jun 7, 2016, 5:19:41 PM6/7/16
to Druid User
Ah, ignore my last message, I think the log messages have changed and I was looking for the wrong thing. I'll keep investigating.

David Lim

unread,
Jun 7, 2016, 5:49:53 PM6/7/16
to Druid User
Hey Drew,

My suspicion is that there might be something misconfigured with your load/drop rules (http://druid.io/docs/latest/operations/rule-configuration.html). Could you check the default rules in the coordinator console (default http://{coordinator_ip}:8081/#/datasources, click on the pencil next to 'default rules') and make sure that it has load rules for your segments? The easiest thing to do is add a load forever rule for _default_tier as rule #1. If that seems okay, maybe you can try ingesting into a new datasource in case the datasource level rules got messed up at some point.

Robin

unread,
Jun 8, 2016, 4:29:25 AM6/8/16
to Druid User
Hi,

No worries about the second supervisor. Turns out the producer was silently dropping new events since the kafka restart.

I'll get back to you on the publish. However, taking more than 30 minutes to publish this small dataset sounds wrong. Could it be related to running threads? I found that the historical node was configured to run in one thread only.

regards,
Robin

Robin

unread,
Jun 8, 2016, 5:43:19 AM6/8/16
to Druid User
Hi again,

No, changing the thread setting didn't help at all.

Any hints on which nodes I should look so I can consolidate the logs during this time?

The second supervisor had about 20 events and the merged data was about 8k, so this must be related to something else.

thanks

Robin

unread,
Jun 8, 2016, 7:04:16 AM6/8/16
to Druid User
This is complete supervisor log from where the two kafka supervisor tasks have been running for 3600s until they are killed 1800s later.

drew dahlke

unread,
Jun 8, 2016, 10:03:14 AM6/8/16
to Druid User
David, either the coordinator console UI is broken (firefox/chrome) or there's a disagreement in state. The load rule is in mysql, shows up in the UI, but data sources are missing, and tasks are missing. Overlord & coordinator are on the same machine with the same common runtime config. The overlord seems to be running smooth. I attached some screenshots & mysql state so you can compare.
druid2.zip

David Lim

unread,
Jun 8, 2016, 12:48:00 PM6/8/16
to Druid User
Hey Drew,

Could you post a dump of your druid_segments table as well as your server configs (common.realtime.properties + realtime.properties for each node)? I haven't seen any clear hints of what's wrong yet. The ingestion part is working fine; the problem seem to be that the coordinator is unable to read the segments table for some reason.

drew dahlke

unread,
Jun 8, 2016, 1:48:34 PM6/8/16
to Druid User
Figured it out. This is what happens when you provide the mysql .9.0 extension to druid .9.1.. Whoops. Fixing that got everything back on track. Thanks a bunch for your help David


On Wednesday, June 1, 2016 at 4:56:59 AM UTC-4, Robin wrote:

David Lim

unread,
Jun 8, 2016, 1:52:40 PM6/8/16
to Druid User
Wow that's brutal. Glad you figured it out!

David Lim

unread,
Jun 8, 2016, 2:15:40 PM6/8/16
to Druid User
Hey Robin,

The ingestion related nodes are the overlord, middle manager (with the peon processes) and the realtime nodes (which I don't think you're using). Their job is finished once they have created a segment (check in your deep storage to confirm the files are there) and written segment metadata (check druid_segments table or whatever you named it as). Well, almost finished - in order to handle realtime queries, the peon process responds to broker queries using the data it has ingested but has not yet written into a segment. After pushing the segment to deep storage, it waits for a notice that a historical has begun serving the segment that it pushed so it knows it can stop responding to queries, at which point the task will complete. What's likely happening in your case is the segment never gets served by a historical so the task stays alive until it's killed by timeout.

So once you confirm that the segment is in deep storage and there's metadata written, you can focus on the logs for the coordinator and historical nodes. In your case, it looks like the coordinator doesn't find any segments that it needs to load. Reasons for this are usually a rule preventing the loading of segments or a dataSource having been disabled. Your logs show that you have two rules in place (a default rule and a datasource specific rule) so you should check to make sure that they aren't preventing the segment from being loaded. The druid_segments table has a used column which indicates whether or not the coordinator will load the segment which should give you some hints into what's going on.

Having said all this, you should probably see if you're having the same issue Drew had.. apparently the 0.9.0 mysql extension isn't compatible with 0.9.1 and prevents the coordinator from reading the segment table properly.

Robin

unread,
Jun 9, 2016, 5:25:57 AM6/9/16
to Druid User
Hey David!


On Wednesday, June 8, 2016 at 8:15:40 PM UTC+2, David Lim wrote:
Hey Robin,

Having said all this, you should probably see if you're having the same issue Drew had.. apparently the 0.9.0 mysql extension isn't compatible with 0.9.1 and prevents the coordinator from reading the segment table properly.

This was actually another issue I was looking into. The console was giving errors and it seemed that the druid didn't properly loaded some of the older segments. Upgraded this morning to 0.9.1-rc3 using the proper mysql extension of 0.9.1-rc3 and now it works as expected. The console works. All segments seems to be loaded as they should and more importantly, the segment hand off is working!

Thanks for the help. And also thanks to Drew, finding out that the older mysql extension didn't work :)

regards,
Robin

Chanh Le

unread,
Sep 26, 2016, 4:22:18 AM9/26/16
to Druid User
Hi everyone,

I have the same issue. 
This is the logs from the overlord. It throws a lot and my index task is not showing in console (:8090) but when I created the task I receive no error message.


2016-09-26T08:17:40,037 INFO [KafkaSupervisor-test-kafka-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483645 dead.
2016-09-26T08:17:40,051 INFO [KafkaSupervisor-test-kafka-task-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483646 dead.
2016-09-26T08:17:40,139 INFO [KafkaSupervisor-test-kafka-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483645 dead.
2016-09-26T08:17:40,152 INFO [KafkaSupervisor-test-kafka-task-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483646 dead.
2016-09-26T08:17:40,240 INFO [KafkaSupervisor-test-kafka-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483645 dead.
2016-09-26T08:17:40,253 INFO [KafkaSupervisor-test-kafka-task-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483646 dead.
2016-09-26T08:17:40,341 INFO [KafkaSupervisor-test-kafka-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483645 dead.
2016-09-26T08:17:40,354 INFO [KafkaSupervisor-test-kafka-task-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483646 dead.
2016-09-26T08:17:40,443 INFO [KafkaSupervisor-test-kafka-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483645 dead.
2016-09-26T08:17:40,456 INFO [KafkaSupervisor-test-kafka-task-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483646 dead.

my task 

{
"type": "kafka",
"dataSchema": {
"dataSource": "test-kafka",
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"timestampSpec": {
"column": "time",
"format": "auto"
},
"dimensionsSpec": {
"dimensions": [
"url",
"user"
]
}
}
},
"metricsSpec": [
{
"name": "views",
"type": "count"
},
{
"name": "latencyMs",
"fieldName": "latencyMs",
"type": "doubleSum"
}
],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": "MINUTE"
}
},
"ioConfig": {
"topic": "test",
"consumerProperties": {
"bootstrap.servers": "kafka1:9091,kafka1:9092,kafka2:9091,kafka2:9092"
},
"taskCount": 1,
"replicas": 1,
"taskDuration": "PT1H"

}
}



David Lim

unread,
Sep 26, 2016, 1:04:41 PM9/26/16
to Druid User
Hi Chanh,

Could you post the full overlord logs so that we can see if there's any hints in there?

Chanh Le

unread,
Sep 27, 2016, 12:08:18 AM9/27/16
to Druid User
Hi David,
Here are the full logs before it happened in overlord.



2016-09-26T07:47:07,174 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:47:07.174Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/bufferpool/count","value":0,"bufferpoolName":"mapped"}]
2016-09-26T07:47:11,435 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - {id='test-kafka', generationTime=2016-09-26T07:47:11.435Z, payload={dataSource='test-kafka', topic='test', partitions=2, replicas=1, durationSeconds=600, active=[{id='index_kafka_test-kafka_d7b4e7c4b7d4894_ofpogpli', startTime=2016-09-26T07:40:03.008Z, remainingSeconds=171}], publishing=[]}}
2016-09-26T07:47:36,420 INFO [TaskQueue-StorageSync] io.druid.indexing.overlord.TaskQueue - Synced 2 tasks from storage (0 tasks added, 0 tasks removed).
2016-09-26T07:47:41,436 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - {id='test-kafka', generationTime=2016-09-26T07:47:41.436Z, payload={dataSource='test-kafka', topic='test', partitions=2, replicas=1, durationSeconds=600, active=[{id='index_kafka_test-kafka_d7b4e7c4b7d4894_ofpogpli', startTime=2016-09-26T07:40:03.008Z, remainingSeconds=141}], publishing=[]}}
2016-09-26T07:48:06,914 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:06.914Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jetty/numOpenConnections","value":8}]
2016-09-26T07:48:07,172 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.172Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/mem/max","value":3087007744,"memKind":"heap"}]
2016-09-26T07:48:07,172 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.172Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/mem/committed","value":3087007744,"memKind":"heap"}]
2016-09-26T07:48:07,172 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.172Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/mem/used","value":452692792,"memKind":"heap"}]
2016-09-26T07:48:07,172 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.172Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/mem/init","value":3221225472,"memKind":"heap"}]
2016-09-26T07:48:07,172 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.172Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/mem/max","value":-1,"memKind":"nonheap"}]
2016-09-26T07:48:07,172 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.172Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/mem/committed","value":96100352,"memKind":"nonheap"}]
2016-09-26T07:48:07,172 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.172Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/mem/used","value":94380248,"memKind":"nonheap"}]
2016-09-26T07:48:07,172 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.172Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/mem/init","value":2555904,"memKind":"nonheap"}]
2016-09-26T07:48:07,172 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.172Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/max","value":251658240,"poolKind":"nonheap","poolName":"Code Cache"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.172Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/committed","value":31916032,"poolKind":"nonheap","poolName":"Code Cache"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/used","value":31589696,"poolKind":"nonheap","poolName":"Code Cache"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/init","value":2555904,"poolKind":"nonheap","poolName":"Code Cache"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/max","value":-1,"poolKind":"nonheap","poolName":"Metaspace"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/committed","value":57106432,"poolKind":"nonheap","poolName":"Metaspace"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/used","value":56019056,"poolKind":"nonheap","poolName":"Metaspace"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/init","value":0,"poolKind":"nonheap","poolName":"Metaspace"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/max","value":1073741824,"poolKind":"nonheap","poolName":"Compressed Class Space"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/committed","value":7077888,"poolKind":"nonheap","poolName":"Compressed Class Space"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/used","value":6771496,"poolKind":"nonheap","poolName":"Compressed Class Space"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/init","value":0,"poolKind":"nonheap","poolName":"Compressed Class Space"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/max","value":805306368,"poolKind":"heap","poolName":"PS Eden Space"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/committed","value":805306368,"poolKind":"heap","poolName":"PS Eden Space"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/used","value":407688760,"poolKind":"heap","poolName":"PS Eden Space"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/init","value":805306368,"poolKind":"heap","poolName":"PS Eden Space"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/max","value":134217728,"poolKind":"heap","poolName":"PS Survivor Space"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/committed","value":134217728,"poolKind":"heap","poolName":"PS Survivor Space"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/used","value":24161784,"poolKind":"heap","poolName":"PS Survivor Space"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/init","value":134217728,"poolKind":"heap","poolName":"PS Survivor Space"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/max","value":2147483648,"poolKind":"heap","poolName":"PS Old Gen"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/committed","value":2147483648,"poolKind":"heap","poolName":"PS Old Gen"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/used","value":20842248,"poolKind":"heap","poolName":"PS Old Gen"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/pool/init","value":2147483648,"poolKind":"heap","poolName":"PS Old Gen"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/gc/count","value":0,"gcName":"PS Scavenge"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/gc/time","value":0,"gcName":"PS Scavenge"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/gc/count","value":0,"gcName":"PS MarkSweep"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/gc/time","value":0,"gcName":"PS MarkSweep"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/bufferpool/capacity","value":8927071,"bufferpoolName":"direct"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/bufferpool/used","value":8927071,"bufferpoolName":"direct"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/bufferpool/count","value":396,"bufferpoolName":"direct"}]
2016-09-26T07:48:07,173 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/bufferpool/capacity","value":0,"bufferpoolName":"mapped"}]
2016-09-26T07:48:07,174 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.173Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/bufferpool/used","value":0,"bufferpoolName":"mapped"}]
2016-09-26T07:48:07,174 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:07.174Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"jvm/bufferpool/count","value":0,"bufferpoolName":"mapped"}]
2016-09-26T07:48:09,616 INFO [qtp557016605-162] io.druid.indexing.kafka.supervisor.KafkaSupervisor - Beginning shutdown of KafkaSupervisor[test-kafka]
2016-09-26T07:48:09,618 INFO [qtp557016605-162] io.druid.indexing.overlord.RemoteTaskRunner - Unregistered listener [KafkaSupervisor-test-kafka]
2016-09-26T07:48:09,618 INFO [qtp557016605-162] io.druid.indexing.kafka.supervisor.KafkaSupervisor - Posting GracefulShutdownNotice, signalling managed tasks to complete and publish
2016-09-26T07:48:09,623 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - Task group [0] has run for [PT600S]
2016-09-26T07:48:09,625 INFO [KafkaSupervisor-test-kafka-0] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://adx-pro-vdchcm-ants19-ip-10-199-0-19:8100
2016-09-26T07:48:09,916 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - Setting endOffsets for tasks in taskGroup [0] to {0=105954, 1=184201} and resuming
2016-09-26T07:48:09,917 INFO [KafkaSupervisor-test-kafka-0] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://adx-pro-vdchcm-ants19-ip-10-199-0-19:8100
2016-09-26T07:48:09,940 INFO [KafkaSupervisor-test-kafka-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483641 dead.
2016-09-26T07:48:09,949 INFO [qtp557016605-162] io.druid.indexing.kafka.supervisor.KafkaSupervisor - Shutdown notice handled
2016-09-26T07:48:09,949 INFO [qtp557016605-162] io.druid.indexing.kafka.supervisor.KafkaSupervisor - KafkaSupervisor[test-kafka] has stopped
2016-09-26T07:48:09,949 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - KafkaSupervisor[test-kafka] interrupted, exiting
2016-09-26T07:48:09,951 INFO [qtp557016605-162] org.apache.kafka.clients.consumer.ConsumerConfig - ConsumerConfig values:
2016-09-26T07:48:09,956 INFO [qtp557016605-162] org.apache.kafka.common.utils.AppInfoParser - Kafka version : 0.9.0.1
2016-09-26T07:48:09,956 INFO [qtp557016605-162] org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : 23c69d62a0cabf06
2016-09-26T07:48:09,956 INFO [qtp557016605-162] io.druid.indexing.kafka.supervisor.KafkaSupervisor - Started KafkaSupervisor[test-kafka], first run in [PT5S]
2016-09-26T07:48:10,604 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Worker[adx-pro-vdchcm-ants19-ip-10-199-0-19:8091] wrote SUCCESS status for task [index_kafka_test-kafka_d7b4e7c4b7d4894_ofpogpli] on [TaskLocation{host='adx-pro-vdchcm-ants19-ip-10-199-0-19', port=8100}]
2016-09-26T07:48:10,604 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Worker[adx-pro-vdchcm-ants19-ip-10-199-0-19:8091] completed task[index_kafka_test-kafka_d7b4e7c4b7d4894_ofpogpli] with status[SUCCESS]
2016-09-26T07:48:10,605 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.TaskQueue - Received SUCCESS status for task: index_kafka_test-kafka_d7b4e7c4b7d4894_ofpogpli
2016-09-26T07:48:10,605 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Cleaning up task[index_kafka_test-kafka_d7b4e7c4b7d4894_ofpogpli] on worker[adx-pro-vdchcm-ants19-ip-10-199-0-19:8091]
2016-09-26T07:48:10,606 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_test-kafka_d7b4e7c4b7d4894_ofpogpli] from activeTasks
2016-09-26T07:48:10,607 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.MetadataTaskStorage - Updating task index_kafka_test-kafka_d7b4e7c4b7d4894_ofpogpli to status: TaskStatus{id=index_kafka_test-kafka_d7b4e7c4b7d4894_ofpogpli, status=SUCCESS, duration=493156}
2016-09-26T07:48:10,609 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.TaskQueue - Task done: KafkaIndexTask{id=index_kafka_test-kafka_d7b4e7c4b7d4894_ofpogpli, type=index_kafka, dataSource=test-kafka}
2016-09-26T07:48:10,609 INFO [Curator-PathChildrenCache-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-09-26T07:48:10.609Z","service":"druid/overlord","host":"adx-pro-vdchcm-ants19-ip-10-199-0-19:8090","metric":"task/run/time","value":493156,"dataSource":"test-kafka","taskStatus":"SUCCESS","taskType":"index_kafka"}]
2016-09-26T07:48:10,609 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.TaskQueue - Task SUCCESS: KafkaIndexTask{id=index_kafka_test-kafka_d7b4e7c4b7d4894_ofpogpli, type=index_kafka, dataSource=test-kafka} (493156 run duration)
2016-09-26T07:48:10,609 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_test-kafka_d7b4e7c4b7d4894_ofpogpli] status changed to [SUCCESS].
2016-09-26T07:48:10,609 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Task[index_kafka_test-kafka_d7b4e7c4b7d4894_ofpogpli] went bye bye.
2016-09-26T07:48:14,957 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_realtime_ad_statistics_hourly_2016-09-26T07:00:00.000Z_0_0] location changed to [TaskLocation{host='adx-pro-vdchcm-ants19-ip-10-199-0-19', port=8101}].
2016-09-26T07:48:14,957 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.overlord.RemoteTaskRunner - Registered listener [KafkaSupervisor-test-kafka]
2016-09-26T07:48:15,064 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [0] discovered for topic [test1], adding to task group [0]
2016-09-26T07:48:15,064 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [1] discovered for topic [test1], adding to task group [1]
2016-09-26T07:48:15,064 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [2] discovered for topic [test1], adding to task group [2]
2016-09-26T07:48:15,064 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [3] discovered for topic [test1], adding to task group [0]
2016-09-26T07:48:15,064 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [4] discovered for topic [test1], adding to task group [1]
2016-09-26T07:48:15,064 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [5] discovered for topic [test1], adding to task group [2]
2016-09-26T07:48:15,065 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [6] discovered for topic [test1], adding to task group [0]
2016-09-26T07:48:15,065 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [7] discovered for topic [test1], adding to task group [1]
2016-09-26T07:48:15,065 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [8] discovered for topic [test1], adding to task group [2]
2016-09-26T07:48:15,065 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [9] discovered for topic [test1], adding to task group [0]
2016-09-26T07:48:15,065 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [10] discovered for topic [test1], adding to task group [1]
2016-09-26T07:48:15,065 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [11] discovered for topic [test1], adding to task group [2]
2016-09-26T07:48:15,065 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [12] discovered for topic [test1], adding to task group [0]
2016-09-26T07:48:15,065 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [13] discovered for topic [test1], adding to task group [1]
2016-09-26T07:48:15,065 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [14] discovered for topic [test1], adding to task group [2]
2016-09-26T07:48:15,065 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - New partition [15] discovered for topic [test1], adding to task group [0]
2016-09-26T07:48:15,066 INFO [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - Creating new task group [0] for partitions [0, 3, 6, 9, 12, 15]
2016-09-26T07:48:15,069 WARN [KafkaSupervisor-test-kafka-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - Topic in metadata storage [test] doesn't match spec topic [test1], ignoring stored offsets
2016-09-26T07:48:15,078 INFO [KafkaSupervisor-test-kafka-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483645 dead.
2016-09-26T07:48:15,180 INFO [KafkaSupervisor-test-kafka-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483645 dead.
2016-09-26T07:48:15,282 INFO [KafkaSupervisor-test-kafka-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483645 dead.
2016-09-26T07:48:15,384 INFO [KafkaSupervisor-test-kafka-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483645 dead.
2016-09-26T07:48:15,486 INFO [KafkaSupervisor-test-kafka-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483645 dead.
2016-09-26T07:48:15,588 INFO [KafkaSupervisor-test-kafka-0] org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483645 dead.

David Lim

unread,
Sep 27, 2016, 1:28:42 AM9/27/16
to Druid User
Hey Chanh,

The Kafka indexing service doesn't like it when you change the topic you're ingesting from because it causes the checks used to guarantee exactly-once ingestion to fail since the offsets and number of partitions are different between topics. There are two things you can do here:

1) Change the name of the datasource in the supervisor spec to something different that hasn't been used before.

2) Clear the metadata containing the persisted offset markers by removing the entry for that datasource from the druid_dataSource table.
        "name"</

Chanh Le

unread,
Sep 27, 2016, 4:54:59 AM9/27/16
to druid...@googlegroups.com
Hey David,
Thank you for your support.
I have some feedbacks about this:
it should have something to handle in this case right? Because it made a tons of log in short time right?


Regards,
Chanh




--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/VrICwDw8fhk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/98c6f22c-1a1f-4c64-8ad8-1590e638cc5d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Lim

unread,
Sep 27, 2016, 12:12:08 PM9/27/16
to Druid User
Hey Chanh,

We added an endpoint to reset the supervisor when things happen like the topic changing, see: https://github.com/druid-io/druid/pull/3484. It will be available in Druid 0.9.2. We didn't want to automatically reset the supervisor because doing so would break exactly-once ingestion and the user may not notice the log messages indicating this, so the supervisor/tasks will not proceed until manually reset.

Chanh Le

unread,
Sep 27, 2016, 12:15:12 PM9/27/16
to druid...@googlegroups.com
Thank you David.
This help me a lot.


--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/VrICwDw8fhk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages