Probelm with data ingestion using Tranquility and Kafka server

51 views
Skip to first unread message

Bikash Singh

unread,
Dec 8, 2016, 2:56:55 AM12/8/16
to Druid Development
Hi..

I have got 3 topics configured in kafka which are consumed by Tranquility. I see a very strange thing that during ingestion, the druid task keep runnning without emitting a success message.
On seeing the logs, i came to know that they keep waiting for segment handoff and are not notified by coordinator that the newly created segment has been successfully handed over to historical node.

Observations:
1. Tranquility runs on a hour based granularity and tasks are submitted on per hour basis.
2. While some tasks run to completion with success message, other tasks keep waiting to get notified from coordinator.

Success Logs:
2016-12-07T13:15:23,198 INFO [pixel-feed-2016-12-07T12:00:00.000Z-persist-n-merge] io.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_realtime_pixel-feed_2016-12-07T12:00:00.000Z_0_0] to overlord[http://54.159.125.81:8090/druid/indexer/v1/action]: SegmentInsertAction{segments=[DataSegment{size=23330739, shardSpec=LinearShardSpec{partitionNum=0}, metrics=[count], dimensions=[uid, pixel_id, u1, u2, u3, u4, u5, u6, u7, u8, u9, u10, u11, u12, u13, u14, u15, u16, u17, u18, u19, u20, u21, u22, u23, u24, u25], version='2016-12-07T11:57:03.885Z', loadSpec={type=s3_zip, bucket=analyst-adhoc, key=druid/prod/v1/pixel-feed/2016-12-07T12:00:00.000Z_2016-12-07T13:00:00.000Z/2016-12-07T11:57:03.885Z/0/index.zip}, interval=2016-12-07T12:00:00.000Z/2016-12-07T13:00:00.000Z, dataSource='pixel-feed', binaryVersion='9'}]}
2016-12-07T13:15:23,198 INFO [pixel-feed-2016-12-07T12:00:00.000Z-persist-n-merge] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://54.159.125.81:8090
2016-12-07T13:16:01,293 INFO [coordinator_handoff_scheduled_0] io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2016-12-07T12:00:00.000Z/2016-12-07T13:00:00.000Z, version='2016-12-07T11:57:03.885Z', partitionNumber=0}]]
2016-12-07T13:17:01,293 INFO [coordinator_handoff_scheduled_0] io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Segment Handoff complete for dataSource[pixel-feed] Segment[SegmentDescriptor{interval=2016-12-07T12:00:00.000Z/2016-12-07T13:00:00.000Z, version='2016-12-07T11:57:03.885Z', partitionNumber=0}]
2016-12-07T13:17:01,294 INFO [plumber_merge_0] io.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[pixel-feed_2016-12-07T12:00:00.000Z_2016-12-07T13:00:00.000Z_2016-12-07T11:57:03.885Z] at path[/druid/prod/segments/54.197.229.247:8101/54.197.229.247:8101_realtime__default_tier_2016-12-07T11:57:04.002Z_e22e07f97e304e95bcc67ba7ad2cd4d30]
2016-12-07T13:17:01,294 INFO [plumber_merge_0] io.druid.curator.announcement.Announcer - unannouncing [/druid/prod/segments/54.197.229.247:8101/54.197.229.247:8101_realtime__default_tier_2016-12-07T11:57:04.002Z_e22e07f97e304e95bcc67ba7ad2cd4d30]
2016-12-07T13:17:01,321 INFO [plumber_merge_0] io.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[index_realtime_pixel-feed_2016-12-07T12:00:00.000Z_0_0]: LockReleaseAction{interval=2016-12-07T12:00:00.000Z/2016-12-07T13:00:00.000Z}
2016-12-07T13:17:01,336 INFO [plumber_merge_0] io.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_realtime_pixel-feed_2016-12-07T12:00:00.000Z_0_0] to overlord[http://54.159.125.81:8090/druid/indexer/v1/action]: LockReleaseAction{interval=2016-12-07T12:00:00.000Z/2016-12-07T13:00:00.000Z}
2016-12-07T13:17:01,336 INFO [plumber_merge_0] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://54.159.125.81:8090
2016-12-07T13:17:01,383 INFO [plumber_merge_0] io.druid.segment.realtime.plumber.RealtimePlumber - Deleting Index File[/mnt/xvdb/druid/prod/persistent/task/index_realtime_pixel-feed_2016-12-07T12:00:00.000Z_0_0/work/persist/pixel-feed/2016-12-07T12:00:00.000Z_2016-12-07T13:00:00.000Z]
2016-12-07T13:17:01,397 INFO [plumber_merge_0] io.druid.segment.realtime.plumber.RealtimePlumber - Removing sinkKey 1481112000000 for segment pixel-feed_2016-12-07T12:00:00.000Z_2016-12-07T13:00:00.000Z_2016-12-07T11:57:03.885Z
2016-12-07T13:17:01,405 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.RealtimeIndexTask - Job done!
2016-12-07T13:17:01,412 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_realtime_pixel-feed_2016-12-07T12:00:00.000Z_0_0",
  "status" : "SUCCESS",
  "duration" : 4800269
}

Logs of a task which keep on running
inatorClient.fetchServerView(CoordinatorClient.java:98) ~[druid-server-0.9.1.1.jar:0.9.1.1]
	at io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier.checkForSegmentHandoffs(CoordinatorBasedSegmentHandoffNotifier.java:101) [druid-server-0.9.1.1.jar:0.9.1.1]
	at io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier$1.run(CoordinatorBasedSegmentHandoffNotifier.java:86) [druid-server-0.9.1.1.jar:0.9.1.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_91]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]
Caused by: java.util.concurrent.ExecutionException: org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
	at com.google.common.util.concurrent.Futures$ImmediateFailedFuture.get(Futures.java:186) ~[guava-16.0.1.jar:?]
	at io.druid.client.coordinator.CoordinatorClient.fetchServerView(CoordinatorClient.java:82) ~[druid-server-0.9.1.1.jar:0.9.1.1]
	... 9 more
Caused by: org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
	at com.metamx.http.client.NettyHttpClient.go(NettyHttpClient.java:137) ~[http-client-1.0.4.jar:?]
	at com.metamx.http.client.AbstractHttpClient.go(AbstractHttpClient.java:14) ~[http-client-1.0.4.jar:?]
	at io.druid.client.coordinator.CoordinatorClient.fetchServerView(CoordinatorClient.java:68) ~[druid-server-0.9.1.1.jar:0.9.1.1]
	... 9 more
Caused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: /54.159.125.81:8081
	at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:139) ~[netty-3.10.4.Final.jar:?]
	at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83) ~[netty-3.10.4.Final.jar:?]
	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) ~[netty-3.10.4.Final.jar:?]
	at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) ~[netty-3.10.4.Final.jar:?]
	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[netty-3.10.4.Final.jar:?]
	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[netty-3.10.4.Final.jar:?]
	... 3 more
2016-12-08T07:51:43,632 INFO [coordinator_handoff_scheduled_0] io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2016-12-07T11:00:00.000Z/2016-12-07T12:00:00.000Z, version='2016-12-07T10:57:00.076Z', partitionNumber=0}]]
2016-12-08T07:52:33,619 INFO [coordinator_handoff_scheduled_0] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://54.159.125.81:8081
2016-12-08T07:52:43,631 WARN [HttpClient-Netty-Boss-0] org.jboss.netty.channel.SimpleChannelUpstreamHandler - EXCEPTION, please implement org.jboss.netty.handler.codec.http.HttpContentDecompressor.exceptionCaught() for proper handling.
org.jboss.netty.channel.ConnectTimeoutException: connection timed out: /54.159.125.81:8081
	at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:139) [netty-3.10.4.Final.jar:?]
	at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83) [netty-3.10.4.Final.jar:?]
	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [netty-3.10.4.Final.jar:?]
	at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) [netty-3.10.4.Final.jar:?]
	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.10.4.Final.jar:?]
	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.10.4.Final.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]
2016-12-08T07:52:43,631 ERROR [coordinator_handoff_scheduled_0] io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Exception while checking handoff for dataSource[pixel-feed] Segment[SegmentDescriptor{interval=2016-12-07T11:00:00.000Z/2016-12-07T12:00:00.000Z, version='2016-12-07T10:57:00.076Z', partitionNumber=0}], Will try again after [60000]secs
java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.client.coordinator.CoordinatorClient.fetchServerView(CoordinatorClient.java:98) ~[druid-server-0.9.1.1.jar:0.9.1.1]
	at io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier.checkForSegmentHandoffs(CoordinatorBasedSegmentHandoffNotifier.java:101) [druid-server-0.9.1.1.jar:0.9.1.1]
	at io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier$1.run(CoordinatorBasedSegmentHandoffNotifier.java:86) [druid-server-0.9.1.1.jar:0.9.1.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_91]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]
Caused by: java.util.concurrent.ExecutionException: org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
	at com.google.common.util.concurrent.Futures$ImmediateFailedFuture.get(Futures.java:186) ~[guava-16.0.1.jar:?]
	at io.druid.client.coordinator.CoordinatorClient.fetchServerView(CoordinatorClient.java:82) ~[druid-server-0.9.1.1.jar:0.9.1.1]
	... 9 more
Caused by: org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
	at com.metamx.http.client.NettyHttpClient.go(NettyHttpClient.java:137) ~[http-client-1.0.4.jar:?]
	at com.metamx.http.client.AbstractHttpClient.go(AbstractHttpClient.java:14) ~[http-client-1.0.4.jar:?]
	at io.druid.client.coordinator.CoordinatorClient.fetchServerView(CoordinatorClient.java:68) ~[druid-server-0.9.1.1.jar:0.9.1.1]
	... 9 more
Caused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: /54.159.125.81:8081
	at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:139) ~[netty-3.10.4.Final.jar:?]
	at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83) ~[netty-3.10.4.Final.jar:?]
	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) ~[netty-3.10.4.Final.jar:?]
	at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) ~[netty-3.10.4.Final.jar:?]
	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[netty-3.10.4.Final.jar:?]
	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[netty-3.10.4.Final.jar:?]
	... 3 more
2016-12-08T07:52:43,632 INFO [coordinator_handoff_scheduled_0] io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifier - Still waiting for Handoff for Segments : [[SegmentDescriptor{interval=2016-12-07T11:00:00.000Z/2016-12-07T12:00:00.000Z, version='2016-12-07T10:57:00.076Z', partitionNumber=0}]]

My Problem:
Why is that some task cannot connect to coordinator for segment handoffs while other are connecting to it succesfully and running to completion ??? I have configured my ports properly and none of the ports are blocking.

Can anyone please help me out ??

Fangjin Yang

unread,
Dec 13, 2016, 6:56:25 PM12/13/16
to Druid Development
Reply all
Reply to author
Forward
0 new messages