com.twitter.finagle.FailedFastException: Endpoint druid:overlord is marked down.

464 views
Skip to first unread message

SShivaram

unread,
Apr 14, 2016, 9:23:05 PM4/14/16
to Druid Development
Hi All,

I'm trying to create a storm bolt using tranquility to send data to druid. My storm bolt looks as follows :

class DruidBeamBolt implements com.metamx.tranquility.storm.BeamFactory<Map<String, Object>> {

    @Override
    public Beam<Map<String, Object>> makeBeam(Map<?, ?> map, IMetricsContext imc) {
        try {

            final CuratorFramework curator = CuratorFrameworkFactory.newClient(
                    Configuration.ZOOKEEPER_SERVERS_CONFIG, new BoundedExponentialBackoffRetry(100, 1000, 5));
            curator.start(); 

            final String dataSource = Configuration.KAFKA_SERVERS_TOPIC;
            final List<String> dimensions = ImmutableList.of("timestamp, series[0].tags, series[0].host, series[0].device_name, series[0].interval, series[0].metric, series[0].points[0][0], series[0].points[0][1], series[0].type, series[1].tags[0], series[1].host, series[1].device_name, series[1].interval, series[1].metric, series[1].points[0][0], series[1].points[0][1], series[1].type");
            final List<AggregatorFactory> aggregators = ImmutableList.<AggregatorFactory>of(
                    new CountAggregatorFactory("events"),
                    new DoubleSumAggregatorFactory("sum_value", "value"),
                    new MaxAggregatorFactory("max_value", "value"),
                    new MinAggregatorFactory("min_value", "value"));

            final DruidBeams.Builder<Map<String, Object>> builder = DruidBeams
                    .builder(
                            new Timestamper<Map<String, Object>>() {
                                @Override
                                public DateTime timestamp(Map<String, Object> theMap) {
                                    try {
                                        Long date = Long.parseLong(theMap.get("timestamp").toString());
                                        date = date * 1000;
                                        return new DateTime(date.longValue());

                                    }
                                    catch(NumberFormatException e)
                                    {
                                        System.out.println(e);
                                    }

                                    return  DateTime.now();
                                }
                            }
                    )
                    .curator(curator)
                    .discoveryPath("/druid/discovery")
                    .location(
                            new DruidLocation(
                                    new DruidEnvironment(
                                            "druid/overlord",
                                            "druid:local:firehose:%s"
                                    ), dataSource
                            )
                    )
                    .rollup(DruidRollup.create(dimensions, aggregators, QueryGranularity.MINUTE))
                    .tuning(ClusteredBeamTuning.create(Granularity.HOUR, new Period("PT0M"), new Period("PT10M"), 1, 1));

            final Beam<Map<String, Object>> beam = builder.buildBeam();

            return beam;
        } catch (Exception e) {
            throw Throwables.propagate(e);
        }
    }
}

-------------------------------------------------Overlord/runtime.properties-----------------------------------------
druid.service=druid/overlord
druid.port=8090

druid.indexer.queue.startDelay=PT30S

druid.indexer.runner.type=remote
druid.indexer.storage.type=metadata

I get the following error when I run the code


30403 [ClusteredBeam-ZkFuturePool-e9a72a07-d141-4ba8-a9f1-2c02d30d6900] INFO  com.metamx.common.scala.net.finagle.DiscoResolver - Updating instances for service[druid:overlord] to Set(ServiceInstance{name='druid:overlord', id='cebfe02b-6593-49c4-866f-90e884847256', address='localhost', port=8090, sslPort=null, payload=null, registrationTimeUTC=1460677871525, serviceType=DYNAMIC, uriSpec=null})
30510 [ClusteredBeam-ZkFuturePool-e9a72a07-d141-4ba8-a9f1-2c02d30d6900] INFO  com.metamx.tranquility.finagle.FinagleRegistry - Created client for service: druid:overlord
31708 [finagle/netty3-17] WARN  com.metamx.tranquility.finagle.FutureRetry$ - Transient error, will try again in 12,079 ms
com.twitter.finagle.ChannelWriteException: java.net.ConnectException: Connection refused: no further information: localhost/127.0.0.1:8090 from service: druid:overlord
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.7.0_79]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) ~[na:1.7.0_79]
at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) [netty-3.10.5.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) [netty-3.10.5.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) [netty-3.10.5.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) [netty-3.10.5.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) [netty-3.10.5.Final.jar:na]
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.10.5.Final.jar:na]
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.10.5.Final.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
Caused by: java.net.ConnectException: Connection refused: no further information: localhost/127.0.0.1:8090
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Any pointers will be appreciated!




Gian Merlino

unread,
Apr 15, 2016, 7:35:50 PM4/15/16
to druid-de...@googlegroups.com
"Connection refused: no further information: localhost/127.0.0.1:8090" sounds like either there is no overlord running, or maybe you have an overlord but it is not announcing itself properly in service discovery (registering itself as "localhost" instead of an actual reachable hostname).

If it's the latter, try setting "druid.host=xxx" on the overlord's runtime.properties, where "xxx" is an IP address that the overlord is externally reachable on.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/71eac786-b1db-4009-9ec0-e2f598846e44%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

SShivaram

unread,
Apr 17, 2016, 9:21:21 PM4/17/16
to Druid Development
Thanks Gian. The previous error was taken of by following your suggestion. Now, I seems to be getting the following error,

135012 [finagle/netty3-3] WARN  com.metamx.tranquility.beam.ClusteredBeam - Emitting alert: [anomaly] Failed to propagate events: druid:overlord/test
{
  "eventCount" : 179,
  "timestamp" : "2016-04-18T01:00:00.000Z",
  "beams" : "MergingPartitioningBeam(DruidBeam(interval = 2016-04-18T01:00:00.000/2016-04-18T02:00:00.000, partition = 0, tasks = [index_realtime_test_2016-04-18T01:00:00.000Z_0_0/test-01-0000-0000]))"
}
java.io.IOException: Failed to send request to task[index_realtime_test_2016-04-18T01:00:00.000Z_0_0]: 500 Internal Server Error

I looked at the overlord console/logs to see if I could see any errors:

2016-04-18T01:14:15,469 INFO [main] io.druid.indexing.worker.executor.ExecutorLifecycle - Attempting to lock file[/tmp/persistent/task/index_realtime_test_2016-04-18T01:00:00.000Z_0_0/lock].
2016-04-18T01:14:15,470 INFO [main] io.druid.indexing.worker.executor.ExecutorLifecycle - Acquired lock file[/tmp/persistent/task/index_realtime_test_2016-04-18T01:00:00.000Z_0_0/lock] in 1ms.
2016-04-18T01:14:15,474 INFO [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Running task: index_realtime_test_2016-04-18T01:00:00.000Z_0_0
2016-04-18T01:14:15,483 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Creating plumber using rejectionPolicy[io.druid.segment.realtime.plumber.NoopRejectionPolicyFactory$1@3ec869fe]
2016-04-18T01:14:15,485 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Expect to run at [2016-04-18T02:10:00.000Z]
2016-04-18T01:14:15,487 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Starting merge and push.
2016-04-18T01:14:15,487 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] segments. Attempting to hand off segments that start before [1970-01-01T00:00:00.000Z].
2016-04-18T01:14:15,487 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] sinks to persist and merge
2016-04-18T01:14:15,516 INFO [task-runner-0] io.druid.segment.realtime.firehose.EventReceiverFirehoseFactory - Connecting firehose: firehose:druid:overlord:test-01-0000-0000
2016-04-18T01:14:15,526 INFO [task-runner-0] io.druid.segment.realtime.firehose.EventReceiverFirehoseFactory - Found chathandler of class[io.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider]
2016-04-18T01:14:15,526 INFO [task-runner-0] io.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - Registering Eventhandler[firehose:druid:overlord:test-01-0000-0000]
2016-04-18T01:14:15,531 INFO [task-runner-0] io.druid.curator.discovery.CuratorServiceAnnouncer - Announcing service[DruidNode{serviceName='firehose:druid:overlord:test-01-0000-0000', host='192.168.100.150', port=8100}]
2016-04-18T01:14:15,557 INFO [task-runner-0] io.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - Registering Eventhandler[test-01-0000-0000]
2016-04-18T01:14:15,558 INFO [task-runner-0] io.druid.curator.discovery.CuratorServiceAnnouncer - Announcing service[DruidNode{serviceName='test-01-0000-0000', host='192.168.100.150', port=8100}]
2016-04-18T01:14:15,566 WARN [task-runner-0] org.apache.curator.utils.ZKPaths - The version of ZooKeeper being used doesn't support Container nodes. CreateMode.PERSISTENT will be used instead.
2016-04-18T01:14:15,568 INFO [task-runner-0] io.druid.server.metrics.EventReceiverFirehoseRegister - Registering EventReceiverFirehoseMetric for service [firehose:druid:overlord:test-01-0000-0000]
2016-04-18T01:14:15,569 INFO [task-runner-0] io.druid.data.input.FirehoseFactory - Firehose created, will shut down at: 2016-04-18T02:15:00.000Z

I couldn't see any problems. Any pointers about what I might be doing wrong?


Fangjin Yang

unread,
Apr 26, 2016, 1:01:11 PM4/26/16
to Druid Development
Do you see errors in the logs of the task itself?
Reply all
Reply to author
Forward
0 new messages