Not able to send events from the strom-tranquility to druid

101 views
Skip to first unread message

ravi teja

unread,
Dec 17, 2014, 12:53:04 AM12/17/14
to druid-de...@googlegroups.com
Hi,
 
I am using tranquility inside my strom bolt to send data onto druid.
I am facing timeout issues when I am sending the events.

I have learnt from other posts that ti might occur when tranquility is not able to find the tasks.

When I checked the logs of strom, overlord and realtime, there is something fishy happening.

I see that tranquility creates different tasks, but looks out for different tasks while sending data.

I am using partitions as 1 and replication as 2.
Please find the logs:


strom-tranquility log while creation:


2014-12-17 05:00:21 c.m.c.s.control$ [INFO] Creating druid indexing task with id: index_realtime_offer_impressions_test_3_2014-12-17T05:00:00.000Z_0_1_fkaopgij (service = overlord)
2014-12-17 05:00:21 c.m.c.s.control$ [INFO] Created druid indexing task with id: index_realtime_offer_impressions_test_3_2014-12-17T05:00:00.000Z_0_0_dnfdfimf (service = overlord)
2014-12-17 05:00:21 c.m.c.s.control$ [INFO] Created druid indexing task with id: index_realtime_offer_impressions_test_3_2014-12-17T05:00:00.000Z_0_1_fkaopgij (service = overlord)

 Tasks created:
  index_realtime_offer_impressions_test_3_2014-12-17T05:00:00.000Z_0_0_dnfdfimf
  index_realtime_offer_impressions_test_3_2014-12-17T05:00:00.000Z_0_1_fkaopgij


Co-od console:
running: index_realtime_offer_impressions_test_3_2014-12-17T05:00:00.000Z_0_0_dnfdfimf
pending : index_realtime_offer_impressions_test_3_2014-12-17T05:00:00.000Z_0_1_fkaopgij


strom-tranquility log while sending data:

2014-12-17 04:58:50 c.m.e.c.LoggingEmitter [INFO] Event [{"feed":"alerts","timestamp":"2014-12-17T04:58:50.629Z","service":"tranquility","host":"localhost","severity":"anomaly","description":"Failed to propagate events: overlord/offer_impressions_test_3","data":{"exceptionType":"com.twitter.finagle.GlobalRequestTimeoutException","exceptionStackTrace":"com.twitter.finagle.GlobalRequestTimeoutException: exceeded 1.minutes+30.seconds to druid:firehose:offer_impressions_test_3-04-0000-0001 while waiting for a response for the request, including retries (if applicable)\n\tat com.twitter.finagle.NoStacktrace(Unknown Source)\n","timestamp":"2014-12-17T04:00:00.000Z","beams":"HashPartitionBeam(DruidBeam(timestamp = 2014-12-17T04:00:00.000Z, partition = 0, tasks = [index_realtime_offer_impressions_test_3_2014-12-17T04:00:00.000Z_0_0_pklcjeac/offer_impressions_test_3-04-0000-0000; index_realtime_offer_impressions_test_3_2014-12-17T04:00:00.000Z_0_1_epnblfng/offer_impressions_test_3-04-0000-0001]))","eventCount":101,"exceptionMessage":"exceeded 1.minutes+30.seconds to druid:firehose:offer_impressions_test_3-04-0000-0001 while waiting for a response for the request, including retries (if applicable)"}}]


2014-12-17 04:58:50 c.m.t.b.ClusteredBeam [WARN] Emitting alert: [anomaly] Failed to propagate events: overlord/offer_impressions_test_3
{
  "eventCount" : 101,
  "timestamp" : "2014-12-17T04:00:00.000Z",
  "beams" : "HashPartitionBeam(DruidBeam(timestamp = 2014-12-17T04:00:00.000Z, partition = 0, tasks = [index_realtime_offer_impressions_test_3_2014-12-17T04:00:00.000Z_0_0_pklcjeac/offer_impressions_test_3-04-0000-0000; index_realtime_offer_impressions_test_3_2014-12-17T04:00:00.000Z_0_1_epnblfng/offer_impressions_test_3-04-0000-0001]))"
}
 

Tasks being looked for:
index_realtime_offer_impressions_test_3_2014-12-17T04:00:00.000Z_0_0_pklcjeac
index_realtime_offer_impressions_test_3_2014-12-17T04:00:00.000Z_0_1_epnblfng
 

2014-12-17 05:00:21 c.m.t.b.ClusteredBeam [INFO] Writing new beam data to[/tranquility/beams/overlord/offer_impressions_test_3/data]: {"latestTime":"2014-12-17T05:00:00.000Z","latestCloseTime":"2014-12-17T03:00:00.000Z","beams":{"2014-12-17T04:00:00.000Z":[{"timestamp":"2014-12-17T04:00:00.000Z","partition":0,"tasks":[{"id":"index_realtime_offer_impressions_test_3_2014-12-17T04:00:00.000Z_0_0_pklcjeac","firehoseId":"offer_impressions_test_3-04-0000-0000"},{"id":"index_realtime_offer_impressions_test_3_2014-12-17T04:00:00.000Z_0_1_epnblfng","firehoseId":"offer_impressions_test_3-04-0000-0001"}]}],"2014-12-17T05:00:00.000Z":[{"timestamp":"2014-12-17T05:00:00.000Z","partition":0,"tasks":[{"id":"index_realtime_offer_impressions_test_3_2014-12-17T05:00:00.000Z_0_0_dnfdfimf","firehoseId":"offer_impressions_test_3-05-0000-0000"},{"id":"index_realtime_offer_impressions_test_3_2014-12-17T05:00:00.000Z_0_1_fkaopgij","firehoseId":"offer_impressions_test_3-05-0000-0001"}]}]}}



These are the different tasks which its looking for while sending, which don't exist at co-odinator.

Can you please help us in this regard, we are not able to send data to druid from strom because of this issue.

Is this a 


Thanks,
Ravi

ravi teja

unread,
Dec 17, 2014, 1:21:12 AM12/17/14
to druid-de...@googlegroups.com
The strom version being used is apache-storm-0.9.2-incubating and not the metamarkets patched strom.

Thanks,
Ravi

Nishant Bangarwa

unread,
Dec 17, 2014, 10:29:23 AM12/17/14
to druid-de...@googlegroups.com
Hi Ravi, 

how many middlemanagers are running and what is the configured capacity on them and how many are the number concurrent tasks running in the overlord.  
I wonder if the task is in pending state due to non-availablity of any capacity on the middlemanagers in the cluster ? 


--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/0e24db42-7131-4268-96fa-cd7eb3eedc32%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--

ravi teja

unread,
Dec 17, 2014, 12:00:34 PM12/17/14
to druid-de...@googlegroups.com
Hi Nishant,

Currently to narrow down this issue, we are running only one middle manager.
It has a capacity of 50 and only one task is running.

I am not sure why its waiting even though the capacity is present.

Thanks,
Ravi


Thanks,
Ravi

--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/7g8AYIokK40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.

To post to this group, send email to druid-de...@googlegroups.com.

Nishant Bangarwa

unread,
Dec 17, 2014, 12:31:30 PM12/17/14
to druid-de...@googlegroups.com
Hi Ravi, 

since you are running with a replication factor of 2, you will need at least 2 middlemanagers to replicate your data across different machines. 
when you run with a replication factor in tranquility, the tasks generated are assigned to the same availabilityGroup, to ensure that the replicated tasks are running on separate middlemanagers. 
can you try adding another middlemanager , this will allow for running both the tasks on different machines ? 

you can also look at http://druid.io/docs/latest/Tasks.html for availabilityGroup. 


For more options, visit https://groups.google.com/d/optout.

ravi teja

unread,
Dec 18, 2014, 6:24:39 AM12/18/14
to druid-de...@googlegroups.com
Hi Nishant,

I have started another realtime node, and this time there are no tasks in pending as well.
Still we get these same timeout exceptions.


The same problem continues even if I set the replication as 1.

Thanks,
Ravi

Gian Merlino

unread,
Dec 18, 2014, 8:23:00 PM12/18/14
to druid-de...@googlegroups.com
Do you have druid.indexer.task.chathandler.type=announce set in your middle manager properties? This is necessary to enable announcing of tasks in service discovery, which tranquility needs in order to find them. You should see something with the phrase "Announcing service" in your tasks logs, that should look like this:

Announcing service[DruidNode{serviceName='druid:firehose:offer_impressions_test_3-04-0000-0001', host='a.b.c.d', port=xxxx}]
Reply all
Reply to author
Forward
0 new messages