Druid ingestion task failing randomly

113 views
Skip to first unread message

Anuj

unread,
Aug 24, 2017, 5:44:20 AM8/24/17
to Druid Development
Hi,

We are using druid 0.9.2.And while running batch ingestion task getting below exception 

2017-08-17T13:29:20,084 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1501261194059_188051_m_000797_0, Status : FAILED
Error: java.lang.RuntimeException: native lz4 library not available
        at org.apache.hadoop.io.compress.Lz4Codec.getCompressorType(Lz4Codec.java:125)
        at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)
        at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:165)
        at org.apache.hadoop.mapred.IFile$Writer.<init>(IFile.java:114)
        at org.apache.hadoop.mapred.IFile$Writer.<init>(IFile.java:97)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1609)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1489)
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


Because of this tasks are failing randomly.

Not able to understand what can be the reason for the same.

Anuj

unread,
Sep 13, 2017, 7:54:13 AM9/13/17
to Druid Development
Hi,

Does anyone have any ides about the same.

Sai Chandu

unread,
Apr 9, 2018, 5:32:10 AM4/9/18
to Druid Development
hi Anuj,
    Did you find any solution for this ERROR 
 
    for me tranquility keeps on getting this error idnt know why....
    sometimes the data will push to druid some times its not...its unable to handoff properly with histrolical nodes 
 
    can u tell me what are the main causes for this ERROR    

    Emitting alert: [anomaly] Failed to propagate events: druid
:overlord/spyagent
{
  "eventCount" : 1,
  "timestamp" : "2018-04-09T09:13:00.000Z",
  "beams" : "MergingPartitioningBeam(DruidBeam(interval = 2018-04-09T09:13:00.000Z/2018-04-09T09:14:00.000Z, partition = 0, tasks = [index_realtime_sp
yagent_2018-04-09T09:13:00.000Z_0_0/spyagent-013-0000-0000]))"
}
com.twitter.finagle.NoBrokersAvailableException: No hosts are available for disco!firehose:druid:overlord:spyagent-013-0000-0000, Dtab.base=[], Dtab.l
ocal=[]

Gian Merlino

unread,
Apr 9, 2018, 1:38:21 PM4/9/18
to druid-de...@googlegroups.com
Hi Sai,

It could be because your realtime tasks are not able to start up. Try verifying that handoff is working (tasks exit promptly). If they aren't, check here for troubleshooting tips: https://github.com/druid-io/tranquility/blob/master/docs/trouble.md.

Btw, this message is a better fit for the users list: druid...@googlegroups.com, so if you need to follow up then please use that list rather than this thread. Thanks!

Gian

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/16dffea5-14ef-4e26-80a5-1378752591d4%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages