Historicals not able to load shards of some segments

57 views
Skip to first unread message

Asra Yousuf

unread,
Feb 16, 2017, 4:05:16 AM2/16/17
to Druid User
Hello,

Currently I am facing an issue where in the certain historicals are not able to load particular shards of a given segments. This is happening only with a certain set of segments while other are getting loaded fine. In my deepstorage, the corresponding shards of the segment is present 

The logs on the coordinator read:

[ERROR] 2017-02-16 03:22:11.014 [Master-PeonExec--0] LoadQueuePeon - Server[/druid/loadQueue/ccg22history035623.ccg22.com:8083], throwable caught when submitting [SegmentChangeRequestLoad{segment=DataSegment{size=530025378, shardSpec=HashBasedNumberedShardSpec{partitionNum=1, partitions=5, partitionDimensions=[]}, metrics=[records, pageviews, visits, entrypage, exitpage, clicks, bounces, tpage_sum], dimensions=[page_group, page_name, pagegroup_link_name, page_link_name], version='2017-02-08T04:45:54.230Z', loadSpec={type=hdfs, path=hdfs://druid/deepstorage/druid_ingest/20170207T200000.000Z_20170207T210000.000Z/2017-02-08T04_45_54.230Z/1/index.zip}, interval=2017-02-07T20:00:00.000Z/2017-02-07T21:00:00.000Z, dataSource='druid_ingest', binaryVersion='9'}}].


On the historical, the error message is as follows:

[ERROR] 2017-02-16 07:06:35.179 [ZkCoordinator-0] ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[druid_ingest_2017-02-07T20:00:00.000Z_2017-02-07T21:00:00.000Z_2017-02-08T04:45:54.230Z_1], segment=DataSegment{size=530025378, shardSpec=HashBasedNumberedShardSpec{partitionNum=1, partitions=5, partitionDimensions=[]}, metrics=[records, pageviews, visits, entrypage, exitpage, clicks, bounces, tpage_sum], dimensions=[page_group, page_name, pagegroup_link_name, page_link_name], version='2017-02-08T04:45:54.230Z', loadSpec={type=hdfs, path=hdfs://druid_ingest/20170207T200000.000Z_20170207T210000.000Z/2017-02-08T04_45_54.230Z/1/index.zip}, interval=2017-02-07T20:00:00.000Z/2017-02-07T21:00:00.000Z, dataSource='druid_ingest', binaryVersion='9'}}

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[druid_ingest_2017-02-07T20:00:00.000Z_2017-02-07T21:00:00.000Z_2017-02-08T04:45:54.230Z_1]

        at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:310) ~[druid-server-0.9.2.jar:0.9.2]

        at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:351) [druid-server-0.9.2.jar:0.9.2]

        at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44) [druid-server-0.9.2.jar:0.9.2]

        at io.druid.server.coordination.ZkCoordinator$1.childEvent(ZkCoordinator.java:153) [druid-server-0.9.2.jar:0.9.2]

        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:522) [curator-recipes-2.11.0.jar:?]

        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.11.0.jar:?]

        at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-2.11.0.jar:?]

        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]

        at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:84) [curator-framework-2.11.0.jar:?]

        at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:513) [curator-recipes-2.11.0.jar:?]

        at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.11.0.jar:?]

        at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:773) [curator-recipes-2.11.0.jar:?]

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_73]

        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_73]

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_73]

        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_73]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_73]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_73]

        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_73]

Caused by: java.lang.IllegalStateException

        at com.google.common.base.Preconditions.checkState(Preconditions.java:161) ~[guava-16.0.1.jar:?]

        at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:636) ~[?:?]

        at org.apache.hadoop.hdfs.protocolPB.PBHelper.convertLocatedBlock(PBHelper.java:1062) ~[?:?]

        at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1080) ~[?:?]

        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:208) ~[?:?]

        at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source) ~[?:?]

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_73]

        at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_73]

        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) ~[?:?]

        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[?:?]

        at com.sun.proxy.$Proxy60.getBlockLocations(Unknown Source) ~[?:?]

        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1131) ~[?:?]

        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1121) ~[?:?]

        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1111) ~[?:?]

        at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:272) ~[?:?]

        at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:239) ~[?:?]

        at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:232) ~[?:?]

        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1279) ~[?:?]

        at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:296) ~[?:?]

        at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:292) ~[?:?]

        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[?:?]

        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:292) ~[?:?]

        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:765) ~[?:?]

        at io.druid.storage.hdfs.HdfsDataSegmentPuller$1.openInputStream(HdfsDataSegmentPuller.java:107) ~[?:?]

        at io.druid.storage.hdfs.HdfsDataSegmentPuller.getInputStream(HdfsDataSegmentPuller.java:298) ~[?:?]

        at io.druid.storage.hdfs.HdfsDataSegmentPuller$3.openStream(HdfsDataSegmentPuller.java:241) ~[?:?]

        at com.metamx.common.CompressionUtils$1.call(CompressionUtils.java:138) ~[java-util-0.27.10.jar:?]

        at com.metamx.common.CompressionUtils$1.call(CompressionUtils.java:134) ~[java-util-0.27.10.jar:?]

        at com.metamx.common.RetryUtils.retry(RetryUtils.java:60) ~[java-util-0.27.10.jar:?]

        at com.metamx.common.RetryUtils.retry(RetryUtils.java:78) ~[java-util-0.27.10.jar:?]

        at com.metamx.common.CompressionUtils.unzip(CompressionUtils.java:132) ~[java-util-0.27.10.jar:?]

        at io.druid.storage.hdfs.HdfsDataSegmentPuller.getSegmentFiles(HdfsDataSegmentPuller.java:235) ~[?:?]

        at io.druid.storage.hdfs.HdfsLoadSpec.loadSegment(HdfsLoadSpec.java:62) ~[?:?]

        at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:143) ~[druid-server-0.9.2.jar:0.9.2]

        at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:95) ~[druid-server-0.9.2.jar:0.9.2]

        at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.2.jar:0.9.2]

        at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:306) ~[druid-server-0.9.2.jar:0.9.2]



Can you help me decipher the error message in order to figure out what might be happening wrong? 

Regards,

Asra

Slim Bouguerra

unread,
Feb 16, 2017, 11:12:27 AM2/16/17
to druid...@googlegroups.com
this seem to be an issue from the hadoop hdfs side which hadoop version is this ? 
-- 

B-Slim
_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______

-- 
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/02310100-5af9-4421-be5f-b1e529cbfda6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Asra Yousuf

unread,
Feb 16, 2017, 11:19:57 AM2/16/17
to druid...@googlegroups.com
We are currently using  Hadoop 2.6.0.2.2.9.0-3393

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/02310100-5af9-4421-be5f-b1e529cbfda6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Druid User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/sjohJSq_q9o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

Slim Bouguerra

unread,
Feb 16, 2017, 11:43:10 PM2/16/17
to druid...@googlegroups.com
i am afraid this is a configuration issue, not sure how to reproduce this.
Can you check what makes some nodes fail and not others maybe they have different hadoop config files ?

-- 

B-Slim
_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages