Deleted a segment from hdfs directly

104 views
Skip to first unread message

Amol Purohit

unread,
Nov 19, 2015, 6:17:06 PM11/19/15
to Druid User
I deleted a segment from our hdfs directly. Now my historical node won't start up. Looks like it is still trying to load the missing segment. I also went into mysql and deleted the record for that segment. The historical node is still trying to load the missing segment. How do I recover from this?

Bingkun Guo

unread,
Nov 19, 2015, 6:30:13 PM11/19/15
to Druid User
Hi Amol,

Could you please provide some log? Did you also cleanup the cache directory of historical?

Amol Purohit

unread,
Nov 20, 2015, 1:11:33 PM11/20/15
to Druid User
Hi Bingkun,

Where is this cache directory of historical?

Here is the log from historical node when it tries to save a record handed off by realtime.


2015-11-20T17:56:21,303 ERROR [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z], segment=DataSegment{size=9852, shardSpec=NoneShardSpec, metrics=[count, added, deleted, delta], dimensions=[anonymous, city, continent, country, language, namespace, newPage, page, region, robot, unpatrolled, user], version='2015-11-13T19:23:45.354Z', loadSpec={type=hdfs, path=/AAA/druid/segments/wikipedia/20130831T000000.000Z_20130901T000000.000Z/2015-11-13T19_23_45.354Z/0/index.zip}, interval=2013-08-31T00:00:00.000Z/2013-09-01T00:00:00.000Z, dataSource='wikipedia', binaryVersion='9'}}
io.druid.segment.loading.SegmentLoadingException: Exception loading segment[wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z]
        at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:146) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:171) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:42) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.server.coordination.BaseZkCoordinator$1.childEvent(BaseZkCoordinator.java:115) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.8.0.jar:?]
        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:510) [curator-recipes-2.8.0.jar:?]
        at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92) [curator-framework-2.8.0.jar:?]
        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
        at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:84) [curator-framework-2.8.0.jar:?]
        at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:508) [curator-recipes-2.8.0.jar:?]
        at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.8.0.jar:?]
        at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:759) [curator-recipes-2.8.0.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_05]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [?:1.7.0_05]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [?:1.7.0_05]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_05]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [?:1.7.0_05]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [?:1.7.0_05]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [?:1.7.0_05]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [?:1.7.0_05]
        at java.lang.Thread.run(Thread.java:722) [?:1.7.0_05]
Caused by: io.druid.segment.loading.SegmentLoadingException: /tmp/druid/indexCache/wikipedia/2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z/2015-11-13T19:23:45.354Z/0/index.drd (No such file or directory)
        at io.druid.segment.loading.MMappedQueryableIndexFactory.factorize(MMappedQueryableIndexFactory.java:40) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:94) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:151) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:142) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        ... 20 more
Caused by: java.io.FileNotFoundException: /tmp/druid/indexCache/wikipedia/2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z/2015-11-13T19:23:45.354Z/0/index.drd (No such file or directory)
        at java.io.FileInputStream.open(Native Method) ~[?:1.7.0_05]
        at java.io.FileInputStream.<init>(FileInputStream.java:138) ~[?:1.7.0_05]
        at io.druid.segment.SegmentUtils.getVersionFromDir(SegmentUtils.java:24) ~[druid-api-0.3.9.jar:0.8.1-iap2]
        at io.druid.segment.IndexIO.loadIndex(IndexIO.java:165) ~[druid-processing-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.segment.loading.MMappedQueryableIndexFactory.factorize(MMappedQueryableIndexFactory.java:37) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:94) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:151) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:142) ~[druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        ... 20 more
2015-11-20T17:56:21,304 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - zNode[/druid/loadQueue/lpdbd0036.phx.aexp.com:8083/wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z] was removed
2015-11-20T17:56:51,267 INFO [ZkCoordinator-Exec--0] io.druid.server.coordination.ServerManager - Told to delete a queryable for a dataSource[wikipedia] that doesn't exist.
2015-11-20T17:56:51,267 WARN [ZkCoordinator-Exec--0] io.druid.server.coordination.ZkCoordinator - Unable to delete segmentInfoCacheFile[/tmp/druid/indexCache/info_dir/wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z]
2015-11-20T17:57:21,282 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - New request[LOAD: wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z] with zNode[/druid/loadQueue/lpdbd0036.phx.aexp.com:8083/wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z].
2015-11-20T17:57:21,282 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Loading segment wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z
2015-11-20T17:57:21,285 WARN [ZkCoordinator-0] com.metamx.common.RetryUtils - Failed on try 1, retrying in 2,044ms.
java.io.FileNotFoundException: File /AAA/druid/segments/wikipedia/20130831T000000.000Z_20130901T000000.000Z/2015-11-13T19_23_45.354Z/0/index.zip does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) ~[?:?]
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:722) ~[?:?]
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) ~[?:?]
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398) ~[?:?]
        at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137) ~[?:?]
        at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) ~[?:?]
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:765) ~[?:?]
        at io.druid.storage.hdfs.HdfsDataSegmentPuller$1.openInputStream(HdfsDataSegmentPuller.java:108) ~[?:?]
        at io.druid.storage.hdfs.HdfsDataSegmentPuller.getInputStream(HdfsDataSegmentPuller.java:299) ~[?:?]
        at io.druid.storage.hdfs.HdfsDataSegmentPuller$3.openStream(HdfsDataSegmentPuller.java:242) ~[?:?]
        at com.metamx.common.CompressionUtils$1.call(CompressionUtils.java:136) ~[java-util-0.27.0.jar:?]
        at com.metamx.common.CompressionUtils$1.call(CompressionUtils.java:132) ~[java-util-0.27.0.jar:?]
        at com.metamx.common.RetryUtils.retry(RetryUtils.java:38) [java-util-0.27.0.jar:?]
        at com.metamx.common.CompressionUtils.unzip(CompressionUtils.java:130) [java-util-0.27.0.jar:?]
        at io.druid.storage.hdfs.HdfsDataSegmentPuller.getSegmentFiles(HdfsDataSegmentPuller.java:236) [druid-hdfs-storage-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.storage.hdfs.HdfsLoadSpec.loadSegment(HdfsLoadSpec.java:59) [druid-hdfs-storage-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:141) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:93) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:151) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:142) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:171) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:42) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at io.druid.server.coordination.BaseZkCoordinator$1.childEvent(BaseZkCoordinator.java:115) [druid-server-0.8.1-iap2.jar:0.8.1-iap2]
        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.8.0.jar:?]
        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:510) [curator-recipes-2.8.0.jar:?]
        at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92) [curator-framework-2.8.0.jar:?]
        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
        at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:84) [curator-framework-2.8.0.jar:?]
        at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:508) [curator-recipes-2.8.0.jar:?]
        at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.8.0.jar:?]
        at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:759) [curator-recipes-2.8.0.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_05]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [?:1.7.0_05]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [?:1.7.0_05]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_05]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [?:1.7.0_05]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [?:1.7.0_05]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [?:1.7.0_05]
                                                                                                                                                                           413,1-8       77%

Bingkun Guo

unread,
Nov 20, 2015, 1:50:26 PM11/20/15
to Druid User
Hi Amol,

Are you sure this segment "wikipedia_2013-08-31T00:00:00.000Z_2013-09-01T00:00:00.000Z_2015-11-13T19:23:45.354Z" is removed from mysql?
It's also possible that your zookeeper has stale information under "/druid/loadQueue/", can you try restart zookeeper?

From your log, your cache directory is "/tmp/druid/indexCache/info_dir/", more details about historical cache can be found at http://druid.io/docs/latest/configuration/historical.html#storing-segments.

Amol Purohit

unread,
Dec 2, 2015, 12:41:31 PM12/2/15
to Druid User
Thanks. Clearing the local cache and all the nodes from zookeeper worked for us.

Fangjin Yang

unread,
Dec 4, 2015, 2:34:25 AM12/4/15
to Druid User
You can also look into kill tasks if you run the indexing service to automate hard deleting segments from HDFS:

Reply all
Reply to author
Forward
0 new messages