Error on Historical nodes, missing index.drd file

62 views
Skip to first unread message

Maurizio Gallelli

unread,
Feb 20, 2017, 4:08:46 AM2/20/17
to Druid User
Hi,
I've a strange behavior in the last days in the Historical nodes:
I've exceptions about missing index.drd file.
My Druid version il 0.9.1.1 

Can someone help me to understand this error and how to solve it.

Here some logs:

2017-02-19 22:38:09,884 INFO i.d.s.c.ZkCoordinator [ZkCoordinator-0] Completed request [LOAD: buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00]
2017-02-19 22:38:09,885 ERROR i.d.s.c.ZkCoordinator [ZkCoordinator-0] Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00], segment=DataSegment{size=643323273, shardSpec=LinearShardSpec{partitionNum=0}, metrics=[count, rtb_count, bid_response, wins, clicks, clicks_global, conversion, clickfraud_count, conversionfraud_count, bid_spent, min_bid, max_bid, min_win, events, bid_time_total, max_bid_time, min_bid_time, max_win, win_spent, company_spent, advertiser_spent, max_advertiser_bag, total_advertiser_bad, total_advertiser_transaction, revenue, peer39_price, track_revenue], dimensions=[account_executive, account_manager, account_onboarding, account_sales, advertiser_bag, advertiser_id, advertiser_price, advertiser_transaction, app_bundle, banner_size, bid_floor, bid_price, bid_time, browser_id, browser_version_id, campaign_id, campaign_type, carrier_id, categories, city_id, company_id, company_price, creative_id, country_id, currency, conversion_type, device_make_id, device_model_id, device_os_id, device_osv_id, device_type_id, domain, event_id, exchange_id, native_type, offer_id, optimization_id, optimization_status, place_id, place_name, pub_id, pub_name, publisher_id, publisher_src, publisher_src1, publisher_src2, publisher_src3, publisher_src4, publisher_src5, supply_type, region_id, rotator_id, rtb, sec_counter, smartlink_id, url_id, win_price], version='2017-02-19T20:00:00.000-05:00', loadSpec={type=local, path=/s3/dataStorage/buck_bidding/2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00/2017-02-19T20:00:00.000-05:00/0/index.zip}, interval=2017-02-19T20:00:00.000-05:00/2017-02-19T21:00:00.000-05:00, dataSource='buck_bidding', binaryVersion='9'}}
io.druid.segment.loading.SegmentLoadingException: Exception loading segment[buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00]
        at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:309) ~[druid-server-0.9.1.1.jar:0.9.1.1]
        at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:350) [druid-server-0.9.1.1.jar:0.9.1.1]
        at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44) [druid-server-0.9.1.1.jar:0.9.1.1]
        at io.druid.server.coordination.ZkCoordinator$1.childEvent(ZkCoordinator.java:152) [druid-server-0.9.1.1.jar:0.9.1.1]
        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:522) [curator-recipes-2.10.0.jar:?]
        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.10.0.jar:?]
        at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-2.10.0.jar:?]
        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
        at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) [curator-framework-2.10.0.jar:?]
        at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:514) [curator-recipes-2.10.0.jar:?]
        at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.10.0.jar:?]
        at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:772) [curator-recipes-2.10.0.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_73]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_73]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_73]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_73]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_73]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_73]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_73]
Caused by: io.druid.segment.loading.SegmentLoadingException: /druid/historical/indexCache/buck_bidding/2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00/2017-02-19T20:00:00.000-05:00/0/index.drd (No such file or directory)
        at io.druid.segment.loading.MMappedQueryableIndexFactory.factorize(MMappedQueryableIndexFactory.java:52) ~[druid-server-0.9.1.1.jar:0.9.1.1]
        at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:96) ~[druid-server-0.9.1.1.jar:0.9.1.1]
        at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.1.1.jar:0.9.1.1]
        at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:305) ~[druid-server-0.9.1.1.jar:0.9.1.1]
        ... 18 more
Caused by: java.io.FileNotFoundException: /druid/historical/indexCache/buck_bidding/2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00/2017-02-19T20:00:00.000-05:00/0/index.drd (No such file or directory)
        at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_73]
        at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_73]
        at java.io.FileInputStream.<init>(FileInputStream.java:138) ~[?:1.8.0_73]
        at io.druid.segment.SegmentUtils.getVersionFromDir(SegmentUtils.java:43) ~[druid-api-0.9.1.1.jar:0.9.1.1]
        at io.druid.segment.IndexIO.loadIndex(IndexIO.java:211) ~[druid-processing-0.9.1.1.jar:0.9.1.1]
        at io.druid.segment.loading.MMappedQueryableIndexFactory.factorize(MMappedQueryableIndexFactory.java:49) ~[druid-server-0.9.1.1.jar:0.9.1.1]
        at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:96) ~[druid-server-0.9.1.1.jar:0.9.1.1]
        at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.1.1.jar:0.9.1.1]
        at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:305) ~[druid-server-0.9.1.1.jar:0.9.1.1]
        ... 18 more
2017-02-19 22:38:09,885 INFO i.d.s.c.ZkCoordinator [ZkCoordinator-0] zNode[/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00] was removed
2017-02-19 22:38:39,882 WARN i.d.s.c.ZkCoordinator [ZkCoordinator-Exec--0] Unable to delete segmentInfoCacheFile[/druid/historical/indexCache/info_dir/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00]

Then after some minutes:
2017-02-19 22:42:44,743 DEBUG o.a.z.ClientCnxn [ZkCoordinator-0-SendThread(10.80.4.69:2181)] Reading reply sessionid:0x45870066b440773, packet:: clientPath:/druid-aws/loadQueue/10.80.4.55:8081 serverPath:/druid-aws/loadQueue/10.80.4.55:8081 finished:false header:: 3470,12  replyHeader:: 3470,107378822171,0  request:: '/druid-aws/loadQueue/10.80.4.55:8081,T  response:: v{'buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00},s{107375333209,107375333209,1482155297346,1482155297346,0,71969,0,0,0,1,107378822171} 
2017-02-19 22:42:44,744 DEBUG o.a.z.ClientCnxn [ZkCoordinator-0-SendThread(10.80.4.69:2181)] Reading reply sessionid:0x45870066b440773, packet:: clientPath:/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00 serverPath:/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00 finished:false header:: 3471,4  replyHeader:: 3471,107378822171,0  request:: '/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00,T  response:: #7b22616374696f6e223a226c6f6164222c2264617461536f75726365223a226275636b5f62696464696e67222c22696e74657276616c223a22323031372d30322d31395432303a30303a30302e3030302d30353a30302f323031372d30322d31395432313a30303a30302e3030302d30353a3030222c2276657273696f6e223a22323031372d30322d31395432303a30303a30302e3030302d30353a3030222c226c6f616453706563223a7b2274797065223a226c6f63616c222c2270617468223a222f73332f6461746153746f726167652f6275636b5f62696464696e672f323031372d30322d31395432303a30303a30302e3030302d30353a30305f323031372d30322d31395432313a30303a30302e3030302d30353a30302f323031372d30322d31395432303a30303a30302e3030302d30353a30302f302f696e6465782e7a6970227d2c2264696d656e73696f6e73223a226163636f756e745f6578656375746976652c6163636f756e745f6d616e616765722c6163636f756e745f6f6e626f617264696e672c6163636f756e745f73616c65732c616476657274697365725f6261672c616476657274697365725f69642c616476657274697365725f70726963652c616476657274697365725f7472616e73616374696f6e2c6170705f62756e646c652c62616e6e65725f73697a652c6269645f666c6f6f722c6269645f70726963652c6269645f74696d652c62726f777365725f69642c62726f777365725f76657273696f6e5f69642c63616d706169676e5f69642c63616d706169676e5f747970652c636172726965725f69642c63617465676f726965732c636974795f69642c636f6d70616e795f69642c636f6d70616e795f70726963652c63726561746976655f69642c636f756e7472795f69642c63757272656e63792c636f6e76657273696f6e5f747970652c6465766963655f6d616b655f69642c6465766963655f6d6f64656c5f69642c6465766963655f6f735f69642c6465766963655f6f73765f69642c6465766963655f747970655f69642c646f6d61696e2c6576656e745f69642c65786368616e67655f69642c6e61746976655f747970652c6f666665725f69642c6f7074696d697a6174696f6e5f69642c6f7074696d697a6174696f6e5f7374617475732c706c6163655f69642c706c6163655f6e616d652c7075625f69642c7075625f6e616d652c7075626c69736865725f69642c7075626c69736865725f7372632c7075626c69736865725f737263312c7075626c69736865725f737263322c7075626c69736865725f737263332c7075626c69736865725f737263342c7075626c69736865725f737263352c737570706c795f747970652c726567696f6e5f69642c726f7461746f725f69642c7274622c7365635f636f756e7465722c736d6172746c696e6b5f69642c75726c5f69642c77696e5f7072696365222c226d657472696373223a22636f756e742c7274625f636f756e742c6269645f726573706f6e73652c77696e732c636c69636b732c636c69636b735f676c6f62616c2c636f6e76657273696f6e2c636c69636b66726175645f636f756e742c636f6e76657273696f6e66726175645f636f756e742c6269645f7370656e742c6d696e5f6269642c6d61785f6269642c6d696e5f77696e2c6576656e74732c6269645f74696d655f746f74616c2c6d61785f6269645f74696d652c6d696e5f6269645f74696d652c6d61785f77696e2c77696e5f7370656e742c636f6d70616e795f7370656e742c616476657274697365725f7370656e742c6d61785f616476657274697365725f6261672c746f74616c5f616476657274697365725f6261642c746f74616c5f616476657274697365725f7472616e73616374696f6e2c726576656e75652c7065657233395f70726963652c747261636b5f726576656e7565222c22736861726453706563223a7b2274797065223a226c696e656172222c22706172746974696f6e4e756d223a307d2c2262696e61727956657273696f6e223a392c2273697a65223a3634333332333237332c226964656e746966696572223a226275636b5f62696464696e675f323031372d30322d31395432303a30303a30302e3030302d30353a30305f323031372d30322d31395432313a30303a30302e3030302d30353a30305f323031372d30322d31395432303a30303a30302e3030302d30353a3030227d,s{107378822171,107378822171,1487562164742,1487562164742,0,0,0,457238547567738880,1617,0,107378822171} 
2017-02-19 22:42:44,744 INFO i.d.s.c.ZkCoordinator [ZkCoordinator-0] New request[LOAD: buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00] with zNode[/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00].
2017-02-19 22:42:44,744 INFO i.d.s.c.ZkCoordinator [ZkCoordinator-0] Loading segment buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00
2017-02-19 22:42:54,905 WARN i.d.s.l.SegmentLoaderLocalCacheManager [ZkCoordinator-0] Segment [buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00] is different than expected size. Expected [643323273] found [642701534]
2017-02-19 22:42:54,908 INFO i.d.s.c.BatchDataSegmentAnnouncer [ZkCoordinator-0] Announcing segment[buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00] at existing path[/druid-aws/segments/10.80.4.55:8081/10.80.4.55:8081_historical__default_tier_2017-01-10T10:36:21.934-05:00_5a9a3cf46507491bbf5b3a5ddadb99ca489]
2017-02-19 22:42:54,913 DEBUG o.a.z.ClientCnxn [ZkCoordinator-0-SendThread(10.80.4.69:2181)] Got WatchedEvent state:SyncConnected type:NodeDeleted path:/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00 for sessionid 0x45870066b440773
2017-02-19 22:42:54,913 DEBUG o.a.z.ClientCnxn [ZkCoordinator-0-SendThread(10.80.4.69:2181)] Reading reply sessionid:0x45870066b440773, packet:: clientPath:null serverPath:null finished:false header:: 3473,2  replyHeader:: 3473,107378822620,0  request:: '/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00,-1  response:: null
2017-02-19 22:42:54,913 INFO i.d.s.c.ZkCoordinator [ZkCoordinator-0] Completed request [LOAD: buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00]
2017-02-19 22:42:54,913 INFO i.d.s.c.ZkCoordinator [ZkCoordinator-0] zNode[/druid-aws/loadQueue/10.80.4.55:8081/buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00] was removed


and the folder has:
[root@aws-druid-history18 logs]# ls -la  /druid/historical/indexCache/buck_bidding/2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00/2017-02-19T20:00:00.000-05:00/0/
total 627656
drwxr-xr-x 2 root root      4096 Feb 19 22:42 .
drwxr-xr-x 3 root root      4096 Feb 19 22:42 ..
-rw-r--r-- 1 root root 642698582 Feb 19 22:42 00000.smoosh
-rw-r--r-- 1 root root      2948 Feb 19 22:42 meta.smoosh
-rw-r--r-- 1 root root         4 Feb 19 22:42 version.bin


Thanks,
Maurizio

Maurizio Gallelli

unread,
Feb 22, 2017, 5:27:50 AM2/22/17
to Druid User
Hi guys,
any help on this?

Thanks
Maurizio

Nishant Bangarwa

unread,
Feb 22, 2017, 2:26:28 PM2/22/17
to Druid User
looking at the WARN message in the logs it looks like your segment got corrupt somehow. Can you reindex data for this interval and retry ?
2017-02-19 22:42:54,905 WARN i.d.s.l.SegmentLoaderLocalCacheManager [ZkCoordinator-0] Segment [buck_bidding_2017-02-19T20:00:00.000-05:00_2017-02-19T21:00:00.000-05:00_2017-02-19T20:00:00.000-05:00] is different than expected size. Expected [643323273] found [642701534]
--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/a5367e6e-d705-4765-8a66-0e8c4ba930e9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Maurizio Gallelli

unread,
Feb 24, 2017, 3:40:22 AM2/24/17
to Druid User
Hi Nishant,
thanks for the feedback, I'm trying to reindex the segment.
Could the reason of different size be related to the fact that I've two Realtime nodes that do the same job ingesting from Kafka the same datasources and partitions at the same time?
This is a configuration that I've since two years and never received any kind of error. A few weeks ago I've updated to 0.9.1.1 version.
In the realtime spec file what I've for each datasource is:

"shardSpec": {
    "type": "linear",
    "partitionNum": 0
}

Thanks
Maurizio
Reply all
Reply to author
Forward
0 new messages