insert-segment-to-db generated loadSpec path is relative, segments not loaded by druid

362 views
Skip to first unread message

Ashish Awasthi

unread,
Aug 16, 2016, 11:55:48 PM8/16/16
to Druid User
Hi

We use imply 1.0.2

We populated postgresql segments table using insert-segment-to-db. The difference we see from the earlier metadata in segments table, is that loadSpec (after hex decode) changed from

"loadSpec":{"type":"hdfs","path":"hdfs://master-7e09d585.node.rsa:9000/data/druid/segments/.../0/index.zip"}
to
"loadSpec":{"type":"hdfs","path":"/data/druid/segments/.../0/index.zip"}
 
When druid started after this, it failed with error: 

java.io.FileNotFoundException: File /data/druid/segments/datasource1/20160816T083000.000Z_20160816T084500.000Z/2016-08-16T09_06_16.511Z/0/index.zip does not exist


Is there a way we can either generate the full hdfs URL from the tool or make it load segments from generated relative URL?


The command used to insert segments was:
java \
-Ddruid.metadata.storage.type=postgresql \
-Ddruid.metadata.storage.connector.connectURI=jdbc\:postgresql\://$POSTGRESQL_HOST\:$POSTGRESQL_PORT/druid \
-Ddruid.metadata.storage.connector.user=$DRUID_USER \
-Ddruid.metadata.storage.connector.password=$DRUID_PASSWORD \
-Ddruid.extensions.loadList=[\"postgresql-metadata-storage\",\"druid-hdfs-storage\"] \
-Ddruid.storage.type=hdfs \
-classpath '/root/imply-1.2.0/dist/druid/lib/*' \
io.druid.cli.Main tools insert-segment-to-db --workingDir hdfs://$HDFS_HOST:$HDFS_PORT//data/druid/segments



Thanks 
Ashish

Ashish Awasthi

unread,
Aug 17, 2016, 3:30:47 AM8/17/16
to Druid User
Following is full stack-trace. It looks like druid is looking for file in local file system and not in HDFS

java.io.FileNotFoundException: File /data/druid/segments/sessionsSummary/20160816T084500.000Z_20160816T090000.000Z/2016-08-16T09_06_18.360Z/0/index.zip does not exist

        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) ~[?:?]

        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:722) ~[?:?]

        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) ~[?:?]

        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398) ~[?:?]

        at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137) ~[?:?]

        at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) ~[?:?]

        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:765) ~[?:?]

        at io.druid.storage.hdfs.HdfsDataSegmentPuller$1.openInputStream(HdfsDataSegmentPuller.java:107) ~[?:?]

        at io.druid.storage.hdfs.HdfsDataSegmentPuller.getInputStream(HdfsDataSegmentPuller.java:298) ~[?:?]

        at io.druid.storage.hdfs.HdfsDataSegmentPuller$3.openStream(HdfsDataSegmentPuller.java:241) ~[?:?]

        at com.metamx.common.CompressionUtils$1.call(CompressionUtils.java:138) ~[java-util-0.27.7.jar:?]

        at com.metamx.common.CompressionUtils$1.call(CompressionUtils.java:134) ~[java-util-0.27.7.jar:?]

        at com.metamx.common.RetryUtils.retry(RetryUtils.java:38) [java-util-0.27.7.jar:?]

        at com.metamx.common.CompressionUtils.unzip(CompressionUtils.java:132) [java-util-0.27.7.jar:?]

        at io.druid.storage.hdfs.HdfsDataSegmentPuller.getSegmentFiles(HdfsDataSegmentPuller.java:235) [druid-hdfs-storage-0.9.0.jar:0.9.0]

        at io.druid.storage.hdfs.HdfsLoadSpec.loadSegment(HdfsLoadSpec.java:62) [druid-hdfs-storage-0.9.0.jar:0.9.0]

        at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:143) [druid-server-0.9.0.jar:0.3.16]

        at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:95) [druid-server-0.9.0.jar:0.3.16]

        at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) [druid-server-0.9.0.jar:0.9.0]

        at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:305) [druid-server-0.9.0.jar:0.9.0]

        at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:350) [druid-server-0.9.0.jar:0.9.0]

        at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44) [druid-server-0.9.0.jar:0.9.0]

        at io.druid.server.coordination.ZkCoordinator$1.childEvent(ZkCoordinator.java:152) [druid-server-0.9.0.jar:0.9.0]

        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:518) [curator-recipes-2.9.1.jar:?]

        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:512) [curator-recipes-2.9.1.jar:?]

        at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92) [curator-framework-2.9.1.jar:?]

        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]

        at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83) [curator-framework-2.9.1.jar:?]

        at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:509) [curator-recipes-2.9.1.jar:?]

        at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.9.1.jar:?]

        at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:766) [curator-recipes-2.9.1.jar:?]

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_74]

        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_74]

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_74]

        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_74]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_74]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_74]

        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]

Ashish Awasthi

unread,
Aug 17, 2016, 10:03:50 PM8/17/16
to Druid User
Problem was resolved by adding core-site.xml at conf/druid/_common/, as mentioned in http://druid.io/docs/latest/tutorials/cluster.html

Looks like class io.druid.storage.hdfs.HdfsDataSegmentFinder changes the absolute path to relative, when the tool executes.

While loading the segments using relative path, io.druid.storage.hdfs.HdfsDataSegmentPuller.getSegmentFiles calls path.getFileSystem(config), which (in absence of "fs.defaultFS" in Hadoop conf, in org.apache.hadoop.fs.FileSystem) assumes it to be a file on local disc.

Ashish Awasthi

unread,
Aug 17, 2016, 11:57:19 PM8/17/16
to Druid User
Ideally insert-segment-to-db tool should write absolute url, to be consistent with the path inserted by indexing.

Also at read time, if loadSpec type is "hdfs" and path is a relative url, then instead of assuming local file, druid.storage.storageDirectory path can be used to locate file.

Fangjin Yang

unread,
Aug 25, 2016, 5:01:49 PM8/25/16
to Druid User
Hi Ashish, do you mind helping to improve the documentation for others that use this tool?

Ashish Awasthi

unread,
Aug 26, 2016, 5:46:59 AM8/26/16
to Druid User
sure, will do that

Ashish Awasthi

unread,
Aug 27, 2016, 12:09:46 PM8/27/16
to Druid User
Doc update submitted, please check
Reply all
Reply to author
Forward
0 new messages