This is possible using the Hadoop Azure Filesystem implemented and used within MS's HDInsight product: https://hadoop.apache.org/docs/stable2/hadoop-azure/index.html
My use case is I have an HDInsight cluster, where data is stored in Azure Storage Blobs, and the cluster is configured to use the wasb protocol (this hadoop-azure custom Filesystem implementation above) to read/write data.
I would like to use Presto to query the Hive tables defined in this cluster. I configured Presto hive-hadoop2 catalog to point at the Hive metastore. Additionally I copied the following JARs for the custom Filesystem:
- azure-storage-4.4.0.jar
- hadoop-azure-2.7.3.jar
To the <presto install>/plugin/hive-hadoop2 directory.
I can successfully query the metastore (SHOW DATABASES, SHOW TABLES, etc... work fine).
Querying data, e.g.
SELECT * FROM hive.default.hivesampletable;
Always returns 0 rows, when I have verified with the hive command line client that the query does in fact return data.
Any thoughts or things I could troubleshoot?
thanks!
You received this message because you are subscribed to a topic in the Google Groups "Presto" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/presto-users/U7d0FcqMkuw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to presto-users...@googlegroups.com.
java.lang.NoClassDefFoundError: org/mortbay/util/ajax/JSON$Convertor
at org.apache.hadoop.fs.azure.NativeAzureFileSystem.createDefaultStore(NativeAzureFileSystem.java:1064)
at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1035)
at org.apache.hadoop.fs.PrestoFileSystemCache.createFileSystem(PrestoFileSystemCache.java:74)
at org.apache.hadoop.fs.PrestoFileSystemCache.getInternal(PrestoFileSystemCache.java:61)
at org.apache.hadoop.fs.PrestoFileSystemCache.get(PrestoFileSystemCache.java:43)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at com.facebook.presto.hive.HdfsEnvironment.lambda$getFileSystem$0(HdfsEnvironment.java:67)
at com.facebook.presto.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:23)
at com.facebook.presto.hive.HdfsEnvironment.getFileSystem(HdfsEnvironment.java:66)
at com.facebook.presto.hive.HdfsEnvironment.getFileSystem(HdfsEnvironment.java:60)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:280)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:222)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:77)
at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:178)
at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:45)
at com.facebook.presto.hive.util.ResumableTasks.lambda$submit$1(ResumableTasks.java:33)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.mortbay.util.ajax.JSON$Convertor
at com.facebook.presto.server.PluginClassLoader.loadClass(PluginClassLoader.java:106)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 21 more
2016-09-26T21:04:40.648Z INFO query-execution-2 com.facebook.presto.event.query.QueryMonitor TIMELINE: Query 20160926_210438_00000_7k8yi :: Transaction:[c9010f6c-308c-43cd-a759-5b905e9eb2ad] :: elapsed 1641ms :: planning 708ms :: scheduling 933ms :: running 0ms :: finishing 933ms :: begin 2016-09-26T21:04:38.692Z :: end 2016-09-26T21:04:40.333Z
This is logged as WARN, although maybe indicates a more serious issue. Could this be the root cause of why no data is returned?
2016-09-26T21:50:13.942Z WARN hive-hive-0 com.facebook.presto.hive.util.ResumableTasks ResumableTask completed exceptionally
java.lang.NoSuchMethodError: org.apache.hadoop.security.ProviderUtils.excludeIncompatibleCredentialProviders(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/Class;)Lorg/apache/hadoop/conf/Configuration;
at org.apache.hadoop.fs.azure.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:45)
at org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider.getStorageAccountKey(ShellDecryptionKeyProvider.java:40)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.getAccountKeyFromConfiguration(AzureNativeFileSystemStore.java:841)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:921)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:439)
at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1160)
at org.apache.hadoop.fs.PrestoFileSystemCache.createFileSystem(PrestoFileSystemCache.java:74)
at org.apache.hadoop.fs.PrestoFileSystemCache.getInternal(PrestoFileSystemCache.java:61)
at org.apache.hadoop.fs.PrestoFileSystemCache.get(PrestoFileSystemCache.java:43)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at com.facebook.presto.hive.HdfsEnvironment.lambda$getFileSystem$0(HdfsEnvironment.java:67)
at com.facebook.presto.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:23)
at com.facebook.presto.hive.HdfsEnvironment.getFileSystem(HdfsEnvironment.java:66)
at com.facebook.presto.hive.HdfsEnvironment.getFileSystem(HdfsEnvironment.java:60)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:280)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:222)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:77)
at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:178)
at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:45)
at com.facebook.presto.hive.util.ResumableTasks.lambda$submit$1(ResumableTasks.java:33)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-09-26T21:50:14.163Z INFO query-execution-2 com.facebook.presto.event.query.QueryMonitor TIMELINE: Query 20160926_215012_00000_ukb8u :: Transaction:[381ea87a-1958-4ec6-93b3-cce079ec085b] :: elapsed 1578ms :: planning 855ms :: scheduling 723ms :: running 0ms :: finishing 723ms :: begin 2016-09-26T21:50:12.445Z :: end 2016-09-26T21:50:14.023Z
I can see that hadoop-common.jar in the plugin folder has this method implemented... not sure where else Presto server could be loading a JAR file that implements that class... any ideas?
hive.config.resources=/etc/hadoop/conf/hdfs-site.xml,/etc/hadoop/conf/core-site.xml
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-install-presto
OR
https://github.com/hdinsight/presto-hdinsight
Thanks,
Dharmesh