[Help]How to load hive table to alluxio?

847 views
Skip to first unread message

Leon Zhong

unread,
May 23, 2016, 11:35:17 PM5/23/16
to Alluxio Users
alluxio-1.0.1+hive-1.0.0+spark-1.6.0
I want to use alluxio to cache data table, now all data are in hive tables locate hdfs, My cluster has hive and spark two engines within 1000+ nodes.
All table are created with hive schema, I cannot find api to load a table to alluxio directly. 
Alluxio loadufs command only load files to alluxio without no metadata, the sql will scan data from hdfs still. 
A way to create a new table in alluxio with location alluxio://xxx:19998, but the table name changed, this is incompatible current running sql, should to change sql. 
Cannot transparent for users. 
Anybody please help me to slove this proplem, any idea?

Haoyuan Li

unread,
May 29, 2016, 11:49:26 PM5/29/16
to Leon Zhong, Alluxio Users
Leon,

You need to call the new location "alluxio://xxx:19998" instead of the previous location. Is this acceptable?

Best,

Haoyuan

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ashish kumar

unread,
Jul 20, 2016, 8:39:31 AM7/20/16
to Alluxio Users, mei...@126.com
Hi Haoyuan,

I've setup the Hadoop with Alluxio and able to run Hadoop map-reduce job on alluxio file system.

But when I'm trying to create Hive table on top of Alluxio, it is giving error.

hive> CREATE TABLE test (c STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 'alluxio://<master>:19888/ashish';

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.io.IOException: No FileSystem for scheme: alluxio)


And at the same time, I can access 

hadoop fs -ls alluxio://<master>:19888/


Any help would be appreciated.



Thanks,

Gene Pang

unread,
Jul 20, 2016, 9:04:54 AM7/20/16
to Alluxio Users, mei...@126.com
I think you will have to configure hive to to also know about the Alluxio scheme.

Hive needs to know about the "fs.alluxio.impl" property, mentioned here: http://www.alluxio.org/docs/master/en/Running-Hadoop-MapReduce-on-Alluxio.html#configuring-hadoop

It looks like you can use the "--hiveconf" commandline option or use hive-site.xml: https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration

Thanks,
Gene

ashish kumar

unread,
Jul 20, 2016, 1:10:19 PM7/20/16
to Alluxio Users, mei...@126.com
Thank Gene for the reply!

I tired starting HIVE prompt like this:

hive --hiveconf fs.alluxio.impl=alluxio.hadoop.FileSystem --hiveconf fs.alluxio-ft.impl=alluxio.hadoop.FaultTolerantFileSystem --hiveconf fs.AbstractFileSystem.alluxio.impl=alluxio.hadoop.AlluxioFileSystem


But still getting the same error. Am I missing anything here?

Gene Pang

unread,
Jul 20, 2016, 1:24:36 PM7/20/16
to Alluxio Users, mei...@126.com
I'm not sure, but maybe you can try the hiveconf:

 fs.default.name=alluxio://localhost:19998

as well? Maybe this thread could help: https://groups.google.com/d/topic/alluxio-users/9rrNBtywRbY

Thanks,
Gene

ashish kumar

unread,
Jul 21, 2016, 4:44:53 AM7/21/16
to Alluxio Users, mei...@126.com

I've tried by passing filesystem config as well:

hive --hiveconf fs.alluxio.impl=alluxio.hadoop.FileSystem --hiveconf fs.alluxio-ft.impl=alluxio.hadoop.FaultTolerantFileSystem --hiveconf fs.AbstractFileSystem.alluxio.impl=alluxio.hadoop.AlluxioFileSystem --hiveconf fs.default.name=alluxio://<master>:19888


Now, getting new error:


Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Frame size (67108864) larger than max length (16777216)!

at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)

at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)

at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)

at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Caused by: java.io.IOException: Frame size (67108864) larger than max length (16777216)!

at alluxio.AbstractClient.checkVersion(AbstractClient.java:115)

at alluxio.AbstractClient.connect(AbstractClient.java:178)

at alluxio.AbstractClient.retryRPC(AbstractClient.java:325)

at alluxio.client.file.FileSystemMasterClient.getStatus(FileSystemMasterClient.java:185)

at alluxio.client.file.BaseFileSystem.getStatus(BaseFileSystem.java:175)

at alluxio.client.file.BaseFileSystem.getStatus(BaseFileSystem.java:167)

at alluxio.hadoop.AbstractFileSystem.getFileStatus(AbstractFileSystem.java:295)

at alluxio.hadoop.FileSystem.getFileStatus(FileSystem.java:25)

at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)

at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:596)

at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)

at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)

Gene Pang

unread,
Jul 21, 2016, 9:17:58 AM7/21/16
to Alluxio Users, mei...@126.com
Hi,

Typically, the error: "Frame size larger than max length" happens when the incorrect port is used for the communication. Is 19888 the correct port? This is different from the default, so did you modify that parameter?

Thanks,
Gene

ashish kumar

unread,
Jul 24, 2016, 10:23:43 AM7/24/16
to Alluxio Users, mei...@126.com
Thanks Gene!

I got the actual issue. It was related to the Hive version. Now I'm using Hive 1.2 and it is working fine.

Gene Pang

unread,
Jul 25, 2016, 11:19:42 AM7/25/16
to Alluxio Users
Thanks for the confirmation!

-Gene
Reply all
Reply to author
Forward
Message has been deleted
0 new messages