Query 3

74 views
Skip to first unread message

Alejandro Montero

unread,
Jul 18, 2016, 7:20:40 AM7/18/16
to Big Data Benchmark for BigBench
Hi,

I'm currently running BigBench and all queries seem to work just fine, except for the query number 3. The error message I've got is the following: 

============================
</settings from hiveSettings.sql>
============================
hive.exec.compress.output=false
OK
Time taken: 0.163 seconds
OK
Time taken: 0.948 seconds
FailedPredicateException(identifier,{useSQL11ReservedKeywordsForIdentifier()}?)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:10924)
at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:45850)
at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectItem(HiveParser_SelectClauseParser.java:2941)
at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectList(HiveParser_SelectClauseParser.java:1373)
at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:1128)
at org.apache.hadoop.hive.ql.parse.HiveParser.selectClause(HiveParser.java:45827)
at org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:41495)
at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:41402)
at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:40413)
at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:40283)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.subQuerySource(HiveParser_FromClauseParser.java:5307)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource(HiveParser_FromClauseParser.java:3741)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.joinSource(HiveParser_FromClauseParser.java:1873)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1518)
at org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:45873)
at org.apache.hadoop.hive.ql.parse.HiveParser.singleFromStatement(HiveParser.java:41020)
at org.apache.hadoop.hive.ql.parse.HiveParser.fromStatement(HiveParser.java:40743)
at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:40398)
at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:40283)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.subQuerySource(HiveParser_FromClauseParser.java:5307)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource(HiveParser_FromClauseParser.java:3741)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.joinSource(HiveParser_FromClauseParser.java:1873)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1518)
at org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:45873)
at org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:41516)
at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:41230)
at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:40413)
at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:40283)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1590)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1109)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:396)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 8:27 Failed to recognize predicate 'user'. Failed rule: 'identifier' in selection target
======= q03_hive_RUN_QUERY_0 time =======
Start timestamp: 2016/07/18:11:13:20 1468840400
Stop  timestamp: 2016/07/18:11:13:33 1468840413
Duration:  0h 0m 13s

I'm running BigBench in a 2 nodes vagrant virtual cluster with Hive 1.2.1 and mahout 0.9. 

Thank you very much for your help.

Michael Frank

unread,
Jul 18, 2016, 8:56:28 AM7/18/16
to Big Data Benchmark for BigBench
Hi,

'user' in query 3 is a reserved keyword in hive since this patch: https://issues.apache.org/jira/browse/HIVE-6617
Two solutions:
a) in query change 'user' to something else e.g. 'username' or 'user2'
b) workaround till hive starts using the reserved keyword: set hive.support.sql11.reserved.keywords=false

You can add the option for b) /engines/hive/conf/engineSettings.sql or configure it globally for your entire hive installation.
Please  note that we consider mahout to be the deprecated ML-engine for bigbench and spark is the new default.

cheers
Michael

Alejandro Montero

unread,
Jul 19, 2016, 5:55:10 AM7/19/16
to Big Data Benchmark for BigBench
Thank you very much for your help. You where completely right and specifying hive to stop using reserved keywords worked. 

Now I'm moving from mahout to spark but I'm facing some issues as well. The version I'm currently using is Spark-1.6.1 without hadoop. I've configured BigBench to use these engine and set all requirements for it to work, also I've set java's Xms to 512 MB and Xmx to 1024 to avoid heap issues. When launching query 5 (one of the ML queries) Spark sends and error: 

java.lang.ClassNotFoundException: org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.util.Utils$.classForName(Utils.scala:174)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Failed to load main class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.
You need to build Spark with -Phive and -Phive-thriftserver.

I cannot build any version of spark at the moment, I need to use some of the already built versions available in http://apache.rediris.es/spark/. Any idea on how to make it work?

Once again, thank you very much.

Manuel Danisch

unread,
Jul 19, 2016, 8:51:50 AM7/19/16
to Big Data Benchmark for BigBench
Hi Alejandro,

The error message states that your spark binary lacks hive support. Spark needs certain hive parts to be able to , e.g., access the hive metastore (where all data are stored). None of the binaries I've seen so far have hive support compiled in. So as far as I can see, you have the following options:
  1. Find a binary with hive support compiled in (as said, I don't know anyone who provides that).
  2. Get your spark binary provider to include the two mentioned flags (-Phive -Phive-thriftserver) when compiling spark.
  3. Compile spark yourself with these two flags.
Spark compilation is documented here: http://spark.apache.org/docs/latest/building-spark.html. As soon as spark is compiled with the two flags, everything should work.

Best regards,

Manuel
Reply all
Reply to author
Forward
0 new messages