Spark SQL Error using HiveQL

1,090 views
Skip to first unread message

Philip Lee

unread,
Jan 4, 2016, 10:33:51 AM1/4/16
to Big Data Benchmark for BigBench, Michael Frank
Hello,

I am testing Spark SQL with scale factor 1, 50, etc using BigBench.

However, the thing is only queries 2, 4, 5, 7, 8 has error with all scala factor, even sf1. The rest of the queries on Spark SQL was fine. I briefly explained about the errors of the queries and attached more details of the queries.

Query 2,4 = 

java.io.IOException: Stream closed

at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:453)

at java.io.OutputStream.write(OutputStream.java:127)

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:93)

at java.io.BufferedOutputStream.write(BufferedOutputStream.java:137)

at org.apache.hadoop.hive.ql.exec.TextRecordWriter.write(TextRecordWriter.java:53)

org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(ScriptTransformation.scala:255)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)


Query 5 = I do not think the query with sf 1 is not supposed to have overflow error. I am trying to fix this error.
org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 6. To avoid this, increase spark.kryoserializer.buffer.max value.

Query 7 =

Unsupported language features in query: INSERT INTO TABLE q07_spark_run_query_0_result

SELECT

 ca_state,

 COUNT(*) AS cnt

FROM

 customer_address a,

 customer c,

 store_sales s,

...

(

...

)

GROUP BY ca_state

HAVING cnt >= 10 --at least 10 customers

ORDER BY cnt DESC, ca_state --top 10 states in descending order

LIMIT 10


Query 8 =

16/01/01 21:54:12 ERROR Utils: Uncaught exception in thread Thread-ScriptTransformation-Feed

java.lang.NullPointerException

at org.apache.spark.unsafe.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:235)

at org.apache.spark.util.collection.unsafe.sort. UnsafeInMemorySorter$SortedIterator.loadNext(UnsafeInMemorySorter.java:165)

at org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:142)

at org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:129)

at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at org.apache.spark.sql.hive.execution.ScriptTransformationWriterThread$$anonfun$run$1.apply$mcV$sp(ScriptTransformation.scala:255)


Have you ever seen this kind of error before on running these queries?

Best,
Phil


SparkSQLError.pdf

Philip Lee

unread,
Jan 7, 2016, 8:33:43 AM1/7/16
to Big Data Benchmark for BigBench, michae...@bankmark.de

Just for query 2, 4, 8 on Spark SQL>> the other ones were solved.
I tried to remove return type after "REDUCE USING" to apply python code, but the same error still remains.

Any suggestio?

James

unread,
Jan 8, 2016, 1:15:00 AM1/8/16
to Big Data Benchmark for BigBench, michae...@bankmark.de
Hi Philip,
Which the Spark version you are using for this test ?  Apache Spark 1.5/1.6 or CDH Spark ?

Philip Lee

unread,
Jan 8, 2016, 5:41:10 AM1/8/16
to James, Michael Frank, Big Data Benchmark for BigBench

It is spark 1.5.2

I found the exact error from query 2,4,8.

It was the path issue, but after solving these issues I found the other error, which is Error in query: cannot recognize input near '$' '{' 'hiveconf' in table name; line 1 pos 21.

I guess sql expression in the queries does not create or delete temp_table. It is still in JIRA issue "https://issues.apache.org/jira/browse/SPARK-11972"

But you know the other people succeded the queries before. How could it be?

2016. 1. 8. 오전 7:15에 "James" <yia...@gmail.com>님이 작성:
--
You received this message because you are subscribed to a topic in the Google Groups "Big Data Benchmark for BigBench" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/big-bench/yuy3VkP--ow/unsubscribe.
To unsubscribe from this group and all its topics, send an email to big-bench+...@googlegroups.com.
To post to this group, send email to big-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/big-bench/bc0b9831-023d-497f-8d12-a7b414b1c4cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James

unread,
Jan 10, 2016, 8:14:53 PM1/10/16
to Big Data Benchmark for BigBench, michae...@bankmark.de

The new issue of SPARK-11624/SPARK-11972 will cause the problem you hit with Spark 1.5.2. It should pass with previous Spark version. From current PR status updates, the patch have been created but it is still be in process of being review by community.

Philip Lee

unread,
Jan 11, 2016, 8:17:40 AM1/11/16
to James, Big Data Benchmark for BigBench, Michael Frank
Thanks, James.

I followed your way to try previous spark version 1.3.2, but I still face this error as well.

- DROP TABLE IF EXISTS ${hiveconf:TEMP_TABLE2}
16/01/11 13:12:02 INFO ParseDriver: Parsing command: DROP TABLE IF EXISTS ${hiveconf:TEMP_TABLE2}
NoViableAltException(16@[184:1: tableName : (db= identifier DOT tab= identifier -> ^( TOK_TABNAME $db $tab) |tab= identifier -> ^( TOK_TABNAME $tab) );])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:144)

- DROP TABLE IF EXISTS ${hiveconf:TEMP_TABLE2}]
org.apache.spark.sql.AnalysisException: cannot recognize input near '$' '{' 'hiveconf' in table name; line 1 pos 21
at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:254)

- org.apache.spark.sql.AnalysisException: cannot recognize input near '$' '{' 'hiveconf' in table name; line 1 pos 21
at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:254)




James

unread,
Jan 12, 2016, 2:41:29 AM1/12/16
to Big Data Benchmark for BigBench, yia...@gmail.com, michae...@bankmark.de
Could you please try to experiment the issue in Spark CLI first in order to identify the issue itself like example in SPARK-11972/SPARK-11624 ?

For example, reproduce steps:
/usr/lib/spark/bin/spark-sql -v --driver-memory 4g --executor-memory 7g --executor-cores 5 --num-executors 31 --master yarn-client --conf spark.yarn.executor.memoryOverhead=1024 --hiveconf RESULT_TABLE=test_result01

>use test;
>DROP TABLE IF EXISTS ${hiveconf:RESULT_TABLE};

James

unread,
Jan 12, 2016, 8:10:38 AM1/12/16
to Big Data Benchmark for BigBench, yia...@gmail.com, michae...@bankmark.de
As a side note, are you running Spark on YARN with yarn-client mode,right ?

Philip Lee

unread,
Jan 12, 2016, 8:21:50 AM1/12/16
to James, Big Data Benchmark for BigBench, Michael Frank
As bigbench default setting in /engines/spark/conf, I use --master local[*]

James

unread,
Jan 13, 2016, 1:36:56 AM1/13/16
to Big Data Benchmark for BigBench, yia...@gmail.com, michae...@bankmark.de
I once tried to run BigBench queries with Spark SQL on Spark 1.5.0 with yarn-client mode and have no ' noViableAltException' error you came cross. So please firstly experiment the example case mentioned in Jira to check if spark environment issue or others.

Philip Lee

unread,
Jan 13, 2016, 2:45:40 AM1/13/16
to James, Michael Frank, Big Data Benchmark for BigBench

Thanks for your try.

So did you run queries with external scripts, which are 2, 4, 8? What about trying --master local(*) ? This option was the recommend option.

But first you were supposed to see the error with return type value unless you change the return type value in Reduce ... Using Code.

2016. 1. 13. 오전 7:36에 "James" <yia...@gmail.com>님이 작성:

James

unread,
Jan 15, 2016, 8:42:05 AM1/15/16
to Big Data Benchmark for BigBench, yia...@gmail.com, michae...@bankmark.de
The Q2,Q4 and Q8 run successfully on --master local[*] mode with Spark 1.5 after resolving the file path issue for python scripts..not sure why it got failure on your test. So could you please check the environment to see if any exception ? As a side note,  the option of '--master local[*]'  is only an example for engine run. Local mode mean running Spark application on local machine instead of distributed mode. Actually spark application can run on yarn as resource manager(eg., run on yarn-client mode).

Matthias Schmidt

unread,
Feb 18, 2016, 6:04:54 AM2/18/16
to Big Data Benchmark for BigBench, michae...@bankmark.de
Hi Philip,

did you also fix query 5?

Matthias Schmidt

unread,
Jun 8, 2016, 10:34:46 AM6/8/16
to Big Data Benchmark for BigBench, michae...@bankmark.de
Hi Philip,

how did you fix query 5?

Br,
Matthias
Reply all
Reply to author
Forward
0 new messages