Can't Start Query 9

80 views
Skip to first unread message

Alejandro Montero

unread,
Aug 29, 2016, 4:48:27 AM8/29/16
to Big Data Benchmark for BigBench
Hello, 

I'm facing some problems trying to start the Query-9. There is no error message, simply the query won't start after finishing all hive settings. This the last output that I'm able to see: 

===============================================
Running query : q09
-----------------------------------------------
benchmark phase: RUN_QUERY
stream number  : 0
user parameter file: 
user settings file : 
log: /mnt/aloja/aloja-bench_3/src/BigBench/logs/q09_hive_RUN_QUERY_0.log
===============================================
checking existence of local: /mnt/aloja/aloja-bench_3/src/BigBench/logs
creating folders and setting permissions
/mnt/aloja/aloja-bench_3/src/BigBench/engines/hive/bin/runQuery: line 73: [: missing `]'
/mnt/aloja/aloja-bench_3/src/BigBench/engines/hive/bin/runQuery: line 73: -eq: command not found
WARNING: Use "yarn jar" to launch YARN applications.

Logging initialized using configuration in file:/etc/hive/2.4.2.4-5/0/hive-log4j.properties
============================
<settings from queryParameters.sql>
============================
============================
</settings from queryParameters.sql>
============================
============================
<settings from hiveSettings.sql>
============================
============================
Print most important properties
============================
hive.exec.parallel=false
hive.exec.parallel.thread.number=8
hive.exec.compress.intermediate=false
mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec
hive.exec.compress.output=false
mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec
hive.default.fileformat=TextFile
mapred.max.split.size=67108864
mapred.min.split.size=1
hive.exec.reducers.bytes.per.reducer=256000000
hive.exec.reducers.max=1009
hive.auto.convert.sortmerge.join=true
hive.auto.convert.sortmerge.join.noconditionaltask is undefined
hive.optimize.bucketmapjoin=true
hive.optimize.bucketmapjoin.sortedmerge=false
hive.optimize.ppd=true
hive.optimize.index.filter=true
hive.auto.convert.join.noconditionaltask.size=319039733
hive.auto.convert.join=true
hive.optimize.mapjoin.mapreduce is undefined
hive.mapred.local.mem=0
hive.mapjoin.smalltable.filesize=5000000
hive.mapjoin.localtask.max.memory.usage=0.9
hive.optimize.skewjoin=false
hive.optimize.skewjoin.compiletime=false
hive.groupby.skewindata=false
============================
</settings from hiveSettings.sql>
============================
hive.exec.compress.output=false
OK
Time taken: 1.191 seconds
OK
Time taken: 0.777 seconds

After that the execution freezes. It's important to notice that all other queries run flawless. Thank you very much for your help.

Michael Frank

unread,
Aug 29, 2016, 12:04:51 PM8/29/16
to Big Data Benchmark for BigBench
Hi Alejandro,
 
to help me/us to give good support please always provide:
  • bigbench version (tpcx-bb kit or intel github version and which git #hash)
  • the full command line you used to start bigbench
  • the logfile(s) - see the logs/ folder!
  • the on screen message (that one you provided)


 There is no error message

Yes there is:

/mnt/aloja/aloja-bench_3/src/BigBench/engines/hive/bin/runQuery: line 73: [: missing `]'
/mnt/aloja/aloja-bench_3/src/BigBench/engines/hive/bin/runQuery: line 73: -eq: command not found

What's curious, that line 73 does not link to relevant code in the official repo's, neither in https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench nor the zip you can download from the tpc TPCX-BB_Tools_v1.1.0.zip). Both version have different line numbers for the same code because of different license headers.

=> Please clarify which bigbench version you are using, the tpc one or the official from the github.


As the error refers to "-eq", this is the only line in the runQuery file that matches your error (and is in the same vicinity)

 if [ -z "$DEBUG_QUERY_PART" ] || [ $DEBUG_QUERY_PART -eq 1 ]

There a multiple possibilities
  • missing `]'   :: this is often caused by a missing space between the brackets [ ] and the statement - probable cause: someone tinkered with the file.
    • Wrong: [-z "$DEBUG_QUERY_PART"]
    • Right:   [ -z "$DEBUG_QUERY_PART" ] # note the spaces!
  • -eq: command not found :: this can be caused by an non-integer argument
    • maybe you started bigbench with the -D "query part to debug" option and provided a non-integer argument (e.g. a string, or null)
      •  for this you probably also provided "-U" to unlock expert mode - probably to run queries individually. Please provide the full command line you used to start the query.
    • missing spaces again?
Hope that gives u a clue what to look for.
If you need further help, please provide the information i requested above and if you solve it, please post your answer so others might find it helpful  :)

Cheers,
Michael

Alejandro Montero

unread,
Sep 9, 2016, 5:10:43 AM9/9/16
to Big Data Benchmark for BigBench
Hi again, sorry for being missing for so long, a few days of vacations are always needed :).

Continuing with the problem, I'm deploying BigBench in PaaS, using both Hive and Spark engine. On spark all 30 queries finish flawless and in hive all but query #9 works as well. The BigBench version we are currently using is the one in the official github repo but with a tweak in the main binary to accept config files from different folders than the original, we have already corrected the issue with the -eq and as expected it was nothing more but an aesthetics issue. 

One of the main drawbacks from ussing PaaS is that most of the times they have a remote HDFS resulting in warning message each time hive access the metastore: 

"Could not get hdfsEncryptionShim, it is only applicable to hdfs filesystem."

Though it seems not to affect the execution is important to notify it.

I've also found one mahout issue as well. It seems Mahout 0.9 and Hadoop 2.7 have several incompatibilities. In each Mahout query when reaching step 3, which commonly is calculating K-means with partial results from previous steps, the query would fail because Mahout was unable to find those partial results in the HDFS. To solve it the advice is to upgrade to Mahout 0.11 or above though, at least for me, the issue still occurs. So what I did is set BIG_BENCH_HDFS_ABSOLUTE_PATH as, well..., as what the variable was expecting, the complete absolute path of the HDFS. This is a pain in the ass on PaaS as the user should not need to know this information. On premise though, a relative path such as /dfs/$BIG_BENCH_USER works flawless. 

So, to elaborate the issue in query #9, the BigBench command line is dynamically created differently in each cluster, in our case it's the following:

20160909_084027 22224: DEBUG: BigBench command:

    export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
    export PATH=/mnt/aloja/aplic2/apps/apache-mahout-distribution-0.12.2/bin/:/home/pristine/share/sw/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
    
/mnt/src/BigBench/bin/bigBench runQuery -q 9

Unfortunately this is not the only issue I'm dealing with, on our local cluster (5 machines, 12 cores, 64 GB RAM each machine) BigBench works without major issues on spark engine using 1 executor per node, with all cores and 54 GB of RAM (this is the first iteration we expect to tune it after everything works) but hive fails on queries #9 and #18 due to JAVA memory.

Q09:

============================
Print most important properties
============================
hive.exec.parallel=false
hive.exec.parallel.thread.number=8
hive.exec.compress.intermediate=false
mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec
hive.exec.compress.output=false
mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec
hive.default.fileformat=TextFile
mapred.max.split.size=67108864
mapred.min.split.size=1
hive.exec.reducers.bytes.per.reducer=256000000
hive.exec.reducers.max=1009
hive.auto.convert.sortmerge.join=false
hive.auto.convert.sortmerge.join.noconditionaltask is undefined
hive.optimize.bucketmapjoin=false
hive.optimize.bucketmapjoin.sortedmerge=false
hive.optimize.ppd=true
hive.optimize.index.filter=false
hive.auto.convert.join.noconditionaltask.size=10000000
hive.auto.convert.join=true
hive.optimize.mapjoin.mapreduce is undefined
hive.mapred.local.mem=0
hive.mapjoin.smalltable.filesize=5000000
hive.mapjoin.localtask.max.memory.usage=0.9
hive.optimize.skewjoin=false
hive.optimize.skewjoin.compiletime=false
hive.groupby.skewindata=false
============================
</settings from hiveSettings.sql>
============================
hive.exec.compress.output=false
OK
Time taken: 1.484 seconds
OK
Time taken: 0.819 seconds
Query ID = pristine_20160908113020_955813a0-3562-4b81-a6c9-bcf6a7f45ba1
Total jobs = 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/scratch/local/aplic2/apps/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/scratch/local/aplic2/apps/spark_hive-1.6.2/lib/spark-assembly-1.6.2-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Execution log at: /scratch/attached/1/aloja-bench_3/hive_logs/pristine_20160908113020_955813a0-3562-4b81-a6c9-bcf6a7f45ba1.log
2016-09-08 11:30:29     Starting to launch local task to process map join;      maximum memory = 514523136
2016-09-08 11:30:32     Processing rows:        200000  Hashtable size: 199999  Memory usage:   119680168       percentage:     0.233
2016-09-08 11:30:32     Processing rows:        300000  Hashtable size: 299999  Memory usage:   155328552       percentage:     0.302
2016-09-08 11:30:32     Processing rows:        400000  Hashtable size: 399999  Memory usage:   190976928       percentage:     0.371
2016-09-08 11:30:32     Processing rows:        500000  Hashtable size: 499999  Memory usage:   224078992       percentage:     0.436
2016-09-08 11:30:32     Processing rows:        600000  Hashtable size: 599999  Memory usage:   257113776       percentage:     0.50
2016-09-08 11:30:32     Processing rows:        700000  Hashtable size: 699999  Memory usage:   286264128       percentage:     0.556
Execution failed with exit status: 3
Obtaining error information

Task failed!
Task ID:
  Stage-15

Logs:

/scratch/attached/1/aloja-bench_3/hive_logs/hive.log
FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask

As far as I recall, exit code 3 is memory issues. 

Q18:

new io.bigdatabenchmark.v1.queries.q18.NegativeSentimentUDF()
new io.bigdatabenchmark.v1.queries.q18.NegativeSentimentUDF() done
initialize io.bigdatabenchmark.v1.queries.q18.NegativeSentimentUDF
initialize io.bigdatabenchmark.v1.queries.q18.NegativeSentimentUDF done
Query ID = pristine_20160908120844_8d3a6c02-e7d5-475d-91ab-89ca1f041961
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1473323077826_0116, Tracking URL = http://minerva-105:8088/proxy/application_1473323077826_0116/
Kill Command = /scratch/local/aplic2/apps/hadoop-2.7.1/bin/hadoop job  -kill job_1473323077826_0116
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 0
2016-09-08 12:08:54,491 Stage-1 map = 0%,  reduce = 0%
2016-09-08 12:09:55,130 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 195.69 sec
2016-09-08 12:09:58,258 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 207.79 sec
2016-09-08 12:10:10,802 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 234.51 sec
2016-09-08 12:11:03,302 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 291.44 sec
MapReduce Total cumulative CPU time: 4 minutes 51 seconds 440 msec
Ended Job = job_1473323077826_0116
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1473323077826_0117, Tracking URL = http://minerva-105:8088/proxy/application_1473323077826_0117/
Kill Command = /scratch/local/aplic2/apps/hadoop-2.7.1/bin/hadoop job  -kill job_1473323077826_0117
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2016-09-08 12:11:15,090 Stage-3 map = 0%,  reduce = 0%
2016-09-08 12:11:23,433 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 4.49 sec
MapReduce Total cumulative CPU time: 4 seconds 490 msec
Ended Job = job_1473323077826_0117
Loading data to table bigbenchorc.q18_hive_run_query_0_result
Table bigbenchorc.q18_hive_run_query_0_result stats: [numFiles=0, numRows=84255, totalSize=0, rawDataSize=17405743]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 3   Cumulative CPU: 291.44 sec   HDFS Read: 75868354 HDFS Write: 17490307 SUCCESS
Stage-Stage-3: Map: 1   Cumulative CPU: 4.49 sec   HDFS Read: 17492873 HDFS Write: 14131898 SUCCESS
Total MapReduce CPU Time Spent: 4 minutes 55 seconds 930 msec
OK
Time taken: 161.531 seconds
OK
Time taken: 0.19 seconds
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.
Exception in thread "main" java.lang.OutOfMemoryError: PermGen space
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.hadoop.hive.ql.session.SessionState.close(SessionState.java:1472)
at org.apache.hadoop.hive.cli.CliSessionState.close(CliSessionState.java:66)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:683)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

I have the default memory porperty in mapred site and 1 GB on JAVA XMX, any advice how to scale this values depending on the scale factor? 

Notice that I'm currently using SCALE_FACTOR=1 in the tests.

As always, thank you very much for your help!
Reply all
Reply to author
Forward
0 new messages