Can't Start Query 9

80 views

Skip to first unread message

Alejandro Montero

unread,

Aug 29, 2016, 4:48:27 AM8/29/16

to Big Data Benchmark for BigBench

Hello,

I'm facing some problems trying to start the Query-9. There is no error message, simply the query won't start after finishing all hive settings. This the last output that I'm able to see:

===============================================

Running query : q09

-----------------------------------------------

benchmark phase: RUN_QUERY

stream number : 0

user parameter file:

user settings file :

log: /mnt/aloja/aloja-bench_3/src/BigBench/logs/q09_hive_RUN_QUERY_0.log

===============================================

checking existence of local: /mnt/aloja/aloja-bench_3/src/BigBench/logs

creating folders and setting permissions

/mnt/aloja/aloja-bench_3/src/BigBench/engines/hive/bin/runQuery: line 73: [: missing `]'

/mnt/aloja/aloja-bench_3/src/BigBench/engines/hive/bin/runQuery: line 73: -eq: command not found

WARNING: Use "yarn jar" to launch YARN applications.

Logging initialized using configuration in file:/etc/hive/2.4.2.4-5/0/hive-log4j.properties

============================

============================

</settings from queryParameters.sql>

============================

============================

Print most important properties

============================

hive.exec.parallel=false

hive.exec.parallel.thread.number=8

hive.exec.compress.intermediate=false

mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec

hive.exec.compress.output=false

mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec

hive.default.fileformat=TextFile

mapred.max.split.size=67108864

mapred.min.split.size=1

hive.exec.reducers.bytes.per.reducer=256000000

hive.exec.reducers.max=1009

hive.auto.convert.sortmerge.join=true

hive.auto.convert.sortmerge.join.noconditionaltask is undefined

hive.optimize.bucketmapjoin=true

hive.optimize.bucketmapjoin.sortedmerge=false

hive.optimize.ppd=true

hive.optimize.index.filter=true

hive.auto.convert.join.noconditionaltask.size=319039733

hive.auto.convert.join=true

hive.optimize.mapjoin.mapreduce is undefined

hive.mapred.local.mem=0

hive.mapjoin.smalltable.filesize=5000000

hive.mapjoin.localtask.max.memory.usage=0.9

hive.optimize.skewjoin=false

hive.optimize.skewjoin.compiletime=false

hive.groupby.skewindata=false

============================

</settings from hiveSettings.sql>

============================

hive.exec.compress.output=false

Time taken: 1.191 seconds

Time taken: 0.777 seconds

After that the execution freezes. It's important to notice that all other queries run flawless. Thank you very much for your help.

Michael Frank

unread,

Aug 29, 2016, 12:04:51 PM8/29/16

to Big Data Benchmark for BigBench

Hi Alejandro,

to help me/us to give good support please always provide:

bigbench version (tpcx-bb kit or intel github version and which git #hash)
the full command line you used to start bigbench
the logfile(s) - see the logs/ folder!
the on screen message (that one you provided)

There is no error message

Yes there is:

/mnt/aloja/aloja-bench_3/src/BigBench/engines/hive/bin/runQuery: line 73: [: missing `]'

/mnt/aloja/aloja-bench_3/src/BigBench/engines/hive/bin/runQuery: line 73: -eq: command not found

What's curious, that line 73 does not link to relevant code in the official repo's, neither in https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench nor the zip you can download from the tpc TPCX-BB_Tools_v1.1.0.zip). Both version have different line numbers for the same code because of different license headers.

=> Please clarify which bigbench version you are using, the tpc one or the official from the github.

As the error refers to "-eq", this is the only line in the runQuery file that matches your error (and is in the same vicinity)

if [ -z "$DEBUG_QUERY_PART" ] || [ $DEBUG_QUERY_PART -eq 1 ]

There a multiple possibilities

missing `]' :: this is often caused by a missing space between the brackets [ ] and the statement - probable cause: someone tinkered with the file.

Wrong: [-z "$DEBUG_QUERY_PART"]
Right: [ -z "$DEBUG_QUERY_PART" ] # note the spaces!

-eq: command not found :: this can be caused by an non-integer argument

maybe you started bigbench with the -D "query part to debug" option and provided a non-integer argument (e.g. a string, or null)

for this you probably also provided "-U" to unlock expert mode - probably to run queries individually. Please provide the full command line you used to start the query.

missing spaces again?

Hope that gives u a clue what to look for.
If you need further help, please provide the information i requested above and if you solve it, please post your answer so others might find it helpful :)

Cheers,
Michael

Alejandro Montero

unread,

Sep 9, 2016, 5:10:43 AM9/9/16

to Big Data Benchmark for BigBench

Hi again, sorry for being missing for so long, a few days of vacations are always needed :).

Continuing with the problem, I'm deploying BigBench in PaaS, using both Hive and Spark engine. On spark all 30 queries finish flawless and in hive all but query #9 works as well. The BigBench version we are currently using is the one in the official github repo but with a tweak in the main binary to accept config files from different folders than the original, we have already corrected the issue with the -eq and as expected it was nothing more but an aesthetics issue.

One of the main drawbacks from ussing PaaS is that most of the times they have a remote HDFS resulting in warning message each time hive access the metastore:

"Could not get hdfsEncryptionShim, it is only applicable to hdfs filesystem."

Though it seems not to affect the execution is important to notify it.

I've also found one mahout issue as well. It seems Mahout 0.9 and Hadoop 2.7 have several incompatibilities. In each Mahout query when reaching step 3, which commonly is calculating K-means with partial results from previous steps, the query would fail because Mahout was unable to find those partial results in the HDFS. To solve it the advice is to upgrade to Mahout 0.11 or above though, at least for me, the issue still occurs. So what I did is set BIG_BENCH_HDFS_ABSOLUTE_PATH as, well..., as what the variable was expecting, the complete absolute path of the HDFS. This is a pain in the ass on PaaS as the user should not need to know this information. On premise though, a relative path such as /dfs/$BIG_BENCH_USER works flawless.

So, to elaborate the issue in query #9, the BigBench command line is dynamically created differently in each cluster, in our case it's the following:

20160909_084027 22224: DEBUG: BigBench command:

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

export PATH=/mnt/aloja/aplic2/apps/apache-mahout-distribution-0.12.2/bin/:/home/pristine/share/sw/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games

/mnt/src/BigBench/bin/bigBench runQuery -q 9

Unfortunately this is not the only issue I'm dealing with, on our local cluster (5 machines, 12 cores, 64 GB RAM each machine) BigBench works without major issues on spark engine using 1 executor per node, with all cores and 54 GB of RAM (this is the first iteration we expect to tune it after everything works) but hive fails on queries #9 and #18 due to JAVA memory.

Q09:

============================

Print most important properties

============================

hive.exec.parallel=false

hive.exec.parallel.thread.number=8

hive.exec.compress.intermediate=false

mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec

hive.exec.compress.output=false

mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec

hive.default.fileformat=TextFile

mapred.max.split.size=67108864

mapred.min.split.size=1

hive.exec.reducers.bytes.per.reducer=256000000

hive.exec.reducers.max=1009

hive.auto.convert.sortmerge.join=false

hive.auto.convert.sortmerge.join.noconditionaltask is undefined

hive.optimize.bucketmapjoin=false

hive.optimize.bucketmapjoin.sortedmerge=false

hive.optimize.ppd=true

hive.optimize.index.filter=false

hive.auto.convert.join.noconditionaltask.size=10000000

hive.auto.convert.join=true

hive.optimize.mapjoin.mapreduce is undefined

hive.mapred.local.mem=0

hive.mapjoin.smalltable.filesize=5000000

hive.mapjoin.localtask.max.memory.usage=0.9

hive.optimize.skewjoin=false

hive.optimize.skewjoin.compiletime=false

hive.groupby.skewindata=false

============================

</settings from hiveSettings.sql>

============================

hive.exec.compress.output=false

Time taken: 1.484 seconds

Time taken: 0.819 seconds

Query ID = pristine_20160908113020_955813a0-3562-4b81-a6c9-bcf6a7f45ba1

Total jobs = 1

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/scratch/local/aplic2/apps/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/scratch/local/aplic2/apps/spark_hive-1.6.2/lib/spark-assembly-1.6.2-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Execution log at: /scratch/attached/1/aloja-bench_3/hive_logs/pristine_20160908113020_955813a0-3562-4b81-a6c9-bcf6a7f45ba1.log

2016-09-08 11:30:29 Starting to launch local task to process map join; maximum memory = 514523136

2016-09-08 11:30:32 Processing rows: 200000 Hashtable size: 199999 Memory usage: 119680168 percentage: 0.233

2016-09-08 11:30:32 Processing rows: 300000 Hashtable size: 299999 Memory usage: 155328552 percentage: 0.302

2016-09-08 11:30:32 Processing rows: 400000 Hashtable size: 399999 Memory usage: 190976928 percentage: 0.371

2016-09-08 11:30:32 Processing rows: 500000 Hashtable size: 499999 Memory usage: 224078992 percentage: 0.436

2016-09-08 11:30:32 Processing rows: 600000 Hashtable size: 599999 Memory usage: 257113776 percentage: 0.50

2016-09-08 11:30:32 Processing rows: 700000 Hashtable size: 699999 Memory usage: 286264128 percentage: 0.556

Execution failed with exit status: 3

Obtaining error information

Task failed!

Task ID:

Stage-15

Logs:

/scratch/attached/1/aloja-bench_3/hive_logs/hive.log

FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask

As far as I recall, exit code 3 is memory issues.

Q18:

new io.bigdatabenchmark.v1.queries.q18.NegativeSentimentUDF()

new io.bigdatabenchmark.v1.queries.q18.NegativeSentimentUDF() done

initialize io.bigdatabenchmark.v1.queries.q18.NegativeSentimentUDF

initialize io.bigdatabenchmark.v1.queries.q18.NegativeSentimentUDF done

Query ID = pristine_20160908120844_8d3a6c02-e7d5-475d-91ab-89ca1f041961

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1473323077826_0116, Tracking URL = http://minerva-105:8088/proxy/application_1473323077826_0116/

Kill Command = /scratch/local/aplic2/apps/hadoop-2.7.1/bin/hadoop job -kill job_1473323077826_0116

Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 0

2016-09-08 12:08:54,491 Stage-1 map = 0%, reduce = 0%

2016-09-08 12:09:55,130 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 195.69 sec

2016-09-08 12:09:58,258 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 207.79 sec

2016-09-08 12:10:10,802 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 234.51 sec

2016-09-08 12:11:03,302 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 291.44 sec

MapReduce Total cumulative CPU time: 4 minutes 51 seconds 440 msec

Ended Job = job_1473323077826_0116

Stage-4 is filtered out by condition resolver.

Stage-3 is selected by condition resolver.

Stage-5 is filtered out by condition resolver.

Launching Job 3 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1473323077826_0117, Tracking URL = http://minerva-105:8088/proxy/application_1473323077826_0117/

Kill Command = /scratch/local/aplic2/apps/hadoop-2.7.1/bin/hadoop job -kill job_1473323077826_0117

Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0

2016-09-08 12:11:15,090 Stage-3 map = 0%, reduce = 0%

2016-09-08 12:11:23,433 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 4.49 sec

MapReduce Total cumulative CPU time: 4 seconds 490 msec

Ended Job = job_1473323077826_0117

Loading data to table bigbenchorc.q18_hive_run_query_0_result

Table bigbenchorc.q18_hive_run_query_0_result stats: [numFiles=0, numRows=84255, totalSize=0, rawDataSize=17405743]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 3 Cumulative CPU: 291.44 sec HDFS Read: 75868354 HDFS Write: 17490307 SUCCESS

Stage-Stage-3: Map: 1 Cumulative CPU: 4.49 sec HDFS Read: 17492873 HDFS Write: 14131898 SUCCESS

Total MapReduce CPU Time Spent: 4 minutes 55 seconds 930 msec

Time taken: 161.531 seconds

Time taken: 0.19 seconds

WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.

WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.

Exception in thread "main" java.lang.OutOfMemoryError: PermGen space

at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(ClassLoader.java:792)

at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

at org.apache.hadoop.hive.ql.session.SessionState.close(SessionState.java:1472)

at org.apache.hadoop.hive.cli.CliSessionState.close(CliSessionState.java:66)

at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:683)

at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

I have the default memory porperty in mapred site and 1 GB on JAVA XMX, any advice how to scale this values depending on the scale factor?

Notice that I'm currently using SCALE_FACTOR=1 in the tests.

As always, thank you very much for your help!

Reply all

Reply to author

Forward

0 new messages