Query 9

dwayne lessner

unread,

Jan 21, 2016, 7:04:22 PM1/21/16

to Big Data Benchmark for BigBench

I think the source of my problems is this line of code as to why query 9 keeps failing

Query ID = root_20160121190101_e96cfce4-af0d-425b-9409-fe6efc6ad263

Total jobs = 5

Execution log at: /tmp/root/root_20160121190101_e96cfce4-af0d-425b-9409-fe6efc6ad263.log

2016-01-21 07:01:42 Starting to launch local task to process map join; maximum memory = 257425408

I need to change the value and have tried changing the java setting in the /root/Big-Data-Benchmark-for-Big-Bench/conf/userSettings.conf but no luck.

Yan Tang

unread,

Jan 21, 2016, 10:00:54 PM1/21/16

to Big Data Benchmark for BigBench

hi,

You can try to enlarge the Map/Reduce Task Memory for this issue...

Regards,

Yan

在 2016年1月22日星期五 UTC+8上午8:04:22，dwayne lessner写道：

James

unread,

Jan 21, 2016, 10:34:17 PM1/21/16

to Big Data Benchmark for BigBench

Do you mean local map join failure for Q09 ? if yes, please try to enlarge 'Client Java Heap Size in Bytes' value(eg, 2G or 3G or larger) the on Hive side via Cloudera Manage to enlarge "maximum memory" size.

Todor Ivanov

unread,

Jan 22, 2016, 5:42:21 AM1/22/16

to Big Data Benchmark for BigBench

Hi,

We were able to solve this by adding the parameters -Xms2147483648 and -Xmx2147483648 (increasing the Heap size to 2GB) to the environment variable HADOOP_CLIENT_OPTS in the hive-env.sh file.

Best Regards,

Todor

dwayne lessner

unread,

Jan 22, 2016, 12:26:57 PM1/22/16

to Big Data Benchmark for BigBench

Thanks everyone

I have changed the Client Java Heap Size in Bytes to 12 GB already. I made the changes in Cloudera manager and hive-env.sh but they don't seem to take. Seem to always using some stock config. I copied hive-env.sh from a template and made the changes and restarted the Hive service.

ive.stats.fetch.partition.stats=true

hive.script.operator.truncate.env=false

hive.compute.query.using.stats=false

hive.vectorized.execution.enabled=false

hive.vectorized.execution.reduce.enabled=true

hive.stats.autogather=true

mapreduce.input.fileinputformat.split.minsize=1

mapreduce.input.fileinputformat.split.maxsize=256000000

hive.exec.reducers.bytes.per.reducer=256000000

hive.exec.reducers.max=1009

hive.exec.parallel=false

hive.exec.parallel.thread.number=8

hive.exec.compress.intermediate=false

hive.exec.compress.output=false

mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec

mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec

hive.default.fileformat=TextFile

hive.auto.convert.sortmerge.join=false

hive.auto.convert.sortmerge.join.noconditionaltask is undefined

hive.optimize.bucketmapjoin=false

hive.optimize.bucketmapjoin.sortedmerge=false

hive.auto.convert.join.noconditionaltask.size=10000000

hive.auto.convert.join=true

hive.optimize.mapjoin.mapreduce is undefined

hive.mapred.local.mem=0

hive.mapjoin.smalltable.filesize=25000000

hive.mapjoin.localtask.max.memory.usage=0.9

hive.optimize.skewjoin=false

hive.optimize.skewjoin.compiletime=false

hive.optimize.ppd=true

hive.optimize.ppd.storage=true

hive.ppd.recognizetransivity=true

hive.optimize.index.filter=false

hive.optimize.sampling.orderby.number=1000

hive.optimize.sampling.orderby.percent=0.1

bigbench.hive.optimize.sampling.orderby=true

bigbench.hive.optimize.sampling.orderby.number=20000

bigbench.hive.optimize.sampling.orderby.percent=0.1

hive.groupby.skewindata=false

-DL

Yan Tang

unread,

Jan 22, 2016, 8:03:33 PM1/22/16

to Big Data Benchmark for BigBench

We can modify this configuration simply by Cloudera Manager. Search "Client Java Heap Size in Bytes" property in Hive-Configuration..

Regards,

Yan

在 2016年1月23日星期六 UTC+8上午1:26:57，dwayne lessner写道：

James

unread,

Jan 24, 2016, 11:00:59 PM1/24/16

to Big Data Benchmark for BigBench

If you modifies the value of 'Client Java Heap Size in Bytes' and then restart via Cloudera Manager(CM), you will see the changes for HADOOP_CLIENT_OPTS in /etc/hive/conf/hive-env.sh . Could you please check your operation ?

James

unread,

Jan 24, 2016, 11:03:52 PM1/24/16

to Big Data Benchmark for BigBench

BTW, you mean that the maximum memory still show stock value and have no change after modify the parameter 'Client Java Heap Size in Bytes' correctly via CM ?

dwayne lessner

unread,

Feb 1, 2016, 9:46:44 PM2/1/16

to Big Data Benchmark for BigBench

Not sure why email notification isn't working but apperiate the help. I will test and confirm this tonight.

dwayne lessner

unread,

Feb 1, 2016, 10:45:10 PM2/1/16

to Big Data Benchmark for BigBench

export HADOOP_CLIENT_OPTS="-Xmx8589934592 -XX:MaxPermSize=5128M -Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS"

dwayne lessner

unread,

Feb 1, 2016, 10:52:12 PM2/1/16

to Big Data Benchmark for BigBench

Still not working with 8 GB of client RAM

Duration: 0h 0m 27s

q09_hive_power_test_0 FAILED exit code: 3

----- result -----

EMPTY bytes: 0

to display: hadoop fs -cat /user/root/benchmarks/bigbench/queryResults/q09_hive_power_test_0_result/*

----- logs -----

time&status: /root/Big-Data-Benchmark-for-Big-Bench/logs/times.csv

full log: /root/Big-Data-Benchmark-for-Big-Bench/logs/q09_hive_power_test_0.log

Michael Frank

unread,

Feb 3, 2016, 12:58:09 PM2/3/16

to Big Data Benchmark for BigBench

Hi Dwayne,

form your other post: https://groups.google.com/d/msg/big-bench/msMBmR5ahYk/4cwjKJ2FEQAJ i take you are running on a cloudera distribution.
Is this line something you copied from /etc/hive/conf/hive-env.sh as a prove of your settings as requested by James or is this line an the result of an edit you did manually yourself to this file?

export HADOOP_CLIENT_OPTS="-Xmx8589934592 -XX:MaxPermSize=5128M -Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS"

In the later case a word of caution: in an managed hadoop stack, refrain from manually modifying things like: /etc/hive/conf/hive-env.sh ! Virtually all such files are under control of clouderas management.
Manual edits might not just be (silently) overwritten by e.g. cloudera manager but may conflict with other settings you do in cloudera manager. As a result you may have prevented your settings to take effect.
Additionally: when manually editing such files, they must be manually replicated across your cluster nodes, and your cluster needs a restart to take effect.

To trace your q9 mapjoin memory issue I would require the full uncensored log file:

/root/Big-Data-Benchmark-for-Big-Bench/logs/q09_hive_power_test_0.log

In this file, the line which tells you if your settings really have taken effect is this one:

2016-01-28 12:47:51 Starting to launch local task to process map join; maximum memory = 3340763136

As you can see, i am running with 3GB 'Client Java Heap Size in Bytes', which works fine for up to Scale factor 3000 which is 3TB of data (I am running CDH 5.5.1)

Log snippet of q09_hive_power_test_0.log:

Query ID = bmuser1_20160128124747_7258dc1b-a393-4949-b462-36e1f6772e57
Total jobs = 1
Execution log at: /tmp/bmuser1/bmuser1_20160128124747_7258dc1b-a393-4949-b462-36e1f6772e57.log
2016-01-28 12:47:51    Starting to launch local task to process map join;    maximum memory = 3340763136
2016-01-28 12:47:53    Processing rows:    200000    Hashtable size:    199999    Memory usage:    305474776    percentage:    0.091
2016-01-28 12:47:53    Processing rows:    300000    Hashtable size:    299999    Memory usage:    338606224    percentage:    0.101
2016-01-28 12:47:53    Processing rows:    400000    Hashtable size:    399999    Memory usage:    369640496    percentage:    0.111
2016-01-28 12:47:53    Processing rows:    500000    Hashtable size:    499999    Memory usage:    408433336    percentage:    0.122
2016-01-28 12:47:53    Processing rows:    600000    Hashtable size:    599999    Memory usage:    266637872    percentage:    0.08
2016-01-28 12:47:53    Processing rows:    700000    Hashtable size:    699999    Memory usage:    297020432    percentage:    0.089
2016-01-28 12:47:53    Processing rows:    800000    Hashtable size:    799999    Memory usage:    334998632    percentage:    0.10
2016-01-28 12:47:53    Processing rows:    900000    Hashtable size:    899999    Memory usage:    365381200    percentage:    0.109
2016-01-28 12:47:53    Processing rows:    1000000    Hashtable size:    999999    Memory usage:    403359408    percentage:    0.121
2016-01-28 12:47:53    Processing rows:    1100000    Hashtable size:    1099999    Memory usage:    442130600    percentage:    0.132
2016-01-28 12:47:53    Processing rows:    1200000    Hashtable size:    1199999    Memory usage:    480108816    percentage:    0.144
2016-01-28 12:47:53    Processing rows:    1300000    Hashtable size:    1299999    Memory usage:    510491368    percentage:    0.153
2016-01-28 12:47:53    Processing rows:    1400000    Hashtable size:    1399999    Memory usage:    548469584    percentage:    0.164
2016-01-28 12:47:53    Processing rows:    1500000    Hashtable size:    1499999    Memory usage:    578852136    percentage:    0.173
2016-01-28 12:47:54    Processing rows:    1600000    Hashtable size:    1599999    Memory usage:    616830336    percentage:    0.185
2016-01-28 12:47:54    Processing rows:    1700000    Hashtable size:    1699999    Memory usage:    647212888    percentage:    0.194
2016-01-28 12:47:57    Processing rows:    1800000    Hashtable size:    1799999    Memory usage:    620329400    percentage:    0.186
2016-01-28 12:47:57    Processing rows:    1900000    Hashtable size:    1899999    Memory usage:    651137264    percentage:    0.195

Cheers,
Michael

Reply all

Reply to author

Forward