bart@sandy-quad-1:~/Big-Bench$ ./scripts/bigBench runBenchmark -m 4 -f 1 -s 2
[...]
Does anybody have an idea why query 9 and 20 are still failing?
set hive.mapjoin.smalltable.filesize=1;
I've investigated query 9, and my educated guess is that it is suffering the map join problem described in section 'optimization for joins' in hiveSettings.sql.
To overcome that problem, I tried
set hive.mapjoin.smalltable.filesize=1;
to avoid that the common join is converted to a map join, but that did not work.
but when settinghive.smalltable.filesize
hive.mapjoin.smalltable.filesize
- Default Value:
25000000
- Added In: Hive 0.7.0 with HIVE-1642: hive.smalltable.filesize (replaced by hive.mapjoin.smalltable.filesize in Hive 0.8.1)
- Added In: Hive 0.8.1 with HIVE-2499: hive.mapjoin.smalltable.filesize
The threshold (in bytes) for the input file size of the small tables; if the file size is smaller than this threshold, it will try to convert the common join into map join.
hive.mapjoin.smalltable.filesize to 0 or 1, i still get2014-09-19 02:15:23 Starting to launch local task to process map join; maximum memory = 257949696
2014-09-19 02:15:25 Dump the side-table into file: file:/tmp/bart/hive_2014-09-19_14-14-45_643_6217887407085482903-1/-local-10006/HashTable-Stage-5/MapJoin-mapfile21--.hashtable
2014-09-19 02:15:25 Upload 1 File to: file:/tmp/bart/hive_2014-09-19_14-14-45_643_6217887407085482903-1/-local-10006/HashTable-Stage-5/MapJoin-mapfile21--.hashtable
2014-09-19 02:15:25 Dump the side-table into file: file:/tmp/bart/hive_2014-09-19_14-14-45_643_6217887407085482903-1/-local-10006/HashTable-Stage-5/MapJoin-mapfile11--.hashtable
2014-09-19 02:15:25 Upload 1 File to: file:/tmp/bart/hive_2014-09-19_14-14-45_643_6217887407085482903-1/-local-10006/HashTable-Stage-5/MapJoin-mapfile11--.hashtable
2014-09-19 02:15:25 Processing rows: 200000 Hashtable size: 199999 Memory usage: 118769408 percentage: 0.46
2014-09-19 02:15:26 Processing rows: 300000 Hashtable size: 299999 Memory usage: 151032040 percentage: 0.586
Execution failed with exit status: 3
Obtaining error information
Task failed!
Task ID:
Stage-15
[...]
so it seems like the common join is still being converted to a map join...
Suggestions on how to solve this problem are still welcome.
set hive.auto.convert.join=false;
Does anybody have an idea why query 9 and 20 are still failing?
Error: java.lang.NumberFormatException: For input string: "\N"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
at java.lang.Double.valueOf(Double.java:504)
at org.apache.mahout.clustering.conversion.InputMapper.map(InputMapper.java:48)
at org.apache.mahout.clustering.conversion.InputMapper.map(InputMapper.java:34)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
./scripts/bigBench -q 20 runQuery
q20:
==========================
I have located and fixed the cause of this issue. (Update in github)
But i wonder why it surfaced now. Previous big-bench versions did not show this behavior. Maybe because of an updated and more correct database or a more recent hive version.
To retest this issue re-run q20 with:./scripts/bigBench -q 20 runQuery
bart@sandy-quad-1:~$ hive --version
Hive 0.12.0-cdh5.1.2
Subversion git://ubuntu64-12-04-mk1/var/lib/jenkins/workspace/generic-package-ubuntu64-12-04/CDH5.1.2-Packaging-Hive-2014-08-25_19-47-00/hive-0.12.0+cdh5.1.2+375-1.cdh5.1.2.p0.2~precise/src -r 8e266e052e423af592871e2dfe09d54c03f6a0e8
Compiled by jenkins on Mon Aug 25 19:53:31 PDT 2014
From source with checksum 6cc3c71c3f8c2dfbf547fde9f3b23985