FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
Hive converted a join into a locally running and faster 'mapjoin', but ran out of memory while doing so.
maximum memory = 257949696
256Mb of memory is not quite enough
There are two bugs responsible for this.
Bug 1)
------
hives metric for converting joins miscalculated the required amount of memory. This is especially true for compressed files and ORC files, as hive uses the filesize as metric, but compressed tables require more memory in their uncompressed 'in memory representation'.
You could simply decrease 'hive.smalltable.filesize' to tune the metric, or increase 'hive.mapred.local.mem' to allow the allocation of more memory for map tasks.
The later option may lead to bug number two if you happen to have a affected hadoop version.
Bug 2)
Hive/Hadoop ignores 'hive.mapred.local.mem' !
(more exactly: bug in Hadoop 2.2 where hadoop-env.cmd sets the -xmx parameter multiple times, effectively overriding the user set hive.mapred.locla.mem setting.
see:
https://issues.apache.org/jira/browse/HADOOP-10245There are 3 workarounds for this bug:
1) assign more memory to the local! Hadoop JVM client (this is not! mapred.map.memory) because map-join child jvm will inherit the parents jvm settings
+In cloudera manager home, click on "hive" service,
+then on the hive service page click on "configuration"
+Gateway base group --(expand)--> Resouce Management -> Client Java Heap Size in Bytes -> 1GB
2) reduce "hive.smalltable.filesize" to ~1MB or below (depends on your cluster settings for the local JVM)
3) turn off "hive.auto.convert.join" to prevent hive from converting the joins to a mapjoin.
The prefered solution is:1).If you can not increase your memory settings to high enougth values, you can additionaly employ workaround 2).
Workaround 3) is the last option. Turning off auto join convert implies a huge performance penalty.
2) & 3) can be set in Big-Bench/hive/hiveSettings.sql