I installed RHadoop on Hortonworks sandbox 2.1. I used the instructions of:
I was facing the problem that MR-jobs did start but would not complete. The map progress on the terminal stayed at 0% (and in Ambari, it stuck at 5%).
I noticed that several other people were having the same problem and that they gave up:
After several hours of struggling, I could finally fix it by changing memory settings. Especially the following resources were helpful:
The following settings worked for me:
- yarn.nodemanager.resource.memory-mb: 3072
- yarn.scheduler.minimum-allocation-mb: 512
- yarn.scheduler.maximum-allocation-mb: 3072
- yarn.nodemanager.vmem-pmem-ratio: 10
- mapreduce.map.memory.mb: 1024
- mapreduce.reduce.memory.mb: 1024
- yarn.app.mapreduce.am.resource.mb: 1024
- yarn.app.mapreduce.am.command-opts: -Xmx768m
- mapreduce.task.io.sort: 512
- mapreduce.map.java.opts: -Xmx768m
- mapreduce.reduce.java.opts: -Xmx768m
Notes:
- Before, I increased VirtualBox memory from 4096 to 5291
- When running R, I entered the following after loading rmr2:
rmr.options(backend.parameters = list(
hadoop = list(D = "mapreduce.map.memory.mb=1024", D = "mapreduce.reduce.memory.mb=1024")
))
I would like to know if my settings are optimal for HDP 2.1 Sandbox, or if further optimization is possible.