I don't even know where to begin providing details on this one.
Mapper runs for an hour. Completes 100% No errors.
Reducer runs for an hour. Completes 100% no errors.
Then the application container kills the reducer and runs it all over again.
The only error details I get are:
AttemptID:attempt_1451522080526_0003_r_000000_1 Timed out after 300 secs Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143
It's worth noting I did have to set the following args to get this far:
mapreduce.map.memory.mb=8576
mapreduce.reduce.java.opts=-Xmx7808m
mapreduce.reduce.memory.mb=8576
yarn.app.mapreduce.am.command-opts=-Xmx7808m
yarn.app.mapreduce.am.resource.mb=8576
mapreduce.map.java.opts=-Xmx7808m