Hi
Some of the steps i usually follow:
1. Increase dfs block size to either 256 or 512M
2. mapred.min.split.size = dfs.block.size (don’t increase it to have lesser maps)
3. Mapred.map/reduce.tasks.speculative.execution - false
4. Mapred.job.reuse.jvm.tasks - -1
5. mapred.child.java.opts – have higher child task memory
io.sort.mb - check for spills more than 1 in logs. And increase it
6. mapred.compress.map.output – true
7.Use combiner – if possible
8. Total Slots (M + R) > num cores
9. Increase ulimit
10. Use a correct writable data format
Text is very expensive
Reuse writables
11. Implement RAW COMPARATORS with your custom format
12. Create objects in setup method of your Mapper/Reducer class & Reuse
13. mapred.reduce.slowstart.completed.maps – 0.8
14. tasktracker.http.threads – 2* no of cores
--