Thanks for the info. I uploaded the sequence file once again and checked its properties using fsck. Following is the info I got.
FSCK started by sabanerjee (auth:SIMPLE) from /
127.0.0.1 for path /user/sabanerjee/perftest/input/part-00000 at Sun Mar 30 00:20:48 EDT 2014
.Status: HEALTHY
Total size: 8448509 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 5 (avg. block size 1689701 B)
Minimally replicated blocks: 5 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Sun Mar 30 00:20:48 EDT 2014 in 6 milliseconds
So looks like there are 5 blocks as expected. However, the number of parallel map tasks is still 2. I noticed however that the number of map tasks in total now was 6, though it is set to 5 in DkproHadoopDriver.
I searched around a bit more and made the following config changes:
1. Set yarn.nodemanager.resource.cpu-vcores to 5 in yarn-config.xml
2. Removed the option mapreduce.map.memory.mb which was set to 3072. - Since yarn is allocated 6 GB in yarn-config, I thought maybe this limit is causing the number of mappers to be limited (My total memory is 8 GB)
My new configurations are at
Any other thoughts?
Regards,
Samudra