how to use mapred.min.split.size option ?

Showing 1-10 of 10 messages
how to use mapred.min.split.size option ? Mapred Learn 5/24/11 5:17 PM
Hi,
I have few input splits that are few MB in size.
I want to submit 1 GB of input to every mapper. How can I do it ?
Currently each mapper gets one input split that results in many small map-output files.
 
I tried setting -Dmapred.map.min.split.size=<number> , but still it does not take effect.
 
Thanks,
-JJ
Re: how to use mapred.min.split.size option ? Mapred Learn 5/25/11 7:08 AM
Resending ====>


> Hi,
> I have few input splits that are few MB in size.
> I want to submit 1 GB of input to every mapper. Does anyone know how can I do it ?

> Currently each mapper gets one input split that results in many small map-output files.
>  
> I tried setting -Dmapred.map.min.split.size=<number> , but still it does not take effect.
>  
> Thanks,
> -JJ

Re: how to use mapred.min.split.size option ? Juwei Shi 5/25/11 7:49 AM
The input split size is detemined by map.min.split.size, dfs.block.size and mapred.map.tasks.

goalSize = totalSize / mapred.map.tasks
minSize = max {mapred.min.split.size, minSplitSize}
splitSize= max (minSize, min(goalSize, dfs.block.size))

minSplitSize is determined by each InputFormat such as SequenceFileInputFormat.

You may want to refer to FileInputFormat.java for more details.


2011/5/25 Mapred Learn <mapred...@gmail.com>



--
- Juwei Shi
Re: how to use mapred.min.split.size option ? Juwei Shi 5/25/11 7:51 AM
The following are suitable for hadoop 0.20.2.

2011/5/25 Juwei Shi <shij...@gmail.com>



--
- Juwei Shi (史巨伟)
Re: how to use mapred.min.split.size option ? Mapred Learn 5/25/11 7:59 AM
Thanks Juwei !
I will go through this..

Sent from my iPhone
Re: how to use mapred.min.split.size option ? Mapred Learn 5/25/11 9:58 AM
I gave mapred.min.size=1000000000L i.e. 1 GB and each input file is 233 MB and block size = 64 MB.
With all these values, i thought my split size would work and 4 input files would be combined to get 1 GB input split but somehow this does not happen and I get 10 mappers , each corresponding to 233 MB file.

Re: how to use mapred.min.split.size option ? Harsh J 5/25/11 10:05 AM
This is the correct behavior. Regular FileInputFormat derivatives
would transform, at the least, one file == one mapper. You need to
look at CombineFileInputFormat/etc. to have multiple files per map
task.

--
Harsh J

Re: how to use mapred.min.split.size option ? Mapred Learn 5/25/11 11:34 AM
Hi Harsh,
I just implemented a combineFile InputFormat and its record reader for my case.
 
Now my input has 10 files each of 233 MB and by using this, My job just runs 1 mapper that processes  them.
 
How can I control it by split size i.e. if i say make every split 1 GB i.e. run 3 mappers for these 10 files not 1 ?
 
Thanks,
-JJ
Re: how to use mapred.min.split.size option ? Mapred Learn 5/25/11 12:44 PM
Sorry it is working,, i was not giving right value with -Dmapred.max.split.size.
 
Thanks for your help !

Re: how to use mapred.min.split.size option ? khan 4/20/13 9:56 AM
Dear,

What is correct statement for -Dmapred.max.split.size.
I have 60 MB data file and  want to further divide into small (i.e. 15MB) and then expected to process 4 Mapps. So what is correct statement.

Many thanks