请问如何增加一个map的输入量?

17 views
Skip to first unread message

zhang辉张

unread,
Oct 15, 2012, 5:00:44 AM10/15/12
to hadoo...@googlegroups.com
如题, 请问如何减少map的个数,也就是增大每个map的输入量?

feng lu

unread,
Oct 15, 2012, 5:20:53 AM10/15/12
to hadoo...@googlegroups.com
你好:

可以通过控制InputFormat的getSplits来做。

 /** 
   * Logically split the set of input files for the job.  
   * 
   * <p>Each {@link InputSplit} is then assigned to an individual {@link Mapper}
   * for processing.</p>
   *
   * <p><i>Note</i>: The split is a <i>logical</i> split of the inputs and the
   * input files are not physically split into chunks. For e.g. a split could
   * be <i>&lt;input-file-path, start, offset&gt;</i> tuple. The InputFormat
   * also creates the {@link RecordReader} to read the {@link InputSplit}.
   * 
   * @param context job configuration.
   * @return an array of {@link InputSplit}s for the job.
   */
  public abstract 
    List<InputSplit> getSplits(JobContext context
                               ) throws IOException, InterruptedException;
  


On Mon, Oct 15, 2012 at 5:00 PM, zhang辉张 <zhang...@gmail.com> wrote:
如题, 请问如何减少map的个数,也就是增大每个map的输入量?

--
You received this message because you are subscribed to the Google Groups "Hadoop In China" group.
To post to this group, send email to hadoo...@googlegroups.com.
To unsubscribe from this group, send email to hadooper_cn...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hadooper_cn?hl=en.



--
Don't Grow Old, Grow Up... :-)

zhang辉张

unread,
Oct 15, 2012, 5:27:16 AM10/15/12
to hadoo...@googlegroups.com
我用streaming的方式提交任务了,能不能通过参数修改每次传给map的数据量的大小?

feng lu

unread,
Oct 15, 2012, 5:47:51 AM10/15/12
to hadoo...@googlegroups.com
streaming也可以指定input/output format的吧。不知道你的inputformat是什么,可不可能通过参数来设置split,看一下你的inputformat说明吧。

zhang辉张

unread,
Oct 15, 2012, 5:51:36 AM10/15/12
to hadoo...@googlegroups.com
嗯,是可以指定,我应该是默认的inputformat,textinputformt吧,不太能确定

准备试试 mapred.min.split.size 这个参数,设置大一些看看有没有效果,大家有没有用过的经验呢?

feng lu

unread,
Oct 15, 2012, 6:06:37 AM10/15/12
to hadoo...@googlegroups.com
你用的是不是FileInputFormat,这个参数可以设置,但是不能小于blockSize。

On Mon, Oct 15, 2012 at 5:51 PM, zhang辉张 <zhang...@gmail.com> wrote:
split.size

Hailang You

unread,
Feb 3, 2016, 10:42:12 PM2/3/16
to Hadoop In China
直接使用CombineTextInputFormat ,可以将小文件合并。
    job.setInputFormatClass(CombineTextInputFormat.class);

    job.getConfiguration().setLong(CombineFileInputFormat.SPLIT_MINSIZE_PERNODE, 0 * 1024 * 1024);
    job.getConfiguration().setLong(CombineFileInputFormat.SPLIT_MINSIZE_PERRACK, 256 * 1024 * 1024);
    job.getConfiguration().setLong("mapreduce.input.fileinputformat.split.maxsize", 512 * 1024 * 1024);

在 2012年10月15日星期一 UTC+8下午5:00:44,atomzhang写道:
如题, 请问如何减少map的个数,也就是增大每个map的输入量?
Reply all
Reply to author
Forward
0 new messages