请问如何增加一个map的输入量？

zhang辉张

unread,

Oct 15, 2012, 5:00:44 AM10/15/12

to hadoo...@googlegroups.com

如题，请问如何减少map的个数，也就是增大每个map的输入量？

feng lu

unread,

Oct 15, 2012, 5:20:53 AM10/15/12

to hadoo...@googlegroups.com

你好：

可以通过控制InputFormat的getSplits来做。

/**

* Logically split the set of input files for the job.

*

* Each {@link InputSplit} is then assigned to an individual {@link Mapper}

* for processing.

*

* Note: The split is a logical split of the inputs and the

* input files are not physically split into chunks. For e.g. a split could

* be <input-file-path, start, offset> tuple. The InputFormat

* also creates the {@link RecordReader} to read the {@link InputSplit}.

*

* @param context job configuration.

* @return an array of {@link InputSplit}s for the job.

*/

public abstract

List<InputSplit> getSplits(JobContext context

) throws IOException, InterruptedException;

On Mon, Oct 15, 2012 at 5:00 PM, zhang辉张 <zhang...@gmail.com> wrote:

如题，请问如何减少map的个数，也就是增大每个map的输入量？

--
You received this message because you are subscribed to the Google Groups "Hadoop In China" group.
To post to this group, send email to hadoo...@googlegroups.com.
To unsubscribe from this group, send email to hadooper_cn...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hadooper_cn?hl=en.

--
Don't Grow Old, Grow Up... :-)

zhang辉张

unread,

Oct 15, 2012, 5:27:16 AM10/15/12

to hadoo...@googlegroups.com

我用streaming的方式提交任务了，能不能通过参数修改每次传给map的数据量的大小？

feng lu

unread,

Oct 15, 2012, 5:47:51 AM10/15/12

to hadoo...@googlegroups.com

streaming也可以指定input/output format的吧。不知道你的inputformat是什么，可不可能通过参数来设置split，看一下你的inputformat说明吧。

zhang辉张

unread,

Oct 15, 2012, 5:51:36 AM10/15/12

to hadoo...@googlegroups.com

嗯，是可以指定，我应该是默认的inputformat，textinputformt吧，不太能确定

准备试试 mapred.min.split.size 这个参数，设置大一些看看有没有效果，大家有没有用过的经验呢？

feng lu

unread,

Oct 15, 2012, 6:06:37 AM10/15/12

to hadoo...@googlegroups.com

你用的是不是FileInputFormat，这个参数可以设置，但是不能小于blockSize。

On Mon, Oct 15, 2012 at 5:51 PM, zhang辉张 <zhang...@gmail.com> wrote:

split.size

Hailang You

unread,

Feb 3, 2016, 10:42:12 PM2/3/16

to Hadoop In China

直接使用CombineTextInputFormat ，可以将小文件合并。

job.setInputFormatClass(CombineTextInputFormat.class);

job.getConfiguration().setLong(CombineFileInputFormat.SPLIT_MINSIZE_PERNODE, 0 * 1024 * 1024);

job.getConfiguration().setLong(CombineFileInputFormat.SPLIT_MINSIZE_PERRACK, 256 * 1024 * 1024);

job.getConfiguration().setLong("mapreduce.input.fileinputformat.split.maxsize", 512 * 1024 * 1024);

在 2012年10月15日星期一 UTC+8下午5:00:44，atomzhang写道：

如题，请问如何减少map的个数，也就是增大每个map的输入量？

Reply all

Reply to author

Forward