答复: 对于输入是大批量小文件如何处理

15 views
Skip to first unread message

leon...@gmail.com

unread,
Jul 5, 2012, 7:40:47 PM7/5/12
to hadoo...@googlegroups.com
简单的处理方式应该是设置jvm重用吧



-- Sent from my HP Veer


zhang辉张<zhang...@gmail.com>于2012-7-4 09:49写道:

hi,

     现在遇到个问题,输入是大批量的小文件,每个map解码一个小文件,其实解码本身很快,但是启动一个map占用的时间就很大了

     想问问大家,对于这种输入是大批量小文件是如何处理滴

--
You received this message because you are subscribed to the Google Groups "Hadoop In China" group.
To post to this group, send email to hadoo...@googlegroups.com.
To unsubscribe from this group, send email to hadooper_cn...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hadooper_cn?hl=en.

zhang辉张

unread,
Jul 10, 2012, 7:24:30 AM7/10/12
to hadoo...@googlegroups.com
有木有出现使用 jvm 之后,出现ENOENT:No such file or directory这个错误呢?

zhang辉张

unread,
Jul 13, 2012, 2:29:47 AM7/13/12
to hadoo...@googlegroups.com
大家有木有用过 CombineFileInputFormat,在使用的时候有木有出现outofmemorey的错误

一个目录下面待该6W多个小文件,每个不超过10M

feng lu

unread,
Jul 13, 2012, 10:06:47 PM7/13/12
to hadoo...@googlegroups.com
6W个文件,每个文件不超过10M,那也要600G, 你设置了CombineFileInputFormat的maxSplitSize了吗,它的getSplits返回了多少个InputSplit,会不会单个Map任务的处理数据量多大导致的。

2012/7/13 zhang辉张 <zhang...@gmail.com>



--
Don't Grow Old, Grow Up... :-)

zhang辉张

unread,
Jul 19, 2012, 8:04:57 AM7/19/12
to hadoo...@googlegroups.com
CombineFileInputFormat的maxSplitSiz 如何设置呢?

2012/7/14 feng lu <amuse...@gmail.com>

zhang辉张

unread,
Jul 19, 2012, 9:14:58 AM7/19/12
to hadoo...@googlegroups.com


重新设置了maxSplitSize之后,出现io错误。

错误日志包含一下内容:

 R/W/S=150/147/0 in:6=150/24 [rec/s] out:6=147/24 [rec/s]

这是神马意思呢?

2012/7/19 zhang辉张 <zhang...@gmail.com>
Reply all
Reply to author
Forward
0 new messages