Groups
Sign in
Groups
cs402pku
Conversations
About
Send feedback
Help
求各位指点,文件id(整数形式)是怎么实现的?
78 views
Skip to first unread message
Krasus C
unread,
Jul 20, 2014, 4:16:55 AM
7/20/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cs40...@googlegroups.com
可以得到文件名,不过这个不知道怎么弄嘤嘤。试了试貌似用什么全局变量每次根据名字来+1的做法不对。。
郭行健
unread,
Jul 20, 2014, 4:54:24 AM
7/20/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cs40...@googlegroups.com
需要建一个辅助字典……鉴于文件id是非负整数,直接把所有文件名(或者Path对象)放在一个数组中就行了。这个数组可以在Mapper的setup方法中创建,也可以预先做好并让Hadoop Runtime作为Distributed Cache发放给每一个Mapper Task(Distributed Cache的API可参见
Hadoop: The Definitive Guide
的第八章),后者的好处是文件数目特别多时不会导致内存溢出。
在 2014年7月20日星期日UTC+8下午4时16分55秒,Krasus C写道:
可以得到文件名,不过这个不知道怎么弄嘤嘤。试了试貌似用什么全局变量每次根据名字来+1的做法不对。。
张雨晴
unread,
Jul 20, 2014, 5:59:23 AM
7/20/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cs40...@googlegroups.com
求问如何得到所有文件的文件名?
Krasus C
unread,
Jul 20, 2014, 7:32:38 AM
7/20/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cs40...@googlegroups.com
是可以获得当前文件的文件名,你是想直接获取所有的然后来建表?
不过倒是启发了我。。可以搞个set,每次获取文件名就插入一个,然后用set的大小来作为id。。
在 2014年7月20日星期日UTC+8下午5时59分23秒,张雨晴写道:
Krasus C
unread,
Jul 20, 2014, 9:12:49 AM
7/20/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cs40...@googlegroups.com
不过这样不知道顺序问题怎么办。。
在 2014年7月20日星期日UTC+8下午7时32分38秒,Krasus C写道:
杨博文
unread,
Jul 20, 2014, 9:24:34 AM
7/20/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cs40...@googlegroups.com
虽然怎么听怎么简单粗暴但是好像也只能这么干了(╯‵□′)╯︵┻━┻
在 2014年7月20日星期日UTC+8下午4时54分24秒,郭行健写道:
Krasus C
unread,
Jul 20, 2014, 11:48:49 AM
7/20/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cs40...@googlegroups.com
建立数组的时候,如果每次按照当前字典中的对象个数来增加的话,会不会每次任务执行会有不同的计数,比如文件A有时候是2,有时候是3..那样就应该再输出一份本次任务用的字典么。。。
在 2014年7月20日星期日UTC+8下午4时54分24秒,郭行健写道:
郭行健
unread,
Jul 20, 2014, 11:59:32 PM
7/20/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cs40...@googlegroups.com
一股脑儿获得全部文件名更好些吧……出错的概率会小一些。具体可以参照Hadoop: The Definitive Guide (3rd edition)第64页的方法。
在 2014年7月20日星期日UTC+8下午7时32分38秒,Krasus C写道:
Haoyan Huo
unread,
Jul 21, 2014, 8:45:50 AM
7/21/14
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cs40...@googlegroups.com
在每个Mapper的Setup里面都获取一边列表有风险,最好在commit Job之间就最好这件事吧。
Reply all
Reply to author
Forward
0 new messages