Tutorial: Simulating Secondary Sort on Values with Hadoop

31 views
Skip to first unread message

Han Jiang

unread,
Jul 21, 2014, 2:46:27 AM7/21/14
to cs40...@googlegroups.com
原网址在: http://sonerbalkir.blogspot.com/2010/01/simulating­secondary­sort­on­values.html

有的同学可能登录不了,就看这个pdf吧。
HadoopComparatortutorial.pdf

张雨晴

unread,
Jul 21, 2014, 5:05:15 AM7/21/14
to cs40...@googlegroups.com
谢谢分享,很有用的资料:)
有个地方没太看懂,在这个pdf最后一页,
public class Reducer {
  long denominator;
  public void reduce(key, values) {
      string word = key.substring(0, last index of '#');
      string doc_id = key.substring(last index of '#' + 1, key.length);
      if (doc_id == 0 ) {
           denominator = 0;
                 for each v in values
                    denominator += v;
      }
      else {
                long sum = 0;
                for each v in values
                sum += v;
                emit(key, sum / denominator);
       }
}
似乎有些问题?denominator 仅在if中赋值,怎么会在else中能够得到呢?



在 2014年7月21日星期一UTC+8下午2时46分27秒,Han Jiang写道:

Han Jiang

unread,
Jul 21, 2014, 5:13:41 AM7/21/14
to cs402pku
同样的reducer会接收到划入一个group里的key,但不一定是一次就接收到哈。可能是通过reduce的多次调用收集到的。

这里denominator是作为成员变量来被修改的。因为代码逻辑保证一定会存在doc_id == 0的词项,而这些词项一定会在doc_id > 0的词项前被收集到。所以可以用先统计doc_id == 0 的词项。 在计算doc_id > 0词项分数时,用加总后的denominator作为分母平滑分数。




--
You received this message because you are subscribed to the Google Groups "cs402pku" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cs402pku+u...@googlegroups.com.
To post to this group, send email to cs40...@googlegroups.com.
Visit this group at http://groups.google.com/group/cs402pku.
For more options, visit https://groups.google.com/d/optout.



--
Han Jiang

Team of Search Engine and Web Mining,
School of Electronic Engineering and Computer Science
,
Peking University, China

张雨晴

unread,
Jul 21, 2014, 7:19:49 AM7/21/14
to cs40...@googlegroups.com, h...@apache.org

嗯……我觉得我做的是一样的事,但输出的DF总是0,调了好久也没调出来……










在 2014年7月21日星期一UTC+8下午5时13分41秒,Han Jiang写道:
Reply all
Reply to author
Forward
0 new messages