Map-Reduce inconsistent behavior using Array for _id

27 views
Skip to first unread message

David Pellegrini

unread,
Nov 19, 2014, 3:24:32 PM11/19/14
to mon...@googlegroups.com
Hi All,

I've noticed some inconsistent behaviors in the Map-Reduce functionality, and am wondering if they are known issues or not. I'll present one issue here, then start another topic on the other issue, just to keep the discussions focused.

First up ...

A Map-Reduce process in which the _id is an array works fine in mongo shell, but does not entirely work via mongoid.

The example process essentially takes lists of companies and produces a strength-of-relationship score for each pair of companies, based on the number of lists containing the pair. The lists are actually heterogeneous so, in the code below, the first step of the mapper filters the list items for just the companies. Then it computes each pair of company ids and emits the pair with a nominal value of 1:

@@mapper = <<EOM
function() {
  if (typeof this.items !== "undefined") {
    var companies = this.items.filter (function(x) { return x.type == "Company"; });
    for (var i = 0; i < companies.length-1; i++) {
      var cid1 = companies[i].id;
      for (var j = i+1; j < companies.length; j++) {
        var cid2 = companies[j].id;
        if (cid1 < cid2) {
          emit([cid1, cid2], 1);
        } else {
          emit([cid2, cid1], 1);
        }
      }
    }
  }
}
EOM

@@reducer = <<EOR
function(key, values) {
  return values.length
}
EOR

and invoked thusly:

IndustryList.
          where(@@criteria).
          map_reduce(@@mapper, @@reducer).
          out(replace: 'tmp_scored_companies')

I developed the mapper and reducer using mongo shell, and it produced lots of documents like: 

{
"_id" : [
ObjectId("4d430abf91a34357e000000c"),
ObjectId("4d9a4a4c91a3437035000028")
],
"value" : 26
}

However, after copy-pasting into the Ruby code and executing via mongoid, the results were very different -- exactly one document like this:

{
"_id" : [
ObjectId("4d430abf91a34357e000000c"),
ObjectId("4d9a4a4c91a3437035000028")
],
"value" : 1
}

It's as though the mapper simply stopped after the first emit.

Changing the mapper code to emit hashes rather than arrays produces consistent results in both mongo shell and mongoid.

        if (cid1 < cid2) {
          emit({1: cid1, 2: cid2}, 1);
        } else {
          emit({1: cid2, 2: cid1}, 1);
        }

I should mention versions:
    mongoid (3.1.6)
    MongoDB shell version: 2.6.3

Comments?
Reply all
Reply to author
Forward
0 new messages