map-reduce: summing hashes of {word: count, word2: count}

150 views
Skip to first unread message

James K

unread,
Sep 6, 2010, 10:22:38 AM9/6/10
to mongodb-user
Hello everyone.
I was experimenting with Mongo's map-reduce last week and hit an odd
issue with Javascript objects. First I'll set up the problem for you.

I have tokenized a set of text from a forum into a collection. One
field is a map of word => count, and there are other fields for
topic_id, user_id etc. Each object in the collection represents one
post, and I need to roll them up by user or topic.

For example, one object may look like {user_id: 1, topic_id: 1,
post_id: 1, stem_counts: { 'foo': 2, 'bar': 6}}.

Now, summing over hashes gets tricky because JS hashes are just
objects, and they aren't really empty.
Iterating over the keys in the object (for word in words) will result
in not just 'foo' and 'bar' but also 'eval' and 'watch', the value of
which are functions rather than numbers. I had to put in checks for
the type of the value returned from the object, which seem rather
inelegant.

Before pasting my map and reduce functions in, my questions are:
should I have modeled this differently in mongodb? If the bag of word
counts is reasonable, is there a better way to get a truly empty
object in JS to avoid having to check the type of the stored values in
the map?

Thanks

userIdMapper = function () {
var total_words = 0;
for (word in this['stem_counts'])
{
if (typeof this['stem_counts'][word] == 'number')
total_words += this['stem_counts'][word];
}
emit(this.user_id, {total_words: total_words, words:
this.stem_counts});
}

r = function(key, vals) {
var totals = {};
var curr;
var len = vals.length;
var wordCount = 0;
var word;
var existingTotal;
for (var idx = 0; idx < len; idx++)
{
//iterate over all keys
curr = vals[idx]['words'];
for (word in curr)
{
if (typeof curr[word] == 'number')
{
existingTotal = totals[word];
if (typeof existingTotal != 'number')
{
existingTotal = 0;
}
totals[word] = existingTotal + (curr[word] || 0);
wordCount = wordCount + (curr[word] || 0);
}
}
}
return {total_words: wordCount, words: totals};
}


db.runCommand({mapreduce: 'posts', map: userIdMapper, reduce: r, out:
'posts_by_user'})

Eliot Horowitz

unread,
Sep 7, 2010, 1:58:44 AM9/7/10
to mongod...@googlegroups.com
Are you sure about eval and watch showing up?

> x = { a : 1 }
{ "a" : 1 }
> for ( z in x ) print(z)
a

Otherwise I think that looks good.


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


James K

unread,
Sep 7, 2010, 11:17:34 AM9/7/10
to mongodb-user
Interesting, I get the same iteration behavior here.

Since iteration works as expected, it must have been my lookups that
failed when one of the special words was present in one of the hashes
being summed and not the other, like:

> z['vitamin']
1
> z['eval']
function eval() {
[native code]
}
> z['watch']
function watch() {
[native code]
}

I'll double check the lookup code, because at some point, the sum at
'watch' is something "12function watch(){[native code]}134512345" when
the + operator switches to concatenation because that string sneaks
in.

Thanks!

On Sep 7, 1:58 am, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Are you sure about eval and watch showing up?
>
> > x = { a : 1 }
> { "a" : 1 }
> > for ( z in x ) print(z)
>
> a
>
> Otherwise I think that looks good.
>
> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsu...@googlegroups.com>
> > .
Reply all
Reply to author
Forward
0 new messages