James K
unread,Sep 6, 2010, 10:22:38 AM9/6/10Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to mongodb-user
Hello everyone.
I was experimenting with Mongo's map-reduce last week and hit an odd
issue with Javascript objects. First I'll set up the problem for you.
I have tokenized a set of text from a forum into a collection. One
field is a map of word => count, and there are other fields for
topic_id, user_id etc. Each object in the collection represents one
post, and I need to roll them up by user or topic.
For example, one object may look like {user_id: 1, topic_id: 1,
post_id: 1, stem_counts: { 'foo': 2, 'bar': 6}}.
Now, summing over hashes gets tricky because JS hashes are just
objects, and they aren't really empty.
Iterating over the keys in the object (for word in words) will result
in not just 'foo' and 'bar' but also 'eval' and 'watch', the value of
which are functions rather than numbers. I had to put in checks for
the type of the value returned from the object, which seem rather
inelegant.
Before pasting my map and reduce functions in, my questions are:
should I have modeled this differently in mongodb? If the bag of word
counts is reasonable, is there a better way to get a truly empty
object in JS to avoid having to check the type of the stored values in
the map?
Thanks
userIdMapper = function () {
var total_words = 0;
for (word in this['stem_counts'])
{
if (typeof this['stem_counts'][word] == 'number')
total_words += this['stem_counts'][word];
}
emit(this.user_id, {total_words: total_words, words:
this.stem_counts});
}
r = function(key, vals) {
var totals = {};
var curr;
var len = vals.length;
var wordCount = 0;
var word;
var existingTotal;
for (var idx = 0; idx < len; idx++)
{
//iterate over all keys
curr = vals[idx]['words'];
for (word in curr)
{
if (typeof curr[word] == 'number')
{
existingTotal = totals[word];
if (typeof existingTotal != 'number')
{
existingTotal = 0;
}
totals[word] = existingTotal + (curr[word] || 0);
wordCount = wordCount + (curr[word] || 0);
}
}
}
return {total_words: wordCount, words: totals};
}
db.runCommand({mapreduce: 'posts', map: userIdMapper, reduce: r, out:
'posts_by_user'})