The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Finding Interesting People / Nested Map Reduce Mayhem?
 There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic. There was an error processing your request. Please try again. Standard view   View as tree
 6 messages

From:
To:
Cc:
Followup To:
Subject:
 Validation: For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon.

More options Sep 11 2012, 12:24 pm
From: pctj101 <mir...@x2o.net>
Date: Tue, 11 Sep 2012 09:24:15 -0700 (PDT)
Local: Tues, Sep 11 2012 12:24 pm
Subject: Finding Interesting People / Nested Map Reduce Mayhem?

I'm trying to analyze a dataset of People.  I'd like to find interesting
people.

I don't need you guys to write the entire solution... I'd like to just ask
a few questions:

Here's the situation:

For each person
Find the average salary of my "peers" defined as:
All people that are:
Within +/- 5 years of age
Within +/- 10 pounds

Only include the Person in the "ultimate" result set:
If that average salary of my "peers" is more than \$X different than
this person
And include the "set" of peers that this person got compared to

I can use map-reduce to get the average of a set (for just 1 person's
peers).  No problem!

* In this case, because the set of "peers" is different for each
"Person", do I:
1a) Write a more crazy map/reduce function that...
1b) Perhaps even a map/reduce function that calls another
map/reduce function?
M-R: (Key = Person, Value = ( M-R ( Key = Uhhhh?, Query:
Peers ) ) <- bad pseudocode but.. general idea... loop in a loop

1c) Run map-reduce N times, once for each person, each with a
different "query" filter to pick my peers?

1d) If running N times, do you see "a problem" with using a
forEach loop on the mongodb server? (rather than having a Perl/PHP script
run 10000 M-R calls, just have the serverside Javascript run the loop)

Just general thoughts are fine... If you see anything like "OMG don't do
that!" or "Here's a great idea" please let me know :)

To post a message you must first join this group.
You do not have the permission required to post.
More options Sep 12 2012, 6:22 am
From: Jan Riechers <janpet...@freenet.de>
Date: Wed, 12 Sep 2012 13:19:37 +0300
Local: Wed, Sep 12 2012 6:19 am
Subject: Re: [mongodb-user] Finding Interesting People / Nested Map Reduce Mayhem?
On 11.09.2012 19:24, pctj101 wrote:

[TRIMMED]

Hi,

I am not sure which language you are using to write down your
application, but you could also, instead of a map/reduce, make use of
"\$lte" and "\$gte" modificator, to scout out ranges.

For example in Python
currentMoney = 100              (Pounds)
currentAge = 25                 (years)
{'\$gte': {'money':currentMoney, 'age': currentAge}, '\$lte':
{'money':currentMoney+10, 'age': currentAge+5}}

What this simply does - it looks for matching money and age entries
which are either higher or equal to the current-Values BUT also lower or
equal the currentValue +10 and +5 for the age (your example range values)

I think this will do the trick for you without even making use of
map/reduce, just by using the query magic.

To post a message you must first join this group.
You do not have the permission required to post.
More options Sep 12 2012, 2:35 pm
From: Jenna deBoisblanc <jenna.deboisbl...@10gen.com>
Date: Wed, 12 Sep 2012 11:35:34 -0700 (PDT)
Local: Wed, Sep 12 2012 2:35 pm
Subject: Re: Finding Interesting People / Nested Map Reduce Mayhem?
Hi,

You have an interesting problem to solve. Jan makes a good point-
using a query to filter out non-matching peers will significantly
improve MR performance/complexity. If possible, I would recommend
using the aggregation framework (available in MongoDB version 2.2) for
this task.  MR uses Javascript, and since MongoDB uses SpiderMonkey -
a single-threaded Javascript engine, MR is a slow, blocking operation.
As such, I would not recommend running multiple MR commands back-to-
back for each user. It is also possible to specify a query in your
aggregation command:
Agg framework: http://docs.mongodb.org/manual/applications/aggregation/

It may also be possible to do the calculation for all users with a
single aggregation command. If you are still searching for a solution,
could you provide a sample document to help us tailor our response to

"Binning" or "bucketing" the users based on age could simplify the
problem, and it might allow you could accomplish the aggregation with
a single command. An example of "binning":

{name: Bob, age: 21, weight: 100}
{name: Jenna, age: 23, weight: 108}

{name: Gene, age: 24, weight: 120}
{name: Susan, age: 22, weight; 127}

{name: Tom, age: 25, weight: 101}
{name: Ellen, age: 26, weight: 102}

bin age by 5, weight by 10:
Bob and Jenna are peers, defined as users in the ages 20 - 24, weight
100 - 109.
Gene and Susan are peers, defined as users in the ages 20 - 24, weight
120 - 129.
Tom and Ellen are peers, defined as users in ages 25 - 29, weight 100
- 109.

Hope this helps! Let us know if you need any additional help.

On Sep 12, 6:22 am, Jan Riechers <janpet...@freenet.de> wrote:

To post a message you must first join this group.
You do not have the permission required to post.
More options Sep 15 2012, 6:52 am
From: pctj101 <mir...@x2o.net>
Date: Sat, 15 Sep 2012 03:52:26 -0700 (PDT)
Local: Sat, Sep 15 2012 6:52 am
Subject: Re: Finding Interesting People / Nested Map Reduce Mayhem?

So okay... thanks for the tips so far.  In fact I am using the \$gt/\$lt
operators and aggregation.

The problem is that I feel like I have to run this "query" for each row in
my collection.

For Bob, map-reduce/aggregate/whatever { :age => {\$gt => bobs.age - 5} ....

}

For Jane, map-reduce/aggregate/whatever { :age => {\$gt => janes.age - 5}
.... }
For Andy, map-reduce/aggregate/whatever { :age => {\$gt => andys.age - 5}
.... }
... repeat this n-times ...

Hey just as a quick question.. if I ran a "for" / "foreach" loop on a
result set.. like:

peoplez = db.people.find()
forEach (person in peoplez, function(person) { whatever })

Does that loop occur entirely on the mongodb process? or does mongocli have
to fetch all the records back into the client side to run the loop?

If the loop runs entirely on the serverside without having to transfer the
entire people collection to the client side, I suppose that would be an
alternative solution?

Thanks!

Jeff

To post a message you must first join this group.
You do not have the permission required to post.
More options Sep 18 2012, 12:57 pm
From: Jenna deBoisblanc <jenna.deboisbl...@10gen.com>
Date: Tue, 18 Sep 2012 09:57:28 -0700 (PDT)
Local: Tues, Sep 18 2012 12:57 pm
Subject: Re: Finding Interesting People / Nested Map Reduce Mayhem?

Hi Jeff,

Do you need the ranges to be +- 5 years of the particular user, or can you
"bin" the users into groups (see the previous comment for an example of
binning)? The latter procedure will drastically simplify the aggregation
command.

>Does that loop occur entirely on the mongodb process? or does mongocli

have to fetch all the records back into the client side to run the loop?

The client does not fetch all of the records at once; instead, the server
returns documents in batches.  The cursor iterates through the batch
client-side, and if additional records are required to meet the query, a
getMore() command is issued to the server:
http://www.mongodb.org/display/DOCS/Queries+and+Cursors#QueriesandCur...

To post a message you must first join this group.
You do not have the permission required to post.
More options Sep 28 2012, 8:50 am
From: Alaa Qutaish <alaa.quta...@gmail.com>
Date: Fri, 28 Sep 2012 05:50:14 -0700 (PDT)
Local: Fri, Sep 28 2012 8:50 am
Subject: Re: Finding Interesting People / Nested Map Reduce Mayhem?

Thumb up for this question.

Actually I am running into the same problem, I have a situation where i
have to calcualte the trending songs for every user using mapReduce, and i
need way a iterate over the whole users and execute the map/Reduce over
their IDs.

my query is like this:

db.runCommand({"mapreduce":"trending_users", "map":map, "reduce":reduce,
"scope":{user_id: "111111111"} ,"query":{'value.friends':{\$in :
['111111111'] }},'out':{inline:1}})

I am trying to find a way to execute the same query in a loop of user's
IDs.