2.4 Accessing other Collections in Map() function.

224 views
Skip to first unread message

Gavin Hogan

unread,
Mar 22, 2013, 10:51:41 AM3/22/13
to mongod...@googlegroups.com
In my existing application there is use cases for accessing documents in other Collections during a map function.  The basic idea is that the emitted document is enriched with data from other collections before it is emited.

This was working on prior releases by use of a db.<OtherCollection>.find({_id:this.otherCollectionId}), this is now forbidden in 2.4 and I am at a loss for a solution.

Any help would be appreciated.

Thanks

Gavin Hogan

David Hows

unread,
Mar 28, 2013, 2:15:04 AM3/28/13
to mongod...@googlegroups.com
Hi Gavin,

Can you give me an idea of what errors you are seeing? And what your map reduce code looks like?

Thanks,
David

Dan Pasette

unread,
Mar 28, 2013, 1:16:19 PM3/28/13
to mongod...@googlegroups.com
Hi Gavin,

The ability to use global properties such as "db" was removed in 2.4.  Please see the release notes:

--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb
 
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Sam Millman

unread,
Apr 2, 2013, 6:38:44 AM4/2/13
to mongod...@googlegroups.com
Tbh it is kinda odd you were relying on this functionality, it has been deprecated since about version 1.8 (maybe 1.6 actually), and definitly hasn't been documented since then.

This no known work around, that I can think of. You will need to consider exactly what your doing and why your doing it and restructure appropiately.


On 2 April 2013 09:26, Paul Gallagher <gallagh...@gmail.com> wrote:
Gavin, or others: is there a recommended workaround or a better pattern identified yet?

I've also run into this problem due to the new restriction in 2.4 that prevents access to the "db" global in map-reduce. The pattern all these (now failing) map-reduce jobs implement is I think just what Gavin was dealing with i.e.:
  • map collection A, and lookup values in collection B (using db.) to enrich or alter the values emitted from the map function.
In most cases, these are doing something akin to an outer join, and the foreign key from A->B is unrelated to the key of the main map-reduce process, so I don't think it is possible to use the more common approach of map/reduce A and B independently into a results collection.

Cheers,
Paul 

Sam Millman

unread,
Apr 2, 2013, 6:39:56 AM4/2/13
to mongod...@googlegroups.com
However:


"so I don't think it is possible to use the more common approach of map/reduce A and B independently into a results collection."

Yes you can emit on the fk and mege the foreign row into the parent row in a two setp MR, however, this is probably not recommended tbh.

Paul Gallagher

unread,
Apr 2, 2013, 8:30:03 AM4/2/13
to mongod...@googlegroups.com
> Tbh it is kinda odd you were relying on this functionality, it has been deprecated since about version 1.8 (maybe 1.6 actually), and definitly hasn't been documented since then.

why? because it has worked fine! And was elegant and performant (caveat: I don't have a sharding concern)

You will need to consider exactly what your doing and why your doing it and restructure appropriately.

Yep, that's what I'm trying to figure out. Perhaps if I describe a scenario below, someone can point out a standard approach or pattern that might fit? NB: I've translated this from my domain terminology to something more readily understandable.


Let's say the problem concerns matching and then rating/ranking a huge number of menu combinations of meat + veg dishes (say, for a fast food chain doing menu planning)

The starting point is two collections of data that come from external sources (so we can't change or restructure them): Meat and Veg
 
Meat: is a large collection of meat dishes with attributes like cost, and nutritional detail. Also flagged for either lunch or dinner.
Veg: is a large collection of veg dishes with attributes like cost, and nutritional detail. Also flagged for either lunch or dinner.

What we want to produce (as input for all subsequent map/reduce processing) is a MeatVegCombo collection, which is the cross product of all 'lunch' meat dishes with all 'lunch' veg dishes, and likewise for 'dinner' courses.

At the moment (and now broken in 2.4) MeatVegCombo is manufactured with a map/reduce on Meat that will emit a varying number of Meat/Veg combos for each Meat dish (to do so it needs to lookup the matching Veg options). This is basically tricking map-reduce into doing a cross join of Meat and Veg on matching lunch/dinner flag. (In fact, this was originally done as a script that did the naive iteration to generate the MeatVegCombo collection. But it was redone as map-reduce to push it all into MongoDB, and take advantage of the replace/reduce functionality to achieve 'upsert' type behaviour).

So, of course we could revert to the external scripting technique, but the question is really what is 'best practice' for this kind of requirement, and in particular how can we keep it as a workload for MongoDB and not for a client script.  I suspect the core issue here is that I am trying to get MongoDB to fake a join, but I want a technique that works well for processing huge data volumes.

When I look at discussions concerning this (e.g. http://stackoverflow.com/questions/9618711/accessing-another-collection-in-mongodbs-map-reduce ) then the solutions don't apply. For example:
(a) normalise the data: well, Meat and Veg actually have no relation. The MeatVegCombo I want to manufacture is essentially a fully normalised and independent collection.
(b) multi-part M/R: seems a chicken-and-egg proposition. Neither Meat nor Veg know how to emit the keys required for MeatVegCombo without reference to each other.

Hope that all makes sense. Any guidance would be appreciated.












You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/JAwXtqnDN_8/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to mongodb-user...@googlegroups.com.

Sam Millman

unread,
Apr 2, 2013, 9:08:31 AM4/2/13
to mongod...@googlegroups.com
"why? because it has worked fine! And was elegant and performant (caveat: I don't have a sharding concern)"

Without sharding I can understand. That is why this feature was taken out.

I will look over the scenario and see if I can think of anything.

Paul Gallagher

unread,
Apr 2, 2013, 4:05:30 PM4/2/13
to mongod...@googlegroups.com
On Tuesday, 2 April 2013, Sam Millman wrote:
"why? because it has worked fine! And was elegant and performant (caveat: I don't have a sharding concern)"

Without sharding I can understand. That is why this feature was taken out.

I will look over the scenario and see if I can think of anything.

Thanks Sam, it will be greatly appreciated.

PS: if you have a copy of MapReduce Design Patterns (O'Reilly), what I'm describing basically correspond with the scenarios covered in Section 5: Join Patterns

Gavin Hogan

unread,
Apr 3, 2013, 8:24:23 AM4/3/13
to mongod...@googlegroups.com
TBH I tend to be very careful of using deprecated features, but I cannot recall finding any documentation indicating that this was the case. It seemed pretty obvious to us to use this feature since it was the first thing we thought to try and it worked. I have been using mongodb since 1.6/1.8 so I probably did not notice any release notes depreciating features from 1.4 to 1.6 or earlier (and did not find anything today after a quick review of previous release notes)

There are some cases where I can easily avoid this approach, others that I cannot.  For the record I do not think this is necessarily an invalid concept or feature request.  I can understand that having access to the db object might be too permissive but getting a document from anther collection during map() is very powerful. I would be curious to know if there is a case to be made for supporting such a capability.

Thanks.

Sam Millman

unread,
Apr 3, 2013, 10:33:41 AM4/3/13
to mongod...@googlegroups.com
At the time the exact reasoning what due to sharding concerns and how such a thing would possibly and feasbily work with sharding. I cannot, to be honest, remember the details however, I remember it being something around that.

Ben Becker

unread,
Apr 23, 2013, 1:16:46 PM4/23/13
to mongod...@googlegroups.com
Hi Raman/Paul,

For additional details on why 'db' access was removed, please see SERVER-8104.  Conceptually, MapReduce does not permit arbitrary queries -- it should only rely on the input document set.

Regarding the scenario mentioned by Paul, I think you'll find a faster solution using a client-side script.  I think what you're really after is server-side scripting, which can be accomplished (to some degree) with stored JavaScript functions and db.eval().  Note the details of the 'lock' parameter limitations.

Overall, this is a job best suited for a client-side script.  I would expect the code to be roughly the same complexity as the current script.  It may also be significantly faster, depending on the language used and network performance.

Hope that helps.

     -ben

On Monday, April 22, 2013 1:30:28 PM UTC-7, Raman Gupta wrote:

On Tuesday, April 2, 2013 4:05:30 PM UTC-4, Paul Gallagher wrote:
On Tuesday, 2 April 2013, Sam Millman wrote:
"why? because it has worked fine! And was elegant and performant (caveat: I don't have a sharding concern)"


+1 Same issue here.
 
Without sharding I can understand. That is why this feature was taken out.

I will look over the scenario and see if I can think of anything.

Thanks Sam, it will be greatly appreciated.


Has anyone had a chance to look at this yet?
 
PS: if you have a copy of MapReduce Design Patterns (O'Reilly), what I'm describing basically correspond with the scenarios covered in Section 5: Join Patterns

Regards,
Raman
 
Reply all
Reply to author
Forward
0 new messages