Schema question: best bet for join-like sorting.

43 views
Skip to first unread message

AndyW

unread,
May 8, 2012, 10:44:10 PM5/8/12
to mongod...@googlegroups.com
Let's say I have a collection of entries and a collection of sources.  Each entry has one source from the source collection associated with it, and each source has a trust number of 1-10 which can change (though it will likely not change very much).  When extracting the entries for presentation, I want to be able to constrain them by time range (the last 24 hours, say), but also have them sorted by most trusted source first.  In a relational DB, this would be done with a join on the sources table.  Obviously, this is not possible in mongoDB.  What is the best way to do this?  Is it best to just map-reduce all entries to insert the trust number directly into the document and then do incremental map-reduce to map new entries?

What happens if I do want to change a source's trust number?  Do I have to just know that I need to run another complete map-reduce manually, or is there a facility to accomplish this automatically?  I would normally just use the client to do the linking, but that removes the ability to sort.

Ideas?  Again, the source trust numbers will not change often, so expense isn't as much of an issue, but they do need to be able to change.

Andreas Jung

unread,
May 8, 2012, 11:43:57 PM5/8/12
to mongod...@googlegroups.com


AndyW wrote:
> Let's say I have a collection of entries and a collection of sources.
> Each entry has one source from the source collection associated with
> it, and each source has a trust number of 1-10 which can change (though
> it will likely not change very much). When extracting the entries for
> presentation, I want to be able to constrain them by time range (the
> last 24 hours, say), but also have them sorted by most trusted source
> first. In a relational DB, this would be done with a join on the
> sources table. Obviously, this is not possible in mongoDB. What is the
> best way to do this? Is it best to just map-reduce all entries to
> insert the trust number directly into the document and then do
> incremental map-reduce to map new entries?

http://www.mongodb.org/display/DOCS/Schema+Design

Standard answer: use embedded documents or perform multiple queries.

-aj

--
ZOPYX Limited | Python | Zope | Plone | MongoDB
Charlottenstr. 37/1 | Consulting & Development
D-72070 T�bingen | Electronic Publishing Solutions
www.zopyx.com | Scalable Web Solutions
--------------------------------------------------
Produce & Publish - www.produce-and-publish.com


AndyW

unread,
May 9, 2012, 4:00:50 AM5/9/12
to mongod...@googlegroups.com
That's a start, but I could use a bit more information.  If by "embedded documents" you are referring to derived nesting, your post doesn't address best practices when I need to change the number for a particular source and have it reflected across the document database.  It's sounding like I will need to update all documents containing that source, if it's derived.  I could simply store the source key name in each, but how would I arbitrarily order their output by the value that key points to in the sources collection?

If, for instance, I have a sources collection:

{"_id" : ObjectId( "4a92af2db3d09cb83d985f6f") , "source_one" : 1, "metadata" : "blah blah"}
{"_id" : ObjectId( "4a92af2db3d09cb83d985f70") , "source_two" : 10, "metadata" : "and so on"}
{"_id" : ObjectId( "4a92af2db3d09cb83d985f71") , "source_three" : 20, "metadata" : "who cares"}

And a document collection:

{"_id" : ObjectId( "9cb83d985f6f4a92af2db3d0") , "important_value" : "foo", "source" : "source_two"}
{"_id" : ObjectId( "9cb83d985f6f4a92af2db3d1") , "important_value" : "bar", "source" : "source_one"}
{"_id" : ObjectId( "9cb83d985f6f4a92af2db3d2") , "important_value" : "baz", "source" : "source_three"}
{"_id" : ObjectId( "9cb83d985f6f4a92af2db3d3") , "important_value" : "wibble", "source" : "source_one"}

I would like to be able to write a query that returns important values in source-sorted order
{ "important_value" : "baz" }
{ "important_value" : "foo" }
{ "important_value" : "bar" }
{ "important_value" : "wibble"}

is group() what I'm looking for?

-Andy

Kevin Matulef

unread,
May 9, 2012, 2:46:59 PM5/9/12
to mongod...@googlegroups.com
You have a few options for handling this, and the best method is going to depend on your access patterns.  

One solution, as you're aware, is just to store the trust number itself inside each "entry" document.  The easiest way to do this might be to change your application code so that it pre-computes the trust number (perhaps by querying the "sources" collection) before inserting each new entry. If you did this, you'd also have to change your application code so that whenever the trust number of a particular source changes, you issue a corresponding multi-update to the "entries" collection (see the multi=true option here: http://www.mongodb.org/display/DOCS/Updating ) to change all entries associated with that source.  This could be an expensive operation, but if trust numbers don't change very often, it might be a small price to pay.

Another option is to query the "entries" collection and get the result sorted by source, then step through the results and query the "sources" collection for each trust number you need to look up, then have your application code sort by that.  This definitely requires more work on the part of your application code, and more round-trip queries to the server (you might be able to reduce the amount of round-trip communication by locally caching the results of the trust number lookups for common sources, but this sounds more complicated than you need).

Since in your case the trust numbers don't change very often and I think you'd like to optimize for fast reads, it seems like the first solution would be preferable. 
Reply all
Reply to author
Forward
0 new messages