Message from discussion
Schema question: best bet for join-like sorting.
Date: Wed, 9 May 2012 11:46:59 -0700 (PDT)
From: Kevin Matulef <matu...@10gen.com>
To: mongodb-user@googlegroups.com
Message-ID: <24783806.934.1336589219554.JavaMail.geo-discussion-forums@yncd3>
In-Reply-To: <30716840.93.1336550450650.JavaMail.geo-discussion-forums@yndz2>
References: <2730140.3176.1336531450807.JavaMail.geo-discussion-forums@ynff7>
<30716840.93.1336550450650.JavaMail.geo-discussion-forums@yndz2>
Subject: Re: Schema question: best bet for join-like sorting.
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_932_11784065.1336589219552"
------=_Part_932_11784065.1336589219552
Content-Type: multipart/alternative;
boundary="----=_Part_933_16528135.1336589219553"
------=_Part_933_16528135.1336589219553
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
You have a few options for handling this, and the best method is going to
depend on your access patterns.
One solution, as you're aware, is just to store the trust number itself
inside each "entry" document. The easiest way to do this might be to
change your application code so that it pre-computes the trust number
(perhaps by querying the "sources" collection) before inserting each new
entry. If you did this, you'd also have to change your application code so
that whenever the trust number of a particular source changes, you issue a
corresponding multi-update to the "entries" collection (see the multi=true
option here: http://www.mongodb.org/display/DOCS/Updating ) to change all
entries associated with that source. This could be an expensive operation,
but if trust numbers don't change very often, it might be a small price to
pay.
Another option is to query the "entries" collection and get the result
sorted by source, then step through the results and query the "sources"
collection for each trust number you need to look up, then have your
application code sort by that. This definitely requires more work on the
part of your application code, and more round-trip queries to the server
(you might be able to reduce the amount of round-trip communication by
locally caching the results of the trust number lookups for common sources,
but this sounds more complicated than you need).
Since in your case the trust numbers don't change very often and I think
you'd like to optimize for fast reads, it seems like the first solution
would be preferable.
On Wednesday, May 9, 2012 4:00:50 AM UTC-4, AndyW wrote:
>
> That's a start, but I could use a bit more information. If by "embedded
> documents" you are referring to derived nesting, your post doesn't address
> best practices when I need to change the number for a particular source and
> have it reflected across the document database. It's sounding like I will
> need to update all documents containing that source, if it's derived. I
> could simply store the source key name in each, but how would I arbitrarily
> order their output by the value that key points to in the sources
> collection?
>
> If, for instance, I have a sources collection:
>
> {"_id" : ObjectId( "4a92af2db3d09cb83d985f6f") , "source_one" : 1,
> "metadata" : "blah blah"}
> {"_id" : ObjectId( "4a92af2db3d09cb83d985f70") , "source_two" : 10,
> "metadata" : "and so on"}
> {"_id" : ObjectId( "4a92af2db3d09cb83d985f71") , "source_three" : 20,
> "metadata" : "who cares"}
>
> And a document collection:
>
> {"_id" : ObjectId( "9cb83d985f6f4a92af2db3d0") , "important_value" :
> "foo", "source" : "source_two"}
> {"_id" : ObjectId( "9cb83d985f6f4a92af2db3d1") , "important_value" :
> "bar", "source" : "source_one"}
> {"_id" : ObjectId( "9cb83d985f6f4a92af2db3d2") , "important_value" :
> "baz", "source" : "source_three"}
> {"_id" : ObjectId( "9cb83d985f6f4a92af2db3d3") , "important_value" :
> "wibble", "source" : "source_one"}
>
> I would like to be able to write a query that returns important values in
> source-sorted order
> { "important_value" : "baz" }
> { "important_value" : "foo" }
> { "important_value" : "bar" }
> { "important_value" : "wibble"}
>
> is group() what I'm looking for?
>
> -Andy
>
> On Tuesday, May 8, 2012 10:44:10 PM UTC-4, AndyW wrote:
>>
>> Let's say I have a collection of entries and a collection of sources.
>> Each entry has one source from the source collection associated with it,
>> and each source has a trust number of 1-10 which can change (though it will
>> likely not change very much). When extracting the entries for
>> presentation, I want to be able to constrain them by time range (the last
>> 24 hours, say), but also have them sorted by most trusted source first. In
>> a relational DB, this would be done with a join on the sources table.
>> Obviously, this is not possible in mongoDB. What is the best way to do
>> this? Is it best to just map-reduce all entries to insert the trust number
>> directly into the document and then do incremental map-reduce to map new
>> entries?
>>
>> What happens if I do want to change a source's trust number? Do I have
>> to just know that I need to run another complete map-reduce manually, or is
>> there a facility to accomplish this automatically? I would normally just
>> use the client to do the linking, but that removes the ability to sort.
>>
>> Ideas? Again, the source trust numbers will not change often, so expense
>> isn't as much of an issue, but they do need to be able to change.
>>
>
------=_Part_933_16528135.1336589219553
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
You have a few options for handling this, and the best method is going to d=
epend on your access patterns. <div><br></div><div>One solution, as y=
ou're aware, is just to store the trust number itself inside each "entry" d=
ocument. The easiest way to do this might be to change your applicati=
on code so that it pre-computes the trust number (perhaps by querying the "=
sources" collection) before inserting each new entry. If you did this, you'=
d also have to change your application code so that whenever the trust numb=
er of a particular source changes, you issue a corresponding multi-update t=
o the "entries" collection (see the multi=3Dtrue option here: http://w=
ww.mongodb.org/display/DOCS/Updating ) to change all entries associated wit=
h that source. This could be an expensive operation, but if trust num=
bers don't change very often, it might be a small price to pay.</div><div><=
br></div><div>Another option is to query the "entries" collection and get t=
he result sorted by source, then step through the results and query the "so=
urces" collection for each trust number you need to look up, then have your=
application code sort by that. This definitely requires more work on=
the part of your application code, and more round-trip queries to the serv=
er (you might be able to reduce the amount of round-trip communication by l=
ocally caching the results of the trust number lookups for common sources, =
but this sounds more complicated than you need).<br><br>Since in your case =
the trust numbers don't change very often and I think you'd like to optimiz=
e for fast reads, it seems like the first solution would be preferable.&nbs=
p;<br><br>On Wednesday, May 9, 2012 4:00:50 AM UTC-4, AndyW wrote:<blockquo=
te class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left:=
1px #ccc solid;padding-left: 1ex;">That's a start, but I could use a bit m=
ore information. If by "embedded documents" you are referring to deri=
ved nesting, your post doesn't address best practices when I need to change=
the number for a particular source and have it reflected across the docume=
nt database. It's sounding like I will need to update all documents c=
ontaining that source, if it's derived. I could simply store the sour=
ce key name in each, but how would I arbitrarily order their output by the =
value that key points to in the sources collection?<div><br></div><div>If, =
for instance, I have a sources collection:</div><div><br></div>{"_id" : Obj=
ectId( "4a92af2db3d09cb83d985f6f") , "source_one" : 1, "metadata" : "blah b=
lah"}<div>{"_id" : ObjectId( "4a92af2db3d09cb83d985f70") , "source_two" : 1=
0, "metadata" : "and so on"}</div><div>{"_id" : ObjectId( "4a92af2db3d09cb8=
3d985f71") , "source_three" : 20, "metadata" : "who cares"}<br><div><br></d=
iv><div>And a document collection:</div><div><br></div><div>{"_id" : Object=
Id( "9cb83d985f6f4a92af2db3d0") , "important_value" : "foo", "source" : "so=
urce_two"}</div><div>{"_id" : ObjectId( "9cb83d985f6f4a92af2db3d1") , "impo=
rtant_value" : "bar", "source" : "source_one"}<br></div><div>{"_id" : Objec=
tId( "9cb83d985f6f4a92af2db3d2") , "important_value" : "baz", "source" : "s=
ource_three"}<br></div><div>{"_id" : ObjectId( "9cb83d985f6f4a92af2db3d3") =
, "important_value" : "wibble", "source" : "source_one"}<br></div><div><br>=
</div><div>I would like to be able to write a query that returns important =
values in source-sorted order</div><div>{ "important_value" : "baz" }</div>=
<div>{ "important_value" : "foo" }</div><div>{ "important_value" : "bar" }<=
/div><div>{ "important_value" : "wibble"}</div><div><br></div><div>is group=
() what I'm looking for?</div><div><br></div><div>-Andy</div><div><br></div=
><div>On Tuesday, May 8, 2012 10:44:10 PM UTC-4, AndyW wrote:<blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #cc=
c solid;padding-left:1ex">Let's say I have a collection of entries and a co=
llection of sources. Each entry has one source from the source collec=
tion associated with it, and each source has a trust number of 1-10 which c=
an change (though it will likely not change very much). When extracti=
ng the entries for presentation, I want to be able to constrain them by tim=
e range (the last 24 hours, say), but also have them sorted by most trusted=
source first. In a relational DB, this would be done with a join on =
the sources table. Obviously, this is not possible in mongoDB. =
What is the best way to do this? Is it best to just map-reduce all en=
tries to insert the trust number directly into the document and then do inc=
remental map-reduce to map new entries?<div><br></div><div>What happens if =
I do want to change a source's trust number? Do I have to just know t=
hat I need to run another complete map-reduce manually, or is there a facil=
ity to accomplish this automatically? I would normally just use the c=
lient to do the linking, but that removes the ability to sort.<div><br></di=
v><div>Ideas? Again, the source trust numbers will not change often, =
so expense isn't as much of an issue, but they do need to be able to change=
.</div></div></blockquote></div></div></blockquote></div>
------=_Part_933_16528135.1336589219553--
------=_Part_932_11784065.1336589219552--