query and intersection

233 views
Skip to first unread message

Nicolas Clairon

unread,
May 18, 2011, 7:44:26 AM5/18/11
to mongod...@googlegroups.com
Hi,

I got a use case and I wonder if it is a possible senario that MongoDB
can handle natively :

Here's a bunch of documents:

{'_id':1, 'kw': ['a', 'c', 'f', 'g']}
{'_id':2, 'kw': ['a', 'b', 'c, 'd', 'e'']}
{'_id':3, 'kw': ['a', 'e', 'f', d']}

Now, I have an array of value and I'd like to get the documents that
best match the array.

par instance, ['a', 'b', e'] will return the document "2" and ['a',
'c', 'g'] the document "1"

Basically, it's like doing an intersection for each document and
return the one which get the bigger intersection.

Is it something that can be done easily (via query) with mongodb ? Any
hints for doing this efficiently ?

Thanks,

N.

Andreas Jung

unread,
May 18, 2011, 7:48:51 AM5/18/11
to mongod...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nicolas Clairon wrote:
> Hi,
>
> I got a use case and I wonder if it is a possible senario that MongoDB
> can handle natively :
>
> Here's a bunch of documents:
>
> {'_id':1, 'kw': ['a', 'c', 'f', 'g']}
> {'_id':2, 'kw': ['a', 'b', 'c, 'd', 'e'']}
> {'_id':3, 'kw': ['a', 'e', 'f', d']}
>
> Now, I have an array of value and I'd like to get the documents that
> best match the array.
>
> par instance, ['a', 'b', e'] will return the document "2" and ['a',
> 'c', 'g'] the document "1"
>
> Basically, it's like doing an intersection for each document and
> return the one which get the bigger intersection.

What you get from MongDB is the $all operator for returning the results
of the intersections without any further filtering. I don't think that
you can do further filtering or ranking on the MongoDB level - so
perform further filtering client-side.

- -aj
- --
ZOPYX Limited | zopyx group
Charlottenstr. 37/1 | The full-service network for Zope & Plone
D-72070 T�bingen | Produce & Publish
www.zopyx.com | www.produce-and-publish.com
- ------------------------------------------------------------------------
E-Publishing, Python, Zope & Plone development, Consulting


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQGUBAEBAgAGBQJN07IjAAoJEADcfz7u4AZjR6QLwIm6OsH2hRg0tGc3AfrDnMKW
jqaPzGxquFxOH2xhrTtZJaRkgK2FvmYOlDTUrq42nX5nhbk1E4yjx1nuDNmtIAdl
Q51eO1Ml1QSn4QuLmMk5Nxug8fO5phIrdr6H4tSWf5ZNNLXLbSXB0iPuC2GRzVvc
NZwxnjJNPQ95+E2OvwV+E9sYTlchRJsWPPIpvnar+GeDeYwWi35Dxm8AX4LTw3RD
WBvBMeb7MS06Vu59465n/nJ7qjjJN6D5UoAgTFS5umdDOYw/hHfxGETGEOW9jV7O
4MdlBVGgPgyQl3CfATT6N33s/hJNIAvsiCM/Q3t0xX/CTbR9fkQkt+muExkJqIZN
InOVYkasH5V+gqy8XAwseDmLjM1U8KDZ5w8ZXDiCjcMk5QfSTqKLsBT/Or7H8G99
T+cxLp25QLrwU1s1sPGOKx+n3eiPazHA0Y0jnAtdViepKH7Fg5YTmAscd8NXJcTW
OhAUa8jBdZ48oUlPtY3k6kQBpH4ELpI=
=tVh3
-----END PGP SIGNATURE-----

lists.vcf

Nicolas Clairon

unread,
May 18, 2011, 7:56:06 AM5/18/11
to mongod...@googlegroups.com
This is an issue because I have > 3M documents and I need to perform
this query on the fly. Doing it on the client side would take forever
and will blow the ram.

I anyone have a solution for this I would love to hear what it is. At
the moment, I'm watching at ElasticSearch...

Cheers,

N.

> D-72070 Tübingen        | Produce & Publish


> www.zopyx.com           | www.produce-and-publish.com
> - ------------------------------------------------------------------------
> E-Publishing, Python, Zope & Plone development, Consulting
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iQGUBAEBAgAGBQJN07IjAAoJEADcfz7u4AZjR6QLwIm6OsH2hRg0tGc3AfrDnMKW
> jqaPzGxquFxOH2xhrTtZJaRkgK2FvmYOlDTUrq42nX5nhbk1E4yjx1nuDNmtIAdl
> Q51eO1Ml1QSn4QuLmMk5Nxug8fO5phIrdr6H4tSWf5ZNNLXLbSXB0iPuC2GRzVvc
> NZwxnjJNPQ95+E2OvwV+E9sYTlchRJsWPPIpvnar+GeDeYwWi35Dxm8AX4LTw3RD
> WBvBMeb7MS06Vu59465n/nJ7qjjJN6D5UoAgTFS5umdDOYw/hHfxGETGEOW9jV7O
> 4MdlBVGgPgyQl3CfATT6N33s/hJNIAvsiCM/Q3t0xX/CTbR9fkQkt+muExkJqIZN
> InOVYkasH5V+gqy8XAwseDmLjM1U8KDZ5w8ZXDiCjcMk5QfSTqKLsBT/Or7H8G99
> T+cxLp25QLrwU1s1sPGOKx+n3eiPazHA0Y0jnAtdViepKH7Fg5YTmAscd8NXJcTW
> OhAUa8jBdZ48oUlPtY3k6kQBpH4ELpI=
> =tVh3
> -----END PGP SIGNATURE-----
>

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Andreas Jung

unread,
May 18, 2011, 9:17:03 AM5/18/11
to mongod...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You can also look at the aggregation docs

http://www.mongodb.org/display/DOCS/Aggregation

and check if there is something suitable but I doubt that you can solve
your issue *soley* on the MongoDB query level.

- -aj

> -aj
>>
- --


You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/mongodb-user?hl=en.
>>
>>

- --

ZOPYX Limited | zopyx group
Charlottenstr. 37/1 | The full-service network for Zope & Plone

D-72070 T�bingen | Produce & Publish


www.zopyx.com | www.produce-and-publish.com
- ------------------------------------------------------------------------
E-Publishing, Python, Zope & Plone development, Consulting


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQGUBAEBAgAGBQJN08bPAAoJEADcfz7u4AZjC6oLvA2JG8Aer1VtB63p8ed3P4Kz
gyceZXYD3K9nmKNcKL41EVMKJVYi3YIJzvSAMLXjaDBGsZbgwR1UODaKvcKInOev
yMMLcSTnlJkgLsjM4hLGvFplv1Oi5QaKMEeqTv1HvPJ8Yewt0wU9U9ifCNc1gwMP
T0fByDy6Yscjsr8zTw8obvnMjwcARdsivCQMpEyVhoniQxXKK2rN6l9WkVoRXTc3
XVBCGXNU+9CLtMyJA3+jAmB6+xtJ/A75s/aA8mBv5md4KqlPUV8mkfIdcNs/LAYD
P/3Bq5K6O1yZkW3mZAgWyuUcONJR6NjbG5XUsLtSxWt2myXe5zRVGzkOOSLQez2u
f0eXtdsiEZcV3vBJXu67Kn5hdgc9uNPyKwTnlTOToMMnZJ+tgHJ7RGf76bIQZ/Q7
ZmcthaDTxUS70VzF/YzgLUkt3SVEv4visMi8+LsD18X3A5AZVzWvayyxhEJnjWlK
TlQDpXWe+X3J6dzvMJFALGjkRHoSGXc=
=+DwH
-----END PGP SIGNATURE-----

lists.vcf

Kyle Banker

unread,
May 18, 2011, 10:33:11 AM5/18/11
to mongodb-user
Andreas is right. You can't do this with the query language alone. If
you have some idea of what you want to match in advance, then you can
perhaps optimize by pre-aggregating. Otherwise, you'll have to use an
aggregation framework (map-reduce, etc.).

I'd recommend creating a jira ticket with this use case so that we can
track.

Kyle
> For more options, visit this group athttp://groups.google.com/group/mongodb-user?hl=en.
>
>
>
> - --
> ZOPYX Limited           | zopyx group
> Charlottenstr. 37/1     | The full-service network for Zope & Plone
> D-72070 T�bingen        | Produce & Publishwww.zopyx.com          |www.produce-and-publish.com
> - ------------------------------------------------------------------------
> E-Publishing, Python, Zope & Plone development, Consulting
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (Darwin)
> Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org/
>
> iQGUBAEBAgAGBQJN08bPAAoJEADcfz7u4AZjC6oLvA2JG8Aer1VtB63p8ed3P4Kz
> gyceZXYD3K9nmKNcKL41EVMKJVYi3YIJzvSAMLXjaDBGsZbgwR1UODaKvcKInOev
> yMMLcSTnlJkgLsjM4hLGvFplv1Oi5QaKMEeqTv1HvPJ8Yewt0wU9U9ifCNc1gwMP
> T0fByDy6Yscjsr8zTw8obvnMjwcARdsivCQMpEyVhoniQxXKK2rN6l9WkVoRXTc3
> XVBCGXNU+9CLtMyJA3+jAmB6+xtJ/A75s/aA8mBv5md4KqlPUV8mkfIdcNs/LAYD
> P/3Bq5K6O1yZkW3mZAgWyuUcONJR6NjbG5XUsLtSxWt2myXe5zRVGzkOOSLQez2u
> f0eXtdsiEZcV3vBJXu67Kn5hdgc9uNPyKwTnlTOToMMnZJ+tgHJ7RGf76bIQZ/Q7
> ZmcthaDTxUS70VzF/YzgLUkt3SVEv4visMi8+LsD18X3A5AZVzWvayyxhEJnjWlK
> TlQDpXWe+X3J6dzvMJFALGjkRHoSGXc=
> =+DwH
> -----END PGP SIGNATURE-----
>
>  lists.vcf
> < 1KViewDownload

Nicolas Clairon

unread,
May 19, 2011, 5:54:02 AM5/19/11
to mongod...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages