MongoDB - Querying all tags, only some return

47 views
Skip to first unread message

MKN Web Solutions

unread,
Oct 2, 2012, 10:49:14 PM10/2/12
to mongod...@googlegroups.com
Subject:
So I've got a collection of documents, each containing multiple tags / array (I.e. "apple", "orange", "peach").  There are a total of 50 different tags.  Each document can contain between 1-5 tags.  The "tags" field is also indexed.

A person can search for one or multiple tags (I.e. "apple" or "apple orange").  My system already parses the search input and queries mongodb utilizing {"tags":{"$in":["apple","orange"]}} and returns the correct documents.  


The Issue:
When a search includes every possible tag {"tags":{"$in":["apple","orange","peach","tomato","grape","etc","etc"]}} - it's only returning 29% of the full collection of documents.  I've also tried using '$or', but that ran exactly like '$in'.  Any ideas?

MKN Web Solutions

unread,
Oct 3, 2012, 1:35:40 PM10/3/12
to mongod...@googlegroups.com
Any ideas?

Sam Helman

unread,
Oct 3, 2012, 2:30:37 PM10/3/12
to mongod...@googlegroups.com
Where are you running the code from?  Are you using a language driver or the javascript shell?


On Tuesday, October 2, 2012 10:49:14 PM UTC-4, MKN Web Solutions wrote:

MKN Web Solutions

unread,
Oct 3, 2012, 3:04:46 PM10/3/12
to mongod...@googlegroups.com
I tried on both PHP and javascript shell.  Same results.

Scott Hernandez

unread,
Oct 3, 2012, 4:01:28 PM10/3/12
to mongod...@googlegroups.com
Can you post the output of db.coll.getIndexes(), the count query you
are using, a count query checking for null in the field and the output
of db.coll.distinct(<field-name>)?

Please post a mongo javascript shell session to gist/pastebin/etc?
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

MKN Web Solutions

unread,
Oct 3, 2012, 4:07:45 PM10/3/12
to mongod...@googlegroups.com
Index for "tags":
{
"v" : 1,
"key" : {
"tags" : NumberLong(1)
},
"ns" : "xxx.xxx",
"background" : NumberLong(1),
"name" : "tags"
}


Distinct output: **Note, these tags don't reflect the example I posted above - these are the true tags.
> db.xxx.distinct("tags")
[
{
"0" : "health",
"4" : "retail"
},
"retail",
"yoga",
{
"0" : "health",
"2" : "retail"
},
"photography",
{
"0" : "fitness",
"3" : "retail"
},
{
"0" : "health",
"1" : "massage",
"3" : "retail"
},
{
"0" : "fitness",
"1" : "art",
"2" : "yoga",
"5" : "retail"
},
"sporting goods",
"american",
"massage",
{
"0" : "fitness",
"1" : "yoga",
"3" : "retail"
},
"food",
{
"0" : "skin care",
"2" : "massage",
"5" : "retail"
},
{
"0" : "photography",
"1" : "art",
"6" : "retail"
},
{
"0" : "art",
"2" : "retail"
},
{
"0" : "fitness",
"2" : "health",
"6" : "retail"
},
{
"0" : "spa",
"2" : "massage",
"5" : "retail"
},
{
"0" : "mexican",
"1" : "american",
"3" : "retail"
},
{
"0" : "services",
"1" : "yogurt",
"3" : "food",
"7" : "retail"
},
{
"0" : "fitness",
"2" : "yoga",
"5" : "retail"
},
{
"0" : "fitness",
"1" : "art",
"6" : "retail"
},
{
"0" : "yoga",
"3" : "retail"
},
{
"0" : "american",
"3" : "retail"
},
{
"0" : "mexican",
"1" : "american",
"5" : "retail"
},
{
"0" : "art",
"4" : "retail"
},
{
"0" : "fitness",
"1" : "sporting goods",
"3" : "retail"
},
{
"0" : "adventure",
"1" : "yoga",
"3" : "retail"
},
{
"0" : "golf",
"2" : "yoga",
"5" : "retail"
},
{
"0" : "amusement",
"2" : "fitness",
"5" : "retail"
},
"fitness",
{
"0" : "detailing",
"3" : "retail"
},
{
"0" : "food",
"1" : "art",
"6" : "retail"
},
{
"0" : "tours",
"1" : "wine",
"3" : "yoga",
"7" : "retail"
},
{
"0" : "american",
"1" : "pets",
"5" : "retail"
},
{
"0" : "tours",
"1" : "yoga",
"3" : "retail"
},
{
"0" : "detailing",
"5" : "retail"
},
{
"0" : "photography",
"1" : "sporting goods",
"3" : "retail"
},
{
"0" : "food",
"1" : "american",
"3" : "retail"
},
{
"0" : "services",
"1" : "sporting goods",
"3" : "retail"
},
{
"0" : "fishing",
"1" : "yoga",
"3" : "retail"
},
{
"0" : "fitness",
"1" : "detailing",
"3" : "retail"
},
{
"0" : "tours",
"1" : "food",
"3" : "retail"
},
{
"0" : "hotels",
"1" : "yoga",
"3" : "retail"
},
{
"0" : "wine",
"1" : "massage",
"3" : "retail"
},
{
"0" : "detailing",
"1" : "yoga",
"3" : "retail"
},
{
"0" : "sushi",
"1" : "american",
"3" : "retail"
},
{
"0" : "wine",
"1" : "fitness",
"3" : "retail"
},
"detailing",
{
"0" : "adventure",
"2" : "yoga",
"5" : "retail"
},
{
"0" : "baked goods",
"2" : "food",
"5" : "retail"
}
]


Mongo Query: (All tags)
> db.xxx.find({"tags":{"$in":["massage","hotels","yoga","skin care","massage","yogurt","art","wine","detailing","photography","tours","american","mexican","spa","sporting goods","baked goods","food","hotels","sushi","fishing","amusement","automotive","coffee","golf","pets","adventure","health","fitness","services","retail"]}}).count()
88

Mongo Query: (All)
> db.xxx.find().count()
349

aliane abdelouahab

unread,
Oct 3, 2012, 5:54:19 PM10/3/12
to mongodb-user
can you try "unwind"? here is an example:

db.users.find({"personnel.pseudo":"alucaard"}).distinct("produit_up.spec")
Out[62]:
[{u'abus': 0,
u'date': u'2012-09-30',
u'description': u"portable tr\xe8s solide, peu servi, avec batterie
d'une autonomie de 3 heures.",
u'id': u'alucaard134901952647',
u'namep': u'nokia 3310',
u'nombre': 1,
u'prix': 1000,
u'tags': [u'portable', u'nokia', u'3310'],
u'vendu': False},
{u'abus': 0,
u'date': u'2012-09-30',
u'description': u'\u0646\u0628\u064a\u0639 \u0623\u064a
\u0641\u0648\u0646 \u062c\u062f\u064a\u062f \u0641\u064a
\u0627\u0644\u0628\u0648\u0627\u0637 \u0645\u0639\u0627\u0647
\u0634\u0627\u0631\u062c\u0648\u0631 \u062f\u0648\u0631\u064a\u062c
\u064a\u0646',
u'id': u'alucaard134902092967',
u'namep': u'iphone 3gs',
u'nombre': 1,
u'prix': 20000,
u'tags': [u'iphone', u'3gs', u'apple'],
u'vendu': False},
{u'abus': 0,
u'date': u'2012-09-30',
u'description': u'vends 206 toutes options 2006 hdi.',
u'id': u'alucaard134902099082',
u'namep': u'peugeot 206',
u'nombre': 1,
u'prix': 500000,
u'tags': [u'voiture', u'206', u'hdi'],
u'vendu': False}]


db.users.aggregate([{"$unwind":"$produit_up"},{"$match":
{"produit_up.spec.tags":{"$in":["3310", "iphone", "heloooooo"]}}},
{"$group":{"_id":"$_id","result":
{"$push":"$produit_up.spec.namep"}}}])
Out[63]:
{u'ok': 1.0,
u'result': [{u'_id': ObjectId('5061fab93a5f3a09f4be0e21'),
u'result': [u'nokia 3310', u'iphone 3gs']}]}

On 3 oct, 21:07, MKN Web Solutions <mich...@mknwebsolutions.com>
wrote:
> *Index for "tags":*
> {
> "v" : 1,
> "key" : {
> "tags" : NumberLong(1)},
>
> "ns" : "xxx.xxx",
> "background" : NumberLong(1),
> "name" : "tags"
>
> }
>
> *Distinct output: **Note, these tags don't reflect the example I posted
> above - these are the true tags.*> db.xxx.distinct("tags")
> *Mongo Query: (All tags)*> db.xxx.find({"tags":{"$in":["massage","hotels","yoga","skin
>
> care","massage","yogurt","art","wine","detailing","photography","tours","am erican","mexican","spa","sporting
> goods","baked
> goods","food","hotels","sushi","fishing","amusement","automotive","coffee", "golf","pets","adventure","health","fitness","services","retail"]}}).count( )
> 88
>
> *Mongo Query: (All)*> db.xxx.find().count()
> > > To post to this group, send email to mongod...@googlegroups.com<javascript:>
> > > To unsubscribe from this group, send email to
> > > mongodb-user...@googlegroups.com <javascript:>

Scott Hernandez

unread,
Oct 3, 2012, 10:29:04 PM10/3/12
to mongod...@googlegroups.com
You have tags which are not strings, but which are embedded documents.
You will not be able to find those docs when searching with your list
of string in $in.

I suspect you need to clean up your data, or the way you are
collecting data in your program.

On Wed, Oct 3, 2012 at 4:07 PM, MKN Web Solutions
Message has been deleted

MKN Web Solutions

unread,
Oct 3, 2012, 10:49:09 PM10/3/12
to mongod...@googlegroups.com
Got it - I see what I did there. Going to give this a shot now.

MKN Web Solutions

unread,
Oct 3, 2012, 11:06:52 PM10/3/12
to mongod...@googlegroups.com
Issue has been fixed!

Using "array_unique" via php caused the issue - that new array injected into mongodb becomes a subdoc.  So rid the keys! 
Reply all
Reply to author
Forward
0 new messages