Solving the dynamic attribute challenge properly (design/indexing)

587 views
Skip to first unread message

Kaitsu

unread,
Feb 23, 2012, 2:30:22 AM2/23/12
to mongodb-user
I have following kind of data in MongoDB:

{
"name":"some name",
"attrs":[
{"n":"subject","v":"Some subject"},
{"n":"description","v":"Some great description"},
{"n":"comments","v":"Comments are here!"},
]
}

The attrs array is a container for dynamic attributes, i.e. I don't
beforehand know what kind of attributes are put there. n stands for
name and v stands for value.

The "MongoDB In Action" book describes this design as a solution for
having dynamic attributes in the case where the attributes are
completely upredictable. It also describes that you can index it like
this:

db.mycollection.ensureIndex({"attrs.n":1, "attrs.v":1})

Queries can then be done like this:

db.mycollection.find({attrs: {$elemMatch: {n: "subject", v: "Some
subject"}}})

When i test this, I get very poor performance. I tested with
mycollection having about 2 million documents and having no index
seems to perform better. Here's the explain of the query, which shows
that only the n field is taken from the index:

{
"cursor" : "BtreeCursor attrs.n_1_attrs.v_1",
"nscanned" : 2000202,
"nscannedObjects" : 2000202,
"n" : 2,
"millis" : 4111,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"attrs.n" : [
[
"a_id",
"a_id"
]
],
"attrs.v" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}

If I use multikeys to index it like this:

db.mycollection.ensureIndex({"attrs":1})

and then search like this:

db.mycollection.find({"attrs": {n: "subject", v: "Some subject"}})

the search result is correct and the search uses the multikey index
just fine:

{
"cursor" : "BtreeCursor attrs_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"attrs" : [
[
{
"n" : "a_id",
"v" : "19"
},
{
"n" : "a_id",
"v" : "19"
}
]
]
}
}

So, the question goes: is the dynamic attribute example in the
"MongoDB in Action" wrong and my multikey solution correct? Or is
there some other way to solve this kind of dynamic attribute setting
so that the indexing gives good performance? In my case, it is not
feasible to just have keys like "subject" and "description" and index
them all separately.

Kyle Banker

unread,
Feb 23, 2012, 11:21:41 AM2/23/12
to mongodb-user
Hi Kaitsu,

The strategy you use at the end of your post is perfectly acceptable
as long as you only require exact matches on the 'v' field. If you
need to use any other query operator on 'v' ($in, $gt, etc.), you'll
have to use $elemMatch, but as you've noticed, the index isn't being
used very effectively in that case. I refer you to a SERVER case,
which indicate that this will be fixed very shortly. Please add
yourself as a voter and a watcher on it:

https://jira.mongodb.org/browse/SERVER-3104

- Kyle

Kyle Banker

unread,
Feb 24, 2012, 1:23:25 PM2/24/12
to mongod...@googlegroups.com
Just to add some more detail:

The query used to be efficient in 1.8, but a change in 2.0 to the query engine designed to fix a certain
edge case that returned incorrect results caused a performance regression.

Again, follow the server ticket for progress on this. Here's a link to another, related discussion:

damirv

unread,
Aug 30, 2012, 2:09:01 PM8/30/12
to mongod...@googlegroups.com, ky...@10gen.com
I have the same problem, Jenny seems to have found the solution for me:
https://groups.google.com/d/msg/mongodb-user/RKrsyzRwHrE/uctTn4T6dHgJ

I did not know that you can use $elemMatch with comparison operators (such as $gt, $lt) on objects. This seems to be a solution to this issue. Here is an example (in python):
connection = Connection()
connection.drop_database('testing')
db = connection.testing
db.items.ensure_index("attrs", pymongo.ASCENDING)
db.items.insert({ "attrs" : [{ "v0" : 0 }, { "v2" : 1 }, {"v3":10}] })
db.items.insert({ "attrs" : [{ "v1" : 1 }, { "v2" : 1 }, {"v3":10}] })
db.items.insert({ "attrs" : [{ "v1" : 100 }, { "v2" : 100 }, {"v3":100}] })
db.items.insert({ "attrs" : [{ "v1" : 1000 }, { "v2" : 1000 }, {"v3":1000}] })

# all documents that have "v2" greater than 100, notice the arbitrary upperbound of 10000000 (explained in the post why)

cursor = db.items.find({"attrs":{'$elemMatch':{'$gt':{"v2":100}, '$lt':{"v2":10000000}}}})

# all documents less with v1 less than 50 (again notice the lower bound of 0)
cursor = db.items.find({"attrs":{'$elemMatch':{'$lt':{"v1":50}, '$gt':{"v1":0}}}})

For me this seems to solve the problem.

damirv

unread,
Aug 30, 2012, 2:23:33 PM8/30/12
to mongod...@googlegroups.com, ky...@10gen.com
I now see that it does not work for your example like this (because you don't use $gt).. but when you do it like this:

from pprint import pprint
from pymongo import Connection
import pymongo


connection = Connection()
connection.drop_database('testing')
db = connection.testing
db.items.ensure_index("attrs", pymongo.ASCENDING)

db.items.insert({
    "name":"some name",
    "attrs":[
        {"_n":"subject","_v":"Some subject"},
        {"_n":"description","_v":"Some great description"},
        {"_n":"comments","_v":"Comments are here!"},
        ] })
db.items.insert({

    "name":"some name",
    "attrs":[
        {"_n":"subject","_v":"Other subject"},
        {"_n":"description","_v":"Some great description"},
        {"_n":"comments","_v":"Comments are here!"},
        ] })

cursor = db.items.find({'attrs': {'$elemMatch': {'$gte':{'_n': "subject", '_v': "Some subject"}, '$lte':{'_n': "subject", '_v': "Some subject"}}}})
pprint(cursor.explain())
pprint(list(cursor))


it works! :) (the index bounds are correct)
Reply all
Reply to author
Forward
0 new messages