Indexing embedded JSON objects inside an array - how does it work?

3,310 views
Skip to first unread message

Aleksei T

unread,
Feb 22, 2011, 6:51:21 PM2/22/11
to mongod...@googlegroups.com
So according to the docs, if array is part of an index, all its elements are indexed, which is great.

http://www.mongodb.org/display/DOCS/Multikeys

In the examples, most of the time you see a "tagging" scenario, like this:
tags: ['weather', 'hot', 'record', 'april']
However, there is just one quick section on the indexing of JSON objects inside an array:

Embedded object fields in an array

Additionally the same technique can be used for fields in embedded objects:

> db.posts.find( { "comments.author" : "julie" } )
{"title" : "How the west was won" ,
"comments" : [{"text" : "great!" , "author" : "sam"},
{"text" : "ok" , "author" : "julie"}],
"_id" : "497ce79f1ca9ca6d3efca325"}

For example, having the following array, how can I make sure it's elements are indexed and can be efficiently queried:

"records" : [
{ "name" : "status" , "value" : "200" },
{ "name" : "URL" , "value" : "www.cnn.com" },
{ "name" : "customerId" , "value" : "12345" }
]

What would the index need to be if I want to use this field "records" in an index to be able to query efficiently like this:

find( { ... "records.name":"status", "records.value":"200" })

Would I just use the array field in the index and assume it will index all the element accordingly (seems to be implied in documentation) or would I have to explicitly declare a compound index on "records.name" and "records.value" ?

I hope just using the "records" array field in an index will do all the indexing to be able to use the automatically created indices in the query.

Thank you!

Gates

unread,
Feb 22, 2011, 7:24:31 PM2/22/11
to mongodb-user
So, you can index on "records.name", but you're probably not going to
get the type of index you want. You can test this by creating the
index and then running your query with .explain().

Looking at your "records" data you have created an _array of objects_.
However, in your example, each _object_ is also a key-value pair.

Is there a reason you're not just structuring the data like the
following?

"records" : {
"status" : "200",
"URL" : "www.cnn.com",
"customerId" : "12345"
}

Using the above data you would be able to index "records.status". Then
do a *find( { "records.status" : "200" } ). This will use the index on
"records.status".

Hopefully that points you in the right direction.

- Gates

On Feb 22, 3:51 pm, Aleksei T <alex...@gmail.com> wrote:
> So according to the docs, if array is part of an index, all its elements are
> indexed, which is great.
>
> http://www.mongodb.org/display/DOCS/Multikeys
>
> In the examples, most of the time you see a "tagging" scenario, like this:
>
> tags: ['weather', 'hot', 'record', 'april']
>
> However, there is just one quick section on the indexing of JSON objects
> inside an array:
> Embedded object fields in an array
>
> Additionally the same technique can be used for fields in embedded objects:
>
> > db.posts.find( { "comments.author" : "julie" } )
>
> {"title" : "How the west was won" ,
>  "comments" : [{"text" : "great!" , "author" : "sam"},
>                {"text" : "ok" , "author" : "julie"}],
>  "_id" : "497ce79f1ca9ca6d3efca325"}
>
> For example, having the following array, how can I make sure it's elements
> are indexed and can be efficiently queried:
>
> "records" : [
> { "name" : "status" , "value" : "200" },
> { "name" : "URL" , "value" : "www.cnn.com" },
> { "name" : "customerId" , "value" : "12345" }
> ]
>
> What would the index need to be if I want to use this field "records" in an
> index to be able to query efficiently like this:
>
> *find( { ... "records.name":"status", "records.value":"200" })*

Aleksei T

unread,
Feb 22, 2011, 7:53:11 PM2/22/11
to mongod...@googlegroups.com
Thanks Gates.  One little problem (and probably the reason we thought of using the array) is that these embedded objects can have different names and values, it's really one of the main use cases for using MongoDB - this flexibility. 

So, if we went with the sub-document approach (vs array), one document could have a set of name/values that is different in the names and number of elements than another, for example, these two documents:

{
"id:123

"records" : {
  "status" : "200",
  "URL" : "www.cnn.com",
  "customerId" : "12345"
}
"bytes" : 2500
"requests" : 546
}
vs.
{
"id":456
"records" : {
  "contentType" : "img/jpg",
  "status" : "404",
}
"bytes" : 45600
"requests" : 2300
}

Please note - we do know all the possible name options that may appear in the record, but any document may have an arbitrary number of them AND they could be in arbitrary order - so, we know there are maybe 10 different names that could appear (status, URL, content type, country, zipcode, etc.).  One thing is that the documents will have a fixed number and order of these for each "id".  So, in the example above, we know for sure that all documents for id 123 will have that specific schema, and all documents for id 456 will have theirs.

A million dollar question is how would we be able to ensure an index such that it would index these elements in each document according to its structure. 

The way these documents get created and populated are via continuous update $inc calls as data arrives (updating the measures - e.g. bytes and requests incrementally).

So, how could we ensure the right index for each document?  I thought indexing on the array.name and array.value would achieve what we want?  Is such compound index possible at all?

Alternatively, if we were to go with the array, we could just create an array of strings with name/value combined in each element, e.g.

"records" : [ "status|200" , "URL|www.cnn.com" , "customerId|12345" ]

And we would use an index on the array, according to docs, and it would index all the strings accordingly.  On the application side, we would have to post-process the strings to split them on "|" and get the value out, pretty much.

What's THE answer here? :)

Eliot Horowitz

unread,
Feb 23, 2011, 5:34:52 AM2/23/11
to mongod...@googlegroups.com
A common pattern is to do this
{ a : [ { status : 200 } ] }
index on a
then do
find( { a : { status : 200 } } )

That's if I understand the question.

> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/mongodb-user?hl=en.
>

Aleksei T

unread,
Feb 23, 2011, 11:37:44 AM2/23/11
to mongod...@googlegroups.com
Great, thanks, that's what I was hoping would be doable.

So, just to confirm, if we have these two documents:

{ id : 1, records : [ { status : 200, type : "img/jpg" } ] }
{ id : 1, records : [ { url : "cnn.com", customerId : 456, city : "Paris" } ] }

By indexing "records" field, all the following finds will be using that index to find each document:

find( { records : { status : 200 } } ) -- first element in array, OK
find( { records : { type : "img/jpg" } } ) -- not the first element in the array, will work?
find( { records : { city : "Paris", url : "cnn.com" } } )  -- order is different, will work?

Thanks for the help!

Aleksei T

unread,
Feb 23, 2011, 11:58:17 AM2/23/11
to mongod...@googlegroups.com
Based on some testing I think what I am seeing is this:

find( { records : { status : 200 } } ) -- first element - works
find( { records : { type : "img/jpg" } } ) -- not first element, works
find( { records : { city : "Paris", url : "cnn.com" } } )  -- multiple elements, arbitrary order - does not work

So, it's limited to only finding by one element in the array regardless of its position, right?  I cannot find by multiple elements?

Keith Branton

unread,
Feb 23, 2011, 12:00:26 PM2/23/11
to mongod...@googlegroups.com
Does your last test work if the order matches? i.e. is it multiple elements or arbitrary order that is breaking it?

Keith Branton

unread,
Feb 23, 2011, 12:08:47 PM2/23/11
to mongod...@googlegroups.com
I'd have thought you would need to use $elemMatch for these queries as you're not trying to match a complete document. I'm a bit surprised any of them work.


I have no idea how $elemMatch affects index usage.

Scott Hernandez

unread,
Feb 23, 2011, 12:15:17 PM2/23/11
to mongod...@googlegroups.com
When document matching is used the order of the fields is significant.

{a:1, b:1} != {b:1, a:1}

When comparisons are done on the document they are basically a binary compare.

On Wed, Feb 23, 2011 at 9:00 AM, Keith Branton <ke...@branton.co.uk> wrote:
> Does your last test work if the order matches? i.e. is it multiple elements
> or arbitrary order that is breaking it?
>

Keith Branton

unread,
Feb 23, 2011, 12:17:18 PM2/23/11
to mongod...@googlegroups.com
I did some tests with your examples. On 1.7.5 none of your queries return a result. If $elemMatch is used they all return the correct results but don't use the index.

If a query for an exact matching document is used it works and uses the index:

.find({records:{status:200,type:"img/jpg"}}).explain() 

Scott Hernandez

unread,
Feb 23, 2011, 12:21:28 PM2/23/11
to mongod...@googlegroups.com
On Wed, Feb 23, 2011 at 9:17 AM, Keith Branton <ke...@branton.co.uk> wrote:
> I did some tests with your examples. On 1.7.5 none of your queries return a
> result. If $elemMatch is used they all return the correct results but don't
> use the index.

You can't use a whole document index to match fields within that doc.
It would require the match code to decode each document in the index
to do a comparison for the just the fields you are looking for; it is
something that could be useful, but it doesn't work that way now. If
you want that behavior you are better off creating a compound index
with just the fields you are concerned about.

> If a query for an exact matching document is used it works and uses the
> index:
> .find({records:{status:200,type:"img/jpg"}}).explain()
>

Keith Branton

unread,
Feb 23, 2011, 12:52:30 PM2/23/11
to mongod...@googlegroups.com
It looks like you may be better off restructuring your documents to store these attributes as key/value maps...

.insert({ id : 1, records : [ {k:"status", v:200},{k: "type",v : "img/jpg"} ] })  
.insert({ id : 1, records : [ {k:"url", v:"cnn.com"},{k: "customerId",v : 456},{k:"city",v:"Paris"} ] })

Then add an index...

.ensureIndex({records:1})    

Then your queries become...

.find({records:{$all:[{k:"status",v:200}]}})
.find({records:{$all:[{k:"type",v:"img/jpg"}]}})
.find({records:{$all:[{k:"city",v:"Paris"},{k:"url",v:"cnn.com"}]}})

... all work and DO use the index.

Obviously you don't need the $all when only matching on a single attribute but it doesn't seem to matter and may make query generation simpler.

Regards,

Keith. 

Keith Branton

unread,
Feb 23, 2011, 1:04:56 PM2/23/11
to mongod...@googlegroups.com
@Scott, Thanks for jumping in!
 
You can't use a whole document index to match fields within that doc.

It would be nice if the $elemMatch docs mentioned that it can't take advantage of a "whole document index". Would have saved a bit of time.
 
It is something that could be useful, but it doesn't work that way now. 

Do you want to open a jira for that or should I? It seems like it's a deficiency with some nasty workarounds - either 
  1. Create an arbitrary number of compound indexes to speed up all the slow queries - very difficult to keep track of since the OP stated that there are many possible keys in these documents, and not very scalable because of the increase in working set size and degradation in update performance from maintaining all these indexes
  2. Restructure your data to take advantage of what Mongo is able to do today, but losing a lot of the intuitiveness of a clean, simple data structure.
Neither particularly appealing imo.

Scott Hernandez

unread,
Feb 23, 2011, 1:49:33 PM2/23/11
to mongod...@googlegroups.com
If you are going to use a key/value pattern then you probably don't
want to use the index like that -- a full document index.

Once you have broken it down into k/v then you can just create a
compound index. The reason Eliot suggested the full document index was
since you didn't have a known set of fields.

Aleksei T

unread,
Feb 23, 2011, 1:51:59 PM2/23/11
to mongod...@googlegroups.com
Great, thanks, that seems to work, wow (at least retrieving all the matching records).  Nice.  But yes, seems like a more intuitive mechanism would be very useful. 

Thanks for all the help, this is quite mind-stimulating :)

Scott Hernandez

unread,
Feb 23, 2011, 1:54:10 PM2/23/11
to mongod...@googlegroups.com
On Wed, Feb 23, 2011 at 10:04 AM, Keith Branton <ke...@branton.co.uk> wrote:
> @Scott, Thanks for jumping in!
>
>>
>> You can't use a whole document index to match fields within that doc.
>
> It would be nice if the $elemMatch docs mentioned that it can't take
> advantage of a "whole document index". Would have saved a bit of time.

Nothing can reach into values within an indexed field (like an
embedded document). For example, regex queries that need to check
within a string can't use an index either. It is the same concept.

>>
>> It is something that could be useful, but it doesn't work that way now.
>
> Do you want to open a jira for that or should I? It seems like it's a
> deficiency with some nasty workarounds - either
>
> Create an arbitrary number of compound indexes to speed up all the slow
> queries - very difficult to keep track of since the OP stated that there are
> many possible keys in these documents, and not very scalable because of the
> increase in working set size and degradation in update performance from
> maintaining all these indexes
> Restructure your data to take advantage of what Mongo is able to do today,
> but losing a lot of the intuitiveness of a clean, simple data structure.
>
> Neither particularly appealing imo.
>

Aleksei T

unread,
Feb 23, 2011, 1:58:03 PM2/23/11
to mongod...@googlegroups.com
I would love to use Eliot's suggestion, as it is definitely simple and intuitive, but I would need the ability to find documents with a match on an arbitrary number of fields in the order that may be different than what's in the document.  It seems like with that approach I could only match on a single field, but if I did multiple in matching or non-matching order, it didnt work.

If the array approach does work with a compound index, that's fine, too, as the queries will be generated programmatically, so their "clumsiness" is less important than ability to use the index and allow querying by multiple elements in random order.

Scott Hernandez

unread,
Feb 23, 2011, 2:38:34 PM2/23/11
to mongod...@googlegroups.com
The key to doing what you want is to store one value in each array
element and to index the array (elements).

> db.test3.save({_id:1, "records" : [ {name:"ralph"}, {age:"old"}, {hair:"brown"}]})
> db.test3.save({_id:2, "records" : [ {name:"scott"}, {age:"old enough"}, {hair:"brown"}]})
> db.test3.save({_id:3, "records" : [ {name:"chris"}, {eyes:"blue"}, {hair:"green"}]})
>db.test3.ensureIndex({records:1})

Now you can search for documents with any number of record elements.

> db.test3.find({records:{hair:"brown"}}) // 2 results
> db.test3.find({records:{$all:[{name:"scott"}, {hair:"brown"}]}}) //just scott

This is very much what Kieth suggested but there is no need to add the
extra k/v fields. Normally when you do the k/v thing you would index
with a compound index on both values, but this is probably simpler and
easier to understand.


On Wed, Feb 23, 2011 at 10:58 AM, Aleksei T <ale...@gmail.com> wrote:
> I would love to use Eliot's suggestion, as it is definitely simple and
> intuitive, but I would need the ability to find documents with a match on an
> arbitrary number of fields in the order that may be different than what's in
> the document.  It seems like with that approach I could only match on a
> single field, but if I did multiple in matching or non-matching order, it
> didnt work.

You can match on multiple fields but the order does matter; and that
can be complicated.

> If the array approach does work with a compound index, that's fine, too, as
> the queries will be generated programmatically, so their "clumsiness" is
> less important than ability to use the index and allow querying by multiple
> elements in random order.
>

Keith Branton

unread,
Feb 23, 2011, 3:24:49 PM2/23/11
to mongod...@googlegroups.com
Nice solution Scott! [head-slap moment]. It's obvious when you see it :) Thanks for sharing.

Aleksei T

unread,
Feb 23, 2011, 3:29:18 PM2/23/11
to mongod...@googlegroups.com
Great, thanks a lot guys, this seems to be promising, appreciate the quick help on this.  This group of people here is pretty amazing, just like MongoDB :)

Aleksei T

unread,
Feb 23, 2011, 4:27:57 PM2/23/11
to mongod...@googlegroups.com
Alright, to finalize this, here is what I have done following the recommendation -
  - populated some rows with data
  - called ensureIndex({records:1})

Here is what's in my collection:

> r.find()
{ "_id" : ObjectId("4d6574e985904ac21ab43fc8"), "bytes" : 10, "id" : 1, "records" : [ { "status" : 302 }, { "url" : "cnn.com" }, { "custid" : 456 } ] }
{ "_id" : ObjectId("4d6574fe85904ac21ab43fcc"), "bytes" : 10, "id" : 1, "records" : [ { "status" : 200 }, { "url" : "cnn.com" }, { "custid" : 789 } ] }
{ "_id" : ObjectId("4d65750485904ac21ab43fcd"), "bytes" : 10, "id" : 1, "records" : [ { "status" : 200 }, { "url" : "yahoo.com" }, { "custid" : 789 } ] }
{ "_id" : ObjectId("4d65750a85904ac21ab43fce"), "bytes" : 10, "id" : 1, "records" : [ { "status" : 404 }, { "url" : "yahoo.com" }, { "custid" : 789 } ] }
{ "_id" : ObjectId("4d65795585904ac21ab43fcf"), "bytes" : 30, "id" : 1, "records" : [ { "status" : 200 }, { "url" : "cnn.com" }, { "custid" : 123 } ] }
{ "_id" : ObjectId("4d6574ef85904ac21ab43fc9"), "bytes" : 10, "id" : 1, "records" : [ { "status" : 302 }, { "url" : "yahoo.com" }, { "custid" : 456 } ] }
{ "_id" : ObjectId("4d6574f485904ac21ab43fca"), "bytes" : 10, "id" : 1, "records" : [ { "status" : 200 }, { "url" : "yahoo.com" }, { "custid" : 456 } ] }
{ "_id" : ObjectId("4d6574fa85904ac21ab43fcb"), "bytes" : 10, "id" : 1, "records" : [ { "status" : 200 }, { "url" : "cnn.com" }, { "custid" : 456 } ] }

> r.find({records:{ $all:[ {status:200},{custid:456} ] }})
{ "_id" : ObjectId("4d6574f485904ac21ab43fca"), "bytes" : 10, "id" : 1, "records" : [ { "status" : 200 }, { "url" : "yahoo.com" }, { "custid" : 456 } ] }
{ "_id" : ObjectId("4d6574fa85904ac21ab43fcb"), "bytes" : 10, "id" : 1, "records" : [ { "status" : 200 }, { "url" : "cnn.com" }, { "custid" : 456 } ] }

> r.find({records:{ $all:[ {status:200},{custid:456} ] }}).explain()
{
    "cursor" : "BtreeCursor records_1",
    "nscanned" : 5,
    "nscannedObjects" : 5,
    "n" : 2,
    "millis" : 0,
    "indexBounds" : {
        "records" : [
            [
                {
                    "status" : 200
                },
                {
                    "status" : 200
                }
            ]
        ]
    }
}
>

So everything seems to be working, actually.  Interestingly enough, it even works if I query out of order - now that is great, actually:

> r.find({records:{ $all:[ {custid:456},{status:200} ] }})                                          
{ "_id" : ObjectId("4d6574f485904ac21ab43fca"), "bytes" : 10, "id" : 1, "records" : [ { "status" : 200 }, { "url" : "yahoo.com" }, { "custid" : 456 } ] }
{ "_id" : ObjectId("4d6574fa85904ac21ab43fcb"), "bytes" : 10, "id" : 1, "records" : [ { "status" : 200 }, { "url" : "cnn.com" }, { "custid" : 456 } ] }
> r.find({records:{ $all:[ {custid:456},{status:200} ] }}).explain()
{
    "cursor" : "BtreeCursor records_1",
    "nscanned" : 4,
    "nscannedObjects" : 4,
    "n" : 2,
    "millis" : 0,
    "indexBounds" : {
        "records" : [
            [
                {
                    "custid" : 456
                },
                {
                    "custid" : 456
                }
            ]
        ]
    }
}

One last question - the explain() statement in the above examples - does info in "indexBounds" mean it is using the index?  How do you know if the query is indeed using an index?

Scott Hernandez

unread,
Feb 23, 2011, 4:39:32 PM2/23/11
to mongod...@googlegroups.com, Aleksei T

The order doesn't matter in the array. The part where the order
mattered is the field order, {a:1, b:1} (the order of a, b in that
doc). Since all of your embedded docs are single value the order is
always the same. If you started to store more than one field in the
array elements then it will problematic.

>> r.find({records:{ $all:[ {custid:456},{status:200} ]
>> }})
> { "_id" : ObjectId("4d6574f485904ac21ab43fca"), "bytes" : 10, "id" : 1,
> "records" : [ { "status" : 200 }, { "url" : "yahoo.com" }, { "custid" : 456
> } ] }
> { "_id" : ObjectId("4d6574fa85904ac21ab43fcb"), "bytes" : 10, "id" : 1,
> "records" : [ { "status" : 200 }, { "url" : "cnn.com" }, { "custid" : 456 }
> ] }
>> r.find({records:{ $all:[ {custid:456},{status:200} ] }}).explain()
> {
>     "cursor" : "BtreeCursor records_1",
>     "nscanned" : 4,
>     "nscannedObjects" : 4,
>     "n" : 2,
>     "millis" : 0,
>     "indexBounds" : {
>         "records" : [
>             [
>                 {
>                     "custid" : 456
>                 },
>                 {
>                     "custid" : 456
>                 }
>             ]
>         ]
>     }
> }
>
> One last question - the explain() statement in the above examples - does
> info in "indexBounds" mean it is using the index?  How do you know if the
> query is indeed using an index?

The "cursor : BtreeCursor records_1" part means it is using a
BTreeCursor called records_1

Aleksei T

unread,
Feb 23, 2011, 4:55:18 PM2/23/11
to mongod...@googlegroups.com, Aleksei T
Yes, got it, excellent, this is a big relief for me, means we can actually structure and query the data the way we'd like to, and it will use the index.  Pretty good, guys.

Also, as I understand from this the explain output part of this example:


> r.find({records:{ $all:[ {custid:456},{status:200} ] }})                                          
{ "_id" : ObjectId("4d6574f485904ac21ab43fca"), "bytes" : 10, "id" : 1, "records" : [ { "status" : 200 }, { "url" : "yahoo.com" }, { "custid" : 456 } ] }
{ "_id" : ObjectId("4d6574fa85904ac21ab43fcb"), "bytes" : 10, "id" : 1, "records" : [ { "status" : 200 }, { "url" : "cnn.com" }, { "custid" : 456 } ] }

> r.find({records:{ $all:[ {custid:456},{status:200} ] }}).explain()
{
    "cursor" : "BtreeCursor records_1",
    "nscanned" : 4,
    "nscannedObjects" : 4,
    "n" : 2,
    "millis" : 0,
    "indexBounds" : {
        "records" : [
            [
                {
                    "custid" : 456
                },
                {
                    "custid" : 456
                }
            ]
        ]
    }
}
It means it used the index on the first element "custid":456, returned 4 documents, then scanned those 4 docs and returned 2 with status 200.  So it applied the index on the first element and still needs to do the scan on the resulting documents (much smaller number of course) to filter out the rest of the query conditions (status:200 in this case).

Is that a correct understanding?  Or did it use the index further on the 4 docs to get the status:200 ones?  I guess I am wondering if it behaves like a compound index in that sense?  Very interesting to know how this actually works :)  Thanks guys, you are extremely helpful.

Scott Hernandez

unread,
Feb 23, 2011, 5:03:08 PM2/23/11
to mongod...@googlegroups.com

Yes, and yes; I'm not sure what you mean by works like a compound
index. Since it is a multivalue field it is a little more complicated
and in some ways restrictive. For each value in the array and index
value is created. In order to do an all it could require scanning the
index for all of the criteria and then taking a union, or just search
for the first term and then analyzing the docs for the rest of the
criteria. I believe ATM mongodb does that later only.

Aleksei T

unread,
Feb 23, 2011, 5:12:35 PM2/23/11
to mongod...@googlegroups.com
Great, thanks, So, it would use the index on the first element in the find() query to get back  the matching documents (4 in this case), but then on these 4 documents, does it use the index again on the second find() element (status:200) within these 4 documents, or it does the "full scan" of the 4 docs and evaluate "status" field to see if it's 200 or not.  It would seem like if it used the index again on the sub-set of documents matching the 1st element it would be much faster than doing a full scan of those?  If it applies the index repetitively on each element match, it would be great and I would put this to rest, finally! :))  Thanks again, I am a Mongo newbie, so please forgive my ignorance :)

Scott Hernandez

unread,
Feb 23, 2011, 5:22:31 PM2/23/11
to mongod...@googlegroups.com
It walks through those 4 docs. The difference between nScanned and
nScannedObjects values would indicate the relative scanning.

It is up to the indexing system to choose the best behavior; as I
mentioned, that is the way things currently work, but in the future
the query optimizer could choose a different path. ATM it is rather
simplistic and optimizations will be added.

Aleksei T

unread,
Feb 23, 2011, 5:44:58 PM2/23/11
to mongod...@googlegroups.com
Great, got it, thanks a lot - so the indexing system will choose the field with the most effective index (max cardinality, probably?) to apply to the find() and then scan for the rest of the match terms OR would it always scan by the first element regardless of whether it is indeed most effective. Thank you!

Scott Hernandez

unread,
Feb 23, 2011, 5:53:27 PM2/23/11
to mongod...@googlegroups.com
Unfortunately at the moment it selects the first. The query optimizer
doesn't currently take cardinality into concern when making choices.
Instead it will run multiple concurrent queries and remember the best
performing plans; this is not applicable to your data with a single
field, but you can imaging how it applies when multiple fields are
used where there is more than one index to choose from.

Aleksei T

unread,
Feb 23, 2011, 6:02:19 PM2/23/11
to mongod...@googlegroups.com
OK, thanks a lot, Scott, very detailed information that is very helpful.  I imagine there is a lot of clever work/design in progress for indexing as this would be one of those huge impact areas for a lot of MongoDB users.  If you guys can figure out a smart indexing mechanism for arbitrary arrays and embedded documents that would take into account cardinality and other things that would be glorious :)  Thanks again!

Scott Hernandez

unread,
Feb 23, 2011, 6:07:20 PM2/23/11
to mongod...@googlegroups.com
If this still doesn't perform to your needs feel free to add a jira
issue (http://jira.mongodb.org); or you can just create one now so you
can get notified when optimizations are made.

There are a huge number of improvements/optimizations that can be made..

Sometimes the first step is get the indexes working at all for certain
use-cases :)

Aleksei T

unread,
Feb 23, 2011, 6:39:46 PM2/23/11
to mongod...@googlegroups.com
Thanks for the suggestion, just created the JIRA issue, hopefully it makes sense :)

http://jira.mongodb.org/browse/SERVER-2617


arieljake

unread,
Apr 1, 2012, 3:35:35 AM4/1/12
to mongod...@googlegroups.com
Hi everyone. I just finished reading this exchange, and just in case anyone is still monitoring the discussion, I'd like to ask a newbie question.

The original poster said:

One little problem (and probably the reason we thought of using the array) is that these embedded objects can have different names and values
For example, these two documents:

{
"id:123
"records" : { 
  "status" : "200", 
  "URL" : "www.cnn.com", 
  "customerId" : "12345" 
}
"bytes" : 2500
"requests" : 546
}
vs.
{
"id":456
"records" : { 
  "contentType" : "img/jpg", 
  "status" : "404", 
}
"bytes" : 45600
"requests" : 2300
}

But I don't understand why _not_ to  make a flat data structure where each document has different attributes, like:

{
"id:123
"status" : "200", 
"URL" : "www.cnn.com", 
"customerId" : "12345" 
"bytes" : 2500
"requests" : 546
}
and
{
"id":456
"contentType" : "img/jpg", 
"status" : "404", 
"bytes" : 45600
"requests" : 2300
}

I mean, if you create an index on status, they both have the value. Does something unexpected happen if you index on contentType as well, since the first record does not have that field? Thanks, I'm just trying to wrap my head around this NoSQL stuff.


Eliot Horowitz

unread,
Apr 2, 2012, 12:43:00 AM4/2/12
to mongod...@googlegroups.com
In that example flat would be ok, but there are many others where
embedding is good for management, etc..
Also opens up arrays down the line.
You may want to open a new thread for any follow up.

> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.

> To view this discussion on the web visit
> https://groups.google.com/d/msg/mongodb-user/-/L7Q7z1gTE9QJ.

Reply all
Reply to author
Forward
0 new messages