Indexing of Embedded Documents

300 views
Skip to first unread message

Ron Yosipovich

unread,
Jan 28, 2015, 3:14:43 PM1/28/15
to mongod...@googlegroups.com
Hello, I have a question about how mongodb indexes embedded documents. 
Say I would like to index the product embedded doc, what fields are indexed?
Will the product.attributes.similar array be indexed as well?
How can I know what fields are indexes when applying an indexes (what command)?

{  
   "status":{  
      "version":"3.1",
      "code":"200",
      "message":"free",
      "find":"0049000006582",
      "deepSearch":"2014-12-20 01:26:03",
      "run":"0.4056"
   },
   "product":{  
      "attributes":{  
         "product":"Diet Coke, 12-Ounce Cans (Pack Of 24)",
         "description":"12oz Can",
         "price_new":"7.9900",
         "price_new_extra":"USD",
         "price_new_extra_long":"US Dollars",
         "price_new_extra_id":"537",
         "asin_com":"B004J 
XDCC2",
         "category":"51",
         "category_text":"Food",
         "category_text_long":"Food",
         "long_desc":"Diet Coke is the most popular sugar-free soft drink in America. It's the original sparkling beverage for those who want great flavor without the calories\u2014a drink for those with great taste.",
         "features":"<ul><li>Diet Coke is the most popular sugar-free soft drink in America<\/li><li>It's the original sparkling beverage for those who want great flavor without the calories<\/li><\/ul>",
         "binding":"Grocery",
         "similar":[  
            "0049000001327",
            "5449000131836",
            "0049000006346",
            "0049000042528",
            "0049000001310",
            "0049000012507"
         ],
         "language":"553",
         "language_text":"en",
         "language_text_long":"English"
      },
      "EAN13":"0049000006582",
      "UPCA":"049000006582",
      "UPCE":"04965802",
      "barcode":{  
         "EAN13":"http:\/\/eandata.com\/image\/0049000006582.png",
         "UPCA":"http:\/\/eandata.com\/image\/049000006582.png",
         "UPCE":"http:\/\/eandata.com\/image\/04965802.png"
      },
      "locked":"0",
      "modified":"2014-12-20 01:26:03",
      "image":"http:\/\/eandata.com\/image\/products\/004\/900\/000\/0049000006582.jpg"
   },
   "company":{  
      "name":"The Coca-Cola Company",
      "logo":"",
      "url":"www.coca-colacompany.com\/",
      "address":"PO Box 1734 Atlanta, GA, USA 30313",
      "phone":"8004382653",
      "locked":"0"
   }
}

Will Berkeley

unread,
Jan 28, 2015, 3:56:52 PM1/28/15
to mongod...@googlegroups.com
When you index a field, you index the values of that field. For an embedded document, that means that you index the embedded documents that are the values of the field. You do not index any of the fields of the embedded documents. This has some (usually) unexpected results and means that you almost always want to index a field or a combination of fields in the embedded document and not the embedded document field. It'll be easiest to explain with an example from the mongo shell:

> db.test.drop()
> db.test.insert({ "x" : { "a" : 1, "b" : 2 } })
> db.test.insert({ "x" : { "b" : 2, "a" : 1 } })
> db.test.ensureIndex({ "x" : 1 })

If the index on x indexed the field embedded in x, we might expect the query

> db.test.find({ "x.a" : 1, "x.b" : 2 })

to use the index. But it doesn't, as you can check using .explain

> db.test.find({ "x.a" : 1, "x.b" : 2 }).explain()

{
    "cursor" : "BasicCursor",
    "isMultiKey" : false,
    "n" : 2,
    ...
}

Instead, the index keys are the entire embedded document, and field order matters:

> db.test.find({ "x" : { "a" : 1, "b" : 2 } })
{ "_id" : ObjectId(...), "x" : { "a" : 1, "b" : 2 } }

> db.test.find({ "x" : { "a" : 1, "b" : 2 } }).explain()
{
    "cursor" : "BtreeCursor x_1",
    "isMultiKey" : false,
    "n" : 1,
    ...
    "indexBounds" : {
    "x" : [
            [ { "a" : 1, "b" : 2 },
              { "a" : 1, "b" : 2 }]
        ]
    },
    ...
}

> db.test.find({ "x" : { "b" : 2, "a" : 1 } })
{ "_id" : ObjectId(...), "x" : { "b" : 2, "a" : 1 } }

To summarize by answering your question about how to know what fields are indexed, the fields that are indexed are precisely the fields you specify in the index command. For a field that has values that are embedded docs, this means the indexed value is the embedded document. If you want to index a subfield of an embedded document, use dot notation. For example, to create an index to fulfill the first query in the example

db.test.find({ "x.a" : 1, "x.b" : 2 })

create an index on { "x.a" : 1, "x.b" : 1 } or { "x.b" : 1, "x.a" : 1 }.

-Will

Reply all
Reply to author
Forward
0 new messages