query performance very slow while filtering embedding large arrays documents Mongodb

374 views
Skip to first unread message

sai sandeep edpuganti

unread,
Oct 13, 2017, 9:44:48 PM10/13/17
to mongodb-user

In my mongodb collection i have 15 millions documents with following json structure. The embedded document count inside playfields array field changes for each json document. All my queries involves filtering the documents based on data in playfields array field. All queries are taking  more than 2 minutes to execute. below attached sample queries I am running.

Please suggest if i am doing anything wrong in writing queries or missing any indexes or I have to move data from embed documents in single document to multiple collections?


Index created:

db.collection.ensureIndex({ "playfields.ID": 1, "playfields.value": 1 })


Queries running slow:


db.playfieldvalues.find({$and:[

  {playfields: {$elemMatch:{ID:"Play.NHL.NHLHomeTeam" ,value: "BOS BOSTON BRUINS"}}},

  {$or:[ {playfields: {$elemMatch:{ID:"Play.NHL.NHLAwayTeam" ,value: "NYI NEW YORK ISLANDERS"}}},{playfields: {$elemMatch:{ID:"Play.NHL.NHLAwayTeam" ,value: "T.B TAMPA BAY LIGHTNING"}}}]},

  {playfields: {$elemMatch:{ID:"Play.NHL.NHLEventY" ,value: -38}}},

  {playfields: {$elemMatch:{ID:"Play.NHL.NHLEventX" ,value: {$gt: 0}}}},

  {playfields: {$elemMatch:{ID:"Play.NHL.NHLEventScoreDifferential" ,value: {$gt: 0}}}}

  ]

})


db.playfieldvalues.find({playfields: {$elemMatch:{ID:"Play.NHL.NHLHomeTeam" ,value: "BOS BOSTON BRUINS"}}})


JSON Document:

    "_id" : ObjectId("59dbd4c5704aa82e70ac10b5"), 
    "playid" : "2594c658-aa3b-4a98-b2eb-0cc03e4dc9e5", 
    "playfields" : [
        {
            "ID" : "Play.NHL.NHLGameDate", 
            "TS" : "", 
            "value" : NumberInt(20160228)
        }, 
        {
            "ID" : "Play.GameDate", 
            "TS" : "", 
            "value" : "2/28/2016 12:00:00 AM"
        }, 
        {
            "ID" : "Play.NHL.NHLEventType", 
            "TS" : "", 
            "value" : "HIT"
        }, 
        {
            "ID" : "Play.NHL.NHLClockTime", 
            "TS" : "", 
            "value" : "03:08"
        }, 
        {
            "ID" : "Play.NHL.NHLClockTimeSeconds", 
            "TS" : "", 
            "value" : NumberInt(188)
        }, 
        {
            "ID" : "Play.NHL.NHLEventX", 
            "TS" : "", 
            "value" : NumberInt(62)
        }, 
        {
            "ID" : "Play.NHL.NHLEventY", 
            "TS" : "", 
            "value" : NumberInt(-38)
        }, 
        {
            "ID" : "Play.NHL.NHLEventPeriod", 
            "TS" : "", 
            "value" : "1"
        }, 
        {
            "ID" : "Play.NHL.NHLGameCode", 
            "TS" : "", 
            "value" : "20933"
        }, 
        {
            "ID" : "Play.NHL.NHLSeason", 
            "TS" : "", 
            "value" : "20152016"
        }
    ]
}
stackoverflowjson.JPG
queries.JPG

Kevin Adistambha

unread,
Oct 17, 2017, 9:43:20 PM10/17/17
to mongodb-user

Hi

I believe you cross-posted this question in StackOverflow as well: https://stackoverflow.com/questions/46736673/query-performance-while-filtering-embedding-large-arrays-documents-mongodb and there were some suggestions from the community regarding your schema design.

Having said that, I agree with the StackOverflow thread that your schema design is sub-optimal for your query. Without much information, generally it’s best to either:

  1. Separate each item of your array into a separate document, or
  2. Remove the array and put individual items into a separate field in a single document.

Which approach is best would depend on your use case. Since you only posted one example document, it’s difficult to see any pattern in your data.

To optimize your queries and schema design, I would suggest you to peruse these links:

Best regards
Kevin

Reply all
Reply to author
Forward
0 new messages