Mongo queries are slow and yielding/waiting for locks thousand times

220 views
Skip to first unread message

Abhishek Somwanshi

unread,
Nov 20, 2018, 5:23:20 PM11/20/18
to mongodb-user
Hello

We are using mongo setup with replica sets on AWS. 
Setup Details:
1 Primary node: r3.8xlarge
4 secondary nodes: r5.xlarge
Mongo version: 3.0.8 (WiredTiger)
Database size: 358GB

We have configured mongo cloud monitoring for this setup. Here are few stats from it:

Mongo Primary Network: 350MB/S (In: 40MB/S, Out: 310MB/S, Num of requests: 5.32KB/S) 
1.1K Connections on average on primary node
Tickets available on primary: Reads: 125, Write: 100
Queues: Total: 15-20, Read : 0-1, Write: 15-20


Other Points:
CPU and memory stats on the instance look pretty much under control. 
We are using 800gb GP2 EBS volume(2400 IOPS) and we are consuming around 2000 IOPS. The burst balance is almost available to full capacity which means that also is not exhausted.
Primary node is of r3.8xlarge type, so it has 10gig network.
Ulimits are set as follows:

Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             64000                64000                processes
Max open files            64000                64000                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       1967994              1967994              signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us


Problem: 
We are accessing mongo using Java application and we are seeing that a lot of our queries are taking considerable amount of time such as 2-4 seconds.
We enabled profiling and listed slowest queries. In that we found that a sample query for update object, acquired intent lock 17000 times and had to yield those many times. 
We are not able to understand if it is intent lock then why does it need to yield so many times? Does intent lock also not allow other operations to proceed?
If the query has to yield so many times, how can we see the advantage of document level lock?
We have also seen some of our queries getting timed out(probably waiting for the lock too long and eventually dying).
Could you please guide us on how to debug this problem and improve the performance of our mongo ?

Here's the profiler output of a sample query:
{
       
"op" : "update",
       
"ns" : "backend.Scores",
       
"query" : {
               
"channel.id" : "hkxj",
               
"metric" : "YVR",
               
"date" : ISODate("2018-11-20T10:00:00Z")
       
},
       
"updateobj" : {
               
"$set" : {
                       
"channel" : {
                               
"id" : "hkxj",
                               
"failures" : 39,
                               
"items" : [
                             
                                       
{
                                               
"_id" : ObjectId("5bf3e434800075956f1"),
                                               
"image" : "http://images.abcd.com/something/wscoob.jpg",
                                               
"b_time" : ISODate("2018-11-26T19:24:00Z"),
                                               
"title" : "What's New ",
                                               
"id" : "fq969"
                                       
},
                                       
{
                                               
"_id" : ObjectId("5bf3e43f800075956f0"),
                                               
"image" : "http://images.abcd.com/something/w4citv2.jpg",
                                               
"broadcast_time" : ISODate("2018-11-26T20:24:00Z"),
                                               
"title" : "Let's End This",
                                               
"id" : "fsfgd"
                                       
}
                               
]
                       
},
                       
"metric" : "YVR",
                       
"date" : ISODate("2018-11-20T10:00:00Z")
               
},
               
"$setOnInsert" : {
                       
"__v" : 0
               
}
       
},
       
"nscanned" : 0,
       
"nscannedObjects" : 2209900,
       
"nMatched" : 1,
       
"nModified" : 1,
       
"keyUpdates" : 0,
       
"writeConflicts" : 0,
       
"numYield" : 17264,
       
"locks" : {
               
"Global" : {
                       
"acquireCount" : {
                               
"r" : NumberLong(17266),
                               
"w" : NumberLong(17266)
                       
}
               
},
               
"Database" : {
                       
"acquireCount" : {
                               
"w" : NumberLong(17266)
                       
}
               
},
               
"Collection" : {
                       
"acquireCount" : {
                               
"w" : NumberLong(17265)
                       
}
               
},
               
"oplog" : {
                       
"acquireCount" : {
                               
"w" : NumberLong(1)
                       
}
               
}
       
},
       
"millis" : 3864,
       
"execStats" : {


       
},
       
"ts" : ISODate("2018-11-20T10:40:25.104Z"),
       
"client" : "172.2.3.52",
       
"allUsers" : [ ],
       
"user" : ""
}



Robert Cochran

unread,
Nov 20, 2018, 6:13:48 PM11/20/18
to mongodb-user
I know this does not answer your issue -- I understand that -- but I want to say that my understanding is that MongoDB version 3.0.8 reached end-of-life in February 2018. You should upgrade to a supported version. 

Thanks so much

Bob

Kevin Adistambha

unread,
Nov 22, 2018, 8:33:03 PM11/22/18
to mongodb-user

Hi Abhishek,

I think this part of the output you posted showed what went wrong:

       "nscanned" : 0,
        "nscannedObjects" : 2209900,
        "nMatched" : 1,
        "nModified" : 1,
        "keyUpdates" : 0,
        "writeConflicts" : 0,
        "numYield" : 17264,

So the query scanned 2,209,900 documents to find the single one to update. Since nscanned is zero, I think the collection is not indexed, thus MongoDB was forced to load all documents from disk and check if any of them matches the update criteria. As you can imagine, this is very inefficient. The high number of yields was due to MongoDB waiting for the document to be loaded from disk.

In short, the slow performance has nothing to do with locks. It was due to an unindexed update command in a relatively large collection.

I believe this is the query involved in the update:

        "query" : {
                "channel.id" : "hkxj",
                "metric" : "YVR",
                "date" : ISODate("2018-11-20T10:00:00Z")

You might want to check if adding an index such as:

db.collection.createIndex({channel.id: 1, metric: 1, date: 1})

would help with this query. See Indexes and Indexing Strategies for more information.

Please try this in a development environment before doing this in production. Also remember to take backups before you do any admin work such as this in your prod environment.

I would also like to reiterate Bob’s comment that MongoDB 3.0 is very old and is out of support. Please consider upgrading to a newer version (4.0.4 is the newest currently) for bugfixes and improvements. For upgrading, please see Upgrade to the Latest Revision of MongoDB, noting that MongoDB would have to be upgraded by following major versions progression, e.g. 3.0 -> 3.2 -> 3.4 -> 3.6 -> 4.0.

Best regards,
Kevin

Abhishek Somwanshi

unread,
Nov 26, 2018, 12:30:04 AM11/26/18
to mongodb-user
Thank you Robert and Kevin. I was able to get rid of the query which was scanning million records to return one. Now when I move to next set of slowest queries, there are few update queries which are waiting for lock. What is strange is they are waiting for intent lock(w) for 2.5 seconds. Can you tell me what could be the possible reason for this?
Here's the execution details for one such query:
    "writeConflicts" : 0,
    "numYield" : 1,
    "locks" : {
        "Global" : {
            "acquireCount" : {
                "r" : NumberLong(3),
                "w" : NumberLong(3)
            }
        },
        "Database" : {
            "acquireCount" : {
                "w" : NumberLong(3)
            },
            "acquireWaitCount" : {
                "w" : NumberLong(1)
            },
            "timeAcquiringMicros" : {
                "w" : NumberLong(2497196)
            }
        },
        "Collection" : {
            "acquireCount" : {
                "w" : NumberLong(2)
            }
        },
        "oplog" : {
            "acquireCount" : {
                "w" : NumberLong(1)
            }
        }
    },
    "millis" : 2500,

Kevin Adistambha

unread,
Nov 27, 2018, 10:29:05 PM11/27/18
to mongodb-user

Hi Abhishek

According to the What type of locking does MongoDB use, intent lock provides a signal that a thread is intending to do a read or write. This is different from an exclusive lock, which signals the intention to take exclusive access of a resource.

What I understand from the output you posted is that the thread to update yielded 1 time. Typically a thread yields when it’s waiting for something slow to finish, e.g. fetching the needed data from disk. From the output, I see that it’s waiting for an intent lock. Based on this output and the content of the linked page above, I can think of one possibility: the thread is trying to write the update, but it’s forced to wait behind an exclusive lock on that document that was obtained by another thread. Of course, this is probably not the only reason why it’s waiting.

Is this operation generally slow and it’s detrimental to your overall operation? Is this operation always slow, or is it only slow some time? It’s hard to pinpoint what exactly happened when looking at a single occurrence of an event. This could be caused by many things, e.g. slow disk, overburdened server, etc. You might be able to gain a more holistic view of the state of your deployment using tools such as mongostat or mongotop. It’s also best to observe the recommendations in the Production Notes and check if you have any suboptimal setting.

Best regards,
Kevin

Reply all
Reply to author
Forward
0 new messages