Mongo queries are slow and yielding/waiting for locks thousand times

Abhishek Somwanshi

unread,

Nov 20, 2018, 5:23:20 PM11/20/18

to mongodb-user

Hello

We are using mongo setup with replica sets on AWS.

Setup Details:

1 Primary node: r3.8xlarge

4 secondary nodes: r5.xlarge

Mongo version: 3.0.8 (WiredTiger)

Database size: 358GB

We have configured mongo cloud monitoring for this setup. Here are few stats from it:

Mongo Primary Network: 350MB/S (In: 40MB/S, Out: 310MB/S, Num of requests: 5.32KB/S)

1.1K Connections on average on primary node

Tickets available on primary: Reads: 125, Write: 100

Queues: Total: 15-20, Read : 0-1, Write: 15-20

Other Points:

CPU and memory stats on the instance look pretty much under control.

We are using 800gb GP2 EBS volume(2400 IOPS) and we are consuming around 2000 IOPS. The burst balance is almost available to full capacity which means that also is not exhausted.

Primary node is of r3.8xlarge type, so it has 10gig network.

Ulimits are set as follows:

Limit Soft Limit Hard Limit Units

Max cpu time unlimited unlimited seconds

Max file size unlimited unlimited bytes

Max data size unlimited unlimited bytes

Max stack size 8388608 unlimited bytes

Max core file size 0 unlimited bytes

Max resident set unlimited unlimited bytes

Max processes 64000 64000 processes

Max open files 64000 64000 files

Max locked memory 65536 65536 bytes

Max address space unlimited unlimited bytes

Max file locks unlimited unlimited locks

Max pending signals 1967994 1967994 signals

Max msgqueue size 819200 819200 bytes

Max nice priority 0 0

Max realtime priority 0 0

Max realtime timeout unlimited unlimited us

Problem:

We are accessing mongo using Java application and we are seeing that a lot of our queries are taking considerable amount of time such as 2-4 seconds.

We enabled profiling and listed slowest queries. In that we found that a sample query for update object, acquired intent lock 17000 times and had to yield those many times.

We are not able to understand if it is intent lock then why does it need to yield so many times? Does intent lock also not allow other operations to proceed?

If the query has to yield so many times, how can we see the advantage of document level lock?

We have also seen some of our queries getting timed out(probably waiting for the lock too long and eventually dying).

Could you please guide us on how to debug this problem and improve the performance of our mongo ?

Here's the profiler output of a sample query:

{
        "op" : "update",
        "ns" : "backend.Scores",
        "query" : {
                "channel.id" : "hkxj",
                "metric" : "YVR",
                "date" : ISODate("2018-11-20T10:00:00Z")
        },
        "updateobj" : {
                "$set" : {
                        "channel" : {
                                "id" : "hkxj",
                                "failures" : 39,
                                "items" : [
                             
                                        {
                                                "_id" : ObjectId("5bf3e434800075956f1"),
                                                "image" : "http://images.abcd.com/something/wscoob.jpg",
                                                "b_time" : ISODate("2018-11-26T19:24:00Z"),
                                                "title" : "What's New ",
                                                "id" : "fq969"
                                        },
                                        {
                                                "_id" : ObjectId("5bf3e43f800075956f0"),
                                                "image" : "http://images.abcd.com/something/w4citv2.jpg",
                                                "broadcast_time" : ISODate("2018-11-26T20:24:00Z"),
                                                "title" : "Let's End This",
                                                "id" : "fsfgd"
                                        }
                                ]
                        },
                        "metric" : "YVR",
                        "date" : ISODate("2018-11-20T10:00:00Z")
                },
                "$setOnInsert" : {
                        "__v" : 0
                }
        },
        "nscanned" : 0,
        "nscannedObjects" : 2209900,
        "nMatched" : 1,
        "nModified" : 1,
        "keyUpdates" : 0,
        "writeConflicts" : 0,
        "numYield" : 17264,
        "locks" : {
                "Global" : {
                        "acquireCount" : {
                                "r" : NumberLong(17266),
                                "w" : NumberLong(17266)
                        }
                },
                "Database" : {
                        "acquireCount" : {
                                "w" : NumberLong(17266)
                        }
                },
                "Collection" : {
                        "acquireCount" : {
                                "w" : NumberLong(17265)
                        }
                },
                "oplog" : {
                        "acquireCount" : {
                                "w" : NumberLong(1)
                        }
                }
        },
        "millis" : 3864,
        "execStats" : {


        },
        "ts" : ISODate("2018-11-20T10:40:25.104Z"),
        "client" : "172.2.3.52",
        "allUsers" : [ ],
        "user" : ""
}

Robert Cochran

unread,

Nov 20, 2018, 6:13:48 PM11/20/18

to mongodb-user

I know this does not answer your issue -- I understand that -- but I want to say that my understanding is that MongoDB version 3.0.8 reached end-of-life in February 2018. You should upgrade to a supported version.

Thanks so much

Bob

Kevin Adistambha

unread,

Nov 22, 2018, 8:33:03 PM11/22/18

to mongodb-user

Hi Abhishek,

I think this part of the output you posted showed what went wrong:

       "nscanned" : 0,
        "nscannedObjects" : 2209900,
        "nMatched" : 1,
        "nModified" : 1,
        "keyUpdates" : 0,
        "writeConflicts" : 0,
        "numYield" : 17264,

So the query scanned 2,209,900 documents to find the single one to update. Since nscanned is zero, I think the collection is not indexed, thus MongoDB was forced to load all documents from disk and check if any of them matches the update criteria. As you can imagine, this is very inefficient. The high number of yields was due to MongoDB waiting for the document to be loaded from disk.

In short, the slow performance has nothing to do with locks. It was due to an unindexed update command in a relatively large collection.

I believe this is the query involved in the update:

        "query" : {
                "channel.id" : "hkxj",
                "metric" : "YVR",
                "date" : ISODate("2018-11-20T10:00:00Z")

You might want to check if adding an index such as:

db.collection.createIndex({channel.id: 1, metric: 1, date: 1})

would help with this query. See Indexes and Indexing Strategies for more information.

Please try this in a development environment before doing this in production. Also remember to take backups before you do any admin work such as this in your prod environment.

I would also like to reiterate Bob’s comment that MongoDB 3.0 is very old and is out of support. Please consider upgrading to a newer version (4.0.4 is the newest currently) for bugfixes and improvements. For upgrading, please see Upgrade to the Latest Revision of MongoDB, noting that MongoDB would have to be upgraded by following major versions progression, e.g. 3.0 -> 3.2 -> 3.4 -> 3.6 -> 4.0.

Best regards,
Kevin

Abhishek Somwanshi

unread,

Nov 26, 2018, 12:30:04 AM11/26/18

to mongodb-user

Thank you Robert and Kevin. I was able to get rid of the query which was scanning million records to return one. Now when I move to next set of slowest queries, there are few update queries which are waiting for lock. What is strange is they are waiting for intent lock(w) for 2.5 seconds. Can you tell me what could be the possible reason for this?

Here's the execution details for one such query:

"writeConflicts" : 0,

"numYield" : 1,

"locks" : {

"Global" : {

"acquireCount" : {

"r" : NumberLong(3),

"w" : NumberLong(3)

}

},

"Database" : {

"acquireCount" : {

"w" : NumberLong(3)

},

"acquireWaitCount" : {

"w" : NumberLong(1)

},

"timeAcquiringMicros" : {

"w" : NumberLong(2497196)

}

},

"Collection" : {

"acquireCount" : {

"w" : NumberLong(2)

}

},

"oplog" : {

"acquireCount" : {

"w" : NumberLong(1)

}

},

"millis" : 2500,

Kevin Adistambha

unread,

Nov 27, 2018, 10:29:05 PM11/27/18

to mongodb-user

Hi Abhishek

According to the What type of locking does MongoDB use, intent lock provides a signal that a thread is intending to do a read or write. This is different from an exclusive lock, which signals the intention to take exclusive access of a resource.

What I understand from the output you posted is that the thread to update yielded 1 time. Typically a thread yields when it’s waiting for something slow to finish, e.g. fetching the needed data from disk. From the output, I see that it’s waiting for an intent lock. Based on this output and the content of the linked page above, I can think of one possibility: the thread is trying to write the update, but it’s forced to wait behind an exclusive lock on that document that was obtained by another thread. Of course, this is probably not the only reason why it’s waiting.

Is this operation generally slow and it’s detrimental to your overall operation? Is this operation always slow, or is it only slow some time? It’s hard to pinpoint what exactly happened when looking at a single occurrence of an event. This could be caused by many things, e.g. slow disk, overburdened server, etc. You might be able to gain a more holistic view of the state of your deployment using tools such as mongostat or mongotop. It’s also best to observe the recommendations in the Production Notes and check if you have any suboptimal setting.

Best regards,
Kevin

Reply all

Reply to author

Forward