High cpu usage on secondary node while others almost idle

745 views
Skip to first unread message

Assaf Lavie

unread,
May 14, 2012, 5:52:53 AM5/14/12
to mongod...@googlegroups.com
mongod seems to be constantly at 90% cpu on one of the replica set's nodes. There's practically no real load on the mongo server, and the other nodes (including the primary) hover quietly around 2% CPU...

I've checked disk space and RAM. Plenty of those left. There's almost no IO, except for the heartbeat and the log being written.

Can't for the life of me figure out why this particular node is so busy. Logs show nothing except the heartbeat every second or so. Adding -vvvvv didn't reveal anything, either.

Adam C

unread,
May 14, 2012, 7:45:16 AM5/14/12
to mongod...@googlegroups.com
Any chance you are replicating a capped collection over to that node?

What is the output of db.currentOp() on that node (especially compared to others)?

Adam

Assaf Lavie

unread,
May 16, 2012, 11:36:23 AM5/16/12
to mongod...@googlegroups.com
Yes, definitely a chance of that. This replica set is being used, among other things, for logging.

currentOp shows this:

{
        "inprog" : [
                {
                        "opid" : 4747,
                        "active" : false,
                        "waitingForLock" : false,
                        "op" : "none",
                        "ns" : "",
                        "query" : {

                        },
                        "client" : "",
                        "desc" : "rsSync",
                        "threadId" : "0x7eff347fc700",
                        "numYields" : 1
                }
        ]
}

And arbitrarily breaking into mongod with gdb seems to always stop at:

#0  0x00000000008c3e3a in mongo::ps::Rolling::access(unsigned long, short, bool) ()
#1  0x00000000008c3bd6 in mongo::Record::accessed() ()
#2  0x000000000086eb03 in mongo::CoveredIndexMatcher::matches(mongo::BSONObj const&, mongo::DiskLoc const&, mongo::MatchDetails*, bool) ()
#3  0x000000000086ed78 in mongo::CoveredIndexMatcher::matchesCurrent(mongo::Cursor*, mongo::MatchDetails*) ()
#4  0x00000000009794dc in mongo::UpdateOp::next() ()
#5  0x00000000008c9a96 in mongo::QueryPlanSet::Runner::nextOp(mongo::QueryOp&) ()
#6  0x00000000008d9930 in mongo::QueryPlanSet::Runner::next() ()
#7  0x00000000008d9f86 in mongo::QueryPlanSet::Runner::runUntilFirstCompletes() ()
#8  0x00000000008ddc3e in mongo::QueryPlanSet::runOp(mongo::QueryOp&) ()
#9  0x00000000008df5f5 in mongo::MultiPlanScanner::runOpOnce(mongo::QueryOp&) ()
#10 0x00000000008dfaf0 in mongo::MultiCursor::nextClause() ()
#11 0x00000000008e1bcd in mongo::MultiCursor::MultiCursor(char const*, mongo::BSONObj const&, mongo::BSONObj const&, boost::shared_ptr<mongo::MultiCursor::CursorOp>, bool) ()
#12 0x000000000095f542 in mongo::_updateObjects(bool, char const*, mongo::BSONObj const&, mongo::BSONObj, bool, bool, bool, mongo::OpDebug&, mongo::RemoveSaver*) ()
#13 0x00000000009642d5 in mongo::updateObjects(char const*, mongo::BSONObj const&, mongo::BSONObj, bool, bool, bool, mongo::OpDebug&) ()
#14 0x00000000008339cf in mongo::applyOperation_inlock(mongo::BSONObj const&, bool) ()
#15 0x00000000008232cb in mongo::ReplSetImpl::syncApply(mongo::BSONObj const&) ()
#16 0x0000000000826f65 in mongo::ReplSetImpl::syncTail() ()
#17 0x00000000008283e5 in mongo::ReplSetImpl::_syncThread() ()
#18 0x0000000000828438 in mongo::ReplSetImpl::syncThread() ()
#19 0x00000000008288d0 in mongo::startSyncThread() ()
#20 0x0000000000aabd90 in thread_proxy ()
#21 0x00007effd9cd09ca in start_thread () from /lib/libpthread.so.0
#22 0x00007effd927fcdd in clone () from /lib/libc.so.6

Thanks,
Assaf

Eliot Horowitz

unread,
May 17, 2012, 1:02:21 AM5/17/12
to mongod...@googlegroups.com
Can you try adding an index on _id for the capped collection?
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

Assaf Lavie

unread,
May 17, 2012, 1:39:17 AM5/17/12
to mongod...@googlegroups.com
Thanks, that seems to have fixed it. Any idea why this should cause high CPU usage?

Nat

unread,
May 17, 2012, 2:44:42 AM5/17/12
to mongod...@googlegroups.com
Do you perform a lot of updates on capped collection? Replication doesn't work well with updates on capped collection.

> See also the IRC channel -- freenode.net#mongodb

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to

Assaf Lavie

unread,
May 17, 2012, 2:57:57 AM5/17/12
to mongod...@googlegroups.com
Absolutely no updates. It's just logs. 



> See also the IRC channel -- freenode.net#mongodb

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to

See also the IRC channel -- freenode.net#mongodb

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to

Adam C

unread,
May 17, 2012, 3:56:43 AM5/17/12
to mongod...@googlegroups.com
There are a few related bugs for this one:


The short version is that you didn't have an _id index and hence replication was essentially doing a full table scan when replicating the capped collection over to the secondary.  By adding the index you removed that isses, it started using an index, and your CPU dropped.  Once the listed issues get sorted out, this will no longer be an issue, but for now capped collections require some manual intervention to replicate efficiently.


Adam

> See also the IRC channel -- freenode.net#mongodb

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to

See also the IRC channel -- freenode.net#mongodb

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to

Assaf Lavie

unread,
May 17, 2012, 11:59:38 AM5/17/12
to mongod...@googlegroups.com
Thanks for the detailed and thorough reply.

One more question: are the _id values unique (they are auto-generated, I assume by mongo or pymongo)? The docs warn about uniqueness...


On Thursday, May 17, 2012 10:56:43 AM UTC+3, Adam C wrote:
There are a few related bugs for this one:


The short version is that you didn't have an _id index and hence replication was essentially doing a full table scan when replicating the capped collection over to the secondary.  By adding the index you removed that isses, it started using an index, and your CPU dropped.  Once the listed issues get sorted out, this will no longer be an issue, but for now capped collections require some manual intervention to replicate efficiently.


Adam
[snip] 

Adam C

unread,
May 17, 2012, 12:21:18 PM5/17/12
to mongod...@googlegroups.com
See the spec here:


With that make up, you can reasonably expect uniqueness when auto-generating them.  Even if the IDs were created at the exact same time, by machines with the same name/vid/mac, by processes with the same PID, the incrementing/random value in the last three bytes would have to be exactly the same too.  Not a likely event :)

Adam
Reply all
Reply to author
Forward
0 new messages