ClientCursor::yield can't unlock b/c of recursive lock

Chris Clarke

unread,

Feb 9, 2012, 8:02:07 AM2/9/12

to mongod...@googlegroups.com

Hi there,

I'm doing a map-reduce job on mongo 2.0.2 (single server) that joins together several documents from a number of collections and places them incrementally in a new collection.

Whilst the map reduce is running mongodb becomes very unresponsive for reads and sometimes causes a seg fault (using the php driver). I'm seeing the following output in the logs:

Thu Feb 9 12:42:34 [conn7] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: acorn.CBD_harvest top: { opid: 411, active: true, lockType: "read", waitingForLock: false, secs_running: 464, op: "query", ns: "acorn.CBD_itemtoitem", query: { $msg: "query not recording (too large)" }, client: "127.0.0.1:60612", desc: "conn", threadId: "0x7fcd3b7f9700", connectionId: 7, msg: "m/r: (1/3) emit phase 11805/144997 8%", progress: { done: 11805, total: 144997 }, numYields: 17 }
Thu Feb 9 12:42:35 [conn7] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: acorn.CBD_itemtoitem top: { opid: 411, active: true, lockType: "read", waitingForLock: false, secs_running: 465, op: "query", ns: "acorn.CBD_itemtoitem", query: { $msg: "query not recording (too large)" }, client: "127.0.0.1:60612", desc: "conn", threadId: "0x7fcd3b7f9700", connectionId: 7, msg: "m/r: (1/3) emit phase 11834/144997 8%", progress: { done: 11834, total: 144997 }, numYields: 17 }
Thu Feb 9 12:42:36 [conn7] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: acorn.CBD_harvest top: { opid: 411, active: true, lockType: "read", waitingForLock: false, secs_running: 466, op: "query", ns: "acorn.CBD_itemtoitem", query: { $msg: "query not recording (too large)" }, client: "127.0.0.1:60612", desc: "conn", threadId: "0x7fcd3b7f9700", connectionId: 7, msg: "m/r: (1/3) emit phase 11866/144997 8%", progress: { done: 11866, total: 144997 }, numYields: 17 }
Thu Feb 9 12:42:36 [conn7] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: acorn.CBD_harvest top: { opid: 411, active: true, lockType: "read", waitingForLock: false, secs_running: 466, op: "query", ns: "acorn.CBD_itemtoitem", query: { $msg: "query not recording (too large)" }, client: "127.0.0.1:60612", desc: "conn", threadId: "0x7fcd3b7f9700", connectionId: 7, msg: "m/r: (1/3) emit phase 11871/144997 8%", progress: { done: 11871, total: 144997 }, numYields: 17 }
Thu Feb 9 12:42:36 [conn7] query acorn.CBD_harvest ntoreturn:1 nscanned:1 nreturned:1 reslen:2001 107ms
11900/144997 8%
Thu Feb 9 12:42:39 [conn7] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: acorn.CBD_harvest top: { opid: 411, active: true, lockType: "read", waitingForLock: false, secs_running: 469, op: "query", ns: "acorn.CBD_itemtoitem", query: { $msg: "query not recording (too large)" }, client: "127.0.0.1:60612", desc: "conn", threadId: "0x7fcd3b7f9700", connectionId: 7, msg: "m/r: (1/3) emit phase 11955/144997 8%", progress: { done: 11955, total: 144997 }, numYields: 17 }

Here's some mongostat output (incidentally, mongostat can take up to 20s to start when this MR is running):

$ mongostat
connected to: 127.0.0.1
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn time
0 6 0 0 0 7 0 36.3g 73g 2.56g 207 79.5 0 0|0 0|1 1k 2k 10 12:46:37
0 0 0 0 0 1 0 36.3g 73g 2.56g 121 108 0 0|0 0|1 62b 1k 10 12:46:38
0 1494 0 0 0 1 0 36.3g 73g 2.6g 208 53.4 0 0|0 1|0 62b 1k 10 12:46:39
0 2698 0 0 0 1 0 36.3g 73g 2.6g 12 0 0 0|0 1|0 62b 1k 10 12:46:40
0 2951 0 0 0 1 0 36.3g 73g 2.6g 3 0 0 0|0 1|0 62b 1k 10 12:46:41
0 3026 0 0 0 1 0 36.3g 73g 2.6g 2 0 0 0|0 0|0 62b 1k 10 12:46:42
0 2785 0 0 0 11 0 36.3g 73g 2.6g 9 0 0 0|0 1|0 703b 4k 10 12:46:43
0 2878 0 0 0 1 0 36.3g 73g 2.6g 1 0 0 0|0 0|0 62b 1k 10 12:46:44
0 1486 0 0 0 1 0 36.3g 73g 2.63g 18 0 0 0|0 1|0 62b 1k 10 12:46:45
0 0 0 0 0 1 0 36.3g 73g 2.79g 7 0 0 0|0 1|0 62b 1k 10 12:46:46
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn time
0 0 0 0 0 1 0 36.3g 73g 2.95g 5 0 0 0|0 1|0 62b 1k 10 12:46:47
0 0 0 0 0 1 1 36.3g 73g 3.16g 14 0 0 0|0 1|0 62b 1k 10 12:46:48
0 0 0 0 0 1 0 36.3g 73g 3.29g 155 0 0 0|0 1|0 62b 1k 10 12:46:49
0 0 0 0 0 1 0 36.3g 73g 3.5g 11 0 0 0|0 1|0 62b 1k 10 12:46:50
0 0 0 0 0 1 0 36.3g 73g 3.7g 7 0 0 0|0 1|0 62b 1k 10 12:46:51
0 0 0 0 0 1 0 36.3g 73g 3.86g 65 0 0 0|0 1|0 62b 1k 10 12:46:52
0 0 0 0 0 1 0 36.3g 73g 4.04g 44 0 0 0|0 1|0 62b 1k 10 12:46:53
0 1501 0 0 0 1 0 36.3g 73g 4.04g 3 0 0 0|0 0|0 62b 1k 10 12:46:54
0 2770 0 0 0 1 0 36.3g 73g 4.05g 9 0 0 0|0 0|0 62b 1k 10 12:46:55
0 2970 0 0 0 1 0 36.3g 73g 4.05g 7 0 0 0|0 1|0 62b 1k 10 12:46:56
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn time
0 2634 0 0 0 1 0 36.3g 73g 4.05g 8 0 0 0|0 0|0 62b 1k 10 12:46:57
0 2955 0 0 0 1 0 36.3g 73g 4.06g 6 0 0 0|0 1|0 62b 1k 10 12:46:58
0 3075 0 0 0 1 0 36.3g 73g 4.06g 2 0 0 0|0 0|0 62b 1k 10 12:46:59
0 2704 0 0 0 1 0 36.3g 73g 4.06g 11 0 0 0|0 1|0 62b 1k 10 12:47:00

My first thought is that a query inside my M-R isn't making use of indexes and thus causing the large amount of faults recorded by mongostat. However these are all doing db.coll.find({'_id':someid}) as far as I can see, so should be making use of the _id index.

The log statements above include:

query: { $msg: "query not recording (too large)" }

Is there any way I can get some information about what the query in question might be so I can track it down?

Or perhaps someone has some other ideas to investigate?

Thanks,

Chris

Antoine Girbal

unread,

Feb 9, 2012, 5:15:50 PM2/9/12

to mongodb-user

could you give the sample MR cmd that you are running?
Can you post the stats for the db and all collections that MR is
using?
Note that you have about 4GB of resident memory so if all those
collections / indexes dont fit you will get a lot of page faults.
The message that display "can't unlock" is usually due to the fact
that a nested operation (like code within MR or eval) is taking a long
time to execute.
Also note that it is not advised to do internal db requests to other
collections within MR since it would not work if you decide to shard.

Chris Clarke

unread,

Feb 10, 2012, 6:02:09 AM2/10/12

to mongodb-user

Hi Antoine,

I will send you the code by mail. The DB is much larger than RAM
actually so it may just be a case of an undersized box:

db.stats()
{
"db" : "some-db",
"collections" : 13,
"objects" : 11125052,
"avgObjSize" : 3030.2757437897817,
"dataSize" : 33711975224,
"storageSize" : 40355508112,
"numExtents" : 190,
"indexes" : 38,
"indexSize" : 2974371568,
"fileSize" : 70766297088,
"nsSizeMB" : 16,
"ok" : 1
}

I know eval() doesn't work well with sharding but I was unaware that
db queries within map reduce don't play nicely with sharing....

Are there any other limitations of doing db access within MR (other
than sharding) that I should be aware of?

Chris

On Feb 9, 10:15 pm, Antoine Girbal <anto...@10gen.com> wrote:
> could you give the sample MR cmd that you are running?
> Can you post the stats for the db and all collections that MR is
> using?
> Note that you have about 4GB of resident memory so if all those
> collections / indexes dont fit you will get a lot of page faults.
> The message that display "can't unlock" is usually due to the fact
> that a nested operation (like code within MR or eval) is taking a long
> time to execute.
> Also note that it is not advised to do internal db requests to other
> collections within MR since it would not work if you decide to shard.
>

Antoine Girbal

unread,

Feb 10, 2012, 3:31:10 PM2/10/12

to mongodb-user

yes accessing a lot of data in random fashion through indexes that do
not fit in RAM will be very slow.
Doing db access within JS methods of MR is possible but not advised.
We tend to prevent / disable things that do not work seamlessly when
sharding.
The db queries will only access data that exist within that shard and
so you would get inconsistent results if used with sharding.

The only other thing to note is that db operations cannot yield
(release lock) within JS and so any query you run should be very short
lived (e.g. access few documents)

> ...
>
> read more »

Chris Clarke

unread,

Feb 12, 2012, 5:00:14 AM2/12/12

to mongodb-user

Hi Antoine,

I think our first route will be to ensure we're using a more suitably
sized server to run this job (this is just a test instance on EC2) and
see if the lock yield situation improves.

We are only dealing with retrieving documents by ID in the M-R job so
it should only be handling a very small amount of docs.

Chris

Reply all

Reply to author

Forward