I'm doing a map-reduce job on mongo 2.0.2 (single server) that joins together several documents from a number of collections and places them incrementally in a new collection.
Whilst the map reduce is running mongodb becomes very unresponsive for reads and sometimes causes a seg fault (using the php driver). I'm seeing the following output in the logs:
Thu Feb 9 12:42:34 [conn7] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: acorn.CBD_harvest top: { opid: 411, active: true, lockType: "read", waitingForLock: false, secs_running: 464, op: "query", ns: "acorn.CBD_itemtoitem", query: { $msg: "query not recording (too large)" }, client: "127.0.0.1:60612", desc: "conn", threadId: "0x7fcd3b7f9700", connectionId: 7, msg: "m/r: (1/3) emit phase 11805/144997 8%", progress: { done: 11805, total: 144997 }, numYields: 17 }
Thu Feb 9 12:42:35 [conn7] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: acorn.CBD_itemtoitem top: { opid: 411, active: true, lockType: "read", waitingForLock: false, secs_running: 465, op: "query", ns: "acorn.CBD_itemtoitem", query: { $msg: "query not recording (too large)" }, client: "127.0.0.1:60612", desc: "conn", threadId: "0x7fcd3b7f9700", connectionId: 7, msg: "m/r: (1/3) emit phase 11834/144997 8%", progress: { done: 11834, total: 144997 }, numYields: 17 }
Thu Feb 9 12:42:36 [conn7] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: acorn.CBD_harvest top: { opid: 411, active: true, lockType: "read", waitingForLock: false, secs_running: 466, op: "query", ns: "acorn.CBD_itemtoitem", query: { $msg: "query not recording (too large)" }, client: "127.0.0.1:60612", desc: "conn", threadId: "0x7fcd3b7f9700", connectionId: 7, msg: "m/r: (1/3) emit phase 11866/144997 8%", progress: { done: 11866, total: 144997 }, numYields: 17 }
Thu Feb 9 12:42:36 [conn7] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: acorn.CBD_harvest top: { opid: 411, active: true, lockType: "read", waitingForLock: false, secs_running: 466, op: "query", ns: "acorn.CBD_itemtoitem", query: { $msg: "query not recording (too large)" }, client: "127.0.0.1:60612", desc: "conn", threadId: "0x7fcd3b7f9700", connectionId: 7, msg: "m/r: (1/3) emit phase 11871/144997 8%", progress: { done: 11871, total: 144997 }, numYields: 17 }
Thu Feb 9 12:42:36 [conn7] query acorn.CBD_harvest ntoreturn:1 nscanned:1 nreturned:1 reslen:2001 107ms
11900/144997 8%
Thu Feb 9 12:42:39 [conn7] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: acorn.CBD_harvest top: { opid: 411, active: true, lockType: "read", waitingForLock: false, secs_running: 469, op: "query", ns: "acorn.CBD_itemtoitem", query: { $msg: "query not recording (too large)" }, client: "127.0.0.1:60612", desc: "conn", threadId: "0x7fcd3b7f9700", connectionId: 7, msg: "m/r: (1/3) emit phase 11955/144997 8%", progress: { done: 11955, total: 144997 }, numYields: 17 }
Here's some mongostat output (incidentally, mongostat can take up to 20s to start when this MR is running):
$ mongostat
connected to: 127.0.0.1
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn time
0 6 0 0 0 7 0 36.3g 73g 2.56g 207 79.5 0 0|0 0|1 1k 2k 10 12:46:37
0 0 0 0 0 1 0 36.3g 73g 2.56g 121 108 0 0|0 0|1 62b 1k 10 12:46:38
0 1494 0 0 0 1 0 36.3g 73g 2.6g 208 53.4 0 0|0 1|0 62b 1k 10 12:46:39
0 2698 0 0 0 1 0 36.3g 73g 2.6g 12 0 0 0|0 1|0 62b 1k 10 12:46:40
0 2951 0 0 0 1 0 36.3g 73g 2.6g 3 0 0 0|0 1|0 62b 1k 10 12:46:41
0 3026 0 0 0 1 0 36.3g 73g 2.6g 2 0 0 0|0 0|0 62b 1k 10 12:46:42
0 2785 0 0 0 11 0 36.3g 73g 2.6g 9 0 0 0|0 1|0 703b 4k 10 12:46:43
0 2878 0 0 0 1 0 36.3g 73g 2.6g 1 0 0 0|0 0|0 62b 1k 10 12:46:44
0 1486 0 0 0 1 0 36.3g 73g 2.63g 18 0 0 0|0 1|0 62b 1k 10 12:46:45
0 0 0 0 0 1 0 36.3g 73g 2.79g 7 0 0 0|0 1|0 62b 1k 10 12:46:46
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn time
0 0 0 0 0 1 0 36.3g 73g 2.95g 5 0 0 0|0 1|0 62b 1k 10 12:46:47
0 0 0 0 0 1 1 36.3g 73g 3.16g 14 0 0 0|0 1|0 62b 1k 10 12:46:48
0 0 0 0 0 1 0 36.3g 73g 3.29g 155 0 0 0|0 1|0 62b 1k 10 12:46:49
0 0 0 0 0 1 0 36.3g 73g 3.5g 11 0 0 0|0 1|0 62b 1k 10 12:46:50
0 0 0 0 0 1 0 36.3g 73g 3.7g 7 0 0 0|0 1|0 62b 1k 10 12:46:51
0 0 0 0 0 1 0 36.3g 73g 3.86g 65 0 0 0|0 1|0 62b 1k 10 12:46:52
0 0 0 0 0 1 0 36.3g 73g 4.04g 44 0 0 0|0 1|0 62b 1k 10 12:46:53
0 1501 0 0 0 1 0 36.3g 73g 4.04g 3 0 0 0|0 0|0 62b 1k 10 12:46:54
0 2770 0 0 0 1 0 36.3g 73g 4.05g 9 0 0 0|0 0|0 62b 1k 10 12:46:55
0 2970 0 0 0 1 0 36.3g 73g 4.05g 7 0 0 0|0 1|0 62b 1k 10 12:46:56
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn time
0 2634 0 0 0 1 0 36.3g 73g 4.05g 8 0 0 0|0 0|0 62b 1k 10 12:46:57
0 2955 0 0 0 1 0 36.3g 73g 4.06g 6 0 0 0|0 1|0 62b 1k 10 12:46:58
0 3075 0 0 0 1 0 36.3g 73g 4.06g 2 0 0 0|0 0|0 62b 1k 10 12:46:59
0 2704 0 0 0 1 0 36.3g 73g 4.06g 11 0 0 0|0 1|0 62b 1k 10 12:47:00
My first thought is that a query inside my M-R isn't making use of indexes and thus causing the large amount of faults recorded by mongostat. However these are all doing db.coll.find({'_id':someid}) as far as I can see, so should be making use of the _id index.
The log statements above include:
query: { $msg: "query not recording (too large)" }
Is there any way I can get some information about what the query in question might be so I can track it down?
Or perhaps someone has some other ideas to investigate?
Thanks,
Chris