has anyone seen this pattern (min writer queue size) ?

31 views
Skip to first unread message

Pierre Couzy

unread,
Jan 16, 2012, 7:35:39 AM1/16/12
to mongodb-user
Hey there,

I'm running a very simple setup (single server) mongo, and I see this
pattern from time to time :

1/ Normal behaviour, almost idle
insert query update delete getmore command flushes mapped vsize
res faults locked % idx miss % qr|qw ar|aw netIn netOut
conn time
0 68 120 26 0 154 0 3.95g 8.06g
3.18g 0 55.8 0 0|0 0|0 141k 137k
32 03:25:38
0 58 85 7 0 93 0 3.95g 8.05g
3.18g 0 42.6 0 0|0 0|0 116k 114k
31 03:25:39
0 40 48 4 0 50 0 3.95g 8.05g
3.18g 0 1.4 0 5|1 5|2 57k 59k
31 03:25:41
0 64 95 12 0 120 0 3.95g 8.05g
3.18g 6 58.5 0 0|0 0|0 126k 132k
31 03:25:42
0 69 97 9 0 110 0 3.95g 8.05g
3.18g 0 55.2 0 0|0 0|0 138k 139k
30 03:25:43
0 93 111 13 0 125 0 3.95g 8.05g
3.18g 0 19.1 0 7|2 7|3 146k 147k
30 03:25:44
0 75 128 11 0 144 0 3.95g 8.05g
3.18g 22 58.8 0 0|2 0|3 181k 183k
31 03:25:45

Notice that queued & active workers are regularly down to zero as
expected.
Now, have a look at this one :

insert query update delete getmore command flushes mapped vsize
res faults locked % idx miss % qr|qw ar|aw netIn netOut
conn time
0 192 198 37 0 232 0 3.95g 8.13g
3.53g 8 44.7 0 47|6 47|7 241k 260k
106 14:04:04
0 289 478 69 1 558 0 3.95g 8.13g
3.53g 42 54.9 0 0|0 0|0 618k 627k
111 14:04:05
0 172 194 16 0 166 0 3.95g 8.13g
3.53g 0 55.8 0 9|46 9|48 201k 335k
110 14:04:06
0 219 391 64 0 507 0 3.95g 8.14g
3.53g 19 53.7 0 0|0 0|0 548k 421k
113 14:04:07
0 208 222 31 0 197 0 3.95g 8.14g
3.53g 1 58.8 0 7|54 7|56 233k 396k
113 14:04:08
0 279 443 72 0 579 0 3.95g 8.13g
3.53g 7 45.5 0 0|0 1|0 643k 542k
111 14:04:09
0 254 361 61 0 426 0 3.95g 8.13g
3.53g 2 64.7 0 0|0 0|0 447k 485k
111 14:04:10
0 225 266 29 0 284 0 3.95g 8.13g
3.53g 2 8.6 0 19|3 19|4 338k 379k
111 14:04:11
0 260 304 40 0 240 0 3.95g 8.17g
3.53g 247 93.6 0 1|116 1|119 241k 570k
150 14:04:12
0 296 518 81 0 714 0 3.95g 8.17g
3.53g 4 29.8 0 0|11 0|12 816k 560k
149 14:04:13

Notice the peak at 14:04:12 ? From now on, the min qw & aw will never
go below 11 :

0 265 398 70 0 483 0 3.95g 8.17g
3.53g 5 77.6 0 0|11 0|11 540k 559k
149 14:04:53
0 250 365 51 0 422 0 3.95g 8.17g
3.53g 0 47.7 0 0|11 0|12 460k 471k
147 14:04:54
0 214 317 37 0 355 0 3.95g 8.17g
3.53g 2 8.8 0 0|11 0|12 420k 403k
147 14:04:55
0 241 355 31 0 390 0 3.95g 8.17g
3.53g 11 79.4 0 0|11 0|12 478k 485k
146 14:04:56
0 254 328 45 0 367 0 3.95g 8.17g
3.53g 0 20.5 0 9|18 9|19 379k 409k
145 14:04:57
insert query update delete getmore command flushes mapped vsize
res faults locked % idx miss % qr|qw ar|aw netIn netOut
conn time
0 232 368 40 0 423 0 3.95g 8.17g
3.53g 1 70.5 0 0|11 0|11 520k 530k
145 14:04:58
0 156 147 19 0 146 0 3.95g 8.17g
3.53g 8 49.9 0 0|38 2|38 161k 288k
144 14:04:59
0 284 454 60 1 542 0 3.95g 8.17g
3.53g 5 36.7 0 0|11 0|11 639k 584k
144 14:05:00
0 190 183 20 0 159 0 3.95g 8.17g
3.53g 0 54.4 0 7|59 8|59 188k 367k
144 14:05:01
0 264 471 72 1 592 0 3.95g 8.17g
3.53g 10 37.3 0 0|11 0|11 644k 489k
145 14:05:02
0 214 218 25 1 193 0 3.95g 8.17g
3.53g 1 67.6 0 0|63 1|65 225k 429k
144 14:05:03
0 310 529 89 0 676 0 3.95g 8.17g
3.53g 3 45.5 0 0|11 0|11 762k 557k
143 14:05:04
0 215 278 32 0 293 0 3.95g 8.16g
3.53g 3 83.5 0 1|23 1|32 320k 415k
142 14:05:05
0 214 375 55 0 455 0 3.95g 8.16g
3.53g 19 16.1 0 0|11 0|11 495k 406k
142 14:05:07
0 229 320 40 0 361 0 3.95g 8.17g
3.53g 9 89.7 0 0|11 0|11 433k 425k
143 14:05:08

The same happens regularily :
insert query update delete getmore command flushes mapped vsize
res faults locked % idx miss % qr|qw ar|aw netIn netOut
conn time
0 320 504 61 0 578 0 3.95g 8.14g
3.55g 49 97.1 0 2|11 2|12 728k 705k
119 14:37:29
0 273 273 40 0 251 0 3.95g 8.14g
3.55g 2 63.7 0 10|77 10|78 297k 539k
119 14:37:30
0 318 565 95 0 735 0 3.95g 8.15g
3.55g 36 41.1 0 0|11 0|11 845k 631k
125 14:37:31
0 229 267 31 0 251 0 3.95g 8.15g
3.55g 3 72 0 0|57 0|60 296k 423k
125 14:37:33
0 282 448 67 0 573 0 3.95g 8.15g
3.55g 0 18.5 0 0|11 0|11 670k 546k
125 14:37:34
0 196 172 19 0 182 0 3.95g 8.15g
3.55g 10 43.2 0 69|19 69|20 235k 248k
127 14:37:35
0 339 510 44 0 488 0 3.95g 8.2g
3.55g 353 93.1 0 0|83 0|93 601k 777k
181 14:37:36
0 305 446 65 0 575 0 3.95g 8.2g
3.55g 6 13.2 0 13|23 13|24 674k 521k
181 14:37:37
0 333 511 82 0 605 0 3.95g 8.2g
3.55g 2 68.8 0 0|22 0|23 667k 682k
180 14:37:38

(peak, then min aw/qw goes to 22)

One more (to 71)

insert query update delete getmore command flushes mapped vsize
res faults locked % idx miss % qr|qw ar|aw netIn netOut
conn time
0 284 392 47 0 439 0 3.95g 8.2g
3.67g 12 72.9 0 0|48 2|52 516k 559k
173 17:55:03
0 268 351 44 0 395 0 3.95g 8.2g
3.67g 18 8.2 0 34|53 34|54 487k 477k
174 17:55:04
0 407 617 65 0 691 0 3.95g 8.2g
3.67g 28 80.2 0 0|48 2|48 836k 884k
174 17:55:05
0 312 426 56 0 489 0 3.95g 8.2g
3.67g 13 43.2 0 1|48 3|48 566k 622k
174 17:55:06
0 241 281 29 0 305 0 3.95g 8.21g
3.67g 30 43.8 0 64|56 64|57 381k 336k
175 17:55:07
0 372 599 69 1 681 0 3.95g 8.21g
3.67g 59 30.6 0 0|48 2|48 832k 899k
180 17:55:08
0 279 317 41 0 349 0 3.95g 8.21g
3.67g 11 9.5 0 46|57 46|58 412k 403k
179 17:55:10
0 262 278 32 0 179 0 3.95g 8.25g
3.67g 236 116 0 0|192 2|196 214k 609k
216 17:55:11
0 371 666 107 0 898 0 3.95g 8.25g
3.67g 38 33.7 0 10|72 10|73 1m 674k
216 17:55:12
0 267 383 65 0 455 0 3.95g 8.25g
3.67g 26 44.5 0 0|71 2|72 501k 533k
217 17:55:13

almost dead :

0 386 592 88 0 704 0 3.95g 8.36g
3.71g 22 24.3 0 0|174 4|177 829k 743k
322 18:39:26
0 230 198 15 0 121 0 3.95g 8.36g
3.71g 16 77 0 30|269 30|270 158k 367k
325 18:39:28
0 417 728 91 0 909 0 3.95g 8.37g
3.71g 120 39.1 0 0|181 3|184 1m 832k
340 18:39:29
0 264 331 44 0 379 0 3.95g 8.37g
3.71g 14 62.1 0 0|181 3|187 410k 503k
341 18:39:30

dead for all practical purposes :
insert query update delete getmore command flushes mapped vsize
res faults locked % idx miss % qr|qw ar|aw netIn netOut
conn time
0 95 104 20 0 129 0 3.95g 8.46g
3.72g 28 57.1 0 0|202 4|203 134k 165k
426 19:00:35
0 4 10 1 0 16 0 3.95g 8.46g
3.72g 14 0 0 0|202 4|202 14k 6k
427 19:00:36
0 84 104 9 0 77 0 3.95g 8.45g
3.72g 4 13.5 0 30|243 29|244 73k 103k
421 19:00:38
0 31 48 13 0 105 0 3.95g 8.45g
3.72g 33 66.6 0 0|202 4|202 102k 87k
421 19:00:39
0 7 4 1 0 6 1 3.95g 8.45g
3.72g 10 1.3 0 0|202 4|202 4k 6k
421 19:00:40
0 60 98 11 0 118 0 3.95g 8.45g
3.72g 2 58.1 0 0|204 4|206 113k 134k
416 19:00:41
0 2 10 0 0 15 0 3.95g 8.45g
3.72g 9 8.1 0 0|202 4|202 18k 6k
416 19:00:42
0 93 123 12 0 114 0 3.95g 8.45g
3.72g 11 52.7 0 6|211 10|226 114k 160k
414 19:00:43
0 41 59 11 0 102 0 3.95g 8.45g
3.72g 14 12 0 0|202 4|202 112k 82k
414 19:00:45
0 151 254 27 0 246 0 3.95g 8.45g
3.72g 374 6.8 0 0|202 4|239 281k 228k
421 19:00:46

This is running on a 64 bit windows box, 16Go RAM,


Have you ever seen that behaviour ?

Pierrre

Eliot Horowitz

unread,
Jan 17, 2012, 1:38:59 AM1/17/12
to mongod...@googlegroups.com
The most common reason for that is slow queries.
Possible?
db.currentOp() is a good way to look, or profiling

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Pierre Couzy

unread,
Jan 16, 2012, 4:23:43 PM1/16/12
to mongodb-user
To be more specific: this is most probably a condition that is
triggered by the application and not by a bug in Mongo, but we're
trying to pinpoint it, so if you made once a mistake that gave you an
everincreasing number of active writes/queued writes I'd like to know
about it. In case you wonder about the limitation at 400 connections
or so, this is a limitation we defined in the client driver (pool
size).

Pierre Couzy

unread,
Jan 17, 2012, 8:05:53 AM1/17/12
to mongodb-user
We had a look at currentOp() and there was nothing really obvious
there (see extract below).
We experienced issues during MongoDump and tried a repair, which gives
something meaningful :

[...]
Tue Jan 17 00:09:46 [initandlisten] warning: ClientCursor::yield can't
unlock b/c of recursive lock ns: Main.cachedItems top: { opid: 221, a
ctive: true, waitingForLock: false, secs_running: 0, op: "getmore",
ns: "Main.cachedItems", query: {}, client: "0.0.0.0:0", desc:
"initandli
sten", numYields: 0 }
Tue Jan 17 00:09:46 [initandlisten] warning: ClientCursor::yield can't
unlock b/c of recursive lock ns: Main.cachedItems top: { opid: 221, a
ctive: true, waitingForLock: false, secs_running: 0, op: "getmore",
ns: "Main.cachedItems", query: {}, client: "0.0.0.0:0", desc:
"initandli
sten", numYields: 0 }
Tue Jan 17 00:09:46 [initandlisten] warning: ClientCursor::yield can't
unlock b/c of recursive lock ns: Main.cachedItems top: { opid: 221, a
ctive: true, waitingForLock: false, secs_running: 0, op: "getmore",
ns: "Main.cachedItems", query: {}, client: "0.0.0.0:0", desc:
"initandli
sten", numYields: 0 }
Tue Jan 17 00:09:46 [initandlisten] build index Main.cachedItems
{ _id: 1 }
1000000/1051544 95%
Tue Jan 17 00:10:17 [initandlisten] external sort used : 2 files
in 31 secs
Tue Jan 17 00:10:27 [initandlisten] done building bottom layer,
going to commit
Tue Jan 17 00:10:27 [initandlisten] build index done 1051544 records
41.668 secs
Tue Jan 17 00:10:27 [initandlisten] build index Main.cachedItems { k:
1.0 }
Tue Jan 17 00:10:36 [initandlisten] external sort used : 2 files
in 8 secs
Tue Jan 17 00:10:38 [initandlisten] build index done 1051544 records
10.78 secs
Tue Jan 17 00:10:38 [initandlisten] build index Main.cachedItems { e:
1.0 }
Tue Jan 17 00:10:41 [initandlisten] build index done 1051544 records
3.031 secs
Tue Jan 17 00:10:45 [initandlisten] removeJournalFiles
Tue Jan 17 00:10:46 [initandlisten] finished checking dbs
Tue Jan 17 00:10:46 dbexit:

I don't know about recursive locks but that smells like something that
could hang around and be reissued, explaining why the writers don't go
down past a certain level. Do you know how we can get more
information ? We have mongo logs, mongostat dumps, and applicative
logs as well.


Extract of currentOp() :
{
"inprog" : [
{
"opid" : 11198852,
"active" : true,
"lockType" : "write",
"waitingForLock" : false,
"secs_running" : 0,
"op" : "update",
"ns" : "Main.cachedItems",
"query" : {
"k" : "tile0_3_3-2431917"
},
"client" : "79.125.41.97:54646",
"desc" : "conn",
"connectionId" : 5602,
"numYields" : 1
},
{
"opid" : 11198392,
"active" : false,
"lockType" : "write",
"waitingForLock" : true,
"op" : "remove",
"ns" : "",
"query" : {
"k" : "lck_visits29570744"
},
"client" : "79.125.41.97:54643",
"desc" : "conn",
"connectionId" : 5600,
"numYields" : 0
},
{
"opid" : 11198484,
"active" : false,
"lockType" : "write",
"waitingForLock" : true,
"op" : "remove",
"ns" : "",
"query" : {
"k" : "lck_visits16822501"
},
"client" : "79.125.83.28:58052",
"desc" : "conn",
"connectionId" : 5597,
"numYields" : 0
},
{
"opid" : 11198507,
"active" : false,
"lockType" : "write",
"waitingForLock" : true,
"op" : "update",
"ns" : "",
"query" : {
"k" : "data16377358"
},
"client" : "79.125.41.97:54716",
"desc" : "conn",
"connectionId" : 5623,
"numYields" : 0
},
{
"opid" : 11198825,
"active" : false,
"lockType" : "write",
"waitingForLock" : true,
"op" : "update",
"ns" : "",
"query" : {
"k" : "tile0_4_4-12012251"
},
"client" : "79.125.41.97:54599",
"desc" : "conn",
"connectionId" : 5578,
"numYields" : 0
},
{
"opid" : 11198801,
"active" : false,
"lockType" : "write",
"waitingForLock" : true,
"op" : "update",
"ns" : "",
"query" : {
"k" : "data33690616"
},
"client" : "176.34.221.185:61670",
"desc" : "conn",
"connectionId" : 5594,
"numYields" : 0
},

Pierre
> ...
>
> plus de détails »

Eliot Horowitz

unread,
Jan 17, 2012, 8:12:06 AM1/17/12
to mongod...@googlegroups.com
Are you doing findAndModify or db.eval on Main.cachedItems?
If so - there might be an unindexed query.

Pierre Couzy

unread,
Jan 17, 2012, 8:19:56 AM1/17/12
to mongodb-user
No findAndModify, and no evals, but I'll ask someone here to chime in
and give more detailed explanations on the client side in case I'm
missing something.

Pierre
> ...
>
> plus de détails »

Yann Schwartz

unread,
Jan 17, 2012, 8:18:04 AM1/17/12
to mongodb-user
We're doing upserts (find and modify) using two indexed fields (one of
them being defined as a sparse index). No eval.
> ...
>
> read more »

Eliot Horowitz

unread,
Jan 17, 2012, 8:37:07 AM1/17/12
to mongod...@googlegroups.com
Can you send the findAndModify commands and indexes?

Yann Schwartz

unread,
Jan 17, 2012, 8:43:22 AM1/17/12
to mongodb-user
To be more precise, we're using a (dirty?) trick to atomically check
and set a mostly empty document act.

Our operation is roughly:
db.mycollection.update( { k: "mykey", e: {$lt: new Date() }, x, /
*upsert*/ true );

k is unique and indexed
e is indexed (and sparse)

So if the document exists but e is still in the future, mongo tries to
insert, and the resulting exception means the document is still held,
if it does not exist or has expired, the document is set to the new
expiration. We're using SafeMode.
> ...
>
> read more »

Yann Schwartz

unread,
Jan 17, 2012, 8:46:28 AM1/17/12
to mongodb-user
Since we're using the official C# driver:

var query = Query.EQ(idKey, key).And(query, Query.LT(expirationKey,
DateTime.UtcNow));
var update = new UpdateBuilder().Set(expirationKey, expiration.Value);
Collection.Update(query, update, UpdateFlags.Upsert, SafeMode.True);
> ...
>
> read more »

Yann Schwartz

unread,
Jan 17, 2012, 8:50:10 AM1/17/12
to mongodb-user
Sorry, messed up when trying to simplify our code...

var query =Query.And( Query.EQ("k", key),Query.LT("e",
DateTime.UtcNow));var update = new UpdateBuilder().Set("e",
someValue);Collection.Update(query, update, UpdateFlags.Upsert,
SafeMode.True);
> ...
>
> read more »

Eliot Horowitz

unread,
Jan 18, 2012, 12:03:01 AM1/18/12
to mongod...@googlegroups.com
Can you do an explain on the query you are doing in findAndModify?

Yann Schwartz

unread,
Jan 18, 2012, 9:15:43 AM1/18/12
to mongodb-user
Our indices:

> db.cachedItems.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "Main.cachedItems",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"k" : 1
},
"unique" : true,
"ns" : "Main.cachedItems",
"name" : "k_1"
},
{
"v" : 1,
"key" : {
"e" : 1
},
"ns" : "Main.cachedItems",
"name" : "e_1",
"sparse" : true
}
]
>


The explain plan:


> db.cachedItems.find({ "k" : "MyKey", "e" : { "$lt" : ISODate("2012-01-18T13:12:17.739Z") } }).explain(true){        "cursor" : "BtreeCursor k_1",        "nscanned" : 1,        "nscannedObjects" : 1,        "n" : 1,        "millis" : 2,        "nYields" : 0,        "nChunkSkips" : 0,        "isMultiKey" : false,        "indexOnly" : false,        "indexBounds" : {                "k" : [                        [                                "MyKey",                                "MyKey"                        ]                ]        },        "allPlans" : [                {                        "cursor" : "BtreeCursor k_1",                        "indexBounds" : {                                "k" : [                                        [                                                "MyKey",                                                "MyKey"                                        ]                                ]                        }                },                {                        "cursor" : "BtreeCursor e_1",                        "indexBounds" : {                                "e" : [                                        [                                                true,                                                ISODate("2012-01-18T13:12:17.739Z")                                        ]                                ]                        }                },                {                        "cursor" : "BasicCursor",                        "indexBounds" : {
                        }                }        ],       
"oldPlan" : {                "cursor" : "BtreeCursor k_1",           
    "indexBounds" : {                        "k" : [                 
              [                                        "MyKey",       
                                "MyKey"                             
  ]                        ]                }        }}>
> ...
>
> read more »

Yann Schwartz

unread,
Jan 18, 2012, 9:31:07 AM1/18/12
to mongodb-user
So from the explain, I gather that we're using the right index (k_1,
unique). It would be problematic if we were using a BasicCursor, or
not as optimal if using the e_1 index first.

We currently have ~1M documents and ~200k documents with e defined.
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages