Long flushes

Jesterkst

unread,

Oct 28, 2011, 2:20:51 PM10/28/11

to mongodb-user, y.v...@dev1team.net, el...@dev1team.net

Hi all.

We're experiencing issues caused by long flushes (at least this seems
to be so). Flushing is set up to occur once per 60s.
There is a relevant fragment of output db.serverStatus():
"backgroundFlushing" : {
"flushes" : 12289,
"total_ms" : 114933178,
"average_ms" : 9352.524859630565,
"last_ms" : 20448,
"last_finished" : ISODate("2011-10-28T15:31:33.589Z")
}

Here is some output from mongostat (with correct tabs output placed
here http://pastebin.com/JA2vUZcT):
insert query update delete getmore command flushes mapped vsize
res faults locked % idx miss % qr|qw ar|aw netIn netOut
conn time
144 190 78 0 0 235 0 46.1g 47g
26.8g 0 2.7 0 0|0 0|0 60k 40k
72 18:30:57
226 216 12 0 0 243 0 46.1g 47g
26.8g 0 1.9 0 0|0 0|0 64k 38k
72 18:30:58
188 190 11 0 0 201 0 46.1g 47g
26.8g 0 1.5 0 0|0 0|0 54k 26k
72 18:30:59
82 73 19 0 0 111 0 46.1g 47g
26.8g 0 1.3 0 0|0 0|0 25k 23k
73 18:31:00
219 213 9 0 0 232 0 46.1g 47g
26.8g 0 1.8 0 0|0 0|0 61k 31k
73 18:31:01
245 341 255 0 0 530 0 46.1g 47g
26.8g 0 6.4 0 0|0 0|0 130k 80k
73 18:31:02
271 271 24 0 0 304 0 46.1g 47g
26.8g 0 2.5 0 0|0 0|0 80k 51k
73 18:31:03
234 229 6 0 0 243 0 46.1g 47g
26.8g 0 1.8 0 0|0 0|0 65k 32k
73 18:31:04
259 248 35 0 0 310 0 46.1g 47g
26.8g 0 2.5 0 0|0 0|0 78k 48k
73 18:31:05
86 74 19 0 0 113 0 46.1g 47g
26.8g 0 1 0 0|0 0|0 26k 17k
73 18:31:06
insert query update delete getmore command flushes mapped vsize
res faults locked % idx miss % qr|qw ar|aw netIn netOut
conn time
172 138 36 0 0 221 0 46.1g 47g
26.8g 0 2.1 0 0|0 0|0 51k 30k
73 18:31:07
189 190 3 0 0 194 0 46.1g 47g
26.8g 0 1.3 0 0|0 0|1 53k 27k
73 18:31:08
259 349 246 0 0 536 0 46.1g 47g
26.8g 0 7.4 0 0|0 0|0 131k 89k
73 18:31:09
256 258 9 0 0 269 0 46.1g 47g
26.8g 0 2.2 0 0|0 0|0 72k 39k
73 18:31:10
231 225 12 0 0 248 0 46.1g 47g
26.8g 0 1.9 0 0|0 0|0 65k 30k
73 18:31:11
225 285 155 0 0 400 0 46.1g 47g
26.8g 0 4.7 0 0|0 0|0 100k 57k
73 18:31:12
257 255 26 0 0 292 0 46.1g 47g
26.8g 0 3.3 0 0|0 0|0 76k 43k
73 18:31:13
260 269 94 0 0 371 0 46.1g 47g
26.8g 0 3.9 0 0|0 0|0 92k 49k
73 18:31:14
265 254 42 0 0 322 0 46.1g 47g
26.8g 0 3.3 0 0|0 0|0 81k 48k
73 18:31:15
275 243 45 0 0 336 0 46.1g 47g
26.8g 0 3.4 0 0|0 0|0 82k 47k
69 18:31:16
insert query update delete getmore command flushes mapped vsize
res faults locked % idx miss % qr|qw ar|aw netIn netOut
conn time
89 87 6 0 0 98 0 46.1g 47g
26.8g 0 0.8 0 0|0 0|0 25k 13k
69 18:31:17
109 97 12 0 0 126 0 46.1g 47g
26.8g 0 1.1 0 0|0 0|0 31k 18k
69 18:31:18
181 187 3 0 0 186 0 46.1g 47g
26.8g 0 1.3 0 0|0 0|0 51k 23k
69 18:31:19
259 386 345 0 0 646 0 46.1g 47g
26.8g 0 8.5 0 0|0 0|0 155k 106k
69 18:31:20
172 196 7 0 0 175 0 46.1g 47g
26.8g 1 1.2 0 1|17 1|18 49k 26k
70 18:31:23
278 245 35 0 0 333 0 46.1g 47g
26.8g 0 75.7 0 0|0 0|0 82k 63k
67 18:31:24
240 335 221 0 0 485 0 46.1g 47g
26.8g 0 6.3 0 0|0 0|0 121k 78k
67 18:31:25
244 222 22 0 0 279 0 46.1g 47g
26.8g 0 2.7 0 0|0 0|0 70k 42k
66 18:31:26
82 84 6 0 0 89 0 46.1g 47g
26.8g 0 0.7 0 0|0 0|0 24k 11k
66 18:31:27
227 219 9 0 0 242 0 46.1g 47g
26.8g 1 2 0 0|0 0|0 64k 33k
66 18:31:28
insert query update delete getmore command flushes mapped vsize
res faults locked % idx miss % qr|qw ar|aw netIn netOut
conn time
230 315 219 0 0 475 0 46.1g 47g
26.8g 0 5.8 0 0|0 0|0 117k 74k
66 18:31:29
220 231 0 0 0 221 0 46.1g 47g
26.8g 0 1.5 0 0|0 0|0 62k 31k
66 18:31:30
234 244 87 0 0 338 0 46.1g 47g
26.8g 0 3.7 0 0|0 0|0 83k 52k
66 18:31:31
138 123 18 0 0 164 0 46.1g 47g
26.8g 0 1.5 0 0|0 0|0 40k 25k
66 18:31:32
222 222 6 0 0 231 1 46.1g 47g
26.8g 0 1.8 0 0|0 0|0 62k 30k
66 18:31:33
241 234 9 0 0 254 0 46.1g 47g
26.8g 0 2 0 0|0 0|0 67k 32k
66 18:31:34
213 279 148 0 0 378 0 46.1g 47g
26.8g 0 4.3 0 0|0 0|0 96k 54k
66 18:31:35

As you can see, the load is quite low. Along with capturing mongostat
output I've captured IO activity with diskstats (http://
aspersa.googlecode.com/svn/html/diskstats.html):

#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt
wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg
{7} cciss/c0d2p1 0.3 128.0 0.0 0% 0.0 1.0
59.4 13.3 0.4 41% 5.3 88.4 72% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 12.0 0.0 0% 3.9 2006.0 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 12.0 0.0 0% 5.9 3006.0 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 12.0 0.0 0% 7.9 4006.0 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
155.3 13.2 1.0 38% 43.9 282.6 84% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
0.0 0.0 0.0 0% 0.0 0.0 0% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
0.0 0.0 0.0 0% 0.0 0.0 0% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
0.0 0.0 0.0 0% 0.0 0.0 0% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
0.0 0.0 0.0 0% 0.0 0.0 0% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
0.0 0.0 0.0 0% 0.0 0.0 0% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 16.0 0.0 50% 0.0 0.0 0% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
0.0 0.0 0.0 0% 0.0 0.0 0% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
0.0 0.0 0.0 0% 0.0 0.0 0% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
0.0 0.0 0.0 0% 0.0 0.0 0% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
344.4 16.7 2.8 52% 11.7 34.0 12% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
54.1 18.8 0.5 57% 0.6 11.2 2% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
0.0 0.0 0.0 0% 0.0 0.0 0% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
0.0 0.0 0.0 0% 0.0 0.0 0% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
0.0 0.0 0.0 0% 0.0 0.0 0% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
462.6 18.1 4.1 56% 13.9 30.0 99% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 12.0 0.0 0% 1.8 938.5 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 12.0 0.0 0% 3.8 1938.0 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 20.0 0.0 0% 5.8 2939.0 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 8.0 0.0 0% 7.8 3939.0 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 8.0 0.0 0% 9.7 4933.0 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
3.9 8.0 0.0 0% 25.3 6432.5 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
40.3 10.9 0.2 39% 16.0 398.0 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
323.5 11.6 1.8 30% 8.3 25.5 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 16.0 0.0 0% 2.2 1136.0 100% 0
{1} cciss/c0d2p1 1.0 128.0 0.1 0% 2.1 2149.0
141.7 10.5 0.7 24% 10.6 74.5 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
128.0 12.4 0.8 37% 5.6 43.6 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 16.0 0.0 0% 1.9 c 991.0 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 12.0 0.0 0% 3.9 1991.0 100% 0
{1} cciss/c0d2p1 1.0 128.0 0.1 0% 0.0 1.0
345.5 12.2 2.1 35% 26.6 77.0 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 16.0 0.0 0% 1.9 971.0 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 12.0 0.0 0% 3.9 1971.0 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 12.0 0.0 0% 5.8 2971.0 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 12.0 0.0 0% 7.8 3971.0 100% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
2.0 12.0 0.0 0% 9.8 4971.0 100% 0
{1} cciss/c0d2p1 1.0 8.0 0.0 0% 0.0 4.0
138.6 17.6 1.2 54% 12.6 90.7 12% 0
{1} cciss/c0d2p1 0.0 0.0 0.0 0% 0.0 0.0
0.0 0.0 0.0 0% 0.0 0.0 0% 0

Remember we had ("last_ms" : 20448) in db.serverStatus() output (got
it right after the mongostat signaled about flush)? Diskstat at the
same time showed 100% drive busy during ~21 sec. So I assume that this
is flushing process. Actually mongo isn't writing any significant
amounts of data to drive (wr_mb_s column). But write response time
values are huge (wr_rt) & there is relatively high concurrency
(wr_cnc). We expect writes to be quick because of hitting RAID
controller's cache (1 GB). The c0d2p1 is placed on RAID 1 and is
dedicated for mongo only.

Do you have any ideas why do we have so slow/long flushes?

PS: which IO scheduler should be used for mongo partitions?

Eliot Horowitz

unread,

Oct 28, 2011, 11:49:47 PM10/28/11

to mongod...@googlegroups.com

How much data are you writing into the system?
You could try lowering --syncdelay and seeing if things are smoother.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Igor Valko

unread,

Oct 31, 2011, 12:51:06 PM10/31/11

to mongod...@googlegroups.com

> How much data are you writing into the system?

There are two heavy write collections with the avgObjSizes 258.7 and 197918.01 that are the targets of ~80% requests . The load is balanced approximately equally among those two. So we can suppose that there are 40% of writes from previously posted mongostat output are issued to the collection with avgObjSize = 258.7 and the other 40% - to one with avgObjSize = 197918.01. The first one is append only - we do not delete any documents. The stats for these collections:

"count" : 43261219,

"size" : 11192756168,

"avgObjSize" : 258.72493717756777,

"storageSize" : 13612350704,

"numExtents" : 36,

"nindexes" : 3,

"lastExtentSize" : 2146426864,

"paddingFactor" : 1.9499999994096007,

"flags" : 0,

"totalIndexSize" : 11884048672,

"indexSizes" : {

"_id_" : 2973811648,

"Idx_..." : 4840195760,

"Idx_..." : 4070041264

},

"ok" : 1,

"count" : 400,

"size" : 79167204,

"avgObjSize" : 197918.01,

"storageSize" : 111137536,

"numExtents" : 12,

"nindexes" : 2,

"lastExtentSize" : 25047552,

"paddingFactor" : 1,

"flags" : 1,

"totalIndexSize" : 81920,

"indexSizes" : {

"_id_" : 40960,

"Idx_..." : 40960

},

"ok" : 1

Also you could notice from diskstats output that mongo doesn't flush too much data (wr_mb_s).

> You could try lowering --syncdelay and seeing if things are smoother.

I've already tried that. For syncdelay = 30s some flushes take about 27s!

Bernie Hackett

unread,

Oct 31, 2011, 2:45:53 PM10/31/11

to mongodb-user

Couple of questions:

- Is flushing always slow or is this intermittent?
- Have you always had this problem or did it just begin recently?
- What file system are you using?
- Are you sure the write cache is enabled on your RAID card?

Igor Valko

unread,

Nov 1, 2011, 6:29:55 AM11/1/11

to mongod...@googlegroups.com

Hi Bernie,

- Is flushing always slow or is this intermittent?

No, it is NOT intermittent, and under any decent load it gets worse.

- Have you always had this problem or did it just begin recently?

The problem has been revealed recently. I cannot give you the exact answer right now, it's also the part of our investigation whether any changes brought us to the problem.

- What file system are you using?

ext3

- Are you sure the write cache is enabled on your RAID card?

yep, write-back cache is enabled for both controllers.

Thanks for assistance!

Nat

unread,

Nov 1, 2011, 6:41:28 AM11/1/11

to mongod...@googlegroups.com

Ext3 doesn't support posix_fallocate like ext4 so database preallocation can cause a long pause. You should consider upgrading to ext4.

From: Igor Valko <jest...@gmail.com>

Sender: mongod...@googlegroups.com

Date: Tue, 1 Nov 2011 13:29:55 +0300

To: <mongod...@googlegroups.com>

ReplyTo: mongod...@googlegroups.com

Subject: Re: [mongodb-user] Re: Long flushes

--

Igor Valko

unread,

Nov 1, 2011, 6:49:20 AM11/1/11

to mongod...@googlegroups.com

Thanks. We'll consider this. But I don't believe that's the root issue - i see bad performance on each flush.

2011/11/1 Nat <nat....@gmail.com>

Nat

unread,

Nov 1, 2011, 6:56:25 AM11/1/11

to mongod...@googlegroups.com

You can check the log file to see how long it took to allocate a file when it happens.

From: Igor Valko <jest...@gmail.com>

Sender: mongod...@googlegroups.com

Date: Tue, 1 Nov 2011 13:49:20 +0300

Bernie Hackett

unread,

Nov 1, 2011, 2:00:00 PM11/1/11

to mongodb-user

Definitely recommend migrating to ext4 or switching to xfs. Some RAID
controllers use battery backups for durability and will automatically
disable write caching if the battery fails.

Igor Valko

unread,

Apr 17, 2012, 9:23:17 AM4/17/12

to mongod...@googlegroups.com

btw, the issue was solved by upgrading firmware of the RAID
controllers.

1 ноября 2011 г. 12:49 пользователь Igor Valko <jest...@gmail.com> написал:

Reply all

Reply to author

Forward