Hi there,
I've plugged mongo into my production system today to see how it handles the write speed for a few days. So far I only have one server with a 15k RPM spinning disk and 8GB of ram. If I take this to production properly, I intend to shard, but hopefully to only 2 or 3 servers.
Right now I'm doing about 200 upsert findAndModify's per second, and 200 upsert updates. The stats tells me the averageObjSize is 129 bytes for the documents receiving findAndModifys (4 indexes) and 91 bytes for the documents receiving updates (1 index). The dataset (data + indexes) is currently around 1GB in total.
I have these entries in the mongo logs - I don't know what they mean yet, but they're concerning:
Wed Nov 14 21:49:23 [DataFileSync] flushing mmaps took 10482ms for 11 files
Wed Nov 14 21:53:23 [DataFileSync] flushing mmaps took 10520ms for 11 files
Wed Nov 14 21:55:07 [websvr] serverStatus was very slow: { after basic: 0, middle of mem: 1640, after mem: 1640, after connections: 1640, after extra info: 1640, after counters: 1640, after repl: 1640, after asserts: 1640, after dur: 1640, at end: 1640 }
Wed Nov 14 21:56:22 [DataFileSync] flushing mmaps took 10078ms for 11 files
Wed Nov 14 21:57:04 [journal] old journal file will be removed: /var/lib/mongodb/journal/j._46
Wed Nov 14 21:57:22 [DataFileSync] flushing mmaps took 10008ms for 11 files
Wed Nov 14 22:00:23 [DataFileSync] flushing mmaps took 10751ms for 11 files
Wed Nov 14 22:01:23 [DataFileSync] flushing mmaps took 11075ms for 11 files
Wed Nov 14 22:05:11 [journal] old journal file will be removed: /var/lib/mongodb/journal/j._47
Wed Nov 14 22:06:23 [DataFileSync] flushing mmaps took 10347ms for 11 files
Wed Nov 14 22:07:24 [DataFileSync] flushing mmaps took 11479ms for 11 files
Wed Nov 14 22:10:23 [DataFileSync] flushing mmaps took 10410ms for 11 files
Wed Nov 14 22:11:23 [DataFileSync] flushing mmaps took 10340ms for 11 files
Wed Nov 14 22:13:24 [DataFileSync] flushing mmaps took 11128ms for 11 files
Wed Nov 14 22:13:27 [journal] old journal file will be removed: /var/lib/mongodb/journal/j._48
Wed Nov 14 22:14:24 [DataFileSync] flushing mmaps took 11807ms for 11 files
Wed Nov 14 22:15:24 [DataFileSync] flushing mmaps took 11205ms for 11 files
I'm not doing any reads yet, but I decided to check to see how consistent they are - if the mmaps flushing is causing read stalls.
I set up a simple query which fetches one record based on an index, and had it repeat every second. I log the duration and timestamp if it took longer than 3ms (most were under 1ms). This is the output I saw - some operations taking multiple seconds for a single record query based on an index. Where all the data and indexes should still be in ram..
0.104574109 2012-11-14 22:13:54 -0600
4.74783542 2012-11-14 22:14:22 -0600
0.243556581 2012-11-14 22:14:25 -0600
0.821347295 2012-11-14 22:14:30 -0600
0.036629986 2012-11-14 22:14:56 -0600
3.851662545 2012-11-14 22:15:05 -0600
0.007991038 2012-11-14 22:15:10 -0600
0.004475775 2012-11-14 22:15:57 -0600
0.09404014 2012-11-14 22:16:00 -0600
0.004445754 2012-11-14 22:16:07 -0600
0.024827767 2012-11-14 22:16:16 -0600
0.004718648 2012-11-14 22:16:29 -0600
0.543830541 2012-11-14 22:16:48 -0600
0.081387443 2012-11-14 22:16:57 -0600
0.093996291 2012-11-14 22:17:55 -0600
0.013581206 2012-11-14 22:18:01 -0600
0.455109956 2012-11-14 22:18:17 -0600
0.543997559 2012-11-14 22:18:20 -0600
0.006581586 2012-11-14 22:18:27 -0600
0.040279709 2012-11-14 22:18:52 -0600
3.9563892 2012-11-14 22:19:22 -0600
0.08867152 2012-11-14 22:19:36 -0600
0.044744453 2012-11-14 22:19:54 -0600
0.011553574 2012-11-14 22:20:14 -0600
0.151064978 2012-11-14 22:20:50 -0600
My munin graph shows disk utilization is at 33%. iostat often reports util at 0% - 2%, but has bursts at 100%.
Is it possible I have something misconfigured here or is there something I likely need to tune? Am I overloading the machine? It's my first day throwing data at mongo, so if someone more experienced could offer any insight, it would be much appreciated.
Cheers,
Tim