Initially I thought the problem might be with mongo, so as a test I
setup another machine and ran ~2500 inserts per second - same result.
The occasional latency remained at the same frequency, even with a
100x increase in number of inserts per second.
This is just a personal project at the moment, but when I have time
I'll probably see what the difference is on ephemeral storage
(possibly with the db replicated on EBS), see if using XFS makes a
difference, and look at striping.
I also briefly tried with a 64 bit ec2 micro instance, but was seeing
latency spikes up to 15,000ms. Not sure if that's endemic to micro
instances, or I just caught EBS on a bad day.
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>
Running ubuntu 10.04 (http://alestic.com/ 64bit S3 based AMI),
mongo-stable installed via apt-get the package from the 10gen
repository.
Only nonstock software on the machine is mongodb, some javascript map
reduce jobs that run once an hour by cron (not running during this
sample), and now iostat.
sdf on the iostat is the ebs volume
4 indexes on the user collection, 5 on post - but despite one of each
below, about 90% of the spikes are on post inserts
Posts collections is currently about 3 million documents, users 1.8 million.
db version v1.6.3, pdfile version 4.5
git hash: 278bd2ac2f2efbee556f32c13c1b6803224d1c01
Mongodb log:
dbtop (occurences|percent of elapsed)
NS total Reads Writes Queries GetMores Inserts Updates Removes
GLOBAL 50 1.4% 0 0.0% 50 1.4% 0 0.0% 0 0.0% 25 1.1% 25 0.3% 0 0.0%
site.post 25 1.1% 0 0.0% 25 1.1% 0 0.0% 0 0.0% 25 1.1% 0 0.0% 0 0.0%
site.user 25 0.3% 0 0.0% 25 0.3% 0 0.0% 0 0.0% 0 0.0% 25 0.3% 0 0.0%
write lock % time in write lock, by 4 sec periods
1 2 4 10 5 5 3 3 1 2 3 3 4 5 3 2 3 4 3 2 15 2 3 6 2 4 1 1 7 10 6 5 4 1
4 14 7 5 3 12 2 6 2 22 4 2 4 3 2 3 6 4 3 2 2 2 4 3 3 4 4 2 2 3 2 3 7 3
2 2 3 2 2 2 38 3 1 3 5 1 4 9 3 6 5 5 12 1 4 1 4 10 6 2 5 2 3 7 13
write locked now: false
05:23:18 [conn635] insert site.post 103ms
05:23:33 [conn635] update site.user query: { userId: 3634671 } 113ms
mongostat:
insert/s query/s update/s delete/s getmore/s command/s flushes/s
mapped vsize res faults/s locked % idx miss % q t|r|w conn
time
7 0 7 0 0 1 0
12235 12919 6667 1 0.7 0 0|0|0 45
05:23:08
7 0 7 0 0 1 0
12235 12919 6667 5 4.3 0 0|0|0 45
05:23:09
6 0 6 0 0 1 0
12235 12919 6667 3 3.8 0 0|0|0 45
05:23:10
5 0 5 0 0 1 0
12235 12919 6667 1 1.6 0 0|0|0 45
05:23:11
7 0 7 0 0 1 0
12235 12919 6667 3 3.5 0 0|0|0 45
05:23:12
6 0 6 0 0 1 0
12235 12919 6667 1 1.7 0 0|0|0 45
05:23:13
7 0 7 0 0 1 0
12235 12919 6667 2 2.9 0 0|0|0 45
05:23:14
9 0 9 0 0 1 0
12235 12919 6667 3 4.9 0 0|0|0 45
05:23:15
7 0 7 0 0 1 0
12235 12919 6667 5 4.1 0 0|0|0 45
05:23:16
7 0 7 0 0 1 0
12235 12919 6667 1 1.6 0 0|0|0 45
05:23:17
7 0 7 0 0 1 0
12235 12919 6667 4 16.6 0 0|0|0 45
05:23:18
8 0 8 0 0 1 0
12235 12919 6667 4 11.7 0 0|0|0 45
05:23:19
7 0 7 0 0 1 0
12235 12919 6667 2 1.3 0 0|0|0 45
05:23:20
7 0 7 0 0 1 0
12235 12919 6666 2 1.6 0 0|0|0 45
05:23:21
6 0 6 0 0 1 0
12235 12919 6666 1 1.8 0 0|0|0 45
05:23:22
6 0 6 0 0 1 0
12235 12919 6666 0 0.2 0 0|0|0 45
05:23:23
8 0 8 0 0 1 0
12235 12919 6666 2 2.7 0 0|0|0 45
05:23:24
7 0 7 0 0 1 0
12235 12919 6666 3 2.7 0 0|0|0 45
05:23:25
3 0 3 0 0 1 0
12235 12919 6666 1 0.6 0 0|0|0 45
05:23:26
6 0 6 0 0 1 0
12235 12919 6666 3 3.1 0 0|0|0 45
05:23:27
insert/s query/s update/s delete/s getmore/s command/s flushes/s
mapped vsize res faults/s locked % idx miss % q t|r|w conn
time
7 0 7 0 0 1 0
12235 12919 6666 3 3.5 0 0|0|0 45
05:23:28
7 0 7 0 0 1 0
12235 12919 6666 2 4.5 0 0|0|0 45
05:23:29
8 0 8 0 0 1 0
12235 12919 6666 5 9.8 0 0|0|0 45
05:23:30
8 0 8 0 0 1 0
12235 12919 6666 0 0.4 0 0|0|0 45
05:23:31
7 0 7 0 0 1 0
12235 12919 6666 0 0.5 0 0|0|0 45
05:23:32
4 0 4 0 0 1 0
12235 12919 6666 4 6.3 0 0|0|0 45
05:23:33
5 0 5 0 0 1 0
12235 12919 6666 2 14.9 0 0|0|0 45
05:23:34
3 0 3 0 0 1 0
12235 12919 6666 0 0.1 0 0|0|0 45
05:23:35
6 0 6 0 0 1 0
12235 12919 6666 8 7.3 0 0|0|0 45
05:23:36
6 0 6 0 0 1 0
12235 12919 6666 1 2 0 0|0|0 45
05:23:37
5 0 5 0 0 1 0
12235 12919 6667 1 0.4 0 0|0|0 45
05:23:38
7 0 7 0 0 1 0
12235 12919 6667 2 5.8 0 0|0|0 45
05:23:39
iostat -tz -x 2
10/07/2010 05:23:07 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 1.74 0.00 98.26
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sda1 0.00 0.00 0.00 1.99 0.00 15.92
8.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 11.94 0.00 405.97 0.00
34.00 0.11 9.58 2.50 2.99
10/07/2010 05:23:09 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 0.74 0.00 99.26
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sdf 0.00 0.00 5.97 1.99 242.79 15.92
32.50 0.01 1.88 1.87 1.49
10/07/2010 05:23:11 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 1.24 0.00 98.76
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sda1 0.00 0.00 0.00 1.00 0.00 7.96
8.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 3.98 0.00 111.44 0.00
28.00 0.04 11.25 6.25 2.49
10/07/2010 05:23:13 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 1.01 0.00 98.99
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sda1 0.00 0.00 0.00 1.99 0.00 15.92
8.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 5.47 0.00 147.26 0.00
26.91 0.02 4.55 4.55 2.49
10/07/2010 05:23:15 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 1.98 0.00 98.02
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sdf 0.00 0.00 6.97 1.99 175.12 15.92
21.33 0.08 9.44 3.33 2.99
10/07/2010 05:23:17 AM
avg-cpu: %user %nice %system %iowait %steal %idle
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sdf 0.00 0.00 5.47 0.00 107.46 0.00
19.64 0.08 14.55 5.45 2.99
10/07/2010 05:23:19 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 15.14 0.00 84.86
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sda1 0.00 0.00 0.00 2.99 0.00 23.88
8.00 0.00 0.00 0.00 0.00
sdf 0.00 396.02 13.93 415.92 557.21 6495.52
16.41 32.15 74.80 0.75 32.34
10/07/2010 05:23:21 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.26 0.78 0.00 98.96
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sdf 0.00 0.00 8.46 0.00 222.89 0.00
26.35 0.03 3.53 1.18 1.00
10/07/2010 05:23:23 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 0.56 0.00 99.44
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sdf 0.00 0.00 1.99 0.00 99.50 0.00
50.00 0.04 20.00 5.00 1.00
10/07/2010 05:23:25 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 1.25 0.00 98.75
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sda1 0.00 0.00 0.00 2.99 0.00 23.88
8.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 8.96 1.99 433.83 15.92
41.09 0.06 5.45 1.82 1.99
10/07/2010 05:23:27 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 0.75 0.00 99.25
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sda1 0.00 0.00 0.00 1.49 0.00 11.94
8.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 7.96 0.00 234.83 0.00
29.50 0.04 5.62 2.50 1.99
10/07/2010 05:23:29 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 2.00 0.00 98.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sdf 0.00 0.00 7.96 1.99 302.49 15.92
32.00 0.12 12.50 4.50 4.48
10/07/2010 05:23:31 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 2.49 0.00 97.51
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sda1 0.00 0.00 0.00 1.99 0.00 15.92
8.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 8.96 0.00 401.99 0.00
44.89 0.20 22.22 5.56 4.98
10/07/2010 05:23:33 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 9.75 0.00 90.25
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sdf 0.00 431.00 2.00 357.50 16.00 5712.00
15.93 29.87 68.83 0.58 21.00
10/07/2010 05:23:35 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 26.93 0.25 72.82
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sdf 0.00 960.70 3.48 905.47 39.80 15526.37
17.13 95.35 110.51 0.62 56.72
10/07/2010 05:23:37 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 1.99 0.00 98.01
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sda1 0.00 0.00 0.00 2.99 0.00 23.88
8.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 8.96 0.00 214.93 0.00
24.00 0.11 12.78 4.44 3.98
10/07/2010 05:23:39 AM
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.25 3.00 0.00 96.75
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
avgrq-sz avgqu-sz await svctm %util
sdf 0.00 61.19 1.49 76.12 39.80 1098.51
14.67 3.25 41.47 0.83 6.47
mongostat
insert/s query/s update/s delete/s getmore/s command/s flushes/s
mapped vsize res faults/s locked % idx miss % q t|r|w conn
time
7 0 7 0 0 1 0
12235 12919 6667 1 0.7 0 0|0|0 45
05:23:08
7 0 7 0 0 1 0
12235 12919 6667 5 4.3 0 0|0|0 45
05:23:09
6 0 6 0 0 1 0
12235 12919 6667 3 3.8 0 0|0|0 45
05:23:10
5 0 5 0 0 1 0
12235 12919 6667 1 1.6 0 0|0|0 45
05:23:11
7 0 7 0 0 1 0
12235 12919 6667 3 3.5 0 0|0|0 45
05:23:12
6 0 6 0 0 1 0
12235 12919 6667 1 1.7 0 0|0|0 45
05:23:13
7 0 7 0 0 1 0
12235 12919 6667 2 2.9 0 0|0|0 45
05:23:14
9 0 9 0 0 1 0
12235 12919 6667 3 4.9 0 0|0|0 45
05:23:15
7 0 7 0 0 1 0
12235 12919 6667 5 4.1 0 0|0|0 45
05:23:16
7 0 7 0 0 1 0
12235 12919 6667 1 1.6 0 0|0|0 45
05:23:17
7 0 7 0 0 1 0
12235 12919 6667 4 16.6 0 0|0|0 45
05:23:18
8 0 8 0 0 1 0
12235 12919 6667 4 11.7 0 0|0|0 45
05:23:19
7 0 7 0 0 1 0
12235 12919 6667 2 1.3 0 0|0|0 45
05:23:20
7 0 7 0 0 1 0
12235 12919 6666 2 1.6 0 0|0|0 45
05:23:21
6 0 6 0 0 1 0
12235 12919 6666 1 1.8 0 0|0|0 45
05:23:22
6 0 6 0 0 1 0
12235 12919 6666 0 0.2 0 0|0|0 45
05:23:23
8 0 8 0 0 1 0
12235 12919 6666 2 2.7 0 0|0|0 45
05:23:24
7 0 7 0 0 1 0
12235 12919 6666 3 2.7 0 0|0|0 45
05:23:25
3 0 3 0 0 1 0
12235 12919 6666 1 0.6 0 0|0|0 45
05:23:26
6 0 6 0 0 1 0
12235 12919 6666 3 3.1 0 0|0|0 45
05:23:27
insert/s query/s update/s delete/s getmore/s command/s flushes/s
mapped vsize res faults/s locked % idx miss % q t|r|w conn
time
7 0 7 0 0 1 0
12235 12919 6666 3 3.5 0 0|0|0 45
05:23:28
7 0 7 0 0 1 0
12235 12919 6666 2 4.5 0 0|0|0 45
05:23:29
8 0 8 0 0 1 0
12235 12919 6666 5 9.8 0 0|0|0 45
05:23:30
8 0 8 0 0 1 0
12235 12919 6666 0 0.4 0 0|0|0 45
05:23:31
7 0 7 0 0 1 0
12235 12919 6666 0 0.5 0 0|0|0 45
05:23:32
4 0 4 0 0 1 0
12235 12919 6666 4 6.3 0 0|0|0 45
05:23:33
5 0 5 0 0 1 0
12235 12919 6666 2 14.9 0 0|0|0 45
05:23:34
3 0 3 0 0 1 0
12235 12919 6666 0 0.1 0 0|0|0 45
05:23:35
6 0 6 0 0 1 0
12235 12919 6666 8 7.3 0 0|0|0 45
05:23:36
6 0 6 0 0 1 0
12235 12919 6666 1 2 0 0|0|0 45
05:23:37
5 0 5 0 0 1 0
12235 12919 6667 1 0.4 0 0|0|0 45
05:23:38
7 0 7 0 0 1 0
12235 12919 6667 2 5.8 0 0|0|0 45
05:23:39
The occasional latency almost always happens on the inserts, not
upserts. I've seen a few people suggest writing out zeros to the
entire EBS volume first can improve performance, but haven't seen a
definitive answer or solid numbers testing that theory out.