> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
<iostat.txt>
If you need totally random access, the best thing to do is have enough
shards to keep the index almost entirely in ram.
So if you have 32gb of index, and 16gb machines, you would want 2-3 shards.
Why are yo using string _id that look like numbers? Would be better
to use actual numbers as the index will be smaller so more will fit in
ram
Different types still impact index size though, but there are many
factors, including density, etc...
(Looks like I sent this to the wrong post)
Sharding is a good long term strategy for any system where you have a
lot of data.
What's the read/write ratio on this system?
If its mostly read, 1 option is to have 2 servers and 2 shards. Each
server would run a master and a slave for the other server.
The slave will mostly be not in memory since LRU is across all data on
the server.
This only works if its mostly reads though.
Also - for your 3 collections, these are the % differences and total
size disparity between the shards..
2.5% - 3gb
25% - 500mb
5% - 3gb
Those are right along the thresholds for when things will get balanced
so I think that's in pretty good shape.
Its not efficient to keep things perfectly balanced as a temporary
in-balance would then trigger lots of behind the scenes work.
Better to wait and see for a bit - and then fix.
I'll try to recreate the scenario where there was the same load, what
would help to debug? mongostat, anything else? Log files? more db
stats ?
As this will be done on the production env, it's better to try to
gathe as many info as I can from the test.
Thanks, ofer
high locked% and fault/s, also, this server also runs the config server,
it has 32GB RAM, there's also a mongos on this one as well.
This is from the shard2 server:
connected to: localhost:27018
insert/s query/s update/s delete/s getmore/s command/s flushes/s mapped
vsize res faults/s locked % idx miss % q t|r|w conn time
12 0 42 0 0 2 0 278345
282495 18114 7 9.2 0 0|0|0 467 00:46:09
54 0 9 0 0 2 0 278345
282495 18114 1 1.7 0 0|0|0 467 00:46:10
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:11
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:12
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:13
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:14
1 0 11 0 0 1 0 278345
282495 18114 5 10.1 0 0|0|0 467 00:46:15
2 0 5 0 0 1 0 278345
282495 18114 2 3 0 0|0|0 467 00:46:16
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:17
6 0 0 0 0 3 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:18
4 0 19 0 0 2 0 278345
282495 18114 3 4.4 0 0|0|0 467 00:46:19
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:20
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:21
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:22
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:23
0 0 9 0 0 1 0 278345
282495 18114 1 0.1 0 0|0|0 467 00:46:24
37 0 23 0 0 1 0 278345
282495 18114 10 19.2 0 0|0|0 467 00:46:25
202 0 17 0 0 2 0 278345
282495 18114 0 0.8 0 0|0|0 467 00:46:26
It looks like the second shard servers is in quite better shape, there
are spikes of locks, this server runs only shard server and mongos
process, and has 32GB RAM as well.
Everything went well with only writing to this cluster, at the moment we
have started reading from it as well, the locks begin.
We have reversed back to only reading but high lock% is still there.
Is there any documents that i'm missing? i couldn't find any help on the
site that could help me diagnose it better.
I do see in iostat that util goes up to 100% (but!, no high disk usage
for some reason, no high write/s or read/s or even Write/Read MB/s).
There's some iowait on vmstat (around 30% sometimes), but i can't figure
out why, more servers won't help much i guess, as the index is yet not
large enough to exceed 32GB.
What can i do more to diagnose and understand why i have high lock%?
Adding more shard servers? more config servers?
We do not add any servers until we know that it will help performance
wise, after we will see it works we will start adding more config
servers and replicaset for each shard server so to keep it running even
if a server crash.
Current Configuration:
shard00 server - 32GB, RAID10 - 4 disks (running config server on port
27019, shard server on port 27018, mongos on port 27017)
shard01 server - 32GB, RAID10 - 4 disks (running shard server on port
27018, mongos on port 27017)
each application server is running an instance of mongos that point to
shard00 server config server (port 27019).
Thanks!
I did notice that the disks were saturated, high util%, i'm trying to
understand how come it went so hard on the disk if the index fit well in
memory.
I will turn on profiling, i just hope it won't decrease the performance
by that much.
Thanks,
Erez.
We use _id which is indexed.
Fri Oct 1 13:18:27 [conn284] query db.collection2 ntoreturn:1 idhack
reslen:2367 bytes:2351 1076ms
Fri Oct 1 13:18:27 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 330ms
Fri Oct 1 13:18:32 [conn269] query db.collection2 ntoreturn:1 idhack
reslen:3100 bytes:3084 581ms
Fri Oct 1 13:18:32 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 487ms
Fri Oct 1 13:18:35 [conn360] update db.collection2 query: { _id:
483894948 } nscanned:1 moved 126ms
Fri Oct 1 13:18:35 [conn406] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 103ms
Fri Oct 1 13:18:37 [conn92] query db.collection2 ntoreturn:1 idhack
reslen:131 bytes:115 795ms
Fri Oct 1 13:18:42 [conn157] query db.collection2_interaction
ntoreturn:1 idhack reslen:140 bytes:124 614ms
Fri Oct 1 13:18:42 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 138ms
Fri Oct 1 13:18:42 [conn386] query db.collection2 ntoreturn:1 idhack
reslen:1326 bytes:1310 116ms
Fri Oct 1 13:18:43 [conn126] query db.collection2_interaction
ntoreturn:1 idhack reslen:140 bytes:124 589ms
Fri Oct 1 13:18:43 [conn84] query db.collection2 ntoreturn:1 idhack
reslen:62 bytes:46 273ms
Fri Oct 1 13:18:44 [conn291] query db.collection2 ntoreturn:1 idhack
reslen:2883 bytes:2867 394ms
Fri Oct 1 13:18:44 [conn299] query db.collection2 ntoreturn:1 idhack
reslen:168 bytes:152 262ms
Fri Oct 1 13:18:45 [conn287] query db.collection2 ntoreturn:1 idhack
reslen:619 bytes:603 224ms
Fri Oct 1 13:18:47 [conn147] query db.collection2_interaction
ntoreturn:1 idhack reslen:82 bytes:66 212ms
Fri Oct 1 13:18:47 [conn99] query db.collection2 ntoreturn:1 idhack
reslen:62 bytes:46 134ms
Fri Oct 1 13:18:51 [conn293] query db.collection2 ntoreturn:1 idhack
reslen:8370 bytes:8354 251ms
Fri Oct 1 13:18:52 [conn324] query db.collection3 ntoreturn:1 idhack
reslen:220 bytes:204 211ms
Fri Oct 1 13:18:52 [conn363] query db.collection3 ntoreturn:1 idhack
reslen:58 bytes:42 228ms
Fri Oct 1 13:18:58 [conn207] query db.collection3 ntoreturn:1 idhack
reslen:104 bytes:88 1283ms
Fri Oct 1 13:18:58 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 947ms
Fri Oct 1 13:18:59 [conn82] query db.collection2 ntoreturn:1 idhack
reslen:305 bytes:289 292ms
Fri Oct 1 13:18:59 [conn225] query db.collection2 ntoreturn:1 idhack
reslen:2643 bytes:2627 136ms
Fri Oct 1 13:19:00 [conn82] query db.collection2 ntoreturn:1 idhack
reslen:62 bytes:46 225ms
Fri Oct 1 13:19:00 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 176ms
Fri Oct 1 13:19:02 [conn157] query db.collection2 ntoreturn:1 idhack
reslen:3818 bytes:3802 251ms
Fri Oct 1 13:19:03 [conn159] query db.collection2 ntoreturn:1 idhack
reslen:1608 bytes:1592 907ms
Fri Oct 1 13:19:03 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 571ms
Fri Oct 1 13:19:07 [conn64] query db.collection2_interaction
ntoreturn:1 idhack reslen:1072 bytes:1056 208ms
Fri Oct 1 13:19:07 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 196ms
Fri Oct 1 13:19:08 [conn11] query db.collection2 ntoreturn:1 idhack
reslen:1863 bytes:1847 212ms
Fri Oct 1 13:19:08 [conn95] query db.collection2 ntoreturn:1 idhack
reslen:62 bytes:46 220ms
Fri Oct 1 13:19:10 [conn46] query db.collection2 ntoreturn:1 idhack
reslen:10490 bytes:10474 121ms
Fri Oct 1 13:19:10 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 112ms
Fri Oct 1 13:19:12 [conn369] update db.collection2 query: { _id:
463316827 } nscanned:1 moved 284ms
Fri Oct 1 13:19:12 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 143ms
Fri Oct 1 13:19:12 [conn226] query db.collection2 ntoreturn:1 idhack
reslen:1544 bytes:1528 237ms
Fri Oct 1 13:19:12 [conn231] query db.collection2_interaction
ntoreturn:1 idhack reslen:175 bytes:159 198ms
Fri Oct 1 13:19:13 [conn99] query db.collection3 ntoreturn:1 idhack
reslen:58 bytes:42 221ms
Fri Oct 1 13:19:13 [conn91] query db.collection2 ntoreturn:1 idhack
reslen:634 bytes:618 251ms
Fri Oct 1 13:19:13 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 216ms
Fri Oct 1 13:19:13 [conn369] update db.collection2 query: { _id:
468690787 } nscanned:1 moved 103ms
Fri Oct 1 13:19:13 [conn90] query db.collection2 ntoreturn:1 idhack
reslen:62 bytes:46 139ms
Fri Oct 1 13:19:15 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 680ms
Fri Oct 1 13:19:15 [conn214] query db.collection2 ntoreturn:1 idhack
reslen:382 bytes:366 1302ms
Fri Oct 1 13:19:18 [conn345] query db.collection3 ntoreturn:1 idhack
reslen:58 bytes:42 162ms
Fri Oct 1 13:19:18 [conn83] query db.collection2 ntoreturn:1 idhack
reslen:1854 bytes:1838 126ms
Fri Oct 1 13:19:19 [conn355] query admin.$cmd ntoreturn:1 command: {
features: 1 } reslen:85 112ms
Fri Oct 1 13:19:19 [conn454] query admin.$cmd ntoreturn:1 command: {
features: 1 } reslen:85 112ms
Fri Oct 1 13:19:19 [conn230] query db.collection2 ntoreturn:1 idhack
reslen:843 bytes:827 144ms
Fri Oct 1 13:19:19 [conn429] query admin.$cmd ntoreturn:1 command: {
features: 1 } reslen:85 105ms
Fri Oct 1 13:19:20 [conn390] query db.collection2 ntoreturn:1 idhack
reslen:4101 bytes:4085 243ms
lock% started to grow a little bit:
insert/s query/s update/s delete/s getmore/s command/s flushes/s mapped
vsize res faults/s locked % idx miss % q t|r|w conn time
8 767 7 0 0 1 0 278505
282514 18278 41 5.3 0 15|2|13 444 13:21:03
0 199 11 0 0 1 0 278505
282514 18278 14 12.9 0 231|69|162 444 13:21:04
2 121 7 0 0 2 0 278505
282514 18278 11 0.7 0 249|21|228 444 13:21:07
24 867 19 0 0 2 0 278505
282514 18279 57 3.3 0 0|0|0 444 13:21:08
7 715 20 0 0 4 0 278505
282514 18279 37 4.4 0 127|70|57 444 13:21:09
1 217 10 0 0 1 0 278505
282514 18279 16 1 0 207|93|114 444 13:21:10
24 672 19 0 0 1 0 278505
282514 18280 54 5.8 0 127|25|102 444 13:21:11
29 858 17 0 0 1 0 278505
282514 18281 59 9.6 0 49|12|37 444 13:21:12
27 679 9 0 0 1 0 278505
282514 18281 24 4.8 0 12|6|6 444 13:21:14
7 448 24 0 0 1 0 278505
282514 18281 13 3.4 0 129|41|88 444 13:21:15
5 231 6 0 0 1 0 278505
282514 18282 31 1.6 0 192|4|188 444 13:21:16
25 1035 35 0 0 1 0 278505
282514 18282 47 7.7 0 172|11|161 444 13:21:17
22 958 40 0 0 3 0 278505
282514 18283 53 8 0 68|56|12 444 13:21:18
4 374 23 0 0 1 0 278505
282514 18283 16 34.4 0 228|170|58 444 13:21:19
disks are not saturated as before though, they are fine. (util% is
around 50-70, and iowait is very low if any at all.
it makes mongo hangs a little bit sometimes, it's just from the minute
we included updates.
Thanks,
Erez.
Another thing that i have came across is perhaps trying to decrease the
syncdelay to 30 instead of 60?, is it possible that the fsync interval
of 60 seconds is too high for our needs?
Thanks,
Erez.