fast put, super slow findOne(MongoCursorTimeoutException after 30000ms)

376 views
Skip to first unread message

oferfort

unread,
Aug 27, 2010, 6:44:08 PM8/27/10
to mongodb-user
Moving on with our move from mysql to mongo, i am seeing the find
getting slower and slower.
All my inserts are les than a millisecond, but my findOne are getting
slower and slower.
Could this be all because of the size of the index?
I have now multiple machines inserting data and querying mongo, and
the findOne takes longer and longer.
In my frontend(PHP) it times our after 30s ( i guess that's the
default).
On my backend(JAVA) i see responses between a few milliseconds to a
few MINUTES (300,000 ms)
My mongo seerver is on a 16GM ram machine, CPU seems pretty low most
of the time.

those slow responses started as soon as i added more and more servers
inserting and querying, but still, 300,000ms for a query based on the
only index the table has seems strange.

is there some configuration i can change so that it will be able to
handle many requests from different client?

On my tests, when there were a few clients connected, the rate of
queries was much higher, and the response time was very good.

please help me, since this is a critical stage in our move, and i
didn't expect this behavior.

thanks,
ofer

Eliot Horowitz

unread,
Aug 27, 2010, 7:52:30 PM8/27/10
to mongod...@googlegroups.com
What's your index to ram ratio? While your performance isn't good, can you run "iostat -x 2"

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Ofer Fort

unread,
Aug 27, 2010, 8:03:05 PM8/27/10
to mongod...@googlegroups.com
i have 16GB ram, and about 32GB index, i understand why it's not all super fast since half the index is not in ram, but a few minutes for a get?
when i was inserting and querying from a single client at a much higher rate i had no problem , a few millis at top,the problem started when i started running more and more clients
attached is the iostat file, thanks



> db.collection1.stats()
{
        "ns" : "db1.collection1",
        "count" : 396809500,
        "size" : 199473952388,
        "avgObjSize" : 502.69449795934827,
        "storageSize" : 207042362624,
        "numExtents" : 134,
        "nindexes" : 1,
        "lastExtentSize" : 1991168256,
        "paddingFactor" : 1,
        "flags" : 1,
        "totalIndexSize" : 15569675984,
        "indexSizes" : {
                "_id_" : 15569675984
        },
        "ok" : 1
}
> db.collection2.stats()
{
        "ns" : "db1.collection2",
        "count" : 417544667,
        "size" : 115451087512,
        "avgObjSize" : 276.4999690727699,
        "storageSize" : 123413295872,
        "numExtents" : 92,
        "nindexes" : 1,
        "lastExtentSize" : 1991168256,
        "paddingFactor" : 1,
        "flags" : 1,
        "totalIndexSize" : 16444851920,
        "indexSizes" : {
                "_id_" : 16444851920
        },
        "ok" : 1
}
> db.collection3.stats()
{
        "ns" : "db1.collection3",
        "count" : 136453683,
        "size" : 4618535340,
        "avgObjSize" : 33.84690862466497,
        "storageSize" : 6874642432,
        "numExtents" : 33,
        "nindexes" : 1,
        "lastExtentSize" : 1152296704,
        "paddingFactor" : 1,
        "flags" : 1,
        "totalIndexSize" : 5270865584,
        "indexSizes" : {
                "_id_" : 5270865584
        },
        "ok" : 1
iostat.txt

Eliot Horowitz

unread,
Aug 27, 2010, 8:11:09 PM8/27/10
to mongod...@googlegroups.com
Overall your disk is totally saturated. 
Can you send the log for the few minute query?  
Do you need truly random reads?  If so the two options are adding ram or sharding. 
<iostat.txt>

Ofer Fort

unread,
Aug 27, 2010, 8:23:16 PM8/27/10
to mongod...@googlegroups.com
my reads are rather random, what would be a good sharding architecture? how many servers with how much ram?

but does this explain a few minutes query?
thanks
log.txt

Eliot Horowitz

unread,
Aug 28, 2010, 2:18:39 AM8/28/10
to mongod...@googlegroups.com
The slowest one i see in the log is 634ms, which can happen if the
disk is totally saturated.

If you need totally random access, the best thing to do is have enough
shards to keep the index almost entirely in ram.

So if you have 32gb of index, and 16gb machines, you would want 2-3 shards.

Ofer Fort

unread,
Aug 28, 2010, 2:47:05 AM8/28/10
to mongod...@googlegroups.com
from the client side, the time it took to get a reply is much more then 600ms, during the long ones it get's to 30,000, and i even saw it once with 900,000ms.
could it be something with the client library ? (i'm using the java 2.0 raj), and same behavior from the php client.

Now i changed my code, so i'm only inserting from the java(backend), on the client side, returns very fast, less than 1ms, but on the mongo log i see very long times:
Fri Aug 27 23:41:40 [conn9874] update content_cache.object  query: { _id: "406993891" } 131ms
Fri Aug 27 23:41:40 [conn11479] update content_cache.object  query: { _id: "176360685" } 9403ms
Fri Aug 27 23:41:40 [conn9612] update content_cache.owner  query: { _id: "11476787" } 9542ms
Fri Aug 27 23:41:41 [conn11572] update content_cache.object  query: { _id: "399658804" } 9289ms
Fri Aug 27 23:41:41 [conn11051] update content_cache.object  query: { _id: "419258889" } 73316ms
Fri Aug 27 23:41:41 [conn10111] update content_cache.owner  query: { _id: "36545786" } 44616ms
Fri Aug 27 23:41:41 [conn9278] update content_cache.object  query: { _id: "131120004" } 6070ms
Fri Aug 27 23:41:42 [conn9158] update content_cache.object  query: { _id: "421771801" } 5892ms
Fri Aug 27 23:41:42 [conn11397] update content_cache.object  query: { _id: "421162026" } 42324ms
Fri Aug 27 23:41:42 [conn9916] update content_cache.owner  query: { _id: "6102139" } 20589ms
Fri Aug 27 23:41:42 [conn9095] update content_cache.object  query: { _id: "401307292" } 63114ms
Fri Aug 27 23:41:42 [conn11519] update content_cache.object  query: { _id: "408067654" } 4863ms
Fri Aug 27 23:41:42 [conn9688] update content_cache.owner  query: { _id: "170701" } 20189ms
Fri Aug 27 23:41:42 [conn10168] update content_cache.owner  query: { _id: "493240" } 62855ms
Fri Aug 27 23:41:42 [conn9947] update content_cache.owner  query: { _id: "18624614" } 4726ms
Fri Aug 27 23:41:42 [conn10763] update content_cache.object  query: { _id: "401447851" } 28651ms
Fri Aug 27 23:41:42 [conn10875] update content_cache.object  query: { _id: "347403105" } 3956ms
Fri Aug 27 23:41:43 [conn9767] update content_cache.object  query: { _id: "420110281" } 26706ms
Fri Aug 27 23:41:43 [conn10118] update content_cache.owner  query: { _id: "187677" } 14297ms
Fri Aug 27 23:41:43 [conn11952] update content_cache.owner  query: { _id: "122911794" } 26420ms
Fri Aug 27 23:41:43 [conn10066] update content_cache.owner  query: { _id: "4006151" } 3422ms
Fri Aug 27 23:41:44 [conn10872] update content_cache.object  query: { _id: "170544967" } 103ms
Fri Aug 27 23:41:45 [conn10441] update content_cache.object  query: { _id: "400291482" } 37664ms
Fri Aug 27 23:41:45 [conn9676] update content_cache.owner  query: { _id: "115842420" } 25181ms
Fri Aug 27 23:41:45 [conn10836] update content_cache.object  query: { _id: "406181682" } 24635ms
Fri Aug 27 23:41:45 [conn11464] update content_cache.object  query: { _id: "354031534" } 10342ms
Fri Aug 27 23:41:45 [conn9198] update content_cache.object  query: { _id: "391526194" } 9920ms

i guess it's due to the fact that the client doesn't wait for the update to finish?
and the same times i still get from from the front end...

Eliot Horowitz

unread,
Aug 28, 2010, 2:50:13 AM8/28/10
to mongod...@googlegroups.com
Your disk is totally saturated, so everything can be slow.

Why are yo using string _id that look like numbers? Would be better
to use actual numbers as the index will be smaller so more will fit in
ram

Ofer Fort

unread,
Aug 28, 2010, 2:52:57 AM8/28/10
to mongod...@googlegroups.com
didn't mean to do that, i just set the _id for the new object i inset to be the same id as in the rest of the system.
how can i make it to be an int instead of a string? and how much will it reduce the index?

thanks

Ofer Fort

unread,
Aug 28, 2010, 2:56:33 AM8/28/10
to mongod...@googlegroups.com
i now see that in my code, i change the int value to String, what a waste...
so how much will it reduce the index size? and can i do an update on the db to update all the ids from string to int?

again, thanks a lot, we are in the midst of the transfer and i really appreciate you input
ofer

Ankur

unread,
Aug 28, 2010, 8:07:59 AM8/28/10
to mongodb-user
You can't do an update on the '_id' field, so you will be forced to
create a new collection, either copying from your old mongo collection
or if you have the data, you can recreate from scratch. But that
would definitely help your index size tremendously. Maybe you can
post index sizes after you are done, I am curious.

Ankur

On Aug 28, 2:56 am, Ofer Fort <ofer...@gmail.com> wrote:
> i now see that in my code, i change the int value to String, what a waste...
> so how much will it reduce the index size? and can i do an update on the db
> to update all the ids from string to int?
>
> again, thanks a lot, we are in the midst of the transfer and i really
> appreciate you input
> ofer
>
>
>
> On Sat, Aug 28, 2010 at 9:52 AM, Ofer Fort <ofer...@gmail.com> wrote:
> > didn't mean to do that, i just set the _id for the new object i inset to be
> > the same id as in the rest of the system.
> > how can i make it to be an int instead of a string? and how much will it
> > reduce the index?
>
> > thanks
>
> > On Sat, Aug 28, 2010 at 9:50 AM, Eliot Horowitz <eliothorow...@gmail.com>wrote:
>
> >> Your disk is totally saturated, so everything can be slow.
>
> >> Why are yo using string _id that look like numbers?  Would be better
> >> to use actual numbers as the index will be smaller so more will fit in
> >> ram
>
> >> eliothorow...@gmail.com>
> >> > wrote:
>
> >> >> The slowest one i see in the log is 634ms, which can happen if the
> >> >> disk is totally saturated.
>
> >> >> If you need totally random access, the best thing to do is have enough
> >> >> shards to keep the index almost entirely in ram.
>
> >> >> So if you have 32gb of index, and 16gb machines, you would want 2-3
> >> >> shards.
>
> >> >> On Fri, Aug 27, 2010 at 8:23 PM, Ofer Fort <ofer...@gmail.com> wrote:
> >> >> > my reads are rather random, what would be a good sharding
> >> architecture?
> >> >> > how
> >> >> > many servers with how much ram?
>
> >> >> > but does this explain a few minutes query?
> >> >> > thanks
>
> >> >> > On Sat, Aug 28, 2010 at 3:11 AM, Eliot Horowitz
> >> >> > <eliothorow...@gmail.com>
> >> >> > wrote:
>
> >> >> >> Overall your disk is totally saturated.
> >> >> >> Can you send the log for the few minute query?
> >> >> >> Do you need truly random reads?  If so the two options are adding
> >> ram
> >> >> >> or
> >> >> >> sharding.
>
> >> >> >> <eliothorow...@gmail.com>
> >> >> >> wrote:
>
> >> >> >>> What's your index to ram ratio?  While your performance isn't good,
> >> >> >>> can
> >> >> >>> you run "iostat -x 2"
>
> >> >> >>> > You received this message because...
>
> read more »

Kristina Chodorow

unread,
Aug 28, 2010, 9:02:15 AM8/28/10
to mongod...@googlegroups.com
String size is the length of the string + 5, so "12345" would take 10 bytes of storage.
Int size is 4 bytes (or 8, if you use 64-bit ints).


Ofer Fort

unread,
Aug 28, 2010, 9:04:24 AM8/28/10
to mongod...@googlegroups.com
Is there an estimation for the memory size gain? I read in one of the
posts that it's 10%, and if that's the case, I don't think I'll go
through with it, as I have about 1B records, and importing them again
will be a long time.
now my index is 32GB in a 16GB machine so 10% will still force me to
use shards or increase memory.

Ofer Fort

unread,
Aug 28, 2010, 9:20:18 AM8/28/10
to mongod...@googlegroups.com
So of my numbers are of between 100M or 1B that would be 15 bytes, and
the int would be 8 (64-bit)
that is actully a big gain, might even postpone my move to shards...

Ofer Fort

unread,
Aug 28, 2010, 9:52:04 AM8/28/10
to mongod...@googlegroups.com
If mumy numbers are actually Long and not Integer, does it matter or
is it still 8 bytes?

Eliot Horowitz

unread,
Aug 28, 2010, 12:05:12 PM8/28/10
to mongod...@googlegroups.com
if its a long its 8 bytes per, so not as much of a savings, but still
some probably.
Are the 2 collections totally different data, or do they have the same
ids in them?

Ankur

unread,
Aug 28, 2010, 12:19:55 PM8/28/10
to mongodb-user
Is there is more to the index than just storage size though? Let me
paste relevant info from our collection stats:
"count" : 15438917,
"size" : 10527137432,
"avgObjSize" : 681.8572463340531,
"storageSize" : 12163553536,
"nindexes" : 9,
"totalIndexSize" : 6493766080,
"indexSizes" : {
"_id_" : 576054208,
... other indexes
}

{ "_id" : NumberLong( 5 ) }
{ "_id" : NumberLong( 51 ) }
{ "_id" : NumberLong( 73 ) }

So just the _id index is 576 mb for 15 million longs.
576054208 bytes divided by 15438917 records is 37 bytes per record.


Taking Ofer's first post:

db.collection1: 39 bytes per record
db.collection2: 39 bytes per record
db.collection3: 38 bytes per record

So there is no benefit to using long instead of string for an index?
Or is there some other factor for lookup?

Ankur

On Aug 28, 12:05 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> if its a long its 8 bytes per, so not as much of a savings, but still
> some probably.
> Are the 2 collections totally different data, or do they have the same
> ids in them?
>
>
>
> On Sat, Aug 28, 2010 at 9:52 AM, Ofer Fort <ofer...@gmail.com> wrote:
> > If mumy numbers are actually Long and not Integer, does it matter or
> > is it still 8 bytes?
>
> > On Saturday, August 28, 2010, Ofer Fort <ofer...@gmail.com> wrote:
> >> So of my numbers are of between 100M or 1B that would be 15 bytes, and
> >> the int would be 8 (64-bit)
> >> that is actully a big gain, might even postpone my move to shards...
>
> >> On Saturday, August 28, 2010, Kristina Chodorow <krist...@10gen.com> wrote:
> >>> String size is the length of the string + 5, so "12345" would take 10 bytes of storage.
> >>> Int size is 4 bytes (or 8, if you use 64-bit ints).
>

Eliot Horowitz

unread,
Aug 28, 2010, 12:35:18 PM8/28/10
to mongod...@googlegroups.com
Yes - there is overhead, etc..

Different types still impact index size though, but there are many
factors, including density, etc...

(Looks like I sent this to the wrong post)

oferfort

unread,
Aug 28, 2010, 1:07:19 PM8/28/10
to mongodb-user
the collections are totally different, but the id numbers are probably
similar as they are representing two auto incremented tables, that
started at the same number and are progressing roughly at the same
rate

On Aug 28, 7:05 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> if its a long its 8 bytes per, so not as much of a savings, but still
> some probably.
> Are the 2 collections totally different data, or do they have the same
> ids in them?
>
> On Sat, Aug 28, 2010 at 9:52 AM, Ofer Fort <ofer...@gmail.com> wrote:
> > If mumy numbers are actually Long and not Integer, does it matter or
> > is it still 8 bytes?
>
> > On Saturday, August 28, 2010, Ofer Fort <ofer...@gmail.com> wrote:
> >> So of my numbers are of between 100M or 1B that would be 15 bytes, and
> >> the int would be 8 (64-bit)
> >> that is actully a big gain, might even postpone my move to shards...
>
> >> On Saturday, August 28, 2010, Kristina Chodorow <krist...@10gen.com> wrote:
> >>> String size is the length of the string + 5, so "12345" would take 10 bytes of storage.
> >>> Int size is 4 bytes (or 8, if you use 64-bit ints).
>

Eliot Horowitz

unread,
Aug 28, 2010, 1:11:25 PM8/28/10
to mongod...@googlegroups.com
I think your best bet is to look into sharding or getting more ram in that box.

Sharding is a good long term strategy for any system where you have a
lot of data.

What's the read/write ratio on this system?

If its mostly read, 1 option is to have 2 servers and 2 shards. Each
server would run a master and a slave for the other server.
The slave will mostly be not in memory since LRU is across all data on
the server.
This only works if its mostly reads though.

Ofer Fort

unread,
Aug 28, 2010, 2:11:00 PM8/28/10
to mongod...@googlegroups.com
It's actually mostly write, from a backend server, around 5-10m
records a day, and will increase once the servers will be oine.
The reads are much less but needs to be very fast, as they are from a
web server.

Eliot Horowitz

unread,
Aug 28, 2010, 2:27:39 PM8/28/10
to mongod...@googlegroups.com
Ok - then sharding is the way to go.

oferfort

unread,
Aug 28, 2010, 3:00:23 PM8/28/10
to mongodb-user
ok, i'll set up a second server and probably import all the data
again, this time with the id as a long

great, thanks for the info

On Aug 28, 9:27 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Ok - then sharding is the way to go.
>
> On Sat, Aug 28, 2010 at 2:11 PM, Ofer Fort <ofer...@gmail.com> wrote:
> > It's actually mostly write, from a backend server, around 5-10m
> > records a day, and will increase once the servers will be oine.
> > The reads are much less but needs to be very fast, as they are from a
> > web server.
>
> > On Saturday, August 28, 2010, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> >> I think your best bet is to look into sharding or getting more ram in that box.
>
> >> Sharding is a good long term strategy for any system where you have a
> >> lot of data.
>
> >> What's the read/write ratio on this system?
>
> >> If its mostly read, 1 option is to have 2 servers and 2 shards.  Each
> >> server would run a master and a slave for the other server.
> >> The slave will mostly be not in memory since LRU is across all data on
> >> the server.
> >> This only works if its mostly reads though.
>

oferfort

unread,
Sep 16, 2010, 4:18:33 AM9/16/10
to mongodb-user
ok guys, i've set up a second shard, imported all my data to it,
changed the id to be long instead of string, but the index size is
actually bigger than what it was on one shard, and it doesn't seem to
be evenly distributed between the shards, anything i'm doing wrong?

> use storage
switched to db storage
> db.collection1.stats()
{
"sharded" : true,
"ns" : "storage.collection1",
"count" : 436655414,
"size" : 238318515940,
"avgObjSize" : 545.7816582574194,
"storageSize" : 254791264768,
"nindexes" : 1,
"nchunks" : 896,
"shards" : {
"shard0000" : {
"ns" : "storage.collection1",
"count" : 158106430,
"size" : 120734713280,
"avgObjSize" : 763.6293683944416,
"storageSize" : 131377968896,
"numExtents" : 96,
"nindexes" : 1,
"lastExtentSize" : 1991168256,
"paddingFactor" : 1.0099999999453324,
"flags" : 1,
"totalIndexSize" : 5989311216,
"indexSizes" : {
"_id_" : 5989311216
},
"ok" : 1
},
"shard0001" : {
"ns" : "storage.collection1",
"count" : 278548984,
"size" : 117583802660,
"avgObjSize" : 422.12971295562147,
"storageSize" : 123413295872,
"numExtents" : 92,
"nindexes" : 1,
"lastExtentSize" : 1991168256,
"paddingFactor" : 1.009999999999807,
"flags" : 1,
"totalIndexSize" : 13123552976,
"indexSizes" : {
"_id_" : 13123552976
},
"ok" : 1
}
},
"ok" : 1
}
> db.collection2.stats()
{
"sharded" : true,
"ns" : "storage.collection2",
"count" : 148778689,
"size" : 4557257264,
"avgObjSize" : 30.631115885152074,
"storageSize" : 7256946176,
"nindexes" : 1,
"nchunks" : 35,
"shards" : {
"shard0000" : {
"ns" : "storage.collection2",
"count" : 81371890,
"size" : 2543001104,
"avgObjSize" : 31.25159196867616,
"storageSize" : 3961892352,
"numExtents" : 30,
"nindexes" : 1,
"lastExtentSize" : 666838528,
"paddingFactor" : 1.0099999999946485,
"flags" : 1,
"totalIndexSize" : 3054814144,
"indexSizes" : {
"_id_" : 3054814144
},
"ok" : 1
},
"shard0001" : {
"ns" : "storage.collection2",
"count" : 67406799,
"size" : 2014256160,
"avgObjSize" : 29.882091864353328,
"storageSize" : 3295053824,
"numExtents" : 29,
"nindexes" : 1,
"lastExtentSize" : 555698944,
"paddingFactor" : 1.5999999999994619,
"flags" : 1,
"totalIndexSize" : 2551473088,
"indexSizes" : {
"_id_" : 2551473088
},
"ok" : 1
}
},
"ok" : 1
}
> db.collection3.stats()
{
"sharded" : true,
"ns" : "storage.collection3",
"count" : 443557862,
"size" : 120656719360,
"avgObjSize" : 272.02024740573756,
"storageSize" : 131338832896,
"nindexes" : 1,
"nchunks" : 598,
"shards" : {
"shard0000" : {
"ns" : "storage.collection3",
"count" : 218183582,
"size" : 58469078292,
"avgObjSize" : 267.9811091010505,
"storageSize" : 63678248192,
"numExtents" : 62,
"nindexes" : 1,
"lastExtentSize" : 1991168256,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 8132306640,
"indexSizes" : {
"_id_" : 8132306640
},
"ok" : 1
},
"shard0001" : {
"ns" : "storage.collection3",
"count" : 225374280,
"size" : 62187641068,
"avgObjSize" : 275.9305146443507,
"storageSize" : 67660584704,
"numExtents" : 64,
"nindexes" : 1,
"lastExtentSize" : 1991168256,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 8453482192,
"indexSizes" : {
"_id_" : 8453482192
},
"ok" : 1
}
},
"ok" : 1
}
>

thanks for any help
> > >>>> >>>> >> Your disk is totallysaturated, so everything can be slow.

Eliot Horowitz

unread,
Sep 16, 2010, 9:51:23 AM9/16/10
to mongod...@googlegroups.com
Balancing happens over time and there are thresholds for when to
trigger balancing.

Also - for your 3 collections, these are the % differences and total
size disparity between the shards..

2.5% - 3gb
25% - 500mb
5% - 3gb

Those are right along the thresholds for when things will get balanced
so I think that's in pretty good shape.

Its not efficient to keep things perfectly balanced as a temporary
in-balance would then trigger lots of behind the scenes work.
Better to wait and see for a bit - and then fix.

Ofer Fort

unread,
Sep 16, 2010, 11:30:57 AM9/16/10
to mongod...@googlegroups.com
ok, i'll wait for it to happen.

any suggestion about the amount of memory i need for those process?

Eliot Horowitz

unread,
Sep 16, 2010, 10:29:00 PM9/16/10
to mongod...@googlegroups.com
Migrations shouldn't use much memory.
Its fixed at about 200mb

Ofer Fort

unread,
Sep 17, 2010, 5:55:07 AM9/17/10
to mongod...@googlegroups.com
since my index now doesn't fit in memory, i wonder what would be better, add another server, or add more memory to the current machine.
any recommendations?

thanks

Eliot Horowitz

unread,
Sep 17, 2010, 10:38:02 AM9/17/10
to mongod...@googlegroups.com
There are pros and cons of each of course.
More servers gets you more cpu, but more complexity.
Either should work though

Ofer Fort

unread,
Sep 30, 2010, 9:13:59 PM9/30/10
to mongod...@googlegroups.com
Hey guys, seems like we hit another bump. with my mysql->mongo transfer.
As long as we were only writing to mongodb, everything was very smooth and fast.
once we started reading from mongo, everything got very very slow.
my system has 2 shards(32 GB RAM), 1 config server, 10 mongos (for each server that connects to the shard)
I have 3 collections, the total index is about 40GB, so i should be OK.

my system has multiple processes (JAVA), each running multiple threads reading and writing to mongo.

we are getting new messages i haven't seen in mongos:
Thu Sep 30 17:03:16 [conn2262] ns: storage.owner ClusteredCursor::query ShardConnection had to change attempt: 0
Thu Sep 30 17:03:16 [conn2252] ns: storage.owner ClusteredCursor::query ShardConnection had to change attempt: 0
Thu Sep 30 17:03:16 [conn2251] ns: storage.owner ClusteredCursor::query ShardConnection had to change attempt: 0
Thu Sep 30 17:03:16 [conn2258] ns: storage.object ClusteredCursor::query ShardConnection had to change attempt: 0
Thu Sep 30 17:03:17 [conn2255] ns: storage.object ClusteredCursor::query ShardConnection had to change attempt: 0
Thu Sep 30 17:03:18 [conn2263] ns: storage.owner ClusteredCursor::query ShardConnection had to change attempt: 0

and a lot of locks in the config:
Thu Sep 30 16:29:12 [conn4] update config.locks  query: { _id: "balancer" } byid  1702ms
Thu Sep 30 16:32:08 [conn54] update config.locks  query: { _id: "balancer", state: 0, ts: ObjectId('4ca51e330c1a7f7b631afe65') } 1035ms
Thu Sep 30 16:34:09 [conn48] update config.mongos  query: { _id: "a.b.c.d:27017" } 644ms
Thu Sep 30 16:45:08 [conn31] update config.lockpings  query: { _id: "a.b.c.d:1285346887:1804289383" } 103ms
Thu Sep 30 16:48:09 [conn32] update config.lockpings  query: { _id: "a.b.c.d:1285346887:1804289383" } 1209ms
Thu Sep 30 16:53:20 [conn93] update config.lockpings  query: { _id: "a.b.c.d:1285346869:1804289383" } 3865ms
Thu Sep 30 16:55:12 [conn4] update config.locks  query: { _id: "balancer" } byid  2763ms
Thu Sep 30 17:40:24 [conn4] update config.mongos  query: { _id: "a.b.c.d:27017" } 430910ms
Thu Sep 30 17:48:43 [conn45] update config.mongos  query: { _id: "a.b.c.d:27017" } 111ms


 the system was very slow, timeouts of 120s, i tried to login to one of my shards and it took me a few minutes, i guess CPU was high?

these are my db stats:
> db.stats()
{
        "raw" : {
                "ch-mdb1.tra.cx:27018" : {
                        "collections" : 5,
                        "objects" : 492241465,
                        "avgObjSize" : 414.1699891292173,
                        "dataSize" : 203871642208,
                        "storageSize" : 223712341760,
                        "numExtents" : 203,
                        "indexes" : 3,
                        "indexSize" : 19206190160,
                        "fileSize" : 291849109504,
                        "ok" : 1
                },
                "ch-mdb2.tra.cx:27018" : {
                        "collections" : 5,
                        "objects" : 598202525,
                        "avgObjSize" : 352.7227614293337,
                        "dataSize" : 210999646512,
                        "storageSize" : 224236464384,
                        "numExtents" : 202,
                        "indexes" : 3,
                        "indexSize" : 25628732832,
                        "fileSize" : 293995544576,
                        "ok" : 1
                }
        },
        "objects" : 1090443990,
        "avgObjSize" : 380.4608879727972,
        "dataSize" : 414871288720,
        "storageSize" : 447948806144,
        "numExtents" : 405,
        "indexes" : 6,
        "indexSize" : 44834922992,
        "fileSize" : 585844654080,
        "ok" : 1
}

this is really a showstopper for us, and i appreciate any help i could get, it's been a long process to move to mongo, and this is now at a critical stage

thanks

Eliot Horowitz

unread,
Sep 30, 2010, 9:26:33 PM9/30/10
to mongod...@googlegroups.com
What version is this on?
Can you run mongostat on the config server.
Also - you should really have 3 config servers, not 1.

Ofer Fort

unread,
Sep 30, 2010, 9:40:24 PM9/30/10
to mongod...@googlegroups.com
thanks for a swift reply
using 1.6.3,
why 3 config servers? is most of the load on the config servers? what kind of hardware is best for them? (is it memory or cpu demanding)
this is mongostat, but i changed my system again only to write to mongo and not reas, as this is the production environment.

insert/s query/s update/s delete/s getmore/s command/s mapped  vsize    res faults/s locked % idx miss %  conn       time
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:36:46
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:36:47
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:36:48
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:36:49
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:36:50
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:36:51
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:36:52
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:36:53
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:36:54
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:36:55
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:36:56
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:36:57
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:36:58
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:36:59
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:00
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:01
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:02
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:03
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:04
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:05
insert/s query/s update/s delete/s getmore/s command/s mapped  vsize    res faults/s locked % idx miss %  conn       time
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:06
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:07
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:08
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:09
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:10
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:11
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:12
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:13
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:14
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:15
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:16
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:17
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:18
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:19
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:20
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:21
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:22
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:23
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:24
       0       0        0        0         0         1      0    375     37        0        0          0     1   18:37:25

Eliot Horowitz

unread,
Sep 30, 2010, 11:43:04 PM9/30/10
to mongod...@googlegroups.com
You need 3 for redundancy.
If you have and it goes down it can bring the whole cluster down.
Hard to diagnose what's going on with no load.

Ofer Fort

unread,
Oct 1, 2010, 2:53:25 AM10/1/10
to mongod...@googlegroups.com
But performance wise, will it change anything?

I'll try to recreate the scenario where there was the same load, what
would help to debug? mongostat, anything else? Log files? more db
stats ?
As this will be done on the production env, it's better to try to
gathe as many info as I can from the test.

Thanks, ofer

Erez Zarum

unread,
Oct 1, 2010, 3:50:47 AM10/1/10
to mongod...@googlegroups.com
Hey Eliot,
I am working with Ofer,
this is our mongostat from the shard1 server:
connected to: localhost:27018
insert/s query/s update/s delete/s getmore/s command/s flushes/s mapped
vsize res faults/s locked % idx miss % q t|r|w conn time
64 0 149 0 0 10 0 276378
280607 19035 43 88.4 0 269|7|262 462 00:43:13
29 0 44 0 0 2 0 276378
280607 19035 7 57.5 0 261|0|261 462 00:43:19
33 0 46 0 0 2 0 276378
280607 19036 14 105 0 261|0|261 462 00:43:20
36 0 145 0 0 1 0 276378
280607 19036 30 139 0 261|0|261 462 00:43:22
41 0 93 0 0 70 0 276378
280607 19037 24 65.7 0 261|0|261 462 00:43:23

high locked% and fault/s, also, this server also runs the config server,
it has 32GB RAM, there's also a mongos on this one as well.

This is from the shard2 server:
connected to: localhost:27018
insert/s query/s update/s delete/s getmore/s command/s flushes/s mapped
vsize res faults/s locked % idx miss % q t|r|w conn time
12 0 42 0 0 2 0 278345
282495 18114 7 9.2 0 0|0|0 467 00:46:09
54 0 9 0 0 2 0 278345
282495 18114 1 1.7 0 0|0|0 467 00:46:10
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:11
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:12
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:13
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:14
1 0 11 0 0 1 0 278345
282495 18114 5 10.1 0 0|0|0 467 00:46:15
2 0 5 0 0 1 0 278345
282495 18114 2 3 0 0|0|0 467 00:46:16
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:17
6 0 0 0 0 3 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:18
4 0 19 0 0 2 0 278345
282495 18114 3 4.4 0 0|0|0 467 00:46:19
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:20
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:21
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:22
0 0 0 0 0 1 0 278345
282495 18114 0 0 0 0|0|0 467 00:46:23
0 0 9 0 0 1 0 278345
282495 18114 1 0.1 0 0|0|0 467 00:46:24
37 0 23 0 0 1 0 278345
282495 18114 10 19.2 0 0|0|0 467 00:46:25
202 0 17 0 0 2 0 278345
282495 18114 0 0.8 0 0|0|0 467 00:46:26

It looks like the second shard servers is in quite better shape, there
are spikes of locks, this server runs only shard server and mongos
process, and has 32GB RAM as well.

Everything went well with only writing to this cluster, at the moment we
have started reading from it as well, the locks begin.

We have reversed back to only reading but high lock% is still there.

Eliot Horowitz

unread,
Oct 1, 2010, 10:38:56 AM10/1/10
to mongod...@googlegroups.com
Look like things are fairly saturated.
You might want to look at why things are so busy without the reads.
Indexes, etc...

Erez Zarum

unread,
Oct 1, 2010, 11:22:11 AM10/1/10
to mongod...@googlegroups.com
I'm trying to understand where to start, as soon as we are starting to
update then we get high lock%, i have no clue what's wrong or where to
start looking at...
Index size on that shard server is around 19GB, we have 32GB RAM on that
server, so it should be fine.
It uses more resident memory then the index for some reason, around 7% more.

Is there any documents that i'm missing? i couldn't find any help on the
site that could help me diagnose it better.

I do see in iostat that util goes up to 100% (but!, no high disk usage
for some reason, no high write/s or read/s or even Write/Read MB/s).
There's some iowait on vmstat (around 30% sometimes), but i can't figure
out why, more servers won't help much i guess, as the index is yet not
large enough to exceed 32GB.

What can i do more to diagnose and understand why i have high lock%?

Adding more shard servers? more config servers?
We do not add any servers until we know that it will help performance
wise, after we will see it works we will start adding more config
servers and replicaset for each shard server so to keep it running even
if a server crash.

Current Configuration:
shard00 server - 32GB, RAID10 - 4 disks (running config server on port
27019, shard server on port 27018, mongos on port 27017)
shard01 server - 32GB, RAID10 - 4 disks (running shard server on port
27018, mongos on port 27017)
each application server is running an instance of mongos that point to
shard00 server config server (port 27019).

Thanks!

Erez Zarum

unread,
Oct 1, 2010, 12:08:19 PM10/1/10
to mongod...@googlegroups.com
I also have to add, we are querying by _ID, and the ID is indexed as well.

Eliot Horowitz

unread,
Oct 1, 2010, 1:46:35 PM10/1/10
to mongod...@googlegroups.com
Can you turn on profiling on the shards?
Also - any slow ops in the log?

Erez Zarum

unread,
Oct 1, 2010, 1:54:29 PM10/1/10
to mongod...@googlegroups.com
When we started issuing a lot of updates, all the operations were slow,
ranging from 300ms to 1000ms+.

I did notice that the disks were saturated, high util%, i'm trying to
understand how come it went so hard on the disk if the index fit well in
memory.

I will turn on profiling, i just hope it won't decrease the performance
by that much.

Thanks,
Erez.

Erez Zarum

unread,
Oct 1, 2010, 3:22:31 PM10/1/10
to mongod...@googlegroups.com
I have looked a little bit in the log i found logs like this:
[conn185] update db.collection3 query: { _id: 54149149 } 7308ms

We use _id which is indexed.

Erez Zarum

unread,
Oct 1, 2010, 4:27:27 PM10/1/10
to mongod...@googlegroups.com
We loaded only 2 app servers currently updating to mongodb:

Fri Oct 1 13:18:27 [conn284] query db.collection2 ntoreturn:1 idhack
reslen:2367 bytes:2351 1076ms
Fri Oct 1 13:18:27 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 330ms
Fri Oct 1 13:18:32 [conn269] query db.collection2 ntoreturn:1 idhack
reslen:3100 bytes:3084 581ms
Fri Oct 1 13:18:32 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 487ms
Fri Oct 1 13:18:35 [conn360] update db.collection2 query: { _id:
483894948 } nscanned:1 moved 126ms
Fri Oct 1 13:18:35 [conn406] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 103ms
Fri Oct 1 13:18:37 [conn92] query db.collection2 ntoreturn:1 idhack
reslen:131 bytes:115 795ms
Fri Oct 1 13:18:42 [conn157] query db.collection2_interaction
ntoreturn:1 idhack reslen:140 bytes:124 614ms
Fri Oct 1 13:18:42 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 138ms
Fri Oct 1 13:18:42 [conn386] query db.collection2 ntoreturn:1 idhack
reslen:1326 bytes:1310 116ms
Fri Oct 1 13:18:43 [conn126] query db.collection2_interaction
ntoreturn:1 idhack reslen:140 bytes:124 589ms
Fri Oct 1 13:18:43 [conn84] query db.collection2 ntoreturn:1 idhack
reslen:62 bytes:46 273ms
Fri Oct 1 13:18:44 [conn291] query db.collection2 ntoreturn:1 idhack
reslen:2883 bytes:2867 394ms
Fri Oct 1 13:18:44 [conn299] query db.collection2 ntoreturn:1 idhack
reslen:168 bytes:152 262ms
Fri Oct 1 13:18:45 [conn287] query db.collection2 ntoreturn:1 idhack
reslen:619 bytes:603 224ms
Fri Oct 1 13:18:47 [conn147] query db.collection2_interaction
ntoreturn:1 idhack reslen:82 bytes:66 212ms
Fri Oct 1 13:18:47 [conn99] query db.collection2 ntoreturn:1 idhack
reslen:62 bytes:46 134ms
Fri Oct 1 13:18:51 [conn293] query db.collection2 ntoreturn:1 idhack
reslen:8370 bytes:8354 251ms
Fri Oct 1 13:18:52 [conn324] query db.collection3 ntoreturn:1 idhack
reslen:220 bytes:204 211ms
Fri Oct 1 13:18:52 [conn363] query db.collection3 ntoreturn:1 idhack
reslen:58 bytes:42 228ms
Fri Oct 1 13:18:58 [conn207] query db.collection3 ntoreturn:1 idhack
reslen:104 bytes:88 1283ms
Fri Oct 1 13:18:58 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 947ms
Fri Oct 1 13:18:59 [conn82] query db.collection2 ntoreturn:1 idhack
reslen:305 bytes:289 292ms
Fri Oct 1 13:18:59 [conn225] query db.collection2 ntoreturn:1 idhack
reslen:2643 bytes:2627 136ms
Fri Oct 1 13:19:00 [conn82] query db.collection2 ntoreturn:1 idhack
reslen:62 bytes:46 225ms
Fri Oct 1 13:19:00 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 176ms
Fri Oct 1 13:19:02 [conn157] query db.collection2 ntoreturn:1 idhack
reslen:3818 bytes:3802 251ms
Fri Oct 1 13:19:03 [conn159] query db.collection2 ntoreturn:1 idhack
reslen:1608 bytes:1592 907ms
Fri Oct 1 13:19:03 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 571ms
Fri Oct 1 13:19:07 [conn64] query db.collection2_interaction
ntoreturn:1 idhack reslen:1072 bytes:1056 208ms
Fri Oct 1 13:19:07 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 196ms
Fri Oct 1 13:19:08 [conn11] query db.collection2 ntoreturn:1 idhack
reslen:1863 bytes:1847 212ms
Fri Oct 1 13:19:08 [conn95] query db.collection2 ntoreturn:1 idhack
reslen:62 bytes:46 220ms
Fri Oct 1 13:19:10 [conn46] query db.collection2 ntoreturn:1 idhack
reslen:10490 bytes:10474 121ms
Fri Oct 1 13:19:10 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 112ms
Fri Oct 1 13:19:12 [conn369] update db.collection2 query: { _id:
463316827 } nscanned:1 moved 284ms
Fri Oct 1 13:19:12 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 143ms
Fri Oct 1 13:19:12 [conn226] query db.collection2 ntoreturn:1 idhack
reslen:1544 bytes:1528 237ms
Fri Oct 1 13:19:12 [conn231] query db.collection2_interaction
ntoreturn:1 idhack reslen:175 bytes:159 198ms
Fri Oct 1 13:19:13 [conn99] query db.collection3 ntoreturn:1 idhack
reslen:58 bytes:42 221ms
Fri Oct 1 13:19:13 [conn91] query db.collection2 ntoreturn:1 idhack
reslen:634 bytes:618 251ms
Fri Oct 1 13:19:13 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 216ms
Fri Oct 1 13:19:13 [conn369] update db.collection2 query: { _id:
468690787 } nscanned:1 moved 103ms
Fri Oct 1 13:19:13 [conn90] query db.collection2 ntoreturn:1 idhack
reslen:62 bytes:46 139ms
Fri Oct 1 13:19:15 [conn457] query admin.$cmd ntoreturn:1 command: {
serverStatus: 1 } reslen:901 680ms
Fri Oct 1 13:19:15 [conn214] query db.collection2 ntoreturn:1 idhack
reslen:382 bytes:366 1302ms
Fri Oct 1 13:19:18 [conn345] query db.collection3 ntoreturn:1 idhack
reslen:58 bytes:42 162ms
Fri Oct 1 13:19:18 [conn83] query db.collection2 ntoreturn:1 idhack
reslen:1854 bytes:1838 126ms
Fri Oct 1 13:19:19 [conn355] query admin.$cmd ntoreturn:1 command: {
features: 1 } reslen:85 112ms
Fri Oct 1 13:19:19 [conn454] query admin.$cmd ntoreturn:1 command: {
features: 1 } reslen:85 112ms
Fri Oct 1 13:19:19 [conn230] query db.collection2 ntoreturn:1 idhack
reslen:843 bytes:827 144ms
Fri Oct 1 13:19:19 [conn429] query admin.$cmd ntoreturn:1 command: {
features: 1 } reslen:85 105ms
Fri Oct 1 13:19:20 [conn390] query db.collection2 ntoreturn:1 idhack
reslen:4101 bytes:4085 243ms

lock% started to grow a little bit:

insert/s query/s update/s delete/s getmore/s command/s flushes/s mapped
vsize res faults/s locked % idx miss % q t|r|w conn time

8 767 7 0 0 1 0 278505
282514 18278 41 5.3 0 15|2|13 444 13:21:03
0 199 11 0 0 1 0 278505
282514 18278 14 12.9 0 231|69|162 444 13:21:04
2 121 7 0 0 2 0 278505
282514 18278 11 0.7 0 249|21|228 444 13:21:07
24 867 19 0 0 2 0 278505
282514 18279 57 3.3 0 0|0|0 444 13:21:08
7 715 20 0 0 4 0 278505
282514 18279 37 4.4 0 127|70|57 444 13:21:09
1 217 10 0 0 1 0 278505
282514 18279 16 1 0 207|93|114 444 13:21:10
24 672 19 0 0 1 0 278505
282514 18280 54 5.8 0 127|25|102 444 13:21:11
29 858 17 0 0 1 0 278505
282514 18281 59 9.6 0 49|12|37 444 13:21:12
27 679 9 0 0 1 0 278505
282514 18281 24 4.8 0 12|6|6 444 13:21:14
7 448 24 0 0 1 0 278505
282514 18281 13 3.4 0 129|41|88 444 13:21:15
5 231 6 0 0 1 0 278505
282514 18282 31 1.6 0 192|4|188 444 13:21:16
25 1035 35 0 0 1 0 278505
282514 18282 47 7.7 0 172|11|161 444 13:21:17
22 958 40 0 0 3 0 278505
282514 18283 53 8 0 68|56|12 444 13:21:18
4 374 23 0 0 1 0 278505
282514 18283 16 34.4 0 228|170|58 444 13:21:19

disks are not saturated as before though, they are fine. (util% is
around 50-70, and iowait is very low if any at all.
it makes mongo hangs a little bit sometimes, it's just from the minute
we included updates.

Thanks,
Erez.

Ofer Fort

unread,
Oct 2, 2010, 11:58:05 AM10/2/10
to mongod...@googlegroups.com
I'm trying to understand if we are doing something wrong.
We are doing inserts, queries and updates all based on the _id, and the index fits in memory.
maybe i am using the api (java driver) in a bad way? this is the way we are doing our:

query:
DBObject object = m_mongoDB.getDB(db).getCollection(collection).findOne(new BasicDBObject().append("_id", id));

insert:
m_mongoDB.getDB(db).getCollection(collection).insert(new BasicDBObject(data).append("_id", id));

update:
m_mongoDB.getDB(db).getCollection(collection).save(new BasicDBObject(data).append("_id", id));

thanks for any help

ofer

Ofer Fort

unread,
Oct 3, 2010, 1:41:14 PM10/3/10
to mongod...@googlegroups.com
Anybody?
I don't understand what I'm doing wrong, could it be the amount of our updates?

Eliot Horowitz

unread,
Oct 3, 2010, 4:26:09 PM10/3/10
to mongod...@googlegroups.com
You have 16gb of ram and 32gb of indexes still?
If so - i think that's what's causing the slowness.
What's the access pattern on _id?
Do you update totally random _ids or is there some pattern that can be
leveraged.

Erez Zarum

unread,
Oct 3, 2010, 4:56:04 PM10/3/10
to mongod...@googlegroups.com
Each server on the shard has 32GB of RAM.(shard0000 has index size of
around 18Gb while shard0001 has around 25GB)
I will wait for ofer to say which access pattern we're on, but i assume
it's quite random.
I did notice high util% in the disks.

Another thing that i have came across is perhaps trying to decrease the
syncdelay to 30 instead of 60?, is it possible that the fsync interval
of 60 seconds is too high for our needs?

Thanks,
Erez.

Ofer Fort

unread,
Oct 3, 2010, 4:55:28 PM10/3/10
to mongod...@googlegroups.com
no, i have two shards of 32GB each, and the index size is 17GB on one of the shards and 23GB on the other.

updating is not random exactly,
I have 3 collections, one (currently 500M docs) hardly gets updates, but gets a lot of inserts (3M a day) and queries(also 3M a day).
The next one (also around 500M docs) gets a lot of insets and queries (around 4M a day) , and there are ususally around 5M updates a day.
The last one (160M docs) gets inserts and queries (0.5M a day) and has around 1M of updates a day.

The updating pattern is a kind of degradation, once i insert a docuemnt, i will update it once every few hours, then once a few days, and then after a week or two i done't update it anymore.
Reply all
Reply to author
Forward
0 new messages