sharding makes reads/writes slower.

72 views
Skip to first unread message

babak

unread,
Jan 5, 2011, 5:51:22 AM1/5/11
to mongodb-user
Hi!

We are evaluating different mongodb setups on amazon ec2 with ubuntu
10.0.4 AMIs. We have tested setting up 3 config servers on 3 micro
instances and a router on a large instance. We first tried to shard on
one large instance and got ~9000 upserts/second, which didn't change
much during the test. When adding a shard on another large instance we
only got ~7000 upserts/second. When checking mongostat we can see that
upserts vary very much (between 0 to 35000 update/s). we also get a
weird StaleConfigException at the same time. Reads seems also to get
slower with two shards. This seems very weird and we can't find the
error, do you have any ideas?

mongostat:

insert/s query/s update/s delete/s getmore/s command/s mapped
vsize res faults/s locked % idx miss % conn time
0 0 2975 0 0 1 0
438 20 0 0 0 13 10:34:10
0 0 4292 0 0 1 0
438 20 0 0 0 13 10:34:11
0 0 4349 0 0 1 0
438 20 0 0 0 13 10:34:12
0 0 22941 0 0 1 0
438 20 0 0 0 13 10:34:13
0 0 12207 0 0 1 0
438 20 0 0 0 13 10:34:14
0 0 10701 0 0 1 0
438 20 0 0 0 13 10:34:15
0 0 4544 0 0 1 0
438 21 0 0 0 13 10:34:16
0 0 4453 0 0 1 0
438 21 0 0 0 13 10:34:17
0 0 3957 0 0 1 0
438 21 0 0 0 13 10:34:18
0 0 5227 0 0 1 0
438 21 0 0 0 13 10:34:19
0 0 4192 0 0 1 0
438 21 0 0 0 13 10:34:20
0 0 25435 0 0 1 0
438 21 0 0 0 13 10:34:21
0 0 4121 0 0 1 0
438 21 0 0 0 13 10:34:22
0 0 2416 0 0 1 0
438 21 0 0 0 13 10:34:23
0 0 2166 0 0 1 0
438 21 0 0 0 13 10:34:24
0 0 2195 0 0 1 0
438 21 0 0 0 13 10:34:25
0 0 2044 0 0 1 0
438 21 0 0 0 13 10:34:26
0 0 12227 0 0 1 0
438 21 0 0 0 13 10:34:27
0 0 3643 0 0 1 0
438 21 0 0 0 13 10:34:28
0 0 25697 0 0 1 0
438 21 0 0 0 13 10:34:29
insert/s query/s update/s delete/s getmore/s command/s mapped
vsize res faults/s locked % idx miss % conn time
0 0 35755 0 0 1 0
438 21 0 0 0 13 10:34:30
0 0 7316 0 0 1 0
438 21 0 0 0 13 10:34:31
0 0 9863 0 0 1 0
438 21 0 0 0 13 10:34:32
0 0 10730 0 0 1 0
438 21 0 0 0 13 10:34:33
0 0 10401 0 0 1 0
438 21 0 0 0 13 10:34:34
0 0 8030 0 0 1 0
438 21 0 0 0 13 10:34:35
0 0 8410 0 0 1 0
438 21 0 0 0 13 10:34:36
....

mongos log:

Wed Jan 5 10:29:22 [conn3] autosplitting ud.visits size: 58893640
shard: ns:ud.visits at: shard0001:10.228.94.255:27017 lastmod: 3|3
min: { uuid: "7cbe5f73-b804-4d76-834b-26ea8dfc2060" } max: { uuid:
"fffff375-ed72-4482-a20e-dd4edbf7ef97" } on: { uuid:
"bdaef5f0-8766-41b5-992c-05155942a798" }(splitThreshold 52428800)
Wed Jan 5 10:29:22 [conn3] config change: { _id:
"ip-10-235-79-102-2011-01-05T10:29:22-4", server: "ip-10-235-79-102",
time: new Date(1294223362667), what: "split", ns: "ud.visits",
details: { before: { min: { uuid: "7cbe5f73-
b804-4d76-834b-26ea8dfc2060" }, max: { uuid: "fffff375-ed72-4482-a20e-
dd4edbf7ef97" } }, left: { min: { uuid: "7cbe5f73-
b804-4d76-834b-26ea8dfc2060" }, max: { uuid:
"bdaef5f0-8766-41b5-992c-05155942a798" } }, right: { min: { uuid:
"bdaef5f0-8766-41b5-992c-05155942a798" }, max: { uuid: "fffff375-
ed72-4482-a20e-dd4edbf7ef97" } } } }
Wed Jan 5 10:29:22 [conn5] update failed b/c of StaleConfigException,
retrying left:4 ns: ud.visits query: { uuid: "a004fb80-9a35-43e5-
afbb-44d4af5c9982" }
Wed Jan 5 10:29:22 [conn11] update failed b/c of
StaleConfigException, retrying left:4 ns: ud.visits query: { uuid:
"7401f7a0-7a7c-448d-b5ce-e556ddd087a2" }
Wed Jan 5 10:29:22 [conn8] update failed b/c of StaleConfigException,
retrying left:4 ns: ud.visits query: { uuid: "6df95be6-df9e-4aa8-b621-
c7aaca98fd59" }
Wed Jan 5 10:29:22 [WriteBackListener] update failed b/c of
StaleConfigException, retrying left:4 ns: ud.visits query: { uuid:
"cd9298db-fe7b-4001-a3b6-c208752ca448" }
Wed Jan 5 10:29:22 [conn12] update failed b/c of
StaleConfigException, retrying left:4 ns: ud.visits query: { uuid:
"d6357772-3872-4872-9c7e-bba466a3a770" }
Wed Jan 5 10:29:22 [conn10] update failed b/c of
StaleConfigException, retrying left:4 ns: ud.visits query: { uuid:
"e4145573-9f70-4be2-b048-04f41569be5a" }
Wed Jan 5 10:29:22 [conn4] update failed b/c of StaleConfigException,
retrying left:4 ns: ud.visits query: { uuid:
"3e6137f3-208c-4812-8dd6-65114ecfedff" }
Wed Jan 5 10:29:22 [conn6] update failed b/c of StaleConfigException,
retrying left:4 ns: ud.visits query: { uuid:
"3bf80a2e-2535-4c3f-908d-6b2138c6111f" }
Wed Jan 5 10:29:22 [conn7] update failed b/c of StaleConfigException,
retrying left:4 ns: ud.visits query: { uuid: "c1748bde-efc4-4b1f-8149-
e36a2a550c46" }
Wed Jan 5 10:29:22 [conn9] update failed b/c of StaleConfigException,
retrying left:4 ns: ud.visits query: { uuid: "de0af13f-e141-465d-93cd-
c923321813a5" }
Wed Jan 5 10:29:23 [conn4] update failed b/c of StaleConfigException,
retrying left:4 ns: ud.visits query: { uuid:
"fffffac7-276a-4d6c-82f4-372ea4aa3ae8" }
Wed Jan 5 10:29:28 [conn13] creating WriteBackListener for:
10.235.73.46:27019,10.235.46.84:27019,10.234.167.245:27019
Wed Jan 5 10:29:31 [conn5] update failed b/c of StaleConfigException,
retrying left:4 ns: ud.visits query: { uuid: "000018a8-fa63-48c5-
ad46-7fffec6a33fc" }
Wed Jan 5 10:29:32 [conn10] update failed b/c of
StaleConfigException, retrying left:4 ns: ud.visits query: { uuid:
"000018a8-fa63-48c5-ad46-7fffec6a33fc" }
Wed Jan 5 10:30:27 connection accepted from 127.0.0.1:59287 #14
Wed Jan 5 10:30:53 [conn12] autosplitting ud.visits size: 54813920
shard: ns:ud.visits at: shard0001:10.228.94.255:27017 lastmod: 3|6
min: { uuid: "7cbe5f73-b804-4d76-834b-26ea8dfc2060" } max: { uuid:
"bdaef5f0-8766-41b5-992c-05155942a798" } on: { uuid:
"9cd3aaa4-5ee4-4a41-b0df-d06f3333269c" }(splitThreshold 52428800)
...

Sharding status:

> db.printShardingStatus();
--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "shard0000", "host" : "10.235.57.133:27017" }
{ "_id" : "shard0001", "host" : "10.228.94.255:27017" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" :
"config" }
{ "_id" : "ud", "partitioned" : true, "primary" :
"shard0001" }
ud.visits chunks:
{ "uuid" : { $minKey : 1 } } -->> { "uuid" :
"00003c42-dee2-47d8-8d5e-423a7ad949b7" } on : shard0000 { "t" : 2000,
"i" : 0 }
{ "uuid" : "00003c42-
dee2-47d8-8d5e-423a7ad949b7" } -->> { "uuid" : "1d6b4869-8bc7-4f6f-
b308-fe48766e9ace" } on : shard0001 { "t" : 3000, "i" : 14 }
{ "uuid" : "1d6b4869-8bc7-4f6f-b308-
fe48766e9ace" } -->> { "uuid" : "3c227fab-
f792-4800-999b-5de0c3630b7f" } on : shard0001 { "t" : 3000, "i" : 15 }
{ "uuid" : "3c227fab-
f792-4800-999b-5de0c3630b7f" } -->> { "uuid" :
"5c5d6d3b-7bb1-450d-86e0-604a06df5a0a" } on : shard0001 { "t" : 3000,
"i" : 10 }
{ "uuid" :
"5c5d6d3b-7bb1-450d-86e0-604a06df5a0a" } -->> { "uuid" : "7cbe5f73-
b804-4d76-834b-26ea8dfc2060" } on : shard0001 { "t" : 3000, "i" : 11 }
{ "uuid" : "7cbe5f73-
b804-4d76-834b-26ea8dfc2060" } -->> { "uuid" : "9cd3aaa4-5ee4-4a41-
b0df-d06f3333269c" } on : shard0001 { "t" : 3000, "i" : 8 }
{ "uuid" : "9cd3aaa4-5ee4-4a41-b0df-
d06f3333269c" } -->> { "uuid" :
"bdaef5f0-8766-41b5-992c-05155942a798" } on : shard0001 { "t" : 3000,
"i" : 9 }
{ "uuid" :
"bdaef5f0-8766-41b5-992c-05155942a798" } -->> { "uuid" :
"de557951-997a-4f30-a970-3581697bc4bf" } on : shard0001 { "t" : 3000,
"i" : 12 }
{ "uuid" : "de557951-997a-4f30-
a970-3581697bc4bf" } -->> { "uuid" : "fffff375-ed72-4482-a20e-
dd4edbf7ef97" } on : shard0001 { "t" : 3000, "i" : 13 }
{ "uuid" : "fffff375-ed72-4482-a20e-
dd4edbf7ef97" } -->> { "uuid" : { $maxKey : 1 } } on : shard0000
{ "t" : 3000, "i" : 0 }


//Babak

babak

unread,
Jan 5, 2011, 5:57:49 AM1/5/11
to mongodb-user
when checking further it seems that all writes are done to one shard
and then shards rebalance. this would imply that we have chosen a bad
shard key, but we can't understand why uuid could be a bad shard key.
Here is the oplog

> db.currentOp()
{
"inprog" : [
{
"opid" : "shard0001:7273380",
"active" : false,
"lockType" : "write",
"waitingForLock" : true,
"op" : "update",
"ns" : "?d.visits",
"query" : {
"uuid" : "ad263a18-a214-4d19-a678-
f0aa570c4c45"
},
"client" : "10.235.79.102:59959",
"desc" : "conn"
},
{
"opid" : "shard0001:7273437",
"active" : false,
"lockType" : "write",
"waitingForLock" : true,
"op" : "update",
"ns" : "?d.visits",
"query" : {
"uuid" : "789ae250-c61a-4af7-
b3e7-14a7effe1883"
},
"client" : "10.235.79.102:59983",
"desc" : "conn"
},
{
"opid" : "shard0001:7273420",
"active" : false,
"lockType" : "write",
"waitingForLock" : true,
"op" : "update",
"ns" : "?d.visits",
"query" : {
"uuid" : "f19e95b1-5a84-4663-
b8df-0f65349c4f9c"
},
"client" : "10.235.79.102:59968",
"desc" : "conn"
},
{
"opid" : "shard0001:7273443",
"active" : false,
"lockType" : "write",
"waitingForLock" : true,
"op" : "update",
"ns" : "?d.visits",
"query" : {
"uuid" : "7c00a1f2-6fc4-45bc-
b76f-31567e40d1cd"
},
"client" : "10.235.79.102:59976",
"desc" : "conn"
},
{
"opid" : "shard0001:7273459",
"active" : false,
"lockType" : "write",
"waitingForLock" : true,
"op" : "update",
"ns" : "?d.visits",
"query" : {
"uuid" : "a038a031-0708-4fce-be50-
d4546ceed38f"
},
"client" : "10.235.79.102:48110",
"desc" : "conn"
},
{
"opid" : "shard0001:5462825",
"active" : true,
"waitingForLock" : false,
"secs_running" : 1020,
"op" : "query",
"ns" : "?d.visits",
"query" : {
"writebacklisten" :
ObjectId("4d24320d3e5fa20dc3b4c1f7")
},
"client" : "10.235.79.102:48118",
"desc" : "conn"
},
{
"opid" : "shard0001:7273453",
"active" : false,
"lockType" : "write",
"waitingForLock" : true,
"op" : "update",
"ns" : "?d.visits",
"query" : {
"uuid" : "efd07c98-3506-410b-
a276-5cb790d367fd"
},
"client" : "10.235.79.102:48112",
"desc" : "conn"
},
{
"opid" : "shard0001:7273429",
"active" : false,
"lockType" : "write",
"waitingForLock" : true,
"op" : "update",
"ns" : "?d.visits",
"query" : {
"uuid" : "a390d3ac-101f-46e4-994f-
da99436d969d"
},
"client" : "10.235.79.102:48113",
"desc" : "conn"
},
{
"opid" : "shard0001:7273436",
"active" : false,
"lockType" : "write",
"waitingForLock" : true,
"op" : "update",
"ns" : "?d.visits",
"query" : {
"uuid" : "7e088d7f-
df1d-4002-8f9a-312c150f6175"
},
"client" : "10.235.79.102:48114",
"desc" : "conn"
},
{
"opid" : "shard0001:7273463",
"active" : true,
"lockType" : "write",
"waitingForLock" : false,
"secs_running" : 0,
"op" : "update",
"ns" : "ud.visits",
"query" : {
"uuid" : "b37f9485-e127-4d5f-ad80-
aa170bd165bf"
},
"client" : "10.235.79.102:48115",
"desc" : "conn"
},
{
"opid" : "shard0001:7273130",
"active" : false,
"lockType" : "write",
"waitingForLock" : true,
"op" : "update",
"ns" : "?d.visits",
"query" : {
"uuid" : "3970e364-6fae-40c3-a910-
ea24e929e078"
},
"client" : "10.235.79.102:48116",
"desc" : "conn"
},
{
"opid" : "shard0000:26",
"active" : true,
"waitingForLock" : false,
"secs_running" : 1555,
"op" : "query",
"ns" : "?",
"query" : {
"writebacklisten" :
ObjectId("4d24320d3e5fa20dc3b4c1f7")
},
"client" : "10.235.79.102:43198",
"desc" : "conn"
}
]
}

Claudio Bisegni

unread,
Jan 5, 2011, 6:31:33 AM1/5/11
to mongod...@googlegroups.com
Hi i'm making a lot of test with sharding because i need to put on mongo experimental data so i have many many device that send at a lot of HZ row data onto mongo. I play around a shard key and with the help of this group i have change my key to use a device ID ad a seed(a random key to max and min) auto splitting on these two key i can grow the insert for sec, obviusoly the query i a little bit slower but in this case i can balance the input on more shard. 

Alvin Richards

unread,
Jan 5, 2011, 11:19:21 PM1/5/11
to mongodb-user
So there are two parts to sharding. Firstly a chunk (i.e. a range of
values) is split on a median value. The algorithm has changed a little
between 1.6 and 1.7.x, but essentially as data is inserted you will
get more and more chunks each containing a smaller range of values.

When there is more than 8 chunk difference between shards, then the
migrate kicks in to move data between the shards.

Looking at you printShardingStatus, it looks like you have just about
enough data to cause the migration to happen.

You can manually split and migrate, take a look at these documents

http://www.mongodb.org/display/DOCS/Splitting+Chunks

http://www.mongodb.org/display/DOCS/Moving+Chunks

-Alvin

Reply all
Reply to author
Forward
0 new messages