dst
unread,Nov 3, 2011, 5:11:08 PM11/3/11Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to mongodb-user
I am fairly new to Mongo, and started looking into sharding to solve a
problem where, I am beginning to think, it may have been the wrong
choice. The problem simplified: lets say I have three environments:
admin, product, review. I was thinking I could partition my
environments into three shards matching the environments. I put all
skids on after reading this recently:
From the Manning "Scaling MongoDB" book, they talk about Low-
Cardinality Shared Keys
"""
Some people don't really trust or understand how MongoDB automatically
distributes data, so they think something along the lines of "I have
four shards, so I will use a field with four possible values for my
shard keys." This is a really, really bad idea
...
If you chose a shard key with low cardinality, you will end up with
large, unmovalble, unsplittable chunks that will make your life
miserable
"""
Thats scary -- and its exactly what I am proposing ! Its not that I
mistrust Mongo, I think I may be using the wrong tool for the wrong
job. I simply want to move data across environments efficiently, and
also keep data close to the applications running in those
environments. If anyone else has similar experience or advice in this
area, I would appreciate hearing about it.
I ran a prototype of the concept, and it seemed to "work". I've
included the output from my session for anyone who is interested (I
learned a lot from it)...
-------------
First, I setup the shard servers
## start config server
./mongod --dbpath /dbs/config --port 20000 --fork
## start shard routing server
./mongos --port 30000 --configdb localhost:20000
## start admin shard
./mongod --dbpath /dbs/admin-master --port 10000 --fork
## start prod1 shard
./mongod --dbpath /dbs/prod1-master --port 10001 --fork
./mongod --dbpath /dbs/review-master --port 10003 --fork
## connect to Administrative Server (mongos)
./mongo localhost:30000/admin
db.runCommand({addshard: "localhost:10000", name: "admin", allowLocal:
true})
db.runCommand({addshard: "localhost:10001", name: "prod1", allowLocal:
true})
db.runCommand({addshard: "localhost:10003", name: "review",
allowLocal: true})
# Enable sharding on database mlt
db.runCommand({"enablesharding": "mlt"});
# Enable sharding on the dogs collection
db.runCommand({"shardcollection": "mlt.dogs", "key" : {"env": 1}})
# split by environment
db.runCommand({split: "mlt.dogs", middle: {"env": "admin"}})
{ "ok" : 1 }
db.runCommand({split: "mlt.dogs", middle: {"env": "prod"}})
{ "ok" : 1 }
db.runCommand({split: "mlt.dogs", middle: {"env": "review"}})
{ "ok" : 1 }
Here is a look at my "mlt" database's sharding info. Everything by
default went to the admin shard (on: admin)
{ "_id" : "mlt", "partitioned" : true, "primary" : "admin" }
mlt.dogs chunks:
admin 4
{ "env" : { $minKey : 1 } } -->> { "env" : "admin" } on :
admin { "t" : 1000, "i" : 1 }
{ "env" : "admin" } -->> { "env" : "prod" } on : admin
{ "t" : 1000, "i" : 3 }
{ "env" : "prod" } -->> { "env" : "review" } on : admin
{ "t" : 1000, "i" : 5 }
{ "env" : "review" } -->> { "env" : { $maxKey : 1 } } on :
admin { "t" : 1000, "i" : 6 }
Next, move the chunks on an empty database
db.runCommand({moveChunk: "mlt.dogs", find: { "env": "admin"}, to:
"admin"});
{ "ok" : 0, "errmsg" : "that chunk is already on that shard" }
mongos> db.runCommand({moveChunk: "mlt.dogs", find: { "env": "prod"},
to: "prod1"});
{ "millis" : 2094, "ok" : 1 }
mongos> db.runCommand({moveChunk: "mlt.dogs", find: { "env":
"review"}, to: "review"});
{ "millis" : 2180, "ok" : 1 }
Now we have allocated chunks into ranges between reserved words
$minKey and $maxKey
{ "_id" : "mlt", "partitioned" : true, "primary" : "admin" }
mlt.dogs chunks:
admin 2
prod1 1
review 1
{ "env" : { $minKey : 1 } } -->> { "env" : "admin" } on : admin
{ "t" : 3000, "i" : 1 }
{ "env" : "admin" } -->> { "env" : "prod" } on : admin { "t" :
1000, "i" : 3 }
{ "env" : "prod" } -->> { "env" : "review" } on : prod1 { "t" :
2000, "i" : 0 }
{ "env" : "review" } -->> { "env" : { $maxKey : 1 } } on : review
{ "t" : 3000, "i" : 0 }
Next, when connected to the mongos balancer node, I inserted 4
records
db.dogs.insert("name": "Collins", "env": "prod"}
db.dogs.insert("name": "Bluesy", "env": "review"}
db.dogs.insert("name": "Billy": "env": "prod"}
db.dogs.insert("name": "Spot", "env": "admin")
When I connect to each individual shard, here is what i get
mongo localhost:10003 // connecting to review shard
use mlt
db.dogs.find()
{ env: review, name: Bluesy}
mongo localhost:10000 // connect to admin shard
use mlt
db.dogs.find()
{env: admin, name: Spot}
mongo localhost:10001 // connect to prod1 shard
use mlt
db.dogs.find()
{env: admin, name: Collins}
{env: admin, name: Billy}
One thing that I learned. You cannot modify shard keys, implying once
an object is on a shard, it stays there.
db.dogs.update({"name": "Billy"}, { "$set": { "env": "review"}})
>> can't do non-multi update with query that doesn't have a valid shard key
db.dogs.update({"name": "Billy", "env": "prod"}, { "$set": { "env":
"review"}})
>> Can't modify shard key's value fieldenv for collection: mlt.dogs
For fun, lets say a value gets stored without a defined environment.
db.dogs.insert(name: "Homeless", env: "shelter")
Which shard did it end up in? Mongo seems to shard into splits by
standard string collation. So it ends up in the "review" shard, since
"s" comes after "r"