Hi Jameson,
MongoDB by default uses ObjectIDs as the default value for the _id field if the _id field is not specified. The value of ObjectID represent a time stamp (most significant bits), which means that they increment in a regular and predictable pattern — not a hashed value.
If you specify your own _id values and have already hashed your _id values (or a field you want to shard on) you can use this as your shard key instead of requesting the server to calculate this for you.
Although, letting MongoDB to create the hashed index for you has an advantage. If you shard an empty collection using a hashed shard key, MongoDB will automatically create and migrate chunks that each shards has two chunks. You can control how many chunks MongoDB will create with numInitialChunks parameter to shardCollection.
For example: even before there is any records in the collection, sharding by a hashed index will give you:
shard key: { "_id": "hashed" }
chunks:
shard01: 2
shard02: 2
{ "_id": { "$minKey" : 1 } } -> { "_id": NumberLong("-4611686018427387902") } on: shard01 Timestamp(2, 2)
{ "_id": NumberLong("-4611686018427387902") } -> { "_id": NumberLong("0") } on: shard01 Timestamp(2, 3)
{ "_id": NumberLong("0") } -> { "_id": NumberLong("4611686018427387902") } on: shard02 Timestamp(2, 4)
{ "_id": NumberLong("4611686018427387902") } -> { "_id": { "$maxKey" : 1 } } on: shard02 Timestamp(2, 5)
Alternatively if you are using your own hashed values you can manually pre-split chunks for an empty collection before importing any data. You can pre-split chunks on hashed values and non-hashed values. This way the impact of the balancer would only be felt when you add a new shard (and therefore must balance).
Regards,
Wan.