(cross-posting my question from stack over flow here, I hope that's ok : http://stackoverflow.com/questions/12961873/whats-a-good-mongodb-shard-key-for-this-schema )
I'm storing performance metrics in the following schema:
{
"_id" : ObjectId("5069d68700a2934015000000"),
"port_name" : "CL1-A",
"metric" : 340,
"port_number" : "0",
"datetime" : ISODate("2012-09-30T13:44:00Z"),
"array_serial" : "12345"
}
Each array has 128 ports, the port names are CL1-A, CL2-A, CL3-A etc.. the names correspond to port_numbers 0, 1, 2, 3 etc.. and I'm storing minutely data for each metric in 1 collection per metric. I'd like to be able to shard the collections but I'm having trouble figuring out a proper shard key, and figuring out a unique index strategy.
Am I correct in the knowledge that the only way to enforce a unique key on a sharded system is on the shard key? If I want to ensure a unique key on array_serial, port_name, datetime, is that going to be an ok shard key? Will it provide enough cardinality while still allowing for query localization, and manageable chunks?
Or should I shard only on port_name, that way the records are evenly spread out across the cluster? If this is the shard key do I have to keep a proxy collection like made up like this:
{
"_id" : ObjectId("5069d68700a2934015000000"),
"key" : "1234,CL-1A,<dateinmiliseconds>"
}
And only write to the sharded collection if a write to the above proxy collection succeeds? That seems like a lot of extra overhead.
Sorry, this is all a bit new and confusing.