(cross-posting my question from stack over flow here, I hope that's ok :
http://stackoverflow.com/questions/12961873/whats-a-good-mongodb-shar...
)
I'm storing performance metrics in the following schema:
{
"_id" : ObjectId("5069d68700a2934015000000"),
"port_name" : "CL1-A",
"metric" : 340,
"port_number" : "0",
"datetime" : ISODate("2012-09-30T13:44:00Z"),
"array_serial" : "12345"
}
Each array has 128 ports, the port names are CL1-A, CL2-A, CL3-A etc.. the
names correspond to port_numbers 0, 1, 2, 3 etc.. and I'm storing minutely
data for each metric in 1 collection per metric. I'd like to be able to
shard the collections but I'm having trouble figuring out a proper shard
key, and figuring out a unique index strategy.
Am I correct in the knowledge that the only way to enforce a unique key on
a sharded system is on the shard key? If I want to ensure a unique key on
array_serial, port_name, datetime, is that going to be an ok shard key?
Will it provide enough cardinality while still allowing for query
localization, and manageable chunks?
Or should I shard only on port_name, that way the records are evenly spread
out across the cluster? If this is the shard key do I have to keep a proxy
collection like made up like this:
{
"_id" : ObjectId("5069d68700a2934015000000"),
"key" : "1234,CL-1A,<dateinmiliseconds>"
}
And only write to the sharded collection if a write to the above proxy
collection succeeds? That seems like a lot of extra overhead.
Sorry, this is all a bit new and confusing.