Excellent post, Asya. Do you have any more insights/numbers on a distributed (sharded) cluster benchmarks, which also have replica sets?
I am currently in a planning phase to run YCSB benchmark on a MongoDB cluster. I have 11 machines to run this benchmark, each with following configurations:
RAM: 128GB
Disks: 8 disks, 7200 RPM, 1TB - in a RAID 0 array
CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz - 16 physical cores, 32 HT cores
Network: 1x10gbe
OS: CentOS 6.5
JVM: 1.7.0_67
I am thinking about deploying a 10 node sharded cluster with replica sets (1 primary, 2 secondary) and planning to keep 1 node for config servers. I am planning to run YCSB workloads in following order:
1) Load 2 TB of data, all unique keys.
2) Run zipfian get, read all fields
3) Run range scan
4) Run mixed workload of 95% of get and 5% of update operations
5) Run mixed workload of 50% of get and 50% of update operations.
I am going to use the default 1K row size, with 10 fields/columns with 100 bytes of payload and the sharding will be done based on the key which will be in format user+'long int'.
As per mongodb documentation,
http://docs.mongodb.org/manual/tutorial/deploy-replica-set/ and
http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/, I found that defining replica set and sharding adds a manual overhead during the setup.
I would appreciate if you could provide a guidance on what is the best way to setup the MongoDB cluster for this exercise and if you have done similar testing in-house, what type of numbers (throughput/latency) should I expect as an outcome?
Thanks in advance.
Milind