Hi Christian,
To import into a sharded cluster deployment, you can pre-split the collection before importing any data.
You can do this as follows:
db.createCollection() and also create the necessary indexes using db.collection.createIndex(), including the index that corresponds to the shard key.sh.shardCollection()mongos.Since the empty chunks are already distributed and balanced, the imported data will go straight into the proper shard, and your import should run faster.
Please note that you should only pre-split an empty collection.
Choosing the correct shard key is extremely important for a sharded cluster deployment, since it is immutable once you shard a collection. For more information regarding shard keys, please see Considerations for Selecting Shard Keys.
Regarding the hashed shard key: A hashed shard key helps distribute writes if the indexed field you are hashing has high cardinality. However, a hashed shard key only supports equality queries based on the shard key, so range queries on a collection with a hashed shard key will always be less efficient scatter/gather queries (see: Distributed Queries). You may be able to choose a shard key that better supports your use case than a hashed shard key.
However, I also have some questions about your deployment:
I am implementing a cluster in mongodb (v 3.0.10.)
I would recommend using the latest 3.0.x series, which is currently 3.0.11.
- 3 ConfigServer on different servers (2 MAC and 1 UBUNTU on port 27020 each)
- 3 shards on different servers (2 MAC with 64 RAM and 1 UBUNTU 4 RAM)
Regarding the hardware and setup:
Do you mean that the Macs are equipped with 64 GB of RAM, and the Ubuntu machine is equipped with 4 GB of RAM? If that is so, is there a reason for using machines with such a large discrepancy in hardware capabilities? This is fine if this is a development setup, but not recommended if this is a production setup, because MongoDB balances the cluster by counting chunks, not data size.
Where are the config servers located (e.g. in the same machine as the shards, or on separate hardware), and what is the order of the config servers in the mongos --configdb setting? The first config server listed in the setting will get more read traffic vs. the other config servers.
Please note that the recommended sharded cluster deployment in a production environment involves using replica sets as shards, as noted in the Production Cluster Architecture page.
Best regards,
Kevin