Best Practice splitting large Data

Boas Enkler

unread,

May 10, 2016, 10:38:15 AM5/10/16

to mongodb-user

I'm currently migrating a huge amount of data to mongodb.

There are abour 10 Mio Documents. They have are defined by a date and a assigned company id.

Now I'm thinking about wether it would make any difference when I split the data . So that I would have foreach company and / or date a collection or database.

Is it to be expected to have performance differences when having all documents in one collection or in multiple collections / databases?

sam

unread,

May 10, 2016, 1:46:06 PM5/10/16

to mongodb-user

You should use mongodb sharding and pick a shard key that isn't by company or date. Then add secondary indexes to the collection. You only need one collection. You can use separate databases per client or separate collections, but I don't believe that is needed.

Wan Bachtiar

unread,

May 16, 2016, 1:16:22 AM5/16/16

to mongodb-user

Is it to be expected to have performance differences when having all documents in one collection or in multiple collections / databases?

Hi Boas,

The answer depends on your use case and dataset. For example, if the frequent queries that your application used can be retrieved from a single collection, then it would be wise to keep the data in a single collection to eliminate an extra query to another collection. See also Data Modelling

Now I’m thinking about wether it would make any difference when I split the data

You may want to check out Sharding. Sharding solves the problem with horizontal scaling. With sharding, you can add more machines to support data growth and the demands of read and write operations. If you already have large dataset to be migrated to MongoDB see Manual Split Chunks in a Sharded Cluster.