pre create datafiles

13 views

Skip to first unread message

Julien Annibal

unread,

Oct 28, 2014, 6:49:02 AM10/28/14

to mongod...@googlegroups.com

I'm testing huge loading on my cluster (9 shards). I observed that it tooks ~8s per shard each time a new datafile (2Go) is allocating, and thus loading time are affected by the order of 20%.

So do you have a solution de pre create all datafiles ? do you think i can load my database until i reach the expected size, then drop each collections, and the load my database with production data ?

regards

Julien

Will Berkeley

unread,

Oct 28, 2014, 12:09:35 PM10/28/14

to mongod...@googlegroups.com

Yes, you can do this, but I don't think it would win you anything. The preallocation should occur in the background and not block ongoing inserts, unless the disk is saturated. If you preload the database, you're going to spend time inserting data that you're just going to throw away so you can insert the real data, while still incurring the 8s penalty for preallocation per data file, so the total time to load data will end up being much longer. Plus, if an additional 8s per 2gb raises the loading time by 20%, then it takes something like 40s to add 2gb, or about 6 1/2 hours to load a terabyte into 1 shard. That doesn't seem too bad to me, especially given that I ignored the parallelization of inserts across the shards.

As an alternative, you could load the data on a non-production cluster and get it to a ready, unchanging state, then move the data files over to the production instance. This would have to be done for all of the nodes in the cluster, including the config servers. This is essentially restoring from a backup, so you can consult the backup documentation for more information.