Yes, you can do this, but I don't think it would win you anything. The preallocation should occur in the background and not block ongoing inserts, unless the disk is saturated. If you preload the database, you're going to spend time inserting data that you're just going to throw away so you can insert the real data, while still incurring the 8s penalty for preallocation per data file, so the total time to load data will end up being much longer. Plus, if an additional 8s per 2gb raises the loading time by 20%, then it takes something like 40s to add 2gb, or about 6 1/2 hours to load a terabyte into 1 shard. That doesn't seem too bad to me, especially given that I ignored the parallelization of inserts across the shards.
As an alternative, you could load the data on a non-production cluster and get it to a ready, unchanging state, then move the data files over to the production instance. This would have to be done for all of the nodes in the cluster, including the config servers. This is essentially restoring from a backup, so you can consult the
backup documentation for more information.
-Will