Hi Andrey, I think the best example for what you want to do is a little farther down in the Motor tutorial:
>>> @gen.coroutine
... def do_insert():
... for i in range(2000):
... future = db.test_collection.insert({'i': i})
... result = yield future
...
>>> IOLoop.current().run_sync(do_insert)
Tornado's function IOLoop.current() creates an event loop for you, the first time you call it. "run_sync" runs the loop until the "do_insert" coroutine completes, then the loop terminates.
To parallelize and do 4 concurrent inserts, you might want to split your data into 4 even chunks:
from itertools import islice
data = range(2000)
n_chunks = 4
for i in range(n_chunks):
# Get a chunk of numbers like 0, 3, 7, 11, ...
chunk = islice(data, i, None, n_chunks)
Insert them in parallel and wait for all the inserts to finish:
from itertools import islice
from tornado import gen
from tornado.ioloop import IOLoop
from motor import MotorClient
@gen.coroutine
def insert_chunk(chunk):
for i in chunk:
yield db.test_collection.insert({'_id': i})
@gen.coroutine
def parallel_inserts():
yield db.test_collection.remove({})
data = range(2000)
n_chunks = 4
futures = []
for i in range(n_chunks):
# Get a chunk of numbers like 0, 3, 7, 11, ...
chunk = islice(data, i, None, n_chunks)
future = insert_chunk(chunk)
futures.append(future)
# await all in parallel
yield futures
n_inserted = yield db.test_collection.count()
print('Inserted %d' % n_inserted)
db = MotorClient().test
IOLoop.current().run_sync(parallel_inserts)
Each of the four processes will run much faster if it does a bulk insert instead of inserting documents individually:
@gen.coroutine
def insert_chunk(chunk):
yield db.test_collection.insert({'_id': i} for i in chunk)
I'd expect this parallelization to gain some performance with the MMAPv1 storage engine, but you'll tend to bottleneck on the database-level lock at a small number of n_chunks. Increasing n_chunks allows you to overcome network latency and to allow the client to use your client's CPU to bson-encode documents concurrently with the server performing the inserts, but increasing n_chunks to a very large number won't win you much. You can scale higher if you insert into collections in multiple databases at once.
With MongoDB 3 and WiredTiger you'll be able to scale up your n_chunks usefully, even if all the inserts are to one collection.