We have seen that writing causing the mongo instance’s whole performance down, i.e.
Hi Xiao,
It’s been a while since you posted this question, have you found a way to improve insert performance?
Before you’re going deeper into the Spark config, I would recommend to limiting the scope of the performance test. For example, by testing your MongoDB instance performance to handle 4M inserts. The goal is to find out the performance bottleneck by executing simple tests. For example, check your MongoDB memory/disk IO. See also MongoDB Capacity Planning
You may find the following performance related resources useful:
Is there a way to ensureIndex on the collection through mongo-spark connector’s python api? Will this help for speed up writing?.
Generally, adding an index wouldn’t improve your insert operations. It may improve update operations querying, but depends on the update operation itself.
we are using Mongo 3.2 without wiredTiger
If possible, I would recommend to consider/test using WiredTiger storage engine for your use case.
Regards,
Wan.