Hi,
I'm trying to find out a way how I can insert bunch of docs via bulk python API, e.g. insert_many, where I may have a duplicate in my docs.
Current behavior stop when it encounters a duplicate error and it does not proceed afterwards because of thrown exception. I'd like to avoid that behavior and I want to bulk succeed and just skip duplicates. Here is a simple example:
from pymongo import MongoClient, DESCENDING
from pymongo.errors import BulkWriteError
uri = 'mongodb://localhost:8230'
client = MongoClient(uri)
coll = client['test']['db']
coll.create_index([('test',DESCENDING)], unique=True)
docs = [{'test':1} for _ in range(10)] + [{'foo':1 for _ in range(5)}]
try:
coll.insert_many(docs)
except BulkWriteError:
pass
docs = [{'bla':1} for _ in range(10)]
coll.insert_many(docs)
Doing so, I only see two docs in my test.db
{"test": 1, "_id": "572bd5392f74d466951ebb4a"}
{"_id": "572bd5392f74d466951ebb55", "bla": 1}
while I want to see 3 docs one with test, one with foo and one with bla keys.
We have an application which needs to write millions docs and I thought we can avoid a full scan to remove duplicates. Of course I can use plain insert, but it will be much slower operation.
Thanks,
Valentin.