batch insert failure managment

87 views
Skip to first unread message

mrCode

unread,
Mar 2, 2012, 4:46:47 AM3/2/12
to mongodb-user
My application is doing ton of parallel batch inserts and every item/
document in every batch is important to be inserted successfully into
the database and if it does not, I need to know which ones have failed
to try it again.

I know batch insert is not atomic and the call could fail for whatever
reason beyond resource failure like say some timeout or network and so
on. I think in such situation, a portion of a batch have been
persisted and a portion of the batch has NOT been persisted and
several items in the batch could have failed but as far as I know,
mongodb will NOT tell me which ones have been persisted and which ones
have failed and which ones were not persisted. I probably can use
ContinueOnError and at least divide my problem into the ones that have
persisted and the ones the have failed but mongo will not tell me
which ones failed? is this correct?

My application can NOT live with a single item not being persisted or
double insertion (I can not try to insert the same batch again upon
failure). How should I address the partial batch insert failure
without knowing which ones have failed? What comes to my mind is to
add logic to the application layer where upon a batch failure, query
back all the items in the batch and figure out the missing ones and
create a new batch of the missing ones and try to persist it again? Is
that my only option?

Randolph Tan

unread,
Mar 5, 2012, 11:46:40 AM3/5/12
to mongod...@googlegroups.com
Basically, there are a couple of options:

1. Your method, which is to remember the _id of the batch and query them back. Depending on the size of the batch, this may not be as bad as it sounds, for 2 reasons: _id is indexed, and the default BSON Object id increases in time, which would mean that the batch documents will most likely be very close to each other in the b-tree. To make this really fast, you can limit the result to only return the _id field so the query can use a covered index.
2. Decompose the batch into individual inserts and call getLastError every time. I would imagine that this would be slower than 1, since you get more back and forth traffic.
3. There is a issue (https://jira.mongodb.org/browse/SERVER-4637) that is close to what you are looking for.
Reply all
Reply to author
Forward
0 new messages