how to avoid mongodb race condition

1,382 views
Skip to first unread message

luocheng

unread,
May 27, 2012, 11:58:22 PM5/27/12
to mongod...@googlegroups.com
Hi all
    i want to update some data in mongodb,my logic is as follows:
     #find the specific document with "md5,time,size",
      if collection.find({"src_md5":file_md5,"src_time":file_time,"src_size":file_size}).count() == 0:
      #if not found
      #   find the idx,if idx is not yet exist,set idx equa 1
          if collection.find({},{"idx":1}).count() == 0:
             idx = 1
       #if idx is alread there, sort idx and get the biggest idx
          else:
              idx = collection.find({},{"idx":1}).sort('idx',-1).limit(5)[0]['idx']
              idx = idx + 1
         #insert the info with idx
             if not self.insertFileInfo(collection,file_obj,file_md5,file_time,file_size,long(idx)):
                return None
    #if the specific document with "md5,time,size" is found
      else:
    #just get the idx with the specific md5
            idx = collection.find({"src_md5":file_md5,"src_time":file_time,"src_size":file_size},{"idx":1})[0]['idx']
            return None

i will run the above code in 4 machines,which means 4 process would update mongodb almost simultaneously,how can i ensure the atomic of the operations?

Eliot Horowitz

unread,
May 28, 2012, 12:03:01 AM5/28/12
to mongod...@googlegroups.com
Are you just trying to simulate an auto increment key?
In general I would avoid it, but if you need to, the best thing is
just to have a unique index on that field.
Then you'll just a duplicate key error if there is a race and can re-try.
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

luocheng

unread,
May 28, 2012, 12:44:37 AM5/28/12
to mongod...@googlegroups.com
my recored schema is
 {"src_md5":"djapijfdakfiwqjfkasdj","src_size":2376498,"src_time":1338179291,"idx":1}
 {"src_md5":"jdfipajkoijjipjefjidwpj","src_size":234876323,"src_time":1338123873,"idx":2}
 {"src_md5":"djapojfkdasxkjipkjkf","src_size":3829874,"src_time":1338127634,"idx":3}

it's not a simple auto increment key,it should be increased when md5,size,time changed,and shuld be insert with them,as a record.
i create a compound unique index on {"src_md5","src_time","src_size"},and create a unique index on {"idx"},but before i insert new info, i should get the idx alread exist,then increase it.
there are two situation:
1,idx with the specific md5,size,time,if is already exist,just return the idx
2,if not  exist, increase idx with 1


在 2012年5月28日星期一UTC+8下午12时03分01秒,Eliot Horowitz写道:
> mongodb-user+unsubscribe@googlegroups.com

Saar Korren

unread,
May 28, 2012, 2:58:52 AM5/28/12
to mongodb-user
You can use a second collection, and the procedure below.
For brevity, we'll call the second collection temp_lock. We'll also
refer to {"src_md5":"...", "src_size":"...", "src_time":"..."} as a
"file". We'll also refer to the collection which contains the files as
documents as "myfiles".
1. Use findAndModify with upsert to insert the file into temp_lock, if
it does not exist. If an upsert did not occur, quit.
2. Search for the file in myfiles. If it is found, remove the file
from temp_lock, and quit.
3. Use findAndModify and $inc to increment a field "idx" in a document
in temp_lock. Find this document using {idx:{$exists:true}}. Use
upsert to create it if no document exists. Retrieve the *new* "idx".
4. Insert the file with the new "idx" into myfiles. You can use the
unique indexes for extra safety, but it is not needed at this point.
5. Remove the file from temp_lock.

Using this method, "myfiles" would never contain duplicate files or
idx, and any "skips" in idx will be temporary, which is a form of
eventual consistency (This is the best guarantee possible here).
Note that this does assume none of the processes ever crash or
restart. This can be improved by giving processes unique ids which
would be used to mark the files they were trying to insert, so that a
process can retry an insert if it crashed.

On May 28, 7:44 am, luocheng <luoc...@gmail.com> wrote:
> my recored schema is
>  {"src_md5":"djapijfdakfiwqjfkasdj","src_size":2376498,"src_time":1338179291 ,"idx":1}
>  {"src_md5":"jdfipajkoijjipjefjidwpj","src_size":234876323,"src_time":133812 3873,"idx":2}
>  {"src_md5":"djapojfkdasxkjipkjkf","src_size":3829874,"src_time":1338127634, "idx":3}
>
> it's not a simple auto increment key,it should be increased when
> md5,size,time changed,and shuld be insert with them,as a record.
> i create a compound unique index on {"src_md5","src_time","src_size"},and
> create a unique index on {"idx"},but before i insert new info, i should get
> the idx alread exist,then increase it.
> there are two situation:
> 1,idx with the specific md5,size,time,if is already exist,just return the
> idx
> 2,if not  exist, increase idx with 1
>
> 在 2012年5月28日星期一UTC+8下午12时03分01秒,Eliot Horowitz写道:
>
>
>
>
>
>
>
>
>
> > Are you just trying to simulate an auto increment key?
> > In general I would avoid it, but if you need to, the best thing is
> > just to have a unique index on that field.
> > Then you'll just a duplicate key error if there is a race and can re-try.
>
> > On Sun, May 27, 2012 at 11:58 PM, luocheng <luoc...@gmail.com> wrote:
> > > Hi all
> > >     i want to update some data in mongodb,my logic is as follows:
> > >      #find the specific document with "md5,time,size",
> > >       if
>
> > collection.find({"src_md5":file_md5,"src_time":file_time,"src_size":file_si ze}).count()
>
> > > == 0:
> > >       #if not found
> > >       #   find the idx,if idx is not yet exist,set idx equa 1
> > >           if collection.find({},{"idx":1}).count() == 0:
> > >              idx = 1
> > >        #if idx is alread there, sort idx and get the biggest idx
> > >           else:
> > >               idx =
> > > collection.find({},{"idx":1}).sort('idx',-1).limit(5)[0]['idx']
> > >               idx = idx + 1
> > >          #insert the info with idx
> > >              if not
>
> > self.insertFileInfo(collection,file_obj,file_md5,file_time,file_size,long(i dx)):
>
> > >                 return None
> > >     #if the specific document with "md5,time,size" is found
> > >       else:
> > >     #just get the idx with the specific md5
> > >             idx =
>
> > collection.find({"src_md5":file_md5,"src_time":file_time,"src_size":file_si ze},{"idx":1})[0]['idx']
>
> > >             return None
>
> > > i will run the above code in 4 machines,which means 4 process would
> > update
> > > mongodb almost simultaneously,how can i ensure the atomic of the
> > operations?
>
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "mongodb-user" group.
> > > To post to this group, send email to mongod...@googlegroups.com
> > > To unsubscribe from this group, send email to
> > > mongodb-user...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages