Implementing Optimistic Locking

155 views
Skip to first unread message

Aurélien Geron

unread,
Apr 11, 2013, 6:44:00 AM4/11/13
to mongoeng...@googlegroups.com
Hi everyone,

I'm new to MongoDB, MongoEngine and I just joined this mailing list.  :-)

I've read the docs and searched the web and this mailing list, and unless I'm mistaken there's no discussion about optimistic locking.  Here's an example, where some data is lost because of a race condition:

from mongoengine import *
connect("test")

class Task(EmbeddedDocument):
    text = StringField()

class TodoList(Document):
    name = StringField()
    tasks = ListField(EmbeddedDocumentField(Task))
    def move_task(self, index, new_index):
        self.tasks.insert(new_index, self.tasks[index])
        if new_index <= index:
            index += 1
        del(self.tasks[index])
    def __str__(self):
        return self.name + ": " + ", ".join([task.text for task in self.tasks])

todo = TodoList(name="Organize birthday")
todo.tasks.append(Task(text = "Buy presents"))
todo.tasks.append(Task(text = "Send invitations"))
todo.tasks.append(Task(text = "Bake cake"))
todo.save() # <TodoList: Organize birthday: Buy presents, Send invitations, Bake cake>


todo_copy = TodoList.objects.get(id = todo.id) # another instance for the same mongodb document
todo_copy.tasks.insert(0, Task(text = "Prepare a poem"))
todo_copy.save() # <TodoList: Organize birthday: Prepare a poem, Buy presents, Send invitations, Bake cake>


todo.move_task(1, 0)  # reordering arrays seems like a dangerous mongodb operation, without locking.
todo.save() # <TodoList: Organize birthday: Send invitations, Buy presents, Bake cake>

todo.reload() # <TodoList: Organize birthday: Send invitations, Buy presents, Bake cake>

When manipulating multiple Document instances that represent the same MongoDB document, we run the risk of losing data, in this case the "Prepare a poem" task is lost.
I know that pushing an element at the end of a list is atomic, and mongodb 2.4 now handles push+order, but there are lots of cases where that's just not enough, for example when you need to reorder a list.

Wouldn't it be nice if we could write something like this ?

...
class TodoList(Document):
    version = IntField()
    name = StringField()
    ...
...

todo.save(lock__version = true)

This would raise an exception in case the document's "version" was changed between (re)loading and saving the TodoList instance.  If the save succeeds, the version field is incremented.  Note that this syntax allows for multiple locks, even within one save/update operation.

We could also have something simpler, like this:

...
class TodoList(Document):
    version = IntField(optimistic_lock = true)
    name = StringField()
    ...
...

todo.save()

All save and updates would automatically check+increment the version field.

I guess we could also limit the lock to some specific fields:

...
class TodoList(Document):
    version = IntField(optimistic_lock_for = ["tasks"])
    name = StringField()
    ...
...

todo.save()

This would use the lock (meaning checking+incrementing) only if the tasks field is modified.

One last option:

...
@optimistic_lock
class TodoList(Document):
    name = StringField()
    ...
...

todo.save()  # all save and updates would automatically check+increment a "_ver" field.

We may not need all (or any) of the previous options.  Does this sound cool?  Do you have any other idea?

Cheers,
Aurélien

Ross Lawley

unread,
Apr 11, 2013, 8:59:28 AM4/11/13
to mongoeng...@googlegroups.com
Ho Aurelien,

That is definitely one solution and I'd love for an implementation to be added to mongoengine!  So anyone want to send a pull request :D  I'd happily help get it into the codebase :)

However, a there is a solution now with MongoDB -  do not to use save ;)  Its problematic because it is an overwrite and not an atomic update.  Operators like $set and $push help ensure you dont overwrite data across inserts across threads.

Ross



--
You received this message because you are subscribed to the Google Groups "MongoEngine Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongoengine-us...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Aurélien Geron

unread,
Apr 15, 2013, 9:05:27 AM4/15/13
to mongoeng...@googlegroups.com
Hi Rozza,

I just implemented a first draft.  It's available for testing and commenting on my github clone of mongoengine:

It's not exactly what I promised, but it does the job.  Basically, the idea is that every field can have a list of version_locks, which are the names of IntFields that should be checked and incremented whenever that field is modified and the document is saved.

Here's an example:

        class Task(EmbeddedDocument):
            description = StringField()

        class TodoList(Document):
            name = StringField()
            tasks_version = IntField(db_field = "tver")
            tasks = ListField(EmbeddedDocumentField("Task", db_field="t"), version_locks = ["tasks_version"])

Now whenever a task list is saved, if the tasks field is modified (the list itself or any task in the list), then the tasks_version IntField is checked against the value in mongodb, and incremented if it has the expected value.  If it's not the expected value, then a VersionLockError is raised.  For example (taken from tests/test_version_locks.py):

        todo_list1 = self.TodoList(name='Test').save()
        todo_list2 = self.TodoList.objects.get(id = todo_list1.id)

        self.assertIsNone(todo_list1.tasks_version)
        self.assertIsNone(todo_list2.tasks_version)

        todo_list1.name = "First change"
        todo_list1.save()
        self.assertIsNone(todo_list1.tasks_version)
        self.assertIsNone(todo_list2.tasks_version)
        
        todo_list2.name = "Second change (no conflict)"
        todo_list2.save()
        self.assertIsNone(todo_list1.tasks_version)
        self.assertIsNone(todo_list2.tasks_version)

        todo_list1.tasks = [self.Task(description = "Buy presents")]
        todo_list1.save()
        self.assertEquals(todo_list1.tasks_version, 1)
        self.assertIsNone(todo_list2.tasks_version)

        todo_list2.name = "Third change (still no conflict)"
        todo_list2.save()
        self.assertEquals(todo_list1.tasks_version, 1)
        self.assertIsNone(todo_list2.tasks_version)

        todo_list2.tasks = [self.Task(description = "Party")]
        self.assertRaises(VersionLockError, todo_list2.save)
        self.assertEquals(todo_list1.tasks_version, 1)
        self.assertIsNone(todo_list2.tasks_version)

        todo_list2.reload()
        self.assertEquals(todo_list1.tasks_version, 1)
        self.assertEquals(todo_list2.tasks_version, 1)

        todo_list2.tasks.append(self.Task(description = "Party"))
        todo_list2.save()
        self.assertEquals(todo_list1.tasks_version, 1)
        self.assertEquals(todo_list2.tasks_version, 2)

As a special shortcut, setting the version_locks attribute in the document class's _meta dictionary applies the version_locks to all fields.  For example:

        class Task(EmbeddedDocument):
            description = StringField()

        class TodoList(Document):
            meta = { "version_locks": ["version"] }
            name = StringField()
            version = IntField(db_field = "ver")
            tasks = ListField(EmbeddedDocumentField("Task", db_field="t"))

Now anytime the document is saved, the version lock is checked & incremented.  If its value is unexpected, a VersionLockError is raised:

        todo_list1 = self.TodoList(name='Test').save()
        todo_list2 = self.TodoList.objects.get(id = todo_list1.id)

        todo_list1.name = "First change"
        todo_list1.save()
        
        todo_list2.name = "Second change (conflicts)"
        self.assertRaises(VersionLockError, todo_list2.save)

This is a very quick implementation, I'm new to both mongodb and mongoengine, so I hope I didn't mess everything up.

Caveat: for the moment, the version_locks should only be set on fields in root document classes, not in EmbeddedDocuments or dynamic documents.

Could you please check this out and tell me what you think ?  I'll be happy to send you a pull request if you think the code is good enough for your standards.

BTW, is there a specific reason why updates and removals were done in two different calls to the update method on the collection in the document's save method ?  I now do both in a single update, the tests still pass, I hope I haven't broken the law !  ;-)

Cheers,
Aurélien

Ross Lawley

unread,
Apr 15, 2013, 2:58:59 PM4/15/13
to mongoeng...@googlegroups.com
Hi Aurelien,

That looks great!  I need to spend more time reviewing it - but looks good so far.  We need to make sure people understand it only works if you use save().

I'll schedule for 0.8 - but thanks for the contribution, feel free to add a pull request.

Regarding update having separate $set and $unset - I think that has been fixed in 0.8 but yes no reason for it!

Ross

Aurélien Geron

unread,
Apr 15, 2013, 5:25:22 PM4/15/13
to mongoeng...@googlegroups.com
Hi Ross,

I'm glad you like it.  :-))  I just sent you a pull request.

Unfortunately, I'll be in a cave for a few days (work, work, work), but I'll be back!  Please tell me whatever you want me to fix or improve, and I'll do my best.  I'm still very new to both mongodb and mongoengine, so I may be missing trivial stuff.  I'm not sure how version locks could work with embedded documents, for example:

    class Post(EmbeddedDocument):
        text = StringField()
        comments_version = IntField()
        comments = ListField(StringField(), version_locks = ["comments_version"])

    class Blog(Document):
        posts = ListField(EmbeddedDocumentField("Post"))

    blog = Blog()
    blog.posts.append(Post(text="foo", comments=["bla bla", "+1"]))
    blog.posts.append(Post(text="bar", comments=["great", "cool"]))
    blog.posts.append(Post(text="zap", comments=["bad bad", "amazing"]))
    blog.save()

    blog.posts[0].comments.append("Hey!")
    del blog.posts[2].comments[0]

    blog.save()

In this example, I guess we would have to check the version locks based on the Posts' index in the posts ListField.  Something like this in the mongo shell:

db.blog.update({ "posts.0.comments_version": {$exists: false}, "posts.2.comments_version": {$exists: false} },
  { $inc: {"posts.0.comments_version": 1, "posts.2.comments_version": 1},
    $set: {"posts.0.comments": ["bla bla", "+1", "Hey!"], "posts.2.comments": ["amazing"]} })

There's a risk that a post may be inserted by another process.  This would lead to checking and possibly updating the wrong post's version locks.  I guess you'd have to add a version lock for the posts ListField itself to avoid this problem.  Version locks in EmbeddedDocuments could be useful in some cases, but this kind of pitfall should be well documented.

Cheers,
Aurélien
Reply all
Reply to author
Forward
0 new messages