Options for merging and conflict resolution for offline data

350 views
Skip to first unread message

Andrew Wilcox

unread,
Sep 13, 2013, 3:12:58 PM9/13/13
to meteo...@googlegroups.com
Suppose a user performs an operation A resulting an a Meteor method call (which could be an insert, update, or remove on a collection, which internally turns into a Meteor method call), followed later by someone performing operation B (it might be a second person, or perhaps the same person on a different device).

There are various ways in which these two operations might conflict or need to be merged.

For example, delivery of A might be delayed so that B arrives at the server first.  This can happen in a standard Meteor application even without offline data: the first device might have temporarily lost its Internet connection.  Meteor will retry until it gets an Internet connection again, and then automatically send any waiting methods to the server.  This effect is can be more extended with an offline data implementation: a device might be offline for hours or days and then be turned on and go online and it would have the ability to finally send the waiting method call, if that's the right design choice for an application.

When might this be a problem?  Imagine I'm in a cafe on my tablet without an Internet connection, I type in a note, I turn off device, go home, and later in the day -- or even tomorrow -- I turn on my tablet and it finally has an Internet connection.  The note might be Very Important to me, so I really don't want it to get lost.  I really do want the tablet to save the note I typed in for as long as is necessary to get it safely to the server.

However, suppose at the cafe I had instead typed in just a short text: "idea: pet rock poem".  I go home and again don't have my tablet on, but this time I get on my desktop and in that same note field (which is empty because my tablet hasn't connected yet) I write my masterpiece "Ode to Pet Rocks".  An hour later I happily save and turn on my tablet to do something else... and the text "idea: pet rock poem" is finally delivered to the server... and overwrites my actual poem.

So one goal might be that even if B arrives at the server before A, if A happened first then we want "B arrives at the server and then A arrives at the server" to have the same outcome (the same final result in the database) as "A arrives first at the server followed by B".

To take the particular example of setting a field

Notes.update(noteId, {$set: {text: noteText}});

we could give the update a timestamp, and on the server keep track of a timestamp value for the "text" field.  Then at the server we could avoid saving A if it has an earlier timestamp than the current timestamp value in the database, because we know that a later update (B) had already been applied.

However, there are lots of different kinds of operations.  I might have a method that commutes, and so A and B can be applied in either order without worrying about timestamps:

Entries.update(entryId, {$inc: {votes: 1}});

Operational transform is a sophisticated example of designing rules so that "apply B and later apply A" can have a consistent result.

On the other hand, Meteor methods can be arbitrarily complex, so there's no general way of taking any operation A and operation B and having them be able to be applied in either order and get the same result.  You need to look at the particular operation that your particular application needs to do, and figure out what will work.

Nor is "come up with a final result that's the same whether A or B arrives at the server first" the only possible goal that one might have for conflict resolution.

For example, suppose first someone edits a field and then later someone else edits the field.  If the second person sees the text entered by the first person, then we can assume they'll take into account what was written (and so, for instance, if they delete something we can expect that they really meant to delete it).

But suppose the second person is offline and so doesn't see the text the first person entered.  If operation A is "set grocery list to 'apples, bananas'" and operation B is "set grocery list to 'carrots'" then it makes a difference whether the second person saw "apples, bananas" or not.

So here we might want to create a versioning system, where let's say version 1 of the grocery list on the server is "" and version 2 on the server is "apples, bananas".  The second person's client would have either {text: "", version:1} or {text: "apples, bananas", version: 2}, depending on whether it was online and had seen the first person's update or not.  Then, when it sent the update "carrots" to the server, it would also include the last known server version number.  If it's 2, we know that the previous update was in fact seen and deleting "apples, bananas" was intentional; but if it's an older version we know we have a conflict.

But again, this particular implementation strategy just applies to the one particular case of setting a field, while Meteor methods can be arbitrarily complex.  So if your application is doing some other kind of operation, you may need some other or adjusted way to detect conflicts.

Ultimately it comes down to the application and what operations it needs to perform.  There are lots of possible options, including the versioning and timestamp techniques I've mentioned here.  But each particular technique will only apply to particular situations, which may or may not be what your application needs.  Thus when talking about merge or conflict resolution strategies, I think it's important to start with your actual application and what it needs to do.

You'll notice I've tried to be very specific in my discussion: "first person A does X and then person B does Y".  With a specific description we can then look at whether one technique or another will work for that scenario.

Andrew


--
Andrew Wilcox  http://awwx.ws/

Morten Henriksen

unread,
Sep 30, 2013, 9:03:07 AM9/30/13
to meteo...@googlegroups.com
I think would be nice to be able to apply a conflict handling strategy to a collection, for starters we could try to make a simple OT conflict handling mechanism that could be applied to a standard Meteor collection if needed. My typical use would be business apps.

Lets say for my application I want to use the OT conflict handler and apply it to my collection.

I install a package "conflicthandler-ot" and apply it maybe like: ConflictHandler.OT(myCollection, 50 * $mb); 

Now, I would expect:

1. Latest data wins - operations could be rerun via the oplog in chronology order (only in case of older operations added during a sync and a diff patch would be applied)
2. Mergable fields (text/array/objects) should be merged - eg. leave out numbers/binary/base64/date/_id
3. The oplog collection would be prefixed and size handled over time
4. I'd expect two timestamps on all operations - ctime, utime (delete operations being a difficult carrier)

--

I'm not sure about how deletes should be handled.

Users A and B edits the same document, A is offline and just deleted the document, a sec later user B is online and edits the document. Some minutes later A comes online and syncs. What should happen to the document?
  • Should the delete operation be denied (eg. cannot remove a document with an timestamp newer than the delete timestamp - maybe respond "Access denied")
  • Shold the document be deleted and the user B data discarded?
Can a older destructive operation win over the newer data?

Users A and B edits the same document, A is online and just deleted the document, a sec later user B is offline and edits the document. Some minutes later B comes online and syncs. What should happen to the document now?
  • Should the delete operation be discarded in favor of the older document?
  • Shold the update operation be told "Access denied" - since the document does not exists?
Can a newer destructive operation win over older data? (data could be a year old)

--

I see a point in merging text can be complicated as you mention A and B are both offline, A types in "apples, bananas", then B types "carrots" what would the resulting text look like? "apples, bananas, carrots" it would be application specific merging to add " ," between the items - we are messing with language - a merge would look like "apples, bananascarrots"
On the good side we did not loose the users data, they could correct this them selfs.

We could add an operator for text merging $merge - the user could decide if data should be overwritten (default) or data should be merge able?

--
Reply all
Reply to author
Forward
0 new messages