The more we discuss this, the more I'm becoming convinced that the
right answer is to transform the operations to something that will be
recoverable by the application. The easiest way to do this is to move
the otherwise deleted data to some recovery area (a list). The
application can issue a subsequent move to put the data back wherever
it wants (or delete it). Because that would be another move operation,
it would chain nicely and if both the server and client automatically
resolved the conflict, we'd get the normal OT semantics merging that
consistently. (If they both do the same thing, thats what will happen.
Otherwise exactly one of them will win).
I'm still a little bit tempted to ignore the problem and convert the
operation into an overwrite just because of how rare it is, but I
think that'd make object move mostly useless, and I think recovering
from that state (or working around it) is going to be really difficult
/ annoying. I think its workable by clever app developers (see my
comments below), but working around that is complicated and error
prone and hard to reason about & test if you aren't used to it.
I don't think making transform configurable with a user specified
function is the right option. You'd need to supply the exact same
function in both the client and the server, and if you don't the
document will get silently out of sync. And it would break a bunch of
interfaces that I'd rather keep stable.
We could add a flag in move operations to tell the OT code what it
should do in the case of a conflict. Then the options could be merging
the values, replacing (last writer wins), or renaming the key to
oldkey-1. But I think sometimes you don't want any of those things. In
the IDE example, you probably want to rename librope.h to librope-1.h,
not librope.h-1. Also renaming the key is tricky because the transform
function doesn't have access to the document, and again I'd rather not
change the API.
More comments below.
On Tue, Oct 21, 2014 at 1:49 AM, Patrick Coleman <
patco...@google.com> wrote:
>>
>> My worry is that if someone else moves another file called rope.h to
>> the resources directory at the same time, I don't want the result to
>> ever be that one of our files gets deleted from disk.
>>
>
> If the client/server keeps recent operations in memory and supports undo /
> has their own logic to recognize and handle move collisions, this seems
> less of a problem.
True - but this is a little tricky.
- I'm not sure if I want to make this OT type invertable. If we don't,
finding the data that was actually deleted will be difficult
- Any concurrent edits on the data that conflicted might would be
lost, because we'd replace a move with a delete-reinsert
But we could implement this much more cleanly if move operations
specify a fallback location for conflicting data to get moved into. If
that fallback location was a list, we can guarantee we won't have any
cascaded conflicts. To resolve the conflict, the user can just move or
delete the conflicted object like normal.
Given doc {x:1, y:2, _recovery:[]}.
Op1: (Move 'x' -> 'z', on conflict move to:'_recovery')
Op2: (Move 'y' -> 'z', on conflict move to:'_recovery')
Op1 transforms to: (move 'x' -> _recovery[0], flags:conflicted)
Result: {z:2, _recovery:[1]}
... We could add some metadata too if we want - we could make it end
up with _recovery:[{from:'x', to:'z', data:2}] or something.
> Alternatively, are any tests supported? (like in 6902
> <
https://tools.ietf.org/html/rfc6902#appendix-A.8>)
> in which case each move could include a test that the original value is
> missing, and so would automatically be undone when transformed against an
> identical move.
Thats an interesting idea, but I think it'd be a little wonky if you
moved the object, and then had to wait for a roundtrip to find out
that the move failed and it got moved back. But the bigger problem is
that you might move B->C then A->B. If B->C fails, we have to undo
A->B as well, and so on. Doable, but ugh. (And what if you did B->C
then insert {lots of data}->B. We have the same problem with the
inserted data - it'll now conflict with the moved-back B.
The nice thing about that idea is that we wouldn't need a special
first-writer-wins set operation, because you could implement that as a
test+set. But that shouldn't be hard to implement anyway.
> Some observations:
>> - This will almost never actually happen in real life.
>> - Even if we turn the operation will transform into a 'replace', we
>> could add a flag to the operation to mark that it was unintentionally
>> overwritten, and then the process which is making the corresponding
>> changes on disk could see that and do some special
>> application-specific behaviour instead of actually deleting the file.
>
>
> Indeed - your structure could instead be:
> { "name": "librope",
> "package": {
> "src": {
> "rope.c": "somelonguniqueid",
> "rope.h": "anotherfileid",
> }},
> data: {
> "somelonguniqueid": { "text": "#include <rope.h>\n ..." },
> "anotherfileid": { "text": "// This is a cool header" },
> }}
>
> Then deletion is some separate offline index/cleanup, and resolving
> conflicts is simpler.
> (although 'move' is less useful here as there are no child properties).
Yeah - thats definitely usable. It kind of moves the mess around your
plate a little. Its very similar to the idea of just not allowing
move-to-object operations at all, which would force every
dictionary-like structure to be managed using lists with manual
(custom!) dedup in the case of conflicts. Either way, application
writers would need to do some manual garbage collection. In your case,
two people could both insert the same file at the same time, and you'd
end up with a data key which isn't referenced. Also if two people
renamed different files to the same thing, one of those references
will go away and you'll need to detect that & clean up.
Definitely manageable though!
>>
>> > Regarding object move, I'm in favor of keeping things simple, but I'm
>> also
>> > in favor of thinking about this in terms of what people might actually do
>> > with this capability. Since mapping a directory structure of files is the
>> > one good example you have, then maybe multiple moves to the same key
>> should
>> > merge their values as directory moves merge their files. That would mean
>> a
>> > slightly modified version of your option 1. In the simple case, you have
>> > {x:1, y:2} -> {z:2}, but in the complex case you have {x:{a:1}, y:{b:2}}
>> ->
>> > {z:{a:1, b:2}}.
>>
>
> Does move distinguish between PUT vs. PATCH?
> I would assume PUT (x: {a: 1}) -> z and PUT (y: {b: 2}) -> z would end up
> in z: {b:2}
> but if the second operation received was a PATCH operation, they would be
> merged rather than replaced.
> For file system moves they'd probably be PUT, but their might be instances
> where a PATCH is useful.
I was just thinking of move as a PUT operation. When is PATCH useful?
In what use cases do you actually want two object structures to get
merged?