JSON editing and move conflicts

101 views
Skip to first unread message

Joseph Gentle

unread,
Oct 18, 2014, 7:34:05 PM10/18/14
to sha...@googlegroups.com, derbyjs, wave-dev
I'm (finally!) taking a serious look at making a better version of the
JSON OT type. I'm cross-posting this here because it directly effects
sharejs & derby users, and I think any serious rewrite of wave will
use something like this at the top level to store waves.

I have two questions that I would love some input on from the wider community.

We want to add a 'move' action which will let you transplant arbitrary
parts of the JSON structure to other places in the tree. This is
really useful if, for example, you want a hierarchal to-do list, a
tree of source files for your IDE or a tree of blips.

But I can't figure out what should happen if two users move different
objects to the same place in the tree at the same time.

1. The simplest option is to simply delete one of them. This is really
convenient from a programming pov, but I think that would be the worst
kind of surprise. {x:1, y:2} -> {z:2}
2. We could try and make one of the moves fail - {x:1, y:2} -> {x:1,
z:2}. On the face of it this sounds great, but this is a cascading
failure. If (locally) I move x->z then insert some new data at x, what
happens to the new data? Does the new data get deleted? Does x get
deleted? I can't think of a good answer here which isn't just
dangerous in a more complicated way. Making both the moves fail has
the same problem.
3. We could pick a winner and then move the loser to some special
lost&found bucket or something. So, {x:1, y:2} -> {__recovered_z:1,
z:2} But that could play havoc with data bindings, and we'd have to
introduce some reserved keys. The other way to do this is to introduce
a new top-level structure which contains the data, so it'd be
{data:{x:1, y:2}} -> {data:{z:2}, lost:[{z:1}]}.
4. Or we could add the backup location into the move operation itself,
so whoever's building on top of this API can make the choice. (They
can make a better decision because they know what the data structure
looks like). So instead of (move x->z) we'd have (move x->z,
onconflict:(move x->__x)) or something.

Are there any other choices I'm missing here? I'm edging toward option
4, although it might increase the type's complexity by 50% unless I
can think of a clean way to do it.

Thanks
-J

Joseph Gentle

unread,
Oct 18, 2014, 8:46:11 PM10/18/14
to wave-dev, sha...@googlegroups.com, derbyjs
It does complicate the type, and it breaks my nice abstraction. Also
remember that any funny logic will need to be on the server and the
client - and I'd really like the server to be application-agnostic if
possible. That said, the server will probably need
application-specific schema validation & access control code anyway,
so maybe its not a big deal.

Another option is to just do the dangerous thing for objects, but
actively encourage people to use lists instead. If you're inserting
into a list instead of inserting into an object, the semantics are
safe & easy. The final list will just contains both items. For
hierarchal to-do lists, you probably want a children:[] list on your
nodes anyway. Using object-move when the key is either a GUID or a
hash of the data would work fine as well because your keys won't
conflict.

This won't work if you're trying to map a directory structure of files
though. But thats the only bad use case I can think of. Maybe
something as simple as a flag on the move operation saying "in the
case of conflicts, rewrite the destination key to {key}_n". Or we
could just force the IDE to use a list of files, and leave deduping
hacks there.

-J



On Sat, Oct 18, 2014 at 4:44 PM, Ali Lown <a...@lown.me.uk> wrote:
> Hi Joseph,
>
> I think that the only sensible option is to delegate the resolution of
> this action in the case of conflict back into the application, so some
> sort of extension of (4) that allows some arbitrary lambda expression
> to be passed as the onconflict method. (Depending on the situation,
> they might want to 'merge' the two items (if possible), rather than
> moving one into a 'backup' location).
>
> This does complicate the OT type, and does make it more difficult to
> analyse how long certain actions will take to resolve though...
>
> How does this sound?
>
> Ali

Erik

unread,
Oct 19, 2014, 1:46:23 PM10/19/14
to sha...@googlegroups.com, wave...@incubator.apache.org, der...@googlegroups.com
Joseph Gentle, did you see rfc6902 

JavaScript Object Notation (JSON) Patch specification

? Do you plan you add OT for this spec in livedb?

Joseph Gentle

unread,
Oct 20, 2014, 1:32:07 PM10/20/14
to sha...@googlegroups.com
I had a couple of emails back and forth with one of the authors of
that spec, and then I dropped the ball. I hadn't realised they'd
finished the spec.

It looks like their final design is similar to the current sharejs OT type.

- We decided to use list based paths instead of string based paths.
(['a', 'b', 'c'] instead of '/a/b/c' or 'a.b.c'). There were two
reasons - first splitting javascript strings is incredibly slow, and
secondly you have to implement escape characters so you can put your
path separator in an object key. There was a conversation in a github
issue a few years ago, and I can't find it now.
- Up until we finish the new JSON code, ShareJS won't have a way to do
object-move operations.
- The current JSON0 implementation uses the list-of-changes model. The
problem is that given two operations, transforming an operation with M
changes by another operation with N changes requires N*M steps - its
really slow. The way I'm structuring the new code is the same as the
current text type where an operation is actually a traversal of the
original object which can rewrite it on the way. Because the traversal
is in order, we can scan both operations together and the operation
takes N+M steps.

At a glance, we can probably write some pretty simple bidirectional
translation code, converting sharejs operations to json patches and
vice versa. I don't know if the json patch specification is used
anywhere yet, but if it is that would probably be very handy.

-J
> --
> You received this message because you are subscribed to the Google Groups
> "ShareJS" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sharejs+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Richard Davis

unread,
Oct 20, 2014, 9:53:46 PM10/20/14
to sha...@googlegroups.com, wave...@incubator.apache.org, der...@googlegroups.com
Hi Joseph,

Regarding object move, I'm in favor of keeping things simple, but I'm also in favor of thinking about this in terms of what people might actually do with this capability. Since mapping a directory structure of files is the one good example you have, then maybe multiple moves to the same key should merge their values as directory moves merge their files. That would mean a slightly modified version of your option 1. In the simple case, you have {x:1, y:2} -> {z:2}, but in the complex case you have {x:{a:1}, y:{b:2}} -> {z:{a:1, b:2}}.

Maybe you could add a flag that allows the writer to specify whether they want a move to "replace" the existing value (your option 1) or "merge" with the existing value (this modified option 1). It's still clear what will happen if two writers use two different flags, as long as you can put the two ops in some order: if the "replace" comes second you can replace, and if the "merge" comes second you can merge.

The only downside I see of merging is that it can have a cascading effect. That could be tricky, and I haven't thought it through, but I had hope that this wouldn't be fundamentally different from the other changes you are making in JSON2.

-Richard

Joseph Gentle

unread,
Oct 21, 2014, 3:12:10 AM10/21/14
to wave-dev, sha...@googlegroups.com, derbyjs
Thats a good idea.

Lets say I'm editing a small C project in my IDE. Lets say the project
is being stored as one big JSON object (including the text of all
files). I have another process which is periodically saving out any
updated files to disk.

My document might look like this:

{
"name": "librope",
... Compiler options, etc.
"files": {
"src": {
"rope.c": { "text": "#include <rope.h>\n ..." },
"rope.h" { "text": "// This is a cool header" },
},
"resources": {... more files .... }
}
}

Lets say I want to move src/rope.h to the resources directory. I drag
the file in my IDE, which does an object move operation to move the
files/src/rope.h to files/resources/rope.h. This is fine, and will
work fine.

My worry is that if someone else moves another file called rope.h to
the resources directory at the same time, I don't want the result to
ever be that one of our files gets deleted from disk.

Some observations:
- This will almost never actually happen in real life.
- Even if we turn the operation will transform into a 'replace', we
could add a flag to the operation to mark that it was unintentionally
overwritten, and then the process which is making the corresponding
changes on disk could see that and do some special
application-specific behaviour instead of actually deleting the file.

-J

On Mon, Oct 20, 2014 at 8:27 PM, Michael MacFadden
<michael....@gmail.com> wrote:
> Joseph,
>
> I¹d like to drill into the directory structure mapping concept a bit. Can
> you epand on the use case where you think ShareJS would be used to map to a
> directory structure? I think thing this would help us understand some of
> the challenges.
>
> ~Michael
>> On Sat, Oct 18, 2014 at 4:44 PM, Ali Lown <a...@lown.me.uk <javascript:> >
>> wrote:
>>> > Hi Joseph,
>>> >
>>> > I think that the only sensible option is to delegate the resolution of
>>> > this action in the case of conflict back into the application, so some
>>> > sort of extension of (4) that allows some arbitrary lambda expression
>>> > to be passed as the onconflict method. (Depending on the situation,
>>> > they might want to 'merge' the two items (if possible), rather than
>>> > moving one into a 'backup' location).
>>> >
>>> > This does complicate the OT type, and does make it more difficult to
>>> > analyse how long certain actions will take to resolve though...
>>> >
>>> > How does this sound?
>>> >
>>> > Ali
>>> >
>>> > On 19 October 2014 00:33, Joseph Gentle <jos...@gmail.com <javascript:> >

Joseph Gentle

unread,
Oct 21, 2014, 3:41:01 PM10/21/14
to wave-dev, sha...@googlegroups.com, derbyjs
The more we discuss this, the more I'm becoming convinced that the
right answer is to transform the operations to something that will be
recoverable by the application. The easiest way to do this is to move
the otherwise deleted data to some recovery area (a list). The
application can issue a subsequent move to put the data back wherever
it wants (or delete it). Because that would be another move operation,
it would chain nicely and if both the server and client automatically
resolved the conflict, we'd get the normal OT semantics merging that
consistently. (If they both do the same thing, thats what will happen.
Otherwise exactly one of them will win).

I'm still a little bit tempted to ignore the problem and convert the
operation into an overwrite just because of how rare it is, but I
think that'd make object move mostly useless, and I think recovering
from that state (or working around it) is going to be really difficult
/ annoying. I think its workable by clever app developers (see my
comments below), but working around that is complicated and error
prone and hard to reason about & test if you aren't used to it.

I don't think making transform configurable with a user specified
function is the right option. You'd need to supply the exact same
function in both the client and the server, and if you don't the
document will get silently out of sync. And it would break a bunch of
interfaces that I'd rather keep stable.

We could add a flag in move operations to tell the OT code what it
should do in the case of a conflict. Then the options could be merging
the values, replacing (last writer wins), or renaming the key to
oldkey-1. But I think sometimes you don't want any of those things. In
the IDE example, you probably want to rename librope.h to librope-1.h,
not librope.h-1. Also renaming the key is tricky because the transform
function doesn't have access to the document, and again I'd rather not
change the API.

More comments below.

On Tue, Oct 21, 2014 at 1:49 AM, Patrick Coleman <patco...@google.com> wrote:
>>
>> My worry is that if someone else moves another file called rope.h to
>> the resources directory at the same time, I don't want the result to
>> ever be that one of our files gets deleted from disk.
>>
>
> If the client/server keeps recent operations in memory and supports undo /
> has their own logic to recognize and handle move collisions, this seems
> less of a problem.

True - but this is a little tricky.
- I'm not sure if I want to make this OT type invertable. If we don't,
finding the data that was actually deleted will be difficult
- Any concurrent edits on the data that conflicted might would be
lost, because we'd replace a move with a delete-reinsert

But we could implement this much more cleanly if move operations
specify a fallback location for conflicting data to get moved into. If
that fallback location was a list, we can guarantee we won't have any
cascaded conflicts. To resolve the conflict, the user can just move or
delete the conflicted object like normal.

Given doc {x:1, y:2, _recovery:[]}.
Op1: (Move 'x' -> 'z', on conflict move to:'_recovery')
Op2: (Move 'y' -> 'z', on conflict move to:'_recovery')

Op1 transforms to: (move 'x' -> _recovery[0], flags:conflicted)
Result: {z:2, _recovery:[1]}

... We could add some metadata too if we want - we could make it end
up with _recovery:[{from:'x', to:'z', data:2}] or something.

> Alternatively, are any tests supported? (like in 6902
> <https://tools.ietf.org/html/rfc6902#appendix-A.8>)
> in which case each move could include a test that the original value is
> missing, and so would automatically be undone when transformed against an
> identical move.

Thats an interesting idea, but I think it'd be a little wonky if you
moved the object, and then had to wait for a roundtrip to find out
that the move failed and it got moved back. But the bigger problem is
that you might move B->C then A->B. If B->C fails, we have to undo
A->B as well, and so on. Doable, but ugh. (And what if you did B->C
then insert {lots of data}->B. We have the same problem with the
inserted data - it'll now conflict with the moved-back B.

The nice thing about that idea is that we wouldn't need a special
first-writer-wins set operation, because you could implement that as a
test+set. But that shouldn't be hard to implement anyway.

> Some observations:
>> - This will almost never actually happen in real life.
>> - Even if we turn the operation will transform into a 'replace', we
>> could add a flag to the operation to mark that it was unintentionally
>> overwritten, and then the process which is making the corresponding
>> changes on disk could see that and do some special
>> application-specific behaviour instead of actually deleting the file.
>
>
> Indeed - your structure could instead be:
> { "name": "librope",
> "package": {
> "src": {
> "rope.c": "somelonguniqueid",
> "rope.h": "anotherfileid",
> }},
> data: {
> "somelonguniqueid": { "text": "#include <rope.h>\n ..." },
> "anotherfileid": { "text": "// This is a cool header" },
> }}
>
> Then deletion is some separate offline index/cleanup, and resolving
> conflicts is simpler.
> (although 'move' is less useful here as there are no child properties).

Yeah - thats definitely usable. It kind of moves the mess around your
plate a little. Its very similar to the idea of just not allowing
move-to-object operations at all, which would force every
dictionary-like structure to be managed using lists with manual
(custom!) dedup in the case of conflicts. Either way, application
writers would need to do some manual garbage collection. In your case,
two people could both insert the same file at the same time, and you'd
end up with a data key which isn't referenced. Also if two people
renamed different files to the same thing, one of those references
will go away and you'll need to detect that & clean up.

Definitely manageable though!

>>
>> > Regarding object move, I'm in favor of keeping things simple, but I'm
>> also
>> > in favor of thinking about this in terms of what people might actually do
>> > with this capability. Since mapping a directory structure of files is the
>> > one good example you have, then maybe multiple moves to the same key
>> should
>> > merge their values as directory moves merge their files. That would mean
>> a
>> > slightly modified version of your option 1. In the simple case, you have
>> > {x:1, y:2} -> {z:2}, but in the complex case you have {x:{a:1}, y:{b:2}}
>> ->
>> > {z:{a:1, b:2}}.
>>
>
> Does move distinguish between PUT vs. PATCH?
> I would assume PUT (x: {a: 1}) -> z and PUT (y: {b: 2}) -> z would end up
> in z: {b:2}
> but if the second operation received was a PATCH operation, they would be
> merged rather than replaced.
> For file system moves they'd probably be PUT, but their might be instances
> where a PATCH is useful.

I was just thinking of move as a PUT operation. When is PATCH useful?
In what use cases do you actually want two object structures to get
merged?

Joseph Gentle

unread,
Oct 22, 2014, 3:08:28 PM10/22/14
to wave-dev, sha...@googlegroups.com, derbyjs
On Wed, Oct 22, 2014 at 2:50 AM, Patrick Coleman <patco...@google.com> wrote:
>>
>> - I'm not sure if I want to make this OT type invertable. If we don't,
>
> What are the costs of making it invertable? i.e. are there particular
> reasons not to?
> I guess for proper invertability, you'd need to copy all the data being
> moved?

I honestly don't know what the complexity cost is. And I've wavered
back and forth on whether invertibility is worth it for the past few
years. The way I'm looking at designing operations at the moment is
something like this:

Given a doc: {x:5, y:[1,2,3]}:
Pickup phase: go in key x, delete, skip to key y, go in key 1, pick up (id=0)
Edit phase: go in key y.0. Add 10
Drop phase: go to key x, insert "hi", go in key z, drop id=0

-> The document will be {x:"hi", y:[11,3], z:2}

If this works out, inverting the operation *should be* really easy -
we just swap the pickup phase & drop phase, then invert everything in
the edit phase. We *don't* need a copy of moved data, we just need to
swap the source and destination. I'm thinking that if you explicitly
want to overwrite something, you'll need to delete it in the pickup
phase before you can move something over the top of it. Then the
semantics of conflicts-on-insert will match the semantics of
conflict-on-move. But that also means that only deletes would need to
specify what was deleted to make invertibility work.

I don't know if this will actually work out in practice - I'm writing
the transform function now and I may end up changing the data model
somewhat.

I also want a couple more properties:

- I want to be able to embed subtypes for things like rich text - and
those subtypes aren't necessarily invertible themselves.
- I want to make a TP2 version of this JSON type, so we can make it
into a p2p federated wave-like structure at some point. That type will
not be invertible because of the requirements of tp2 semantics.

So, I'm not sure.

>> But we could implement this much more cleanly if move operations
>> specify a fallback location for conflicting data to get moved into. If
>> that fallback location was a list, we can guarantee we won't have any
>> cascaded conflicts. To resolve the conflict, the user can just move or
>> delete the conflicted object like normal.
>>
>
> Out of interest, is this something that conflicting 'set' operations would
> want too?
> e.g. (Set 'c' {v: 1}) & (Set 'c' {v: 2}) seems to have similar loss
> semantics,
> should _recovery then be [{set: 'c', data: {v: 1}}]

Yep absolutely.
> Yep, I was assuming everyone wants manual garbage collection anyway :)
> I realized though that moving folders messes this up a bit;
> either the folders are also files (possible, but then every N-deep folder
> traversal does N id lookups)
> or they are objects (e.g. "src": {"js": { "file1": id1, "file2": id2,
> ...}}) in which case you're back to the original problem.

Yeah exactly. I think the merge conflict semantics I suggested above
are a decent fit for this problem, and will make things a lot easier.

>> Does move distinguish between PUT vs. PATCH?
>> > I would assume PUT (x: {a: 1}) -> z and PUT (y: {b: 2}) -> z would end
>> up
>> > in z: {b:2}
>> > but if the second operation received was a PATCH operation, they would be
>> > merged rather than replaced.
>> > For file system moves they'd probably be PUT, but their might be
>> instances
>> > where a PATCH is useful.
>>
>> I was just thinking of move as a PUT operation. When is PATCH useful?
>> In what use cases do you actually want two object structures to get
>> merged?
>>
>
> Only thing I could think of was batch file moves - i.e. moving a set of
> string-id'd files from one directory to another,
> you'd want to PATCH move {'f1.js': {...}, 'f2.js': {...}} from src -> dest
> but then it's a bit weird, as you'd want PATCH on the top level and PUT
> underneath, and it'd probably be better off as a list of PUT moves.
> Otherwise, I couldn't think of one, unless a client wanted to get really
> abstract and e.g. batch a 'local edits' object which then get 'applied'
> by a PATCH move onto the persisted object, but that also sounds like
> something that could be done better in other ways.

Yeah.... I'm just going to leave this out for now. If there is a
compelling use case, A special OT merge operation would be strictly
better than manually merging all the keys, but if we can't think of
any actual use for it, I'll save myself the trouble.

-J
Reply all
Reply to author
Forward
0 new messages