New JSON type - update / progress report

90 views
Skip to first unread message

Joseph Gentle

unread,
Jul 2, 2015, 9:47:32 PM7/2/15
to sha...@googlegroups.com, derbyjs
Hi everyone!

I've been working on and off for the past few weeks on the new OT
type. The progress is thus:

- Transform is working (O_o). There aren't any more bugs that I know
about (I have about 30 test cases), although I'm sure the randomizer
will find some new edge cases I didn't think of. Still not implemented
in transform: embedded types (easy), set-null (hard).

- Apply is working, although that was really easy.

- The only hard-ish function left to do is compose, although compose
will probably be a lot simpler than transform. The trick will be
factoring both of them to reuse as much code as possible.

Transform is by far the hardest code in the type. I'd say its about
65% done in total.


I had a big design review with Nate last week while I was briefly in
SF, and we talked through some things:

Invertibility:

So, I think the only thing non-invertible in operations is the removed
data. Instead of just specifying that data is being removed, I'm going
to make operations optionally invertible. I don't know how this will
play out exactly in livedb.

It might also be worth adding a standard getInvert(doc, op) function
for ops which returns an op's invert. Formally, given op' =
invert(doc, op) then doc * op * op` == doc. This would be really easy
to write for all the other OT types too and it would make OT-level
undo easy to implement.


Initializing data (aka set null):

The ability to use OT to initialize is actually super useful /
important. Its used to express 'initialize doc to {count:0} and then
increment count'. Without explicit support, its impossible to express
that operation in a way that transforms correctly.

The JSON-patch specification contains arbitrary conditionals (instead
of just if-null-then-insert), but unfortunately I can't think of a way
to capture the semantics of that in a general way through transform.


Format of operations:

We spent a lot of time talking about the awkwardness of the current
operation format. If you want to set doc.name.first = "Nate" then
currently you have to say:
{o:{name:{o:{first:{di:"Nate"}}}}}
Nate (fairly) complained that adding the o's everywhere is awkward,
and making it work properly for lists requires coercing strings into
numbers in javascript and sorting.

He suggested instead we use lists-of-lists:
['name','first',{i:'Nate'}]

... Which is a compacted version of this:
['name',[['first',[{i:'Nate'}]]]]
If we wanted to set first and last names:
['name',[['first',{i:'Nate'}], ['last',{i:'Smith'}]]]

I'm not sure what to think about this. The fact that there's multiple
ways to write the same operation is going to either make the OT code
more complicated, or require translating between the expanded and
compact representations at the start and end of all the OT functions.
So ... ugh.

I translated all the examples from the draft spec to nate's proposed system:
https://gist.github.com/josephg/bd05e9dd240dc0ac7888
In terms of bytes-on-wire its about the same for the operations I
looked at, though if you have super deep objects and simple operations
(super common) then nate's proposal will start looking a lot better.

I'd love some input - otherwise I'm going to sit on it and think for a
week or two. There might be a compromise solution somewhere, but in
any case I can't make much more progress until we figure this out.

Cheers
-J

Nate Smith

unread,
Jul 8, 2015, 3:25:37 PM7/8/15
to sha...@googlegroups.com, der...@googlegroups.com
Invertibility means that it is possible to calculate the opposite of an op. Applying the inverse of an op is effectively the way to "undo" the op.

It is not necessary for playback of history, though it can be used for that. What it lets you do, is that if you have a snapshot at version 10 and the ops, having an invertible op means that you can modify the snapshot to create snapshot version 9 directly from snapshot version 10 and op version 9 (the op between snapshot version 9 and snapshot version 10). Alternatively, you can calculate any version from scratch with just the ops if you have the full op history. You can also store all the snapshot versions or keyframes of the versions (such as store every 10 snapshots) and calculate forward from there.

Remove ops are not necessarily invertible, because the description of the remove does not need to include the stuff that is being deleted. For example, you can say remove the 5th through 10th characters in a string without keeping around what those characters were. They can be optionally invertible by simply making storage of the deleted content optional. It is also possible to calculate what is removed from the full history of operations. Thus, our conclusion was that the operation format should support making invertible ops, but not require it.

There is no need for concern, since making this optional is just additional functionality. It will mean that it is possible to have less verbose ops for those who don't need invertibility, but invertible ops will be equivalent to before.

On Wed, Jul 8, 2015 at 2:31 AM, Pedro Machado Santa <pedro...@gmail.com> wrote:
Hi Joseph,

Awesome, thanks for the update. +1

Could you elaborate a little bit more about non-invertible/optionally invertible operations? Isn't the invertability of an operation important to OT? Is it optional?

Also, if it's not too much to ask, could you elaborate on why data being removed cannot be inverted?

Sorry for the amount of questions. :D I'm a little concerned, because I rely heavily on the fact that operations are invertible for a document timeline version compare purposes - though one could think of a custom format for it if it comes to that, I guess. :)

Cheers.

Pedro

--
You received this message because you are subscribed to the Google Groups "ShareJS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sharejs+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Joseph Gentle

unread,
Jul 9, 2015, 4:25:14 AM7/9/15
to sha...@googlegroups.com, derbyjs

There's an answer to that which doesn't require invertability though - it would be easy enough to just write a function which takes a doc and an op and produces the inverse to the op.

This would also be easy to add to the text type.

-J

On 9 Jul 2015 3:44 pm, "Jeremy Apthorp" <norn...@nornagon.net> wrote:
Invertibility critically also allows you to start with document D, apply op A, then apply op B, then apply op A' and end up with a document that's equivalent to D*B (where A' is the inverse of A).

That is:

D*A*B*A' == D*B

This allows you to do things like implement an undo system where you can undo your own changes without affecting anyone else's changes.

Joseph Gentle

unread,
Jul 21, 2015, 6:23:18 PM7/21/15
to sha...@googlegroups.com, derbyjs
Hah sure. But it really is quite easy. Say the document is:
"ABCDEFG"

and you apply a text operation:
[2, 'x', {d:3}] (Skip 2 characters, insert 'x', then delete 3
characters ('CDE').)

The resultant document will be:
"ABxFG"

Inverting the operation is only hard because looking at the operation
we don't know what characters were deleted. If you have a version of
the doc from right before the operation is applied, you can just go
look. With that, just swap inserts to deletes, deletes to inserts and
swaps stay the same.

The inverse is:
[2, {d:1}, "CDE"]

And if you apply that to ABxFG you get back the original ABCDEFG.

I'm proposing adding a standard function to the ot types which does
this work. We can call it invertWithDoc or something. Pass in the
operation and the document before that operation was applied and it
can produce the inverse such that doc+op+op' == doc. That should be
easy to add to the fuzzer as well.

We can use that for the undo stack in the client by generating &
storing the inverse before each local operation is applied.

-J



On Wed, Jul 22, 2015 at 1:02 AM, David Evans <da...@playcanvas.com> wrote:
> "There's an answer to that which doesn't require invertability though - it
> would be easy enough to just write a function which takes a doc and an op
> and produces the inverse to the op."
>
> This is like the margin note in Fermat's last theorum...
>
> Could you go into any more detail about how to create an invert operation
> for the ottypes/text type. We need to implement undo for our text editor and
> it's proving problematic without the an invertible text type.

Joseph Gentle

unread,
Aug 19, 2015, 11:04:45 PM8/19/15
to sha...@googlegroups.com, derbyjs
If you have experience with diffs, I can see where you're coming from.
However in this case, we have enough information to perfectly recreate
the inverted operation from the op + document which it was applied to.
This can happen in the client or the server when the operation is
created. And I can verify that the code is correct (using the fuzzer).

As I get closer to finishing the OT type I am more convinced that this
optional invertability approach is the best one. Consider this edge
case:

Say you have users users A and B editing this document:
{x:5, z:{}}
A moves doc.x -> doc.z.x. Result: {z:{x:5}}
B removes doc.z. Result: {x:5}.

What should the final result of applying these two operations be?
There's two possible answers:
1. We cancel / undo the move, leaving {x:5}. Unfortunately, this will
cause more problems. What if a subsequent operation by A moves
something else into doc.x? Do we need to apply back-pressure to that
as well? We might end up causing problems in a whole chain of edits.
2. We consider the move to have happened first, and the moved data gets deleted.

Without conflict markers, I think 2 is the only workable solution
(though I'd be curious if anyone disagrees with this).

So lets talk about the transform function. Remember that the transform
function only sees the operations, not the document state when the
operations were applied[1]. The transform function needs to perform:
transform( move doc.x -> doc.z.x , remove doc.z:{} ) -> remove doc.x:5
and
transform( remove doc.z:{}, move doc.x -> doc.z.x ) -> remove doc.z:{x:5}

But the transform function doesn't have enough information to know
that x currently has a value of 5 - implementing this is impossible if
invertable data needs to be stored in the op.

So I'm just going to make the transform function always strip
invertability information. All removes will simply mark that section
as needing to be removed. Livedb *does* have a document snapshot when
the operations are saved to the database. (This is important to verify
the operation's validity). At that point we can add back in
information about what was removed if thats desirable. The record
about what has been deleted is more reliable this way - right now
misbehaving clients can lie about what they removed and the server
will accept it.

-J

[1] And for good reason - if an client sends an operation for an old
document, we'd have to reconstruct the document state at that old
version in order to transform; which may be quite expensive to do.



On Wed, Aug 19, 2015 at 12:30 AM, Dmitry <uvar...@gmail.com> wrote:
> Hi Joseph,
>
> Regarding inversible ops I have my own separate opinion that it's better to
> keep deleted data together with OP instead of trying to rebuild it using
> document snapshot. I have experience with building full DIFFs of changes
> between 2 versions in a text document, that can show what was deleted,
> inserted or modified by each collaborator. It requires to have inverted ops
> and building them for each revision is resource expensive. Of course it's
> just one of the custom requirements and it might not be important for 90% of
> others.
>
> Another benefit of having delete objects recorded in the OP is that you'll
> be able to implement very strict state checks, that means you can make sure
> that what client requested to delete is what you actually have in your
> server snapshot. Over wise it increases chances to get to the broken
> unrecoverable document state in case there was an error on the client and
> something went wrong.

Joseph Gentle

unread,
Aug 22, 2015, 1:10:05 AM8/22/15
to Jeremy Apthorp, ShareJS, derbyjs
Yep.

So I've been banging my head against transform for the past couple of
weeks - I think I've written and rewritten the code about 4 or 5 times
now. I just did a pass deduping the tests because I've written so many
of them. Writing this code is like weaving about 10 spider threads
through each other while making sure they all touch in *exactly* the
right place.

If you're curious, this is the rule set for transform:
https://dl.dropboxusercontent.com/u/2494815/transform.png
... Which doesn't take into account the semantics for list moves
(which is crazy complicated, but I've already written that part enough
times now that there are no *known* bugs).

-J

On Fri, Aug 21, 2015 at 3:10 AM, Jeremy Apthorp <norn...@nornagon.net> wrote:
> Move cannot be considered as a remove + insert operation, because then
> transforming "edit" past "remove + insert" results in the edit being lost.
>
> On Thu, 20 Aug 2015 at 09:13 Dmitry <uvar...@gmail.com> wrote:
>>
>> I had situations when client code submitted something broken to the
>> document and without proper checks I ended up with broken state. Not very
>> good old code maybe that I used, but I prefer to always have my back covered
>> so in new rich text OT i keep deleted information together with OP.
>>
>> Yes, JSON stuff is more complex than working with text :) and transform
>> needs to affect that deleted info as well, greatly increasing code
>> complexity. Agree with that.
>>
>> I'm sure you already thought about this and probably had a reason to go
>> with "move" operation, but if you consider "move" as "remove" + "insert"
>> then your case can be solved easily with the only outcome of Result: {x:5}.
>> That's what would happen in the text OT.
>>
>> just my 2 cents.

Curran Kelleher

unread,
Aug 28, 2015, 11:20:37 PM8/28/15
to Derby, norn...@nornagon.net, sha...@googlegroups.com, m...@josephg.com
Hello,

I'm just wondering, would it make sense to try to leverage the text OT type on stringified JSON?

Best regards,
Curran

Joseph Gentle

unread,
Aug 29, 2015, 1:41:08 AM8/29/15
to Curran Kelleher, Derby, Jeremy Apthorp, sha...@googlegroups.com
Heh I should set up a FAQ.

Short answer: No, it won't work.
Long answer: if you have an empty list: "[]" and we both insert into
the list (pos 1, insert "1") vs (pos 1, insert "2") we'll have "[12]"
instead of "[1,2]". There's dozens of examples like this where it
would break - some would create invalid JSON ('[1,2]' -> '[,]'),
others would simply make something weird and wrong (like above). The
text type is also missing the ability to move characters, which the
new type will support.

-J

Curran Kelleher

unread,
Aug 30, 2015, 9:21:04 PM8/30/15
to Derby, curran....@gmail.com, norn...@nornagon.net, sha...@googlegroups.com, m...@josephg.com
Ah yes, I see what you mean. Interesting cases that didn't pop into my mind. Thank you for clarifying.

Is there a function somewhere in the Derby stack that computes a diff between two JSON objects and generates a corresponding array of JSON0 operations? I see there is this function setDiff, but I'm not sure if that is what this is doing exactly. I can't seem to find documentation on this function and how it should be called, maybe I'm just missing where to find it.

Thank you for your work on this stuff, the stack looks amazing and I'm getting excited about the potential of using ShareJS / Derby for real-time collaborative data visualization applications.

Best regards,
Curran

Ian Johnson

unread,
Aug 31, 2015, 2:03:57 PM8/31/15
to der...@googlegroups.com, Curran Kelleher, Jeremy Apthorp, sha...@googlegroups.com, Joseph Gentle
Basically setDiff walks the tree of a json object and checks for equality, if things are unequal then it will set the value at that path to the value of the object being compared with. setDiffDeep does the same thing but does extra work to check for deep equality.
It will not use OT string operations when setting on text values however, and there may be some other limitations I'm not aware of.

Racer is essentially a convenience wrapper around the share API designed for building Derby-like applications. It may be worth it in some cases to write your own functions that do repeated OT operations. If you have more semantic information about what kind of differences matter when comparing your json objects that might help in designing something more robust.

--
You received this message because you are subscribed to the Google Groups "Derby" group.
To unsubscribe from this group and stop receiving emails from it, send an email to derbyjs+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Ian Johnson - 周彦
Reply all
Reply to author
Forward
0 new messages