New JSON type - update / progress report

261 views
Skip to first unread message

Joseph Gentle

unread,
Jul 2, 2015, 9:47:31 PM7/2/15
to sha...@googlegroups.com, derbyjs
Hi everyone!

I've been working on and off for the past few weeks on the new OT
type. The progress is thus:

- Transform is working (O_o). There aren't any more bugs that I know
about (I have about 30 test cases), although I'm sure the randomizer
will find some new edge cases I didn't think of. Still not implemented
in transform: embedded types (easy), set-null (hard).

- Apply is working, although that was really easy.

- The only hard-ish function left to do is compose, although compose
will probably be a lot simpler than transform. The trick will be
factoring both of them to reuse as much code as possible.

Transform is by far the hardest code in the type. I'd say its about
65% done in total.


I had a big design review with Nate last week while I was briefly in
SF, and we talked through some things:

Invertibility:

So, I think the only thing non-invertible in operations is the removed
data. Instead of just specifying that data is being removed, I'm going
to make operations optionally invertible. I don't know how this will
play out exactly in livedb.

It might also be worth adding a standard getInvert(doc, op) function
for ops which returns an op's invert. Formally, given op' =
invert(doc, op) then doc * op * op` == doc. This would be really easy
to write for all the other OT types too and it would make OT-level
undo easy to implement.


Initializing data (aka set null):

The ability to use OT to initialize is actually super useful /
important. Its used to express 'initialize doc to {count:0} and then
increment count'. Without explicit support, its impossible to express
that operation in a way that transforms correctly.

The JSON-patch specification contains arbitrary conditionals (instead
of just if-null-then-insert), but unfortunately I can't think of a way
to capture the semantics of that in a general way through transform.


Format of operations:

We spent a lot of time talking about the awkwardness of the current
operation format. If you want to set doc.name.first = "Nate" then
currently you have to say:
{o:{name:{o:{first:{di:"Nate"}}}}}
Nate (fairly) complained that adding the o's everywhere is awkward,
and making it work properly for lists requires coercing strings into
numbers in javascript and sorting.

He suggested instead we use lists-of-lists:
['name','first',{i:'Nate'}]

... Which is a compacted version of this:
['name',[['first',[{i:'Nate'}]]]]
If we wanted to set first and last names:
['name',[['first',{i:'Nate'}], ['last',{i:'Smith'}]]]

I'm not sure what to think about this. The fact that there's multiple
ways to write the same operation is going to either make the OT code
more complicated, or require translating between the expanded and
compact representations at the start and end of all the OT functions.
So ... ugh.

I translated all the examples from the draft spec to nate's proposed system:
https://gist.github.com/josephg/bd05e9dd240dc0ac7888
In terms of bytes-on-wire its about the same for the operations I
looked at, though if you have super deep objects and simple operations
(super common) then nate's proposal will start looking a lot better.

I'd love some input - otherwise I'm going to sit on it and think for a
week or two. There might be a compromise solution somewhere, but in
any case I can't make much more progress until we figure this out.

Cheers
-J

Wout Mertens

unread,
Jul 3, 2015, 1:05:33 AM7/3/15
to sha...@googlegroups.com, derbyjs
For reducing the footprint of the messages, there's https://github.com/Sage/jsurl, which turns
{o:{name:{o:{first:{di:"Nate"}}}}} into
~(o~(name~(first~(di~'Nate))))

and
['name','first',{i:'Nate'}] into
~(~'name~'first~(i~'Nate))

It solves the extra constraint of being URI compatible, which is not necessary here, but the basic ideas are nice.

It comes at the price of a string escaping step.

There's also http://ubjson.org/ which compresses better, but not sure which of them is faster.

Another worthy consideration is a dictionary of strings, that you could populate with project-specific strings so that last example could become ~(#0#1~(i#2)) given the dictionary ['name','first','Nate']. (but use character indexes instead of digits)

Wout.

--
You received this message because you are subscribed to the Google Groups "ShareJS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sharejs+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--

Wout.
(typed on mobile, excuse terseness)

Wout Mertens

unread,
Jul 3, 2015, 1:14:10 AM7/3/15
to sha...@googlegroups.com, derbyjs
Also worth noting is that array handling is much faster than object handling. Natively expressing ops as arrays might be better for memory footprint and speed.


On Fri, Jul 3, 2015 at 3:47 AM Joseph Gentle <jos...@gmail.com> wrote:
--
You received this message because you are subscribed to the Google Groups "ShareJS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sharejs+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joseph Gentle

unread,
Jul 3, 2015, 1:27:34 AM7/3/15
to sha...@googlegroups.com

That could work. That said, I think the most important goal for the operations is that they're simple to make & work with. Once I'm done with the js version I want a c implementation of the type - and no doubt that will just pack the information directly using enums & union types. I could use the same binary format in js too by exploiting the arraybuffer tricks asmjs uses, but ugh at the thought of staring at base64 strings stored in my databases. In the long run we can add tooling, but I want this to be usable soon.

So I want some sort of pure json embedding of the operations usable by humans. It's more a question of how to do it semi-cleanly.

-J

Pedro Machado Santa

unread,
Jul 8, 2015, 5:31:05 AM7/8/15
to sha...@googlegroups.com, der...@googlegroups.com
Hi Joseph,

Awesome, thanks for the update. +1

Could you elaborate a little bit more about non-invertible/optionally invertible operations? Isn't the invertability of an operation important to OT? Is it optional?

Also, if it's not too much to ask, could you elaborate on why data being removed cannot be inverted?

Sorry for the amount of questions. :D I'm a little concerned, because I rely heavily on the fact that operations are invertible for a document timeline version compare purposes - though one could think of a custom format for it if it comes to that, I guess. :)

Cheers.

Pedro

Nate Smith

unread,
Jul 8, 2015, 3:25:35 PM7/8/15
to sha...@googlegroups.com, der...@googlegroups.com
Invertibility means that it is possible to calculate the opposite of an op. Applying the inverse of an op is effectively the way to "undo" the op.

It is not necessary for playback of history, though it can be used for that. What it lets you do, is that if you have a snapshot at version 10 and the ops, having an invertible op means that you can modify the snapshot to create snapshot version 9 directly from snapshot version 10 and op version 9 (the op between snapshot version 9 and snapshot version 10). Alternatively, you can calculate any version from scratch with just the ops if you have the full op history. You can also store all the snapshot versions or keyframes of the versions (such as store every 10 snapshots) and calculate forward from there.

Remove ops are not necessarily invertible, because the description of the remove does not need to include the stuff that is being deleted. For example, you can say remove the 5th through 10th characters in a string without keeping around what those characters were. They can be optionally invertible by simply making storage of the deleted content optional. It is also possible to calculate what is removed from the full history of operations. Thus, our conclusion was that the operation format should support making invertible ops, but not require it.

There is no need for concern, since making this optional is just additional functionality. It will mean that it is possible to have less verbose ops for those who don't need invertibility, but invertible ops will be equivalent to before.

--
You received this message because you are subscribed to the Google Groups "ShareJS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sharejs+u...@googlegroups.com.

Jeremy Apthorp

unread,
Jul 9, 2015, 1:44:40 AM7/9/15
to sha...@googlegroups.com, der...@googlegroups.com
Invertibility critically also allows you to start with document D, apply op A, then apply op B, then apply op A' and end up with a document that's equivalent to D*B (where A' is the inverse of A).

That is:

D*A*B*A' == D*B

This allows you to do things like implement an undo system where you can undo your own changes without affecting anyone else's changes.

Joseph Gentle

unread,
Jul 9, 2015, 4:25:13 AM7/9/15
to sha...@googlegroups.com, derbyjs

There's an answer to that which doesn't require invertability though - it would be easy enough to just write a function which takes a doc and an op and produces the inverse to the op.

This would also be easy to add to the text type.

-J

Geoff Goodman

unread,
Jul 9, 2015, 12:33:38 PM7/9/15
to sha...@googlegroups.com, derbyjs
Hi Nate, thanks for the detailed explanation. Can you elaborate on why invertible ops would be required for snapshotting?

Also, it seems that if you are snapshotting and are using non-invertible remove operations, you could reconstruct any state given the last snapshot prior to that state and all subsequent ops. To the extent that snapshots are fairly regular, this seems like it would be pretty painless to do. I feel like I'm probably missing something you guys with experience know that I don't.

Exciting to see this move forward.

Geoff

Cryptic Swarm

unread,
Jul 9, 2015, 1:02:28 PM7/9/15
to sha...@googlegroups.com, derbyjs
Snapshotting does not require invertible ops.

A non invertible op is like a forward iterator. It allows you to iterator forward in history.

An invertible op is like a bidirectional iterator. It allows you to move either forward in history or backwards.

David Evans

unread,
Jul 21, 2015, 11:02:36 AM7/21/15
to sha...@googlegroups.com, der...@googlegroups.com
"There's an answer to that which doesn't require invertability though - it would be easy enough to just write a function which takes a doc and an op and produces the inverse to the op."

This is like the margin note in Fermat's last theorum...

Could you go into any more detail about how to create an invert operation for the ottypes/text type. We need to implement undo for our text editor and it's proving problematic without the an invertible text type.

Joseph Gentle

unread,
Jul 21, 2015, 6:23:17 PM7/21/15
to sha...@googlegroups.com, derbyjs
Hah sure. But it really is quite easy. Say the document is:
"ABCDEFG"

and you apply a text operation:
[2, 'x', {d:3}] (Skip 2 characters, insert 'x', then delete 3
characters ('CDE').)

The resultant document will be:
"ABxFG"

Inverting the operation is only hard because looking at the operation
we don't know what characters were deleted. If you have a version of
the doc from right before the operation is applied, you can just go
look. With that, just swap inserts to deletes, deletes to inserts and
swaps stay the same.

The inverse is:
[2, {d:1}, "CDE"]

And if you apply that to ABxFG you get back the original ABCDEFG.

I'm proposing adding a standard function to the ot types which does
this work. We can call it invertWithDoc or something. Pass in the
operation and the document before that operation was applied and it
can produce the inverse such that doc+op+op' == doc. That should be
easy to add to the fuzzer as well.

We can use that for the undo stack in the client by generating &
storing the inverse before each local operation is applied.

-J

Vaios Kalpias-Ilias

unread,
Jul 22, 2015, 12:25:42 PM7/22/15
to ShareJS, der...@googlegroups.com
Hey,

I have a question about redoing operations. Say you have a local undo stack and then you undo an operation. Then we calculate the inverse of that operation and add it to a local redo stack.

Now if some remote operations come in, what would be the correct way to transform the redo stack so that when you redo, the operations get added correctly to the document?

So if a document has these operations (A is user A and B is user B):
A1 A2 B1 A3 B2 B3

Then user A undoes A3:

A3' = invert(A3)
A3' = transform(undo, B2, 'left')
A3' = transform(undo, B3, 'left)

A3'' = invert(A3') 

then A3'' is in the redo stack and after applying A3' (the undo op) the series becomes:
A1 A2 B1 A3 B2 B3 A3'

Then B does another operation:

A1 A2 B1 B2 B3 A3' B4

now we need to transform A3'' which is in the redo stack somehow... I am doing

A3'' = transform(A3'', B4, 'left')

But this doesn't give the correct result in all cases. Mostly when you have multiple operations in the redo stack, then if you transform all of them with the remote operation it can produce incorrect results. Is there a better way? Let me know if I'm not clear enough :)

Joseph Gentle

unread,
Jul 25, 2015, 8:53:31 PM7/25/15
to sha...@googlegroups.com
Hey

Everything you said is correct up to here:

On Thu, Jul 23, 2015 at 2:25 AM, Vaios Kalpias-Ilias <vkal...@gmail.com> wrote:
> But this doesn't give the correct result in all cases. Mostly when you have
> multiple operations in the redo stack, then if you transform all of them
> with the remote operation it can produce incorrect results. Is there a
> better way? Let me know if I'm not clear enough :)

... And you're totally right - you have to be a bit trickier when you
transform multiple undo / redo stack entries.

Going back to your example, say we have:

A1 A2 B1

(A1,A2 from user A, B1 from user B).

A has operations A1' and A2' which are the inverse of A1 and A2. When
A gets operation B1, how do we convert A1' and A2' ? Its actually
incorrect to transform them both by B1 directly. There's a trick you
need to do.

The reason you need the trick is that transform has a constraint that
isn't talked about much. Whenever you call transform(x, y), there must
exist a document state where either x or y would be valid operations.
In this case, operations B1 and A2' are both valid operations on the
document state between A2 and B1. You don't need to have the document
state to transform, but there must exist one. It would not be valid to
transform A1' by B1 because there exists no document state which could
accept either A1' or B1.

The trick is explained more easily by a diagram. In short, you need to do this:
A2'' = transform(A2', B1, 'left')
B1' = transform(B1, A2', 'right')
now we can calculate A1'' with B1':
A1'' = transform(A1', B1', 'left')

In code:
given undoStack, remoteOp:

r = remoteOp;
for (op in undoStack in from the top of the stack to the bottom)
op' = transform(op, r, 'left')
r = transform(r, op, 'right)
replace op with op' in undoStack

The same trick will work with the redo stack. If you want to convince
yourself this is correct, all the types have a random op generator. It
should be easy to make a simple fuzzer that starts with a document,
generates some random operations from two clients and undoes some of
them (and redoes others) and makes sure everything checks out.

-J

Vaios Kalpias-Ilias

unread,
Jul 25, 2015, 10:49:33 PM7/25/15
to ShareJS
I see I think I get why we need the trick. It all makes more sense now. I will give this a try asap - thanks so much!

Wout Mertens

unread,
Jul 26, 2015, 10:37:22 AM7/26/15
to sha...@googlegroups.com, der...@googlegroups.com

I just realized that you don't actually need to transfer the deleted text on a deletion op to the server for it to be reversible. You only need to store that text once you apply the op and you want to be able to invert...

Dmitry

unread,
Aug 18, 2015, 10:30:48 AM8/18/15
to ShareJS, der...@googlegroups.com
Hi Joseph,

Regarding inversible ops I have my own separate opinion that it's better to keep deleted data together with OP instead of trying to rebuild it using document snapshot. I have experience with building full DIFFs of changes between 2 versions in a text document, that can show what was deleted, inserted or modified by each collaborator. It requires to have inverted ops and building them for each revision is resource expensive. Of course it's just one of the custom requirements and it might not be important for 90% of others.

Another benefit of having delete objects recorded in the OP is that you'll be able to implement very strict state checks, that means you can make sure that what client requested to delete is what you actually have in your server snapshot. Over wise it increases chances to get to the broken unrecoverable document state in case there was an error on the client and something went wrong.

Joseph Gentle

unread,
Aug 19, 2015, 11:04:45 PM8/19/15
to sha...@googlegroups.com, derbyjs
If you have experience with diffs, I can see where you're coming from.
However in this case, we have enough information to perfectly recreate
the inverted operation from the op + document which it was applied to.
This can happen in the client or the server when the operation is
created. And I can verify that the code is correct (using the fuzzer).

As I get closer to finishing the OT type I am more convinced that this
optional invertability approach is the best one. Consider this edge
case:

Say you have users users A and B editing this document:
{x:5, z:{}}
A moves doc.x -> doc.z.x. Result: {z:{x:5}}
B removes doc.z. Result: {x:5}.

What should the final result of applying these two operations be?
There's two possible answers:
1. We cancel / undo the move, leaving {x:5}. Unfortunately, this will
cause more problems. What if a subsequent operation by A moves
something else into doc.x? Do we need to apply back-pressure to that
as well? We might end up causing problems in a whole chain of edits.
2. We consider the move to have happened first, and the moved data gets deleted.

Without conflict markers, I think 2 is the only workable solution
(though I'd be curious if anyone disagrees with this).

So lets talk about the transform function. Remember that the transform
function only sees the operations, not the document state when the
operations were applied[1]. The transform function needs to perform:
transform( move doc.x -> doc.z.x , remove doc.z:{} ) -> remove doc.x:5
and
transform( remove doc.z:{}, move doc.x -> doc.z.x ) -> remove doc.z:{x:5}

But the transform function doesn't have enough information to know
that x currently has a value of 5 - implementing this is impossible if
invertable data needs to be stored in the op.

So I'm just going to make the transform function always strip
invertability information. All removes will simply mark that section
as needing to be removed. Livedb *does* have a document snapshot when
the operations are saved to the database. (This is important to verify
the operation's validity). At that point we can add back in
information about what was removed if thats desirable. The record
about what has been deleted is more reliable this way - right now
misbehaving clients can lie about what they removed and the server
will accept it.

-J

[1] And for good reason - if an client sends an operation for an old
document, we'd have to reconstruct the document state at that old
version in order to transform; which may be quite expensive to do.

Dmitry

unread,
Aug 20, 2015, 12:13:07 PM8/20/15
to ShareJS, der...@googlegroups.com, m...@josephg.com
I had situations when client code submitted something broken to the document and without proper checks I ended up with broken state. Not very good old code maybe that I used, but I prefer to always have my back covered so in new rich text OT i keep deleted information together with OP.

Yes, JSON stuff is more complex than working with text :) and transform needs to affect that deleted info as well, greatly increasing code complexity. Agree with that. 

I'm sure you already thought about this and probably had a reason to go with "move" operation, but if you consider "move" as "remove" + "insert" then your case can be solved easily with the only outcome of Result: {x:5}. That's what would happen in the text OT.

just my 2 cents.

Jeremy Apthorp

unread,
Aug 20, 2015, 1:10:23 PM8/20/15
to ShareJS, der...@googlegroups.com, m...@josephg.com
Move cannot be considered as a remove + insert operation, because then transforming "edit" past "remove + insert" results in the edit being lost.

--

Joseph Gentle

unread,
Aug 22, 2015, 1:10:04 AM8/22/15
to Jeremy Apthorp, ShareJS, derbyjs
Yep.

So I've been banging my head against transform for the past couple of
weeks - I think I've written and rewritten the code about 4 or 5 times
now. I just did a pass deduping the tests because I've written so many
of them. Writing this code is like weaving about 10 spider threads
through each other while making sure they all touch in *exactly* the
right place.

If you're curious, this is the rule set for transform:
https://dl.dropboxusercontent.com/u/2494815/transform.png
... Which doesn't take into account the semantics for list moves
(which is crazy complicated, but I've already written that part enough
times now that there are no *known* bugs).

-J

Joseph Gentle

unread,
Aug 29, 2015, 1:41:08 AM8/29/15
to Curran Kelleher, Derby, Jeremy Apthorp, sha...@googlegroups.com
Heh I should set up a FAQ.

Short answer: No, it won't work.
Long answer: if you have an empty list: "[]" and we both insert into
the list (pos 1, insert "1") vs (pos 1, insert "2") we'll have "[12]"
instead of "[1,2]". There's dozens of examples like this where it
would break - some would create invalid JSON ('[1,2]' -> '[,]'),
others would simply make something weird and wrong (like above). The
text type is also missing the ability to move characters, which the
new type will support.

-J

On Sat, Aug 29, 2015 at 1:20 PM, Curran Kelleher
<curran....@gmail.com> wrote:
> Hello,
>
> I'm just wondering, would it make sense to try to leverage the text OT type
> on stringified JSON?
>
> Best regards,
> Curran

Ian Johnson

unread,
Aug 31, 2015, 2:03:55 PM8/31/15
to der...@googlegroups.com, Curran Kelleher, Jeremy Apthorp, sha...@googlegroups.com, Joseph Gentle
Basically setDiff walks the tree of a json object and checks for equality, if things are unequal then it will set the value at that path to the value of the object being compared with. setDiffDeep does the same thing but does extra work to check for deep equality.
It will not use OT string operations when setting on text values however, and there may be some other limitations I'm not aware of.

Racer is essentially a convenience wrapper around the share API designed for building Derby-like applications. It may be worth it in some cases to write your own functions that do repeated OT operations. If you have more semantic information about what kind of differences matter when comparing your json objects that might help in designing something more robust.

On Sun, Aug 30, 2015 at 6:21 PM, Curran Kelleher <curran....@gmail.com> wrote:
Ah yes, I see what you mean. Interesting cases that didn't pop into my mind. Thank you for clarifying.

Is there a function somewhere in the Derby stack that computes a diff between two JSON objects and generates a corresponding array of JSON0 operations? I see there is this function setDiff, but I'm not sure if that is what this is doing exactly. I can't seem to find documentation on this function and how it should be called, maybe I'm just missing where to find it.

Thank you for your work on this stuff, the stack looks amazing and I'm getting excited about the potential of using ShareJS / Derby for real-time collaborative data visualization applications.

Best regards,
Curran

--
You received this message because you are subscribed to the Google Groups "Derby" group.
To unsubscribe from this group and stop receiving emails from it, send an email to derbyjs+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Ian Johnson - 周彦
Reply all
Reply to author
Forward
0 new messages