Dev Diary: JSON conflicts, ACID and GraphQL

270 views
Skip to first unread message

Joseph Gentle

unread,
Oct 14, 2015, 12:22:23 AM10/14/15
to sha...@googlegroups.com, derbyjs
Hi!

I wanted to post this last week, but I didn't want this stuff to get
lost in the noise announcing the sharejs & sharedb restructure.

I've been stalled working on the JSON transform function. The fuzzer
runs for about 200 iterations now, and then it finds a problem which
looks like this:

Given a document {x:{}, y:{}}
Operation 1: Move x into y (-> {y:{x:{}}} )
Operation 2: Move y into x (-> {x:{y:{}}} )

What should happen if both of those operations are concurrent?

I think the least-bad answer is that one of the operations wins and
undoes the other operation's move. However, this has its own awful set
of flow-on effects, and it might have unexpected consequences for data
consistency depending on how your data is modelled. Its also really
complicated to implement and I don't want to do so then change my mind
later.

Meanwhile, I've still been spinning on what to do about default data
in a document. I think it makes sense to have the ability to commit an
operation atomically, and explicitly fail back to user code if the
operation needs to be transformed by another op. This is really easy
to implement, and it would give you a huge amount of flexibility. If
you don't like the OT semantics for JSON, you can just do everything
using regular ACID transactions. But if your intent can be expressed
using operations then you can just express that directly and let the
OT code resolve your intent.

Given three possible feature sets:
a) Powerful, complicated and correct
b) Powerful, complicated and sometimes not correct
c) Simple and correct

... I personally prefer (c) over (b). When OT works well it should be
(a), but I'm worried that the semantics of this awful
insert-two-objects-into-each-other case is one of those times where
even if I make it do something consistently, it won't necessarily be
what the developer wants. If I can't communicate effectively what the
resulting behaviour is under OT, it'll be really easy to write code
which has latent bugs. (Also worth mentioning: in non-text documents
transform is actually quite rare).

So this makes me think that maybe a better approach is to extend an
ACID-style conflict mechanism deeper. So you can say "Here's an
operation, but if it conflicts with another operation in one of these
ways, don't handle it automatically. Instead don't apply the operation
and punt it back to my code". The super important thing here is that
the application developer can reason about what happens if they want,
and change the behaviour accordingly.

So with that in mind I'm thinking about alternate APIs where you
provide a set of conflict flags specifying which operations are
allowed to be resolved through OT and which aren't. The code path
would be to first check for conflicts, then for each conflict check
the flags. If the flags say the concurrent edits are incompatible,
error back to the user. If the flags say they are compatible, resolve
the conflict. Then transform as normal. The regular transform function
would just allow everything to be automatically resolved. Actually
exposing that modified API to the user would require some subsequent
changes in the client.

---

The other thing I've been thinking about is the relationship between
users and data on the cloud. I'd like to have a database which users
own and web applications can keep their data in, so my application
data isn't sitting in another company's servers.

I made a video blog talking about this here:
https://www.youtube.com/watch?v=yAIO-W_o1Gc ... and I'll post up a
text version soon.

The interesting & hard thing about this is that we'd need 'one
database to rule them all'. KV databases would be too slow if you had
to do a roundtrip to a remote DB for each fetch. SQL databases aren't
suitable for a large number of tasks - and normalizing your data is
super annoying. I actually think facebook's GraphQL might be a great
tool for the job. At lever we had a performance problem where entries
in one collection were (individually) quite large. To show a list of
these entries we ended up severely overfetching data. The projections
in livedb were made to address this problem, and they work - but
they're a huge hack. They make other things harder (like we started
double-fetching the data to load it both into the list and in the
detail view). Also editing got harder.

The solution in graphql is far more elegant. In your graphql query you
specify which fields you're interested in. Graphql can follow & expand
references, so you can request part of an object and part of some
other object it refers to without additional round-trips. Combining
that with something like the JSON OT code might result in a really
sexy stack.

-J

Jeremy Apthorp

unread,
Oct 14, 2015, 6:06:00 PM10/14/15
to sha...@googlegroups.com, derbyjs
For custom conflict resolution, would that code exist on the server or the client?

--
You received this message because you are subscribed to the Google Groups "ShareJS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sharejs+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joseph Gentle

unread,
Oct 14, 2015, 6:28:04 PM10/14/15
to sha...@googlegroups.com, derbyjs

The client. I'm imagining something like a transaction retry block, like fdb and firebase uses. There might be a way to send code to the server instead - I'd need to think through some use cases.

Joe Leaver

unread,
Oct 21, 2015, 3:24:45 PM10/21/15
to ShareJS, der...@googlegroups.com
I really like your thinking on this. A transaction retry block in case of conflict would give developers flexibility to handle conflicts in a way that makes most sense to the application (which the underlying library can't predict.) At the very least, how conflicts got handled would be clear and predictable to the developer, since the responsibility for choosing how it's handled gets passed the them. 

It will no doubt always be a frequent support topic, however. Concurrency is hard! 

Jonathan Kienzle

unread,
Dec 11, 2015, 5:38:26 AM12/11/15
to ShareJS, der...@googlegroups.com, m...@josephg.com
Hey Joseph,

the json1 type sounds really interesting! Would be great if you could give us an update on its current status. Is this something you're still working on?

Best,
Johnny

Joseph Gentle

unread,
Dec 11, 2015, 7:37:43 AM12/11/15
to sha...@googlegroups.com
Yeah I want to get back to it - I've been working on some other projects the last couple of months. 
--

Jonathan Kienzle

unread,
Dec 11, 2015, 10:39:39 AM12/11/15
to ShareJS, m...@josephg.com
Great to hear that you plan on getting back to it! :)
To unsubscribe from this group and stop receiving emails from it, send an email to sharejs+unsubscribe@googlegroups.com.

Joe Leaver

unread,
Dec 14, 2015, 12:00:36 PM12/14/15
to ShareJS, m...@josephg.com
I super excited to hear this too! I've been eagerly anticipating json1 with bated breath. I keep the github page for it open and refresh it many times a day to check for updates. json1 is more than just a type -- json1 will align the planets and bring word peace and fulfillment for all mankind. Or, at least, it's going to make nested trees a lot easier to work with in share!
To unsubscribe from this group and stop receiving emails from it, send an email to sharejs+unsubscribe@googlegroups.com.

Joe Leaver

unread,
Jan 12, 2016, 12:04:03 PM1/12/16
to ShareJS, der...@googlegroups.com, m...@josephg.com
Ok, I've been thinking about this for a while, off and on, for the last several months. My view on this might be somewhat myopic, however, since my application centers around moving around nodes on a nested tree. Here's my idea:
 
Given a document {x:{}, y:{}} 
Operation 1: Move x into y (-> {y:{x:{}}} ) 
Operation 2: Move y into x (-> {x:{y:{}}} ) 

What should happen if both of those operations are concurrent? 

I'm thinking the end state of this concurrent operation should be {y:{}, x:{}}. At first glance this might seem mildly retarded (and maybe it is?), but consider it from the user's point of view: a) as a user, I know I'm editing a document concurrent with other users, b)  my change placed my document in the order/location where I expected it to be, in spite of the fact that it's not nested the same way because another user made a change to my intended new parent. c) it's non destructive -- both node y and x are intact, though they have had their positions swapped. This actually provides decent feedback that the two nodes have been changed concurrently. And it's predictable behavior. 

What do you think? Could this not be a non-destructive reasonable default, even if you were to provide a transaction retry/override block in a later version?

Jeremy Apthorp

unread,
Jan 12, 2016, 10:03:31 PM1/12/16
to ShareJS, der...@googlegroups.com, m...@josephg.com
Technically keys in objects aren't ordered, so that's the same thing as not applying either of the changes (which is maybe a reasonable thing to do?)

--
You received this message because you are subscribed to the Google Groups "ShareJS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sharejs+u...@googlegroups.com.

Joe Leaver

unread,
Jan 12, 2016, 10:18:00 PM1/12/16
to ShareJS, der...@googlegroups.com, m...@josephg.com
Actually, maybe it is a reasonable thing to do. It's an impossible op, or, at least, a non-nonsensical one.  I just don't think there's a mechanism for communicating such back through the stack (maybe I'm wrong?) and failing silently seems like an antipattern. But I'd take it, if it meant I could get all the other juicy goodness of json1.

Honestly, I'd be pleased with pretty much any predictable behavior, though erring on the side of doing less (or nothing) feels right.

Joe Leaver

unread,
Jan 12, 2016, 10:25:55 PM1/12/16
to ShareJS, der...@googlegroups.com, m...@josephg.com
Followup -- maybe swapping (rather than doing nothing) still makes sense (if it ever did?) Conflicting ops can happen anywhere at different levels in the tree.

Consider {a: {x:{}}, b: {y: {}} } 
Operation 1: Move x into y (-> {b: {y:{x:{}}}} ) 
Operation 2: Move y into x (-> {a: {x:{y:{}}}} )  

I'd be pleased if the final result was {a: {y:{}}, b: {x: {}} }. This seems to me (in my specific imaginary use) to be a 'best effort' of honoring the intent of the technically impossible (or at least nonsensical) op. 

Jeremy Apthorp

unread,
Jan 13, 2016, 9:12:21 PM1/13/16
to ShareJS, der...@googlegroups.com, m...@josephg.com
The tricky part isn't just deciding how to resolve this conflict, it's making sure that it's valid with other things transforming past it.

e.g, say we both have this document:
{x:{}, y:{}}
I mutate it into:
{x:{y:{}}}
and then into:
{x:{y:{a:3}}}

and you mutate the original into:
{y:{x:{}}}

what should the resulting document be, and how do we make sure that when i transform your operation past mine and apply it, i get the same document as when you transform my operation past yours and apply it (i.e. that TP1 is satisfied)? In this case, any of these are reasonable things to want:
{}
{x:{}, y:{a:3}}
{x:{y:{a:3}}}
{y:{x:{}, a:3}}

An even trickier case is something like

{ x:{y:{z:{}}}, a:{b:{c:{}}} }

where party (A) transforms to

{ x:{y:{z:{a:{b:{c:{}}}}}} }

and party (B) transforms to

{ a:{b:{c:{x:{y:{z:{}}}}}} }

we want to be able to resolve this sort of conflict such that all of (A)'s future operations _and_ all of (B)'s future operations continue to make sense on the resulting document.

Joe Leaver

unread,
Jan 14, 2016, 10:33:59 AM1/14/16
to ShareJS, der...@googlegroups.com, m...@josephg.com
Yeah, that makes total sense. You know, all these cases are actually the same thing, which is trying to do an operation on a DAG that creates a cycle. Any op (or a conflict between ops) that tries to create a cycle in a tree has to be an invalid op. Or what about {a: {x: {y: {}}} and an op that tries to move a into y? You could interpret it any number of ways (shuffle children left...), but really, it's an invalid op and needs to be rejected. Or what about an insert into y that happens concurrently with a delete of a? But rejecting any op for any reason creates all kinds of issues with offline or lagging clients, etc., and doesn't feel very 'transformy.' If we aren't to reject ops, and instead try to resolve conflicts, what next? Perhaps a hierarchy of safe resolutions: moves are always non-destructive, node contents are always preserved, conflicting ops send a rollback op to the chosen loser, conflicting ops rise in the tree to nearest legal sibling??? And what does this all look like when trying to communicate a coherent stream of events to catch a client up? Concurrency is hard, yo. 

Jeremy Apthorp

unread,
Jan 14, 2016, 2:33:06 PM1/14/16
to ShareJS, der...@googlegroups.com, m...@josephg.com
Ops that are invalid by themselves are easy enough to handle, but when it's the combination of two ops that's invalid, by the time you know it's invalid it's kind of too late.
Reply all
Reply to author
Forward
0 new messages