Dev Diary: JSON conflicts, ACID and GraphQL

101 views
Skip to first unread message

Joseph Gentle

unread,
Oct 14, 2015, 12:22:24 AM10/14/15
to sha...@googlegroups.com, derbyjs
Hi!

I wanted to post this last week, but I didn't want this stuff to get
lost in the noise announcing the sharejs & sharedb restructure.

I've been stalled working on the JSON transform function. The fuzzer
runs for about 200 iterations now, and then it finds a problem which
looks like this:

Given a document {x:{}, y:{}}
Operation 1: Move x into y (-> {y:{x:{}}} )
Operation 2: Move y into x (-> {x:{y:{}}} )

What should happen if both of those operations are concurrent?

I think the least-bad answer is that one of the operations wins and
undoes the other operation's move. However, this has its own awful set
of flow-on effects, and it might have unexpected consequences for data
consistency depending on how your data is modelled. Its also really
complicated to implement and I don't want to do so then change my mind
later.

Meanwhile, I've still been spinning on what to do about default data
in a document. I think it makes sense to have the ability to commit an
operation atomically, and explicitly fail back to user code if the
operation needs to be transformed by another op. This is really easy
to implement, and it would give you a huge amount of flexibility. If
you don't like the OT semantics for JSON, you can just do everything
using regular ACID transactions. But if your intent can be expressed
using operations then you can just express that directly and let the
OT code resolve your intent.

Given three possible feature sets:
a) Powerful, complicated and correct
b) Powerful, complicated and sometimes not correct
c) Simple and correct

... I personally prefer (c) over (b). When OT works well it should be
(a), but I'm worried that the semantics of this awful
insert-two-objects-into-each-other case is one of those times where
even if I make it do something consistently, it won't necessarily be
what the developer wants. If I can't communicate effectively what the
resulting behaviour is under OT, it'll be really easy to write code
which has latent bugs. (Also worth mentioning: in non-text documents
transform is actually quite rare).

So this makes me think that maybe a better approach is to extend an
ACID-style conflict mechanism deeper. So you can say "Here's an
operation, but if it conflicts with another operation in one of these
ways, don't handle it automatically. Instead don't apply the operation
and punt it back to my code". The super important thing here is that
the application developer can reason about what happens if they want,
and change the behaviour accordingly.

So with that in mind I'm thinking about alternate APIs where you
provide a set of conflict flags specifying which operations are
allowed to be resolved through OT and which aren't. The code path
would be to first check for conflicts, then for each conflict check
the flags. If the flags say the concurrent edits are incompatible,
error back to the user. If the flags say they are compatible, resolve
the conflict. Then transform as normal. The regular transform function
would just allow everything to be automatically resolved. Actually
exposing that modified API to the user would require some subsequent
changes in the client.

---

The other thing I've been thinking about is the relationship between
users and data on the cloud. I'd like to have a database which users
own and web applications can keep their data in, so my application
data isn't sitting in another company's servers.

I made a video blog talking about this here:
https://www.youtube.com/watch?v=yAIO-W_o1Gc ... and I'll post up a
text version soon.

The interesting & hard thing about this is that we'd need 'one
database to rule them all'. KV databases would be too slow if you had
to do a roundtrip to a remote DB for each fetch. SQL databases aren't
suitable for a large number of tasks - and normalizing your data is
super annoying. I actually think facebook's GraphQL might be a great
tool for the job. At lever we had a performance problem where entries
in one collection were (individually) quite large. To show a list of
these entries we ended up severely overfetching data. The projections
in livedb were made to address this problem, and they work - but
they're a huge hack. They make other things harder (like we started
double-fetching the data to load it both into the list and in the
detail view). Also editing got harder.

The solution in graphql is far more elegant. In your graphql query you
specify which fields you're interested in. Graphql can follow & expand
references, so you can request part of an object and part of some
other object it refers to without additional round-trips. Combining
that with something like the JSON OT code might result in a really
sexy stack.

-J

Joseph Gentle

unread,
Oct 14, 2015, 6:28:06 PM10/14/15
to sha...@googlegroups.com, derbyjs

The client. I'm imagining something like a transaction retry block, like fdb and firebase uses. There might be a way to send code to the server instead - I'd need to think through some use cases.

On 15 Oct 2015 9:06 AM, "Jeremy Apthorp" <norn...@nornagon.net> wrote:
For custom conflict resolution, would that code exist on the server or the client?

--
You received this message because you are subscribed to the Google Groups "ShareJS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sharejs+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "ShareJS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sharejs+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages