On setNull and schemas

Joseph Gentle

unread,

Aug 29, 2015, 2:30:41 AM8/29/15

to sha...@googlegroups.com, derbyjs

I've been working on the new JSON type and thinking about setNull.

Derby currently has a setNull function which lets you set a property
to be some value but only if its currently null. This is super useful
for when your document is missing a field and you want to insert
something into the field.

Eg, imagine I'm storing my friend Jeremy in my database:

{ handle: 'nornagon' }

and I want to add a list of URLs to Jeremy's webpages - so I want:

{
handle: 'nornagon',
urls: ['nornagon.net']
}

Unfortunately, I can't do this in a clean way with OT. If I insert
urls: ['nornagon.net'] when someone else inserts urls:
['github.com/nornagon'] we'll end up with only one of our two URLs.
The other one will be removed! (aak!)

So we want to be able to say:
user.setNull('urls', []); user.urls.push('nornagon.net');
... And have that come out alright.

Because the JSON OT code is last-writer-wins, there's no real way to
do this right now. We've been punting on the problem until the new
JSON OT type which will magically fix everything, and that time is
here.

I have four possible solutions, and I'd like some feedback on which
one(s) to implement:

Option 1: Make the normal insert / move first-writer-wins instead of
last-writer-wins. I can't remember why we currently break ties with
last-writer-wins semantics. (Does anyone know?)

If its just arbitrary, we could swap it around to make it
first-writer-wins. Then you could at least implement setnull with an
extra round-trip. (Which is awful, but doable). First do a round-trip
to the server inserting urls:[] and when that comes back (even if
someone else got there first) you can insert into the list.

Option 2: We add an alternate insert to the new OT type. So, we'd be
able to say something like:
(path: 'urls', insert if null: []), (path:'urls.0', insert 'nornagon.net')

The simplest way to make the semantics of this consistent are that if
you both setnull different values, one person wins and the other
person's edits get deleted. Compose will have to be a bit tricky, but
we'll cope.

Option 3: We make insert more clever...

We could make it so that if two people insert the same thing at the
same place, neither person's edit gets overwritten.

So:
user.urls = ['nornagon.net']
vs
user.urls = ['github.com/nornagon']
would result in one url winning and the other losing

but:
user.urls = []; users.urls[0] = 'nornagon.net';
vs
user.urls = []; users.urls[0] = 'github.com/nornagon';
would result in both URLs (in some arbitrary, consistent order)

To make this work, transform would check when two inserts collide to
see if they're inserting the same value. If they are, we just let them
'both win' as it were.

Its sort of a quirk of the way operations are written that those are
both independently expressible. I was about to make one of them
invalid when I realised the possibility.

Option 4: We solve the problem in a different layer - in particular in livedb.

So, basically the problem we're trying to solve here is that of data
schemas. Another (maybe better) way to solve this problem is to make
every collection have an (optional) schema / default value defining
what fields are in the document.

The schema could even be as simple as a version number + default value
for the document.
So, user schema V1 is {}
user schema V2 is {urls:[]}

... Then we define an operation for converting documents from schema
V1 to V2. In this case, (insert urls:[])

When the client code is updated to care about a url list, it requests
the document at schema v2 and livedb automatically migrates the data
between versions before handing the data to the client. Then the
clients can just insert the new URLs in and everything will be peachy.
I'm not sure what clients with old code are supposed to do - what
happens if you request the document at V1? Does it error? That means
you have to restart all your frontend servers at exactly the same
time. I guess this a problem anyway.

---

I'm leaning toward ditching an explicit setNull and implementing
options 3 and 4. Option 3 is the ghetto hacky way for people who are
into that sort of thing and option 4 will satisfy people who care more
about correctness.

(Oh, and I won't get around to adding schemas to livedb anytime soon.
We'll need to do some more design work before writing that code.)

What do you all think? Are there any other use cases for setNull that
you care about? (fiddling with insert is strictly less powerful than
having a proper setnull operator).

-J

Jeremy Apthorp

unread,

Aug 29, 2015, 2:30:40 PM8/29/15

to sha...@googlegroups.com, derbyjs

Is it an option to simply not solve this problem? Create the user blob with the 'urls' field already in place :) then upgrading the schema looks like {set "urls", []} for all users. I guess clients might have an old version and try to create users without that field though. Hm.

Another option would be to define "insert" on undefined to treat it as an empty list, same way "add" treats undefined as 0.

--
You received this message because you are subscribed to the Google Groups "ShareJS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sharejs+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joseph Gentle

unread,

Aug 29, 2015, 8:13:27 PM8/29/15

to sha...@googlegroups.com, derbyjs

Not solving the problem is basically where I was going with option 4.
Which is to say, we punt the problem to livedb and then its not a
problem that needs fancy OT semantics.

I've thought about making insert just automatically create parent
objects before but I think its a bad idea - I think there should be
some kind of explicit intentionality about it. If I rename a field
across my database and forget to change one piece of code somewhere,
I'd rather that code crashes than it just recreate the old field every
time it runs. Those bugs are the worst. (Also I think there was some
quite good OT-related reason why automatically making objects would be
bad, but I can't remember what it is now)

-J

Joe Leaver

unread,

Sep 1, 2015, 2:15:04 PM9/1/15

to ShareJS, der...@googlegroups.com, m...@josephg.com

So, I've been lurking here for a couple of months and watching the progress on the new JSON type.

Just my opinion, but I don't think option 3 is a "ghetto hacky way" at all. In my mind (addled though it is) this represents the "correct" behavior; all data is kept, order can't be predicted, but is consistent.

Option 4 doesn't belong in the type at all, and if it exists I think it belongs in the livedb layer, possibly as a filter or callback?

Dmitry

unread,

Sep 14, 2015, 1:49:28 PM9/14/15

to ShareJS, der...@googlegroups.com, m...@josephg.com

If you remember we had this discussion for json0 and i think my pull request is still hanging somewhere :)

I was trying to achieve a solution for creating an editable object on the client who connects first, taking in mind possible concurrency. In my case, I need to insert child Rich Text object into parent JSON to start working on it. For example:

initial: {}

client1: { doc: {--rich text object format--} }, client 1 do some edits here

client2 (concurrently): { doc: {--rich text object format--} }, client 2 do some edits here

expected result:

{doc: {client1 edits + client2 edits} }

so here neither "first wins" nor "last wins" works.

Joseph Gentle

unread,

Sep 14, 2015, 5:47:02 PM9/14/15

to Dmitry, derbyjs, sha...@googlegroups.com

Yeah, that's what I'm implementing!

Dmitry

unread,

Sep 18, 2015, 10:47:53 AM9/18/15

to ShareJS, uvar...@gmail.com, der...@googlegroups.com, m...@josephg.com

In this case what i've ended up was that system needs limitation - every client who needs to use concurrent setNull operation, need to use the same initial value, or invertion will break. It's enough to insert empty document, and then apply regular operations on top of it, but everyone needs to agree to insert document of the same type.

Reply all

Reply to author

Forward