I've been working on the new JSON type and thinking about setNull.
Derby currently has a setNull function which lets you set a property
to be some value but only if its currently null. This is super useful
for when your document is missing a field and you want to insert
something into the field.
Eg, imagine I'm storing my friend Jeremy in my database:
{ handle: 'nornagon' }
and I want to add a list of URLs to Jeremy's webpages - so I want:
{
handle: 'nornagon',
urls: ['
nornagon.net']
}
Unfortunately, I can't do this in a clean way with OT. If I insert
urls: ['
nornagon.net'] when someone else inserts urls:
['
github.com/nornagon'] we'll end up with only one of our two URLs.
The other one will be removed! (aak!)
So we want to be able to say:
user.setNull('urls', []); user.urls.push('
nornagon.net');
... And have that come out alright.
Because the JSON OT code is last-writer-wins, there's no real way to
do this right now. We've been punting on the problem until the new
JSON OT type which will magically fix everything, and that time is
here.
I have four possible solutions, and I'd like some feedback on which
one(s) to implement:
Option 1: Make the normal insert / move first-writer-wins instead of
last-writer-wins. I can't remember why we currently break ties with
last-writer-wins semantics. (Does anyone know?)
If its just arbitrary, we could swap it around to make it
first-writer-wins. Then you could at least implement setnull with an
extra round-trip. (Which is awful, but doable). First do a round-trip
to the server inserting urls:[] and when that comes back (even if
someone else got there first) you can insert into the list.
Option 2: We add an alternate insert to the new OT type. So, we'd be
able to say something like:
(path: 'urls', insert if null: []), (path:'urls.0', insert '
nornagon.net')
The simplest way to make the semantics of this consistent are that if
you both setnull different values, one person wins and the other
person's edits get deleted. Compose will have to be a bit tricky, but
we'll cope.
Option 3: We make insert more clever...
We could make it so that if two people insert the same thing at the
same place, neither person's edit gets overwritten.
So:
user.urls = ['
nornagon.net']
vs
user.urls = ['
github.com/nornagon']
would result in one url winning and the other losing
but:
user.urls = []; users.urls[0] = '
nornagon.net';
vs
user.urls = []; users.urls[0] = '
github.com/nornagon';
would result in both URLs (in some arbitrary, consistent order)
To make this work, transform would check when two inserts collide to
see if they're inserting the same value. If they are, we just let them
'both win' as it were.
Its sort of a quirk of the way operations are written that those are
both independently expressible. I was about to make one of them
invalid when I realised the possibility.
Option 4: We solve the problem in a different layer - in particular in livedb.
So, basically the problem we're trying to solve here is that of data
schemas. Another (maybe better) way to solve this problem is to make
every collection have an (optional) schema / default value defining
what fields are in the document.
The schema could even be as simple as a version number + default value
for the document.
So, user schema V1 is {}
user schema V2 is {urls:[]}
... Then we define an operation for converting documents from schema
V1 to V2. In this case, (insert urls:[])
When the client code is updated to care about a url list, it requests
the document at schema v2 and livedb automatically migrates the data
between versions before handing the data to the client. Then the
clients can just insert the new URLs in and everything will be peachy.
I'm not sure what clients with old code are supposed to do - what
happens if you request the document at V1? Does it error? That means
you have to restart all your frontend servers at exactly the same
time. I guess this a problem anyway.
---
I'm leaning toward ditching an explicit setNull and implementing
options 3 and 4. Option 3 is the ghetto hacky way for people who are
into that sort of thing and option 4 will satisfy people who care more
about correctness.
(Oh, and I won't get around to adding schemas to livedb anytime soon.
We'll need to do some more design work before writing that code.)
What do you all think? Are there any other use cases for setNull that
you care about? (fiddling with insert is strictly less powerful than
having a proper setnull operator).
-J