On setNull and schemas

Joseph Gentle

unread,

Aug 29, 2015, 2:30:41 AM8/29/15

to sha...@googlegroups.com, derbyjs

I've been working on the new JSON type and thinking about setNull.

Derby currently has a setNull function which lets you set a property
to be some value but only if its currently null. This is super useful
for when your document is missing a field and you want to insert
something into the field.

Eg, imagine I'm storing my friend Jeremy in my database:

{ handle: 'nornagon' }

and I want to add a list of URLs to Jeremy's webpages - so I want:

{
handle: 'nornagon',
urls: ['nornagon.net']
}

Unfortunately, I can't do this in a clean way with OT. If I insert
urls: ['nornagon.net'] when someone else inserts urls:
['github.com/nornagon'] we'll end up with only one of our two URLs.
The other one will be removed! (aak!)

So we want to be able to say:
user.setNull('urls', []); user.urls.push('nornagon.net');
... And have that come out alright.

Because the JSON OT code is last-writer-wins, there's no real way to
do this right now. We've been punting on the problem until the new
JSON OT type which will magically fix everything, and that time is
here.

I have four possible solutions, and I'd like some feedback on which
one(s) to implement:

Option 1: Make the normal insert / move first-writer-wins instead of
last-writer-wins. I can't remember why we currently break ties with
last-writer-wins semantics. (Does anyone know?)

If its just arbitrary, we could swap it around to make it
first-writer-wins. Then you could at least implement setnull with an
extra round-trip. (Which is awful, but doable). First do a round-trip
to the server inserting urls:[] and when that comes back (even if
someone else got there first) you can insert into the list.

Option 2: We add an alternate insert to the new OT type. So, we'd be
able to say something like:
(path: 'urls', insert if null: []), (path:'urls.0', insert 'nornagon.net')

The simplest way to make the semantics of this consistent are that if
you both setnull different values, one person wins and the other
person's edits get deleted. Compose will have to be a bit tricky, but
we'll cope.

Option 3: We make insert more clever...

We could make it so that if two people insert the same thing at the
same place, neither person's edit gets overwritten.

So:
user.urls = ['nornagon.net']
vs
user.urls = ['github.com/nornagon']
would result in one url winning and the other losing

but:
user.urls = []; users.urls[0] = 'nornagon.net';
vs
user.urls = []; users.urls[0] = 'github.com/nornagon';
would result in both URLs (in some arbitrary, consistent order)

To make this work, transform would check when two inserts collide to
see if they're inserting the same value. If they are, we just let them
'both win' as it were.

Its sort of a quirk of the way operations are written that those are
both independently expressible. I was about to make one of them
invalid when I realised the possibility.

Option 4: We solve the problem in a different layer - in particular in livedb.

So, basically the problem we're trying to solve here is that of data
schemas. Another (maybe better) way to solve this problem is to make
every collection have an (optional) schema / default value defining
what fields are in the document.

The schema could even be as simple as a version number + default value
for the document.
So, user schema V1 is {}
user schema V2 is {urls:[]}

... Then we define an operation for converting documents from schema
V1 to V2. In this case, (insert urls:[])

When the client code is updated to care about a url list, it requests
the document at schema v2 and livedb automatically migrates the data
between versions before handing the data to the client. Then the
clients can just insert the new URLs in and everything will be peachy.
I'm not sure what clients with old code are supposed to do - what
happens if you request the document at V1? Does it error? That means
you have to restart all your frontend servers at exactly the same
time. I guess this a problem anyway.

---

I'm leaning toward ditching an explicit setNull and implementing
options 3 and 4. Option 3 is the ghetto hacky way for people who are
into that sort of thing and option 4 will satisfy people who care more
about correctness.

(Oh, and I won't get around to adding schemas to livedb anytime soon.
We'll need to do some more design work before writing that code.)

What do you all think? Are there any other use cases for setNull that
you care about? (fiddling with insert is strictly less powerful than
having a proper setnull operator).

-J

Joseph Gentle

unread,

Aug 29, 2015, 8:13:28 PM8/29/15

to sha...@googlegroups.com, derbyjs

Not solving the problem is basically where I was going with option 4.
Which is to say, we punt the problem to livedb and then its not a
problem that needs fancy OT semantics.

I've thought about making insert just automatically create parent
objects before but I think its a bad idea - I think there should be
some kind of explicit intentionality about it. If I rename a field
across my database and forget to change one piece of code somewhere,
I'd rather that code crashes than it just recreate the old field every
time it runs. Those bugs are the worst. (Also I think there was some
quite good OT-related reason why automatically making objects would be
bad, but I can't remember what it is now)

-J

On Sun, Aug 30, 2015 at 4:30 AM, Jeremy Apthorp <norn...@nornagon.net> wrote:
> Is it an option to simply not solve this problem? Create the user blob with
> the 'urls' field already in place :) then upgrading the schema looks like
> {set "urls", []} for all users. I guess clients might have an old version
> and try to create users without that field though. Hm.
>
> Another option would be to define "insert" on undefined to treat it as an
> empty list, same way "add" treats undefined as 0.

>> --
>> You received this message because you are subscribed to the Google Groups
>> "ShareJS" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to sharejs+u...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "ShareJS" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sharejs+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Carl-Johan Blomqvist

unread,

Sep 2, 2015, 11:13:31 AM9/2/15

to Derby

My 5 cents for the OT part:
I'm not sure I'm following you properly. If you do setNull and then push on two clients simultaneously, would that still cause the issue? If not, then it seems like there only need to be some kind of convenience fn for calling both in one (which seems like something that belongs on a Racer level). If it does, then it seems like it should be solved in the OT layer, at least eventually, since it seems like one should be able to add items to a existing/non-existing array without a schema. Here, I'm a little bit lost. It seems to me like .push (insert) is not working as desired if 2 clients tries to push two elements simultaneously and only one gets added. I.e. it seems to me that if these are the operations:
<client1: setnull-op>
<client1: insert-op>
<client2: setnull-op>
<client2: insert-op>

Then it seems like one of the setNulls should be discarded automatically (the second one), which would make us have the following list of ops:
<client1: setnull-op>
<client1: insert-op>
<client2: insert-op>

Which should work right now right? If that doesn't work, what happens in normal situations where there's a long time delay between client1 and 2's actions? (i.e. when two clients pushes items to an array generally speaking). Or am I missing something? Maybe I'm just unaware of the OT code and my assumptions are completely off - but from a user perspective it seems like both should be inserted. It should however probably be possible to re-set the whole array and then it's a different question. Generally, it seems like option 3 is favourable.

My 5 cents for schemas:
To implement schemas at the livedb-layer would be pretty nice. In Derby (or more specifically Racer), it's possible to have schemas but hard to make them very useful. From a Derby-perspective, there are three things which schemas can add:
1. Enforce only certain data gets added to the actual DB/server-side
- Existing solutions for this work ok and it's pretty easy to add support for this. However, this should be last resort since this does not necessarily make for good UX. Further, currently it doesn't really work to revert operations in the client afterwards - at least not reliably. (although it sounds like support for this can be more easily added with the new OT code if each op have an inverted op)
2. Hook up certain model paths for schema validation (fields as well as more complicated objects).
- There are a lot of things to think about for this and a lot of different options would probably be needed (such as if the schema should enforce it, just throw errors, or put all errors at another path, etcetera). Generally, it seems this discussion is only partly related to this discussion so I'll skip it.
3. Have default values.
- This seems also to be quite difficult, or at least quite impractical, to implement at a Derby/Racer-level since neither really touches the data when transmitted to the server (please correct me if I'm wrong) - it seems it goes more or less straight to Share/LiveDB. Thus, it needs to be added somewhere in that part of the stack. Currently, one could possibly do it in the hooks of Share. However, this does not fare well with the client, since the client just expects the operation to have gone through (not have been modified). Possibly, this could be fixed in Racer by comparing the actual data to the data returned by Share after an operation has been applied (i.e. when any default values would have been set), but it seems we're then also blurring the lines of what an operation actually means (since it can also add default values).

Generally, adding schemas to the stack would be really really neat and might be one of the largest missing pieces of the Derby-ecosystem currently.

Joseph Gentle

unread,

Sep 14, 2015, 5:47:03 PM9/14/15

to Dmitry, derbyjs, sha...@googlegroups.com

Yeah, that's what I'm implementing!

On 15 Sep 2015 3:49 am, "Dmitry" <uvar...@gmail.com> wrote:

If you remember we had this discussion for json0 and i think my pull request is still hanging somewhere :)

I was trying to achieve a solution for creating an editable object on the client who connects first, taking in mind possible concurrency. In my case, I need to insert child Rich Text object into parent JSON to start working on it. For example:
initial: {}
client1: { doc: {--rich text object format--} }, client 1 do some edits here
client2 (concurrently): { doc: {--rich text object format--} }, client 2 do some edits here

expected result:
{doc: {client1 edits + client2 edits} }

so here neither "first wins" nor "last wins" works.

Reply all

Reply to author

Forward