Its time to redesign ShareJS.

1,457 views
Skip to first unread message

Joseph Gentle

unread,
Mar 18, 2013, 10:03:49 PM3/18/13
to sha...@googlegroups.com
A month ago I got hired by Lever and moved over to San Francisco.
We're building an applicant tracking system for hiring. Its a realtime
web app built on top of [Derby](http://derbyjs.com/) and
[Racer](http://racerjs.com/), which is a web framework written by the
company's cofounders.

Racer doesn't do proper OT and it doesn't scale. Over the next few
months, I'm going to refactor and rewrite big chunks of ShareJS so we
can use it underneath racer to keep data in sync between the browser
and our servers. I'm going to refactor ShareJS into a few modules
(long overdue), add live queries to ShareJS and make the database
layer support scaling.

I want feedback on this before I start. I will break things, but I
think its worth it in the long term.

So, without further ado, here's the master plan:

https://dl.dropbox.com/u/2494815/redesign%20architecture.png
(I've attached the same diagram at the bottom of this email, but I'm
not sure it'll survive google groups.)


# Standardized OT Library

First, ShareJS's OT types are written to a simple API and don't depend
on any external services. I'm going to pull them out into their own
project, akin to [libOT](https://github.com/josephg/libot).

The types here should be super stable and fast, and preferably written
in multiple languages.

I considered adding some simple, reusable OT management code in there
too, but by the time I pared OT down until I had something reusable,
it was just a `for` loop.

I'm not sure where the text & JSON API wrappers should go. The
wrappers are generally useful, but not coded in a particularly
reusable way.

# Scalable database backend

Next, we need a scalable version of ShareJS's database code. I want to
pull out ShareJS's database code and make it support scaling the
server across multiple machines.

I also want to add:
- **Collections**: Documents will be scoped by collection. I expect
collections to map to SQL tables, mongodb collections or couchdb
databases. Collections seem to be a standard, useful thing.
- **Live queries**: I want to be able to issue a query saying "get me
all docs in the profiles collection with age > 50". The result set
should update in realtime as documents are added & removed from that
set. This should also work with paginated requests. I don't want to
invent my own query language - I'll just use whatever native format
the database uses. (SQL select statements, couchdb views, mongo find()
queries, etc).
- **Snapshot update hooks**: For example, I want to be able to issue a
query to a full-text search database (like SOLR) and reuse the same
live query mechanism. I imagine this working via a post-update hook
that the application can use to update SOLR. As a first pass, I'll
poll all outstanding queries against the database when documents are
updated, but I can optimise for certain common use cases down the
track.

I want to get the API here stable first and let the implementation
grow in complexity as we need it to be more scalable and reliable. At
first, this code will route all messages through a single redis
server. Later I want to set it up with a redis slave for automatic
failover and make the server shard between multiple DB instances using
consistant hashing of document IDs or something.

I'm nervous about how the DB code and the operational transform
algorithm will work. If the DB backend doesn't understand OT, the API
will have to be strongly tied to ShareJS's model code and harder to
reuse. But if I make it understand OT and subsume ShareJS's model
code, it makes the DB code much harder to adapt to work with other
databases (you'll need to rewrite all that code!). I really love the
state of [model.coffee](https://github.com/josephg/ShareJS/blob/master/src/server/model.coffee)
in ShareJS at the moment, though it took me 2 near complete rewrites
to get to that point.

I would also like to make a trivial in-memory implementation for
examples and for testing. Once I have two implementations and a test
suite, it should be possible to rewrite this layer on top of Hadoop or
AWS or whatever.


# ShareJS code

Whats left for ShareJS?

ShareJS's primary responsibility is to let you access the OT database
in a web browser or nodejs client in a way thats secure & safe.

It will (still) have these components:
- Auth function for limiting reading & writing. I want to extend this
for JSON documents to make it easy to restrict / trim access to
certain parts of some documents.
- Session code to manage client sessions. All the protocol stuff thats
in [session.coffee](https://github.com/josephg/ShareJS/blob/master/src/server/session.coffee).
I want to rewrite / refactor this to use NodeJS's new streams.
- Presence, although this will require some rethinking to work with
the new database backend stuff.
- A simple API that lets you tell the server when it has a new client,
and pass messages for it. I'm sick of all the nonsense around
socket.io, browserchannel, sockjs, etc so I want to just make it the
user's problem. Again, this will use the new streams API. This also
makes it really easy for applications to send messages to their server
that don't have anything to do with OT.
- Equivalent connection code on the client, currently in
[client/connection.coffee](https://github.com/josephg/ShareJS/blob/master/src/client/connection.coffee).
- Client-side OT code, currently in
[client/doc.coffee](https://github.com/josephg/ShareJS/blob/master/src/client/doc.coffee).
- Build script to bundle up & minify the client and required OT types
for the browser. I want to rewrite this in Make. (Sorry windows
developers).
- Tests. It looks like nodeunit is no longer actively maintained, so
it might time to port the tests to a different framework.
(Suggestions? What does everybody use these days?)

ShareJS has slowly become a grab bag of other stuff that I like. I'm
not sure whether all this stuff should stay in ShareJS or what.

There is:
- The examples. These will wire ShareJS up with browserchannel and
[express](http://expressjs.com/). The examples will add a few
dependancies that ShareJS won't otherwise have.
- The different database backends. Unless someone makes an adapter for
my new database code, these are all going to break. Sorry.
- Browser binding for textareas, ace and codemirror
- All the ongoing etherpad work. I met a bunch of etherpad & etherpad
lite developers at an event last week, and they were awesome. Super
happy this is happening.



---------------------------

# Thoughts

Thats the gist of the redesign. Some thoughts:

> I hate making ShareJS more complicated, but at the same time I think its important to make it actually useful. People need to scale their servers and they need to be able to build complex applications on top of all this stuff. I love how ShareJS's entire server is basically encapsulated in one file, and it'll be a pity to lose that.

> This change will break existing code. Sorry. The current DB adapters will break, and putting documents in collections will change APIs all the way though ShareJS.

> I'm still not entirely sure how this redesign will interact with my [C port of ShareJS](https://github.com/josephg/sharedb). Before I realised how integral ShareJS would be to my current work, I was intending to finish working on my C implementation next. For now, I guess that'll take a back seat. (In exchange, I'll be working on this stuff while at work, and not just on weekends.)

> This design allows some nice application features. For example, the auth stuff can much more easily enforce schemas for documents. You could enforce that everything in the 'code' collection has type 'text', everything in the 'projects' collection is JSON (with a particular structure) and items in the 'profiles' directory are only editable by user who owns the profile. You could probably do that before, but it was a bit more subtle.

> As I said above, I'm not sure where the line should be drawn between the DB project and the model. If they're two separate projects, they should have a very clear separation of concerns. I'm really trying to build a DB wrapper that provides the API that I want databases to provide directly, in a scalable way. However, that idea is entangled with the OT and presence functionality. What a mess.


----------


I want feedback on all this stuff. I know a lot of people are building
cool stuff with ShareJS, or want to. Do these plans make your lives
better or worse? Should we keep the current simple incantation of
ShareJS around? If I'm taking the time to rip the guts out of ShareJS,
what else would you like to see changed? How do these ideas interact
with the etherpad interaction work?

Thanks!
-Joseph
redesign architecture.png

Collin Miller

unread,
Mar 19, 2013, 12:29:29 AM3/19/13
to sha...@googlegroups.com
I think the biggest issues I had with ShareJS are: 

Programming the auth functions and slowness of the json OT code.

The json OT code uses json stringify/parsing to dup objects and it didn't really take much data
to choke it (The comments suggest the possibility of removing it, but that's beyond my understanding).

My journey into the profiler always showed the object cloning to be the culprit here.

I'm actually on Firebase now, but would much rather be on an OT system and host my own database.

Have you played around with Firebase much? I've found some small joys in the client
API, not exactly sure it would all translate back to ShareJS, but it's some good stuff.





-Joseph

--
You received this message because you are subscribed to the Google Groups "ShareJS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sharejs+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Joseph Gentle

unread,
Mar 19, 2013, 12:43:42 AM3/19/13
to sha...@googlegroups.com
I know how to fix that.

The problem is that the model caches the document. If the server
receives an invalid op (something that makes the OT code throw an
exception), if I don't clone the snapshot, the snapshot will be in an
invalid state.

The easy fix is to just invalidate the cache when that happens. The
downside is that someone could DoS the server by submitting lots of
invalid ops.

The harder fix is using better preprocessing checks before trying to
apply an operation, though there's no way I'd catch every edge case
without hooking up a fuzzer.

Eventually I'd also like a JSON OT implementation in C to sit
alongside my text implementation, but its not a priority for me.

-J

Wout Mertens

unread,
Mar 19, 2013, 2:02:34 AM3/19/13
to sha...@googlegroups.com

First of all, Yey! :-)

Secondly, I'd rather optimize for the case without invalid ops and build in throttling for clients that seems invalid ops.

Thirdly, promises are really great. They make asynchronous programming a lot less error prone. I really like the Q library at http://github.com/kriskowal/q . It would be awesome if share were to use promises.

I'm on my phone now, I'll respond later with my thoughts on the overall architecture.

Wout.

fmayot

unread,
Mar 19, 2013, 4:31:39 PM3/19/13
to sha...@googlegroups.com
Just my 2 cents regarding your proposal:
  • Scalability: I'm not sure if scaling the DB layer will be sufficient. Looking at my own use case, I'd probably look at ways to scale the Session/OT layer too. I would be interested in your perspective though. That could I guess be handled outside of ShareJS entirely (e.g. having a load balancing layer that dispatches documents to different servers).
  • Presence: I'm probably missing something here but is there a strong reason to tie a presence system to ShareJS? Couldn't that be implemented separately using an XMPP infrastructure for instance? I think the question is more if you want to build a OT platform or a Collaboration platform
  • Platform availability: It seems that there isn't a lot of solutions out there for OT native clients. Might be a good thing to make the design/protocols/etc. such that ports to C/Java/etc. are easily doable and efficient.
2 additional points regarding the positioning of ShareJS against other solutions:
  • Google has just released a Real-time API for Drive, which means that ShareJS probably won't be the solution for those willing to integrate with Drive
  • How can ShareJS fit into the PaaS picture? There's an example/script to deploy on Amazon but I'm wondering what will be needed to make deployment possible on any other PaaS provider. I'm also tempted to go back to my comment on scalability and reliability, which is a major selling point for PaaS providers.

Joseph Gentle

unread,
Mar 19, 2013, 5:08:43 PM3/19/13
to sha...@googlegroups.com
On Tue, Mar 19, 2013 at 1:31 PM, fmayot <fma...@gmail.com> wrote:
> Just my 2 cents regarding your proposal:
>
> Scalability: I'm not sure if scaling the DB layer will be sufficient.
> Looking at my own use case, I'd probably look at ways to scale the
> Session/OT layer too. I would be interested in your perspective though. That
> could I guess be handled outside of ShareJS entirely (e.g. having a load
> balancing layer that dispatches documents to different servers).

It will be, though you'll need a load balancer as well.

Basically the work will allow you to spin up a bunch of ShareJS
instances which all connect to the same database cluster. A client can
connect to any sharejs instance and do the same things. So, you can
use nodejs clusters & classical load balancers in front of ShareJS to
scale out.

> Presence: I'm probably missing something here but is there a strong reason
> to tie a presence system to ShareJS? Couldn't that be implemented separately
> using an XMPP infrastructure for instance? I think the question is more if
> you want to build a OT platform or a Collaboration platform

You could do presence through a separate system, but cursors basically
have to be done inline in ShareJS. If I type, your cursor should move
on my screen. In a more complicated example, if I were to make a
spreadsheet using the JSON API, if you delete a column, my cursor
should move correctly.

I guess you could build a system around Sharejs, but its going to have
to know some of Sharejs's internals to work properly. Its easier done
inside Share.

> Platform availability: It seems that there isn't a lot of solutions out
> there for OT native clients. Might be a good thing to make the
> design/protocols/etc. such that ports to C/Java/etc. are easily doable and
> efficient.

I agree! I've wanted that for ages. I actually just got back from
lunch with the Simperium guys who have been working on similar stuff
natively on iOS.

The new server's connection layer will expose a NodeJS 0.10 object
stream for each client. If you add framing, you can pipe that straight
into a TCP stream or whatever. The current protocol is documented
here: https://github.com/josephg/ShareJS/wiki/Wire-Protocol . It'll
need to change to support collections & queries. I'll update it when I
figure out the new protocol.

> 2 additional points regarding the positioning of ShareJS against other
> solutions:
>
> Google has just released a Real-time API for Drive, which means that ShareJS
> probably won't be the solution for those willing to integrate with Drive

I heard about that a couple hours ago. I'll read up on it in a minute...

> How can ShareJS fit into the PaaS picture? There's an example/script to
> deploy on Amazon but I'm wondering what will be needed to make deployment
> possible on any other PaaS provider. I'm also tempted to go back to my
> comment on scalability and reliability, which is a major selling point for
> PaaS providers.

I honestly don't know. Do you have experience with PaaS providers?
What do we need to do to make it easy?

-J

Ted Young

unread,
Mar 20, 2013, 1:25:57 AM3/20/13
to sha...@googlegroups.com
Hi Joseph!

We met at the etherpad meetup. I'm planning on adding sharejs support to ethersheet, so I'm happy to see you're going to town on it. In particular:

- Breaking things up into multiple modules
- Switching to node.js standard conventions, such as err first etc
- Using streams, and in general moving towards a "plays well with others" as opposed to "kitchen sink" style

I'm confident you'll get scalability working, and I look forwards to lending a hand when I switch to working on ethersheet full-time.

Cheers,

Ted

fmayot

unread,
Mar 20, 2013, 2:01:44 PM3/20/13
to sha...@googlegroups.com

Thanks Joseph. A few comments below

On Tuesday, March 19, 2013 2:08:43 PM UTC-7, Joseph Gentle wrote:
On Tue, Mar 19, 2013 at 1:31 PM, fmayot <fma...@gmail.com> wrote:
> Scalability: I'm not sure if scaling the DB layer will be sufficient.
> Looking at my own use case, I'd probably look at ways to scale the
> Session/OT layer too. I would be interested in your perspective though. That
> could I guess be handled outside of ShareJS entirely (e.g. having a load
> balancing layer that dispatches documents to different servers).

It will be, though you'll need a load balancer as well.

Basically the work will allow you to spin up a bunch of ShareJS
instances which all connect to the same database cluster. A client can
connect to any sharejs instance and do the same things. So, you can
use nodejs clusters & classical load balancers in front of ShareJS to
scale out.

Not sure I get that part... 2 clients accessing the same ShareJS document should be connected to the same ShareJS instance or am I missing something? If that's the case, then you need to use some kind of custom load balancing of the documents between ShareJS instances. This seems doable but not as trivial as putting a standard SW or HW load balancer as an entry point.

> Platform availability: It seems that there isn't a lot of solutions out
> there for OT native clients. Might be a good thing to make the
> design/protocols/etc. such that ports to C/Java/etc. are easily doable and
> efficient.

I agree! I've wanted that for ages. I actually just got back from
lunch with the Simperium guys who have been working on similar stuff
natively on iOS.

The new server's connection layer will expose a NodeJS 0.10 object
stream for each client. If you add framing, you can pipe that straight
into a TCP stream or whatever. The current protocol is documented
here: https://github.com/josephg/ShareJS/wiki/Wire-Protocol . It'll
need to change to support collections & queries. I'll update it when I
figure out the new protocol.

Sounds great. The only thing missing would be a port of the OT types. You addressed that point in your initial post I think. 
 
> How can ShareJS fit into the PaaS picture? There's an example/script to
> deploy on Amazon but I'm wondering what will be needed to make deployment
> possible on any other PaaS provider. I'm also tempted to go back to my
> comment on scalability and reliability, which is a major selling point for
> PaaS providers.

I honestly don't know. Do you have experience with PaaS providers?
What do we need to do to make it easy?

Unfortunately I don't. I've started looking into this. There isn't a lot of PaaS offering persistent connection support. So far, I've listed OpenShift (RedHat) and Nodejitsu, which both provide websockets support. Heroku offers XHR-polling support, which should be fine in most scenarios. Nodejitsu doesn't seem to host the DBs but rather offers hooks to PaaS DB providers. I'm not sure how this will work at large scale with data intensive /  low latency applications (and also in terms of bandwidth costs...) The only annoying part is the scalability. Most of these platforms offer auto-scalable hosting, firing up instances as needed. However (and hence my question above), I'm not sure if we could offer something as straightforward with ShareJS. 

Joseph Gentle

unread,
Mar 20, 2013, 2:36:44 PM3/20/13
to sha...@googlegroups.com
On Wed, Mar 20, 2013 at 11:01 AM, fmayot <fma...@gmail.com> wrote:
>
> Thanks Joseph. A few comments below
>
> On Tuesday, March 19, 2013 2:08:43 PM UTC-7, Joseph Gentle wrote:
>>
>> On Tue, Mar 19, 2013 at 1:31 PM, fmayot <fma...@gmail.com> wrote:
>> > Scalability: I'm not sure if scaling the DB layer will be sufficient.
>> > Looking at my own use case, I'd probably look at ways to scale the
>> > Session/OT layer too. I would be interested in your perspective though.
>> > That
>> > could I guess be handled outside of ShareJS entirely (e.g. having a load
>> > balancing layer that dispatches documents to different servers).
>>
>> It will be, though you'll need a load balancer as well.
>>
>> Basically the work will allow you to spin up a bunch of ShareJS
>> instances which all connect to the same database cluster. A client can
>> connect to any sharejs instance and do the same things. So, you can
>> use nodejs clusters & classical load balancers in front of ShareJS to
>> scale out.
>
>
> Not sure I get that part... 2 clients accessing the same ShareJS document
> should be connected to the same ShareJS instance or am I missing something?
> If that's the case, then you need to use some kind of custom load balancing
> of the documents between ShareJS instances. This seems doable but not as
> trivial as putting a standard SW or HW load balancer as an entry point.

Right; so the first version is going to allow multiple sharejs
instances to access the same document at the same time. I'm going to
achieve this by doing STM-style operation commits in the database. To
do this, you need database support for a query like "append to the
oplog but only if the last entry in the oplog is version X".

In redis 2.6 you can do this using a lua script. In mongo you can
specify the current version number in an update. Eg, update({_id:xxx,
v:50}, {$push:...., v:51}). The return value tells you how many
records were updated. If someone else appended to the oplog first, the
query won't match any records, nothing will be updated the update will
return 0. In that case, you have to fetch the new operation and retry.

If all the clients accessing a document are connected to the same
server, the current caching logic should keep performance the same as
it is at the moment. If multiple clients connected to multiple sharejs
instances all try and edit the document at the same time, performance
will ultimately be limited by the database. But this pushes a lot of
the atomic update requirements to the database, which should be able
to use persistant hashing and whatnot to scale out.

Once I get this working, I want to benchmark the crap out of it and
see how it actually performs. But I figure this is a first step, and
I'm hoping that changes that rearchitect the database code will be
limited to the separated out realtime database project.

>> > Platform availability: It seems that there isn't a lot of solutions out
>> > there for OT native clients. Might be a good thing to make the
>> > design/protocols/etc. such that ports to C/Java/etc. are easily doable
>> > and
>> > efficient.
>>
>> I agree! I've wanted that for ages. I actually just got back from
>> lunch with the Simperium guys who have been working on similar stuff
>> natively on iOS.
>>
>> The new server's connection layer will expose a NodeJS 0.10 object
>> stream for each client. If you add framing, you can pipe that straight
>> into a TCP stream or whatever. The current protocol is documented
>> here: https://github.com/josephg/ShareJS/wiki/Wire-Protocol . It'll
>> need to change to support collections & queries. I'll update it when I
>> figure out the new protocol.
>
>
> Sounds great. The only thing missing would be a port of the OT types. You
> addressed that point in your initial post I think.

Yeah. I've ported the 'text2' type in ShareJS to C here:
https://github.com/josephg/libot .

I benchmarked the crap out of it, and for small ops, its really fast.
Like, the millions of transforms per second fast. (Compared to
tens-of-thousands with my coffeescript implementation). Unfortunately
I lost all the benchmark graphs when my laptop got stolen late last
year. That'll show me for not putting it online.

As part of this change, I'm promoting text2 to be the default text
type in ShareJS. I also want to start identifying ot types by
canonical URLs through which you can access where their specs, code in
various languages and tests.

I'm about to start uploading ShareJS's types here:
https://github.com/josephg/ot-types . Should the C types should live
in a separate repository from their JS equivalents?

I would also really like to have a C implementation of the JSON2 type
once we're happy with the coffeescript implementation.

>> > How can ShareJS fit into the PaaS picture? There's an example/script to
>> > deploy on Amazon but I'm wondering what will be needed to make
>> > deployment
>> > possible on any other PaaS provider. I'm also tempted to go back to my
>> > comment on scalability and reliability, which is a major selling point
>> > for
>> > PaaS providers.
>>
>> I honestly don't know. Do you have experience with PaaS providers?
>> What do we need to do to make it easy?
>
>
> Unfortunately I don't. I've started looking into this. There isn't a lot of
> PaaS offering persistent connection support. So far, I've listed OpenShift
> (RedHat) and Nodejitsu, which both provide websockets support. Heroku offers
> XHR-polling support, which should be fine in most scenarios. Nodejitsu
> doesn't seem to host the DBs but rather offers hooks to PaaS DB providers.
> I'm not sure how this will work at large scale with data intensive / low
> latency applications (and also in terms of bandwidth costs...) The only
> annoying part is the scalability. Most of these platforms offer
> auto-scalable hosting, firing up instances as needed. However (and hence my
> question above), I'm not sure if we could offer something as straightforward
> with ShareJS.

Well, as I said above, you can :D And now you can pick your
server-browser communication library as needed.

-J

fmayot

unread,
Mar 20, 2013, 3:36:38 PM3/20/13
to sha...@googlegroups.com
I see but there's something I still don't quite understand (certainly because I'm not enough familiar with OT). If you have the following scenario where client A is connected to SJS #1, and client B to SJS #2. Both SJS instances are accessing the same DB cluster. Client A and B are accessing the same document. Client A pushes OP_A to SJS #1. OP_A is persisted in the DB. SJS 2 needs to update its list of ops to reflect OP_A, transform it and send it to client B. Am I missing something?

In such a scenario, your ShareJS instance needs read/write access to the DB vs write only access if a given document is served by a single instance. I don't quite understand how you can reach low latency and high scalability... (I hope what I'm saying makes some sense ;-)

Do you want to explore this architecture because it's simpler to develop/maintain, is it because it'll be easier to load balance clients, or is it for some other reason?
Last time I looked at the repository, it seemed that the implementation of the JSON2 type has been stalling for a couple months. I looked into the code and documentation but couldn't find what the JSON2 type will bring. Is it conceptually different or is it just a new implementation of the JSON type? Would you also recommend avoiding a port of the current JSON type to C? If yes, why?

Jeremy Apthorp

unread,
Mar 20, 2013, 4:11:09 PM3/20/13
to sha...@googlegroups.com
Stalled because I'm busy :P

It will bring:
- embedded documents of other OT types
- being able to move objects around the tree more freely

See the thread about it earlier on this list for more info.
 

--
You received this message because you are subscribed to the Google Groups "ShareJS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sharejs+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 


--

j

Joseph Gentle

unread,
Mar 20, 2013, 5:19:33 PM3/20/13
to sha...@googlegroups.com
On Wed, Mar 20, 2013 at 12:36 PM, fmayot <fma...@gmail.com> wrote:
> I see but there's something I still don't quite understand (certainly
> because I'm not enough familiar with OT). If you have the following scenario
> where client A is connected to SJS #1, and client B to SJS #2. Both SJS
> instances are accessing the same DB cluster. Client A and B are accessing
> the same document. Client A pushes OP_A to SJS #1. OP_A is persisted in the
> DB. SJS 2 needs to update its list of ops to reflect OP_A, transform it and
> send it to client B. Am I missing something?

Almost. You're right that SJS 2 needs to know about the op so it can
tell client B about the operation. SJS 2 won't do any transformation
unless client B has also sent an operation at the same time.

I'll notify SJS #2 using a pubsub system. To start with, I'm just
using a redis pubsub channel, which is why redis will be required in
the first pass.

> In such a scenario, your ShareJS instance needs read/write access to the DB
> vs write only access if a given document is served by a single instance. I
> don't quite understand how you can reach low latency and high scalability...
> (I hope what I'm saying makes some sense ;-)

ShareJS always needed to be able to read from the database in the case
of cache misses. But you're right - it'll need to read more frequently
when two instances are editing the same document because all of the
communication will go via the database.

> Do you want to explore this architecture because it's simpler to
> develop/maintain, is it because it'll be easier to load balance clients, or
> is it for some other reason?

I want ShareJS to scale. Right now, you can't have more than one
ShareJS instance and that makes it unusable in production for most
people.

The initial version of that using a redis pubsub channel and whatnot
probably won't be the final form of this goal, but I think its a step
in the right direction.

-J

Wout Mertens

unread,
Mar 20, 2013, 6:23:44 PM3/20/13
to sha...@googlegroups.com
Hi Joseph,

Wild idea: How do you feel about making the application server code be another client with more privileges?

In other words, where currently you would include Share as a component of your application, you would instead connect to Share using one of the transports (a local Unix socket?) and some authentication method, then specify an application name and allow you to specify the authorization function.

Example privileges:
- define authorization function for your documents
- listen on events for all documents
- give names to clients

Advantages:
- application server-side code is very similar to client-side code
- clean separation of code
- paves the way to a Share cloud that provides document persistence, authentication and presence
- server-less apps for simple doc sharing

Disadvantages:
- A performance hit because ops and events will need to be sent to the server code
* Mitigation: Allow authentication function to be uploaded to Share so it can run inside Share
- An extra process to run
* Mitigation: Make it optional to work in this way, still allowing embedding Share as now

If this is too out-there, it would still be nice to unify the server and client APIs. I think that's what you're hinting at in your diagram.

Another crazy idea is to use XMPP as the communication protocol and use the XMPP server for presence and authentication. There's a JS XMPP implementation :)

Comments inline:

On Mar 19, 2013, at 3:03 , Joseph Gentle <jos...@gmail.com> wrote:

> # ShareJS code
>
> It will (still) have these components:
> - Auth function for limiting reading & writing. I want to extend this
> for JSON documents to make it easy to restrict / trim access to
> certain parts of some documents.

What about authentication (vs authorization)? Is that something the server app needs to handle or should Share have pluggable authentication modules?

> - Build script to bundle up & minify the client and required OT types
> for the browser. I want to rewrite this in Make. (Sorry windows
> developers).

You don't like the shelljs now in use in the Cakefile? It's cross-platform… It would need https://github.com/arturadib/shelljs/issues/52 to make Make-like things possible.

> - Tests. It looks like nodeunit is no longer actively maintained, so
> it might time to port the tests to a different framework.
> (Suggestions? What does everybody use these days?)

Mocha? Chai?

> ShareJS has slowly become a grab bag of other stuff that I like. I'm
> not sure whether all this stuff should stay in ShareJS or what.

Those could all be plugins under a sharejs umbrella on github.

I'm really looking forward to the redesign, although it's always risky, some projects never get out of it. Here's to a swift change :)

Wout.

fmayot

unread,
Mar 20, 2013, 7:22:58 PM3/20/13
to sha...@googlegroups.com
Can I ask why you'd like to use an OT system rather than Firebase?

Joseph Gentle

unread,
Mar 20, 2013, 7:33:03 PM3/20/13
to sha...@googlegroups.com
On Wed, Mar 20, 2013 at 3:23 PM, Wout Mertens <wout.m...@gmail.com> wrote:
> Hi Joseph,
>
> Wild idea: How do you feel about making the application server code be another client with more privileges?
>
> In other words, where currently you would include Share as a component of your application, you would instead connect to Share using one of the transports (a local Unix socket?) and some authentication method, then specify an application name and allow you to specify the authorization function.
>
> Example privileges:
> - define authorization function for your documents
> - listen on events for all documents
> - give names to clients

Its an interesting idea. Doing authentication would be hard. The
advantage to having auth in-process is that you can use closure
contexts and whatnot to access extra data to make your decisions.
Round-tripping via an attached client process would be terrible for
performance, and just uploading a script that the server executes
would be less powerful than the current system.

> Advantages:
> - application server-side code is very similar to client-side code
> - clean separation of code
> - paves the way to a Share cloud that provides document persistence, authentication and presence
> - server-less apps for simple doc sharing
>
> Disadvantages:
> - A performance hit because ops and events will need to be sent to the server code
> * Mitigation: Allow authentication function to be uploaded to Share so it can run inside Share
> - An extra process to run
> * Mitigation: Make it optional to work in this way, still allowing embedding Share as now
>
> If this is too out-there, it would still be nice to unify the server and client APIs. I think that's what you're hinting at in your diagram.

Yeah, that'd be good. So, the new nodejs 0.10 streams let you have a
stream of objects. Its really trivial to wrap that in a TCP stream to
make native clients which are nice and fast.

> Another crazy idea is to use XMPP as the communication protocol and use the XMPP server for presence and authentication. There's a JS XMPP implementation :)

Ergh no. Embedding everything in an XMPP extension that wraps sharejs
data in an XML stream over TLS is a rabbit hole from which no man
returns the same.

> Comments inline:
>
> On Mar 19, 2013, at 3:03 , Joseph Gentle <jos...@gmail.com> wrote:
>
>> # ShareJS code
>>
>> It will (still) have these components:
>> - Auth function for limiting reading & writing. I want to extend this
>> for JSON documents to make it easy to restrict / trim access to
>> certain parts of some documents.
>
> What about authentication (vs authorization)? Is that something the server app needs to handle or should Share have pluggable authentication modules?

I don't know. As far as I know, nobody has really pushed on the API
and taken it to its limits. Would a plugin architecture be good here?
I don't really have strong opinions.

>> - Build script to bundle up & minify the client and required OT types
>> for the browser. I want to rewrite this in Make. (Sorry windows
>> developers).
>
> You don't like the shelljs now in use in the Cakefile? It's cross-platform… It would need https://github.com/arturadib/shelljs/issues/52 to make Make-like things possible.

.... I think our cakefile is way more complicated than the equivalent
Makefile, and the extra complexity is unnecessary. Its cool, but I
don't think it buys us enough to justify using extra (weird) tools.

>> - Tests. It looks like nodeunit is no longer actively maintained, so
>> it might time to port the tests to a different framework.
>> (Suggestions? What does everybody use these days?)
>
> Mocha? Chai?

Cool. Moving to mocha. https://github.com/josephg/ot-types/tree/master/test

>> ShareJS has slowly become a grab bag of other stuff that I like. I'm
>> not sure whether all this stuff should stay in ShareJS or what.
>
> Those could all be plugins under a sharejs umbrella on github.
>
> I'm really looking forward to the redesign, although it's always risky, some projects never get out of it. Here's to a swift change :)

Yeah agree :)

-J

> Wout.

David Greisen

unread,
Mar 22, 2013, 11:46:45 AM3/22/13
to sha...@googlegroups.com
My big concern is the ability to reject ops based on the resultant document. I currently do that by looking at the document after it has been transformed by the json type, but before the opp has been accepted, then rejecting the opp and discarding the transformed document, if transformed document doesn't validate.

If you remove the object deep copy, (which is not performant) this method won't work. If ops are invertable, can still make it work, though harder. If neither copy, nor invertible, then it basically becomes impossible to do client-side validation of sharejs documents (I think). This would make sharejs unusable for us.

Geoff Goodman

unread,
Mar 22, 2013, 12:04:48 PM3/22/13
to sha...@googlegroups.com
I'd like to chime in on what David mentioned in that it would be very nice to be able to enforce schema conformance.  For example the json-schema standard could be very helpful in documenting and restricting document formats (see http://tools.ietf.org/html/draft-zyp-json-schema-03).

Another idea would be to expose the OT engine such that a client could (for example) request a series of ops from the server and then could replay those ops at a user-defined pace.  This would mean that the same mechanism used by an app for interfacing with ShareJS could be used to support 'replays'. That would be cool!

Geoff

--

Joseph Gentle

unread,
Mar 22, 2013, 12:30:37 PM3/22/13
to sha...@googlegroups.com
This is something my employer wants too. We're talking about adding
code that lets you to whitelist certain paths for edits and analyse
operations by reading the JSON op stream. We'll do this through the
auth function.

I'm not sure if this is exactly what you want, but its pretty close,
and it doesn't depend on invert or cloning documents.

That said, if you want to stay with your current system, you can just
clone the document yourself before applying.

-J

Joseph Gentle

unread,
Mar 22, 2013, 12:31:58 PM3/22/13
to sha...@googlegroups.com
There's already a server-side API for getting the op stream (getOps).
It just isn't exposed through the wire protocol because I didn't need
it. Thats a pretty simple patch away.

-J

Wout Mertens

unread,
Mar 23, 2013, 5:28:59 AM3/23/13
to sha...@googlegroups.com

You can still revert when you don't deep clone or have invertible ops : simply take an older snapshot and apply all accepted ops from then onwards. Pretty fast operation. This means having the deep clone for the historic snapshot only every few seconds instead of on each apply.

Wout.

--

Josh Taylor

unread,
Mar 29, 2013, 2:12:14 PM3/29/13
to sha...@googlegroups.com
Build: Would you consider using gruntjs for builds instead of Make? 

Test: I think Jasmine and Mocha are the two most popular test frameworks.

Soroush Hat

unread,
Mar 30, 2013, 5:32:36 AM3/30/13
to sha...@googlegroups.com
I think that the Meta branch is so important, what's your plan for it?

Joseph Gentle

unread,
Mar 30, 2013, 5:46:21 AM3/30/13
to sha...@googlegroups.com
Do you want the general metadata stuff or just cursors?

Soroush Hat

unread,
Mar 31, 2013, 4:37:09 PM3/31/13
to sha...@googlegroups.com
I'm mostly interested in general meta-data, but having cursors would be nice as well.

David Greisen

unread,
Mar 31, 2013, 5:24:17 PM3/31/13
to sharejs

Second the general metadata

Joseph Gentle

unread,
Mar 31, 2013, 5:31:00 PM3/31/13
to sha...@googlegroups.com
What are your use cases?

David Greisen

unread,
Mar 31, 2013, 7:30:30 PM3/31/13
to sharejs
Top priority for me is to know who is logged on, when they were last active, what documents they have open.

Soroush Hat

unread,
Mar 31, 2013, 7:43:03 PM3/31/13
to sha...@googlegroups.com
In my project there is a discussion thread per document (similar to google docs discussion thread) were people can tag different section of the document and talk about it.

Wout Mertens

unread,
Apr 4, 2013, 8:16:35 AM4/4/13
to sha...@googlegroups.com
On Mar 29, 2013, at 19:12 , Josh Taylor <josh...@gmail.com> wrote:

Build: Would you consider using gruntjs for builds instead of Make? 

+1

Wout.

Mauro

unread,
Apr 4, 2013, 8:36:27 AM4/4/13
to sha...@googlegroups.com
My primary usecase for ShareJS would be collaborative rich text editing (see https://github.com/josephg/ShareJS/issues/1). Seems like the most promising approach is to convert the rich text to a JSON representation and work on that.

In particular, I’d be interested in using ShareJS with something like TinyMCE which has only getContent and setContent methods and no way to track insertions, deletion as individual OT ops (which would be hard to track since changes can come from a variety of sources like typing, copy&pasting, plugins, etc). An alternative to OT (which is referred to as Event passing in that paper) was proposed by http://neil.fraser.name/writing/sync/ but maybe that’s not necessary.

So I’d love ShareJS to have a method where I simply can put in the new text and it figures out all the changes itself like:

onLocalChange ->
    localHtml = editor.getContent()
    sharejs.setContent(localHtml)

And on the other end(s) the incoming patches would be applied to a dirty (or fuzzy) text that has already changed again from typing locally, like:

onRemoteChange ->
    localHtml = editor.getContent()
    updatedHtml = sharejs.patch(localHtml)
    editor.setContent(updatedHtml)

Seems like a lot of this work has already been done for the textarea example, but there I have to give it a textarea DOM element instead of having an API to set and get strings.

Hope my use case description helps.

Best,
Mauro

Wout Mertens

unread,
Apr 4, 2013, 9:05:44 AM4/4/13
to sha...@googlegroups.com
You can get the innerhtml() of the dom element and pass that to the same diff method that textarea.js has.

Storing the DOM as a JSON object may be an option but it's probably quite slow too. You would have to convert each DOM element into a regular JSON object that has all the right attributes to create a DOM element again.

Wout.

Message has been deleted

Mauro

unread,
Apr 12, 2013, 10:48:22 AM4/12/13
to sha...@googlegroups.com
Actually I forked textarea.js until it worked with TinyMCE. I even introduced an atomi delete-plus-insert-op to not lose opening tags when moving them around, but I don't think I'll ever get it to work 100% reliably since the interactions between OT, HTML tags and the editor's clean-up methods are just too complex. OT is not meant for rich text. Either you work on a tree (like JSON) or you use OT on plain text and store the rich text markup with range annotations like Google Docs and Firepad does.

@Joseph, is rich text something you're looking into for the next major release or not quite yet? If you read Firebase's blog post (https://www.firebase.com/blog/2013-04-09-firepad-open-source-realtime-collaborative-editor.html), I think the market is definitely there.

Best,
Mauro

Joseph Gentle

unread,
Apr 12, 2013, 9:02:10 PM4/12/13
to sha...@googlegroups.com
Yeah I saw that post. OT _is_ meant for rich text, but as you say you
have to do range annotations.

I'd like to do rich text (I've been talking about rich text ever since
ShareJS launched). Doing OT isn't really that hard, someone just has
to spend the necessary couple weeks _doing it_. The place to put a
rich text OT implementation is https://github.com/josephg/ot-types .
Scroll down to read about what functions you need to implement.

-J

Mauro

unread,
Apr 16, 2013, 4:16:47 AM4/16/13
to sha...@googlegroups.com
Thanks for the answer, I'll look into it the next time I've got a week to spare. Would you transmit the annotations via a separate channel with JSON OT or wrap everything in one ot-type?

Joseph Gentle

unread,
Apr 16, 2013, 11:02:13 AM4/16/13
to sha...@googlegroups.com

Wrap everything in one type. I wouldn't reference JSON OT at all.

-J

On Apr 16, 2013 1:16 AM, "Mauro" <mauro...@gmail.com> wrote:
Thanks for the answer, I'll look into it the next time I've got a week to spare. Would you transmit the annotations via a separate channel with JSON OT or wrap everything in one ot-type?

Vincent Woo

unread,
Apr 21, 2013, 9:46:37 AM4/21/13
to sha...@googlegroups.com
  • Seconding general meta-data as very important for me, as well. Right now, detecting disconnects is a huge pain, as well as hacky support for cursors.
  • Also, some simpler helper operations like "blow away all the text in this text object and replace it with this" would be nice. Right now I have to reimplement everything in terms of submitOp.
  • It would be nice to see all json subdocs be castable to any other type. right now, i can't make, say, a "text2" doc out of a json's .at('sometextfield').

Geoff Goodman

unread,
Apr 21, 2013, 11:27:35 AM4/21/13
to sha...@googlegroups.com

Just wanted to voice my +1 to all of Vincent's ideas.

Joseph Gentle

unread,
Apr 21, 2013, 12:52:02 PM4/21/13
to sha...@googlegroups.com
On Sun, Apr 21, 2013 at 6:46 AM, Vincent Woo <aka...@gmail.com> wrote:
> Seconding general meta-data as very important for me, as well. Right now,
> detecting disconnects is a huge pain, as well as hacky support for cursors.

Detecting when you disconnect, or when other users disconnect?

> Also, some simpler helper operations like "blow away all the text in this
> text object and replace it with this" would be nice. Right now I have to
> reimplement everything in terms of submitOp.

Can you give me a list of the other simple operations you'd like to
see? The one you mentioned could be implemented as:
doc.del(0, doc.getText().length);
doc.insert(0, "oh hi");

But are there any others?

The new version can also take optional initial data in create() take
initial document data.

> It would be nice to see all json subdocs be castable to any other type.
> right now, i can't make, say, a "text2" doc out of a json's
> .at('sometextfield').

Hm. This wouldn't be too hard to implement because of the new changes
to create & delete, though it'd make the JSON type kind of godlike.
Interesting!

Geoff Goodman

unread,
Apr 21, 2013, 4:04:28 PM4/21/13
to sha...@googlegroups.com

Godlike is good!

David Greisen

unread,
May 13, 2013, 2:51:02 PM5/13/13
to sharejs
I've been thinking on this some more. In addition to the whitelisted paths, it would incredibly useful if you could open just a part of a json document. This would be useful for sharing. If i have the following json document:

{ "private": [...], "shared": { "jgentle": [...], wmertens: [...] }

it would be very nice if I could write an auth function that would let Joseph open shared.jgentle as a document, while letting Wout open shared.wmertens, without giving either any access to the rest of the document.

Soroush Hat

unread,
May 13, 2013, 6:16:16 PM5/13/13
to sha...@googlegroups.com
This is very interesting, I've tried to do this on a "text" type document before. The idea was that in a code editor, we allow the users to lock some sections in their codes so others can't edit those portions.
Reply all
Reply to author
Forward
0 new messages