Thanks a lot for all the replies. I thought I'd outline a bit more in detail what we do, since I'd love to hear about more efficient approaches, and would be very happy if anything we do is useful to others. (All code at
https://github.com/chili-epfl/FROG).
So the basic idea is that we want to have a plugin system for rich learning activities - these could be generic, like watching a video, editing a text/form, doing a quiz etc, but also very specific, like a physics simulation, etc. The key is that they should be configurable, accept input data, stream learning analytics, and produce output data. We also have operators which are basically functions (with configuration) which transform data. The end result is some kind of very specialized graphical programming language for collaborative learning scenarios.
The idea is to make these activity plugins as easy to write as possible, abstracting a lot of the "hard work". We use Meteor, and ShareDB, but activities are separate NPM packages, which only need to import React. They provide a configuration object, and we use react-jsonschema-form to render the config in the editor, and store the configuration. The activity package also defines the initial shape of it's data collection, as well as a merge function which will receive the input data from previous operators/activities and merge this into its own data collection.
The way the graph works, is that the bottom line denotes individual activities, the middle-line group activities, and the top line "whole class" activities. Depending on where the activity is located, we generate one shareDB document per individual, group or one for the whole class. The document is initiated with the "shape" requested by the activity (for example {}), and then the merge function is run, if there is any incoming data.
This gives activity authors an enormous amount of flexibility, while still making it easy to do individual and collaborative activities. Below is an almost complete example of a minimal group chat, with some configuration:
```js
const meta = {
name: 'Text Component'
};
const config = {
type: 'object',
properties: {
title: {
type: 'string',
title: 'Title'
}
}
};
const dataStructure = [];
const ActivityRunner = ({ config, data, dataFn, logger, userInfo }) =>
<div>
<h1>
{config.title}
</h1>
<ul>{data.map(x => <li key={
x.id}>{x.user}: {x.msg}</li>)}</ul>
<TextInput
callbackFn={e => {
dataFn.listAppend({ msg: e, user: userInfo.name, id: uuid() });
logger({ chat: e });
}}
/>
</div>
export default {id: 'ac-chat', meta, config, dataStructure, ActivityRunner}
```
So far we have mainly been using json0, however we just added support for text, with a simple component called <ReactiveText type='textarea|textinput' path=['text'] dataFn={dataFn} />. This creates a text area/text input, storing the text in the path given in the document given to the activity runner. We are also planning to add rich text, code editing (syntax highlighting/live preview etc), and as the result of a previous project, we have complex forms with collab editing (
https://github.com/chili-epfl/collab-react-components).
Part of this research project is to look at how we can provide sophisticated feedback to the teacher, and analytics for researchers, if we control more of the stack (instead of using Google Docs etc). We have two students currently working on analytics of collaborative writing behaviour, based on data from either ShareDB or Etherpad (first step is to create a unified representation from those two data sources). Ideas we are exploring: highlight areas of the text that are the "oldest" / "most vs least edited", infer collaborative behaviour (I write one paragraph while you write another, vs you write a paragraph and I add information, vs you write a paragraph and I make a lot of changes, etc). Can these kinds of higher-level information predict collaborative success?
---
Anyway, so the pattern right now is that we have a number of Meteor servers (all connected to the same Mongo server), which import the ShareDB server, connecting to a separate Mongo instance for ShareDB, as well as a Redis server. In development, the front-end client connects to two websockets on the Meteor server, one for the ShareDB and one for the Meteor stuff.
In production, we only use the Meteor SharedB instance for back-end processing, and front-end clients connect to four ShareDB servers, which are user-facing (just a few lines of Node importing and starting the server
https://github.com/chili-epfl/FROG/blob/unil/sharedb/index.js), connected to the same Redis and Mongo DBs, using Nginx for ip-hash based load balancing and SSL reverse proxying.
Students move in lock-step through the script, so when a teacher transitions to a new activity set, one or several activities need to be initiated - the data needed is calculated, and then the necessary set of ShareDB documents are initialized from the server-side. This could be many hundred documents, and this is not quite as fast as we'd like, even though it's sufferable for now. Initially we did it very naively with first subscribing to each document, and then on('load'), creating the document, and not even doing it in parallel. Currently we directly get, and then create documents, without waiting for load, because we know they don't already exist, and try to use Futures to get parallelism, which makes it much better.
In most workloads, only 4-5 students are connected to a single document. In the future there will be more collaborative editing, but currently it's a lot of adding list items or objects (we are experimenting with objects of the form { 'uuid': { id: 'uuid', chatmsg: 'hi', etc}}, which both gives us easy editing/deleting without worrying about list index, but also makes it easy to Object.values(x).map(<li key=
x.id>) etc. Thus the conflicting edits should be minimal.
Once an activity concludes, we need to gather all the data from all the instances, and write to the database / use for future activities. Here we also used to do it incredibly inefficiently (because with a few test users on localhost it was never a problem), but after switching to doing a simple fetchquery, - where we actually do a regexp on the document ID (not sure this is a good idea, any way of attaching metadata to a document that doesn't become part of doc.data?), because we generate IDs like activityId/instanceId, and we want all instances for a given activityId - this is superfast.
My concern was more around number of users/websocket connections to a given instance. We had many issues with 200 users on a single node, and because we do experiments in large classrooms with up to 300 students connecting at the same time, it's not so easy to experiment and narrow it down. Therefore we then went with 5 meteor instances and 4 separate sharedb instances - and this worked well. It might well be that we need much less though - maybe Meteor is much more demanding than ShareDB, because the consensus in the Meteor community seems to be around 50 active clients for a 1CPU small VM...
... which seems very little to me. During my PhD, I wrote collaborative MOOC software with Elixir/Phoenix, which had thousands of web sockets with never more than 3% server load. Back then I actually tried rewriting the server part of (then sharejs) into Elixir, thinking I could make it compatible with the ShareJS frontend client, and make it much easier to integrate into my Elixir program, and scale well (I really didn't like Node back then). I actually made a bunch of the JS tests work on this server, but never completed it. Sometimes still dream about completing this and launching "ShareDB as a service" :)
https://github.com/houshuang/ot_text/blob/master/lib/text.ex
Anyway, thanks for listening. If you have any feedback on the architecture, it would be welcome. Right now I'm trying with 4 Digital Ocean droplets, all four running the ShareDB server, with one of them also running the Redis and MongoDB - don't know if these should be separate... I might try in the future to see if I can get buy with fewer servers. In the future, we might also do experiments in MOOCs, so we could expect thousands of people connecting at the same time, although not editing the same documents. However, since we have many different group structures (for example, you might be assigned a role as a mayor, and be in group 55, and first meet with all the mayors to discuss how to respond to an earthquake, and then go to group 55, to discuss with a property developer, an environmental activity, and a business owner - these would be two shareDB documents with different subscribers), it would be very difficult to shard based on collection ID... so I first need to know roughly how many users a single instance can support, and then how many instances a single Redis/Mongo server can mediate between, and at what point things begin to break down.
Also very happy to discuss with anyone who is doing research around collaboration, collaborative editing etc.
thanks to all the ShareDB people for an amazing tool!
Stian