[ANN] Schism, a set of CRDTs for Clojure and ClojureScript

265 views
Skip to first unread message

Alex Redington

unread,
Apr 18, 2018, 9:05:50 PM4/18/18
to Clojure
Good evening!

I submit for your evaluation and reasoned feedback a library I've been working on to provide a set of convergent replicated data types to Clojure and ClojureScript:

https://github.com/aredington/schism
[com.holychao/schism "0.1.0"]

Schism provides convergent collections for sets, vectors, lists, and maps, along with edn readers and writers out of the gate. My intent is to provide you with all the tools necessary to replicate a collection across two or more computing nodes.

Replication is a hard problem, so there are a few caveats to these tools as I provide them:

- You must identify nodes within your cluster. You may choose to identify nodes with a random UUID, in which case schism will help you to do so. Node identifiers must be clojure serializable data.
- Schism purposefully avoids carrying around a monotonically increasing historical collection of data about deleted entries. Consequently there are some ambiguities during convergence that may not exactly mirror local modification.
- Schism is only solving the problem of synchronizing two in memory sets. Maintaining identity of those two sets, tracking state changes, and long term durability are responsibilities left to the schism user.

Schism collections are persistent collections, so you should feel free to work with them as a drop in replacement for a function which would work against a Clojure collection. The usual utilities such as conj, assoc, dissoc, disj, and rest are pure functions which will return you derived copies, implicitly soing the convergence bookkeeping necessary in the background. As you work with it, schism will maintain node and timestamp information, with the goal of convergence providing the same result as if all previous invocations had occurred on one local collection in memory. Schism's requirements are your expectations of a Clojure collection, so hash, =, and support for meta are all included, as well as many other functions defined against Clojure's own collections. Particularly with hash and =, you can expect a schism collection to return the same hashcode as its Clojure equivalent, and to determine equality in the same way as its Clojure equivalent would.

I've built schism in the absence of a direct itch to scratch with it. Should you find that it betrays your expectations of Clojure collections, I'd greatly appreciate a bug report and will work to correct schism quickly.

I hope that it assists you in solving your own problems.

-Alex

John Newman

unread,
Apr 18, 2018, 9:39:45 PM4/18/18
to Clojure
Wow, this is really cool!

Could you describe how one might go about using this to implement, say, a crdt chat room using maps and sets? Can I for instance just push crdts between clients in a serverless fashion?

Perhaps some examples would be instructive, but I've been hoping for something just like this recently, so thanks!

V/r

John

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Luke Burton

unread,
Apr 18, 2018, 9:55:36 PM4/18/18
to Clojure

This is cool! 

There seems to be some room for a recommendation on how to apply these data structures effectively in practice, especially around the distribution mechanism. For instance, I can imagine storing these in an atom and using watchers to push changes out over zero-mq. Or is that the Worst Idea Ever™? I feel like the transit layer probably has *some* impact on whether CRDTs are the right solution to the problem, and I'm interested to know what those boundaries are.

Have you used them in production over the network, and if so, what solution did you use?

Thanks again!



Alex Redington

unread,
Apr 19, 2018, 8:55:46 AM4/19/18
to Clojure
I'll try to answer this question and John's at the same time.

Schism does not try to manage state for you over time; there are a lot of good tools for doing that already (atoms, refs, agents, channels, etc). Schism is a set of collections that augment basic clojure collections with enough additional information that you can take two persistent collections with common ancestry and merge them together. So, you could take a schism map held in an atom, dereference it and send it from a pedestal process to an om application as edn, and read-string it in the om app to have a replicated copy of the map. The om app could make some changes (dissoc one key, update another) without communicating to the pedestal app for each operation. Then send it back to the pedestal app in a POST body, and the pedestal app would swap! the atom using schism.core/converge and the value it just received from om. The changes, both additive and destructive, would be replicated in the atom's value.

So, yes, you could build a serverless chatroom where each client held an atom with a schism collection, and they sent their copies around using WebRTC to stay in sync. You could push a schism edn serialization over a message queue to be synchronized on the other end. I anticipate pushing each discrete state over the mq with an add-watch hook might yield bad performance for little semantic gain. Schism collections are necessarily more expensive than Clojure collections in serialization size, and synchronization is where all of the signficant computational expense of these kinds of data structures resides, so doing it less frequently than every discrete update will probably be best. (A debounced send & sync that guaranteed transmission after n milliseconds of being inert would make sense to me)

I'm not presently confronting a problem that these solve in my day job; my intent was to build and have the tool ready if and when we wanted to have better answers for "offline sync" in a single page ClojureScript application.

Thank you for the great questions!

-Alex

John Newman

unread,
Apr 19, 2018, 1:46:41 PM4/19/18
to Clojure
Alex, yeah that explains it for me. I'll probably want to use a fully connected address space, via a mesh or a hub and spoke topology, that I fully manage - so this is the perfect level of abstraction for me. I'd like to see that ZQ implementation though.

So, to be clear, if node A sends schism-map M to node B, both A and B update M, B sends M back to A, A converges M1 and M2: If M1 and M2 have destructive conflicts, does the most "recent" change win? Or does the local copy win?

Thanks,

V/r

John

John


For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscribe@googlegroups.com.

Alex Redington

unread,
Apr 19, 2018, 7:19:20 PM4/19/18
to Clojure
Another good question, and the answer is "it depends". As a guiding principle for the expectations of this abstraction, convergence should yield a collection as if it had all of the operations applied to it in the order that they were applied, locally. So let's work with M, and nodes A and B.

A creates M as a simple map {:foo true} and replicates it to B. A has M ({:foo true}), and B has M' ({:foo true}).
A performs (assoc M :bar false}. A has M ({:foo true, :bar false}) and B has M' ({:foo true}).
B performs (dissoc M' :foo). A has M ({:foo true, :bar false}) and B has M' ({}).
B sends M' to A, and A synchronizes it. A has M ({:bar false}). B has M' ({}).
A sends M to B, and B synchronizes it. A has M ({:bar false}). B has M' ({:bar false}).

That's a simple case because A and B are modifying different keys and converge to the same value without contention over those keys. But what if A had done (assoc M :foo false) instead? There's still a synchronization conflict between the two, but the key :foo has been put in contention between A and B. In schism, this is resolved by the last writer winning, because that is what would happen on an isolated node working with a data structure locally.
Reply all
Reply to author
Forward
0 new messages