How to implement a distributed and concurrent system in Clojure?

2,504 views
Skip to first unread message

Hussein B.

unread,
Jul 3, 2013, 5:26:53 AM7/3/13
to clo...@googlegroups.com
Hi,

I read recently on the internet that Clojure concurrency tools make it easy to implement a highly concurrent system but on a single machine.

But how to implement a highly concurrent system that runs on a multiple machines?

Erlang, Elixir and Scala have the Actors model.

Please correct me if I'm wrong.

Thanks for help and time.

Joseph Guhlin

unread,
Jul 3, 2013, 9:27:08 AM7/3/13
to clo...@googlegroups.com
I'd like to hear others opinions on this too. I don't believe Clojure has anything built in at this point. My plan of action (not yet implemented) is to use gearman(possibly java, but it seems that it is no longer updated) and zeroconf for clusters (just for automatic master determination).

I know there is support for Hadoop in Clojure as well, which does not fit my needs but may fit your needs. A quick google search will get you started.

Immutant has support for clustering too, but I believe it requires leiningen to start, where I need to compile everything into a single jar.  Immutant under the hood is using JBoss' message queue, so that may be an option to explore as well.

I'm curious what others are doing.

Best.,
--Joseph

Softaddicts

unread,
Jul 3, 2013, 10:18:32 AM7/3/13
to clo...@googlegroups.com
clj-zookeeper + avout. We run our solution on clusters of small nodes, we needed
a lightweight solution. We implemented cluster queues and use avout locking.
Our configuration is also stored in zookeeper as clojure expressions.

We isolated this in a coordinator module so nothing spills out in the rest of the code.
We could swap this for another alternative with minimal headaches but so far it scales
pretty well.

For inter cluster exchanges we are using zeromq but we do not need the same
tight intra cluster integration.

Luc P.
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
--
Softaddicts<lprefo...@softaddicts.ca> sent by ibisMail from my ipad!

Philippe Guillebert

unread,
Jul 4, 2013, 4:21:03 AM7/4/13
to clo...@googlegroups.com

Hi

I'm surprised nobody mentioned storm yet. It's massively scalable real-time computing, using clojure.

David Pollak

unread,
Jul 4, 2013, 5:39:46 AM7/4/13
to clo...@googlegroups.com
Please keep in mind that Scala's "Actor Model" is a very thin piece of code that is not inherently distributed.

There are a ton of issues in Scala related to crossing address spaces.

Scala is not nearly as biased to immutability as Clojure. Sure, there are case classes, but case classes can easily contain mutable data. When I wrote Goat Rodeo (http://goatrodeo.org), I wrote a compiler plugin that guaranteed immutability and serializability of the data structures used for Goat Rodeo's Actor-based messaging... and this lead to the second issue...

Having a class-based design means that one has to deal with serializing/deserializing class-based data structures. This is a huge problem. It means that the inter-process data structures must contain class signatures... and for distributed systems that are going to have 100% uptime, that means version and class signatures so that a message sent from a version 1 system can be deserialized on a version 2 system even if the class has changed.

Long story short... Akka, the only popular distributed system in Scala, is marginally better than RMI/J/EE, so for enterprise java shops, it's great (much like Spring was.) But it's not something to aspire to.

If I get some time, I'll work on a distributed version of core.async. The only real challenge I can see is marshalling a Channel identifier across address spaces. Everything else should be a walk in the park.






--

Luca Antiga

unread,
Jul 4, 2013, 3:42:54 PM7/4/13
to clo...@googlegroups.com
Not that I'd recommend using it in production, but I experimented with distributed reference types on top of Redis some time ago: https://github.com/lantiga/exoref

Several aspects are very rough e.g when the connection to Redis is lost or when it comes to reusing a key. 
Overall it's probably naive, but it may be a starting point.

Luca

09goral

unread,
Jul 18, 2015, 6:52:20 PM7/18/15
to clo...@googlegroups.com
David,

Have your opinion on Akka changed since 2013 now that you have seen its progress ? I am very interested in your opinion. 

Regards,
Mateusz

Stuart Sierra

unread,
Jul 19, 2015, 11:13:50 AM7/19/15
to clo...@googlegroups.com, hubag...@gmail.com
This is an old thread, but it showed up in my Google Groups so I figured I would give an answer.

I have worked on fairly large (10-50 machines) distributed systems written entirely in Clojure.

The language itself doesn't provide an explicit mechanism for communication between machines, so you just have to pick an existing tool or technique that suits your application architecture. There are many possibilities to choose from, all with different strengths and weaknesses:

1. Direct connections between nodes, e.g. HTTP, raw sockets, HornetQ
2. Message broker, e.g. ActiveMQ, RabbitMQ
3. Distributed shared memory, e.g. ZooKeeper, Hazelcast, memcached
4. Distributed job control, e.g. Hadoop, Storm

You can end up implementing something that looks very much like the Actor Model using these components. But you have other options as well.

Where Clojure helps you in designing these systems is its focus on generic, immutable data structures and pure functions.

Clojure's data structures are trivially serializable (EDN, Transit, Fressian). When all the data in your application can be represented by Clojure's generic data structures, it is easy to distribute work or data across multiple machines.

When functions are stateless, it is less important "where" they are executed. When functions (or declarative expressions, e.g. database queries) can be expressed as data structures, they are easier to compose and distribute.

–S

09goral .

unread,
Jul 19, 2015, 11:40:53 AM7/19/15
to clo...@googlegroups.com
Thanks Stuart for your answer, it is very helpfull. Would you choose Clojure again ?

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/4GRJVxctlU0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Pozdrawiam,
Mateusz Górski

Stuart Sierra

unread,
Jul 19, 2015, 11:54:56 AM7/19/15
to clo...@googlegroups.com, gor...@gmail.com
Absolutely would use again. But I'm biased towards Clojure already. :)
–S


On Sunday, July 19, 2015 09goral wrote:
Thanks Stuart for your answer, it is very helpfull. Would you choose Clojure again ?
2015-07-19 17:13 GMT+02:00 Stuart Sierra:
This is an old thread, but it showed up in my Google Groups so I figured I would give an answer.

I have worked on fairly large (10-50 machines) distributed systems written entirely in Clojure....

Colin Yates

unread,
Jul 19, 2015, 12:24:43 PM7/19/15
to clo...@googlegroups.com
I don’t have anything to add at that scale, but I wanted to echo Stuart’s comment about the serialisability of EDN. Moving EDN between the front and back tiers  on our app has cut down a bunch of boilerplate. That principle can scale across machines as well.

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.

Devin Walters

unread,
Jul 19, 2015, 12:38:09 PM7/19/15
to clo...@googlegroups.com
http://docs.paralleluniverse.co/pulsar/ is out there. I can't say I've used it in anger, but I did enjoy experimenting with it.

Christopher Small

unread,
Jul 19, 2015, 1:11:24 PM7/19/15
to clo...@googlegroups.com

I'll also add that if you're interested in the Storm model (distributed stream processing), you may want to check out Onyx (https://github.com/onyx-platform/onyx). It's newer, but I have a feeling that moving forward we're going to see it take a dominant position as far as that flavor of distributed computing goes. The reasons for this (briefly):

* Data all the topologies: Because computational topologies are data structures, it's far simpler to have more dynamic computational workflows than with storms opaque macros.
* It's more Clojure centric: Parts of Storm are Clojure, but the Clojure API seems almost an afterthought, reading the documentation. (However, support is on the way for other JVM languages).
* Pace of development: Storm is now an Apache Incubator project, and development is slower than molasses. I was working on a project for over a year and many times encountered issues with stale dependencies that didn't play nicely with newer things.
* Designed with batch processing in mind: Even though both are designed around streaming first, Onyx has a little more explicit support for batched computations.

For a while, the biggest advantage of Storm was that it was a lot faster, but the gap there has closed considerably recently (perhaps entirely?). Storm is still more battle tested, but there are folks starting to use it in production.

The biggest reason I see Onyx taking sway over Storm is the data centric approach. I see this resonating with the community's "data all the things" philosophy as a way of maximizing composability and productivity. Anecdotally, folks using Onyx are already saying this about it.

My 2 c

Chris

Derek Troy-West

unread,
Jul 19, 2015, 8:52:44 PM7/19/15
to clo...@googlegroups.com
We use Storm/Trident fairly extensively for distributed computation, it has not been painless and the documentation is poor, however it does perform well once you understand its peculiarities. I'm keeping an interested eye on Onyx but my bandwidth is fairly limited at the moment.

The only things I would add to your post are:

* It's possible to write Storm and Trident topologies purely in Clojure, in fact parts of Storm were originally written in Clojure. I'm not sure the DSL were an afterthought, but I agree they are fairly impenetrable at first. Storm has a large learning curve and lots of quirks, and the documentation is pretty bad (in-fact, completely wrong in parts).

* Storm is currently limited to Clojure 1.5.1, thought the latest beta release has been updated to 1.6.0

* Recently the momentum of the project appears to have picked up, but certainly for a while there (just after apache incubation) it appeared a bit stagnant. I presume they were busy settling the project in.

Matan

unread,
Jun 30, 2017, 3:53:58 AM6/30/17
to Clojure
This thread is a bit old, but it's seen two year awakenings before... and still shows up high in google searches.
I wonder how you folks see it now. Onyx is a bit older by now, and not much seems new out there. What would you choose for reliable and resilient distributed computing with clojure, if you had to start over at this time? in particular, for a message driven architecture not a streaming one.

Christopher Small

unread,
Jun 30, 2017, 12:15:58 PM6/30/17
to clo...@googlegroups.com
I'd have to know a little bit more about the specifics of your message driven architecture to say for sure, but in general I'd still highly recommend Onyx over Storm. In the last two years the Distributed Masonry crew have done a TON of fabulous work with Onyx, including closing the performance gap (maybe someone can chime in here about the current sate of that gap) by switching to Aeron for message passing (the old version used HornetMQ I believe). It's also significantly more battle tested now, and the team just recently got $500k for building out tooling and services around Onyx. There are also quite a few new features, like improved agregation support, and so on. They also implemented a runtime for onyx jobs in cljc, meaning you can test out jobs locally without all the overhead of firing up zookeeper! EVEN IN THE BROWSER! Imagine that; dynamically building and testing (with small data) a distributed system in the browser via some domain-specific UI, and then pushing it to the cluster when you're ready to let her rip... Nothing like this exists for Storm or any of the other competitors in this space. I've even heard some dashing and enterprizing group of fellows are considering building a re-frame-esque event-driven UI system using Onyx. Who knows? The future is bright.

Chris

PS; Conflict of Interest Disclaimer: I have recieved no moneys or favors from Distributed Masonry for frequent and prefuse recommendation of their products. They're just that cool.



For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/4GRJVxctlU0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+unsubscribe@googlegroups.com.

Bobby Calderwood

unread,
Jun 30, 2017, 7:38:08 PM6/30/17
to Clojure
Onyx is super cool, has matured substantially, and has a great team behind it.

I've also had success building with Kafka (and its ecosystem libraries Kafka Connect and Kafka Streams) and Datomic.

Cheers,
Bobby

Derek Troy-West

unread,
Jul 1, 2017, 1:25:21 AM7/1/17
to Clojure
I still have Storm topologies in prod, but I'm investigating Kafka Streams and Onyx right now.

Christopher Small

unread,
Jul 1, 2017, 3:46:13 AM7/1/17
to clo...@googlegroups.com
There's an onyx-kafka plugin I believe, so you should be in luck!

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/4GRJVxctlU0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages