Advice concerning Elixir/OTP for Distributed Cron system

222 views
Skip to first unread message

Dan Carpenter

unread,
May 25, 2016, 5:40:16 PM5/25/16
to elixir-lang-talk

So I am curious about using elixir to build a distributed cron system. Our platform runs user defined “flows” from a variety of IoT devices/services (think Nest, SmartThings, Lifx, Fitbit, etc) as well as digital services (Twitter, Facebook, etc). Here is an example: When it’s 9am or I turn my car on at the house, dim my lights, turn down the thermostat only if the outside temperature is below 60 otherwise leave thermostat at current level. 


We need to keep track of these time sensitive “jobs” and we do so by having our data router send the cron job to a "scheduler" node when one of our brokers (each integration  e.g. Nest, Facebook, etc has its own broker or group of brokers) sends a new request in. This scheduler node schedules it using node-crontab (basically a snapshot of the data it sends to the processing engine). We currently have thousands of jobs per node in memory, unfortunately when a node goes down so do its jobs. When the cron job needs to run the payload in memory is sent for processing and execution. We are trying to think through a way to have another node take over a failed nodes job (first thinking through how a node or group of scheduler nodes get notified that this has happened and to which node.) without having to check all the defined jobs in the central k/v store (redis) and take all the jobs belonging to the failed node (when jobs come in part of the job key value is the hash of the scheduler node it was sent to).


A couple of  people are leaning towards a zookeeper master / slave system for this to solve notifications but we are still faced with how to quickly have another node take over a failed nodes jobs. Anyways, I and another person have deployed elixir for a few semi-critical services but nothing like what we might need to build a cron system like this. Erlang/OTP/Elixir seem a perfect fit perhaps with each node writing keeping jobs a local Agent while also writing it to mnesia or perhaps for a cluster of nodes keeping track of jobs on mnesia and when a node goes down another one can take over the jobs for that node by grabbing them from mnesia and writing to it’s cron. I don’t have much experience with these types of systems but it seems like a natural fit for Elixir and the OTP model.  Any advice or guidance would be very welcome. 


For an example here is something kind of like what we want in golang http://dkron.io/ 

but as far as I can tell without robust failover mechanisms. 


Thanks,


Dan 

Ed W

unread,
May 26, 2016, 7:38:05 AM5/26/16
to elixir-l...@googlegroups.com
On 25/05/2016 22:40, Dan Carpenter wrote:

So I am curious about using elixir to build a distributed cron system.

...


We currently have thousands of jobs per node in memory, unfortunately when a node goes down so do its jobs. When the cron job needs to run the payload in memory is sent for processing and execution. We are trying to think through a way to have another node take over a failed nodes job


A couple of  people are leaning towards a zookeeper master / slave system for this to solve notifications but we are still faced with how to quickly have another node take over a failed nodes jobs.


So, the use case gets in the way a little here and makes it sound more complex. Yet actually the underlying problem is both incredibly complex and subtle and hard to get right also...

So, competing goals:
- Single point of truth, because you don't want two nodes running the same job (ie we want implement job at least once, preferably at most once (second can't be guaranteed, hence make jobs idempotent where possible))
- Distributed truth, in case our single point of knowledge dies...

So, you need a distributed log, with ordered events

- Cheat way is to slap redis in and hope no one notices the single point of failure

- Better way is to build a distributed consensus service (hard, incredibly easy to get the theoretical algorithm right, but in practice it breaks in the real world)

- Alternative way is to grab someone's debugged system, eg zookeeper


Now, grab some papers on Zookeeper, Paxos and Raft. Fascinating stuff, and provably these algorithms allow you to create a distributed consensus on some "truth".  However, there are several problems:

1) Its expensive to do this for every decision...

2) It's extremely complicated to get the implementation correct...

One way that seems common to solve 1) is to use the consensus ONLY to elect a single point of truth, then stick with that single point until "something happens" (tm) and then elect a new single point of truth

Incremental improvement, to avoid the "single point" becoming a bottleneck you can use some consistent hash (insert favourite technique here) to split the load across multiple processes, however, this is really just the same thing (it's like saying that server over there does everything starting with "a" and over there does "b", etc - really it's no different to saying "that server does everything", it's just that you partitioned the problem so "everything" is a smaller scope)


So many people use Zookeeper (et al) as a way of electing a leader (single point of failure) for all (or a partition of all) decisions and then that leader ensures decisions are serialised, acknowledge appropriately.

There are lots of interesting ways to partition the load, eg "rings" are quite popular at the moment, eg Riak.


Of course, if that all sounds complex... I guess it is. Most people cheat and claim all kinds of perfect, but if you really, really care about stuff working in the presence of failure, then you HAVE to do something like the above. Anything else will work "most of the time", which may be good enough (?), but at least understand why being properly robust is different and hard...

This link should be fascinating if you want to really see if you are safe in the face of failure!
    https://aphyr.com/tags/Jepsen

Good luck!

Ed W

Ed W

unread,
May 26, 2016, 7:45:37 AM5/26/16
to elixir-l...@googlegroups.com

A couple of  people are leaning towards a zookeeper master / slave system for this to solve notifications but we are still faced with how to quickly have another node take over a failed nodes jobs.


Practical quick suggestions.

- RIAK has a mode where it can operate in "highly consistent" mode. This is a 2 phase commit which should give you guaranteed consistency (probably you can use the ordinary mode if you are careful)
- Kafka is a very fast messaging/log service with ordered commits
- You can use zookeeper to implement something (probably just elect leaders and run your own database, re-elect if something falls over)
- There are a few Raft and the odd Paxos implementation in Erlang/Elixir. This would be extremely interesting to see fixed up and develop an Erlang Zookeeper alike...
- Possibly PhoenixPresence is robust enough for your needs. I believe you would currently need to look carefully at it's partition tolerance? I think it has none... It keeps running if nodes become partitioned and I believe the research is on using CRDTs to achieve eventual consistency (when the split is healed). I guess you could look at adding features to enforce certain levels of consistency over the top of that as your needs require?

Good luck

Ed W

Dan Carpenter

unread,
May 26, 2016, 12:38:32 PM5/26/16
to elixir-lang-talk
Ed,

Thanks for all of the suggestions. Looks like some more reading is in order.

Adam Keating

unread,
May 28, 2016, 10:28:09 PM5/28/16
to elixir-lang-talk
Hi Dan, 
I watched Chris McCord's presentation at ElixirConfEU this morning. There are links to the slides and video here http://www.elixirconf.eu/elixirconf2016/chris-mccord. The relevant section you might be interested in is replication of state (presence) via CRDTs. Perhaps all nodes should get details of all jobs, with some rule for grabbing jobs from failed nodes. Of course the node might come back from a network partition and reacquire jobs.. An interesting problem indeed, but likely a solved one. Good luck.
Adam
Reply all
Reply to author
Forward
0 new messages