We've just started digging into Chicago Boss and loving it so far.
From my understanding reading the docs and code CB is using global
modules to achieve "clustring" which means that one master_node is a
potential single point of failure. My first idea was to change this
behaviour from a normal gen_server to gen_leader. Since I've just done
a couple of projects with gen_leader I have fairly good insights into
it. But after spending a good 30min converting I felt that it might be
the wrong way to go. Here's a modified session controller that handles
creation of sessions on all nodes in a cluster:
https://github.com/bipthelin/ChicagoBoss/commit/48eded3c7a03166aac45f...
This only work for new sessions and dont't remove/update existing
sessions and before I code anything more I'd like to discuss some
design decisions since I don't (yet) have any deep knowledge of the
architecture.
My gut feeling is that there should be a more generic approach to
clustring instead of doing it specifically at every place. Some
gen_leader here, some mnesia there, etc.
If anyone with more insights in the code (Evan?) can provide some of
their ideas/visions of clustring, etc for CB I'd be glad to chip in my
0.5c and the corresponding code.
I've never used gen_leader so I can't say whether it's a good fit. In general I think this is a hard problem because for true clustering we'll need replication of data (not just services). I think you've discovered the difficulty with session storage, but then we also need to worry about making BossMQ (the message queue) truly distributed and fault-tolerant. That sounds hard to me.
A more practical approach might be to clusterize the master_node pure-computation services (i.e. incoming email), then for data services just interface to external applications that have already implemented fault-toleration. This is done for sessions (which can use memcached), and could be done for the message queue as well. As much as I like CB's batteries-included approach I think it's best to farm out hard problems like data replication to other servers.
On Mon, Jan 9, 2012 at 12:38 PM, Bip Thelin <bip.the...@evolope.se> wrote: > Hi,
> We've just started digging into Chicago Boss and loving it so far. > From my understanding reading the docs and code CB is using global > modules to achieve "clustring" which means that one master_node is a > potential single point of failure. My first idea was to change this > behaviour from a normal gen_server to gen_leader. Since I've just done > a couple of projects with gen_leader I have fairly good insights into > it. But after spending a good 30min converting I felt that it might be > the wrong way to go. Here's a modified session controller that handles > creation of sessions on all nodes in a cluster: > https://github.com/bipthelin/ChicagoBoss/commit/48eded3c7a03166aac45f...
> This only work for new sessions and dont't remove/update existing > sessions and before I code anything more I'd like to discuss some > design decisions since I don't (yet) have any deep knowledge of the > architecture.
> My gut feeling is that there should be a more generic approach to > clustring instead of doing it specifically at every place. Some > gen_leader here, some mnesia there, etc.
> If anyone with more insights in the code (Evan?) can provide some of > their ideas/visions of clustring, etc for CB I'd be glad to chip in my > 0.5c and the corresponding code.
> I've never used gen_leader so I can't say whether it's a good fit. In > general I think this is a hard problem because for true clustering > we'll need replication of data (not just services). I think you've > discovered the difficulty with session storage, but then we also need > to worry about making BossMQ (the message queue) truly distributed and > fault-tolerant. That sounds hard to me.
> A more practical approach might be to clusterize the master_node > pure-computation services (i.e. incoming email), then for data > services just interface to external applications that have already > implemented fault-toleration. This is done for sessions (which can use > memcached), and could be done for the message queue as well. As much > as I like CB's batteries-included approach I think it's best to farm > out hard problems like data replication to other servers.
> Evan
> On Mon, Jan 9, 2012 at 12:38 PM, Bip Thelin <bip.the...@evolope.se> wrote: >> Hi,
>> We've just started digging into Chicago Boss and loving it so far. >> From my understanding reading the docs and code CB is using global >> modules to achieve "clustring" which means that one master_node is a >> potential single point of failure. My first idea was to change this >> behaviour from a normal gen_server to gen_leader. Since I've just done >> a couple of projects with gen_leader I have fairly good insights into >> it. But after spending a good 30min converting I felt that it might be >> the wrong way to go. Here's a modified session controller that handles >> creation of sessions on all nodes in a cluster: >> https://github.com/bipthelin/ChicagoBoss/commit/48eded3c7a03166aac45f...
>> This only work for new sessions and dont't remove/update existing >> sessions and before I code anything more I'd like to discuss some >> design decisions since I don't (yet) have any deep knowledge of the >> architecture.
>> My gut feeling is that there should be a more generic approach to >> clustring instead of doing it specifically at every place. Some >> gen_leader here, some mnesia there, etc.
>> If anyone with more insights in the code (Evan?) can provide some of >> their ideas/visions of clustring, etc for CB I'd be glad to chip in my >> 0.5c and the corresponding code.
Incidentally, "Prehistoric Boss" (circa 2008) used CouchDB exclusively. But I kept getting weird errors and gave up on my NoSQL/Erlang dreams until I discovered Tyrant the next year. Ah, memories.
Anyways, Couch is much more stable now and it'd be great to add it to the mix.
On Mon, Jan 9, 2012 at 8:26 PM, Dave Cottlehuber <d...@muse.net.nz> wrote: > I'd love to add a couchdb backend for Boss and then point it at a > bigcouch cluster :-)))
> The first part I hopefully will have time for in Feb.
> Erlang FTW.
> On 10 January 2012 02:53, Evan Miller <emmil...@gmail.com> wrote: >> I've never used gen_leader so I can't say whether it's a good fit. In >> general I think this is a hard problem because for true clustering >> we'll need replication of data (not just services). I think you've >> discovered the difficulty with session storage, but then we also need >> to worry about making BossMQ (the message queue) truly distributed and >> fault-tolerant. That sounds hard to me.
>> A more practical approach might be to clusterize the master_node >> pure-computation services (i.e. incoming email), then for data >> services just interface to external applications that have already >> implemented fault-toleration. This is done for sessions (which can use >> memcached), and could be done for the message queue as well. As much >> as I like CB's batteries-included approach I think it's best to farm >> out hard problems like data replication to other servers.
>> Evan
>> On Mon, Jan 9, 2012 at 12:38 PM, Bip Thelin <bip.the...@evolope.se> wrote: >>> Hi,
>>> We've just started digging into Chicago Boss and loving it so far. >>> From my understanding reading the docs and code CB is using global >>> modules to achieve "clustring" which means that one master_node is a >>> potential single point of failure. My first idea was to change this >>> behaviour from a normal gen_server to gen_leader. Since I've just done >>> a couple of projects with gen_leader I have fairly good insights into >>> it. But after spending a good 30min converting I felt that it might be >>> the wrong way to go. Here's a modified session controller that handles >>> creation of sessions on all nodes in a cluster: >>> https://github.com/bipthelin/ChicagoBoss/commit/48eded3c7a03166aac45f...
>>> This only work for new sessions and dont't remove/update existing >>> sessions and before I code anything more I'd like to discuss some >>> design decisions since I don't (yet) have any deep knowledge of the >>> architecture.
>>> My gut feeling is that there should be a more generic approach to >>> clustring instead of doing it specifically at every place. Some >>> gen_leader here, some mnesia there, etc.
>>> If anyone with more insights in the code (Evan?) can provide some of >>> their ideas/visions of clustring, etc for CB I'd be glad to chip in my >>> 0.5c and the corresponding code.
On 10 January 2012 03:44, Evan Miller <emmil...@gmail.com> wrote:
> Incidentally, "Prehistoric Boss" (circa 2008) used CouchDB > exclusively. But I kept getting weird errors and gave up on my > NoSQL/Erlang dreams until I discovered Tyrant the next year. Ah, > memories.
> Anyways, Couch is much more stable now and it'd be great to add it to the mix.
> Evan
> On Mon, Jan 9, 2012 at 8:26 PM, Dave Cottlehuber <d...@muse.net.nz> wrote: >> I'd love to add a couchdb backend for Boss and then point it at a >> bigcouch cluster :-)))
>> The first part I hopefully will have time for in Feb.
>> Erlang FTW.
>> On 10 January 2012 02:53, Evan Miller <emmil...@gmail.com> wrote: >>> I've never used gen_leader so I can't say whether it's a good fit. In >>> general I think this is a hard problem because for true clustering >>> we'll need replication of data (not just services). I think you've >>> discovered the difficulty with session storage, but then we also need >>> to worry about making BossMQ (the message queue) truly distributed and >>> fault-tolerant. That sounds hard to me.
>>> A more practical approach might be to clusterize the master_node >>> pure-computation services (i.e. incoming email), then for data >>> services just interface to external applications that have already >>> implemented fault-toleration. This is done for sessions (which can use >>> memcached), and could be done for the message queue as well. As much >>> as I like CB's batteries-included approach I think it's best to farm >>> out hard problems like data replication to other servers.
>>> Evan
>>> On Mon, Jan 9, 2012 at 12:38 PM, Bip Thelin <bip.the...@evolope.se> wrote: >>>> Hi,
>>>> We've just started digging into Chicago Boss and loving it so far. >>>> From my understanding reading the docs and code CB is using global >>>> modules to achieve "clustring" which means that one master_node is a >>>> potential single point of failure. My first idea was to change this >>>> behaviour from a normal gen_server to gen_leader. Since I've just done >>>> a couple of projects with gen_leader I have fairly good insights into >>>> it. But after spending a good 30min converting I felt that it might be >>>> the wrong way to go. Here's a modified session controller that handles >>>> creation of sessions on all nodes in a cluster: >>>> https://github.com/bipthelin/ChicagoBoss/commit/48eded3c7a03166aac45f...
>>>> This only work for new sessions and dont't remove/update existing >>>> sessions and before I code anything more I'd like to discuss some >>>> design decisions since I don't (yet) have any deep knowledge of the >>>> architecture.
>>>> My gut feeling is that there should be a more generic approach to >>>> clustring instead of doing it specifically at every place. Some >>>> gen_leader here, some mnesia there, etc.
>>>> If anyone with more insights in the code (Evan?) can provide some of >>>> their ideas/visions of clustring, etc for CB I'd be glad to chip in my >>>> 0.5c and the corresponding code.
Good this was the kind of response I was looking for. A few notes on gen_leader(sometimes known as Paxos), it's an erlang behaviour where a cluster of services can dispatch messages to an elected leader and that leader can dispatch messages to all workers(i.e. gen_leaders but not elected leaders). If the elected leader goes down a new leader will be automatically elected and takes over the responsibilities. It's a simple and elegant solution for the problem when you have a bunch of services but at a given time you want only one of them performing something, like sending mail, etc. In my opinion it's a perfect candidate for a master_node setup.
It's not as much a perfect fit for clustring if you want a true horizontal approach with a "gossip" protocol like memcache or Riak.
This is where I stopped. I started with a gen_leader approach but felt halfway through that a true clustring approach is more suitable.
I agree that one (and a pretty good one) approach is to use external applications like memcached. My biggest gripe with this is ending up with dependencies on a bunch of different servers/applications and the hassle with configuring, running all of these and the eventual cyclic dependency hell. I'm not saying that this is where one ends up but I just had a rather unpleasant experience with Scribe(log transport for Hadoop) which ended up in us rolling our own(https://github.com/bipthelin/zerolog).
There is a fine line in keeping it simple to setup, use and maintain and ending up rolling your own Riak in the end. I like the "batteries-included approach" so I'll do some research and see what I come up with. But on another note gen_leader might be a good addition to some other parts of CB, just not as a distributed k/v.
-- Bip Thelin
Evolope AB | Lugnets Allé 1 | 120 33 Stockholm Tel 08-533 335 37 | Mob 0735-18 18 90 www.evolope.se
> I've never used gen_leader so I can't say whether it's a good fit. In > general I think this is a hard problem because for true clustering > we'll need replication of data (not just services). I think you've > discovered the difficulty with session storage, but then we also need > to worry about making BossMQ (the message queue) truly distributed and > fault-tolerant. That sounds hard to me.
> A more practical approach might be to clusterize the master_node > pure-computation services (i.e. incoming email), then for data > services just interface to external applications that have already > implemented fault-toleration. This is done for sessions (which can use > memcached), and could be done for the message queue as well. As much > as I like CB's batteries-included approach I think it's best to farm > out hard problems like data replication to other servers.
> Evan
> On Mon, Jan 9, 2012 at 12:38 PM, Bip Thelin <bip.the...@evolope.se> wrote: >> Hi,
>> We've just started digging into Chicago Boss and loving it so far. >> From my understanding reading the docs and code CB is using global >> modules to achieve "clustring" which means that one master_node is a >> potential single point of failure. My first idea was to change this >> behaviour from a normal gen_server to gen_leader. Since I've just done >> a couple of projects with gen_leader I have fairly good insights into >> it. But after spending a good 30min converting I felt that it might be >> the wrong way to go. Here's a modified session controller that handles >> creation of sessions on all nodes in a cluster: >> https://github.com/bipthelin/ChicagoBoss/commit/48eded3c7a03166aac45f...
>> This only work for new sessions and dont't remove/update existing >> sessions and before I code anything more I'd like to discuss some >> design decisions since I don't (yet) have any deep knowledge of the >> architecture.
>> My gut feeling is that there should be a more generic approach to >> clustring instead of doing it specifically at every place. Some >> gen_leader here, some mnesia there, etc.
>> If anyone with more insights in the code (Evan?) can provide some of >> their ideas/visions of clustring, etc for CB I'd be glad to chip in my >> 0.5c and the corresponding code.
On Tue, Jan 10, 2012 at 2:01 AM, Dave Cottlehuber <d...@muse.net.nz> wrote: > On 10 January 2012 03:44, Evan Miller <emmil...@gmail.com> wrote: >> Incidentally, "Prehistoric Boss" (circa 2008) used CouchDB >> exclusively. But I kept getting weird errors and gave up on my >> NoSQL/Erlang dreams until I discovered Tyrant the next year. Ah, >> memories.
> Any code remnants??
Sure -- after minutes of digging under the hot Illinois sun, I found a partial skeleton:
>> Anyways, Couch is much more stable now and it'd be great to add it to the mix.
>> Evan
>> On Mon, Jan 9, 2012 at 8:26 PM, Dave Cottlehuber <d...@muse.net.nz> wrote: >>> I'd love to add a couchdb backend for Boss and then point it at a >>> bigcouch cluster :-)))
>>> The first part I hopefully will have time for in Feb.
>>> Erlang FTW.
>>> On 10 January 2012 02:53, Evan Miller <emmil...@gmail.com> wrote: >>>> I've never used gen_leader so I can't say whether it's a good fit. In >>>> general I think this is a hard problem because for true clustering >>>> we'll need replication of data (not just services). I think you've >>>> discovered the difficulty with session storage, but then we also need >>>> to worry about making BossMQ (the message queue) truly distributed and >>>> fault-tolerant. That sounds hard to me.
>>>> A more practical approach might be to clusterize the master_node >>>> pure-computation services (i.e. incoming email), then for data >>>> services just interface to external applications that have already >>>> implemented fault-toleration. This is done for sessions (which can use >>>> memcached), and could be done for the message queue as well. As much >>>> as I like CB's batteries-included approach I think it's best to farm >>>> out hard problems like data replication to other servers.
>>>> Evan
>>>> On Mon, Jan 9, 2012 at 12:38 PM, Bip Thelin <bip.the...@evolope.se> wrote: >>>>> Hi,
>>>>> We've just started digging into Chicago Boss and loving it so far. >>>>> From my understanding reading the docs and code CB is using global >>>>> modules to achieve "clustring" which means that one master_node is a >>>>> potential single point of failure. My first idea was to change this >>>>> behaviour from a normal gen_server to gen_leader. Since I've just done >>>>> a couple of projects with gen_leader I have fairly good insights into >>>>> it. But after spending a good 30min converting I felt that it might be >>>>> the wrong way to go. Here's a modified session controller that handles >>>>> creation of sessions on all nodes in a cluster: >>>>> https://github.com/bipthelin/ChicagoBoss/commit/48eded3c7a03166aac45f...
>>>>> This only work for new sessions and dont't remove/update existing >>>>> sessions and before I code anything more I'd like to discuss some >>>>> design decisions since I don't (yet) have any deep knowledge of the >>>>> architecture.
>>>>> My gut feeling is that there should be a more generic approach to >>>>> clustring instead of doing it specifically at every place. Some >>>>> gen_leader here, some mnesia there, etc.
>>>>> If anyone with more insights in the code (Evan?) can provide some of >>>>> their ideas/visions of clustring, etc for CB I'd be glad to chip in my >>>>> 0.5c and the corresponding code.