Replication

472 views
Skip to first unread message

Pierre-Yves Kerembellec

unread,
Nov 7, 2009, 1:10:55 PM11/7/09
to beanstalk-talk
Hi there,

We would like to implement (or have implemented) replication beetween
at least 2 distincts physical beanstalkd servers, so that
a beanstalkd server does not become an architectural SPOF. Do you know
if and when this feature would be implemented ? Ar you interesting in
external contributions to add this capability ?

Thanks for your reply,
Pierre-Yves / Dailymotion

Keith Rarick

unread,
Nov 9, 2009, 12:42:21 AM11/9/09
to beansta...@googlegroups.com
On Sat, Nov 7, 2009 at 10:10 AM, Pierre-Yves Kerembellec
<py.kere...@gmail.com> wrote:
> We would like to implement (or have implemented) replication beetween
> at least 2 distincts physical beanstalkd servers, so that
> a beanstalkd server does not become an architectural SPOF. Do you know
> if and when this feature would be implemented ? Ar you interesting in
> external contributions to add this capability ?

Yes! Some coworkers discussed this with me long ago, but we never got
a chance to do it. I don't know when I will have time to do it myself;
if you want to implement it I will do whatever I can to help.

We can discuss the design here on the list. Here are some
(underdeveloped) ideas from my old discussions:

* peer-peer, not master-slave
* built-in eventual consistency
- in a general-purpose store, the client has to resolve inconsistencies
- but hopefully beanstalkd can resolve inconsistencies automatically
* introduce the feature a piece at a time rather than one huge change
* maybe embed the replication protocol in the existing protocol

kr

Pierre-Yves Kerembellec

unread,
Nov 9, 2009, 12:49:09 PM11/9/09
to beanstalk-talk
On 9 nov, 06:42, Keith Rarick <k...@xph.us> wrote:
> Yes! Some coworkers discussed this with me long ago, but we never got
> a chance to do it. I don't know when I will have time to do it myself;
> if you want to implement it I will do whatever I can to help.
>
> We can discuss the design here on the list. Here are some
> (underdeveloped) ideas from my old discussions:
>
>  * peer-peer, not master-slave
>  * built-in eventual consistency
>     - in a general-purpose store, the client has to resolve inconsistencies
>     - but hopefully beanstalkd can resolve inconsistencies automatically
>  * introduce the feature a piece at a time rather than one huge change
>  * maybe embed the replication protocol in the existing protocol

I've read some of the past discussions and I understand the different
points. The
idea I was pursuing was not to enable a full consistent cross-
replication against
multiple servers: I guess that type of scalability would be achieved
by sharding
producers and consumers on multiple beanstalkd instances, not really
talking to
each others.

I was more thinking about some kind of "remote binlog" capability, in
a master/slave
configuration, where a "master" server would provide a stream of its
modifications to
a "slave" server, so that a catastrophic failure of the master (not at
the beanstalkd
software level but at the physical machine level) would lead to
clients switching to
the slave, with 0-to-no job loss (depending on the replication &
network speeds between
the master and the slave).

The "switch" would be handled by some sort of floating IP mechanism
(software VRRP
like ucarpd, or quorum-based clustering like wackamole comes to mind).
This capability
is quite mandatory in our forthcoming architecture, since we heavily
rely on beanstalkd
to run multiple workflows, and we cannot afford to loose the ongoing
tasks in case
of hardware failure.

Most of network "data" servers have some sort of master/slave
replication mechanism
to address this particular issue (MySQL, MemcacheDB, Redis, etc). I
know this is not
exactly what you had in mind (from what I've read here), but this is
an interesting
feature for us (and we are willing to implement it), so what's your
position on this ?

Thanks,
Pierre-Yves / Dailymotion

Keith Rarick

unread,
Nov 10, 2009, 4:58:00 AM11/10/09
to beansta...@googlegroups.com
On Mon, Nov 9, 2009 at 9:49 AM, Pierre-Yves Kerembellec
<py.kere...@gmail.com> wrote:
> Most of network "data" servers have some sort of master/slave
> replication mechanism
> to address this particular issue (MySQL, MemcacheDB, Redis, etc). I
> know this is not
> exactly what you had in mind (from what I've read here), but this is
> an interesting
> feature for us (and we are willing to implement it), so what's your
> position on this ?

The design I was hoping for isn't really for scalability either, just
high availability.

The reason I prefer peer-peer 2-way replication is because "failover"
is ridiculously simple for clients and requires no external tools or
fancy routing.

I'm still very much in favor even if you really want to do it
master-slave. Either way this will be a great improvement.

kr

Dennis Krul

unread,
Nov 10, 2009, 5:55:30 AM11/10/09
to beanstalk-talk
Pierre-Yves,

I just would like to say that this would be a very welcome addition to
beanstalk for us! (And I'm sure we are not the only ones out there.)

We also have the redundancy challenge. We don't really need to scale
out, beanstalk handles everything just fine, but in our architecture
it's currently also a spof. We made it 'sorta' HA with keepalived, but
when a fail-over occurs the jobs on the queue are lost. That's not
really that critical for our platform, but it would be nice if we
could avoid it. We are currently considering using binlog with drbd
underneath. But if you could implement this feature that would be a
lot better and we can wait for that :)

Are you going to implement this? How long do you think you need? If
you need some help with testing we'd be more than willing to help.

Thanks

Dennis Krul


On Nov 10, 10:58 am, Keith Rarick <k...@xph.us> wrote:
> On Mon, Nov 9, 2009 at 9:49 AM, Pierre-Yves Kerembellec
>

Silas

unread,
Nov 10, 2009, 11:54:42 AM11/10/09
to beanstalk-talk
It seems like this could be done with the new persistence stuff in 1.4
and something like DRBD+Wackamole.

On Nov 9, 10:49 am, Pierre-Yves Kerembellec <py.kerembel...@gmail.com>
wrote:

Jaume Sabater

unread,
Nov 11, 2009, 4:30:17 AM11/11/09
to beansta...@googlegroups.com
On Tue, Nov 10, 2009 at 11:55 AM, Dennis Krul <d...@krul.nu> wrote:

> We also have the redundancy challenge. We don't really need to scale
> out, beanstalk handles everything just fine, but in our architecture
> it's currently also a spof. We made it 'sorta' HA with keepalived, but
> when a fail-over occurs the jobs on the queue are lost. That's not
> really that critical for our platform, but it would be nice if we
> could avoid it. We are currently considering using binlog with drbd
> underneath. But if you could implement this feature that would be a
> lot better and we can wait for that :)

Binlog + drdb is what I had in mind, too, although it's not critical
for us at work at the moment.

--
Jaume Sabater
http://linuxsilo.net/

"Ubi sapientas ibi libertas"

Kuangwei Hwang

unread,
Jun 13, 2011, 3:53:11 PM6/13/11
to beansta...@googlegroups.com
We need exactly the same replication and fail over functionality,
we would love to use the replication feature and just wondering
what's the status of this feature?
If not, we could probably contribute to the code base and open-source it.


Keith Rarick

unread,
Jun 14, 2011, 2:06:19 AM6/14/11
to beansta...@googlegroups.com
On Mon, Jun 13, 2011 at 12:53 PM, Kuangwei Hwang <khw...@betterworks.com> wrote:
> We need exactly the same replication and fail over functionality,
> we would love to use the replication feature and just wondering
> what's the status of this feature?

As far as I know, no work has been done.

kr

Wil Moore

unread,
Jun 14, 2011, 3:46:52 PM6/14/11
to beansta...@googlegroups.com
Would it not be of interest to think about standing on the shoulders of redis, mongo, etc. as optional backends? This way, you gain optional replication and a few other features as well. What other features?:

1. Ability to do pagination
2. Drop arbitrary messages by ID
3. Inspection and backup tools for those technologies become tools for beanstalk

Jan Kantert

unread,
Jun 14, 2011, 4:21:08 PM6/14/11
to beansta...@googlegroups.com
Hi Kuangwei,
Hi Wil,

there are solutions based on redis like Resque from the github guys.
Personally I like beanstalkd, because its simple and fast. Resque offers
a lot more features (especially more management), but it comes with more
overhead.

Redundancy is good, but if your using beanstalkd as async queue you may
just tolerate the failure of one node and run the jobs later on after
restoring the host. You can easily use multiple beanstalk server at once
and do some kind of loadbalancing between them. You will just loose some
jobs in case of a failure, which would run later after the host was
restored. Just my thoughts ;-).


Jan

Wil Moore

unread,
Jun 15, 2011, 10:46:31 AM6/15/11
to beansta...@googlegroups.com
Good ideas. Have you (or anyone else) figured out a sane way to touch specific jobs by ID?

Jesse Sanford

unread,
Jan 4, 2012, 12:48:32 PM1/4/12
to beansta...@googlegroups.com
I would like to pick up this project. Keith can you point me at any (if any) code spikes that have been done in this direction? Is there any proof of concept code etc? I know binlog replay is a pattern that is pretty fleshed out in other persistence daemons. I know that is not necessarily what you had been thinking per your statement back in 09:

"The design I was hoping for isn't really for scalability either, just 
high availability. 

The reason I prefer peer-peer 2-way replication is because "failover" 
is ridiculously simple for clients and requires no external tools or 
fancy routing. 

I'm still very much in favor even if you really want to do it 
master-slave. Either way this will be a great improvement."

But I am not sure you even had the on disk persistence then?

In the meantime I like others am planning on running a hot spare that can pick up the mount with the binlog if there is a failure in the primary and then fencing the primary out.

Keith Rarick

unread,
Jan 4, 2012, 7:21:04 PM1/4/12
to beansta...@googlegroups.com
On Wed, Jan 4, 2012 at 9:48 AM, Jesse Sanford <jesses...@gmail.com> wrote:
> I would like to pick up this project. Keith can you point me at any (if any)
> code spikes that have been done in this direction?

I don't know of any, but I'm happy to answer questions and do
code review. Is there anything else you need to get started?
It'd be good to briefly talk about the design (especially as it
impacts the user experience of beanstalkd) before actually
writing code.

The current on-disk format is different on different hardware,
since beanstalkd just copies bytes out from the job struct.
This means that, for example, replaying the log from a 32-bit
machine on a 64-bit machine won't work. (In hindsight, it
would have been better to design an architecture-independent
format; if the format ever changes again, I'll want to do that.)

kr

Keith Rarick

unread,
Dec 4, 2012, 6:25:50 PM12/4/12
to beansta...@googlegroups.com
On Tue, Dec 4, 2012 at 8:39 AM, Nathaniel Cook <natha...@qualtrics.com> wrote:
> So once again I am a volunteer to work on this. Same question: has there
> been any progress? I have already forked the github repo and will be
> submitting pull requests soon.

No progress that I know of.

> I think there is a simple change that can be made to move this in the right
> direction. I propose that we add support for multiple binlog dirs so the
> binlogs can exists somewhere on a shared file system for disater recovery.
> Then we can work on replicating hte bin log itself.

That doesn't sound simple. Can you give a more concrete description?
What would the command-line flags look like? What files would be
created in various scenarios, when would they be written, when would
they be read, and by whom?

Nathaniel Cook

unread,
Dec 5, 2012, 3:00:29 PM12/5/12
to beansta...@googlegroups.com

Specify wal dirs as a ':' separated list(like the PATH env var). Then each directory is written to in parallel, while only the first directory is used for reading. Te point of such a change allows for the data to exist on different disks without having to use  something like drbd. Then in the event of a failure of a disk the data can be restored though it be a manual process.

Discussion on how to implement replication in the protocol.

I like the idea of peer-peer replication scheme. At a high level what I am thinking is you define a list of peers to push/pull replication events to/from.( I am still thinking through if we want to allow more than one peer). This way multiple servers can have the same job. Each server will allow the job to be reserved but quickly replicate that it has been reserved. The inconstancy that can arise is that a single job could be reserved by two different clients at the same time. This condition can already exist if the TTR runs out and the job is released while the client that originally reserved the continues to run the job anyway. So we could resolve this issue the same way we already do. We also can prevent infinite loops in replication because we can uniquely identify a job with the server id(currently in progress) and its own job id. 

So to take a crack at the protocol it could be something along the lines of:

replicate put <jobrec>\r\n
<data>\r\n

replicate reserve <jobid> <serverid>\r\n
replicate release <jobid> <serverid> <pri> <delay>\r\n

etc for each of the job specific command.

Just some ideas to get the discussion going.

Nathaniel Cook

unread,
Jan 7, 2013, 12:36:19 PM1/7/13
to beansta...@googlegroups.com
I have started tinkering around with the code but mostly waiting to get some input from others. 

My main question is do we let jobs be served by multiple servers? 
     If we do then do we use some form of paxos to ensure only one server actually releases the job? Or do we just do a best effort algorithm that informs all servers with the job was released so they don't release it?

     If we don't let multiple servers serve the job then do servers just keep the replicated jobs in memory until the source server dies?

There are some performance simplicity trade-offs here. 

I'd love some input from the community.

On Monday, January 7, 2013 3:40:40 AM UTC-7, Verachten Bruno wrote:
Hi there,

is there anything in the work?

Kind regards,

Bruno Verachten
Reply all
Reply to author
Forward
0 new messages