Improved OCF resource agent for dynamic active-active mirrored clustering

148 views
Skip to first unread message

Bogdan Dobrelya

unread,
Jun 9, 2015, 10:17:56 AM6/9/15
to rabbitm...@googlegroups.com
Hello.

I'd like to contribute the improved OCF resource agent for dynamic membership control of the nodes in RabbitMQ active-active mirrored cluster.
Active-active means that all currently available cluster nodes can be directly connected by AMQP and process messages by usual meanings.
The Fuel for OpenStack project [0] uses this OCF RA since the release 5.1. Originally this agent was implemented by related blueprint [1]
and have been improving constantly. The current state of clustering business logic hidden behind this RA may be represented by the flow charts [2].
Note, the slide "Handling the failed node" is out of the scope and requires separate fence daemon process running, see [3] for details.

As you can see, the main idea is:
- do not keep cluster nodes in configuration file, but instead represent them as a multistate (single master, multi-slaves) clone resource in Pacemaker.
- do the best to keep as many rabbit nodes available as it is possible for any given moment of time.
- keep some clustering related data (rabbit app uptime, resource master score and master attribute) in the Corosync Info Base (CIB)
- on the RabbitMQ nodes cold start up, elect a Master role for the related resource and join other RabbitMQ nodes to form
a cluster.
- demote the Master if it should be stopped or crashed for some reason and promote a new one, based on the most rabbit app uptime criterion.
Once elected, rejoin other rabbit nodes represented as multistate clone Slaves.
- when any rabbit node stops or crashes for some reason, it is beeing kicked in order to update the cluster membership dynamically. And if it comes back
again later, its mnesia will be reset and it will re-join the cluster.

So, the action items, as I can see them :
- submit a pull request to [4]. The original code currently resides in the Fuel project library [5]
- code review and make the OCF agent backwards compatible
- update the docs and include the flow charts [2], if possible.
- consider adding unit tests. Yes, for bash, you know...

Thank you for your time reading this :)

Jean-Sébastien Pédron

unread,
Jun 9, 2015, 12:01:19 PM6/9/15
to rabbitm...@googlegroups.com
On 09.06.2015 16:17, Bogdan Dobrelya wrote:
> Hello.

Hi Bogdan!

> I'd like to contribute the improved OCF resource agent for dynamic
> membership control of the nodes in RabbitMQ active-active mirrored cluster.

Great! I will look at what you prepared and get back to you.

Thank you!

--
Jean-Sébastien Pédron
Pivotal / RabbitMQ

Jean-Sébastien Pédron

unread,
Jun 10, 2015, 10:22:07 AM6/10/15
to rabbitm...@googlegroups.com
On 09.06.2015 16:17, Bogdan Dobrelya wrote:
> Hello.

Hi!

> I'd like to contribute the improved OCF resource agent for dynamic
> membership control of the nodes in RabbitMQ active-active mirrored
> cluster.

I know neither Pacemaker nor the OCF spec in details, so I may
understand things incorrectly. For instance, when a node is demoted from
master to slave, RabbitMQ (the Erlang application, not the entire node)
is stopped but not restarted. Am I correct? Is this expected?

When a node is stopped, it is removed from the cluster. I see nowhere in
the code if you wait for the HA synchronization to finish before
removing a node which is the master for some queues.

By the way, when you call a node a master, does this mean you want all
queue's masters to run on that particular node?

> So, the action items, as I can see them :
> - submit a pull request to [4]. The original code currently resides in
> the Fuel project library [5]

I agree, it would be easier to comment on the code. For now, I attached
a commented version to this mail. My comments start with "# XXX" and
address only implementation details, not the workflow itself.

> - code review and make the OCF agent backwards compatible

The current RA is barely a copy of a simple init script with no policy
enforcement. Your implementation is designed for a cluster with all
resources replicated on all nodes and dynamically removes/adds nodes to
the cluster. As you enforce a particular choice, I'm not sure this new
resource agent can be backward-compatible with the current one.

Of course, this doesn't mean we won't include it. I personnaly have no
idea how the current RA is used in production by users: the current one
brings nearly no value compared to a simple init script.

I would like to hear from people using the OCF RA: what do you think of
this new one? Should we include both? Should the new one replace the
current one? In general, what do you expect from the official RA?
rabbitmq

Michael Klishin

unread,
Jun 10, 2015, 10:31:54 AM6/10/15
to Jean-Sébastien Pédron, rabbitm...@googlegroups.com
We can include this more opinionated version and recommend it for OpenStack, for instance.

MK

Bogdan Dobrelya

unread,
Jun 10, 2015, 11:10:14 AM6/10/15
to rabbitm...@googlegroups.com
On 10.06.2015 16:22, Jean-Sébastien Pédron wrote:
> On 09.06.2015 16:17, Bogdan Dobrelya wrote:
>
> Hi!
>
>
> I know neither Pacemaker nor the OCF spec in details, so I may
> understand things incorrectly. For instance, when a node is demoted from
> master to slave, RabbitMQ (the Erlang application, not the entire node)
> is stopped but not restarted. Am I correct? Is this expected?

Yes, as you can see from the flow charts, OCF agent starts the rabbit
app only after the Master pacemaker resource promoted successfully or
the Slave resource started & joined rabbit cluster by existing Master.
This is handled by post promote and post start notifications sent
cluster-wide by Pacemaker.

>
> When a node is stopped, it is removed from the cluster. I see nowhere in
> the code if you wait for the HA synchronization to finish before
> removing a node which is the master for some queues.

This OCF agent considers any non running state of the rabbit pacemaker
resource as a failure, synchronization is not expected (the most
pessimistic case is assumed). If it was a Slave down, it will be
re-joined later, once/if available again. If it was a Master down, the
new one will be re-elected from the rest of the nodes remaining running
as Slaves - as a part of fail-over procedure. The one who has the most
uptime of the rabbit app will win the Master role. The rest will re-join
him on post-promote or post-start events received. Mnesia reset depends
on the situation. For example, if node thinks it is clustered with
another node, but that one disagrees.

But you're right. Perhaps, the HA synchronization should be expected if
the resource stop is
gracefull, for example by the operator request. Do you think this case
should be addressed as a bug?

>
> By the way, when you call a node a master, does this mean you want all
> queue's masters to run on that particular node?
>

No, queue masters may belong to any rabbit nodes. The Master and Slave
we're referring here are only for the pacemaker multistate clone roles.
The "Master" is also the rabbit node that is normally specified by
another members ("Slaves") as a target for the join_cluster command.

>
> I agree, it would be easier to comment on the code. For now, I attached
> a commented version to this mail. My comments start with "# XXX" and
> address only implementation details, not the workflow itself.
>
>
> The current RA is barely a copy of a simple init script with no policy
> enforcement. Your implementation is designed for a cluster with all
> resources replicated on all nodes and dynamically removes/adds nodes to
> the cluster. As you enforce a particular choice, I'm not sure this new
> resource agent can be backward-compatible with the current one.
>
> Of course, this doesn't mean we won't include it. I personnaly have no
> idea how the current RA is used in production by users: the current one
> brings nearly no value compared to a simple init script.
>
> I would like to hear from people using the OCF RA: what do you think of
> this new one? Should we include both? Should the new one replace the
> current one? In general, what do you expect from the official RA?
>


--
Best regards,
Bogdan Dobrelya,
Skype #bogdando_at_yahoo.com
Irc #bogdando

Michael Klishin

unread,
Jun 10, 2015, 2:56:23 PM6/10/15
to Bogdan Dobrelya, rabbitm...@googlegroups.com
On 10 June 2015 at 18:10:15, Bogdan Dobrelya (bdob...@mirantis.com) wrote:
> But you're right. Perhaps, the HA synchronization should be
> expected if
> the resource stop is
> gracefull, for example by the operator request. Do you think
> this case
> should be addressed as a bug?

What specifically would you like to see improved in RabbitMQ? We are happy
to consider many (most?)  things.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Jean-Sébastien Pédron

unread,
Jun 11, 2015, 5:17:56 AM6/11/15
to rabbitm...@googlegroups.com
On 10.06.2015 17:10, Bogdan Dobrelya wrote:
> Yes, as you can see from the flow charts, OCF agent starts the rabbit
> app only after the Master pacemaker resource promoted successfully or
> the Slave resource started & joined rabbit cluster by existing Master.
> This is handled by post promote and post start notifications sent
> cluster-wide by Pacemaker.

Ok, so a Pacemaker takes care of transitionning the stopped slave node
to a running state.

Another question because I'm not sure I got that right: consider an
already running cluster of three nodes, A, B and C. When A is demoted
and B is promoted master, does C leaves the cluster and join B again?

>> When a node is stopped, it is removed from the cluster. I see nowhere in
>> the code if you wait for the HA synchronization to finish before
>> removing a node which is the master for some queues.
>
> This OCF agent considers any non running state of the rabbit pacemaker
> resource as a failure, synchronization is not expected (the most
> pessimistic case is assumed). If it was a Slave down, it will be
> re-joined later, once/if available again. If it was a Master down, the
> new one will be re-elected from the rest of the nodes remaining running
> as Slaves - as a part of fail-over procedure. The one who has the most
> uptime of the rabbit app will win the Master role. The rest will re-join
> him on post-promote or post-start events received. Mnesia reset depends
> on the situation. For example, if node thinks it is clustered with
> another node, but that one disagrees.
>
> But you're right. Perhaps, the HA synchronization should be expected if
> the resource stop is
> gracefull, for example by the operator request. Do you think this case
> should be addressed as a bug?

In case of a graceful stop/demotion of a master, yes, it could be nice
to ensure the queues owned by this node are fully synchronized
elsewhere. Otherwise, there could be data loss.

An unexpected stop/crash is of course a different situation.

>> By the way, when you call a node a master, does this mean you want all
>> queue's masters to run on that particular node?
>
> No, queue masters may belong to any rabbit nodes. The Master and Slave
> we're referring here are only for the pacemaker multistate clone roles.
> The "Master" is also the rabbit node that is normally specified by
> another members ("Slaves") as a target for the join_cluster command.

Ok, thanks.

Bogdan Dobrelya

unread,
Jun 11, 2015, 8:56:25 AM6/11/15
to Jean-Sébastien Pédron, rabbitm...@googlegroups.com
On 11.06.2015 11:17, Jean-Sébastien Pédron wrote:
> On 10.06.2015 17:10, Bogdan Dobrelya wrote:
>
> Ok, so a Pacemaker takes care of transitionning the stopped slave node
> to a running state.
>
> Another question because I'm not sure I got that right: consider an
> already running cluster of three nodes, A, B and C. When A is demoted
> and B is promoted master, does C leaves the cluster and join B again?

The flow will be as the following:
Case I. When Corosync and Pacemaker on node A are alive.
1. A demoted (rabbit app stopped)
2. B or C kicked A from cluster (the first one succeeds) by the
post-demote notification.
3. B or C (the-most-rabbit-uptime-winner) promoted, let it be C. rabbit
app started.
4. B's app stopped, joined to C (mnesia reset, only if cannot join),
app started - by post-promote notification. C ignores post-promote.
___
5. (optional) A came back. Rabbit app started but will likely fail to
start because A thinks it is clustered with some node, but it was
forgotten. If so, its mnesia reset and rabbit app started anyway.
6. A, B, C processed post-start for A:
- A, joined C and rabbit app started. If it cannot join, reset mnesia
and stop until the next try to start;
- B checks "what-if-I-need-to-join-as-well?" and does nothing, if it has
a cluster.
- C does nothing.

Case II. When the node A just died and vanished - there will be no
notification sent by pacemaker.
All is the same here, but the node A will not be kicked and some
additional mnesia reset actions will be required to join it back later
smoothly, if it returned.
In order to handle this situation, the dedicated fence daemon may
detect "corosync node left the cluster" events and kick affected rabbit
nodes, but this is out of the scope of OCF RA.

>
>
> In case of a graceful stop/demotion of a master, yes, it could be nice
> to ensure the queues owned by this node are fully synchronized
> elsewhere. Otherwise, there could be data loss.
>
> An unexpected stop/crash is of course a different situation.
>
>
> Ok, thanks.

Bogdan Dobrelya

unread,
Jun 12, 2015, 7:29:55 AM6/12/15
to Jean-Sébastien Pédron, rabbitm...@googlegroups.com
On 11.06.2015 11:17, Jean-Sébastien Pédron wrote:
> On 10.06.2015 17:10, Bogdan Dobrelya wrote:
>
> Ok, so a Pacemaker takes care of transitionning the stopped slave node
> to a running state.
>
> Another question because I'm not sure I got that right: consider an
> already running cluster of three nodes, A, B and C. When A is demoted
> and B is promoted master, does C leaves the cluster and join B again?
>
>
> In case of a graceful stop/demotion of a master, yes, it could be nice
> to ensure the queues owned by this node are fully synchronized
> elsewhere. Otherwise, there could be data loss.

I forgot to answer this part, sorry.
Here is a bug for the Fuel project [0].
I hope when we're about to address this change for the OCF script, we
will already do it as a contribution to the rabbitmq-server project by
the script's new address :)

[0] https://bugs.launchpad.net/fuel/+bug/1464637

>
> An unexpected stop/crash is of course a different situation.
>
>
> Ok, thanks.

Jean-Sébastien Pédron

unread,
Jun 15, 2015, 10:40:08 AM6/15/15
to rabbitm...@googlegroups.com
On 11.06.2015 14:56, Bogdan Dobrelya wrote:
>> Another question because I'm not sure I got that right: consider an
>> already running cluster of three nodes, A, B and C. When A is demoted
>> and B is promoted master, does C leaves the cluster and join B again?
>
> The flow will be as the following:
> Case I. When Corosync and Pacemaker on node A are alive.
> ...
>
> Case II. When the node A just died and vanished - there will be no
> notification sent by pacemaker.
> ...

Thank you very much for the explanations!

To move forward with this, you can submit a pull request to add your OCF
RA (as an addition to, not a replacement of the current RA). Please base
your request on the "stable" branch so it gets included in the next
minor release.

Michael Klishin

unread,
Jun 15, 2015, 10:44:40 AM6/15/15
to Jean-Sébastien Pédron, rabbitm...@googlegroups.com
 On 15 June 2015 at 17:40:07, Jean-Sébastien Pédron (jean-se...@rabbitmq.com) wrote:
> Please base
> your request on the "stable" branch so it gets included in the
> next
> minor release.

In the next bug fix release :)

Jean-Sébastien Pédron

unread,
Jun 15, 2015, 10:49:29 AM6/15/15
to rabbitm...@googlegroups.com
On 15.06.2015 16:44, Michael Klishin wrote:
> On 15 June 2015 at 17:40:07, Jean-Sébastien Pédron (jean-se...@rabbitmq.com) wrote:
>> Please base your request on the "stable" branch so it gets included
>> in the next minor release.
>
> In the next bug fix release :)

Yes, sorry, that is what I meant :)

Bogdan Dobrelya

unread,
Jun 16, 2015, 10:02:42 AM6/16/15
to rabbitm...@googlegroups.com
Here is a PR [0] based on the stable branch.
Could we expect it, if accepted, would appear in the next bugfix release
for the 3.3.5 or 3.4.x versions as well, so we could drop this OCF code
out of the Fuel project and use the upstream version?

Another question is, in Fuel project we have CI gates validating
RabbitMQ cluster in HA layout as well as other HA components of
OpenStack infrastructure nodes. What would be the workflow for accepting
changes to the subject OCF script upstream?

[0] https://github.com/rabbitmq/rabbitmq-server/pull/189

Jean-Sébastien Pédron

unread,
Jun 18, 2015, 11:27:39 AM6/18/15
to rabbitm...@googlegroups.com
On 16.06.2015 16:02, Bogdan Dobrelya wrote:
>> To move forward with this, you can submit a pull request to add your OCF
>> RA (as an addition to, not a replacement of the current RA).
>
> Here is a PR [0] based on the stable branch.

Thanks! I put my comments there.

> Could we expect it, if accepted, would appear in the next bugfix release
> for the 3.3.5 or 3.4.x versions as well, so we could drop this OCF code
> out of the Fuel project and use the upstream version?

Unfortunately, we won't publish any new releases on the 3.3.x and 3.4.x
branches. We only maintain the last stable branch (3.5.x).

> Another question is, in Fuel project we have CI gates validating
> RabbitMQ cluster in HA layout as well as other HA components of
> OpenStack infrastructure nodes. What would be the workflow for
> accepting changes to the subject OCF script upstream?

Just to be sure I understand your question, do you want to know if we
have a validating process internally to test the OCF resource agent?

We don't have such a tool currently. Our current resource agent looks
like an init script and has no advanced feature to test.

Bogdan Dobrelya

unread,
Jul 7, 2015, 3:45:32 AM7/7/15
to rabbitm...@googlegroups.com
On 18.06.2015 17:27, Jean-Sébastien Pédron wrote:
> On 16.06.2015 16:02, Bogdan Dobrelya wrote:
>
> Thanks! I put my comments there.
>
>
> Unfortunately, we won't publish any new releases on the 3.3.x and 3.4.x
> branches. We only maintain the last stable branch (3.5.x).
>
>
> Just to be sure I understand your question, do you want to know if we
> have a validating process internally to test the OCF resource agent?
>
> We don't have such a tool currently. Our current resource agent looks
> like an init script and has no advanced feature to test.
>

I believe it makes sense to establish some publicly available validating
process, otherwise it is not clear how to support this script and make
changes in future.
I'm open for suggestions and ready to participate in this effort.

For now, I suggest we to postpone the pull request. I will continue
supporting of this OCF script as a part of Fuel project as we have CI
gates to cover HA RabbitMQ cluster deployment scenario and catch code
regressions, if any.

Bogdan Dobrelya

unread,
Jul 24, 2015, 5:36:03 AM7/24/15
to rabbitm...@googlegroups.com
Folks, I addressed the most of the comments to my PR [0].
I'm not sure postponing the work being done would benefit community, so
I decided to resume it.
I believe we could consider accepting this patch and think about minor
fixes and public verification process as next steps.

Thank you again for providing great comments! Those helped to improve
the script a lot.

[0] https://github.com/rabbitmq/rabbitmq-server/pull/189


--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

Jean-Sébastien Pédron

unread,
Jul 24, 2015, 10:35:00 AM7/24/15
to rabbitm...@googlegroups.com
On 24.07.2015 11:35, Bogdan Dobrelya wrote:
> Folks, I addressed the most of the comments to my PR [0].
> I'm not sure postponing the work being done would benefit community, so
> I decided to resume it.
> I believe we could consider accepting this patch and think about minor
> fixes and public verification process as next steps.

Hi!

I agree, style issues could be fixed later. Meanwhile, interested users
can start to play with it.

I merged your pull request to the master branch.

> Thank you again for providing great comments! Those helped to improve
> the script a lot.

Good :) Thank you very much for your contribution!

Bogdan Dobrelya

unread,
Jul 27, 2015, 5:32:27 AM7/27/15
to rabbitmq-users, jean-se...@rabbitmq.com
Thank you.
The next step is 
- update the docs and include the flow charts [0], if possible.

Can you help with the way you usually provide documentation updates?


пʼятниця, 24 липня 2015 р. 16:35:00 UTC+2 користувач Jean-Sébastien Pédron написав:

Jean-Sébastien Pédron

unread,
Jul 27, 2015, 5:51:55 AM7/27/15
to rabbitm...@googlegroups.com
On 27.07.2015 11:32, Bogdan Dobrelya wrote:
> Thank you.
> The next step is
> - update the docs and include the flow charts [0], if possible.
>
> Can you help with the way you usually provide documentation updates?

Our documentation is part of the website. It was open-sourced recently
and is now available on GitHub:
https://github.com/rabbitmq/rabbitmq-website

You take site/build-erlang-client.xml as an example:
https://github.com/rabbitmq/rabbitmq-website/blob/master/site/build-erlang-client.xml

You should create your branch off of "master". The README.md explains
the branches and how to locally test the website.

Thank you!

Michael Klishin

unread,
Jul 27, 2015, 5:57:01 AM7/27/15
to Jean-Sébastien Pédron, rabbitm...@googlegroups.com, Bogdan Dobrelya
On 27 July 2015 at 12:51:56, Jean-Sébastien Pédron (jean-se...@rabbitmq.com) wrote:
> Our documentation is part of the website. It was open-sourced
> recently
> and is now available on GitHub:
> https://github.com/rabbitmq/rabbitmq-website

…which has a fairly small number of dependencies in the (old and home grown) static
site generator. The README should cover that.

> You take site/build-erlang-client.xml as an example:
> https://github.com/rabbitmq/rabbitmq-website/blob/master/site/build-erlang-client.xml
>
> You should create your branch off of "master". The README.md
> explains
> the branches and how to locally test the website.

This  means it will be available on next.rabbitmq.com and after 3.6.0 ships (in October).

We may want to push this live earlier, in which case it needs to be
branched off of `live`. Thoughts?

Jean-Sébastien Pédron

unread,
Jul 27, 2015, 6:13:27 AM7/27/15
to rabbitm...@googlegroups.com
On 27.07.2015 11:56, Michael Klishin wrote:
> This means it will be available on next.rabbitmq.com and after 3.6.0
> ships (in October).
>
> We may want to push this live earlier, in which case it needs to be
> branched off of `live`. Thoughts?

I'm ok to merge the new OCF resource agent to "stable" in RabbitMQ and
put the documentation in the "live" branch too, if that is what you mean.

Michael Klishin

unread,
Jul 27, 2015, 6:16:07 AM7/27/15
to Jean-Sébastien Pédron, rabbitm...@googlegroups.com, Bogdan Dobrelya
On 27 July 2015 at 13:13:27, Jean-Sébastien Pédron (jean-se...@rabbitmq.com) wrote:
> I'm ok to merge the new OCF resource agent to "stable" in RabbitMQ
> and
> put the documentation in the "live" branch too, if that is what
> you mean.

Yup, that's what I had in mind. Let's do it then.

Bogdan, if you are going to look into doc changes, please branch off of live.
That  branch is deployed every few days.

Bogdan Dobrelya

unread,
Sep 19, 2015, 8:48:46 AM9/19/15
to rabbitmq-users, jean-se...@rabbitmq.com, bdob...@mirantis.com
Hi there.
I submitted the PR [0] which describes how to use the OCF script with a Pacemaker.
This is a basic information. If you think there should be more details, including the aforementioned
clustering flow chart, this can be addressed later with separate PRs.

Please review, and check if this actually works, As I'm not a DocBook pro :-)

[0] https://github.com/rabbitmq/rabbitmq-website/pull/79

понедельник, 27 июля 2015 г., 12:16:07 UTC+2 пользователь Michael Klishin написал:

Bogdan Dobrelya

unread,
Oct 13, 2015, 9:06:39 AM10/13/15
to rabbitm...@googlegroups.com
On 27.07.2015 11:32, Bogdan Dobrelya wrote:
> Thank you.
> The next step is
> - update the docs and include the flow charts [0], if possible.
>
> Can you help with the way you usually provide documentation updates?
>
> [0] http://goo.gl/PPNrw7

The RabbitMQ pacemaker guide [0] now contains the required information.
See "Auto-configuration of a cluster with a Pacemaker".

[0] http://www.rabbitmq.com/pacemaker.html

>
> пʼятниця, 24 липня 2015 р. 16:35:00 UTC+2 користувач Jean-Sébastien
> Pédron написав:
>
> On 24.07.2015 11:35, Bogdan Dobrelya wrote:
> > Folks, I addressed the most of the comments to my PR [0].
> > I'm not sure postponing the work being done would benefit
> community, so
> > I decided to resume it.
> > I believe we could consider accepting this patch and think about
> minor
> > fixes and public verification process as next steps.
>
> Hi!
>
> I agree, style issues could be fixed later. Meanwhile, interested users
> can start to play with it.
>
> I merged your pull request to the master branch.
>
> > Thank you again for providing great comments! Those helped to improve
> > the script a lot.
>
> Good :) Thank you very much for your contribution!
>
> --
> Jean-Sébastien Pédron
> Pivotal / RabbitMQ
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "rabbitmq-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/rabbitmq-users/BnoIQJb34Ao/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> rabbitmq-user...@googlegroups.com
> <mailto:rabbitmq-user...@googlegroups.com>.
> To post to this group, send email to rabbitm...@googlegroups.com
> <mailto:rabbitm...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Bogdan Dobrelya

unread,
Oct 13, 2015, 9:21:44 AM10/13/15
to rabbitm...@googlegroups.com
On 13.10.2015 15:06, Bogdan Dobrelya wrote:
> On 27.07.2015 11:32, Bogdan Dobrelya wrote:
>> Thank you.
>> The next step is
>> - update the docs and include the flow charts [0], if possible.
>>
>> Can you help with the way you usually provide documentation updates?
>>
>> [0] http://goo.gl/PPNrw7
>
> The RabbitMQ pacemaker guide [0] now contains the required information.
> See "Auto-configuration of a cluster with a Pacemaker".
>
> [0] http://www.rabbitmq.com/pacemaker.html

Which rabbitmq-server build will contain the OCF script?
I'd like this to be clear in the OpenStack guide reference [0].
I checked with the

Version table:
*** 3.5.6-1 0
500 http://www.rabbitmq.com/debian/ testing/main amd64 Packages
100 /var/lib/dpkg/status

and it is not there yet.

[0] https://bugs.launchpad.net/openstack-manuals/+bug/1497528

Michael Klishin

unread,
Oct 13, 2015, 9:27:35 AM10/13/15
to rabbitm...@googlegroups.com, Bogdan Dobrelya
On 13 Oct 2015 at 16:21:44, Bogdan Dobrelya (bdob...@mirantis.com) wrote:
> Which rabbitmq-server build will contain the OCF script?
> I'd like this to be clear in the OpenStack guide reference [0].
> I checked with the
>
> Version table:
> *** 3.5.6-1 0
> 500 http://www.rabbitmq.com/debian/ testing/main amd64
> Packages
> 100 /var/lib/dpkg/status
>
> and it is not there yet.

Your PR was submitted against master, which will be 3.6.0 (the issue
is marked as such):

https://github.com/rabbitmq/rabbitmq-server/pull/189

We can backport it, feel free to submit the same change against `stable`,
then it will be in 3.5.7 (scheduled for early November) .

Bogdan Dobrelya

unread,
Oct 22, 2015, 9:24:19 AM10/22/15
to rabbitm...@googlegroups.com
On 15.10.2015 12:36, Bogdan Dobrelya wrote:
I updated the Auto-configuration guide for the Pacemaker [0].
Fixed the wrong/missing data in the guide and re-verified by following
the steps at my lab, both for crm and pcs tools.

And I prepared a Vagrant box [1] for the virtualbox and libvirt
providers [1]. So, you can ease the things a lot now in order
to play with the new rabbit clustering method. The box is maintained in
this repo [2]. You may want to use a Vagrantfile from that repo in order
to launch several nodes running rabbitmq and corosync/pacemaker clusters
out of a box.

Later, we could think of the fully automated CI/CD. For now, we have
only automated Vagrant box builds.

I hope this will work for all now :)

[0] https://github.com/rabbitmq/rabbitmq-website/pull/95
[1] https://atlas.hashicorp.com/bogdando/boxes/rabbitmq-cluster-ocf
[2] https://github.com/bogdando/packer-atlas-example

Bogdan Dobrelya

unread,
Feb 15, 2016, 10:36:57 AM2/15/16
to rabbitmq-users, Michael Klishin
I updated the Auto-configuration guide for the Pacemaker [0].
Fixed the wrong/missing data in the guide and re-verified by following
the steps at my lab, both for crm and pcs tools.

And I prepared a Vagrant box [1] for the virtualbox and libvirt
providers [1]. So, you can ease the things a lot now in order
to play with the new rabbit clustering method. The box is maintained in
this repo [2]. You may want to use a Vagrantfile from that repo in order
to launch several nodes running rabbitmq and corosync/pacemaker clusters
out of a box.

Later, we could think of the fully automated CI/CD. For now, we have
only automated Vagrant box builds.

Hello! A quick status update:

- A Travis CI smoke test example config on top of the rabbitmq-cluster fork [0].
  So, I'd like we to think about implementing the suchlike upstream as well!
- Separated Vagrantfile and cluster provision scripts to a separate repo [1].
  The packer example repo [2] now manages only atlas and docker (new!) builds.
- Added docker images for Ubuntu Trusty [3] and Wily [4]. Only the latter works
  stable, though. Perhaps I can build new ones for Xenial or other distros as well.
- Vagrantfile can now also deploy with the docker provider. Although there are
  few ugly kludges to w/a unimplemented things like this [5] and that [6]...

Bogdan Dobrelya

unread,
Mar 15, 2016, 10:57:47 AM3/15/16
to rabbitmq-users, mkli...@pivotal.io
Folks, I proposed a patch [0] to implement a Travis CI check for the rabbit cluster OCF RA.
Note, I moved existing .travis.yml erlang checks to the test group 1. And added new check for
the OCF RA to the group 2, so those to not interfere each over. It bootstraps a cluster of
two nodes (in docker containers [1]) and issues a simple smoke test.

Please consider enabling the git web hooks to display check results for pull requests.
Even though the current erlang checks in the stable branch seem failing, it would be really
nice for the rabbit OCF RA maintainers to see the group 2 results anyway!

TODO (optional improvements)
- Use https://www.rabbitmq.com/build-server.html for the group 2. For now it just installs
a vanilla rabbitmq-server package that comes with a Linux distribution (see the docker image details [1])
- Add group 2 checks for Ubuntu Xenial as well, or Centos or whatever we want.

[1] https://hub.docker.com/r/bogdando/rabbitmq-cluster-ocf-wily/

понеділок, 15 лютого 2016 р. 16:36:57 UTC+1 користувач Bogdan Dobrelya написав:
Reply all
Reply to author
Forward
0 new messages