queue will lose on RabbitMQ pod after the vm restart

杜昱萱

unread,

Jun 22, 2018, 3:14:46 AM6/22/18

to rabbitmq-users

We install rabbitmq with helm chart.

And then here will be 2 pod on worker node, they combine a cluster.

$ kubectl get pod

rabbitmq-0 1/1 Running 2 1h
rabbitmq-1 1/1 Running 2 1h

On the node, here are 37 queues, which are designed as durable=true and auto-delete=false.

And set ha policy:

bash-4.2# rabbitmqctl set_policy ha-all "" '{"ha-mode":"exactly", "ha-params":2, "ha-promote-on-shutdown":"always"}'

Then shutoff all the worker node, and restart the node.

There are only 23 queues left.

14 queues are missed, and this is not random, I mean always lose the same queue.

Also, the policy I just set is lost as well.

Any idea of the issue?

Rabbitmq issue is 3.7.0.

Thanks.

Michael Klishin

unread,

Jun 22, 2018, 3:44:15 AM6/22/18

to rabbitm...@googlegroups.com

If the line you use really is

rabbitmqctl set_policy ha-all "" '{"ha-mode":"exactly", "ha-params":2, "ha-promote-on-shutdown":"always"}'

then it will match no queues since the pattern is an empty string. Therefore no queues will be mirrored.

There isn't much information to work with otherwise, see server logs for clues.

We don't guess on this list but other possible reasons that came up on this list before specifically on Kubernetes are:

* Stateful sets were not used (this is a requirement) [1]

* An example deployment file included a dangerous option and the Helm chart adopted it without much consideration [2][3]

1. http://www.rabbitmq.com/cluster-formation.html#peer-discovery-k8s

2. https://groups.google.com/d/msg/rabbitmq-users/wuOfzEywHXo/k8z_HWIkBgAJ

3. https://groups.google.com/d/msg/rabbitmq-users/wuOfzEywHXo/zu_spWuiCAAJ

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

MK

Staff Software Engineer, Pivotal/RabbitMQ

杜昱萱

unread,

Jun 22, 2018, 4:39:37 AM6/22/18

to rabbitmq-users

While I can see the defined ha-all policy really apply to all queues.

bash-4.2# rabbitmqctl list_queues durable auto_delete consumers policy pid owner_pid exclusive name

Timeout: 60.0 seconds ...

Listing queues for vhost / ...

true false 0 ha-all <rab...@rabbitmq-0.3.368.0> false agentdmupdate

true false 10 ha-all <rab...@rabbitmq-0.3.374.0> false diameter-adapter

true false 20 ha-all <rab...@rabbitmq-0.3.371.0> false cig.adaptationlayer.update.request.impact

true false 1 ha-all <rab...@rabbitmq-0.3.377.0> false impactexchange.request.lwm2m

true false 1 ha-all <rab...@rabbitmq-0.3.380.0> false impactexchange.request.oma2

true false 3 ha-all <rab...@rabbitmq-0.3.383.0> false agentresponse

true false 1 ha-all <rab...@rabbitmq-0.3.386.0> false agentnotifydefault

true false 1 ha-all <rab...@rabbitmq-0.3.389.0> false PPGNODE_1

true false 1 ha-all <rab...@rabbitmq-0.3.392.0> false gateway.message.mbus

true false 0 ha-all <rab...@rabbitmq-0.3.395.0> false pushnotifyretry

true false 2 ha-all <rab...@rabbitmq-0.3.398.0> false diameter-adapter-adapter1

And in the helm chart, we use the peer-discovery-k8s plugin to combine cluster.
The node name is based on IP address.

bash-4.2# pwd
/var/lib/rabbitmq/mnesia
bash-4.2# ls
rabbit@192.168.1.140                 rabbit@192.168.1.181                 rabbit@192.168.1.244                 rabbit@192.168.1.41
rab...@192.168.1.140-plugins-expand  rab...@192.168.1.181-plugins-expand  rab...@192.168.1.244-plugins-expand  rab...@192.168.1.41-plugins-expand
rabbit@192.168.1.140.pid             rabbit@192.168.1.181.pid             rabbit@192.168.1.244.pid             rabbit@192.168.1.41.pid
rabbit@192.168.1.172                 rabbit@192.168.1.190                 rabbit@192.168.1.36
rab...@192.168.1.172-plugins-expand  rab...@192.168.1.190-plugins-expand  rab...@192.168.1.36-plugins-expand
rabbit@192.168.1.172.pid             rabbit@192.168.1.190.pid             rabbit@192.168.1.36.pid

After restart, pod IP has changed, every restart generate a new directory.

在 2018年6月22日星期五 UTC+8下午3:44:15，Michael Klishin写道：

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,

Jun 22, 2018, 5:03:04 AM6/22/18

to rabbitm...@googlegroups.com

You can configure node's data directory [1] to not include the hostname (or IP address),

or make sure that the hostname doesn't change and resolves to whatever IP is currently in use.

The latter is probably optimal because you will have to access management UI somehow and

using IP addresses for that is not what humans like to do.

1. http://www.rabbitmq.com/relocate.html

On Fri, Jun 22, 2018 at 11:39 AM, 杜昱萱 <beibei1...@gmail.com> wrote:

While I can see the defined ha-all policy really apply to all queues.
bash-4.2# rabbitmqctl list_queues durable auto_delete consumers policy pid owner_pid exclusive name
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
true    false   0       ha-all  <rab...@rabbitmq-0.3.368.0>             false   agentdmupdate
true    false   10      ha-all  <rab...@rabbitmq-0.3.374.0>             false   diameter-adapter
true    false   20      ha-all  <rab...@rabbitmq-0.3.371.0>             false   cig.adaptationlayer.update.request.impact
true    false   1       ha-all  <rab...@rabbitmq-0.3.377.0>             false   impactexchange.request.lwm2m
true    false   1       ha-all  <rab...@rabbitmq-0.3.380.0>             false   impactexchange.request.oma2
true    false   3       ha-all  <rab...@rabbitmq-0.3.383.0>             false   agentresponse
true    false   1       ha-all  <rab...@rabbitmq-0.3.386.0>             false   agentnotifydefault
true    false   1       ha-all  <rab...@rabbitmq-0.3.389.0>             false   PPGNODE_1
true    false   1       ha-all  <rab...@rabbitmq-0.3.392.0>             false   gateway.message.mbus
true    false   0       ha-all  <rab...@rabbitmq-0.3.395.0>             false   pushnotifyretry
true    false   2       ha-all  <rab...@rabbitmq-0.3.398.0>             false   diameter-adapter-adapter1

And in the helm chart, we use the peer-discovery-k8s plugin to combine cluster.
The node name is based on IP address.

bash-4.2# pwd
/var/lib/rabbitmq/mnesia
bash-4.2# ls
rabbit@192.168.1.140 rabbit@192.168.1.181 rabbit@192.168.1.244 rabbit@192.168.1.41

rab...@192.168.1.140-plugins-expand rab...@192.168.1.181-plugins-expand rabbit@192.168.1.244-plugins-expand rab...@192.168.1.41-plugins-expand

rabbit@192.168.1.140.pid rabbit@192.168.1.181.pid rabbit@192.168.1.244.pid rabbit@192.168.1.41.pid
rabbit@192.168.1.172 rabbit@192.168.1.190 rabbit@192.168.1.36

rab...@192.168.1.172-plugins-expand rab...@192.168.1.190-plugins-expand rabbit@192.168.1.36-plugins-expand
rabbit@192.168.1.172.pid rabbit@192.168.1.190.pid rabbit@192.168.1.36.pid

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

杜昱萱

unread,

Jun 22, 2018, 5:17:30 AM6/22/18

to rabbitmq-users

Thank you so much!
For the first, you mean set RABBITMQ_MNESIA_DIR to a stable directory?

在 2018年6月22日星期五 UTC+8下午5:03:04，Michael Klishin写道：

Michael Klishin

unread,

Jun 22, 2018, 5:42:19 AM6/22/18

to rabbitm...@googlegroups.com

Yes, e.g. set it to /var/lib/rabbitmq/db via rabbitmq-env.conf [1] or any other way of setting

environment variables and this specific issue should be gone. Note that the RABBITMQ_ prefix must

be removed in rabbitmq-env.conf but present in all other cases.

1. http://www.rabbitmq.com/configure.html

rab...@192.168.1.140-plugins-expand rab...@192.168.1.181-plugins-expand rab...@192.168.1.244-plugins-expand rabbit@192.168.1.41-plugins-expand

rabbit@192.168.1.140.pid rabbit@192.168.1.181.pid rabbit@192.168.1.244.pid rabbit@192.168.1.41.pid
rabbit@192.168.1.172 rabbit@192.168.1.190 rabbit@192.168.1.36

rab...@192.168.1.172-plugins-expand rab...@192.168.1.190-plugins-expand rab...@192.168.1.36-plugins-expand
rabbit@192.168.1.172.pid rabbit@192.168.1.190.pid rabbit@192.168.1.36.pid

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

杜昱萱

unread,

Jun 22, 2018, 7:37:34 AM6/22/18

to rabbitmq-users

Oh, I change to RABBITMQ_MNESIA_DIR=/var/lib/rabbitmq/mnesia/localhost in rabbitmq-env.conf, the rabbitmq server can also be brang up.

While the issue is, the cluster_nodes.config file under this directory record the previous node ip infomation.
So can not cluster since the pod ip will change after VM restart.

$ kubectl get pod

rabbitmq-0 0/1 CrashLoopBackOff 3 24m
rabbitmq-1 0/1 CrashLoopBackOff 3 24m

$ kubectl logs rabbitmq-0

{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{failed_to_cluster_with,['rab...@192.168.1.142','rab...@192.168.1.210'],\"Mnesia could not connect to any nodes.\"},{rabbit,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{{failed_to_cluster_with,['rab...@192.168.1.142','rab...@192.168.1.210'],"Mnesia could not connect to any nodes."}

在 2018年6月22日星期五 UTC+8下午5:42:19，Michael Klishin写道：

rab...@192.168.1.140-plugins-expand rab...@192.168.1.181-plugins-expand rabbit@192.168.1.244-plugins-expand rab...@192.168.1.41-plugins-expand

rabbit@192.168.1.140.pid rabbit@192.168.1.181.pid rabbit@192.168.1.244.pid rabbit@192.168.1.41.pid
rabbit@192.168.1.172 rabbit@192.168.1.190 rabbit@192.168.1.36

rab...@192.168.1.172-plugins-expand rab...@192.168.1.190-plugins-expand rabbit@192.168.1.36-plugins-expand
rabbit@192.168.1.172.pid rabbit@192.168.1.190.pid rabbit@192.168.1.36.pid

Michael Klishin

unread,

Jun 22, 2018, 11:03:40 AM6/22/18

to rabbitm...@googlegroups.com

That's why I suggested using domain names that resolve: relying on IP addresses, well, just about anywhere

will go haywire if they change.

Sorry but your remaining issues are not specific to RabbitMQ. RabbitMQ nodes cannot possibly know what the new IP is. It's up to you

to make sure nodes use stable hostnames to identify each other.

On Fri, Jun 22, 2018 at 2:37 PM, 杜昱萱 <beibei1...@gmail.com> wrote:

Oh, I change to RABBITMQ_MNESIA_DIR=/var/lib/rabbitmq/mnesia/localhost in rabbitmq-env.conf, the rabbitmq server can also be brang up.
While the issue is, the cluster_nodes.config file under this directory record the previous node ip infomation.
So can not cluster since the pod ip will change after VM restart.

$ kubectl get pod
rabbitmq-0 0/1 CrashLoopBackOff 3 24m
rabbitmq-1 0/1 CrashLoopBackOff 3 24m
$ kubectl logs rabbitmq-0

{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{failed_to_cluster_with,['rabbit@192.168.1.142','rab...@192.168.1.210'],\"Mnesia could not connect to any nodes.\"},{rabbit,start,[normal,[]]}}}"}

rab...@192.168.1.140-plugins-expand rab...@192.168.1.181-plugins-expand rab...@192.168.1.244-plugins-expand rabbit@192.168.1.41-plugins-expand

rabbit@192.168.1.140.pid rabbit@192.168.1.181.pid rabbit@192.168.1.244.pid rabbit@192.168.1.41.pid
rabbit@192.168.1.172 rabbit@192.168.1.190 rabbit@192.168.1.36

rab...@192.168.1.172-plugins-expand rab...@192.168.1.190-plugins-expand rab...@192.168.1.36-plugins-expand
rabbit@192.168.1.172.pid rabbit@192.168.1.190.pid rabbit@192.168.1.36.pid

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward