queue will lose on RabbitMQ pod after the vm restart

458 views
Skip to first unread message

杜昱萱

unread,
Jun 22, 2018, 3:14:46 AM6/22/18
to rabbitmq-users
We install rabbitmq with helm chart.
And then here will be 2 pod on worker node, they combine a cluster.
$ kubectl get pod
rabbitmq-0                               1/1       Running            2          1h
rabbitmq-1                               1/1       Running            2          1h

On the node, here are 37 queues, which are designed as durable=true and auto-delete=false.
And set ha policy:
bash-4.2# rabbitmqctl set_policy ha-all "" '{"ha-mode":"exactly", "ha-params":2, "ha-promote-on-shutdown":"always"}'

Then shutoff all the worker node, and restart the node.
There are only 23 queues left.
14 queues are missed, and this is not random, I mean always lose the same queue.
Also, the policy I just set is lost as well.

Any idea of the issue?
Rabbitmq issue is 3.7.0.

Thanks.


Michael Klishin

unread,
Jun 22, 2018, 3:44:15 AM6/22/18
to rabbitm...@googlegroups.com
If the line you use really is

rabbitmqctl set_policy ha-all "" '{"ha-mode":"exactly", "ha-params":2, "ha-promote-on-shutdown":"always"}'

then it will match no queues since the pattern is an empty string. Therefore no queues will be mirrored.

There isn't much information to work with otherwise, see server logs for clues.

We don't guess on this list but other possible reasons that came up on this list before specifically on Kubernetes are:

 * Stateful sets were not used (this is a requirement) [1]
 * An example deployment file included a dangerous option and the Helm chart adopted it without much consideration [2][3]


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

杜昱萱

unread,
Jun 22, 2018, 4:39:37 AM6/22/18
to rabbitmq-users
While I can see the defined ha-all policy really apply to all queues.
bash-4.2# rabbitmqctl list_queues durable auto_delete consumers policy pid owner_pid exclusive name
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
true    false   0       ha-all  <rab...@rabbitmq-0.3.368.0>             false   agentdmupdate
true    false   10      ha-all  <rab...@rabbitmq-0.3.374.0>             false   diameter-adapter
true    false   20      ha-all  <rab...@rabbitmq-0.3.371.0>             false   cig.adaptationlayer.update.request.impact
true    false   1       ha-all  <rab...@rabbitmq-0.3.377.0>             false   impactexchange.request.lwm2m
true    false   1       ha-all  <rab...@rabbitmq-0.3.380.0>             false   impactexchange.request.oma2
true    false   3       ha-all  <rab...@rabbitmq-0.3.383.0>             false   agentresponse
true    false   1       ha-all  <rab...@rabbitmq-0.3.386.0>             false   agentnotifydefault
true    false   1       ha-all  <rab...@rabbitmq-0.3.389.0>             false   PPGNODE_1
true    false   1       ha-all  <rab...@rabbitmq-0.3.392.0>             false   gateway.message.mbus
true    false   0       ha-all  <rab...@rabbitmq-0.3.395.0>             false   pushnotifyretry
true    false   2       ha-all  <rab...@rabbitmq-0.3.398.0>             false   diameter-adapter-adapter1

And in the helm chart, we use the peer-discovery-k8s plugin to combine cluster.
The node name is based on IP address.
bash-4.2# pwd
/var/lib/rabbitmq/mnesia
bash-4.2# ls
rabbit@192.168.1.140                 rabbit@192.168.1.181                 rabbit@192.168.1.244                 rabbit@192.168.1.41
rab...@192.168.1.140-plugins-expand  rab...@192.168.1.181-plugins-expand  rab...@192.168.1.244-plugins-expand  rab...@192.168.1.41-plugins-expand
rabbit@192.168.1.140.pid             rabbit@192.168.1.181.pid             rabbit@192.168.1.244.pid             rabbit@192.168.1.41.pid
rabbit@192.168.1.172                 rabbit@192.168.1.190                 rabbit@192.168.1.36
rab...@192.168.1.172-plugins-expand  rab...@192.168.1.190-plugins-expand  rab...@192.168.1.36-plugins-expand
rabbit@192.168.1.172.pid             rabbit@192.168.1.190.pid             rabbit@192.168.1.36.pid

After restart, pod IP has changed, every restart generate a new directory.

在 2018年6月22日星期五 UTC+8下午3:44:15,Michael Klishin写道:
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Jun 22, 2018, 5:03:04 AM6/22/18
to rabbitm...@googlegroups.com
You can configure node's data directory [1] to not include the hostname (or IP address),
or make sure that the hostname doesn't change and resolves to whatever IP is currently in use.
The latter is probably optimal because you will have to access management UI somehow and
using IP addresses for that is not what humans like to do.


On Fri, Jun 22, 2018 at 11:39 AM, 杜昱萱 <beibei1...@gmail.com> wrote:
While I can see the defined ha-all policy really apply to all queues.
bash-4.2# rabbitmqctl list_queues durable auto_delete consumers policy pid owner_pid exclusive name
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
true    false   0       ha-all  <rab...@rabbitmq-0.3.368.0>             false   agentdmupdate
true    false   10      ha-all  <rab...@rabbitmq-0.3.374.0>             false   diameter-adapter
true    false   20      ha-all  <rab...@rabbitmq-0.3.371.0>             false   cig.adaptationlayer.update.request.impact
true    false   1       ha-all  <rab...@rabbitmq-0.3.377.0>             false   impactexchange.request.lwm2m
true    false   1       ha-all  <rab...@rabbitmq-0.3.380.0>             false   impactexchange.request.oma2
true    false   3       ha-all  <rab...@rabbitmq-0.3.383.0>             false   agentresponse
true    false   1       ha-all  <rab...@rabbitmq-0.3.386.0>             false   agentnotifydefault
true    false   1       ha-all  <rab...@rabbitmq-0.3.389.0>             false   PPGNODE_1
true    false   1       ha-all  <rab...@rabbitmq-0.3.392.0>             false   gateway.message.mbus
true    false   0       ha-all  <rab...@rabbitmq-0.3.395.0>             false   pushnotifyretry
true    false   2       ha-all  <rab...@rabbitmq-0.3.398.0>             false   diameter-adapter-adapter1

And in the helm chart, we use the peer-discovery-k8s plugin to combine cluster.
The node name is based on IP address.
bash-4.2# pwd
/var/lib/rabbitmq/mnesia
bash-4.2# ls
rabbit@192.168.1.140                 rabbit@192.168.1.181                 rabbit@192.168.1.244                 rabbit@192.168.1.41
rab...@192.168.1.140-plugins-expand  rab...@192.168.1.181-plugins-expand  rabbit@192.168.1.244-plugins-expand  rab...@192.168.1.41-plugins-expand

rabbit@192.168.1.140.pid             rabbit@192.168.1.181.pid             rabbit@192.168.1.244.pid             rabbit@192.168.1.41.pid
rabbit@192.168.1.172                 rabbit@192.168.1.190                 rabbit@192.168.1.36
rab...@192.168.1.172-plugins-expand  rab...@192.168.1.190-plugins-expand  rabbit@192.168.1.36-plugins-expand
rabbit@192.168.1.172.pid             rabbit@192.168.1.190.pid             rabbit@192.168.1.36.pid

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

杜昱萱

unread,
Jun 22, 2018, 5:17:30 AM6/22/18
to rabbitmq-users
Thank you so much!
For the first, you mean set RABBITMQ_MNESIA_DIR to a stable directory?

在 2018年6月22日星期五 UTC+8下午5:03:04,Michael Klishin写道:

Michael Klishin

unread,
Jun 22, 2018, 5:42:19 AM6/22/18
to rabbitm...@googlegroups.com
Yes, e.g. set it to /var/lib/rabbitmq/db via rabbitmq-env.conf [1] or any other way of setting
environment variables and this specific issue should be gone. Note that the RABBITMQ_ prefix must
be removed in rabbitmq-env.conf but present in all other cases.


rab...@192.168.1.140-plugins-expand  rab...@192.168.1.181-plugins-expand  rab...@192.168.1.244-plugins-expand  rabbit@192.168.1.41-plugins-expand

rabbit@192.168.1.140.pid             rabbit@192.168.1.181.pid             rabbit@192.168.1.244.pid             rabbit@192.168.1.41.pid
rabbit@192.168.1.172                 rabbit@192.168.1.190                 rabbit@192.168.1.36
rab...@192.168.1.172-plugins-expand  rab...@192.168.1.190-plugins-expand  rab...@192.168.1.36-plugins-expand
rabbit@192.168.1.172.pid             rabbit@192.168.1.190.pid             rabbit@192.168.1.36.pid

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

杜昱萱

unread,
Jun 22, 2018, 7:37:34 AM6/22/18
to rabbitmq-users
Oh, I change to RABBITMQ_MNESIA_DIR=/var/lib/rabbitmq/mnesia/localhost in rabbitmq-env.conf, the rabbitmq server can also be brang up.
While the issue is, the cluster_nodes.config file under this directory record the previous node ip infomation.
So can not cluster since the pod ip will change after VM restart.

$ kubectl get pod
rabbitmq-0                               0/1       CrashLoopBackOff   3          24m
rabbitmq-1                               0/1       CrashLoopBackOff   3          24m
$ kubectl logs rabbitmq-0
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{failed_to_cluster_with,['rab...@192.168.1.142','rab...@192.168.1.210'],\"Mnesia could not connect to any nodes.\"},{rabbit,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{{failed_to_cluster_with,['rab...@192.168.1.142','rab...@192.168.1.210'],"Mnesia could not connect to any nodes."}

在 2018年6月22日星期五 UTC+8下午5:42:19,Michael Klishin写道:
rab...@192.168.1.140-plugins-expand  rab...@192.168.1.181-plugins-expand  rabbit@192.168.1.244-plugins-expand  rab...@192.168.1.41-plugins-expand

rabbit@192.168.1.140.pid             rabbit@192.168.1.181.pid             rabbit@192.168.1.244.pid             rabbit@192.168.1.41.pid
rabbit@192.168.1.172                 rabbit@192.168.1.190                 rabbit@192.168.1.36
rab...@192.168.1.172-plugins-expand  rab...@192.168.1.190-plugins-expand  rabbit@192.168.1.36-plugins-expand
rabbit@192.168.1.172.pid             rabbit@192.168.1.190.pid             rabbit@192.168.1.36.pid

Michael Klishin

unread,
Jun 22, 2018, 11:03:40 AM6/22/18
to rabbitm...@googlegroups.com
That's why I suggested using domain names that resolve: relying on IP addresses, well, just about anywhere
will go haywire if they change.

Sorry but your remaining issues are not specific to RabbitMQ. RabbitMQ nodes cannot possibly know what the new IP is. It's up to you
to make sure nodes use stable hostnames to identify each other.

On Fri, Jun 22, 2018 at 2:37 PM, 杜昱萱 <beibei1...@gmail.com> wrote:
Oh, I change to RABBITMQ_MNESIA_DIR=/var/lib/rabbitmq/mnesia/localhost in rabbitmq-env.conf, the rabbitmq server can also be brang up.
While the issue is, the cluster_nodes.config file under this directory record the previous node ip infomation.
So can not cluster since the pod ip will change after VM restart.

$ kubectl get pod
rabbitmq-0                               0/1       CrashLoopBackOff   3          24m
rabbitmq-1                               0/1       CrashLoopBackOff   3          24m
$ kubectl logs rabbitmq-0
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{failed_to_cluster_with,['rabbit@192.168.1.142','rab...@192.168.1.210'],\"Mnesia could not connect to any nodes.\"},{rabbit,start,[normal,[]]}}}"}
rab...@192.168.1.140-plugins-expand  rab...@192.168.1.181-plugins-expand  rab...@192.168.1.244-plugins-expand  rabbit@192.168.1.41-plugins-expand

rabbit@192.168.1.140.pid             rabbit@192.168.1.181.pid             rabbit@192.168.1.244.pid             rabbit@192.168.1.41.pid
rabbit@192.168.1.172                 rabbit@192.168.1.190                 rabbit@192.168.1.36
rab...@192.168.1.172-plugins-expand  rab...@192.168.1.190-plugins-expand  rab...@192.168.1.36-plugins-expand
rabbit@192.168.1.172.pid             rabbit@192.168.1.190.pid             rabbit@192.168.1.36.pid

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages