RMQ in a bad state; how can I fix it?

248 views
Skip to first unread message

Albert Meyer

unread,
Dec 2, 2019, 7:02:49 PM12/2/19
to rabbitmq-users
We had a network issue and now RMQ is unhappy. I see RMQ errors in the logs of various openstack components:

/var/log/neutron/neutron-metadata-agent.log:
error: [Errno 32] Broken pipe


/var/log/designate/designate-central.log:

2019-12-02 15:28:14.203 4106 ERROR oslo.messaging._drivers.impl_rabbit [req-155b359b-4468-46ce-a95a-e2a8a50fc3ee 5b288691527245bda715ab7744a193e9 deea2d8541f741eda6fb0d242d16bb23 - - -] [ee9e3b84-55c7-40a4-a7e5-7f9c72688002] AMQP server on us01odc-p02-ctrl2.internal.synopsys.com:5672 is unreachable: [Errno 32] Broken pipe. Trying again in 1 seconds.: error: [Errno 32] Broken pipe

If I look at cluster_status it looks fine:

root@us01odc-p02-ctrl1:/var/log/rabbitmq# rabbitmqctl cluster_status
Cluster status of node 'rabbit@us01odc-p02-ctrl1'
[{nodes,[{disc,['rabbit@us01odc-p02-ctrl1']},
         {ram,['rabbit@us01odc-p02-ctrl3','rabbit@us01odc-p02-ctrl2']}]},
 {running_nodes,['rabbit@us01odc-p02-ctrl3','rabbit@us01odc-p02-ctrl2',
                 'rabbit@us01odc-p02-ctrl1']},
 {cluster_name,<<"rab...@us01odc-p02-ctrl1.internal.synopsys.com">>},
 {partitions,[]},
 {alarms,[{'rabbit@us01odc-p02-ctrl3',[]},
          {'rabbit@us01odc-p02-ctrl2',[]},
          {'rabbit@us01odc-p02-ctrl1',[]}]}]


But when I go to the web interface and look at the Summary I see 1000 Ready messages and 4500 Unacked. If I go to Queues and sort by Total I see the unacked messages under q-plugin:

OverviewMessagesMessage rates+/-
NameNodeFeaturesStateReadyUnacked Totalincomingdeliver / getack
q-pluginus01odc-p02-ctrl1 +2ha-all
running
04,6594,6591.8/s4.8/s3.4/s

The q-plugin queue was out of sync so I synced it but that didn't fix the problem. If I try to get messages from q-plugin I get an error "Queue is empty." If I purge it, the unacked messages don't go away. How can I help this queue recover from the network outage?

Wesley Peng

unread,
Dec 2, 2019, 7:10:54 PM12/2/19
to rabbitm...@googlegroups.com
This seems more likely openstack’s networking issues. 
Have you checked the neutron setup? 
Are they working correctly? 
How about configuration of namespace, firewall, routing etc?

Regards 

Sent from my iPhone

On Dec 3, 2019, at 8:04 AM, Albert Meyer <albert...@gmail.com> wrote:


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/944527ac-0f86-4b96-b430-cab11617512b%40googlegroups.com.

Albert Meyer

unread,
Dec 2, 2019, 9:22:24 PM12/2/19
to rabbitmq-users
We haven't changed the configuration. It wasn't a neutron issue; the external network was failing and somehow that caused RMQ to stack up, and now it isn't working right. Before the network issue everything worked correctly.

The neutron logs look normal except for the RMQ failures. It isn't hard down; I see them losing connection and then getting it back:

designate-sink.log:
2019-12-02 15:23:25.414 4115 ERROR oslo.messaging._drivers.impl_rabbit [-] [a25cf2e1-8f04-43ef-a66b-b3b4694c58db] AMQP server on us01odc-p02-ctrl3.internal.synopsys.com:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.: error: [Errno 111] ECONNREFUSED
2019-12-02 15:23:26.430 4115 INFO oslo.messaging._drivers.impl_rabbit [-] [a25cf2e1-8f04-43ef-a66b-b3b4694c58db] Reconnected to AMQP server on us01odc-p02-ctrl1.internal.synopsys.com:5672 via [amqp] client with port 54292.

designate-central.log:
2019-12-02 15:28:14.203 4106 ERROR oslo.messaging._drivers.impl_rabbit [req-155b359b-4468-46ce-a95a-e2a8a50fc3ee 5b288691527245bda715ab7744a193e9 deea2d8541f741eda6fb0d242d16bb23 - - -] [ee9e3b84-55c7-40a4-a7e5-7f9c72688002] AMQP server on us01odc-p02-ctrl2.internal.synopsys.com:5672 is unreachable: [Errno 32] Broken pipe. Trying again in 1 seconds.: error: [Errno 32] Broken pipe
2019-12-02 15:28:14.733 4106 INFO oslo.messaging._drivers.impl_rabbit [req-fa36139a-2e3c-4ee8-85ef-fdee55cab911 5b288691527245bda715ab7744a193e9 deea2d8541f741eda6fb0d242d16bb23 - - -] [86ec0561-d1fd-4aaf-8df5-06c78f629ae9] Reconnected to AMQP server on us01odc-p02-ctrl2.internal.synopsys.com:5672 via [amqp] client with port 45198.
2019-12-02 15:28:14.858 4106 INFO oslo.messaging._drivers.impl_rabbit [req-4981f664-aca2-431e-b88b-92b8ee5b1975 5b288691527245bda715ab7744a193e9 deea2d8541f741eda6fb0d242d16bb23 - - -] [5b6e58f5-93ad-4689-8d54-89081254de80] Reconnected to AMQP server on us01odc-p02-ctrl1.internal.synopsys.com:5672 via [amqp] client with port 58144.
2019-12-02 15:28:14.912 4106 ERROR oslo.messaging._drivers.impl_rabbit [req-4981f664-aca2-431e-b88b-92b8ee5b1975 5b288691527245bda715ab7744a193e9 deea2d8541f741eda6fb0d242d16bb23 - - -] [2d51ea22-7b76-43c0-a0cc-97429b12a256] AMQP server on us01odc-p02-ctrl2.internal.synopsys.com:5672 is unreachable: [Errno 32] Broken pipe. Trying again in 1 seconds.: error: [Errno 32] Broken pipe
2019-12-02 15:28:14.936 4106 INFO oslo.messaging._drivers.impl_rabbit [req-4981f664-aca2-431e-b88b-92b8ee5b1975 5b288691527245bda715ab7744a193e9 deea2d8541f741eda6fb0d242d16bb23 - - -] [8857e0aa-79c1-46e7-9e2a-d4889209167b] Reconnected to AMQP server on us01odc-p02-ctrl2.internal.synopsys.com:5672 via [amqp] client with port 45204.


On Monday, December 2, 2019 at 4:02:49 PM UTC-8, Albert Meyer wrote:
We had a network issue and now RMQ is unhappy. I see RMQ errors in the logs of various openstack components:

/var/log/neutron/neutron-metadata-agent.log:
error: [Errno 32] Broken pipe


/var/log/designate/designate-central.log:

2019-12-02 15:28:14.203 4106 ERROR oslo.messaging._drivers.impl_rabbit [req-155b359b-4468-46ce-a95a-e2a8a50fc3ee 5b288691527245bda715ab7744a193e9 deea2d8541f741eda6fb0d242d16bb23 - - -] [ee9e3b84-55c7-40a4-a7e5-7f9c72688002] AMQP server on us01odc-p02-ctrl2.internal.synopsys.com:5672 is unreachable: [Errno 32] Broken pipe. Trying again in 1 seconds.: error: [Errno 32] Broken pipe

If I look at cluster_status it looks fine:

root@us01odc-p02-ctrl1:/var/log/rabbitmq# rabbitmqctl cluster_status
Cluster status of node 'rabbit@us01odc-p02-ctrl1'
[{nodes,[{disc,['rabbit@us01odc-p02-ctrl1']},
         {ram,['rabbit@us01odc-p02-ctrl3','rabbit@us01odc-p02-ctrl2']}]},
 {running_nodes,['rabbit@us01odc-p02-ctrl3','rabbit@us01odc-p02-ctrl2',
                 'rabbit@us01odc-p02-ctrl1']},
Reply all
Reply to author
Forward
0 new messages