RabbitMQ on OpenStack: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply

2,480 views
Skip to first unread message

sankube

unread,
Jul 21, 2020, 8:10:34 AM7/21/20
to rabbitmq-users
Hi,

OpenStack train release deployed on kubernetes. RabbitMQ version 3.7.13

I see the below error only few times randomly when trying to create an instance. I see below error in nova-conductor log:

2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager [req-77f1419f-f62f-46f5-bcad-37797a3460f9 dddd13ba833f41d9bc8103980ffd0cf2 90499b64c9cc4d68a81f26ff751c0e3a - default default] Failed to schedule instances: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 32bf2a91b7e24492aaf0bc7bfcdadb44
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager Traceback (most recent call last):
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager   File "/var/lib/openstack/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 397, in get
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager     return self._queues[msg_id].get(block=True, timeout=timeout)
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager   File "/var/lib/openstack/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager     return waiter.wait()
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager   File "/var/lib/openstack/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager     return get_hub().switch()
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager   File "/var/lib/openstack/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager     return self.greenlet.switch()
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager queue.Empty
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager During handling of the above exception, another exception occurred:
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager Traceback (most recent call last):
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager   File "/var/lib/openstack/lib/python3.6/site-packages/nova/conductor/manager.py", line 1360, in schedule_and_build_instances
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager     instance_uuids, return_alternates=True)
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager   File "/var/lib/openstack/lib/python3.6/site-packages/nova/conductor/manager.py", line 839, in _schedule_instances
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager     return_alternates=return_alternates)
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager   File "/var/lib/openstack/lib/python3.6/site-packages/nova/scheduler/client/query.py", line 42, in select_destinations
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager     instance_uuids, return_objects, return_alternates)
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager   File "/var/lib/openstack/lib/python3.6/site-packages/nova/scheduler/rpcapi.py", line 160, in select_destinations
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager     return cctxt.call(ctxt, 'select_destinations', **msg_args)
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager   File "/var/lib/openstack/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager     transport_options=self.transport_options)
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager   File "/var/lib/openstack/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager     transport_options=transport_options)
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager   File "/var/lib/openstack/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 646, in send
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager     transport_options=transport_options)
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager   File "/var/lib/openstack/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 634, in _send
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager     call_monitor_timeout)
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager   File "/var/lib/openstack/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 523, in wait
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager     message = self.waiters.get(msg_id, timeout=timeout)
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager   File "/var/lib/openstack/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 401, in get
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager     'to message ID %s' % msg_id)
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 32bf2a91b7e24492aaf0bc7bfcdadb44
2020-07-21 11:16:44.940 1 ERROR nova.conductor.manager

When I grep rabbitmq logs for errors, all I see is below messages, but there's no actual errors themselves. Is there a way to see actual error?

2020-07-21 08:22:25.000 [warning] <0.118.0> lager_error_logger_h dropped 12 messages in the last second that exceeded the limit of 1000 messages/sec
2020-07-21 08:39:05.999 [warning] <0.118.0> lager_error_logger_h dropped 12 messages in the last second that exceeded the limit of 1000 messages/sec
2020-07-21 08:53:33.012 [warning] <0.118.0> lager_error_logger_h dropped 1 messages in the last second that exceeded the limit of 1000 messages/sec
2020-07-21 09:12:25.000 [warning] <0.118.0> lager_error_logger_h dropped 3 messages in the last second that exceeded the limit of 1000 messages/sec
2020-07-21 09:29:26.012 [warning] <0.118.0> lager_error_logger_h dropped 21 messages in the last second that exceeded the limit of 1000 messages/sec
2020-07-21 09:44:24.005 [warning] <0.118.0> lager_error_logger_h dropped 7 messages in the last second that exceeded the limit of 1000 messages/sec
2020-07-21 10:00:35.999 [warning] <0.118.0> lager_error_logger_h dropped 8 messages in the last second that exceeded the limit of 1000 messages/sec
2020-07-21 10:17:32.013 [warning] <0.118.0> lager_error_logger_h dropped 6 messages in the last second that exceeded the limit of 1000 messages/sec
2020-07-21 11:06:27.000 [warning] <0.118.0> lager_error_logger_h dropped 5 messages in the last second that exceeded the limit of 1000 messages/sec
2020-07-21 11:36:55.999 [warning] <0.118.0> lager_error_logger_h dropped 3 messages in the last second that exceeded the limit of 1000 messages/sec
2020-07-21 11:52:46.999 [warning] <0.118.0> lager_error_logger_h dropped 6 messages in the last second that exceeded the limit of 1000 messages/sec

I see few 'client unexpectedly closed connection' but think that's okay:

2020-07-21 11:16:10.318 [warning] <0.29574.49> closing AMQP connection <0.29574.49> (15.119.163.145:48536 -> 10.233.101.4:5672 - health-probe.py:26463:f4313de5-8770-418a-a053-712b5592fc96, vhost: 'neutron', user: 'neutron'):
client unexpectedly closed TCP connection
2020-07-21 11:16:51.413 [info] <0.30197.49> accepting AMQP connection <0.30197.49> (10.233.64.0:43144 -> 10.233.101.4:5672)
--
2020-07-21 11:16:52.313 [warning] <0.30197.49> closing AMQP connection <0.30197.49> (10.233.64.0:43144 -> 10.233.101.4:5672 - health-probe.py:5746:8447ed6c-a40f-42ec-a582-8ac80e7f93ff, vhost: 'nova', user: 'nova'):
client unexpectedly closed TCP connection
2020-07-21 11:17:03.513 [info] <0.30226.49> accepting AMQP connection <0.30226.49> (10.233.127.0:46908 -> 10.233.101.4:5672)
--
2020-07-21 11:17:03.688 [warning] <0.30226.49> closing AMQP connection <0.30226.49> (10.233.127.0:46908 -> 10.233.101.4:5672 - health-probe.py:10801:48e46432-b20d-48e5-a988-281886fdb1cb, vhost: 'neutron', user: 'neutron'):
client unexpectedly closed TCP connection

 I checked nova-scheduler logs and there's no trace of corresponding request there. So I assume nova-conductor could publish message to RabbitMQ but the message didn't reach nova-scheduler. What approach should we follow to debug this?

NOTE: We currently have not enabled debug for oslo.messaging in other service.


Reply all
Reply to author
Forward
0 new messages