Sending a message to the queue failed, after a large number of connection closed and reconnect

294 views
Skip to first unread message

cuiyd

unread,
Jan 9, 2019, 11:14:41 AM1/9/19
to rabbitmq-users

Hi,

 

    I am using rabbitmq.Because of the batch restart of components, a large number of connections broken and reconnected for a period of time.After that, several processes failed to send the message. The call stack is as follows, but the consumer receives the message and reports a lot of duplicate message logs.It is because the rabbitmq not send ack to the publisher.

 

图片1.png

 

I found a suspicious queue by running the command "rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().'"(the result is as follows), and the same error occurred when I sent a message to the queue through my test script.

Found 6 suspicious processes.

[{pid,<6404.114.0>},

 {registered_name,[]},

 {current_stacktrace,[{worker_pool_worker,run,2,[]},

                      {worker_pool_worker,handle_call,3,[]},

                      {gen_server2,handle_msg,2,[]},

                      {proc_lib,wake_up,3,

                                [{file,"proc_lib.erl"},{line,250}]}]},

 {initial_call,{proc_lib,init_p,5}},

 {dictionary,[{'$ancestors',[worker_pool_sup,rabbit_sup,<6404.87.0>]},

              {worker_pool_worker,true},

              {'$initial_call',{gen,init_it,6}}]},

 {message_queue_len,0},

 {links,[<6404.106.0>,<6404.1159.73>]},

 {monitors,[]},

 {monitored_by,[<6404.104.0>,<6404.107.0>,<6404.24861.72>]},

 {heap_size,376}]

[{pid,<6404.24861.72>},

{registered_name,[]},

 {current_stacktrace,

     [{gen,do_call,4,[{file,"gen.erl"},{line,168}]},

      {gen_server2,call,3,[]},

      {rabbit_misc,execute_mnesia_transaction,1,

          [{file,"src/rabbit_misc.erl"},{line,530}]},

      {rabbit_misc,execute_mnesia_tx_with_tail,1,

          [{file,"src/rabbit_misc.erl"},{line,572}]},

      {rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6,[]},

      {rabbit_amqqueue_process,terminate_shutdown,2,[]},

      {gen_server2,terminate,3,[]},

      {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,250}]}]},

 {initial_call,{proc_lib,init_p,5}},

 {dictionary,

     [{{xtype_to_module,direct},rabbit_exchange_type_direct},

      {'$ancestors',

          [<6404.24807.72>,rabbit_amqqueue_sup_sup,rabbit_sup,<6404.87.0>]},

      {process_name,

          {rabbit_amqqueue_process,

              {resource,<<"/">>,queue,

                  <<"q-agent-notifier-l2pation-update_fanout_497f9863489e424f87e97aee3e11b3ae">>}}},

      {'$initial_call',{gen,init_it,6}},

      {guid,{{3585719385,3740494773,3798005005,4196272880},0}}]},

 {message_queue_len,6761},

 {links,[<6404.24807.72>]},

 {monitors,[{process,<6404.114.0>}]},

 {monitored_by,

     [<6404.18372.160>,<6404.299.0>,<6404.11487.141>,<6404.12726.137>,

      <6404.17385.137>,<6404.104.0>,<6404.19873.160>,<6404.17444.160>,

      <6404.17283.160>,<6404.17942.160>,<6404.18523.160>,<6404.17776.160>,

      <6404.16390.160>,<6404.18500.160>,<6404.16935.160>,<6404.2005.160>,

      <6404.17894.160>,<6404.2545.160>,<6404.18221.160>,<6404.20078.160>,

      <6404.2007.160>]},

 {heap_size,833026}]

[{pid,<6404.1159.73>},

 {registered_name,[]},

 {current_stacktrace,

     [{timer,sleep,1,[{file,"timer.erl"},{line,153}]},

      {mnesia_tm,restart,9,[{file,"mnesia_tm.erl"},{line,914}]},

      {rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1,

          [{file,"src/rabbit_misc.erl"},{line,534}]},

      {worker_pool_worker,'-run/2-fun-0-',3,[]}]},

 {initial_call,{erlang,apply,2}},

 {dictionary,

     [{mnesia_activity_state,

          {mnesia,{tid,709989,<6404.1159.73>},{tidstore,2907979901,[],1}}},

      {random_seed,{10057,8283,23562}},

      {worker_pool_worker,true}]},

 {message_queue_len,3},

 {links,[<6404.114.0>,<6404.152.0>]},

 {monitors,[]},

 {monitored_by,[]},

 {heap_size,376}]

[{pid,<6404.11487.141>},

 {registered_name,[]},

 {current_stacktrace,

     [{gen,do_call,4,[{file,"gen.erl"},{line,168}]},

      {gen_server2,call,3,[]},

      {rabbit_amqqueue,info,2,[{file,"src/rabbit_amqqueue.erl"},{line,561}]},

      {rabbit_misc,with_exit_handler,2,

          [{file,"src/rabbit_misc.erl"},{line,495}]},

      {rabbit_misc,'-filter_exit_map/2-lc$^0/1-0-',3,

          [{file,"src/rabbit_misc.erl"},{line,508}]},

      {rabbit_misc,'-filter_exit_map/2-lc$^0/1-0-',3,

          [{file,"src/rabbit_misc.erl"},{line,510}]},

      {rabbit_misc,filter_exit_map,2,

          [{file,"src/rabbit_misc.erl"},{line,508}]},

      {rabbit_amqqueue,info_all,2,

          [{file,"src/rabbit_amqqueue.erl"},{line,590}]}]},

 {initial_call,{erlang,apply,2}},

 {dictionary,[{delegate,delegate_3}]},

 {message_queue_len,0},

 {links,[]},

 {monitors,[{process,<6404.24861.72>}]},

 {monitored_by,[<6404.14.0>]},

 {heap_size,121536}]

[{pid,<6404.12726.137>},

 {registered_name,[]},

 {current_stacktrace,

     [{gen,do_call,4,[{file,"gen.erl"},{line,168}]},

      {gen_server2,call,3,[]},

      {rabbit_amqqueue,info,2,[{file,"src/rabbit_amqqueue.erl"},{line,561}]},

      {rabbit_misc,with_exit_handler,2,

          [{file,"src/rabbit_misc.erl"},{line,495}]},

      {rabbit_misc,'-filter_exit_map/2-lc$^0/1-0-',3,

          [{file,"src/rabbit_misc.erl"},{line,508}]},

      {rabbit_misc,'-filter_exit_map/2-lc$^0/1-0-',3,

          [{file,"src/rabbit_misc.erl"},{line,510}]},

      {rabbit_misc,filter_exit_map,2,

          [{file,"src/rabbit_misc.erl"},{line,508}]},

      {rabbit_amqqueue,info_all,2,

          [{file,"src/rabbit_amqqueue.erl"},{line,590}]}]},

 {initial_call,{erlang,apply,2}},

 {dictionary,[{delegate,delegate_14}]},

 {message_queue_len,0},

 {links,[]},

 {monitors,[{process,<6404.24861.72>}]},

 {monitored_by,[<6404.14.0>]},

 {heap_size,196650}]

[{pid,<6404.17385.137>},

 {registered_name,[]},

 {current_stacktrace,

     [{gen,do_call,4,[{file,"gen.erl"},{line,168}]},

      {gen_server2,call,3,[]},

      {rabbit_amqqueue,info,2,[{file,"src/rabbit_amqqueue.erl"},{line,561}]},

      {rabbit_misc,with_exit_handler,2,

          [{file,"src/rabbit_misc.erl"},{line,495}]},

      {rabbit_misc,'-filter_exit_map/2-lc$^0/1-0-',3,

          [{file,"src/rabbit_misc.erl"},{line,508}]},

      {rabbit_misc,'-filter_exit_map/2-lc$^0/1-0-',3,

          [{file,"src/rabbit_misc.erl"},{line,510}]},

      {rabbit_misc,filter_exit_map,2,

          [{file,"src/rabbit_misc.erl"},{line,508}]},

      {rabbit_amqqueue,info_all,2,

          [{file,"src/rabbit_amqqueue.erl"},{line,590}]}]},

 {initial_call,{erlang,apply,2}},

 {dictionary,[{delegate,delegate_5}]},

 {message_queue_len,0},

 {links,[]},

 {monitors,[{process,<6404.24861.72>}]},

 {monitored_by,[<6404.14.0>]},

 {heap_size,121536}]

ok


Therefore, it is suspected that there is a error with this queue and resulting this problem. The details of the suspicious queue are as follows. Compared with the detailed information of the normal queue, you can find that there is no slave pid and gm_pids.


{ok,#amqqueue{name = #resource{virtual_host = <<"/">>,

                               kind = queue,

                               name = <<"q-agent-notifier-l2pation-update_fanout_497f9863489e424f87e97aee3e11b3ae">>},

              durable = false,auto_delete = false,exclusive_owner = none,

              arguments = [{<<"x-expires">>,signedint,600000}],

              pid = <0.24861.72>,slave_pids = [],sync_slave_pids = [],

              down_slave_nodes = [],

              policy = [{vhost,<<"/">>},

                        {name,<<"ha_all_queue">>},

                        {pattern,<<"^">>},

                        {'apply-to',<<"queues">>},

                        {definition,[{<<"ha-mode">>,<<"all">>},

                                     {<<"ha-sync-mode">>,<<"automatic">>},

                                     {<<"max-length">>,46600},

                                     {<<"message-ttl">>,46400000}]},

                        {priority,1}],

              gm_pids = [],decorators = [],state = live}}


the normal queues:

{ok,#amqqueue{name = #resource{virtual_host = <<"/">>,

                               kind = queue,

                               name = <<"q-agent-notifier-l2pation-update_fanout_010f35aeea4b415189dc2f5b9b6062ad">>},

              durable = false,auto_delete = false,exclusive_owner = none,

              arguments = [{<<"x-expires">>,signedint,600000}],

              pid = <3186.30389.6>,

              slave_pids = [<0.19317.1>],

              sync_slave_pids = [<0.19317.1>],

              down_slave_nodes = [rabbit@rabbitmqNode0],

              policy = [{vhost,<<"/">>},

                        {name,<<"ha_all_queue">>},

                        {pattern,<<"^">>},

                        {'apply-to',<<"queues">>},

                        {definition,[{<<"ha-mode">>,<<"all">>},

                                     {<<"ha-sync-mode">>,<<"automatic">>},

                                     {<<"max-length">>,46600},

                                     {<<"message-ttl">>,46400000}]},

                        {priority,1}],

              gm_pids = [{<0.19318.1>,<0.19317.1>},

                         {<3186.30392.6>,<3186.30389.6>}],

              decorators = [],state = live}}


Why does rabbitmq not reply ack, and the actual message is sent successfully? Is it because gm_pids is empty? What is the role of gm_pids? Does the call stack mean that it has been stuck in deleting the queue? What would cause the queue's gm_pids to be empty? In this case, can there be a solution  instead of restarting rabbitmq?


I am using rabbitmq-3.5.6 on erlang-18.3.


Thanks!

Luke Bakken

unread,
Jan 9, 2019, 4:50:46 PM1/9/19
to rabbitmq-users
Hello,

Please, please upgrade both RabbitMQ and Erlang. RabbitMQ 3.5.6 is over 3 years old, and that version of Erlang is known to have TCP issues.

While the RabbitMQ core engineering team tries to help everyone out, it's not a good use of our time to try to debug issues for such old software.

Thanks,
Luke

Michael Klishin

unread,
Jan 11, 2019, 5:13:17 PM1/11/19
to rabbitm...@googlegroups.com
In this specific case I'm much more likely to think that it's a lack of a sensible channel operation timeout
in RabbitMQ 3.5.x that's the biggest contributor.

That said, Erlang/OTP 18.3 does have bugs that are catastrophic to RabbitMQ.

Please see [1][2][3] and upgrade. Even RabbitMQ 3.6.x has been out of support for over 6 months now.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

cuiyd

unread,
Jan 14, 2019, 12:15:33 PM1/14/19
to rabbitmq-users
Thanks for the reply.These days I tested rabbitmq 3.7.6 but when there are a lot of broken and reconnected connections, there will still be a lot of such errors,This error log will still print when the connections are stable. 

i use rabbitmq 3.7.6 erlang 20.3.8.14 
5000 server with topic rpc-01 already
the same time:
5000 server with topic rpc-02
5000 server with topic rpc-03
and a client with topic rpc-01 publish 60 fanout msgs
 
the error log:
2019-01-14 21:36:40,280 - DEBUG: Received recoverable error from kombu:
Traceback (most recent call last):
  File "xxxxxx/kombu/connection.py", line 436, in _ensured
    return fun(*args, **kwargs)
  File xxxxxx/kombu/connection.py", line 508, in __call__
    return fun(*args, channel=channels[0], **kwargs), channels[0]
  File "xxxxxx/oslo_messaging/_drivers/impl_rabbit.py", line 804, in execute_method
    method()
  File "xxxxxx/oslo_messaging/_drivers/impl_rabbit.py", line 1220, in _publish
    compression=self.kombu_compression)
  File "xxxxxx/kombu/messaging.py", line 172, in publish
    routing_key, mandatory, immediate, exchange, declare)
  File "xxxxxx/kombu/messaging.py", line 188, in _publish
    mandatory=mandatory, immediate=immediate,
  File "xxxxxx/amqp/channel.py", line 2133, in basic_publish_confirm
    self.wait(allowed_methods=[(60, 80), (60, 120)], timeout=30)
  File xxxxxx/amqp/abstract_channel.py", line 67, in wait
    self.channel_id, allowed_methods, timeout)
  File "xxxxxx/amqp/connection.py", line 241, in _wait_method
    channel, method_sig, args, content = read_timeout(timeout)
  File "xxxxxx/amqp/connection.py", line 337, in read_timeout
    return self.method_reader.read_method()
  File "xxxxxx/amqp/method_framing.py", line 196, in read_method
    raise m
timeout
2019-01-14 21:36:40,281 - ERROR: AMQP server on x.x.x.x:5672 is unreachable: . Trying again in 1 seconds.




Then i run the command:‘rabbitmqctl list_queues name messages consumers’,and  the following log is printed in rabbitmq-server's log:

2019-01-14 22:16:39.527 [warning] <0.150.0> Mnesia(user@testNode0): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
2019-01-14 22:16:39.645 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
2019-01-14 22:16:39.731 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
2019-01-14 22:16:40.256 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
2019-01-14 22:16:40.402 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
2019-01-14 22:16:40.417 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
2019-01-14 22:16:40.461 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
2019-01-14 22:16:42.045 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
2019-01-14 22:16:42.396 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
2019-01-14 22:16:42.430 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
2019-01-14 22:16:42.437 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
2019-01-14 22:16:42.873 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
2019-01-14 22:16:42.942 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
2019-01-14 22:16:42.962 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
2019-01-14 22:16:42.992 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}
2019-01-14 22:16:43.035 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}


2019-01-14 22:15:53.049 [error] emulator Discarding message {'$gen_call',{<0.17752.2973>,#Ref<0.1372313994.2354315267.255964>},{info,[name,messages,consumers]}} from <0.17752.2973> to <0.17867.20> in an old incarnation (2) of this node (1)

2019-01-14 22:15:53.253 [error] emulator Discarding message {'$gen_call',{<0.17752.2973>,#Ref<0.1372313994.1123287061.26354>},{info,[name,messages,consumers]}} from <0.17752.2973> to <0.23066.20> in an old incarnation (2) of this node (1)

2019-01-14 22:15:53.253 [error] emulator Discarding message {'$gen_call',{<0.17752.2973>,#Ref<0.1372313994.1123287061.26381>},{info,[name,messages,consumers]}} from <0.17752.2973> to <0.18739.20> in an old incarnation (2) of this node (1)


This problem has been bothering me for a long time, I hope to get your help.

Thanks!
cuiyd

在 2019年1月12日星期六 UTC+8上午6:13:17,Michael Klishin写道:

cuiyd

unread,
Jan 14, 2019, 12:23:56 PM1/14/19
to rabbitmq-users
Thank for your reply and  sorry for not being able to upgrade the new version in time. These days I tested this problem based on the rabbitmq 3.7.6 and found that there are still similar errors. I will be very happy if I can get your help. For details, please see my reply to Michael .

Thanks
cuiyd
在 2019年1月10日星期四 UTC+8上午5:50:46,Luke Bakken写道:

Michael Klishin

unread,
Jan 14, 2019, 12:31:41 PM1/14/19
to rabbitm...@googlegroups.com
I'm not sure I understand what the question is. Kombu is a Python client that historically had awful limitations
and poor design choices, e.g. it did not support heartbeats at all [1], which means RabbitMQ was closing connections
on it if they didn't have any activity. So if you have mass client disconnects with Kombu, that's my leading hypothesis.

Avoid Kombu if you can help it; Pika is much better client in almost every way.

The warning you almost certainly can ignore.

Lastly, the "previous incarnation" message means that a queue mirror received a message or operation directed at the previous
queue master [2] (but now supposedly this one was elected a new master for the queue). Such messages are discarded.

This was a fairly common warning to see in the 3.5.x and early 3.6.x days but not any more (could be because we handle those scenarios
better or with fewer warnings logged).

cuiyd

unread,
Feb 17, 2019, 10:09:09 AM2/17/19
to rabbitmq-users
Thanks for reply!
I use kombu and oslo_messaging to support heartbeats like other clients.As i known kombu and oslo_messaging do support heartbeats. And oslo_messaging set heartbeat with heartbeat_timeout_threshold.
My problem is :
Because of the restart of lots components, a large number of connections broken and reconnected for a period of time,resulting in several processes failed to send the message continuously. The call stack is as follows, but the consumer receives the message and reports a lot of duplicate message logs.It is because the rabbitmq not send ack to the publisher.

2019-01-14 21:38:41,285 - DEBUG: Received recoverable error from kombu:
Traceback (most recent call last):
  File "xxxxxx/kombu/connection.py", line 436, in _ensured
    return fun(*args, **kwargs)
  File xxxxxx/kombu/connection.py", line 508, in __call__
    return fun(*args, channel=channels[0], **kwargs), channels[0]
  File "xxxxxx/oslo_messaging/_drivers/impl_rabbit.py", line 804, in execute_method
    method()
  File "xxxxxx/oslo_messaging/_drivers/impl_rabbit.py", line 1220, in _publish
    compression=self.kombu_compression)
  File "xxxxxx/kombu/messaging.py", line 172, in publish
    routing_key, mandatory, immediate, exchange, declare)
  File "xxxxxx/kombu/messaging.py", line 188, in _publish
    mandatory=mandatory, immediate=immediate,
  File "xxxxxx/amqp/channel.py", line 2133, in basic_publish_confirm
    self.wait(allowed_methods=[(60, 80), (60, 120)], timeout=30)
  File xxxxxx/amqp/abstract_channel.py", line 67, in wait
    self.channel_id, allowed_methods, timeout)
  File "xxxxxx/amqp/connection.py", line 241, in _wait_method
    channel, method_sig, args, content = read_timeout(timeout)
  File "xxxxxx/amqp/connection.py", line 337, in read_timeout
    return self.method_reader.read_method()
  File "xxxxxx/amqp/method_framing.py", line 196, in read_method
    raise m
timeout
2019-01-14 21:38:41,286 - ERROR: AMQP server on x.x.x.x:5672 is unreachable: . Trying again in 1 seconds.
2019-01-14 21:40:42,288 - DEBUG: Received recoverable error from kombu:
Traceback (most recent call last):
  File "xxxxxx/kombu/connection.py", line 436, in _ensured
    return fun(*args, **kwargs)
  File xxxxxx/kombu/connection.py", line 508, in __call__
    return fun(*args, channel=channels[0], **kwargs), channels[0]
  File "xxxxxx/oslo_messaging/_drivers/impl_rabbit.py", line 804, in execute_method
    method()
  File "xxxxxx/oslo_messaging/_drivers/impl_rabbit.py", line 1220, in _publish
    compression=self.kombu_compression)
  File "xxxxxx/kombu/messaging.py", line 172, in publish
    routing_key, mandatory, immediate, exchange, declare)
  File "xxxxxx/kombu/messaging.py", line 188, in _publish
    mandatory=mandatory, immediate=immediate,
  File "xxxxxx/amqp/channel.py", line 2133, in basic_publish_confirm
    self.wait(allowed_methods=[(60, 80), (60, 120)], timeout=30)
  File xxxxxx/amqp/abstract_channel.py", line 67, in wait
    self.channel_id, allowed_methods, timeout)
  File "xxxxxx/amqp/connection.py", line 241, in _wait_method
    channel, method_sig, args, content = read_timeout(timeout)
  File "xxxxxx/amqp/connection.py", line 337, in read_timeout
    return self.method_reader.read_method()
  File "xxxxxx/amqp/method_framing.py", line 196, in read_method
    raise m
timeout
2019-01-14 21:40:40,289 - ERROR: AMQP server on x.x.x.x:5672 is unreachable: . Trying again in 1 seconds.
.....
在 2019年1月15日星期二 UTC+8上午1:31:41,Michael Klishin写道:
I'm not sure I understand what the question is. Kombu is a Python client that historically had awful limitations
and poor design choices, e.g. it did not support heartbeats at all [1], which means RabbitMQ was closing connections
on it if they didn't have any activity. So if you have mass client disconnects with Kombu, that's my leading hypothesis.

Avoid Kombu if you can help it; Pika isulator Discarding message {'$gen_call',{<0.17752.2973>,#Ref<0.1372313994.1123287061.26381>},{info,[name,messages,consumers]}} from <0.17752.2973> to <0.18739.20> in an old incarnation (2) of this node (1)
MK

Staff Software Engineer, Pivotal/RabbitMQ much better client in almost every way.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Luke Bakken

unread,
Feb 17, 2019, 11:07:06 AM2/17/19
to rabbitmq-users
Hello,

These stack traces are only marginally helpful. Please note this message:

ERROR: AMQP server on x.x.x.x:5672 is unreachable

"unreachable" usually means there is a network issue. Are you using a firewall or load balancer between your application and RabbitMQ? What is shown in the RabbitMQ log at the same point in time?

Thanks,
Luke

cuiyd

unread,
Feb 18, 2019, 11:50:33 AM2/18/19
to rabbitmq-users
RabbitMQ log at the same point in time:
(log in Rabbitmq corresponding to each closed connection is as follows about 3mins later,heartbeat is 60s)
2019-02-14 39:36:40.027 [warning] <0.2872.0> closing AMQP connection <0.2872.0> (clientIP:43580 -> mqIP:5672):
missed heartbeats from client, timeout: 60s

The log "ERROR: AMQP server on x.x.x.x:5672 is unreachable" is just beacause of the log above.There is no network issue.If the network is unreachable ,it will log:No route to host or EHOSTUNREACH Immediately Instead of about 2mins later.

在 2019年2月18日星期一 UTC+8上午12:07:06,Luke Bakken写道:

Luke Bakken

unread,
Feb 18, 2019, 11:56:34 AM2/18/19
to rabbitmq-users
Thanks for clarifying that.

"missed heartbeats" is exactly what it says it is - the library you're using and / or your code is preventing heartbeats from being sent to RabbitMQ.

Without a code sample that reproduces what you report, there isn't anything else to add.

Luke

cuiyd

unread,
Feb 18, 2019, 12:14:51 PM2/18/19
to rabbitmq-users
Thanks for reply.
1. I sniffer packet to see the problem. I found first the client close connection beacuse of the timeout error (wait for the ack),then the client stop send heartbeat to rabbitmq,after two haertbeat time,rabbitmq close the connection as missed heartbeat.
So the cause of the connection close is timeout due to wait for ack for rabbitmq. 
2. I will attach code sample later.


在 2019年2月19日星期二 UTC+8上午12:56:34,Luke Bakken写道:

Gabriele Santomaggio

unread,
Feb 18, 2019, 1:09:32 PM2/18/19
to rabbitmq-users
Which oslo messaging/kombu/py-amqp version are using?

There are a few fixes about that:
https://github.com/celery/py-amqp/issues/6

recently we fixed a problem related to "Too many heartbeats missed" here:
https://bugs.launchpad.net/oslo.messaging/+bug/1800957

about {dump_log,write_threshold}, read:

Check also the HA policy, you should not mirror all the queues, expecially the reply_* queues

-
Gabriele

Michael Klishin

unread,
Feb 19, 2019, 1:34:51 PM2/19/19
to rabbitm...@googlegroups.com
Publisher confirms can be lost due to network failures, so publishers should use a reasonable timeout
(comparable to the heartbeat timeout, perhaps) after which they should conclude that the message should be
republished.
Message has been deleted

cuiyd

unread,
Feb 20, 2019, 7:50:19 AM2/20/19
to rabbitmq-users
Thank for reply
I use oslo messaging:4.5.1
py-amqp:1.4.8-1.1
kombu:3.0.32-1
在 2019年2月19日星期二 UTC+8上午2:09:32,Gabriele Santomaggio写道:

cuiyd

unread,
Feb 20, 2019, 8:25:05 AM2/20/19
to rabbitmq-users
Actually the client use a timeout,and the client republished again,however due to waiting timeout,the connection is reconnected and republished continuously.

way to reproduce:
i use rabbitmq 3.7.6 erlang 20.3.8.14 
5000 server with topic rpc-01 already
and a client with topic rpc-01 publish 60 fanout msgs
the same time(close the connections by killing the server-net.py process and reconnect immediately by running the server-net.py again,Repeat ablout every 60s):
5000 server with topic rpc-02
5000 server with topic rpc-03

Thank
cuiyd

在 2019年2月20日星期三 UTC+8上午2:34:51,Michael Klishin写道:
client-net.py
server-net.py
rpc-net.conf

cuiyd

unread,
Feb 20, 2019, 10:01:28 PM2/20/19
to rabbitmq-users
Please use the new code.Thanks

在 2019年2月20日星期三 UTC+8下午9:25:05,cuiyd写道:
server-ack.py
client-net.py
rpc-net.conf

cuiyd

unread,
Feb 25, 2019, 12:41:19 PM2/25/19
to rabbitmq-users
This issue happens in a different enviroment,and will always appear when there are a lot of connections closed and reconnected.I am not sure if this problem is due to performance issues or other reasons, so I want to ask the community for help.No one else raised a similar question may be due to the absence of this scene that publish msgs while there are lots of connections closed and reconnected.

在 2019年2月21日星期四 UTC+8上午11:01:28,cuiyd写道:
Reply all
Reply to author
Forward
0 new messages