Sending a message to the queue failed, after a large number of connection closed and reconnect

cuiyd

unread,

Jan 9, 2019, 11:14:41 AM1/9/19

to rabbitmq-users

Hi,

I am using rabbitmq.Because of the batch restart of components, a large number of connections broken and reconnected for a period of time.After that, several processes failed to send the message. The call stack is as follows, but the consumer receives the message and reports a lot of duplicate message logs.It is because the rabbitmq not send ack to the publisher.

图片1.png

I found a suspicious queue by running the command "rabbitmqctl eval 'rabbit_diagnostics:maybe_stuck().'"(the result is as follows), and the same error occurred when I sent a message to the queue through my test script.

Found 6 suspicious processes.

[{pid,<6404.114.0>},

{registered_name,[]},

{current_stacktrace,[{worker_pool_worker,run,2,[]},

{worker_pool_worker,handle_call,3,[]},

{gen_server2,handle_msg,2,[]},

{proc_lib,wake_up,3,

[{file,"proc_lib.erl"},{line,250}]}]},

{initial_call,{proc_lib,init_p,5}},

{dictionary,[{'$ancestors',[worker_pool_sup,rabbit_sup,<6404.87.0>]},

{worker_pool_worker,true},

{'$initial_call',{gen,init_it,6}}]},

{message_queue_len,0},

{links,[<6404.106.0>,<6404.1159.73>]},

{monitors,[]},

{monitored_by,[<6404.104.0>,<6404.107.0>,<6404.24861.72>]},

{heap_size,376}]

[{pid,<6404.24861.72>},

{registered_name,[]},

{current_stacktrace,

[{gen,do_call,4,[{file,"gen.erl"},{line,168}]},

{gen_server2,call,3,[]},

{rabbit_misc,execute_mnesia_transaction,1,

[{file,"src/rabbit_misc.erl"},{line,530}]},

{rabbit_misc,execute_mnesia_tx_with_tail,1,

[{file,"src/rabbit_misc.erl"},{line,572}]},

{rabbit_amqqueue_process,'-terminate_delete/3-fun-1-',6,[]},

{rabbit_amqqueue_process,terminate_shutdown,2,[]},

{gen_server2,terminate,3,[]},

{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,250}]}]},

{initial_call,{proc_lib,init_p,5}},

{dictionary,

[{{xtype_to_module,direct},rabbit_exchange_type_direct},

{'$ancestors',

[<6404.24807.72>,rabbit_amqqueue_sup_sup,rabbit_sup,<6404.87.0>]},

{process_name,

{rabbit_amqqueue_process,

{resource,<<"/">>,queue,

<<"q-agent-notifier-l2pation-update_fanout_497f9863489e424f87e97aee3e11b3ae">>}}},

{'$initial_call',{gen,init_it,6}},

{guid,{{3585719385,3740494773,3798005005,4196272880},0}}]},

{message_queue_len,6761},

{links,[<6404.24807.72>]},

{monitors,[{process,<6404.114.0>}]},

{monitored_by,

[<6404.18372.160>,<6404.299.0>,<6404.11487.141>,<6404.12726.137>,

<6404.17385.137>,<6404.104.0>,<6404.19873.160>,<6404.17444.160>,

<6404.17283.160>,<6404.17942.160>,<6404.18523.160>,<6404.17776.160>,

<6404.16390.160>,<6404.18500.160>,<6404.16935.160>,<6404.2005.160>,

<6404.17894.160>,<6404.2545.160>,<6404.18221.160>,<6404.20078.160>,

<6404.2007.160>]},

{heap_size,833026}]

[{pid,<6404.1159.73>},

{registered_name,[]},

{current_stacktrace,

[{timer,sleep,1,[{file,"timer.erl"},{line,153}]},

{mnesia_tm,restart,9,[{file,"mnesia_tm.erl"},{line,914}]},

{rabbit_misc,'-execute_mnesia_transaction/1-fun-0-',1,

[{file,"src/rabbit_misc.erl"},{line,534}]},

{worker_pool_worker,'-run/2-fun-0-',3,[]}]},

{initial_call,{erlang,apply,2}},

{dictionary,

[{mnesia_activity_state,

{mnesia,{tid,709989,<6404.1159.73>},{tidstore,2907979901,[],1}}},

{random_seed,{10057,8283,23562}},

{worker_pool_worker,true}]},

{message_queue_len,3},

{links,[<6404.114.0>,<6404.152.0>]},

{monitors,[]},

{monitored_by,[]},

{heap_size,376}]

[{pid,<6404.11487.141>},

{registered_name,[]},

{current_stacktrace,

[{gen,do_call,4,[{file,"gen.erl"},{line,168}]},

{gen_server2,call,3,[]},

{rabbit_amqqueue,info,2,[{file,"src/rabbit_amqqueue.erl"},{line,561}]},

{rabbit_misc,with_exit_handler,2,

[{file,"src/rabbit_misc.erl"},{line,495}]},

{rabbit_misc,'-filter_exit_map/2-lc$^0/1-0-',3,

[{file,"src/rabbit_misc.erl"},{line,508}]},

{rabbit_misc,'-filter_exit_map/2-lc$^0/1-0-',3,

[{file,"src/rabbit_misc.erl"},{line,510}]},

{rabbit_misc,filter_exit_map,2,

[{file,"src/rabbit_misc.erl"},{line,508}]},

{rabbit_amqqueue,info_all,2,

[{file,"src/rabbit_amqqueue.erl"},{line,590}]}]},

{initial_call,{erlang,apply,2}},

{dictionary,[{delegate,delegate_3}]},

{message_queue_len,0},

{links,[]},

{monitors,[{process,<6404.24861.72>}]},

{monitored_by,[<6404.14.0>]},

{heap_size,121536}]

[{pid,<6404.12726.137>},

{registered_name,[]},

{current_stacktrace,

[{gen,do_call,4,[{file,"gen.erl"},{line,168}]},

{gen_server2,call,3,[]},

{rabbit_amqqueue,info,2,[{file,"src/rabbit_amqqueue.erl"},{line,561}]},

{rabbit_misc,with_exit_handler,2,

[{file,"src/rabbit_misc.erl"},{line,495}]},

{rabbit_misc,'-filter_exit_map/2-lc$^0/1-0-',3,

[{file,"src/rabbit_misc.erl"},{line,508}]},

{rabbit_misc,'-filter_exit_map/2-lc$^0/1-0-',3,

[{file,"src/rabbit_misc.erl"},{line,510}]},

{rabbit_misc,filter_exit_map,2,

[{file,"src/rabbit_misc.erl"},{line,508}]},

{rabbit_amqqueue,info_all,2,

[{file,"src/rabbit_amqqueue.erl"},{line,590}]}]},

{initial_call,{erlang,apply,2}},

{dictionary,[{delegate,delegate_14}]},

{message_queue_len,0},

{links,[]},

{monitors,[{process,<6404.24861.72>}]},

{monitored_by,[<6404.14.0>]},

{heap_size,196650}]

[{pid,<6404.17385.137>},

{registered_name,[]},

{current_stacktrace,

[{gen,do_call,4,[{file,"gen.erl"},{line,168}]},

{gen_server2,call,3,[]},

{rabbit_amqqueue,info,2,[{file,"src/rabbit_amqqueue.erl"},{line,561}]},

{rabbit_misc,with_exit_handler,2,

[{file,"src/rabbit_misc.erl"},{line,495}]},

{rabbit_misc,'-filter_exit_map/2-lc$^0/1-0-',3,

[{file,"src/rabbit_misc.erl"},{line,508}]},

{rabbit_misc,'-filter_exit_map/2-lc$^0/1-0-',3,

[{file,"src/rabbit_misc.erl"},{line,510}]},

{rabbit_misc,filter_exit_map,2,

[{file,"src/rabbit_misc.erl"},{line,508}]},

{rabbit_amqqueue,info_all,2,

[{file,"src/rabbit_amqqueue.erl"},{line,590}]}]},

{initial_call,{erlang,apply,2}},

{dictionary,[{delegate,delegate_5}]},

{message_queue_len,0},

{links,[]},

{monitors,[{process,<6404.24861.72>}]},

{monitored_by,[<6404.14.0>]},

{heap_size,121536}]

ok

Therefore, it is suspected that there is a error with this queue and resulting this problem. The details of the suspicious queue are as follows. Compared with the detailed information of the normal queue, you can find that there is no slave pid and gm_pids.

{ok,#amqqueue{name = #resource{virtual_host = <<"/">>,

kind = queue,

name = <<"q-agent-notifier-l2pation-update_fanout_497f9863489e424f87e97aee3e11b3ae">>},

durable = false,auto_delete = false,exclusive_owner = none,

arguments = [{<<"x-expires">>,signedint,600000}],

pid = <0.24861.72>,slave_pids = [],sync_slave_pids = [],

down_slave_nodes = [],

policy = [{vhost,<<"/">>},

{name,<<"ha_all_queue">>},

{pattern,<<"^">>},

{'apply-to',<<"queues">>},

{definition,[{<<"ha-mode">>,<<"all">>},

{<<"ha-sync-mode">>,<<"automatic">>},

{<<"max-length">>,46600},

{<<"message-ttl">>,46400000}]},

{priority,1}],

gm_pids = [],decorators = [],state = live}}

the normal queues:

{ok,#amqqueue{name = #resource{virtual_host = <<"/">>,

kind = queue,

name = <<"q-agent-notifier-l2pation-update_fanout_010f35aeea4b415189dc2f5b9b6062ad">>},

durable = false,auto_delete = false,exclusive_owner = none,

arguments = [{<<"x-expires">>,signedint,600000}],

pid = <3186.30389.6>,

slave_pids = [<0.19317.1>],

sync_slave_pids = [<0.19317.1>],

down_slave_nodes = [rabbit@rabbitmqNode0],

policy = [{vhost,<<"/">>},

{name,<<"ha_all_queue">>},

{pattern,<<"^">>},

{'apply-to',<<"queues">>},

{definition,[{<<"ha-mode">>,<<"all">>},

{<<"ha-sync-mode">>,<<"automatic">>},

{<<"max-length">>,46600},

{<<"message-ttl">>,46400000}]},

{priority,1}],

gm_pids = [{<0.19318.1>,<0.19317.1>},

{<3186.30392.6>,<3186.30389.6>}],

decorators = [],state = live}}

Why does rabbitmq not reply ack, and the actual message is sent successfully? Is it because gm_pids is empty? What is the role of gm_pids? Does the call stack mean that it has been stuck in deleting the queue? What would cause the queue's gm_pids to be empty? In this case, can there be a solution instead of restarting rabbitmq?

I am using rabbitmq-3.5.6 on erlang-18.3.

Thanks!

Luke Bakken

unread,

Jan 9, 2019, 4:50:46 PM1/9/19

to rabbitmq-users

Hello,

Please, please upgrade both RabbitMQ and Erlang. RabbitMQ 3.5.6 is over 3 years old, and that version of Erlang is known to have TCP issues.

While the RabbitMQ core engineering team tries to help everyone out, it's not a good use of our time to try to debug issues for such old software.

Thanks,

Luke

Michael Klishin

unread,

Jan 11, 2019, 5:13:17 PM1/11/19

to rabbitm...@googlegroups.com

In this specific case I'm much more likely to think that it's a lack of a sensible channel operation timeout

in RabbitMQ 3.5.x that's the biggest contributor.

That said, Erlang/OTP 18.3 does have bugs that are catastrophic to RabbitMQ.

Please see [1][2][3] and upgrade. Even RabbitMQ 3.6.x has been out of support for over 6 months now.

1. http://www.rabbitmq.com/changelog.html

2. http://rabbitmq.com/upgrade.html

3. http://www.rabbitmq.com/which-erlang.html

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

MK

Staff Software Engineer, Pivotal/RabbitMQ

cuiyd

unread,

Jan 14, 2019, 12:15:33 PM1/14/19

to rabbitmq-users

Thanks for the reply.These days I tested rabbitmq 3.7.6 but when there are a lot of broken and reconnected connections, there will still be a lot of such errors，This error log will still print when the connections are stable.

i use rabbitmq 3.7.6 erlang 20.3.8.14

5000 server with topic rpc-01 already

the same time:

5000 server with topic rpc-02

5000 server with topic rpc-03

and a client with topic rpc-01 publish 60 fanout msgs

the error log:

2019-01-14 21:36:40,280 - DEBUG: Received recoverable error from kombu:

Traceback (most recent call last):

File "xxxxxx/kombu/connection.py", line 436, in _ensured

return fun(*args, **kwargs)

File xxxxxx/kombu/connection.py", line 508, in __call__

return fun(*args, channel=channels[0], **kwargs), channels[0]

File "xxxxxx/oslo_messaging/_drivers/impl_rabbit.py", line 804, in execute_method

method()

File "xxxxxx/oslo_messaging/_drivers/impl_rabbit.py", line 1220, in _publish

compression=self.kombu_compression)

File "xxxxxx/kombu/messaging.py", line 172, in publish

routing_key, mandatory, immediate, exchange, declare)

File "xxxxxx/kombu/messaging.py", line 188, in _publish

mandatory=mandatory, immediate=immediate,

File "xxxxxx/amqp/channel.py", line 2133, in basic_publish_confirm

self.wait(allowed_methods=[(60, 80), (60, 120)], timeout=30)

File xxxxxx/amqp/abstract_channel.py", line 67, in wait

self.channel_id, allowed_methods, timeout)

File "xxxxxx/amqp/connection.py", line 241, in _wait_method

channel, method_sig, args, content = read_timeout(timeout)

File "xxxxxx/amqp/connection.py", line 337, in read_timeout

return self.method_reader.read_method()

File "xxxxxx/amqp/method_framing.py", line 196, in read_method

raise m

timeout

2019-01-14 21:36:40,281 - ERROR: AMQP server on x.x.x.x:5672 is unreachable: . Trying again in 1 seconds.

Then i run the command:‘rabbitmqctl list_queues name messages consumers’,and the following log is printed in rabbitmq-server's log:

2019-01-14 22:16:39.527 [warning] <0.150.0> Mnesia(user@testNode0): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:16:39.645 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:16:39.731 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:16:40.256 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:16:40.402 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:16:40.417 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:16:40.461 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:16:42.045 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:16:42.396 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:16:42.430 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:16:42.437 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:16:42.873 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:16:42.942 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:16:42.962 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:16:42.992 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:16:43.035 [warning] <0.150.0> Mnesia(user@testNode0 ): ** WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

2019-01-14 22:15:53.049 [error] emulator Discarding message {'$gen_call',{<0.17752.2973>,#Ref<0.1372313994.2354315267.255964>},{info,[name,messages,consumers]}} from <0.17752.2973> to <0.17867.20> in an old incarnation (2) of this node (1)

2019-01-14 22:15:53.253 [error] emulator Discarding message {'$gen_call',{<0.17752.2973>,#Ref<0.1372313994.1123287061.26354>},{info,[name,messages,consumers]}} from <0.17752.2973> to <0.23066.20> in an old incarnation (2) of this node (1)

2019-01-14 22:15:53.253 [error] emulator Discarding message {'$gen_call',{<0.17752.2973>,#Ref<0.1372313994.1123287061.26381>},{info,[name,messages,consumers]}} from <0.17752.2973> to <0.18739.20> in an old incarnation (2) of this node (1)

This problem has been bothering me for a long time, I hope to get your help.

Thanks!

cuiyd

在 2019年1月12日星期六 UTC+8上午6:13:17，Michael Klishin写道：

cuiyd

unread,

Jan 14, 2019, 12:23:56 PM1/14/19

to rabbitmq-users

Thank for your reply and sorry for not being able to upgrade the new version in time. These days I tested this problem based on the rabbitmq 3.7.6 and found that there are still similar errors. I will be very happy if I can get your help. For details, please see my reply to Michael .

Thanks

cuiyd
在 2019年1月10日星期四 UTC+8上午5:50:46，Luke Bakken写道：

Michael Klishin

unread,

Jan 14, 2019, 12:31:41 PM1/14/19

to rabbitm...@googlegroups.com

I'm not sure I understand what the question is. Kombu is a Python client that historically had awful limitations

and poor design choices, e.g. it did not support heartbeats at all [1], which means RabbitMQ was closing connections

on it if they didn't have any activity. So if you have mass client disconnects with Kombu, that's my leading hypothesis.

Avoid Kombu if you can help it; Pika is much better client in almost every way.

The warning you almost certainly can ignore.

Lastly, the "previous incarnation" message means that a queue mirror received a message or operation directed at the previous

queue master [2] (but now supposedly this one was elected a new master for the queue). Such messages are discarded.

This was a fairly common warning to see in the 3.5.x and early 3.6.x days but not any more (could be because we handle those scenarios

better or with fewer warnings logged).

1. http://www.rabbitmq.com/heartbeats.html

2. http://www.rabbitmq.com/ha.html#behaviour

cuiyd

unread,

Feb 17, 2019, 10:09:09 AM2/17/19

to rabbitmq-users

Thanks for reply!

I use kombu and oslo_messaging to support heartbeats like other clients.As i known kombu and oslo_messaging do support heartbeats. And oslo_messaging set heartbeat with heartbeat_timeout_threshold.

My problem is :

Because of the restart of lots components, a large number of connections broken and reconnected for a period of time,resulting in several processes failed to send the message continuously. The call stack is as follows, but the consumer receives the message and reports a lot of duplicate message logs.It is because the rabbitmq not send ack to the publisher.

2019-01-14 21:38:41,285 - DEBUG: Received recoverable error from kombu:

Traceback (most recent call last):

File "xxxxxx/kombu/connection.py", line 436, in _ensured

return fun(*args, **kwargs)

File xxxxxx/kombu/connection.py", line 508, in __call__

return fun(*args, channel=channels[0], **kwargs), channels[0]

File "xxxxxx/oslo_messaging/_drivers/impl_rabbit.py", line 804, in execute_method

method()

File "xxxxxx/oslo_messaging/_drivers/impl_rabbit.py", line 1220, in _publish

compression=self.kombu_compression)

File "xxxxxx/kombu/messaging.py", line 172, in publish

routing_key, mandatory, immediate, exchange, declare)

File "xxxxxx/kombu/messaging.py", line 188, in _publish

mandatory=mandatory, immediate=immediate,

File "xxxxxx/amqp/channel.py", line 2133, in basic_publish_confirm

self.wait(allowed_methods=[(60, 80), (60, 120)], timeout=30)

File xxxxxx/amqp/abstract_channel.py", line 67, in wait

self.channel_id, allowed_methods, timeout)

File "xxxxxx/amqp/connection.py", line 241, in _wait_method

channel, method_sig, args, content = read_timeout(timeout)

File "xxxxxx/amqp/connection.py", line 337, in read_timeout

return self.method_reader.read_method()

File "xxxxxx/amqp/method_framing.py", line 196, in read_method

raise m

timeout

2019-01-14 21:38:41,286 - ERROR: AMQP server on x.x.x.x:5672 is unreachable: . Trying again in 1 seconds.

2019-01-14 21:40:42,288 - DEBUG: Received recoverable error from kombu:

Traceback (most recent call last):

File "xxxxxx/kombu/connection.py", line 436, in _ensured

return fun(*args, **kwargs)

File xxxxxx/kombu/connection.py", line 508, in __call__

return fun(*args, channel=channels[0], **kwargs), channels[0]

File "xxxxxx/oslo_messaging/_drivers/impl_rabbit.py", line 804, in execute_method

method()

File "xxxxxx/oslo_messaging/_drivers/impl_rabbit.py", line 1220, in _publish

compression=self.kombu_compression)

File "xxxxxx/kombu/messaging.py", line 172, in publish

routing_key, mandatory, immediate, exchange, declare)

File "xxxxxx/kombu/messaging.py", line 188, in _publish

mandatory=mandatory, immediate=immediate,

File "xxxxxx/amqp/channel.py", line 2133, in basic_publish_confirm

self.wait(allowed_methods=[(60, 80), (60, 120)], timeout=30)

File xxxxxx/amqp/abstract_channel.py", line 67, in wait

self.channel_id, allowed_methods, timeout)

File "xxxxxx/amqp/connection.py", line 241, in _wait_method

channel, method_sig, args, content = read_timeout(timeout)

File "xxxxxx/amqp/connection.py", line 337, in read_timeout

return self.method_reader.read_method()

File "xxxxxx/amqp/method_framing.py", line 196, in read_method

raise m

timeout

2019-01-14 21:40:40,289 - ERROR: AMQP server on x.x.x.x:5672 is unreachable: . Trying again in 1 seconds.

.....

在 2019年1月15日星期二 UTC+8上午1:31:41，Michael Klishin写道：

I'm not sure I understand what the question is. Kombu is a Python client that historically had awful limitations
and poor design choices, e.g. it did not support heartbeats at all [1], which means RabbitMQ was closing connections
on it if they didn't have any activity. So if you have mass client disconnects with Kombu, that's my leading hypothesis.

Avoid Kombu if you can help it; Pika isulator Discarding message {'$gen_call',{<0.17752.2973>,#Ref<0.1372313994.1123287061.26381>},{info,[name,messages,consumers]}} from <0.17752.2973> to <0.18739.20> in an old incarnation (2) of this node (1)

MK

Staff Software Engineer, Pivotal/RabbitMQ much better client in almost every way.

--

You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Luke Bakken

unread,

Feb 17, 2019, 11:07:06 AM2/17/19

to rabbitmq-users

Hello,

These stack traces are only marginally helpful. Please note this message:

ERROR: AMQP server on x.x.x.x:5672 is unreachable

"unreachable" usually means there is a network issue. Are you using a firewall or load balancer between your application and RabbitMQ? What is shown in the RabbitMQ log at the same point in time?

Thanks,

Luke

cuiyd

unread,

Feb 18, 2019, 11:50:33 AM2/18/19

to rabbitmq-users

RabbitMQ log at the same point in time:

（log in Rabbitmq corresponding to each closed connection is as follows about 3mins later,heartbeat is 60s）

2019-02-14 39:36:40.027 [warning] <0.2872.0> closing AMQP connection <0.2872.0> (clientIP:43580 -> mqIP:5672):

missed heartbeats from client, timeout: 60s

The log "ERROR: AMQP server on x.x.x.x:5672 is unreachable" is just beacause of the log above.There is no network issue.If the network is unreachable ，it will log:No route to host or EHOSTUNREACH Immediately Instead of about 2mins later.

在 2019年2月18日星期一 UTC+8上午12:07:06，Luke Bakken写道：

Luke Bakken

unread,

Feb 18, 2019, 11:56:34 AM2/18/19

to rabbitmq-users

Thanks for clarifying that.

"missed heartbeats" is exactly what it says it is - the library you're using and / or your code is preventing heartbeats from being sent to RabbitMQ.

Without a code sample that reproduces what you report, there isn't anything else to add.

Luke

cuiyd

unread,

Feb 18, 2019, 12:14:51 PM2/18/19

to rabbitmq-users

Thanks for reply.

1. I sniffer packet to see the problem. I found first the client close connection beacuse of the timeout error (wait for the ack),then the client stop send heartbeat to rabbitmq,after two haertbeat time,rabbitmq close the connection as missed heartbeat.

So the cause of the connection close is timeout due to wait for ack for rabbitmq.

2. I will attach code sample later.

在 2019年2月19日星期二 UTC+8上午12:56:34，Luke Bakken写道：

Gabriele Santomaggio

unread,

Feb 18, 2019, 1:09:32 PM2/18/19

to rabbitmq-users

Which oslo messaging/kombu/py-amqp version are using?

There are a few fixes about that:

https://github.com/celery/py-amqp/issues/6

recently we fixed a problem related to "Too many heartbeats missed" here:
https://bugs.launchpad.net/oslo.messaging/+bug/1800957

about {dump_log,write_threshold}, read:

https://streamhacker.com/2008/12/10/how-to-eliminate-mnesia-overload-events/

Check also the HA policy, you should not mirror all the queues, expecially the reply_* queues

-

Gabriele

Michael Klishin

unread,

Feb 19, 2019, 1:34:51 PM2/19/19

to rabbitm...@googlegroups.com

Publisher confirms can be lost due to network failures, so publishers should use a reasonable timeout

(comparable to the heartbeat timeout, perhaps) after which they should conclude that the message should be

republished.

Message has been deleted

cuiyd

unread,

Feb 20, 2019, 7:50:19 AM2/20/19

to rabbitmq-users

Thank for reply

I use oslo messaging:4.5.1
py-amqp:1.4.8-1.1

kombu:3.0.32-1
在 2019年2月19日星期二 UTC+8上午2:09:32，Gabriele Santomaggio写道：

cuiyd

unread,

Feb 20, 2019, 8:25:05 AM2/20/19

to rabbitmq-users

Actually the client use a timeout,and the client republished again,however due to waiting timeout,the connection is reconnected and republished continuously.

way to reproduce:

i use rabbitmq 3.7.6 erlang 20.3.8.14

5000 server with topic rpc-01 already

and a client with topic rpc-01 publish 60 fanout msgs

the same time(close the connections by killing the server-net.py process and reconnect immediately by running the server-net.py again,Repeat ablout every 60s):

5000 server with topic rpc-02

5000 server with topic rpc-03

Thank

cuiyd

在 2019年2月20日星期三 UTC+8上午2:34:51，Michael Klishin写道：

client-net.py

server-net.py

rpc-net.conf

cuiyd

unread,

Feb 20, 2019, 10:01:28 PM2/20/19

to rabbitmq-users

Please use the new code.Thanks

在 2019年2月20日星期三 UTC+8下午9:25:05，cuiyd写道：

server-ack.py

client-net.py

rpc-net.conf

cuiyd

unread,

Feb 25, 2019, 12:41:19 PM2/25/19

to rabbitmq-users

This issue happens in a different enviroment,and will always appear when there are a lot of connections closed and reconnected.I am not sure if this problem is due to performance issues or other reasons, so I want to ask the community for help.No one else raised a similar question may be due to the absence of this scene that publish msgs while there are lots of connections closed and reconnected.

在 2019年2月21日星期四 UTC+8上午11:01:28，cuiyd写道：

Reply all

Reply to author

Forward