[HA] Plenty of messages with "discarding message" errors after node restart in the cluster

3,632 views
Skip to first unread message

Ivan Udovichenko

unread,
Jul 29, 2019, 10:26:38 AM7/29/19
to rabbitmq-users
Hello,

Rabbitmq: 3.7.16
Erlang 22.0.7

I run cluster with 3 nodes, 4 queues with ha:all policy, master was selected as node 2 for all queues.
Generated a load of about 800 m/s.

Then I tried to restart the 2nd node and noticed errors like in the log file of 2nd node:
2019-07-29 14:14:43.032 [error] emulator Discarding message {'$gen_call',{<0.442.0>,#Ref<0.3746561768.1184890881.104501>},stat} from <0.442.0> to <0.32134.65> in an old incarnation (1) of this node (2)

2019-07-29 14:14:43.035 [error] emulator Discarding message {'$gen_call',{<0.435.0>,#Ref<0.3746561768.1184890881.104506>},stat} from <0.435.0> to <0.3956.17> in an old incarnation (1) of this node (2)

2019-07-29 14:14:43.311 [error] emulator Discarding message {'$gen_call',{<0.431.0>,#Ref<0.3746561768.1184890881.104511>},stat} from <0.431.0> to <0.32141.65> in an old incarnation (1) of this node (2)

Could you please explain what does that mean in details ?
I found an explanation of Mickaël Rémond [1] where he stated that: "The problem is that you have process ID retained in your cluster that come from a node that has been stopped and restarted."

Is that the same case ? Could you please help to find out that process IDs and information about cluster, because I can't prove the information from Mickaël.

Thank you in advance!

Luke Bakken

unread,
Jul 29, 2019, 3:32:05 PM7/29/19
to rabbitmq-users
Hi Ivan,

In this case "message" is an Erlang VM inter-process message, not an AMQP message.

You can expect to see log entries like these when you restart a node.

Do you see any other issues during your testing?

Ivan Udovichenko

unread,
Jul 29, 2019, 4:18:37 PM7/29/19
to rabbitmq-users
Hi Luke,

Thank you for your reply. I wan't to understand it's nature, why during normal restart there are so many errors (at first sight).
Yes, there are also other errors like:
...
2019-07-29 14:11:03.230 [error] emulator Discarding message {'$gen_cast',{force_event_refresh,#Ref<0.3746561768.11846
28737.258574>}} from <0.291.0> to <0.30239.65> in an old incarnation (1) of this node (2)
2019-07-29 14:11:03.230 [error] emulator Discarding message {'$gen_cast',{force_event_refresh,#Ref<0.3746561768.11846
28737.258574>}} from <0.291.0> to <0.28736.64> in an old incarnation (1) of this node (2)
...
2019-07-29 14:12:20.902 [error] emulator Discarding message {'$gen_cast',{deliver,{delivery,false,true,<6204.12121.0>
,{basic_message,{resource,<<"/openstack">>,exchange,<<"cinder-scheduler_fanout">>},[<<>>],{content,60,{'P_basic',<<"a
pplication/json">>,<<"utf-8">>,[],2,0,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined
,undefined},<<248,0,16,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110,5,117,116,102,45,56,0,0,0,0,2,0>>,
rabbit_framing_amqp_0_9_1,[<<"{\"oslo.message\": \"{\\"_context_domain\\": null, \\"_context_request_id\\": \\"req-fe
64f644-2826-42fc-a3c7-c28208f1df1b\\", \\"_context_global_request_id\\": null, \\"_context_quota_class\\": null, \\"_
context_service_catalog\\": [], \\"_context_auth_token\\": null, \\"_context_resource_uuid\\": null, \\"_context_user
\\": null, \\"_context_user_id\\": null, \\"_context_show_deleted\\": false, \\"_context_is_admin\\": true, \\"versio
n\\": \\"3.3\\", \\"_context_project_domain\\": null, \\"_context_timestamp\\": \\"2019-07-29T14:11:47.001291\\", \\"
method\\": \\"update_service_capabilities\\", \\"_context_project\\": null, \\"_context_remote_address\\": null, \\"_
context_roles\\": [\\"admin\\"], \\"args\\": {\\"cluster_name\\": null, \\"service_name\\": \\"volume\\", \\"host\\":
 \\"cmp2@lvm-driver\\", \\"capabilities\\": {\\"filter_function\\": null, \\"goodness_function\\": null, \\"shared_ta
rgets\\": false, \\"volume_backend_name\\": \\"lvm-driver\\", \\"driver_version\\": \\"3.0.0\\", \\"sparse_copy_volum
e\\": true, \\"pools\\": [{\\"pool_name\\": \\"lvm-driver\\", \\"filter_function\\": null, \\"goodness_function\\": n
ull, \\"multiattach\\": true, \\"total_volumes\\": 1, \\"provisioned_capacity_gb\\": 0.0, \\"allocated_capacity_gb\\"
: 0, \\"thin_provisioning_support\\": true, \\"free_capacity_gb\\": 19.0, \\"location_info\\": \\"LVMVolumeDriver:cmp
2:cinder-volumes:thin:0\\", \\"total_capacity_gb\\": 19.0, \\"thick_provisioning_support\\": false, \\"reserved_percentage\\": 0, \\"QoS_support\\": false, \\"max_over_subscription_ratio\\": \\"20.0\\", \\"backend_state\\": \\"up\\"}], \\"vendor_name\\": \\"Open Source\\", \\"storage_protocol\\": \\"iSCSI\\"}, \\"timestamp\\": \\"2019-07-29T14:11:48.277405\\"}, \\"_context_domain_id\\": null, \\"_unique_id\\": \\"4c428d24297045849aceae01fc8c13d8\\", \\"_context_is_admin_project\\": true, \\"_context_project_name\\": null, \\"_context_read_deleted\\": \\"no\\", \\"_context_user_identity\\": \\"- - - - -\\", \\"_context_tenant\\": null, \\"_context_project_id\\": null, \\"_context_read_only\\": false, \\"_context_user_domain\\": null}\", \"oslo.version\": \"2.0\"}">>]},<<217,11,69,46,227,82,98,121,72,162,157,241,231,74,127,55>>,true},5,flow},false}} from <0.439.0> to <0.16805.30> in an old incarnation (1) of this node (2)
2019-07-29 14:12:20.904 [error] emulator Discarding message {'$gen_cast',{deliver,{delivery,false,true,<6204.12121.0>,{basic_message,{resource,<<"/openstack">>,exchange,<<"cinder-scheduler_fanout">>},[<<>>],{content,60,{'P_basic',<<"application/json">>,<<"utf-8">>,[],2,0,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},<<248,0,16,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110,5,117,116,102,45,56,0,0,0,0,2,0>>,rabbit_framing_amqp_0_9_1,[<<"{\"oslo.message\": \"{\\"_context_domain\\": null, \\"_context_request_id\\": \\"req-fe64f644-2826-42fc-a3c7-c28208f1df1b\\", \\"_context_global_request_id\\": null, \\"_context_quota_class\\": null, \\"_context_service_catalog\\": [], \\"_context_auth_token\\": null, \\"_context_resource_uuid\\": null, \\"_context_user\\": null, \\"_context_user_id\\": null, \\"_context_show_deleted\\": false, \\"_context_is_admin\\": true, \\"version\\": \\"3.3\\", \\"_context_project_domain\\": null, \\"_context_timestamp\\": \\"2019-07-29T14:11:47.001291\\", \\"method\\": \\"update_service_capabilities\\", \\"_context_project\\": null, \\"_context_remote_address\\": null, \\"_context_roles\\": [\\"admin\\"], \\"args\\": {\\"cluster_name\\": null, \\"service_name\\": \\"volume\\", \\"host\\": \\"cmp2@lvm-driver\\", \\"capabilities\\": {\\"filter_function\\": null, \\"goodness_function\\": null, \\"shared_targets\\": false, \\"volume_backend_name\\": \\"lvm-driver\\", \\"driver_version\\": \\"3.0.0\\", \\"sparse_copy_volume\\": true, \\"pools\\": [{\\"pool_name\\": \\"lvm-driver\\", \\"filter_function\\": null, \\"goodness_function\\": null, \\"multiattach\\": true, \\"total_volumes\\": 1, \\"provisioned_capacity_gb\\": 0.0, \\"allocated_capacity_gb\\": 0, \\"thin_provisioning_support\\": true, \\"free_capacity_gb\\": 19.0, \\"location_info\\": \\"LVMVolumeDriver:cmp2:cinder-volumes:thin:0\\", \\"total_capacity_gb\\": 19.0, \\"thick_provisioning_support\\": false, \\"reserved_percentage\\": 0, \\"QoS_support\\": false, \\"max_over_subscription_ratio\\": \\"20.0\\", \\"backend_state\\": \\"up\\"}], \\"vendor_name\\": \\"Open Source\\", \\"storage_protocol\\": \\"iSCSI\\"}, \\"timestamp\\": \\"2019-07-29T14:11:48.277405\\"}, \\"_context_domain_id\\": null, \\"_unique_id\\": \\"4c428d24297045849aceae01fc8c13d8\\", \\"_context_is_admin_project\\": true, \\"_context_project_name\\": null, \\"_context_read_deleted\\": \\"no\\", \\"_context_user_identity\\": \\"- - - - -\\", \\"_context_tenant\\": null, \\"_context_project_id\\": null, \\"_context_read_only\\": false, \\"_context_user_domain\\": null}\", \"oslo.version\": \"2.0\"}">>]},<<217,11,69,46,227,82,98,121,72,162,157,241,231,74,127,55>>,true},5,flow},false}} from <0.439.0> to <0.15885.64> in an old incarnation (1) of this node (2)

Ivan Udovichenko

unread,
Jul 29, 2019, 7:45:25 PM7/29/19
to rabbitmq-users
Also tried with policy "ha-exactly" mode:
# rabbitmqctl list_policies
Listing policies for vhost "/" ...
vhost   name    pattern apply-to        definition      priority
/       HA      ^(?!amq\.).*    all     {"ha-mode":"exactly","ha-params":2,"ha-sync-mode":"automatic","message-ttl":120000}  0

Errors are pretty much the same:
2019-07-29 23:38:29.117 [error] emulator Discarding message {'$gen_cast',{deliver,{delivery,false,true,<6202.14993.3>,{basic_message,{resource,<<"/openstack">>,exchange,<<"neutron">>},[<<"q-reports-plugin">>],{content,60,{'P_basic',<<"application/json">>,<<"utf-8">>,[],2,0,undefined,undefined,<<"60000">>,undefined,undefined,undefined,undefined,undefined,undefined},<<249,0,16,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110,5,117,116,102,45,56,0,0,0,0,2,0,5,54,48,48,48,48>>,rabbit_framing_amqp_0_9_1,[<<"{\"oslo.message\": \"{\\"_context_domain\\": null, \\"_msg_id\\": \\"69c4861c63924aaf9dec658c5d603ab2\\", \\"_context_global_request_id\\": null, \\"_context_tenant_name\\": null, \\"_context_read_only\\": false, \\"_context_request_id\\": \\"req-e9a70bf4-2f8b-43b4-b3a2-a99d9a189a0c\\", \\"args\\": {\\"agent_state\\": {\\"agent_state\\": {\\"binary\\": \\"neutron-dhcp-agent\\", \\"uuid\\": \\"6e0c1b1a-296b-4ea5-9d5f-26736a5f6eb0\\", \\"availability_zone\\": \\"nova\\", \\"topic\\": \\"dhcp_agent\\", \\"host\\": \\"gtw03\\", \\"agent_type\\": \\"DHCP agent\\", \\"configurations\\": {\\"subnets\\": 0, \\"log_agent_heartbeats\\": false, \\"dhcp_lease_duration\\": 600, \\"dhcp_driver\\": \\"neutron.agent.linux.dhcp.Dnsmasq\\", \\"networks\\": 0, \\"ports\\": 0}}}, \\"time\\": \\"2019-07-29T23:38:23.703120\\"}, \\"_unique_id\\": \\"c572a6c286e44c2db3414384ff6ab82b\\", \\"_context_resource_uuid\\": null, \\"_context_tenant_id\\": null, \\"_context_is_admin_project\\": true, \\"_context_user\\": null, \\"_context_user_id\\": null, \\"_context_project_name\\": null, \\"_context_user_identity\\": \\"- - - - -\\", \\"_reply_q\\": \\"reply_347b8b60d91041ea8bc4921fb789962f\\", \\"_context_auth_token\\": null, \\"_context_show_deleted\\": false, \\"_context_tenant\\": null, \\"_context_roles\\": [], \\"_context_is_admin\\": true, \\"version\\": \\"1.0\\", \\"_context_project_id\\": null, \\"_context_project_domain\\": null, \\"_context_timestamp\\": \\"2019-07-29 23:38:23.702695\\", \\"_context_user_domain\\": null, \\"_context_user_name\\": null, \\"method\\": \\"report_state\\", \\"_context_project\\": null}\", \"oslo.version\": \"2.0\"}">>]},<<191,121,202,181,97,206,90,242,53,75,76,103,165,187,30,34>>,true},4,flow},false}} from <0.3282.0> to <0.6154.0> in an old incarnation (2) of this node (3)

2019-07-29 23:38:29.118 [error] emulator Discarding message {'$gen_call',{<0.3282.0>,#Ref<0.93407759.1541406721.158363>},stat} from <0.3282.0> to <0.26132.0> in an old incarnation (2) of this node (3)

2019-07-29 23:38:29.118 [error] emulator Discarding message {'$gen_call',{<0.3282.0>,#Ref<0.93407759.1541406721.158364>},stat} from <0.3282.0> to <0.26124.0> in an old incarnation (2) of this node (3)

Ivan Udovichenko

unread,
Jul 29, 2019, 7:52:46 PM7/29/19
to rabbitmq-users
I even tried to add sleep for about 60 seconds to Systemd unit just in case (to be sure net tick time will work anyway):
ExecStop=/usr/lib/rabbitmq/bin/rabbitmqctl shutdown
ExecStop=/bin/sleep 60


:::Cluster status:::
# rabbitmqctl cluster_status
Cluster status of node rabbit@msg03 ...
[{nodes,[{disc,[rabbit@msg01,rabbit@msg02,rabbit@msg03]}]},
 {running_nodes,[rabbit@msg01,rabbit@msg02,rabbit@msg03]},
 {cluster_name,<<"openstack">>},
 {partitions,[]},
 {alarms,[{rabbit@msg01,[]},{rabbit@msg02,[]},{rabbit@msg03,[]}]}]

:::Logs:::
2019-07-29 23:43:40.807 [info] <0.622.8> Halting Erlang VM with the following applications:
    ranch
    ssl
    public_key
    asn1
    observer_cli
    crypto
    inets
    xmerl
    recon
    jsx
    lager
    goldrush
    compiler
    syntax_tools
    sasl
    stdlib
    kernel
2019-07-29 23:44:47.390 [info] <0.8.0> Log file opened with Lager
2019-07-29 23:44:48.352 [info] <0.43.0> Application mnesia exited with reason: stopped
2019-07-29 23:44:49.307 [info] <0.309.0> 
 Starting RabbitMQ 3.7.16 on Erlang 22.0.7
 Copyright (C) 2007-2019 Pivotal Software, Inc.
 Licensed under the MPL.  See https://www.rabbitmq.com/
2019-07-29 23:44:49.311 [info] <0.309.0> 
 node           : rabbit@msg03
 home dir       : /var/lib/rabbitmq/
 config file(s) : /etc/rabbitmq/rabbitmq.config
 cookie hash    : +uF1zQicqZ3sG7d/YLRTVA==
 log(s)         : /var/log/rabbitmq/rab...@msg03.log
                : /var/log/rabbitmq/rabbit@msg03_upgrade.log
 database dir   : /var/lib/rabbitmq/mnesia/rabbit@msg03
2019-07-29 23:44:51.383 [info] <0.309.0> Running boot step pre_boot defined by app rabbit
2019-07-29 23:44:51.383 [info] <0.309.0> Running boot step rabbit_core_metrics defined by app rabbit
2019-07-29 23:44:51.384 [info] <0.309.0> Running boot step rabbit_alarm defined by app rabbit
2019-07-29 23:44:51.390 [info] <0.434.0> Memory high watermark set to 1594 MiB (1672157593 bytes) of 1993 MiB (2090196992 bytes) total
2019-07-29 23:44:51.397 [info] <0.436.0> Enabling free disk space monitoring
2019-07-29 23:44:51.397 [info] <0.436.0> Disk free limit set to 50MB
2019-07-29 23:44:51.400 [info] <0.309.0> Running boot step code_server_cache defined by app rabbit
2019-07-29 23:44:51.400 [info] <0.309.0> Running boot step file_handle_cache defined by app rabbit
2019-07-29 23:44:51.400 [info] <0.443.0> Limiting to approx 8092 file handles (7280 sockets)
2019-07-29 23:44:51.401 [info] <0.444.0> FHC read buffering:  OFF
2019-07-29 23:44:51.401 [info] <0.444.0> FHC write buffering: ON
2019-07-29 23:44:51.402 [info] <0.309.0> Running boot step worker_pool defined by app rabbit
2019-07-29 23:44:51.402 [info] <0.310.0> Will use 1 processes for default worker pool
2019-07-29 23:44:51.402 [info] <0.310.0> Starting worker pool 'worker_pool' with 1 processes in it
2019-07-29 23:44:51.402 [info] <0.309.0> Running boot step database defined by app rabbit
2019-07-29 23:44:55.782 [info] <0.309.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-07-29 23:45:05.314 [info] <0.309.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-07-29 23:45:05.346 [info] <0.309.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-07-29 23:45:05.346 [info] <0.309.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping registration.
2019-07-29 23:45:05.346 [info] <0.309.0> Running boot step database_sync defined by app rabbit
2019-07-29 23:45:05.346 [info] <0.309.0> Running boot step codec_correctness_check defined by app rabbit
2019-07-29 23:45:05.346 [info] <0.309.0> Running boot step external_infrastructure defined by app rabbit
2019-07-29 23:45:05.346 [info] <0.309.0> Running boot step rabbit_registry defined by app rabbit
2019-07-29 23:45:05.346 [info] <0.309.0> Running boot step rabbit_auth_mechanism_cr_demo defined by app rabbit
2019-07-29 23:45:05.346 [info] <0.309.0> Running boot step rabbit_queue_location_random defined by app rabbit
2019-07-29 23:45:05.347 [info] <0.309.0> Running boot step rabbit_event defined by app rabbit
2019-07-29 23:45:05.347 [info] <0.309.0> Running boot step rabbit_auth_mechanism_amqplain defined by app rabbit
2019-07-29 23:45:05.347 [info] <0.309.0> Running boot step rabbit_auth_mechanism_plain defined by app rabbit
2019-07-29 23:45:05.347 [info] <0.309.0> Running boot step rabbit_exchange_type_direct defined by app rabbit
2019-07-29 23:45:05.347 [info] <0.309.0> Running boot step rabbit_exchange_type_fanout defined by app rabbit
2019-07-29 23:45:05.347 [info] <0.309.0> Running boot step rabbit_exchange_type_headers defined by app rabbit
2019-07-29 23:45:05.347 [info] <0.309.0> Running boot step rabbit_exchange_type_topic defined by app rabbit
2019-07-29 23:45:05.347 [info] <0.309.0> Running boot step rabbit_mirror_queue_mode_all defined by app rabbit
2019-07-29 23:45:05.347 [info] <0.309.0> Running boot step rabbit_mirror_queue_mode_exactly defined by app rabbit
2019-07-29 23:45:05.348 [info] <0.309.0> Running boot step rabbit_mirror_queue_mode_nodes defined by app rabbit
2019-07-29 23:45:05.348 [info] <0.309.0> Running boot step rabbit_priority_queue defined by app rabbit
2019-07-29 23:45:05.348 [info] <0.309.0> Priority queues enabled, real BQ is rabbit_variable_queue
2019-07-29 23:45:05.348 [info] <0.309.0> Running boot step rabbit_queue_location_client_local defined by app rabbit
2019-07-29 23:45:05.348 [info] <0.309.0> Running boot step rabbit_queue_location_min_masters defined by app rabbit
2019-07-29 23:45:05.348 [info] <0.309.0> Running boot step kernel_ready defined by app rabbit
2019-07-29 23:45:05.348 [info] <0.309.0> Running boot step rabbit_sysmon_minder defined by app rabbit
2019-07-29 23:45:05.348 [info] <0.309.0> Running boot step rabbit_epmd_monitor defined by app rabbit
2019-07-29 23:45:05.352 [info] <0.309.0> Running boot step guid_generator defined by app rabbit
2019-07-29 23:45:05.354 [info] <0.309.0> Running boot step rabbit_node_monitor defined by app rabbit
2019-07-29 23:45:05.379 [info] <0.1292.0> Starting rabbit_node_monitor
2019-07-29 23:45:05.379 [info] <0.309.0> Running boot step delegate_sup defined by app rabbit
2019-07-29 23:45:05.380 [info] <0.309.0> Running boot step rabbit_memory_monitor defined by app rabbit
2019-07-29 23:45:05.380 [info] <0.309.0> Running boot step core_initialized defined by app rabbit
2019-07-29 23:45:05.380 [info] <0.309.0> Running boot step upgrade_queues defined by app rabbit
2019-07-29 23:45:05.395 [error] emulator Discarding message {'$gen_call',{<0.1294.0>,#Ref<0.1358085431.3152281601.225407>},stat} from <0.1294.0> to <0.29340.7> in an old incarnation (2) of this node (3)

2019-07-29 23:45:05.402 [error] emulator Discarding message {'$gen_call',{<0.1294.0>,#Ref<0.1358085431.3152281601.225408>},stat} from <0.1294.0> to <0.582.8> in an old incarnation (2) of this node (3)
...
2019-07-29 23:45:18.512 [error] emulator Discarding message {'$gen_cast',{deliver,{delivery,false,true,<6202.5664.4>,{basic_message,{resource,<<"/openstack">>,exchange,<<"cinder-scheduler_fanout">>},[<<>>],{content,60,{'P_basic',<<"application/json">>,<<"utf-8">>,[],2,0,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},<<248,0,16,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110,5,117,116,102,45,56,0,0,0,0,2,0>>,rabbit_framing_amqp_0_9_1,[<<"{\"oslo.message\": \"{\\"_context_domain\\": null, \\"_context_request_id\\": \\"req-ced052fb-d6fa-4a05-ab4c-58cb73eb7fc6\\", \\"_context_global_request_id\\": null, \\"_context_quota_class\\": null, \\"_context_service_catalog\\": [], \\"_context_auth_token\\": null, \\"_context_resource_uuid\\": null, \\"_context_user\\": null, \\"_context_user_id\\": null, \\"_context_show_deleted\\": false, \\"_context_is_admin\\": true, \\"version\\": \\"3.3\\", \\"_context_project_domain\\": null, \\"_context_timestamp\\": \\"2019-07-29T23:45:17.515494\\", \\"method\\": \\"update_service_capabilities\\", \\"_context_project\\": null, \\"_context_remote_address\\": null, \\"_context_roles\\": [\\"admin\\"], \\"args\\": {\\"cluster_name\\": null, \\"service_name\\": \\"volume\\", \\"host\\": \\"cmp1@lvm-driver\\", \\"capabilities\\": {\\"filter_function\\": null, \\"goodness_function\\": null, \\"shared_targets\\": false, \\"volume_backend_name\\": \\"lvm-driver\\", \\"driver_version\\": \\"3.0.0\\", \\"sparse_copy_volume\\": true, \\"pools\\": [{\\"pool_name\\": \\"lvm-driver\\", \\"filter_function\\": null, \\"goodness_function\\": null, \\"multiattach\\": true, \\"total_volumes\\": 1, \\"provisioned_capacity_gb\\": 0.0, \\"allocated_capacity_gb\\": 0, \\"thin_provisioning_support\\": true, \\"free_capacity_gb\\": 19.0, \\"location_info\\": \\"LVMVolumeDriver:cmp1:cinder-volumes:thin:0\\", \\"total_capacity_gb\\": 19.0, \\"thick_provisioning_support\\": false, \\"reserved_percentage\\": 0, \\"QoS_support\\": false, \\"max_over_subscription_ratio\\": \\"20.0\\", \\"backend_state\\": \\"up\\"}], \\"vendor_name\\": \\"Open Source\\", \\"storage_protocol\\": \\"iSCSI\\"}, \\"timestamp\\": \\"2019-07-29T23:45:18.505283\\"}, \\"_context_domain_id\\": null, \\"_unique_id\\": \\"198e91cb1b0240eba29659a534aaa0ae\\", \\"_context_is_admin_project\\": true, \\"_context_project_name\\": null, \\"_context_read_deleted\\": \\"no\\", \\"_context_user_identity\\": \\"- - - - -\\", \\"_context_tenant\\": null, \\"_context_project_id\\": null, \\"_context_read_only\\": false, \\"_context_user_domain\\": null}\", \"oslo.version\": \"2.0\"}">>]},<<77,200,112,108,238,93,15,169,255,179,108,47,127,33,179,22>>,true},5,flow},false}} from <0.1297.0> to <0.27301.7> in an old incarnation (2) of this node (3)

Ivan Udovichenko

unread,
Jul 29, 2019, 8:30:19 PM7/29/19
to rabbitmq-users
Just noticed that Michael announced  Rabbitmq 3.7.17 version release.
Result is the same.

::: Policies :::
# rabbitmqctl list_policies
Listing policies for vhost "/" ...
vhost   name    pattern apply-to        definition      priority
/       HA      ^(?!amq\.).*    all     {"ha-mode":"all","ha-sync-mode":"automatic","message-ttl":120000}       0

::: Config :::
/etc/rabbitmq/rabbitmq.config 
[
    {rabbit, [{vm_memory_high_watermark, 0.8},
              {memory_alarms, true},
              {disk_free_limit, 50000000},
              {cluster_partition_handling, autoheal},
              {tcp_listen_options,
                        [binary,
                          {packet, raw},
                          {reuseaddr, true},
                          {backlog, 128},
                          {nodelay, true},
                          {exit_on_close, false},
                          {keepalive, true}
                         ]
              },
              {cluster_nodes, {['rabbit@msg01', 'rabbit@msg02', 'rabbit@msg03'], disc}},
              {loopback_users, []},
              {tcp_listeners, [{"10.11.0.42",5672}]},
              {ssl_listeners, [{"0.0.0.0",5671}]},
              {ssl_options, [{cacertfile,"/etc/rabbitmq/ssl/ca.pem"},
                             {certfile,"/etc/rabbitmq/ssl/cert.pem"},
                             {keyfile,"/etc/rabbitmq/ssl/key.pem"},
                             {verify,verify_peer},
                             {versions, ['tlsv1.2', 'tlsv1.1', 'tlsv1']},
                             {fail_if_no_peer_cert,true}
                            ]
              }
              
             ]
    },
    {rabbitmq_management,
              [{listener, [{port, 15672 },
                           {ip, "10.11.0.42" }
                         ]}]}
].

::: Package versions :::
# apt policy rabbitmq-server
rabbitmq-server:
  Installed: 3.7.17-1
  Candidate: 3.7.17-1
  Version table:
 *** 3.7.17-1 1400
        500 https://dl.bintray.com/rabbitmq/debian xenial/main amd64 Packages
        500 https://dl.bintray.com/rabbitmq/debian xenial/main i386 Packages
        100 /var/lib/dpkg/status

# apt policy erlang-base-hipe
erlang-base-hipe:
  Installed: 1:22.0.7-1
  Candidate: 1:22.0.7-1
  Version table:
 *** 1:22.0.7-1 1400
        500 http://dl.bintray.com/rabbitmq-erlang/debian xenial/erlang-22.x amd64 Packages
        100 /var/lib/dpkg/status

::: Logs :::
2019-07-30 00:16:00.121 [info] <0.298.0> 
 Starting RabbitMQ 3.7.17 on Erlang 22.0.7
 Copyright (C) 2007-2019 Pivotal Software, Inc.
 Licensed under the MPL.  See https://www.rabbitmq.com/
2019-07-30 00:16:00.126 [info] <0.298.0> 
 node           : rabbit@msg03
 home dir       : /var/lib/rabbitmq/
 config file(s) : /etc/rabbitmq/rabbitmq.config
 cookie hash    : +uF1zQicqZ3sG7d/YLRTVA==
 log(s)         : /var/log/rabbitmq/rab...@msg03.log
                : /var/log/rabbitmq/rabbit@msg03_upgrade.log
 database dir   : /var/lib/rabbitmq/mnesia/rabbit@msg03
2019-07-30 00:16:00.138 [info] <0.298.0> Running boot step pre_boot defined by app rabbit
2019-07-30 00:16:00.138 [info] <0.298.0> Running boot step rabbit_core_metrics defined by app rabbit
2019-07-30 00:16:00.139 [info] <0.298.0> Running boot step rabbit_alarm defined by app rabbit
2019-07-30 00:16:00.145 [info] <0.304.0> Memory high watermark set to 1594 MiB (1672157593 bytes) of 1993 MiB (209019
6992 bytes) total
2019-07-30 00:16:00.151 [info] <0.306.0> Enabling free disk space monitoring
2019-07-30 00:16:00.151 [info] <0.306.0> Disk free limit set to 50MB
2019-07-30 00:16:00.154 [info] <0.298.0> Running boot step code_server_cache defined by app rabbit
2019-07-30 00:16:00.155 [info] <0.298.0> Running boot step file_handle_cache defined by app rabbit
2019-07-30 00:16:00.155 [info] <0.309.0> Limiting to approx 8092 file handles (7280 sockets)
2019-07-30 00:16:00.155 [info] <0.310.0> FHC read buffering:  OFF
2019-07-30 00:16:00.155 [info] <0.310.0> FHC write buffering: ON
2019-07-30 00:16:00.156 [info] <0.298.0> Running boot step worker_pool defined by app rabbit
2019-07-30 00:16:00.156 [info] <0.299.0> Will use 1 processes for default worker pool
2019-07-30 00:16:00.156 [info] <0.299.0> Starting worker pool 'worker_pool' with 1 processes in it
2019-07-30 00:16:00.156 [info] <0.298.0> Running boot step database defined by app rabbit
2019-07-30 00:16:04.066 [info] <0.298.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-07-30 00:16:34.067 [warning] <0.298.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbi
t_user,rabbit_topic_permission,rabbit_vhost,rabbit_semi_durable_route,rabbit_topic_trie_node,rabbit_topic_trie_bindin
g,rabbit_runtime_parameters,rabbit_queue]}
2019-07-30 00:16:34.067 [info] <0.298.0> Waiting for Mnesia tables for 30000 ms, 8 retries left
2019-07-30 00:16:47.299 [info] <0.298.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-07-30 00:16:47.334 [info] <0.298.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2019-07-30 00:16:47.334 [info] <0.298.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support
 registration, skipping registration.
2019-07-30 00:16:47.334 [info] <0.298.0> Running boot step database_sync defined by app rabbit
2019-07-30 00:16:47.334 [info] <0.298.0> Running boot step feature_flags defined by app rabbit
2019-07-30 00:16:47.335 [info] <0.298.0> Running boot step codec_correctness_check defined by app rabbit
2019-07-30 00:16:47.335 [info] <0.298.0> Running boot step external_infrastructure defined by app rabbit
2019-07-30 00:16:47.335 [info] <0.298.0> Running boot step rabbit_registry defined by app rabbit
2019-07-30 00:16:47.335 [info] <0.298.0> Running boot step rabbit_auth_mechanism_cr_demo defined by app rabbit
2019-07-30 00:16:47.335 [info] <0.298.0> Running boot step rabbit_queue_location_random defined by app rabbit
2019-07-30 00:16:47.335 [info] <0.298.0> Running boot step rabbit_event defined by app rabbit
2019-07-30 00:16:47.336 [info] <0.298.0> Running boot step rabbit_auth_mechanism_amqplain defined by app rabbit
2019-07-30 00:16:47.336 [info] <0.298.0> Running boot step rabbit_auth_mechanism_plain defined by app rabbit
2019-07-30 00:16:47.336 [info] <0.298.0> Running boot step rabbit_exchange_type_direct defined by app rabbit
2019-07-30 00:16:47.336 [info] <0.298.0> Running boot step rabbit_exchange_type_fanout defined by app rabbit
2019-07-30 00:16:47.336 [info] <0.298.0> Running boot step rabbit_exchange_type_headers defined by app rabbit
2019-07-30 00:16:47.337 [info] <0.298.0> Running boot step rabbit_exchange_type_topic defined by app rabbit
2019-07-30 00:16:47.337 [info] <0.298.0> Running boot step rabbit_mirror_queue_mode_all defined by app rabbit
2019-07-30 00:16:47.337 [info] <0.298.0> Running boot step rabbit_mirror_queue_mode_exactly defined by app rabbit
2019-07-30 00:16:47.337 [info] <0.298.0> Running boot step rabbit_mirror_queue_mode_nodes defined by app rabbit
2019-07-30 00:16:47.337 [info] <0.298.0> Running boot step rabbit_priority_queue defined by app rabbit
2019-07-30 00:16:47.337 [info] <0.298.0> Priority queues enabled, real BQ is rabbit_variable_queue
2019-07-30 00:16:47.337 [info] <0.298.0> Running boot step rabbit_queue_location_client_local defined by app rabbit
2019-07-30 00:16:47.338 [info] <0.298.0> Running boot step rabbit_queue_location_min_masters defined by app rabbit
2019-07-30 00:16:47.338 [info] <0.298.0> Running boot step kernel_ready defined by app rabbit
2019-07-30 00:16:47.338 [info] <0.298.0> Running boot step rabbit_sysmon_minder defined by app rabbit
2019-07-30 00:16:47.338 [info] <0.298.0> Running boot step rabbit_epmd_monitor defined by app rabbit
2019-07-30 00:16:47.342 [info] <0.298.0> Running boot step guid_generator defined by app rabbit
2019-07-30 00:16:47.346 [info] <0.298.0> Running boot step rabbit_node_monitor defined by app rabbit
2019-07-30 00:16:47.380 [info] <0.922.0> Starting rabbit_node_monitor
2019-07-30 00:16:47.381 [info] <0.298.0> Running boot step delegate_sup defined by app rabbit
2019-07-30 00:16:47.381 [info] <0.298.0> Running boot step rabbit_memory_monitor defined by app rabbit
2019-07-30 00:16:47.382 [info] <0.298.0> Running boot step core_initialized defined by app rabbit
2019-07-30 00:16:47.382 [info] <0.298.0> Running boot step upgrade_queues defined by app rabbit
2019-07-30 00:16:47.408 [error] emulator Discarding message {'$gen_call',{<0.924.0>,#Ref<0.1379925993.2084044801.73396>},stat} from <0.924.0> to <0.3102.0> in an old incarnation (2) of this node (1)

2019-07-30 00:16:47.411 [error] emulator Discarding message {'$gen_call',{<0.924.0>,#Ref<0.1379925993.2084044801.73397>},stat} from <0.924.0> to <0.5334.0> in an old incarnation (2) of this node (1)

2019-07-30 00:16:47.417 [error] emulator Discarding message {'$gen_call',{<0.924.0>,#Ref<0.1379925993.2084044801.73398>},stat} from <0.924.0> to <0.3102.0> in an old incarnation (2) of this node (1)

2019-07-30 00:16:47.425 [error] emulator Discarding message {'$gen_call',{<0.924.0>,#Ref<0.1379925993.2084044801.73399>},stat} from <0.924.0> to <0.3541.0> in an old incarnation (2) of this node (1)

2019-07-30 00:16:47.426 [info] <0.298.0> Running boot step rabbit_connection_tracking defined by app rabbit
2019-07-30 00:16:47.426 [error] emulator Discarding message {'$gen_call',{<0.924.0>,#Ref<0.1379925993.2084044801.73400>},stat} from <0.924.0> to <0.1595.0> in an old incarnation (2) of this node (1)
...
2019-07-30 00:16:49.410 [error] emulator Discarding message {'$gen_cast',{deliver,{delivery,false,true,<6202.7267.0>,
{basic_message,{resource,<<"/openstack">>,exchange,<<"nova">>},[<<"conductor">>],{content,60,{'P_basic',<<"applicatio
n/json">>,<<"utf-8">>,[],2,0,undefined,undefined,<<"30000">>,undefined,undefined,undefined,undefined,undefined,undefi
ned},<<249,0,16,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110,5,117,116,102,45,56,0,0,0,0,2,0,5,51,48,4
8,48,48>>,rabbit_framing_amqp_0_9_1,[<<"{\"oslo.message\": \"{\\"_context_domain\\": null, \\"_msg_id\\": \\"eb78ae17
f759404ca4b1bd653df7e8fd\\", \\"_context_global_request_id\\": null, \\"_context_quota_class\\": null, \\"_context_re
ad_only\\": false, \\"_context_request_id\\": \\"req-7645981a-3d15-400f-86c9-4926a50122e8\\", \\"_context_service_cat
alog\\": [], \\"args\\": {\\"object_versions\\": {\\"ComputeNodeList\\": \\"1.17\\", \\"ComputeNode\\": \\"1.18\\", \
\"HVSpec\\": \\"1.2\\", \\"PciDevicePool\\": \\"1.1\\", \\"PciDevicePoolList\\": \\"1.1\\"}, \\"objmethod\\": \\"get_
all_by_host\\", \\"args\\": [\\"cmp2\\"], \\"objname\\": \\"ComputeNodeList\\", \\"kwargs\\": {\\"use_slave\\": true}
}, \\"_unique_id\\": \\"def56d487f3e483c9a18f197c7614318\\", \\"_context_resource_uuid\\": null, \\"_context_instance
_lock_checked\\": false, \\"_context_is_admin_project\\": true, \\"_context_user\\": null, \\"_context_user_id\\": nu
ll, \\"_context_project_name\\": null, \\"_context_read_deleted\\": \\"no\\", \\"_context_user_identity\\": \\"- - - 
- -\\", \\"_reply_q\\": \\"reply_24c25351a8e9457a8188f5fe1a17c06a\\", \\"_context_auth_token\\": null, \\"_context_sh
ow_deleted\\": false, \\"_context_tenant\\": null, \\"_context_roles\\": [], \\"_context_is_admin\\": true, \\"version\\": \\"3.0\\", \\"_context_project_id\\": null, \\"_context_project_domain\\": null, \\"_context_timestamp\\": \\"2019-07-30T00:15:45.799466\\", \\"_context_user_domain\\": null, \\"_context_user_name\\": null, \\"method\\": \\"object_class_action_versions\\", \\"_context_project\\": null, \\"_context_remote_address\\": null}\", \"oslo.version\": \"2.0\"}">>]},<<142,167,101,89,76,182,22,205,43,36,165,136,135,146,31,209>>,true},4,flow},false}} from <0.938.0> to <0.1600.0> in an old incarnation (2) of this node (1)

Ivan Udovichenko

unread,
Jul 30, 2019, 8:33:33 AM7/30/19
to rabbitmq-users
Forgot to mention policies on '/openstack' vhost:
# rabbitmqctl list_policies -p /openstack
Listing policies for vhost "/openstack" ...
vhost   name    pattern apply-to        definition      priority
/openstack      ha-notif        ^(alarm|alarming|event|metering|notifications)\.        all     {"ha-mode":"all","ha-sync-mode":"automatic"} 0
/openstack      default-policy  ^(?!amq\.).*    all     {"message-ttl":120000}  0
/openstack      results_expire  ^results\.      all     {"expires":3600000}     1
/openstack      heat_worker_rpc_expire  ^engine_worker\.        all     {"expires":3600000}     1
/openstack      tasks_expire    ^tasks\.        all     {"expires":3600000}     1
/openstack      heat_engine_rpc_expire  ^heat-engine-listener\. all     {"expires":3600000}     1
/openstack      Ceilometer      ^metering.sample$       all     {"ha-mode":"all","message-ttl":10800000}        10


Also noticed, that these issues can't be reproduced on ha all for all vhosts with everything else.

Luke Bakken

unread,
Jul 30, 2019, 10:31:00 AM7/30/19
to rabbitmq-users
Hi Ivan,

Like I said, these errors are normal and to be expected. When you restart a node, the processes within the Erlang VM that were running come to a halt. Other nodes in the cluster may retain identifiers for those defunct process and may try to send messages to them. This will fail because the process no longer exists.

Thanks,
Luke

Ivan Udovichenko

unread,
Jul 30, 2019, 12:00:04 PM7/30/19
to rabbitmq-users
Hi Luke,

Great, thank you for the explanation!
That's all I need.

Thank you!

Michael Klishin

unread,
Jul 31, 2019, 3:42:01 PM7/31/19
to rabbitmq-users
The message in this thread is logged by the runtime.

RabbitMQ can log similar messages as warnings if a queue encounters a message from an old "term" (to use Raft speak).
That is, after a queue master changes, all messages directed to the previous master "incarnation" are intentionally dropped,
both in classic mirrored queues and 3.8 quorum queues (in the latter case it's mandatory Raft peer behavior we must adhere to).

Ivan Udovichenko

unread,
Aug 13, 2019, 5:04:52 AM8/13/19
to rabbitmq-users
Hi Michael,

Thank you for the information!

gayathri devi

unread,
Jan 29, 2020, 8:51:36 AM1/29/20
to rabbitmq-users
Hi, 

Rabbitmq: 3.7.15
Erlang 22.x
We are using "TWO node Rabbitmq cluster". When Rabbitmq hosting nodes are rebooted multiple times, then sometimes Rabbitmq service is not working fine i.e., When Rabbitmq joined the the cluster again. 

Is it possible to fix this issue in 3.7.15 Rabbitmq version instead of upgrading to 3.8 Rabbitmq version?

Michael Klishin

unread,
Jan 29, 2020, 10:55:44 AM1/29/20
to rabbitmq-users
This list uses a "one question per thread" policy. Please start a new thread.

We cannot suggest anything with the amount of information provided. How do you define "not working"? What's in the logs of all nodes involved?

Two node clusters are highly recommended against [1] but that alone is very unlikely to be the root cause. We need server logs and a set of steps to reproduce
in order to suggest anything specific. Please help others help you.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/26b9e80d-5a9f-42a8-89fa-f025c971f32b%40googlegroups.com.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ
Reply all
Reply to author
Forward
0 new messages