RabbitMQ crashed after updating 3.8 to 3.9

776 views
Skip to first unread message

Roman Levitsky

unread,
Dec 21, 2021, 7:02:24 AM12/21/21
to rabbitm...@googlegroups.com
Hi Group,

I am trying to updgrade my RabbitMQ servers.
After installing 3.9.11 version with exactly the same config that works
with 3.8.16 I am getting the "no function clause matching" exception,
please find the logfile exceprt below. It repeates every 10 minutes and
I see no connection from my Logstash instance at the web interface.

Could you please advise what should I do to identify an issue?

Best regards,
Roman Levitsky

Here is the log (time stripped to make it shorter):

[erro] <0.2284.0> crasher:
[erro] <0.2284.0> initial call: cowboy_stream_h:request_process/3
[erro] <0.2284.0> pid: <0.2284.0>
[erro] <0.2284.0> registered_name: []
[erro] <0.2284.0> exception error: no function clause matching
[erro] <0.2284.0> rabbit_mgmt_wm_node:find_type('rab...@h028.company.com',
[erro] <0.2284.0> []) (rabbit_mgmt_wm_node.erl, line 74)
[erro] <0.2284.0> in function rabbit_mgmt_wm_node:node_data/2 (rabbit_mgmt_wm_node.erl, line 65)
[erro] <0.2284.0> in call from rabbit_mgmt_wm_node:node0/1 (rabbit_mgmt_wm_node.erl, line 44)
[erro] <0.2284.0> in call from rabbit_mgmt_wm_node:resource_exists/2 (rabbit_mgmt_wm_node.erl, line 29)
[erro] <0.2284.0> in call from cowboy_rest:call/3 (src/cowboy_rest.erl, line 1575)
[erro] <0.2284.0> in call from cowboy_rest:expect/6 (src/cowboy_rest.erl, line 1558)
[erro] <0.2284.0> in call from cowboy_rest:upgrade/4 (src/cowboy_rest.erl, line 284)
[erro] <0.2284.0> in call from cowboy_stream_h:execute/3 (src/cowboy_stream_h.erl, line 300)
[erro] <0.2284.0> ancestors: [<0.2283.0>,<0.537.0>,<0.529.0>,<0.528.0>,<0.526.0>,
[erro] <0.2284.0> rabbit_web_dispatch_sup,<0.515.0>]
[erro] <0.2284.0> message_queue_len: 0
[erro] <0.2284.0> messages: []
[erro] <0.2284.0> links: [<0.2283.0>]
[erro] <0.2284.0> dictionary: []
[erro] <0.2284.0> trap_exit: false
[erro] <0.2284.0> status: running
[erro] <0.2284.0> heap_size: 6772
[erro] <0.2284.0> stack_size: 29
[erro] <0.2284.0> reductions: 3003
[erro] <0.2284.0> neighbours:
[erro] <0.2284.0>
[erro] <0.2283.0> Ranch listener {acceptor,{0,0,0,0,0,0,0,0},15672}, connection process <0.2283.0>, stream 1 had its request process <0.2284.0> exit with reason function_clause and stacktrace [{rabbit_mgmt_wm_node,find_type,['rab...@h028.company.com',[]],[{file,"rabbit_mgmt_wm_node.erl"},{line,74}]},{rabbit_mgmt_wm_node,node_data,2,[{file,"rabbit_mgmt_wm_node.erl"},{line,65}]},{rabbit_mgmt_wm_node,node0,1,[{file,"rabbit_mgmt_wm_node.erl"},{line,44}]},{rabbit_mgmt_wm_node,resource_exists,2,[{file,"rabbit_mgmt_wm_node.erl"},{line,29}]},{cowboy_rest,call,3,[{file,"src/cowboy_rest.erl"},{line,1575}]},{cowboy_rest,expect,6,[{file,"src/cowboy_rest.erl"},{line,1558}]},{cowboy_rest,upgrade,4,[{file,"src/cowboy_rest.erl"},{line,284}]},{cowboy_stream_h,execute,3,[{file,"src/cowboy_stream_h.erl"},{line,300}]}]
2021-12-21 09:01:32.681905+00:00 [erro] <0.2283.0>


--


CONFIDENTIALITY
NOTICE: This email and files attached to it are
confidential. If you
are not the intended recipient you are hereby notified
that using,
copying, distributing or taking any action in reliance on the
contents of this information is strictly prohibited. If you have
received
this email in error please notify the sender and delete this
email.

Luke Bakken

unread,
Dec 21, 2021, 10:16:07 AM12/21/21
to rabbitmq-users
Hello,

How exactly did you perform the upgrade?

There is something wrong with your cluster status. What is the output of these commands?

rabbitmqctl eval "rabbit_mnesia:status()."
rabbitmqctl cluster_status

Thanks,
Luke

Roman Levitsky

unread,
Dec 21, 2021, 10:49:40 AM12/21/21
to rabbitm...@googlegroups.com
Hi Luke,

Thank you very much for you reply.

I've just pulled the "rabbitmq:3-management" image with Docker and run
it with my 3.8 configs.

Here is the output of the rabbitmqctl eval "rabbit_mnesia:status().":

[{nodes,[{disc,['rab...@logs-node1.company.com']}]},
{running_nodes,['rab...@logs-node1.company.com']},
{cluster_name,<<"rab...@h028.company.com">>},
{partitions,[]}]

I see no difference with the other node running 3.8.16, however.

Here is the output of the rabbitmqctl cluster_status:

Cluster status of node rab...@logs-node1.company.com ...
Basics
Cluster name: rab...@h028.company.com
Disk Nodes
rab...@logs-node1.company.com
Running Nodes
rab...@logs-node1.company.com
Versions
rab...@logs-node1.company.com: RabbitMQ 3.9.11 on Erlang 24.2
Maintenance status
Node: rab...@logs-node1.company.com, status: not under maintenance
Alarms
(none)
Network Partitions
(none)
Listeners
Node: rab...@logs-node1.company.com, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rab...@logs-node1.company.com, interface: [::], port: 15671, protocol: https, purpose: HTTP API over TLS (HTTPS)
Node: rab...@logs-node1.company.com, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rab...@logs-node1.company.com, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rab...@logs-node1.company.com, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
Feature flags
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: implicit_default_bindings, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: stream_queue, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled

I see no difference from the second node running 3.8.16 other than one
extra string here at 3.9:
Flag: stream_queue, state: enabled

Thank you for you time,
Roman.

Luke Bakken

unread,
Dec 21, 2021, 1:35:10 PM12/21/21
to rabbitmq-users
Hello,

As you can see there is only one node in this environment. Is this what you intended?

Please attach your complete configuration and docker files to your response.

Thanks,
Luke

Roman Levitsky

unread,
Dec 22, 2021, 12:13:54 AM12/22/21
to rabbitm...@googlegroups.com
Hi Luke,

Yes, there is only one node and it is a typical setup for our
remote sites.

Actually, we are using Ansible to do the pulling/starting job.
It's configuration looks like this (I've substituted values):

- name: Run docker rabbitmq container
docker_container:
log_driver: "json-file"
log_options:
max-size: "10m"
max-file: "3"
env: 'RABBITMQ_ERLANG_COOKIE="thecookie" RABBITMQ_USE_LONGNAME="true" TZ="UTC"'
hostname: "logs-node1.company.com"
image: rabbitmq:3-management
name: rabbitmq
network_mode: host
networks: "[]"
recreate: true
restart_policy: always
volumes:
- "/opt/rabbitmq"

I've also attached the 'docker inspect' command output to this message
along with 'rabbitmq.conf' and 'enabled_plugins' config files.
I skipped the SSL certificates only.

Thank you very much for your effort,
rabbitmq.conf
enabled_plugins
inspect-rabbitmq.txt

Marcin Gryszkalis

unread,
Dec 25, 2021, 4:05:58 PM12/25/21
to rabbitmq-users
I can see the same exception on RabbitMQ 3.9.10 on Erlang 24.1.7.
Roman, do you use any tool to monitor your RabbitMQ instance? Because quick thought was that in my case it's caused by Zabbix (I have the exception repeating every minute - similar to your 10 minutes interval).

Roman Levitsky

unread,
Dec 31, 2021, 12:57:34 AM12/31/21
to rabbitm...@googlegroups.com
Hi Marcin,

Yes we use Zabbix too.
However, I can now see that Zabbix connects every minute and no crash
occures on it's connection.

Roman Levitsky

unread,
Dec 31, 2021, 3:25:56 AM12/31/21
to rabbitmq-users
I've figured out which Zabbix check causes RabbitMQ crash:

2021/12/31 09:41:32.545083 received passive check request: 'system.cpu.intr' from '10.135.2.6' 2021/12/31 09:41:32.545142 [1] processing update request (1 requests) 2021/12/31 09:41:32.545153 [1] adding new request for key: 'system.cpu.intr' 2021/12/31 09:41:32.545160 [1] created direct exporter task for plugin 'ZabbixAsync' itemid:0 key 'system.cpu.intr' 2021/12/31 09:41:32.545212 executing direct exporter task for key 'system.cpu.intr' 2021/12/31 09:41:32.545305 executed direct exporter task for key 'system.cpu.intr' 2021/12/31 09:41:32.545336 sending passive check response: '838913203' to '10.135.2.6'

Results in Rabbit logs:

2021-12-31 07:41:32.548561+00:00 [dbug] <0.3546.0> User 'zabbix' authenticated successfully by backend rabbit_auth_backend_internal 2021-12-31 07:41:32.549734+00:00 [erro] <0.3546.0> crasher: 2021-12-31 07:41:32.549734+00:00 [erro] <0.3546.0> initial call: cowboy_stream_h:request_process/3 2021-12-31 07:41:32.549734+00:00 [erro] <0.3546.0> pid: <0.3546.0> 2021-12-31 07:41:32.549734+00:00 [erro] <0.3546.0> registered_name: [] 2021-12-31 07:41:32.549734+00:00 [erro] <0.3546.0> exception error: no function clause matching ...

It seems to me that Zabbix connects, gets what he want, then disconnects.
And upon this disconnection, RabbitMQ generating the crash report.

Wes Peng

unread,
Dec 31, 2021, 3:28:38 AM12/31/21
to rabbitm...@googlegroups.com
RMQ has the official Prometheus monitoring integration. :)

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/feb76fa0-3ad9-4f9d-a492-52d421d48430n%40googlegroups.com.

Roman Levitsky

unread,
Dec 31, 2021, 5:23:28 AM12/31/21
to rabbitm...@googlegroups.com
Hi All,

I figured out why Rabbit sending this crash report.

Actually the offending check is conducted via the HTTP request, not the Zabbix Agent requiest:

12:11:32.534860 IP 10.135.2.6.57992 > 10.135.2.25.15672: Flags [P.], seq 1:182, ack 1, win 229, options [nop,nop,TS val 1729999850 ecr 1528946302], length 181
GET /api/nodes/rab...@company.com?memory=true?memory=true HTTP/1.1
Host: company.com:15672
Authorization: Basic emFiYml4Om9hamUzc2FpV3VnaGE1YWg=
Accept: */*

12:11:32.537262 IP 10.135.2.25.15672 > 10.135.2.6.57992: Flags [P.], seq 1:58, ack 182, win 235, options [nop,nop,TS val 1528946305 ecr 1729999850], length 57
HTTP/1.1 500 Internal Server Error
content-length: 0

Luke Bakken

unread,
Dec 31, 2021, 5:41:59 PM12/31/21
to rabbitmq-users
Hello,

You've left off the most important part of the RabbitMQ log message! Please attach the entire log file in your response.

Thanks,
Luke
Message has been deleted

Marcin Gryszkalis

unread,
Jan 3, 2022, 2:54:26 AM1/3/22
to rabbitm...@googlegroups.com
On 31.12.2021 11:23, 'Roman Levitsky' via rabbitmq-users wrote:
> I figured out why Rabbit sending this crash report.

I tried to answer via groups.google.com but it borked, sorry if you get
an answer twice...

Anyway - great analysis. I followed the tracks and in may case the
problem is caused by url in template that uses HOST.NAME macro (FQDN) as
rabbit cluster hostname, while my rabbit installation uses short
hostnames. I tested with curl that usgin non-existent cluster hostname
causes 500 error.

curl -v -u zbx_monitor:xxx
'http://127.0.0.1:15672/api/nodes/rabbit@hostname?memory=true'

< HTTP/1.1 200 OK


curl -v -u zbx_monitor:xxx
'http://127.0.0.1:15672/api/nodes/rab...@hostname.domain.com?memory=true'

< HTTP/1.1 500 Internal Server Error


So I edited the template item "RabbitMQ: Get nodes" and changed url from

{$RABBITMQ.API.SCHEME}://{HOST.CONN}:{$RABBITMQ.API.PORT}/api/nodes/{$RABBITMQ.CLUSTER.NAME}@{HOST.NAME}?memory=true

to

{$RABBITMQ.API.SCHEME}://{HOST.CONN}:{$RABBITMQ.API.PORT}/api/nodes/{$RABBITMQ.CLUSTER.NAME}@{$RABBITMQ.CLUSTER.HOSTNAME}?memory=true


and defined macro RABBITMQ.CLUSTER.HOSTNAME="hostname" in monitored host.


No more crashes :)


Side question is - should using invalid cluster hostname cause such
error? But that's up to developers I guess.

best regards
--
Marcin Gryszkalis, PGP 0xA5DBEEC7 http://fork.pl/gpg.txt

Marcin Gryszkalis

unread,
Jan 3, 2022, 2:54:26 AM1/3/22
to rabbitm...@googlegroups.com
On 31.12.2021 11:23, 'Roman Levitsky' via rabbitmq-users wrote:
> I figured out why Rabbit sending this crash report.

I tried to answer via groups.google.com but it borked, sorry if you get
an answer twice...

Anyway - great analysis. I followed the tracks and in may case the
problem is caused by url in template that uses HOST.NAME macro (FQDN) as
rabbit cluster hostname, while my rabbit installation uses short
hostnames. I tested with curl that usgin non-existent cluster hostname
causes 500 error.

curl -v -u zbx_monitor:xxx
'http://127.0.0.1:15672/api/nodes/rabbit@hostname?memory=true'

< HTTP/1.1 200 OK


curl -v -u zbx_monitor:xxx
'http://127.0.0.1:15672/api/nodes/rab...@hostname.domain.com?memory=true'

< HTTP/1.1 500 Internal Server Error


Reply all
Reply to author
Forward
0 new messages