Hi folks,
We are using RabbitMQ in our web applications and there are multiple Android devices and server clients connected to it for exchanging messages.
We recently upgraded RabbitMQ from version to version 3.7.8
Erlang Version: 20.3.4
RabbitMQ runs inside a docker container.
All logs are send to AWS cloudwatch.
Configurations :
1. rabbitqm_auth_backend_http
2. rabbitmq_auth_backend_cache
{cache_ttl, 60000}
3. {vm_memory_high_watermark, 0.7}
4. {heartbeat, 90}
List of enabled plugins:
[E*] rabbitmq_auth_backend_cache 3.7.8
[E*] rabbitmq_auth_backend_http 3.7.8
[ ] rabbitmq_auth_backend_ldap 3.7.8
[ ] rabbitmq_auth_mechanism_ssl 3.7.8
[ ] rabbitmq_consistent_hash_exchange 3.7.8
[ ] rabbitmq_event_exchange 3.7.8
[ ] rabbitmq_federation 3.7.8
[ ] rabbitmq_federation_management 3.7.8
[ ] rabbitmq_jms_topic_exchange 3.7.8
[E*] rabbitmq_management 3.7.8
[e*] rabbitmq_management_agent 3.7.8
[E*] rabbitmq_mqtt 3.7.8
[ ] rabbitmq_peer_discovery_aws 3.7.8
[ ] rabbitmq_peer_discovery_common 3.7.8
[ ] rabbitmq_peer_discovery_consul 3.7.8
[ ] rabbitmq_peer_discovery_etcd 3.7.8
[ ] rabbitmq_peer_discovery_k8s 3.7.8
[ ] rabbitmq_random_exchange 3.7.8
[ ] rabbitmq_recent_history_exchange 3.7.8
[ ] rabbitmq_sharding 3.7.8
[ ] rabbitmq_shovel 3.7.8
[ ] rabbitmq_shovel_management 3.7.8
[ ] rabbitmq_stomp 3.7.8
[ ] rabbitmq_top 3.7.8
[ ] rabbitmq_tracing 3.7.8
[ ] rabbitmq_trust_store 3.7.8
[e*] rabbitmq_web_dispatch 3.7.8
[E*] rabbitmq_web_mqtt 3.7.8
[E*] rabbitmq_web_mqtt_examples 3.7.8
After the upgrade, we are getting below errors at an average interval of 12-18hrs and all connections get broken from rabbitmq server.
Below is the error stack trace:
** Last message in was {inet_async,{sslsocket,{gen_tcp,#Port<0.1637478>,tls_connection,<0.865.0>},<0.16201.210>},#Ref<0.2517789809.3853778946.1543>,{ok,<<16,243,2,0,6,77,81,73,115,100,112,3,236,0,60,0,16,55,48,98,102,56,99,102,56,56,51,98,49,101,48,102,99,0,24,107,111,107,111,47,115,101,114,118,101,114,47,107,112,47,52,50,47,115,116,97,116,117,115,0,7,111,102,102,108,105,110,101,0,5,107,112,95,52,50,1,41,101,121,74,48,101,88,65,105,79,105,74,75,86,49,81,105,76,67,74,104,98,71,99,105,79,105,74,73,85,122,73,49,78,105,74,57,46,101,121,74,49,99,50,86,121,88,50,108,107,73,106,111,105,78,68,73,105,76,67,74,49,99,50,86,121,88,51,66,104,99,51,78,51,98,51,74,107,73,106,111,105,77,122,103,52,90,71,89,50,90,68,85,53,78,106,70,109,79,87,73,48,89,106,65,53,78,84,107,119,77,71,81,119,78,68,90,109,90,68,82,109,77,50,85,105,76,67,74,115,98,50,100,112,98,108,57,116,89,87,53,104,90,50,86,121,88,50,53,104,98,87,85,105,79,105,74,109,100,87,86,115,88,51,66,118,97,87,53,48,88,121,73,115,73,109,70,119,99,70,57,48,101,88,66,108,73,106,111,105,97,51,65,105,76,67,74,110,90,87,53,108,99,109,70,48,90,87,82,102,89,88,81,105,79,105,73,121,77,68,69,52,76,84,69,121,76,84,65,122,73,68,69,121,79,106,69,119,79,106,85,50,76,106,73,49,77,122,69,53,77,121,74,57,46,66,109,65,95,112,49,112,99,73,76,76,53,52,100,99,67,81,53,...>>}}
** When Server state == {state,{sslsocket,{gen_tcp,#Port<0.1637478>,tls_connection,<0.865.0>},<0.16201.210>},"<server_ip>:49171 -> <server_ip>:8883",true,undefined,true,running,{none,<0.16316.210>},<0.16233.210>,false,none,{proc_state,{sslsocket,{gen_tcp,#Port<0.1637478>,tls_connection,<0.865.0>},<0.16201.210>},#{},{undefined,<<"amq.ctag-P1SlWWXXGq0PBD7g081_EQ">>},{0,nil},{0,nil},undefined,1,"android_device_id",false,{mqtt_msg,true,1,"topic_name",false,undefined,<<"offline">>},{<0.16268.210>,undefined},<0.16273.210>,<<"amq.topic">>,{amqp_adapter_info,{0,0,0,0,0,65535,2815,38},8883,{0,0,0,0,0,65535,2815,2},49171,<<"10.255.0.2:49171 -> <server_ip>:8883">>,{'MQTT',"N/A"},[{variable_map,#{<<"client_id">> => <<"android_device_id">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_128_gcm},{ssl_hash,aead}]},none,<0.843.0>,{auth_state,<<"kp_42">>,{user,<<"kp_42">>,[],[{rabbit_auth_backend_cache,none}]},<<"/">>},#Fun<rabbit_mqtt_processor.0.11905929>},<0.16273.210>,{state,fine,5000,#Ref<0.2517789809.3860332545.149892>}}
** Reason for termination ==
** {function_clause,[{rabbit_mqtt_processor,process_login,[<<"kp_42">>,<<"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjoiNDIiLCJ1c2VyX3Bhc3N3b3JkIjoiMzg4ZGY2ZDU5NjFmOWI0YjA5NTkwMGQwNDZmZDRmM2UiLCJsb2dpbl9tYW5hZ2VyX25hbWUiOiJmdWVsX3BvaW50XyIsImFwcF90eXBlIjoia3AiLCJnZW5lcmF0ZWRfYXQiOiIyMDE4LTEyLTAzIDEyOjEwOjU2LjI1MzE5MyJ9.BmA_p1pcILL54dcCQ59hKJo0U9M0XVKQhCXP5u1fXno">>,3,{proc_state,{sslsocket,{gen_tcp,#Port<0.1637478>,tls_connection,<0.865.0>},<0.16201.210>},#{},{undefined,<<"amq.ctag-P1SlWWXXGq0PBD7g081_EQ">>},{0,nil},{0,nil},undefined,1,"android_device_id",false,{mqtt_msg,true,1,"topic_name",false,undefined,<<"offline">>},{<0.16268.210>,undefined},<0.16273.210>,<<"amq.topic">>,{amqp_adapter_info,{0,0,0,0,0,65535,2815,38},8883,{0,0,0,0,0,65535,2815,2},49171,<<"<server_ip>:49171 -> <server_ip>:8883">>,{'MQTT',"N/A"},[{variable_map,#{<<"client_id">> => <<"android_device_id">>}},{variable_map,#{<<"client_id">> => <<"android_device_id">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_128_gcm},{ssl_hash,aead}]},none,<0.843.0>,{auth_state,<<"kp_42">>,{user,<<"kp_42">>,[],[{rabbit_auth_backend_cache,none}]},<<"/">>},#Fun<rabbit_mqtt_processor.0.11905929>}],[{file,"src/rabbit_mqtt_processor.erl"},{line,477}]},{rabbit_mqtt_processor,process_request,...},...]}
2019-01-24 05:03:03.292 [error] <0.16109.210> CRASH REPORT Process <0.16109.210> with 0 neighbours exited with reason: no function clause matching rabbit_mqtt_processor:process_login(<<"kp_42">>, <<"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjoiNDIiLCJ1c2VyX3Bhc3N3b3JkIjoiMzg4ZGY2ZDU5Nj...">>, 3, {proc_state,{sslsocket,{gen_tcp,#Port<0.1637478>,tls_connection,<0.865.0>},<0.16201.210>},#{},{undefined,...},...}) line 477 in gen_server2:terminate/3 line 1166
2019-01-24 05:03:03.292 [error] <0.16276.210> Supervisor {<0.16276.210>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.16233.210>, {acceptor,{0,0,0,0,0,0,0,0},8883}, {sslsocket,{gen_tcp,#Port<0.1637478>,tls_connection,<0.865.0>},<0.16201.210>}) at <0.16109.210> exit with reason no function clause matching rabbit_mqtt_processor:process_login(<<"kp_42">>, <<"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjoiNDIiLCJ1c2VyX3Bhc3N3b3JkIjoiMzg4ZGY2ZDU5Nj...">>, 3, {proc_state,{sslsocket,{gen_tcp,#Port<0.1637478>,tls_connection,<0.865.0>},<0.16201.210>},#{},{undefined,...},...}) line 477 in context child_terminated
2019-01-24 05:03:03.293 [error] <0.16276.210> Supervisor {<0.16276.210>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.16233.210>, {acceptor,{0,0,0,0,0,0,0,0},8883}, {sslsocket,{gen_tcp,#Port<0.1637478>,tls_connection,<0.865.0>},<0.16201.210>}) at <0.16109.210> exit with reason reached_max_restart_intensity in context shutdown
After this error, all devices get disconnected for sometime and they have to reconnect again.
Please assist incase if anyone has faced similar situation or if there's any solution which you could recommend.
Thanks