Hi,
I have a 3 node cluster running inside docker (each container runs on a different VM) using --network=host. After upgrade to 3.10.7, RabbitMQ logs are getting spammed with the following message/stack trace:
2022-08-24 14:28:21.590397+00:00 [info] <0.3134.0> STREAM_NAME_1661200157105845916 [osiris_replica:init/1] next offset 0, tail info {0,empty}
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> crasher:
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> initial call: osiris_replica:init/1
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> pid: <0.3134.0>
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> registered_name: []
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> exception error: no case clause matching
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> {error,
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> {connection_refused,
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> {child,undefined,#Ref<0.2446360502.61865985.142836>,
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> {osiris_replica_reader,start_link,
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> [#{connection_token =>
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> <<15,62,59,206,36,84,92,90,173,46,8,157,65,39,90,
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> 234,81,254,166,85,32,226,234,76,27,52,247,215,
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> 82,95,175,240>>,
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> hosts =>
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> ["VM_NAME",
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> "DNS_NAME_ASSIGNED_TO_VM",
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> {VM_IP},
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> {DOCKER_CONTAINER_IP}],
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> leader_pid => <12910.31741.0>,
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> name =>
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> "STREAM_NAME_REDACTED",
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> port => 6114,
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> reference =>
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> {resource,<<"VHOST_NAME">>,queue,
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> <<"management.sync">>},
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> replica_pid => <0.3134.0>,
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> start_offset => {0,empty},
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> transport => ssl}]},
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> temporary,false,5000,worker,
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> [osiris_replica_reader]}}}
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> in function osiris_replica_reader:start/2 (src/osiris_replica_reader.erl, line 108)
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> in call from osiris_replica:init/1 (src/osiris_replica.erl, line 234)
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> in call from gen_server:init_it/2 (gen_server.erl, line 848)
2022-08-24 14:28:21.596951+00:00 [error] <0.3134.0> in call from gen_server:init_it/6 (gen_server.erl, line 811)
It looks like it's trying to connect to its own replication port that it randomly chose from the 6000-6500 range, but fails and instantly makes another attempt with different port. I've verified connection to a port from that range can be established from within the container using both VM_NAME and DNS_NAME_ASSIGNED_TO_VM. Netstat shows that RabbitMQ starts listening on one of the 6000-6500 ports for a short while, then unbinds and tries another one (it happens more than once per second).
Kind Regards,
Maciej