Hello all
I am trying to setup RMQ cluster with autocluster plugin. I have tried both with autocluster v0.6.1 and v0.7.0. The issue I am seeing is that rmq server takes about 4-5 minutes to come up.
Below are the details:
I am running rmq 3.6.9.
List of plugins enabled:
root@rabbitmq-0:/usr/lib/rabbitmq/bin# rabbitmq-plugins list
Configured: E = explicitly enabled; e = implicitly enabled
|/
[e ] amqp_client 3.6.9
[E ] autocluster 0.6.1
[e ] cowboy 1.0.4
[e ] cowlib 1.0.2
[ ] rabbitmq_amqp1_0 3.6.9
[ ] rabbitmq_auth_backend_ldap 3.6.9
[ ] rabbitmq_auth_mechanism_ssl 3.6.9
[e ] rabbitmq_aws 0.1.2
[ ] rabbitmq_consistent_hash_exchange 3.6.9
[ ] rabbitmq_event_exchange 3.6.9
[E ] rabbitmq_federation 3.6.9
[E ] rabbitmq_federation_management 3.6.9
[ ] rabbitmq_jms_topic_exchange 3.6.9
[E ] rabbitmq_management 3.6.9
[e ] rabbitmq_management_agent 3.6.9
[ ] rabbitmq_management_visualiser 3.6.9
[ ] rabbitmq_mqtt 3.6.9
[ ] rabbitmq_recent_history_exchange 3.6.9
[ ] rabbitmq_sharding 3.6.9
[E ] rabbitmq_shovel 3.6.9
[E ] rabbitmq_shovel_management 3.6.9
[ ] rabbitmq_stomp 3.6.9
[ ] rabbitmq_top 3.6.9
[ ] rabbitmq_tracing 3.6.9
[ ] rabbitmq_trust_store 3.6.9
[e ] rabbitmq_web_dispatch 3.6.9
[ ] rabbitmq_web_mqtt 3.6.9
[ ] rabbitmq_web_mqtt_examples 3.6.9
[ ] rabbitmq_web_stomp 3.6.9
[ ] rabbitmq_web_stomp_examples 3.6.9
[ ] sockjs 0.3.4
This is my env and config:
# config
root@rabbitmq-0:/usr/lib/rabbitmq/bin# cat /etc/rabbitmq/rabbitmq.config
[
{rabbit, [
{tcp_listen_options, [
{backlog, 128},
{nodelay, true},
{linger, {true,0}},
{exit_on_close, false},
{sndbuf, 12000},
{recbuf, 12000}
]},
{log_levels, [{autocluster, debug}, {connection, info}]},
{loopback_users, []}
]},
{autocluster, [
{dummy_param_without_comma, true},
{autocluster_log_level, debug},
{backend, etcd},
{autocluster_failure, stop},
{cleanup_interval, 30},
{cluster_cleanup, true},
{cleanup_warn_only, false},
{etcd_ttl, 30},
{etcd_scheme, http},
{etcd_host, "etcd.kube-system.svc.cluster.local"},
{etcd_port, 2379}
]}
].
# Env
root@rabbitmq-0:/usr/lib/rabbitmq/bin# cat /etc/rabbitmq/rabbitmq-env.conf
USE_LONGNAME=true
RABBITMQ_USE_LONGNAME=true
# k8s will inject $POD_IP
NODENAME=rabbit@$POD_IP
Starting the server essentially hangs (the broker doesn't come up for 4-5 minutes):
root@rabbitmq-0:/usr/lib/rabbitmq/bin# ./rabbitmq-server start
root@rabbitmq-0:/usr/lib/rabbitmq/bin# rabbitmqctl status
[{pid,9882},
{running_applications,[{ssl,"Erlang/OTP SSL application","7.3"},
{public_key,"Public key infrastructure","1.1.1"},
{cowlib,"Support library for manipulating Web protocols.",
"1.0.2"},
{crypto,"CRYPTO","3.6.3"},
{mnesia,"MNESIA CXC 138 12","4.13.3"},
{asn1,"The Erlang ASN1 compiler version 4.0.2",
"4.0.2"},
{inets,"INETS CXC 138 49","6.2"},
{xmerl,"XML parser","1.3.10"},
{sasl,"SASL CXC 138 11","2.7"},
{stdlib,"ERTS CXC 138 10","2.8"},
{kernel,"ERTS CXC 138 10","4.2"}]},
{os,{unix,linux}},
{erlang_version,"Erlang/OTP 18 [erts-7.3] [source] [64-bit] [smp:8:8] [async-threads:128] [kernel-poll:true]\n"},
{memory,[{total,130689648},
{connection_readers,0},
{connection_writers,0},
{connection_channels,0},
{connection_other,0},
{queue_procs,0},
{queue_slave_procs,0},
{plugins,0},
{other_proc,10493288},
{mnesia,59296},
{metrics,0},
{mgmt_db,0},
{msg_index,0},
{other_ets,1714320},
{binary,94528},
{code,13998139},
{atom,553569},
{other_system,103776508}]},
{alarms,[]},
{listeners,[]},
{processes,[{limit,262144},{used,82}]},
{run_queue,0},
{uptime,16},
{kernel,{net_ticktime,60}}]
As you can see, the broker app along with the plugins are not up yet. I am suspecting the reason is due to rabbitmq_aws plugin making calls to AWS EC2 instance metadata server, even though the configured backend is etcd.
Here is the tcpdump output, it hangs here till retrys and timeouts are exceeded:
root@rabbitmq-0:/# tcpdump -i eth0 dst 169.254.169.254
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
03:01:19.720263 IP rabbitmq-0.rabbitmq.maglev-system.svc.cluster.local.38890 > 169.254.169.254.80: Flags [S], seq 4159351748, win 29200, options [mss 1460,sackOK,TS val 222882878 ecr 0,nop,wscale 7], length 0
03:01:19.720947 IP rabbitmq-0.rabbitmq.maglev-system.svc.cluster.local.38890 > 169.254.169.254.80: Flags [.], ack 285812580, win 229, options [nop,nop,TS val 222882878 ecr 873474382], length 0
03:01:19.728853 IP rabbitmq-0.rabbitmq.maglev-system.svc.cluster.local.38890 > 169.254.169.254.80: Flags [P.], seq 0:134, ack 1, win 229, options [nop,nop,TS val 222882880 ecr 873474382], length 134: HTTP: GET /latest/meta-data/placement/availability-zone HTTP/1.1
After this, once the broker starts, autocluster registeres with etcd and cluster comes up, but I am stuck with this initial delay.
Appreciate any help!
--
Harish