k8s_statefulsets example crash

758 views

Skip to first unread message

Deepak Kaul

unread,

Sep 27, 2018, 11:22:59 AM9/27/18

to rabbitmq-users

Hi, I apologize if this has been posted already but I couldn't find the exact error in a search the closest match had to do with certs but my error doesn't mention certs.

I have a kubernetes cluster with 1 master and 2 worker nodes on bare metal. I'm trying to following the example in github.com/rabbitmq/rabbitmq-peer-discovery-k8s/examples/README.md

When I follow the steps I get the error below. I can't seem to make sense of why its crashing. Any thoughts on what I might be doing wrong? Thanks in advance.

kubectl -n test-rabbitmq logs rabbitmq-0

2018-09-27 15:09:35.926 [info] <0.33.0> Application lager started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.145 [info] <0.33.0> Application crypto started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.145 [info] <0.33.0> Application jsx started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.145 [info] <0.33.0> Application xmerl started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.219 [info] <0.33.0> Application mnesia started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.223 [info] <0.33.0> Application os_mon started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.223 [info] <0.33.0> Application cowlib started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.271 [info] <0.33.0> Application inets started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.271 [info] <0.33.0> Application asn1 started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.271 [info] <0.33.0> Application public_key started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.313 [info] <0.33.0> Application ssl started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.317 [info] <0.33.0> Application ranch started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.318 [info] <0.33.0> Application cowboy started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.318 [info] <0.33.0> Application ranch_proxy_protocol started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.319 [info] <0.33.0> Application recon started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.319 [info] <0.33.0> Application rabbit_common started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.324 [info] <0.33.0> Application amqp_client started on node 'rab...@10.244.2.33'

2018-09-27 15:09:36.331 [info] <0.201.0>

Starting RabbitMQ 3.7.7 on Erlang 20.3.4

Licensed under the MPL. See http://www.rabbitmq.com/

## ##

########## Licensed under the MPL. See http://www.rabbitmq.com/

###### ##

########## Logs: <stdout>

Starting broker...

2018-09-27 15:09:36.345 [info] <0.201.0>

node : rab...@10.244.2.33

home dir : /var/lib/rabbitmq

config file(s) : /etc/rabbitmq/rabbitmq.conf

cookie hash : XhdCf8zpVJeJ0EHyaxszPg==

log(s) : <stdout>

database dir : /var/lib/rabbitmq/mnesia/rab...@10.244.2.33

2018-09-27 15:09:37.638 [info] <0.209.0> Memory high watermark set to 76778 MiB (80507604172 bytes) of 191945 MiB (201269010432 bytes) total

2018-09-27 15:09:37.642 [info] <0.211.0> Enabling free disk space monitoring

2018-09-27 15:09:37.642 [info] <0.211.0> Disk free limit set to 50MB

2018-09-27 15:09:37.645 [info] <0.213.0> Limiting to approx 1048476 file handles (943626 sockets)

2018-09-27 15:09:37.645 [info] <0.214.0> FHC read buffering: OFF

2018-09-27 15:09:37.645 [info] <0.214.0> FHC write buffering: ON

2018-09-27 15:09:37.647 [info] <0.201.0> Node database directory at /var/lib/rabbitmq/mnesia/rab...@10.244.2.33 is empty. Assuming we need to join an existing cluster or initialise from scratch...

2018-09-27 15:09:37.647 [info] <0.201.0> Configured peer discovery backend: rabbit_peer_discovery_k8s

2018-09-27 15:09:37.647 [info] <0.201.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s

2018-09-27 15:09:37.647 [info] <0.201.0> Peer discovery backend does not support locking, falling back to randomized delay

2018-09-27 15:09:37.647 [info] <0.201.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.

2018-09-27 15:09:42.655 [info] <0.201.0> Failed to get nodes from k8s - {failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}},

{inet,[inet],nxdomain}]}

2018-09-27 15:09:42.656 [error] <0.200.0> CRASH REPORT Process <0.200.0> with 0 neighbours exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 164 in application_master:init/4 line 134

2018-09-27 15:09:42.656 [info] <0.33.0> Application rabbit exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 164

{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,\"{failed_connect,[{to_address,{\\"kubernetes.default.svc.cluster.local\\",443}},\n {inet,[inet],nxdomain}]}\"}},[{rabbit_mnesia,init_from_config,0,[{file,\"src/rabbit_mnesia.erl\"},{line,164}]},{rabbit_mnesia,init_with_lock,3,[{file,\"src/rabbit_mnesia.erl\"},{line,144}]},{rabbit_mnesia,init,0,[{file,\"src/rabbit_mnesia.erl\"},{line,111}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,run_step,2,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit_boot_steps,run_boot_steps,1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit,start,2,[{file,\"src/rabbit.erl\"},{line,805}]}]}}}}}"}

Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,"{failed_connect,[{to_address,{\"kubernetes.defau

All the system pods seem to be running fine:

kubectl -n kube-system get pods

NAME READY STATUS RESTARTS AGE

coredns-78fcdf6894-hrq9j 1/1 Running 86 40d

coredns-78fcdf6894-sz9h9 1/1 Running 86 40d

etcd-daphne0 1/1 Running 51 40d

kube-apiserver-daphne0 1/1 Running 350 19d

kube-controller-manager-daphne0 1/1 Running 2 19h

kube-flannel-ds-2lc27 1/1 Running 11 2d

kube-flannel-ds-jzbls 1/1 Running 6 2d

kube-flannel-ds-l6469 1/1 Running 66 28d

kube-proxy-6wlgq 1/1 Running 5 2d

kube-proxy-9rtbm 1/1 Running 6 2d

kube-proxy-mr5dp 1/1 Running 48 40d

kube-scheduler-daphne0 1/1 Running 53 40d

Michael Klishin

unread,

Sep 27, 2018, 5:54:56 PM9/27/18

to rabbitm...@googlegroups.com

> {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443

reads "error: failed to connect to address kubernetes.default.svc.cluster.local on port 443".

See [1][2] and mailing list archives [3].

1. http://www.rabbitmq.com/troubleshooting-networking.html

2. http://www.rabbitmq.com/troubleshooting-ssl.html

3. https://groups.google.com/forum/#!searchin/rabbitmq-users/Kubernetes$20failed_connect$20to_address%7Csort:date

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.