k8s_statefulsets example crash

758 views
Skip to first unread message

Deepak Kaul

unread,
Sep 27, 2018, 11:22:59 AM9/27/18
to rabbitmq-users
Hi, I apologize if this has been posted already but I couldn't find the exact error in a search the closest match had to do with certs but my error doesn't mention certs.

I have a kubernetes cluster with 1 master and 2 worker nodes on bare metal. I'm trying to following the example in github.com/rabbitmq/rabbitmq-peer-discovery-k8s/examples/README.md

When I follow the steps I get the error below. I can't seem to make sense of why its crashing. Any thoughts on what I might be doing wrong? Thanks in advance.

kubectl -n test-rabbitmq logs rabbitmq-0
2018-09-27 15:09:35.926 [info] <0.33.0> Application lager started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.145 [info] <0.33.0> Application crypto started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.145 [info] <0.33.0> Application jsx started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.145 [info] <0.33.0> Application xmerl started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.219 [info] <0.33.0> Application mnesia started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.223 [info] <0.33.0> Application os_mon started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.223 [info] <0.33.0> Application cowlib started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.271 [info] <0.33.0> Application inets started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.271 [info] <0.33.0> Application asn1 started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.271 [info] <0.33.0> Application public_key started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.313 [info] <0.33.0> Application ssl started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.317 [info] <0.33.0> Application ranch started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.318 [info] <0.33.0> Application cowboy started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.318 [info] <0.33.0> Application ranch_proxy_protocol started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.319 [info] <0.33.0> Application recon started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.319 [info] <0.33.0> Application rabbit_common started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.324 [info] <0.33.0> Application amqp_client started on node 'rab...@10.244.2.33'
2018-09-27 15:09:36.331 [info] <0.201.0> 
 Starting RabbitMQ 3.7.7 on Erlang 20.3.4
 Copyright (C) 2007-2018 Pivotal Software, Inc.
 Licensed under the MPL.  See http://www.rabbitmq.com/

  ##  ##
  ##  ##      RabbitMQ 3.7.7. Copyright (C) 2007-2018 Pivotal Software, Inc.
  ##########  Licensed under the MPL.  See http://www.rabbitmq.com/
  ######  ##
  ##########  Logs: <stdout>

              Starting broker...
2018-09-27 15:09:36.345 [info] <0.201.0> 
 node           : rab...@10.244.2.33
 home dir       : /var/lib/rabbitmq
 config file(s) : /etc/rabbitmq/rabbitmq.conf
 cookie hash    : XhdCf8zpVJeJ0EHyaxszPg==
 log(s)         : <stdout>
 database dir   : /var/lib/rabbitmq/mnesia/rab...@10.244.2.33
2018-09-27 15:09:37.638 [info] <0.209.0> Memory high watermark set to 76778 MiB (80507604172 bytes) of 191945 MiB (201269010432 bytes) total
2018-09-27 15:09:37.642 [info] <0.211.0> Enabling free disk space monitoring
2018-09-27 15:09:37.642 [info] <0.211.0> Disk free limit set to 50MB
2018-09-27 15:09:37.645 [info] <0.213.0> Limiting to approx 1048476 file handles (943626 sockets)
2018-09-27 15:09:37.645 [info] <0.214.0> FHC read buffering:  OFF
2018-09-27 15:09:37.645 [info] <0.214.0> FHC write buffering: ON
2018-09-27 15:09:37.647 [info] <0.201.0> Node database directory at /var/lib/rabbitmq/mnesia/rab...@10.244.2.33 is empty. Assuming we need to join an existing cluster or initialise from scratch...
2018-09-27 15:09:37.647 [info] <0.201.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2018-09-27 15:09:37.647 [info] <0.201.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2018-09-27 15:09:37.647 [info] <0.201.0> Peer discovery backend does not support locking, falling back to randomized delay
2018-09-27 15:09:37.647 [info] <0.201.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2018-09-27 15:09:42.655 [info] <0.201.0> Failed to get nodes from k8s - {failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}},
                 {inet,[inet],nxdomain}]}
2018-09-27 15:09:42.656 [error] <0.200.0> CRASH REPORT Process <0.200.0> with 0 neighbours exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n                 {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 164 in application_master:init/4 line 134
2018-09-27 15:09:42.656 [info] <0.33.0> Application rabbit exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n                 {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 164
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,\"{failed_connect,[{to_address,{\\"kubernetes.default.svc.cluster.local\\",443}},\n                 {inet,[inet],nxdomain}]}\"}},[{rabbit_mnesia,init_from_config,0,[{file,\"src/rabbit_mnesia.erl\"},{line,164}]},{rabbit_mnesia,init_with_lock,3,[{file,\"src/rabbit_mnesia.erl\"},{line,144}]},{rabbit_mnesia,init,0,[{file,\"src/rabbit_mnesia.erl\"},{line,111}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,run_step,2,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit_boot_steps,run_boot_steps,1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit,start,2,[{file,\"src/rabbit.erl\"},{line,805}]}]}}}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,"{failed_connect,[{to_address,{\"kubernetes.defau


All the system pods seem to be running fine:

kubectl -n kube-system get pods
NAME                              READY     STATUS    RESTARTS   AGE
coredns-78fcdf6894-hrq9j          1/1       Running   86         40d
coredns-78fcdf6894-sz9h9          1/1       Running   86         40d
etcd-daphne0                      1/1       Running   51         40d
kube-apiserver-daphne0            1/1       Running   350        19d
kube-controller-manager-daphne0   1/1       Running   2          19h
kube-flannel-ds-2lc27             1/1       Running   11         2d
kube-flannel-ds-jzbls             1/1       Running   6          2d
kube-flannel-ds-l6469             1/1       Running   66         28d
kube-proxy-6wlgq                  1/1       Running   5          2d
kube-proxy-9rtbm                  1/1       Running   6          2d
kube-proxy-mr5dp                  1/1       Running   48         40d
kube-scheduler-daphne0            1/1       Running   53         40d

Michael Klishin

unread,
Sep 27, 2018, 5:54:56 PM9/27/18
to rabbitm...@googlegroups.com
> {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443

reads "error: failed to connect to address kubernetes.default.svc.cluster.local on port 443".

See [1][2] and mailing list archives [3].


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ
Reply all
Reply to author
Forward
0 new messages