Cluster fails to start in k8s

719 views
Skip to first unread message

Coders Magic

unread,
Nov 21, 2018, 1:57:25 AM11/21/18
to rabbitmq-users

Hi All,

My first pod in the stateful set fails to come up with below error message.

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: rabbitmq 
---
kind: Role
metadata:
  name: endpoint-reader
rules:
- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get"]
---
kind: RoleBinding
metadata:
  name: endpoint-reader
subjects:
- kind: ServiceAccount
  name: rabbitmq
roleRef:
  kind: Role
  name: endpoint-reader

---
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
  name: 
  labels:
    app: 
    chart: 
    release: 
    heritage: 
spec:
  serviceName: my-rabbitmq
  replicas: 3
 selector:
    matchLabels:
      app: 
      release: 
  template:
    metadata:
      labels:
        app: 
        release: 
        name: 
    spec:
      serviceAccountName: rabbitmq
     




2018-11-21 06:51:35.541 [info] <0.205.0> FHC read buffering:  OFF

2018-11-21 06:51:35.541 [info] <0.205.0> FHC write buffering: ON

2018-11-21 06:51:35.543 [info] <0.191.0> Node database directory at /var/lib/rabbitmq/mnesia/rab...@10.56.9.66 is empty. Assuming we need to join an existing cluster or initialise from scratch...

2018-11-21 06:51:35.543 [info] <0.191.0> Configured peer discovery backend: rabbit_peer_discovery_k8s

2018-11-21 06:51:35.543 [info] <0.191.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s

2018-11-21 06:51:35.548 [info] <0.191.0> Peer discovery backend does not support locking, falling back to randomized delay

2018-11-21 06:51:35.548 [info] <0.191.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.

2018-11-21 06:51:35.593 [info] <0.191.0> Failed to get nodes from k8s - 404

2018-11-21 06:51:35.594 [error] <0.190.0> CRASH REPORT Process <0.190.0> with 0 neighbours exited with reason: no case clause matching {error,"404"} in rabbit_mnesia:init_from_config/0 line 164 in application_master:init/4 line 134

2018-11-21 06:51:35.594 [info] <0.33.0> Application rabbit exited with reason: no case clause matching {error,"404"} in rabbit_mnesia:init_from_config/0 line 164

Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,"404"}},[{rabbit_mnesia,init_from_config,0,[{file

{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,\"404\"}},[{rabbit_mnesia,init_from_config,0,[{file,\"src/rabbit_mnesia.erl\"},{line,164}]},{rabbit_mnesia,init_with_lock,3,[{file,\"src/rabbit_mnesia.erl\"},{line,144}]},{rabbit_mnesia,init,0,[{file,\"src/rabbit_mnesia.erl\"},{line,111}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,run_step,2,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit_boot_steps,run_boot_steps,1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit,start,2,[{file,\"src/rabbit.erl\"},{line,816}]}]}}}}}"}


Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done

Coders Magic

unread,
Nov 21, 2018, 1:59:24 AM11/21/18
to rabbitmq-users
env in statefulset is something like below

env:  
                  - name: MY_POD_IP
                    valueFrom:
                        fieldRef:
                            fieldPath: status.podIP
                  - name: RABBITMQ_NODENAME
                    value: "rabbit@$(MY_POD_IP)"
                  - name: RABBITMQ_USE_LONGNAME
                    value: "true"
                  - name: K8S_SERVICE_NAME
                    value: "rabbitmq"
                  - name: RABBITMQ_ERLANG_COOKIE
                    value: "mycookie" 
                  - name: RABBITMQ_SSL_CERTFILE
                    value: {{ .Values.internalTls.mountPath }}/{{ .Values.internalTls.crt }}
                  - name: RABBITMQ_SSL_KEYFILE
                    value: {{ .Values.internalTls.mountPath }}/{{ .Values.internalTls.key }}
                  - name: RABBITMQ_SSL_CACERTFILE
                    value: {{ .Values.internalTls.mountPath }}/{{ .Values.internalTls.ca }}
                  - name: RABBITMQ_SSL_CA_FILE
                    value: {{ .Values.internalTls.mountPath }}/{{ .Values.internalTls.ca }}

2018-11-21 06:51:35.543 [info] <0.191.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit...@10.56.9.66 is empty. Assuming we need to join an existing cluster or initialise from scratch...

Michael Klishin

unread,
Nov 21, 2018, 11:34:42 AM11/21/18
to rabbitm...@googlegroups.com
The peer discovery process issues an HTTP request to a Kubernetes endpoint and that responds with a 404.
I'd recommend doing a traffic capture to see what is the specific path requested if you are not sure
what it is since it relies on several variables in the plugin plus the value in the `k8s_namespace_path` file.

Note that you must reset nodes (or make sure their pods are recreated) between attempts in development environments [1].


2018-11-21 06:51:35.543 [info] <0.191.0> Node database directory at /var/lib/rabbitmq/mnesia/rab...@10.56.9.66 is empty. Assuming we need to join an existing cluster or initialise from scratch...

2018-11-21 06:51:35.543 [info] <0.191.0> Configured peer discovery backend: rabbit_peer_discovery_k8s

2018-11-21 06:51:35.543 [info] <0.191.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s

2018-11-21 06:51:35.548 [info] <0.191.0> Peer discovery backend does not support locking, falling back to randomized delay

2018-11-21 06:51:35.548 [info] <0.191.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.

2018-11-21 06:51:35.593 [info] <0.191.0> Failed to get nodes from k8s - 404

2018-11-21 06:51:35.594 [error] <0.190.0> CRASH REPORT Process <0.190.0> with 0 neighbours exited with reason: no case clause matching {error,"404"} in rabbit_mnesia:init_from_config/0 line 164 in application_master:init/4 line 134

2018-11-21 06:51:35.594 [info] <0.33.0> Application rabbit exited with reason: no case clause matching {error,"404"} in rabbit_mnesia:init_from_config/0 line 164

Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,"404"}},[{rabbit_mnesia,init_from_config,0,[{file

{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,\"404\"}},[{rabbit_mnesia,init_from_config,0,[{file,\"src/rabbit_mnesia.erl\"},{line,164}]},{rabbit_mnesia,init_with_lock,3,[{file,\"src/rabbit_mnesia.erl\"},{line,144}]},{rabbit_mnesia,init,0,[{file,\"src/rabbit_mnesia.erl\"},{line,111}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,run_step,2,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit_boot_steps,run_boot_steps,1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit,start,2,[{file,\"src/rabbit.erl\"},{line,816}]}]}}}}}"}


Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Coders Magic

unread,
Nov 22, 2018, 2:53:50 AM11/22/18
to rabbitmq-users
Setting the K8S_SERVICE_NAME to my-rabbitmq worked. 

It was wrongly set to "rabbitmq"

Thanks for the help MK


On Wednesday, November 21, 2018 at 12:27:25 PM UTC+5:30, Coders Magic wrote:

2018-11-21 06:51:35.543 [info] <0.191.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit...@10.56.9.66 is empty. Assuming we need to join an existing cluster or initialise from scratch...

Michael Klishin

unread,
Nov 22, 2018, 7:21:58 PM11/22/18
to rabbitm...@googlegroups.com
I'm not sure why "rabbitmq" would be wrong but "my-rabbitmq" would be right. May I ask you to elaborate?

2018-11-21 06:51:35.543 [info] <0.191.0> Node database directory at /var/lib/rabbitmq/mnesia/rab...@10.56.9.66 is empty. Assuming we need to join an existing cluster or initialise from scratch...

2018-11-21 06:51:35.543 [info] <0.191.0> Configured peer discovery backend: rabbit_peer_discovery_k8s

2018-11-21 06:51:35.543 [info] <0.191.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s

2018-11-21 06:51:35.548 [info] <0.191.0> Peer discovery backend does not support locking, falling back to randomized delay

2018-11-21 06:51:35.548 [info] <0.191.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.

2018-11-21 06:51:35.593 [info] <0.191.0> Failed to get nodes from k8s - 404

2018-11-21 06:51:35.594 [error] <0.190.0> CRASH REPORT Process <0.190.0> with 0 neighbours exited with reason: no case clause matching {error,"404"} in rabbit_mnesia:init_from_config/0 line 164 in application_master:init/4 line 134

2018-11-21 06:51:35.594 [info] <0.33.0> Application rabbit exited with reason: no case clause matching {error,"404"} in rabbit_mnesia:init_from_config/0 line 164

Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,"404"}},[{rabbit_mnesia,init_from_config,0,[{file

{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,\"404\"}},[{rabbit_mnesia,init_from_config,0,[{file,\"src/rabbit_mnesia.erl\"},{line,164}]},{rabbit_mnesia,init_with_lock,3,[{file,\"src/rabbit_mnesia.erl\"},{line,144}]},{rabbit_mnesia,init,0,[{file,\"src/rabbit_mnesia.erl\"},{line,111}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,run_step,2,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit_boot_steps,run_boot_steps,1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit,start,2,[{file,\"src/rabbit.erl\"},{line,816}]}]}}}}}"}


Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages