Hello Guys,
Setup info:
I have a 3 node(ubuntu OS) K8s cluster, where I'm using bitnami helmcharts with etcd:3.5.9 version.
Allowed: Multiple pods belonging to the etcd StatefulSet can be scheduled onto the same node.
NOTE: This issue is intermittent.
Issue:
When draining one of the node, the that Etcd instance restarts and gets scheduled to available node. Usual behavior is when terminating the etcd-instance it is removing itself from the etcd cluster member list(member is removed), when comes up and joins the member list again(member is added). But in this issue case, Etcd member is missing from the Etcd member list.
voltha voltha-etcd-cluster-client-0 1/1 Running 0 14h
voltha voltha-etcd-cluster-client-1 1/1 Running 0 14h
voltha voltha-etcd-cluster-client-2 0/1 CrashLoopBackOff 173 (4m31s ago) 14h
Logs when first time etcd crashes. It gives member not found error. Afterwards etcd restarted by K8s it never comes up and gets stuck in CrashLookBackOff State.
Error Logs:
2024-02-20T15:03:22.167595028Z stderr F [38;5;6metcd [38;5;5m15:03:22.16 [0m [38;5;2mINFO [0m ==> ** Starting etcd setup **
2024-02-20T15:03:22.185870876Z stderr F [38;5;6metcd [38;5;5m15:03:22.18 [0m [38;5;2mINFO [0m ==> Validating settings in ETCD_* env vars..
2024-02-20T15:03:22.188192766Z stderr F [38;5;6metcd [38;5;5m15:03:22.18 [0m [38;5;3mWARN [0m ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment.
2024-02-20T15:03:22.19508585Z stderr F [38;5;6metcd [38;5;5m15:03:22.19 [0m [38;5;2mINFO [0m ==> Initializing etcd
2024-02-20T15:03:22.197954667Z stderr F [38;5;6metcd [38;5;5m15:03:22.19 [0m [38;5;2mINFO [0m ==> Generating etcd config file using env variables
2024-02-20T15:03:22.221563855Z stderr F [38;5;6metcd [38;5;5m15:03:22.22 [0m [38;5;2mINFO [0m ==> Detected data from previous deployments
2024-02-20T15:03:22.336259228Z stderr F [38;5;6metcd [38;5;5m15:03:22.33 [0m [38;5;2mINFO [0m ==> Updating member in existing cluster
2024-02-20T15:03:22.38884315Z stderr F {"level":"warn","ts":"2024-02-20T15:03:22.388607Z","logger":"etcd-client","caller":"v...@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001c0000/voltha-etcd-cluster-client-0.voltha-etcd-cluster-client-headless.voltha.svc.cluster.local:2379","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: member not found"}
2024-02-20T15:03:22.388876646Z stderr F Error: etcdserver: member not found
When checked the member, the member is really not present in the member list.
kv exec -it voltha-etcd-cluster-client-1 – etcdctl member list -w table
-------------------------------------------------------------------------------------------------------------------{}{}----------+
ID
STATUS
NAME
PEER ADDRS
CLIENT ADDRS
IS LEARNER
-------------------------------------------------------------------------------------------------------------------{}{}----------+
As this issue is intermit, I am hoping if anyone can help me to resolve it or if it is a known issue and got fixed in latest releases.
Thanks,
Abhay