I am running a 3-replica etcd cluster on a single node (m5.8xlarge) EKS cluster (as a test before I move to using multiple nodes). I am using version 3.5.6 and EBS persistent volumes on the cluster nodes. The etcd stateful set as well as services all come up ok and I can verify via the "member list" and "endpoint health" commands that the cluster is healthy. However every few minutes 1 or more of the 3 etcd pods get into CrashloopBackOff and I see many connectivity related error messages in the etcd pod logs.
1.
{"level":"info","ts":"2023-09-22T00:17:24.691Z","caller":"rafthttp/peer.go:335","msg":"stopped remote peer","remote-peer-id":"b429c86e3cd4e077"}
{"level":"warn","ts":"2023-09-22T00:17:24.692Z","caller":"rafthttp/http.go:413","msg":"failed to find remote peer in cluster","local-member-id":"2e80f96756a54ca9","remote-peer-id-stream-handler":"2e80f96756a54ca9","remote-peer-id-from":"7fd61f3f79d97779","cluster-id":"718fb68f6a80fda9"}
2. {"level":"info","ts":"2023-09-22T00:17:24.690Z","caller":"rafthttp/peer.go:335","msg":"stopped remote peer","remote-peer-id":"7fd61f3f79d97779"}
{"level":"info","ts":"2023-09-22T00:17:24.690Z","caller":"rafthttp/peer.go:330","msg":"stopping remote peer","remote-peer-id":"b429c86e3cd4e077"}
{"level":"warn","ts":"2023-09-22T00:17:24.690Z","caller":"rafthttp/stream.go:286","msg":"closed TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","remote-peer-id":"b429c86e3cd4e077"}
3. {"level":"info","ts":"2023-09-22T00:17:24.695Z","caller":"embed/etcd.go:568","msg":"stopping serving peer traffic","address":"[::]:2380"}
{"level":"info","ts":"2023-09-22T00:17:25.695Z","caller":"embed/etcd.go:573","msg":"stopped serving peer traffic","address":"[::]:2380"}
{"level":"info","ts":"2023-09-22T00:17:25.695Z","caller":"embed/etcd.go:375","msg":"closed etcd server","name":"etcd-0","data-dir":"/var/run/etcd/etcd-0","advertise-peer-urls":["
http://etcd-0:2380"],"advertise-client-urls":["
http://etcd-0.etcd:2379"]}
Any hints to troubleshoot this further would be appreciated!
Thanks,