According to Spindel's test, k8s cluster will recover by restarting apiserver; so I think this may also apiserver related. Any way, I'd like build some tools or PR to see what happen in scheduler (can not reproduced in local, and scheduler's log seems no much info).
/cc @kubernetes/sig-api-machinery-bugs
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
/close
Closed #45237.
slack with Spindel@, it seems OK after update ETCD to 3.0.17 :).
@Spindel , I closed this firstly as it seems OK with ETCD 3.0.17; if there is still issue, please free feel to re-open this and ping me :).
I'm facing this with gce 1.8.4-gke.1. I create a new beefy compute node and schedule a container that can only fit there. After stopping (using jupyter auto cull container), i am unable to schedule it again. My current workaround is to delete the node and add a new one. The scheduler always works the first time. It seems like the scheduler isn't tracking the release or resources properly. I wasn't able to find a scheduler pod in any namespace, maybe a gce thing.
I'm seeing this with AKS. I have nodes with 8G of ram and I schedule 1 pod per node with limits and requests for 6.5G memory. Sometimes it works fine. Othertimes it says "insufficient memory" when there is clearly enough.
I am plagued by this issue on EKS (Kubernetes version 1.21)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.