@diogo-reis: Reiterating the mentions to trigger a notification:
@kubernetes/sig-storage-bugs
In response to this:
/sig storage
cc @kubernetes/sig-storage-bugs
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
/sig openstack
@diogo-reis What's your k8s version?
My k8s version is v1.7.5.
k8s version 1.8 .. on AWS:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m (x2 over 3m) default-scheduler PersistentVolumeClaim is not bound: "www-web-0" (repeated 9 times)
Normal Scheduled 2m default-scheduler Successfully assigned web-0 to ip-10-222-38-161.eu-west-1.compute.internal
Normal SuccessfulMountVolume 2m kubelet, ip-10-222-38-161.eu-west-1.compute.internal MountVolume.SetUp succeeded for volume "default-token-td6qx"
Warning FailedMount 2m attachdetach AttachVolume.Attach failed for volume "pvc-37038357-ad9c-11e7-9798-0a5f89d13502" : Error attaching EBS volume "vol-0c159c62175f227b6" to instance "i-0af5130e8d06ffb5a": "IncorrectState: vol-0c159c62175f227b6 is not 'available'.\n\tstatus code: 400, request id: 53305145-566b-457e-b1d2-83272f8fd889"
Warning FailedMount 2m attachdetach AttachVolume.Attach failed for volume "pvc-37038357-ad9c-11e7-9798-0a5f89d13502" : Error attaching EBS volume "vol-0c159c62175f227b6" to instance "i-0af5130e8d06ffb5a": "IncorrectState: vol-0c159c62175f227b6 is not 'available'.\n\tstatus code: 400, request id: cd609dca-2e74-48bd-a309-c90cf98df2a7"
Normal SuccessfulMountVolume 2m kubelet, ip-10-222-38-161.eu-west-1.compute.internal MountVolume.SetUp succeeded for volume "pvc-37038357-ad9c-11e7-9798-0a5f89d13502"
I have seen this same on openstack, it takes quite many minutes to get it working. I have kube 1.8
Got the same error.
We modified the resource request/limits for one statefulset with 3 replicas.
K8s moved one of the replicas to another node, which has enough resources, but the volume was still attached to the old node.
K8s version: v1.8.1+coreos.0
Running on AWS
Warning FailedAttachVolume 7m (x2987 over 12m) attachdetach Multi-Attach error for volume "pvc-4fe430e8-db4d-11e7-9931-02138f142c30" Volume is already exclusively attached to one node and can't be attached to another
@diogo-reis, sorry for the late reply. You mentioned you "When i move a Pod with the expression "nodeSelector:" to another Node ". Could you please confirm that the pod is first killed from node 1 and then started on node 2. Is node 1 still running?
We got this "Multi-Attach error" also on Azure, in v1.9.6, we found volume in node.volumesInUse is not removed even after pod with that volume has already been moved from the node for a very long time, I filed another issue here: #62282
I'm getting the same issue on "Multi-Attach error" also on Azure, in v1.9.6 .
hi @SuperMarioo, this issue "Multi-Attach error" on Azure
is fixed in v1.9.7, PR: #62467
Pls follow below link to mitigate:
https://github.com/andyzhangx/demo/blob/master/issues/azuredisk-issues.md#5-azure-disk-pvc-multi-attach-error-makes-disk-mount-very-slow-or-mount-failure-forever
What is the status of this on AWS/EBS? I'have the same problem on AWS with v1.9.3
imo this "bug" exist in all volumetypes. If you have pod with pvc(any type, RWX types excluded) running in node1. You will shutdown that node1 -> the pod will start again in some another node but failovering(it will return that multi-attach error) volumes takes 6-10minutes because it will wait force detach.
Yes it seems that is general.
@zetaab that's correct, on Azure, time cost of disk detach and attach to another node would be around 1 min, so Multi-Attach error
within that 1 min is expected, while we found an issue specific in containerized kubelet that UnmountDevice process always fail which lead to disk detach on one node never succeeded on one node, in that case, we hit Multi-Attach error
for hours...
This's not a bug, this's expected behavior. You must not double mount with ReadWriteOnce policy; this's what Kubernetes is trying to avoid. However if the node which's down does not respond within 6 minutes (default), the volumes will be forced associated to the replacement pod.
Reference to this 'knowledge'.
@dElogics I think that there is a subproblem in AWS about provisiong EBS volumes and switching nodes. If it is provisioned as /dev/ and then reattached to be exposed on /dev/nvme nodes like M5. I think it will fail also if it is attached correctly to the node.
I'm experiencing the same issue on GKE 1.8.10-gke.0
@AmazingTurtle can you describe in more detail what happened to the old node to cause your pod to be rescheduled? Did the node become NotReady, or was it upgraded, terminated or repaired?
In my case it became NotReady when I stopped the Docker service. I'm using Rancher 2.0 so everything (including kubelet) is containerized. It runs on 3 bare-metal Ubuntu 16.04 nodes, latest Docker CE and Kubernetes 1.10.1. I have a Deployment with 1 replica that mounts a Ceph RBD PVC in RWO mode.
I've got the same issue with k8s 1.11.0 and ceph using dynamic provisioning.
This issue also occures when I do a
kubectl apply -f deployment.yml
As such it's not possible to modify something without waiting 6min... :(
I think I just experienced the same issue on AWS ...
We are facing the same issue with k8s 1.9.8 and rbd volumes. But in our case the pod was just redeployed on another node due to changes viakubectl edit deployment ...
@dakleine could you please provide what changes you made when edit deployment? What is the status of the old node? Thanks!
@jingxu97 we only changed the image version of our postgresql deployment. the old node is ready.
This happening to me on AWS as well. I have a pod right now that has been stuck in "ContainerCreating" for 15 minutes. Any ideas what to do?
Warning FailedMount 7s (x7 over 13m) kubelet, ip-x-x-x-x.x-x-x.compute.internal Unable to mount volumes for pod "gitlab02-runner-678ffc74f4-m2w8m_build(7a473b7f-d23e-11e8-8cfe-0688ae24c2fe)": timeout expired waiting for volumes to attach/mount for pod "build"/"gitlab02-runner-678ffc74f4-m2w8m". list of unattached/unmounted volumes=[data-volume]
@christensen143, looks like your data-volume is not attached. if you can access the kube-controller-manager log on the master, it should print out the detained message related to why it failed to attach the volume.
Just experienced the same on Digital Ocean. The pod is still in ContainerCreating after 13 mins...
postgres-deployment-77c874df64-k4hn9 0/1 ContainerCreating 0 13m
Same issue on AWS. Is there a way to "fix" the volume?
@pittar you can try restarting all master controllers (one-by-one then you do not have downtime)
@zetaab , thanks for the suggestion. We are actually on OpenShift Dedicated (managed service), so I guess I'll put in a support ticket.
@pittar could you please provide more details about your issue? In what situation your pod is stuck in "ContainerCreating"? Are you trying to delete your pod and then start it on another node? Thanks!
Hi @jingxu97 , since it's a managed service, I've submitted a support ticket, but I'll try to explain what happened here in case it benefits others.
Last night was a scheduled upgrade of our OpenShift cluster from 3.9 (k8s 1.9) to 3.11 (k8s 1.11). During the upgrade, pods would have been evacuated and re-created, so my guess is certain pods (like postgres) tried to re-attach to a PV (aws ebs) before the old pod had actually shut down. This seems to have left things in a strange state.
This morning, I tried killing the pods and restarting. The first error I would get form the pods was that the volumes were already attached to a different container (aws ebs are not RWX). After killing them again and trying to restart, I would get the timeout error similar to @christensen143 last comment.
One of the pods eventually re-attached (after I killed it and waited a good 10min before starting it again). Another that was in the same state still hasn't been able to start properly (still getting timeout trying to attach/mount the pv).
@rootfs @jsafrane @thockin do you have guys idea how we could improve this situation? This volume mount problem has been problem for a long time. I have tried to solve this twice, but always storage or node sigs are saying that my solution is incorrect.
We have customer who is using cronjobs each 5 minute, and they do have volume in it as well. Well, you can imagine what will happen when you will ask volumes to mount every 5 minute, force detach time is 6 minutes. I think we can modify force detach time in cluster, but still it does not remove this problem. It seems that this volume mount problem is in all cloudproviders, sometimes it takes 5-20minutes to get volume in place. 20minutes is quite huge time if your application is running production.
I compared kubernetes 1.9 and kube 1.13 with this cronjob with volume thing. In 1.13 volume mounts are working like should, 1.9 does not work correctly. So if you see problems I would say please update cluster first. Then we have problem left if node is shutdown / similar. That another ticket hopefully will solve that
I have similar issue in DigitalOcean. If pod is scheduled for deployment on another node it will break as current node+pod are already linked and old pod will not detach before new one is attached.
FIX attempt 1: Add RollingUpdate
maxUnavailable: 100%
--> FAILED
FIX attempt 2: Add FIX1 + add affinity to deploy pod only to one node --> SUCCESS
This means that you will have service for few seconds offline and you will not be able to use cluster nor to scale service to different nodes.
DigitalOcean volumes support only ReadWriteOnce as many others. That means that we need to find some better solution as deployment to one node and accepting downtime is not what Kubernetes is and it heavily undermines entire idea of persistent volumes.
Version:
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:31:33Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen
.
Mark the issue as fresh with/remove-lifecycle rotten
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Closed #53059.
I am still having this exact issue on version: v1.12.8 on Google Kubernetes Engine.
It happens to me when I run kubectl apply -f app.yaml
and make a pod recreate itself.
How is this still not fixed? Am I using Kubernetes incorrectly?
/reopen
/remove-lifecycle rotten
@jonstelly: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
/remove-lifecycle rotten
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
—
This's not a bug, this's expected behavior. You must not double mount with ReadWriteOnce policy; this's what Kubernetes is trying to avoid. However if the node which's down does not respond within 6 minutes (default), the volumes will be forced associated to the replacement pod.
Reference to this 'knowledge'.
Unfortunately in case of iscsi, the replacement pod always remains in ContainerCreating state which's problematic.
Hi,
I'm writing a CSI plugin.
I've found the 6 minutes timeout here: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/attachdetach/attach_detach_controller.go#L89
I still have a problem: in case node is down, application will be down for 6 minutes.
I do not think this is a valid timeout for applications.
I want kubernetes to detach the volume faster.
I do not want to change this timeout (even if it was configurable without compiling the source), as it changes timeout for any umount.
What I do want is to tel k8s that volume is detached when I understand the node is down.
Since I'm the plugin manager, I believe this is a valid request.
Is there a way to do so?
Also affected for me, I'm using linstor-csi plugin
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Steps to reproduce:
kubectl get pod -o wide | grep <statefulsetname>
for find the node where is it runningkubectl cordon <node>
kubectl get pod -o wide | grep <statefulsetname>
for find the new node is it runningContainerCreating
stateYou also can check kubectl get volumeattachments.storage.k8s.io
for track detaching/attaching process
this is new bug, reported here #86281
Has this issue been resolved?
I have the same issue on k8s 1.17.2, rook-ceph as storage. One worker node getting turned off, pod is trying to be evicted after 5 minutes, but can not start because "Volume is exclusively used ...by the old pod". Old pod is getiing stuck in "Terminating". Workaround: kill the old pod, kill the new pod, wait until the new pod is still unable to start, kill the new pod again. Pretty weak for a cluster solution.
same on scaleway kapsule
Same on OKD 3.11 with Ceph RDB storage
Taint-based Evictions feature should do the trick - it's GA since k8s 1.18.
The lesson learned here: Don't use kubernetes for database.
Same issue here on k8s 1.15 on AWS, provisioned with kOps.
After setting up Cluster Autoscaler together with Horisontal Pod Autoscaler, this has become a huge problem, as nodes come and go all the time, based on demand.
I've been forced to add safe-to-evict: false
to all deployments tha tuse pods with PV's, which defeats some of the purpose of autoscaling.
Is this a problem with EBS attach/detach timeouts/limits or K8s?
I have similar issue in DigitalOcean. If pod is scheduled for deployment on another node it will break as current node+pod are already linked and old pod will not detach before new one is attached.
FIX attempt 1: Add
RollingUpdate
maxUnavailable: 100%
--> FAILED
FIX attempt 2: Add FIX1 + add affinity to deploy pod only to one node --> SUCCESSThis means that you will have service for few seconds offline and you will not be able to use cluster nor to scale service to different nodes.
DigitalOcean volumes support only ReadWriteOnce as many others. That means that we need to find some better solution as deployment to one node and accepting downtime is not what Kubernetes is and it heavily undermines entire idea of persistent volumes.
Version:
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:31:33Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Having the same issue on DitigalOcean, there're two things invovled
RollingUpdate
vs Recreate
. Obviously for zero down time RollingUpdate
is preferred. It keeps old pod before new pod is ready. Here comes the problem: the new pod will fail saying "Multi-Attach error for volume "pvc-xxx" Volume is already used by pod(s) xxx". Changing to Recreate
seem to eliminate this error - make sense, it destroys old pod first, leaving some down time, but it ensures volume is completely attached, then new pod schedule & attach volume. Not sure if @MichaelOrtho 's FIX1 equals to Recreate
. But like @MichaelOrtho said, this beats one of k8s main purpose of zero downtime. What I see ideally is that with RollingUpdate
, k8s should be able to transfer the volume attachment from old pod to new pod. Is this a bug, or it's just not possible and it's an expected limitation on k8s's RollingUpdate
?ReadWriteOnce
that only allows volume mount to one node. This error occurs even if your update strategy is Recreate
. Current workaround is like @MichaelOrtho mentioned, add affinity to ensure scheduling on one same node. The question is, is this a bug for k8s, at least for Recreate
, can k8s detach volume from one node/old pod, and attach it to another node/new pod?@adipascu and other people on StackOverflow mentioned StatefulSet
for stateful app, haven't tried it yet.
If the above are not considered as k8s bug, on a user/developer experience perspective, I really think maybe we should disable PVC support on Deployment completely, or if PVC used, by default configure Recreate
and affinity for the user, or at least highlight this in Deployment's documentation and guide people to use StatefulSet
, since user will absolutely hit the wall when using PVC on Deployment.
IIRC While working through this issue in 2018 the codebase for handling this situation was not "common" and had a custom implementation in each storage driver. Not sure if that info helps at all.
It seems HPE driver has something called the pod monitor that deletes the pod and the volumeattachment forcefully. Here is an article on it: https://datamattsson.tumblr.com/post/622329432678514688/better-automatic-recovery-for-csi-drivers-on
I am not sure if this is a very graceful solution.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
Still having this issue on AKS v1.27.
The delayed volume detachment added a 6m delay to our database pods coming back up on a new VM. My workaround is to explicitly delete the VolumeAttachments for the node set for removal after it has been drained. This allows the pod to startup within 1m, which is prob as good as can be expected. This works only for planned maintenance operations where K8s nodes are being explicitly shut down.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.
Same with EKS 1.21..24 and aws-ebs-csi-driver:v1.5.1
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are on a team that was mentioned.