A CSI plugin that the 1.10 controller manager can't detect (which could occur in a number of scenarios, such as when a rollout is being performed or a node is evacuated) seems to retry without backoff. In this case, I have a pod referencing a PVC -> PV that references my CSI plugin, but no controller plugin running.
The controller is probing every 100ms continuously for a single volume attachment. The controller should backoff. Have not tested with higher numbers of pods, but in a busy system this could cause the controller manager to burn CPU.
I0611 02:50:56.620411 1 csi_attacher.go:134] kubernetes.io/csi: probing VolumeAttachment [id=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0]
I0611 02:50:56.740576 1 csi_attacher.go:134] kubernetes.io/csi: probing VolumeAttachment [id=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0]
I0611 02:50:56.840987 1 csi_attacher.go:134] kubernetes.io/csi: probing VolumeAttachment [id=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0]
I0611 02:50:56.941971 1 csi_attacher.go:134] kubernetes.io/csi: probing VolumeAttachment [id=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0]
I0611 02:50:57.042799 1 csi_attacher.go:134] kubernetes.io/csi: probing VolumeAttachment [id=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0]
I0611 02:50:57.144185 1 csi_attacher.go:134] kubernetes.io/csi: probing VolumeAttachment [id=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0]
I0611 02:50:57.245544 1 csi_attacher.go:134] kubernetes.io/csi: probing VolumeAttachment [id=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0]
I0611 02:50:57.301899 1 wrap.go:42] GET /healthz: (33.695µs) 200 [[kube-probe/1.10+] 10.0.2.15:50350]
I0611 02:50:57.346632 1 csi_attacher.go:134] kubernetes.io/csi: probing VolumeAttachment [id=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0]
I0611 02:50:57.448573 1 csi_attacher.go:134] kubernetes.io/csi: probing VolumeAttachment [id=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0]
I0611 02:50:57.548648 1 csi_attacher.go:134] kubernetes.io/csi: probing VolumeAttachment [id=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0]
I0611 02:50:57.649474 1 csi_attacher.go:134] kubernetes.io/csi: probing VolumeAttachment [id=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0]
I0611 02:50:57.749117 1 csi_attacher.go:134] kubernetes.io/csi: probing VolumeAttachment [id=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0]
I0611 02:50:57.849607 1 csi_attacher.go:134] kubernetes.io/csi: probing VolumeAttachment [id=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0]
I0611 02:50:57.950186 1 csi_attacher.go:134] kubernetes.io/csi: probing VolumeAttachment [id=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0]
E0611 02:50:57.964001 1 csi_attacher.go:201] kubernetes.io/csi: attacher.WaitForAttach timeout after 15s [volume=volume-1; attachment.ID=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0]
E0611 02:50:57.964202 1 csi_attacher.go:168] [attachment.ID=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0] watch channel had been closed
E0611 02:50:57.964220 1 csi_attacher.go:168] [attachment.ID=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0] watch channel had been closed
E0611 02:50:57.964228 1 csi_attacher.go:168] [attachment.ID=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0] watch channel had been closed
E0611 02:50:57.964236 1 csi_attacher.go:168] [attachment.ID=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0] watch channel had been closed
E0611 02:50:57.964324 1 csi_attacher.go:168] [attachment.ID=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0] watch channel had been closed
E0611 02:50:57.964385 1 csi_attacher.go:168] [attachment.ID=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0] watch channel had been closed
E0611 02:50:57.964398 1 csi_attacher.go:168] [attachment.ID=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0] watch channel had been closed
E0611 02:50:57.964406 1 csi_attacher.go:168] [attachment.ID=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0] watch channel had been closed
E0611 02:50:57.964413 1 csi_attacher.go:168] [attachment.ID=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0] watch channel had been closed
E0611 02:50:57.964421 1 csi_attacher.go:168] [attachment.ID=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0] watch channel had been closed
E0611 02:50:57.964428 1 csi_attacher.go:168] [attachment.ID=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0] watch channel had been closed
E0611 02:50:57.964435 1 csi_attacher.go:168] [attachment.ID=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0] watch channel had been closed
E0611 02:50:57.964443 1 csi_attacher.go:168] [attachment.ID=csi-f827919f325dd68a8460161ccb7a8bce56dc849a4c6ef8ca8a9431fc0808d7c0] watch channel had been closed
Also, there isn't actually a great error telling the user why the probing is happening (I knew in this case, but an admin likely has no idea).
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
/assign
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
/milestone v1.14
adding it to next milestone as per kubernetes/enhancements#626 (comment)
if it doesn't make it then it can be further moved
as KEP has been moved, unless an exception is made,
/milestone v1.15
@nikopen the linked kep above is unrelated to this issue.
This is a bug and I don't think it needs an associated enhancement
sorry, got diverted by the comments on that KEP.
/milestone v1.14
Hello from Bug Triage Shadow . I'd like to remind that code freeze is coming in 3 days (Friday, 8th March). ❄️ 🏔 Is this still planned to be fixed for 1.14?
We have not been tracking this for 1.14.
/assign @jsafrane
can you help triage and see if this is still an issue and if there is a quick fix?
I can't reproduce the issue, @smarterclayton, do you remember what you did?
From the logs it seems that Watch channel consistently returns watch.Error
so new waitForVolumeAttachmentInternal
is called recursively (beware, 1.10 code, but it's very similar in master):
Can it produce fairly stable 100ms delay between waitForVolumeAttachmentInternal
calls? Anyway, this is not usual code path, watcher does not return watch.Error
in most cases.
@vladimirvivien, if I read the code correctly, such recursive waitForVolumeAttachmentInternal
must end up with timeout - caller does not care about waitForVolumeAttachmentInternal
return value and caller's watch probably won't return anything useful. Is there a reason why we don't return error? This will kick in exponential backoff in A/D controller.
clearing the milestone for tracking purposes to let the fix land in its own time
/milestone clear
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
—
I found the error would happen when a pv is attahing(volumeattachment has created) and the csi-attacher is deleted at the same time. Then the log would print so frequently (because of recursion).
@cwdsuzhou is this fixed by #79144 ?
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen
.
Mark the issue as fresh with/remove-lifecycle rotten
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Closed #64952.