/cc @kubernetes/sig-storage-bugs
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
I am having this problem too with nfs-provisioner. What I've discovered is that the problem only happens if you first create a PVC and then run the provisioner. All PVCs that are created after the provisioner is deployed will succeed.
Here's the failing PVC log:
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
29m 8s 118 persistentvolume-controller Normal FailedBinding no persistent volumes available for this claim and no storage class is set
When I deleted the PVC and recreated it using the same template:
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
13s 13s 2 persistentvolume-controller Normal ExternalProvisioning waiting for a volume to be created, either by external provisioner "local.net/nfs" or manually created by system administrator
13s 13s 1 local.net/nfs nfs-provisioner-0 07729f15-a8f2-11e7-9b4b-d61a662c2538 Normal Provisioning External provisioner is provisioning volume for claim "test/test-postgresql"
13s 13s 1 local.net/nfs nfs-provisioner-0 07729f15-a8f2-11e7-9b4b-d61a662c2538 Normal ProvisioningSucceeded Successfully provisioned volume pvc-11346119-a8f3-11e7-b405-00d5a3ffd69a
In the provisioner logs, it was first failing then it succeeded:
E1004 10:58:21.872600 1 controller.go:569] Error getting claim "test/test-postgresql"'s StorageClass's fields: StorageClass "" not found
E1004 10:58:36.872758 1 controller.go:569] Error getting claim "test/test-postgresql"'s StorageClass's fields: StorageClass "" not found
E1004 10:58:51.872880 1 controller.go:569] Error getting claim "test/test-postgresql"'s StorageClass's fields: StorageClass "" not found
I1004 10:59:17.728782 1 controller.go:1052] scheduleOperation[lock-provision-test/test-postgresql[11346119-a8f3-11e7-b405-00d5a3ffd69a]]
I1004 10:59:17.744725 1 controller.go:1052] scheduleOperation[lock-provision-test/test-postgresql[11346119-a8f3-11e7-b405-00d5a3ffd69a]]
I1004 10:59:17.754203 1 leaderelection.go:154] attempting to acquire leader lease...
I1004 10:59:17.773669 1 leaderelection.go:176] successfully acquired lease to provision for pvc test/test-postgresql
I1004 10:59:17.777182 1 controller.go:1052] scheduleOperation[provision-test/test-postgresql[11346119-a8f3-11e7-b405-00d5a3ffd69a]]
I1004 10:59:17.816305 1 provision.go:411] using service SERVICE_NAME=nfs-provisioner cluster IP 10.110.0.92 as NFS server IP
I1004 10:59:17.824160 1 controller.go:786] volume "pvc-11346119-a8f3-11e7-b405-00d5a3ffd69a" for claim "test/test-postgresql" created
I1004 10:59:17.829239 1 controller.go:803] volume "pvc-11346119-a8f3-11e7-b405-00d5a3ffd69a" for claim "test/test-postgresql" saved
I1004 10:59:17.829255 1 controller.go:839] volume "pvc-11346119-a8f3-11e7-b405-00d5a3ffd69a" provisioned for claim "test/test-postgresql"
I1004 10:59:19.797022 1 leaderelection.go:196] stopped trying to renew lease to provision for pvc test/test-postgresql, task succeeded
By the way, I created a Helm chart to deploy nfs-provisioner, might be useful for testing: https://github.com/IlyaSemenov/nfs-provisioner-chart
@avagin, @IlyaSemenov do you still see the issue?
If at the time you created the PVC, you did not have a default storage class, then the admission controller will not be able to fill in the name of the default storage class into the PVC object.
+1
Like @msau42 mentioned, is the storage class created and set to default before the PVC is created? @IlyaSemenov I tried your helm chart, and if I create a PVC before installing the chart, the storage class is created only after the PVC exists.
I wasn't able to reproduce the issue otherwise using nfs-provisioner. @IlyaSemenov @aliasmee what are your repro steps?
I confirm that it works if the storage class exists prior to creating a PVC. After all, I mentioned the same in my first comment: "What I've discovered is that the problem only happens if you first create a PVC and then run the provisioner."
This is however I believe a typical scenario for newcomers on newly created clusters. You deploy an app (with a Helm chart or manually) which creates a PVC for itself, but it fails to come up. You then read logs and discover that you're missing a provisioner, you create a default provisioner, but it still does not help and shows cryptic messages 🤷. It's far from obvious that the default provisioner will not "pick up" previously created PVC which have been set to use the default provisioner, and that you need to recreate all PVCs.
@jsafrane what do you think of modifying existing PVCs to add the default storage class if the default was created/set after the PVC? Will this break backwards compatibility?
I've been out of the loop for a bit, but I see that the pvc's storage class is set to "" (above), I assume due to no storage class being provided. If this is true how will the controller know the difference between an explicit setting of "" for storage class in the pvc (which means not to use any storage class) vs. the storage class being omitted in the claim but "" filled in when the pvc object was created? Is another internal annotation needed?
PVC with storageclass == nil is used to indicate the default, whereas storageclass == "" means no storageclass. But the filling in of the default storageclass is done in the admission controller on PVC create, so if you create/set your default storageclass after creating the PVC, the storageclass field won't be updated afterwards. I think we could possibly have a long running controller do the same thing as the admission controller, filling in the storageclass name with the default storageclass, however, we need to consider backwards compatibility scenarios.
Maybe it's ok. Who's going to have an existing pending PVC with nil storageclass from before the release (1.6?) that default storageclasses was introduced?
Why not just use your local storage? like this:
--- vz-test-claim.yaml ---
kind: PersistentVolume
apiVersion: v1
metadata:
name: vz-test-claim
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
- ReadOnlyMany
hostPath:
path: "/mnt/vstorage/kube"
------
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
I believe the problem needs to be fixed ("" updated to nil automatically), or at least the error messages should more explanative.
#44370 (comment) still reflects the sum of the problem from user perspective, and #44370 (comment) the internals.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
—
PVC with storageclass == nil is used to indicate the default, whereas storageclass == "" means no storageclass. But the filling in of the default storageclass is done in the admission controller on PVC create, so if you create/set your default storageclass after creating the PVC, the storageclass field won't be updated afterwards. I think we could possibly have a long running controller do the same thing as the admission controller, filling in the storageclass name with the default storageclass, however, we need to consider backwards compatibility scenarios.
Unfortunately this dependency between storage classes and PVCs breaks the resource model of Kubernetes where everything converges after a while. I don't know whether there are any backwards-compatibility concerns but a controller that sets the default storage class in PVCs that do not use it sounds good to me.
/remove-lifecycle stale
Backfilling PVCs that have long existed before the default storageclass logic was added also breaks backwards compatibility.
Backfilling PVCs that have long existed before the default storageclass logic was added also breaks backwards compatibility.
Does it also break for unbound PVCs?
Unbound PVCs should be safe to backfill
I do not like filling it in either. I think this can be solved adequately with better communication. What if the admission controller wrote an event on the PVC like "the DefaultStorageClass plugin is on but no default storage class yet exists, no storage class has been set" (I am not sure if it is allowed to write events...but obviously it is allowed to write to the PVC spec.)
I think, for people familiar with kube api "default" probably carries the implication of "default at creation time". for everybody else it is not so clear that "default" is only a creation-time thing.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
—
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
I believe at least better communication should be added.
/remove-lifecycle rotten
I have the same issue.
We're seeing the same behavior on a vanilla Minikube cluster. The default storage class exists, and a simple PVC with no storage class set will sometimes fail to bind, with error no persistent volumes available for this claim and no storage class is set
.
This is on Minikube v0.33.1 for what it's worth, but the issue only shows up sporadically, and a reboot seems to fix it, at least temporarily.
Encountered the same issue when testing a local installation. Simply googling the error message in pod creation does not help. I went into provisioner pod and googled the logged error message which finally led me to this thread.
I agree that backfilling might not be a good idea. But maybe this can be mentioned in the official documentation of storage class (or somewhere appropriate).
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle rotten
I do not like filling it in either. I think this can be solved adequately with better communication. What if the admission controller wrote an event on the PVC like "the DefaultStorageClass plugin is on but no default storage class yet exists, no storage class has been set" (I am not sure if it is allowed to write events...but obviously it is allowed to write to the PVC spec.)
I think, for people familiar with kube api "default" probably carries the implication of "default at creation time". for everybody else it is not so clear that "default" is only a creation-time thing.
I wholly disagree. Kubernetes is a state resolution engine. Define your desired state and eventually you have that state. I should not be required to create a StorageClass primitive with a specific annotation before any other primitive in order to achieve consistent behavior. I should be able to spin up a vanilla cluster, kubectl apply -f .
and have a working application. This behavior is inconsistent with the rest of kubernetes and its philosophies and should be changed.
I would put storage class in the same class of objects as psp or quota, something an admin/bootstrapper creates to allow/deny access to resources. Don't want to argue what constitutes "kubernetes philosophies" but order matters in lots of places, e.g. you cannot create a custom resource before its crd, pod before its service account. A common pattern is to prepend your yamls with #'s.
Anyway, I didn't say putting an event on the PVC was the best solution, just an adequate one. Backfilling would be the best UX but requires more thought. You need to consider races with the pv controller who may be in the middle of binding the pvc to a pv at the time you create the default storage class. Or if you want the pv controller itself to do the backfilling then you need to touch its code which is complicated.
Help to fix this is welcome...this is unfortunately not a high priority because the majority of users are on cloudproviders/distributions that bootstrap a default storage class already. (This issue is 2 yrs old now and being run into sporadically).
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Closed #44370.
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen
.
Mark the issue as fresh with/remove-lifecycle rotten
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.