Hello folks,
If you are using external-resizer v1.13.0 or later where RecoverFromVolumeExpansionFailure feature is enabled and have kubernetes api-server on v1.32 but one or more nodes are still on v1.31 (or older), you could be affected by https://github.com/kubernetes/kubernetes/issues/131402 where `allocatedResourceStatus` doesn’t clear after resizing operation is finished on a PVC. This stale condition which doesn’t get cleared even after expansion is successful can block subsequent volume expansion of the same PVC.
With kubernetes api-server v1.32 or later, if you intend to keep using nodes with versions v1.31 or less, we recommend you either switch to a older version of external-resizer such as https://github.com/kubernetes-csi/external-resizer/releases/tag/v1.12.0 or disable `RecoverVolumeExpansionFailure` in external-resizer with `--feature-gates` flag. Afterwards, clear any PVCs stuck in this state by running the command:
kubectl patch pvc <pvc name> --type=json --subresource status -p='[{"op": "remove", "path": "/status/allocatedResourceStatuses"}]'
If your api-server and kubelet are both on v1.32 or greater, you are not affected by this bug. Similarly, if you are running an older version of api-server (v1.31.0), you are not affected by this bug either.
We will update this thread when fixes are available.