@mlmhl: Reiterating the mentions to trigger a notification:
@kubernetes/sig-storage-pr-reviews
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.![]()
/assign @gnufied
@gnufied commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + +The `volumeFSResizingAnnotation` annotation's key is `volumes.kubernetes.io/fs-resizing-{volumeName}`, and set it's value to `yes` if online file system resizing if required. We can use `volume.Spec.Name()` as the `volumeName` to guarantee that: + +* Annotation's key is a qualified name +* `volumeName` is unique for a single pod + +### Prerequisite + +* `sig-api-machinery` need to allow pod's annotation update from kubelet. +* `sig-api-machinery` need to allow pod's get and annotation update from expand-controller. + +### Controller Manager Resize + +#### Fetch All Pods Using a PVC + +We need to know which pods are using a volume with specified name, but `ExpandController` doesn't cache this relationship. To achieve this goal, We can share the `DesiredStateOfWorld` with `AttachDetachController`. In detail, reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and add a `GetPodsInVolume` method to `DesiredStateOfWorld`. Expand controller use this method to fetch pods using the volume with specified name.
typo in word reflector ? I am still unclear though - how will this cache become available to ExpandController since it will be a internal property of AttachDetachController.
@gnufied commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +
+### Prerequisite
+
+* `sig-api-machinery` need to allow pod's annotation update from kubelet.
+* `sig-api-machinery` need to allow pod's get and annotation update from expand-controller.
+
+### Controller Manager Resize
+
+#### Fetch All Pods Using a PVC
+
+We need to know which pods are using a volume with specified name, but `ExpandController` doesn't cache this relationship. To achieve this goal, We can share the `DesiredStateOfWorld` with `AttachDetachController`. In detail, reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and add a `GetPodsInVolume` method to `DesiredStateOfWorld`. Expand controller use this method to fetch pods using the volume with specified name.
+
+The `GetPodsInVolume` method will look like this:
+
+```go
+func (dsw *desiredStateOfWorld) GetPodsInVolume(volumeName v1.UniqueVolumeName) []*v1.Pod {
We should really use ActualStateOfWorld rather than dsow here.
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + +The `volumeFSResizingAnnotation` annotation's key is `volumes.kubernetes.io/fs-resizing-{volumeName}`, and set it's value to `yes` if online file system resizing if required. We can use `volume.Spec.Name()` as the `volumeName` to guarantee that: + +* Annotation's key is a qualified name +* `volumeName` is unique for a single pod + +### Prerequisite + +* `sig-api-machinery` need to allow pod's annotation update from kubelet. +* `sig-api-machinery` need to allow pod's get and annotation update from expand-controller. + +### Controller Manager Resize + +#### Fetch All Pods Using a PVC + +We need to know which pods are using a volume with specified name, but `ExpandController` doesn't cache this relationship. To achieve this goal, We can share the `DesiredStateOfWorld` with `AttachDetachController`. In detail, reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and add a `GetPodsInVolume` method to `DesiredStateOfWorld`. Expand controller use this method to fetch pods using the volume with specified name.
I've updated the proposal to add more detailed description of this reflector operation.
And a preliminary implementation can be found from these commits: reflector
remove dependence
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +
+### Prerequisite
+
+* `sig-api-machinery` need to allow pod's annotation update from kubelet.
+* `sig-api-machinery` need to allow pod's get and annotation update from expand-controller.
+
+### Controller Manager Resize
+
+#### Fetch All Pods Using a PVC
+
+We need to know which pods are using a volume with specified name, but `ExpandController` doesn't cache this relationship. To achieve this goal, We can share the `DesiredStateOfWorld` with `AttachDetachController`. In detail, reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and add a `GetPodsInVolume` method to `DesiredStateOfWorld`. Expand controller use this method to fetch pods using the volume with specified name.
+
+The `GetPodsInVolume` method will look like this:
+
+```go
+func (dsw *desiredStateOfWorld) GetPodsInVolume(volumeName v1.UniqueVolumeName) []*v1.Pod {
Yeah, you are right. I think we should get the pod-volume-relationship(which pod use which volume) from DesiredStateOfWorld as ActualStateOfWorld doesn't cache this relationship, and get the volume-node-relationship(which volume mount to which node) from ActualStateOfWorld. The current proposal misses the latter one, I will update the proposal later.
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +
+### Prerequisite
+
+* `sig-api-machinery` need to allow pod's annotation update from kubelet.
+* `sig-api-machinery` need to allow pod's get and annotation update from expand-controller.
+
+### Controller Manager Resize
+
+#### Fetch All Pods Using a PVC
+
+We need to know which pods are using a volume with specified name, but `ExpandController` doesn't cache this relationship. To achieve this goal, We can share the `DesiredStateOfWorld` with `AttachDetachController`. In detail, reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and add a `GetPodsInVolume` method to `DesiredStateOfWorld`. Expand controller use this method to fetch pods using the volume with specified name.
+
+The `GetPodsInVolume` method will look like this:
+
+```go
+func (dsw *desiredStateOfWorld) GetPodsInVolume(volumeName v1.UniqueVolumeName) []*v1.Pod {
I added a detailed explanation for this subject to the proposal.
@gnufied commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + +## Implementation Design + +### Overview + +The key point of online file system resizing is how kubelet to discover which PVCs need file system resizing. We achieve this goal by adding a `volumeFSResizingAnnotation` annotation to pod. In detail, if a volume received an online file system resizing request, the annotation will be added to the pods using this volume, so kubelet can watch for pods running on it to discover this annotation, and then start the online file system resizing operation for according volumes. + +The `volumeFSResizingAnnotation` annotation's key is `volumes.kubernetes.io/fs-resizing-{volumeName}`, and set it's value to `yes` if online file system resizing if required. We can use `volume.Spec.Name()` as the `volumeName` to guarantee that: + +* Annotation's key is a qualified name +* `volumeName` is unique for a single pod + +### Prerequisite + +* `sig-api-machinery` need to allow pod's annotation update from kubelet. +* `sig-api-machinery` need to allow pod's get and annotation update from expand-controller.
cc @liggitt
@gnufied commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +* For a specific `(volume,node)`, if more than one `pod` satisfies the relationship above, only add the annotation to one pod inside them. + +#### Fetch All Pods Using a PVC + +For a volume with online resizing request, we need to find which pods and nodes satisfy the relationship defines above, but `ExpandController` doesn't cache this relationship, so we need to fetch this information from `DesiredStateOfWorld` and `ActualStateOfWorld` of `AttachDetachController`. `DesiredStateOfWorld`caches the used volumes for each pods, `ActualStateOfWorld` caches the mounted volumes for each nodes. + +**Note**: `ActualStateOfWorld` only caches the mount relationship between volumes and nodes, not the volumes and pods, because only kubelet knows which pods actually mount which volumes. + + To achieve this goal, we need to: + +* reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and remove the parameter `volumePluginMgr` of `NewDesiredStateOfWorld` and `NewActualStateOfWorld` function, so that we can create a new `DesiredStateOfWorld` and `ActualStateOfWorld` without any dependence. + +* This modification is possible because `DesiredStateOfWorld` and `ActualStateOfWorld` only use `volumePluginMgr` to fetch volume name, inside the `DesiredStateOfWorld.AddPod` and `ActualStateOfWorld.AddVolumeNode` method, we can get the volume name outside and transform to them as a method parameter. + +* Then we can create a common `DesiredStateOfWorld` and `ActualStateOfWorld`, implemented as fields of the `ControllerContext` object, instead of creating them inside `NewAttachDetachController`. After this, `AttachDetachController` and `ExpandController` can both access it through `ControllerContext`. +
Is there any example of this design being used previously? I am just curious.
@gnufied commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + +#### Prerequisite of Controller Manager Resize + +* We use a three tuple `(pod,volume,node)` to represent a relationship: `pod` is using `volume`, `pod` is running on `node` and `volume` has been mounted to `node`. Add the `volumeFSResizingAnnotation` to `pod` only if this relationship was established. + +* For a specific `(volume,node)`, if more than one `pod` satisfies the relationship above, only add the annotation to one pod inside them. + +#### Fetch All Pods Using a PVC + +For a volume with online resizing request, we need to find which pods and nodes satisfy the relationship defines above, but `ExpandController` doesn't cache this relationship, so we need to fetch this information from `DesiredStateOfWorld` and `ActualStateOfWorld` of `AttachDetachController`. `DesiredStateOfWorld`caches the used volumes for each pods, `ActualStateOfWorld` caches the mounted volumes for each nodes. + +**Note**: `ActualStateOfWorld` only caches the mount relationship between volumes and nodes, not the volumes and pods, because only kubelet knows which pods actually mount which volumes. + + To achieve this goal, we need to: + +* reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and remove the parameter `volumePluginMgr` of `NewDesiredStateOfWorld` and `NewActualStateOfWorld` function, so that we can create a new `DesiredStateOfWorld` and `ActualStateOfWorld` without any dependence.
Do you mean reflector or you mean refactor ?
@gnufied commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +} +``` + +### File System Resize on Kubelet + +#### Prerequisite of Online File System Resize + +* Pod running on the node has one more `volumeFSResizingAnnotation` annotations. +* Volume with a online resizing request has been mounted to the node. +* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`. + +#### Volume Resize Request Detect + +Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state. + +Besides, we add a `VolumeFsResizeMapPopulator` interface as the producer of `VolumeFsResizeMap`, it runs an asynchronous periodic loop to populate the volumes need file system resize. In each loop, `VolumeFsResizeMapPopulator` traverses each pod's annotation, and generate a `VolumeToResize` object for each `volumeFSResizingAnnotation`.
Will this be a new package or part of volume-manager's interface?
@liggitt commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +* For a specific `(volume,node)`, if more than one `pod` satisfies the relationship above, only add the annotation to one pod inside them. + +#### Fetch All Pods Using a PVC + +For a volume with online resizing request, we need to find which pods and nodes satisfy the relationship defines above, but `ExpandController` doesn't cache this relationship, so we need to fetch this information from `DesiredStateOfWorld` and `ActualStateOfWorld` of `AttachDetachController`. `DesiredStateOfWorld`caches the used volumes for each pods, `ActualStateOfWorld` caches the mounted volumes for each nodes. + +**Note**: `ActualStateOfWorld` only caches the mount relationship between volumes and nodes, not the volumes and pods, because only kubelet knows which pods actually mount which volumes. + + To achieve this goal, we need to: + +* reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and remove the parameter `volumePluginMgr` of `NewDesiredStateOfWorld` and `NewActualStateOfWorld` function, so that we can create a new `DesiredStateOfWorld` and `ActualStateOfWorld` without any dependence. + +* This modification is possible because `DesiredStateOfWorld` and `ActualStateOfWorld` only use `volumePluginMgr` to fetch volume name, inside the `DesiredStateOfWorld.AddPod` and `ActualStateOfWorld.AddVolumeNode` method, we can get the volume name outside and transform to them as a method parameter. + +* Then we can create a common `DesiredStateOfWorld` and `ActualStateOfWorld`, implemented as fields of the `ControllerContext` object, instead of creating them inside `NewAttachDetachController`. After this, `AttachDetachController` and `ExpandController` can both access it through `ControllerContext`. +
Then we can create a common
DesiredStateOfWorldandActualStateOfWorld, implemented as fields of theControllerContextobject
no... controller context is not intended to be a shared mutable struct
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + +#### Prerequisite of Controller Manager Resize + +* We use a three tuple `(pod,volume,node)` to represent a relationship: `pod` is using `volume`, `pod` is running on `node` and `volume` has been mounted to `node`. Add the `volumeFSResizingAnnotation` to `pod` only if this relationship was established. + +* For a specific `(volume,node)`, if more than one `pod` satisfies the relationship above, only add the annotation to one pod inside them. + +#### Fetch All Pods Using a PVC + +For a volume with online resizing request, we need to find which pods and nodes satisfy the relationship defines above, but `ExpandController` doesn't cache this relationship, so we need to fetch this information from `DesiredStateOfWorld` and `ActualStateOfWorld` of `AttachDetachController`. `DesiredStateOfWorld`caches the used volumes for each pods, `ActualStateOfWorld` caches the mounted volumes for each nodes. + +**Note**: `ActualStateOfWorld` only caches the mount relationship between volumes and nodes, not the volumes and pods, because only kubelet knows which pods actually mount which volumes. + + To achieve this goal, we need to: + +* reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and remove the parameter `volumePluginMgr` of `NewDesiredStateOfWorld` and `NewActualStateOfWorld` function, so that we can create a new `DesiredStateOfWorld` and `ActualStateOfWorld` without any dependence.
Sorry, typo in word, I've fixed this.
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +} +``` + +### File System Resize on Kubelet + +#### Prerequisite of Online File System Resize + +* Pod running on the node has one more `volumeFSResizingAnnotation` annotations. +* Volume with a online resizing request has been mounted to the node. +* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`. + +#### Volume Resize Request Detect + +Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state. + +Besides, we add a `VolumeFsResizeMapPopulator` interface as the producer of `VolumeFsResizeMap`, it runs an asynchronous periodic loop to populate the volumes need file system resize. In each loop, `VolumeFsResizeMapPopulator` traverses each pod's annotation, and generate a `VolumeToResize` object for each `volumeFSResizingAnnotation`.
I'm not sure if I understand what you mean. We can put VolumeFsResizeMap interface under package pkg/kubelet/volumemanager/cache, just like the ASW and DSW, and put VolumeFsResizeMapPopulator interface under package pkg/kubelet/volumemanager/populator, just like the DesiredStateOfWorldPopulator. Then pkg/kubelet/volumemanager/reconciler.Reconciler can work as a consumer of VolumeFsResizeMap. These two interfaces will look like this:
type VolumeFsResizeMap interface { AddVolumeForPod(volumeName string, pod *v1.Pod) GetVolumesToResize() []VolumeToResize MarkAsFileSystemResized(pod *v1.Pod, volumeName string) error }
type VolumeFsResizeMapPopulator interface { Run(stopCh <-chan struct{}) }
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +* For a specific `(volume,node)`, if more than one `pod` satisfies the relationship above, only add the annotation to one pod inside them. + +#### Fetch All Pods Using a PVC + +For a volume with online resizing request, we need to find which pods and nodes satisfy the relationship defines above, but `ExpandController` doesn't cache this relationship, so we need to fetch this information from `DesiredStateOfWorld` and `ActualStateOfWorld` of `AttachDetachController`. `DesiredStateOfWorld`caches the used volumes for each pods, `ActualStateOfWorld` caches the mounted volumes for each nodes. + +**Note**: `ActualStateOfWorld` only caches the mount relationship between volumes and nodes, not the volumes and pods, because only kubelet knows which pods actually mount which volumes. + + To achieve this goal, we need to: + +* reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and remove the parameter `volumePluginMgr` of `NewDesiredStateOfWorld` and `NewActualStateOfWorld` function, so that we can create a new `DesiredStateOfWorld` and `ActualStateOfWorld` without any dependence. + +* This modification is possible because `DesiredStateOfWorld` and `ActualStateOfWorld` only use `volumePluginMgr` to fetch volume name, inside the `DesiredStateOfWorld.AddPod` and `ActualStateOfWorld.AddVolumeNode` method, we can get the volume name outside and transform to them as a method parameter. + +* Then we can create a common `DesiredStateOfWorld` and `ActualStateOfWorld`, implemented as fields of the `ControllerContext` object, instead of creating them inside `NewAttachDetachController`. After this, `AttachDetachController` and `ExpandController` can both access it through `ControllerContext`. +
Yeah, this refactoring is really a bit trick. Do we have any better idea to share ASW/DSW?
@gnufied commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +* For a specific `(volume,node)`, if more than one `pod` satisfies the relationship above, only add the annotation to one pod inside them. + +#### Fetch All Pods Using a PVC + +For a volume with online resizing request, we need to find which pods and nodes satisfy the relationship defines above, but `ExpandController` doesn't cache this relationship, so we need to fetch this information from `DesiredStateOfWorld` and `ActualStateOfWorld` of `AttachDetachController`. `DesiredStateOfWorld`caches the used volumes for each pods, `ActualStateOfWorld` caches the mounted volumes for each nodes. + +**Note**: `ActualStateOfWorld` only caches the mount relationship between volumes and nodes, not the volumes and pods, because only kubelet knows which pods actually mount which volumes. + + To achieve this goal, we need to: + +* reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and remove the parameter `volumePluginMgr` of `NewDesiredStateOfWorld` and `NewActualStateOfWorld` function, so that we can create a new `DesiredStateOfWorld` and `ActualStateOfWorld` without any dependence. + +* This modification is possible because `DesiredStateOfWorld` and `ActualStateOfWorld` only use `volumePluginMgr` to fetch volume name, inside the `DesiredStateOfWorld.AddPod` and `ActualStateOfWorld.AddVolumeNode` method, we can get the volume name outside and transform to them as a method parameter. + +* Then we can create a common `DesiredStateOfWorld` and `ActualStateOfWorld`, implemented as fields of the `ControllerContext` object, instead of creating them inside `NewAttachDetachController`. After this, `AttachDetachController` and `ExpandController` can both access it through `ControllerContext`. +
Lets list out all the ideas we discussed in Github issue, so as we know all the available options. The other ways of implementing this is:
I probably miss something very obvious, but why do we need new annotations? Right now, kubelet already knows that a PV needs filesystems resize, because it does offline resize. And it also can update the PVC and "finish" the resize so user knows it's done. So why a new annotation and a new controller?
@jsafrane commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + + return nil +} +``` + +### File System Resize on Kubelet + +#### Prerequisite of Online File System Resize + +* Pod running on the node has one more `volumeFSResizingAnnotation` annotations. +* Volume with a online resizing request has been mounted to the node. +* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`. + +#### Volume Resize Request Detect + +Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state. This interface will look like this:
Kubelet already loops through all mounts in VolumeManager. It can see that a volume needs fs resize there. And based on whether the mount exists in its ASW, it can see if it's going to perform online or offline resize.
In addition, fs resize must be coordinated with VolumeManager anyway, since VolumeManager should not touch the volume (e.g. unmount it) during resize.
I probably miss something very obvious, but why do we need new annotations?
Kubelet only watches for pod and only fetch PVC object from apiserver when pod starting, won't get latest PVC object again after pod is already started. So if we resize the volume after pod started, kubelet can't realize this request.
We add a annotation to pod to inform kubelet some volumes need resizing, but as @gnufied mentioned, this is just one of some available approaches to inform kubelet.
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + + return nil +} +``` + +### File System Resize on Kubelet + +#### Prerequisite of Online File System Resize + +* Pod running on the node has one more `volumeFSResizingAnnotation` annotations. +* Volume with a online resizing request has been mounted to the node. +* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`. + +#### Volume Resize Request Detect + +Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state. This interface will look like this:
Kubelet already loops through all mounts in VolumeManager. It can see that a volume needs fs resize there.
The problem is that the volume objects cached in ASW is only updated when pod starting, and won't update after pod started any more, it means that if we resize the volume after pod already started, VolumeManager can't realize this change from ASW.
In addition, fs resize must be coordinated with VolumeManager anyway, since VolumeManager should not touch the volume (e.g. unmount it) during resize.
We append the resize operation to OperationExecutor so that VolumeManager can't touch the volume before resize operation finished.
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + + return nil +} +``` + +### File System Resize on Kubelet + +#### Prerequisite of Online File System Resize + +* Pod running on the node has one more `volumeFSResizingAnnotation` annotations. +* Volume with a online resizing request has been mounted to the node. +* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`. + +#### Volume Resize Request Detect + +Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state. This interface will look like this:
The original discussion in this issue: kubernetes/kubernetes#57185
@gnufied commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + + return nil +} +``` + +### File System Resize on Kubelet + +#### Prerequisite of Online File System Resize + +* Pod running on the node has one more `volumeFSResizingAnnotation` annotations. +* Volume with a online resizing request has been mounted to the node. +* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`. + +#### Volume Resize Request Detect + +Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state. This interface will look like this:
I think - what we can do is. When we are iterating through pods in volumemanager's desired_state_of_world for mount and call PodExistsInVolume function, then that function can make an additional API call to check if underlying volume needs resizing. Currently PodExistsInVolume already raises an error remountRequired when some change requires us to remount the volume. So we can introduce a new error resizeRequired which will be raised by the function and that in turn will cause file system resize on the kubelet.
This will require no API change or annotation and everything will work online as expected. thanks @jsafrane for pointing it out.
@gnufied commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + + return nil +} +``` + +### File System Resize on Kubelet + +#### Prerequisite of Online File System Resize + +* Pod running on the node has one more `volumeFSResizingAnnotation` annotations. +* Volume with a online resizing request has been mounted to the node. +* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`. + +#### Volume Resize Request Detect + +Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state. This interface will look like this:
One potential problem with ^ approach is - the reconciler loop turns every 100ms and say if a node has 20 pods with volumes, then we will be making 20 GET requests every 100ms, because there is no informer inside volume_manager. Again - I am not sure how badly it will affect us.
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + + return nil +} +``` + +### File System Resize on Kubelet + +#### Prerequisite of Online File System Resize + +* Pod running on the node has one more `volumeFSResizingAnnotation` annotations. +* Volume with a online resizing request has been mounted to the node. +* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`. + +#### Volume Resize Request Detect + +Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state. This interface will look like this:
Maybe we can add a lastUpdateTimestamp for each volumes, and use an independent loop interval with the reconciler, this can reduce the request counts, but I am not sure how much help this can have
/ok-to-test
We need help from @kubernetes/sig-node-proposals.
kubelet needs to know when a volume has been resized so it can call /sbin/resize2fs (or similar utility) to resize filesystem on the volume. Kubelet does not have any informer on PV/PVCs, because it would allow compromised node to read all PVs / PVCs on the system.
This proposal adds a new controller that detects this on behalf of kubelet and puts annotations to pods. Kubelet has informer on pods so it can get an event, get the affected PV from API server and resize it.
I would be much better if kubelet had some sort of informer on PVs or PVCs so it can get event when a PV (or PVC) is resized, without a special controller poking pods just to let kubelet know.
To use approach similar to remountRequired feature what we can do is:
syncPod we can request reprocessing of pod. This does not require any new change, all pods are reprorcessed periodically.attachedVolume.mountedPods struct to indicate that a resize is requested.We need to implement a caching pvc manager on the line of https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/configmap/configmap_manager.go#L98 . So basically pvc information is cached there with a TTL.
This looks like a good solution, I will test it first and then update this proposal for it if it works.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: saad-ali
Assign the PR to them by writing /assign @saad-ali in a comment when ready.
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Needs approval from an approver in each of these files:Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@gnufied
This proposal is updated to conform to our new solution, PTAL.
@gnufied commented on this pull request.
mostly looks good. left some minor comments.
cc @derekwaynecarr @sjenning @Random-Liu for kubelet review.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + if err != nil &&
+ !nestedpendingoperations.IsAlreadyExists(err) &&
+ !exponentialbackoff.IsExponentialBackoff(err) {
+ // Ignore nestedpendingoperations.IsAlreadyExists and exponentialbackoff.IsExponentialBackoff errors, they are expected.
+ // Log all other errors.
+ glog.Errorf(volumeToMount.GenerateErrorDetailed("operationExecutor.VolumeFileSystemResize failed", err).Error())
+ }
+ if err == nil {
+ glog.V(4).Infof(volumeToMount.GenerateMsgDetailed("operationExecutor.VolumeFileSystemResize started", ""))
+ }
+}
+```
+
+Can be seen from the above code, we add a `VolumeFileSystemResize` method to `OperationExecutor` and a `GenerateVolumeFSResizeFunc` method to `OperationGenerator`, which performs volume file system resizing operation.
+
+We also add a `MarkVolumeAsResized` to `ActualStateOfWorldMounterUpdater`, which will be invoked after the file system resizing operation succeeded. `MarkVolumeAsResized` clears the `fsResizeRequired` flag of according `mountedPod` to promise no more resize operation will be performed.
What is ActualStateOfWorldMounterUpdater ?
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + } + processedVolumesForFSResize.Insert(string(uniqueVolumeName)) +} +``` + +It is important to note that: + +- we only perform online resizing for mounted volumes, it means that these volumes must exist in `ActualStateOfWorld`, so we fetch all mount volumes from ASW in advance(stored in input parameter `mountedVolumesForPod`), and skip volumes which not exist in `mountedVolumesForPod` + +- Although we add `fsResizeRequired` to `mountedPod`, actually file system resizing is a global operation for volume, if more than one pod mount a same PVC, we needn't invoke `MarkFSResizeRequired` for each pod. Instead, we only mark the first pod we processed. The input parameter `processedVolumesForFSResize` is introduced for this purpose. + +#### File System Resizing Operation + +Currently `Reconciler` runs a periodic loop to reconcile the desired state of the world with the actual state of the world by triggering attach, detach, mount, and unmount operations. Based on this mechanism, we can make `Reconciler` to trigger file system resize operation too. + +In each loop, for each volume in DSW, `Reconciler` checks whether this volume exists in ASW or not. If volume exists but marked as file system resizing required, ASW returns a `fsResizeRequiredError` error. then `Reconciler` triggers a file system resizing operation for this volume. The code snippet looks like this:
So I assume this code will run in place where we check for mounted volumes right?
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + asw.attachedVolumes[volumeName].mountedPods[podName] = podObj + } +} +``` + +And `DesiredStateOfWorldPopulator` uses a `checkVolumeFSResize` method to perform the operation described as above, which invokes inside `processPodVolumes` method: + +```go +// checkVolumeFSResize checks whether a PVC mounted by the pod requires file +// system resize or not. If so, marks this volume as fsResizeRequired in ASW. +// - mountedVolumesForPod stores all mounted volumes in ASW, because online +// volume resize only considers mounted volumes. +// - processedVolumesForFSResize stores all volumes we have checked in current loop, +// because file system resize operation is a global operation for volume, so +// we only need to check it once if more than one pod use it.use it. +func (dswp *desiredStateOfWorldPopulator) checkVolumeFSResize(
It might be worth improving this check to consider only RW volumes within pods. I know that we will be mostly dealing with global mount path but it is best to skip entire resize operation for the pod, if volume is mounted as readonly.
@gnufied commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + - [Online File System Resize in Kbuelet](#online-file-system-resize-in-kbuelet) + - [Volume Resize Request Detect](#volume-resize-request-detect) + - [File System Resizing Operation](#file-system-resizing-operation) + +## Overview + +Release 1.10 only supports offline file system resizing for PVCs, as this operation is only executed inside the `MountVolume` operation in kubelet. If a resizing request was submitted after the volume mounted, it won't be performed. This proposal intent to support online file system resizing for PVCs in kubelet. + +## Goals + +Enable users to increase size of PVC which already in use(mounted). The user will update PVC for requesting a new size. Underneath we expect that - according kubelet will resize the file system for the PVC. + +## Non Goals + +* Offline file system resizing not included. If we found a volume need file system resizing but not mounted to the node yet, we will do nothing. This situation will be dealt with by the existing [offline file system resizing handler](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md). +
Lets reword "If we found a volume need file system resizing but not mounted to the node yet, we will do nothing." to -> "If we found a volume need file system resizing but not mounted to the node yet, we will do nothing. The file system on volume will be resized according to existing
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + if err != nil &&
+ !nestedpendingoperations.IsAlreadyExists(err) &&
+ !exponentialbackoff.IsExponentialBackoff(err) {
+ // Ignore nestedpendingoperations.IsAlreadyExists and exponentialbackoff.IsExponentialBackoff errors, they are expected.
+ // Log all other errors.
+ glog.Errorf(volumeToMount.GenerateErrorDetailed("operationExecutor.VolumeFileSystemResize failed", err).Error())
+ }
+ if err == nil {
+ glog.V(4).Infof(volumeToMount.GenerateMsgDetailed("operationExecutor.VolumeFileSystemResize started", ""))
+ }
+}
+```
+
+Can be seen from the above code, we add a `VolumeFileSystemResize` method to `OperationExecutor` and a `GenerateVolumeFSResizeFunc` method to `OperationGenerator`, which performs volume file system resizing operation.
+
+We also add a `MarkVolumeAsResized` to `ActualStateOfWorldMounterUpdater`, which will be invoked after the file system resizing operation succeeded. `MarkVolumeAsResized` clears the `fsResizeRequired` flag of according `mountedPod` to promise no more resize operation will be performed.
ActualStateOfWorldMounterUpdater is defined in here. OperationGenerator uses it to MarkVolumeAsMounted, MarkVolumeAsUnmounted, etc. VolumeManager's ActualStateOfWorld implements this interface.
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + } + processedVolumesForFSResize.Insert(string(uniqueVolumeName)) +} +``` + +It is important to note that: + +- we only perform online resizing for mounted volumes, it means that these volumes must exist in `ActualStateOfWorld`, so we fetch all mount volumes from ASW in advance(stored in input parameter `mountedVolumesForPod`), and skip volumes which not exist in `mountedVolumesForPod` + +- Although we add `fsResizeRequired` to `mountedPod`, actually file system resizing is a global operation for volume, if more than one pod mount a same PVC, we needn't invoke `MarkFSResizeRequired` for each pod. Instead, we only mark the first pod we processed. The input parameter `processedVolumesForFSResize` is introduced for this purpose. + +#### File System Resizing Operation + +Currently `Reconciler` runs a periodic loop to reconcile the desired state of the world with the actual state of the world by triggering attach, detach, mount, and unmount operations. Based on this mechanism, we can make `Reconciler` to trigger file system resize operation too. + +In each loop, for each volume in DSW, `Reconciler` checks whether this volume exists in ASW or not. If volume exists but marked as file system resizing required, ASW returns a `fsResizeRequiredError` error. then `Reconciler` triggers a file system resizing operation for this volume. The code snippet looks like this:
Yeah, you are right. Currently the check for mounted volumes has two branches:
This proposal adds a third branch: Volume is mounted and requires resizing.
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + asw.attachedVolumes[volumeName].mountedPods[podName] = podObj + } +} +``` + +And `DesiredStateOfWorldPopulator` uses a `checkVolumeFSResize` method to perform the operation described as above, which invokes inside `processPodVolumes` method: + +```go +// checkVolumeFSResize checks whether a PVC mounted by the pod requires file +// system resize or not. If so, marks this volume as fsResizeRequired in ASW. +// - mountedVolumesForPod stores all mounted volumes in ASW, because online +// volume resize only considers mounted volumes. +// - processedVolumesForFSResize stores all volumes we have checked in current loop, +// because file system resize operation is a global operation for volume, so +// we only need to check it once if more than one pod use it.use it. +func (dswp *desiredStateOfWorldPopulator) checkVolumeFSResize(
Nice idea, I've already added this check to this proposal, PTAL.
@saad-ali commented on this pull request.
Can you add a section indicating in what cases online expansion is safe. Traditionally filesystem resizing requires volume to be unmounted first. Explain why it is safe to do resize without unmount.
Linux kernel-3.3 introduced online resizing of ext4 volumes (https://www.ibm.com/developerworks/library/l-33linuxkernel/) and xfs always supported growing mounted partitions (in-fact currently there is no way to expand a unmounted xfs file system).
These are the only 2 file systems we support for resizing currently. Our motivation for doing online resizing is - we have had many users on slack and elsewhere confused about current resizing workflow. It is not obvious when they have to delete and recreate the pod. We have added conditions and events for that, but the workflow is still not very user friendly.
I think we can add a section to inform the user that currently we only support resizing for xfs, ext3, ext4, and they are all safe for online resizing. BTW, right now offline resizing is also performed after the volume is mounted as there is no way to expand a unmounted xfs file system.
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +Enable users to increase size of PVC which already in use(mounted). The user will update PVC for requesting a new size. Underneath we expect that - according kubelet will resize the file system for the PVC. + +## Non Goals + +* Offline file system resizing not included. If we found a volume need file system resizing but not mounted to the node yet, we will do nothing. This situation will be dealt with by the existing [offline file system resizing handler](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md
). + +* Extending resize tools: We only support for most common file systems' offline resizing in current release, and we prefer to stay the same for the online one. + +## Use Cases + +* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data) +* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline) + +## Prerequisite + +Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod.
@saad-ali @gnufied I've added a section in here to illustrate the safety for online resizing, PTAL.
@saad-ali commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +Enable users to increase size of PVC which already in use(mounted). The user will update PVC for requesting a new size. Underneath we expect that - according kubelet will resize the file system for the PVC. + +## Non Goals + +* Offline file system resizing not included. If we found a volume need file system resizing but not mounted to the node yet, we will do nothing. This situation will be dealt with by the existing [offline file system resizing handler](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md). + +* Extending resize tools: We only support for most common file systems' offline resizing in current release, and we prefer to stay the same for the online one. + +## Use Cases + +* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data) +* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline) + +## Prerequisite + +Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod.
Should this behavior be gated on a check of kernel version? If kernel < 3.3 then don't do online resizing?
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +Enable users to increase size of PVC which already in use(mounted). The user will update PVC for requesting a new size. Underneath we expect that - according kubelet will resize the file system for the PVC. + +## Non Goals + +* Offline file system resizing not included. If we found a volume need file system resizing but not mounted to the node yet, we will do nothing. This situation will be dealt with by the existing [offline file system resizing handler](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md). + +* Extending resize tools: We only support for most common file systems' offline resizing in current release, and we prefer to stay the same for the online one. + +## Use Cases + +* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data) +* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline) + +## Prerequisite + +Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod.
Yeah, it's better to add this check. Currently offline resizing is also performed after volume mounted as there is no way to expand an unmounted xfs file system, so I guess we can add this check inside resizeFileSystem, then both online and offline operations can be gated. cc @gnufied
@gnufied commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +Enable users to increase size of PVC which already in use(mounted). The user will update PVC for requesting a new size. Underneath we expect that - according kubelet will resize the file system for the PVC. + +## Non Goals + +* Offline file system resizing not included. If we found a volume need file system resizing but not mounted to the node yet, we will do nothing. This situation will be dealt with by the existing [offline file system resizing handler](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md). + +* Extending resize tools: We only support for most common file systems' offline resizing in current release, and we prefer to stay the same for the online one. + +## Use Cases + +* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data) +* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline) + +## Prerequisite + +Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod.
I am not sure about adding this check. Linux Kernel 3.3 is really old. Docker requirement is 3.10+ - https://docs.docker.com/install/linux/docker-ce/binaries/
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +## Non Goals + +* Offline file system resizing not included. If we found a volume need file system resizing but not mounted to the node yet, we will do nothing. This situation will be dealt with by the existing [offline file system resizing handler](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md). + +* Extending resize tools: We only support for most common file systems' offline resizing in current release, and we prefer to stay the same for the online one. + +## Use Cases + +* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data) +* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline) + +## Prerequisite + +- Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod. + +- This feature will be protected by an alpha feature gate `ExpandOnlinePersistentVolumes` in v1.11. We separate this feature gate from the offline resizing gate `ExpandPersistentVolumes`, if user wants to enable this feature, `ExpandPersistentVolumes` should be enabled first.
@gnufied The description of online resizing feature gate is added here.
@jsafrane commented on this pull request.
I have couple of nits here and there. I did not review the code snippets though.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + // UnregisterPod unregisters PersistentVolumeClaims from a given pod that are not + // used by any other registered pod. + UnregisterPod(pod *v1.Pod) +} +``` + +`RegisterPod` is invoked when a pod is added or updated to `PodManager`, and `UnregisterPod` is invoked when a pod is deleted from `PodManager`. `RegisterPod` just increases the count reference of each PVC used by the pod, according PVC + object won't be fetched from `apiserver` until `GetPersistentVolumeClaim` is invoked for the first time. Similarly, `UnregisterPod` just reduce the reference count and clear according item if no alive pods use this volume. + +`VolumeManager` invokes `GetPersistentVolumeClaim` to check whether a volume requires file system resize or not. `GetPersistentVolumeClaim` will try to fetch the PVC object in the following order: + +- PVC doesn't exist in local cache, it means this is the first time to request this volume, we need to fetch it from `etcd` +- PVC exists in local cache but expired, it means this is just a periodic refresh of a PVC we successfully fetched previously, we can just fetch it from `apiserver`'s cache(setting `ResourceVersion` to 0) +- PVC exists in local cache and hasn't expired, just return the cached object + +### Online File System Resize in Kbuelet
Kbuelet -> Kubelet
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + // WARNING: Register/UnregisterPod functions should be efficient, + // i.e. should not block on network operations. + + // RegisterPod registers all PersistentVolumeClaims from a given pod. + RegisterPod(pod *v1.Pod) + + // UnregisterPod unregisters PersistentVolumeClaims from a given pod that are not + // used by any other registered pod. + UnregisterPod(pod *v1.Pod) +} +``` + +`RegisterPod` is invoked when a pod is added or updated to `PodManager`, and `UnregisterPod` is invoked when a pod is deleted from `PodManager`. `RegisterPod` just increases the count reference of each PVC used by the pod, according PVC + object won't be fetched from `apiserver` until `GetPersistentVolumeClaim` is invoked for the first time. Similarly, `UnregisterPod` just reduce the reference count and clear according item if no alive pods use this volume. + +`VolumeManager` invokes `GetPersistentVolumeClaim` to check whether a volume requires file system resize or not. `GetPersistentVolumeClaim` will try to fetch the PVC object in the following order:
It's quite confusing to have two interfaces named Manager and VolumeManager. I guess there is nothing we can do about it, but please be extra careful with variable names and so on - it should be clear which manager is used where and how.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +`VolumeManager` invokes `GetPersistentVolumeClaim` to check whether a volume requires file system resize or not. `GetPersistentVolumeClaim` will try to fetch the PVC object in the following order: + +- PVC doesn't exist in local cache, it means this is the first time to request this volume, we need to fetch it from `etcd` +- PVC exists in local cache but expired, it means this is just a periodic refresh of a PVC we successfully fetched previously, we can just fetch it from `apiserver`'s cache(setting `ResourceVersion` to 0) +- PVC exists in local cache and hasn't expired, just return the cached object + +### Online File System Resize in Kbuelet + +#### Volume Resize Request Detect + +Currently `DesiredStateOfWorldPopulator` loops through all alive pods in `PodManager` and reprocess each pod periodically. When we process the pod, we can also loop through each PVCs mounted by this pod to detect which volume requires file system resizing. A PVC requires file system resizing if: + +* `PVC.Status.Capacity` is less than `PV.Spec.Capacity` +* `PVC.Status.Condition` has a `PersistentVolumeClaimFileSystemResizePending` condition + +To mark a volume as file system resizing required, we add a `fsResizeRequired` field to `attachedVolume.mountedPod`, , and add a `MarkFSResizeRequired` method to `ActualStateOfWorld`. If `DesiredStateOfWorldPopulator` discovers a volume requires file system resizing, it invokes `MarkFSResizeRequired` to set according `mountedPod.fsResizeRequired` to true.
typo: ", ,"
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + // This volume is a PVC, we should check it whether requires file system resize or not.
+ pvc, err := dswp.pvcManager.GetPersistentVolumeClaim(pod.Namespace, podVolume.PersistentVolumeClaim.ClaimName)
+ if err != nil {
+ glog.Errorf("Failed to check whether pod volume(%s/%s) requires "+
+ "file system resize or not, get PVC(%s/%s) from PVCManager failed: %v",
+ uniquePodName, podVolume.Name, pod.Namespace, podVolume.PersistentVolumeClaim.ClaimName, err)
+ return
+ }
+ if volumeRequiresFSResize(pvc, volumeSpec.PersistentVolume) {
+ dswp.actualStateOfWorld.MarkFSResizeRequired(uniqueVolumeName, uniquePodName)
+ }
+ processedVolumesForFSResize.Insert(string(uniqueVolumeName))
+}
+```
+
+It is important to note that:
Please add a note that since we support only XFS and EXT3/4, we do not need to worry about exotic filesystems that can be attached / mounted to multiple nodes at the same time and require to run a resize command on a node, such as gfs2. Who knows what happens if two or more nodes run gfs2_grow at the same time. IMO GFS2 should survive, however let's not play with devil.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + !exponentialbackoff.IsExponentialBackoff(err) {
+ // Ignore nestedpendingoperations.IsAlreadyExists and exponentialbackoff.IsExponentialBackoff errors, they are expected.
+ // Log all other errors.
+ glog.Errorf(volumeToMount.GenerateErrorDetailed("operationExecutor.VolumeFileSystemResize failed", err).Error())
+ }
+ if err == nil {
+ glog.V(4).Infof(volumeToMount.GenerateMsgDetailed("operationExecutor.VolumeFileSystemResize started", ""))
+ }
+}
+```
+
+Can be seen from the above code, we add a `VolumeFileSystemResize` method to `OperationExecutor` and a `GenerateVolumeFSResizeFunc` method to `OperationGenerator`, which performs volume file system resizing operation.
+
+We also add a `MarkVolumeAsResized` to `ActualStateOfWorldMounterUpdater`, which will be invoked after the file system resizing operation succeeded. `MarkVolumeAsResized` clears the `fsResizeRequired` flag of according `mountedPod` to promise no more resize operation will be performed.
+
+ The main work of `GenerateVolumeFSResizeFunc` is simply invoking the `resizeFileSystem` method and `MarkVolumeAsResized`, looks like this:
@jsafrane commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +## Use Cases + +* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data) +* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline) + +## Prerequisite + +- Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/
), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod. + +- This feature will be protected by an alpha feature gate `ExpandOnlinePersistentVolumes` in v1.11. We separate this feature gate from the offline resizing gate `ExpandPersistentVolumes`, if user wants to enable this feature, `ExpandPersistentVolumes` should be enabled first. + +## Implementation Design + +### Design Overview + +The key point of online file system resizing is how kubelet to discover which PVCs need file system resizing. We achieve this goal by adding a PVC cache for kubelet. In detail, when kubelet receives a new pod, it adds all PVCs mounted by this pod to the cache, and remove according PVCs when a pod is deleted. The PVC information is cached there with a TTL to reduce the load on both `etcd` and `apiserver`.
@liggitt, heads up: this proposal adds PVC cache into Kubelet, just as it has Secrets and ConfigMap caches. It will refresh (GET) all PVCs used by running or scheduled pods every now and then.
Should we worry about performance / API server load?
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + // UnregisterPod unregisters PersistentVolumeClaims from a given pod that are not + // used by any other registered pod. + UnregisterPod(pod *v1.Pod) +} +``` + +`RegisterPod` is invoked when a pod is added or updated to `PodManager`, and `UnregisterPod` is invoked when a pod is deleted from `PodManager`. `RegisterPod` just increases the count reference of each PVC used by the pod, according PVC + object won't be fetched from `apiserver` until `GetPersistentVolumeClaim` is invoked for the first time. Similarly, `UnregisterPod` just reduce the reference count and clear according item if no alive pods use this volume. + +`VolumeManager` invokes `GetPersistentVolumeClaim` to check whether a volume requires file system resize or not. `GetPersistentVolumeClaim` will try to fetch the PVC object in the following order: + +- PVC doesn't exist in local cache, it means this is the first time to request this volume, we need to fetch it from `etcd` +- PVC exists in local cache but expired, it means this is just a periodic refresh of a PVC we successfully fetched previously, we can just fetch it from `apiserver`'s cache(setting `ResourceVersion` to 0) +- PVC exists in local cache and hasn't expired, just return the cached object + +### Online File System Resize in Kbuelet
Nice catch, fixed.
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + // WARNING: Register/UnregisterPod functions should be efficient, + // i.e. should not block on network operations. + + // RegisterPod registers all PersistentVolumeClaims from a given pod. + RegisterPod(pod *v1.Pod) + + // UnregisterPod unregisters PersistentVolumeClaims from a given pod that are not + // used by any other registered pod. + UnregisterPod(pod *v1.Pod) +} +``` + +`RegisterPod` is invoked when a pod is added or updated to `PodManager`, and `UnregisterPod` is invoked when a pod is deleted from `PodManager`. `RegisterPod` just increases the count reference of each PVC used by the pod, according PVC + object won't be fetched from `apiserver` until `GetPersistentVolumeClaim` is invoked for the first time. Similarly, `UnregisterPod` just reduce the reference count and clear according item if no alive pods use this volume. + +`VolumeManager` invokes `GetPersistentVolumeClaim` to check whether a volume requires file system resize or not. `GetPersistentVolumeClaim` will try to fetch the PVC object in the following order:
Thanks for reminding, I've already adjusted the description in here to reduce the confusion.
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +`VolumeManager` invokes `GetPersistentVolumeClaim` to check whether a volume requires file system resize or not. `GetPersistentVolumeClaim` will try to fetch the PVC object in the following order: + +- PVC doesn't exist in local cache, it means this is the first time to request this volume, we need to fetch it from `etcd` +- PVC exists in local cache but expired, it means this is just a periodic refresh of a PVC we successfully fetched previously, we can just fetch it from `apiserver`'s cache(setting `ResourceVersion` to 0) +- PVC exists in local cache and hasn't expired, just return the cached object + +### Online File System Resize in Kbuelet + +#### Volume Resize Request Detect + +Currently `DesiredStateOfWorldPopulator` loops through all alive pods in `PodManager` and reprocess each pod periodically. When we process the pod, we can also loop through each PVCs mounted by this pod to detect which volume requires file system resizing. A PVC requires file system resizing if: + +* `PVC.Status.Capacity` is less than `PV.Spec.Capacity` +* `PVC.Status.Condition` has a `PersistentVolumeClaimFileSystemResizePending` condition + +To mark a volume as file system resizing required, we add a `fsResizeRequired` field to `attachedVolume.mountedPod`, , and add a `MarkFSResizeRequired` method to `ActualStateOfWorld`. If `DesiredStateOfWorldPopulator` discovers a volume requires file system resizing, it invokes `MarkFSResizeRequired` to set according `mountedPod.fsResizeRequired` to true.
Nice catch, fixed.
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + // This volume is a PVC, we should check it whether requires file system resize or not.
+ pvc, err := dswp.pvcManager.GetPersistentVolumeClaim(pod.Namespace, podVolume.PersistentVolumeClaim.ClaimName)
+ if err != nil {
+ glog.Errorf("Failed to check whether pod volume(%s/%s) requires "+
+ "file system resize or not, get PVC(%s/%s) from PVCManager failed: %v",
+ uniquePodName, podVolume.Name, pod.Namespace, podVolume.PersistentVolumeClaim.ClaimName, err)
+ return
+ }
+ if volumeRequiresFSResize(pvc, volumeSpec.PersistentVolume) {
+ dswp.actualStateOfWorld.MarkFSResizeRequired(uniqueVolumeName, uniquePodName)
+ }
+ processedVolumesForFSResize.Insert(string(uniqueVolumeName))
+}
+```
+
+It is important to note that:
Thanks for reminding, a note is added for this.
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> + !exponentialbackoff.IsExponentialBackoff(err) {
+ // Ignore nestedpendingoperations.IsAlreadyExists and exponentialbackoff.IsExponentialBackoff errors, they are expected.
+ // Log all other errors.
+ glog.Errorf(volumeToMount.GenerateErrorDetailed("operationExecutor.VolumeFileSystemResize failed", err).Error())
+ }
+ if err == nil {
+ glog.V(4).Infof(volumeToMount.GenerateMsgDetailed("operationExecutor.VolumeFileSystemResize started", ""))
+ }
+}
+```
+
+Can be seen from the above code, we add a `VolumeFileSystemResize` method to `OperationExecutor` and a `GenerateVolumeFSResizeFunc` method to `OperationGenerator`, which performs volume file system resizing operation.
+
+We also add a `MarkVolumeAsResized` to `ActualStateOfWorldMounterUpdater`, which will be invoked after the file system resizing operation succeeded. `MarkVolumeAsResized` clears the `fsResizeRequired` flag of according `mountedPod` to promise no more resize operation will be performed.
+
+ The main work of `GenerateVolumeFSResizeFunc` is simply invoking the `resizeFileSystem` method and `MarkVolumeAsResized`, looks like this:
Thanks for reminding, I've add two notes to describe the PVC size/condition management and error handling.
Windows NTFS supports both online expand & shrink
@yujuhong commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +## Use Cases + +* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data) +* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline) + +## Prerequisite + +- Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod. + +- This feature will be protected by an alpha feature gate `ExpandOnlinePersistentVolumes` in v1.11. We separate this feature gate from the offline resizing gate `ExpandPersistentVolumes`, if user wants to enable this feature, `ExpandPersistentVolumes` should be enabled first. + +## Implementation Design + +### Design Overview + +The key point of online file system resizing is how kubelet to discover which PVCs need file system resizing. We achieve this goal by adding a PVC cache for kubelet. In detail, when kubelet receives a new pod, it adds all PVCs mounted by this pod to the cache, and remove according PVCs when a pod is deleted. The PVC information is cached there with a TTL to reduce the load on both `etcd` and `apiserver`.
Is the TTL going to be set according the cluster size similar to that for Secrets and ConfigMaps? I think in general we do have concerns about the scalability of these requests. The hope (AFAIK) is to switch to "watch" if the security concerns can be addressed.
/cc @wojtek-t for the scalability issues.
@gnufied commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +## Use Cases + +* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data) +* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline) + +## Prerequisite + +- Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod. + +- This feature will be protected by an alpha feature gate `ExpandOnlinePersistentVolumes` in v1.11. We separate this feature gate from the offline resizing gate `ExpandPersistentVolumes`, if user wants to enable this feature, `ExpandPersistentVolumes` should be enabled first. + +## Implementation Design + +### Design Overview + +The key point of online file system resizing is how kubelet to discover which PVCs need file system resizing. We achieve this goal by adding a PVC cache for kubelet. In detail, when kubelet receives a new pod, it adds all PVCs mounted by this pod to the cache, and remove according PVCs when a pod is deleted. The PVC information is cached there with a TTL to reduce the load on both `etcd` and `apiserver`.
Isn't there a cost associated with watches as well? I have seen some production failures where number of watches pushed etcd over the edge. If watches is the way to go, the watch has to apply selectively and should not be cluster wide - right?
@gnufied commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +## Use Cases + +* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data) +* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline) + +## Prerequisite + +- Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod. + +- This feature will be protected by an alpha feature gate `ExpandOnlinePersistentVolumes` in v1.11. We separate this feature gate from the offline resizing gate `ExpandPersistentVolumes`, if user wants to enable this feature, `ExpandPersistentVolumes` should be enabled first. + +## Implementation Design + +### Design Overview + +The key point of online file system resizing is how kubelet to discover which PVCs need file system resizing. We achieve this goal by adding a PVC cache for kubelet. In detail, when kubelet receives a new pod, it adds all PVCs mounted by this pod to the cache, and remove according PVCs when a pod is deleted. The PVC information is cached there with a TTL to reduce the load on both `etcd` and `apiserver`.
One of the other things we can do is - only poll/cache for PVCs which are "resizable". What I mean is - not all PVCs can be resized (https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md) and only PVCs which were dynamically provisioned from storageClass that has allowVolumeExpansion set can be expanded.
Steps could be:
kubernetes.io/resizable: true)One downside is - if a kube admin edits the SC then the PVC still can not be online resized until user deletes and recreates the pod. I think that is an acceptable compromise that applies to other cases where editing SC does not result in state change of in-use PVC.
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +## Use Cases + +* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data) +* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline) + +## Prerequisite + +- Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod. + +- This feature will be protected by an alpha feature gate `ExpandOnlinePersistentVolumes` in v1.11. We separate this feature gate from the offline resizing gate `ExpandPersistentVolumes`, if user wants to enable this feature, `ExpandPersistentVolumes` should be enabled first. + +## Implementation Design + +### Design Overview + +The key point of online file system resizing is how kubelet to discover which PVCs need file system resizing. We achieve this goal by adding a PVC cache for kubelet. In detail, when kubelet receives a new pod, it adds all PVCs mounted by this pod to the cache, and remove according PVCs when a pod is deleted. The PVC information is cached there with a TTL to reduce the load on both `etcd` and `apiserver`.
@yujuhong Yeah, the TTL is similar to that for Secrets and ConfigMaps.
Hi guys @jsafrane , @yujuhong , we changed this proposal again as we found that Kubelet's VolumeManager is already fetching PV/PVC during every pod resync loop, see here and here. So we decided to reuse this code, using the fetched PV/PVC object to check whether volume requires online resizing or not instead of adding a PVC cache.
Actually current PV/PVC reprocess mechanism has a hidden performance problem, so we opened a separate issue to track this problem: #63274
cc @gnufied
In light of @mlmhl finding that we are already fetching pvc/pv from api-server on every pod sync, I do agree that we do not need PVC cache similar to configmaps etc. This proposal can definitely work without introducing new api calls.
I propose we tackle the bug/issue with PVC/PV getting fetched all the time in separate github issue - kubernetes/kubernetes#63274 that has been opened. cc @saad-ali
@jsafrane commented on this pull request.
lgtm-ish, however you should note somewhere what happens when volume is being resized and at the same time a pod is deleted and the same volume is being unmounted. IMO unmount must wait (should not be started) until resize finishes. Reconciler(?) should check that.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +- We only perform online resizing for PVCs used as`Filesystem` mode, because `Block` mode volume will not be formatted with a file system. + +- If a PVC is mounted as read only by a pod, we won't invoke `MarkFSResizeRequired` for this pod to promise the semantic of `ReadOnly`. + +- Although we add `fsResizeRequired` to `mountedPod`, actually file system resizing is a global operation for volume, if more than one pod mount a same PVC, we needn't invoke `MarkFSResizeRequired` for each pod. Instead, we only mark the first pod we processed. The input parameter `processedVolumesForFSResize` is introduced for this purpose. + +- Since we support only `xfs` and `ext3/4`, we needn't to worry about exotic file systems that can be attached/mounted to multiple nodes at the same time and require a resizing operation on a node, such as [gfs2](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/global_file_system_2/s1-manage-growfs). + +#### File System Resizing Operation + +Currently `Reconciler` runs a periodic loop to reconcile the desired state of the world with the actual state of the world by triggering attach, detach, mount, and unmount operations. Based on this mechanism, we can make `Reconciler` to trigger file system resize operation too. + +In each loop, for each volume in DSW, `Reconciler` checks whether this volume exists in ASW or not. If volume exists but marked as file system resizing required, ASW returns a `fsResizeRequiredError` error. then `Reconciler` triggers a file system resizing operation for this volume. The code snippet looks like this: + +```go +if cache.IsFSResizeRequiredError(err) {
I don't like passing random return values via errors, they should be reserved to real errors / exceptions. But we can solve that in implementation phase and not in this proposal.
@jsafrane yeah I think there is a note in original resize proposal that if pvc has any of resize-in-progress conditions then we will wait for it to clear before unmounting the volume.
@mlmhl commented on this pull request.
In contributors/design-proposals/storage/online-grow-volume-size.md:
> +- We only perform online resizing for PVCs used as`Filesystem` mode, because `Block` mode volume will not be formatted with a file system. + +- If a PVC is mounted as read only by a pod, we won't invoke `MarkFSResizeRequired` for this pod to promise the semantic of `ReadOnly`. + +- Although we add `fsResizeRequired` to `mountedPod`, actually file system resizing is a global operation for volume, if more than one pod mount a same PVC, we needn't invoke `MarkFSResizeRequired` for each pod. Instead, we only mark the first pod we processed. The input parameter `processedVolumesForFSResize` is introduced for this purpose. + +- Since we support only `xfs` and `ext3/4`, we needn't to worry about exotic file systems that can be attached/mounted to multiple nodes at the same time and require a resizing operation on a node, such as [gfs2](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/global_file_system_2/s1-manage-growfs). + +#### File System Resizing Operation + +Currently `Reconciler` runs a periodic loop to reconcile the desired state of the world with the actual state of the world by triggering attach, detach, mount, and unmount operations. Based on this mechanism, we can make `Reconciler` to trigger file system resize operation too. + +In each loop, for each volume in DSW, `Reconciler` checks whether this volume exists in ASW or not. If volume exists but marked as file system resizing required, ASW returns a `fsResizeRequiredError` error. then `Reconciler` triggers a file system resizing operation for this volume. The code snippet looks like this: + +```go +if cache.IsFSResizeRequiredError(err) {
Currently we use a remountRequiredError to indicate a volume need remount, so I add a resizeRequiredError to keep consistency. As you said we can review this again in implementation phase.
@angapov approved this pull request.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle rotten
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
still relevant right?
This has been replaced by https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20190125-online-growing-persistent-volume-size.md
/close
Closed #1535.
@msau42: Closed this PR.
In response to this:
This has been replaced by https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20190125-online-growing-persistent-volume-size.md
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.