Re: [kubernetes/community] Proposal for online resizing of persistent volumes (#1535)

23 views
Skip to first unread message

k8s-ci-robot

unread,
Dec 20, 2017, 7:17:34 AM12/20/17
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl: Reiterating the mentions to trigger a notification:
@kubernetes/sig-storage-pr-reviews

In response to this:

@kubernetes/sig-storage-pr-reviews @gnufied

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

mlmhl

unread,
Dec 20, 2017, 7:20:59 AM12/20/17
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

/assign @gnufied

Hemant Kumar

unread,
Dec 20, 2017, 12:05:09 PM12/20/17
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@gnufied commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +
+The `volumeFSResizingAnnotation` annotation's key is `volumes.kubernetes.io/fs-resizing-{volumeName}`, and set it's value to `yes` if online file system resizing if required. We can use `volume.Spec.Name()` as the `volumeName` to guarantee that:
+
+* Annotation's key is a qualified name
+* `volumeName` is unique for a single pod
+
+### Prerequisite
+
+* `sig-api-machinery` need to allow pod's annotation update from kubelet. 
+* `sig-api-machinery` need to allow pod's get and annotation update from expand-controller.
+
+### Controller Manager Resize
+
+#### Fetch All Pods Using a PVC
+
+We need to know which pods are using a volume with specified name, but `ExpandController` doesn't cache this relationship. To achieve this goal, We can share the `DesiredStateOfWorld` with `AttachDetachController`. In detail, reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and add a `GetPodsInVolume` method to `DesiredStateOfWorld`. Expand controller use this method to fetch pods using the volume with specified name.

typo in word reflector ? I am still unclear though - how will this cache become available to ExpandController since it will be a internal property of AttachDetachController.

Hemant Kumar

unread,
Dec 20, 2017, 12:05:46 PM12/20/17
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@gnufied commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +
+### Prerequisite
+
+* `sig-api-machinery` need to allow pod's annotation update from kubelet. 
+* `sig-api-machinery` need to allow pod's get and annotation update from expand-controller.
+
+### Controller Manager Resize
+
+#### Fetch All Pods Using a PVC
+
+We need to know which pods are using a volume with specified name, but `ExpandController` doesn't cache this relationship. To achieve this goal, We can share the `DesiredStateOfWorld` with `AttachDetachController`. In detail, reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and add a `GetPodsInVolume` method to `DesiredStateOfWorld`. Expand controller use this method to fetch pods using the volume with specified name.
+
+The `GetPodsInVolume` method will look like this:
+
+```go
+func (dsw *desiredStateOfWorld) GetPodsInVolume(volumeName v1.UniqueVolumeName) []*v1.Pod {

We should really use ActualStateOfWorld rather than dsow here.

mlmhl

unread,
Dec 21, 2017, 4:04:06 AM12/21/17
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +
+The `volumeFSResizingAnnotation` annotation's key is `volumes.kubernetes.io/fs-resizing-{volumeName}`, and set it's value to `yes` if online file system resizing if required. We can use `volume.Spec.Name()` as the `volumeName` to guarantee that:
+
+* Annotation's key is a qualified name
+* `volumeName` is unique for a single pod
+
+### Prerequisite
+
+* `sig-api-machinery` need to allow pod's annotation update from kubelet. 
+* `sig-api-machinery` need to allow pod's get and annotation update from expand-controller.
+
+### Controller Manager Resize
+
+#### Fetch All Pods Using a PVC
+
+We need to know which pods are using a volume with specified name, but `ExpandController` doesn't cache this relationship. To achieve this goal, We can share the `DesiredStateOfWorld` with `AttachDetachController`. In detail, reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and add a `GetPodsInVolume` method to `DesiredStateOfWorld`. Expand controller use this method to fetch pods using the volume with specified name.

I've updated the proposal to add more detailed description of this reflector operation.
And a preliminary implementation can be found from these commits: reflector
remove dependence

mlmhl

unread,
Dec 21, 2017, 7:32:17 AM12/21/17
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +
+### Prerequisite
+
+* `sig-api-machinery` need to allow pod's annotation update from kubelet. 
+* `sig-api-machinery` need to allow pod's get and annotation update from expand-controller.
+
+### Controller Manager Resize
+
+#### Fetch All Pods Using a PVC
+
+We need to know which pods are using a volume with specified name, but `ExpandController` doesn't cache this relationship. To achieve this goal, We can share the `DesiredStateOfWorld` with `AttachDetachController`. In detail, reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and add a `GetPodsInVolume` method to `DesiredStateOfWorld`. Expand controller use this method to fetch pods using the volume with specified name.
+
+The `GetPodsInVolume` method will look like this:
+
+```go
+func (dsw *desiredStateOfWorld) GetPodsInVolume(volumeName v1.UniqueVolumeName) []*v1.Pod {

Yeah, you are right. I think we should get the pod-volume-relationship(which pod use which volume) from DesiredStateOfWorld as ActualStateOfWorld doesn't cache this relationship, and get the volume-node-relationship(which volume mount to which node) from ActualStateOfWorld. The current proposal misses the latter one, I will update the proposal later.

mlmhl

unread,
Dec 22, 2017, 1:41:58 AM12/22/17
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +
+### Prerequisite
+
+* `sig-api-machinery` need to allow pod's annotation update from kubelet. 
+* `sig-api-machinery` need to allow pod's get and annotation update from expand-controller.
+
+### Controller Manager Resize
+
+#### Fetch All Pods Using a PVC
+
+We need to know which pods are using a volume with specified name, but `ExpandController` doesn't cache this relationship. To achieve this goal, We can share the `DesiredStateOfWorld` with `AttachDetachController`. In detail, reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and add a `GetPodsInVolume` method to `DesiredStateOfWorld`. Expand controller use this method to fetch pods using the volume with specified name.
+
+The `GetPodsInVolume` method will look like this:
+
+```go
+func (dsw *desiredStateOfWorld) GetPodsInVolume(volumeName v1.UniqueVolumeName) []*v1.Pod {

I added a detailed explanation for this subject to the proposal.

Hemant Kumar

unread,
Jan 2, 2018, 5:44:49 PM1/2/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@gnufied commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +
+## Implementation Design
+
+### Overview
+
+The key point of online file system resizing is how kubelet to discover which  PVCs need file system resizing. We achieve this goal by adding a `volumeFSResizingAnnotation` annotation to pod. In detail, if a volume received an online file system resizing request, the annotation will be added to the pods using this volume, so kubelet can watch for pods running on it to discover this annotation, and then start the online file system resizing operation for according volumes.
+
+The `volumeFSResizingAnnotation` annotation's key is `volumes.kubernetes.io/fs-resizing-{volumeName}`, and set it's value to `yes` if online file system resizing if required. We can use `volume.Spec.Name()` as the `volumeName` to guarantee that:
+
+* Annotation's key is a qualified name
+* `volumeName` is unique for a single pod
+
+### Prerequisite
+
+* `sig-api-machinery` need to allow pod's annotation update from kubelet. 
+* `sig-api-machinery` need to allow pod's get and annotation update from expand-controller.

cc @liggitt

Hemant Kumar

unread,
Jan 2, 2018, 5:47:24 PM1/2/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@gnufied commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +* For a specific `(volume,node)`, if more than one `pod` satisfies the relationship above, only add the annotation to one pod inside them.
+
+#### Fetch All Pods Using a PVC
+
+For a volume with online resizing request, we need to find which pods and nodes satisfy the relationship defines above, but `ExpandController` doesn't cache this relationship, so we need to fetch this information from `DesiredStateOfWorld` and `ActualStateOfWorld` of `AttachDetachController`. `DesiredStateOfWorld`caches the used volumes for each pods, `ActualStateOfWorld` caches the mounted volumes for each nodes.
+
+**Note**: `ActualStateOfWorld` only caches the mount relationship between volumes and nodes, not the volumes and pods, because only kubelet knows which pods actually mount which volumes.
+
+ To achieve this goal, we need to:
+
+* reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and remove the parameter `volumePluginMgr` of `NewDesiredStateOfWorld` and `NewActualStateOfWorld` function, so that we can create a new `DesiredStateOfWorld` and `ActualStateOfWorld` without any dependence. 
+
+* This modification is possible because `DesiredStateOfWorld` and `ActualStateOfWorld` only use `volumePluginMgr` to fetch volume name, inside the `DesiredStateOfWorld.AddPod` and `ActualStateOfWorld.AddVolumeNode` method, we can get the volume name outside and transform to them as a method parameter.
+
+* Then we can create a common `DesiredStateOfWorld` and `ActualStateOfWorld`, implemented as fields of the `ControllerContext` object, instead of creating them inside `NewAttachDetachController`. After this, `AttachDetachController` and `ExpandController` can both access it through `ControllerContext`.
+

Is there any example of this design being used previously? I am just curious.

Hemant Kumar

unread,
Jan 2, 2018, 5:47:51 PM1/2/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@gnufied commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +
+#### Prerequisite of Controller Manager Resize
+
+* We use a three tuple `(pod,volume,node)` to represent a relationship: `pod` is using `volume`, `pod` is running on `node` and `volume` has been mounted to `node`. Add the `volumeFSResizingAnnotation` to `pod` only if this relationship was established.
+
+* For a specific `(volume,node)`, if more than one `pod` satisfies the relationship above, only add the annotation to one pod inside them.
+
+#### Fetch All Pods Using a PVC
+
+For a volume with online resizing request, we need to find which pods and nodes satisfy the relationship defines above, but `ExpandController` doesn't cache this relationship, so we need to fetch this information from `DesiredStateOfWorld` and `ActualStateOfWorld` of `AttachDetachController`. `DesiredStateOfWorld`caches the used volumes for each pods, `ActualStateOfWorld` caches the mounted volumes for each nodes.
+
+**Note**: `ActualStateOfWorld` only caches the mount relationship between volumes and nodes, not the volumes and pods, because only kubelet knows which pods actually mount which volumes.
+
+ To achieve this goal, we need to:
+
+* reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and remove the parameter `volumePluginMgr` of `NewDesiredStateOfWorld` and `NewActualStateOfWorld` function, so that we can create a new `DesiredStateOfWorld` and `ActualStateOfWorld` without any dependence. 

Do you mean reflector or you mean refactor ?

Hemant Kumar

unread,
Jan 2, 2018, 5:49:54 PM1/2/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@gnufied commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +}
+```
+
+### File System Resize on Kubelet
+
+#### Prerequisite of Online File System Resize
+
+* Pod running on the node has one more `volumeFSResizingAnnotation` annotations.
+* Volume with a online resizing request has been mounted to the node.
+* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`.
+
+#### Volume Resize Request Detect
+
+Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state.
+
+Besides, we add a `VolumeFsResizeMapPopulator` interface as the producer of `VolumeFsResizeMap`, it runs an asynchronous periodic loop to populate the volumes need file system resize. In each loop, `VolumeFsResizeMapPopulator` traverses each pod's annotation, and generate a `VolumeToResize` object for each `volumeFSResizingAnnotation`.

Will this be a new package or part of volume-manager's interface?

Jordan Liggitt

unread,
Jan 2, 2018, 10:04:15 PM1/2/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@liggitt commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +* For a specific `(volume,node)`, if more than one `pod` satisfies the relationship above, only add the annotation to one pod inside them.
+
+#### Fetch All Pods Using a PVC
+
+For a volume with online resizing request, we need to find which pods and nodes satisfy the relationship defines above, but `ExpandController` doesn't cache this relationship, so we need to fetch this information from `DesiredStateOfWorld` and `ActualStateOfWorld` of `AttachDetachController`. `DesiredStateOfWorld`caches the used volumes for each pods, `ActualStateOfWorld` caches the mounted volumes for each nodes.
+
+**Note**: `ActualStateOfWorld` only caches the mount relationship between volumes and nodes, not the volumes and pods, because only kubelet knows which pods actually mount which volumes.
+
+ To achieve this goal, we need to:
+
+* reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and remove the parameter `volumePluginMgr` of `NewDesiredStateOfWorld` and `NewActualStateOfWorld` function, so that we can create a new `DesiredStateOfWorld` and `ActualStateOfWorld` without any dependence. 
+
+* This modification is possible because `DesiredStateOfWorld` and `ActualStateOfWorld` only use `volumePluginMgr` to fetch volume name, inside the `DesiredStateOfWorld.AddPod` and `ActualStateOfWorld.AddVolumeNode` method, we can get the volume name outside and transform to them as a method parameter.
+
+* Then we can create a common `DesiredStateOfWorld` and `ActualStateOfWorld`, implemented as fields of the `ControllerContext` object, instead of creating them inside `NewAttachDetachController`. After this, `AttachDetachController` and `ExpandController` can both access it through `ControllerContext`.
+

Then we can create a common DesiredStateOfWorld and ActualStateOfWorld, implemented as fields of the ControllerContext object

no... controller context is not intended to be a shared mutable struct

mlmhl

unread,
Jan 3, 2018, 5:20:58 AM1/3/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +
+#### Prerequisite of Controller Manager Resize
+
+* We use a three tuple `(pod,volume,node)` to represent a relationship: `pod` is using `volume`, `pod` is running on `node` and `volume` has been mounted to `node`. Add the `volumeFSResizingAnnotation` to `pod` only if this relationship was established.
+
+* For a specific `(volume,node)`, if more than one `pod` satisfies the relationship above, only add the annotation to one pod inside them.
+
+#### Fetch All Pods Using a PVC
+
+For a volume with online resizing request, we need to find which pods and nodes satisfy the relationship defines above, but `ExpandController` doesn't cache this relationship, so we need to fetch this information from `DesiredStateOfWorld` and `ActualStateOfWorld` of `AttachDetachController`. `DesiredStateOfWorld`caches the used volumes for each pods, `ActualStateOfWorld` caches the mounted volumes for each nodes.
+
+**Note**: `ActualStateOfWorld` only caches the mount relationship between volumes and nodes, not the volumes and pods, because only kubelet knows which pods actually mount which volumes.
+
+ To achieve this goal, we need to:
+
+* reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and remove the parameter `volumePluginMgr` of `NewDesiredStateOfWorld` and `NewActualStateOfWorld` function, so that we can create a new `DesiredStateOfWorld` and `ActualStateOfWorld` without any dependence. 

Sorry, typo in word, I've fixed this.

mlmhl

unread,
Jan 3, 2018, 5:41:48 AM1/3/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +}
+```
+
+### File System Resize on Kubelet
+
+#### Prerequisite of Online File System Resize
+
+* Pod running on the node has one more `volumeFSResizingAnnotation` annotations.
+* Volume with a online resizing request has been mounted to the node.
+* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`.
+
+#### Volume Resize Request Detect
+
+Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state.
+
+Besides, we add a `VolumeFsResizeMapPopulator` interface as the producer of `VolumeFsResizeMap`, it runs an asynchronous periodic loop to populate the volumes need file system resize. In each loop, `VolumeFsResizeMapPopulator` traverses each pod's annotation, and generate a `VolumeToResize` object for each `volumeFSResizingAnnotation`.

I'm not sure if I understand what you mean. We can put VolumeFsResizeMap interface under package pkg/kubelet/volumemanager/cache, just like the ASW and DSW, and put VolumeFsResizeMapPopulator interface under package pkg/kubelet/volumemanager/populator, just like the DesiredStateOfWorldPopulator. Then pkg/kubelet/volumemanager/reconciler.Reconciler can work as a consumer of VolumeFsResizeMap. These two interfaces will look like this:

type VolumeFsResizeMap interface {
    AddVolumeForPod(volumeName string, pod *v1.Pod)
    GetVolumesToResize() []VolumeToResize
    MarkAsFileSystemResized(pod *v1.Pod, volumeName string) error
}
type VolumeFsResizeMapPopulator interface {
    Run(stopCh <-chan struct{})
}

mlmhl

unread,
Jan 3, 2018, 7:26:18 AM1/3/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +* For a specific `(volume,node)`, if more than one `pod` satisfies the relationship above, only add the annotation to one pod inside them.
+
+#### Fetch All Pods Using a PVC
+
+For a volume with online resizing request, we need to find which pods and nodes satisfy the relationship defines above, but `ExpandController` doesn't cache this relationship, so we need to fetch this information from `DesiredStateOfWorld` and `ActualStateOfWorld` of `AttachDetachController`. `DesiredStateOfWorld`caches the used volumes for each pods, `ActualStateOfWorld` caches the mounted volumes for each nodes.
+
+**Note**: `ActualStateOfWorld` only caches the mount relationship between volumes and nodes, not the volumes and pods, because only kubelet knows which pods actually mount which volumes.
+
+ To achieve this goal, we need to:
+
+* reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and remove the parameter `volumePluginMgr` of `NewDesiredStateOfWorld` and `NewActualStateOfWorld` function, so that we can create a new `DesiredStateOfWorld` and `ActualStateOfWorld` without any dependence. 
+
+* This modification is possible because `DesiredStateOfWorld` and `ActualStateOfWorld` only use `volumePluginMgr` to fetch volume name, inside the `DesiredStateOfWorld.AddPod` and `ActualStateOfWorld.AddVolumeNode` method, we can get the volume name outside and transform to them as a method parameter.
+
+* Then we can create a common `DesiredStateOfWorld` and `ActualStateOfWorld`, implemented as fields of the `ControllerContext` object, instead of creating them inside `NewAttachDetachController`. After this, `AttachDetachController` and `ExpandController` can both access it through `ControllerContext`.
+

Yeah, this refactoring is really a bit trick. Do we have any better idea to share ASW/DSW?

Hemant Kumar

unread,
Jan 24, 2018, 10:54:09 AM1/24/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@gnufied commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +* For a specific `(volume,node)`, if more than one `pod` satisfies the relationship above, only add the annotation to one pod inside them.
+
+#### Fetch All Pods Using a PVC
+
+For a volume with online resizing request, we need to find which pods and nodes satisfy the relationship defines above, but `ExpandController` doesn't cache this relationship, so we need to fetch this information from `DesiredStateOfWorld` and `ActualStateOfWorld` of `AttachDetachController`. `DesiredStateOfWorld`caches the used volumes for each pods, `ActualStateOfWorld` caches the mounted volumes for each nodes.
+
+**Note**: `ActualStateOfWorld` only caches the mount relationship between volumes and nodes, not the volumes and pods, because only kubelet knows which pods actually mount which volumes.
+
+ To achieve this goal, we need to:
+
+* reflector the `pkg/controller/volume/attachdetach/cache` package to `pkg/controller/volume/cache`, and remove the parameter `volumePluginMgr` of `NewDesiredStateOfWorld` and `NewActualStateOfWorld` function, so that we can create a new `DesiredStateOfWorld` and `ActualStateOfWorld` without any dependence. 
+
+* This modification is possible because `DesiredStateOfWorld` and `ActualStateOfWorld` only use `volumePluginMgr` to fetch volume name, inside the `DesiredStateOfWorld.AddPod` and `ActualStateOfWorld.AddVolumeNode` method, we can get the volume name outside and transform to them as a method parameter.
+
+* Then we can create a common `DesiredStateOfWorld` and `ActualStateOfWorld`, implemented as fields of the `ControllerContext` object, instead of creating them inside `NewAttachDetachController`. After this, `AttachDetachController` and `ExpandController` can both access it through `ControllerContext`.
+

Lets list out all the ideas we discussed in Github issue, so as we know all the available options. The other ways of implementing this is:

  1. We discussed volumemanager on kubelet side could watch for PVCs and perform FS resize when certain condition is met. The downside of this approach is - in a large enough cluster (think 5000 nodes), it may not be good idea to create such a large number of watches. Having said that, volumemanager need not watch for all PVCs - it can only watch for PVCs which are in its actual_state_of_world. I am not sure, performance penalty of watching all PVCs VS watching only a small subset of PVCs.
  2. Another option is, ExpandController can build its own object graph of pvc/pod/node mapping. This will enable us to determine pod/node where PVC is being used without sharing a cache with Attach/Detach controller.
  3. We have kicked around idea of adding node name to the PVC for awhile now after attach is done. This will require an API change, but it will enable us to determine pvc->node relationship rather quickly.

Jan Šafránek

unread,
Jan 25, 2018, 8:50:55 AM1/25/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

I probably miss something very obvious, but why do we need new annotations? Right now, kubelet already knows that a PV needs filesystems resize, because it does offline resize. And it also can update the PVC and "finish" the resize so user knows it's done. So why a new annotation and a new controller?

Jan Šafránek

unread,
Jan 25, 2018, 8:53:46 AM1/25/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@jsafrane commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +
+    return nil
+}
+```
+
+### File System Resize on Kubelet
+
+#### Prerequisite of Online File System Resize
+
+* Pod running on the node has one more `volumeFSResizingAnnotation` annotations.
+* Volume with a online resizing request has been mounted to the node.
+* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`.
+
+#### Volume Resize Request Detect
+
+Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state. This interface will look like this:

Kubelet already loops through all mounts in VolumeManager. It can see that a volume needs fs resize there. And based on whether the mount exists in its ASW, it can see if it's going to perform online or offline resize.

In addition, fs resize must be coordinated with VolumeManager anyway, since VolumeManager should not touch the volume (e.g. unmount it) during resize.

mlmhl

unread,
Jan 25, 2018, 9:30:18 PM1/25/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@jsafrane

I probably miss something very obvious, but why do we need new annotations?

Kubelet only watches for pod and only fetch PVC object from apiserver when pod starting, won't get latest PVC object again after pod is already started. So if we resize the volume after pod started, kubelet can't realize this request.

We add a annotation to pod to inform kubelet some volumes need resizing, but as @gnufied mentioned, this is just one of some available approaches to inform kubelet.

mlmhl

unread,
Jan 25, 2018, 10:04:38 PM1/25/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +
+    return nil
+}
+```
+
+### File System Resize on Kubelet
+
+#### Prerequisite of Online File System Resize
+
+* Pod running on the node has one more `volumeFSResizingAnnotation` annotations.
+* Volume with a online resizing request has been mounted to the node.
+* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`.
+
+#### Volume Resize Request Detect
+
+Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state. This interface will look like this:

Kubelet already loops through all mounts in VolumeManager. It can see that a volume needs fs resize there.

The problem is that the volume objects cached in ASW is only updated when pod starting, and won't update after pod started any more, it means that if we resize the volume after pod already started, VolumeManager can't realize this change from ASW.

In addition, fs resize must be coordinated with VolumeManager anyway, since VolumeManager should not touch the volume (e.g. unmount it) during resize.

We append the resize operation to OperationExecutor so that VolumeManager can't touch the volume before resize operation finished.

mlmhl

unread,
Jan 25, 2018, 10:06:54 PM1/25/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +
+    return nil
+}
+```
+
+### File System Resize on Kubelet
+
+#### Prerequisite of Online File System Resize
+
+* Pod running on the node has one more `volumeFSResizingAnnotation` annotations.
+* Volume with a online resizing request has been mounted to the node.
+* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`.
+
+#### Volume Resize Request Detect
+
+Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state. This interface will look like this:

The original discussion in this issue: kubernetes/kubernetes#57185

Hemant Kumar

unread,
Jan 26, 2018, 10:17:28 AM1/26/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@gnufied commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +
+    return nil
+}
+```
+
+### File System Resize on Kubelet
+
+#### Prerequisite of Online File System Resize
+
+* Pod running on the node has one more `volumeFSResizingAnnotation` annotations.
+* Volume with a online resizing request has been mounted to the node.
+* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`.
+
+#### Volume Resize Request Detect
+
+Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state. This interface will look like this:

I think - what we can do is. When we are iterating through pods in volumemanager's desired_state_of_world for mount and call PodExistsInVolume function, then that function can make an additional API call to check if underlying volume needs resizing. Currently PodExistsInVolume already raises an error remountRequired when some change requires us to remount the volume. So we can introduce a new error resizeRequired which will be raised by the function and that in turn will cause file system resize on the kubelet.

This will require no API change or annotation and everything will work online as expected. thanks @jsafrane for pointing it out.

Hemant Kumar

unread,
Jan 26, 2018, 11:42:12 AM1/26/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@gnufied commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +
+    return nil
+}
+```
+
+### File System Resize on Kubelet
+
+#### Prerequisite of Online File System Resize
+
+* Pod running on the node has one more `volumeFSResizingAnnotation` annotations.
+* Volume with a online resizing request has been mounted to the node.
+* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`.
+
+#### Volume Resize Request Detect
+
+Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state. This interface will look like this:

One potential problem with ^ approach is - the reconciler loop turns every 100ms and say if a node has 20 pods with volumes, then we will be making 20 GET requests every 100ms, because there is no informer inside volume_manager. Again - I am not sure how badly it will affect us.

mlmhl

unread,
Jan 27, 2018, 4:21:41 AM1/27/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +
+    return nil
+}
+```
+
+### File System Resize on Kubelet
+
+#### Prerequisite of Online File System Resize
+
+* Pod running on the node has one more `volumeFSResizingAnnotation` annotations.
+* Volume with a online resizing request has been mounted to the node.
+* `PV.Spec.Capacity` is greater than `PVC.Status.Capacity`.
+
+#### Volume Resize Request Detect
+
+Kubelet need to periodically loop through the list of active pods to detect which volume requires file system resizing. We use a `VolumeToResize` object to represent a volume resizing requests and introduce a new interface `VolumeFsResizeMap` rather than `DesiredStateOfWorld` to cache such `VolumeToResize` objects, since we treat it as a request queue, not a state. This interface will look like this:

Maybe we can add a lastUpdateTimestamp for each volumes, and use an independent loop interval with the reconciler, this can reduce the request counts, but I am not sure how much help this can have

Christoph Blecker

unread,
Jan 29, 2018, 12:29:54 PM1/29/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

/ok-to-test

Jan Šafránek

unread,
Feb 2, 2018, 8:49:30 AM2/2/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

We need help from @kubernetes/sig-node-proposals.

kubelet needs to know when a volume has been resized so it can call /sbin/resize2fs (or similar utility) to resize filesystem on the volume. Kubelet does not have any informer on PV/PVCs, because it would allow compromised node to read all PVs / PVCs on the system.

This proposal adds a new controller that detects this on behalf of kubelet and puts annotations to pods. Kubelet has informer on pods so it can get an event, get the affected PV from API server and resize it.

I would be much better if kubelet had some sort of informer on PVs or PVCs so it can get event when a PV (or PVC) is resized, without a special controller poking pods just to let kubelet know.

Hemant Kumar

unread,
Apr 5, 2018, 5:10:31 PM4/5/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

To use approach similar to remountRequired feature what we can do is:

  1. on syncPod we can request reprocessing of pod. This does not require any new change, all pods are reprorcessed periodically.
  2. We need to implement a caching pvc manager on the line of https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/configmap/configmap_manager.go#L98 . So basically pvc information is cached there with a TTL.
  3. When reprocessing the pod in desired_state_of_world_populator, it can check for pvc change and add a flag to attachedVolume.mountedPods struct to indicate that a resize is requested.
  4. Whenever reconciler loop turns and it detects that a volume requires resize, it will do that online.

mlmhl

unread,
Apr 6, 2018, 11:51:19 PM4/6/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

We need to implement a caching pvc manager on the line of https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/configmap/configmap_manager.go#L98 . So basically pvc information is cached there with a TTL.

This looks like a good solution, I will test it first and then update this proposal for it if it works.

k8s-ci-robot

unread,
Apr 9, 2018, 7:48:28 AM4/9/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: saad-ali

Assign the PR to them by writing /assign @saad-ali in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mlmhl

unread,
Apr 9, 2018, 8:09:36 AM4/9/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@gnufied
This proposal is updated to conform to our new solution, PTAL.

Hemant Kumar

unread,
Apr 9, 2018, 11:56:05 AM4/9/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@gnufied commented on this pull request.

mostly looks good. left some minor comments.

cc @derekwaynecarr @sjenning @Random-Liu for kubelet review.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +	if err != nil &&
+		!nestedpendingoperations.IsAlreadyExists(err) &&
+		!exponentialbackoff.IsExponentialBackoff(err) {
+		// Ignore nestedpendingoperations.IsAlreadyExists and exponentialbackoff.IsExponentialBackoff errors, they are expected.
+		// Log all other errors.
+		glog.Errorf(volumeToMount.GenerateErrorDetailed("operationExecutor.VolumeFileSystemResize failed", err).Error())
+	}
+	if err == nil {
+		glog.V(4).Infof(volumeToMount.GenerateMsgDetailed("operationExecutor.VolumeFileSystemResize started", ""))
+	}
+}
+```
+
+Can be seen from the above code, we add a `VolumeFileSystemResize` method to `OperationExecutor` and a `GenerateVolumeFSResizeFunc` method to `OperationGenerator`, which performs volume file system resizing operation.
+
+We also add a `MarkVolumeAsResized` to `ActualStateOfWorldMounterUpdater`, which will be invoked after the file system resizing operation succeeded. `MarkVolumeAsResized` clears the `fsResizeRequired` flag of according `mountedPod` to promise no more resize operation will be performed.

What is ActualStateOfWorldMounterUpdater ?


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +	}
+	processedVolumesForFSResize.Insert(string(uniqueVolumeName))
+}
+```
+
+It is important to note that:
+
+- we only perform online resizing for mounted volumes, it means that these volumes must exist in `ActualStateOfWorld`, so we fetch all mount volumes from ASW in advance(stored in input parameter `mountedVolumesForPod`), and skip volumes which not exist in `mountedVolumesForPod`
+
+- Although we add `fsResizeRequired` to `mountedPod`, actually file system resizing is a global operation for volume, if more than one pod mount a same PVC, we needn't invoke `MarkFSResizeRequired` for each pod. Instead, we only mark the first pod we processed. The input parameter `processedVolumesForFSResize` is introduced for this purpose.
+
+#### File System Resizing Operation
+
+Currently `Reconciler` runs a periodic loop to reconcile the desired state of the world with the actual state of the world by triggering attach, detach, mount, and unmount operations. Based on this mechanism, we can make `Reconciler` to trigger file system resize operation too.
+
+In each loop, for each volume in DSW, `Reconciler` checks whether this volume exists in ASW or not. If volume exists but marked as file system resizing required, ASW returns a `fsResizeRequiredError` error. then `Reconciler` triggers a file system resizing operation for this volume. The code snippet looks like this:

So I assume this code will run in place where we check for mounted volumes right?


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +		asw.attachedVolumes[volumeName].mountedPods[podName] = podObj
+	}
+}
+```
+
+And `DesiredStateOfWorldPopulator` uses a `checkVolumeFSResize` method to perform the operation described as above, which invokes inside `processPodVolumes` method:
+
+```go
+// checkVolumeFSResize checks whether a PVC mounted by the pod requires file
+// system resize or not. If so, marks this volume as fsResizeRequired in ASW.
+// - mountedVolumesForPod stores all mounted volumes in ASW, because online
+//   volume resize only considers mounted volumes.
+// - processedVolumesForFSResize stores all volumes we have checked in current loop,
+//   because file system resize operation is a global operation for volume, so
+//   we only need to check it once if more than one pod use it.use it.
+func (dswp *desiredStateOfWorldPopulator) checkVolumeFSResize(

It might be worth improving this check to consider only RW volumes within pods. I know that we will be mostly dealing with global mount path but it is best to skip entire resize operation for the pod, if volume is mounted as readonly.

Hemant Kumar

unread,
Apr 9, 2018, 12:00:16 PM4/9/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@gnufied commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +	- [Online File System Resize in Kbuelet](#online-file-system-resize-in-kbuelet)
+		- [Volume Resize Request Detect](#volume-resize-request-detect)
+		- [File System Resizing Operation](#file-system-resizing-operation)
+
+## Overview
+
+Release 1.10 only supports offline file system resizing for PVCs, as this operation is only executed inside the `MountVolume` operation in kubelet. If a resizing request was submitted after the volume mounted, it won't be performed. This proposal intent to support online file system resizing for PVCs in kubelet.
+
+## Goals
+
+Enable users to increase size of PVC which already in use(mounted). The user will update PVC for requesting a new size.  Underneath we expect that - according kubelet will resize the file system for the PVC.
+
+## Non Goals
+
+* Offline file system resizing not included. If we found a volume need file system resizing but not mounted to the node yet, we will do nothing. This situation will be dealt with by the existing [offline file system resizing handler](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md).
+

Lets reword "If we found a volume need file system resizing but not mounted to the node yet, we will do nothing." to -> "If we found a volume need file system resizing but not mounted to the node yet, we will do nothing. The file system on volume will be resized according to existing

mlmhl

unread,
Apr 9, 2018, 10:59:27 PM4/9/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +	if err != nil &&
+		!nestedpendingoperations.IsAlreadyExists(err) &&
+		!exponentialbackoff.IsExponentialBackoff(err) {
+		// Ignore nestedpendingoperations.IsAlreadyExists and exponentialbackoff.IsExponentialBackoff errors, they are expected.
+		// Log all other errors.
+		glog.Errorf(volumeToMount.GenerateErrorDetailed("operationExecutor.VolumeFileSystemResize failed", err).Error())
+	}
+	if err == nil {
+		glog.V(4).Infof(volumeToMount.GenerateMsgDetailed("operationExecutor.VolumeFileSystemResize started", ""))
+	}
+}
+```
+
+Can be seen from the above code, we add a `VolumeFileSystemResize` method to `OperationExecutor` and a `GenerateVolumeFSResizeFunc` method to `OperationGenerator`, which performs volume file system resizing operation.
+
+We also add a `MarkVolumeAsResized` to `ActualStateOfWorldMounterUpdater`, which will be invoked after the file system resizing operation succeeded. `MarkVolumeAsResized` clears the `fsResizeRequired` flag of according `mountedPod` to promise no more resize operation will be performed.

ActualStateOfWorldMounterUpdater is defined in here. OperationGenerator uses it to MarkVolumeAsMounted, MarkVolumeAsUnmounted, etc. VolumeManager's ActualStateOfWorld implements this interface.

mlmhl

unread,
Apr 9, 2018, 11:19:51 PM4/9/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +	}
+	processedVolumesForFSResize.Insert(string(uniqueVolumeName))
+}
+```
+
+It is important to note that:
+
+- we only perform online resizing for mounted volumes, it means that these volumes must exist in `ActualStateOfWorld`, so we fetch all mount volumes from ASW in advance(stored in input parameter `mountedVolumesForPod`), and skip volumes which not exist in `mountedVolumesForPod`
+
+- Although we add `fsResizeRequired` to `mountedPod`, actually file system resizing is a global operation for volume, if more than one pod mount a same PVC, we needn't invoke `MarkFSResizeRequired` for each pod. Instead, we only mark the first pod we processed. The input parameter `processedVolumesForFSResize` is introduced for this purpose.
+
+#### File System Resizing Operation
+
+Currently `Reconciler` runs a periodic loop to reconcile the desired state of the world with the actual state of the world by triggering attach, detach, mount, and unmount operations. Based on this mechanism, we can make `Reconciler` to trigger file system resize operation too.
+
+In each loop, for each volume in DSW, `Reconciler` checks whether this volume exists in ASW or not. If volume exists but marked as file system resizing required, ASW returns a `fsResizeRequiredError` error. then `Reconciler` triggers a file system resizing operation for this volume. The code snippet looks like this:

Yeah, you are right. Currently the check for mounted volumes has two branches:

  • Volume isn't attached
  • Volume isn't mounted or requires remounted

This proposal adds a third branch: Volume is mounted and requires resizing.

mlmhl

unread,
Apr 10, 2018, 8:26:35 AM4/10/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +		asw.attachedVolumes[volumeName].mountedPods[podName] = podObj
+	}
+}
+```
+
+And `DesiredStateOfWorldPopulator` uses a `checkVolumeFSResize` method to perform the operation described as above, which invokes inside `processPodVolumes` method:
+
+```go
+// checkVolumeFSResize checks whether a PVC mounted by the pod requires file
+// system resize or not. If so, marks this volume as fsResizeRequired in ASW.
+// - mountedVolumesForPod stores all mounted volumes in ASW, because online
+//   volume resize only considers mounted volumes.
+// - processedVolumesForFSResize stores all volumes we have checked in current loop,
+//   because file system resize operation is a global operation for volume, so
+//   we only need to check it once if more than one pod use it.use it.
+func (dswp *desiredStateOfWorldPopulator) checkVolumeFSResize(

Nice idea, I've already added this check to this proposal, PTAL.

Saad Ali

unread,
Apr 10, 2018, 4:16:42 PM4/10/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@saad-ali commented on this pull request.

Can you add a section indicating in what cases online expansion is safe. Traditionally filesystem resizing requires volume to be unmounted first. Explain why it is safe to do resize without unmount.

Hemant Kumar

unread,
Apr 10, 2018, 9:32:26 PM4/10/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

Linux kernel-3.3 introduced online resizing of ext4 volumes (https://www.ibm.com/developerworks/library/l-33linuxkernel/) and xfs always supported growing mounted partitions (in-fact currently there is no way to expand a unmounted xfs file system).

These are the only 2 file systems we support for resizing currently. Our motivation for doing online resizing is - we have had many users on slack and elsewhere confused about current resizing workflow. It is not obvious when they have to delete and recreate the pod. We have added conditions and events for that, but the workflow is still not very user friendly.

mlmhl

unread,
Apr 10, 2018, 10:14:11 PM4/10/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

I think we can add a section to inform the user that currently we only support resizing for xfs, ext3, ext4, and they are all safe for online resizing. BTW, right now offline resizing is also performed after the volume is mounted as there is no way to expand a unmounted xfs file system.

mlmhl

unread,
Apr 10, 2018, 11:33:27 PM4/10/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +Enable users to increase size of PVC which already in use(mounted). The user will update PVC for requesting a new size.  Underneath we expect that - according kubelet will resize the file system for the PVC.
+
+## Non Goals
+
+* Offline file system resizing not included. If we found a volume need file system resizing but not mounted to the node yet, we will do nothing. This situation will be dealt with by the existing [offline file system resizing handler](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md
).
+
+* Extending resize tools: We only support for most common file systems' offline resizing in current release, and we prefer to stay the same for the online one.
+
+## Use Cases
+
+* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data)
+* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline)
+
+## Prerequisite
+
+Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod.

@saad-ali @gnufied I've added a section in here to illustrate the safety for online resizing, PTAL.

Saad Ali

unread,
Apr 12, 2018, 1:29:52 PM4/12/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@saad-ali commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +Enable users to increase size of PVC which already in use(mounted). The user will update PVC for requesting a new size.  Underneath we expect that - according kubelet will resize the file system for the PVC.
+
+## Non Goals
+
+* Offline file system resizing not included. If we found a volume need file system resizing but not mounted to the node yet, we will do nothing. This situation will be dealt with by the existing [offline file system resizing handler](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md).
+
+* Extending resize tools: We only support for most common file systems' offline resizing in current release, and we prefer to stay the same for the online one.
+
+## Use Cases
+
+* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data)
+* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline)
+
+## Prerequisite
+
+Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod.

Should this behavior be gated on a check of kernel version? If kernel < 3.3 then don't do online resizing?

mlmhl

unread,
Apr 13, 2018, 8:11:50 AM4/13/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +Enable users to increase size of PVC which already in use(mounted). The user will update PVC for requesting a new size.  Underneath we expect that - according kubelet will resize the file system for the PVC.
+
+## Non Goals
+
+* Offline file system resizing not included. If we found a volume need file system resizing but not mounted to the node yet, we will do nothing. This situation will be dealt with by the existing [offline file system resizing handler](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md).
+
+* Extending resize tools: We only support for most common file systems' offline resizing in current release, and we prefer to stay the same for the online one.
+
+## Use Cases
+
+* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data)
+* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline)
+
+## Prerequisite
+
+Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod.

Yeah, it's better to add this check. Currently offline resizing is also performed after volume mounted as there is no way to expand an unmounted xfs file system, so I guess we can add this check inside resizeFileSystem, then both online and offline operations can be gated. cc @gnufied

Hemant Kumar

unread,
Apr 13, 2018, 9:40:45 AM4/13/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@gnufied commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +Enable users to increase size of PVC which already in use(mounted). The user will update PVC for requesting a new size.  Underneath we expect that - according kubelet will resize the file system for the PVC.
+
+## Non Goals
+
+* Offline file system resizing not included. If we found a volume need file system resizing but not mounted to the node yet, we will do nothing. This situation will be dealt with by the existing [offline file system resizing handler](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md).
+
+* Extending resize tools: We only support for most common file systems' offline resizing in current release, and we prefer to stay the same for the online one.
+
+## Use Cases
+
+* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data)
+* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline)
+
+## Prerequisite
+
+Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod.

I am not sure about adding this check. Linux Kernel 3.3 is really old. Docker requirement is 3.10+ - https://docs.docker.com/install/linux/docker-ce/binaries/

mlmhl

unread,
Apr 14, 2018, 4:51:27 AM4/14/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +## Non Goals
+
+* Offline file system resizing not included. If we found a volume need file system resizing but not mounted to the node yet, we will do nothing. This situation will be dealt with by the existing [offline file system resizing handler](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md).
+
+* Extending resize tools: We only support for most common file systems' offline resizing in current release, and we prefer to stay the same for the online one.
+
+## Use Cases
+
+* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data)
+* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline)
+
+## Prerequisite
+
+- Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod.
+
+- This feature will be protected by an alpha feature gate `ExpandOnlinePersistentVolumes` in v1.11. We separate this feature gate from the offline resizing gate `ExpandPersistentVolumes`, if user wants to enable this feature, `ExpandPersistentVolumes` should be enabled first.

@gnufied The description of online resizing feature gate is added here.

Jan Šafránek

unread,
Apr 16, 2018, 7:10:20 AM4/16/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@jsafrane commented on this pull request.

I have couple of nits here and there. I did not review the code snippets though.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +	// UnregisterPod unregisters PersistentVolumeClaims from a given pod that are not
+	// used by any other registered pod.
+	UnregisterPod(pod *v1.Pod)
+}
+```
+
+`RegisterPod` is invoked when a pod is added or updated to  `PodManager`, and `UnregisterPod` is invoked when a pod is deleted from `PodManager`. `RegisterPod` just increases the count reference of each PVC used by the pod, according PVC
+ object won't be fetched from `apiserver` until `GetPersistentVolumeClaim` is invoked for the first time. Similarly, `UnregisterPod` just reduce the reference count and clear according item if no alive pods use this volume.
+
+`VolumeManager` invokes `GetPersistentVolumeClaim` to check whether a volume requires file system resize or not. `GetPersistentVolumeClaim` will try to fetch the PVC object in the following order:
+
+- PVC doesn't exist in local cache, it means this is the first time to request this volume, we need to fetch it from `etcd`
+- PVC exists in local cache but expired, it means this is just a periodic refresh of a PVC we successfully fetched previously, we can just fetch it from `apiserver`'s cache(setting `ResourceVersion` to 0)
+- PVC exists in local cache and hasn't expired, just return the cached object
+
+### Online File System Resize in Kbuelet

Kbuelet -> Kubelet


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +	// WARNING: Register/UnregisterPod functions should be efficient,
+	// i.e. should not block on network operations.
+
+	// RegisterPod registers all PersistentVolumeClaims from a given pod.
+	RegisterPod(pod *v1.Pod)
+
+	// UnregisterPod unregisters PersistentVolumeClaims from a given pod that are not
+	// used by any other registered pod.
+	UnregisterPod(pod *v1.Pod)
+}
+```
+
+`RegisterPod` is invoked when a pod is added or updated to  `PodManager`, and `UnregisterPod` is invoked when a pod is deleted from `PodManager`. `RegisterPod` just increases the count reference of each PVC used by the pod, according PVC
+ object won't be fetched from `apiserver` until `GetPersistentVolumeClaim` is invoked for the first time. Similarly, `UnregisterPod` just reduce the reference count and clear according item if no alive pods use this volume.
+
+`VolumeManager` invokes `GetPersistentVolumeClaim` to check whether a volume requires file system resize or not. `GetPersistentVolumeClaim` will try to fetch the PVC object in the following order:

It's quite confusing to have two interfaces named Manager and VolumeManager. I guess there is nothing we can do about it, but please be extra careful with variable names and so on - it should be clear which manager is used where and how.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +`VolumeManager` invokes `GetPersistentVolumeClaim` to check whether a volume requires file system resize or not. `GetPersistentVolumeClaim` will try to fetch the PVC object in the following order:
+
+- PVC doesn't exist in local cache, it means this is the first time to request this volume, we need to fetch it from `etcd`
+- PVC exists in local cache but expired, it means this is just a periodic refresh of a PVC we successfully fetched previously, we can just fetch it from `apiserver`'s cache(setting `ResourceVersion` to 0)
+- PVC exists in local cache and hasn't expired, just return the cached object
+
+### Online File System Resize in Kbuelet
+
+#### Volume Resize Request Detect
+
+Currently `DesiredStateOfWorldPopulator` loops through all alive pods in `PodManager` and reprocess each pod periodically. When we process the pod, we can also loop through each PVCs mounted by this pod to detect which volume requires file system resizing. A PVC requires file system resizing if:
+
+* `PVC.Status.Capacity` is less than  `PV.Spec.Capacity`
+* `PVC.Status.Condition` has a `PersistentVolumeClaimFileSystemResizePending` condition
+
+To mark a volume as file system resizing required, we add a `fsResizeRequired` field to `attachedVolume.mountedPod`, , and add a `MarkFSResizeRequired` method to `ActualStateOfWorld`. If `DesiredStateOfWorldPopulator` discovers a volume requires file system resizing, it invokes `MarkFSResizeRequired` to set according `mountedPod.fsResizeRequired` to true.

typo: ", ,"


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +	// This volume is a PVC, we should check it whether requires file system resize or not.
+	pvc, err := dswp.pvcManager.GetPersistentVolumeClaim(pod.Namespace, podVolume.PersistentVolumeClaim.ClaimName)
+	if err != nil {
+		glog.Errorf("Failed to check whether pod volume(%s/%s) requires "+
+			"file system resize or not, get PVC(%s/%s) from PVCManager failed: %v",
+			uniquePodName, podVolume.Name, pod.Namespace, podVolume.PersistentVolumeClaim.ClaimName, err)
+		return
+	}
+	if volumeRequiresFSResize(pvc, volumeSpec.PersistentVolume) {
+		dswp.actualStateOfWorld.MarkFSResizeRequired(uniqueVolumeName, uniquePodName)
+	}
+	processedVolumesForFSResize.Insert(string(uniqueVolumeName))
+}
+```
+
+It is important to note that:

Please add a note that since we support only XFS and EXT3/4, we do not need to worry about exotic filesystems that can be attached / mounted to multiple nodes at the same time and require to run a resize command on a node, such as gfs2. Who knows what happens if two or more nodes run gfs2_grow at the same time. IMO GFS2 should survive, however let's not play with devil.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +		!exponentialbackoff.IsExponentialBackoff(err) {
+		// Ignore nestedpendingoperations.IsAlreadyExists and exponentialbackoff.IsExponentialBackoff errors, they are expected.
+		// Log all other errors.
+		glog.Errorf(volumeToMount.GenerateErrorDetailed("operationExecutor.VolumeFileSystemResize failed", err).Error())
+	}
+	if err == nil {
+		glog.V(4).Infof(volumeToMount.GenerateMsgDetailed("operationExecutor.VolumeFileSystemResize started", ""))
+	}
+}
+```
+
+Can be seen from the above code, we add a `VolumeFileSystemResize` method to `OperationExecutor` and a `GenerateVolumeFSResizeFunc` method to `OperationGenerator`, which performs volume file system resizing operation.
+
+We also add a `MarkVolumeAsResized` to `ActualStateOfWorldMounterUpdater`, which will be invoked after the file system resizing operation succeeded. `MarkVolumeAsResized` clears the `fsResizeRequired` flag of according `mountedPod` to promise no more resize operation will be performed.
+
+ The main work of `GenerateVolumeFSResizeFunc` is simply invoking the `resizeFileSystem` method and `MarkVolumeAsResized`, looks like this:
  • Who / when changes PVC size and/or conditions?
  • How does it handle errors? I see it in the code, however it should be explicitly noted somewhere.

Jan Šafránek

unread,
Apr 16, 2018, 7:57:19 AM4/16/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@jsafrane commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +## Use Cases
+
+* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data)
+* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline)
+
+## Prerequisite
+
+- Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/
), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod.
+
+- This feature will be protected by an alpha feature gate `ExpandOnlinePersistentVolumes` in v1.11. We separate this feature gate from the offline resizing gate `ExpandPersistentVolumes`, if user wants to enable this feature, `ExpandPersistentVolumes` should be enabled first.
+
+## Implementation Design
+
+### Design Overview
+
+The key point of online file system resizing is how kubelet to discover which  PVCs need file system resizing. We achieve this goal by adding a PVC cache for kubelet. In detail, when kubelet receives a new pod, it adds all PVCs mounted by this pod to the cache, and  remove according PVCs when a pod is deleted. The PVC information is cached there with a TTL to reduce the load on both `etcd` and `apiserver`.

@liggitt, heads up: this proposal adds PVC cache into Kubelet, just as it has Secrets and ConfigMap caches. It will refresh (GET) all PVCs used by running or scheduled pods every now and then.

Should we worry about performance / API server load?

mlmhl

unread,
Apr 16, 2018, 11:55:45 PM4/16/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +	// UnregisterPod unregisters PersistentVolumeClaims from a given pod that are not
+	// used by any other registered pod.
+	UnregisterPod(pod *v1.Pod)
+}
+```
+
+`RegisterPod` is invoked when a pod is added or updated to  `PodManager`, and `UnregisterPod` is invoked when a pod is deleted from `PodManager`. `RegisterPod` just increases the count reference of each PVC used by the pod, according PVC
+ object won't be fetched from `apiserver` until `GetPersistentVolumeClaim` is invoked for the first time. Similarly, `UnregisterPod` just reduce the reference count and clear according item if no alive pods use this volume.
+
+`VolumeManager` invokes `GetPersistentVolumeClaim` to check whether a volume requires file system resize or not. `GetPersistentVolumeClaim` will try to fetch the PVC object in the following order:
+
+- PVC doesn't exist in local cache, it means this is the first time to request this volume, we need to fetch it from `etcd`
+- PVC exists in local cache but expired, it means this is just a periodic refresh of a PVC we successfully fetched previously, we can just fetch it from `apiserver`'s cache(setting `ResourceVersion` to 0)
+- PVC exists in local cache and hasn't expired, just return the cached object
+
+### Online File System Resize in Kbuelet

Nice catch, fixed.

mlmhl

unread,
Apr 16, 2018, 11:56:38 PM4/16/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +	// WARNING: Register/UnregisterPod functions should be efficient,
+	// i.e. should not block on network operations.
+
+	// RegisterPod registers all PersistentVolumeClaims from a given pod.
+	RegisterPod(pod *v1.Pod)
+
+	// UnregisterPod unregisters PersistentVolumeClaims from a given pod that are not
+	// used by any other registered pod.
+	UnregisterPod(pod *v1.Pod)
+}
+```
+
+`RegisterPod` is invoked when a pod is added or updated to  `PodManager`, and `UnregisterPod` is invoked when a pod is deleted from `PodManager`. `RegisterPod` just increases the count reference of each PVC used by the pod, according PVC
+ object won't be fetched from `apiserver` until `GetPersistentVolumeClaim` is invoked for the first time. Similarly, `UnregisterPod` just reduce the reference count and clear according item if no alive pods use this volume.
+
+`VolumeManager` invokes `GetPersistentVolumeClaim` to check whether a volume requires file system resize or not. `GetPersistentVolumeClaim` will try to fetch the PVC object in the following order:

Thanks for reminding, I've already adjusted the description in here to reduce the confusion.

mlmhl

unread,
Apr 16, 2018, 11:56:55 PM4/16/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +`VolumeManager` invokes `GetPersistentVolumeClaim` to check whether a volume requires file system resize or not. `GetPersistentVolumeClaim` will try to fetch the PVC object in the following order:
+
+- PVC doesn't exist in local cache, it means this is the first time to request this volume, we need to fetch it from `etcd`
+- PVC exists in local cache but expired, it means this is just a periodic refresh of a PVC we successfully fetched previously, we can just fetch it from `apiserver`'s cache(setting `ResourceVersion` to 0)
+- PVC exists in local cache and hasn't expired, just return the cached object
+
+### Online File System Resize in Kbuelet
+
+#### Volume Resize Request Detect
+
+Currently `DesiredStateOfWorldPopulator` loops through all alive pods in `PodManager` and reprocess each pod periodically. When we process the pod, we can also loop through each PVCs mounted by this pod to detect which volume requires file system resizing. A PVC requires file system resizing if:
+
+* `PVC.Status.Capacity` is less than  `PV.Spec.Capacity`
+* `PVC.Status.Condition` has a `PersistentVolumeClaimFileSystemResizePending` condition
+
+To mark a volume as file system resizing required, we add a `fsResizeRequired` field to `attachedVolume.mountedPod`, , and add a `MarkFSResizeRequired` method to `ActualStateOfWorld`. If `DesiredStateOfWorldPopulator` discovers a volume requires file system resizing, it invokes `MarkFSResizeRequired` to set according `mountedPod.fsResizeRequired` to true.

Nice catch, fixed.

mlmhl

unread,
Apr 16, 2018, 11:57:22 PM4/16/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +	// This volume is a PVC, we should check it whether requires file system resize or not.
+	pvc, err := dswp.pvcManager.GetPersistentVolumeClaim(pod.Namespace, podVolume.PersistentVolumeClaim.ClaimName)
+	if err != nil {
+		glog.Errorf("Failed to check whether pod volume(%s/%s) requires "+
+			"file system resize or not, get PVC(%s/%s) from PVCManager failed: %v",
+			uniquePodName, podVolume.Name, pod.Namespace, podVolume.PersistentVolumeClaim.ClaimName, err)
+		return
+	}
+	if volumeRequiresFSResize(pvc, volumeSpec.PersistentVolume) {
+		dswp.actualStateOfWorld.MarkFSResizeRequired(uniqueVolumeName, uniquePodName)
+	}
+	processedVolumesForFSResize.Insert(string(uniqueVolumeName))
+}
+```
+
+It is important to note that:

Thanks for reminding, a note is added for this.

mlmhl

unread,
Apr 16, 2018, 11:58:02 PM4/16/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +		!exponentialbackoff.IsExponentialBackoff(err) {
+		// Ignore nestedpendingoperations.IsAlreadyExists and exponentialbackoff.IsExponentialBackoff errors, they are expected.
+		// Log all other errors.
+		glog.Errorf(volumeToMount.GenerateErrorDetailed("operationExecutor.VolumeFileSystemResize failed", err).Error())
+	}
+	if err == nil {
+		glog.V(4).Infof(volumeToMount.GenerateMsgDetailed("operationExecutor.VolumeFileSystemResize started", ""))
+	}
+}
+```
+
+Can be seen from the above code, we add a `VolumeFileSystemResize` method to `OperationExecutor` and a `GenerateVolumeFSResizeFunc` method to `OperationGenerator`, which performs volume file system resizing operation.
+
+We also add a `MarkVolumeAsResized` to `ActualStateOfWorldMounterUpdater`, which will be invoked after the file system resizing operation succeeded. `MarkVolumeAsResized` clears the `fsResizeRequired` flag of according `mountedPod` to promise no more resize operation will be performed.
+
+ The main work of `GenerateVolumeFSResizeFunc` is simply invoking the `resizeFileSystem` method and `MarkVolumeAsResized`, looks like this:

Thanks for reminding, I've add two notes to describe the PVC size/condition management and error handling.

Patrick Lang

unread,
Apr 17, 2018, 1:33:16 PM4/17/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

Windows NTFS supports both online expand & shrink

Yu-Ju Hong

unread,
Apr 24, 2018, 1:38:48 PM4/24/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@yujuhong commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +## Use Cases
+
+* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data)
+* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline)
+
+## Prerequisite
+
+- Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod.
+
+- This feature will be protected by an alpha feature gate `ExpandOnlinePersistentVolumes` in v1.11. We separate this feature gate from the offline resizing gate `ExpandPersistentVolumes`, if user wants to enable this feature, `ExpandPersistentVolumes` should be enabled first.
+
+## Implementation Design
+
+### Design Overview
+
+The key point of online file system resizing is how kubelet to discover which  PVCs need file system resizing. We achieve this goal by adding a PVC cache for kubelet. In detail, when kubelet receives a new pod, it adds all PVCs mounted by this pod to the cache, and  remove according PVCs when a pod is deleted. The PVC information is cached there with a TTL to reduce the load on both `etcd` and `apiserver`.

Is the TTL going to be set according the cluster size similar to that for Secrets and ConfigMaps? I think in general we do have concerns about the scalability of these requests. The hope (AFAIK) is to switch to "watch" if the security concerns can be addressed.

/cc @wojtek-t for the scalability issues.

Hemant Kumar

unread,
Apr 24, 2018, 9:11:28 PM4/24/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@gnufied commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +## Use Cases
+
+* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data)
+* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline)
+
+## Prerequisite
+
+- Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod.
+
+- This feature will be protected by an alpha feature gate `ExpandOnlinePersistentVolumes` in v1.11. We separate this feature gate from the offline resizing gate `ExpandPersistentVolumes`, if user wants to enable this feature, `ExpandPersistentVolumes` should be enabled first.
+
+## Implementation Design
+
+### Design Overview
+
+The key point of online file system resizing is how kubelet to discover which  PVCs need file system resizing. We achieve this goal by adding a PVC cache for kubelet. In detail, when kubelet receives a new pod, it adds all PVCs mounted by this pod to the cache, and  remove according PVCs when a pod is deleted. The PVC information is cached there with a TTL to reduce the load on both `etcd` and `apiserver`.

Isn't there a cost associated with watches as well? I have seen some production failures where number of watches pushed etcd over the edge. If watches is the way to go, the watch has to apply selectively and should not be cluster wide - right?

Hemant Kumar

unread,
Apr 24, 2018, 9:18:53 PM4/24/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@gnufied commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +## Use Cases
+
+* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data)
+* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline)
+
+## Prerequisite
+
+- Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod.
+
+- This feature will be protected by an alpha feature gate `ExpandOnlinePersistentVolumes` in v1.11. We separate this feature gate from the offline resizing gate `ExpandPersistentVolumes`, if user wants to enable this feature, `ExpandPersistentVolumes` should be enabled first.
+
+## Implementation Design
+
+### Design Overview
+
+The key point of online file system resizing is how kubelet to discover which  PVCs need file system resizing. We achieve this goal by adding a PVC cache for kubelet. In detail, when kubelet receives a new pod, it adds all PVCs mounted by this pod to the cache, and  remove according PVCs when a pod is deleted. The PVC information is cached there with a TTL to reduce the load on both `etcd` and `apiserver`.

One of the other things we can do is - only poll/cache for PVCs which are "resizable". What I mean is - not all PVCs can be resized (https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md) and only PVCs which were dynamically provisioned from storageClass that has allowVolumeExpansion set can be expanded.

Steps could be:

  1. In existing resize admission controller, when allowing a PVC for resize; also set an annotation that marks PVC as resizable (bit like kubernetes.io/resizable: true)
  2. Whenever someone creates a pod using such a PVC, it will be polled as per this proposal and PVC will be cached and if a change detected volume will be resized.

One downside is - if a kube admin edits the SC then the PVC still can not be online resized until user deletes and recreates the pod. I think that is an acceptable compromise that applies to other cases where editing SC does not result in state change of in-use PVC.

@saad-ali @jsafrane thoughts?

mlmhl

unread,
Apr 25, 2018, 1:00:30 AM4/25/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +## Use Cases
+
+* As a user I am running Mysql on a 100GB volume - but I am running out of space, I should be able to increase size of volume mysql is using without losing all my data. (online and with data)
+* As a user I am running an application with a PVC. I should be able to resize the volume without losing data or mount point. (online and with data and without taking pod offline)
+
+## Prerequisite
+
+- Currently we only support offline resizing for `xfs`, `ext3`, `ext4`. Online resizing of `ext3`,`ext4` is introduced from [Linux kernel-3.3](https://www.ibm.com/developerworks/library/l-33linuxkernel/), and `xfs` always supported growing mounted partitions (in-fact currently there is no way to expand an unmounted `xfs` file system), so they are all safe for online resizing. If user tries to expand a volume with other formats an error event will be reported for according pod.
+
+- This feature will be protected by an alpha feature gate `ExpandOnlinePersistentVolumes` in v1.11. We separate this feature gate from the offline resizing gate `ExpandPersistentVolumes`, if user wants to enable this feature, `ExpandPersistentVolumes` should be enabled first.
+
+## Implementation Design
+
+### Design Overview
+
+The key point of online file system resizing is how kubelet to discover which  PVCs need file system resizing. We achieve this goal by adding a PVC cache for kubelet. In detail, when kubelet receives a new pod, it adds all PVCs mounted by this pod to the cache, and  remove according PVCs when a pod is deleted. The PVC information is cached there with a TTL to reduce the load on both `etcd` and `apiserver`.

@yujuhong Yeah, the TTL is similar to that for Secrets and ConfigMaps.

mlmhl

unread,
Apr 28, 2018, 2:55:57 AM4/28/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

Hi guys @jsafrane , @yujuhong , we changed this proposal again as we found that Kubelet's VolumeManager is already fetching PV/PVC during every pod resync loop, see here and here. So we decided to reuse this code, using the fetched PV/PVC object to check whether volume requires online resizing or not instead of adding a PVC cache.

Actually current PV/PVC reprocess mechanism has a hidden performance problem, so we opened a separate issue to track this problem: #63274

cc @gnufied

Hemant Kumar

unread,
Apr 28, 2018, 8:09:51 AM4/28/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

In light of @mlmhl finding that we are already fetching pvc/pv from api-server on every pod sync, I do agree that we do not need PVC cache similar to configmaps etc. This proposal can definitely work without introducing new api calls.

I propose we tackle the bug/issue with PVC/PV getting fetched all the time in separate github issue - kubernetes/kubernetes#63274 that has been opened. cc @saad-ali

Hemant Kumar

unread,
May 19, 2018, 9:29:15 AM5/19/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

/lgtm

cc @saad-ali @msau42 @jsafrane lets try and get this merged for 1.11. I do not think there are any remaining blockers here

Jan Šafránek

unread,
May 21, 2018, 3:13:20 AM5/21/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@jsafrane commented on this pull request.

lgtm-ish, however you should note somewhere what happens when volume is being resized and at the same time a pod is deleted and the same volume is being unmounted. IMO unmount must wait (should not be started) until resize finishes. Reconciler(?) should check that.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +- We only perform online resizing for PVCs used as`Filesystem` mode, because `Block` mode volume will not be formatted with a file system.
+
+- If a PVC is mounted as read only by a pod, we won't invoke `MarkFSResizeRequired` for this pod to promise the semantic of `ReadOnly`.
+
+- Although we add `fsResizeRequired` to `mountedPod`, actually file system resizing is a global operation for volume, if more than one pod mount a same PVC, we needn't invoke `MarkFSResizeRequired` for each pod. Instead, we only mark the first pod we processed. The input parameter `processedVolumesForFSResize` is introduced for this purpose.
+
+- Since we support only `xfs` and `ext3/4`, we needn't to worry about exotic file systems that can be attached/mounted to multiple nodes at the same time and require a resizing operation on a node, such as [gfs2](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/global_file_system_2/s1-manage-growfs).
+
+#### File System Resizing Operation
+
+Currently `Reconciler` runs a periodic loop to reconcile the desired state of the world with the actual state of the world by triggering attach, detach, mount, and unmount operations. Based on this mechanism, we can make `Reconciler` to trigger file system resize operation too.
+
+In each loop, for each volume in DSW, `Reconciler` checks whether this volume exists in ASW or not. If volume exists but marked as file system resizing required, ASW returns a `fsResizeRequiredError` error. then `Reconciler` triggers a file system resizing operation for this volume. The code snippet looks like this:
+
+```go
+if cache.IsFSResizeRequiredError(err) {

I don't like passing random return values via errors, they should be reserved to real errors / exceptions. But we can solve that in implementation phase and not in this proposal.

Hemant Kumar

unread,
May 21, 2018, 6:57:37 AM5/21/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@jsafrane yeah I think there is a note in original resize proposal that if pvc has any of resize-in-progress conditions then we will wait for it to clear before unmounting the volume.

mlmhl

unread,
May 22, 2018, 8:26:50 AM5/22/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@mlmhl commented on this pull request.


In contributors/design-proposals/storage/online-grow-volume-size.md:

> +- We only perform online resizing for PVCs used as`Filesystem` mode, because `Block` mode volume will not be formatted with a file system.
+
+- If a PVC is mounted as read only by a pod, we won't invoke `MarkFSResizeRequired` for this pod to promise the semantic of `ReadOnly`.
+
+- Although we add `fsResizeRequired` to `mountedPod`, actually file system resizing is a global operation for volume, if more than one pod mount a same PVC, we needn't invoke `MarkFSResizeRequired` for each pod. Instead, we only mark the first pod we processed. The input parameter `processedVolumesForFSResize` is introduced for this purpose.
+
+- Since we support only `xfs` and `ext3/4`, we needn't to worry about exotic file systems that can be attached/mounted to multiple nodes at the same time and require a resizing operation on a node, such as [gfs2](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/global_file_system_2/s1-manage-growfs).
+
+#### File System Resizing Operation
+
+Currently `Reconciler` runs a periodic loop to reconcile the desired state of the world with the actual state of the world by triggering attach, detach, mount, and unmount operations. Based on this mechanism, we can make `Reconciler` to trigger file system resize operation too.
+
+In each loop, for each volume in DSW, `Reconciler` checks whether this volume exists in ASW or not. If volume exists but marked as file system resizing required, ASW returns a `fsResizeRequiredError` error. then `Reconciler` triggers a file system resizing operation for this volume. The code snippet looks like this:
+
+```go
+if cache.IsFSResizeRequiredError(err) {

Currently we use a remountRequiredError to indicate a volume need remount, so I add a resizeRequiredError to keep consistency. As you said we can review this again in implementation phase.

Vasily Angapov

unread,
May 31, 2018, 9:12:06 PM5/31/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@angapov approved this pull request.

fejta-bot

unread,
Aug 29, 2018, 9:48:04 PM8/29/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot

unread,
Sep 28, 2018, 10:10:21 PM9/28/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.

/lifecycle rotten

Hemant Kumar

unread,
Nov 7, 2018, 3:48:24 PM11/7/18
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

/remove-lifecycle rotten

fejta-bot

unread,
Feb 5, 2019, 4:37:30 PM2/5/19
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.

/lifecycle stale

mlmhl

unread,
Feb 6, 2019, 4:38:18 AM2/6/19
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

/remove-lifecycle stale

fejta-bot

unread,
May 7, 2019, 5:46:20 AM5/7/19
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.

/lifecycle stale

Moody Saada

unread,
May 7, 2019, 11:23:01 AM5/7/19
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

/remove-lifecycle stale
still relevant right?

Michelle Au

unread,
May 7, 2019, 11:29:57 AM5/7/19
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

Kubernetes Prow Robot

unread,
May 7, 2019, 11:30:12 AM5/7/19
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

Closed #1535.

Kubernetes Prow Robot

unread,
May 7, 2019, 11:30:14 AM5/7/19
to kubernetes/community, k8s-mirror-storage-pr-reviews, Team mention

@msau42: Closed this PR.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Reply all
Reply to author
Forward
0 new messages