NodeUnpublish and NodeUnstage calls with container restart

apurv barve

unread,

Sep 9, 2020, 11:25:17 AM9/9/20

to container-storage-interface-community, Saad Ali, Hemant Kumar, Ben Swartzlander, Sandeep P S, Michelle Au

Hi experts,

When an application container fails and stops because underlying storage faults, NodeUnpublish and NodeUnstage calls are not made.

This is required because when an application faults, to do log replays we need the file system unmount and mount to be done. These will be done through NodeUnpublish and NodeUnstage calls when container stops and Nodepublish and Nodestage calls when container restarts again after the storage fault is recovered from.

We are seeing this behavior with both configurations when we have a pod with a single container or a pod having multiple containers.

Would NodeUnpublish and NodeUnstage calls be made only when the pod is restarted and not when container restarts?

--

Regards,
Apurv

Saad Ali

unread,

Sep 9, 2020, 9:44:49 PM9/9/20

to apurv barve, Xing Yang, kubernetes-sig-storage, container-storage-interface-community, Hemant Kumar, Ben Swartzlander, Sandeep P S, Michelle Au

Today only a pod termination/recreation results in unpublish/unstage and another stage/publish.

If a container restarts, the assumption is the existing mount is good and is reused.

We are working on CSI Volume Health. Initially this will only be informative only, but in the future we could use it to do programmatic repair.

+Xing Yang

+kubernetes-sig-storage

Xing Yang

unread,

Sep 10, 2020, 12:10:53 AM9/10/20

to Saad Ali, apurv barve, kubernetes-sig-storage, container-storage-interface-community, Hemant Kumar, Ben Swartzlander, Sandeep P S, Michelle Au

Adding to what Saad has said, CSI Volume Health is an Alpha feature just introduced in Kubernetes 1.19. It enables CSI drivers to check volume conditions on the storage system and communicate that information back to Kubernetes. As Saad mentioned, initially this is informative only so abnormal volume conditions will be logged as events on PVCs or Pods. We will be looking at how to make this information available so that we can programmatically make corrections based on volume health. The problem you described looks like a good use case we can consider.

Here is the Volume Health KEP:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1432-volume-health-monitor.

Here’s the repo: https://github.com/kubernetes-csi/external-health-monitor.

Thanks,

Xing

apurv barve

unread,

Sep 10, 2020, 4:29:16 AM9/10/20

to Xing Yang, Saad Ali, kubernetes-sig-storage, container-storage-interface-community, Hemant Kumar, Ben Swartzlander, Sandeep P S, Michelle Au

Hi Xing, Saad,

Thanks for this information. While the monitor functionality of volume health check would be useful, we want to take programmatic actions against storage failures in CSI plugin.

Since NodeUnpublish and NodeUnstage are called only in context of pods going down and not for containers, is there a way to instruct k8s to forcefully terminate the pod altogether for storage errors?

Another enhancement (if this does not yet exist) could be to work in conjunction with k8s scheduler and decide to fail over the pod to another node where storage is accessible instead of repeatedly trying to start the pod on the same node. A negotiated error from Nodepublish or Nodestage calls would help k8s scheduler decide if it should attempt re-scheduling the pod to another node. Is anything getting discussed around this?

Regards,

Apurv

Xing Yang

unread,

Sep 10, 2020, 10:22:04 AM9/10/20

to apurv barve, Saad Ali, kubernetes-sig-storage, container-storage-interface-community, Hemant Kumar, Ben Swartzlander, Sandeep P S, Michelle Au, patric...@intel.com

> Thanks for this information. While the monitor functionality of volume health check would be useful, we want to take programmatic actions against storage failures in CSI plugin.

Right now volume health just provides information in the form of events. In the future, we want to look into how to make programmatic actions when the volume on the underlying storage system has problems.

> Since NodeUnpublish and NodeUnstage are called only in context of pods going down and not for containers, is there a way to instruct k8s to forcefully terminate the pod altogether for storage errors?

As Saad mentioned, this is not there today. This is something we can think about when working on the next phase of the Volume Health feature.

> Another enhancement (if this does not yet exist) could be to work in conjunction with k8s scheduler and decide to fail over the pod to another node where storage is accessible instead of repeatedly trying to start the pod on the same node. A negotiated error from Nodepublish or Nodestage calls would help k8s scheduler decide if it should attempt re-scheduling the pod to another node. Is anything getting discussed around this?

This is something we may want to build on top of this KEP on storage capacity tracking:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1472-storage-capacity-tracking. So in addition to capacity, we could also consider volume health or "pool" health when making scheduling decisions.

CC: Patrick Ohly

Thanks,

Xing

Reply all

Reply to author

Forward