@thockin commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> +1. Set `VolumeAttachment.Status.Attached` field to true to indicate the volume is attached. +2. Update the `VolumeAttachment.Status.AttachmentMetadata` field with the contents of the returned `PublishVolumeInfo`. +3. Clear the `VolumeAttachment.Status.AttachError` field. + +If the operation fails, the external-attacher will: + +1. Ensure the `VolumeAttachment.Status.Attached` field to still false to indicate the volume is not attached. +2. Set the `VolumeAttachment.Status.AttachError` field detailing the error. +3. Create an event against the Kubernetes API associated with the `VolumeAttachment` object to inform users what went wrong. + +The external-attacher may implement it’s own error recovery strategy, and retry as long as conditions specified for attachment above are valid. It is strongly recommended that the external-attacher implement an exponential backoff strategy for retries. + +The detach operation will be triggered by the deletion of the `VolumeAttachment` Kubernetes API objects. Since the `VolumeAttachment` Kubernetes API object will have a finalizer added by the external-attacher, it will wait for confirmation from the external-attacher before deleting the object. + +Once all the following conditions are true, the external-attacher should call `ControllerUnpublishVolume` against the CSI volume driver to detach the volume from the specified node: +1. A `VolumeAttachment` Kubernetes API object is deleted by Kubernetes: the value for the `VolumeAttachment.metadata.deletionTimestamp` field is set.
That seems reasonable, to me.
@kubernetes/sig-api-machinery-api-reviews is this a reasonable use of finalizers?
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.![]()
@thockin commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> + // This field must only be set by the entity completing the attach + // operation, i.e. the external-attacher. + // +optional + AttachmentMetadata map[string]string `json:"attachmentMetadata,omitempty" protobuf:"bytes,2,rep,name=attachmentMetadata"` + + // The last error encountered during attach operation, if any. + // This field must only be set by the entity completing the attach + // operation, i.e. the external-attacher. + // +optional + AttachError *VolumeError `json:"attachError,omitempty" protobuf:"bytes,3,opt,name=attachError,casttype=VolumeError"` + + // The last error encountered during detach operation, if any. + // This field must only be set by the entity completing the detach + // operation, i.e. the external-attacher. + // +optional + DetachError *VolumeError `json:"detachError,omitempty" protobuf:"bytes,4,opt,name=detachError,casttype=VolumeError"`
Does it matter if the controller knows the difference? How are we going to react differently?
@lavalamp commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> +1. Set `VolumeAttachment.Status.Attached` field to true to indicate the volume is attached. +2. Update the `VolumeAttachment.Status.AttachmentMetadata` field with the contents of the returned `PublishVolumeInfo`. +3. Clear the `VolumeAttachment.Status.AttachError` field. + +If the operation fails, the external-attacher will: + +1. Ensure the `VolumeAttachment.Status.Attached` field to still false to indicate the volume is not attached. +2. Set the `VolumeAttachment.Status.AttachError` field detailing the error. +3. Create an event against the Kubernetes API associated with the `VolumeAttachment` object to inform users what went wrong. + +The external-attacher may implement it’s own error recovery strategy, and retry as long as conditions specified for attachment above are valid. It is strongly recommended that the external-attacher implement an exponential backoff strategy for retries. + +The detach operation will be triggered by the deletion of the `VolumeAttachment` Kubernetes API objects. Since the `VolumeAttachment` Kubernetes API object will have a finalizer added by the external-attacher, it will wait for confirmation from the external-attacher before deleting the object. + +Once all the following conditions are true, the external-attacher should call `ControllerUnpublishVolume` against the CSI volume driver to detach the volume from the specified node: +1. A `VolumeAttachment` Kubernetes API object is deleted by Kubernetes: the value for the `VolumeAttachment.metadata.deletionTimestamp` field is set.
Yes, you should definitely not allow new references of something that is being deleted.
@thockin commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> +Notably, `ProvisionableVolumePlugin` and `DeletableVolumePlugin` are not implemented because provisioning and deleting for CSI volumes is handled by an external provisioner. + +#### Mount and Unmount + +The in-tree volume plugin’s SetUp and TearDown methods will trigger the `NodePublishVolume` and `NodeUnpublishVolume` CSI calls via Unix Domain Socket. Kubernetes will generate a unique `target_path` (unique per pod per volume) to pass via `NodePublishVolume` for the CSI plugin to mount the volume. Upon successful completion of the `NodeUnpublishVolume` call (once volume unmount has been verified), Kubernetes will delete the directory. + +The Kubernetes volume sub-system does not currently support block volumes (only file), so for alpha, the Kubernetes CSI volume plugin will only support file. + +#### Attaching and Detaching + +The attach/detach controller,running as part of the kube-controller-manager binary on the master, decides when a CSI volume must be attached or detached from a particular node. + +When the controller decides to attach a CSI volume, it will call the in-tree CSI volume plugin’s attach method. The in-tree CSI volume plugin’s attach method will do the following: + +1. Create a new `VolumeAttachment` object (defined in the “Communication Channels” section) to attach the volume. + * The name of the of the `VolumeAttachment` object will be the `pv-<PVName>-<NodeName>`.
No requirement that all names are subdomains (some are not) but the general default is subdomain or even label. What does the name being defined this way buy us? Can the name be made simpler, and the data be carried in the spec?
@thockin commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> + * The operation to timeout. + * The operation terminates with timeout error. + +### Recommended Mechanism for Deploying CSI Drivers on Kubernetes + +Although, Kubernetes does not dictate the packaging for a CSI volume driver, it offers the following recommendations to simplify deployment of a containerized CSI volume driver on Kubernetes. + + + +To deploy a containerized third-party CSI volume driver, it is recommended that storage vendors: + + * Create a “CSI volume driver” container that implements the volume plugin behavior and exposes a gRPC interface via a unix domain socket, as defined in the CSI spec (including Controller, Node, and Identity services). + * To deploy a CSI plugin, a cluster admin should deploy the following + * The Kubernetes team will provide helper containers (external-attacher, external-provisioner, Kubernetes CSI Helper) which will assist the “CSI volume driver” container in interacting with the Kubernetes system. + * StatefulSet with replica size 1, that should + * A StatefulSet (unlike a ReplicaSet) will guarantee that no more than 1 instance of the pod will be running at once (so we don’t have to worry about multiple instances of the external-provisioner or external-attacher in the cluster).
ACK
@saad-ali commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> + +## Design Details + +### Third-Party CSI Volume Drivers + +Kubernetes is as minimally prescriptive on the packaging and deployment of a CSI Volume Driver as possible. Use of the *Communication Channels* (documented below) is the only requirement for enabling an arbitrary external CSI compatible storage driver in Kubernetes. + +This document recommends a standard mechanism for deploying an arbitrary containerized CSI driver on Kubernetes. This can be used by a Storage Provider to simplify deployment of containerized CSI compatible volume drivers on Kubernetes (see the “Recommended Mechanism for Deploying CSI Drivers on Kubernetes” section below). This mechanism, however, is strictly optional. + +### Communication Channels + +#### Kubelet to CSI Driver Communication + +Kubelet (responsible for mount and unmount) will communicate with an external “CSI volume driver” running on the same host machine (whether containerized or not) via a Unix Domain Socket. + +CSI volume drivers should create a socket at the following path on the node machine: `/var/lib/kubelet/plugins/csi/sockets/[driverName]/kubeletproxy.sock`. For alpha, kubelet will assume this is the location for the Unix Domain Socket to talk to the CSI volume driver. For the beta implementation, we can consider using the [Device Plugin Unix Domain Socket Registration](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md#unix-socket) mechanism to register the Unix Domain Socket with kubelet. This mechanism would need to be extended to support registration of both CSI volume drivers and device plugins independently.
Good points, I am going to change the path to: ``/var/lib/kubelet/plugins/[SanitizedCSIDriverName]/csi.sock`.
For the alpha implementation, these CSI paths will be hard coded like this. For the beta implementation, we will enable the driver to register the socket with kubelet -- this will allow the driver to have a custom socket path and use the same socket for both device and storage.
@thockin commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> +To provision a new CSI volume, an end user would create a `PersistentVolumeClaim` object referencing this `StorageClass`. The external provisioner will react to the creation of the PVC and issue the `CreateVolume` call against the CSI volume driver to provision the volume. The `CreateVolume` name will be auto-generated as it is for other dynamically provisioned volumes. The `CreateVolume` capacity will be take from the `PersistentVolumeClaim` object. The `CreateVolume` parameters will be passed through from the `StorageClass` parameters (opaque to Kubernetes). Once the operation completes successfully, the external provisioner creates a `PersistentVolume` object to represent the volume using the information returned in the `CreateVolume` response. The `PersistentVolume` object is bound to the `PersistentVolumeClaim` and available for use. + +To delete a CSI volume, an end user would delete the corresponding `PersistentVolumeClaim` object. The external provisioner will react to the deletion of the PVC and based on its reclamation policy it will issue the `DeleteVolume` call against the CSI volume driver commands to delete the volume. It will then delete the `PersistentVolume` object. + +##### Attaching and Detaching + +Attach/detach operations must also be handled by an external component (an “attacher”). The attacher watches the Kubernetes API on behalf of the external CSI volume driver for new `VolumeAttachment` objects (defined below), and triggers the appropriate calls against the CSI volume driver to attach the volume. The attacher must watch for `VolumeAttachment` object and mark it as attached even if the underlying CSI driver does not support `ControllerPublishVolume` call, as Kubernetes has no knowledge about it. + +More specifically, an external “attacher” must watch the Kubernetes API on behalf of the external CSI volume driver to handle attach/detach requests. + +Once the following conditions are true, the external-attacher should call `ControllerPublishVolume` against the CSI volume driver to attach the volume to the specified node: + +1. A new `VolumeAttachment` Kubernetes API objects is created by Kubernetes attach/detach controller. +2. The `VolumeAttachment.Spec.Attacher` value in that object corresponds to the name of the external attacher. +3. The `VolumeAttachment.Status.Attached` value is not yet set to true. +4. A Kubernetes Node API object exists with the name matching `VolumeAttachment.Spec.NodeName` and that object contains a `nodeid.csi.volume.kubernetes.io/<sanitized CSIDriverName>` annotation corresponding to the CSI volume driver so that the CSI Driver’s NodeId mapping can be retrieved and used in the `ControllerPublishVolume` calls.
this needs updating to cover the change in format of the annotation
In contributors/design-proposals/storage/container-storage-interface.md:
> +#### Kubelet to CSI Driver Communication + +Kubelet (responsible for mount and unmount) will communicate with an external “CSI volume driver” running on the same host machine (whether containerized or not) via a Unix Domain Socket. + +CSI volume drivers should create a socket at the following path on the node machine: `/var/lib/kubelet/plugins/csi/sockets/[driverName]/kubeletproxy.sock`. For alpha, kubelet will assume this is the location for the Unix Domain Socket to talk to the CSI volume driver. For the beta implementation, we can consider using the [Device Plugin Unix Domain Socket Registration](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md#unix-socket
) mechanism to register the Unix Domain Socket with kubelet. This mechanism would need to be extended to support registration of both CSI volume drivers and device plugins independently. + +Upon initialization of the external “CSI volume driver”, some external component must call the CSI method `GetNodeId` to get the mapping from Kubernetes Node names to CSI driver NodeID. It must then add the CSI driver NodeID to the `csi.volume.kubernetes.io/nodeid` annotation on the Kubernetes Node API object. The key of the annotation must be `csi.volume.kubernetes.io/nodeid`. The value of the annotation is a JSON blob, containing key/value pairs for each CSI driver. + +For example: +``` +csi.volume.kubernetes.io/nodeid: "{ \"driver1\": \"name1\", \"driver2\": \"name2\" } +``` + +This will enable the component that will issue `ControllerPublishVolume` calls to use the annotation as a mapping from cluster node ID to storage node ID. + +`Sanitized CSIDriverName` is CSI driver name that does not contain dangerous character and can be used as annotation name. It can follow the same pattern that we use for [volume plugins](https://github.com/kubernetes/kubernetes/blob/master/pkg/util/strings/escape.go#L27). Too long or too ugly driver names can be rejected, i.e. all components described in this document will report an error and won't talk to this CSI driver. Exact sanitization method is implementation detail (SHA in the worst case).
Is this used anywhere now?
In contributors/design-proposals/storage/container-storage-interface.md:
> + +The detach operation will be triggered by the deletion of the `VolumeAttachment` Kubernetes API objects. Since the `VolumeAttachment` Kubernetes API object will have a finalizer added by the external-attacher, it will wait for confirmation from the external-attacher before deleting the object. + +Once all the following conditions are true, the external-attacher should call `ControllerUnpublishVolume` against the CSI volume driver to detach the volume from the specified node: +1. A `VolumeAttachment` Kubernetes API object is marked for deletion: the value for the `VolumeAttachment.metadata.deletionTimestamp` field is set. + +If the operation completes successfully, the external-attacher will: +1. Remove its finalizer from the list of finalizers on the `VolumeAttachment` object permitting the delete operation to continue. + +If the operation fails, the external-attacher will: + +1. Ensure the `VolumeAttachment.Status.Attached` field remains true to indicate the volume is not yet detached. +2. Set the `VolumeAttachment.Status.DetachError` field detailing the error. +3. Create an event against the Kubernetes API associated with the `VolumeAttachment` object to inform users what went wrong. + +The new API object called `VolumeAttachment` will be defined as follows:
Devil's advocate: did we rule out making this just be new fields on PV ?
I don't see much that I disagree with here, but the discovery part needs some thinking, and there's some outstanding comments to resolve. I wouldn't let that stop you from making progress on it. Might be worth a f2f to go over last details?
@saad-ali commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> + +## Design Details + +### Third-Party CSI Volume Drivers + +Kubernetes is as minimally prescriptive on the packaging and deployment of a CSI Volume Driver as possible. Use of the *Communication Channels* (documented below) is the only requirement for enabling an arbitrary external CSI compatible storage driver in Kubernetes. + +This document recommends a standard mechanism for deploying an arbitrary containerized CSI driver on Kubernetes to simplify deployments of containerized CSI compatible volume drivers on Kubernetes (see the “Recommended Mechanism for Deploying CSI Drivers on Kubernetes” section below), however, this is strictly optional. + +### Communication Channels + +#### Kubelet to CSI Driver Communication + +Kubelet (responsible for mount and unmount) will communicate with an external “CSI volume driver” running on the same host machine (whether containerized or not) via a Unix Domain Socket. + +The Unix Domain Socket will be registered with kubelet using the [Device Plugin Unix Domain Socket Registration](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md#unix-socket) mechanism. This mechanism will need to be extended to support registration of both CSI volume drivers and device plugins independently.
The registration mechanism is the way it is currently implemented by the device plugin. I'm open to your proposal, but I'll leave defer to @jiayingz.
@jiayingz commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> + +## Design Details + +### Third-Party CSI Volume Drivers + +Kubernetes is as minimally prescriptive on the packaging and deployment of a CSI Volume Driver as possible. Use of the *Communication Channels* (documented below) is the only requirement for enabling an arbitrary external CSI compatible storage driver in Kubernetes. + +This document recommends a standard mechanism for deploying an arbitrary containerized CSI driver on Kubernetes to simplify deployments of containerized CSI compatible volume drivers on Kubernetes (see the “Recommended Mechanism for Deploying CSI Drivers on Kubernetes” section below), however, this is strictly optional. + +### Communication Channels + +#### Kubelet to CSI Driver Communication + +Kubelet (responsible for mount and unmount) will communicate with an external “CSI volume driver” running on the same host machine (whether containerized or not) via a Unix Domain Socket. + +The Unix Domain Socket will be registered with kubelet using the [Device Plugin Unix Domain Socket Registration](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md#unix-socket) mechanism. This mechanism will need to be extended to support registration of both CSI volume drivers and device plugins independently.
Sorry @thockin I missed your original comment.
It is a good point that we should try to use one-way communication model for its simplicity. We did discuss the option to have kubelet detect new device plugins through inotify during the device plugin design discussion. The main reasons we chose to export a Registration rpc service from Kubelet was mostly because a) it is easier to detect api version mismatch early on during registration, b) we discussed the possible need to authenticate device plugin during registration and grpc has more flexible authentication support, and c) it allows more flexibility for device plugin to choose their communication channel. They don't have to use a unix socket under a canonical path.
Having said these, the current device plugin implementation haven't really used the listed flexibility. The implementation expects a device plugin to export its rpc interface through a socket under the /var/lib/kubelet/device-plugins directory. Our current security model is that /var/lib/kubelet/device-plugins requires privilege access so a device plugin needs to run with privilege to register with kubelet. From our experience, requiring device plugin to register with kubelet does complicate failure recovery. In particular, when kubelet restarts, there is a time gap for previously registered device plugins to re-register and we have to add some code complexity to populate device plugin resource based on previously checkpointed state and gracefully garbage collect resources from device plugins that fail to re-register. So there are some tradeoffs to consider and the discussions we have here will be very useful to us. Whether we decide to use a poll model or registration model for CSI, we would be interested in following the same model in device plugin to have a consistent communication mechanism across different plugins.
@saad-ali commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> +#### Kubelet to CSI Driver Communication + +Kubelet (responsible for mount and unmount) will communicate with an external “CSI volume driver” running on the same host machine (whether containerized or not) via a Unix Domain Socket. + +CSI volume drivers should create a socket at the following path on the node machine: `/var/lib/kubelet/plugins/csi/sockets/[driverName]/kubeletproxy.sock`. For alpha, kubelet will assume this is the location for the Unix Domain Socket to talk to the CSI volume driver. For the beta implementation, we can consider using the [Device Plugin Unix Domain Socket Registration](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md#unix-socket) mechanism to register the Unix Domain Socket with kubelet. This mechanism would need to be extended to support registration of both CSI volume drivers and device plugins independently. + +Upon initialization of the external “CSI volume driver”, some external component must call the CSI method `GetNodeId` to get the mapping from Kubernetes Node names to CSI driver NodeID. It must then add the CSI driver NodeID to the `csi.volume.kubernetes.io/nodeid` annotation on the Kubernetes Node API object. The key of the annotation must be `csi.volume.kubernetes.io/nodeid`. The value of the annotation is a JSON blob, containing key/value pairs for each CSI driver. + +For example: +``` +csi.volume.kubernetes.io/nodeid: "{ \"driver1\": \"name1\", \"driver2\": \"name2\" } +``` + +This will enable the component that will issue `ControllerPublishVolume` calls to use the annotation as a mapping from cluster node ID to storage node ID. + +`Sanitized CSIDriverName` is CSI driver name that does not contain dangerous character and can be used as annotation name. It can follow the same pattern that we use for [volume plugins](https://github.com/kubernetes/kubernetes/blob/master/pkg/util/strings/escape.go#L27). Too long or too ugly driver names can be rejected, i.e. all components described in this document will report an error and won't talk to this CSI driver. Exact sanitization method is implementation detail (SHA in the worst case).
Revising the doc, it should be used in the plugin socket path.
@saad-ali commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> +To provision a new CSI volume, an end user would create a `PersistentVolumeClaim` object referencing this `StorageClass`. The external provisioner will react to the creation of the PVC and issue the `CreateVolume` call against the CSI volume driver to provision the volume. The `CreateVolume` name will be auto-generated as it is for other dynamically provisioned volumes. The `CreateVolume` capacity will be take from the `PersistentVolumeClaim` object. The `CreateVolume` parameters will be passed through from the `StorageClass` parameters (opaque to Kubernetes). Once the operation completes successfully, the external provisioner creates a `PersistentVolume` object to represent the volume using the information returned in the `CreateVolume` response. The `PersistentVolume` object is bound to the `PersistentVolumeClaim` and available for use. + +To delete a CSI volume, an end user would delete the corresponding `PersistentVolumeClaim` object. The external provisioner will react to the deletion of the PVC and based on its reclamation policy it will issue the `DeleteVolume` call against the CSI volume driver commands to delete the volume. It will then delete the `PersistentVolume` object. + +##### Attaching and Detaching + +Attach/detach operations must also be handled by an external component (an “attacher”). The attacher watches the Kubernetes API on behalf of the external CSI volume driver for new `VolumeAttachment` objects (defined below), and triggers the appropriate calls against the CSI volume driver to attach the volume. The attacher must watch for `VolumeAttachment` object and mark it as attached even if the underlying CSI driver does not support `ControllerPublishVolume` call, as Kubernetes has no knowledge about it. + +More specifically, an external “attacher” must watch the Kubernetes API on behalf of the external CSI volume driver to handle attach/detach requests. + +Once the following conditions are true, the external-attacher should call `ControllerPublishVolume` against the CSI volume driver to attach the volume to the specified node: + +1. A new `VolumeAttachment` Kubernetes API objects is created by Kubernetes attach/detach controller. +2. The `VolumeAttachment.Spec.Attacher` value in that object corresponds to the name of the external attacher. +3. The `VolumeAttachment.Status.Attached` value is not yet set to true. +4. A Kubernetes Node API object exists with the name matching `VolumeAttachment.Spec.NodeName` and that object contains a `nodeid.csi.volume.kubernetes.io/<sanitized CSIDriverName>` annotation corresponding to the CSI volume driver so that the CSI Driver’s NodeId mapping can be retrieved and used in the `ControllerPublishVolume` calls.
Updating.
@saad-ali commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> + +The detach operation will be triggered by the deletion of the `VolumeAttachment` Kubernetes API objects. Since the `VolumeAttachment` Kubernetes API object will have a finalizer added by the external-attacher, it will wait for confirmation from the external-attacher before deleting the object. + +Once all the following conditions are true, the external-attacher should call `ControllerUnpublishVolume` against the CSI volume driver to detach the volume from the specified node: +1. A `VolumeAttachment` Kubernetes API object is marked for deletion: the value for the `VolumeAttachment.metadata.deletionTimestamp` field is set. + +If the operation completes successfully, the external-attacher will: +1. Remove its finalizer from the list of finalizers on the `VolumeAttachment` object permitting the delete operation to continue. + +If the operation fails, the external-attacher will: + +1. Ensure the `VolumeAttachment.Status.Attached` field remains true to indicate the volume is not yet detached. +2. Set the `VolumeAttachment.Status.DetachError` field detailing the error. +3. Create an event against the Kubernetes API associated with the `VolumeAttachment` object to inform users what went wrong. + +The new API object called `VolumeAttachment` will be defined as follows:
Mostly because it makes detaching tricker.
@saad-ali pushed 1 commit.
—
You are receiving this because you are subscribed to this thread.
View it on GitHub or mute the thread.![]()
I don't see much that I disagree with here, but the discovery part needs some thinking, and there's some outstanding comments to resolve. I wouldn't let that stop you from making progress on it. Might be worth a f2f to go over last details?
Great, feedback addressed. I'll set up some time with you tomorrow to hammer it out.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.![]()
@jsafrane commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> +#### Kubelet to CSI Driver Communication + +Kubelet (responsible for mount and unmount) will communicate with an external “CSI volume driver” running on the same host machine (whether containerized or not) via a Unix Domain Socket. + +CSI volume drivers should create a socket at the following path on the node machine: `/var/lib/kubelet/plugins/csi/sockets/[driverName]/kubeletproxy.sock`. For alpha, kubelet will assume this is the location for the Unix Domain Socket to talk to the CSI volume driver. For the beta implementation, we can consider using the [Device Plugin Unix Domain Socket Registration](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md#unix-socket) mechanism to register the Unix Domain Socket with kubelet. This mechanism would need to be extended to support registration of both CSI volume drivers and device plugins independently. + +Upon initialization of the external “CSI volume driver”, some external component must call the CSI method `GetNodeId` to get the mapping from Kubernetes Node names to CSI driver NodeID. It must then add the CSI driver NodeID to the `csi.volume.kubernetes.io/nodeid` annotation on the Kubernetes Node API object. The key of the annotation must be `csi.volume.kubernetes.io/nodeid`. The value of the annotation is a JSON blob, containing key/value pairs for each CSI driver. + +For example: +``` +csi.volume.kubernetes.io/nodeid: "{ \"driver1\": \"name1\", \"driver2\": \"name2\" } +``` + +This will enable the component that will issue `ControllerPublishVolume` calls to use the annotation as a mapping from cluster node ID to storage node ID. + +`Sanitized CSIDriverName` is CSI driver name that does not contain dangerous character and can be used as annotation name. It can follow the same pattern that we use for [volume plugins](https://github.com/kubernetes/kubernetes/blob/master/pkg/util/strings/escape.go#L27). Too long or too ugly driver names can be rejected, i.e. all components described in this document will report an error and won't talk to this CSI driver. Exact sanitization method is implementation detail (SHA in the worst case).
It's used in finalizers, from some reason they need to pass IsQualifiedName validation.
Just met with Tim. To summarize what we discussed:
I'll update the doc accordingly.
AI: we should have CSI spec define the valid name for a plugin to be the same as label.
Opened container-storage-interface/spec#145
AI: Add an alternatives considered section to mention this was considered and why we choose not to proceed with it.
Done.
AI: change VolumeAttachment name to a hash. And consider if hash collision to verify that the existing object matches, otherwise post a big error somewhere (indicates hash collision).
Done.
AI: Consider putting VolumeSource under "source" in VolumeAttachment.
Done. Changed AttachedVolumeSource from ,inline to volumeSource. So that the spec would look something like:
spec: attacher: "blah" volumeSource: persistentVolumeName: pv1 nodeName: "node1"
AI: Consider collapsing in to a single error field for beta.
Ack.
AI: Consider changing attached in VolumeAttachment object from bool to enum
Ack.
PTAL.
@govint commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> +Attach/detach operations must also be handled by an external component (an “attacher”). The attacher watches the Kubernetes API on behalf of the external CSI volume driver for new `VolumeAttachment` objects (defined below), and triggers the appropriate calls against the CSI volume driver to attach the volume. The attacher must watch for `VolumeAttachment` object and mark it as attached even if the underlying CSI driver does not support `ControllerPublishVolume` call, as Kubernetes has no knowledge about it. + +More specifically, an external “attacher” must watch the Kubernetes API on behalf of the external CSI volume driver to handle attach/detach requests. + +Once the following conditions are true, the external-attacher should call `ControllerPublishVolume` against the CSI volume driver to attach the volume to the specified node: + +1. A new `VolumeAttachment` Kubernetes API objects is created by Kubernetes. +2. The `VolumeAttachment.Spec.Attacher` value in that object corresponds to the name of the external attacher. +3. The `VolumeAttachment.Status.Attached` value is not yet set to true. +4. A Kubernetes Node API object exists with the name matching `VolumeAttachment.Spec.NodeName` and that object contains a `nodeid.csi.volume.kubernetes.io/<sanitized CSIDriverName>` annotation corresponding to the CSI volume driver so that the CSI Driver’s NodeId mapping can be retrieved and used in the `ControllerPublishVolume` calls. +5. The `VolumeAttachment.Metadata.DeletionTimestamp` is not set. + +Before starting the `ControllerPublishVolume` operation, the external-attacher should add these finalizers to these Kubernetes API objects: + +* To the `VolumeAttachment` so that when the object is deleted, the external-attacher has an opportunity to detach the volume first. External attacher removes this finalized once the volume is fully detached from the node. +* To the `PersistentVolume` referenced by `VolumeAttachment` so the the PV cannot be deleted while the volume is attached. External attacher needs information from the PV to perform detach operation. The attacher will remove the finalizer once all `VolumeAttachment` objects that refer to the PV are deleted, i.e. the volume is detached from all nodes.
Perfect!
@smarterclayton commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> + +Because the kubelet would be responsible for fetching and passing the mount secret to the CSI driver,the Kubernetes NodeAuthorizer must be updated to allow kubelet read access to mount secrets. + +## Alternatives Considered + +### Extending PersistentVolume Object + +Instead of creating a new `VolumeAttachment` object, another option we considered was extending the exiting `PersistentVolume` object. + +`PersistentVolumeSpec` would be extended to include: +* List of nodes to attach the volume to (initially empty). + +`PersistentVolumeStatus` would be extended to include: +* List of nodes the volume was successfully attached to. + +We dismissed this approach because having attach/detach triggered by the creation/deletion of an object is much easier to manage (for both external-attacher and Kubernetes) and more robust (fewer corner cases to worry about).
What happens if I delete the PV before the attach objects, then a PV gets recreated with the same name?
@saad-ali commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> + +Because the kubelet would be responsible for fetching and passing the mount secret to the CSI driver,the Kubernetes NodeAuthorizer must be updated to allow kubelet read access to mount secrets. + +## Alternatives Considered + +### Extending PersistentVolume Object + +Instead of creating a new `VolumeAttachment` object, another option we considered was extending the exiting `PersistentVolume` object. + +`PersistentVolumeSpec` would be extended to include: +* List of nodes to attach the volume to (initially empty). + +`PersistentVolumeStatus` would be extended to include: +* List of nodes the volume was successfully attached to. + +We dismissed this approach because having attach/detach triggered by the creation/deletion of an object is much easier to manage (for both external-attacher and Kubernetes) and more robust (fewer corner cases to worry about).
+@jsafrane there will be a finalizer on the PV to prevent deletion until the detach is completed.
@thockin commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> + +The detach operation will be triggered by the deletion of the `VolumeAttachment` Kubernetes API objects. Since the `VolumeAttachment` Kubernetes API object will have a finalizer added by the external-attacher, it will wait for confirmation from the external-attacher before deleting the object. + +Once all the following conditions are true, the external-attacher should call `ControllerUnpublishVolume` against the CSI volume driver to detach the volume from the specified node: +1. A `VolumeAttachment` Kubernetes API object is marked for deletion: the value for the `VolumeAttachment.metadata.deletionTimestamp` field is set. + +If the operation completes successfully, the external-attacher will: +1. Remove its finalizer from the list of finalizers on the `VolumeAttachment` object permitting the delete operation to continue. + +If the operation fails, the external-attacher will: + +1. Ensure the `VolumeAttachment.Status.Attached` field remains true to indicate the volume is not yet detached. +2. Set the `VolumeAttachment.Status.DetachError` field detailing the error. +3. Create an event against the Kubernetes API associated with the `VolumeAttachment` object to inform users what went wrong. + +The new API object called `VolumeAttachment` will be defined as follows:
did you make this case in teh doc? I can't find it without re-reading teh whole thing
@thockin commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> + // Status of the VolumeAttachment request.
+ // Populated by the entity completing the attach or detach
+ // operation, i.e. the external-attacher.
+ // +optional
+ Status VolumeAttachmentStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"`
+}
+
+// The specification of a VolumeAttachment request.
+type VolumeAttachmentSpec struct {
+ // Attacher indicates the name of the volume driver that MUST handle this
+ // request. This is the name returned by GetPluginName() and must be the
+ // same as StorageClass.Provisioner.
+ Attacher string `json:"attacher" protobuf:"bytes,1,opt,name=attacher"`
+
+ // AttachedVolumeSource represents the volume that should be attached.
+ VolumeSource AttachedVolumeSource `json:"volumeSource" protobuf:"bytes,2,opt,name=volumeSource"`
propose just "source" for this field
@thockin commented on this pull request.
Approving. Please keep in mind the various API-related feedback as you iterate towards beta.
/lgtm
/approve
/test all [submit-queue is verifying that this PR is safe to merge]
Automatic merge from submit-queue.
Merged #1258.
@saad-ali commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> + // Status of the VolumeAttachment request.
+ // Populated by the entity completing the attach or detach
+ // operation, i.e. the external-attacher.
+ // +optional
+ Status VolumeAttachmentStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"`
+}
+
+// The specification of a VolumeAttachment request.
+type VolumeAttachmentSpec struct {
+ // Attacher indicates the name of the volume driver that MUST handle this
+ // request. This is the name returned by GetPluginName() and must be the
+ // same as StorageClass.Provisioner.
+ Attacher string `json:"attacher" protobuf:"bytes,1,opt,name=attacher"`
+
+ // AttachedVolumeSource represents the volume that should be attached.
+ VolumeSource AttachedVolumeSource `json:"volumeSource" protobuf:"bytes,2,opt,name=volumeSource"`
Ack, will update the implementation accordingly.
@jsafrane commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> + +Because the kubelet would be responsible for fetching and passing the mount secret to the CSI driver,the Kubernetes NodeAuthorizer must be updated to allow kubelet read access to mount secrets. + +## Alternatives Considered + +### Extending PersistentVolume Object + +Instead of creating a new `VolumeAttachment` object, another option we considered was extending the exiting `PersistentVolume` object. + +`PersistentVolumeSpec` would be extended to include: +* List of nodes to attach the volume to (initially empty). + +`PersistentVolumeStatus` would be extended to include: +* List of nodes the volume was successfully attached to. + +We dismissed this approach because having attach/detach triggered by the creation/deletion of an object is much easier to manage (for both external-attacher and Kubernetes) and more robust (fewer corner cases to worry about).
If PV is deleted before AttachedVolume is created, then external attacher controller won't even start attaching the volume (as there is nothing to attach).
When the PV is deleted after external attacher controller started attaching, the attacher already has finalizer on the PV - it puts the finalizer there before it starts attaching.
@hairesis commented on this pull request.
In contributors/design-proposals/storage/container-storage-interface.md:
> + +To enable easy deployment of an external containerized CSI volume driver, the Kubernetes team will provide a sidecar "Kubernetes CSI Helper" container that can manage the unix domain socket registration and NodeId initialization. This is detailed in the “Suggested Mechanism for Deploying CSI Drivers on Kubernetes” section below. + +#### Master to CSI Driver Communication + +Because CSI volume driver code is considered untrusted, it might not be allowed to run on the master. Therefore, the Kube controller manager (responsible for create, delete, attach, and detach) can not communicate via a Unix Domain Socket with the “CSI volume driver” container. Instead, the Kube controller manager will communicate with the external “CSI volume driver” through the Kubernetes API. + +More specifically, some external component must watch the Kubernetes API on behalf of the external CSI volume driver and trigger the appropriate operations against it. This eliminates the problems of discovery and securing a channel between the kube-controller-manager and the CSI volume driver. + +To enable easy deployment of an external containerized CSI volume driver on Kubernetes, without making the driver Kubernetes aware, Kubernetes will provide a sidecar “Kubernetes to CSI” proxy container that will watch the Kubernetes API and trigger the appropriate operations against the “CSI volume driver” container. This is detailed in the “Suggested Mechanism for Deploying CSI Drivers on Kubernetes” section below. + +The external component watching the Kubernetes API on behalf of the external CSI volume driver must handle provisioning, deleting, attaching, and detaching. + +##### Provisioning and Deleting + +Provisioning and deletion operations are handled using the existing [external provisioner mechanism](https://github.com/kubernetes-incubator/external-storage/tree/master/docs), where the external component watching the Kubernetes API on behalf of the external CSI volume driver will act as an external provisioner.
broken link