Question on CSI snapshot design

sandee...@gmail.com

unread,

Jan 29, 2021, 5:29:25 PM1/29/21

to container-storage-interface-community

Hi,

I was going through the CSI snapshot spec, the design docs, examples, etc and few questions popped up in my mind.

(1)

https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/177-volume-snapshot/README.md#controller-split

I'm wondering why did we have use 2 controllers - snapshot controller and external snapshotter side car. Could we not have built the CSI snapshot functionality entirely in the external snapshotter sidecar? This would simplify debugging issues in one container. I may be missing something, so curios to know the answer.

(2)

The same above link also says this - "For dynamic provisioning, the creation of a new VolumeSnapshot object referencing a VolumeSnapshotClass CRD object corresponding to this driver causes the snapshot controller to trigger the creation of a Kubernetes VolumeSnapshotContent object to represent the to-be-created new snapshot."

So, for volume provisioning, the PV is created only after the CSI driver creates the volume on the storage. So, if VolumeSnapshot/VolumeSnapshotContent is equivalent to PVC/PV, why is the same design followed for VolumeSnapshot/VolumeSnapshotContent as well. I mean any reason why VolumeSnapshotContent is not created after the CSI driver creates the snapshot?

(3) Since snapshots takes additional space, is this integrated into namespace storage quota accounting?

Thanks,

Sandeep

sandee...@gmail.com

unread,

Jan 29, 2021, 5:33:13 PM1/29/21

to container-storage-interface-community

Few corrections in the email

On Friday, January 29, 2021 at 2:29:25 PM UTC-8 sandee...@gmail.com wrote:

Hi,

I was going through the CSI snapshot spec, the design docs, examples, etc and few questions popped up in my mind.

(1)
https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/177-volume-snapshot/README.md#controller-split

I'm wondering why did we have to use 2 controllers - snapshot controller and external snapshotter side car. Could we not have built the CSI snapshot functionality entirely in the external snapshotter sidecar? This would simplify debugging issues in one container. I may be missing something, so curios to know the answer.

(2)
The same above link also says this - "For dynamic provisioning, the creation of a new VolumeSnapshot object referencing a VolumeSnapshotClass CRD object corresponding to this driver causes the snapshot controller to trigger the creation of a Kubernetes VolumeSnapshotContent object to represent the to-be-created new snapshot."

So, for volume provisioning, the PV is created only after the CSI driver creates the volume on the storage. So, if VolumeSnapshot/VolumeSnapshotContent is equivalent to PVC/PV, why is the same design not followed for VolumeSnapshot/VolumeSnapshotContent as well. I mean any reason why VolumeSnapshotContent is not created after the CSI driver creates the snapshot?

Xing Yang

unread,

Jan 29, 2021, 8:30:51 PM1/29/21

to sandee...@gmail.com, container-storage-interface-community

Hi Sandeep,

(1) Regarding the controller split, this is following the in-tree PV controller and the external-provisioner sidecar model. Like the PV controller, the snapshot controller handles logic such as binding of VolumeSnapshot and VolumeSnapshotContent API objects. Sidecar is a helper for the CSI driver so it should be light-weight.

(2) In the alpha version of the volume snapshot, we followed the PV/PVC model strictly. When moving to Beta, we made some enhancements so the design is not exactly the same any more. The reason that a VolumeSnapshotContent is created before a snapshot is created on the storage system is to avoid leaking of snapshot resources.

(3) Regarding quota, there is still limitation in this area because VolumeSnapshot is a custom resource. For custom resources, we can set quota for the count - total number of VolumeSnapshots in a namespace, but we cannot set quota for capacity like what is currently available for PVCs. This is an area that needs improvement.

Thanks,

Xing

--
You received this message because you are subscribed to the Google Groups "container-storage-interface-community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to container-storage-interf...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/container-storage-interface-community/6df57f90-f0ce-4d35-91be-9a885b7502f2n%40googlegroups.com.

sandee...@gmail.com

unread,

Feb 2, 2021, 3:40:35 PM2/2/21

to container-storage-interface-community

Thanks Xing for replying to my email. I had one more question and then follow up comments/question.

(4) Regarding this doc https://kubernetes.io/docs/concepts/storage/volume-snapshots/#volume-snapshot-contents, why does VolumeSnapshotContent(dynamic provisionig) does not have snapshotHandle and why does the VolumeSnapshotContent(preprovisioned snapshot) does not have volumeHandle? Would we need both snapshotHandle and volumeHandle in both use cases? If not, what is the reasoning?

On (1), currently it is a bit complex to debug issues related to volume binding and attach/detach since we need to look at 2 different logs(kube controller manager, external attacher/provisioner). Sometimes, we do not get appropriate/matching logs as they roll over at different rate. We had an opportunity to consolidate the snapshot controller and the side car to simplify the binding logic. Other than just keep it consistent with PV controller, was there any other reason?

On (2), can you provide some details on leaking of snapshot resources and how does creating VolumeSnapshotContent before creating the actual snapshot is solving it?

On (3), have we done any analysis or thinking on the possible solutions?

Thanks,

Sandeep

Xing Yang

unread,

Feb 2, 2021, 11:15:12 PM2/2/21

to sandee...@gmail.com, container-storage-interface-community

Hi Sandeep,

(4) We want to clearly specify whether the snapshot is dynamically provisioned or pre-provisioned by looking at the VolumeSnapshot Spec Source field. Source must contain one of VolumeHandle or SnapshotHandle. For dynamic provisioning, we need to know the VolumeHandle to take the snapshot. For a pre-provisioned snapshot, a SnapshotHandle already exists. SnapshotHandle is also in VolumeSnapshot Status for both dynamically provisioned snapshot and pre-provisioned snapshot.

(1) Sidecars are supposed to be light-weight. We put most controller logic in the common snapshot controller. The CSI snapshotter sidecar is very small. Some users may want to deploy multiple CSI drivers. In this case, we only need to deploy one snapshot controller as that is shared among all CSI drivers. I agree that it is easier to debug if they are in the same controller. There are pros and cons with both options.

(2) The external-provisioner had the following fix to handle potential leak of volumes due to CSI driver timeouts.

https://github.com/kubernetes-csi/external-provisioner/pull/312

In the snapshot case, since VolumeSnapshotContent is created before the snapshot is created, we just re-queue any failed snapshot creation including timeouts and do retries. The logic is much simpler.

(3) Regarding quota for snapshot capacity, I looked at it a while ago. The PVC capacity quota is handled by in-tree logic. We can’t use similar logic for custom resources. I think we’ll need to add this support for CR first. I have not investigated it further. You are welcome to propose ideas.

Thanks,

Xing

To view this discussion on the web visit https://groups.google.com/d/msgid/container-storage-interface-community/629c9bc2-f285-45ed-9e3e-6d37789a2143n%40googlegroups.com.

Reply all

Reply to author

Forward