Velero CSI Snapshots for OCS

693 views
Skip to first unread message

Rahul R

unread,
Jan 13, 2021, 2:21:19 AM1/13/21
to Project Velero
Hello,

I am trying to configure Velero to perform CSI Snapshots for OCS. I am not sure if Snapshot Location should be specified during configuration.

With Volume Snapshots Enabled:

velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.1.0,velero/velero-plugin-for-csi:v0.1.2 --bucket velero --secret-file ./credentials-velero --use-volume-snapshots=true --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://minio.velero.svc:9000--image velero/velero:v1.5.2 --features=EnableCSI 

I tried the command above which resulted into default snapshot location configured with provider details as aws. Please note i am using minio (for S3 interface) only for storage purpose.

Without Volume Snapshot Location:
Since, we are using CSI snapshots, i am not sure what the volume snapshot location should point to ? My understanding is it should contain provider specific parameters. But I couldn't find any relevant documentation related to CSI Snapshot location parameters. Is this even required (as VolumeSnapshotClass should have relevant information)?

I deleted the volume snapshot location (Is this correct configuration ?) and i could see that the snapshots succeeded. The backup details showed volumesnapshot, volumesnapshotcontents populated against resource list.

If this is not correct configuration, can someone confirm whats the correct configuration for VolumeSnapshotLocation ?

I tried namespace restore which succeeded without any errors. However, the Pod which was consuming PVC couldn't come up as PVC was never bound. Is this due to incorrect configuration (SnapshotLocation) ? 

Thanks,
Rahul

Rahul R

unread,
Jan 14, 2021, 11:51:48 PM1/14/21
to Project Velero
Team,
I will really appreciate if someone can take a look at this ? 

I think the issue is - During restore VolumeSnapshot is not getting patched to point to the VolumeSnasphotContent, insted it keeps pointing to PVC (as in the original backup) and hence volume is not restored.

Thanks,
Rahul

Swanand Shende

unread,
Jan 15, 2021, 12:22:25 AM1/15/21
to Rahul R, Project Velero
Yes, you are correct. That is the issue. Its root cause is that during restore, the PVC and Volumesnapshot resources are still pointing to the old namespace. This issue has been corrected in the PR submitted by ssh...@catalogicsoftware.com where tha restore action plugins for CSI have been updated.

Thanks and Regards,
Swanand

--
You received this message because you are subscribed to the Google Groups "Project Velero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to projectveler...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/projectvelero/09da5d93-e2f7-4f09-9741-dd440ae6f5f0n%40googlegroups.com.

Rahul R

unread,
Jan 15, 2021, 12:37:18 AM1/15/21
to Project Velero
Thanks much Swanand. Can you please point me to the PR and velero version which contains the fix ?

Please let me know velero client, server versions along with the plug-n versions.

-Rahul

Rahul R

unread,
Jan 15, 2021, 12:38:20 AM1/15/21
to Project Velero
Also, i faced the same issue when restoring to original namespace too.

Swanand Shende

unread,
Jan 15, 2021, 1:09:11 AM1/15/21
to Rahul R, Project Velero
Hi Rahul,

      the PR is for CSI plugin version v0.1.2 and Velero v1.5.2.

Regards,
Swanand

Rahul R

unread,
Jan 15, 2021, 2:05:36 AM1/15/21
to Project Velero

Upgraded Velero to 1.5.3 which contains the fix above, the result is still same, in fact now PVC is also not getting restored.

Excerpt from Velero Logs for Restore:

time="2021-01-15T06:53:08Z" level=info msg="Starting VolumeSnapshotContentRestoreItemAction" cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/restore/volumesnapshotcontent_action.go:47" pluginName=velero-plugin-for-csi restore=velero/backup-nginx-1-1501-20210115122236

time="2021-01-15T06:53:08Z" level=info msg="Returning from VolumeSnapshotContentRestoreItemAction with 0 additionalItems" cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/restore/volumesnapshotcontent_action.go:65" pluginName=velero-plugin-for-csi restore=velero/backup-nginx-1-1501-20210115122236

time="2021-01-15T06:53:08Z" level=info msg="Attempting to restore VolumeSnapshotContent: snapcontent-d0e5bfba-9dcf-4851-bd9e-744f53d49ff7" logSource="pkg/restore/restore.go:1107" restore=velero/backup-nginx-1-1501-20210115122236

time="2021-01-15T06:53:08Z" level=info msg="Restore of VolumeSnapshotContent, snapcontent-d0e5bfba-9dcf-4851-bd9e-744f53d49ff7 skipped: it already exists in the cluster and is the same as the backed up version" logSource="pkg/restore/restore.go:1164" restore=velero/backup-nginx-1-1501-20210115122236

time="2021-01-15T06:53:08Z" level=info msg="Restoring resource 'volumesnapshots.snapshot.storage.k8s.io' into namespace 'nginx-1-1502'" logSource="pkg/restore/restore.go:724" restore=velero/backup-nginx-1-1501-20210115122236

time="2021-01-15T06:53:08Z" level=info msg="Getting client for snapshot.storage.k8s.io/v1beta1, Kind=VolumeSnapshot" logSource="pkg/restore/restore.go:768" restore=velero/backup-nginx-1-1501-20210115122236

time="2021-01-15T06:53:08Z" level=info msg="Executing item action for volumesnapshots.snapshot.storage.k8s.io" logSource="pkg/restore/restore.go:1002" restore=velero/backup-nginx-1-1501-20210115122236

time="2021-01-15T06:53:08Z" level=info msg="Starting VolumeSnapshotRestoreItemAction" cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/restore/volumesnapshot_action.go:60" pluginName=velero-plugin-for-csi restore=velero/backup-nginx-1-1501-20210115122236

time="2021-01-15T06:53:09Z" level=info msg="Returning from VolumeSnapshotRestoreItemAction with no additionalItems" cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/restore/volumesnapshot_action.go:132" pluginName=velero-plugin-for-csi restore=velero/backup-nginx-1-1501-20210115122236

time="2021-01-15T06:53:09Z" level=info msg="Attempting to restore VolumeSnapshot: velero-ceph-ext-wxfmg" logSource="pkg/restore/restore.go:1107" restore=velero/backup-nginx-1-1501-20210115122236

time="2021-01-15T06:53:09Z" level=info msg="Restoring resource 'secrets' into namespace 'nginx-1-1502'" logSource="pkg/restore/restore.go:724" restore=velero/backup-nginx-1-1501-20210115122236

Probable Issue ?

It seems PVC cant be restored, the original namespace backed up is nginx-1

time="2021-01-15T06:53:09Z" level=info msg="Adding PVC nginx-1/ceph-ext as an additional item to restore" cmd=/velero logSource="pkg/restore/add_pvc_from_pod_action.go:58" pluginName=velero restore=velero/backup-nginx-1-1501-20210115122236

time="2021-01-15T06:53:09Z" level=warning msg="unable to restore additional item" additionalResource=persistentvolumeclaims additionalResourceName=ceph-ext additionalResourceNamespace=nginx-1 error="stat /tmp/761109646/resources/persistentvolumeclaims/namespaces/nginx-1/ceph-ext.json: no such file or directory" logSource="pkg/restore/restore.go:1034" restore=velero/backup-nginx-1-1501-20210115122236

VolumeSnapshot Not Pointing to Snapshot Content:

Excerpt from describe output of restored volumesnapshot below. The Spec is still pointing to PVC rather than snapshot content (this being restore operation).

Spec:

  Source:

    Persistent Volume Claim Name:  ceph-ext

  Volume Snapshot Class Name:      ocs-storagecluster-rbdplugin-snapclass

Status:

  Error:

    Message:     Failed to create snapshot content with error snapshot controller failed to update velero-ceph-ext-wxfmg on API server: cannot get claim from snapshot

    Time:        2021-01-15T06:53:09Z

  Ready To Use:  false

Events:

  Type     Reason                         Age    From                 Message

  ----     ------                         ----   ----                 -------

  Warning  SnapshotContentCreationFailed  9m44s  snapshot-controller  Failed to create snapshot content with error snapshot controller failed to update velero-ceph-ext-wxfmg on API server: cannot get claim from snapshot

Swanand Shende

unread,
Jan 15, 2021, 4:14:36 AM1/15/21
to Rahul R, Project Velero
Hi Rahul,

       the PR is for the CSI plugin repository: https://github.com/vmware-tanzu/velero-plugin-for-csi. I guess the PR is not merged yet into the CSI plugin repository. Please do share the mention where you found this PR to be available in the latest releases.
     
       To answer your other question, VolumeSnapshotLocations are required in case you wish to employ the native Storage Provider Snapshot functionality(Non-CSI persistent volumes) to protect presistent volumes provisined by those Storage Providers. VolumeSnapshotLocations play no part in protecting CSI volumes when the CSI feature flag is enabled during Velero installation.
      
Regards,
Swanand

Rahul R

unread,
Jan 15, 2021, 4:44:47 AM1/15/21
to Project Velero
Thanks Swanand, i see that PR is still to me merged.

Also looking at the PR and as you also mentioned the fix seems like fixing the namespace but I see that VolumeSnapshot is not patched to point to VolumeSnapshotContent on my setup which seems like a different issue ?

Thanks for confirming my understanding that VolumeSnapshotLocation is not required for CSI Snapshot. I had installed Velero with "--use-volume-snapshots" initially which lead to creation of default VolumeSnapshotLocation, which i deleted eventually.

Swanand Shende

unread,
Jan 15, 2021, 7:23:22 AM1/15/21
to Rahul R, Project Velero
Hi Rahul,

       The root cause of your symptom "VolumeSnapshot is not patched to point to VolumeSnapshotContent" stems from the root cause which has been fixed in PR https://github.com/vmware-tanzu/velero-plugin-for-csi/pull/82. I shall now try to explain how... please correct me if there is a break in logic.

       There are two scenarios being discussed here: Please do correct me if I am wrong.
  1. Backup of namespace nginx-1 is being restored in namespace nginx-1-1502
  2. Backup of namespace nginx-1 is being restored in namespace nginx-1(The original namespace)
       Let us consider the first condition i.e restore to an alternate namespace. Up until v0.1.2 ( i.e without the pending PR https://github.com/vmware-tanzu/velero-plugin-for-csi/pull/82 being included/merged), the CSI plugin code does execute the static binding of VolumeSnapshot  <-> VolumeSnapshotContent object during restore operation only if it does not find the VolumeSnapshot object in the destination namespace (the namespace in which the VolumeSnapshot and PVC is to be restored.) Refer condition on line https://github.com/vmware-tanzu/velero-plugin-for-csi/blob/v0.1.2/internal/restore/volumesnapshot_action.go#L71.

     Having said that, it looks like there is a bug present in this current implementation. Without the check for an alternate namespace(which is what is implemented in the PR), during restore, the plugin tries to check the existence of the VolumeSnapshot object in the old namespace rather than the destination namespace, which it may find. Hence, the static binding between VolumeSnapshot  <-> VolumeSnapshotContent never happens. The VolumeSnapshot object gets restored as it was in the backed-up version in the new destination namespace. The VolumeSnapshot object continues to point to the PVC. Then the PVC gets restored (There is a similar bug there) and the PVC now points to the VolumeSnapshot object it restored earlier. This creates a cyclical reference between the PVC and the VolumeSnapshot objects with both pointing to each other as the source. The result is what you are observing today. So, the suggested PR needs to be included/merged to get this working. Hope this nudges the reviewers to review this PR :)

   Let us take up the second condition. The reasoning for the first condition continues here. One additional thing to remember here is that Velero traditionally does not overwrite resources. So, if you have a namespace that is backed-up and you try to restore it back to the same namespace, no objects are going to be overwritten. Try deleting the namespace and then restoring it back. This time you will have the PVCs restored using the CSI snapshots.
    
   You could try building the CSI plugin with the changes in the PR(Really small change) yourself and use this new image as the CSI plugin in Velero to test the outcome. For testing purposes, you could directly refer to this image(https://hub.docker.com/layers/catalogicsoftware/velero-plugin-for-csi/v0.1.2.2/images/sha256-513685da43912033d2cd70e8222a6f7c161df7e7fae3e50da9b446475ead3532?context=explore) which has the fix proposed in the suggested PR. Just use this image to update the init container in Velero deployment, or mention this image name while installing the CSI plugin(velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.1.0,catalogicsoftware/velero-plugin-for-csi:v0.1.2.2 --bucket velero --secret-file ./credentials-velero --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://minio.velero.svc:9000--image velero/velero:v1.5.2 --features=EnableCSI ).

    Please do share the results and/or more questions if any. Hope this helps, happy to help.

Regards,
Swanand

Rahul R

unread,
Jan 19, 2021, 10:59:12 PM1/19/21
to Project Velero
Thanks much for additional details! I am trying to restore to original namespace now (after deleting it). That had failed earlier.
I am starting afresh and doing it again, will share the outcome here.

Thanks,
Rahul

Bahadir Kiziltan

unread,
Oct 22, 2021, 5:52:27 AM10/22/21
to Project Velero
Hello,

Can anyone confirm if the issue described here is addressed with CSI plug-in v0.2.0?

Thanks.

Reply all
Reply to author
Forward
0 new messages