Help! Backup stuck InProgress, using restic and PV AzureFile

998 views
Skip to first unread message

Jean-Bernard Massicotte

unread,
Sep 11, 2020, 1:41:51 PM9/11/20
to Project Velero
Desperately looking for help on the problem below so hopefully this is the right forum.

First, here is what I did:
  • Installed velero in Azure AKS, with use-restic flag on, 1 Gi memory
  • Deployed an app that mounts 3 ManagedDisk PVs, 1 on AzureFile PV
  • I added the AzureFile PV name to the application pod annotation (backup.velero.io/backup-volumes=…)
  • I added the mountOption nouser_xattr to the AzureFile storageclass
  • I started a backup: velero backup create backup1 --include-namespaces=<mynamespace>
Then here is what I see:
  • 'velero backup describe' shows backup InProgress, no error, no warn, mynamespace is included, 1 item backed up out of 62, and 1 new Restic Backup
  • Azure Portal shows a restic folder and namespace subfolder created under the Azure container provided in velero install
  • Kubectl logs velero shows no error; last log says 'Initializing restic repository'. Then nothing.
I have been at this for a week. Any help appreciated!!!
I can attach more info if anybody is willing to help.

-JB

Nolan Brubaker

unread,
Sep 22, 2020, 12:55:36 PM9/22/20
to Project Velero
Hi JB,

The pod annotation to use is the volume name within the pod spec, not the PV name. This is because Velero's restic support has to get at the data through the pod, not directly on the PV.

This field is at the path `spec.containers.volumeMounts.name`.

Jean-Bernard Massicotte

unread,
Oct 16, 2020, 12:37:26 PM10/16/20
to Project Velero
hm... true, and thank you!

BTW this flaw you mentioned is real, but not the reason for the never ending In Progress situation. Turns out the restic daemonset was not running on all the VMs in my cluster. You see, I use taint and toleration to force some app to run on some nodes, and other taint & toleration to force another app to run elsewhere. These taint & toleration is what prevented restic from running everywhere. I added the needed toleration to the daemonset, and now it is running everywhere.

Reply all
Reply to author
Forward
0 new messages