Backup and Restore of Operators

74 views
Skip to first unread message

Chris Johnson

unread,
May 3, 2021, 5:28:32 PM5/3/21
to operator-framework-olm-dev
We're trying to develop a procedure for backing-up and restoring OLM-managed operators that do not

We were thinking of a process (assuming a single, Own-Namespace OperatorGroup namespace):
1.  Backup the CatalogSource(s) image digest (pin the catalogsource)
2.  Backup the OperatorGroup
3.  Set the Subscriptions to Manual
4.  Backup the Subscriptions.

Then to restore:
1.  Restore the CatalogSource(s) using the digest
2.  Restore the namespace and operatorgroup
3.  Restore the subscriptions.
4.  Set the Subscriptions to Automatic.

This would then trigger OLM to create the install plans to the same state that it was prior to the backup.  By setting the Subscriptions to Manual, there won't be a race condition when OLM tries to resolve dependencies since they would already be present.   Does this seem like the right set of actions?

--Chris

Chris Johnson

unread,
May 4, 2021, 6:20:24 PM5/4/21
to operator-framework-olm-dev
We initially tried to use Velero / Konveyor / OADP to do a backup and restore of the Operator namespace.  This does not work, because there are several states that are not implicitly restored, such as the clusterPermissions (ClusterRoles/Bindings) and any other resources that are at the Cluster scope that may be in the bundle.

My previous post did not work.  We instead needed to sanitize the Subscriptions and artificially pause OLM...

To backup:
1.  Backup the CatalogSource(s) image digest (pin the catalogsource)
2.  Backup the OperatorGroup
3.  Backup the Subscriptions

Then to restore:
1.  Restore the CatalogSource(s) using the digest
2.  Restore the namespace and operatorgroup
3.  Create a "broken" subscription (points to an invalid catalogsource, for example).  This will Pause the OLM sat solver for the entire namespace.
4.  Restore the sanitized subscriptions (removing all but the bare minimum metadata and spec to avoid OLM looking for install plans, and other state that's not there)
5.  Delete the "broken" subscription, allowing the good subscriptions to resolve.

Here's a slack conversation on olm-dev:

Daniel Messer

unread,
May 31, 2021, 8:46:30 AM5/31/21
to Chris Johnson, operator-framework-olm-dev
Hi Chris,

I'm late to the party but after catching up to this, there seem to be two underrepresented aspects here, that I didn't see covered in this thread or in the slack conversation: the need to create a broken subscription to pause the sat-solver and the need to create sanitized subscriptions. Can you explain the issues you experienced before you resorted to doing those specific things? What kind of race conditions did you see when you restored multiple subscriptions at once? What problems appeared when you applied a `Subscription` manifest from a backup that potentially contained additional annotations and status blocks?

/D

--
You received this message because you are subscribed to the Google Groups "operator-framework-olm-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to operator-framework-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/operator-framework-olm-dev/65ee3e1e-2d13-48a0-a048-8f20c6aa1977n%40googlegroups.com.


--
Daniel Messer

Product Manager Operator Framework & Quay

Red Hat OpenShift

Reply all
Reply to author
Forward
0 new messages