Operand rollback due to failed upgrade

94 views
Skip to first unread message

Craig Brookes

unread,
Oct 27, 2020, 12:59:21 PM10/27/20
to Operator Framework, Wei Li, John Frizelle
Hey Guys,

I am looking for some feedback and advice around how an operator should handle rollback when it identifies a failed upgrade. 
Assuming the operator is capable of spotting a failed upgrade via monitoring metrics such as error rate etc, as the currently installed version of the operator is responsible for upgrading the operand, is it reasonable that it should also have the logic to be able to rollback that upgrade? Not sure if I can see another way of it working.
One problem I see with this, is once the rollback is performed, what would happen on the reconcile? Should the operator be able to reconcile both versions or should there be some signal to OLM that a rollback has happened and it should put the previous version of the operator in place?

--
Craig Brookes
Integreatly 
@maleck13 Github

Daniel Messer

unread,
Nov 4, 2020, 5:51:51 AM11/4/20
to Craig Brookes, Operator Framework, Wei Li, John Frizelle
Hi Craig,

Practically in order to migrate it is required that your Operator understands and can reconcile the most up-to-date version and the previous version of your Operand and CRDs. If an Operand update fails the Operator should make the workload fall back to the previous version. If an Operator update fails nothing should have been touched yet but this is impossible for OLM to detect.

This complexity is why OLM currently doesn't support downgrades. Failed Operand upgrades should be fixed by downgrading the Operand. Failed Operator upgrades should be fixed by release a subsequent release that does not have the bug that caused the failure. Or if it is never possible release an update to the upgrade graph that removes the path between the versions. OLM is investing significantly in the package and catalog tooling to support this, so Operator authors don't have to blow up their migration code.

HTH,
Daniel

--
You received this message because you are subscribed to the Google Groups "Operator Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to operator-framew...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/operator-framework/CA%2B-sgVhX%3DuTAnhmgRvygGHfUpoXBKTg8u4FaB-RYLmPK08b_qQ%40mail.gmail.com.


--
Daniel Messer

Product Manager Operator Framework & Quay

Red Hat OpenShift

Reply all
Reply to author
Forward
0 new messages