Validate CR with CRD

33 views
Skip to first unread message

Martin Choma

unread,
Apr 21, 2022, 10:55:17 AM4/21/22
to Operator Framework
Hi,

I am looking for way how to test this condition in advance

"All existing instances, or custom resources, that are associated with the serving versions of the CRD are valid when validated against the validation schema of the new CRD." [1]

I think having capability to validate CR with CRD, same as I can validate XML by XSD could work for me. Basically I would say this is same as this [2].

Is something like that possible?

Regards,
Martin Choma

Alex Greene

unread,
Apr 22, 2022, 10:35:23 AM4/22/22
to Martin Choma, Operator Framework
Hello Martin,

Thanks for reaching out with this question! I don't know of a way to do this off cluster, but if your main concern is that "a new version of my operator which updates the CRD schema may invalidate existing CRs" there are practical steps you can take to avoid this issue:
  • Do not introduce required fields without introducing a new version of your CRD. This does not apply to required fields within a struct of an optional field.
  • Do not tighten validation on existing fields without introducing a new version of your CRD.
  • If you introduce a new version of your CRD, include a conversion webhook that is able to morph any version of your CRD into a different version. If the new version of your operator is the stored version, this approach works particularly well with the Kube-Storage-Version-Migrator project as all your objects will be updated to the new stored type.
  • If you introduce a new version of your CRD and you are not interested in a conversion webhook, you can introduce some migration logic in versions of your operator prior to marking the new CRD version as the stored version, but I don't recommend this approach.
As a shameless plug, the operator-lifecycle-manager project includes some logic which prevents upgrades that invalidate existing CRs and the RukPac project has merged a pull request introducing a validatingWebhookConfiguration that prevents CRD upgrades that invalidate existing CRs (this may be made available outside of RukPak in the future, TBD). Neither of these solutions prevent the issue prior to releasing your operator, but they do protect your user's clusters.

I hope this helps!

Best,

Alex

--
You received this message because you are subscribed to the Google Groups "Operator Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to operator-framew...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/operator-framework/CAAJBDTCmHbGJBkepPmUWG%3DkwUoruzrfBy%3DG-d99j0M0CjcZAiQ%40mail.gmail.com.


--
Alexander Greene
He - Him - His
Senior Software Developer
IRC: agreene

Martin Choma

unread,
Apr 25, 2022, 5:55:28 AM4/25/22
to Alex Greene, Operator Framework
Hi Alex,

Thank you very much for your response. Nicely summarized.  Responses inline below.

Regards,
Martin


On Fri, Apr 22, 2022 at 4:35 PM Alex Greene <agr...@redhat.com> wrote:
Hello Martin,

Thanks for reaching out with this question! I don't know of a way to do this off cluster, but if your main concern is that "a new version of my operator which updates the

Validation on cluster would work for us as well if that can be easily triggered. "I have got new CRD installed on cluster. And validate given CR comply to this CR by some oc command". Would be something like that possible?
 
CRD schema may invalidate existing CRs" there are practical steps you can take to avoid this issue:
  • Do not introduce required fields without introducing a new version of your CRD. This does not apply to required fields within a struct of an optional field.
  • Do not tighten validation on existing fields without introducing a new version of your CRD.
Yes we are aware of that limitation now. We are looking for simplest solution how to check that automatically earlier in development cycle, that any such backward incompatible change can't get into same version of CRD. As this would be common issue for all Operators, I was expecting some solution could be provided by Operator Framework.
  • If you introduce a new version of your CRD, include a conversionlifecycl webhook that is able to morph any version of your CRD into a different version. If the new version of your operator is the stored version, this approach works particularly well with the Kube-Storage-Version-Migrator project as all your objects will be updated to the new stored type.
  • If you introduce a new version of your CRD and you are not interested in a conversion webhook, you can introduce some migration logic in versions of your operator prior to marking the new CRD version as the stored version, but I don't recommend this approach.
As a shameless plug, the operator-lifecycle-manager project includes some logic which prevents upgrades that invalidate existing CRs and the RukPac project has merged a pull request introducing a validatingWebhookConfiguration that prevents CRD upgrades that invalidate existing CRs (this may be made available outside of RukPak in the future, TBD). Neither of these solutions prevent the issue prior to releasing your operator, but they do protect your user's clusters.

Exactly, we hit that logic after release. We introduced required field which default value was provided by Operator itself, so we did not hit during our standard tests. Only when Operator upgrade process was initialized, abovementioned CRD validation was failing. Now we are looking for easiest way to how to automatically check that before release. I would be thankful for any hint.

David Lanouette

unread,
Apr 25, 2022, 8:00:20 AM4/25/22
to Martin Choma, Alex Greene, Operator Framework
Hi Martin.
 
We are looking for simplest solution how to check that automatically earlier in development cycle, that any such backward incompatible change can't get into same version of CRD.

I would think the simplest way to test this would be a unit and/or integration test.  In the test, create a minimal version of the resource, and try to apply it.  It will get validated when it's deserialized from yaml to an object.  If your CRd changes, that deserialization would fail.  If you run your tests often (and you should) you will catch any incompatible changes.

If you do find an incompatible change, then, as Alex said, you will want to create a conversion webhook that will translate the old version of the resource to the new version.  That usually means applying default values for new (required) fields.  Here is an article that takes you step-by-step through an example using the memcached sample project.


HTH.


David Lanouette

Principal Software Engineer

David.L...@Redhat.com   
M: 919-610-6656    



Alex Greene

unread,
Apr 25, 2022, 2:32:45 PM4/25/22
to David Lanouette, Martin Choma, Operator Framework
AFAIK the operator-framework does not provide any way to test CRD backwards compatibility today. There are a few options though:
  • You can deploy OLM or possibly RukPak as part of your test suite and attempt an upgrade of your operator and check for CRD validation failures. Given that you ship your operator with OLM, I would recommend using OLM in your test suite.
  • If the validatingWebhookConfiguration is made available outside of RukPak, you may be able to deploy it and test the CRD changes.
  • The validate function behind the validatingWebhookConfiguration could be used to test your CRDs off cluster with fake clients / objects. I don't think this is supported today but I think it could work, so you may want to open an issue to add some level of support. Also, If you want to avoid using a fake, you would need to file an issue requesting changes to the code, and then possibly submit a POC. This could be done off cluster in unit tests.
  • You can create a test like David suggested.
I hope this helps,

Alex

Martin Choma

unread,
Apr 25, 2022, 11:30:21 PM4/25/22
to David Lanouette, Alex Greene, Operator Framework
Hi David,

> I would think the simplest way to test this would be a unit and/or integration test.  In the test, create a minimal version of the resource, and try to apply it.  It will get validated when it's deserialized from yaml to an object.  If your CRd changes, that deserialization would fail.

We do this a lot in our tests. Unfortunately it did not catch the issue. I assume because Operator provides default value for newly required field. That was just CRD validation during upgrade which was failing.

Regards,
Martin

Martin Choma

unread,
Apr 25, 2022, 11:38:23 PM4/25/22
to Alex Greene, David Lanouette, Operator Framework
Hi Alex,

are there any examples of first two points you are mentioning, so I can have a look to understand better how you mean?

Yes, I can create an issue. Which project should I create issue against?

As I replied to David, we have tests like David suggested, but it was not catching that particular issue we faced.

Regards,
Martin

Alex Greene

unread,
Apr 27, 2022, 12:09:53 PM4/27/22
to Martin Choma, David Lanouette, Operator Framework
> are there any examples of first two points you are mentioning, so I can have a look to understand better how you mean?
None that I can point to, this would be an in house solution where you install the latest version of OLM, deploy your operator, create some CRs, and attempt an upgrade. Very heavy handed.

> Yes, I can create an issue. Which project should I create issue against?

I might try to do this in the next week or so.

Best,

Alex

Martin Choma

unread,
Apr 28, 2022, 2:33:49 AM4/28/22
to Alex Greene, David Lanouette, Operator Framework
> I might try to do this in the next week or so.

Thanks

Camila Macedo

unread,
May 10, 2022, 7:10:39 PM5/10/22
to Martin Choma, Alex Greene, David Lanouette, Operator Framework, Rashmi Gottipati
Hi Martin, 

By looking at your comment:
> That was just CRD validation during the upgrade, which was failing.

The SDK has a command that allows us to test the upgrade. See `operator-sdk run bundle-upgrade`[1] 
OperatorHub CI[2] uses it to check if it is possible to upgrade the new bundle with what is shipped already.

In this way, could not this check be done with this command? 

Example: (In the CI)
  • Use operator-sdk olm install (to install OLM)
  • Use operator-sdk run bundle <image> (to install the latest bundle released / built from i.e a tag)
  • Create and push the bundle to a registry a new bundle from the PR source code with `make bundle-build bundle-push BUNDLE_IMG=<some-registry>/<name>-operator-bundle:<prID>`
  • Then, run operator-sdk run bundle-upgrade <some-registry>/<name>-operator-bundle:<prID>
WDYT? Could not this be a solution to verify your scenario? 

CAMILA MACEDO

SR. SOFTWARE ENGINEER 

RED HAT Operator framework

Red Hat UK

She / Her / Hers

IM: cmacedo

I respect your work-life balance. Therefore there is no need to answer this email out of your office hours.




Reply all
Reply to author
Forward
0 new messages