Avoid concurrent reconciles on same CR

1,330 views
Skip to first unread message

Olivier Chantrel

unread,
Jul 28, 2021, 6:06:10 AM7/28/21
to Operator Framework
Hi, 
I developed a golang operator and I noticed that reconcile method is called twice or more for a single event (creation, update or delete) on a CR. It seems to depend on the number of nodes I have on K8S cluster (I don't see this behaviour locally on crc perhaps because it is a one-node cluster but when deployed on prod cluster events appear in parallel for the same resource).
This is not really an issue because the operator is built to correctly manage this but it produces error logs (for example on delete action because I try to remove external ressources related to the CR in finalize() method and if external ressource is already deleted by a reconcile process the next ones won't be able to delete external ressources which don't exist anymore).
I thought this was managed by MaxConcurrentReconciles parameter in my custom controller but this doesn't change anything.

Thanks for your help,
Olivier 

David Lanouette

unread,
Jul 28, 2021, 4:32:59 PM7/28/21
to Operator Framework
Do you update the CR at all in your operator?  Set status, etc.  If you do, that will count as an update, and you'll get another call to your operator to reconcile the change.

Olivier Chantrel

unread,
Jul 29, 2021, 5:33:06 AM7/29/21
to Operator Framework
First of all, thanks for your replies.
I confirm I have only one replica in the config file. 
I noticed that setting status on a CR will launch a new reconciliation but in the current case I deal with finalize() method with no update on status or other subressources.

Regards,
Olivier

Zvonko Kaiser

unread,
Jul 29, 2021, 5:51:01 AM7/29/21
to Olivier Chantrel, Operator Framework
You could add a GenerationChangePredicate that will discard updates that did not increase the ResourceGeneration. 
Usually Status is a subresource that does not change the generation of an object. 

See this for more information and how to even filter more events that "you" do not want to trigger a reconciliation: 


Regards, 
Zvonko

--
You received this message because you are subscribed to the Google Groups "Operator Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to operator-framew...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/operator-framework/8472d3ba-6a52-4315-ae6a-687c81ba4ed3n%40googlegroups.com.

David Lanouette

unread,
Jul 29, 2021, 8:50:36 AM7/29/21
to Operator Framework
>  but in the current case I deal with finalize() method with no update on status or other subressources.

My understanding is that finalize() only gets called when your resource is deleted - not when it's updated.  See the kubebuilder docs on Finailzers for some more details.

Camila Macedo

unread,
Aug 2, 2021, 7:07:36 PM8/2/21
to David Lanouette, Operator Framework
Hi David, 

I'd suggest you ensure that you have the latest state of the resource in the variable. I mean, you re-fetch/get them before doing the actions.  

Example:   
// Check if the deployment already exists, if not create a new one
c := &appsv1.Deployment{}
err = r.Get(ctx, types.NamespacedName{Name: memcached.Name, Namespace: memcached.Namespace}, found)
if err != nil && errors.IsNotFound(err) {
        // {... code}
} else if err != nil {
	 // {... code}
}

Then, before doing anything with the Deployment, which was stored in the `found` variable, we get it again to ensure that we have the latest version/state. Note that from the first time that we get that from the cluster until the second moment that we will use it, it can be changed or no longer exist there. In the code/variable, we have stored the data from when we requested that to the k8s API via the client.

So, if a reconciling process already deletes the external resource, the next ones won't will try to delete them because they will check first that it no longer exists. So, in this scenario, you can stop the reconcile with the return or move to the next step/operation (if required). It is like a "loop" that will be executed until you ensure that the desired state is applied to the cluster. 

The finalizer is useful when you want to ensure that an action/operation will be performed before the CR be deleted. Therefore, you will not allow the CR to get deleted until the requirement is satisfied. 

Also, note that the default value for the MaxConcurrentReconciles is one(1) already. (see https://pkg.go.dev/github.com/kubernetes-sigs/controller-runtime/pkg/controller#Options and https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/controller/controller.go#L109). In this way, if you change the code to use this option informing 1 then, it should not make any difference. 

I hope that it helps you. 

Cheers, 

CAMILA MACEDO

SR. SOFTWARE ENGINEER 

RED HAT Operator framework

Red Hat UK

She / Her / Hers

IM: cmacedo





--
You received this message because you are subscribed to the Google Groups "Operator Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to operator-framew...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages