Why the backoff duration of failed CertificateRequest is hard coded to an hour?

Yelei Wu

Dec 28, 2020, 1:34:52 AM12/28/20
to cert-manager-dev
Hi there,

We have been using cert-manager to request certificates from Let's encrypt and occasionally encounters a problem that Let's encrypt cannot verify the DNS01 challenge response even though the self-check of cert-manager is succeed. Deleting the CertificateRequest to trigger a retry always solve the problem, however, the default retry backoff on failed CertificateRequest is hard coded to an hour in cert-manager:

log.Info("the failed existing certificate request failed less than an hour ago, will be scheduled for reprocessing in an hour")

I'm wondering is there any specific reason to choose this duration. And if not, should we make it configurable?

Best Regards,

Maël Valais

Jun 17, 2021, 3:29:09 AM6/17/21
to cert-manager-dev
That is a great question, so I digged into the arbitrary "one hour" that cert-manager uses. In trigger_controller.go:

// the amount of time after the LastFailureTime of a Certificate
// before the request should be retried.
// In future this should be replaced with a more dynamic exponential
// back-off algorithm.
retryAfterLastFailure = time.Hour

The commit itself does not give further indications except for the fact that one hour seemed to be a reasonable backoff duration.

As a cert-manager user, I would probably expect an exponential backoff instead of a fixed one. And I would probably expect the backoff upper limit to be something like one hour. 
