Frequent 502/503 errors from DigiCert Wyvern 2026h1 log

119 views
Skip to first unread message

Michel Le Bihan

unread,
Oct 5, 2025, 11:35:35 AM (11 days ago) Oct 5
to Certificate Transparency Policy

Hi all,

I'm experiencing frequent "502 Bad Gateway" and "503 Service Temporarily Unavailable" errors when attempting to retrieve entries from https://wyvern.ct.digicert.com/2026h1/

The errors occur inconsistently across different get-entries requests, making it impossible to reliably follow the growing tree. Even with retries and backoff, the frequency of failures is high enough that I'm falling behind.

Is anyone else seeing this behavior from this log? Are there any known issues or recommended strategies for handling this level of instability?

Thanks,
Michel Le Bihan


Carlos Joan Rafael Ibarra Lopez

unread,
Oct 6, 2025, 1:28:49 PM (10 days ago) Oct 6
to Michel Le Bihan, Certificate Transparency Policy, Certificate Transparency Operations

Chrome's monitoring is also running into this, we started seeing the error responses for get-entries around 2025-05-05 2:00 UTC. By 2025-05-06 1:00 UTC this put the log under the 99% availability requirement.
Can Digicert please investigate this issue and provide an update following the incident response process?

Thanks,
Carlos, on behalf of the Chrome CT team

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/fe74baab-6c77-41d7-a78a-afe9ee6521ban%40chromium.org.

Carlos Joan Rafael Ibarra Lopez

unread,
Oct 6, 2025, 1:35:55 PM (10 days ago) Oct 6
to Michel Le Bihan, Certificate Transparency Policy, Certificate Transparency Operations
Apologies, dates should have been 2025-09-05 2:00 UTC. By 2025-09-06 1:00 UTC

Carlos Joan Rafael Ibarra Lopez

unread,
Oct 6, 2025, 1:37:21 PM (10 days ago) Oct 6
to Michel Le Bihan, Certificate Transparency Policy, Certificate Transparency Operations
And apologies again, that correction should have been 2025-10-05 2:00 UTC and 2025-10-06 1:00 UTC

Jeremy Rowley

unread,
Oct 6, 2025, 1:58:51 PM (10 days ago) Oct 6
to Carlos Joan Rafael Ibarra Lopez, Michel Le Bihan, Certificate Transparency Policy, Certificate Transparency Operations

Acknowledged – we are looking at it now.

Rick Roos

unread,
Oct 6, 2025, 10:39:25 PM (9 days ago) Oct 6
to Carlos Joan Rafael Ibarra Lopez, Michel Le Bihan, Certificate Transparency Policy, Certificate Transparency Operations
Hi Carlos, you should be seeing a recovery on that endpoint as of now. It appears as the load on that log suddenly increased (probably due to Let's Encrypt's rollover to using the 2026 logs), the CTile pods did not scale up appropriately to handle the new load. Those have been scaled up and are now returning respondes. 

Also, we use New Relic synthetic checks on that endpoint to get alerts when it is down. In this case the failure threshold was set to high relative to the checking frequency creating a window for the alerts to not be triggered if a few made through. This is also being adjusted. 

Thanks,
Rick


Reply all
Reply to author
Forward
0 new messages