All,
Through this message we would like to inform the community of an upcoming maintenance window affecting all Mammoth sharded CT logs and update the community on a recent availability incident.
Scheduled Maintenance
Sectigo has a scheduled maintenance window on March 16th, 2024 starting at 09:00 UTC. The maintenance window is expected to take no longer than 8 hours. During this maintenance window the logs will be completely unavailable.
Incident Report
On March 5th we were notified by the Chrome Certificate Transparency team that they were observing intermittent availability issues with all Mammoth CT log shards.
Our own logging shows an anomaly started on March 4th at 16:20 UTC. By March 6th, 18:13 our logging reported a return to normal state.
Unfortunately, we were unable to confirm the distinct root cause of the availability issue. Our internal monitoring system did not detect any issues directly with the CT logs; however, several tests performed from outside of our network confirmed the issues reported by the Chrome CT team.
Upon reviewing our system logs, we
discovered that the incident started right after we increased the available
memory and CPU limits for all our CT logs. We didn’t believe that those
increases themselves could have caused this issue, but this discovery led us
into investigating the control plane nodes of our Kubernetes cluster. We
performed restarts of the control plane nodes, which resolved the issue.
Based on further investigation, it’s our belief that after our Kubernetes
cluster renewed several certificates, at least some of the control plane nodes did
not automatically switch to using these renewed certificates. Our best guess is
that the intermittent availability issues were due to the previous certificates
being still in use after they had expired. Automation is great when it works, but
when it fails unexpectedly it can sometimes be harder to detect and diagnose
the problem than it would have been if a manual mechanism had been used
instead!
Regards,
Martijn Katerbarg
Sectigo