Problems monitoring Yeti2025

833 views
Skip to first unread message

Rob Stradling

unread,
Apr 24, 2025, 11:37:56 AMApr 24
to Certificate Transparency Policy
For the past week or two, crt.sh has been hitting rate limits that have prevented it from getting entries from Yeti2025 quickly enough, resulting in a backlog that is currently around 90 million entries.

In contrast, CertSpotter currently has no backlog for Yeti2025 (according to https://sslmate.com/resources/certspotter_stats).

Can anyone advise me on the best strategy for monitoring this log (i.e., number of concurrent get-entries requests, get-entries req/sec, get-entries batch size) ?

crt.sh is also struggling somewhat with Wyvern2025h2 and Sphinx2025h2, so I'd appreciate advice on the best strategy for monitoring these logs as well.

Thanks!

Andrew Ayer

unread,
Apr 24, 2025, 12:06:34 PMApr 24
to Rob Stradling, 'Rob Stradling' via Certificate Transparency Policy
On Thu, 24 Apr 2025 08:37:56 -0700 (PDT)
"'Rob Stradling' via Certificate Transparency Policy"
Hi Rob,

Monitoring these three logs has been a struggle for us as well. We eventually settled on 4 parallel get-entries requests with a batch size of 64. Unfortunately, we have to rotate between 4 different source IP addresses to avoid rate limits. This is not great since not everyone has that many IPv4 addresses available.

One of the problems with Wyvern2025h2 and Sphinx2025h2 is that get-entries frequently returns an error (such as 500 Internal Server Error or 504 Gateway Timeout) that necessitates a retry.

Regards,
Andrew

Rob Stradling

unread,
Apr 24, 2025, 12:26:39 PMApr 24
to Certificate Transparency Policy, Andrew Ayer, 'Rob Stradling' via Certificate Transparency Policy, Rob Stradling
> Unfortunately, we have to rotate between 4 different source IP addresses to avoid rate limits. This is not great since not everyone has that many IPv4 addresses available.

Thanks Andrew.  I suspected as much.  crt.sh doesn't have the luxury of multiple source IP addresses, unfortunately.  :-(

Pierre Barre

unread,
Apr 24, 2025, 4:00:48 PMApr 24
to Rob Stradling, Certificate Transparency Policy, Andrew Ayer
Hi Rob,

We’re doing the same for Merklemap, pooling multiple IPs together with unified logic and shared state to handle errors.

Best,
Pierre
--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

Bas Westerbaan

unread,
Apr 25, 2025, 10:04:44 AMApr 25
to Pierre Barre, Rob Stradling, Certificate Transparency Policy, Andrew Ayer
Hi Rob,

We also can't keep up. Our backlog is about 100 million now. We use a single IP and ingest at about 70/s. That's below the ~120/s required.

I do not think it's acceptable to ask monitors to use multiple IPs for scraping.

By the way, we noticed Yeti is also surprisingly rate-limiting get-sth.

Best,

 Bas

Rob Stradling

unread,
Apr 25, 2025, 11:55:29 AMApr 25
to Certificate Transparency Policy, Bas Westerbaan, Rob Stradling, Certificate Transparency Policy, Andrew Ayer, Pierre Barre
> I do not think it's acceptable to ask monitors to use multiple IPs for scraping.

Thanks Bas.  I agree.  Monitors should be able to monitor any log without having to use DDoS techniques to circumvent rate limits.

If raising the rate limits is not possible, then in my view the correct solution to this problem is for the log operator to slow the growth of the log by imposing a tighter global rate limit on submissions.

Perhaps this is something that should be addressed by the browser CT policies?

Pierre Barre

unread,
Apr 25, 2025, 1:39:27 PMApr 25
to Rob Stradling, Certificate Transparency Policy, Bas Westerbaan, Andrew Ayer
Wouldn’t a long term solution be some kind of authentication that is not an ip address? That’d allow everyone to pull enough while preventing “bad actors” and rogue scripts to push unnecessary load.

Best,
Pierre

Bas Westerbaan

unread,
Apr 25, 2025, 5:51:33 PMApr 25
to Pierre Barre, Rob Stradling, Certificate Transparency Policy, Andrew Ayer
What would be the criterion to accept a new credential?

If any new keypair would be accepted and rate-limited separately, then that'd be strictly less effective than IP: at least IPs have a tangible cost to acquire.

If on the other hand, we'd be very strict and only admit publicly established monitors, then that'd hurt the trust in the transparency of the whole system.

Of course there might be something in between.

However, I think Static CT is the obvious first step instead of authentication. Let's hope that'll be enough for now.

Best,

 Bas


Joe DeBlasio

unread,
Apr 25, 2025, 7:18:50 PMApr 25
to Bas Westerbaan, Pierre Barre, Rob Stradling, Certificate Transparency Policy, Andrew Ayer
I'm responding informally, without having consulted with the rest of my team, and on a Friday afternoon. This is perhaps foolish. Please do not consider this as an official policy communication from Chrome.

Restrictive rate limits are, in my opinion, not consistent with the goals of the transparency ecosystem. CT monitors should typically be well short of triggering any limits, since they are an essential and intended role in the ecosystem. I'm similarly quite uncomfortable with authentication for the reasons Bas mentioned. Whether or not we should encourage rate limits primarily on submission is also somewhat tricky -- doing so may better distribute load from certificate issuance across the ecosystem... so long as there is sufficient excess capacity among other logs. It is important, however, that CT submission not become a blocker for certificate issuance. 

Log operators ought to rely on rate limits only as a last resort. From a policy perspective, Chrome considers aggressive rate limiting as a form of reduced availability, but one that's mostly invisible to the monitoring infrastructure (since the restrictions are not uniformly felt). While Chrome CT folks have been quite lax on enforcement, it is my hope and intent that enforcement is ratcheted up in the future.

The present reality, however, makes that somewhat difficult.

4 of 6 current log operators in Chrome's log list have 2025-expiry Usable logs (i.e. those that are under heaviest load) that are not currently compliant with Chrome's policy of 99% uptime (DigiCert, Sectigo, Cloudflare, TrustAsia). The path to improved availability for many of these existing logs does not seem trivial.

Sectigo's postgres-backed Trillian RFC6962 logs and Let's Encrypt's static-ct-api logs offer hope, but while trending upward, neither set of logs are demonstrating sufficient availability yet. Future Tessera-based static-ct-api logs from Google will offer another avenue, but no public logs exist from that codebase yet.

If and when these efforts succeed, Chrome should more stringently enforce availability requirements.  At present, it's less clear to me what options are available. Suggestions are very much welcome.

Joe, very much on behalf of no one but himself.

Chuck Blevins

unread,
Apr 25, 2025, 8:00:48 PMApr 25
to Joe DeBlasio, Bas Westerbaan, Pierre Barre, Rob Stradling, Certificate Transparency Policy, Andrew Ayer
Hi, all
DigiCert has made some improvements
  • Added two additional nodes to each Kubernetes cluster, and then redeployed applications in an attempt to spread pods out across the new nodes.  We will likely increase the cluster size again based on Node CPU Load metrics
  • Increased the pod count of frontend and backend servers for Wyvern 2025h2 and Sphinx 2025h2 CT log shards to observe any performance difference.  We watch over the weekend to see if this is sustainable and evaluate if we can go higher.
  • Increased the rate limit of the get-entries endpoint from 1 request/second/IP to 3 requests/second/IP on all Wyvern and Sphinx CT log shards
  • Enabled settings in the frontend application to no longer return error strings to the client when there is an internal error. 
We agree rate limiting is undesirable and are actively evaluating how to scale our CT servers so that they can be removed; except in exigent circumstances.
I apologize for the frustrations and appreciate your patience. 
DigiCert is committed to delivering a stable and available CT service. 

Cheers
Chuck

Matt Palmer

unread,
Apr 27, 2025, 7:41:11 PMApr 27
to ct-p...@chromium.org
On Fri, Apr 25, 2025 at 04:18:27PM -0700, Joe DeBlasio wrote:
> Restrictive rate limits are, in my opinion, not consistent with the goals
> of the transparency ecosystem. CT monitors should typically be well short
> of triggering any limits, since they are an essential and intended role in
> the ecosystem. I'm similarly quite uncomfortable with authentication for
> the reasons Bas mentioned. Whether or not we should encourage rate limits
> primarily on submission is also somewhat tricky -- doing so may better
> distribute load from certificate issuance across the ecosystem... so long
> as there is sufficient excess capacity among other logs. It is important,
> however, that CT submission not become a blocker for certificate issuance.

There's a *super* big chicken-and-egg problem here.

It is economically rational that additional log capacity won't be
provided until there is demand for it. However there won't be demand
for additional log capacity until there is pain being felt by those who
need that additional log capacity -- ie issuing CAs.

Thus, predicating a browser policy change on there being capacity
available, so that CAs won't be inconvenienced, when capacity won't be
made available until CAs *are* inconvenienced, means that the policy
change will never be made -- and hence monitors will keep running
behind, because log operators have no incentive to satisfy that segment
of the user population.

> Sectigo's postgres-backed Trillian RFC6962 logs and Let's Encrypt's
> static-ct-api logs offer hope, but while trending upward, neither set
> of logs are demonstrating sufficient availability yet. Future
> Tessera-based static-ct-api logs from Google will offer another
> avenue, but no public logs exist from that codebase yet.
>
> If and when these efforts succeed, Chrome should more stringently
> enforce availability requirements. At present, it's less clear to me
> what options are available. Suggestions are very much welcome.

I wouldn't call it a suggestion, as such, but I would observe that I've
periodically been trying, since the birth of CT[1], to gather financial
support for independently-operated CT logs. For various reasons, that
has never come to fruition, but it still seems to me that logs to which
a variety of organisations contribute resources (ie money), would have
far more capacity than logs whose available capacity is constrained by
the willingness of a single organisation to support.

- Matt

[1] Those who have been around long enough may recall that a log I
operated, alpha.ctlogs.org, was the first non-Google log to be accepted
by the Chromium CT log policy.

Pierre Barre

unread,
Apr 28, 2025, 3:51:14 PMApr 28
to Matt Palmer, Certificate Transparency Policy
If it is economically rational that capacity will not be provided without demand, it follows logically that CAs, who are the primary drivers of that demand, must be directly responsible for ensuring the ecosystem’s health.

Expecting capacity to somehow emerge without making CAs take explicit responsibility seems unrealistic. If CAs are to rely on CT logs as critical infrastructure, it makes far more coherent sense that they should be required to operate, or fund, sufficient logging capacity themselves. Otherwise, we remain trapped in the deadlock you describe: waiting for someone else to solve a problem whose resolution is not in their immediate interest.

In short, if the need is clear and the affected parties are identifiable, responsibility should follow naturally

Best,
Pierre
> --
> You received this message because you are subscribed to the Google
> Groups "Certificate Transparency Policy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to ct-policy+...@chromium.org.
> To view this discussion visit
> https://groups.google.com/a/chromium.org/d/msgid/ct-policy/655ef9ef-fda4-42df-8598-4533ed4e15c5%40mtasv.net.

Matt Palmer

unread,
Apr 28, 2025, 8:22:46 PMApr 28
to Certificate Transparency Policy
On Mon, Apr 28, 2025 at 11:57:33PM +0200, Winston de Greef wrote:
> Static CT logs are currently being transitioned to. These are significantly
> cheaper, so maybe this issue just resolves itself?

Perhaps, and I certainly hope so. But we're also about to see a
significant increase in issuance volume, with the gradual reduction of
validity periods to 45 days. It's entirely possible that the extra
capacity of static CT logs might be quickly consumed by the increased
volumes from validity period decreases and natural growth.

> Also, ct.cloudflare.com shows that Internet Security Research Group/Let's
> Encrypt, Sectigo, Google and Digicert collectively issue 91% of all
> certificates tracked. All of these entities run their own CT logs.
> So this means that practically speaking, CAs are generally speaking already
> explicitly taking responsibility. 91% of the load on the CT system comes
> from entities that run their own logs.

Kinda-sorta, but not really. As Joe previously mentioned:

> 4 of 6 current log operators in Chrome's log list have 2025-expiry
> Usable logs (i.e. those that are under heaviest load) that are not
> currently compliant with Chrome's policy of 99% uptime (DigiCert,
> Sectigo, Cloudflare, TrustAsia).

Further, at least one of those CAs is running a log that apparently
cannot be meaningfully audited (yeti2025), which is what spawned this
thread.

- Matt

Matt Palmer

unread,
Apr 28, 2025, 8:54:43 PMApr 28
to Certificate Transparency Policy
On Mon, Apr 28, 2025 at 09:50:36PM +0200, Pierre Barre wrote:
> If it is economically rational that capacity will not be provided
> without demand, it follows logically that CAs, who are the primary
> drivers of that demand, must be directly responsible for ensuring the
> ecosystem’s health.

CAs are "directly responsible" (I prefer the term "rationally
motivated") for making sufficient CT log *write* capacity available,
insofar as if they don't have a log to write to, they effectively can't
issue certificates. It follows they will expend the resources necessary
to ensure that capacity is available, one way or another.

Where we have the problem is on the other side: ensuring the CT logs
have sufficient capacity available to service the monitoring
functionality which is essential for the CT ecosystem as a whole, but
is not important for CAs themselves.

Which is where the possible policy change comes in. Chromium's CT log
policy could be amended to more clearly specify availability
requirements for the get-entries endpoint, particularly something along
the lines of requiring that a single IP address not be prevented by rate
limiting from retrieving the entire contents of the log at some multiple
of the log's growth rate.

If a log were to be at risk of being nuked because it couldn't be
monitored, that would threaten the write capacity that is the directly
valuable portion of a log to a CA, and it would then follow that
resources would be allocated to ensure compliance with the policy. How
compliance is achieved is left to the log operator's discretion: they
could provide sufficient get-entries capacity to satisfy the current
write rate, or throttle writes to allow get-entries to run fast enough.
(There might also be other solutions I haven't considered)

> Expecting capacity to somehow emerge without making CAs take explicit
> responsibility seems unrealistic.

Yes, that is, essentially, my thesis.

> If CAs are to rely on CT logs as critical infrastructure, it makes far
> more coherent sense that they should be required to operate, or fund,
> sufficient logging capacity themselves.

I think you might have misunderstood what I was driving at in my
previous message. I absolutely do not think that CAs should be "forced"
to do any specific action to solve the problem of insufficient CT log
capacity. That way lies all manner of unpleasantness.

Rather, I wanted to highlight the contradiction in Joe's stated desire
to only introduce more stringent policy requirements if it doesn't cause
inconvenience to CAs. If "cannot block issuance" is a bar to
improvement, then no improvement is ever likely to happen, because most
changes have the potential to disrupt issuance one way or another.

To be clear: I believe that a policy change along the lines I described
above should be introduced as soon as possible, with an appropriate
"lead time" to allow CAs and log operators to adjust to the new reality,
before commencing enforcement.

By specifying policy, without mandating a particular implementation, it
allows the market to innovate to produce the desired outcome in a
relatively efficient manner -- certainly one which is more likely to be
efficient than if Chromium were to mandate a particular means of
achieving the desired outcome (such as requiring CAs to chip into a "CT
log operation fund", for example).

- Matt

Winston de Greef

unread,
May 12, 2025, 12:08:22 PMMay 12
to Pierre Barre, Matt Palmer, Certificate Transparency Policy
Static CT logs are currently being transitioned to. These are significantly cheaper, so maybe this issue just resolves itself?

Also, ct.cloudflare.com shows that Internet Security Research Group/Let's Encrypt, Sectigo, Google and Digicert collectively issue 91% of all certificates tracked. All of these entities run their own CT logs.
So this means that practically speaking, CAs are generally speaking already explicitly taking responsibility. 91% of the load on the CT system comes from entities that run their own logs.

Sincerely,
Winston de Greef


Pierre Barre

unread,
May 12, 2025, 12:18:04 PMMay 12
to Winston de Greef, Matt Palmer, Certificate Transparency Policy
I think making CA-operated CT logs an explicit requirement would still have value beyond just cost considerations.

While major CAs already run their own logs, this is currently voluntary. Making it a formal requirement would:

- Eliminate ambiguity about who is responsible for CT log operation
- Create a clear framework that doesn't rely on the goodwill of major players

CAs are granted significant trust to issue publicly trusted certificates. I believe this privilege should come with explicit obligations for transparency infrastructure, not just implicit expectations based on current practices.

Best,
Pierre
Reply all
Reply to author
Forward
0 new messages