Hoi,
A quick update and a few questions from IPng's Gouda / Halloumi logs.
- On 2026-03-12 at 16:35 UTC I ratelimited two cross posters (or one
cross poster from two different networks):
- We observed the submissions/sec go from ~400 to ~45/sec immediately
- Of all traffic to Halloumi + Gouda (93M total), 39M were from the
cross poster(s):
# LABEL COUNT
1 2a01:4f9:4b::/48 36 031 259
2 2a03:4000:29::/48 13 279 765
3
40.75.145.0/24 3 822 027
4
48.204.59.0/24 3 672 889
5 240d:c000:f05f::/48 3 427 313
- Focusing on the write path, of the 56M submissions in the last 24hrs,
27M were from the cross poster(s) (HTTP 403 or HTTP 429)
# LABEL COUNT
1 403 24 723 966
2 429 2 695 745
3 404 113 159
4 410 10 226
5 499 6 135
6 500 1 559
- It means from the 39M requests, 27M were rejected and 12M made it
through.
However, once I placed the limiter on the write-path:
- We saw latency go down to normal, and 500s mostly vanish on regular
traffic (~450 or so 500s served out of 93M queries served)
- Shortly after applying the ratelimit to the cross posters, Matthew
showed latency as seen from Let's Encrypt submissions markedly improve
to Gouda.
- The Sunlight pool size (which regularly clipped at 750 before, causing
submissions to be rejected with HTTP 503), is now consistently under 40.
- After having notified the hosting providers, they both relayed the
message to their customer (could be one customer, or two, I do not
know), and they have not stopped sending ~400qps of submissions to Gouda
and Halloumi.
I've left Halloumi (the TesseraCT log) unrouched. It is serving as well
~400 submissions/sec from these cross posters.
For Gouda, the current ratio of HTTP 200s to HTTP 400s is quite high;
about 11% is an HTTP 400, and about 88.9% is HTTP 200. The 400s on the
write path are fairly distributed. For example they query
'website~=gouda.* AND uri~=/ct AND status=400' contains this top 5:
# LABEL COUNT
1
172.71.164.0/24 37 761
2
162.158.202.0/24 37 170
3 2a0a:4cc0:c0::/48 35 301
4
172.70.242.0/24 31 959
5
172.71.172.0/24 17 893
On Halloumi, they are heavily skewed to the cross poster (#1+#2, not
rate limited there), with the query 'website~=halloumi.* AND uri~=/ct
AND status=400':
# LABEL COUNT % BAR
1 2a01:4f9:4b::/48 422 588
2 2a0a:4cc0:c0::/48 44 192
3
172.71.164.0/24 37 762
4
162.158.202.0/24 37 172
5
172.70.243.0/24 31 436
I've e-mailed the two hosting providers. One of them responded pretty
quickly, noting that their customer was rather uncooperative. I've
followed up with both hosting providers' abuse teams to see if I can
arrange a dialog. Until now, it's not looking great though.
Question:
- Does Chrome offer anything shorter than 90d rolling average for log
availability? I see the performance still as ~98.89% despite us having
served almost any 500s in the last ~4 days on
https://www.gstatic.com/ct/compliance/endpoint_uptime.csv From the CT
Log Policy:
```Log availability is measured on a per-endpoint basis over a 90-day
rolling average from all requests made to the log by the Chrome team’s
compliance monitoring infrastructure. The log’s overall availability is
represented by the minimum of all per-endpoint availabilities.```
- Cloudflare shows average uptime at 100% in
https://radar.cloudflare.com/certificate-transparency/log/gouda2026h1?dateRange=24w
I'd like to get a signal from CAs (and possibly monitors, although the
read path was performant throughout for both Gouda and Halloumi), if
submissions to Gouda have improved or not. I believe we are in the clear
groet,
Pim
On 2026-03-12 16:04, 'Pim van Pelt' via Certificate Transparency Policy
wrote:
> Hoi folks,
>
> Last week the Chrome folks sent us a headsup that IPng has dipped just
> under 99% availability the write path of Gouda2026h1. We have been
> investigating proximate and symptomatic cause, but we have not yet
> found a root cause. Our ZFS diskpool (consisting of three Samsung
> MZILT3T8HBLS (SAS-3) drives in raidz-1) has slowed down considerably,
> while at the same time the load on Gouda is considerably higher than
> other logs, from some cross posters and ToR (write) traffic.
>
> I had previously loadtested [1] Sunlight to at least an order of
> magnitude more writes, and it is as of yet unclear what the root cause
> is. The pattern is intermittent 503s (the blue spikes below) -
>
> Due to sequencing pool overruns -
>
> Gouda is loaded a fair bit higher than other Sunlight logs, Filippo
> showed a comparison (Gouda in green) -
>
> --
> You received this message because you are subscribed to the Google
> Groups "Certificate Transparency Policy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
ct-policy+...@chromium.org.
> To view this discussion visit
>
https://groups.google.com/a/chromium.org/d/msgid/ct-policy/637a9d15-deb8-484d-96c2-1bc592a60ea0%40ipng.ch
> [1].
>
>
> Links:
> ------
> [1]
>
https://groups.google.com/a/chromium.org/d/msgid/ct-policy/637a9d15-deb8-484d-96c2-1bc592a60ea0%40ipng.ch?utm_medium=email&utm_source=footer