Xiaoming Yang
unread,May 25, 2026, 7:48:28 AMMay 25Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Certificate Transparency Policy
Symptom
1. Intermittent data errors occurred on the get-sth-consistency endpoint for both log2026a and log2026b logs, resulting in failed data validation.
2. Troubleshooting of the get-sth-consistency endpoint revealed that different request parameters returned the same cached response.
Root Cause
1. To optimize data efficiency at the gateway layer, the gateway was updated on 2026-05-22 17:50:00 +0800 to add a static API CT data pulling endpoint. Concurrently, the gateway routing match priority order was adjusted to prioritize high-volume endpoints.
2. In the adjusted routing matching rules, the regex-based routing at index 5 and index 6 were ^/log2026a.*/get-sth.* and ^/log2026a.*/get-sth-consistency.*, respectively. The caching rule for index 5 was configured to ignore URL query parameters, whereas index 6 was configured to match query parameters.
3. Following the adjustment, the misconfigured gateway caused get-sth-consistency requests to mistakenly match index 5. This triggered the behavior of ignoring query parameters when caching data, causing different request parameters to retrieve the same cached payload.
Impact Scope
1. Only affected data caching for the get-sth-consistency endpoint at the gateway layer.
2. Did not affect the ct_server service.
3. Did not affect the trillian_log service.
4. Did not affect database operations.
Remediation
1. Adjusted the gateway configuration to set ^/log2026a.*/get-sth-consistency.* to index=5 and ^/log2026a.*/get-sth.* to index=6.
Improvements
1. Clean up previously preset caching rules at the gateway layer, and progressively phase out caching at the gateway layer entirely.
2. Enhance testing procedures following configuration changes to ensure testing coverage spans the full scope of potential impact.
Timeline
1. 2026-05-22 17:50:00 +0800: Gateway configuration updated; misconfigured settings deployed.
2. 2026-05-23 02:51:00 +0800: Received email notification from Google.
3. 2026-05-23 06:10:00 +0800: Google-uptime monitoring showed intermittent fluctuations in the availability of get-sth-consistency.
4. 2026-05-24 16:50:00 +0800: Suspended add-chain / add-pre-chain endpoints.
5. 2026-05-24 21:03:00 +0800: Gateway configuration updated; misconfiguration resolved.
6. 2026-05-25 09:20:00 +0800: Resumed data submission endpoints.
Configuration Before Optimization:
```toml
{index=0, location_regex = "^/log2026a.*/get-sth-consistency.*", proxy_pass = "proxy_pass_log2026a", rate_limit = 16,cache_config = { cache_time_sec = 30, ignore_args = false, enable_compression = false } },
{index=1, location_regex = "^/log2026a.*/get-sth.*", proxy_pass = "proxy_pass_log2026a", rate_limit = 2, cache_config = { cache_time_sec = 60, ignore_args = true, enable_compression = false } },
```
Faulty Configuration (During Incident):
```toml
{index=5, location_regex = "^/log2026a.*/get-sth.*", proxy_pass = "proxy_pass_log2026a", rate_limit = 2, cache_config = { cache_time_sec = 60, ignore_args = true, enable_compression = false } },
{index=6, location_regex = "^/log2026a.*/get-sth-consistency.*", proxy_pass = "proxy_pass_log2026a", rate_limit = 16,cache_config = { cache_time_sec = 30, ignore_args = false, enable_compression = false } },
```
Configuration After Fix:
```toml
{index=6, location_regex = "^/log2026a.*/get-sth.*", proxy_pass = "proxy_pass_log2026a", rate_limit = 2, cache_config = { cache_time_sec = 60, ignore_args = true, enable_compression = false } },
{index=5, location_regex = "^/log2026a.*/get-sth-consistency.*", proxy_pass = "proxy_pass_log2026a", rate_limit = 16,cache_config = { cache_time_sec = 30, ignore_args = false, enable_compression = false } },
```