TrustAsia CT2025A and CT2025B MMD Violations

Andrew Ayer

unread,

Jul 14, 2025, 9:19:36 PMJul 14

to trustasi...@trustasia.com, Certificate Transparency Policy

TrustAsia CT2025A (log ID KOKBOP2DIUXpqdaqdTdtg3eohRKzwH9yQUgh3L3pjGY=) and CT2025B (log ID KCyL3YEP+QkSCs4W1uDsIBvqgqOkrxnZ7/tZ6D/cQmg=) are currently incorporating log entries with delays in excess of 24 hours.

Entry 115,911,249 of TrustAsia CT2025A has a timestamp of 2025-07-13 05:02:56.178+00, but the STH for tree size 115,908,250 has a timestamp of 2025-07-15 00:58:17.427+00.

Entry 110,014,266 of TrustAsia CT2025B has a timestamp of 2025-07-13 17:21:00.119+00, but the STH for tree size 110,009,267 has a timestamp of 2025-07-15 01:02:32.83+00.

The STHs are attached to this email.

I recommend that this log stop accepting new entries, to allow the submission backlog to clear.

Regards,
Andrew

ct2025a-sth.json

ct2025b-sth.json

Xiaoming Yang

unread,

Jul 15, 2025, 12:56:15 AMJul 15

to Certificate Transparency Policy, trustasi...@trustasia.com

TrustAsia CT Log2025a & Log2025b have temporarily block the add-chain api and are waiting for the merge to be completed.

Carlos Joan Rafael Ibarra Lopez

unread,

Jul 15, 2025, 2:29:13 PMJul 15

to Xiaoming Yang, Certificate Transparency Policy, trustasi...@trustasia.com

Thanks Andrew, our monitoring also detected some unincluded certificates in both TrustAsia 2025 logs yesterday. All of those have been included now.

Xiaoming: Thanks for stopping acceptance of new entries. Please investigate this issue and share the results with ct-p...@chromium.org.

Carlos, on behalf of the Chrome CT team

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/daf7c5bd-ebb6-482b-9953-8ee9c6b44618n%40chromium.org.

Mustafa Emre Acer

unread,

Jul 23, 2025, 8:23:51 PMJul 23

to Carlos Joan Rafael Ibarra Lopez, Xiaoming Yang, Certificate Transparency Policy, trustasi...@trustasia.com

Hi Xiaoming, did you have a chance to investigate this issue? If so, could you please share an update?

Thanks,

Mustafa, Chrome CT Team

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/CAABgKfWOnrYOtLBYqGK3J3JrNj2MX_vbhv%2B81-ShrQ8vbNQV5g%40mail.gmail.com.

Message has been deleted

Xiaoming Yang

unread,

Jul 24, 2025, 12:15:55 PMJul 24

to Certificate Transparency Policy, Mustafa Emre Acer, Xiaoming Yang, Certificate Transparency Policy, trustasi...@trustasia.com, Carlos Joan Rafael Ibarra Lopez

Hi Mustafa,

Yes indeed, we have already completed the issue investigation and is now formulating the analysis report. The report is expected to be submitted by this week. Will keep updated. Thank you.

Message has been deleted

Xiaoming Yang

unread,

Jul 25, 2025, 12:17:04 PMJul 25

to Certificate Transparency Policy, Certificate Transparency Policy, trustasi...@trustasia.com

Incident Report for TrustAsia log2025a and log2025b MMD violations

There were two phases in this incident 1) 2025-07-08: Approximately 11 hours of intermittent 5xx errors and short pause of signing service; 2)2025-07-12 to 2025-07-15: Log merge backlog. Both were chain reactions triggered by a sudden surge in number of log submission and pulling.

Impact and Details
1. 2025-07-08 Singing service paused
2. 2025-07-08 HTTP 500/502/504 errors occurred for approximately 4% of total requests
3. 2025-07-08 Observed error on database in our log service, that host is blocked because of many connection errors
4. 2025-07-15 Log merge backlog; tree merge service was running but could not keep pace with submission rates, leading to the result that the submitted log entries incorporating with delays in excess of 24 hours

Impact Window
1. 2025-07-08 04:20 +0800 ～ 2025-07-08 15:30 +0800
2. 2025-07-12 16:30 +0800 ～ 2025-07-15 20:30 +0800

Resolution
1. Blocked some IPs with large-volume submission and restarted services. The signing service briefly recovered but paused again while HTTP 500/504 errors persisted
2. Temporary solution: suspended add-cert and add-pre-cert interfaces. After suspended certificate submission, the signing service recovered and HTTP 500/504 errors ceased. However, the 500/504 errors continued when reopening certificate submissions
3. Added CPU resources for database servers
4. Phase two actions: temporarily disabled certificate submission interfaces
5. Phase two actions: upgraded database server disks for performance

Root Cause Analysis
1. Phase one: there were large number of submissions to log2025a/log2025b beginning around 2025-07-08 04:00 +0800, increasing database query pressure. The surge in submissions led to increased query volume, pushing database CPU usage to its limit. This resulted in query backlog and ultimately triggered the error “host is blocked because of many connection errors”. After we added CPU resources to the database servers, the 5xx errors were mitigated.
2. Phase two: After the short stabilizing by adding CPU resources on July 8th, the new request increase gradually raised the database IOPS demand. Sustained heavy querying led to an IOPS bottleneck, causing a sharp decline in database processing capability. This eventually affected the operation of trillian_log_signer, slowing down the tree merge rate and leading to backlog accumulation.
3. We monitor merge delays using the sequencer_merge_delay_count and sequencer_sequenced metrics. However, during periods of high overall system pressure, these metrics became inaccurate due to value stagnation.
4. For certificate submission and tree signing monitoring, our existing monitoring system remained operational. Since merging was continuously processing, no alert rules were triggered.

Timeline
1. 2025-07-08 04:30 +0800 Alerts notified in our internal workplace tool, for HTTP 500 and 504 errors
2. 2025-07-08 05:11 +0800 Manual phone alert by our staff that there were request timeouts on log2025b. Investigation revealed that starting from 2025-07-08 04:00 +0800, log2025a began receiving a surge of requests. We suspected it was caused by large-volume certificate submissions. Therefore identified a batch of suspicious IPs and temporarily blocked them.
3. 2025-07-08 06:00 +0800 We found that there were still HTTP 500 and 504 errors on log2025b, and its signing service continued pausing after short recovery. We then restarted services and rebooted business server, but found signing service continued pausing after short recovery.
4. 2025-07-08 09:00 +0800 After inspecting the database and connection middleware, we identified the error message “host is blocked because of many connection errors”. We attempted to adjust database runtime parameters and restart the database. The signing service briefly recovered but failed again, confirming insufficient CPU resources on the database servers.
5. 2025-07-08 11:00 +0800 Suspended add-cert and add-pre-cert interfaces. Checked the database status after it stabilized.
6. 2025-07-08 14:20 +0800 Prepared to add CPU resources to the database servers.
7. 2025-07-08 15:10 +0800 Completed adding CPU resources and the service began to gradually recover.
8. 2025-07-08 15:30 +0800 The HTTP 500 and 504 errors were confirmed ceasing and the signing service resumed normal operation.
9. 2025-07-11 08:00 +0800 ~ 2025-07-15 10:00 +0800 A growing backlog of unsigned entries has been observed in the merge tree.
10. 2025-07-15 09:27 +0800 Received external email notification that log entries incorporating with delays in excess of 24 hours.
11. 2025-07-15 09:42 +0800 Suspended certificate submission interface.
12. 2025-07-15 16:00 +0800 Implemented monitoring and alerting mechanisms for the Unsequenced table in the database.
13. 2025-07-15 19:20 +0800 Started to upgrade disk resource.
14. 2025-07-15 20:12 +0800 Completed upgrading disk resource.
15. 2025-07-15 20:26 +0800 Restored certificate submission interface and started to accept submissions. Awaiting system for stable running.
16. 2025-07-16 17:00 +0800 The monitoring system indicated stable operation.

Actions
1. Added database CPU resources, upgraded database disk performance and enhanced database processing capabilities.
2. In addition to using the sequencer_merge_delay_count and sequencer_sequenced metrics to monitor merge delays, monitoring and alerts have been added for the Unsequenced table.
3. Implemented a new operational strategy, that is the add-cert interface will be automatically suspended when the Unsequenced table reaches a threshold, preventing excessive data accumulation.
4. Plan to implement new external monitoring for more comprehensive API interface monitoring.

Mustafa Emre Acer

unread,

Jul 25, 2025, 6:33:50 PMJul 25

to Certificate Transparency Policy, xiaomi...@trustasia.com, Certificate Transparency Policy, trustasi...@trustasia.com

Thank you for the detailed update and the fixes, Xiaoming!

Reply all

Reply to author

Forward