Elephant2026h1 index corruption and repair

63 views
Skip to first unread message

Rob Stradling

unread,
Apr 2, 2026, 3:22:52 PM (8 days ago) Apr 2
to Certificate Transparency Policy
At 2026-04-01 07:15 UTC I received a report from a monitor operator that https://elephant2026h1.ct.sectigo.com/ct/v1/get-entries?start=227980800&end=227981055, as well as get-entries calls for just the first or last of those entries, were consistently producing HTTP 500 errors.  It was also observed that get-entries calls for various other entry ranges worked without any problems.

Errors of the following form were spotted in the CTFE logs:
GetEntries handler error: backend GetLeavesByRange request failed: rpc error: code = Unknown desc = ERROR: could not open file "base/16386/16484.166" (target block 519901635): previous segment is only 88541 blocks (SQLSTATE XX000)

Executing the query "SELECT relname, relkind FROM pg_class WHERE relfilenode = 16484;" showed us that the affected object was "leafdata_pkey", the primary key index on the "leafdata" table.  Since this object is an index rather than a table, we concluded that rebuilding the index should resolve the problem.

A "REINDEX CONCURRENTLY" operation began at 2026-04-01 07:58 UTC and eventually completed at 2026-04-02 06:37 UTC.  Since then, the previously problematic get-entries calls have consistently worked correctly.

The first and last CTFE errors indicating the index corruption occurred at 2026-03-30 04:11 and 2026-04-02 06:36 respectively.

https://www.gstatic.com/ct/compliance/endpoint_uptime_24h.csv shows poor availability for Elephant2026h1 over the past 24hrs.  We're speculating that this was due to the performance impact of the reindexing operation, and so we're optimistic that those numbers will look healthy again within 24hrs from now.

We think it's likely that our recent Proxmox outage was the root cause of the index corruption.  That incident finished around 6 days before the first evidence of index corruption was discovered, but our access logs show that no requests for https://elephant2026h1.ct.sectigo.com/ct/v1/get-entries?start=227980800&end=227981055 were received during that 6-day window, and we have not found any evidence of other entry ranges being impacted.

We will continue to monitor this log for any reoccurrence of this issue. 
Reply all
Reply to author
Forward
0 new messages