Thanks Luke for posting this report.
RFC6962 section 3.5 says that an STH's
"timestamp MUST be at least as recent as the most recent SCT timestamp in the tree", and it's clear that the observed behaviour from mammoth2025h1 did not meet that requirement. The only plausible explanation that we can come up with is clock skew on
one or more of the nodes that produces SCTs / STHs, although sadly we no longer have logs from 2024-06-09 from which to prove or disprove that hypothesis.
We sought advice from the Trillian / CTFE (CT Front End) engineers on the
transparency-dev Slack (public invitation
here). Quoting Al Cutter from that conversation:
'Yeah, clock skew seems likely.
Unfortunately this is very tricky to check for - the CTFE chooses the timestamp, and that’s opaque by the time it gets to Trillian (it’s simply bytes in the Merkle Leaf at that point), so if the block on the server running CTFE is far enough ahead of the
one on the machine running the log_server then this could potentially happen.
Trillian’s log_signer binary does have a
--sequencer_guard_window flag which is intended to help with this; it more or less says “entries in the queue must have been there
for at least this long before they’re eligible for integration into the tree”.'
We reviewed our log_signer configuration and found that we were already setting
sequencer_guard_window=1s. In the hope of avoiding any future occurrence of this type of problem, we intend to increase that to
sequencer_guard_window=10s across all of our logs.
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.