Sunlight v0.8.1 security release

367 views
Skip to first unread message

Filippo Valsorda

unread,
Jun 30, 2026, 3:29:30 PM (3 days ago) Jun 30
to Certificate Transparency Policy
Hello fellow humans, AIs, and super-intelligent shades of the color blue,

Today we released Sunlight v0.8.1, a security fix for v0.8.0.


Summary. The Sunlight deduplication cache used 128-bit keys, which potentially allowed obtaining SCTs for forged certificates. In the WebPKI, this would have required a colluding, compromised, or separately vulnerable CA. Tuscolo, Willow, and Sycamore have rolled out the patch. Gouda is scheduled to roll out the patch soon.

Details. Submissions were deduplicated by looking them up in a SQLite database using a SHA-256 hash truncated to 128 bits. If a match is found in the deduplication cache, Sunlight signs an SCT over the stored (old) timestamp and index, and the submitted (pre-)certificate. With 128-bit keys, an attacker could compute a collision offline in 2⁶⁴ work (costing in the order of tens of thousands of dollars), submit an honest certificate, and then obtain an SCT for the colliding certificate, without the latter making it into the log.

Mitigating factors. This is a collision attack, so it requires selecting a pair of certificates which happen to randomly have the same truncated SHA-256 hash from a large pool of 2⁶⁴ candidates. In the WebPKI, this attack would require a colluding, compromised, or separately vulnerable CA, because the TLS BRs require unpredictable entropy in the serial number, making it impossible to compute the collision offline.

Remediation. Sunlight v0.8.1 now only inserts new entries in the cache using their full SHA-256 hash, in a new "cache256" table. Existing 128-bit keys are still queried, to avoid a wave of cache misses when upgrading from v0.8.0 (or earlier) to v0.8.1 (or later). This is relatively safe because a collision attack requires control over both entries. Either the attack was executed in the past, in which case the attacker already obtained a forged SCT, or the existing 128-bit keys are honest. With approximately 2³² honest entries in the cache, a multi-target second preimage attack that searches for a new certificate that matches one of them would require 2⁹⁶ work (taking in the order of a couple years with the same energy consumption as the whole Bitcoin network).

For extra safety, operators can optionally run the new recompute-cache command to rebuild the cache with 256-bit keys from the backend storage (with verification of the STH, not to trust the backend). Once recompute-cache has completed, running

sqlite3 cache.db "ALTER TABLE cache RENAME TO cache_legacy;"

disables the fallback. This is safe to run concurrently with the log. (Do not use DROP as that would be a long write transaction, which would block the sequencer. Ask me how I know. Poor Navigli.)

The Tuscolo logs have rolled out the patch, regenerated the cache, and disabled the fallback. I am told the Willow and Sycamore logs are now running v0.8.1, and that the Gouda rollout is planned for 22:30 CEST. I'd like to thank the Sunlight operators for their rapid response.

Retrospective. The deduplication cache implementation was probably too clever. It uses deterministic ECDSA to produce byte-identical SCTs without storing the signatures. I felt pressure to limit the on-disk size of this cache, because it needs to live locally, even in setups that use object storage for the log. That was probably over-indexed in retrospect: the Tuscolo2025h2 cache is 70GB. I also apparently did think about collisions, because the cache type had a comment saying

// birthday bound of 2⁴⁸ entries with collision chance 2⁻³²

but I failed to realize that intentional collisions were both possible and valuable to an attacker. Besides being less clever with the cache, it's not clear how else this could have been prevented without hindsight. Suggestions are welcome.

Timeline.
  • 2026-06-10: the issue is spontaneously reported by Anthropic, along with lower severity issues in Sunlight and age. The issue is reported as severity LOW. Upon a first skim, I fail to recognize the actual severity, and defer it.
  • 2026-06-25: upon triaging the issue, I identify its severity, and coordinate with other Sunlight operators on a suitable release date. (No details of the vulnerability were disclosed to the operators.) Tuesday is selected to avoid US and Canada holidays.
  • 2026-06-27: fix deployed to the Navigli staging logs.
  • 2026-06-28: fix deployed to the Tuscolo production logs.
  • 2026-06-30: public release, fix deployed to Willow and Sycamore; Gouda deployment planned for 22:30 CEST.
Credit. This vulnerability was discovered by Claude, Anthropic's AI assistant, and triaged by the Anthropic security team in collaboration with Anthropic Research.

Alla prossima (well, hopefully not),
Filippo

Matthew McPherrin

unread,
Jun 30, 2026, 3:56:00 PM (3 days ago) Jun 30
to Filippo Valsorda, Certificate Transparency Policy
Let’s Encrypt has updated our logs to 0.8.1.

Thanks to Filippo for your hard work on maintaining Sunlight and working with us on coordinating schedules.

Our logs were down for a few minutes each while we deployed updated software.

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/83d45f12-e70f-4733-9a7d-3378087f1088%40app.fastmail.com.

Andrew Ayer

unread,
Jun 30, 2026, 4:55:58 PM (3 days ago) Jun 30
to Filippo Valsorda, Certificate Transparency Policy
Hi Filippo,

By remarkable coincidence I filed <https://github.com/transparency-dev/tessera/issues/1018> in Tessera/Tesseract last week. It's the same class of issue - a fault in the dedupe cache leading to an unincorporated SCT.

I made the observation that it's fragile for the input to the SCT signing operation to come from two places (the submitted certificate and the dedupe cache). Tesseract is going to address that by retrieving the entire entry from storage (instead of just the timestamp) and signing over that instead of an amalgamation.

I think it would be a very good idea for Sunlight to also do that, or to store the SCT signature in the cache. Either change would make the correct operation of the log not dependent on the correct operation of the dedupe cache.

By the way, v0.8.1 is not currently part of any branch in the sunlight repo.

Regards,
Andrew

Jeroen Massar

unread,
Jun 30, 2026, 5:44:10 PM (3 days ago) Jun 30
to Filippo Valsorda, Certificate Transparency Policy


> On 30 Jun 2026, at 21:29, Filippo Valsorda <fil...@ml.filippo.io> wrote:
>
> Hello fellow humans, AIs, and super-intelligent shades of the color blue,
>
> Today we released Sunlight v0.8.1, a security fix for v0.8.0.
>
> https://github.com/FiloSottile/sunlight/releases/tag/v0.8.1
> https://github.com/FiloSottile/sunlight/compare/v0.8.0...v0.8.1
>
> Summary. The Sunlight deduplication cache used 128-bit keys, which potentially allowed obtaining SCTs for forged certificates. In the WebPKI, this would have required a colluding, compromised, or separately vulnerable CA. Tuscolo, Willow, and Sycamore have rolled out the patch. Gouda is scheduled to roll out the patch soon.

IPng Rennet was upgraded at 20:30 CEST and Gouda has been updated since 22:30 CEST. Both running fine.

Cache regeneration is in progress.


Thanks Filippo for all the great work on many thanks and for the timely heads up so we could be prepped for this quick and painless upgrade.

Regards,
Jeroen

Filippo Valsorda

unread,
Jun 30, 2026, 5:49:51 PM (3 days ago) Jun 30
to Andrew Ayer, Certificate Transparency Policy
2026-06-30 22:55 GMT+02:00 Andrew Ayer <ag...@andrewayer.name>:
I made the observation that it's fragile for the input to the SCT signing operation to come from two places (the submitted certificate and the dedupe cache). Tesseract is going to address that by retrieving the entire entry from storage (instead of just the timestamp) and signing over that instead of an amalgamation.

I think it would be a very good idea for Sunlight to also do that, or to store the SCT signature in the cache.  Either change would make the correct operation of the log not dependent on the correct operation of the dedupe cache.

I considered retrieving the entry from storage (at least to upgrade the 128-bit key to the 256-bit version), but that would be a large change in the performance profile: serving a submission of a known certificate, which can be resubmitted an unlimited number of times (in the version where we do this for robustness, not for migration), would require retrieving 256 certificates from object storage, plus ~ 4 x 256 hashes to authenticate it. (Sunlight does not consider the object storage trusted, only the local cache and the lock storage.) Caching will not help against the worst case. I am kinda surprised TesseraCT can make that work.

The alternative is indeed storing the signature instead of recomputing it. That would roughly triple the size of the cache, which we just doubled. I had over-indexed on cache size, but 6x might actually matter?

The code is pretty easy to audit at this point, the cache is

SHA-256(entry_type || issuer_key_hash? || certificate) => timestamp, index

and the ECDSA message is

SHA-256(0 || 0 || timestamp || entry_type || issuer_key_hash? || certificate || index_extension)

I'll need to ponder it a bit. Thank you for the input.

By the way, v0.8.1 is not currently part of any branch in the sunlight repo.

Yeah, I forked off v0.8.0 to minimize the diff and ensure the rollout wouldn't require a rollback. I'll merge it in main as I go on developing v0.9.0.

Andrew Ayer

unread,
Jul 1, 2026, 8:49:30 AM (2 days ago) Jul 1
to Filippo Valsorda, Certificate Transparency Policy
On Tue, 30 Jun 2026 23:49:27 +0200
"Filippo Valsorda" <fil...@ml.filippo.io> wrote:

> The alternative is indeed storing the signature instead of
> recomputing it. That would roughly triple the size of the cache,
> which we just doubled. I had over-indexed on cache size, but 6x might
> actually matter?

I think it would be fine to go back to a 128 bit hash, since the impact of a collision would be returning an SCT with an invalid signature, which is not a violation of the log's integrity. You can even validate the signature against the submission before returning it.

It might even be viable to use an even shorter hash, make the index non-unique, and if there are multiple hits, validate the signatures to find the right one?

> The code is pretty easy to audit at this point, the cache is

Another benefit is resilience to operator mistakes. Consider an operator setting up a new shard who forgets to update the cache path when copying the config for the previous shard. If the new shard's expiry range overlaps with the previous shard's (which one log operator likes to do) then currently one of the logs dies as soon as a certificate is submitted to both shards.

I think it would be a meaningful increase in resilience if the global lock backend were the only storage trusted by Sunlight.

> > By the way, v0.8.1 is not currently part of any branch in the
> > sunlight repo.
>
> Yeah, I forked off v0.8.0 to minimize the diff and ensure the rollout
> wouldn't require a rollback. I'll merge it in main as I go on
> developing v0.9.0.

Note that GitHub is currently displaying the same warning for v0.8.1 commits that you get when viewing untrusted code from a fork. It's arguably a GitHub bug but not using branches is a weird way of using Git.

Regards,
Andrew

Filippo Valsorda

unread,
Jul 1, 2026, 9:55:30 AM (2 days ago) Jul 1
to Andrew Ayer, Certificate Transparency Policy
2026-07-01 14:49 GMT+02:00 Andrew Ayer <ag...@andrewayer.name>:
On Tue, 30 Jun 2026 23:49:27 +0200
"Filippo Valsorda" <fil...@ml.filippo.io> wrote:

> The alternative is indeed storing the signature instead of
> recomputing it. That would roughly triple the size of the cache,
> which we just doubled. I had over-indexed on cache size, but 6x might
> actually matter?

I think it would be fine to go back to a 128 bit hash, since the impact of a collision would be returning an SCT with an invalid signature, which is not a violation of the log's integrity. You can even validate the signature against the submission before returning it.

That's still a 5x size increase compared to v0.8.0.

Would be useful to hear from Let's Encrypt and IPng if this would be a problem for them. The resilience argument is compelling.

> > By the way, v0.8.1 is not currently part of any branch in the
> > sunlight repo.
> Yeah, I forked off v0.8.0 to minimize the diff and ensure the rollout
> wouldn't require a rollback. I'll merge it in main as I go on
> developing v0.9.0.

Note that GitHub is currently displaying the same warning for v0.8.1 commits that you get when viewing untrusted code from a fork. It's arguably a GitHub bug but not using branches is a weird way of using Git.

Definitely a poor implementation on GitHub's part: refs/tags/ is just as trusted as refs/heads/ if not more. (I use Jujutsu, so it actually feels weirder to manually create an extra bookmark for something that is developed branchless and already pushed as a tag.)

Jeroen Massar

unread,
Jul 2, 2026, 9:10:36 AM (yesterday) Jul 2
to Filippo Valsorda, Andrew Ayer, Certificate Transparency Policy


> On 1 Jul 2026, at 15:55, Filippo Valsorda <fil...@ml.filippo.io> wrote:
>
> 2026-07-01 14:49 GMT+02:00 Andrew Ayer <ag...@andrewayer.name>:
>>
>>
>> On Tue, 30 Jun 2026 23:49:27 +0200
>> "Filippo Valsorda" <fil...@ml.filippo.io> wrote:
>>
>> > The alternative is indeed storing the signature instead of
>> > recomputing it. That would roughly triple the size of the cache,
>> > which we just doubled. I had over-indexed on cache size, but 6x might
>> > actually matter?
>>
>> I think it would be fine to go back to a 128 bit hash, since the impact of a collision would be returning an SCT with an invalid signature, which is not a violation of the log's integrity. You can even validate the signature against the submission before returning it.
>
>
> That's still a 5x size increase compared to v0.8.0.
>
> Would be useful to hear from Let's Encrypt and IPng if this would be a problem for them. The resilience argument is compelling.

The rennet2026h2 conversion just finished; unfortunately do not have a before-snapshot of the sizes (if I run another I will definitely do that).

We currently have:

102K rennet2025h2 (recomputed)
7.6M rennet2026h1 (recomputed)
8.3G rennet2026h2 (recomputed)
235K rennet2027h1
235K rennet2027h2

22G gouda2025h2
50G gouda2026h1
21G gouda2026h2
1.1G gouda2027h1
427K gouda2027h2

They all have "cache256" table for new entries and the old "cache" table for the 128bit ones. Did not drop them yet.

Running that conversion on rennet2026h2 took quite some time (~2 days) btw; I think there is either some locking or other thing happening that it is slowing it down (average rate was around 500/s ...) or it was just slow as it was just heavy and we should have some multi-core thing happening...

even if you take the above largest 50G and make it 300G.... that is doable from storage perspective IMHO.

Noting that the above is the operational set, the 2025h2 ones are retired already, when archived they do not take space; thus operationally rotating them to archive frees that storage up for the next year.

Regards,
Jeroen

Filippo Valsorda

unread,
Jul 2, 2026, 9:24:39 AM (yesterday) Jul 2
to Jeroen Massar, Andrew Ayer, Certificate Transparency Policy
I don’t recommend dropping the old table: that would lock the database for long enough to cause a visible outage. The rename will disable the fallback without however reclaiming space.

Running that conversion on rennet2026h2 took quite some time (~2 days) btw; I think there is either some locking or other thing happening that it is slowing it down (average rate was around 500/s ...) or it was just slow as it was just heavy and we should have some multi-core thing happening...

The recomputation is not very optimized, it’s designed to be safe and simple, but it should be I/O bound, not CPU bound.

On our machine the rate was more like 6,000-8,000/s, so this suggests there’s something slow in your I/O stack.

(It’s not urgent, anyway. You could start them all in parallel and let them run for a week or two.)

even if you take the above largest 50G and make it 300G.... that is doable from storage perspective IMHO.

Yeah I think I’m 90% convinced to add the signature to the cache. The numbers are not prohibitive and only trusting the lock backend, as Andrew pointed out, is compelling.

Would be good to get a thumbs up from Let’s Encrypt as well on the cache size growth, since they’re the ones that store it separately from object storage.

Jeroen Massar

unread,
Jul 2, 2026, 11:57:21 AM (yesterday) Jul 2
to Filippo Valsorda, Certificate Transparency Policy


> On 2 Jul 2026, at 15:24, Filippo Valsorda <fil...@ml.filippo.io> wrote:
>
> 2026-07-02 15:10 GMT+02:00 Jeroen Massar <jer...@massar.ch>:

> [..]
>> Running that conversion on rennet2026h2 took quite some time (~2 days) btw; I think there is either some locking or other thing happening that it is slowing it down (average rate was around 500/s ...) or it was just slow as it was just heavy and we should have some multi-core thing happening...
>
>
> The recomputation is not very optimized, it’s designed to be safe and simple, but it should be I/O bound, not CPU bound.
>
> On our machine the rate was more like 6,000-8,000/s, so this suggests there’s something slow in your I/O stack.
>
> (It’s not urgent, anyway. You could start them all in parallel and let them run for a week or two.)

Yes, that is why I did not take the time to peek at where the bottleneck was yet.

>> even if you take the above largest 50G and make it 300G.... that is doable from storage perspective IMHO.
>
>
> Yeah I think I’m 90% convinced to add the signature to the cache. The numbers are not prohibitive and only trusting the lock backend, as Andrew pointed out, is compelling.

Sounds reasonable to me indeed.

I'll hold off on the recompute for the other logs for the time being unless you say otherwise.

As I guess if you apply the above change that it might imply a recompute again and if that is the long term one, I'll have a poke where the bottleneck is. It did not directly look like IO from a mere nvme, but that was a mere btop.

Good thing of having staging and prod logs, can 'safely' run it first on staging ;)

Regards,
Jeroen

Reply all
Reply to author
Forward
0 new messages