Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Static CT API v1.0.0-rc.1 and Sunlight v0.3.1

1,216 views
Skip to first unread message

Filippo Valsorda

unread,
Aug 28, 2024, 1:42:34 PM8/28/24
to Certificate Transparency Policy
Hi all,

I'm happy to announce a release candidate of the Static CT API specification at https://c2sp.org/static...@v1.0.0-rc.1. There are no significant changes since https://github.com/C2SP/C2SP/pull/76.

The Sunlight implementation has been updated to implement the latest spec, and fresh development logs are available at https://rome.ct.filippo.io. These are not production logs, but we invite CAs to submit to them (but not include the SCTs in production certificates) and monitors to... well, monitor them :)

(The new Sunlight version also implements a two-step commit algorithm that should be more robust against object storage misbehavior and make recovery in case of simultaneous sequencers easier. This was suggested at https://github.com/FiloSottile/sunlight/issues/11.)

Looking forward to any feedback!

Cheers,
Filippo

P.S. I will be on vacation from tomorrow to September 9th. Hopefully that will lead to delayed responses :)

Matthew McPherrin

unread,
Aug 28, 2024, 2:28:21 PM8/28/24
to Filippo Valsorda, Certificate Transparency Policy
Thanks Filippo! I'd like to thank everyone who has contributed to this specification. It's very exciting to get close to a 1.0 release.

Let's Encrypt plans to shortly take down our existing Sunlight log shards that are based on the previous draft of the specification. We will stand up new logs running the current version of the sunlight specification.

We intend Twig to remain a test log, with Willow and Sycamore applying to become trusted logs once the CT programs are ready to accept such logs.

I will update this thread when the new logs are available. As always, the list of Let's Encrypt logs will be updated at https://letsencrypt.org/docs/ct-logs/ as well.

Matthew McPherrin
Let's Encrypt SRE

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/f5bc1e56-16cb-442b-beef-58c7a9916f53%40app.fastmail.com.

Elliot Cubit

unread,
Aug 29, 2024, 4:39:08 PM8/29/24
to Certificate Transparency Policy, Matthew McPherrin, Certificate Transparency Policy, Filippo Valsorda
This seems like a good place to mention that Censys now supports monitoring the static CT API and has is monitoring the development logs (Rome 2024h2/2025h1/2025h2).

We expect to finish mirroring the largest (2024h2) in a few days.

We are also prepared to mirror the Let's Encrypt logs running the new version of the spec as soon as they are operational.

Thanks to everyone who has contributed!
Elliot Cubit
Censys Software Engineer

Matthew McPherrin

unread,
Oct 3, 2024, 5:07:15 PM10/3/24
to Filippo Valsorda, Certificate Transparency Policy
Let's Encrypt is now running three logs on Sunlight v0.3.1, implementing the v1.0.0-rc.1 specification.

The details of those logs is available at https://letsencrypt.org/docs/ct-logs/#Sunlight as well as on the landing page for each log:

https://twig.ct.letsencrypt.org/ (test log)

The previous twig, sycamore, and willow shards have been deleted.
The new shards are designated 2025h1b and 2025h2b (adding a "b" suffix). 2026 shards will come in the near future.

Let's Encrypt is currently submitting production pre-certificates and certificates to all three, though of course not using the SCTs in certificates.
Our staging environment is using SCTs from twig and Rome, and submitting final certificates there as well.

If you have any questions or concerns, please feel free to email me, this list, or ask on our community forum: https://community.letsencrypt.org/

I look forward to any feedback from anyone who is trying to use these logs.

Matthew McPherrin
Let's Encrypt SRE

Andrew Ayer

unread,
Oct 15, 2024, 8:49:27 AM10/15/24
to Matthew McPherrin, 'Matthew McPherrin' via Certificate Transparency Policy, Filippo Valsorda
Hi Matthew,

I assume it was a mistake to set Willow 2025h2b's expiry end date in
2026? Unfortunately, the log does indeed contain a certificate expiring
as late as July 24, 2026 (at index 1140900) so this log would not comply
with Chrome's CT Log Policy regarding maximum shard length.

I can't help but wonder if the weird start/end dates contributed to this
mistake. When I was configuring my monitor, I was so focused on the
month and day that I didn't notice the incorrect year and ended up
misconfiguring it as 2025. (I only became aware of the mistake this
morning when I received alerts about the log containing certificates
outside of the range.) If logs generally configured their shards using
strings like "2025", "2025h2" or "2025q3" instead of a pair of
date-times, it seems like mistakes would be less likely.

Regards,
Andrew

Filippo Valsorda

unread,
Oct 15, 2024, 9:08:15 AM10/15/24
to Andrew Ayer, Matthew McPherrin, Certificate Transparency Policy
2024-10-15 14:49 GMT+02:00 Andrew Ayer <ag...@andrewayer.name>:
I can't help but wonder if the weird start/end dates contributed to this
mistake.  When I was configuring my monitor, I was so focused on the
month and day that I didn't notice the incorrect year and ended up
misconfiguring it as 2025.  (I only became aware of the mistake this
morning when I received alerts about the log containing certificates
outside of the range.)  If logs generally configured their shards using
strings like "2025", "2025h2" or "2025q3" instead of a pair of
date-times, it seems like mistakes would be less likely.

Matthew McPherrin

unread,
Oct 15, 2024, 10:13:47 AM10/15/24
to Andrew Ayer, 'Matthew McPherrin' via Certificate Transparency Policy, Filippo Valsorda
Thanks for pointing this out.  That does look like a mistake.

I feel like a small explanation of the dates is in order:

1. We don't want shard start/end transitions to happen over the New Year's holiday, to reduce the risk of anything breaking or alerting during the holidays.
2. We want the start/end transitions of our own logs to be offset from each other, as well as from other logs. This makes sure the test shards get traffic first, and ensure any issues encountered during the switchover from one shard to the next don't happen for all the logs at once.
3. We want transitions to happen on weekdays.

For this example, the 2025h1 shards are backdated from the "ideal" Jan 1st 2025, to Dec 17th, 18th, and 19th of 2024.  All of the other dates follow from them.

This does imply we'll have to stop the Willow shard 2025h2b, and replace it with Willow 2025h2c.

Given these logs aren't yet trusted, I expect we will do so promptly without much further notice. 

For some immediate mitigation to ensure this exact mistake doesn't happen again, I will add a test that the shards aren't too long.

Philippe Boneff

unread,
Oct 15, 2024, 10:46:21 AM10/15/24
to Matthew McPherrin, Andrew Ayer, 'Matthew McPherrin' via Certificate Transparency Policy, Filippo Valsorda
When I was configuring my monitor, I was so focused on the
month and day that I didn't notice the incorrect year and ended up
misconfiguring it as 2025
I feel the pain, I got bitten by the same thing a few months ago (on a different log though). 

Multiple RFC6962 operators have adopted the hX/qX syntax, but they don't necessarily all have the same end-dates patterns, and there are also temporal logs that don't use this syntax. I now realize reading 
Matthew's message that the logic for deciding LE's end date is a bit more elaborate than I thought (and I actually think all these points make a lot of sense).

Andrew, were you thinking of pushing this further than Sunlight?
Would it make sense for instance to have something in the specs that says that a temporal shard must have a YYYY{,h1,h2} suffix? And that the end_date of a log must not be X days +/- the beginning of this period?

Cheers,
Philippe

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

Andrew Ayer

unread,
Oct 15, 2024, 11:18:05 AM10/15/24
to Philippe Boneff, 'Philippe Boneff' via Certificate Transparency Policy, Matthew McPherrin, Filippo Valsorda
On Tue, 15 Oct 2024 16:45:40 +0200
"'Philippe Boneff' via Certificate Transparency Policy"
<ct-p...@chromium.org> wrote:

> Andrew, were you thinking of pushing this further than Sunlight?
> Would it make sense for instance to have something in the specs that
> says that a temporal shard must have a YYYY{,h1,h2} suffix? And that
> the end_date of a log must not be X days +/- the beginning of this
> period?

I don't think the specs should be too prescriptive about shards because
the ecosystem is going to need to adapt sharding strategies as certificate
lifetimes become shorter and shorter.

I'd rather:

1. Spec a mandatory log API endpoint that returns machine-readable
metadata that monitors, log list curators, submitters, etc. can
retrieve instead of manually copying information from web pages
or log inclusion bugs. To avoid reinventing the wheel, the API
endpoint can just return the same log object that the JSON log list
uses.

2. Encourage log software to provide user-friendly ways of configuring
shards that aren't start/end dates. The software could offset the
start/end dates if desired, but the calculations should be done by the
software, not humans.

Regards,
Andrew

Matthew McPherrin

unread,
Oct 15, 2024, 2:34:51 PM10/15/24
to Andrew Ayer, Philippe Boneff, 'Philippe Boneff' via Certificate Transparency Policy, Filippo Valsorda
Let's Encrypt now has configuration validation that our logs are approximately 6 months long.

I'm definitely in favour of sharing log metadata as JSON ala the apple/google log lists.
I think the biggest question for me is: Should we host 1 JSON file per-operator at an arbitrary location?
Or should each log host metadata at a prescribed location relative to the root of the log?
Per-log metadata is easiest to automate, but a single JSON of all logs might be easier to consume.

Elliot Cubit

unread,
Oct 15, 2024, 2:38:56 PM10/15/24
to Certificate Transparency Policy, Matthew McPherrin, Philippe Boneff, 'Philippe Boneff' via Certificate Transparency Policy, Filippo Valsorda, Andrew Ayer
I think that a single JSON file per-operator at an arbitrary location is the right choice - this would be very, very usful for Censys (and probably other monitors).

Filippo Valsorda

unread,
Oct 15, 2024, 3:56:49 PM10/15/24
to Elliot Cubit, Certificate Transparency Policy, Matthew McPherrin, Philippe Boneff, Andrew Ayer
I would like to serve such a file from the Sunlight instance itself, so ideally you'd support merging multiple lists for an operator.

Note that per-instance is different from per-log: Sunlight instances can host multiple logs (and I expect usually each instance will host all the shards under the same name). For example, there would be two production Let's Encrypt files in the current configuration:


This would be just barely less convenient for monitors, but would minimize margin of error and manual operations by the log operators. (Operators would still be free to manage a single list manually at an arbitrary location.)


(Note how the root of a Sunlight instance already presents an HTML list of its logs. e.g. https://sycamore.ct.letsencrypt.org)

Joe DeBlasio

unread,
Oct 15, 2024, 4:50:39 PM10/15/24
to Filippo Valsorda, Elliot Cubit, Certificate Transparency Policy, Matthew McPherrin, Philippe Boneff, Andrew Ayer
I think Chrome folks would be excited about a metadata json file, either in a per-log or per-operator form. I personally think it'd make some sense to opportunistically put this into the tiled log spec. That would add weight to the per-log model. There's definitely value in a per-operator file (e.g. if we explored something like a submission-proxy like we discussed at the transparency.dev summit -- more details to come), but that's still nascent, so I personally favor the simpler per-log approach. I'm also personally probably least interested in a per-instance file -- per-instance is not a conceptually useful abstraction for many besides the log operator. If support for per-log configuration existed in log software, we'd likely move to require it for future log applications.

Apropos of matching the existing log list schemas, we'll be announcing (hopefully by end-of-next-week) changes we'll be making to accommodate static-ct-api logs, including a minor addition to our v3 log list schema that'll add a separate list for tiled logs, and an additional entry for the monitoring URL prefix.

Joe

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

Philippe Boneff

unread,
Oct 15, 2024, 4:52:39 PM10/15/24
to Filippo Valsorda, Elliot Cubit, Certificate Transparency Policy, Matthew McPherrin, Andrew Ayer
Right now, from a client and configuration point of view two temporal logs that have the same prefix (e.g sycamore2025h1 and sycamore2025h2) are no less different that two logs that have different prefix (e.g argon2025h1, xenon2025h2). For instance, clients don't need to configure them together, they can be configured as independent logs.

If two logs now expose this file on a single endpoint, that's introduces a new URL to keep track of, and some logical connection between two temporal logs. Also, I like that Sunlight logs follow a `<name>.prefix./<temporal>`, but that might not be what every log operator decides to use. I can see pros (and cons) for having a per operator endpoint, or a per log endpoint, but I wonder wether introducing a new level of logical grouping is worth it. I'll admit that there is already a precedent here because temporal logs are grouped on crbug, but that seems fine since this is intended for humans, and not for machines.

Cheers,
Philippe

Adit Sachde

unread,
Oct 22, 2024, 11:24:40 AM10/22/24
to Philippe Boneff, Filippo Valsorda, Elliot Cubit, Certificate Transparency Policy, Matthew McPherrin, Andrew Ayer
I operate the Itko CT log (https://github.com/aditsachde/itko#public-instance), which runs my implementation of the static CT API spec. With Itko, each temporal shard is its own instance, so implementing an endpoint which covers all temporal shards would be difficult. I've added this endpoint to my log, following the log list v3 schema: https://ct2025.itko.dev/logs.v3.json

I also wanted to document an edge case for operators with static CT logs that hit my log last week.

Andrew notified me that the data tile `x023/404` did not match the hashes in the corresponding level 0 tile. However, he also checked the uncached version of the tile, which returned the correct data. After investigating, I believe the following sequence of events occurred.

1. The log tried to incorporate a bunch of new entries into the tree
2. The full data tile was written to disk
3. The log got OOM killed before the checkpoint was written
4. A consumer tried to opportunistically fetch the full data tile, putting the incorrect version in cache
5. The log rolled back to the last signed checkpoint on restart (which is fine because SCTs are not issued until the checkpoint is written to disk)
6. The log wrote a new data tile, hashes, and checkpoint
7. The incorrect data tile remained in cache, resulting in a mismatch

This case seems possible to hit with Sunlight as well, especially if the log is running on very constrained hardware as mine is, so it might be worth documenting somewhere. Based on what I can tell, Let's Encrypt's Sunlight logs don't cache with cloudfront so they shouldn't hit this issue.

Best,
Adit

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

Matthew McPherrin

unread,
Oct 22, 2024, 11:47:41 AM10/22/24
to Certificate Transparency Policy
Willow 2025h2c is running now. 2025h2b will be shut down in the future.

Information about all Willow shards is available at https://willow.ct.letsencrypt.org/, and all Let's Encrypt logs are documented at https://letsencrypt.org/docs/ct-logs/

{
  "description": "willow 2025h2c",
  "log_id": "kqECxXwi2rGMzCrnH9TMWcBdJR2hbHPiKBvT8LBImIc=",
  "key": "MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEaUvzqBm/C9pNUsVI1jqpms5OkW3Kk+Eb3/veW6P3ogOItkqqEvkZfU7zBbsvm1j1Ep003iNUGFOrilPl5TpCRg==",
  "temporal_interval": {
    "start_inclusive": "2025-06-19T00:00:00Z",
    "end_exclusive": "2025-12-18T00:00:00Z"
  },
  "url": "https://willow.ct.letsencrypt.org/2025h2c/"
}

Rob Stradling

unread,
Oct 23, 2024, 6:26:41 AM10/23/24
to Certificate Transparency Policy
crt.sh now supports monitoring the Static CT API.

I've added the currently available Willow and Sycamore logs, since they are anticipated to become production logs.  (It'll take quite a while for crt.sh to ingest the backlog from these logs!)
I also briefly added the Rome logs, but then disabled them again.  (crt.sh is focused on publicly-trusted certificates, whereas the Rome logs are development logs that accepts staging and test roots).

A few shout-outs to other projects that greatly helped me in this effort...

From: 'Matthew McPherrin' via Certificate Transparency Policy <ct-p...@chromium.org>
Sent: 22 October 2024 16:47
To: Certificate Transparency Policy <ct-p...@chromium.org>
Subject: Re: [ct-policy] Static CT API v1.0.0-rc.1 and Sunlight v0.3.1
 
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

Al Cutter

unread,
Oct 23, 2024, 7:00:59 AM10/23/24
to Adit Sachde, Philippe Boneff, Filippo Valsorda, Elliot Cubit, Certificate Transparency Policy, Matthew McPherrin, Andrew Ayer
On Tue, Oct 22, 2024 at 4:24 PM Adit Sachde <m...@aditsachde.com> wrote:
I operate the Itko CT log (https://github.com/aditsachde/itko#public-instance), which runs my implementation of the static CT API spec. With Itko, each temporal shard is its own instance, so implementing an endpoint which covers all temporal shards would be difficult. I've added this endpoint to my log, following the log list v3 schema: https://ct2025.itko.dev/logs.v3.json

I also wanted to document an edge case for operators with static CT logs that hit my log last week.

Andrew notified me that the data tile `x023/404` did not match the hashes in the corresponding level 0 tile. However, he also checked the uncached version of the tile, which returned the correct data. After investigating, I believe the following sequence of events occurred.

1. The log tried to incorporate a bunch of new entries into the tree
2. The full data tile was written to disk
3. The log got OOM killed before the checkpoint was written
4. A consumer tried to opportunistically fetch the full data tile, putting the incorrect version in cache
5. The log rolled back to the last signed checkpoint on restart (which is fine because SCTs are not issued until the checkpoint is written to disk)
6. The log wrote a new data tile, hashes, and checkpoint
7. The incorrect data tile remained in cache, resulting in a mismatch

This case seems possible to hit with Sunlight as well, especially if the log is running on very constrained hardware as mine is, so it might be worth documenting somewhere. Based on what I can tell, Let's Encrypt's Sunlight logs don't cache with cloudfront so they shouldn't hit this issue.

This is in some ways a client fail in (4) since it read and relied on data which wasn't ever committed to by the log (i.e. no checkpoint which committed to a tree size which implied the existence of these tiles existed at the time), however, the implication that a bad client can poison downstream caches for everybody else is interesting...

I think Tessera avoids the issue because it durably sequences entries at the time they're added, so tree integrations then become "just" an idempotent derivation (modulo the POSIX storage impl which I suspect is currently vulnerable to this).

Cheers,
Al.
 

Filippo Valsorda

unread,
Oct 23, 2024, 7:44:06 AM10/23/24
to Adit Sachde, Philippe Boneff, Elliot Cubit, Certificate Transparency Policy, Matthew McPherrin, Andrew Ayer
2024-10-22 16:50 GMT+02:00 Adit Sachde <m...@aditsachde.com>:
I operate the Itko CT log (https://github.com/aditsachde/itko#public-instance), which runs my implementation of the static CT API spec. With Itko, each temporal shard is its own instance, so implementing an endpoint which covers all temporal shards would be difficult. I've added this endpoint to my log, following the log list v3 schema: https://ct2025.itko.dev/logs.v3.json

I also wanted to document an edge case for operators with static CT logs that hit my log last week.

Andrew notified me that the data tile `x023/404` did not match the hashes in the corresponding level 0 tile. However, he also checked the uncached version of the tile, which returned the correct data. After investigating, I believe the following sequence of events occurred.

1. The log tried to incorporate a bunch of new entries into the tree
2. The full data tile was written to disk
3. The log got OOM killed before the checkpoint was written
4. A consumer tried to opportunistically fetch the full data tile, putting the incorrect version in cache
5. The log rolled back to the last signed checkpoint on restart (which is fine because SCTs are not issued until the checkpoint is written to disk)
6. The log wrote a new data tile, hashes, and checkpoint
7. The incorrect data tile remained in cache, resulting in a mismatch

This case seems possible to hit with Sunlight as well, especially if the log is running on very constrained hardware as mine is, so it might be worth documenting somewhere. Based on what I can tell, Let's Encrypt's Sunlight logs don't cache with cloudfront so they shouldn't hit this issue.

Thank you for sharing this! Indeed, early versions of Sunlight worked similarly. We switched to a two-step commit process based on feedback by Jelle van den Hooff and we have extensive tests for various ways the log can crash, which also ensure that immutable assets are not changed. Note that the only mutable resources are the checkpoint and the staging bundles (which are of not part of the API and of no interest to clients).

Three concerns led to the change (which added a storage round-trip to serialization, so not entirely free):
  1. if a writer crashes and restarts it needs to update "production" tiles, which therefore can't be marked read-only with the storage backend;
  2. if two writers run in parallel the "losing" one that fails to update the lock might still win the race to write a tile, which would not be a split but would present inconsistent data to clients;
  3. if the storage backend rolls back a tile to a previously attempted write (for example because cancellation of the write didn't propagate) which was part of a failed sequencing round, the log would present inconsistent data to clients.
2 and 3 are only recoverable if you have object versioning enabled. Some S3-like services don't offer that feature, but do have "create-only-if-not-exists" which however breaks due to 1. The cache poisoning issue you encountered is similar to 1.

Ultimately, I am very glad we switched to a design where all client-visible writes (except the checkpoint, which anyway is stored authoritatively in the lock database) are idempotent, and would recommend that to anyone working with object storage. (I spent so much time modeling "zombie writes" like in point 3 above, which are just not an issue this way.)

Adit Sachde

unread,
Oct 29, 2024, 3:53:33 PM10/29/24
to Filippo Valsorda, Philippe Boneff, Elliot Cubit, Certificate Transparency Policy, Matthew McPherrin, Andrew Ayer
Thanks for the rundown Filippo! I was not aware that Sunlight moved to a two-step commit process, the list of concerns that you listed for why make a lot of sense.

I really like the tar based approach that Sunlight uses; I will implement a similar mechanism in Itko when I get a chance.

Best,
Adit

Luke Valenta

unread,
Nov 20, 2024, 10:27:14 AM11/20/24
to Rob Stradling, Certificate Transparency Policy
Hi folks,

Cloudflare also now supports monitoring Static CT API logs. We're monitoring only the Willow and Sycamore logs for now as well.  It'll take a few more days before we catch up on sycamore2025h1b (crawling Static CT logs is blazingly fast, but we had to add rate limits for internal consumers), but stats are available now in Merkle Town: https://ct.cloudflare.com/logs.

I'd also like to echo Rob's shout-outs! It was extremely helpful to have those other projects to reference.

Best,
Luke



--
Luke Valenta
Systems Engineer - Research

Matthew McPherrin

unread,
Nov 20, 2024, 12:27:27 PM11/20/24
to Luke Valenta, Rob Stradling, Certificate Transparency Policy
I'm glad to see these monitored by Cloudflare. Thanks!

One thing I notice is that the uptime measured in Merkle Town appears significantly lower than our own measurements, and our targeted service level.

Our own measurements show submission success rates between 99.9% and 100%.

If at all possible, I'd love to get more information about what problems you're seeing.



Luke Valenta

unread,
Nov 20, 2024, 1:26:17 PM11/20/24
to Matthew McPherrin, Rob Stradling, Certificate Transparency Policy
Hi Matthew,

I'm seeing some 400 responses to our add-(pre-)chain requests, which looks like we're submitting chains that the log considers invalid and counting it as the log being down (which is perhaps a mistake on our end). I'll investigate and follow up in the transparency-dev Slack, but I wouldn't put too much stake in those uptime metrics for the time being.

Best,
Luke

Reply all
Reply to author
Forward
0 new messages