The Sunlight CT log implementation and monitoring API

1,598 views
Skip to first unread message

Filippo Valsorda

unread,
Mar 14, 2024, 11:26:59 AMMar 14
to Certificate Transparency Policy

Hi all,

Late last year multiple WebPKI community participants noticed the same set of issues burdening the CT ecosystem, and came up with overlapping solutions. Today, I am happy to announce Sunlight, a design developed in collaboration with Let’s Encrypt, and borne out of discussions with the Chrome and TrustFabric teams at Google, the Sigsum project, and other CT log and monitor operators.

You can find a design document at https://filippo.io/a-different-CT-log, a formal specification of the API at https://c2sp.org/sunlight, the implementation at https://github.com/FiloSottile/sunlight, and additional resources including running logs at https://sunlight.dev.

This design is probably not perfect. The question we’re asking today is whether it is moving in the right direction, and if the migration path is sustainable for every stakeholder. If so, we hope we can incorporate feedback and go ahead with deploying it, and then iterate further in the future.

Technical details

We invite everyone to read the more in depth resources above, but here is a summary of the technical details.

No Merge Delay. add-chain requests are pooled and held until the next sequencing, which happens every second. No SCT is returned until the leaf is merged. add-chain requests complete in ~700ms on average, 1.5s at most.

SCT leaf_index extension. SCTs include an RFC 6962 extension with the index of the incorporated leaf.

Object storage backed. All critical data is stored in object or file storage, like S3 or a filesystem. There is no relational database.

Static tile-based read path. Monitors can fetch the tree directly from object storage, where it’s laid out as “tiles”. (A Go client library is coming soon, and we’d love to hear from monitors what they need from it.)

Single node. Instances operate with a single node, relying on the object store for durability, and on the distributed nature of CT for overall system availability.

Compare-and-swap safety mechanism. As an additional safety measure against operational issues, each sequencing performs a compare-and-swap operation of the STH against an auxiliary storage backend.

What’s there to like

Sunlight was designed to address pain points of log operators and monitors.

Reduced cost. Object storage is cheaper than database storage, Sunlight can run with trivial amounts of CPU and memory, and bandwidth costs are reduced by tile compression. Operating the write path in the cloud should cost less than $5k/year. The read path in the cloud is dominated by bandwidth costs, which will benefit significantly. Off-cloud any 1Gbps racked server will do.

Reduced operational complexity. No RDBMS, no merge delay, no consensus. Object storage is readily available in both cloud and self-hosted environments. Porting it to a new infrastructure (Fly.io and Tigris) took a couple days.

Nearly bulletproof. As long as the compare-and-swap backend (which requires no manual maintenance) is not tampered with, it should be impossible to cause irreversible incidents such as consistency failures. The operator can run parallel instances, rollback local or object storage, run out of disk, or even reuse keys, and Sunlight will fail safe.

No read rate-limits. If S3 has rate-limits, I couldn’t find them. With a 10Gbps connection monitors should be able to fetch a whole log in a couple hours, and certainly won’t be rate-limited tailing the log.

Easily scalable. Object storage scales effortlessly, and Sunlight was designed to handle 30x the current WebPKI issuance rate without load balancing across logs. Essentially all CPU goes towards chain verification and SCT signing, and currently consumes ~40% of a single CPU core, suggesting we can get to 128x the current rate on a single node.

A step towards killing SCTs. We are not proposing any changes now, but embedding the leaf index in the SCT is a step towards removing that indirection altogether and verifying proofs on the clients.

You can read more about how these qualities relate to the experience of a log operator in Let’s Encrypt’s announcement.

To sum up, Sunlight aims to make CT log operation more accessible by reducing cost, complexity, and risk. As a side effect, we also hope the API changes will remove “rust” from the protocol, allowing it to iterate more often in the next ten years than it did in the past ten.

We look forward to collecting feedback and answering questions here, or in the #sunlight channel of the transparency-dev Slack. PRs and issues are also welcome at the spec and implementation repositories.

A presto,
Filippo


Joe DeBlasio

unread,
Mar 14, 2024, 4:56:16 PMMar 14
to Filippo Valsorda, Certificate Transparency Policy
Hi folks!

We're excited by the work on Sunlight, and any efforts that make running CT logs easier, cheaper, and more reliable. As the specification, implementations, and ecosystem support evolve, we look forward to being able to include Sunlight logs alongside RFC6962 logs in Chrome. While we're not quite there yet, we encourage the CT community to experiment with the prototype logs and specification as much as possible.

As with any new thing, there will inevitably be some rough edges. You can help find and resolve them by submitting certificates to the test logs, building monitoring tools, embedding SCTs from Sunlight logs, etc. When you encounter issues, please flag them -- making sure that Sunlight is as robust as possible helps the entire community.

Best,
Joe, from the Chrome CT Team

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/e452b894-d691-4f6f-aecc-454c77654c05%40app.fastmail.com.

Clint Wilson

unread,
Mar 14, 2024, 5:46:40 PMMar 14
to Certificate Transparency Policy, Filippo Valsorda, Joe DeBlasio
Hello all,

It’s wonderful to see this evolution taking place to CT logs with Sunlight. There are quite a few worthwhile, smaller problems that are addressed or improved, along with a couple larger ones, culminating in what seems likely to be a true iteration on Certificate Transparency (and likely with minimal to no client impact to boot!).

That said, we’re not planning on accepting new Sunlight logs quite yet in the Apple CT Program (though we hope that is the end result). Before that jump, we’d especially like to hear additional input (even if it’s just “LGTM”) from CT log Monitors and Auditors, as well as CAs and other relying parties submitting certificates to CT logs. Along with that, we invite input from additional SMEs able to perform (informal) security and risk assessments of the proposal — especially if they’re accompanied by filed issues :)

This is a truly exciting development and we are grateful for the effort put into formalizing the specification of the Sunlight API. Thank you all!

Cheers,
-Clint

Andrew Ayer

unread,
Mar 18, 2024, 9:19:42 AMMar 18
to Filippo Valsorda, Certificate Transparency Policy
It's extremely exciting to see the launch of Sunlight! It should make
CT logs significantly easier to operate reliably, which will improve
the health of the CT ecosystem.

I've updated SSLMate's monitor to ingest STHs from the 15 public
Sunlight logs, and it has already detected an issue. The Sunlight
specification says:

"Note that all cryptographic operations (such as hashes and signatures)
are as specified by RFC 6962, so these APIs can be thought of as an
alternative encoding format for the same data"

However, at least 10 Sunlight logs have produced STHs with a tree size
of 0 and an all-zero root hash, while RFC 6962 specifies that the root
hash of an empty tree is the SHA-256 hash of an empty string. I've
attached the STHs to this email in the form of an STH pollination JSON
document.

My next step is implementing log entry ingestion. I'll report back any
feedback.

Regards,
Andrew
inconsistent_sths.json

Ben Laurie

unread,
Mar 18, 2024, 4:48:46 PMMar 18
to Filippo Valsorda, Certificate Transparency Policy
I do like this design, not least because it is a step towards a design I've noodled for many years now: RAIL (Redundant Array of Inexpensive Logs), the core idea there being you accept an even lower availability and thus accept that you need to get more SCTs (or, in fact, inclusion proofs) in your cert. Which leads to some comments:

a) Availability is the tradeoff here - and that should (probably) lead to an increase in the required number of SCTs/inclusion proofs.
b) You've pushed the complexity that was painful back onto object stores and "global" databases, both of which you require to be highly reliable and available - a reasonable tradeoff, but probably brings costs that are not immediately obvious (simultaneous outages of large numbers of logs being one that springs to mind).
c) Obviously switching to inclusion proofs is something I've always wanted, and solves many problems, so yay.

Matthew McPherrin

unread,
Mar 18, 2024, 6:34:57 PMMar 18
to Ben Laurie, Filippo Valsorda, Certificate Transparency Policy
I acknowledge the Sunlight design is trading off the write path availability, but not the read path. I would expect that S3 + Cloudfront to have similar or higher availability than Trillian on RDS. AWS advertises a 99.95% sla for RDS, and 99.99% for S3. In practice we have had more problems with RDS than S3, primarily related to scaling under load.
Given that, I'm not sure that increasing the number of required SCTs follows from that.

The concern about multiple operators on a single object store leading to shared outages is a real concern. I believe based on their public IP addresses, the current Let's Encrypt, Digicert, and TrustAsia's logs are all backed by AWS.
I don't know what the best path is to de-risking that, but it is not a new risk. Filippo's Rome test log is stored using Tigris, and my own private test infrastructure uses Minio. I hope we can get more operators using different platforms.

Matthew McPherrin

unread,
Mar 18, 2024, 6:40:37 PMMar 18
to Certificate Transparency Policy, Andrew Ayer, Certificate Transparency Policy, Filippo Valsorda
Thanks, I've opened a bug to track the all-zero root hash:

https://github.com/FiloSottile/sunlight/issues/14

All the current Sunlight logs are instantiated the same way, signing a zero-height tree during setup, so this problem does affect all of them.

Ben Laurie

unread,
Mar 19, 2024, 10:58:40 AMMar 19
to Matthew McPherrin, Filippo Valsorda, Certificate Transparency Policy
On Mon, 18 Mar 2024 at 22:34, Matthew McPherrin <ma...@letsencrypt.org> wrote:
I acknowledge the Sunlight design is trading off the write path availability, but not the read path. I would expect that S3 + Cloudfront to have similar or higher availability than Trillian on RDS. AWS advertises a 99.95% sla for RDS, and 99.99% for S3. In practice we have had more problems with RDS than S3, primarily related to scaling under load.

The server itself is the weak point. Possibly this is going to work out OK because recovering a dead server ought to be pretty simple. But OTOH, it may not turn out so well in practice...
 
Given that, I'm not sure that increasing the number of required SCTs follows from that.

The concern about multiple operators on a single object store leading to shared outages is a real concern. I believe based on their public IP addresses, the current Let's Encrypt, Digicert, and TrustAsia's logs are all backed by AWS.
I don't know what the best path is to de-risking that, but it is not a new risk.

Agreed - but this design reduces the plausible pool of service providers considerably.

Matthew McPherrin

unread,
Mar 19, 2024, 12:01:55 PMMar 19
to Ben Laurie, Filippo Valsorda, Certificate Transparency Policy
One of the design goals of introducing a new read-path API is that the tiles can be served directly from a web server or object storage (either directly, or via CDN).
The application server handles the write APIs and stores tiles, but isn't involved in the read path at all.

Thus Monitors/Auditors should be able to continue to process entries even if the application server handling writes is down, or gone.
I anticipate this will make keeping historical log archives or mirrors much easier as well.

I don't anticipate that the current design will reduce service providers very much, either.
The cloud providers all have object storage available as a commodity item (At least: AWS, Google, Azure, Oracle, DigitalOcean, OVH, Linode).

While the current Sunlight implementation uses object storage, the design can be adapted to standard file storage.
That means any classical "web servers with a storage cluster" type on-prem/bare-metal deployment will work as well.
Or if you just have four servers in a rack, Minio or Ceph are options.

The case of "we only have databases" isn't currently supported, but I'm not sure that's a significant fraction of plausible service providers.
And of course, we don't want a single CT log implementation running all logs either.
Anyone who wants to run a database-backed log can continue to run Trillian or other implementations.

Filippo Valsorda

unread,
Mar 19, 2024, 2:51:49 PMMar 19
to Andrew Ayer, Certificate Transparency Policy
Thank you everyone for the excellent feedback so far! I join Joe and Clint in waiting to hear back from Monitors about your experience integrating support for Sunlight logs. Please don't hesitate to reach out via email or Slack.

(Responding to a few comments in separate emails.)

2024-03-18 14:14 GMT+01:00 Andrew Ayer <ag...@andrewayer.name>:
It's extremely exciting to see the launch of Sunlight!  It should make
CT logs significantly easier to operate reliably, which will improve
the health of the CT ecosystem.

I've updated SSLMate's monitor to ingest STHs from the 15 public
Sunlight logs, and it has already detected an issue.  The Sunlight
specification says:

"Note that all cryptographic operations (such as hashes and signatures)
are as specified by RFC 6962, so these APIs can be thought of as an
alternative encoding format for the same data"

However, at least 10 Sunlight logs have produced STHs with a tree size
of 0 and an all-zero root hash, while RFC 6962 specifies that the root
hash of an empty tree is the SHA-256 hash of an empty string.   I've
attached the STHs to this email in the form of an STH pollination JSON
document.

Never been this excited to hear a bug report =)

This is indeed an implementation issue. The Go Checksum Database and its tooling (specifically, golang.org/x/mod/sumdb/tlog, which I am reusing for Sunlight) diverge from RFC 6962 in how they define the hash of an empty tree.


I wish we caught this sooner. All zeros is a bit nicer but not worth diverging over.

Will fix. (See Matt's issue https://github.com/FiloSottile/sunlight/issues/14.) Logs that aim to eventual inclusion will need to rekey. I don't plan to rekey Rome unless the old size zero checkpoint is annoying for monitors.

Filippo Valsorda

unread,
Mar 19, 2024, 7:03:07 PMMar 19
to Ben Laurie, Certificate Transparency Policy
2024-03-18 21:48 GMT+01:00 Ben Laurie <be...@google.com>:
I do like this design, not least because it is a step towards a design I've noodled for many years now: RAIL (Redundant Array of Inexpensive Logs), the core idea there being you accept an even lower availability and thus accept that you need to get more SCTs (or, in fact, inclusion proofs) in your cert.

Hi Ben, very glad to hear you like Sunlight! Hopefully that means you don't mind that we used the name of your initial draft ^^ (If you actually do mind, the name would be my fault, for the record.)

Indeed, I do think it's a step closer to the ideas of RAIL, and there is an availability tradeoff. However, I want to stress that Sunlight was explicitly designed not to require additional SCTs, as that would be a significant change in browser policy, which could slow down rollout.

The CT system has two somewhat orthogonal redundancy mechanisms: multiple qualified logs in the ecosystem, and multiple SCTs in a certificate.

If a log's add-chain endpoint is unavailable, CAs need to submit to other logs. This is where Sunlight's single node architecture trades off availability: it can't have as many nines as a Trillian instance can have. This will be ok as long as there are enough logs overall, and Sunlight will hopefully increase the overall number of logs.

On the other hand, if a log becomes Rejected, certificates need to include other SCTs to remain valid. Sunlight does not suffer a tradeoff here. The kind of issues that cause logs to become Rejected are not any more likely to occur to a Sunlight log than they are to a Trillian log. Even if the 99% SLO is strictly enforced on the write path, 22h of downtime (1% of three months) is the stuff of major incidents or software bugs, not the effect of updating or migrating a single-node system. Major incidents and software bugs spare no one, neither single nodes nor distributed systems, as we’ve seen. Hell, my maybe unpopular opinion is that distributed systems are less likely to have brief downtimes but more likely to have long downtimes, because when they do go down they are often more complex to bring back up.

Anyway, to sum up Sunlight does make an availability tradeoff, but doesn’t increase log rejection chances, so doesn’t need additional SCTs.

P.S. We could even have a conversation about decoupling write-path and read-path SLO, because the former is compensated by log redundancy, while the latter is also about making sure certificates make it to monitors in a timely fashion. However, I prefer not to put that conversation on the critical path of deploying Sunlight logs, as there's no need.

Ben Laurie

unread,
Mar 20, 2024, 12:34:48 PMMar 20
to Matthew McPherrin, Filippo Valsorda, Certificate Transparency Policy
On Tue, 19 Mar 2024 at 16:01, Matthew McPherrin <ma...@letsencrypt.org> wrote:
One of the design goals of introducing a new read-path API is that the tiles can be served directly from a web server or object storage (either directly, or via CDN).
The application server handles the write APIs and stores tiles, but isn't involved in the read path at all.

Thus Monitors/Auditors should be able to continue to process entries even if the application server handling writes is down, or gone.

If the app server is down, this is not a log, it's a record of an ex-log.

 
I anticipate this will make keeping historical log archives or mirrors much easier as well.

It isn't hard now.

I don't anticipate that the current design will reduce service providers very much, either.
The cloud providers all have object storage available as a commodity item (At least: AWS, Google, Azure, Oracle, DigitalOcean, OVH, Linode).

While the current Sunlight implementation uses object storage, the design can be adapted to standard file storage.
That means any classical "web servers with a storage cluster" type on-prem/bare-metal deployment will work as well.
Or if you just have four servers in a rack, Minio or Ceph are options.

You can't have it both ways: this will *definitely* impact reliability.
 

The case of "we only have databases" isn't currently supported, but I'm not sure that's a significant fraction of plausible service providers.
And of course, we don't want a single CT log implementation running all logs either.
Anyone who wants to run a database-backed log can continue to run Trillian or other implementations.

Not if you're going to change the protocol to work off inclusion proofs.

Ben Laurie

unread,
Mar 20, 2024, 12:38:58 PMMar 20
to Filippo Valsorda, Certificate Transparency Policy
On Tue, 19 Mar 2024 at 23:03, Filippo Valsorda <fil...@ml.filippo.io> wrote:
2024-03-18 21:48 GMT+01:00 Ben Laurie <be...@google.com>:
I do like this design, not least because it is a step towards a design I've noodled for many years now: RAIL (Redundant Array of Inexpensive Logs), the core idea there being you accept an even lower availability and thus accept that you need to get more SCTs (or, in fact, inclusion proofs) in your cert.

Hi Ben, very glad to hear you like Sunlight! Hopefully that means you don't mind that we used the name of your initial draft ^^ (If you actually do mind, the name would be my fault, for the record.)

Indeed, I do think it's a step closer to the ideas of RAIL, and there is an availability tradeoff. However, I want to stress that Sunlight was explicitly designed not to require additional SCTs, as that would be a significant change in browser policy, which could slow down rollout.

The CT system has two somewhat orthogonal redundancy mechanisms: multiple qualified logs in the ecosystem, and multiple SCTs in a certificate.

If a log's add-chain endpoint is unavailable, CAs need to submit to other logs. This is where Sunlight's single node architecture trades off availability: it can't have as many nines as a Trillian instance can have. This will be ok as long as there are enough logs overall, and Sunlight will hopefully increase the overall number of logs.

On the other hand, if a log becomes Rejected, certificates need to include other SCTs to remain valid. Sunlight does not suffer a tradeoff here. The kind of issues that cause logs to become Rejected are not any more likely to occur to a Sunlight log than they are to a Trillian log. Even if the 99% SLO is strictly enforced on the write path, 22h of downtime (1% of three months) is the stuff of major incidents or software bugs, not the effect of updating or migrating a single-node system. Major incidents and software bugs spare no one, neither single nodes nor distributed systems, as we’ve seen. Hell, my maybe unpopular opinion is that distributed systems are less likely to have brief downtimes but more likely to have long downtimes, because when they do go down they are often more complex to bring back up.

I do agree with this and said it myself somewhere (possibly not publicly :-).
 

Anyway, to sum up Sunlight does make an availability tradeoff, but doesn’t increase log rejection chances, so doesn’t need additional SCTs.

Hmm. I guess we shall see! I have not seen a measurement of the reliability of these backends that would inform such an opinion.
 

P.S. We could even have a conversation about decoupling write-path and read-path SLO, because the former is compensated by log redundancy, while the latter is also about making sure certificates make it to monitors in a timely fashion. However, I prefer not to put that conversation on the critical path of deploying Sunlight logs, as there's no need.

RAIL does decouple like this.

Andrew Ayer

unread,
Apr 4, 2024, 7:59:17 PMApr 4
to Certificate Transparency Policy
I have run into an issue with the Sunlight protocol while implementing
entry processing in SSLMate's monitor.

In RFC 6962, get-entries returns a chain to an accepted root for every
logged certificate. This chain is necessary for attributing blame for a
misissued certificate: "If a certificate is accepted and an SCT issued,
the accepting log MUST store the entire chain used for verification,
including the certificate or Precertificate itself and including the
root certificate used to verify the chain (even if it was omitted from
the submission), and MUST present this chain for auditing upon request.
This chain is required to prevent a CA from avoiding blame by logging
a partial or empty chain." [RFC 6962 Section 3.1]

RFC 9162 further strengthens this requirement by explicitly calling it
log misbehavior for a log to not return a valid chain.

However, Sunlight dispenses with the chain for each entry, instead
requiring monitors to build a chain themselves using the log's
issuers bundle. This is problematic. First, building an X509 chain
is considerably more complex than verifying an X509 chain, so Sunlight
increases the burden on monitors and makes it hard to migrate to
Sunlight. Worse, X509 chain building is rife with
implementation-specific quirks. For example, it would be quite
reasonable for a log to do some or all of the following:

1. Not require byte-for-byte equality of Issuer and Subject (openssl
behavior).

2. Not require AKI and SKI to match (Go behavior).

3. Not require Issuer and Subject to match if AKI and SKI match
(Microsoft behavior).

If a monitor (or an RFC 6962 translation proxy) doesn't use the same
logic as the log, it might build a different chain, or fail to build a
chain at all. This would lead to reports of log misbehavior, creating
a burden on the log operator to investigate and find the right chain.

(Currently, to account for chain verification idiosyncrasies and avoid
spurious misbehavior reports, SSLMate's monitor verifies only the
signatures in the chain. This isn't feasible with Sunlight's issuers
bundle as it would require attempting a verification with every public
key in the bundle.)

I think this can be easily fixed by including the chain in Sunlight's
extra_data, but as fingerprints instead of full certificates. Monitors
would use the fingerprint to find the corresponding certificate in the
issuers bundle. This adds some extra bytes that logs need to store and
transmit, but it would still be *considerably* less than what Trillian /
RFC 6962 now require.

Regards,
Andrew

Filippo Valsorda

unread,
Apr 4, 2024, 10:33:41 PMApr 4
to Andrew Ayer, Certificate Transparency Policy
Thank you Andrew for bringing this up, it's an intentional design choice that's worth discussing with the community.

Monitors indeed need a simple reproducible process to reconstruct chains.

My preference would be to specify byte-for-byte equality of Issuer and Subject as a requirement for Sunlight-logged chains. Given that, it should be almost as easy to build a chain from the issuers bundle as it would be with fingerprints: looking up potential parents by their Subject bytes is not much different from looking them up by their hash; there might be a few candidates instead of only one, but not so many that trial signature verifications become impractical.

That might sound like a CT-enforced rule, which depending on how you view the role of CT might be undesirable. However, it's my understanding that currently all Trillian logs already require byte-for-byte equality of Issuer and Subject when building chains (see isValid and ValidateChain), so we would in fact just be capturing the existing state of the world, and making a lot of logic simpler by allowing monitors to rely on it.

(Personally, I would consider ratcheting the existing requirement forward into a specification MUST and stopping a backslide into the more complex OpenSSL or Microsoft behaviors as a positive side-effect.)

If the above is still considered problematic, I am not strongly opposed to including fingerprints of the issuers in the TileLeaf structure. I agree it would be a simple change with modest cost.

Filippo Valsorda

unread,
Apr 6, 2024, 4:52:23 PMApr 6
to Andrew Ayer, Certificate Transparency Policy
2024-04-05 04:33 GMT+02:00 Filippo Valsorda <fil...@ml.filippo.io>:
Thank you Andrew for bringing this up, it's an intentional design choice that's worth discussing with the community.

Monitors indeed need a simple reproducible process to reconstruct chains.

My preference would be to specify byte-for-byte equality of Issuer and Subject as a requirement for Sunlight-logged chains.

I opened https://github.com/C2SP/C2SP/pull/69 to codify the requirement. I would appreciate feedback from monitors on whether this sufficiently addresses chain-building complexity, and from browsers on whether logs with such an explicit requirement would be acceptable. (Note that all Trillian logs already implicitly apply such a requirement.)

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

Andrew Ayer

unread,
Apr 8, 2024, 5:02:42 PMApr 8
to Filippo Valsorda, Certificate Transparency Policy
On Fri, 05 Apr 2024 04:33:07 +0200
"Filippo Valsorda" <fil...@ml.filippo.io> wrote:

> My preference would be to specify byte-for-byte equality of Issuer
> and Subject as a requirement for Sunlight-logged chains. Given that,
> it should be almost as easy to build a chain from the issuers bundle
> as it would be with fingerprints: looking up potential parents by
> their Subject bytes is not much different from looking them up by
> their hash; there might be a few candidates instead of only one, but
> not so many that trial signature verifications become impractical.

I see two CA subjects in CT which have 9,734 and 9,910 keys respectively,
both Merge Delay Intermediates from Google's monitoring. Among real CAs,
the current maximum is 8, but there is technically no limit on high that
could go, so I think it would be prudent for monitors to try AKI/SKI
first, lest a CA that goes wild with rekeying tank their performance.
That's workable, but it's several degrees less simple that it was in
RFC 6962 which has the chain right in the get-entries response.

But my bigger concern is that because multiple distinct CA certificates
can contain the same subject and public key, it would still be possible
for monitors or translation proxies to construct a different chain than
the one used during submission, or from each other. So the process would
not be reproducible, and a Sunlight log would not be able to provide an
RFC6962-compliant interface using a proxy.

That might prove OK in the end, but it makes me nervous. The Sunlight
spec claims "these APIs can be thought of as an alternative encoding
format for the same data". I find this an extremely compelling
property that justifies accelerated adoption of Sunlight, because it
means consumers can ultimately get the same bytes from a Sunlight log
as they would from an RFC6962 log, giving me confidence that monitoring
Sunlight logs will work just like monitoring RFC6962 logs. But if
monitors and translation proxies have to perform non-reproducible chain
building, then the property claimed in the Sunlight spec does not hold,
and I worry that there is some new edge case waiting to be discovered.

I think it would be well worth it to include the fingerprints in TileLeaf
to get the property claimed in the spec.

(By the way, I support requiring byte-for-byte subject/issuer matching
during acceptance, and this can still be required. But it should not
be a load-bearing part of the protocol.)

Regards,
Andrew

Filippo Valsorda

unread,
Apr 9, 2024, 6:40:08 AMApr 9
to Andrew Ayer, Certificate Transparency Policy
2024-04-08 23:02 GMT+02:00 Andrew Ayer <ag...@andrewayer.name>:
That might prove OK in the end, but it makes me nervous.  The Sunlight
spec claims "these APIs can be thought of as an alternative encoding
format for the same data".  I find this an extremely compelling
property that justifies accelerated adoption of Sunlight, because it
means consumers can ultimately get the same bytes from a Sunlight log
as they would from an RFC6962 log, giving me confidence that monitoring
Sunlight logs will work just like monitoring RFC6962 logs. But if
monitors and translation proxies have to perform non-reproducible chain
building, then the property claimed in the Sunlight spec does not hold,
and I worry that there is some new edge case waiting to be discovered.

Yeah, this is a compelling argument. I'll propose a couple PRs to add chain fingerprints to TileLeaf.

While at it, I would like feedback on whether to keep reproducing the Precertificate Signing Certificate in each TileLeaf, or whether to treat it like any other issuer. The original reasoning was that there's nothing stopping CAs from making a new Precertificate Signing Certificate every hour, flooding the issuers bundle. However, as you point out, there's technically nothing stopping CAs from doing that with regular intermediates either. Are Precertificate Signing Certificate common enough to even care about it? Do they rotate more frequently than intermediates?

(By the way, I support requiring byte-for-byte subject/issuer matching
during acceptance, and this can still be required.  But it should not
be a load-bearing part of the protocol.)

I think there's generally consensus for the requirement, but it feels out of scope for the Sunlight spec if it's not necessary to enable the monitoring API. I plan to keep the de-facto requirement in the implementation, though (also because Sunlight uses the Trillian chain verifier).

Andrew Ayer

unread,
Apr 10, 2024, 6:23:20 PMApr 10
to Filippo Valsorda, Certificate Transparency Policy
On Tue, 09 Apr 2024 12:39:37 +0200
"Filippo Valsorda" <fil...@ml.filippo.io> wrote:

> While at it, I would like feedback on whether to keep reproducing the
> Precertificate Signing Certificate in each TileLeaf, or whether to
> treat it like any other issuer. The original reasoning was that
> there's nothing stopping CAs from making a new Precertificate Signing
> Certificate every hour, flooding the issuers bundle. However, as you
> point out, there's *technically* nothing stopping CAs from doing that
> with regular intermediates either. Are Precertificate Signing
> Certificate common enough to even care about it? Do they rotate more
> frequently than intermediates?

This raises another question, which is whether issuers.pem is
sufficiently resilient to spam attacks that create a huge number of
intermediates (possibly under a constrained or revoked intermediate).

Spam attacks are already a concern with leaf certificates, but once
a leaf spam attack is mitigated, logs can go back to normal. In
contrast, spammy intermediates would occupy issuers.pem for the rest of
the log's life, causing pain any time it changes and monitors have to
download a new copy. It might be necessary to freeze the log after a
bad intermediate spam attack.

Spam attacks don't necessarily require a malicious CA - if a
browser-trusted CA cross-signs the US Federal PKI (again), then
suddenly a huge number of new intermediates could appear in issuers.pem.
(Maybe not enough to kill a log, but it would still be wasteful.)

Maybe DER-encoded issuer certificates should be served as individual
files at /issuers/{SHA256 fingerprint}?

I think it would be fine to serve Precertificate Signing Certificates
that way too.

> > (By the way, I support requiring byte-for-byte subject/issuer
> > matching during acceptance, and this can still be required. But it
> > should not be a load-bearing part of the protocol.)
>
> I think there's generally consensus for the requirement, but it feels
> out of scope for the Sunlight spec if it's not necessary to enable
> the monitoring API. I plan to keep the de-facto requirement in the
> implementation, though (also because Sunlight uses the Trillian chain
> verifier).

That's reasonable.

Regards,
Andrew

Filippo Valsorda

unread,
Jun 7, 2024, 11:28:10 AMJun 7
to Certificate Transparency Policy, Andrew Ayer
Hello all,

I have prepared a revision to the Sunlight spec with the following changes:
  1. moved issuers from a bundle to individual files;
  2. switched treatment of Precertificate Signing Certificates to be equivalent to issuers;
  3. stored issuer fingerprints with each leaf, to guarantee complete RFC 6962 equivalence;
  4. slightly tweaked the URL to remove tile height (which is now hardocded to eight) following discussion around a general API for tiled Merkle Trees (https://github.com/C2SP/C2SP/pull/73).

Most of these changes are based on the discussion in this thread, and on Andrew's suggestions in particular. (Thank you!) I would appreciate feedback from potential log operators, monitors, and relying parties.

Once discussion on this change concludes, I will make the implementation changes, and bundle them with the fix of the hash of zero-sized logs reported in this thread. After that we'll reset the existing logs.

We're also considering changing the name of the specification to more clearly disambiguate the Sunlight CT monitoring API (which may be implemented by any log implementation) and the Sunlight CT log implementation. The best I could come up with so far is the "Static Certificate Transparency API" by analogy to static websites (those served directly from static assets, without server-side computation), or c2sp.org/static-ct-api for short. We welcome both feedback on the change, and suggestions for alternative names.

Thank you,
Filippo

Andrew Ayer

unread,
Jun 9, 2024, 12:32:19 PMJun 9
to Filippo Valsorda, Certificate Transparency Policy
On Fri, 07 Jun 2024 17:27:46 +0200
"Filippo Valsorda" <fil...@ml.filippo.io> wrote:

> I have prepared a revision to the Sunlight spec with the following
> changes:

I just finished writing an RFC 6962 compatibility proxy against the new
Sunlight spec: https://github.com/AGWA/sunglasses/

I found the exercise very straightforward. The logic is simple - much
simpler than it would have been without the change to certificate chains.
I'm feeling good about the protocol.

I wanted to flag one aspect of the protocol for discussion. The log
entries in the data tile are not prefixed by their length, so skipping
an entry requires parsing it successfully. Consequentially, if a
log publishes a malformed entry, monitors won't be able to parse any
subsequent entry in the tile. In contrast, a single malformed entry in
RFC 6962 doesn't prevent other entries from being parsed. But perhaps
a single malformed entry necessitates disqualifying the log anyways,
so this is not a big deal? This has never happened with RFC 6962 logs
so there's no precedent. I'm curious what others think.

Regards,
Andrew

Filippo Valsorda

unread,
Jun 10, 2024, 9:53:12 AMJun 10
to Andrew Ayer, Certificate Transparency Policy
2024-06-09 18:32 GMT+02:00 Andrew Ayer <ag...@andrewayer.name>:
On Fri, 07 Jun 2024 17:27:46 +0200
"Filippo Valsorda" <fil...@ml.filippo.io> wrote:

> I have prepared a revision to the Sunlight spec with the following
> changes:

I just finished writing an RFC 6962 compatibility proxy against the new

😎💯

I found the exercise very straightforward.  The logic is simple - much
simpler than it would have been without the change to certificate chains.
I'm feeling good about the protocol.

Thank you so much for building this, it's an important test and validation of the protocol.

I wanted to flag one aspect of the protocol for discussion.  The log
entries in the data tile are not prefixed by their length, so skipping
an entry requires parsing it successfully.  Consequentially, if a
log publishes a malformed entry, monitors won't be able to parse any
subsequent entry in the tile.  In contrast, a single malformed entry in
RFC 6962 doesn't prevent other entries from being parsed.  But perhaps
a single malformed entry necessitates disqualifying the log anyways,
so this is not a big deal?  This has never happened with RFC 6962 logs
so there's no precedent.  I'm curious what others think.

I went back and forth on this, and more broadly on the encoding of the "data" tiles.

Here are this things I considered:
  1. RFC 6962 made what in hindsight I believe was a mistake by requiring extra per-entry data which is not hashed into the Merkle tree, so we need somewhere to put this data.
  2. This data compresses very well against the Merkle leaf, because it's mainly the pre_certificate which shares most of its bytes with the tbs_certificate.
  3. The encoding of Merkle leaves (and Sunlight data tiles) is very simple: it's not ASN.1 but a TLS-like length-prefixed encoding.
    1. To be clear, the "malformed entry" that breaks parsing of subsequent entries here would have malformed TLS-style framing. Malformed ASN.1 would still be contained in length-prefixed opaque fields.
    2. I've included a complete reference of the TLS structures at https://gist.github.com/FiloSottile/40d1490ce2e599a2ddf6dcb83a0d95ac.
1 and 2 make it very inefficient to adopt the proposed c2sp.org/tlog-tiles entries format, which is a shame. In my view, the main value of length-prefixing entries was being able to write generic tlog code that can hash or enumerate entries without being aware of Sunlight or CT. However, we'd have to put the extra data in separate files where they wouldn't compress against the tbs_certificate. (We also considered specifying space for extra data in c2sp.org/tlog-tiles, but decided against it to avoid encouraging what we believe is a dangerous design.)

Due to 3, I think the value of further length-prefixing entries is small. Sure, there would be one length field per entry instead of a few, but I'm not sure that changes the complexity that much. Logs could still get that wrong, just like they could forget a closing quote or a comma in the get-entries JSON response. (In that sense, I disagree that a malformed entry in RFC 6962 doesn't prevent other entries from being parsed. It depends on what level of encoding you look at.) The consensus seems to be that if the hashed entries were correct and they can fix the encoding, it's a recoverable issue, too.

Another reason to length-prefix would be to let clients be forwards-compatible by skipping unknown future entries they don't know how to parse. I think this is generally undesirable, as clients are monitors, and we don't want a mechanism to include entries in the log that a monitor will ignore.

Ultimately, I don't have a strong opinion on this. I think length-prefixing would be mostly superfluous but not harmful, and if there's consensus in favor of it I'm happy to make the change. (To time box the discussion, I am planning to merge the change and work on deploying it at the end of this week.)

Andrew Ayer

unread,
Jun 11, 2024, 8:23:27 AMJun 11
to Filippo Valsorda, Certificate Transparency Policy
I think you're right that the value of the length prefix is small, and
I don't have a strong opinion either. If no one has a stronger opinion
by the end of the week, I'm good with leaving it as-is.

Regards,
Andrew

Andrew Ayer

unread,
Jun 21, 2024, 9:54:07 AM (8 days ago) Jun 21
to Filippo Valsorda, Certificate Transparency Policy
On Fri, 07 Jun 2024 17:27:46 +0200
"Filippo Valsorda" <fil...@ml.filippo.io> wrote:

> We're also considering changing the name of the specification to more
> clearly disambiguate the Sunlight CT monitoring API (which may be
> implemented by any log implementation) and the Sunlight CT log
> implementation. The best I could come up with so far is the "Static
> Certificate Transparency API" by analogy to static websites (those
> served directly from static assets, without server-side computation),
> or c2sp.org/static-ct-api for short. We welcome both feedback on the
> change, and suggestions for alternative names.

I think it's a very good idea to disambiguate the API and the
implementation.

I haven't been able to think of a better name than "Static Certificate
Transparency API". The ability to serve from static assets is, I
believe, the key benefit of this protocol, so it seems quite appropriate
to include static in the name.

Regards,
Andrew

Filippo Valsorda

unread,
Jun 27, 2024, 10:06:12 AM (2 days ago) Jun 27
to Certificate Transparency Policy
Thank you all for the feedback.

I have now merged the changes (https://github.com/C2SP/C2SP/pull/76) discussed above.

I also landed a small tweak to clarify that checkpoints can carry extra signatures beyond the RFC 6962 one (https://github.com/C2SP/C2SP/pull/83). This leaves the option to operators to participate in the transparency witness network in the future (which we can discuss separately as an idea). I am planning to "grease" this joint by returning extra signatures from Sunlight logs. Clients that use note.Open and sunlight.NewRFC6962Verifier, or anyway implement the note spec correctly, will be unaffected. The extra signatures can be safely discarded by proxies such as Sunglasses.

I have opened a C2SP issue to rename the spec, too. https://github.com/C2SP/C2SP/issues/82

I think we reached a viable v1, and I'm hoping there will be no more semantic or wire changes from the current text.

I am working to update the Sunlight implementation to match the new changes, and we'll coordinate to empty and rekey the existing logs.

Matthew McPherrin

unread,
Jun 27, 2024, 12:14:06 PM (2 days ago) Jun 27
to Filippo Valsorda, Certificate Transparency Policy
Thanks Filippo! These changes look good to me. Once the sunlight implementation is updated, Let's Encrypt will set up new log shards running the new format, and shut down the old ones.

Let's Encrypt's primary motivation for going to half-year shards was primarily related to our underlying use of Trillian with separate mysql instances per-shard for database scale, which is much less relevant with Sunlight logs.  We overlapped their start and end dates to minimize operational risk during switchover.
We're considering going back to year-long shards for our logs using the Sunlight implementation, because the database scale options aren't as relevant. We will stagger the start dates of our shards, but may choose to stop overlapping them as well, as we can more easily run multiple fully-independent logs.
If anyone has any feedback on shard length, I'd be interested in hearing what tradeoffs we might want to consider from other people in the ecosystem.  Is having too many logs a hassle for configuration management? Are there any other concerns we should have?  Since we plan to run more logs, having less shards for each should keep the overall number of entries about the same in the log lists etc.

Matthew McPherrin
Let's Encrypt SRE

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

Andrew Ayer

unread,
Jun 27, 2024, 1:50:26 PM (2 days ago) Jun 27
to Matthew McPherrin, ct-p...@chromium.org
On Thu, 27 Jun 2024 12:13:25 -0400
"'Matthew McPherrin' via Certificate Transparency Policy"
<ct-p...@chromium.org> wrote:

> We're considering going back to year-long shards for our logs using
> the Sunlight implementation, because the database scale options
> aren't as relevant. We will stagger the start dates of our shards,
> but may choose to stop overlapping them as well, as we can more
> easily run multiple fully-independent logs.
> If anyone has any feedback on shard length, I'd be interested in
> hearing what tradeoffs we might want to consider from other people in
> the ecosystem. Is having too many logs a hassle for configuration
> management? Are there any other concerns we should have? Since we
> plan to run more logs, having less shards for each should keep the
> overall number of entries about the same in the log lists etc.

Monitors are usually only interested in unexpired certificates.
With yearlong shards, there will be more expired certificates to sift
through if a monitor hasn't been following the log the entire time.
Granted, the new static API will make it easier to scarf down an
entire log, but it would still waste bandwidth.

Additionally, a monitor which stores an index of every log entry to
enable fast STH and SCT auditing (as SSLMate does) has storage costs
which would be increased by a return to yearlong shards.

Considering that monitors are already facing an increase in bandwidth
and storage costs due to the anticipated increase in redundant logs,
I'm strongly opposed to yearlong shards.

A large number of logs doesn't give me any hassle with configuration
management. I'd much rather have shorter shard lengths. (I hope to
see quarterly shards if/when certificate lifetimes are capped at 90
days!)

Regards,
Andrew

Joe DeBlasio

unread,
Jun 27, 2024, 7:26:41 PM (2 days ago) Jun 27
to Andrew Ayer, Matthew McPherrin, ct-p...@chromium.org
Stabilizing on a v1 is a great milestone! We're excited for that stability to enable support among more members of the ecosystem, which will open the door to accepting these logs as trusted by Chrome.

Towards that end, our team is experimenting with being able to monitor Sunlight/static-ct-api logs for future policy compliance, and we invite folks to not only let this list know if/when you stand up new logs, but also to include our monitoring root as an accepted root -- we'd love to monitor your logs to work out the kinks in our tooling.

Regarding shard length, there is some meaningful hassle in configuration management on our side in juggling many new logs. We agree that it'd be nice to be able to support shorter-lived logs, especially as certificate lifetimes trend downward, so we're investigating ways to reduce this burden. For now though, we'd ask that shards be no shorter than 6 months and no longer than a year.

Best,
Joe

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

Matthew McPherrin

unread,
Jun 27, 2024, 7:32:20 PM (2 days ago) Jun 27
to Joe DeBlasio, Andrew Ayer, ct-p...@chromium.org
Thanks for the feedback to both of you!  I think that's a reasonable take for sure.  We'll keep that in mind as we make our decision about new shards.

I do wonder if there's an opportunity for some method of providing compact additional metadata for tree ranges, to allow monitors to more efficiently scan subsets of a log. For example, if every 2^16 certificates or so, a metadata file is written with notBefore and notAfter ranges, and something like bloom filters of subjectAlternativeNames or other interesting metadata.  I think that has some significant engineering challenges to find a balance that makes it useful, but may be an interesting area for future design.

Having monitors able to scan subsets of logs could potentially reduce the bandwidth log operators serve, which is our largest operating cost, operating on a thesis that many monitors are only looking for a small subset of certificates.

Seo Suchan

unread,
Jun 28, 2024, 5:52:55 PM (22 hours ago) Jun 28
to Certificate Transparency Policy, Matthew McPherrin, Andrew Ayer, ct-p...@chromium.org, Joe DeBlasio

By mistake sent old own as private, so wrote it again as public
for SANs I think inserting indivisual tls+1 level basedomains not entire SAN object or full fqdr, becasue most ct log monitor services are about live watch of certificates issued on any subdomains of a domain:
btw such metadate itself needs to verified, because otherwise a malisues log 'hide' certificate by lying on metadata that it doesn't have problematic domain in those tiles, or marking this tiles have already expired certificates: full aggregater will still find it, though.

2024년 6월 28일 금요일 오전 8시 32분 20초 UTC+9에 Matthew McPherrin님이 작성:
Reply all
Reply to author
Forward
0 new messages