The Sunlight CT log implementation and monitoring API

Filippo Valsorda

unread,

Mar 14, 2024, 11:26:59 AMMar 14

to Certificate Transparency Policy

Hi all,

Late last year multiple WebPKI community participants noticed the same set of issues burdening the CT ecosystem, and came up with overlapping solutions. Today, I am happy to announce Sunlight, a design developed in collaboration with Let’s Encrypt, and borne out of discussions with the Chrome and TrustFabric teams at Google, the Sigsum project, and other CT log and monitor operators.

You can find a design document at https://filippo.io/a-different-CT-log, a formal specification of the API at https://c2sp.org/sunlight, the implementation at https://github.com/FiloSottile/sunlight, and additional resources including running logs at https://sunlight.dev.

This design is probably not perfect. The question we’re asking today is whether it is moving in the right direction, and if the migration path is sustainable for every stakeholder. If so, we hope we can incorporate feedback and go ahead with deploying it, and then iterate further in the future.

Technical details

We invite everyone to read the more in depth resources above, but here is a summary of the technical details.

No Merge Delay. add-chain requests are pooled and held until the next sequencing, which happens every second. No SCT is returned until the leaf is merged. add-chain requests complete in ~700ms on average, 1.5s at most.

SCT leaf_index extension. SCTs include an RFC 6962 extension with the index of the incorporated leaf.

Object storage backed. All critical data is stored in object or file storage, like S3 or a filesystem. There is no relational database.

Static tile-based read path. Monitors can fetch the tree directly from object storage, where it’s laid out as “tiles”. (A Go client library is coming soon, and we’d love to hear from monitors what they need from it.)

Single node. Instances operate with a single node, relying on the object store for durability, and on the distributed nature of CT for overall system availability.

Compare-and-swap safety mechanism. As an additional safety measure against operational issues, each sequencing performs a compare-and-swap operation of the STH against an auxiliary storage backend.

What’s there to like

Sunlight was designed to address pain points of log operators and monitors.

Reduced cost. Object storage is cheaper than database storage, Sunlight can run with trivial amounts of CPU and memory, and bandwidth costs are reduced by tile compression. Operating the write path in the cloud should cost less than $5k/year. The read path in the cloud is dominated by bandwidth costs, which will benefit significantly. Off-cloud any 1Gbps racked server will do.

Reduced operational complexity. No RDBMS, no merge delay, no consensus. Object storage is readily available in both cloud and self-hosted environments. Porting it to a new infrastructure (Fly.io and Tigris) took a couple days.

Nearly bulletproof. As long as the compare-and-swap backend (which requires no manual maintenance) is not tampered with, it should be impossible to cause irreversible incidents such as consistency failures. The operator can run parallel instances, rollback local or object storage, run out of disk, or even reuse keys, and Sunlight will fail safe.

No read rate-limits. If S3 has rate-limits, I couldn’t find them. With a 10Gbps connection monitors should be able to fetch a whole log in a couple hours, and certainly won’t be rate-limited tailing the log.

Easily scalable. Object storage scales effortlessly, and Sunlight was designed to handle 30x the current WebPKI issuance rate without load balancing across logs. Essentially all CPU goes towards chain verification and SCT signing, and currently consumes ~40% of a single CPU core, suggesting we can get to 128x the current rate on a single node.

A step towards killing SCTs. We are not proposing any changes now, but embedding the leaf index in the SCT is a step towards removing that indirection altogether and verifying proofs on the clients.

You can read more about how these qualities relate to the experience of a log operator in Let’s Encrypt’s announcement.

To sum up, Sunlight aims to make CT log operation more accessible by reducing cost, complexity, and risk. As a side effect, we also hope the API changes will remove “rust” from the protocol, allowing it to iterate more often in the next ten years than it did in the past ten.

We look forward to collecting feedback and answering questions here, or in the #sunlight channel of the transparency-dev Slack. PRs and issues are also welcome at the spec and implementation repositories.

A presto,
Filippo

Joe DeBlasio

unread,

Mar 14, 2024, 4:56:16 PMMar 14

to Filippo Valsorda, Certificate Transparency Policy

Hi folks!

We're excited by the work on Sunlight, and any efforts that make running CT logs easier, cheaper, and more reliable. As the specification, implementations, and ecosystem support evolve, we look forward to being able to include Sunlight logs alongside RFC6962 logs in Chrome. While we're not quite there yet, we encourage the CT community to experiment with the prototype logs and specification as much as possible.

As with any new thing, there will inevitably be some rough edges. You can help find and resolve them by submitting certificates to the test logs, building monitoring tools, embedding SCTs from Sunlight logs, etc. When you encounter issues, please flag them -- making sure that Sunlight is as robust as possible helps the entire community.

Best,
Joe, from the Chrome CT Team

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/e452b894-d691-4f6f-aecc-454c77654c05%40app.fastmail.com.

Clint Wilson

unread,

Mar 14, 2024, 5:46:40 PMMar 14

to Certificate Transparency Policy, Filippo Valsorda, Joe DeBlasio

Hello all,

It’s wonderful to see this evolution taking place to CT logs with Sunlight. There are quite a few worthwhile, smaller problems that are addressed or improved, along with a couple larger ones, culminating in what seems likely to be a true iteration on Certificate Transparency (and likely with minimal to no client impact to boot!).

That said, we’re not planning on accepting new Sunlight logs quite yet in the Apple CT Program (though we hope that is the end result). Before that jump, we’d especially like to hear additional input (even if it’s just “LGTM”) from CT log Monitors and Auditors, as well as CAs and other relying parties submitting certificates to CT logs. Along with that, we invite input from additional SMEs able to perform (informal) security and risk assessments of the proposal — especially if they’re accompanied by filed issues :)

This is a truly exciting development and we are grateful for the effort put into formalizing the specification of the Sunlight API. Thank you all!

Cheers,

-Clint

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/CAFZs0S4Z0%2B%3DOK75-en%2Bt9Uy0wRLP7ahcZkAhp5-%2BmjHJOcsrOA%40mail.gmail.com.

Andrew Ayer

unread,

Mar 18, 2024, 9:19:42 AMMar 18

to Filippo Valsorda, Certificate Transparency Policy

It's extremely exciting to see the launch of Sunlight! It should make
CT logs significantly easier to operate reliably, which will improve
the health of the CT ecosystem.

I've updated SSLMate's monitor to ingest STHs from the 15 public
Sunlight logs, and it has already detected an issue. The Sunlight
specification says:

"Note that all cryptographic operations (such as hashes and signatures)
are as specified by RFC 6962, so these APIs can be thought of as an
alternative encoding format for the same data"

However, at least 10 Sunlight logs have produced STHs with a tree size
of 0 and an all-zero root hash, while RFC 6962 specifies that the root
hash of an empty tree is the SHA-256 hash of an empty string. I've
attached the STHs to this email in the form of an STH pollination JSON
document.

My next step is implementing log entry ingestion. I'll report back any
feedback.

Regards,
Andrew

inconsistent_sths.json

Ben Laurie

unread,

Mar 18, 2024, 4:48:46 PMMar 18

to Filippo Valsorda, Certificate Transparency Policy

I do like this design, not least because it is a step towards a design I've noodled for many years now: RAIL (Redundant Array of Inexpensive Logs), the core idea there being you accept an even lower availability and thus accept that you need to get more SCTs (or, in fact, inclusion proofs) in your cert. Which leads to some comments:

a) Availability is the tradeoff here - and that should (probably) lead to an increase in the required number of SCTs/inclusion proofs.

b) You've pushed the complexity that was painful back onto object stores and "global" databases, both of which you require to be highly reliable and available - a reasonable tradeoff, but probably brings costs that are not immediately obvious (simultaneous outages of large numbers of logs being one that springs to mind).

c) Obviously switching to inclusion proofs is something I've always wanted, and solves many problems, so yay.

Matthew McPherrin

unread,

Mar 18, 2024, 6:34:57 PMMar 18

to Ben Laurie, Filippo Valsorda, Certificate Transparency Policy

I acknowledge the Sunlight design is trading off the write path availability, but not the read path. I would expect that S3 + Cloudfront to have similar or higher availability than Trillian on RDS. AWS advertises a 99.95% sla for RDS, and 99.99% for S3. In practice we have had more problems with RDS than S3, primarily related to scaling under load.

Given that, I'm not sure that increasing the number of required SCTs follows from that.

The concern about multiple operators on a single object store leading to shared outages is a real concern. I believe based on their public IP addresses, the current Let's Encrypt, Digicert, and TrustAsia's logs are all backed by AWS.

I don't know what the best path is to de-risking that, but it is not a new risk. Filippo's Rome test log is stored using Tigris, and my own private test infrastructure uses Minio. I hope we can get more operators using different platforms.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/CABrd9SSY1nCbL_EqxkGgz2o4nSkzeTHLQM-cMe6%2BjbHdmBMx-Q%40mail.gmail.com.

Matthew McPherrin

unread,

Mar 18, 2024, 6:40:37 PMMar 18

to Certificate Transparency Policy, Andrew Ayer, Certificate Transparency Policy, Filippo Valsorda

Thanks, I've opened a bug to track the all-zero root hash:

https://github.com/FiloSottile/sunlight/issues/14

All the current Sunlight logs are instantiated the same way, signing a zero-height tree during setup, so this problem does affect all of them.

Ben Laurie

unread,

Mar 19, 2024, 10:58:40 AMMar 19

to Matthew McPherrin, Filippo Valsorda, Certificate Transparency Policy

On Mon, 18 Mar 2024 at 22:34, Matthew McPherrin <ma...@letsencrypt.org> wrote:

I acknowledge the Sunlight design is trading off the write path availability, but not the read path. I would expect that S3 + Cloudfront to have similar or higher availability than Trillian on RDS. AWS advertises a 99.95% sla for RDS, and 99.99% for S3. In practice we have had more problems with RDS than S3, primarily related to scaling under load.

The server itself is the weak point. Possibly this is going to work out OK because recovering a dead server ought to be pretty simple. But OTOH, it may not turn out so well in practice...

Given that, I'm not sure that increasing the number of required SCTs follows from that.

The concern about multiple operators on a single object store leading to shared outages is a real concern. I believe based on their public IP addresses, the current Let's Encrypt, Digicert, and TrustAsia's logs are all backed by AWS.
I don't know what the best path is to de-risking that, but it is not a new risk.

Agreed - but this design reduces the plausible pool of service providers considerably.

Matthew McPherrin

unread,

Mar 19, 2024, 12:01:55 PMMar 19

to Ben Laurie, Filippo Valsorda, Certificate Transparency Policy

One of the design goals of introducing a new read-path API is that the tiles can be served directly from a web server or object storage (either directly, or via CDN).

The application server handles the write APIs and stores tiles, but isn't involved in the read path at all.

Thus Monitors/Auditors should be able to continue to process entries even if the application server handling writes is down, or gone.

I anticipate this will make keeping historical log archives or mirrors much easier as well.

I don't anticipate that the current design will reduce service providers very much, either.

The cloud providers all have object storage available as a commodity item (At least: AWS, Google, Azure, Oracle, DigitalOcean, OVH, Linode).

While the current Sunlight implementation uses object storage, the design can be adapted to standard file storage.

That means any classical "web servers with a storage cluster" type on-prem/bare-metal deployment will work as well.

Or if you just have four servers in a rack, Minio or Ceph are options.

The case of "we only have databases" isn't currently supported, but I'm not sure that's a significant fraction of plausible service providers.

And of course, we don't want a single CT log implementation running all logs either.

Anyone who wants to run a database-backed log can continue to run Trillian or other implementations.

Filippo Valsorda

unread,

Mar 19, 2024, 2:51:49 PMMar 19

to Andrew Ayer, Certificate Transparency Policy

Thank you everyone for the excellent feedback so far! I join Joe and Clint in waiting to hear back from Monitors about your experience integrating support for Sunlight logs. Please don't hesitate to reach out via email or Slack.

(Responding to a few comments in separate emails.)

2024-03-18 14:14 GMT+01:00 Andrew Ayer <ag...@andrewayer.name>:

It's extremely exciting to see the launch of Sunlight! It should make
CT logs significantly easier to operate reliably, which will improve
the health of the CT ecosystem.

I've updated SSLMate's monitor to ingest STHs from the 15 public
Sunlight logs, and it has already detected an issue. The Sunlight
specification says:

"Note that all cryptographic operations (such as hashes and signatures)
are as specified by RFC 6962, so these APIs can be thought of as an
alternative encoding format for the same data"

However, at least 10 Sunlight logs have produced STHs with a tree size
of 0 and an all-zero root hash, while RFC 6962 specifies that the root
hash of an empty tree is the SHA-256 hash of an empty string. I've
attached the STHs to this email in the form of an STH pollination JSON
document.

Never been this excited to hear a bug report =)

This is indeed an implementation issue. The Go Checksum Database and its tooling (specifically, golang.org/x/mod/sumdb/tlog, which I am reusing for Sunlight) diverge from RFC 6962 in how they define the hash of an empty tree.

https://pkg.go.dev/golang.org/x/mod/sumdb/tlog#TreeHash

I wish we caught this sooner. All zeros is a bit nicer but not worth diverging over.

Will fix. (See Matt's issue https://github.com/FiloSottile/sunlight/issues/14.) Logs that aim to eventual inclusion will need to rekey. I don't plan to rekey Rome unless the old size zero checkpoint is annoying for monitors.

Filippo Valsorda

unread,

Mar 19, 2024, 7:03:07 PMMar 19

to Ben Laurie, Certificate Transparency Policy

2024-03-18 21:48 GMT+01:00 Ben Laurie <be...@google.com>:

I do like this design, not least because it is a step towards a design I've noodled for many years now: RAIL (Redundant Array of Inexpensive Logs), the core idea there being you accept an even lower availability and thus accept that you need to get more SCTs (or, in fact, inclusion proofs) in your cert.

Hi Ben, very glad to hear you like Sunlight! Hopefully that means you don't mind that we used the name of your initial draft ^^ (If you actually do mind, the name would be my fault, for the record.)

Indeed, I do think it's a step closer to the ideas of RAIL, and there is an availability tradeoff. However, I want to stress that Sunlight was explicitly designed not to require additional SCTs, as that would be a significant change in browser policy, which could slow down rollout.

The CT system has two somewhat orthogonal redundancy mechanisms: multiple qualified logs in the ecosystem, and multiple SCTs in a certificate.

If a log's add-chain endpoint is unavailable, CAs need to submit to other logs. This is where Sunlight's single node architecture trades off availability: it can't have as many nines as a Trillian instance can have. This will be ok as long as there are enough logs overall, and Sunlight will hopefully increase the overall number of logs.

On the other hand, if a log becomes Rejected, certificates need to include other SCTs to remain valid. Sunlight does not suffer a tradeoff here. The kind of issues that cause logs to become Rejected are not any more likely to occur to a Sunlight log than they are to a Trillian log. Even if the 99% SLO is strictly enforced on the write path, 22h of downtime (1% of three months) is the stuff of major incidents or software bugs, not the effect of updating or migrating a single-node system. Major incidents and software bugs spare no one, neither single nodes nor distributed systems, as we’ve seen. Hell, my maybe unpopular opinion is that distributed systems are less likely to have brief downtimes but more likely to have long downtimes, because when they do go down they are often more complex to bring back up.

Anyway, to sum up Sunlight does make an availability tradeoff, but doesn’t increase log rejection chances, so doesn’t need additional SCTs.

P.S. We could even have a conversation about decoupling write-path and read-path SLO, because the former is compensated by log redundancy, while the latter is also about making sure certificates make it to monitors in a timely fashion. However, I prefer not to put that conversation on the critical path of deploying Sunlight logs, as there's no need.

Ben Laurie

unread,

Mar 20, 2024, 12:34:48 PMMar 20

to Matthew McPherrin, Filippo Valsorda, Certificate Transparency Policy

On Tue, 19 Mar 2024 at 16:01, Matthew McPherrin <ma...@letsencrypt.org> wrote:

One of the design goals of introducing a new read-path API is that the tiles can be served directly from a web server or object storage (either directly, or via CDN).
The application server handles the write APIs and stores tiles, but isn't involved in the read path at all.

Thus Monitors/Auditors should be able to continue to process entries even if the application server handling writes is down, or gone.

If the app server is down, this is not a log, it's a record of an ex-log.

I anticipate this will make keeping historical log archives or mirrors much easier as well.

It isn't hard now.

I don't anticipate that the current design will reduce service providers very much, either.
The cloud providers all have object storage available as a commodity item (At least: AWS, Google, Azure, Oracle, DigitalOcean, OVH, Linode).

While the current Sunlight implementation uses object storage, the design can be adapted to standard file storage.
That means any classical "web servers with a storage cluster" type on-prem/bare-metal deployment will work as well.
Or if you just have four servers in a rack, Minio or Ceph are options.

You can't have it both ways: this will *definitely* impact reliability.

The case of "we only have databases" isn't currently supported, but I'm not sure that's a significant fraction of plausible service providers.
And of course, we don't want a single CT log implementation running all logs either.
Anyone who wants to run a database-backed log can continue to run Trillian or other implementations.

Not if you're going to change the protocol to work off inclusion proofs.

Ben Laurie

unread,

Mar 20, 2024, 12:38:58 PMMar 20

to Filippo Valsorda, Certificate Transparency Policy

On Tue, 19 Mar 2024 at 23:03, Filippo Valsorda <fil...@ml.filippo.io> wrote:

2024-03-18 21:48 GMT+01:00 Ben Laurie <be...@google.com>:
I do like this design, not least because it is a step towards a design I've noodled for many years now: RAIL (Redundant Array of Inexpensive Logs), the core idea there being you accept an even lower availability and thus accept that you need to get more SCTs (or, in fact, inclusion proofs) in your cert.

Hi Ben, very glad to hear you like Sunlight! Hopefully that means you don't mind that we used the name of your initial draft ^^ (If you actually do mind, the name would be my fault, for the record.)

Indeed, I do think it's a step closer to the ideas of RAIL, and there is an availability tradeoff. However, I want to stress that Sunlight was explicitly designed not to require additional SCTs, as that would be a significant change in browser policy, which could slow down rollout.

The CT system has two somewhat orthogonal redundancy mechanisms: multiple qualified logs in the ecosystem, and multiple SCTs in a certificate.

If a log's add-chain endpoint is unavailable, CAs need to submit to other logs. This is where Sunlight's single node architecture trades off availability: it can't have as many nines as a Trillian instance can have. This will be ok as long as there are enough logs overall, and Sunlight will hopefully increase the overall number of logs.

On the other hand, if a log becomes Rejected, certificates need to include other SCTs to remain valid. Sunlight does not suffer a tradeoff here. The kind of issues that cause logs to become Rejected are not any more likely to occur to a Sunlight log than they are to a Trillian log. Even if the 99% SLO is strictly enforced on the write path, 22h of downtime (1% of three months) is the stuff of major incidents or software bugs, not the effect of updating or migrating a single-node system. Major incidents and software bugs spare no one, neither single nodes nor distributed systems, as we’ve seen. Hell, my maybe unpopular opinion is that distributed systems are less likely to have brief downtimes but more likely to have long downtimes, because when they do go down they are often more complex to bring back up.

I do agree with this and said it myself somewhere (possibly not publicly :-).

Anyway, to sum up Sunlight does make an availability tradeoff, but doesn’t increase log rejection chances, so doesn’t need additional SCTs.

Hmm. I guess we shall see! I have not seen a measurement of the reliability of these backends that would inform such an opinion.

P.S. We could even have a conversation about decoupling write-path and read-path SLO, because the former is compensated by log redundancy, while the latter is also about making sure certificates make it to monitors in a timely fashion. However, I prefer not to put that conversation on the critical path of deploying Sunlight logs, as there's no need.

RAIL does decouple like this.

Andrew Ayer

unread,

Apr 4, 2024, 7:59:17 PMApr 4

to Certificate Transparency Policy

I have run into an issue with the Sunlight protocol while implementing
entry processing in SSLMate's monitor.

In RFC 6962, get-entries returns a chain to an accepted root for every
logged certificate. This chain is necessary for attributing blame for a
misissued certificate: "If a certificate is accepted and an SCT issued,
the accepting log MUST store the entire chain used for verification,
including the certificate or Precertificate itself and including the
root certificate used to verify the chain (even if it was omitted from
the submission), and MUST present this chain for auditing upon request.
This chain is required to prevent a CA from avoiding blame by logging
a partial or empty chain." [RFC 6962 Section 3.1]

RFC 9162 further strengthens this requirement by explicitly calling it
log misbehavior for a log to not return a valid chain.

However, Sunlight dispenses with the chain for each entry, instead
requiring monitors to build a chain themselves using the log's
issuers bundle. This is problematic. First, building an X509 chain
is considerably more complex than verifying an X509 chain, so Sunlight
increases the burden on monitors and makes it hard to migrate to
Sunlight. Worse, X509 chain building is rife with
implementation-specific quirks. For example, it would be quite
reasonable for a log to do some or all of the following:

1. Not require byte-for-byte equality of Issuer and Subject (openssl
behavior).

2. Not require AKI and SKI to match (Go behavior).

3. Not require Issuer and Subject to match if AKI and SKI match
(Microsoft behavior).

If a monitor (or an RFC 6962 translation proxy) doesn't use the same
logic as the log, it might build a different chain, or fail to build a
chain at all. This would lead to reports of log misbehavior, creating
a burden on the log operator to investigate and find the right chain.

(Currently, to account for chain verification idiosyncrasies and avoid
spurious misbehavior reports, SSLMate's monitor verifies only the
signatures in the chain. This isn't feasible with Sunlight's issuers
bundle as it would require attempting a verification with every public
key in the bundle.)

I think this can be easily fixed by including the chain in Sunlight's
extra_data, but as fingerprints instead of full certificates. Monitors
would use the fingerprint to find the corresponding certificate in the
issuers bundle. This adds some extra bytes that logs need to store and
transmit, but it would still be *considerably* less than what Trillian /
RFC 6962 now require.

Regards,
Andrew

Filippo Valsorda

unread,

Apr 4, 2024, 10:33:41 PMApr 4

to Andrew Ayer, Certificate Transparency Policy

Thank you Andrew for bringing this up, it's an intentional design choice that's worth discussing with the community.

Monitors indeed need a simple reproducible process to reconstruct chains.

My preference would be to specify byte-for-byte equality of Issuer and Subject as a requirement for Sunlight-logged chains. Given that, it should be almost as easy to build a chain from the issuers bundle as it would be with fingerprints: looking up potential parents by their Subject bytes is not much different from looking them up by their hash; there might be a few candidates instead of only one, but not so many that trial signature verifications become impractical.

That might sound like a CT-enforced rule, which depending on how you view the role of CT might be undesirable. However, it's my understanding that currently all Trillian logs already require byte-for-byte equality of Issuer and Subject when building chains (see isValid and ValidateChain), so we would in fact just be capturing the existing state of the world, and making a lot of logic simpler by allowing monitors to rely on it.

(Personally, I would consider ratcheting the existing requirement forward into a specification MUST and stopping a backslide into the more complex OpenSSL or Microsoft behaviors as a positive side-effect.)

If the above is still considered problematic, I am not strongly opposed to including fingerprints of the issuers in the TileLeaf structure. I agree it would be a simple change with modest cost.

Filippo Valsorda

unread,

Apr 6, 2024, 4:52:23 PMApr 6

to Andrew Ayer, Certificate Transparency Policy

2024-04-05 04:33 GMT+02:00 Filippo Valsorda <fil...@ml.filippo.io>:

Thank you Andrew for bringing this up, it's an intentional design choice that's worth discussing with the community.

Monitors indeed need a simple reproducible process to reconstruct chains.

My preference would be to specify byte-for-byte equality of Issuer and Subject as a requirement for Sunlight-logged chains.

I opened https://github.com/C2SP/C2SP/pull/69 to codify the requirement. I would appreciate feedback from monitors on whether this sufficiently addresses chain-building complexity, and from browsers on whether logs with such an explicit requirement would be acceptable. (Note that all Trillian logs already implicitly apply such a requirement.)

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/c1734f89-d5fa-4342-8542-04c5cbf5bfb3%40app.fastmail.com.

Andrew Ayer

unread,

Apr 8, 2024, 5:02:42 PMApr 8

to Filippo Valsorda, Certificate Transparency Policy

On Fri, 05 Apr 2024 04:33:07 +0200
"Filippo Valsorda" <fil...@ml.filippo.io> wrote:

> My preference would be to specify byte-for-byte equality of Issuer
> and Subject as a requirement for Sunlight-logged chains. Given that,
> it should be almost as easy to build a chain from the issuers bundle
> as it would be with fingerprints: looking up potential parents by
> their Subject bytes is not much different from looking them up by
> their hash; there might be a few candidates instead of only one, but
> not so many that trial signature verifications become impractical.

I see two CA subjects in CT which have 9,734 and 9,910 keys respectively,
both Merge Delay Intermediates from Google's monitoring. Among real CAs,
the current maximum is 8, but there is technically no limit on high that
could go, so I think it would be prudent for monitors to try AKI/SKI
first, lest a CA that goes wild with rekeying tank their performance.
That's workable, but it's several degrees less simple that it was in
RFC 6962 which has the chain right in the get-entries response.

But my bigger concern is that because multiple distinct CA certificates
can contain the same subject and public key, it would still be possible
for monitors or translation proxies to construct a different chain than
the one used during submission, or from each other. So the process would
not be reproducible, and a Sunlight log would not be able to provide an
RFC6962-compliant interface using a proxy.

That might prove OK in the end, but it makes me nervous. The Sunlight
spec claims "these APIs can be thought of as an alternative encoding
format for the same data". I find this an extremely compelling
property that justifies accelerated adoption of Sunlight, because it
means consumers can ultimately get the same bytes from a Sunlight log
as they would from an RFC6962 log, giving me confidence that monitoring
Sunlight logs will work just like monitoring RFC6962 logs. But if
monitors and translation proxies have to perform non-reproducible chain
building, then the property claimed in the Sunlight spec does not hold,
and I worry that there is some new edge case waiting to be discovered.

I think it would be well worth it to include the fingerprints in TileLeaf
to get the property claimed in the spec.

(By the way, I support requiring byte-for-byte subject/issuer matching
during acceptance, and this can still be required. But it should not
be a load-bearing part of the protocol.)

Regards,
Andrew

Filippo Valsorda

unread,

Apr 9, 2024, 6:40:08 AMApr 9

to Andrew Ayer, Certificate Transparency Policy

2024-04-08 23:02 GMT+02:00 Andrew Ayer <ag...@andrewayer.name>:

That might prove OK in the end, but it makes me nervous. The Sunlight
spec claims "these APIs can be thought of as an alternative encoding
format for the same data". I find this an extremely compelling
property that justifies accelerated adoption of Sunlight, because it
means consumers can ultimately get the same bytes from a Sunlight log
as they would from an RFC6962 log, giving me confidence that monitoring
Sunlight logs will work just like monitoring RFC6962 logs. But if
monitors and translation proxies have to perform non-reproducible chain
building, then the property claimed in the Sunlight spec does not hold,
and I worry that there is some new edge case waiting to be discovered.

Yeah, this is a compelling argument. I'll propose a couple PRs to add chain fingerprints to TileLeaf.

While at it, I would like feedback on whether to keep reproducing the Precertificate Signing Certificate in each TileLeaf, or whether to treat it like any other issuer. The original reasoning was that there's nothing stopping CAs from making a new Precertificate Signing Certificate every hour, flooding the issuers bundle. However, as you point out, there's technically nothing stopping CAs from doing that with regular intermediates either. Are Precertificate Signing Certificate common enough to even care about it? Do they rotate more frequently than intermediates?

(By the way, I support requiring byte-for-byte subject/issuer matching
during acceptance, and this can still be required. But it should not
be a load-bearing part of the protocol.)

I think there's generally consensus for the requirement, but it feels out of scope for the Sunlight spec if it's not necessary to enable the monitoring API. I plan to keep the de-facto requirement in the implementation, though (also because Sunlight uses the Trillian chain verifier).

Andrew Ayer

unread,

Apr 10, 2024, 6:23:20 PMApr 10

to Filippo Valsorda, Certificate Transparency Policy

On Tue, 09 Apr 2024 12:39:37 +0200
"Filippo Valsorda" <fil...@ml.filippo.io> wrote:

> While at it, I would like feedback on whether to keep reproducing the
> Precertificate Signing Certificate in each TileLeaf, or whether to
> treat it like any other issuer. The original reasoning was that
> there's nothing stopping CAs from making a new Precertificate Signing
> Certificate every hour, flooding the issuers bundle. However, as you

> point out, there's *technically* nothing stopping CAs from doing that

> with regular intermediates either. Are Precertificate Signing
> Certificate common enough to even care about it? Do they rotate more
> frequently than intermediates?

This raises another question, which is whether issuers.pem is
sufficiently resilient to spam attacks that create a huge number of
intermediates (possibly under a constrained or revoked intermediate).

Spam attacks are already a concern with leaf certificates, but once
a leaf spam attack is mitigated, logs can go back to normal. In
contrast, spammy intermediates would occupy issuers.pem for the rest of
the log's life, causing pain any time it changes and monitors have to
download a new copy. It might be necessary to freeze the log after a
bad intermediate spam attack.

Spam attacks don't necessarily require a malicious CA - if a
browser-trusted CA cross-signs the US Federal PKI (again), then
suddenly a huge number of new intermediates could appear in issuers.pem.
(Maybe not enough to kill a log, but it would still be wasteful.)

Maybe DER-encoded issuer certificates should be served as individual
files at /issuers/{SHA256 fingerprint}?

I think it would be fine to serve Precertificate Signing Certificates
that way too.

> > (By the way, I support requiring byte-for-byte subject/issuer
> > matching during acceptance, and this can still be required. But it
> > should not be a load-bearing part of the protocol.)
>
> I think there's generally consensus for the requirement, but it feels
> out of scope for the Sunlight spec if it's not necessary to enable
> the monitoring API. I plan to keep the de-facto requirement in the
> implementation, though (also because Sunlight uses the Trillian chain
> verifier).

That's reasonable.

Regards,
Andrew

Filippo Valsorda

unread,

Jun 7, 2024, 11:28:10 AMJun 7

to Certificate Transparency Policy, Andrew Ayer

Hello all,

I have prepared a revision to the Sunlight spec with the following changes:

moved issuers from a bundle to individual files;
switched treatment of Precertificate Signing Certificates to be equivalent to issuers;
stored issuer fingerprints with each leaf, to guarantee complete RFC 6962 equivalence;
slightly tweaked the URL to remove tile height (which is now hardocded to eight) following discussion around a general API for tiled Merkle Trees (https://github.com/C2SP/C2SP/pull/73).

https://github.com/C2SP/C2SP/pull/76

Most of these changes are based on the discussion in this thread, and on Andrew's suggestions in particular. (Thank you!) I would appreciate feedback from potential log operators, monitors, and relying parties.

Once discussion on this change concludes, I will make the implementation changes, and bundle them with the fix of the hash of zero-sized logs reported in this thread. After that we'll reset the existing logs.

We're also considering changing the name of the specification to more clearly disambiguate the Sunlight CT monitoring API (which may be implemented by any log implementation) and the Sunlight CT log implementation. The best I could come up with so far is the "Static Certificate Transparency API" by analogy to static websites (those served directly from static assets, without server-side computation), or c2sp.org/static-ct-api for short. We welcome both feedback on the change, and suggestions for alternative names.

Thank you,

Filippo

Andrew Ayer

unread,

Jun 9, 2024, 12:32:19 PMJun 9

to Filippo Valsorda, Certificate Transparency Policy

On Fri, 07 Jun 2024 17:27:46 +0200
"Filippo Valsorda" <fil...@ml.filippo.io> wrote:

> I have prepared a revision to the Sunlight spec with the following
> changes:

I just finished writing an RFC 6962 compatibility proxy against the new
Sunlight spec: https://github.com/AGWA/sunglasses/

I found the exercise very straightforward. The logic is simple - much
simpler than it would have been without the change to certificate chains.
I'm feeling good about the protocol.

I wanted to flag one aspect of the protocol for discussion. The log
entries in the data tile are not prefixed by their length, so skipping
an entry requires parsing it successfully. Consequentially, if a
log publishes a malformed entry, monitors won't be able to parse any
subsequent entry in the tile. In contrast, a single malformed entry in
RFC 6962 doesn't prevent other entries from being parsed. But perhaps
a single malformed entry necessitates disqualifying the log anyways,
so this is not a big deal? This has never happened with RFC 6962 logs
so there's no precedent. I'm curious what others think.

Regards,
Andrew

Filippo Valsorda

unread,

Jun 10, 2024, 9:53:12 AMJun 10

to Andrew Ayer, Certificate Transparency Policy

2024-06-09 18:32 GMT+02:00 Andrew Ayer <ag...@andrewayer.name>:

On Fri, 07 Jun 2024 17:27:46 +0200
"Filippo Valsorda" <fil...@ml.filippo.io> wrote:

> I have prepared a revision to the Sunlight spec with the following
> changes:

I just finished writing an RFC 6962 compatibility proxy against the new
Sunlight spec: https://github.com/AGWA/sunglasses/

😎💯

I found the exercise very straightforward. The logic is simple - much
simpler than it would have been without the change to certificate chains.
I'm feeling good about the protocol.

Thank you so much for building this, it's an important test and validation of the protocol.

I wanted to flag one aspect of the protocol for discussion. The log
entries in the data tile are not prefixed by their length, so skipping
an entry requires parsing it successfully. Consequentially, if a
log publishes a malformed entry, monitors won't be able to parse any
subsequent entry in the tile. In contrast, a single malformed entry in
RFC 6962 doesn't prevent other entries from being parsed. But perhaps
a single malformed entry necessitates disqualifying the log anyways,
so this is not a big deal? This has never happened with RFC 6962 logs
so there's no precedent. I'm curious what others think.

I went back and forth on this, and more broadly on the encoding of the "data" tiles.

Here are this things I considered:

RFC 6962 made what in hindsight I believe was a mistake by requiring extra per-entry data which is not hashed into the Merkle tree, so we need somewhere to put this data.
This data compresses very well against the Merkle leaf, because it's mainly the pre_certificate which shares most of its bytes with the tbs_certificate.
The encoding of Merkle leaves (and Sunlight data tiles) is very simple: it's not ASN.1 but a TLS-like length-prefixed encoding.

To be clear, the "malformed entry" that breaks parsing of subsequent entries here would have malformed TLS-style framing. Malformed ASN.1 would still be contained in length-prefixed opaque fields.
I've included a complete reference of the TLS structures at https://gist.github.com/FiloSottile/40d1490ce2e599a2ddf6dcb83a0d95ac.

1 and 2 make it very inefficient to adopt the proposed c2sp.org/tlog-tiles entries format, which is a shame. In my view, the main value of length-prefixing entries was being able to write generic tlog code that can hash or enumerate entries without being aware of Sunlight or CT. However, we'd have to put the extra data in separate files where they wouldn't compress against the tbs_certificate. (We also considered specifying space for extra data in c2sp.org/tlog-tiles, but decided against it to avoid encouraging what we believe is a dangerous design.)

Due to 3, I think the value of further length-prefixing entries is small. Sure, there would be one length field per entry instead of a few, but I'm not sure that changes the complexity that much. Logs could still get that wrong, just like they could forget a closing quote or a comma in the get-entries JSON response. (In that sense, I disagree that a malformed entry in RFC 6962 doesn't prevent other entries from being parsed. It depends on what level of encoding you look at.) The consensus seems to be that if the hashed entries were correct and they can fix the encoding, it's a recoverable issue, too.

Another reason to length-prefix would be to let clients be forwards-compatible by skipping unknown future entries they don't know how to parse. I think this is generally undesirable, as clients are monitors, and we don't want a mechanism to include entries in the log that a monitor will ignore.

Ultimately, I don't have a strong opinion on this. I think length-prefixing would be mostly superfluous but not harmful, and if there's consensus in favor of it I'm happy to make the change. (To time box the discussion, I am planning to merge the change and work on deploying it at the end of this week.)

Andrew Ayer

unread,

Jun 11, 2024, 8:23:27 AMJun 11

to Filippo Valsorda, Certificate Transparency Policy

I think you're right that the value of the length prefix is small, and
I don't have a strong opinion either. If no one has a stronger opinion
by the end of the week, I'm good with leaving it as-is.

Regards,
Andrew

Andrew Ayer

unread,

Jun 21, 2024, 9:54:07 AM (5 days ago) Jun 21

to Filippo Valsorda, Certificate Transparency Policy

On Fri, 07 Jun 2024 17:27:46 +0200
"Filippo Valsorda" <fil...@ml.filippo.io> wrote:

> We're also considering changing the name of the specification to more
> clearly disambiguate the Sunlight CT monitoring API (which may be
> implemented by any log implementation) and the Sunlight CT log
> implementation. The best I could come up with so far is the "Static
> Certificate Transparency API" by analogy to static websites (those
> served directly from static assets, without server-side computation),
> or c2sp.org/static-ct-api for short. We welcome both feedback on the
> change, and suggestions for alternative names.

I think it's a very good idea to disambiguate the API and the
implementation.

I haven't been able to think of a better name than "Static Certificate
Transparency API". The ability to serve from static assets is, I
believe, the key benefit of this protocol, so it seems quite appropriate
to include static in the name.

Regards,
Andrew

Reply all

Reply to author

Forward