Questions about SCT Auditing in Chrome

Ryan Lehmkuhl

unread,

Oct 31, 2024, 5:42:02 PM10/31/24

to jdeb...@chromium.org, cth...@chromium.org, trusty-t...@chromium.org

Hi all,

I'm a PhD student at MIT working on applied cryptography with Henry Corrigan-Gibbs. I had a few small questions about Chrome's current SCT auditing approach after reading your public document "Opt-out SCT Auditing in Chrome":

Does Chrome audit every 1/1000 SCTs? Or 1/10000? The document says 1000, but the chromium source code says 10000.
The Phase 1 description + the chromium source code describe caching happening per-SCT rather than per-connection: so even if I visit a website often, if I didn't select it for auditing initially, that decision is remembered (until I restart my browser and the cache resets). Is this still the behavior that Chrome follows? It's not clear to me from the Phase 2 document.
Can you provide any information on the set of popular SCTs that gets downloaded for all clients? Knowing which domains are included, or even just the size of the set, would be very useful.

Thanks for all your great work on this!

Best,

Ryan

--
You received this message because you are subscribed to the Google Groups "(Retired) trusty-transport" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trusty-transpo...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/trusty-transport/CAH1d7sWuLxFYqdNe9ffCEVU6%3DnKcL%2BLDT52FTVNTJMLe%3DL-%3Dmg%40mail.gmail.com.
For more options, visit https://groups.google.com/a/chromium.org/d/optout.

Chris Thompson

unread,

Nov 1, 2024, 1:22:10 PM11/1/24

to Ryan Lehmkuhl, jdeb...@chromium.org, trusty-t...@chromium.org

Hi Ryan --

I think the 1/1,000 connection sampling was used to do our anonymity set analysis. The code is the definitive source (we have the ability to override that parameter via our fieldtrial framework, but we have not done so). This also matched what we did for our initial opt-in SCT auditing, so it was simpler to have the two modes share a sampling rate. I can say that our initial conservative 1-in-10k sampling rate has worked fine (we have successfully detected unlogged SCTs, most recently for the Sabre 2025h1 MMD violation) and also serves as a throttle on how much traffic our collection server receives.

The deduplication cache is keyed on the hash of the set of SCTs included in the report. Phase 2 (opt-out) selects a random SCT from the cert, checks the popular SCTs list, and then tries to generate a report. This report will only have the single randomly selected SCT (versus Phase 1 opt-in clients which include the full list of SCTs). So for Phase 1 clients, if the set of SCTs changes (e.g., the site is using a new certificate, or serving a slightly different set via the TLS extension) the report will not match a previous cache entry. For Phase 2 clients, if the connection is sampled for the same website a second time and a different random SCT is selected, the report will not match a previous cache entry and a new report will be created. Happy to provide more details if that is still confusing though.

The popular SCTs set is delivered to Chrome clients via the component updater as part of our PKIMetadata component. The data for this component is inspectable on a desktop machine running Chrome if you want to see the currently delivered set of popular SCTs. (On my macOS machine, this is stored in ~/Library/Application\ Support/Google/Chrome\ Canary/PKIMetadata/<version_number>/ct_config.pb, which uses this definition.) The list is generated based on SCTs uploaded by opt-in clients, and is currently limited to a set of 1024 SCTs.

Hope that helps! I'm always happy to answer questions about this, particularly as it relates to the client implementation.

- Chris

On Thu, Oct 31, 2024 at 2:42 PM Ryan Lehmkuhl <rya...@mit.edu> wrote:

Hi all,

I'm a PhD student at MIT working on applied cryptography with Henry Corrigan-Gibbs. I had a few small questions about Chrome's current SCT auditing approach after reading your public document "Opt-out SCT Auditing in Chrome":
Does Chrome audit every 1/1000 SCTs? Or 1/10000? The document says 1000, but the chromium source code says 10000.
The Phase 1 description + the chromium source code describe caching happening per-SCT rather than per-connection: so even if I visit a website often, if I didn't select it for auditing initially, that decision is remembered (until I restart my browser and the cache resets). Is this still the behavior that Chrome follows? It's not clear to me from the Phase 2 document.
Can you provide any information on the set of popular SCTs that gets downloaded for all clients? Knowing which domains are included, or even just the size of the set, would be very useful.
Thanks for all your great work on this!

Best,
Ryan

--
You received this message because you are subscribed to the Google Groups "(Retired) trusty-transport" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trusty-transpo...@chromium.org.

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/trusty-transport/CALMy46Q8P16Sd%2BcamRrSr8%3D1v%2BhWCkFmjTU6XXUVoS3JBWKN_Q%40mail.gmail.com.

Ryan Lehmkuhl

unread,

Nov 6, 2024, 1:03:03 PM11/6/24

to Chris Thompson, jdeb...@chromium.org, trusty-t...@chromium.org

Chris,

Thanks for such a detailed and quick response! This answered most of my questions :)

re: client caching, would it be fair to say that most users make very few, if any, SCT audits then? Speaking for myself, I would guess I don't visit more than a few hundred unique domains over the span of a year (though perhaps I'm underestimating things), in which case there would be a good chance I never make an audit (if I'm understanding things correctly).

Ah thanks! Does there happen to be a mapping to the domains these SCTs correspond to?

Best,

Ryan

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/trusty-transport/CAH1d7sUxoCmBXuvH30SfpYF0h%3D0J3c%3DLWGm6%3D2x2ZEqNgmOkZQ%40mail.gmail.com.

Chris Thompson

unread,

Nov 6, 2024, 1:38:54 PM11/6/24

to Ryan Lehmkuhl, jdeb...@chromium.org, trusty-t...@chromium.org

Yes, we expect that most users would make few or no SCT audits. To that end, for Phase 2 (opt-out) auditing we also cap the number of reports that a client can ever send to 3 as a "max privacy exposure" limit.

We don't have a mapping of SCTs to domains (the list is dynamically generated server side, and then we ship the minimum data of just the leaf hashes). However, you might be able to query CT logs to kind of reverse engineer this with the ctclient tool -- for example, I think you could use ctclient get-inclusion-proof to check each CT log to find the one that has an inclusion proof for the leaf hash, which gets you the log index; then you could use ctclient get-entries with the log and index to get the log entry and certificate. I haven't tried this myself, but hopefully that gives you a possible path forward.

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/trusty-transport/CALMy46Sk1%2BscPg9N5E9u6-58yvfdU-rVwo73LJF1ue4CHCAUoQ%40mail.gmail.com.

Ryan Lehmkuhl

unread,

Nov 6, 2024, 1:49:07 PM11/6/24

to Chris Thompson, jdeb...@chromium.org, trusty-t...@chromium.org

Ah, I see. Just to make sure I'm understanding, "report" here refers to making an SCT audit? For some reason I thought it referred to when a user actually reported a failed audit.

Thanks for the tip! I'll go ahead and try that.

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/trusty-transport/CAH1d7sUDjjBCo%3DQ2NpZ3o9NE7e1eT%2Bw5-H%3D8ywW%3DOT4zGwwOUA%40mail.gmail.com.

Chris Thompson

unread,

Nov 6, 2024, 1:52:10 PM11/6/24

to Ryan Lehmkuhl, jdeb...@chromium.org, trusty-t...@chromium.org

The "report" the bundle of details that get sent to the server (either because it was randomly selected for Phase 1 clients, or because it was randomly selected and also failed the hashdance lookup for Phase 2 clients). For Phase 2 clients we don't cap how many hashdance lookups they can do (the sampling rate, distribution of SCTs, and our bucketing approach gets us to our privacy threshold for hashdance lookups) but we do cap how many "final reports" they can ever send, if that makes sense.

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/trusty-transport/CALMy46TGf2uy5MC2Uu7OmHaE2m0632%3DOoxB0XrzKxMboKTpm8Q%40mail.gmail.com.

Ryan Lehmkuhl

unread,

Nov 6, 2024, 1:58:59 PM11/6/24

to Chris Thompson, jdeb...@chromium.org, trusty-t...@chromium.org

That makes sense! I think that's all the questions I have right now. Thanks for the quick and helpful responses :)

Best,

Ryan

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/trusty-transport/CAH1d7sUCicqYjCRk0OUKbPGZ%2BY4jR3EN1MrDhDSFy7BNKYDFvQ%40mail.gmail.com.

Ryan Lehmkuhl

unread,

Nov 11, 2024, 10:30:52 AM11/11/24

to Chris Thompson, jdeb...@chromium.org, trusty-t...@chromium.org

Hi again,

Hope your weekend went well!

To give some more context to my previous questions, we recently did some PIR work
relevant to SCT auditing. This led us to think about the privacy / security guarantees
of an ideal auditing protocol, and which of those guarantees Chrome's current
approach achieves. We're planning to submit an RWC talk with some of that
analysis–would love to have a conversation about this with y'all at some point if there's
interest! Of course, feel free to forward this email to anyone else relevant.

One thing that we thought you should know, we believe there is a flaw with the current

anonymity set analysis for a simple reason: it doesn't consider the highly-skewed
popularity of websites. You mention this fact later when discussing pre-loading SCTs,
but it seems to be critical to the anonymity set analysis as well. For example, in a recent
web study over 95% of page-loads were for the top one-million domains. As a result,
with high probability any audit result will fall within the top one-million domains. If I
repeat your anonymity set analysis with this in mind then I get, with high probability,
an anonymity set of size 2. (Chrome's caching behavior muddles this analysis a little
bit, but the point should still more or less stand.)

Let me know if I missed anything here.

Best,
Ryan

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/trusty-transport/CAH1d7sWEtJGL-K3LnzGxV0XBMjYWRoSeth%3D_keSe-75tt%3D7QKg%40mail.gmail.com.

Joe DeBlasio

unread,

Nov 13, 2024, 10:00:45 PM11/13/24

to Ryan Lehmkuhl, Chris Thompson, trusty-t...@chromium.org

Hi Ryan,

Thanks for the context and analysis! This is interesting, and it's definitely an oversight that we didn't take website popularity skew into account. From an information-theoretic perspective, the current approach is certainly far from ideal.

Thinking out loud, and speaking only for myself: while I still have dreams of implementing a stronger PIR scheme for SCT auditing, I can't say I'm going to lose much sleep over the current implementation. A sampling rate of 1-in-10k connections, combined with the top-heavy distribution of site popularity and generally low number of sites visited per user per day, the fact that the queries are credential-less (so are leaking IP address but little other metadata), and the fact that an anonymity set of 2 still gives a user plausible deniability (which feels vaguely differentially private) all comes together to make the current scheme feel like a pretty small privacy compromise.

The popularity-skew observation would definitely prevent us from upping the sampling rate of the current scheme, but even without the skew concern, we'd need to up the sampling rate several orders of magnitude where it is now for it to meaningfully change what value we get out of auditing. Something I noted recently to Lena Heimberger and Bas Westerbaan (who spent some time this summer investigating whether more recent PIR schemes could be interesting for SCT auditing) was that the current auditor isn't an effective tool against a targeted attack, but works fine as a means of monitoring logs for unintended misbehavior. Unless PIR enabled us to use SCT auditing to detect an individual misissued SCT with high probability, it subjectively doesn't add much value above baseline.

I'm always happy to chat about SCT auditing (or frankly anything else) if that's valuable. I'm also definitely excited to see your talk! Whether it ends up at RWC or somewhere else, if you want to give a reprise to Chrome Security at some point, I'd love to make that happen.

Best,

Joe

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/trusty-transport/CAFZs0S7uYCGXpsBUP5SV16zCk5OfVwaMbyu4hh%2BZKQNbZkmX7Q%40mail.gmail.com.

Ryan Lehmkuhl

unread,

Nov 15, 2024, 1:33:52 PM11/15/24

to Joe DeBlasio, Chris Thompson, trusty-t...@chromium.org

Joe,

Thanks for the thoughtful response! It's very helpful to hear different opinions

on something like this. Just to be explicit: are you all ok with us submitting a

talk proposal with this analysis in it?

Yes, I actually had a few conversations with both Lena and Bas recently and

they mentioned this fact. I actually have two slightly different ideas for improving

the value of auditing: I don't have time to type them up right now, but will send you

another email early next week, I'd be curious to get your thoughts.

And if the talk gets in, I would love to give a reprise! I'll send you an email in a few

months if that ends up happening.

Best,
Ryan

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/trusty-transport/CAH1d7sWZHsWW_nq7eo1mwecXauhP%3D30xe-xD%3Dh0dQrSQNSFDMA%40mail.gmail.com.

Joe DeBlasio

unread,

Nov 15, 2024, 2:04:46 PM11/15/24

to Ryan Lehmkuhl, Chris Thompson, trusty-t...@chromium.org

Hi Ryan,

Your work is based on open source and public information, and the implementation details we've provided in this email thread are available to anyone with sufficient open source code spelunking, so I don't think it's my place to grant or withhold a blessing. While I'd probably hope you'd keep the privacy leakage in its greater context, I think the lack of domain popularity analysis is a good observation and probably worth talking about publicly. Candidly, my only real worry is a (IMO) hyperbolic news headline, but that's my problem, not yours. 🙃

Otherwise, I'm looking forward to hearing more of your ideas on improving value, and on how the submission goes.

Best,

Joe

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/trusty-transport/CAFZs0S4L25u5x9JhdwBsCWS6jPFVd6fi%3DYLWDe7Dsi4KPPSL9Q%40mail.gmail.com.

Ryan Lehmkuhl

unread,

Nov 18, 2024, 4:13:53 PM11/18/24

to Joe DeBlasio, Chris Thompson, trusty-t...@chromium.org

Joe,

Yes, of course! I guess my question was more asking if there was specific
framing / context you would like to be included in the talk: you all have
done some awesome work here and I want to make sure I respect that :)
From what you've said so far, I'll make sure to include your points about the
low frequency + credential-less queries in the talk.

--

To give some relevant details you might find interesting, some of our recent
work has focused on using popularity distributions of databases to speed up
PIR. The core idea is conceptually simple: split up the database into popular
and un-popular subsets, and query these subsets with different probabilities.
By doing this you introduce a few complications (decreased correctness,
some tricks required for indexing into the databases, etc.) but you end up with
a concretely more efficient scheme if the popularity distribution is far from
uniform. For example, here's a table from our talk submission for the relevant
SCT auditing costs. (Note that the client state is just a list of the top
1-million popular domains and can be updated very infrequently).

As I mentioned before, I believe that making explicit use of website popularity
can give two (more realistic) paths forwards towards increasing the value of
SCT auditing:

Better audit coverage: currently the chance that a website is audited is
directly proportional to its popularity (since a client has to visit it). As a
result, less-popular websites, even ones with thousands of visitors, might
never be audited. To mitigate this, an auditing scheme could explicitly set
a website's sampling rate to be closer to the inverse of its popularity.

Say we did this in a fairly granular fashion. So you keep the auditing rate
the same for the top 95% of websites, but increase the rate to 1-in-2500 for
the bottom 5% of queries. This would increase costs by ~15% (the
expected auditing rate becomes 1-in-8700) for a 4x "better coverage"
(the metric here is a bit loose) in the tail of the distribution.
Protection from targeted attacks: as you mention, protecting from targeted
attacks would require increasing the sampling rate by several orders of
magnitude. Instead of doing this for all domains, this could be done for only
popular domains, protecting against targeted attacks for 95% of websites
that users’ visit at a much smaller overhead. Considering this is a fairly
small database + we have good batching techniques for PIR, I wouldn't be
surprised if you could get something reasonable here. I'd have to
understand what the real-world constraints are better to know how best to
do the batching though.

Both of these directions arguably require PIR due to the increased audit frequency
+ require some client storage to know what is popular + you'd have to switch to a
regular audit frequency or else be fine with some leakage from how often the
client queries (since a given client's audit frequency is now correlated with how
"normal" their web traffic is).

Anyways, sorry for the long email! Hopefully this is interesting to you, would be

very interested to hear your thoughts here.

Best,

Ryan

To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/trusty-transport/CAH1d7sUv7G4L4SopxFtTAhx%2BxkJDUwFo8sC8aqp9bhULUJQ6Zg%40mail.gmail.com.

Reply all

Reply to author

Forward